STAT1010 picturing data

STAT1010 – picturing data 3.2 Visualizing Distributions of data !  A frequency table provides information on the distribution of data. " When we di...
Author: Gilbert Holland
16 downloads 0 Views 1MB Size
STAT1010 – picturing data

3.2 Visualizing Distributions of data !  A

frequency table provides information on the distribution of data. " When

we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others.

Possible values

Political affiliation

Frequency

Democrat

517

Republican

371

Independent

112

Occurred a lot Occurred less 1 frequently

The distribution of the data !  The

distribution of data is the way the data values are spread over all possible values. " What

values occur frequently? the variable is numeric, what is the maximum value? What is the minimum value? " What is the “shape” of the distribution " If

Weight of Contents of Cans of Cola

10

5

0 330

340

350

360

370

380

390

2

Weight (grams)

Graphical displays of distributions !  As

the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables.

112

400

371

Independent

300

517

Republican

200

Democrat

frequency of affiliation

Frequency

100

Political affiliation

500

600

Political affiliation in a 1000 person survey

0

Frequency

15

Democrat

Republican politican affiliation

Independent

3

1

STAT1010 – picturing data

Pioneer in Statistical Graphics !  Florence " See

Nightingale

video clip from “Joy of Statistics”

4

Bar graph to represent frequencies (or relative frequencies) for qualitative or categorical Political affiliation in a 1000 person survey variables. 400 300 200 0

100

frequency of affiliation

500

600

!  Used

Democrat

Republican politican affiliation

Independent

5

Bar graph - labels provide useful labels.

Horizontal axis label

400 300

frequency of affiliation

200 0

100

Vertical axis label

500

Tick marks

Main title

Political affiliation in a 1000 person survey 600

!  Always

Democrat

Republican

Independent

Categories

politican affiliation

6

2

STAT1010 – picturing data

Bar graph - formatting things to remember… Political affiliation in a 1000 person survey

400 300 200 0

100

frequency of affiliation

Space between bars (specifically when this is a categorical variable plot)

500

Some white space at top

600

!  Some

Democrat

Republican

Independent

politican affiliation

Uniform (arbitrary) bar widths

7

Bar graph – Pareto chart !  A

bar graph in which the bars are arranged in frequency order is called a Pareto chart.

400 200

300

A Pareto chart (descending order)

0

100

frequency of affiliation

500

600

Political affiliation in a 1000 person survey

Democrat

Republican

Independent

8

politican affiliation

Bar graph – Pareto chart !  A

bar graph in which the bars are arranged in frequency order is called a Pareto chart.

400 100

200

300

Not a Pareto chart

0

frequency of affiliation

500

600

Political affiliation in a 1000 person survey

Democrat

Independent politican affiliation

Republican

9

3

STAT1010 – picturing data

Bar graph – Pareto chart A Pareto chart (and also a bar chart)

Not a Pareto chart (but it is a bar chart)

10

Example: Deflategate !  In

2014, there was a National Football League (NFL) scandal called ‘Deflategate’.

!  The

Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).

!  Did

it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles). 11

Example: Deflategate (offensive plays) Frequency of fumbles

Presented as plays per fumble

Categories (i.e. teams)

12

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

4

STAT1010 – picturing data

Dot plot – similar to a bar graph !  If

there are only a small number of observations (or counts), a dot plot can be used. !  One dot per observation.s !  Sometimes seen as a quick and easy plot in the engineering field. 13

Pie Charts !  Also

used to plot qualitative variables. !  A pie chart is a circle divided so that each wedge represents the relative frequency of a particular category. Political affiliation in a 1000 person survey

Political affiliation

Frequency Relative frequency

Democrat

517

0.517

Republican

371

0.371

Independent

112

0.112

Democrat

51.7% 11.2% 37.1% Republican

Independent

14

Pie Charts !  As

I may have mentioned earlier, research has been done that shows that our brains do not interpret pie charts very well. !  Consider other options first before presenting a pie chart. Our brains comprehend this one better than this one.

15

5

STAT1010 – picturing data

Histograms !  A

histogram is like a bar graph, but it shows a distribution for a quantitative variable.

!  The

bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.

!  The

16

Histogram

How ‘often’ a value falls into a given bin

Frequency

bars in a histogram touch each other because there are no gaps between the categories.

Quantitative values grouped into bins

Measurement 17

Histogram Example !  24

cola cans were sampled and weighed. frequency table and histogram were created:

!  A

Weight of Contents of Cans of Cola

15

Frequency

[340,350)

1

[350,360)

11

[360,370)

8

[370,380)

4

Frequency

Class range of values

10

5

0 330

340

350

360

370

Weight (grams)

380

390

18

6

STAT1010 – picturing data

Axes and labels still important.

Histogram Example

Frequency

Some white15 space at top

Weight of Contents of Cans of Cola

No space between bars (specifically when this is a quantitative variable plot)

10

5

0 330

340

350

360

370

380

390

Weight (grams)

Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest. 19

Histogram Example !  Same

data, more classes (narrower bins)… histogram looks a bit different. Frequency

[345,350)

1

[350,355)

6

[355,360)

5

[360,365)

1

[365,370)

7

[370,375)

3

[375,380)

1

Weight of Contents of Cans of Cola 10

Frequency

Class range of values

5

0 330

340

350

360

370

380

Weight (grams)

390

20

Example: Deflategate (all plays) Number of teams falling into each bin

NOTE: This author should have the bars touching each other for a correct histogram presentation. Don’t put space between bars in a histogram

Patriots and their 187 plays/fumble

Numeric variable

21

http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible

7

STAT1010 – picturing data

Displaying Quantitative Data !  Histogram " Provides

a picture or shape of the distribution of the data. " Collects values into bins. " Bins should be of equal width and they should touch each other. " Different bin choices can yield different pictures. " Can show frequencies or relative frequencies 22

Stem-and-leaf plots !  We

can’t see individual data points in a histogram due to the binning and the use of the bars for frequencies. !  A stem-and-leaf plot is similar to a histogram, but individual data points are identified. !  As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small. 23

Stem-and-leaf plots !  One

leaf is associated with one data point.

!  Example

data:

5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

24

8

STAT1010 – picturing data

Stem-and-leaf plots !  One

leaf is associated with one data point.

!  Example

data:

5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6

Here, a ‘leaf’ is the value one place to the right of the decimal place.

25

Stem-and-leaf example !  Recall

the 80 observations on compressive strengths:

105 221 183 186 121 181 180 143

97 154 153 174 120 168 167 141

245 228 174 199 181 158 176 110

163 131 154 115 160 208 158 133

207 180 190 193 194 133 156 123

134 178 76 167 184 135 229 146

218 157 101 171 165 172 158 169

199 151 142 163 145 171 148 158

160 175 149 87 160 237 150 135

196 201 200 176 150 170 118 149

26

Stem-and-leaf example !  80

observations 76, Max: 245 !  Here, a ‘leaf’ represents the “ones place”. !  Looks somewhat like a histogram turned on its side, but we can identify individual data points. !  Gives you a feel for the distribution of the data. !  Min:

7|6 8|7 9|7 10 | 15 11 | 058 12 | 013 13 | 133455 14 | 12356899 15 | 001344678888 16 | 0003357789 17 | 0112445668 18 | 0011346 19 | 034699 20 | 0178 21 | 8 22 | 189 23 | 7 24 | 5 The decimal point is 1 digit(s) to the right of the | 27

9

STAT1010 – picturing data

Line charts !  Also

used to represent a quantitative variable.

!  Created

by connecting the ‘center dots’ at the top of the bars of a histogram.

28

Line chart example

A histogram is also shown here, but it is not part of the line chart

29

Time-Series Graph !  If

a histogram or line chart has a horizontal axis of time, then it is a time-series graph. !  Time series plots show how things change over time. !  Often used with financial market information or housing data.

30

10

STAT1010 – picturing data

Time-Series Graph – example !  A

line chart with a horizontal axis of time (Year) # a times series graph.

31

Time-Series Graph – example Homes sold in Iowa City by zip code and month

1)  What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin?

32

Year (data by the month)

Time-Series Graph – example Number of Olympic medals

1) What is the width of the underlying bin?

Year

33

11

Suggest Documents