STAT1010 – picturing data
3.2 Visualizing Distributions of data ! A
frequency table provides information on the distribution of data. " When
we discuss the distribution of a variable, we are referring to the possible values, and which of the values occur more (or less) frequently than the others.
Possible values
Political affiliation
Frequency
Democrat
517
Republican
371
Independent
112
Occurred a lot Occurred less 1 frequently
The distribution of the data ! The
distribution of data is the way the data values are spread over all possible values. " What
values occur frequently? the variable is numeric, what is the maximum value? What is the minimum value? " What is the “shape” of the distribution " If
Weight of Contents of Cans of Cola
10
5
0 330
340
350
360
370
380
390
2
Weight (grams)
Graphical displays of distributions ! As
the phrase goes… “a picture is worth 1000 words”, and distributions are often better conveyed using graphics rather than tables.
112
400
371
Independent
300
517
Republican
200
Democrat
frequency of affiliation
Frequency
100
Political affiliation
500
600
Political affiliation in a 1000 person survey
0
Frequency
15
Democrat
Republican politican affiliation
Independent
3
1
STAT1010 – picturing data
Pioneer in Statistical Graphics ! Florence " See
Nightingale
video clip from “Joy of Statistics”
4
Bar graph to represent frequencies (or relative frequencies) for qualitative or categorical Political affiliation in a 1000 person survey variables. 400 300 200 0
100
frequency of affiliation
500
600
! Used
Democrat
Republican politican affiliation
Independent
5
Bar graph - labels provide useful labels.
Horizontal axis label
400 300
frequency of affiliation
200 0
100
Vertical axis label
500
Tick marks
Main title
Political affiliation in a 1000 person survey 600
! Always
Democrat
Republican
Independent
Categories
politican affiliation
6
2
STAT1010 – picturing data
Bar graph - formatting things to remember… Political affiliation in a 1000 person survey
400 300 200 0
100
frequency of affiliation
Space between bars (specifically when this is a categorical variable plot)
500
Some white space at top
600
! Some
Democrat
Republican
Independent
politican affiliation
Uniform (arbitrary) bar widths
7
Bar graph – Pareto chart ! A
bar graph in which the bars are arranged in frequency order is called a Pareto chart.
400 200
300
A Pareto chart (descending order)
0
100
frequency of affiliation
500
600
Political affiliation in a 1000 person survey
Democrat
Republican
Independent
8
politican affiliation
Bar graph – Pareto chart ! A
bar graph in which the bars are arranged in frequency order is called a Pareto chart.
400 100
200
300
Not a Pareto chart
0
frequency of affiliation
500
600
Political affiliation in a 1000 person survey
Democrat
Independent politican affiliation
Republican
9
3
STAT1010 – picturing data
Bar graph – Pareto chart A Pareto chart (and also a bar chart)
Not a Pareto chart (but it is a bar chart)
10
Example: Deflategate ! In
2014, there was a National Football League (NFL) scandal called ‘Deflategate’.
! The
Patriots were accused of underinflating their game footballs, which would allow for fewer fumbles (an unfair advantage).
! Did
it look like the Patriots had fewer fumbles? If so, how many fewer? We will actually look at the data as number of Plays per Fumble (high-> fewer fumbles). 11
Example: Deflategate (offensive plays) Frequency of fumbles
Presented as plays per fumble
Categories (i.e. teams)
12
http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible
4
STAT1010 – picturing data
Dot plot – similar to a bar graph ! If
there are only a small number of observations (or counts), a dot plot can be used. ! One dot per observation.s ! Sometimes seen as a quick and easy plot in the engineering field. 13
Pie Charts ! Also
used to plot qualitative variables. ! A pie chart is a circle divided so that each wedge represents the relative frequency of a particular category. Political affiliation in a 1000 person survey
Political affiliation
Frequency Relative frequency
Democrat
517
0.517
Republican
371
0.371
Independent
112
0.112
Democrat
51.7% 11.2% 37.1% Republican
Independent
14
Pie Charts ! As
I may have mentioned earlier, research has been done that shows that our brains do not interpret pie charts very well. ! Consider other options first before presenting a pie chart. Our brains comprehend this one better than this one.
15
5
STAT1010 – picturing data
Histograms ! A
histogram is like a bar graph, but it shows a distribution for a quantitative variable.
! The
bars have a natural order (thus, the classes must be quantitative in nature) and the bar widths have specific meaning.
! The
16
Histogram
How ‘often’ a value falls into a given bin
Frequency
bars in a histogram touch each other because there are no gaps between the categories.
Quantitative values grouped into bins
Measurement 17
Histogram Example ! 24
cola cans were sampled and weighed. frequency table and histogram were created:
! A
Weight of Contents of Cans of Cola
15
Frequency
[340,350)
1
[350,360)
11
[360,370)
8
[370,380)
4
Frequency
Class range of values
10
5
0 330
340
350
360
370
Weight (grams)
380
390
18
6
STAT1010 – picturing data
Axes and labels still important.
Histogram Example
Frequency
Some white15 space at top
Weight of Contents of Cans of Cola
No space between bars (specifically when this is a quantitative variable plot)
10
5
0 330
340
350
360
370
380
390
Weight (grams)
Rearranging these bars (as we did in a Pareto chart for qualitative data) would not make sense here. The classes are in order from smallest to largest. 19
Histogram Example ! Same
data, more classes (narrower bins)… histogram looks a bit different. Frequency
[345,350)
1
[350,355)
6
[355,360)
5
[360,365)
1
[365,370)
7
[370,375)
3
[375,380)
1
Weight of Contents of Cans of Cola 10
Frequency
Class range of values
5
0 330
340
350
360
370
380
Weight (grams)
390
20
Example: Deflategate (all plays) Number of teams falling into each bin
NOTE: This author should have the bars touching each other for a correct histogram presentation. Don’t put space between bars in a histogram
Patriots and their 187 plays/fumble
Numeric variable
21
http://www.sharpfootballanalysis.com/blog/2015/the-new-england-patriots-prevention-of-fumbles-is-nearly-impossible
7
STAT1010 – picturing data
Displaying Quantitative Data ! Histogram " Provides
a picture or shape of the distribution of the data. " Collects values into bins. " Bins should be of equal width and they should touch each other. " Different bin choices can yield different pictures. " Can show frequencies or relative frequencies 22
Stem-and-leaf plots ! We
can’t see individual data points in a histogram due to the binning and the use of the bars for frequencies. ! A stem-and-leaf plot is similar to a histogram, but individual data points are identified. ! As with dot plots, this type of plot probably makes the most sense when the number of observations is relatively small. 23
Stem-and-leaf plots ! One
leaf is associated with one data point.
! Example
data:
5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6
Here, a ‘leaf’ is the value one place to the right of the decimal place.
24
8
STAT1010 – picturing data
Stem-and-leaf plots ! One
leaf is associated with one data point.
! Example
data:
5.4, 0.7, 3.0, 2.6 0.3, 2.8, 5.2, 2.6
Here, a ‘leaf’ is the value one place to the right of the decimal place.
25
Stem-and-leaf example ! Recall
the 80 observations on compressive strengths:
105 221 183 186 121 181 180 143
97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149
26
Stem-and-leaf example ! 80
observations 76, Max: 245 ! Here, a ‘leaf’ represents the “ones place”. ! Looks somewhat like a histogram turned on its side, but we can identify individual data points. ! Gives you a feel for the distribution of the data. ! Min:
7|6 8|7 9|7 10 | 15 11 | 058 12 | 013 13 | 133455 14 | 12356899 15 | 001344678888 16 | 0003357789 17 | 0112445668 18 | 0011346 19 | 034699 20 | 0178 21 | 8 22 | 189 23 | 7 24 | 5 The decimal point is 1 digit(s) to the right of the | 27
9
STAT1010 – picturing data
Line charts ! Also
used to represent a quantitative variable.
! Created
by connecting the ‘center dots’ at the top of the bars of a histogram.
28
Line chart example
A histogram is also shown here, but it is not part of the line chart
29
Time-Series Graph ! If
a histogram or line chart has a horizontal axis of time, then it is a time-series graph. ! Time series plots show how things change over time. ! Often used with financial market information or housing data.
30
10
STAT1010 – picturing data
Time-Series Graph – example ! A
line chart with a horizontal axis of time (Year) # a times series graph.
31
Time-Series Graph – example Homes sold in Iowa City by zip code and month
1) What is the general trend over the years 2006-2011? 2) What is the general trend within each year? 3) What is the width of the underlying bin?
32
Year (data by the month)
Time-Series Graph – example Number of Olympic medals
1) What is the width of the underlying bin?
Year
33
11