2015-03-17
Where does the standard deviation come from? the Standard Deviation is "about 2/3“ – but it has a definition based on the original data po...
Where does the standard deviation come from? the Standard Deviation is "about 2/3“ – but it has a definition based on the original data points
Standard deviation • Standard
deviation is a distance, from the average value, within which you find about 2/ of the sample data points 3 -
-
These data sets each have 14 points, so about 2/3 of the data points is 10 points Each has an average of 14 How far do you have to go from 14 (plus and minus) to include 10 data points, in each case?
1
3
5
7
11
12
12
13
9
11
13
13 13.5 14
15
17
19
21
23
25
27
14 14.5 15
15
16
16
17
1
2015-03-17
Why does the Standard Deviation cover 2/3 of the data points? • Because -
of the Normal curve!
Also called the Gaussian curve Normal curve 0.06
0.05
0.04
0.03
0.02
0.01
0 -20
-10
0
10
20
30
40
50
Useful Details about the Normal Curve
1.
σ is a measure of clustering
2.
68% of data is within 1σ from μ
3.
95% of data is within 2σ from μ
4.
One σ is at the POI
5.
σ2 = variance
2
2015-03-17
Heuristics and the Normal Curve • Heuristic -
an approximate, simple rule based on experience
• The
⅔ rule (to find the standard deviation) is based on the upper picture.
• Another -
-
heuristic rule:
To find the standard deviation, take the range and divide by 4. This rule is based on the lower picture.
Enter these points, in a single column…
80 40 90 100 45
100 60 45 75 30
45 30 55 60 35
50 25 55
These measurements represent weights, in grams.
3
2015-03-17
Calculating the Variance - Formula 1 • Compute
squares of the data
• Calculate
the sum of the data, the sum of the squares of the data, and the count n
• Calculate
the Variance as:
variance =
𝑥2 −
( 𝑥)2 𝑛
𝑛 −1
Observation: this calculation only depends on the sums and count, not on the individual data points. It can be updated as more data points are added.
Calculating the Standard Deviation • The
variance is a good measure of the scattering of the data, but… -
What What What What
are are are are
the the the the
units units units units
of of of of
the the the the
original data? mean? differences? variance???
• The
units of the variance aren't directly comparable to the data units.
• The
Standard Deviation is the square root of the variance. -
It has the same units as the original data and the mean.
4
2015-03-17
Standard Deviation • The
standard deviation: is a value that has the same units as the average (and the data).
• It
represents a distance that covers about 2/3 of the data points.
• Calculate
the square root of the variance to give the standard deviation: standard deviation =
variance
Variance and standard deviation
5
2015-03-17
Another Look • Enter
these points, in a column…
145 145 120 135 150
125 150 160 130 130
160 120 130 140 145 155 140
These measurements represent lengths, in inches.
Calculating the Variance – Another Way • Compute
the mean (the average) of the data
• Create
a column of differences between each data point and the mean -
These are the errors
• Calculate -
• Find -
•
the squares of the differences
These are the Squared Errors
the sum of the Squared Errors
abbreviated SSE
Divide the SSE by (n-1) to get the Variance -
where n is the number of data points (n-1) is called the degrees of freedom
6
2015-03-17
Formula 2 for Variance – the Sum of Square Errors, SSE SSE = variance = -
where » » »
𝑥𝑖 − 𝜇
2
SSE 𝑛−1
𝑥𝑖 are the individual data points 𝜇 is the mean (the average) of all the data points 𝑛 is the number of data points
Using SSE
7
2015-03-17
Enter these points, in a column…
6.1 6.5 6.5 6.3 6.1
6.0 6.3 6.3 6.0 5.7
5.6 6.3 6.1 6.0 5.7
5.8 6.0 5.9
These measurements represent liquid quantities, in milliliters.
Calculating the Variance and Standard Deviation • Compute
the mean (the average) of the data
• Compute
the differences
• Calculate
the squares of the differences
• Find
the sum of the Squared Errors
• Divide
the SSE by (n-1) to get the variance
• Calculate
the square root of the variance to give the standard deviation Observation: this calculation is simple, but computing the average requires the individual data points. It must be completely redone if more data points are added.
8
2015-03-17
Result
Standard Deviation, Variance, and Excel • As
with regression, Excel can calculate these values directly –
• =average( -
calculate the mean of a set of points
• =count( -
)
Calculate n, the number of data points
• =var.s( -
) or =var( )
calculate the variance of a set of points
• =stdev.s( •
)
) or =stdev( )
calculate the standard deviation of a set of points
note: these calculations assume the points are a sample of the underlying population