Where does the standard deviation come from?

2015-03-17 Where does the standard deviation come from? the Standard Deviation is "about 2/3“ – but it has a definition based on the original data po...
Author: Miles Spencer
0 downloads 0 Views 541KB Size
2015-03-17

Where does the standard deviation come from? the Standard Deviation is "about 2/3“ – but it has a definition based on the original data points

Standard deviation • Standard

deviation is a distance, from the average value, within which you find about 2/ of the sample data points 3 -

-

These data sets each have 14 points, so about 2/3 of the data points is 10 points Each has an average of 14 How far do you have to go from 14 (plus and minus) to include 10 data points, in each case?

1

3

5

7

11

12

12

13

9

11

13

13 13.5 14

15

17

19

21

23

25

27

14 14.5 15

15

16

16

17

1

2015-03-17

Why does the Standard Deviation cover 2/3 of the data points? • Because -

of the Normal curve!

Also called the Gaussian curve Normal curve 0.06

0.05

0.04

0.03

0.02

0.01

0 -20

-10

0

10

20

30

40

50

Useful Details about the Normal Curve

1.

σ is a measure of clustering

2.

68% of data is within 1σ from μ

3.

95% of data is within 2σ from μ

4.

One σ is at the POI

5.

σ2 = variance

2

2015-03-17

Heuristics and the Normal Curve • Heuristic -

an approximate, simple rule based on experience

• The

⅔ rule (to find the standard deviation) is based on the upper picture.

• Another -

-

heuristic rule:

To find the standard deviation, take the range and divide by 4. This rule is based on the lower picture.

Enter these points, in a single column…

80 40 90 100 45

100 60 45 75 30

45 30 55 60 35

50 25 55

These measurements represent weights, in grams.

3

2015-03-17

Calculating the Variance - Formula 1 • Compute

squares of the data

• Calculate

the sum of the data, the sum of the squares of the data, and the count n

• Calculate

the Variance as:

variance =

𝑥2 −

( 𝑥)2 𝑛

𝑛 −1

Observation: this calculation only depends on the sums and count, not on the individual data points. It can be updated as more data points are added.

Calculating the Standard Deviation • The

variance is a good measure of the scattering of the data, but… -

What What What What

are are are are

the the the the

units units units units

of of of of

the the the the

original data? mean? differences? variance???

• The

units of the variance aren't directly comparable to the data units.

• The

Standard Deviation is the square root of the variance. -

It has the same units as the original data and the mean.

4

2015-03-17

Standard Deviation • The

standard deviation: is a value that has the same units as the average (and the data).

• It

represents a distance that covers about 2/3 of the data points.

• Calculate

the square root of the variance to give the standard deviation: standard deviation =

variance

Variance and standard deviation

5

2015-03-17

Another Look • Enter

these points, in a column…

145 145 120 135 150

125 150 160 130 130

160 120 130 140 145 155 140

These measurements represent lengths, in inches.

Calculating the Variance – Another Way • Compute

the mean (the average) of the data

• Create

a column of differences between each data point and the mean -

These are the errors

• Calculate -

• Find -



the squares of the differences

These are the Squared Errors

the sum of the Squared Errors

abbreviated SSE

Divide the SSE by (n-1) to get the Variance -

where n is the number of data points (n-1) is called the degrees of freedom

6

2015-03-17

Formula 2 for Variance – the Sum of Square Errors, SSE SSE = variance = -

where » » »

𝑥𝑖 − 𝜇

2

SSE 𝑛−1

𝑥𝑖 are the individual data points 𝜇 is the mean (the average) of all the data points 𝑛 is the number of data points

Using SSE

7

2015-03-17

Enter these points, in a column…

6.1 6.5 6.5 6.3 6.1

6.0 6.3 6.3 6.0 5.7

5.6 6.3 6.1 6.0 5.7

5.8 6.0 5.9

These measurements represent liquid quantities, in milliliters.

Calculating the Variance and Standard Deviation • Compute

the mean (the average) of the data

• Compute

the differences

• Calculate

the squares of the differences

• Find

the sum of the Squared Errors

• Divide

the SSE by (n-1) to get the variance

• Calculate

the square root of the variance to give the standard deviation Observation: this calculation is simple, but computing the average requires the individual data points. It must be completely redone if more data points are added.

8

2015-03-17

Result

Standard Deviation, Variance, and Excel • As

with regression, Excel can calculate these values directly –

• =average( -

calculate the mean of a set of points

• =count( -

)

Calculate n, the number of data points

• =var.s( -

) or =var( )

calculate the variance of a set of points

• =stdev.s( •

)

) or =stdev( )

calculate the standard deviation of a set of points

note: these calculations assume the points are a sample of the underlying population

9