3.1 Measures of Center

3.1 Measures of Center A value at the center or middle of a data set The (arithmetic) mean, often referred to as the average, is the sum of all valu...
Author: Linette Merritt
2 downloads 2 Views 429KB Size
3.1

Measures of Center

A value at the center or middle of a data set The (arithmetic) mean, often referred to as the average, is the sum of all values divided by the total number of values. The mean calculated for population data is denoted by µ (mu). The mean calculated for a sample data is denoted by x (x bar).

µ=

∑x N

x=

∑x n

where N denotes the population size and n denotes the sample size. We will round the mean to one more decimal place than the data. Ex.

Cost of houses in a certain area: 499,000 899,000 598,000

629,000 649,000 759,000

4,900,000 989,000 899,000

715,000 629,000 1,219,000

Find the mean of the cost of houses in this area:

Are there any outliers?

Do you think the mean is a good measurement?

The median is the middle value when the data values are arranged in an increasing or decreasing order. If you have an even number of data values, the median is the mean of the two middle values. The position of the middle value when data is arranged in increasing or decreasing order, can n +1 be calculated by the formula . 2 If you don’t get an integer answer, it is the mean of the two integer values closest to that number.

Find the median of the cost of houses in the example above:

Do you think the median is a good measurement?

A statistic is resistant if its value is not affected much by extreme values (outliers) in the data set. Which is more resistant, the median or the mean?

The mode of a data set is the value that occurs most frequently. If there are two values that occur with the same greatest frequency, the data set is bimodal. If there are more than two values that occur with the same greatest frequency, the data set is multimodal. If no value is repeated, there is no mode. Find the mode of the cost of houses in the example above:

Do you think the mode is a good measurement?

Is the mode resistant?

Relationships Among the Mean, Median, and Mode

Approximate Mean for Grouped Data Although it can be beneficial to group data, we do loose information on individual data. However, we can approximate the mean of grouped data, by assuming that the values in each class is equal to the class midpoint.

The following data give the frequency distribution of the test scores of all the students in a class. Find the mean of these test scores.

Test Scores

Frequency

90-100

7

80-89

10

70-79

12

60-69

5

50-59

2

3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread of the data. One measurement of variation that is easy to calculate is the range. Range = (Largest value) - (Smallest value) What are some disadvantages about this measurement?

We would like to find the spread/variation of all data, not only between the minimum and maximum value. The measurements Variance and Standard Deviation find the deviation of all of the data values from the mean. Ex. Given the following data: 13, 14, 24, 24, 25, 26 Find the mean of the data:

Value of x 13

Deviation from mean x-µ

14 24 24 25 26 Total

The Population Variance is given by

σ2 = The Sample Variance is given by s2 =

The Population Standard Deviation is given by

σ = The Sample Standard Deviation is given by s =

Note: Do NOT use the other formulas for the variance and standard deviation that are also in the book.

- The standard deviation is the measure of variation of all values from the mean.

- The variance is the measure of variation equal to the square of the standard deviation.

- The value of the standard deviation is usually positive, and in rare cases it could also equal to zero. Describe how the data set would look if the standard deviation is zero:

- The unit of the standard deviation is the same unit as the units of the original data.

- It is unusual that data fall more than ................. standard deviations from the mean. Is the standard deviation resistant? Explain.

- A general round-off rule for variation: Carry one more decimal place than is present in the original set of data. Round only the final answer and not values in the middle of a calculation (if necessary to round off in the middle of a data set, you must include several more decimals than what your final answer will have).

How to find Mean, Median, Mode, and Standard Deviation using our calculators:

Variance and Standard Deviation for Grouped Data When finding the variance or standard deviation for grouped data, we will assume that all data values are equal to its class’ midpoint. The following data give the frequency distribution of the test scores of all the students in a class. Find the standard deviation of these test scores. Unless it is clear that a data set is from a population (it will usually use the word ALL) we will assume it is a sample.

Test Scores Frequency 90-100

7

80-89

10

70-79

12

60-69

5

50-59

2

Use of Standard Deviation Empirical Rule For a bell-shaped distribution -about 68% of all values fall within 1 standard deviation of the mean -about 95% of all values fall within 2 standard deviations of the mean - almost all values (about 99.7%) fall within 3 standard deviations of the mean

ex.

The prices of all college textbooks follow a bell-shaped distribution with a mean of $105 and a standard deviation of $20.

(a)

Using the empirical rule, find the interval that contains the prices of about 99.7% of college textbooks.

(b)

Using the empirical rule, find the percentage of all college textbooks with their prices between $85 and $125.

(c)

Using the empirical rule, find the percentage of all college textbooks with their prices between $65 and $145.

Chebyshev’s Theorem 1   At least 1 − 2  of the data values lie within k standard deviations of the mean, for any k>1.  k  Note that the distribution does NOT have to be bell-shaped in order to use this theorem.

ex.

Use above formula for k=2, and interpret the result.

ex.

Use above formula for k=3, and interpret the result.

ex.

Use above formula for k=1.5, and interpret the result.

ex.

Suppose the average credit card debt for households is $9,500 with a standard deviation of $2,600. (a) Using Chebyshevs theorem, find at least what percentage of current credit card debts for all households are between $3,000 and $16,000.

(b) Using Chebyshevs theorem, find the interval that contains credit card debts of at least 89% of all the households.

A Rough Estimation of the Standard Deviation Most values fall within 2 standard deviations of the mean. Values that fall outside of this interval would be considered unusual. range From this we could also say that SD ≈ (unless we have extreme outliers). 4 You would only use above formula if you are asked to find a “rough estimation” of the standard deviation. If you are ever asked to just find the standard deviation, rather than a rough estimate, then don’t use this formula. ex.

The Wechsler Adult Intelligence Scale involves an IQ test designed so that the mean score is 100 and the standard deviation is 15. Use the Rough Estimation of the Standard Deviation to find the minimum and maximum "usual" IQ scores. Then determine whether an IQ score of 135 would be considered "unusual."

3.3 - Measures of Position Z-Scores Who is taller, a man 73 inches tall or a woman 68 inches tall? The obvious answer is that the man is taller. However, men are taller than women on the average. Let’s ask the question this way: Who is taller relative to their gender, a man 73 inches tall or a woman 68 inches tall?

The z-score of an individual data value tells how many ______________________ that value is from its population mean. Let x be a value from a population with mean μ and standard deviation σ. The z-score for x is z=

Practice 1. A National Center for Health Statistics study states that the mean height for adult men in the U.S. is μ = 69.4 inches, with a standard deviation of σ = 3.1 inches. The mean height for adult women is μ = 63.8 inches, with a standard deviation of σ = 2.8 inches. Who is taller relative to their gender, a man 73 inches tall, or a woman 68 inches tall?

2. Eric proudly tells his brother Bruce that he got 94 points on his last math exam, which had an average of 73 points and a standard deviation of 9. Bruce says that he did even better on his math exam, on which he got 96 points, and this exam had an average of 79 points and a standard deviation of 7. Who did better on his exam relative to their class scores?

3. Suppose Eric's classmate got a z-score of -1.7 on his math exam. What was his exam score?

Percentiles Percentiles denoted P1 , P2 , ... , P99 , divide sorted data into ______________ equal parts. Draw a picture:

The pth percentile is the value in a data set that has about p% of the data values smaller than p and (100-p)% values that are greater than p. ex.

When my daughter Linnéa was born, the doctor told me her length was in the 98.8th percentile for newborn girls. What does that mean?

To find the approximate value of the pth percentile: 1. sort the data in increasing order  p  2. calculate L=  ⋅n  100  3. If L is a whole number, the pth percentile is the average of the number in position L and the number in position L+1. If L is NOT a whole number, the pth percentile is the number in the position of the next whole number higher than L.

Percentile rank of x =

( Number of values less than x ) + 0.5 ⋅100

n (where x is some number in the data set)

Round the result to the nearest whole number.

ex.

Given the following data set 15 9 12 11 7

6

9

10

(a)

Calculate the approximate value of the 55th percentile.

(b)

Find the percentile rank of 7.

14

3

6

5

Quartiles Quartiles denoted Q1 , Q 2 , and Q3 , divide sorted data into ____________ equal parts.

Q 2 is the 50th percentile (median). Q1 is the 25th percentile. Q3 is the 75th percentile. The five-number summary of a data set consists of: Minimum value, Q1, Q2, Q3, and Maximum value IQR = Interquartile Range = Q3 - Q1 The IQR method allows us to determine which values are outliers. Outliers are data values that are below Q1 − 1.5 ⋅ IQR , the lower outlier boundary, or above Q3 + 1.5 ⋅ IQR , the upper outlier boundary.

Find the five number summary and the IQR for the given data sets below. Determine if they have any outliers. ex.

1

3

6

7

9

9

ex.

3

2

10

8

2

9

Calculator use:

10

14

15

Boxplots A boxplot is a graphic presentation of data using the five number summary, the IQR, and outliers. Lower outlier boundary = Q1 - 1.5 × IQR Upper outlier boundary = Q3 + 1.5 × IQR

How to draw a box-and-whisker plot: • •

Draw a number line, such that all numbers in the data set are covered. Draw a box above the number line, such that its left side is at Q1 and the right side at Q3 . Draw a vertical line at Q 2 also. Smallest value above lower outlier boundary

Largest value below upper outlier boundary

** • •

ex.

Draw whiskers (horizontal lines) to join the box and the smallest and largest value resp. within the two outlier boundaries. Plot any outliers (values outside of outlier boundaries). The time (in minutes) that a student spent in the laundromat in a week, for 15 randomly selected weeks, is as follows: 72 62 84 73 107 81 93 135 77 85 67 90 83 112

(a)

Prepare a box-and-whisker plot.

(b)

Is the data skewed?

(c)

Does the data contain any outliers?

72

A boxplot can help us better see the distribution of the data, such as the center, spread, skewness, and outliers.

We can also use boxplots to visually compare two or more data sets by placing them right above each other.