3In this chapter we cover

P1: PBU/OVY GTBL011-03 P2: PBU/OVY QC: PBU/OVY GTBL011-Moore-v14.cls T1: PBU May 3, 2006 12:46 In this chapter we cover... Density curves Descr...
Author: Jasper James
23 downloads 0 Views 645KB Size
P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

In this chapter we cover... Density curves Describing density curves Normal distributions The 68–95–99.7 rule The standard Normal distribution Finding Normal proportions Using the standard Normal table Finding a value given a proportion

Stone/Getty Images

CHAPTER

3

The Normal Distributions We now have a kit of graphical and numerical tools for describing distributions. What is more, we have a clear strategy for exploring data on a single quantitative variable. EXPLORING A DISTRIBUTION 1. Always plot your data: make a graph, usually a histogram or a stemplot. 2. Look for the overall pattern (shape, center, spread) and for striking deviations such as outliers. 3. Calculate a numerical summary to briefly describe center and spread. In this chapter, we add one more step to this strategy: 4. Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.

Density curves

64

Figure 3.1 is a histogram of the scores of all 947 seventh-grade students in Gary, Indiana, on the vocabulary part of the Iowa Test of Basic Skills.1 Scores of many students on this national test have a quite regular distribution. The histogram is

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

Density curves

2

4

6

8

10

12

Iowa Test vocabulary score

F I G U R E 3 . 1 Histogram of the vocabulary scores of all seventh-grade students in Gary, Indiana. The smooth curve shows the overall shape of the distribution.

symmetric, and both tails fall off smoothly from a single center peak. There are no large gaps or obvious outliers. The smooth curve drawn through the tops of the histogram bars in Figure 3.1 is a good description of the overall pattern of the data. EXAMPLE 3.1

From histogram to density curve

Our eyes respond to the areas of the bars in a histogram. The bar areas represent proportions of the observations. Figure 3.2(a) is a copy of Figure 3.1 with the leftmost bars shaded. The area of the shaded bars in Figure 3.2(a) represents the students with vocabulary scores 6.0 or lower. There are 287 such students, who make up the proportion 287/947 = 0.303 of all Gary seventh-graders. Now look at the curve drawn through the bars. In Figure 3.2(b), the area under the curve to the left of 6.0 is shaded. We can draw histogram bars taller or shorter by adjusting the vertical scale. In moving from histogram bars to a smooth curve, we make a specific choice: adjust the scale of the graph so that the total area under the curve is exactly 1. The total area represents the proportion 1, that is, all the observations. We can then interpret areas under the curve as proportions of the observations. The curve is now a density curve. The shaded area under the density curve in Figure 3.2(b) represents the proportion of students with score 6.0 or lower. This area is 0.293, only 0.010 away from the actual proportion 0.303. Areas under the density curve give quite good approximations to the actual distribution of the 947 test scores.

65

P1: PBU/OVY GTBL011-03

66

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

2

4

6

8

10

12

Iowa Test vocabulary score

F I G U R E 3 . 2 ( a ) The proportion of scores less than or equal to 6.0 from the histogram is 0.303.

2

4

6

8

10

12

Iowa Test vocabulary score

F I G U R E 3 . 2 ( b ) The proportion of scores less than or equal to 6.0 from the density curve is 0.293.

DENSITY CURVE A density curve is a curve that • is always on or above the horizontal axis, and • has area exactly 1 underneath it. A density curve describes the overall pattern of a distribution. The area under the curve and above any range of values is the proportion of all observations that fall in that range.

CAUTION UTION

Density curves, like distributions, come in many shapes. Figure 3.3 shows a strongly skewed distribution, the survival times of guinea pigs from Exercise 2.34 (page 59). The histogram and density curve were both created from the data by software. Both show the overall shape and the “bumps” in the long right tail. The density curve shows a higher single peak as a main feature of the distribution. The histogram divides the observations near the peak between two bars, thus reducing the height of the peak. A density curve is often a good description of the overall pattern of a distribution. Outliers, which are deviations from the overall pattern, are not described by the curve. Of course, no set of real data is exactly described by a density curve. The curve is an idealized description that is easy to use and accurate enough for practical use.

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

Describing density curves

0

100

200

300

400

500

600

Survival time (days)

F I G U R E 3 . 3 A right-skewed distribution pictured by both a histogram and a density curve.

APPLY YOUR KNOWLEDGE 3.1

Sketch density curves. Sketch density curves that describe distributions with the following shapes: (a) Symmetric, but with two peaks (that is, two strong clusters of observations). (b) Single peak and skewed to the left.

Describing density curves Our measures of center and spread apply to density curves as well as to actual sets of observations. The median and quartiles are easy. Areas under a density curve represent proportions of the total number of observations. The median is the point with half the observations on either side. So the median of a density curve is the equal-areas point, the point with half the area under the curve to its left and the remaining half of the area to its right. The quartiles divide the area under the curve into quarters. One-fourth of the area under the curve is to the left of the first quartile, and three-fourths of the area is to the left of the third quartile. You can roughly locate the median and quartiles of any density curve by eye by dividing the area under the curve into four equal parts. Because density curves are idealized patterns, a symmetric density curve is exactly symmetric. The median of a symmetric density curve is therefore at its center. Figure 3.4(a) shows a symmetric density curve with the median marked. It isn’t so easy to spot the equal-areas point on a skewed curve. There are mathematical ways of finding the median for any density curve. That’s how we marked the median on the skewed curve in Figure 3.4(b).

67

P1: PBU/OVY GTBL011-03

68

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

The long right tail pulls the mean to the right.

Mean Median

Median and mean

F I G U R E 3 . 4 ( a ) The median and mean of a symmetric density curve both lie at the center of symmetry.

F I G U R E 3 . 4 ( b ) The median and mean of a right-skewed density curve. The mean is pulled away from the median toward the long tail.

What about the mean? The mean of a set of observations is their arithmetic average. If we think of the observations as weights strung out along a thin rod, the mean is the point at which the rod would balance. This fact is also true of density curves. The mean is the point at which the curve would balance if made of solid material. Figure 3.5 illustrates this fact about the mean. A symmetric curve balances at its center because the two sides are identical. The mean and median of a symmetric density curve are equal, as in Figure 3.4(a). We know that the mean of a skewed distribution is pulled toward the long tail. Figure 3.4(b) shows how the mean of a skewed density curve is pulled toward the long tail more than is the median. It’s hard to locate the balance point by eye on a skewed curve. There are mathematical ways of calculating the mean for any density curve, so we are able to mark the mean as well as the median in Figure 3.4(b). MEDIAN AND MEAN OF A DENSITY CURVE The median of a density curve is the equal-areas point, the point that divides the area under the curve in half. The mean of a density curve is the balance point, at which the curve would balance if made of solid material. The median and mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail.

F I G U R E 3 . 5 The mean is the balance point of a density curve.

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 25, 2006

21:14

Describing density curves

We can roughly locate the mean, median, and quartiles of any density curve by eye. This is not true of the standard deviation. When necessary, we can once again call on more advanced mathematics to learn the value of the standard deviation. The study of mathematical methods for doing calculations with density curves is part of theoretical statistics. Though we are concentrating on statistical practice, we often make use of the results of mathematical study. Because a density curve is an idealized description of a distribution of data, we need to distinguish between the mean and standard deviation of the density curve and the mean x and standard deviation s computed from the actual observations. The usual notation for the mean of a density curve is μ (the Greek letter mu). We write the standard deviation of a density curve as σ (the Greek letter sigma).

69

mean μ standard deviation σ

APPLY YOUR KNOWLEDGE 3.2

A uniform distribution. Figure 3.6 displays the density curve of a uniform distribution. The curve takes the constant value 1 over the interval from 0 to 1 and is zero outside that range of values. This means that data described by this distribution take values that are uniformly spread between 0 and 1. Use areas under this density curve to answer the following questions. (a) Why is the total area under this curve equal to 1? (b) What percent of the observations lie above 0.8? (c) What percent of the observations lie below 0.6? (d) What percent of the observations lie between 0.25 and 0.75?

height = 1

0

3.3 3.4

F I G U R E 3 . 6 The density curve of a uniform distribution, for Exercises 3.2 and 3.3.

1

Mean and median. What is the mean μ of the density curve pictured in Figure 3.6? What is the median? Mean and median. Figure 3.7 displays three density curves, each with three points marked on them. At which of these points on each curve do the mean and the median fall?

A

A BC (a)

B (b)

AB C

C (c)

F I G U R E 3 . 7 Three density curves, for Exercise 3.4.

P1: PBU/OVY GTBL011-03

70

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

Normal distributions One particularly important class of density curves has already appeared in Figures 3.1 and 3.2. These density curves are symmetric, single-peaked, and bell-shaped. They are called Normal curves, and they describe Normal distributions. Normal distributions play a large role in statistics, but they are rather special and not at all “normal” in the sense of being usual or average. We capitalize Normal to remind you that these curves are special. All Normal distributions have the same overall shape. The exact density curve for a particular Normal distribution is described by giving its mean μ and its standard deviation σ . The mean is located at the center of the symmetric curve and is the same as the median. Changing μ without changing σ moves the Normal curve along the horizontal axis without changing its spread. The standard deviation σ controls the spread of a Normal curve. Figure 3.8 shows two Normal curves with different values of σ . The curve with the larger standard deviation is more spread out.

σ σ

μ

μ

F I G U R E 3 . 8 Two Normal curves, showing the mean μ and standard deviation σ .

The standard deviation σ is the natural measure of spread for Normal distributions. Not only do μ and σ completely determine the shape of a Normal curve, but we can locate σ by eye on the curve. Here’s how. Imagine that you are skiing down a mountain that has the shape of a Normal curve. At first, you descend at an ever-steeper angle as you go out from the peak:

Fortunately, before you find yourself going straight down, the slope begins to grow flatter rather than steeper as you go out and down:

The points at which this change of curvature takes place are located at distance σ on either side of the mean μ. You can feel the change as you run a pencil along a Normal

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

The 68–95–99.7 rule

curve, and so find the standard deviation. Remember that μ and σ alone do not specify the shape of most distributions, and that the shape of density curves in general does not reveal σ . These are special properties of Normal distributions. NORMAL DISTRIBUTIONS A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers, its mean and standard deviation. The mean of a Normal distribution is at the center of the symmetric Normal curve. The standard deviation is the distance from the center to the change-of-curvature points on either side. Why are the Normal distributions important in statistics? Here are three reasons. First, Normal distributions are good descriptions for some distributions of real data. Distributions that are often close to Normal include scores on tests taken by many people (such as Iowa Tests and SAT exams), repeated careful measurements of the same quantity, and characteristics of biological populations (such as lengths of crickets and yields of corn). Second, Normal distributions are good approximations to the results of many kinds of chance outcomes, such as the proportion of heads in many tosses of a coin. Third, we will see that many statistical inference procedures based on Normal distributions work well for other roughly symmetric distributions. However, many sets of data do not follow a Normal distribution. Most income distributions, for example, are skewed to the right and so are not Normal. Non-Normal data, like nonnormal people, not only are common but are sometimes more interesting than their Normal counterparts.

The 68–95–99.7 rule Although there are many Normal curves, they all have common properties. In particular, all Normal distributions obey the following rule. THE 68–95–99.7 RULE In the Normal distribution with mean μ and standard deviation σ : • Approximately 68% of the observations fall within σ of the mean μ. • Approximately 95% of the observations fall within 2σ of μ. • Approximately 99.7% of the observations fall within 3σ of μ. Figure 3.9 illustrates the 68–95–99.7 rule. By remembering these three numbers, you can think about Normal distributions without constantly making detailed calculations.

CAUTION UTION

71

P1: PBU/OVY GTBL011-03

72

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

F I G U R E 3 . 9 The 68–95–99.7 rule for Normal distributions.

68% of data 95% of data 99.7% of data

−3

−2

−1

0

1

2

3

Standard deviations

EXAMPLE 3.2

Iowa Test scores

Figures 3.1 and 3.2 show that the distribution of Iowa Test vocabulary scores for seventhgrade students in Gary, Indiana, is close to Normal. Suppose that the distribution is exactly Normal with mean μ = 6.84 and standard deviation σ = 1.55. (These are the mean and standard deviation of the 947 actual scores.) Figure 3.10 applies the 68–95–99.7 rule to Iowa Test scores. The 95 part of the rule says that 95% of all scores are between μ − 2σ = 6.84 − (2)(1.55) = 6.84 − 3.10 = 3.74 and μ + 2σ = 6.84 + (2)(1.55) = 6.84 + 3.10 = 9.94 The other 5% of scores are outside this range. Because Normal distributions are symmetric, half these scores are lower than 3.74 and half are higher than 9.94. That is, 2.5% of the scores are below 3.74 and 2.5% are above 9.94.

CAUTION UTION

The 68–95–99.7 rule describes distributions that are exactly Normal. Real data such as the actual Gary scores are never exactly Normal. For one thing, Iowa Test scores are reported only to the nearest tenth. A score can be 9.9 or 10.0, but not 9.94. We use a Normal distribution because it’s a good approximation, and because we think the knowledge that the test measures is continuous rather than stopping at tenths. How well does our work in Example 3.2 describe the actual Iowa Test scores? Well, 900 of the 947 scores are between 3.74 and 9.94. That’s 95.04%, very accurate indeed. Of the remaining 47 scores, 20 are below 3.74 and 27 are above 9.94. The tails of the actual data are not quite equal, as they would be in an exactly Normal distribution. Normal distributions often describe real data better in the center of the distribution than in the extreme high and low tails.

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

The 68–95–99.7 rule

One standard deviation is 1.55.

68% of data

95% of data

2.5% of scores are below 3.74.

99.7% of data

2.19

3.74

5.29

6.84

8.39

9.94

11.49

Iowa Test score

F I G U R E 3 . 1 0 The 68–95–99.7 rule applied to the distribution of Iowa Test scores in Gary, Indiana, with μ = 6.84 and σ = 1.55.

EXAMPLE 3.3

Iowa Test scores

Look again at Figure 3.10. A score of 5.29 is one standard deviation below the mean. What percent of scores are higher than 5.29? Find the answer by adding areas in the figure. Here is the calculation in pictures:

=

+

68% 5.29

8.39

percent between 5.29 and 8.39 + 68% +

16%

84%

8.39

percent above 8.39 16%

5.29

= =

percent above 5.29 84%

Be sure you see where the 16% came from: 32% of scores are outside the range 5.29 to 8.39, and half of these are above 8.39.

73

P1: PBU/OVY GTBL011-03

74

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

Because we will mention Normal distributions often, a short notation is helpful. We abbreviate the Normal distribution with mean μ and standard deviation σ as N(μ, σ ). For example, the distribution of Gary Iowa Test scores is approximately N(6.84, 1.55).

APPLY YOUR KNOWLEDGE

Jim McGuire/Index Stock Imagery/ Picture Quest

3.5

Heights of young women. The distribution of heights of women aged 20 to 29 is approximately Normal with mean 64 inches and standard deviation 2.7 inches.2 Draw a Normal curve on which this mean and standard deviation are correctly located. (Hint: Draw the curve first, locate the points where the curvature changes, then mark the horizontal axis.)

3.6

Heights of young women. The distribution of heights of women aged 20 to 29 is approximately Normal with mean 64 inches and standard deviation 2.7 inches. Use the 68–95–99.7 rule to answer the following questions. (Start by making a sketch like Figure 3.10.) (a) Between what heights do the middle 95% of young women fall? (b) What percent of young women are taller than 61.3 inches?

3.7

Length of pregnancies. The length of human pregnancies from conception to birth varies according to a distribution that is approximately Normal with mean 266 days and standard deviation 16 days. Use the 68–95–99.7 rule to answer the following questions. (a) Between what values do the lengths of almost all (99.7%) pregnancies fall? (b) How short are the shortest 2.5% of all pregnancies?

The standard Normal distribution As the 68–95–99.7 rule suggests, all Normal distributions share many common properties. In fact, all Normal distributions are the same if we measure in units of size σ about the mean μ as center. Changing to these units is called standardizing. To standardize a value, subtract the mean of the distribution and then divide by the standard deviation. STANDARDIZING AND z-SCORES If x is an observation from a distribution that has mean μ and standard deviation σ , the standardized value of x is x −μ z= σ A standardized value is often called a z-score.

A z-score tells us how many standard deviations the original observation falls away from the mean, and in which direction. Observations larger than the

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

The standard Normal distribution

75

mean are positive when standardized, and observations smaller than the mean are negative. EXAMPLE 3.4

Standardizing women’s heights

The heights of young women are approximately Normal with μ = 64 inches and σ = 2.7 inches. The standardized height is height − 64 z= 2.7 A woman’s standardized height is the number of standard deviations by which her height differs from the mean height of all young women. A woman 70 inches tall, for example, has standardized height 70 − 64 z= = 2.22 2.7 or 2.22 standard deviations above the mean. Similarly, a woman 5 feet (60 inches) tall has standardized height 60 − 64 z= = −1.48 2.7 or 1.48 standard deviations less than the mean height.

We often standardize observations from symmetric distributions to express them in a common scale. We might, for example, compare the heights of two children of different ages by calculating their z-scores. The standardized heights tell us where each child stands in the distribution for his or her age group. If the variable we standardize has a Normal distribution, standardizing does more than give a common scale. It makes all Normal distributions into a single distribution, and this distribution is still Normal. Standardizing a variable that has any Normal distribution produces a new variable that has the standard Normal distribution. STANDARD NORMAL DISTRIBUTION The standard Normal distribution is the Normal distribution N(0, 1) with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(μ, σ ) with mean μ and standard deviation σ , then the standardized variable x −μ z= σ has the standard Normal distribution.

APPLY YOUR KNOWLEDGE 3.8

SAT versus ACT. Eleanor scores 680 on the mathematics part of the SAT. The distribution of SAT math scores in recent years has been Normal with mean 518

He said, she said. The height and weight distributions in this chapter come from actual measurements by a government survey. Good thing that is. When asked their weight, almost all women say they weigh less than they really do. Heavier men also underreport their weight—but lighter men claim to weigh more than the scale shows. We leave you to ponder the psychology of the two sexes. Just remember that “say so” is no substitute for measuring.

P1: PBU/OVY GTBL011-03

76

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

and standard deviation 114. Gerald takes the ACT Assessment mathematics test and scores 27. ACT math scores are Normally distributed with mean 20.7 and standard deviation 5.0. Find the standardized scores for both students. Assuming that both tests measure the same kind of ability, who has the higher score?

3.9

Men’s and women’s heights. The heights of women aged 20 to 29 are approximately Normal with mean 64 inches and standard deviation 2.7 inches. Men the same age have mean height 69.3 inches with standard deviation 2.8 inches. What are the z-scores for a woman 6 feet tall and a man 6 feet tall? Say in simple language what information the z-scores give that the actual heights do not.

Finding Normal proportions

Spencer Grant/PhotoEdit

Areas under a Normal curve represent proportions of observations from that Normal distribution. There is no formula for areas under a Normal curve. Calculations use either software that calculates areas or a table of areas. The table and most software calculate one kind of area, cumulative proportions.

CUMULATIVE PROPORTIONS The cumulative proportion for a value x in a distribution is the proportion of observations in the distribution that lie at or below x. Cumulative proportion

x

The key to calculating Normal proportions is to match the area you want with areas that represent cumulative proportions. If you make a sketch of the area you want, you will almost never go wrong. Find areas for cumulative proportions either from software or (with an extra step) from a table. The following example shows the method in a picture. EXAMPLE 3.5

Who qualifies for college sports?

The National Collegiate Athletic Association (NCAA) requires Division I athletes to score at least 820 on the combined mathematics and verbal parts of the SAT exam in order to compete in their first college year. (Higher scores are required for students

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

Finding Normal proportions

with poor high school grades.) The scores of the millions of high school seniors taking the SATs in recent years are approximately Normal with mean 1026 and standard deviation 209. What percent of high school seniors qualify for Division I college sports? Here is the calculation in a picture: the proportion of scores above 820 is the area under the curve to the right of 820. That’s the total area under the curve (which is always 1) minus the cumulative proportion up to 820.



=

820

820

area right of 820

= =

total area 1

− −

area left of 820 0.1622

= 0.8378

About 84% of all high school seniors meet the NCAA requirement to compete in Division I college sports.

There is no area under a smooth curve and exactly over the point 820. Consequently, the area to the right of 820 (the proportion of scores > 820) is the same as the area at or to the right of this point (the proportion of scores ≥ 820). The actual data may contain a student who scored exactly 820 on the SAT. That the proportion of scores exactly equal to 820 is 0 for a Normal distribution is a consequence of the idealized smoothing of Normal distributions for data. To find the numerical value 0.1622 of the cumulative proportion in Example 3.5 using software, plug in mean 1026 and standard deviation 209 and ask for the cumulative proportion for 820. Software often uses terms such as “cumulative distribution” or “cumulative probability.” We will learn in Chapter 10 why the language of probability fits. Here, for example, is Minitab’s output:

Cumulative Distribution Function Normal with mean = 1026 and standard deviation = 209 x 820

P ( X 2.85 (c) z > −1.66 (d) −1.66 < z < 2.85 3.11 How hard do locomotives pull? An important measure of the performance of a locomotive is its “adhesion,” which is the locomotive’s pulling force as a multiple of its weight. The adhesion of one 4400-horsepower diesel locomotive model varies in actual use according to a Normal distribution with mean μ = 0.37 and standard deviation σ = 0.04. (a) What proportion of adhesions measured in use are higher than 0.40? (b) What proportion of adhesions are between 0.40 and 0.50?

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 9, 2006

21:37

Finding a value given a proportion

3.12 A better locomotive. Improvements in the locomotive’s computer controls change the distribution of adhesion to a Normal distribution with mean μ = 0.41 and standard deviation σ = 0.02. Find the proportions in (a) and (b) of the previous exercise after this improvement.

Finding a value given a proportion Examples 3.5 to 3.8 illustrate the use of software or Table A to find what proportion of the observations satisfies some condition, such as “SAT score above 820.” We may instead want to find the observed value with a given proportion of the observations above or below it. Statistical software will do this directly. EXAMPLE 3.9

Find the top 10% using software

Scores on the SAT verbal test in recent years follow approximately the N(504, 111) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT? We want to find the SAT score x with area 0.1 to its right under the Normal curve with mean μ = 504 and standard deviation σ = 111. That’s the same as finding the SAT score x with area 0.9 to its left. Figure 3.12 poses the question in graphical form. Most software will tell you x when you plug in mean 504, standard deviation 111, and cumulative proportion 0.9. Here is Minitab’s output:

Inverse Cumulative Distribution Function Normal with mean = 504 and standard deviation = 111 P( X 1.77 (d) −2.25 < z < 1.77 3.31 Standard Normal drill. (a) Find the number z such that the proportion of observations that are less than z in a standard Normal distribution is 0.8. (b) Find the number z such that 35% of all observations from a standard Normal distribution are greater than z. ACT versus SAT. There are two major tests of readiness for college: the ACT and the SAT. ACT scores are reported on a scale from 1 to 36. The distribution of ACT scores in recent years has been roughly Normal with mean μ = 20.9 and standard deviation σ = 4.8. SAT scores are reported on a scale from 400 to 1600. SAT scores have been roughly Normal with mean μ = 1026 and standard deviation σ = 209. Exercises 3.32 to 3.43 are based on this information. 3.32 Tonya scores 1318 on the SAT. Jermaine scores 27 on the ACT. Assuming that both tests measure the same thing, who has the higher score? 3.33 Jacob scores 16 on the ACT. Emily scores 670 on the SAT. Assuming that both tests measure the same thing, who has the higher score?

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

Chapter 3 Exercises

3.34 Jose´ scores 1287 on the SAT. Assuming that both tests measure the same thing, what score on the ACT is equivalent to Jose´’s SAT score? 3.35 Maria scores 28 on the ACT. Assuming that both tests measure the same thing, what score on the SAT is equivalent to Maria’s ACT score? 3.36 Reports on a student’s ACT or SAT usually give the percentile as well as the actual score. The percentile is just the cumulative proportion stated as a percent: the percent of all scores that were lower than this one. Tonya scores 1318 on the SAT. What is her percentile? 3.37 Reports on a student’s ACT or SAT usually give the percentile as well as the actual score. The percentile is just the cumulative proportion stated as a percent: the percent of all scores that were lower than this one. Jacob scores 16 on the ACT. What is his percentile? 3.38 It is possible to score higher than 1600 on the SAT, but scores 1600 and above are reported as 1600. What proportion of SAT scores are reported as 1600? 3.39 It is possible to score higher than 36 on the ACT, but scores 36 and above are reported as 36. What proportion of ACT scores are reported as 36? 3.40 What SAT scores make up the top 10% of all scores? 3.41 How well must Abigail do on the ACT in order to place in the top 20% of all students? 3.42 The quartiles of any distribution are the values with cumulative proportions 0.25 and 0.75. What are the quartiles of the distribution of ACT scores? 3.43 The quintiles of any distribution are the values with cumulative proportions 0.20, 0.40, 0.60, and 0.80. What are the quintiles of the distribution of SAT scores? 3.44 Heights of men and women. The heights of women aged 20 to 29 follow approximately the N(64, 2.7) distribution. Men the same age have heights distributed as N(69.3, 2.8). What percent of young women are taller than the mean height of young men? 3.45 Heights of men and women. The heights of women aged 20 to 29 follow approximately the N(64, 2.7) distribution. Men the same age have heights distributed as N(69.3, 2.8). What percent of young men are shorter than the mean height of young women? 3.46 A surprising calculation. Changing the mean of a Normal distribution by a moderate amount can greatly change the percent of observations in the tails. Suppose that a college is looking for applicants with SAT math scores 750 and above. (a) In 2004, the scores of men on the math SAT followed the N(537, 116) distribution. What percent of men scored 750 or better? (b) Women’s SAT math scores that year had the N(501, 110) distribution. What percent of women scored 750 or better? You see that the percent of men above 750 is almost three times the percent of women with such high scores. Why this is true is controversial. 3.47 Grading managers. Many companies “grade on a bell curve” to compare the performance of their managers and professional workers. This forces the use of some low performance ratings so that not all workers are listed as “above average.” Ford Motor Company’s “performance management process” for a time assigned 10% A grades, 80% B grades, and 10% C grades to the company’s

87

P1: PBU/OVY GTBL011-03

88

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

C H A P T E R 3 • The Normal Distributions

18,000 managers. Suppose that Ford’s performance scores really are Normally distributed. This year, managers with scores less than 25 received C’s and those with scores above 475 received A’s. What are the mean and standard deviation of the scores?

APPLET

3.48 Osteoporosis. Osteoporosis is a condition in which the bones become brittle due to loss of minerals. To diagnose osteoporosis, an elaborate apparatus measures bone mineral density (BMD). BMD is usually reported in standardized form. The standardization is based on a population of healthy young adults. The World Health Organization (WHO) criterion for osteoporosis is a BMD 2.5 standard deviations below the mean for young adults. BMD measurements in a population of people similar in age and sex roughly follow a Normal distribution. (a) What percent of healthy young adults have osteoporosis by the WHO criterion? (b) Women aged 70 to 79 are of course not young adults. The mean BMD in this age is about −2 on the standard scale for young adults. Suppose that the standard deviation is the same as for young adults. What percent of this older population has osteoporosis? 3.49 Are the data Normal? ACT scores. Scores on the ACT test for the 2004 high school graduating class had mean 20.9 and standard deviation 4.8. In all, 1,171,460 students in this class took the test, and 1,052,490 of them had scores of 27 or lower.5 If the distribution of scores were Normal, what percent of scores would be 27 or lower? What percent of the actual scores were 27 or lower? Does the Normal distribution describe the actual data well? 3.50 Are the data Normal? Student loans. A government report looked at the amount borrowed for college by students who graduated in 2000 and had taken out student loans.6 The mean amount was x = $17,776 and the standard deviation was s = $12,034. The quartiles were Q1 = $9900, M = $15,532, and Q3 = $22,500. (a) Compare the mean x and the median M. Also compare the distances of Q1 and Q3 from the median. Explain why both comparisons suggest that the distribution is right-skewed. (b) The right skew pulls the standard deviation up. So a Normal distribution with the same mean and standard deviation would have a third quartile larger than the actual Q3 . Find the third quartile of the Normal distribution with μ = $17,776 and σ = $12,034 and compare it with Q3 = $22,500. The Normal Curve applet allows you to do Normal calculations quickly. It is somewhat limited by the number of pixels available for use, so that it can’t hit every value exactly. In the exercises below, use the closest available values. In each case, make a sketch of the curve from the applet marked with the values you used to answer the questions. 3.51 How accurate is 68–95–99.7? The 68–95–99.7 rule for Normal distributions is a useful approximation. To see how accurate the rule is, drag one flag across the other so that the applet shows the area under the curve between the two flags. (a) Place the flags one standard deviation on either side of the mean. What is the area between these two values? What does the 68–95–99.7 rule say this area is? (b) Repeat for locations two and three standard deviations on either side of the mean. Again compare the 68–95–99.7 rule with the area given by the applet.

P1: PBU/OVY GTBL011-03

P2: PBU/OVY

QC: PBU/OVY

GTBL011-Moore-v14.cls

T1: PBU

May 3, 2006

12:46

Chapter 3 Exercises

3.52 Where are the quartiles? How many standard deviations above and below the mean do the quartiles of any Normal distribution lie? (Use the standard Normal distribution to answer this question.) 3.53 Grading managers. In Exercise 3.47, we saw that Ford Motor Company grades its managers in such a way that the top 10% receive an A grade, the bottom 10% a C, and the middle 80% a B. Let’s suppose that performance scores follow a Normal distribution. How many standard deviations above and below the mean do the A/B and B/C cutoffs lie? (Use the standard Normal distribution to answer this question.)

APPLET

APPLET

89