C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1. The file CDCLifeTable is available from the course web site. It’s in the M folder; the direct link is http://people.stern.nyu.edu/gsimon/statdata/B01.1305/M/index.htm This has data for non-Hispanic whites, with gender corresponding to separate sets of columns. a.

Plot the female and male yearly hazards (the q’s) as a function of age. These should be overlaid on a single graph. HINT: In using Minitab, select Graph ⇒ Scatterplot, and then use “with connect line.” Set up the information panel like this:

Use Multiple Graphs to put both graphs on the same axis. You can edit the labels if you wish by double-clicking on graph features.

1

b.

Compute the gender ratio as a function of age. Let’s say you choose to do this as “females per 100 males.” Plot this as a function of age. HINT: In Minitab, use Calc ⇒ Calculator.

c.

Consider the marriage of Joseph, age 25, to Felicia, also age 25. Assume that their future survival mechanisms operate independently (even though this is a questionable assumption for a married couple). Find the probability that both are alive at age 75, Joseph will be alive at age 75 but Felicia will have died Felicia will be alive at age 75 but Joseph will have died both die before age 75

1

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SOLUTION: For a, the suggested graph comes out as follows: Scatterplot of Female q vs Female Age, Male q vs Male Age Variable Female q * Female Age Male q * Male Age

0.35 0.30

Y-Data

0.25 0.20 0.15 0.10 0.05 0.00 0

20

40 60 X-Data

80

100

There are many things you might do to make this more attractive. Here’s a suggestion: One-year hazard probabilities

Probability of death in one year

0.35 0.30 0.25 0.20

Male

0.15 0.10 Female

0.05 0.00 0

20

40

60

80

100

Age

1

2

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Here’s a plausible graph for b: Females per 100 males at given age 300

250

200

150

100 0

20

40

60

80

100

Age

This picture is a little bit dishonest, in that the data were set up with equal numbers at birth. In fact, females are slightly less than 50% of newborns.

l 75 (male) 62, 220 = ≈ 0.6345. This l 25 (male) 98,064 means, of course, P(Joseph has died before 75) = 1 – 0.6345 = 0.3655.

For c, observe that P(Joseph is alive at 75) =

l 75 (female) 74,084 = ≈ 0.7494. Thus, l 25 (female) 98,864 P(Felicia has died before age 75) = 1 – 0.7494 = 0.2506.

In a similar style, P(Felicia is alive at 75) =

If these are presumed to be independent, then P(both alive at 75)

= 0.6345 × 0.7494 ≈ 0.4755

P(only Joseph alive at 75)

= 0.6345 × 0.2506 ≈ 0.1590

P(only Felicia alive at 75)

= 0.3655 × 0.7494 ≈ 0.2739

P(both die before 75)

= 0.3655 × 0.2506 ≈ 0.0916

These add to 1.0000. (Rounding might have resulted in a total of 0.9998 or 1.0001.)

1

3

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2. Consider the following life table: Age (years)

Expectation of Future Life (years) 30 27 25 22 20 19 18 17 16 15 14 13 12 11 10 9 7 5

Birth to 20 20 to 25 25 to 30 30 to 35 35 to 40 40 to 41 41 to 42 42 to 43 43 to 44 44 to 45 45 to 46 46 to 47 47 to 48 48 to 49 49 to 50 50 to 55 55 to 60 60 and up

This table is presented in terms of expected future lifetime, rather than the more common (and more useful) lx and dx notation. The Age column is given as intervals, but it’s easy to link to the more conventional notation. For instance, think of the row “42 to 43” as simply being “42.” This row refers to someone who has reached the 42nd birthday but not the 43rd birthday. Make a list of things that are wrong with this table. You need not do elaborate arithmetic to form this list. This is a genuine life table, and it was actually used! SOLUTION: There are many imperfections with this table. Here’s a list: (1)

(2)

1

The opening age category “Birth to 20” is far too wide to be useful. The mortality issues for a one-year-old are very different from those of a 19-year-old. The end age category “60 and up” is also too wide, and it is also open-ended. The expected future lifetime for a 61-year-old has to be greater than for an 81-year-old. There are five-year categories, like 30 to 35, but these are not so distressing. The force of mortality is relatively flat over these intervals.

4

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 (3)

Over the ages 40 to 50, the expected future lifetime drops exactly one year for each year of advancing age. This says, for instance, that a 41-year-old can expect to live 18 more years, that a 42-year-old can expect to live 17 more years, that a 43-year-old can expect to live 16 more years, and so on. The drops in this column need to be less than one year. Why? Consider the expected total lifetime. For a 41-year-old, this is (current age) + (expected future lifetime) = 41 + 18 = 59. The same value, 59, is produced for people age 42, 43, …, 49. However, the expected total lifetime has to grow as one ages! A person aged 48 has to have a longer expected total lifetime than someone aged 40, because the 48-year-old has already racked up eight more years toward total lifetime.

(4)

There are some shocking discontinuities. For example, a person age 54 is expected to live 9 more years (to 63), while a person age 55 is expected to live 7 more years (to 62).

So where did this table come from? This was prepared about 200 A.D. in Rome by Ulpian. The Romans had a sufficiently stable and legalistic society that they could accumulate enough data to make such a table. You can find a copy of this table on page 31 of Length of Life: A Study of the Life Table, by Dublin, Louis I., Lotka, Alfred J., and Spiegelman, Mortimer, published by Ronald Press Company, New York, 1949 (revised edition). The original edition was 1936. There is an interesting discussion of this table in The Origin and Early History of Insurance, by Charles Farley Trenerry, London, P.S. King and Sons, Ltd., 1926; see pages 150-151. The Romans may have used this table to construct annuities rather than to prepare conventional life insurance. As the Roman Empire dissipated, many of its greatest creations were lost. The world did not produce another good actuarial table for the next 1,300 years!

3. Suppose that a pizza shop offers ten different toppings for its pizzas, and you select three of these at random. a. b. c. d.

1

What is the probability that mushrooms will be among your selections? What is the probability that both mushrooms and onions will be among your selections? What is the probability that your selections will be mushrooms, onions, and black olives? How exactly would you go about selecting three toppings at random?

5

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 SOLUTION: a. The solution must be this methodically as

3 10

, by a straight ratio-proportion argument. You can get

⎛9⎞ ⎜ ⎟ 2 # selections with mushrooms 36 3 = ⎝ ⎠ = = # of selections 120 10 ⎛10 ⎞ ⎜ ⎟ ⎝ 3⎠

In the numerator, we do the counting by thinking of getting two of the other nine toppings. b.

This should be done as ⎛8⎞ ⎜ ⎟ 1 # selections with mushrooms and onions 8 1 = ⎝ ⎠ = = ≈ 0.0667 # of selections 120 15 ⎛10 ⎞ ⎜ ⎟ ⎝ 3⎠

The numerator is based on the requirement to select exactly one of the eight toppings which are not mushroom or onion.

1 1 = ≈ 0.0083. 120 ⎛10 ⎞ ⎜ ⎟ ⎝ 3⎠

c.

As only one selection qualifies, this is

d.

Haphazard guessing just does not work. You will need to use some form of random device. First associate the toppings with the numbers 1 through 10; this means an association like 1 ⇔ onions, 2 ⇔ green peppers, 3 ⇔ olives, and so on. Here are some possibilities: (i) Use a computer random-number generator to select values in the set {1, 2, … 10}. Use the first three different numbers. (ii) Write the integers 1 through 10 (or the actual topping names) on 10 cards. Shuffle the cards and select three. (iii) Find a ten-sided die, and roll this to select the toppings. There are only five physical possibilities for a die with regular polygon faces, and it happens that ten-sided is one of them. (This is one of the great results of ancient geometry.) If you had nine toppings, you could still use the ten-sided die by agreeing to ignore one of the numbers. (iv) Use a board-game spinner, if you have one of these close at hand.

4. Suppose that you are dealt five cards, at random, from a standard deck.

1

6

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 a. b.

What is the probability that exactly four of these cards will be spades? What is the probability that at least four of these cards will be spades?

SOLUTION: This is a routine hypergeometric calculation. For part a, the probability is ⎛ 13 ⎞ ⎛ 39 ⎞ ⎜ ⎟⎜ ⎟ ⎝ 4 ⎠ ⎝ 1 ⎠ = 715 × 39 ≈ 0.010729. The numerator represents the number of ways 2,598,960 ⎛ 52 ⎞ ⎜ ⎟ ⎝5⎠ of choosing 4 of the 13 spades, and exactly one of the 39 non-spades. The denominator is the total number of ways of choosing 5 cards from the deck of 52.

⎛13 ⎞ ⎛ 39 ⎞ 13 ×12 ×11×10 In this calculation, we used ⎜ ⎟ = = 715, ⎜ ⎟ = 39, and also 4 ×3 ×2 ×1 ⎝4⎠ ⎝ 1⎠ ⎛ 52 ⎞ 52 × 51× 50 × 49 × 48 = 2,598,960. ⎜ ⎟ = 5 × 4× 3 × 2 × 1 ⎝ 5⎠

For part b, we need to consider also the probability of getting five spades. This ⎛ 13 ⎞ ⎛ 39 ⎞ ⎜ ⎟⎜ ⎟ 5 0 1, 287 × 1 probability is ⎝ ⎠ ⎝ ⎠ = ≈ 0.000495. The overall probability is then 2,598,960 ⎛ 52 ⎞ ⎜ ⎟ ⎝5⎠ 0.010729 + 0.000495 = 0.011224. These calculations call all be done easily with Minitab’s hypergeometric option. The command is Calc ⇒ Probability Distributions ⇒ Hypergeometric.

5. In games of the KENO or LOTTO style, the bettor selects numbers from a fixed set. Then the game operator selects another set of numbers, and the bettor wins according to the number of matches. a. Suppose that the game uses the numbers 1 through 50, and suppose that the operator selects eight of these. If the bettor selects five numbers, find the probability that there are exactly five matches. HINT: You might find the ⎛ 50 ⎞ number ⎜ ⎟ = 536,878,650 to be useful. ⎝ 8⎠ b. Suppose that the game uses the numbers 1 through 50, and suppose that the operator selects ten of these. If the bettor selects five numbers, find the probability that there are exactly five matches. Also note whether this probability is larger or smaller than the probability in a. HINT: You will want the number ⎛ 50 ⎞ ⎜ ⎟ = 10,272,278,170. ⎝ 10 ⎠ 1

7

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 c.

Suppose that the game uses the numbers 1 through 50, and suppose that the operator selects ten of these. If the bettor selects six numbers, find the probability that there are exactly six matches. Note whether the probability here is larger or smaller than the probability in b. HINT: You can do these with a calculator, of course, but the calculations call be done more easily with Minitab’s hypergeometric option. The command is Calc ⇒ Probability Distributions ⇒ Hypergeometric.

SOLUTION: Let’s think of the bettor as going first. In part a, the bettor creates five “hot” numbers and 45 “cold” numbers. Now when the game operator selects 8 ⎛ 50 ⎞ numbers, there are ⎜ ⎟ ways to make the selection. However, the number of ways in ⎝8 ⎠ ⎛ 5 ⎞ ⎛ 45 ⎞ which the operator can select five “hot” and three “cold” numbers is ⎜ ⎟ ⎜ ⎟ = ⎝5⎠ ⎝ 3 ⎠ 14,190 1 × 14,190 = 14,190. Thus, the probability is ≈ 2.6431 × 10-5 = 536,878,650 0.000026431. ⎛ 50 ⎞ Part b is similar. There are ⎜ ⎟ possible selections, and the number of ways in which ⎝ 10 ⎠ ⎛ 5 ⎞ ⎛ 45 ⎞ the operator can select five “hot” and five “cold” is ⎜ ⎟ ⎜ ⎟ = 1 × 1,221,759 = ⎝ 5⎠ ⎝ 5 ⎠ 1,221,759 1,221,759. The probability is ≈ 1.1894 × 10-4 = 0.00011894. You 10,272,278,170 might observe that the probability in b is higher. ⎛ 50 ⎞ For part c, there are again ⎜ ⎟ possible selections, but the number of ways in which the ⎝ 10 ⎠ ⎛ 6 ⎞ ⎛ 44 ⎞ operator can select six “hot” and four “cold” is ⎜ ⎟ ⎜ ⎟ = 1 × 135,751 = 135,751. The ⎝6⎠ ⎝ 4 ⎠ 135,751 probability is ≈ 1.3215 × 10-5 = 0.000013215. This is about one-ninth 10,272,278,170 of the solution to b. Apparently getting five-of-five is much easier than getting six-of-six. By the way, you’ll get the same solutions if you imagine that the operator selects first and that the (uninformed) bettor goes next. Using a to illustrate, the operator ⎛ 50 ⎞ would create eight “hot” numbers and 42 “cold” numbers. The bettor has ⎜ ⎟ ⎝5 ⎠ = 2,118,760 ways to select, and out of these the number of ways of getting five 1

8

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

56 ⎛ 8 ⎞ ⎛ 42 ⎞ “hot” numbers is ⎜ ⎟ ⎜ ⎟ = 56 × 1 = 56. Thus, the probability is ≈ 2,118,760 ⎝ 5 ⎠ ⎝ 42 ⎠ 2.6431 × 10-5 , which is exactly the solution above.

6. The random variable Y has the probability distribution described in this table: 1 0.30

y P[ Y = y ]

2 0.40

3 0.20

4 0.10

Find the mean and standard deviation of the random variable Y. SOLUTION: It’s easy to find EY =

∑ y P [Y = y ] y

= 1 × 0.30

+

= 0.30

+

2 × 0.40 + 3 × 0.20 + 4 × 0.10 0.80

+

0.60

+

0.40

= 2.10 = μY

Here μY is a convenient symbol for E Y , the mean of Y (also called the expected value of Y). The standard deviation requires the calculation of

∑ ( y − μ ) P [Y = y ] 2

Y

=

y

∑ y P [Y = y ] 2

− μY2

y

Either form, correctly executed, will lead to the correct result. If μY had been a nice round number (like 2), then we probably would have preferred the first form. Here μY = 2.1, so the differences ( y − μY ) take the values -1.1, -0.1, 0.9, 1.9. These are a bit annoying to work with, so we’ll prefer the second form. We find

∑ y P [Y = y ] 2

y

1

= 12 × 0.30

+

22 × 0.40 +

32 × 0.20 + 42 × 0.10

= 1 × 0.30

+

4 × 0.40

9 × 0.20 + 16 × 0.10

=

+

0.30

1.60

+ +

1.80

9

+

1.60

=

5.30

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 This leads us to

∑ y P [Y = y ] 2

− μY2 = 5.30 - 2.102 = 5.30 - 4.41 = 0.89.

y

Finally, SD(Y) =

∑ y P [Y = y ] 2

− μY2 =

0.89 ≈ 0.9434.

y

How many significant digits should be reported with this answer? The original information was all given in one significant figure (1, 2, 3, 4, 0.3, 0.4, 0.2, 0.1), so it would seem that 0.94 would be more than sufficiently precise. There are differences of opinion here, but here are some suggestions: Derived values should be more precise than the input (so that 0.9 would be regarded as a too-crude answer). Using all the precision of a calculator can be ridiculous. The Windows calculator found this as 0.94339811320566038113206603776226. Many people like to use an even number of figures after a decimal point. The value 0.9434 was chosen as slightly over-generous in precision.

7. The random variable Q has the probability distribution described in this table: q P[ Q = q ]

100 0.20

200 0.50

300 0.20

400 0.10

Find the mean and standard deviation of Q. (After problem 6, this should be very easy.) SOLUTION: Just note that Q = 100 Y, where Y is the random variable of problem 4. Thus E Q = 220 and SD(Q) = 87.18.

8. The random variable X has this distribution: x P[ X = x ]

37 0.06

38 0.25

39 0.38

40 0.25

41 0.06

Find the mean E X. SOLUTION: You can set up all the arithmetic, but the probability values are clearly symmetric about the value 39. Thus μX = 39.

1

10

©gs2012

C22.0103 Homework Set 3 Spring 2012 Solutions 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 9. Suppose that you are playing a game of chance in which the probability of winning is 0.14. What is the probability that you’ll win exactly once in six turns? SOLUTION: This is a binomial phenomenon with n = 6 and p = 0.14. If X represents ⎛6⎞ the number of wins, then P[ X = x ] = ⎜ ⎟ 0.14 x 0.866− x . In particular, for x = 1 win, ⎝ x⎠ ⎛ 6⎞ P[ X = 1 ] = ⎜ ⎟ 0.141 0.865 = 6 × 0.14 × 0.865 ≈ 0.3952 ≈ 40%. ⎝1 ⎠

10. The admissions office of a small, selective liberal-arts college will only offer admission to applicants who have a certain mix of accomplishments, including a combined SAT score of 1,400 or more. Based on past records, the head of admissions feels that the probability is 0.42 that an admitted applicant will come to the college. If 600 applicants are admitted, what is the probability that 250 or more will come? Note that “250 or more” means the set of values {250, 251, 252, …, 600}. SOLUTION: You can get Minitab to find the cumulative binomial probability for n = 600, p = 0.42, and for 249 or fewer “successes.” Cumulative Distribution Function Binomial with n = 600 and p = 0.42 x 249

P( X