= 1 4. = 2 3. Hence, P ( C) = 1 4 and P (+ C) = 2 3. are true

Stat 653 HW3 Divya Nair Exercise 1 (2.1). An article in the New York Times (February 17, 1999) about the PSA blood test for detecting prostate canc...
Author: Geraldine Smith
34 downloads 0 Views 177KB Size
Stat 653 HW3

Divya Nair

Exercise 1 (2.1). An article in the New York Times (February 17, 1999) about the PSA blood test for

detecting prostate cancer stated that, of men who had this disease, the test fails to detect prostate cancer in

1

in

4

(so called false-negative results), and of men who did not have it, as many as two-thirds receive

¯ denote the event of having (not having) prostate cancer and let +(−) denote C(C)

false-positive results. Let

a positive (negative) test result. a. Which is true:

P (− | C) =

1 4 or

P (C | −) =

P (C¯ | +) =

1 4?

2 3 or

¯ = P (+ | C)

2 3?

1 in 4 ... 1 precisely means that P (− | C) = = 4 . Similarly, ... of men who did not have it (disease), ¯ ¯ = P (+∩C) = 32 . as many as two-thirds receive false-positive results. precisely means that P (+ | C) P (+) 1 2 ¯ Hence, P (− | C) = 4 and P (+ | C) = 3 are true. Solution. ...

of the men who had this disease, the test fails to detect prostate cancer in

P (−∩C) P (C)

b. What is the sensitivity of this test? Solution. Sensitivity is the probability that the diagnostic test is positive given that a subject has the

disease. Using the complement rule for conditional probability and the known probability from part (a),

P (+ | C) = 1 − P (− | C) = 1 −

1 4

3 4.

=

c. Of men who take the PSA test, suppose

P (C) = 0.01. Find the cell probabilities in the 2 × 2 table for ¯ . Y = diagnonis (+, −) with X = true disease status (C, C)

the joint distribution that cross classies Solution. The

2×2

table with all the cell probabilities are given below.

True Disease Status

C C¯ Total

Diagnosis

+ 0.0075 0.66 0.6675

− 0.0025 0.33 0.3325

Total

0.01 0.99 1

The values in this table are lled in the following way: Since

P (C) = 0.01,

its complement is

¯ = 1 − P (C) = 0.99. P (C)

This lls up all the values in the third

column.

P (− | C) and using the known probability 1 × 0.01 = 0.0025. Consequently, P (+ ∩ C) = 4 0.0075. These calculations ll up all the values in the rst row.

Next, applying the denition of conditional probability on from part (a) we have,

0.01 − 0.0025 =

P (− ∩ C) = P (− | C) · P (C) =

¯ = 1 − P (+ | C) ¯ = 1 − 2 = 1. P (− | C) 3 3 ¯ ¯ ¯ ¯ = Also, applying the denition of conditional probability on P (− | C) gives P (−∩ C) = P (− | C)·P (C) 1 ¯ 3 × 0.99 = 0.33. Hence, P (+ ∩ C) = 0.99 − 0.33 = 0.66, and P (+) = 0.0075 + 0.66 = 0.6675, and P (−) = 0.0025 + 0.33 = 0.3325. This completes the table.

Using the complement rule for conditional probability we have,

d. Using (c), nd the marginal distribution for the diagnosis. Solution. As computed in part (c),

e. Using (c) and (d), nd

P (C | +),

P (+) = 0.6675

and

P (−) = 0.3325.

and interpret.

P (C∩+) 0.0075 = 0.6675 = 0.01124. This means P (+) that the probability of men diagnosed with prostate cancer given that they tested positive for it is

Solution. As computed in parts (c) and (d),

P (C | +) =

0.01124.

1

Stat 653 HW3

Divya Nair

X = true status (1 = disease, 2 = πi = P (Y = 1 | X = i), i = 1, 2.

Exercise 2 (2.2). For diagnostic testing, let

diagnosis(1

= positive, 2 = negative).

a. Explain why sensitivity

= π1

Let

and specicity

no disease) and

Y =

= 1 − π2 .

Solution. Sensitivity is the probability that the diagnostic test is positive given that a subject has the

disease, that is,

P (Y = 1 | X = 1).

Said dierently, sensitivity is the probability of success for the

subjects in row 1 of the contingency table, and so its probability is given by

π1 .

Specicity is the probability that the test is negative given that the subject does not have the disease, that is,

P (Y = 2 | X = 2).

In other words, specicity is the probability of failure for the subjects in

row 2 of the contingency table. Hence, its probability is given by b. Let

γ

1 − π2 .

denote the probability that a subject has the disease. Given that the diagnosis is positive, use

Bayes' theorem that the probability a subject truly has the disease is

Solution. Recall that Bayes's Theorem is

nosis is positive

P (Y = 1)

P (A | B) =

π1 γ . π1 γ + π2 (1 − γ)

P (B | A) · P (A) . P (B)

The probability that the diag-

is

P (Y = 1) = P (Y = 1 ∩ X = 1) + P (Y = 1 ∩ X = 2) = P (Y = 1 | X = 1) · P (X = 1) + P (Y = 1 | X = 2) · P (X = 2). Thus, the probability that a subject truly has the disease

P (X = 1 | Y = 1)

is given by

P (Y = 1 | X = 1) · P (X = 1) P (Y = 1) P (Y = 1 | X = 1) · P (X = 1) = P (Y = 1 | X = 1) · P (X = 1) + P (Y = 1 | X = 2) · P (X = 2) π1 γ = . π1 γ + π2 (1 − γ)

P (X = 1 | Y = 1) =

c. For mammograms for detecting breast cancer, suppose

= 0.88.

γ = 0.01,

sensitivity

= 0.86,

and specicity

Given a positive test result, nd the probability that the woman truly has breast cancer.

Solution. From part (b), the probability that the woman truly has breast cancer is given by

Since specicity

= 1 − π2 = 0.88,

we get that

π2 = 1 − 0.88 = 0.12.

π1 γ . π1 γ + π2 (1 − γ)

Thus,

π1 γ 0.86 × 0.01 = = 0.0675. π1 γ + π2 (1 − γ) 0.86 × 0.01 + 0.12(1 − 0.01)

d. To better understand the answer in (c), nd the joint probabilities for the

X

and

Y.

Solution. The

True Status

X=1 X=2 Total

2×2

cross classication of

Discuss their relative sizes in the two cells that refer to a positive test result.

2×2

table with joint probabilities are given below. Diagnosis

Y =1 0.0086 0.1188 0.1274

Y =2 0.0014 0.8712 0.8726

Total

0.01 0.99 1 2

Stat 653 HW3

Divya Nair

The values in the table are found in the following way:

P (Y = 1 ∩ X = 1) = P (Y = 1 | X = 1) · P (X = 1) = 0.86 × 0.01 = 0.0086 P (Y = 2 ∩ X = 1) = 0.01 − 0.0086 = 0.0014 P (X = 2) = 1 − 0.01 = 0.99 P (Y = 1 ∩ X = 2) = P (Y = 1 | X = 2) · P (X = 2) = 0.12 × 0.99 = 0.1188 P (Y = 2 ∩ X = 2) = 0.99 − 0.1188 = 0.8712. The probability of women who have breast cancer and tested positive for it is lower than the probability of women who do not have breast cancer but tested positive for it. Exercise 3 (2.3). According to the recent UN gures, the annual gun homicide rate is

residents in the United States and

1.3

62.4

per one million

per one million residents in the UK.

a. Compare the proportion of residents killed annually by guns using (i) dierence of proportions, (ii) relative risk. Solution. The proportion of residents killed annually by guns

(i) Dierence of proportions:

ˆ = p1 − p2 ∆

(ii) Relative risk is given by

− 1.3

= 62.4

per one million

= 61.1

per one million.

p1 π1 π2 which is equal to p2

per one million

= 48.

We see the dierence in proportions is a very small number compared to the relative risk.

0,

b. When both proportions are very close to

as here, which measure is more useful for describing the

strength in association? Why? Solution. The relative risk is a more useful measure in describing the strength in association because

the dierence of proportions is so small that it misleads one into thinking that the dierence in the annual gun homicide rate between the two countries is negligible. Exercise 4 (2.4). A newspaper article preceding the 1994 World Cup seminal match between Italy and

Bulgaria stated that Italy is favored 10-11 to beat Bulgaria, which is rated at 10-3 to reach the nal. Suppose this means that the odds that Italy wins are

11 3 10 and the odds that Bulgaria wins are 10 . Find the

probability that each team wins, and comment.

11

Solution. The probability of success is given by

and the probability that Bulgaria wins is

odds 10 . The probability that Italy wins is 11 odds +1 +

3 10 3 10

+1

10

1

= 0.5238,

= 0.2308.

Exercise 5 (2.5). Consider the following two studies reported in the New York Times :

a. A British study reported (December 3, 1998) that, of smokers who get lung cancer, women are times more vulnerable than men to get small-cell lung cancer. Is Solution. The number

1.7

1.7

1.7

an odds ratio, or a relative risk?

is a relative risk since the proportion of women who get small-cell lung

cancer are being compared to the proportion of men who get small-cell cancer.

3

Stat 653 HW3

Divya Nair

b. A National Cancer Institute study about tamoxifen and breast cancer reported (April 7, 1998) that the women taking the drug were

45% less likely to experience invasive breast cancer compared with the

women taking placebo. Find the relative risk for (i) those taking the drug compared to those taking placebo, (ii) those taking placebo compared to those taking the drug.

π1 π2 = 1−0.45 = 0.55. On the other hand, the relative risk for those taking placebo compared to those taking the drug π 1 is 2 = π1 0.55 = 1.8182.

Solution. The relative risk for those taking the drug compared to those taking placebo is

Exercise 6 (2.6). In the United States, the estimated annual probability that a woman over the age of

dies of lung cancer equals

0.001304

for current smokers and

0.000121

35

for nonsmokers [M. Pagano and K.

Gauvreau, Principles of Biostatistics, Belmont, CA: Duxbury Press (1993), p. 134]. a. Calculate and interpret the dierence of proportions and the relative risk. Which is more informative for this data? Why? Solution. The dierence of proportions is

π1 0.001304 π2 = 0.000121 = proportions is so small. risk is

10.7769.

ˆ = p1 − p2 = 0.001304 − 0.000121 = 0.001183. ∆

The relative

The relative risk is more informative here since the dierence of

b. Calculate and interpret the odds ratio. Explain why the relative risk and odds ratio take similar values.

0.001304/(1−0.001304) π1 /(1−π1 ) π2 /(1−π2 ) = 0.000121/(1−0.000121) = 10.7896. Since the odds ratio is greater than 1, we conclude that women who smoke and are over the age of 35 are more likely to die of Solution. The odds ratio is given by

lung cancer than women who do not smoke and are over the age of take similar values because both the same as

1 − π2 .

π1

π2

and

35.

The relative risk and odds ratio

are close to zero. Consequently,

1 − π1

is approximately

They then cancel each other in the odds ratio formula leaving with the formula

for the relative risk. Exercise 7 (2.7). For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender

(female, male) and survival (yes, no) was

11.4.

(For data, see R. Dawson, J. Statist. Educ. 3, no. 3, 1995.)

a. What is wrong with the interpretation, The probability of survival for females was

11.4

times that for

males.? Give the correct interpretation. Solution. The odds ratio is the ratio of the odds of an event occurring in one group to the odds of that

event occurring in another group. The correct interpretation is The odds of survival for females was

11.4

times that the odds of survival for males.

b. The odds of survival for females equaled

Solution. The odds ratio

which gives that oddsM

0.2544 0.2544+1 0.7436.

= 0.2028,

c. Find the value of

R

θ =

oddsF . oddsM

= 0.2544.

2.9.

For each gender, nd the proportion who survived.

It is given in the problem that

θ = 11.4

The probability of survival for males is given by

and the probability of survival for females is given by

πF =

and oddsF

πM =

oddsF oddsF +1

in the interpretation, The probability of survival for females was

R

= 2.9

oddsM oddsM +1

=

2.9 2.9+1

=

=

times that for

males. Solution. For the given interpretation to be sensible,

by

πF πM

=

0.7436 0.2028

= 3.6667.

4

R

here has to be the relative risk which is given

Stat 653 HW3

Divya Nair

Exercise 8 (2.8). A research study estimated that under a certain condition, the probability a subject

would be referred for heart catheterization was

0.906

for whites and

0.847

for blacks.

a. A press release about the study stated that the odds of referral for cardiac catheterization for blacks are

60%

60%

of the odds for whites. Explain how they obtained

Solution. The odds ratio is

θ =

oddsB oddsW

=

πB /(1−πB ) πW /(1−πW )

=

(more accurately,

0.847/0.153 0.906/0.094

= .5744

57%).

which is equivalent to

57%. b. An Associated Press story that described the study stated Doctors were only cardiac catheterization for blacks as for whites.

60%

as likely to order

What is wrong with this interpretation? Give the

correct percentage for this interpretation. (In stating results to the general public, it is better to use the relative risk than the odds ratio. It is simpler to understand and less likely to be misinterpreted. For details, see New Engl. J. Med., 341: 279-283, 1999.) Solution. The given interpretation is trying to compare the probability of cardiac catheterization in

blacks with the probability of cardiac catheterization in whites, but

60%

describes the odds ratio

instead. The interpretation can be corrected by using the percentage of relative risk which is

0.847 0.906

πB πW

=

= 0.9349 ≈ 93%.

Exercise 9 (2.9). An estimated odds ratio for adult females between the presence of squamous cell carcinoma

(yes, no) and smoking behavior (smoker, nonsmoker) equals subjects whose smoking level

s

is

0 < s < 20

11.7

cigarettes per day; it is

when the smoker category consists of

26.1

for smokers with

s ≥ 20

cigarettes

per day (R. Brownson et al., Epimediology, 3: 61-64, 1992). Show that the estimated odds ratio between carcinoma and smoking levels (s

≥ 20, 0 < s < 20)

equals

26.1 11.7

= 2.2.

Data posted at the FBI website

(www.fbi.gov).

2 × 2 table, the estimated odds ratio between the presence of squamous cell carcinoma (Y ) s (X) of 0 < s < 20 cigarettes per day is given by odds oddsc = 26.1. Similarly, the estimated odds ratio between the presence of squamous cell carcinoma and smoking level of s ≥ 20 cigarettes per day oddsss is given by oddsc = 11.7. Then the estimated odds ratio between carcinoma and smoking levels (s ≥ 20, 26.1×oddsc ss 0 < s < 20) is odds oddss = 11.7×oddsc = 2.2. Solution. In a

and smoking level

Exercise 10 (2.10). Data posted at the FBI website (www.fbi.gov) stated that of all blacks slain in 2005,

91%

were slain by blacks, and of all whites slain in 2005,

victim and

X

83%

a. What conditional distribution do these statistics refer to, Solution. Clearly, these statistics refer to

X

given

b. Calculate and interpret the odds ratio between

b w

Y

denote race of

X

w stands Y b w 0.91 0.09 0.17 0.83

49.37

given

X,

or

X

given

Y?

and

Y. 2×2

contingency table where

b

stands for

for white.

The odds ratio between murderer is

Y

Y.

Solution. The given information is lled in the following

black and

X

were slain by whites. Let

denote race of murderer.

X

and

Y

is then

π1 /(1 − π1 ) 0.91/0.09 = = 49.37. π2 /(1 − π2 ) 0.17/0.83

times higher than the odds of race of victim.

5

The odds of race of

Stat 653 HW3

Divya Nair

c. Given that a murderer was white, can you estimate the probability that the victim was white? What additional information would you need to do this? (Hint: How could you use Bayes's Theorem?)

P (X = w | Y = w) · P (Y = w) P (X = w) P (Y = w) and P (X = w).

P (Y = w | X = w) =

Solution. By Bayes's Theorem,

for white. To estimate this probability we need

where

w

stands

Exercise 11 (2.12). A statistical analysis that combines information from several studies is called a meta analysis. A meta analysis compared aspirin with placebo on incidence of heart attack and of stroke, separately

for men and for women (J. Am. Med. Assoc., 295: 306-313, 2006). For the Women's Health Study, heart attacks were reported for a. Construct a

198

of

19, 934

taking aspirin and for

193

of

19, 942

taking placebo.

2×2 table that cross classies the treatment (aspirin, placebo) with whether a heart attack

was reported (yes, no). Solution. The given information is recorded in a

2×2

table below.

Heart Attack

Treatment A P

Y

N

Total

198 193

19, 736 19, 749

19, 934 19, 942

b. Estimate the odds ratio and interpret.

198/19,736 n11 /n12 n21 /n22 = 193/19,749 = 1.0266. Since the odds ratio is greater than women who take aspirin are more likely to have a heart attack than women do not take aspirin. Solution. The odds ratio is

c. Find a

θˆ =

95% condence interval for the population odds ratio for women.

1,

Interpret. (As of 2006, results

suggested that for women, aspirin was helpful for reducing risk of stroke but not necessarily risk of heart attack.)

Solution. The condence interval is given by

The calculations needed to compute the

log θˆ±Zα/2 ·σlog θˆ where σlog θˆ =

95%

q

1 n11

+

1 n12

+

1 n21

+

1 n22 .

condence interval is shown below.

log θˆ = log 1.0266 = 0.0114 r 1 1 1 1 + + + = 0.1017. σlog θˆ = 198 19736 193 19749 log θˆ ± Zα/2 · σlog θˆ becomes 0.0114 ± 1.96 × 0.1017 = (−0.18793, 0.21073), and −0.18793 0.21073 condence interval is (e ,e ) = (0.82867, 1.23458). Since the interval does ˆ θ = 1, we conclude that the true odds of heart attack is the same for both treatments. Thus,

Exercise 12 (2.13). Refer to Table 2.1 about belief in an afterlife.

Gender F M Total

Belief in After Life Y

N

Total

509 398 907

116 104 220

625 502 1127

a. Construct a

90%

condence interval for the dierence of proportions, and interpret.

6

so the

95%

not contain

Stat 653 HW3

Divya Nair

Solution. The condence interval for the dierence of proportions is given by

Here,

p1 =

509 625

= 0.8144

and

p2 =

398 502

r (0.8144 − 0.7928) ± 1.645 ×

= 0.7928.

Thus, the

90%

(p1 −p2 )±Zα/2

q

p1 (1−p1 ) n1

+

condence interval is

0.8144 × 0.1856 0.7928 × 0.2072 + = 0.0216 ± 0.0392 625 502 = (−0.01764, 0.06084).

Since this interval also contains negative values, we conclude

π1 − π2 < 0,

or equivalently,

π1 < π2 .

This means that more males believe in after life than females. b. Construct a

90%

condence level for the odds ratio, and interpret.

Solution. The condence interval for the odds ratio is given by

log θˆ ± Zα/2 · σlog θˆ.

All the calculations

are shown below.

509/116 log θˆ = log = 0.05941 398/104 r 1 1 1 1 + + + = 0.15071 σlog θˆ = 509 116 398 104 log θˆ ± Zα/2 · σlog θˆ = 0.05941 ± 1.645 × 0.15071 = (−0.18851, 0.30733) 90% condence interval ˆ θ = 1, the true odds of belief The

is

(e−0.18851 , e0.30733 ) = (0.82819, 1.35979).

Since this interval contains

in after life is dierent for males and females.

c. Conduct a test of statistical independence. Report the

p-value

and interpret.

Solution. The null hypothesis is that the two response variables are independent, that is,

for all

i

and

j.

πij = πi · πj

The alternate hypothesis is that the two response variables are dependent on each

other. We will use the Pearson chi-squared statistic for testing

H0

is given by

X2 =

P (nij −ˆµij )2 ij

estimate of the expected frequency is given by

µ ˆij =

µ ˆ ij

. An

ni ·nj . A calculation of the estimated expected n

frequencies for each cell is given below.

625 × 907 1127 625 × 220 = 1127 502 × 907 = 1127 502 × 220 = 1127

µ ˆ11 =

= 502.9947

µ ˆ12

= 122.0053

µ ˆ21 µ ˆ22

= 404.0053 = 97.9947

The Pearson chi-squared statistic is

(509 − 502.9947)2 (116 − 122.0053)2 (398 − 404.0053)2 (104 − 97.9947)2 + + + 502.9947 122.0053 404.0053 97.9947 = 0.8246.

X2 =

The degrees of freedom is

(I − 1)(J − 1) = (2 − 1)(2 − 1) = 1.

The

p-value

is

0.3638.

the null hypothesis and conclude that belief in after life and gender are independent.

7

We fail to reject

p2 (1−p2 ) . n2

Suggest Documents