Analysis of Categorical Data (and the sign test)

Analysis of Categorical Data (and the sign test) Cécile Ané Stat 371 Spring 2006 The sign test Hunger rating (exercise 9.30): The compound mCPP is ...
Author: Lucinda Doyle
13 downloads 0 Views 112KB Size
Analysis of Categorical Data (and the sign test) Cécile Ané Stat 371

Spring 2006

The sign test Hunger rating (exercise 9.30): The compound mCPP is thought to affect appetite and food intake in humans. 2 treatments: mCPP and placebo. 9 men were given one treatment for 2 weeks, nothing for 2 weeks, the other treatment for 2 weeks. Order randomized. Double-blinded. One guy dropped off. Rated how hungry they were at the end of the 2-week periods. Subject

1

2

3

4

5

6

7

9

mCPP placebo

64 69

119 112

0 28

48 95

65 145

119 112

149 141

99 119

d

-5

+7

-28

-47

-80

+7

+8

-20

The sign test Normal probability plot for d: ●



0



−20



5 of 8 men were less hungry with mCPP.



−40





B− = 5 ,

−60 −80

signs of d: − + − − − + +−

Is this difference (5 vs. 3) significant?



−1.5

B+ = 3

−0.5

0.5

1.0

1.5

H0 : median of D is 0, i.e. IP{mCPP better} = IP{placebo better} = 0.5. HA : median of D is not 0, i.e.IP{D < 0} = IP{mCPP better} = 6 0.

The sign test Test statistic: B− = # of − signs. 5 here. If H0 is true, for each subject there is a 50% chance to observe −. So B− ∼ B(n, 0.5)

if H0 is true

p-value: More extreme than B− = 5 (or as extreme as) means B− = 5, 6, 7, 8 or B+ = 5, 6, 7, 8 i.e. B− = 3, 2, 1, 0. p = IP{B− ≤ 3 or B− ≥ 5} = 1 − IP{B− = 4} = 1 − 8 C4 .58 = .72 Conclusion: There is no evidence that mCPP has an effect on hunger rating.

The sign test: rat experiment 8 rats were given a drug. Hemoglobin content of blood was measured before and after the drug. When d = ybefore − yafter we had B− = 1 and B+ = 7. More extreme means B− = 0, 1 or B+ = 0, 1 i.e. B− = 7, 8. p = IP{B− ≤ 1 or B− ≥ 7} = 2 IP{B− ≥ 7}

because B(n, 0.5) is symmetric

= 2 (IP{B− = 7} + IP{B− = 8}) = 2 (8 ∗ 0.58 + 0.58 ) = 0.07 There is weak evidence that the drug decreases hemoglobin content on average (p= .07, sign test).

The sign test: Calculating the p-value Exact p-value: If two-sided test, choose Bs = max{B− , B+ } and p = 2 ∗ IP{Y ≥ Bs } when Y ∼ B(n, 0.5)

nd 1 2 3 4 5 6 7 8

Use Table 7 (p.684) for bracketing the p-value. Tail probability 2 tails: .20 .10 .05 .02 .01 Ex: B− = 7, n = 8 1 tail: .10 .05 .025 .01 .005

5 6 6 7

5 6 7 7

we can reject H0 at α = .20 and α = .10 but not at α = .05. We get 6 7 8

.05 < p < .10 7 8

8

The sign test: What if there are ties? Tie: y1 = y2 for some subject. Then d = 0 for this subject. No sign!! Exclude all zeros, and decrease the sample size accordingly. Example: If the differences were 2.2 −0.9 0.0 1.1 0.6 2.9 1.2 2.0 + − 0 + + + + + B− = 1 as before, but B+ = 6 and n = 1 + 6 = 7, not 8. Use Table 7 with nd = 7. We get .20. With R: X2 =

> mycounts = c(153, 39, 8) > chisq.test(mycounts, p=c(12/16, 3/16, 1/16)) Chi-squared test for given probabilities data: mycounts X-squared = 1.74, df = 2, p-value = 0.4190

200

A genetic example

Validity: expected counts (150, 37.5 and 12.5) are all ≥ 5. Good! Conclusion: There is no evidence that the genetic model is false. The data are consistent with the genetic model (p=0.4). The difference between the data and the model can easily be due to sampling error.

Test of independence We want to compare the performance of 2 drugs on rats. drug 1

drug 2

total

success

71

45

116

failure

34

42

76

total

105

87

192

p1 = IP{success | drug 1}, probability of success with drug 1 p2 = IP{success | drug 2} He want to test H0 : drugs perform equally, i.e p1 = p2 against HA : one drug is better than the other, i.e p1 6= p2 . Equivalently, we have H0 : drug and success are independent. HA : drug and success are not independent.

Test of independence 1

Build table of expected counts under H0 .

If H0 is true, p1 = p2 , but we don’t know this value. So we estimate it. Best guess is ˆ= p

116 total # successes = = .60 total # rats 192

ˆ1 = somewhat in between p

71 105

ˆ2 = = .68 and p

45 87

= .52.

ˆ = 105 ∗ Expected # successes with drug 1: 105 ∗ p In general, E=

Row total * Column total Grand total

116 192 .

Test of independence Observed counts:

Expected counts when drug and success are independent:

drug 1 drug 2

, /

total 2

total

71

45

116

34

42

76

105

87

192

, /

total

drug 1 drug 2

total

63.44

52.56

116

41.56

34.44

76

105

87

192

Calculate the test statistic X 2 X2 =

X (obs − exp)2 exp

all cells

(42 − 34.44)2 (71 − 63.44)2 + ··· + 63.44 34.44 = 5.026

=

Test of independence

3

Calculate the p-value. If there is independence (success does not depend on drug) then X 2 has a χ2 distribution with df= 1 here. Using Table 9 (p. 686), we get .02 < p < .05.

4

Conclusion: There is moderate evidence that the drugs have different success rates (p = 0.025, chi-square test of independence). ˆ1 = .68 > p ˆ2 = .52. Furthermore, in the data we have p There is evidence that drug 1 has a higher success rate.

Test of independence with R > rats = matrix( c(71,34,45,42), 2,2) > rats [,1] [,2] [1,] 71 45 [2,] 34 42 > chisq.test(rats) Pearson’s Chi-squared test with Yates’ continuity correction data: rats X-squared = 4.3837, df = 1, p-value = 0.03628 > chisq.test(rats, correct=FALSE) Pearson’s Chi-squared test data: rats X-squared = 5.0264, df = 1, p-value = 0.02496

χ2 test with a bigger table

Example 10.33: 6,800 German men were sampled. Hair color brown black fair red total Eye color

brown

438

288

115

16

gray/green

1387

746

946

53 3132

blue

807

189 1768

47 2811

total

857

2632 1223 2829 116 6800

H0 : Hair color and eye color are independent HA : They are not!

χ2 test with a bigger table H0 can be stated in many ways: The frequencies of eye colors do not depend on hair color: IP{blue eyes|brown hair} = IP{blue eyes|black hair} = IP{blue eyes|fair hair} = IP{blue eyes|red hair} etc. with all other eye colors. The frequencies of hair colors do not depend on eye color. IP{red hair|brown eyes} = IP{red hair|gray/green eyes} = IP{red hair|blue eyes} etc. with all other hair colors. HA states that at least one of these equalities is not true.

Expected values Hair color brown black fair red total brown

438

288

115

16

gray/green

1387

746

946

53 3132

blue

807

189 1768

47 2811

total

brown

857

2632 1223 2829 116 6800 Hair color brown black fair red total 331.71 154.13

356.54

857

gray/green

3132

blue

2811

total

2632

1223

2829

116 6800

X 2 , degree of freedom and p-value

X2 =

X (obs − exp)2 (438 − 331.71)2 (47 − 47.95)2 = + ··· + exp 331.71 47.95

all cells

= 34.1 + 116.3 + 163.6 + 0.1 + 25.2 + 59.3 + 97.8 + 0.004 +72.6 + 198.2 + 306.3 + 0.02 = 1073.5 Degree of freedom: # pieces of information (cells) needed to fill in entire table. Marginals (totals in the margins) are known. df =

.

Here df = 6. p-value: From Table 9, we get p < .0001. There is overwhelming evidence that hair and eye color are not independent. They are associated.

Interpretation

We can now look at the largest contributions to X 2 and see where the association is the strongest. Blue eyes/fair hair are associated: blue-eyed people tend to have fair hair more frequently than non blue-eyed people. On the opposite, blue-eyed people tend to have black hair less frequently than non blue-eyed people.

Applicability of the method

Independence of observations Expected counts of cells ≥ 5 for the χ2 distribution to be a good approximation. If some cells have small counts, what can be done? Fisher’s test (2 × 2 tables only): but we won’t cover this method. Group cells together. Eye/hair example with a 10-fold decrease in sample size.

Hair color brown black fair

Grouping cells

red

total

brown

44

29

11 1 (1.37)

gray/green

138

75

95 5 (5.06) 313

blue

81

total

85

19 177 5 (4.56) 282

263 123 283 11 Hair color brown black fair/red total

brown

44

29

12

gray/green

138

75

100 313

blue

81

19

182 282

total

263

123

294 680

680

85

Now expected counts are ≥ 5 in all cells. We get df= 4, X 2 = 107 and p < .0001.

Confidence interval for p1 − p2 (with 2 × 2 tables)

Experiment on rats: drug 1 drug 2

, /

total p1 = IP

71

45

116

34

42

76

105

87

192



treatment 1 treatment 2

total

, /

total

,| drug/treatment 1



and p2 = IP



y1

y2

n1 − y1

n2 − y2

n1

n2

,| drug 2



We know from the χ2 test that p1 > p2 (p-value was .025). Now we want more: a confidence interval for p1 − p2 .

Confidence interval for p1 − p2 Same trick that we saw before: we add 4 fictitious individuals (rats), one in each cell (no favorite cell!) drug 1 drug 2

, /

+1

+1

+1

+1

treatment 1 treatment 2

, /

y1 + 1

n1 − y1 + 1 n2 − y2 + 1

total

˜1 = Estimates of p1 and p2 are p ˜1 − p ˜2 Estimate of p1 − p2 is p

y1 + 1 n1 + 2

y2 + 1

n1 + 2

˜2 = p

n2 + 2

y2 + 1 n2 + 2

Standard error of this estimate: s SEp˜1 −p˜2 =

q

SEp˜2 + SEp˜2 = 1

2

˜1 (1 − p ˜1 ) p ˜2 (1 − p ˜2 ) p + n1 + 2 n2 + 2

Confidence interval for p1 − p2

95% confidence interval for p1 − p2 is ˜1 − p ˜2 ± 1.96 ∗ SEp˜1 −p˜2 p Other confidence levels: use a z-multiplier from the Z-table. 90% confidence: z.05 = 1.645 and the interval becomes ˜1 − p ˜2 ± 1.645 ∗ SEp˜1 −p˜2 p 71 + 1 45 + 1 ˜2 = = .67, p = .52, SEp˜1 −p˜2 = 0.07. 105 + 2 87 + 2 95% confidence interval: .156 ± 1.96 ∗ .07 i.e (.019, .293). 90% confidence interval: (.041, .271).

˜1 = Rats: p

Smoking cessation example no contact group counseling

, quit smoking / resumed within a year Total

p1 = IP ˆ1 = p



1 31

,| no contact



= .032


1-pchisq(8.09, df=1) [1] 0.004451016

Smoking cessation example

Observed and expected counts: no contact group counseling

, quit smoking / resumed within a year Total

total

1 (6.64)

26 (20.36)

27

30 (24.36)

69 (74.64)

99

31

95

126

Is a chi-square test okay to use here? Is it valid?