Analysis of Categorical Data (and the sign test) Cécile Ané Stat 371
Spring 2006
The sign test Hunger rating (exercise 9.30): The compound mCPP is thought to affect appetite and food intake in humans. 2 treatments: mCPP and placebo. 9 men were given one treatment for 2 weeks, nothing for 2 weeks, the other treatment for 2 weeks. Order randomized. Double-blinded. One guy dropped off. Rated how hungry they were at the end of the 2-week periods. Subject
1
2
3
4
5
6
7
9
mCPP placebo
64 69
119 112
0 28
48 95
65 145
119 112
149 141
99 119
d
-5
+7
-28
-47
-80
+7
+8
-20
The sign test Normal probability plot for d: ●
●
0
●
−20
●
5 of 8 men were less hungry with mCPP.
●
−40
●
●
B− = 5 ,
−60 −80
signs of d: − + − − − + +−
Is this difference (5 vs. 3) significant?
●
−1.5
B+ = 3
−0.5
0.5
1.0
1.5
H0 : median of D is 0, i.e. IP{mCPP better} = IP{placebo better} = 0.5. HA : median of D is not 0, i.e.IP{D < 0} = IP{mCPP better} = 6 0.
The sign test Test statistic: B− = # of − signs. 5 here. If H0 is true, for each subject there is a 50% chance to observe −. So B− ∼ B(n, 0.5)
if H0 is true
p-value: More extreme than B− = 5 (or as extreme as) means B− = 5, 6, 7, 8 or B+ = 5, 6, 7, 8 i.e. B− = 3, 2, 1, 0. p = IP{B− ≤ 3 or B− ≥ 5} = 1 − IP{B− = 4} = 1 − 8 C4 .58 = .72 Conclusion: There is no evidence that mCPP has an effect on hunger rating.
The sign test: rat experiment 8 rats were given a drug. Hemoglobin content of blood was measured before and after the drug. When d = ybefore − yafter we had B− = 1 and B+ = 7. More extreme means B− = 0, 1 or B+ = 0, 1 i.e. B− = 7, 8. p = IP{B− ≤ 1 or B− ≥ 7} = 2 IP{B− ≥ 7}
because B(n, 0.5) is symmetric
= 2 (IP{B− = 7} + IP{B− = 8}) = 2 (8 ∗ 0.58 + 0.58 ) = 0.07 There is weak evidence that the drug decreases hemoglobin content on average (p= .07, sign test).
The sign test: Calculating the p-value Exact p-value: If two-sided test, choose Bs = max{B− , B+ } and p = 2 ∗ IP{Y ≥ Bs } when Y ∼ B(n, 0.5)
nd 1 2 3 4 5 6 7 8
Use Table 7 (p.684) for bracketing the p-value. Tail probability 2 tails: .20 .10 .05 .02 .01 Ex: B− = 7, n = 8 1 tail: .10 .05 .025 .01 .005
5 6 6 7
5 6 7 7
we can reject H0 at α = .20 and α = .10 but not at α = .05. We get 6 7 8
.05 < p < .10 7 8
8
The sign test: What if there are ties? Tie: y1 = y2 for some subject. Then d = 0 for this subject. No sign!! Exclude all zeros, and decrease the sample size accordingly. Example: If the differences were 2.2 −0.9 0.0 1.1 0.6 2.9 1.2 2.0 + − 0 + + + + + B− = 1 as before, but B+ = 6 and n = 1 + 6 = 7, not 8. Use Table 7 with nd = 7. We get .20. With R: X2 =
> mycounts = c(153, 39, 8) > chisq.test(mycounts, p=c(12/16, 3/16, 1/16)) Chi-squared test for given probabilities data: mycounts X-squared = 1.74, df = 2, p-value = 0.4190
200
A genetic example
Validity: expected counts (150, 37.5 and 12.5) are all ≥ 5. Good! Conclusion: There is no evidence that the genetic model is false. The data are consistent with the genetic model (p=0.4). The difference between the data and the model can easily be due to sampling error.
Test of independence We want to compare the performance of 2 drugs on rats. drug 1
drug 2
total
success
71
45
116
failure
34
42
76
total
105
87
192
p1 = IP{success | drug 1}, probability of success with drug 1 p2 = IP{success | drug 2} He want to test H0 : drugs perform equally, i.e p1 = p2 against HA : one drug is better than the other, i.e p1 6= p2 . Equivalently, we have H0 : drug and success are independent. HA : drug and success are not independent.
Test of independence 1
Build table of expected counts under H0 .
If H0 is true, p1 = p2 , but we don’t know this value. So we estimate it. Best guess is ˆ= p
116 total # successes = = .60 total # rats 192
ˆ1 = somewhat in between p
71 105
ˆ2 = = .68 and p
45 87
= .52.
ˆ = 105 ∗ Expected # successes with drug 1: 105 ∗ p In general, E=
Row total * Column total Grand total
116 192 .
Test of independence Observed counts:
Expected counts when drug and success are independent:
drug 1 drug 2
, /
total 2
total
71
45
116
34
42
76
105
87
192
, /
total
drug 1 drug 2
total
63.44
52.56
116
41.56
34.44
76
105
87
192
Calculate the test statistic X 2 X2 =
X (obs − exp)2 exp
all cells
(42 − 34.44)2 (71 − 63.44)2 + ··· + 63.44 34.44 = 5.026
=
Test of independence
3
Calculate the p-value. If there is independence (success does not depend on drug) then X 2 has a χ2 distribution with df= 1 here. Using Table 9 (p. 686), we get .02 < p < .05.
4
Conclusion: There is moderate evidence that the drugs have different success rates (p = 0.025, chi-square test of independence). ˆ1 = .68 > p ˆ2 = .52. Furthermore, in the data we have p There is evidence that drug 1 has a higher success rate.
Test of independence with R > rats = matrix( c(71,34,45,42), 2,2) > rats [,1] [,2] [1,] 71 45 [2,] 34 42 > chisq.test(rats) Pearson’s Chi-squared test with Yates’ continuity correction data: rats X-squared = 4.3837, df = 1, p-value = 0.03628 > chisq.test(rats, correct=FALSE) Pearson’s Chi-squared test data: rats X-squared = 5.0264, df = 1, p-value = 0.02496
χ2 test with a bigger table
Example 10.33: 6,800 German men were sampled. Hair color brown black fair red total Eye color
brown
438
288
115
16
gray/green
1387
746
946
53 3132
blue
807
189 1768
47 2811
total
857
2632 1223 2829 116 6800
H0 : Hair color and eye color are independent HA : They are not!
χ2 test with a bigger table H0 can be stated in many ways: The frequencies of eye colors do not depend on hair color: IP{blue eyes|brown hair} = IP{blue eyes|black hair} = IP{blue eyes|fair hair} = IP{blue eyes|red hair} etc. with all other eye colors. The frequencies of hair colors do not depend on eye color. IP{red hair|brown eyes} = IP{red hair|gray/green eyes} = IP{red hair|blue eyes} etc. with all other hair colors. HA states that at least one of these equalities is not true.
Expected values Hair color brown black fair red total brown
438
288
115
16
gray/green
1387
746
946
53 3132
blue
807
189 1768
47 2811
total
brown
857
2632 1223 2829 116 6800 Hair color brown black fair red total 331.71 154.13
356.54
857
gray/green
3132
blue
2811
total
2632
1223
2829
116 6800
X 2 , degree of freedom and p-value
X2 =
X (obs − exp)2 (438 − 331.71)2 (47 − 47.95)2 = + ··· + exp 331.71 47.95
all cells
= 34.1 + 116.3 + 163.6 + 0.1 + 25.2 + 59.3 + 97.8 + 0.004 +72.6 + 198.2 + 306.3 + 0.02 = 1073.5 Degree of freedom: # pieces of information (cells) needed to fill in entire table. Marginals (totals in the margins) are known. df =
.
Here df = 6. p-value: From Table 9, we get p < .0001. There is overwhelming evidence that hair and eye color are not independent. They are associated.
Interpretation
We can now look at the largest contributions to X 2 and see where the association is the strongest. Blue eyes/fair hair are associated: blue-eyed people tend to have fair hair more frequently than non blue-eyed people. On the opposite, blue-eyed people tend to have black hair less frequently than non blue-eyed people.
Applicability of the method
Independence of observations Expected counts of cells ≥ 5 for the χ2 distribution to be a good approximation. If some cells have small counts, what can be done? Fisher’s test (2 × 2 tables only): but we won’t cover this method. Group cells together. Eye/hair example with a 10-fold decrease in sample size.
Hair color brown black fair
Grouping cells
red
total
brown
44
29
11 1 (1.37)
gray/green
138
75
95 5 (5.06) 313
blue
81
total
85
19 177 5 (4.56) 282
263 123 283 11 Hair color brown black fair/red total
brown
44
29
12
gray/green
138
75
100 313
blue
81
19
182 282
total
263
123
294 680
680
85
Now expected counts are ≥ 5 in all cells. We get df= 4, X 2 = 107 and p < .0001.
Confidence interval for p1 − p2 (with 2 × 2 tables)
Experiment on rats: drug 1 drug 2
, /
total p1 = IP
71
45
116
34
42
76
105
87
192
treatment 1 treatment 2
total
, /
total
,| drug/treatment 1
and p2 = IP
y1
y2
n1 − y1
n2 − y2
n1
n2
,| drug 2
We know from the χ2 test that p1 > p2 (p-value was .025). Now we want more: a confidence interval for p1 − p2 .
Confidence interval for p1 − p2 Same trick that we saw before: we add 4 fictitious individuals (rats), one in each cell (no favorite cell!) drug 1 drug 2
, /
+1
+1
+1
+1
treatment 1 treatment 2
, /
y1 + 1
n1 − y1 + 1 n2 − y2 + 1
total
˜1 = Estimates of p1 and p2 are p ˜1 − p ˜2 Estimate of p1 − p2 is p
y1 + 1 n1 + 2
y2 + 1
n1 + 2
˜2 = p
n2 + 2
y2 + 1 n2 + 2
Standard error of this estimate: s SEp˜1 −p˜2 =
q
SEp˜2 + SEp˜2 = 1
2
˜1 (1 − p ˜1 ) p ˜2 (1 − p ˜2 ) p + n1 + 2 n2 + 2
Confidence interval for p1 − p2
95% confidence interval for p1 − p2 is ˜1 − p ˜2 ± 1.96 ∗ SEp˜1 −p˜2 p Other confidence levels: use a z-multiplier from the Z-table. 90% confidence: z.05 = 1.645 and the interval becomes ˜1 − p ˜2 ± 1.645 ∗ SEp˜1 −p˜2 p 71 + 1 45 + 1 ˜2 = = .67, p = .52, SEp˜1 −p˜2 = 0.07. 105 + 2 87 + 2 95% confidence interval: .156 ± 1.96 ∗ .07 i.e (.019, .293). 90% confidence interval: (.041, .271).
˜1 = Rats: p
Smoking cessation example no contact group counseling
, quit smoking / resumed within a year Total
p1 = IP ˆ1 = p
1 31
,| no contact
= .032
1-pchisq(8.09, df=1) [1] 0.004451016
Smoking cessation example
Observed and expected counts: no contact group counseling
, quit smoking / resumed within a year Total
total
1 (6.64)
26 (20.36)
27
30 (24.36)
69 (74.64)
99
31
95
126
Is a chi-square test okay to use here? Is it valid?