Week 3 Lecture: Two-Sample Hypothesis Tests (Chapter 8)

Week 3 Lecture: Two-Sample Hypothesis Tests (Chapter 8) Testing for Differences Between Two Means Just as we used the Student’s t-distribution for one...
2 downloads 0 Views 49KB Size
Week 3 Lecture: Two-Sample Hypothesis Tests (Chapter 8) Testing for Differences Between Two Means Just as we used the Student’s t-distribution for one-sample hypothesis testing, we can use it for testing two-sample hypotheses. The procedure is analogous to the t-test for one-sample hypotheses, except the t-statistic is calculated as a ratio between the differences of two sample means and the standard errors of the difference between two sample means: t=

X1 − X 2 , s X1 − X 2

where X i = sample mean, i = 1,2.

s X1 − X 2 = s 2p =

s 2p n1

+

s 2p n2

SS1 + SS 2 ν1 + ν 2

SSi = Sums of squares, i = 1,2 υ i = ni – 1, i = 1,2

So, our t-statistic equation becomes:

t=

X1 − X 2 s 2p n1

+

s 2p n2

or, if sample sizes are equal (i.e., n1 = n2):

1

t=

X1 − X 2 2s 2p

.

n The t-statistic is then compared to a critical t-value: t α , ν = ν 1 + ν 2 = n 1 + n 2 − 2

The decision rule varies whether you are testing one-tailed or two-tailed hypotheses (analogous to the one-sample case). Zar outlines a summary of decision rules on page 135: 1. When Ho: µ1 = µ2 and Ha: µ1 ≠ µ2, and if

t ≥ t α (2 ),ν , then reject Ho

2. When Ho: µ1 ≥ µ2 and Ha: µ1 < µ2, and if t ≤ − t α (1),ν , then reject Ho

3. When Ho: µ1 ≤ µ2 and Ha: µ1 > µ2, and if t ≥ t α (1),ν , then reject Ho.

EXAMPLE Let’s do an example using the Appalachian oak data from last week’s lecture (WO = white oak, RO = red oak). Ho: µwo = µro Ha: µwo ≠ µro α = 0.05 SAMPLE 1

SAMPLE 2

SPECIES

HEIGHT

SPECIES

HEIGHT

WO WO WO WO WO WO WO WO WO WO

87 82 85 85 82 87 100 101 93 96

RO RO RO RO RO RO RO RO RO RO

92 75 90 98 79 77 91 84 86 87

2

WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO WO

75 88 85 95 72 70 88 86 81 87 75 71 86 72 72 71 63 84 66 71

RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO RO

n1 = 30 υ1 = 29 X1 = 81.9 feet SS1 = 2903.46667 ft2

s 2p =

94 84 90 91 94 89 100 74 77 79 95 88 80 87 87 76 77 80 86 83 79 78 74 77 77 80 72 72 60

n2 = 39 υ2 = 38 X 2 = 83.1 feet SS2 = 2661.89744 ft2

SS1 + SS 2 2903.46667 + 2661.89744 = = 83.06514 ft 2 29 + 38 ν1 + ν 2

s X1 − X 2 =

s 2p n1

+

s 2p n2

=

83.06514 83.06514 + = 2.21330 feet 30 39

3

t=

X1 − X 2 81.9 − 83.1 = = −0.54218 s X1 − X 2 2.21330

t α (2 ), ν = ν 1 + ν 2 = t 0 .05 (2 ), 67 = 1 . 996

Decision Rule: If t ≥ 1.996 , then reject Ho; otherwise, do not reject Ho. Conclusion: Since − 0.54218 < 1.996 (P > 0.50), do not reject Ho and conclude that mean height for red oaks is the same as mean height for white oaks in the population of Appalachian oaks that we sampled (or, more statistically correct…we do not have sufficient evidence to reject the null hypothesis that mean heights for red and white oaks are different).

You should notice that the two-tailed hypotheses can also be written as: Ho: µ1 - µ2 = 0 Ha: µ1 - µ2 ≠ 0. This can be generalized to test if the difference between the sample means is significantly different from a hypothesized difference between population means: Ho: µ1 - µ2 = µo Ha: µ1 - µ2 ≠ µo. For the generalized two-tailed hypotheses, the t-statistic can be calculated by: t=

X1 − X 2 − µ o s X1 − X 2

.

In a similar fashion, the one-tailed hypotheses can be rewritten as: 1. Ho: µ1 ≥ µ2 and Ha: µ1 < µ2, changes to: Ho: µ1 - µ2 ≥ 0 and Ha: µ1 - µ2 < 0, 2. Ho: µ1 ≤ µ2 and Ha: µ1 > µ2, changes to: Ho: µ1 - µ2 ≤ 0 and Ha: µ1 - µ2 > 0.

4

These cases can also be generalized as the two-tailed case to test if the difference between the sample means is significantly different from a hypothesized difference between population means: 1. Ho: µ1 - µ2 ≥ µo and Ha: µ1 - µ2 < µo, 2. Ho: µ1 - µ2 ≤ µo and Ha: µ1 - µ2 > µo.

For both of these generalized cases, the t-statistic is calculated by: t=

X1 − X 2 − µ o . s X1 − X 2

The decision rule for the generalized cases also varies whether you are testing one-tailed or twotailed hypotheses. Zar outlines a summary of decision rules on page 135: 1. When Ho: µ1 - µ2 = µo and Ha: µ1 - µ2 ≠ 0, and if

t ≥ t α (2 ),ν , then reject Ho

2. When Ho: µ1 - µ2 ≥ µo and Ha: µ1 - µ2 < µo, and if t ≤ − t α (1),ν , then reject Ho 3. When Ho: µ1 - µ2 ≤ µo and Ha: µ1 - µ2 > µo, and if t ≥ t α (1),ν , then reject Ho.

Confidence Intervals for the Population Mean In the two-sample case where σ12 = σ 22 , confidence intervals for either µ1 or µ2 are calculated in an analogous fashion to the one-sample case except that the pooled variance ( s 2p ) is used instead of s12 or s 22 : X i ± t α (2 ),ν

s 2p ni

,

5

where i = 1, 2 (depending on which sample you are using) and all other variables defined as before. If σ12 ≠ σ 22 , then the confidence interval is calculated by: s i2 , ni

X i ± t α (2 ),ν ′

where i = 1, 2 (depending on which sample you are using), ν’ = degrees of freedom computed with Equation 8.12 (page 139), and all other variables defined as before.

Zar also outlines how to calculate confidence intervals for the difference between two population means (page 144) that we will not discuss here.

Power and Sample Size in Tests for Differences Between Two Means Just as with one-sample tests, we can determine the number of samples necessary to achieve a specified level of statistical precision. The difference is that we will use the pooled variance in the two-sample case (we must assume that we have not violated any of the assumptions discussed below). The formula for calculating n for a specified α- and β-level and a minimum detectable difference, δ, is: n≥

2s 2p δ

2

(t

+ t β (1),ν ) . 2

α ,ν

We can rearrange this formula to calculate the minimum detectable difference, given the probability levels and sample size: δ≥

2s 2p n

(t

α,v

+ t β (1),ν ) .

6

We can also rearrange this formula as we did in the one-sample case to calculate the power of the test, 1 – β (prior to performing our hypothesis test): t β (1),ν ≤

δ 2s

− t α ,ν .

2 p

n Note that maximum power (and robustness) is achieved when n1 = n2. You also get more power as n1 and n2 increase. If we want to estimate power after Ho is tested, we use the following formula (see Zar 4th edition, Example 8.7, p. 137): φ=

nd 2 − 2s 2p 4s 2p

, where d = the difference in the sample means >>> d = X1 − X 2 .

After computing φ, use Zar’s Figure B.1 to estimate β, which can be used to find power (1 – β).

Assumptions of the t-test The two-sample t-test assumes that both samples are independent and randomly selected from a normal populations with equal variances. The t-test is quite robust and is not usually affected when these assumptions are violated (especially when sample sizes are equal or nearly equal and when testing 2-tailed hypotheses – see Zar page 137). Also, as n increases, the t-test is more robust. Last week, we discussed how to test for normality (see example for normality tests in SAS using the oak data). We can also test to see if two variances are equal.

Testing for Difference Between Two Variances As in one-sample variance tests, we cannot use the t-distribution for making two-sample variance tests. Zar suggests using the Variance Ratio Test for testing the null hypothesis that two variances are equal. This test uses ratios of the two variances: 7

s12 s 22 F = 2 or F = 2 , whichever is larger (i.e., put the larger variance in the numerator). s2 s1 Ratios of variances are fortunately known to follow the Snedecor’s F-distribution. If we do not reject Ho, then we can calculate the pooled variance, which is necessary for the t-test. However, this test is severely affected by non-normal data. When the data are not normally distributed, you should not perform this test (proceed to the non-parametric Mann-Whitney test – see Zar p. 163).

Example: We previously tested the Appalachian oak data to see if they were normally distributed (they were). We should also test to see if the variances are equal (i.e., homogeneous). If both of these assumptions hold, then we performed a valid t-test. So, let’s test the hypotheses: Ho: σ 2WO = σ 2RO Ha: σ 2WO ≠ σ 2RO α = 0.05 For our samples, n1 = 30 υ1 = 29 s12 = 100.11954 ft 2 F=

n2 = 39 υ2 = 38 s 22 = 70.04993 ft 2

s12 100.11954 = = 1.42926 70.04993 s 22

Fα ,ν1 ,ν 2 = F0.05( 2 ), 29,38 = 1.97234 (this critical value calculated in Excel with the FINV function). Decision Rule: If F ≥ 1.97234 , then reject Ho: otherwise, do not reject. Conclusion: Since 1.42926 < 1.97234 (0.2 < P < 0.5), do not reject Ho and conclude that the two sample variances are equal. Thus, all assumptions in our two-sample t-test of the Appalachian 8

oaks were valid. (Note, the exact p-value = 0.29941, which is computed from the FDIST Excel function. When using Excel, remember that all values are computed for one tail of the Fdistribution).

When the Assumptions Are Violated If you believe that your sample severely violates the normality or homogeneous variance assumptions, you have two options: 1. use a t-test for unequal variances (aka, Welch’s approximate t) 2. use the non-parametric Mann-Whitney test.

T-test for Unequal Variances Zar presents a t-test (attributed to Smith) called “Welch’s Approximate t” that performs well when the two sample variances are not equal. The following formula is used to calculate the test statistic: t' =

X1 − X 2 s12 s 22 + n1 n 2

,

which is then compared to a t-critical value with the following degrees of freedom:

ν' =

⎛ s12 s 22 ⎞ ⎜⎜ + ⎟⎟ ⎝ n1 n 2 ⎠ 2

2

2

⎛s ⎞ ⎛s ⎞ ⎜⎜ ⎟⎟ ⎜⎜ ⎟⎟ n ⎝ 1 ⎠ + ⎝ n2 ⎠ n1 − 1 n 2 − 1 2 1

2 2

.

9

The Mann-Whitney Rank Test When the assumptions of the t-test do not hold, or when the sample data are not normally distributed (e.g., ordinal or nominal data), you can use non-parametric procedures to perform hypothesis tests. Non-parametric tests make no assumptions about the underlying distribution of the sampled population, and so they are often called “distribution-free” tests.

The Mann-Whitney (aka, the Wilcoxon-Mann-Whitney) test is analogous to the two-sample ttest. This test was originally developed for equal sample sizes by Wilcoxon (ie, the Wilcoxon test) and later expanded by Mann & Whitney. In the Mann-Whitney test, the data in both samples are ranked from lowest to highest (or the other way around – it doesn’t really matter). Then, two test-statistics are calculated based on the ranking of the data: 1. U = n 1 n 2 +

n 1 (n 1 + 1) − R1 2

2. U ' = n 2 n 1 +

n 2 (n 2 + 1) − R2, 2

where: ni = number of observations in ith sample, i = 1, 2 Ri = sum of the ranks of the observations in the ith sample, i = 1, 2.

These test statistics are then used to test one- or two-tailed hypothesis by comparison to a critical value. For a two-tailed test, U or U’ (whichever is largest) is compared to the critical value: U α ( 2 ), n 1 , n 2 , where n1 < n2 (if n1 > n2, then use the critical value: U α ( 2 ), n 2 , n 1 ). Critical values for

10

the U-distribution can be found in Table B.11. If U or U’ ≥ U α ( 2 ), n 1 , n 2 (or U α ( 2 ), n 2 , n 1 , if applicable), then Ho is rejected.

Note, the following condition should be checked to make certain that rank assignment is correct:

R1 + R 2 =

N(N + 1) , 2

where N = n1 + n2. This condition is not usually met in the case of assigning ranks to tied data. When data are tied (ie, when two or more observations have the same value), you must assign the mean of the ranks that would have been assigned had these ranks not been tied (see Zar p. 167 for an example).

The order of the ranking (ie, lowest to highest vs. highest to lowest) is not important for twotailed tests, but it is important for one-tailed tests. For one-tailed hypothesis tests, you need to determine which tail of the Mann-Whitney U-distribution is needed, then use the appropriate test statistic, U or U’ (see Zar page 166, Table 8.2, to determine which statistic is appropriate). Zar provides two good examples (examples 8.11 and 8.12) on pages 164 and 167.

11