ANOVA: One-Way Analysis of Variance This is the kind of statistical analysis we do when we want to talk about more than two means at once (i.e., with 2 means, we use a form of the t test). The simplest kind of design is where we have 1 independent variable with 3 or more levels. For example, we might want to do a study on the effects of caffeine on test performance where: Level 1: One group gets no caffeine (the Control group), Level 2: A second group gets a low dose (Mild Buzz group) and Level 3: A third group gets a heavy dose (the Jolt group). Thus, we would use an ANOVA to analyze the results to decide if there were statistically significant differences between the groups. Assumptions: 1. Must have independent random samples. 2. Each population needs to be normal. ¾ Do histogram for each group. ¾ If you have one group that is skewed, may want to pursue the ANOVA’s non-parametric counterpart: Kruskal-Wallis. 3. The population needs to have a common variance σ 2 ¾ Boxplots should show about the same spreads.
Hypotheses: H0: All populations are the same Ha: At least one population is different from the others
Example: Effects of Caffeine on Test Performance Group 1: Control
Group 2:Mild
Group 3: Jolt
Test Scores 75
80
70
77
82
72
79
84
74
81
86
76
83
88
78
Individual Group Means 79
84
74
Grand Mean = 79 SD = 3.16
SD = 3.16
SD = 3.16
Raw MINUS
Grand
SQUARED
Raw MINUS
Individual
SQUARED
Individual MINUS
Grand
SQUARED
Raw Scores
Grand Mean
SStot
Raw scores
Individual Mean
SSw
Group Mean
Grand Mean
SSb
Group 1
75
79
16
75
79
16
79
79
0
control
77
79
4
77
79
4
79
79
0
m=79
79
79
0
79
79
0
79
79
0
sd=3.16
81
79
4
81
79
4
79
79
0
83
79
16
83
79
16
79
79
0
Group 2
80
79
1
80
84
16
84
79
25
Mild
82
79
9
82
84
4
84
79
25
m=84
84
79
25
84
84
0
84
79
25
sd=3.16
86
79
49
86
84
4
84
79
25
88
79
81
88
84
16
84
79
25
Computations
Raw MINUS
Grand
SQUARED
Raw MINUS
Individual
SQUARED
Individual MINUS
Grand
SQUARED
Raw Scores
Grand Mean
SStot
Raw scores
Individual Mean
SSw
Group Mean
Grand Mean
SSb
G3
70
79
81
70
74
16
74
79
25
Jolt
72
79
49
72
74
4
74
79
25
m=74
74
79
25
74
74
0
74
79
25
st=3.16
76
79
9
76
74
4
74
79
25
78
79
1
78
74
16
74
79
25
Computations
Sum
1185
370
120
250
ANOVA Summary Table Source
SS
df
MS
F
Between Groups
250
k-1=2
SS/df
F=MSb/MSw
250/2=125
125/10
MSb
12.50
Within Groups
120
N-k
SS/df
15-3=12
120/12=10 MSw
Total
370
N-1=14
Note: Every time we estimate something, we lose a degree of freedom (df). df are also the numbers you divide by to estimate a population variance. Mean squares (MS) are average (mean) sums of square deviations. That is, they are variance estimates. The variance is the mean-square-deviation from the mean. The standard deviation is the root-mean-square deviation from the mean. F is a ratio of two mean squares. MSw is the variance within groups. This is the yardstick we use to judge how large the between groups variance is. If there is a treatment effect, then MSb will be larger than MSw, and the F ratio will be larger than 1.0.
F has a sampling distribution that is used to compute significance tests. The sampling distribution of F is not normal. However, the form of the distribution is known and can be looked up in the F Distribution table in our textbook. Unlike the other distributions we have studied, F demands that we supply 2 quantities, in addition to alpha, before its form is fully specified. The quantities we supply are the df for the numerator and the df for the denominator (i.e., dfb and dfw). Fortunately, there is no decision about one- and two-tailed tests; F is unidirectional. Numerator df: dfb 1
2
3
4
5
1---> 5%
161
200
216
225
230
1%
4052
5000
5403
5625
5764
2---> 5%
18.5
19
19.2
19.2
19.3
1%
98.5
99
99.2
99.2
99.3
5---> 5%
6.61
5.79
5.41
5.19
5.05
1%
16.3
13.3
12.1
11.4
11.0
10-> 5%
4.96
4.10
3.71
3.48
3.33
1%
10.0
7.56
6.55
5.99
5.64
12-> 5%
4.75
3.88
3.49
3.26
3.11
1%
9.33
6.93
5.95
5.41
5.06
14-> 5%
4.60
3.74
3.34
3.11
2.96
1%
8.86
6.51
5.56
5.04
4.70
Denominator df: dfw
To find the critical value of the F distribution in this instance with df = 2 for the numerator and df = 12 for the denominator, we find at the .05 level the intersection of these two values to be 3.88. Our calculated F (12.50) is greater than the critical F (3.88) and, therefore, in the critical region, so we reject the Ho.
Also, the effect size, eta square = .676, informs us percentage-wise regarding the amount of real difference present in the sample between the null hypothesis and the alternative hypothesis. In general, effect sizes derived from ANOVA methods with values of .02, .15, and .35 are considered to represent small, medium, and large effects. So, we have a very large difference between the null and the alternative. We conclude that these data provide evidence of statistically significant differences among the three populations of caffeinated drinks, but we do not know where. We will perform post-hoc comparisons to determine where the difference exists between the groups.
Post Hoc Comparisons: There are many comparison tests from which to choose. Assuming equality of variance, here are 4 tests that are often used in educational research. 1. Scheffe Method: ¾ Most conservative and least likely to pick-up on a difference. However, if there is a difference, we are quite sure about the difference because of this test’s conservative nature. ¾ It takes into account the number of groups we are looking at. 2. Bonferroni Method: ¾ Not as conservative as Scheffe. 3. Least Significant Difference (LSD) Test: A more liberal test than the previous two in terms of finding mean differences between groups. 4. Tukey’s HSD (Honestly Significant Difference)