Nested ANOVA
Tablets Example 17.10
Page 1011
We have contents of tablets for different sites of tablets and different batches of tablets. Obs
site
batch
content
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3
5.03 5.10 5.25 4.98 5.05 4.64 4.73 4.82 4.95 5.06 5.10 5.15 5.20 5.08 5.14 5.05 4.96 5.12 5.12 5.05 5.46 5.15 5.18 5.18 5.11 4.90 4.95 4.86 4.86
30
2
3
5.07
proc glm data=tablets; class site batch; model content = site batch(site); * Include the random batch(site) effect; random batch(site)/ q test; * q means get expected mean squares including Q(); * test means use the right denominator MS for tests; * IGNORE! the p-values in the ANOVa table; run;
The GLM Procedure Dependent Variable: content
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
5
0.47226667
0.09445333
7.81
0.0002
Error
24
0.29020000
0.01209167
Corrected Total
29
0.76246667
Source
R-Square
Coeff Var
Root MSE
content Mean
0.619393
2.180346
0.109962
5.043333
Source site batch(site)
DF
Type I SS
Mean Square
F Value
Pr > F
1 4
0.01825333 0.45401333
0.01825333 0.11350333
1.51 9.39
0.2311 0.0001
Source
Type III Expected Mean Square
site
Var(Error) + 5 Var(batch(site)) + Q(site)
batch(site)
Var(Error) + 5 Var(batch(site))
Dependent Variable: content Source
DF
Type III SS
Mean Square
F Value
Pr > F
site
1
0.018253
0.018253
0.16
0.7089
Error Error: MS(batch(site))
4
0.454013
0.113503
DF
Type III SS
Mean Square
F Value
Pr > F
4
0.454013
0.113503
9.39
0.0001
24
0.290200
0.012092
Source batch(site) Error: MS(Error)
Based on the expected mean squares, we can find estimates of the variance components.
σˆ ε2 = MS Error = 0.01209167 2 σˆ batch ( site ) =
MS Batch( site) − MS Error 0.11350333 - 0.01209167 = = 0.02028 5 5
The MIXED procedure has many advantages over the GLM procedure, particularly for unbalanced data and longitudinal, repeated measures data. proc mixed data=tablets; class site batch; model content = site; * Don't include the random batch(site) effect; random batch(site); run; The Mixed Procedure Cov Parm
Estimate
batch(site) Residual
0.02028 0.01209
Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)
-29.8 -25.8 -25.3 -26.2
Type 3 Tests of Fixed Effects
Effect site
Num DF
Den DF
F Value
Pr > F
1
4
0.16
0.7089
Interpreting the batch(site) effect and the test of a site effect proc means data=tablets noprint; var content; by site batch; output out=somemeans mean=mean_content var=var_content; proc print data=somemeans; run; Obs
site
1 2 3 4 5 6
1 1 1 2 2 2
batch 1 2 3 1 2 3
_TYPE_ 0 0 0 0 0 0
_FREQ_ 5 5 5 5 5 5
mean_
var_
content
content
5.082 4.840 5.134 5.060 5.216 4.928
0.01067 0.02825 0.00218 0.00435 0.01943 0.00767
proc glm data=somemeans; class site; model mean_content = site; run; Dependent Variable: mean_content
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
0.00365067
0.00365067
0.16
0.7089
Error
4
0.09080267
0.02270067
Corrected Total
5
0.09445333
Source
R-Square
Coeff Var
Root MSE
mean_content Mean
0.038650
2.987457
0.150667
5.043333
Source site
•
DF
Type I SS
Mean Square
F Value
Pr > F
1
0.00365067
0.00365067
0.16
0.7089
With equal sample sites, perfectly balanced data o Using individual values in the nested ANOVA, SSBatch(Site) is 5 times the SSBatch(Site) using batch means. The 5 comes from having 5 values in each batch mean. Using all values, each batch mean has 5 values. The batch within site effect looks at variability between batches within the same site category, deviations of batch means from the site means. ⎡5 ( 5.082 − 5.01867 )2 + 5 ( 4.84 − 5.01867 )2 + 5 ( 5.134 − 5.01867 )2 + ⎤ ⎥ SSB( A) = ⎢ ⎢⎣5 ( 5.060 − 5.068 )2 + 5 ( 5.261 − 5.068 )2 + 5 ( 4.928 − 5.068 )2 ⎥⎦
Using only mean values for each batch, the within, error SS is ⎡( 5.082 − 5.01867 )2 + ( 4.84 − 5.01867 )2 + ( 5.134 − 5.01867 )2 + ⎤ ⎢ ⎥ ⎢⎣( 5.060 − 5.068 )2 + ( 5.261 − 5.068 )2 + ( 4.928 − 5.068 )2 ⎥⎦ SSB(A) has an extra multiplier of 5 compared to the SSError if we do a 1-way ANOVA with batch means.
o Using individual values in the nested ANOVA, MSBatch(Site) is 5 times the pooled variance using batch means. MSB(A) = 5*0.02270067 = 0.113503
o
The Site SS in the nested analysis is 5 times the Site MS in the ANOVA with batch means. Equations for SSA on page 1012 have multipliers of 15 There are 15 values for each A mean. 2 2 SSA = 15 ⎡( 5.01867 − 5.04333) + ( 5.068 − 5.04333) ⎤ ⎣ ⎦ 2 2 = 5 ⎡3 ( 5.01867 − 5.04333) + 3 ( 5.068 − 5.04333) ⎤ = 5 ∗ SSAY ' s ⎣ ⎦ 2 ⎡ 3 ( 5.01867 − 5.04333) + 3 ( 5.068 − 5.04333)2 ⎤ MSB( A) = SSA / df A = 5 ⎢ ⎥ = 5 ∗ MSAY ' s 1 ⎢⎣ ⎥⎦
5*0.00365067 = 0.018253
o The test for a site effect in the nested analysis is the same as in a 1-way ANOVA F-test or t-test using the batch means. With balanced data and correlated measurements, e.g. tablets in the same batch, a way to derive independent measurements is to average correlated values and use these averages in the usual analysis. Doing the nested analysis gives us information on variance components that could be used for planning future experiments. • For example, optimal allocation to decide how many tablets to test from each batch. If n’s are not equal, this is not the right way to do the test. We want to give greater weight to conditions where we have more data. • But how much weight we give to means depends on both variance components, not simple. o If there is no within tablet variance, only batch effects, then all means should count equally regardless of n. o If there is no batch effect, then each mean should be weighted proportional to n. o In general values should be weighted inversely proportional to their variances. o Nested ANOVA with proc MIXED or proc GLM does the right analysis.
proc ttest data=somemeans; class site; var mean_content; run; The TTEST Procedure
Variable
site
N
Lower CL Mean
Mean
Upper CL Mean
Lower CL Std Dev
Std Dev
Upper CL Std Dev
Std Err
mean_content
1
3
4.6289
5.0187
5.4084
0.0817
0.1569
0.9861
0.0906
mean_content
2
3
4.7099
5.068
5.4261
0.0751
0.1442
0.906
0.0832
mean_content
Diff (1-2)
-0.391
-0.049
0.2922
0.0903
0.1507
0.433
0.123
T-Tests Variable
Method
Variances
mean_content mean_content
Pooled Satterthwaite
Equal Unequal
DF
t Value
Pr > |t|
4 3.97
-0.40 -0.40
0.7089 0.7090
proc glm data=somemeans; class site; model mean_content = site; run; Dependent Variable: mean_content
DF
Sum of Squares
Mean Square
F Value
Pr > F
Model
1
0.00365067
0.00365067
0.16
0.7089
Error
4
0.09080267
0.02270067
Corrected Total
5
0.09445333
Source
•
As anticipated, a 2-sided t-test with equal variances is equivalent to the ANOVA F-test. o Same p-values. o Both tests make exactly the same assumptions so they had better be equivalent. o As usual F = t2 0.016 = 0.42
Suppose we have a=3 treatments, b=4 tanks per treatment, and n=6 fish per tank. Source Treatments Tanks(Treatment) Error Total
df 2 = 3-1 9 = 3*(4-1) 60 = 3*4*(6-1) = 12*(6-1) 71 = 3*4*6 – 1
SSTreat = 24 ∗ ( yTreat1 − y All ) + 24 ∗ ( yTreat 2 − y All ) + 24 ∗ ( yTreat 3 − y All ) 2
2
2
The Tank within Treatments, Tanks(Treatment), variance looks at how much tank means within the same treatment deviate from the mean for that treatment. Within each treatment we look use values of the mean for each tank minus the mean in that treatment. 2 2 ⎡6 ∗ ( y ⎤ Treat 1,Tank 1 − yTreat 1 ) + ... + 6 ∗ ( yTreat 1,Tank 4 − yTreat 1 ) + ⎢ ⎥ 2 2 ⎢ ⎥ SSTank (Treat ) = ⎢ 6 ∗ ( yTreat 2,Tank1 − yTreat 2 ) + ... + 6 ∗ ( yTreat 2,Tank 4 − yTreat 2 ) + ⎥ ⎢ ⎥ 2 2 ⎢ 6 ∗ ( yTreat 3,Tank1 − yTreat 1 ) + ... + 6 ∗ ( yTreat 3,Tank 4 − yTreat 1 ) + ⎥ ⎣ ⎦
Finally, MSError = Pooled variance using within tank variances of all 12 tanks. df = 12*(6-1)