Nested ANOVA Tablets Example Page We have contents of tablets for different sites of tablets and different batches of tablets

Nested ANOVA Tablets Example 17.10 Page 1011 We have contents of tablets for different sites of tablets and different batches of tablets. Obs site...
Author: Jason Benson
7 downloads 0 Views 56KB Size
Nested ANOVA

Tablets Example 17.10

Page 1011

We have contents of tablets for different sites of tablets and different batches of tablets. Obs

site

batch

content

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 1 1 1 1 1 2 2 2 2 2 3 3 3 3

5.03 5.10 5.25 4.98 5.05 4.64 4.73 4.82 4.95 5.06 5.10 5.15 5.20 5.08 5.14 5.05 4.96 5.12 5.12 5.05 5.46 5.15 5.18 5.18 5.11 4.90 4.95 4.86 4.86

30

2

3

5.07

proc glm data=tablets; class site batch; model content = site batch(site); * Include the random batch(site) effect; random batch(site)/ q test; * q means get expected mean squares including Q(); * test means use the right denominator MS for tests; * IGNORE! the p-values in the ANOVa table; run;

The GLM Procedure Dependent Variable: content

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

5

0.47226667

0.09445333

7.81

0.0002

Error

24

0.29020000

0.01209167

Corrected Total

29

0.76246667

Source

R-Square

Coeff Var

Root MSE

content Mean

0.619393

2.180346

0.109962

5.043333

Source site batch(site)

DF

Type I SS

Mean Square

F Value

Pr > F

1 4

0.01825333 0.45401333

0.01825333 0.11350333

1.51 9.39

0.2311 0.0001

Source

Type III Expected Mean Square

site

Var(Error) + 5 Var(batch(site)) + Q(site)

batch(site)

Var(Error) + 5 Var(batch(site))

Dependent Variable: content Source

DF

Type III SS

Mean Square

F Value

Pr > F

site

1

0.018253

0.018253

0.16

0.7089

Error Error: MS(batch(site))

4

0.454013

0.113503

DF

Type III SS

Mean Square

F Value

Pr > F

4

0.454013

0.113503

9.39

0.0001

24

0.290200

0.012092

Source batch(site) Error: MS(Error)

Based on the expected mean squares, we can find estimates of the variance components.

σˆ ε2 = MS Error = 0.01209167 2 σˆ batch ( site ) =

MS Batch( site) − MS Error 0.11350333 - 0.01209167 = = 0.02028 5 5

The MIXED procedure has many advantages over the GLM procedure, particularly for unbalanced data and longitudinal, repeated measures data. proc mixed data=tablets; class site batch; model content = site; * Don't include the random batch(site) effect; random batch(site); run; The Mixed Procedure Cov Parm

Estimate

batch(site) Residual

0.02028 0.01209

Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

-29.8 -25.8 -25.3 -26.2

Type 3 Tests of Fixed Effects

Effect site

Num DF

Den DF

F Value

Pr > F

1

4

0.16

0.7089

Interpreting the batch(site) effect and the test of a site effect proc means data=tablets noprint; var content; by site batch; output out=somemeans mean=mean_content var=var_content; proc print data=somemeans; run; Obs

site

1 2 3 4 5 6

1 1 1 2 2 2

batch 1 2 3 1 2 3

_TYPE_ 0 0 0 0 0 0

_FREQ_ 5 5 5 5 5 5

mean_

var_

content

content

5.082 4.840 5.134 5.060 5.216 4.928

0.01067 0.02825 0.00218 0.00435 0.01943 0.00767

proc glm data=somemeans; class site; model mean_content = site; run; Dependent Variable: mean_content

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

1

0.00365067

0.00365067

0.16

0.7089

Error

4

0.09080267

0.02270067

Corrected Total

5

0.09445333

Source

R-Square

Coeff Var

Root MSE

mean_content Mean

0.038650

2.987457

0.150667

5.043333

Source site



DF

Type I SS

Mean Square

F Value

Pr > F

1

0.00365067

0.00365067

0.16

0.7089

With equal sample sites, perfectly balanced data o Using individual values in the nested ANOVA, SSBatch(Site) is 5 times the SSBatch(Site) using batch means. ƒ The 5 comes from having 5 values in each batch mean. ƒ Using all values, each batch mean has 5 values. ƒ The batch within site effect looks at variability between batches within the same site category, deviations of batch means from the site means. ⎡5 ( 5.082 − 5.01867 )2 + 5 ( 4.84 − 5.01867 )2 + 5 ( 5.134 − 5.01867 )2 + ⎤ ⎥ ƒ SSB( A) = ⎢ ⎢⎣5 ( 5.060 − 5.068 )2 + 5 ( 5.261 − 5.068 )2 + 5 ( 4.928 − 5.068 )2 ⎥⎦ ƒ ƒ ƒ

Using only mean values for each batch, the within, error SS is ⎡( 5.082 − 5.01867 )2 + ( 4.84 − 5.01867 )2 + ( 5.134 − 5.01867 )2 + ⎤ ⎢ ⎥ ⎢⎣( 5.060 − 5.068 )2 + ( 5.261 − 5.068 )2 + ( 4.928 − 5.068 )2 ⎥⎦ SSB(A) has an extra multiplier of 5 compared to the SSError if we do a 1-way ANOVA with batch means.

o Using individual values in the nested ANOVA, MSBatch(Site) is 5 times the pooled variance using batch means. ƒ MSB(A) = 5*0.02270067 = 0.113503

o

The Site SS in the nested analysis is 5 times the Site MS in the ANOVA with batch means. ƒ Equations for SSA on page 1012 have multipliers of 15 ƒ There are 15 values for each A mean. 2 2 SSA = 15 ⎡( 5.01867 − 5.04333) + ( 5.068 − 5.04333) ⎤ ⎣ ⎦ 2 2 = 5 ⎡3 ( 5.01867 − 5.04333) + 3 ( 5.068 − 5.04333) ⎤ = 5 ∗ SSAY ' s ƒ ⎣ ⎦ 2 ⎡ 3 ( 5.01867 − 5.04333) + 3 ( 5.068 − 5.04333)2 ⎤ MSB( A) = SSA / df A = 5 ⎢ ⎥ = 5 ∗ MSAY ' s 1 ⎢⎣ ⎥⎦ ƒ

5*0.00365067 = 0.018253

o The test for a site effect in the nested analysis is the same as in a 1-way ANOVA F-test or t-test using the batch means. ƒ With balanced data and correlated measurements, e.g. tablets in the same batch, a way to derive independent measurements is to average correlated values and use these averages in the usual analysis. ƒ Doing the nested analysis gives us information on variance components that could be used for planning future experiments. • For example, optimal allocation to decide how many tablets to test from each batch. ƒ If n’s are not equal, this is not the right way to do the test. ƒ We want to give greater weight to conditions where we have more data. • But how much weight we give to means depends on both variance components, not simple. o If there is no within tablet variance, only batch effects, then all means should count equally regardless of n. o If there is no batch effect, then each mean should be weighted proportional to n. o In general values should be weighted inversely proportional to their variances. o Nested ANOVA with proc MIXED or proc GLM does the right analysis.

proc ttest data=somemeans; class site; var mean_content; run; The TTEST Procedure

Variable

site

N

Lower CL Mean

Mean

Upper CL Mean

Lower CL Std Dev

Std Dev

Upper CL Std Dev

Std Err

mean_content

1

3

4.6289

5.0187

5.4084

0.0817

0.1569

0.9861

0.0906

mean_content

2

3

4.7099

5.068

5.4261

0.0751

0.1442

0.906

0.0832

mean_content

Diff (1-2)

-0.391

-0.049

0.2922

0.0903

0.1507

0.433

0.123

T-Tests Variable

Method

Variances

mean_content mean_content

Pooled Satterthwaite

Equal Unequal

DF

t Value

Pr > |t|

4 3.97

-0.40 -0.40

0.7089 0.7090

proc glm data=somemeans; class site; model mean_content = site; run; Dependent Variable: mean_content

DF

Sum of Squares

Mean Square

F Value

Pr > F

Model

1

0.00365067

0.00365067

0.16

0.7089

Error

4

0.09080267

0.02270067

Corrected Total

5

0.09445333

Source



As anticipated, a 2-sided t-test with equal variances is equivalent to the ANOVA F-test. o Same p-values. o Both tests make exactly the same assumptions so they had better be equivalent. o As usual F = t2 ƒ 0.016 = 0.42

Suppose we have a=3 treatments, b=4 tanks per treatment, and n=6 fish per tank. Source Treatments Tanks(Treatment) Error Total

df 2 = 3-1 9 = 3*(4-1) 60 = 3*4*(6-1) = 12*(6-1) 71 = 3*4*6 – 1

SSTreat = 24 ∗ ( yTreat1 − y All ) + 24 ∗ ( yTreat 2 − y All ) + 24 ∗ ( yTreat 3 − y All ) 2

2

2

The Tank within Treatments, Tanks(Treatment), variance looks at how much tank means within the same treatment deviate from the mean for that treatment. Within each treatment we look use values of the mean for each tank minus the mean in that treatment. 2 2 ⎡6 ∗ ( y ⎤ Treat 1,Tank 1 − yTreat 1 ) + ... + 6 ∗ ( yTreat 1,Tank 4 − yTreat 1 ) + ⎢ ⎥ 2 2 ⎢ ⎥ SSTank (Treat ) = ⎢ 6 ∗ ( yTreat 2,Tank1 − yTreat 2 ) + ... + 6 ∗ ( yTreat 2,Tank 4 − yTreat 2 ) + ⎥ ⎢ ⎥ 2 2 ⎢ 6 ∗ ( yTreat 3,Tank1 − yTreat 1 ) + ... + 6 ∗ ( yTreat 3,Tank 4 − yTreat 1 ) + ⎥ ⎣ ⎦

Finally, MSError = Pooled variance using within tank variances of all 12 tanks. df = 12*(6-1)