Sample Size/Power Calculations

1

Why Power Analysis? •Research is expensive…wouldn’t want to conduct an experiment with far too 1. few experimental units (EUs) Project won’t find important differences that exist  not really worth doing

2. many experimental units (EUs) Project will be unnecessarily too expensive

•Typical granting agency requirement 2

A Simple Experimental Design • Effect of diet on blood pressure (mmHg) in rats • Consider a Completely Randomized Design (CRD) • 12 rats randomly assigned to one of two different diets • Trt 1: DASH diet…n=6 • Trt 2: Standard diet…n=6

• Investigator expects higher mean blood pressure (BP) at the end of 12 weeks when under Trt 2 • Is n=6 enough to detect this difference? 3

Statistical Analysis • Two competing hypotheses: Ho: m1=m2 H1: m1 a → fail to reject Ho, not enough evidence to conclude H1

• There are two possible incorrect conclusions based on this approach to inference 4

Type I and Type II errors

True unknown state

What the data indicate:

Fail to reject Ho: (P>a)

Reject Ho: (P≤a)

Ho

No error

Type I error (Prob is a)

H1

Type II error (Prob = b)

No error 5

So is n = 6 rats large enough? • Rephrase: Do we have enough statistical power? • Need to “know” two things 1.How large is the true mean difference (d = m2-m1)? a) What do you anticipate and/or want to detect? b) What would be economically/practically important?

2.How much variability (s) between rats within a grp? • Sometimes prior information available from pilot study or previously published studies • Otherwise need to make an educated “guess” • Always round up to be a little conservative 6

One way to elicit values for s • Empirical rule: Consider range of responses to be equal to 4s • Question to client: What would be the likely range (max-min) of responses for rats within the same trt? • Suppose the answer was 60 mmHg R = 60 → s = 15 mmHg. Suppose researchers also believe that d ≥20 mmHg is important R≈4s

7

Two competing hypotheses  2s 2  • Under Ho: y2  y1 ~ N  0,  n   2  2s   • Under H1: y2  y1 ~ N  d , n  

Currently assuming the data are Normal. Sole difference is in the mean of the distribution.

Conduct one-tailed z-test for a certain a Reject Ho: if z 

y2  y1 2s n

2

 za

if y2  y1  za

2s 2 n

8

Distributions of y2  y1 Ho:

Power  1-b



 1   za  d

2s / n 2

H1:

za

)

2s 2 n

a

1b

0

d 9

More reasonable statistical test • t-test

• Because you likely won’t be able to assume s2 is known • One-sided: Reject Ho if y2  y1 t  ta ,df 2 2 s1 s2  n n • Two-sided (H1: m1≠m2): Reject Ho: if

t  ta / 2,df • Use of t distribution results in more complicated alternative hypothesis distribution (non-central t) 10

Using SAS for power analysis proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=6 stddev=15 power=.; run;

or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=6 stddev=15 power=.; run; Similar to two-sided t-test

11

SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Normal Method Exact Number of Sides 1 Null Difference 0 Alpha 0.05 Mean Difference 20 Standard Deviation 15 Sample Size Per Group 6 Computed Power Power 0.693

12

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Alpha Group Means Standard Deviation Sample Size Per Group

Exact 0.05 0 20 15 6

Computed Power Power 0.550 Typically want power to be larger than 80% so more rats would be desirable 13

Using SAS for sample size proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 npergroup=. stddev=15 power=.80; run;

or proc power; onewayanova alpha=.05 test=overall groupmeans=(0 20) npergroup=. stddev=15 power=.80; run;

14

SAS Output Two-sample t Test for Mean Difference Fixed Scenario Elements Distribution Method Number of Sides Null Difference Alpha Mean Difference Standard Deviation Nominal Power

Normal Exact 1 0 0.05 20 15 0.8

Computed N Per Group Actual N Per Power Group 0.813 8 15

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements

Method Alpha Group Means Standard Deviation Nominal Power

Exact 0.05 0 20 15 0.8

Computed N Per Group Actual N Per Power Group 0.805 10

16

Generating Power Curve I proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=20 stddev=15 power=. npergroup=3 to 20 by 1; plot interpol=join yopts=(ref=0.80); run;

17

Power Curve for one-sided t test 1.0

0.9

0.8

0.8

Power

0.7

0.6

0.5

0.4

0.3 0

5

10

15

20

Sample Size Per Group 18

Generating Power Curve II proc power; twosamplemeans alpha=.05 nulldiff=0 sides=1 meandiff=10 to 30 by 1 stddev=15 npergroup=6 power=.; plot x=effect interpol=join yopts=(ref=0.80); run;

19

Determining sample size for a desired margin of error • Confidence interval y2  y1  tdf

s12 s22  n n

Margin of error

• Given guesstimates for the variances, one can set margin of error equal to desired amount and solve for n

What if more than two trts? • Example: In a study of vitamin supplementation, certain pigs are assigned to each of 5 treatment groups and weight gains over a specified time period are to be recorded. • Researchers anticipate mean responses to be 3.9, 4.1,4.2, 4.3 and 4.5 kg for the five treatments, respectively • Based on previous experience, they anticipate a within-treatment variance of about 0.30 kg2 • They want to know if n=4 animals per treatment would provide sufficient power for the ANOVA Ftest. 22

Linear model written two ways 1) Yij  mi  eij

Cell means model

2) Yij  m  ai  eij i= 1,....,r=5;

j = 1,2,…,n=4

Factor level effects model

eij ~ NIID  0, s 2 )

r

m

m i 1

r

i

r

 a i  mi  m   a i  0

i.e. Sum-to-zero constraints

i 1

23

One-way ANOVA table Source

Df

Treatment r-1

Error

r(n-1)

SS

MS

EMS

SSTrt

MSTrt

r  s 2  function    i 1

SSE

MSE

s2

ANOVA F-test: 1) Ho: m1=m2=m3=m4=m5

versus

H1: at least one mi≠mi’

2) Ho: ALL ai = 0

versus.

H1: at least one ai ≠ 0

Equivalent specs.

Note: if Ho: is true then both EMS = s2 such that F = MSTrt/MSE ~ Fr-1,r(n-1) Central F-distribution 24

Power determination for F-test  m1  3.9   m   4.1 • Under H1:  2    , or  m3    4.2       m 4   4.3  m5   4.5 

a1   0.3 a   0.1  2   with m  4.2 a 3    0.0      a  0.1  4   a 5   0.3

This means F = MSTrt/MSE ~ Fr-1,r(n-1),f

Non-central F-distribution (if f > 0)

f

n

s

2

r

a i 1

2 i

is the non-centrality parameter

“Corrected sum of squared means” (CSSM) =(-0.3)2+(-0.1)2+ +(0.0)2+ (0.1)2+(+0.3)2=0.20 for example 25

SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=(3.9 4.1 4.2 4.3 4.5) npergroup=4 stddev=0.5477 power=.; run;

This is the square root of 0.30

26

SAS Output Overall F Test for One-Way ANOVA Fixed Scenario Elements Method Exact Alpha 0.05 Group Means 3.9 4.1 4.2 4.3 4.5 Standard Deviation 0.5477 Sample Size Per Group 4

Computed Power Power 0.171

Very poorly underpowered….as designed, this would be a waste of time and money to run!! 27

SAS Code proc power; onewayanova alpha=.05 test=overall groupmeans=(3.9 4.1 4.2 4.3 4.5) npergroup=4 to 30 stddev=0.5477 power=.; plot interpol=join yopts=(ref=.80); run; Let’s look at a power curve to get an idea of the necessary sample size

28

1.0

0.9 0.8

0.8

Power

0.7

0.6

Looks like we need about 19 animals per group (almost 5 times the number before)

0.5

0.4

0.3

0.2

0.1 0

5

10

15

20

25

30

Sample Size Per Group 29

What if trt means unknown? • Use the “worst case” scenario • Conservative assessment of power • Just have to know the difference between the largest and smallest means or the smallest difference D that is scientifically meaningful • Use –D/2 and D/2 with all other means clumped at zero

Minimizes  a so it minimizes f 2 i

True power will be greater than or equal to this

30

SAS Code **Suppose D=0.6***; proc power; onewayanova alpha=.05 test=overall groupmeans=(-0.3 0 0 0 0.3) npergroup=4 to 30 stddev=0.5477 power=.; plot interpol=join yopts=(ref=.80); run;

31

1.0

0.9 0.8

0.8

Power

0.7

0.6

0.5

Looks like we need about 21 animals per group in the worst case

0.4

0.3

0.2

0.1 0

5

10

15

20

25

30

Sample Size Per Group

32

There is actually a “trick” to computing f using ANOVA software like PROC GLM/MIXED (O’Brien and Lohr, 1984) 1) Substitute “true means” for data in ANOVA. 2) Use the ANOVA table to compute the noncentrality parameter 3) Then use that computed value in power calculations!

33

Using “true means” for data Suppose you are interested in 3 treatments.

data oneway; input treatment mean; Anticipate true mean responses of 4.0, 4.3 and 4.6 datalines; 1 4.0 Anticipate residual variance of 0.30 1 4.0 1 4.0 Wish to compute power based on sample size of n= 4 1 4.0 for each treatment. 2 4.3 2 4.3 2 4.3 2 4.3 3 4.6 proc mixed data=oneway noprofile; Output the class treatment; 3 4.6 ANOVA model mean = treatment; 3 4.6 table to a parms (0.30) /noiter; 3 4.6 file called ods output tests3 = tests3; ;

run;

“tests3”

34

Trick to compute f " FTreatment "

• Compute the ANOVA treatment “F ratio” Obs

Effect

NumDF

DenDF

FValue

ProbF

1

treatment

2

9

1.20

0.3452

• Multiple “FTreatment” by numerator degrees of freedom (NumDF) to get f:

f  " FTreatment "* dfTreatment  1.2*2  2.4 • FTreatment is a function of CSSM.

35

Use f to computer power The critical value separating the “acceptance region” from the “rejection region”

data power; set tests3; Probability of falling in rejection region noncent = Fvalue*numdf; if H1 is true. alpha = 0.05; criticalvalue = Finv(1-alpha,numdf,dendf,0); Power = 1-Probf(criticalvalue,numdf,dendf,noncent); run; proc print data=power; run; Effect

Num DF

Den FValue DF

ProbF

noncent alpha

Critical value

treatment

2

9

0.3452

2.4

4.25649 0.20010

1.20

0.05

Power

36

PROC GLMPOWER does this data example1; Total number of input FactorA $ mean; experimental units datalines; Much simpler data 1 4.0 step 2 4.3 3 4.6 The GLMPOWER Procedure run; Fixed Scenario Elements proc glmpower data=example1 ; Dependent Variable mean Source FactorA class FactorA ; Alpha 0.05 model mean = FactorA ; Error Standard Deviation 0.548 power Total Sample Size 12 stddev = .548 Test Degrees of Freedom 2 ntotal = 12 Error Degrees of Freedom 9 power = . alpha=0.05; Computed Power run; Power 0.200

37

What about Factorial Designs? • An experiment was conducted to determine the effects of three different sources of dietary phosphorous and four different varieties of corn silage on daily milk production • Proposed a 3 x 4 factorial experiment: • Factor A, Dietary phosphorus : 1, 2, & 3 (a=3) • Factor B, Corn silage varieties: 1,2,3, & 4 (b=4).

• Each cow randomly assigned to just one particular A*B treatment combination. • How many cows should be considered? 38

Need to specify “true” means Power analysis requires “knowledge” of mij and s2. Suppose, investigator anticipates that: m11  37 m21  42 m31  47

m12  38 m22  43 m32  48

m13  44 m23  49 m33  54

m14  41 m24  46 m34  51

s2 = 5 kg2

Wishes to determine power for both main effects and two-way interaction and also the difference between, say, Level 1 and 2 of A m1.  m2. 

1 1  m11  m12  m13  m14 )   m21  m22  m23  m24 ) 4 4

39

Setup “data” data power; input FactorA FactorB cellmean; datalines; 1 1 37 1 2 38 1 3 44 1 4 41 2 1 42 symbol1 i=join; 2 2 43 proc gplot; 2 3 49 plot cellmean*FactorB=FactorA; 2 4 46 run; 3 1 47 3 2 48 3 3 54 3 4 51 run;

40

Profile means plot cellmean 54 53

Researcher anticipating no interaction

52 51 50 49 48

(Power analysis should still take its possiblity into account in ANOVA)

47 46 45 44 43 42 41 40 39 38 37 1

2

3

4

FactorB FactorA

1

2

3

41

Using GLMPower proc glmpower data=power ; class FactorA FactorB; model cellmean = FactorA | FactorB ; contrast 'A1 vs A2' FactorA 1 -1 0 FactorB 0 0 0 0 FactorA*FactorB 0.25 0.25 0.25 0.25 -0.25 -0.25 -0.25 -0.25 0 0 0 0 ; power stddev = 5 /* square root of residual standard deviation */ ntotal = 36 /* provides power determination for n =36/12 = 3 reps per group */ power = . /* Blank…because you want to compute power */ alpha=0.05; plot x=n min=24 max=96; /* power curve plot ranging from n = 24/12 to 96/12 */ run; 42

PROC GLMPOWER OUTPUT Fixed Scenario Elements Dependent Variable Alpha Error Standard Deviation Total Sample Size Error Degrees of Freedom

cellmean 0.05 5 36 24

Computed Power

Test Index Type Source DF Power 1 Effect FactorA 2 0.989 2 Effect FactorB 3 0.720 3 Effect FactorA*FactorB 6 0.050 4 Contrast A1 vs A2 1 0.652 43

Power curves

44