Contrast Coding Or: One of These Levels is Not Like the Others

Contrast Coding Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11 Administrivia ● 3/10 (T...
Author: Earl Bradley
10 downloads 2 Views 723KB Size
Contrast Coding Or: One of These Levels is Not Like the Others

Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11

Administrivia ●

3/10 (TODAY): Contrast coding overview



4/7: Simple vs main effects



4/21: Principal components analysis



1st week of May: Harald Baayen visit

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Why Use Contrast Coding? Scott's example study:



=

+

LOCATION OF DISFLUENCY

+

+

PRIOR KNOWLEDGE

SUBJECT

Examining recall memory for spoken discourse as a function of:





Location of disfluencies (categorical variable)



Prior story knowledge (continuous variable)

ITEM

Why Use Contrast Coding? Regression equation: Predicts values





Could use this to predict whether or not something will be remembered

=

+

LOCATION OF DISFLUENCY

+

+

PRIOR KNOWLEDGE

SUBJECT

But in cognitive psych:

● ●

Often interested in the effect of specific levels



Test which ones differ significantly

ITEM

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Contrast Coding 0.8

% of story recalled

0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4

Typical





Atypical

Fluent

Example: Fluent vs. disfluencies in typical locations vs. in atypical locations Which ones differ significantly?

Contrast Coding ●



Contrasts: Test differences between specific levels –

Same as a planned comparison in an ANOVA



Also analogous to a post-hoc test

Planned comparisons vs post-hoc tests –

If we are deciding tests post-hoc, greater chance of capitalizing on chance / spurious effect



Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards



We are basically on the honor system here—no way to prove the comparison was planned ahead of time

Contrasts!



Contrasts like weighted sums of means –



In multiple regression / MLM context, also subject to other variables in the model

Using your scale to test what's different

Contrast Coding It looks like the Fluent stories might not be remembered as well. 0.8

Let's use a contrast to test this.

% of story recalled

0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4

Typical

Atypical

Fluent

Contrasts

TYPICAL

ATYPICAL

FLUENT

Question 1: Do disfluencies affect recall?

One side positive. One side negative.

Contrasts

This determines which levels are being compared (+ versus -)

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Contrast weights are assigned

Doesn't really matter which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical significance

One side positive. One side negative.

Contrasts

Codes add up to zero.

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Contrast weights are assigned

Also nice to have the absolute values of the + code and the – code sum to 1. (We'll see why later.) abs(.33) + abs(-.66) = 1

One side positive. One side negative.

Contrasts

Codes add up to zero.

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Does contrast differ significantly from zero? If so, difference between levels is significant.

Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)

Contrasts .33 TYPICAL

.33 ATYPICAL

-.66 FLUENT

*

Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)

Contrast Coding

*

0.8

0.75

Our first contrast reveals that fluent stories are remembered worse.

% of story recalled

0.7

0.65

Now let's look at Typical vs Atypical

0.6

0.55

0.5

0.45

0.4

Typical

Atypical

Fluent

We always have j – 1 contrasts, where j = the # of levels of the factor So, here 2 contrasts needed to fully describe

Contrasts

TYPICAL

ATYPICAL

Question 2: Does location of disfluencies matter?

One side positive. One side negative.

Contrasts

Codes add up to zero. Sum of absolute values of codes is 1.

.50

-.50

TYPICAL

ATYPICAL

0 FLUENT (zeroed out here!)

Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)

Contrast Coding

*

0.8

n.s.

0.75

% of story recalled

0.7

0.65

0.6

0.55

0.5

0.45

0.4

Typical

Atypical

Fluent

One Important Point! ●

Choice of contrasts doesn't affect total variance accounted for by variable ● ●

Only about differences between levels Can divide this up in multiple different ways and still account for same total variance

LOCATION IN STORY

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Why -.5 and .5? ●

Why [-.5 .5] instead of [-1 1]?



Doesn't affect significance test



Does affect β weight (estimate) –

FILLER LOCATION: [-.5 .5]

FILLER LOCATION: [-1 1]

Std error is also scaled accordingly

Contrast Estimates CONTRAST CODE ATYPICAL LOCATION

.5

}1 TYPICAL LOCATION

-.5

Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant In this case, a 1-unit change in contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions

Contrast Estimates CONTRAST CODE ATYPICAL LOCATION

1

}2 TYPICAL LOCATION

-1

Here, the total difference between the levels' codes is 2 So, a 1-unit change in the contrast is only HALF the difference between the levels' codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions

Contrast Estimates Beta weight (estimate) represents the effect of a 1-unit change in the contrast

CONTRAST CODE ATYPICAL LOCATION

CONTRAST CODE ATYPICAL LOCATION

.5

}2

}1 TYPICAL LOCATION

-.5

1 unit change in contrast IS the difference between levels (.04825 in this case)

1

TYPICAL LOCATION

-1

1 unit change in contrast IS only half the difference between levels

So Why -.5 and .5? ●

Better tell you about difference in means!

FILLER LOCATION: [-.5 .5]

FILLER LOCATION: [-1 1]



The actual difference between conditions is .048



It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

So Why -.5 and .5? ●





Better tell you about difference in means! –

The actual difference between conditions is .048



It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

Both contrasts would account for the same amount of variance This is just another case of deciding the scale of a variable –

Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales

Imbalanced Designs ●

You may have an unequal number of observations per cell –



e.g. some data lost, or responses not codable

Correct for this in your contrast codes if you want things centered –

Ask Tuan or Scott about how to do this :)

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Contrasts in R ●

To check what the current contrasts are: –



contrasts(YourDataFrame$VariableName)

To set the contrasts: –

contrasts(YourDataFrame$VariableName) = cbind(c(.33,.33,-.66),c(.50,-.50,0)) ●





Each c(xx,yy,zz) is the weights for one of the contrasts you want to run e.g. (.33, .33, -.66) is one contrast

After setting contrasts, run lmer model to get the results of the contrasts

Contrasts in R ●



Should have j – 1 contrasts, where k = # of levels of the factor If using a subset of data, some levels of the factor may no longer be present –

e.g. you dropped a condition



But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts



Fix this by reconverting to a factor: ●

YourDataFrame$Variable = factor(YourDataFrame$Variable)

Another R Tip ●



To see the mean of each level of an I.V.: –

tapply(YourDataFrame$DVName, YourDataFrame$IVName,mean)



Could also do median, sd, etc.

For a 2-way (or more!) table –



tapply(YourDataFrame$DVName, list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)

Doesn't work if you have missing values –

But Tuan has made a version of tapply that fixes this problem

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Multiple Comparisons (Here Comes Trouble!)

Multiple Comparisons ● ●

Lots of comparisons you can run Suppose we tested both young & older adults on the disfluency task: FLUENT / YOUNGER

TYPICAL / YOUNGER

ATYPICAL / YOUNGER

FLUENT / OLDER

TYPICAL / OLDER

ATYPICAL / OLDER

Multiple Comparisons ●

Some comparisons are (wholly or partial) redundant ●



● ●

Suppose we find typical > fluent, but typical and atypical don't reliably differ Should expect atypical > fluent (to at least some degree) Or, we find a main effect of age Would expect to find an effect of age within at least some conditions if we looked at them individually

Multiple Comparisons ●



Some comparisons are (wholly or partial) redundant j – 1 contrasts actually describe everything ●

FLUENT

MEAN OF: Typical Atypical

j = # of levels

}

.35730 TYPICAL Can calculate all differences between levels based on this!

ATYPICAL

}

.04825

Multiple Comparisons ●

Want to avoid multiple comparisons ●





Error rate increases if you run overlapping, redundant tests Suppose we have the wrong value for one of means (due to sampling error, etc.) In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H0

.05

Multiple Comparisons ●

nd

But now we run a 2 test comparing that same “bad” condition to another condition ●

Outcome of this test is correlated with the previous one since they both refer to one of the same conditions



Not an independent 5% chance of error



Multiple tests compound Type I error rate

Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0



Matrix of contrast is made up of orthogonal vectors Can think of this as the contrasts being uncorrelated with each other



Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0 CONTRAST 1

CONTRAST 2

PRODUCT

TYPICAL

.33 25

.50

.165

ATYPICAL

.25 x -.50 .33

= -.165

FLUENT

x

-.66 -.5 x

0

+

0

=0

Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0

TYPICAL

ATYPICAL

FLUENT

CONTRAST 1

CONTRAST 2

PRODUCT

.50 25

.50

.25

x

-.50 .25 x

0

-.5 0 x -.50

.0

= +

.0

= .25

Corrections ●

“But, Scott, I really want to do more than j – 1 comparisons”





Can apply corrections to control Type I error Bonferroni: Multiply p value by # of comparisons –



Worst case scenario

Less conservative corrections may be available

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

How Does it Work?

=

+ PRIOR KNOWLEDGE

+

+

LOCATION OF DISFLUENCY

SUBJECT

Behind the scenes...

ITEM

How Does it Work?

=

+

PRIOR KNOWLEDGE

+

+ LOCATION OF DISFLUENCY

SUBJECT

Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Each categorical factor gets coded as j - 1 variables





j = number of levels in that factor



Number of contrasts you have

ITEM

How Does it Work? ●

Each coded variable represents one of your contrasts CONTRAST 1

Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Value of contrast: β ●

2

.33 Sig. difference X = between 2 .33 levels if β differs -.66 from 0

if typical location for disfluencies if atypical if fluent

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Other Kinds of Coding ●

Dummy/Treatment Coding –

Compare all levels to a baseline level



Doesn't allow direct comparisons between non-baseline levels



R does this by defaultX2:(

Typical Atypical Fluent

1 0 0

X3

0 1 0

Other Kinds of Coding ●



Dummy/Treatment Coding –

Compare all levels to a baseline level



Doesn't allow comparisons between levels



R does this by default :(

Sum/Effects Coding –

Test whether each level differs from overall mean or from chance

Outline ●

Why use contrast coding?



Example contrasts



Contrast estimates



Contrasts in R



Multiple comparisons



How does it work?



Other kinds of coding



Interactions

Contrasts & Interactions ●





Contrasts also apply in cases where we have interactions between variables Interaction term represents whether the value of the contrast depends on another variable

We'll see some examples on the next slides

Interaction Example ●

Suppose we also sampled different age groups in the disfluency experiment –

3 x 2 design Story Type

Group

FLUENT



TYPICAL

ATYPICAL

YOUNG ADULTS

Fluent, young

Typical disfluencies, Atypical disfluencies, young young

OLDER ADULTS

Fluent, older

Typical disfluencies, Atypical disfluencies, older older

What are possible patterns of results?

Possible Result 1 9 8

YOUNG

7 6

SIGNIFICANT?

CONTRAST 1

yes

CONTRAST 2

no

AGE C1 x AGE

no no

C2 x AGE

no

5 4 3



2 1



0 Before Plot Point

After Plot Point

Contrast 1 significant

Rest of Story



Effect of disfluencies

Contrast 2 non-sig.

9



8

OLDER

7 6



5 4 3 2

No effect of age at all in this case –

1 0 Before Plot Point

After Plot Point

Rest of Story

Location irrelevant

Everything the same for both age groups

Possible Result 2 9 8

YOUNG

7 6

SIGNIFICANT?

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

no no

C2 x AGE

no

5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story



9

Contrast 2 is now significant

8



7

OLDER

6

Typical > atypical

5 4



3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Still no effect of AGE

Possible Result 3 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes no

C2 x AGE

no

5 4 3



2 1

Before Plot Point

After Plot Point

Now, AGE effect –

0 Rest of Story

9 8 7



6

OLDER

SIGNIFICANT?

5

But, no interaction –

4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Older adults remember more across the board Disfluency effect is the same under both load conditions

Possible Result 4 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes yes

C2 x AGE

no

5 4 3



2 1 0 Before Plot Point

After Plot Point

Contrast 1 interacts with AGE

Rest of Story

– 9

Presence of disfluencies differs across age

8



7

OLDER

SIGNIFICANT?

6 5



4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Effect only for young adults

Contrast 2 (location) still same in all cases

Possible Result 5 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes yes

C2 x AGE

yes

5 4 3 2



1 0 Before Plot Point

After Plot Point

Rest of Story

Now, Contrast 2 also interacts with AGE –

9 8 7

OLDER

SIGNIFICANT?

6 5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Reversal of Typical vs Atypical effect across age

Possible Result 6 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes no

C2 x AGE

yes

5 4 3 2



1 0 Before Plot Point

After Plot Point

Rest of Story

9

Contrast 2 interaction but not Contrast 1 –

Typical vs Atypical comparison does depend on age



Overall effect of having fillers does not

8 7

OLDER

SIGNIFICANT?

6 5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Interactions in R ●

Implementing interactions in an R model formula (lmer or otherwise): –

A + B ●



A * B ●



All possible interactions and main effects of A and B

A : B ●



Main effects of A and B, no interaction

Interaction of A and B, no main effect (unless you add it separately)

In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include

Suggest Documents