Contrast Coding Or: One of These Levels is Not Like the Others
Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11
Administrivia ●
3/10 (TODAY): Contrast coding overview
●
4/7: Simple vs main effects
●
4/21: Principal components analysis
●
1st week of May: Harald Baayen visit
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Why Use Contrast Coding? Scott's example study:
●
=
+
LOCATION OF DISFLUENCY
+
+
PRIOR KNOWLEDGE
SUBJECT
Examining recall memory for spoken discourse as a function of:
●
●
Location of disfluencies (categorical variable)
●
Prior story knowledge (continuous variable)
ITEM
Why Use Contrast Coding? Regression equation: Predicts values
●
●
Could use this to predict whether or not something will be remembered
=
+
LOCATION OF DISFLUENCY
+
+
PRIOR KNOWLEDGE
SUBJECT
But in cognitive psych:
● ●
Often interested in the effect of specific levels
●
Test which ones differ significantly
ITEM
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Contrast Coding 0.8
% of story recalled
0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4
Typical
●
●
Atypical
Fluent
Example: Fluent vs. disfluencies in typical locations vs. in atypical locations Which ones differ significantly?
Contrast Coding ●
●
Contrasts: Test differences between specific levels –
Same as a planned comparison in an ANOVA
–
Also analogous to a post-hoc test
Planned comparisons vs post-hoc tests –
If we are deciding tests post-hoc, greater chance of capitalizing on chance / spurious effect
–
Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards
–
We are basically on the honor system here—no way to prove the comparison was planned ahead of time
Contrasts!
●
Contrasts like weighted sums of means –
●
In multiple regression / MLM context, also subject to other variables in the model
Using your scale to test what's different
Contrast Coding It looks like the Fluent stories might not be remembered as well. 0.8
Let's use a contrast to test this.
% of story recalled
0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4
Typical
Atypical
Fluent
Contrasts
TYPICAL
ATYPICAL
FLUENT
Question 1: Do disfluencies affect recall?
One side positive. One side negative.
Contrasts
This determines which levels are being compared (+ versus -)
.33
.33
-.66
TYPICAL
ATYPICAL
FLUENT
Contrast weights are assigned
Doesn't really matter which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical significance
One side positive. One side negative.
Contrasts
Codes add up to zero.
.33
.33
-.66
TYPICAL
ATYPICAL
FLUENT
Contrast weights are assigned
Also nice to have the absolute values of the + code and the – code sum to 1. (We'll see why later.) abs(.33) + abs(-.66) = 1
One side positive. One side negative.
Contrasts
Codes add up to zero.
.33
.33
-.66
TYPICAL
ATYPICAL
FLUENT
Does contrast differ significantly from zero? If so, difference between levels is significant.
Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)
Contrasts .33 TYPICAL
.33 ATYPICAL
-.66 FLUENT
*
Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)
Contrast Coding
*
0.8
0.75
Our first contrast reveals that fluent stories are remembered worse.
% of story recalled
0.7
0.65
Now let's look at Typical vs Atypical
0.6
0.55
0.5
0.45
0.4
Typical
Atypical
Fluent
We always have j – 1 contrasts, where j = the # of levels of the factor So, here 2 contrasts needed to fully describe
Contrasts
TYPICAL
ATYPICAL
Question 2: Does location of disfluencies matter?
One side positive. One side negative.
Contrasts
Codes add up to zero. Sum of absolute values of codes is 1.
.50
-.50
TYPICAL
ATYPICAL
0 FLUENT (zeroed out here!)
Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)
Contrast Coding
*
0.8
n.s.
0.75
% of story recalled
0.7
0.65
0.6
0.55
0.5
0.45
0.4
Typical
Atypical
Fluent
One Important Point! ●
Choice of contrasts doesn't affect total variance accounted for by variable ● ●
Only about differences between levels Can divide this up in multiple different ways and still account for same total variance
LOCATION IN STORY
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Why -.5 and .5? ●
Why [-.5 .5] instead of [-1 1]?
●
Doesn't affect significance test
●
Does affect β weight (estimate) –
FILLER LOCATION: [-.5 .5]
FILLER LOCATION: [-1 1]
Std error is also scaled accordingly
Contrast Estimates CONTRAST CODE ATYPICAL LOCATION
.5
}1 TYPICAL LOCATION
-.5
Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant In this case, a 1-unit change in contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions
Contrast Estimates CONTRAST CODE ATYPICAL LOCATION
1
}2 TYPICAL LOCATION
-1
Here, the total difference between the levels' codes is 2 So, a 1-unit change in the contrast is only HALF the difference between the levels' codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions
Contrast Estimates Beta weight (estimate) represents the effect of a 1-unit change in the contrast
CONTRAST CODE ATYPICAL LOCATION
CONTRAST CODE ATYPICAL LOCATION
.5
}2
}1 TYPICAL LOCATION
-.5
1 unit change in contrast IS the difference between levels (.04825 in this case)
1
TYPICAL LOCATION
-1
1 unit change in contrast IS only half the difference between levels
So Why -.5 and .5? ●
Better tell you about difference in means!
FILLER LOCATION: [-.5 .5]
FILLER LOCATION: [-1 1]
–
The actual difference between conditions is .048
–
It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
So Why -.5 and .5? ●
●
●
Better tell you about difference in means! –
The actual difference between conditions is .048
–
It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers
Both contrasts would account for the same amount of variance This is just another case of deciding the scale of a variable –
Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales
Imbalanced Designs ●
You may have an unequal number of observations per cell –
●
e.g. some data lost, or responses not codable
Correct for this in your contrast codes if you want things centered –
Ask Tuan or Scott about how to do this :)
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Contrasts in R ●
To check what the current contrasts are: –
●
contrasts(YourDataFrame$VariableName)
To set the contrasts: –
contrasts(YourDataFrame$VariableName) = cbind(c(.33,.33,-.66),c(.50,-.50,0)) ●
●
●
Each c(xx,yy,zz) is the weights for one of the contrasts you want to run e.g. (.33, .33, -.66) is one contrast
After setting contrasts, run lmer model to get the results of the contrasts
Contrasts in R ●
●
Should have j – 1 contrasts, where k = # of levels of the factor If using a subset of data, some levels of the factor may no longer be present –
e.g. you dropped a condition
–
But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts
–
Fix this by reconverting to a factor: ●
YourDataFrame$Variable = factor(YourDataFrame$Variable)
Another R Tip ●
●
To see the mean of each level of an I.V.: –
tapply(YourDataFrame$DVName, YourDataFrame$IVName,mean)
–
Could also do median, sd, etc.
For a 2-way (or more!) table –
●
tapply(YourDataFrame$DVName, list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)
Doesn't work if you have missing values –
But Tuan has made a version of tapply that fixes this problem
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Multiple Comparisons (Here Comes Trouble!)
Multiple Comparisons ● ●
Lots of comparisons you can run Suppose we tested both young & older adults on the disfluency task: FLUENT / YOUNGER
TYPICAL / YOUNGER
ATYPICAL / YOUNGER
FLUENT / OLDER
TYPICAL / OLDER
ATYPICAL / OLDER
Multiple Comparisons ●
Some comparisons are (wholly or partial) redundant ●
●
● ●
Suppose we find typical > fluent, but typical and atypical don't reliably differ Should expect atypical > fluent (to at least some degree) Or, we find a main effect of age Would expect to find an effect of age within at least some conditions if we looked at them individually
Multiple Comparisons ●
●
Some comparisons are (wholly or partial) redundant j – 1 contrasts actually describe everything ●
FLUENT
MEAN OF: Typical Atypical
j = # of levels
}
.35730 TYPICAL Can calculate all differences between levels based on this!
ATYPICAL
}
.04825
Multiple Comparisons ●
Want to avoid multiple comparisons ●
●
●
Error rate increases if you run overlapping, redundant tests Suppose we have the wrong value for one of means (due to sampling error, etc.) In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H0
.05
Multiple Comparisons ●
nd
But now we run a 2 test comparing that same “bad” condition to another condition ●
Outcome of this test is correlated with the previous one since they both refer to one of the same conditions
●
Not an independent 5% chance of error
●
Multiple tests compound Type I error rate
Orthogonality ●
Avoid this issue w/ orthogonal contrasts –
Products of weights (across contrasts) sum to 0
–
Matrix of contrast is made up of orthogonal vectors Can think of this as the contrasts being uncorrelated with each other
–
Orthogonality ●
Avoid this issue w/ orthogonal contrasts –
Products of weights (across contrasts) sum to 0 CONTRAST 1
CONTRAST 2
PRODUCT
TYPICAL
.33 25
.50
.165
ATYPICAL
.25 x -.50 .33
= -.165
FLUENT
x
-.66 -.5 x
0
+
0
=0
Orthogonality ●
Avoid this issue w/ orthogonal contrasts –
Products of weights (across contrasts) sum to 0
TYPICAL
ATYPICAL
FLUENT
CONTRAST 1
CONTRAST 2
PRODUCT
.50 25
.50
.25
x
-.50 .25 x
0
-.5 0 x -.50
.0
= +
.0
= .25
Corrections ●
“But, Scott, I really want to do more than j – 1 comparisons”
●
●
Can apply corrections to control Type I error Bonferroni: Multiply p value by # of comparisons –
●
Worst case scenario
Less conservative corrections may be available
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
How Does it Work?
=
+ PRIOR KNOWLEDGE
+
+
LOCATION OF DISFLUENCY
SUBJECT
Behind the scenes...
ITEM
How Does it Work?
=
+
PRIOR KNOWLEDGE
+
+ LOCATION OF DISFLUENCY
SUBJECT
Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Each categorical factor gets coded as j - 1 variables
●
●
j = number of levels in that factor
●
Number of contrasts you have
ITEM
How Does it Work? ●
Each coded variable represents one of your contrasts CONTRAST 1
Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Value of contrast: β ●
2
.33 Sig. difference X = between 2 .33 levels if β differs -.66 from 0
if typical location for disfluencies if atypical if fluent
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Other Kinds of Coding ●
Dummy/Treatment Coding –
Compare all levels to a baseline level
–
Doesn't allow direct comparisons between non-baseline levels
–
R does this by defaultX2:(
Typical Atypical Fluent
1 0 0
X3
0 1 0
Other Kinds of Coding ●
●
Dummy/Treatment Coding –
Compare all levels to a baseline level
–
Doesn't allow comparisons between levels
–
R does this by default :(
Sum/Effects Coding –
Test whether each level differs from overall mean or from chance
Outline ●
Why use contrast coding?
●
Example contrasts
●
Contrast estimates
●
Contrasts in R
●
Multiple comparisons
●
How does it work?
●
Other kinds of coding
●
Interactions
Contrasts & Interactions ●
●
●
Contrasts also apply in cases where we have interactions between variables Interaction term represents whether the value of the contrast depends on another variable
We'll see some examples on the next slides
Interaction Example ●
Suppose we also sampled different age groups in the disfluency experiment –
3 x 2 design Story Type
Group
FLUENT
●
TYPICAL
ATYPICAL
YOUNG ADULTS
Fluent, young
Typical disfluencies, Atypical disfluencies, young young
OLDER ADULTS
Fluent, older
Typical disfluencies, Atypical disfluencies, older older
What are possible patterns of results?
Possible Result 1 9 8
YOUNG
7 6
SIGNIFICANT?
CONTRAST 1
yes
CONTRAST 2
no
AGE C1 x AGE
no no
C2 x AGE
no
5 4 3
●
2 1
–
0 Before Plot Point
After Plot Point
Contrast 1 significant
Rest of Story
●
Effect of disfluencies
Contrast 2 non-sig.
9
–
8
OLDER
7 6
●
5 4 3 2
No effect of age at all in this case –
1 0 Before Plot Point
After Plot Point
Rest of Story
Location irrelevant
Everything the same for both age groups
Possible Result 2 9 8
YOUNG
7 6
SIGNIFICANT?
CONTRAST 1
yes
CONTRAST 2
yes
AGE C1 x AGE
no no
C2 x AGE
no
5 4 3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
●
9
Contrast 2 is now significant
8
–
7
OLDER
6
Typical > atypical
5 4
●
3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
Still no effect of AGE
Possible Result 3 9 8
YOUNG
7 6
CONTRAST 1
yes
CONTRAST 2
yes
AGE C1 x AGE
yes no
C2 x AGE
no
5 4 3
●
2 1
Before Plot Point
After Plot Point
Now, AGE effect –
0 Rest of Story
9 8 7
●
6
OLDER
SIGNIFICANT?
5
But, no interaction –
4 3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
Older adults remember more across the board Disfluency effect is the same under both load conditions
Possible Result 4 9 8
YOUNG
7 6
CONTRAST 1
yes
CONTRAST 2
yes
AGE C1 x AGE
yes yes
C2 x AGE
no
5 4 3
●
2 1 0 Before Plot Point
After Plot Point
Contrast 1 interacts with AGE
Rest of Story
– 9
Presence of disfluencies differs across age
8
●
7
OLDER
SIGNIFICANT?
6 5
●
4 3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
Effect only for young adults
Contrast 2 (location) still same in all cases
Possible Result 5 9 8
YOUNG
7 6
CONTRAST 1
yes
CONTRAST 2
yes
AGE C1 x AGE
yes yes
C2 x AGE
yes
5 4 3 2
●
1 0 Before Plot Point
After Plot Point
Rest of Story
Now, Contrast 2 also interacts with AGE –
9 8 7
OLDER
SIGNIFICANT?
6 5 4 3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
Reversal of Typical vs Atypical effect across age
Possible Result 6 9 8
YOUNG
7 6
CONTRAST 1
yes
CONTRAST 2
yes
AGE C1 x AGE
yes no
C2 x AGE
yes
5 4 3 2
●
1 0 Before Plot Point
After Plot Point
Rest of Story
9
Contrast 2 interaction but not Contrast 1 –
Typical vs Atypical comparison does depend on age
–
Overall effect of having fillers does not
8 7
OLDER
SIGNIFICANT?
6 5 4 3 2 1 0 Before Plot Point
After Plot Point
Rest of Story
Interactions in R ●
Implementing interactions in an R model formula (lmer or otherwise): –
A + B ●
–
A * B ●
–
All possible interactions and main effects of A and B
A : B ●
●
Main effects of A and B, no interaction
Interaction of A and B, no main effect (unless you add it separately)
In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include