Contrast Coding Or: One of These Levels is Not Like the Others

Contrast Coding Or: One of These Levels is Not Like the Others Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11 Administrivia ● 3/10 (T...

Author: Earl Bradley

10 downloads 2 Views 723KB Size

Report

Download PDF

Recommend Documents

Is Christianity the only true religion, or one among others

Silly Love Sonnets. Like them or not, the Beatles are one of the most influential

of determining the probability that the window is coding or not

or Contrast

HONEYBEES DO NOT LIKE PICKLES OR CRANBERRIES

The Minority Which Is Not One

Martial Arts is not like other sports

Mother is like a flower; each one

Replace Breakfast or lunch with one of these shakeology recipes

Technology is not like water; it does not flow effortlessly

or Neck with contrast

COMPARISON or CONTRAST ESSAY

Models of caring, or acting as if one cared, about the welfare of others

One day is not enough! TOURIST GUIDE

172 Brothers or Others?

AD. One of the earliest of these texts is the Indovinello Veronese, a short riddle mixing

Cargo or not cargo, the value is the information

Disciple and theological training. levels of training to address these needs. The first level is discipleship

The Trinity is One God Not Three Gods. Boethius

Entrepreneurship is not planning by groups or

The Community of Others or What is a Humanist Critique of Empire?

Music is often considered one of the most temporal if not the most temporal of the

Trusting one another Expecting the best- of ourselves and others

Which one is better - JavaScript or jquery

Contrast Coding Or: One of These Levels is Not Like the Others

Scott Fraundorf (and Tuan Lam) MLM Reading Group – 03.10.11

Administrivia ●

3/10 (TODAY): Contrast coding overview

●

4/7: Simple vs main effects

●

4/21: Principal components analysis

●

1st week of May: Harald Baayen visit

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Why Use Contrast Coding? Scott's example study:

●

=

+

LOCATION OF DISFLUENCY

+

+

PRIOR KNOWLEDGE

SUBJECT

Examining recall memory for spoken discourse as a function of:

●

●

Location of disfluencies (categorical variable)

●

Prior story knowledge (continuous variable)

ITEM

Why Use Contrast Coding? Regression equation: Predicts values

●

●

Could use this to predict whether or not something will be remembered

=

+

LOCATION OF DISFLUENCY

+

+

PRIOR KNOWLEDGE

SUBJECT

But in cognitive psych:

● ●

Often interested in the effect of specific levels

●

Test which ones differ significantly

ITEM

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Contrast Coding 0.8

% of story recalled

0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4

Typical

●

●

Atypical

Fluent

Example: Fluent vs. disfluencies in typical locations vs. in atypical locations Which ones differ significantly?

Contrast Coding ●

●

Contrasts: Test differences between specific levels –

Same as a planned comparison in an ANOVA

–

Also analogous to a post-hoc test

Planned comparisons vs post-hoc tests –

If we are deciding tests post-hoc, greater chance of capitalizing on chance / spurious effect

–

Contrasts are set before you fit the model, but it would be possible to go back and change the contrasts afterwards

–

We are basically on the honor system here—no way to prove the comparison was planned ahead of time

Contrasts!

●

Contrasts like weighted sums of means –

●

In multiple regression / MLM context, also subject to other variables in the model

Using your scale to test what's different

Contrast Coding It looks like the Fluent stories might not be remembered as well. 0.8

Let's use a contrast to test this.

% of story recalled

0.75 0.7 0.65 0.6 0.55 0.5 0.45 0.4

Typical

Atypical

Fluent

Contrasts

TYPICAL

ATYPICAL

FLUENT

Question 1: Do disfluencies affect recall?

One side positive. One side negative.

Contrasts

This determines which levels are being compared (+ versus -)

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Contrast weights are assigned

Doesn't really matter which side you choose as the + side. It just affects the sign of the result, but not magnitude or statistical significance

One side positive. One side negative.

Contrasts

Codes add up to zero.

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Contrast weights are assigned

Also nice to have the absolute values of the + code and the – code sum to 1. (We'll see why later.) abs(.33) + abs(-.66) = 1

One side positive. One side negative.

Contrasts

Codes add up to zero.

.33

.33

-.66

TYPICAL

ATYPICAL

FLUENT

Does contrast differ significantly from zero? If so, difference between levels is significant.

Can conceptualize the comparison as: Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent) (holding other variables constant)

Contrasts .33 TYPICAL

.33 ATYPICAL

-.66 FLUENT

*

Contrast 1: .33(Typical) + .33 (Typical) - .66(Fluent)

Contrast Coding

*

0.8

0.75

Our first contrast reveals that fluent stories are remembered worse.

% of story recalled

0.7

0.65

Now let's look at Typical vs Atypical

0.6

0.55

0.5

0.45

0.4

Typical

Atypical

Fluent

We always have j – 1 contrasts, where j = the # of levels of the factor So, here 2 contrasts needed to fully describe

Contrasts

TYPICAL

ATYPICAL

Question 2: Does location of disfluencies matter?

One side positive. One side negative.

Contrasts

Codes add up to zero. Sum of absolute values of codes is 1.

.50

-.50

TYPICAL

ATYPICAL

0 FLUENT (zeroed out here!)

Contrast 2: .50(Typical) - .50(Atypical) + 0(Rest)

Contrast Coding

*

0.8

n.s.

0.75

% of story recalled

0.7

0.65

0.6

0.55

0.5

0.45

0.4

Typical

Atypical

Fluent

One Important Point! ●

Choice of contrasts doesn't affect total variance accounted for by variable ● ●

Only about differences between levels Can divide this up in multiple different ways and still account for same total variance

LOCATION IN STORY

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Why -.5 and .5? ●

Why [-.5 .5] instead of [-1 1]?

●

Doesn't affect significance test

●

Does affect β weight (estimate) –

FILLER LOCATION: [-.5 .5]

FILLER LOCATION: [-1 1]

Std error is also scaled accordingly

Contrast Estimates CONTRAST CODE ATYPICAL LOCATION

.5

}1 TYPICAL LOCATION

-.5

Beta weight (estimate) represents the effect of a 1-unit change in the contrast, holding everything else constant In this case, a 1-unit change in contrast IS the difference between the levels' codes Thus, the contrast correctly represents .04825 as the difference between the conditions

Contrast Estimates CONTRAST CODE ATYPICAL LOCATION

1

}2 TYPICAL LOCATION

-1

Here, the total difference between the levels' codes is 2 So, a 1-unit change in the contrast is only HALF the difference between the levels' codes Thus, the estimate of the contrast is .024 … only half the difference between the conditions

Contrast Estimates Beta weight (estimate) represents the effect of a 1-unit change in the contrast

CONTRAST CODE ATYPICAL LOCATION

CONTRAST CODE ATYPICAL LOCATION

.5

}2

}1 TYPICAL LOCATION

-.5

1 unit change in contrast IS the difference between levels (.04825 in this case)

1

TYPICAL LOCATION

-1

1 unit change in contrast IS only half the difference between levels

So Why -.5 and .5? ●

Better tell you about difference in means!

FILLER LOCATION: [-.5 .5]

FILLER LOCATION: [-1 1]

–

The actual difference between conditions is .048

–

It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

So Why -.5 and .5? ●

●

●

Better tell you about difference in means! –

The actual difference between conditions is .048

–

It would be perfectly correct to describe .024 as half the difference between levels and you could even put a CI around it … it's just less intuitive for your readers

Both contrasts would account for the same amount of variance This is just another case of deciding the scale of a variable –

Akin to measuring temperature in C versus F … both account for the same variance, but the numbers are on different scales

Imbalanced Designs ●

You may have an unequal number of observations per cell –

●

e.g. some data lost, or responses not codable

Correct for this in your contrast codes if you want things centered –

Ask Tuan or Scott about how to do this :)

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Contrasts in R ●

To check what the current contrasts are: –

●

contrasts(YourDataFrame$VariableName)

To set the contrasts: –

contrasts(YourDataFrame$VariableName) = cbind(c(.33,.33,-.66),c(.50,-.50,0)) ●

●

●

Each c(xx,yy,zz) is the weights for one of the contrasts you want to run e.g. (.33, .33, -.66) is one contrast

After setting contrasts, run lmer model to get the results of the contrasts

Contrasts in R ●

●

Should have j – 1 contrasts, where k = # of levels of the factor If using a subset of data, some levels of the factor may no longer be present –

e.g. you dropped a condition

–

But, R still “remembers” that these levels exist and will get mad you didn't specify enough contrasts

–

Fix this by reconverting to a factor: ●

YourDataFrame$Variable = factor(YourDataFrame$Variable)

Another R Tip ●

●

To see the mean of each level of an I.V.: –

tapply(YourDataFrame$DVName, YourDataFrame$IVName,mean)

–

Could also do median, sd, etc.

For a 2-way (or more!) table –

●

tapply(YourDataFrame$DVName, list(YourDataFrame$IVName1, YourDataFrame$IVName2), mean)

Doesn't work if you have missing values –

But Tuan has made a version of tapply that fixes this problem

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Multiple Comparisons (Here Comes Trouble!)

Multiple Comparisons ● ●

Lots of comparisons you can run Suppose we tested both young & older adults on the disfluency task: FLUENT / YOUNGER

TYPICAL / YOUNGER

ATYPICAL / YOUNGER

FLUENT / OLDER

TYPICAL / OLDER

ATYPICAL / OLDER

Multiple Comparisons ●

Some comparisons are (wholly or partial) redundant ●

●

● ●

Suppose we find typical > fluent, but typical and atypical don't reliably differ Should expect atypical > fluent (to at least some degree) Or, we find a main effect of age Would expect to find an effect of age within at least some conditions if we looked at them individually

Multiple Comparisons ●

●

Some comparisons are (wholly or partial) redundant j – 1 contrasts actually describe everything ●

FLUENT

MEAN OF: Typical Atypical

j = # of levels

}

.35730 TYPICAL Can calculate all differences between levels based on this!

ATYPICAL

}

.04825

Multiple Comparisons ●

Want to avoid multiple comparisons ●

●

●

Error rate increases if you run overlapping, redundant tests Suppose we have the wrong value for one of means (due to sampling error, etc.) In a single test, we set alpha so there is a 5% chance of incorrectly rejecting H0

.05

Multiple Comparisons ●

nd

But now we run a 2 test comparing that same “bad” condition to another condition ●

Outcome of this test is correlated with the previous one since they both refer to one of the same conditions

●

Not an independent 5% chance of error

●

Multiple tests compound Type I error rate

Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0

–

Matrix of contrast is made up of orthogonal vectors Can think of this as the contrasts being uncorrelated with each other

–

Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0 CONTRAST 1

CONTRAST 2

PRODUCT

TYPICAL

.33 25

.50

.165

ATYPICAL

.25 x -.50 .33

= -.165

FLUENT

x

-.66 -.5 x

0

+

0

=0

Orthogonality ●

Avoid this issue w/ orthogonal contrasts –

Products of weights (across contrasts) sum to 0

TYPICAL

ATYPICAL

FLUENT

CONTRAST 1

CONTRAST 2

PRODUCT

.50 25

.50

.25

x

-.50 .25 x

0

-.5 0 x -.50

.0

= +

.0

= .25

Corrections ●

“But, Scott, I really want to do more than j – 1 comparisons”

●

●

Can apply corrections to control Type I error Bonferroni: Multiply p value by # of comparisons –

●

Worst case scenario

Less conservative corrections may be available

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

How Does it Work?

=

+ PRIOR KNOWLEDGE

+

+

LOCATION OF DISFLUENCY

SUBJECT

Behind the scenes...

ITEM

How Does it Work?

=

+

PRIOR KNOWLEDGE

+

+ LOCATION OF DISFLUENCY

SUBJECT

Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Each categorical factor gets coded as j - 1 variables

●

●

j = number of levels in that factor

●

Number of contrasts you have

ITEM

How Does it Work? ●

Each coded variable represents one of your contrasts CONTRAST 1

Y=ββ00+ β1X1 + β2X2 + β3X3 + ... Value of contrast: β ●

2

.33 Sig. difference X = between 2 .33 levels if β differs -.66 from 0

if typical location for disfluencies if atypical if fluent

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Other Kinds of Coding ●

Dummy/Treatment Coding –

Compare all levels to a baseline level

–

Doesn't allow direct comparisons between non-baseline levels

–

R does this by defaultX2:(

Typical Atypical Fluent

1 0 0

X3

0 1 0

Other Kinds of Coding ●

●

Dummy/Treatment Coding –

Compare all levels to a baseline level

–

Doesn't allow comparisons between levels

–

R does this by default :(

Sum/Effects Coding –

Test whether each level differs from overall mean or from chance

Outline ●

Why use contrast coding?

●

Example contrasts

●

Contrast estimates

●

Contrasts in R

●

Multiple comparisons

●

How does it work?

●

Other kinds of coding

●

Interactions

Contrasts & Interactions ●

●

●

Contrasts also apply in cases where we have interactions between variables Interaction term represents whether the value of the contrast depends on another variable

We'll see some examples on the next slides

Interaction Example ●

Suppose we also sampled different age groups in the disfluency experiment –

3 x 2 design Story Type

Group

FLUENT

●

TYPICAL

ATYPICAL

YOUNG ADULTS

Fluent, young

Typical disfluencies, Atypical disfluencies, young young

OLDER ADULTS

Fluent, older

Typical disfluencies, Atypical disfluencies, older older

What are possible patterns of results?

Possible Result 1 9 8

YOUNG

7 6

SIGNIFICANT?

CONTRAST 1

yes

CONTRAST 2

no

AGE C1 x AGE

no no

C2 x AGE

no

5 4 3

●

2 1

–

0 Before Plot Point

After Plot Point

Contrast 1 significant

Rest of Story

●

Effect of disfluencies

Contrast 2 non-sig.

9

–

8

OLDER

7 6

●

5 4 3 2

No effect of age at all in this case –

1 0 Before Plot Point

After Plot Point

Rest of Story

Location irrelevant

Everything the same for both age groups

Possible Result 2 9 8

YOUNG

7 6

SIGNIFICANT?

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

no no

C2 x AGE

no

5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

●

9

Contrast 2 is now significant

8

–

7

OLDER

6

Typical > atypical

5 4

●

3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Still no effect of AGE

Possible Result 3 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes no

C2 x AGE

no

5 4 3

●

2 1

Before Plot Point

After Plot Point

Now, AGE effect –

0 Rest of Story

9 8 7

●

6

OLDER

SIGNIFICANT?

5

But, no interaction –

4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Older adults remember more across the board Disfluency effect is the same under both load conditions

Possible Result 4 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes yes

C2 x AGE

no

5 4 3

●

2 1 0 Before Plot Point

After Plot Point

Contrast 1 interacts with AGE

Rest of Story

– 9

Presence of disfluencies differs across age

8

●

7

OLDER

SIGNIFICANT?

6 5

●

4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Effect only for young adults

Contrast 2 (location) still same in all cases

Possible Result 5 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes yes

C2 x AGE

yes

5 4 3 2

●

1 0 Before Plot Point

After Plot Point

Rest of Story

Now, Contrast 2 also interacts with AGE –

9 8 7

OLDER

SIGNIFICANT?

6 5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Reversal of Typical vs Atypical effect across age

Possible Result 6 9 8

YOUNG

7 6

CONTRAST 1

yes

CONTRAST 2

yes

AGE C1 x AGE

yes no

C2 x AGE

yes

5 4 3 2

●

1 0 Before Plot Point

After Plot Point

Rest of Story

9

Contrast 2 interaction but not Contrast 1 –

Typical vs Atypical comparison does depend on age

–

Overall effect of having fillers does not

8 7

OLDER

SIGNIFICANT?

6 5 4 3 2 1 0 Before Plot Point

After Plot Point

Rest of Story

Interactions in R ●

Implementing interactions in an R model formula (lmer or otherwise): –

A + B ●

–

A * B ●

–

All possible interactions and main effects of A and B

A : B ●

●

Main effects of A and B, no interaction

Interaction of A and B, no main effect (unless you add it separately)

In, say, a corpus analysis with 20 predictors, you wouldn't want to test a 20-way interaction … but this lets you control what to include