## Multiple Regression Analysis

Multiple Regression Analysis y = β0 + β1x1 + β2x2 + . . . βkxk + u 5. Dummy Variables 0 Dummy Variables !   A dummy variable (or “indicator variabl...
Author: Elinor Williams
Multiple Regression Analysis y = β0 + β1x1 + β2x2 + . . . βkxk + u 5. Dummy Variables

0

Dummy Variables !   A dummy variable (or “indicator variable” is a variable that takes on the value 1 or 0 Examples: !   male (=1 if male, 0 otherwise) !   south (=1 if in the south, 0 otherwise) !   year70 (=1 if year=1970, 0 otherwise) !   Dummy variables are also called binary variables, for obvious reasons 1

A Dummy Independent Variable !   Dummy variables can be quite useful in regression ! ! ! ! ! !

analysis   Consider the simple model with one continuous variable (x) and one dummy (d)   y = β0 + δ0d + β1x + u   We can interpret d as an intercept shift   If d = 0, then y = β0 + β1x + u   If d = 1, then y = (β0 + δ0) + β1x + u   Think of the case where d = 0 as the base group (or reference group)

2

Example of δ0 > 0 y

y=(β0 + δ0) + β1x d= 1

δ0

{

d=0

} β0

slope = β1 for both!

y = β0 + β1x x 3

Dummy Trap !   Suppose we include a dummy variable called ! ! ! !

female (=1 if female, 0 otherwise)   We could not also include a dummy variable called male (=1 if male, 0 otherwise)   This is because for all i, female=1-male;  perfect colinearity   Intuitively it wouldn’t make sense to have both, we only need 2 intercepts for the 2 groups   In general, with n groups use n-1 dummies 4

Dummies With log(y) !   Not surprisingly, when y is in log form the

coefficients have a percentage interpretation !   Suppose the model is: log(y) = β0 + δ0d + β1x + u !   100(δ0) = is (very roughly) the percentage change in y between two groups holding x fixed Example: !   Regression of log of housing prices on dummy for “heritage” and square feet 5

Interpreting coefficient on dummy with logs Actually, it’s not correct to say that 100*(δ0) is the percent change in y associated with a 1 unit change in x. Because the model is non-linear, we have to think of the RHS parameters as partial derivatives of the LHS w.r.t. the RHS variables of interest. But when we take a derivative, we are asking what the change in the LHS is as we make a small change in one of the RHS variables, holding the others constant. Changing a dummy variable from 0 to 1is not a small change. It’s a discrete jump. 6

Interpreting coefficient on dummy with log y Changing d from 0 to 1 leads to a 100 * (eδ 0 − 1)% change in y. Changing d from 1 to 0 leads to a 100 * (e −δ 0 − 1)% change in y. For instance, if δ0=0.2, this implies that changing d from 0 to 1 leads to a 22% increase in y. Changing d from 1 to 0 leads to an 18% decrease in y. For small values of δ0, the “100*δ0%” approximation will be close. Not so for larger values. 7

Interpreting coefficient on dummy with log y Things get slightly more complicated when we obtain an estimate of δ0. You can’t just plug the OLS estimate of δ0 into the formula on the previous slide, as this will produce a biased estimate of the effect of changing d from 0 to 1. Instead, for an estimate of the effect of changing d from 0 to 1 use the formula: ⎛ (δˆ0 − 12 (se(δˆ0 ))2 ) ⎞ 100 * ⎜ e − 1⎟ % ⎝ ⎠ For more on quirks involving dummy variables, see http://davegiles.blogspot.ca/2011/03/dummies-fordummies.html 8

Dummies for Multiple Categories 1. Several dummy variables can be included in the same equation, e.g., y = β0 + δ0d0 + δ1d1 + β1x + u !   Here the base case is whatever is implied by d0 = d1 = 0 (i.e. when all the dummy variables are zero) Example: log(hprice)= β0 + δ0heritage + δ1pool + β1sqft + u !   The base case is a non-heritage house with no pool !   δ0 gives information about the effect of heritage status (vs. not) on house price; δ1 gives us information about how a pool changes house value 9

Dummies for Multiple Categories 2. We can use dummy variables to control for one variable with more than two categories !   Suppose everyone in your data is either a HS dropout, HS grad only, or college grad (at least) !   To compare HS and college grads to HS dropouts, include 2 dummy variables hsgrad = 1 if HS grad only, 0 otherwise; and colgrad = 1 if college grad, 0 otherwise

10

Multiple Categories (cont) !   Any categorical variable can be turned into a set

of dummy variables !   However, because the base group is represented by the intercept, if there are n categories there should be n-1 dummy variables Example: Could turn # of bedrooms into dummies !   If there are a lot of categories, it may make sense to group some of them together Example: Age, usually divide into age categories !   In fact, can convert continuous variables as well 11

Other Interactions with Dummies !   Can also consider interacting a dummy variable, d, ! ! ! ! !

with a continuous variable, x   y = β0 + δ0d + β1x + δ1d*x + u   If d = 0, then y = β0 + β1x + u   If d = 1, then y = (β0 + δ0) + (β1 + δ1)x + u   i.e. the dummy variable allows both the intercept and the slope to change depending on whether it is “switched” on or off   Example: suppose δ0>0 and δ1 0 and δ1 < 0 y y = β0 + β1x d=0 d=1 y = (β0 + δ0) + (β1 + δ1) x x

15

Testing for Differences Across Groups !   Sometimes we will want to test whether a

regression function (intercept and slopes) differs across groups Example: Wage Equation log(wage)=β0 + β1educ + β2exper + u !   We might think that the intercept and slopes will differ for men and women (e.g. discrimination), so log(wage)=β0 + δ0male + β1educ + δ1male*educ + β2exper + δ2male*exper + u 16

Testing Across Groups (cont) !   This can be thought of as simply testing for the joint significance of the dummy and its interactions with all other x variables !   So, we could estimate the model with all the interactions (unrestricted) and without (restricted) and form the F statistic !   However, forming all of the interaction variables etc. can be unwieldy !   There is an alternative way to form the F statistic 17

The Chow Test !   It turns out that we can compute the proper F

statistic without estimating the unrestricted model !   In the general model with an intercept and k continuous variables we can test for differences across two groups as follows: 1. Run the restricted model for group one and get SSR1, then for group two and get SSR2 2. Run the restricted model for all to get SSR, then

[ SSR − (SSR1 + SSR2 )] [n − 2(k + 1)] F= • SSR1 + SSR2

k +1

18

The Chow Test (continued) !   Notice that the Chow test is really just a simple F

test for exclusion restrictions, but we’ve realized that SSRur = SSR1 + SSR2 !   Unrestricted model is exactly the same as running the restricted model for the groups separately !   Note, we have k + 1 restrictions (each of the slope coefficients and the intercept) !   Note also, the unrestricted model would estimate 2 different intercepts and 2 different slope coefficients, so the denominator df is n – 2k – 2 19

Linear Probability Model !   Binary variables can also be introduced on the ! ! ! ! ! !

LHS of an equation   e.g. y=1 if employed, 0 otherwise   In this setting the βj’s have a different interpretation   As long as E(u|x)=0, then   E(y|x) = β0 + β1x1 + … + βkxk , or   P(y = 1|x) = β0 + β1x1 + … + βkxk   The probability of “success” is a linear function of the x’s 20

Linear Probability Model (cont) !   So, the interpretation of βj is the change in the ! ! ! !

probability of success (y=1) when xj changes, holding the other x’s fixed   If xj is education and y is as above, βj is the change in the probability of being employed when education increases by 1 year   Predicted y is the predicted probability of success   One potential problem is that our predictions can be outside [0,1]   How can a probability be greater than 1? 21

Linear Probability Model (cont) !   Even without predictions outside of [0,1], we may ! ! ! ! !

estimate marginal effects that are greater than 1 and less than -1   Usually best to use values of independent variables near the sample means   This model will violate assumption of homoskedasticity   Problem?   Standard errors are not valid (correctable)   Despite drawbacks, it’s usually a good place to start when y is binary

22

Nonlinear Alternatives to LPM !   If you go on to practice econometrics with limited dependent variables (such as dummies), you will learn about some non-linear models that may be more appropriate   





e.g. logit, probit models These are little more complicated to work with than LPM (marginal effects differ from coefficient estimates, as in other nonlinear models) Advantages: They restrict predicted values to lie between 0 and 1; don’t have built-in heteroskedasticity like LPM EViews and other packages can estimate such models 23

Policy Analysis Using Dummy Variables !   Policy analysts often use dummies to measure the effect of a government program   

e.g., What’s the effect of raising the minimum wage? A simple before/after comparison could be done like this   Take a sample of low-wage firms (say, fast food restaurants) in

a state or province that is changing its minimum wage law.   Sample restaurants before minimum wage increase, then again after minimum wage increase (could use same restaurants or different ones, as long as sampling is random)   Estimate model of staffing levels, where Post is dummy indicating whether the observation is for a firm in the postincrease regime or the pre-increase regime (=1 if post)

# employeesit = β 0 + β1Post it + uit 24

Policy Analysis Using Dummy Variables !   If zero conditional mean assumption holds, then

βˆ1 gives us the effect of increasing the minimum

wage on employment levels at these firms. What sign do you expect? 



Do you see any reason the zero conditional mean assumption might fail? What if something else changes as the policy regime changes?       



Economy gets better or worse. Taxes on business change Immigration increases size of local labour force Any of these changes, if not controlled for, would be captured in u and would be correlated with Post. violation of zero conditional mean assumption

Is there a way (other than controlling for all possible things that might change simultaneously with minimum wage law) we can deal with this problem? 25

Policy Analysis Using Dummy Variables !   It would be nice if we had a “control group”—a bunch of similar firms existing under similar circumstances that didn’t experiment the minimum wage increase. 

How about fast food restaurants on the other side of the state/provincial border?   Similar economic circumstances   Similar labour supply issues

# employeesit = β 0 + β1Post it + β 2 NJ + β 3 Post * NJ it + uit

!   NJ=1 if firm is in NJ; =0 if firm is in Pennsylvania !   Post*NJ is an interaction of the two dummies 26

Policy Analysis Using Dummy Variables !   What this analysis does is look at the effect of an increase in the minimum wage in NJ, using fastfood restaurants in Pennsylvania as a control group. 



We have to assume that other determinants of employment that change at the same time as the minimum wage change are occurring in both states and with equal magnitudes (recessions, booms, etc.) One way to reassure ourselves (other than looking at data on the local economies) is to look at employment trends in FF restaurants in both states. If they tend to move together closely, that will provide some reassurance about our “identifying assumption” 27

Policy Analysis Using Dummy Variables !   Note that the ideal experiment would let us observe NJ at one point in time, simultaneously in 2 states of the world   

The state with no minimum wage increase The state with minimum wage increase

!   Unfortunately we can’t observe these parallel universes; with this approach we’re trying to do the next best thing 





We’re inferring a counterfactual (parallel universe where NJ experiences no minimum wage increase), by observing what happens in PA The counterfactual for NJ assumes NJ follows the same trend as in PA; then we just compare the actual (with min wage increase) outcome in NJ to the (inferred) counterfactual for NJ This gives us our estimate of the effect of increasing the minimum wage. 28

Policy Analysis Using Dummy Variables !   Focus on this picture and see if you can deduce for yourself what the effect of a minimum wage increase in NJ is (assuming common trends in employment in PA and NJ).

29

Policy Analysis Using Dummy Variables # employeesit = β 0 + β1Post it + β 2 NJ + β 3 Post * NJ it + uit βˆ1, βˆ 2 < 0 βˆ1

βˆ 2

βˆ 0

βˆ 0 , βˆ 3 > 0

βˆ 3

30

Policy Analysis Using Dummy Variables # employeesit = β 0 + β1Post it + β 2 NJ + β 3 Post * NJ it + uit

!   Given this model, average employment at a firm is β 0 in PA before min wage increase   β 0 + β1 in PA after min wage increase   β + β in NJ before min wage increase 0 2   β 0 + β1 + β 2 + β 3 in NJ after min wage increase !   Change in employment in NJ: β1 + β 3 !   Change in employment in PA: β1 



Note: This is the counterfactual change in employment for NJ

!   Diff-in-Diff (ΔNJ-ΔPA): β 3 31

Policy Analysis Using Dummy Variables !   This is called a “difference in difference” analysis. βˆ 3 gives the effect of the minimum wage.

!   Card and Krueger (1994) studied a minimum wage law

change in NJ in 1992, using fast food restaurants in NJ and eastern PA as the sample   





Famously found no effect of minimum wage on employment Some follow-up studies have disputed their findings, though not much evidence of a large effect of the minimum wage on employment One hypothesis: minimum wage was so low to begin with that it was close to the equilibrium (market-clearing) wage. Remains a point of empirical study and argument. 32

Policy Analysis Using Dummy Variables !   We can denote a generic diff-in-diff specification as

Outcomeit = β 0 + β1Post it + β 2Treat + β 3 Post _ Treat it + uit !   It’s not hard to think of dozens of cases in policy analysis where this type of approach could be used 

Effect of changes in welfare generosity on       



employment fertility marriage school performance of kids of recipients

Effect of change in divorce laws on   divorce   child outcomes   domestic violence incidents 33

Policy Analysis Using Dummy Variables !   Note the crucial assumption: Common trends 





If employment in NJ and PA always move together; or (more generally) if outcome variable in Treat=1 vs. Treat=0 always move together, then this is likely a valid assumption If employment and NJ and PA often move together but sometimes randomly deviate (for reasons other than changes in minimum wage laws) then assumption is more suspect Does the minimum wage really increase employment?   Seems unlikely theoretically   A more sensible conclusion would be either that minimum wage has no effect

or small effect on employment; OR something else changed in NJ (or PA) at the same time as minimum wage law change is driving their finding (i.e., their results are subject to bias

34

Caveats on Policy Analysis When Those Studied Choose Whether to be Treated !   Consider a job training program that unemployed

workers have the option of signing up for !   If individuals choose whether to participate in a program, this may lead to a self-selection problem !   e.g. Those that apply for job training may be more diligent or may have had to qualify

35

Self-selection Problems !   If we can control for everything that is correlated

with both participation and the outcome of interest then it’s not a problem !   Often, though, there are unobservables that are correlated with participation !   In this case, the estimate of the program effect is biased, and we don’t want to set policy based on it!

36