Econometrics I Lecture 7: Dummy Variables

Econometrics I Lecture 7: Dummy Variables Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1395 ...
Author: Evan Blair
8 downloads 0 Views 313KB Size
Econometrics I Lecture 7: Dummy Variables Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology

44716 Fall 1395

1 / 27

Introduction

• • •

Dummy variable: di is a dummy variable if it only takes values of 0 and 1. Useful for studying eect of qualitative variables / event / treatment / etc. Examples: wage discrimination against women: f emale a dummy that takes 1 if the person is female and 0 otherwise. eect of clean drinking water: clean a dummy that shows whether the individuals have access to clean water.

2 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model

Reference: Wooldridge (2013), Ch 7.

3 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model

4 / 27

Single dummy as a regressor •

Consider the wage equation wage = β0 + β1 f emale + β2 educ + u

• β1 : •

the wage gain if the person is a woman rather than a man holding education constant. Easy to show that β1 = E [wage | f emale, educ] − E [wage | male, educ]

level of education is the same across expectations. •

Eectively, β1 delivers a dierent intercept for females

5 / 27

Intuition: shifting intercept

6 / 27

Interpretation of dummies • •

Interpretation of dummies is with reference to the omitted group In wage = β0 + β1 f emale + β2 educ + u



we dropped the male category, and β1 was the change in wage when female relative to being male (holding education constant). Must omit one category or the constant wage = γ0 male + γ1 f emale + β2 educ + u β2 is the same but γ1 = β0 + β1 and γ0 = β0 .

7 / 27

Dierence in means •

If we are interested in wage discrimination, one could calculate wf emale − wmale . How is this related to βˆ1 ?

• • •

Notice δˆ1 = wf emale − wmale in wage = δ0 + δ1 f emale + u When we add more regressors we are trying to achieve the causal interpretation. The wage dierential between men and women could be due to omitted characteristics that matter for wage. Women might be on average more educated. Women might on average have less experience.

8 / 27

Example - wage discrimination Dep. Var.

female

wage

wage

wage

lwage

(1)

(2)

(3)

(4)

-2.512

-2.273

-1.811

-0.301

(0.303)

(0.279)

(0.265)

(0.0372)

0.506

0.572

0.0875

(0.0504)

(0.0493)

(0.00694)

0.0254

0.00463

(0.0116)

(0.00163)

educ

exper

tenure

Constant

N

R

2

Mean y

0.141

0.0174

(0.0212)

(0.00298)

7.099

0.623

-1.568

0.501

(0.210)

(0.673)

(0.725)

(0.102)

526

526

526

526

0.114

0.256

0.359

0.388

5.896

5.896

5.896

1.623

9 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model

10 / 27

Categorical variables • • •

Often we want to include a categorical variable as a control in a regression. Example: marital status (=0 single, =1 married, =2 divorced, =3 widow), city of residence, ... Including a categorical variable in a regression makes no sense, instead, you must introduce a set of dummies for each category



Example: Correct specication wage = β0 +β1 married+β2 divorced+β3 widow +β4 educ+u

incorrect specication wage = β00 + β10 marital status + β40 educ + u

here we impose: β2 = 2β1 and β3 = 3β1 . 11 / 27

Ordinal variables

• •

For ordinal variables only the ordering matters. Example: credit rating (CR= 0, 1, 2, 3, 4)

1. Including CR in a regression makes little sense 2. Including dummies for each value of CR makes a lot of sense this is also a more exible specication, in other words (1) is a special case of (2)!

12 / 27

Fixed eects • •

Think about a dataset with n individuals in J cities. We want to model returns to education ln wage = β0 + β1 educ + u



We might think cities might have special features that change returns to education this could lead to omitted variable bias because those living in cities with a high returns to education, would achieve higher levels of education!



Is there a way to correct for this?

city xed eects: include J − 1 dummies that show whether the individual lives in each city ln wage = β0 + β1 educ +

J−1 X

δj dj + u

j=1

sometimes we simplify the notation and write ln wageij = β0 + β1 educij + δj + uij 13 / 27

Fixed eects



What does ln wageij = β0 + β1 educij + δj + uij do? each city shifts the intercept of the regression controls for any observed and unobserved city specic characteristics that matter for wage estimation of β1 solely relies on within city variation in wage and education eectively, βˆ1 is the average of estimated returns to education from the J separate city regressions.

14 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model

15 / 27

Interaction terms with dummy variables



Consider wage = β0 + β1 f emale + β2 educ + β3 f emale × educ + u

• • • •

the single f emale term shifts intercept the interaction term shifts slope How do we test for whether returns to education is the same across genders? How do we test for whether being a female has no eect on the wage?

16 / 27

Graphical illustration

17 / 27

Testing for dierences in regression functions across groups

• •

Could use the same interactions idea to test whether coecients are dierent for various groups. For example wage = β0 + β1 f emale + β2 educ + β3 f emale × educ + β4 exper + β5 f emale × exper + u



Testing for β1 = β3 = β5 = 0 is equivalent to testing that the same regression equation applies to men and women.

18 / 27

Outline Single dummy as an explanatory variable Dummy for multiple categories Interaction terms with dummies Linear probability model

19 / 27

Linear probability model (LPM) •

Consider y = β0 + β1 x1 + · · · + βK xK + u where y is a dummy variable Conditional expectation is now the conditional probability of y = 1 E(y | x)

=

1 · Pr(y = 1 | x) + 0 · Pr(y = 0 | x)

=

Pr(y = 1 | x)

= β 0 + β 1 x 1 + · · · + βK x K •

Increasing x1 by 1 unit and holding x2 , . . . , xK xed, probability of y = 1 increases by β1 units

20 / 27

Pros and cons of LPM •

advantages: easy estimation and interpretation often works well in practice



disadvantages predicted probabilities could be negative or greater than 1! model gives heteroskedastic errors V ar(y | x) = p(x) [1 − p(x)]



Logit transformation could solve the issue of unreasonable predicted probabilities Pr(y = 1 | x) =

eβ0 +β1 x ∈ [0, 1] 1 + eβ0 +β1 x

21 / 27

Example- Female labor force participation



What are the determinants of being employed? husbands income, education, experience, age, number of kids, ...



LPM: inlf = β0 + β1 nwif einc + β2 educ + β3 exper + β4 age + β5 kids + ...



use MROZ.dta to estimate this model units of obs: married women

22 / 27

Estimation results Dep. Var. inlf (N=753) nwifeinc

(1)

(2)

-0.0050

-0.0034

(0.0015)

(0.0014)

educ

0.0380 (0.0074)

exper

0.0395 (0.0057)

expersq

-0.0006 (0.0002)

age

-0.0161 (0.0025)

kidslt6

-0.2618 (0.0335)

kidsge6

0.0130 (0.0132)

Constant

R

2

0.6692

0.5855

(0.0359)

(0.1542)

0.0125

0.257

23 / 27

Predicted probabilities - LPM

24 / 27

Predicted probabilities - Logit

25 / 27

Estimation results - heteroskedasticity adjusted Dep. Var. inlf (N=753)

nwifeinc

Unadjusted (1)

(2)

(3)

(4)

-0.0050

-0.0034

-0.0050

-0.0034

(0.0015)

(0.0014)

(0.0015)

(0.0015)

educ exper expersq age kidslt6 kidsge6 Constant

R

2

Adjusted

0.0380

0.0380

(0.0074)

(0.0073)

0.0395

0.0395

(0.0057)

(0.0058)

-0.0006

-0.0006

(0.0002)

(0.0002)

-0.0161

-0.0161

(0.0025)

(0.0024)

-0.2618

-0.2618

(0.0335)

(0.0318)

0.0130

0.0130

(0.0132)

(0.0135)

0.6692

0.5855

0.6692

0.5855

(0.0359)

(0.1542)

(0.0355)

(0.1523)

0.0125

0.257

0.0125

0.257

26 / 27

Summary



In this lecture we discussed use of dummy variables as explanatory and dependent variables discussed interpretation of dummies xed eect and interaction terms Linear Probability Model

27 / 27