Day 3: Binary response models

Day 3: Binary response models 05/12/2013 (1) Forms of discreteness Censoring/corner solutions generate variables which are mixed discrete/continuous...
1 downloads 1 Views 286KB Size
Day 3: Binary response models

05/12/2013 (1)

Forms of discreteness Censoring/corner solutions generate variables which are mixed discrete/continuous (e.g. hours of work are 0 for non-employed, any positive value for employees)

Truncation involves discarding part of the population (e.g. low-income targeted samples, or earnings models for employees only)

Count variables are the outcome of some counting process (e.g. the number of durables owned, or the number of employees of a firm)

Binary variables reflect a distinction between two states (e.g. unemployed or not, married or not)

Ordinal variables are ordered variables, possibly taking more than two values (e.g. happiness on a scale 1=miserable … 5=ecstatic; rank in the army)

Unordered variables reflect outcomes which are discrete but with no natural ordering (e.g. choice of occupation) 05/12/2013 (2)

1

Binary models (1) Dependent variable is yit = 0 or 1 This describes: • situations of choice between 2 alternatives • sequences of events defining durations E.g. suppose: • yi = (0, 0, 0, 0, 1, 1, 1, 0, 1, 1) is a monthly panel observation • 0 indicates unemployment, 1 indicates employment Then yi represents a history of 4 months’ unemployment followed by 3 months’ employment, followed by 1 month’s unemployment then 2 months’ employment. 05/12/2013 (3)

Binary models (2) An alternative to modelling the sequence yi is to model the set of durations: (U4, E3, U1, E2)  survival analysis An important issue concerns dynamics – how does the length of time already spent out of work affect this month’s probability of finding work: duration dependence. Here, we focus on modelling this period’s state (0 or 1): • as a function of explanatory variables and an individual effect (static model) • as a function of explanatory variables, an individual effect and last period’s state (dynamic model). This allows for state dependence. 05/12/2013 (4)

2

Why are special methods needed ? Consider the binary variable, yit = 0 or 1 Notice that the expected value of yit is: E(yit) = Pr(yit = 1)  1 + Pr(yit = 0)  0 = Pr(yit = 1) where Pr(yit = 1) is the probability that yit = 1 A simple way to model yit is to use a regression with yit as dependent variable. Then the RHS will be the conditional probability that yit = 1, plus an error term. This is the linear probability model (LPM): yit = 0 + zi  + xit  + ui + it With panel data methods (e.g. within-group or random-effects), the linear model implies: E(yit | zi , xit , ui)  Pr (yit = 1 | zi , xit , ui) = Pit 05/12/2013 (5)

Disadvantages of the LPM The linear probability model requires: Pit  0 + zi  + xit  + ui But this may fall outside the admissible [0, 1] interval. Moreover, var(yit | zi , xit , ui ) = Pit [1- Pit] which varies with zi and xit  heteroskedasticity is a problem [Despite its disadvantages, the panel LPM is simple to estimate and is often seen in applied work – but it’s not an ideal choice.]

05/12/2013 (6)

3

Why nonlinear models are needed Pr(yit = 1) LPM

1

P(zi , xit , ui) (0+zi  + xit  + ui)

05/12/2013 (7)

Latent regression models: the binary case To overcome the disadvantages of the LPM, use non-linear methods. Define a latent (unobservable) continuous counterpart, yit* Example from labour economics: If yit=1 defines employment, then: yit* = best available wage – minimum acceptable wage. Let yit* be generated by a linear regression structure: yit* = 0 + zi  + xit  + ui + it Then employment is chosen whenever available wage - acceptable wage is positive: yit = 1

if and only if yit* > 0

05/12/2013 (8)

4

Latent regression models: the binary case (2) 

Pr(yit = 1 | zi , xit , ui) = Pr(0 + zi  + xit  + ui + it > 0) = Pr(-it < [0 + zi  + xit  + ui ] ) = F(0 + zi  + xit  + ui)

where F(.) is the distribution function of the random variable -it Probit model: assume it has a normal distribution F( . ) = ( . )  df of the N(0,1) distribution Logit (logistic regression) model: assume it has a logistic distribution F() = e/[ 1+e ]  df of the logistic distribution

05/12/2013 (9)

An aside: understanding the results from binary latent regression models In a linear regression model: yit = 0 + zi  + xit  + ui + it We can interpret the coefficients directly:  = (average) effect on y of increasing z by 1 unit  = (average) effect on y of increasing x by 1 unit These are known as the marginal effects of z, x on y But in nonlinear models, things are more complicated. In: Pr(yit = 1) = F(0 + zi  + xit  + ui)  and  aren’t the effects on Pr(yit = 1) of changing z or x by one unit  so coefficients can’t be directly interpreted

05/12/2013 (10)

5

Some concepts for summarising results Model: Pr(yit = 1) = F(0 + zi  + xit  + ui) (call this conditional probability Pit) Coefficients = Predicted probability = Odds (Oit) =

0 ,  and  Pit Pit / (1 – Pit )

For 2 people with different z and x –values, whose probabilities of y=1 are P0 and P1 : Odds ratio Relative risk

= =

O1 /O0 P1 /P0

Relative risk and the odds ratio are often confused, but they are different 05/12/2013 (11)

Marginal effects, relative risk and the odds ratio Suppose person 0 has observable characteristics z0 , x0 and unobservable characteristic u0 ; then: P0 = F(0 + z0  + x0  + u0) Let’s consider the effect of making a 1-unit change in (say) z. This means inventing a new person with characteristics (z0+1 , x0 , u0), for whom Pr(y=1) is: P1= F(0 + [z0+1] + x0  + u0) We can summarise the effect of this change in various ways:  Marginal effect = P1 – P0  Relative risk = P1 / P0  Odds ratio = [ P1 / ( 1 – P1) ] / [ P0 / ( 1 – P0) ] = [ P1 / P0 ]  [( 1 – P0) /( 1 – P1) ]

Other variables are “held constant” at their baseline values (x0 , u0)

05/12/2013 (12)

6

Logistic regression and the odds ratio In the logit model: P0 = exp(0 + z0  + x0  + u0) / [ 1 + exp(0 + z0  + x0  + u0) ] P1 = exp(0 + [z0 +1] + x0  + u0) / [ 1 + exp(0 + [z0+1] + x0  + u0) ] Odds ratio = [ P1 / ( 1 – P1) ] / [ P0 / ( 1 – P0) ] = [ exp(0 + [z0 +1] + x0  + u0) ] / [ exp(0 + z0  + x0  + u0) ] = [ exp(0 + z0  + x0  + u0)  exp(1 ) ] / [ exp(0 + z0  + x0  + u0) ] = exp() The odds ratio is usually only quoted in relation to logit results. It is hard to interpret and very often gets misinterpreted. It gives the proportionate effect of a 1-unit change in a variable on the odds , not on the probability Pr(y=1). 05/12/2013 (13)

Misinterpretation of odds ratios Check that you understand the error in the following quotation from a well-known textbook: “The odds ratio of 1.3689 for females […] indicates that, controlling for the effects of the other explanatory variables, females are 37% more likely to be in poverty than males. Stated differently, the probability of being in poverty is 1.37 times greater for females than for males.” In fact, it isn’t possible to calculate the relative risk or the marginal effect on the probability of poverty, from knowledge of the odds ratio alone. What would be the relative risk and marginal effect if the predicted probability for a benchmark male individual is 0.2? What if it’s 0.001? What if it’s 0.8?

05/12/2013 (14)

7

Presentation of results • We can report marginal effects evaluated at sample mean values of x and z, with individual effects u set at zero (i.e. the average in the population). But:  This represents a synthetic, hybrid person that doesn’t exist.  Almost no-one has a zero individual effect

• Present average partial effects (APE) which allow for the average effect of the unobserved individual effects. Evaluate at:  Mean x and z, or  Selected x and z to represent typical persons, or  Each sampled person’s x and z, then average the results.

• These methods aren’t possible with fixed-effects logit, as we don’t estimate the (distribution of) individual effects or the coefficients of time-invariant variables.

05/12/2013 (15)

Fixed effects models – some issues • To deal with individual effects in linear FE models, we can:  estimate individual effects ui (LSDV).  eliminate individual effects ui by within-group transform

The two approaches are identical and give an unbiased estimate of  • But in non-linear FE models:  Can’t remove the individual effects ui by within-group transformation as in linear regression  With no short-cut method of calculating the estimate of , we’d have to use the dummy variable method and calculate estimates of all the ui  the “incidental parameters problem”  All the estimated coefficients would be biased, even in large samples 05/12/2013 (16)

8

Conditional ML estimation • CML (as applied here) is a way of condensing the likelihood function into a form which does not depend on ui but does depend on β. • Then CML is consistent (loosely speaking, unbiased in a large sample of individuals) for β. • But CML is model-specific as it is based on a technical “trick” that is only applicable in a few cases, e.g.:  logit models  Poisson model (for count data)

• Details of conditional logit are given in Appendix 4

05/12/2013 (17)

Fixed effects (or conditional) logit Model: Pr(yit = 1) = F(0 + zi  + xit  + ui) , where F( . ) is the logistic form Avoiding technicalities, the method works as follows: • Use the subsample of individuals for whom there is some change in yit during the observation period  so we sacrifice information on any individuals who display no change in y • The changes in the covariates xit (i.e. variable differences like x it  xi ) are then used in a modified logit analysis to explain the changes in the observed sequence of outcomes yi1 … yiT . • NB differencing the covariates removes any variables constant over time (e.g. gender, birth year, etc.), so  can’t be estimated • But it also removes ui , so we don’t have to assume anything about ui  so FE logit (unlike RE logit) is unaffected by any endogeneity which is confined to ui 05/12/2013 (18)

9

Random effects logit/probit Appropriate if we want to: • estimate the coefficients of zi • use a non-logistic form • allow for dynamic adjustment (i.e. use the lagged value yit-1 as an explanatory variable) In these circumstances, conditional likelihood is not available. The random effects approach is a natural solution. [and, of course, RE is preferred if the individual effects are independent of the x – use a Hausman test to decide] 05/12/2013 (19)

Random effects logit/probit Consider the basic model: yit* = 0 + zi  + xit  + ui + it yit = 1 if and only if yit* > 0 Make standard random effects assumptions (including independence of (zi , xit ) and ui ). Since the it are independent, the joint probability of observing (yi1, yi1,…, yiTi) conditional on ui (and zi , xit ) is just the product of the conditional probabilities for each time period: Pr(yi1 , ... , yiT | zi , Xi ,ui ) = Pr(yi1 | zi , xi1 ,ui )  ...  Pr( yiT | zi , xiT ,ui ) = F(0+zi  + xi1  + ui )  ...  F(0+zi  + xiT  + ui ) 05/12/2013 (20)

10

Random effects logit/probit Make an assumption about the distribution of ui (usually assumed to be N(0, u2) Average out (marginalise with respect to) the unobservable ui to get the unconditional probability of the data for individual i : Pr(yi1 , ... , yiT | zi , Xi ) = E [ Pr(yi1 , ... , yiT | zi , Xi ,ui ) ] where “E[ . ]” refers to the expectation or mean with respect to the N(0, u2) distribution of ui . This unconditional probability Pr(yi1 , ... , yiT | zi , Xi ) is the likelihood for individual i. This process is repeated for all individuals in the sample. We then choose as our ML estimates the parameter values that maximise the likelihood over the whole sample. This is implemented in Stata, but computing run times can be quite long. This ML method works well only if cov(ui , [zi , xit]) = 0 05/12/2013 (21)

Is the zero-correlation assumption valid? The Hausman test • A Hausman test can be used to compare conditional logit estimates with the random-effects logit which assumes independence between ui and (zi , Xi ). • Null hypothesis is H0: ui and (zi , Xi ) are independent. • Alternative hypothesis is H1: ui and (zi , Xi ) are not independent (implies we should use Conditional Logit). • βˆ CL is consistent under H0 and H1, but inefficient under H0 (since only uses information on changers).

ˆ

• β RE is consistent and efficient under H0, but inconsistent under H1. • Test statistic:







S  βˆ CL  βˆ RE ' var(βˆ CL )  var(βˆ RE ) βˆ CL  βˆ RE



(distributed as 2 if H0 is correct, with df equal to the no. of coefficients in ) 05/12/2013 (22)

11

Individual effects correlated with regressors •The RE probit/logit assumes that (zi , xit ) and ui are independent. •Ideal solution is to start from a theory to account for the xit - ui endogeneity and estimate an appropriate simultaneous model •A crude theory-free alternative is to allow ui to be correlated with elements of xit observed in the sample:  General formulation due to Chamberlain models the mean of ui as a function of all values of xit from all time periods.  Simplified version (based on the Mundlak model) is to model ui as a function of individual means, xi : ui    xi  i , where i | xi  N(0, ση2)  NB: this cannot be a structurally stable model 05/12/2013 (23)

Dynamic models for binary data – estimation

12

Unobserved heterogeneity or state dependence? • As seen in the HILDA data, there is much persistence in and repetition of categorical states. Past experience of a given state is often a good predictor of future experience of that state. • Example: people who were unemployed in the past are more likely to be unemployed in the future. • There are two plausible mechanisms behind this persistence:  State dependence: experience of a given state alters behaviour in the future so as to make that state more likely to occur  Unobserved heterogeneity: individuals differ in their propensity to be in a given state and the factors explaining these differences persist over time and are unmeasured.

05/12/2013 (25)

Discrete dynamics • Think in terms of discrete states (e.g. employment v.unemployment), rather than continuous variables • The basic concept is the (conditional) probability of the individual being in a particular state j at time t = Pit(j) • The dependent variable yit is a categorical indicator of the current state, so yit  {1, 2, ...,K}, where K is the number of possible states. • These outcome probabilities or choice probabilities may be conditional on: • • •

unobserved individual “effects” ui observed explanatory covariates xit history of past outcomes yit-1 , yit-2 , yit-3 , ...

05/12/2013 (26)

13

Dynamic binary model • Binary dependent variable model with a simple dynamic specification (one lag of the dependent variable). • Latent autoregression: yit* =  + xit  +  yit-1 + ui + it yit = 1 if yit* > 0 yit = 0 otherwise • True state dependence is measured by  • Persistent unobserved heterogeneity is captured by ui • Assume that {it} is white noise

05/12/2013 (27)

Fixed effects estimation • In general, fixed effects estimation won’t work: if we treat the ui as parameters, the parameter space expands with N  estimates biased even in large samples the model is nonlinear, so the ui can’t be eliminated by a simple trick like the within-group transformation

• But some progress is possible in special cases in the logit model (i.e. if it has a logistic distribution) Chamberlain (1985): conditional logit works for a pure AR model (i.e. no covariates xit) with fixed effects and T  4 waves Honoré and Kyriazidou (E’trica 2000): for a model with a single lag in a panel with T  4 waves, Pr(yi3 | yi0, yi1  yi2, xi1, xi2 = xi3) doesn’t involve ui   can be identified from the subsample for which: xi2 = xi3 and xi1 varies across individuals.

• These approaches have limited applicability or require very large samples of individuals, so they aren’t often used 05/12/2013 (28)

14

The random effects model Construct a likelihood by sequential conditioning. Define in turn: Pi0 = Pr(yi0 | zi , Xi , ui)

(initial condition model)

Pi1 = Pr(yi1 | yi0 , zi , xi1 , ui) . . PiT = Pr(yiT | yiT-1 , zi , xiT , ui) The probabilities Pi1 … PiT have the form: F( + xit  +  yit-1 + ui)

for yit = 1

1 - F( + xit  +  yit-1 + ui)

for yit = 0.

05/12/2013 (29)

The random effects likelihood function Given a particular value for ui , the likelihood function for individual i is :

Li (ui )  Pi 0 (ui )  Pi1 (ui )  ...  PiT (ui ) But ui is unknown, so: • • • •

assume a specific distribution for ui [e.g. ui  N(0, u2)] “integrate out” the unknown ui [i.e. integrate Li(ui) with respect to the N(0, u2) distribution] this must be done separately, using numerical approximation, for each of the N individuals search over parameter values requires computation of the likelihood at each point in the search  intensive in computer time

Note: we haven’t said anything yet about the initial condition probability Pi0(ui) 05/12/2013 (30)

15

Dynamic models: the initial conditions problem The Pi0(ui) term in the likelihood is the contribution of the initial condition – the first observed value y. Three important cases: (1) y-process observed from an exogenously fixed origin yi 0 (e.g. model of youth employment  yi 0 = 0 indicates nonemployment during last year at school (t = 0) (2) y-process observed from a heterogeneous origin (e.g. model of youth employment where t = 0 is first post-school period  yi 0 given by separate model of “first destination after school” (3) y-process is long-established and we observe it over periods t = 0 … T  yi 0 depends on ui and the histories of {xit} and {it} up to period 0 In cases 2 & 3, yi0 is variable and depends on ui so cannot be assumed exogenous. 05/12/2013 (31)

Heckman’s method • In practice, it is difficult to derive an exact expression for Pi0(ui), especially if we do not observe the process from the beginning. • Heckman (1981) suggested approximating Pi0(ui) by a simple probit model, where regressors can include “pre-sample” information (e.g. family background). • Can be complicated to estimate. • See Akay (J. Appl. E’metrics 2012) for a comparison with other approaches

05/12/2013 (32)

16

Wooldridge’s method Wooldridge (J.Appl.Ec’metr. 2005) suggested an alternative: condition on yi0, without specifying its probability. Instead, model the distribution of ui conditional on yi0 and Xi=(xi0 ... xiT). For example, ui could be specified as:

u i   0  x i    0 yi 0   i and x i 



T

where i | X i , yi 0 ~ N(0,  2 )

x . The latent regression is then:

t 1 it

yi*  (   0 )  x it β   yit 1  xi   0 yi 0  i   it • Can be estimated as standard RE probit • But note this is just another approximation, specific to the particular sample – coefficients  and 0 will differ between panels of different lengths T. 05/12/2013 (33)

Example of Wooldridge’s method: BHPS men 1991-2004 . xtprobit married i.lmarried age i.degree i.emp linc /// mage mdegree memp mlinc i.married0, re Random-effects probit regression Number of obs Random effects u_i ~ Gaussian Wald chi2(10) Log likelihood = -1690.3634 Prob > chi2

= = =

9463 3699.91 0.0000

-----------------------------------------------------------------------------married | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------1.lmarried | 3.120169 .0856324 36.44 0.000 2.952332 3.288005 age | .0184809 .0097494 1.90 0.058 -.0006275 .0375893 1.degree | .4830863 .2286799 2.11 0.035 .0348819 .9312908 1.emp | .0164447 .1139384 0.14 0.885 -.2068705 .23976 linc | .0009747 .0004575 2.13 0.033 .000078 .0018713 mage | .0143642 .0109134 1.32 0.188 -.0070257 .0357542 mdegree | -.478919 .2546229 -1.88 0.060 -.9779706 .0201327 memp | .4653682 .1886365 2.47 0.014 .0956475 .8350889 mlinc | .0008749 .0006755 1.30 0.195 -.000449 .0021988 1.married0 | .3407543 .1325894 2.57 0.010 .0808839 .6006247 _cons | -3.173536 .2666428 -11.90 0.000 -3.696147 -2.650926 -------------+---------------------------------------------------------------/lnsig2u | -1.818687 .4575043 -2.715379 -.9219949 -------------+---------------------------------------------------------------sigma_u | .4027886 .0921388 .2572545 .6306543 rho | .1395915 .0549489 .062072 .2845516 -----------------------------------------------------------------------------Likelihood-ratio test of rho=0: chibar2(01) = 6.82 Prob >= chibar2 = 0.004 05/12/2013 (34)

17

Is there state dependence? . margins, predict(pu0) dydx(lmarried) at((mean) _all /// > lmarried=0 age=40 degree=0 emp=1 linc=100) Conditional marginal effects Model VCE : OIM

Number of obs

Expression : Pr(married=1 assuming u_i=0), dy/dx w.r.t. : 1.lmarried at : lmarried = 0 age = 40 degree = 0 emp = 1 linc = 100 mage = 29.63701 mdegree = .1637958 memp = .8483568 mlinc = 134.8314 0.married0 = .7298954 1.married0 = .2701046

=

9463

predict(pu0)

(mean) (mean) (mean) (mean) (mean) (mean)

Note: we’ve chosen to set ui at its mean value

-----------------------------------------------------------------------------| Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------1.lmarried | .8744943 .0144746 60.42 0.000 .8461247 .9028639 -----------------------------------------------------------------------------Note: dy/dx for factor levels is the discrete change from the base level. 05/12/2013 (35)

Interpretation of dynamic models: short-run, interim & long-run effects in a binary model

18

Transition tree for a 3-wave panel yi 0 = 0

yi 1 = 0| yi 0 = 0

yi 1 = 1|yi 0 = 0

yi2 = 0 |

yi2 = 1 |

yi2 = 0 |

yi2 = 1 |

yi0 = 0, yi 1 = 0

yi0 = 0, yi 1 = 0

yi0 = 0, yi 1 = 1

yi0 = 0, yi 1 = 1

For a panel with T waves, 2T possible sequences of outcomes: •

potentially a very large number of transition probabilities



simplify by assuming transitions depend on history in a simple way – e.g. 1st-order autoregressive dependence 05/12/2013 (37)

Example: interpretation of a dynamic probit model •

Model: Pr(yit = 1| xit , yit-1, ui) = ( +  xit +  yit-1 + ui)



Assumed initial state: yi0 = 0



Transition probabilities: P0|0(xit , ui) = 1 - ( +  xit + ui) P1|0(xit , ui) = ( +  xit + ui) P0|1(xit , ui) = 1 - ( +  xit +  + ui) P1|1(xit , ui) = ( +  xit +  + ui)

05/12/2013 (38)

19

Example: A hypothetical probit model of marriage yit = 1 if married, 0 if not ; xit = degree, employed, monthly earnings/10 Pr(yit = 1| xit , yit-1, ui) =

(-1.5 + .002 ageit + .40 degreeit

- .05 employedit + .002 earningsit + 3.0 yit-1 + ui) where ( . ) is df of the N(0,1) distribution. • Dependence of current marital status yit on past status yit-1 is known as state dependence • Note that we’re assuming the same process (i.e. same parameters) generates both entry into and exit from marriage • To weaken this assumption, we’d need interactions between yit-1

and degreeit , employedit and earningsit 05/12/2013 (39)

Short-run impact • Consider a 30-year old unmarried employee with no degree and income of £1,000pw and an individual effect of u=0 : • Pr(married at t+1) =  (-1.5+.00230 - .05 + .002100) = 0.099 • For an identical married person: Pr(married at t+1) = 0.956  state dependence is (unsurprisingly!) very important • Effect of higher education:  (-1.5+.00230 + 0.4 - .05 + .002100) = 0.187  education nearly doubles the risk of marriage! • Effect of a 50% increase in income:  (-1.5+.00230 - .05 + .002150) = 0.117  income increases the marriage risk by about 2 percentage points But these are short-run effects that don’t allow for the full effect to accumulate over time 05/12/2013 (40)

20

Steady-state (long-run) analysis • Assume a (hypothetical) steady state: xit = xi for all t • Transition probabilities: P0|0 = 1 - ( +  xi + ui) ; P1|0 = ( +  xi + ui) P0|1 = 1 - ( +  xi +  + ui) ; P1|1 = ( +  xi +  + ui) • In steady state (if it exists) Pr(yit = 1) is the same in every period t. Call this steady-state equilibrium probability  • By the law of conditional probability, Pr(yit = 1) is Pr(yit=1|yit= 0) Pr(yit-1= 0) + Pr(yit=1|yit= 1) Pr(yit-1= 1), so:  = P1|0 (1- ) + P1|1  • Solution for :  = P1|0 / [1-P1|1 + P1|0] = P1|0 / [P0|1 + P1|0] or: Pr(yit = 1) = share of transitions which are 0  1 •

Steady-state can be evaluated for different x- and u-values 05/12/2013 (41)

Steady-state example • Hypothetical 30-year old employee, earning £1,000 pm • Transition probabilities: P0|0 = 1 – 0.089 = 0.911; P1|0 = 0.089 P0|1 = 1 – 0.951 = 0.049; P1|1 = 0.951 • Steady state solution for Pr(married):  = P1|0 / [P0|1 + P1|0] = 0.089 / [0.049 + 0.089] = 0.641 • • • •

In steady-state equilibrium, 64% of people with these characteristics would be married Adding a degree   = 0.776 Increasing income by 50%   = 0.725 Long-run and short run impacts are quite different 05/12/2013 (42)

21

Monte Carlo simulation Procedure: • Specify a sequence of covariate values x17 ... x59 describing the early-mid adult life of a hypothetical individual • Set the persistent unobservable ui equal to its mean value 0 • Repeat over 500 replications: o

o o

draw a fresh sequence of pseudo-random residual terms 17 ... 59 from a N(0,1) distribution initialise the marital state as y16 = 0 recover the estimated parameters  ,  and calculate the probit intercept  as the value consistent with E(u) = 0 (i.e. the mean over individuals of the estimates of (   0 )  x i   0 yi 0 ).

o

generate the corresponding sequence of marital outcomes y17 ... y59 as yt = 1 if 0 + xt +  yt-1 + u + t > 0; and yt = 0 otherwise, for t = 17...59

• Store the resulting 500 marital histories and summarise them as a panel (with N = 500, T = 44) 05/12/2013 (43)

Summary of results from 500 Monte Carlo replications ( 500 replications of the marriage process from age 17 to 59) Proportion never married

Mean no. marriages

Mean no. divorces

Mean years married

Baseline: continuous employment from age 16; no higher education

2.6%

1.56

0.74

20.1

Studying through age 21; degree qualification; continuous employment from age 22

0.0%

1.34

0.36

30.7

Hypothetical individual

05/12/2013 (44)

22

Calculating dynamic impacts by stochastic simulation: 1 Marital status .2 .4 .6 .8 0

0

Marital status .2 .4 .6 .8

1

4 randomly-generated marital histories (replication nos. 1, 87, 234, 429)

20

30

40 Age

60

20

30

40 Age

Baseline

50

60

With degree

Marital status .2 .4 .6 .8 0

0

Marital status .2 .4 .6 .8

1

With degree

1

Baseline

50

20

30 Baseline

40 Age

50

60

20

With degree

30

40 Age

Baseline

50

60

With degree

05/12/2013 (45)

Some conclusions • Quite a lot of “churning” in the marriage market! Too much... • More realistic dynamic models would allow the probability of divorce to depend on the marriage duration – not just the previous year’s marital state • That would introduce duration dependence in addition to state dependence  making it a “semi-Markov” model • Also, we could use separate models of entry into and exit from marriage, to improve flexibility • In general, it’s important to think what dynamic features the model needs to capture before choosing a model specification • Simulation is a very effective way of revealing any shortcomings in model specification 05/12/2013 (46)

23

Appendix 4 The following slides can be safely ignored if you’re not interested in technical detail or if you aren’t familiar with maximum likelihood and the maths of the logit model  Marginal effects  Conditional logit  Random effects likelihood function

05/12/2013 (47)

Marginal effects • In the LPM, the marginal effect of an increase in a variable on the conditional probability that yit = 1 is just its coefficient. Formally  P(xit , ui) /  xjit = βj (where zi is absorbed into xit for brevity) • Note the marginal effect in the LPM does not depend on the values of other covariates, or the individual effect. So the ME is the same for everyone. • This is not generally true in non-linear models: P(xit , ui) /xjit = F(0+xit +ui) / xjit = f (0+xit +ui) βj

05/12/2013 (48)

24

Marginal effects (2) • Marginal effect is coefficient multiplied by the density function (normal for probit, logistic for logit), evaluated at the base values of x. • So marginal effects depend on covariates and individual effects. And usually we don’t estimate the individual effects directly! • Note we can still compare the relative effects of variables (since f(.) cancels out). So the ratio of MEs due to xj and xk is βj / βk . Doesn’t depend on value of latent variable.

05/12/2013 (49)

Conditional logit Subsume zi in xit for notational simplicity. If we try to estimate the ui using individual-specific dummy variables, there is no simplification analogous to within-group regression. Moreover, the number of parameters   with n, so the MLDV estimator is not consistent. Log-likelihood for the logit model for individual i conditional on ui : Ti 1  L(  , u1...un )   yit ln x βu  1  e it i t 1

T  e xit β ui  i    (1  yit ) ln x it β  ui  t 1 1 e

  

The statistic t yit is a sufficient statistic for ui : Pr(yi | t yit) does not depend on ui . Example Ti = 2; t yit can take values 0, 1, 2. Conditional on t yit =0, yi1 = yi2 = 0 and, conditional on t yit =2, yi1 = yi2 = 1 with prob 1. So only cases with t yit = 1 are of interest. 05/12/2013 (50)

25

Conditional logit (continued) Probability of the conditioning event: Pr(t yit = 1) = Pr(yi1 =1, yi2 = 0) + Pr(yi1 =0, yi2 = 1) = Pi1(1-Pi2) + (1-Pi1)Pi2



e xi1β ui  e xi 2β ui 1  e xi1β ui 1  e xi 2β ui







Conditional probability:

Pr( yi1  1, yi 2  0 | yi1  yi 2  1)  



Pr( yi1  1, yi 2  0) Pr( yi1  yi 2  1)

e xi1β ui e xi1β e ( x i1  x i 2 ) β   e xi1β ui  e xi 2β ui e xi1β  e xi 2β 1  e ( xi1  xi 2 )β

ui is eliminated by conditioning on t yit 05/12/2013 (51)

Conditional logit (continued) With T = 2, the conditional log-likelihood is:

L(β) 

 d (x

i:y 1

i

i1



 x i 2 )β  ln 1  e ( x i1  x i 2 ) β



where di = 1 if yi1 =1, yi2 = 0 and 0 if yi1 =0, yi2 = 1. Note that, if xit contains time-invariant covariates (i.e. zi), these disappear from (xi1-xi2)   cannot be estimated. In general, conditional logit only uses data from individuals who experience change in yit over time. This sacrifices sample variation. •The same conditioning approach does not work with probit and other functional forms, nor with general dynamic models •But it can be generalised to:  unordered multinomial logit models  ordered logit models with more than two outcomes.

05/12/2013 (52)

26

The random effects likelihood function (static model) Let Pit(ui) = Pr (yit | zi , xit , ui), where  F ( 0  z i α  x it β  ui ) if yit  1 Pr( yit | z i , x it , ui )   1  F ( 0  z i α  x it β  ui ) if yit  0

Then the likelihood function for individual i, conditional on ui , is : T

Li (ui )   Pit (ui )

,

t 1

which tells us, for given values of , , u2 and 2, and given value of ui how well the model fits the data on individual i. 05/12/2013 (53)

Integrating out the random effects Including ui in the conditioning set greatly simplifies the likelihood function, because errors from different time periods are then independent (otherwise, we’d need to allow for dependence across periods). But… we don’t know ui (also we have the incidental parameters problem). We do, however, know (by assumption!) its distribution. Therefore we can “average out ” or marginalise with respect to ui:

 Ti   Ti Li  E   Pit (ui )     Pit (u )g (u )du  t 1   t 1 where g(u) is an assumed density for u, e.g. for probit, Gaussian: g(u) = u-1(u/u). The full likelihood function is L =  Li Evaluation of the likelihood function requires the integral to be approximated numerically by a quadrature algorithm.

05/12/2013 (54)

27