Event History Modeling: Basics, Parametric Models, and the Cox Model

Event History Modeling: Basics, Parametric Models, and the Cox Model Jason S. Byers University of Georgia April 3, 2018 Jason S. Byers (UGA) Parame...
Author: Janel Hodges
3 downloads 0 Views 340KB Size
Event History Modeling: Basics, Parametric Models, and the Cox Model Jason S. Byers University of Georgia

April 3, 2018

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

1 / 26

Objectives

By the end of this meeting, participants should be able to: Explain the need for duration models with respect to censoring and time-dependent hazard rates. Define the basic elements of a duration model. Characterize various parametric models of event history. Estimate parametric models and choose the best-fitting model. Argue what the advantages of a Cox model are compared to parametric models. Explain various methods for handling ties. Estimate and interpret a Cox model.

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

2 / 26

Preliminaries

Duration Imagine that instead of a regular panel data set, our explanatory problem is the duration of some phenomenon. Examples include life itself, marriages, coalition governments, etc. In each case we ask how long does some phenomenon last after some initial event. e.g., marriage. The initiating event is the wedding. Then the couple is in the initial state, “married,” for some period of time, followed (maybe) by transition to the absorbing state, “divorce.” We ask “How long do marriages last?” and then the more interesting question, “What are the factors that make them last longer or shorter?”

This is duration analysis. Its characteristic is binary. The phenomenon is either “on” (1) or “off” (0). We can set up our data in two ways: Measure how long something lasts and record that length of time as our dependent variable. One row per unit. Create a stacked panel structure in which we code 1 or 0 for the event’s status. Observations drop out after the status change. Each row is a unit-wave. Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

3 / 26

Preliminaries

Censoring (Example: Individual Presidential Approval) Suppose We Are Studying a Rally-Round-the-Flag Effect Left-Censoring: Some people were not affected by the event, never becoming approvers. This is not a big problem. We simply limit our inferences to those who meet the condition and say nothing about others. Inferences from duration models are inherently conditional. Right-Censoring: Some people never transitioned, continuing to approve throughout the duration of the study. This is a more serious issue. Our inferences are subject to bias resulting from selecting these cases out of the sample. This amounts to selection on the dependent variable, which is capable of causing big problems. Software solutions for right-censoring: R: Surv function has an event argument. Expected values are 0=right censored, 1=event at time, or other possibilities. (time argument gives length of time.) Stata: Declare your data as duration data: stset timevar, failure(failvar). timevar is length of time, and failvar is your censoring variable. Stata assumes missing and zero are censored values of failvar. Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

4 / 26

Preliminaries

Why Are Old Approaches Inappropriate? Why a linear model does not work with length of time as the outcome The functional form is almost certain to be wrong. The distribution of the errors is not normal. Why logit does not work for a stacked panel R’s “glm” function and Stata’s “logit” function assume that the hazard rate is constant. If you are in state 1, then it doesn’t matter how long you have been there, the hazard of moving to state 0 is always the same. For processes that are more likely to transition early than late (e.g., marriage), this is the wrong model. Since we explicitly postulate that members respond to the flow of commentary, which changes over time, the logit constant rate assumption is wrong. We need another model. Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

5 / 26

Preliminaries

What Do Event History Data Look Like? Event History Models with a Continuous Measure of Duration Everything that we discuss today will have one unique row, per observation, in the data set. The duration variable is continuous and measured as the amount of time the event occurs. Event History Models with a Discrete Measure of Duration The data will look more similar to stacked panel. The duration variable is discrete and measured as “0” until the end of the event, then receives a “1”. This type of Event History Model will be discussed next week!

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

6 / 26

Preliminaries

Event History Model with a Continuous Measure Example Data for Military Interventions

Intervention 1 46 81 184 236 278 332 467 621 672

Jason S. Byers (UGA)

Intervenor U.K. El Salvador U.S. Bulgaria Taiwan Botswana Uganda Israel Malawi India

Target Albania Honduras Panama Greece China South Africa Kenya Egypt Mozambique Pakistan

Duration 1 657 274 12 7456 1097 409 357 631 173

Parametric and Cox Models

Contiguity 0 1 0 1 1 1 1 1 1 1

Censored 0 0 1 0 0 0 1 0 1 0

April 3, 2018

7 / 26

Preliminaries

Event History Model with a Discrete Measure

Jason S. Byers (UGA)

Case I.D. 1 1 .. .

Event Occurrence 0 0 .. .

Year 1974 1975 .. .

Time Elapsed 1 2 .. .

1 1 5 45 45 .. .

0 1 1 0 0 .. .

1986 1987 1974 1974 1975 .. .

13 14 1 1 2 .. .

45 45

0 0

1992 1993

19 20

Parametric and Cox Models

April 3, 2018

8 / 26

Preliminaries

Survivor Function: A Starting Point

For T the time until an event, the cumulative distribution function is: F (t) = p(T ≤ t) The probability density function is: f (t) =

∂F (t) ∂t

The survivor function, expresses the probability that a particular case has not transitioned at time t: S(t) = p(T ≥ t) For the example, that is the probability that a member is still in the approval state at time t.

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

9 / 26

Preliminaries

Hazard: An alternate conception

The hazard rate h(t) is: h(t) =

f (t) S(t)

In other words, this is the rate at time t that units transition from “on” to “off,” given that they are currently in “on” at time t.

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

10 / 26

Preliminaries

Covariates

Without covariates we are asking the simple question, “How long, on average, does the effect last?” With covariates we are introducing explanations for how long the effect lasts. Testing interesting social theory will generally involve introducing covariates.

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

11 / 26

Parametric Models

Parametric Approaches Choose a functional form you believe is appropriate for modeling duration time. The resulting model also should have a more appropriate probability distribution for your variable. The Exponential Model A straw man that is almost never appropriate. Flat hazard rate: h(t) = λ > 0 S(t) = exp−λ(t) f (t) = λ(t) exp−λ(t) T ∼ Exp(λ−1 ) 0

With covariates: h(t|x) = λ = exp−(β x)

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

12 / 26

Parametric Models

Choosing the Right Functional Form

2.5

The Weibull Model Monotonic hazard rate: h(t) = λp(λt)p−1 for t > 0, λ > 0, p > 0

2.0

Exponential Weibull Log−Logistic

Note: The Gompertz model is another model with a monotonic hazard rate.

1.5 1.0 0.5 0.0

With covariates: h(t|x) = h0t exp{α1 xi1 + α2 xi2 + · · · + αj xij }

Hazard Rate

p

S(t) = exp−(λt) p f (t) = λp(λt)p−1 exp−(λt) T ∼ Wei(λ−1 , p)

0

10

20

30

40

Time

Examples of Different Hazard Rates Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

13 / 26

Parametric Models

Non-Monotonic Functional Forms The Log-Logistic Model Hazard rate is potentially non-monotonic & unimodal: h(t) =

λp(λt)p−1 1+(λt)p

Log-linear duration model: log(T ) = β 0j x + σ, where  has a logistic distribution. S(t) = f (t) =

1 1+(λt)p λp(λt)p−1 (1+(λt)p )2

With covariates: substitute λ = exp{−β j x} into h. The Log-Normal Model Log-linear duration model: log(T ) = β 0j x + σ, where  has a normal distribution.   0 x S(t) = 1 − Φ log(t)−β σ   2  0 x f (t) = σ√12π t −1 exp − 12 log(t)−β σ Hazard rate is potentially non-monotonic & unimodal: h(t) =

f (t) S(t)

Covariates are built into h through S and f . Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

14 / 26

Parametric Models

Estimation Construct our likelihood function as the product of the probability density functions of our observed data: L=

n Y

{f (ti )}δi {S(ti )}1−δi

i=1

Key here, δi is an indicator coded 0 if an observation is censored, and 1 if it is not. So, in each situation, which term contributes to the likelihood and which turns to 1 in a product term?

The log-likelihood is almost always easier to work with: ` = L = log(L) =

n X

δi log{f (ti )} + (1 − δi ) log{S(ti )}

i=1

In each situation, which term contributes to the log-likelihood and which turns to 0 in an additive term? Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

15 / 26

Parametric Models

Parametric Choice & Model Fit

Generalized gamma distribution: Can you impose constraints? Likelihood ratio test: −2 log(L0 − L1 ) ∼ χ2 , degrees of freedom set by number of constraints. AIC: −2(log L) + 2(c + p + 1) BIC: −2(log L) + log(n)(c + p + 1)

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

16 / 26

Cox Proportional Hazard Model

The Cox Proportional Hazard Model

Goal: Effect of covariates. Wrongly specified parametric models can interfere with this, so is there a good semi-parametric approach? With the Cox model, we do not directly estimate the baseline hazard. Specification hi (t) = exp(β1 x1i + β2 x2i + · · · + βk xki )h0 (t)   hi (t) log = β1 x1i + β2 x2i + · · · + βk xki h0 (t)

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

17 / 26

Cox Proportional Hazard Model

Partial Likelihood Instead of using probability density functions from a parametric distribution, we use the probability of failure conditional on being in the risk set: 0 e β xi P(tj = Ti |R(ti )) = P β 0 xj j∈R(ti ) e The product of these probabilities gives us the parital likelihood function: " #δi 0 K Y e β xi Lp = P β 0 xj j∈R(ti ) e i=1 We optimize the log of the partial likelihood:   K X X 0 `p = Lp = log Lp = δi β 0 xi − log e β xj  i=1

Jason S. Byers (UGA)

Parametric and Cox Models

j∈R(ti ) April 3, 2018

18 / 26

Cox Proportional Hazard Model

Partial Likelihood Continued An Example from Box-Steffensmeier and Jones

Case 7 4 5 2 9 3 8 1 6

Jason S. Byers (UGA)

Duration 7 15 21 28 30 36 45 46 51

Censored No No No Yes Yes No Yes No No

Parametric and Cox Models

April 3, 2018

19 / 26

Cox Proportional Hazard Model

Partial Likelihood Continued

Lp =

ψ(7) ψ(1) + ψ(2) + ψ(3) + ψ(4) + ψ(5) + ψ(6) + ψ(7) + ψ(8) + ψ(9) x

ψ(4) ψ(1) + ψ(2) + ψ(3) + ψ(4) + ψ(5) + ψ(6) + ψ(8) + ψ(9) x

ψ(5) ψ(1) + ψ(2) + ψ(3) + ψ(5) + ψ(6) + ψ(8) + ψ(9) x

ψ(3) ψ(1) + ψ(3) + ψ(6) + ψ(8) x

ψ(1) ψ(1) + ψ(6) x

Jason S. Byers (UGA)

ψ(6) ψ(6)

Parametric and Cox Models

April 3, 2018

20 / 26

Cox Proportional Hazard Model

Handling Ties The partial likelihood function depends only on ordered duration times. Numerator: all cases with an observed failure. Denominator: observation gets repeated as often as it succeeds when others fail. The Breslow Method Assume that the risk set does not change among tied failure times. LBreslow =

K Y i=1

Jason S. Byers (UGA)

0

e β si hP

0

β xj j∈R(ti ) e

Parametric and Cox Models

idi

April 3, 2018

21 / 26

Cox Proportional Hazard Model

Handling Ties The Efron Method Assume that all failure orderings among tied events are equally likely. Then calculate the expected risk set. LEf ron =

K Y i=1

0

e β si Qdi

r =1

hP

0

β xj − (r − 1)d −1 j∈R(ti ) e i

0

β xj j∈D(ti ) e

P

i

Other Methods Averaged Likelihood: Average across the likelihoods of all possible orderings. The Exact Discrete Method: To start, treat time as discrete and observe a dummy for success or failure (more next time).

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

22 / 26

Cox Proportional Hazard Model

The Most Important Slide: Interpretation In a proportional hazard model: Positive coefficients imply the hazard rate is increasing; hence, the survival time is shortened. Negative coefficients imply the hazard rate is decreasing; hence, the survival time is lengthened. In an accelerated failure time model: Positive coefficients imply the survival time is lengthened; hence, the hazard rate is decreasing. Negative coefficients imply the survival time is shortened; hence, the hazard rate is increasing. Note carefully how your software is specifying the model. How does the change in a covariate’s value influence the hazard rate? In a proportional hazard model: " # e β(xi =X1 ) − e β(xi =X2 ) %∆h(t) = ∗ 100 e β(xi =X2 ) Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

23 / 26

Cox Proportional Hazard Model

Example of Parametric Models Model of U.N. Peacekeeping Missions

Exponential Variables Constant Civil War Interstate Conflict N Log-Liklihood

Jason S. Byers (UGA)

Estimate (s.e.) 4.35 (.21) -1.16 (.36) 1.64 (.50) 54 -86.35

Weibull (A.F.T.) Estimate (s.e.) 4.29 (.27) -1.10 (.45) 1.74 (.62) 54 -84.66

Parametric and Cox Models

Weibull (Prop. Hazards) Estimate (s.e.) -3.46 (.50) .89 (.38) -1.40 (.51) 54 -84.66

Cox (Prop. Hazards) Estimate (s.e.) .73 (.38) -.86 (.50) 54 -127.16

April 3, 2018

24 / 26

Cox Proportional Hazard Model

Baseline Hazard and Survivor Functions Estimated likelihood of being in a risk set: X ˆ0 α ˆi = e β xj j∈R(ti )

We can estimate the baseline hazard rate as: hˆ0 (ti ) = 1 − α ˆi We can estimate the baseline survivor function as: Sˆ0 (t) =

K Y

α ˆi

i=1

The integrated hazard rate is: Hˆ0 (t) = − log Sˆ0 (t) = −

n X

log α ˆi

i=1 0

ˆ The individual-specific survivor probability is: Sˆi (t) = Sˆ0 (t)exp(β x) Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

25 / 26

Homework

For Next Time

Read Box-Steffensmeier & Jones, chapters 5-7. Estimate a parametric and a Cox model from the UN peacekeeping data. Justify your choice of parametric model. Data: http://spia.uga.edu/faculty_pages/monogan/teaching/ ts/UNFINAL.dta duration is length of time, and failed captures censoring. Your predictors are whether the mission was a response to a civil war (civil) and whether the mission was a response to an interstate conflict (interst).

Report the results of each in a table. Which model do you trust more and why? Interpret the coefficients for the model you prefer.

Jason S. Byers (UGA)

Parametric and Cox Models

April 3, 2018

26 / 26

Suggest Documents