Models for binary data: Logit and Probit

Models for binary data: Logit and Probit Quantitative Methods II for Political Science Kenneth Benoit February 25, 2009 Limited dependent variables...
Author: Eugene Terry
2 downloads 0 Views 576KB Size
Models for binary data: Logit and Probit Quantitative Methods II for Political Science Kenneth Benoit

February 25, 2009

Limited dependent variables

I

Some dependent variables are limited in the possible values they may take on I I I I

might might might might

be be be be

binary (aka dichotomous) counts unordered categories ordered categories

I

For these methods, OLS and the CLRM will fail to provide desirable estimates – in fact OLS easily produces non-sensical results

I

Focus here will be on binary and count limited dependent variables

Binary dependent variables I

Remember OLS assumptions I I I I

I

i has a constant variance σ 2 (homoskedasticity) i are uncorrelated with one another i is normally distributed (necessary for inference) Y is unconstrained on R – implied by the lack of restrictions on the values of the independent variables (except that they cannot be exact linear combinations of each other)

This cannot work if Y = {0, 1} only E (Yi ) = 1 · P(Yi = 1) + 0 · P(Yi = 0) = P(Yi = 1) X = bk Xik = Xi b

I

But if Yi only takes two possible values, then ei = Yˆi − Yi can only take on two possible values (here, 0 or 1)

Why OLS is unsuitable for binary dependent variables I

From above, P(Yi = 1) = Xi b – hence this is called a linear probability model I I

I

if Yi = 0, then (0 = Xi b + ei ) or (ei = −Xi b) if Yi = 1, then (1 = Xi b + ei ) or (ei = 1 − Xi b)

We can maintain the assumption that E(ei ) = 0: E (ei )

=

P(Yi = 0)(−Xi b) + P(Yi = 1)(1 − Xi b)

=

−(1 − P(Yi = 1))P(Yi = 1) + P(Yi = 1)(1 − P(Yi = 1))

=

0

I

As a result, OLS estimates are unbiased, but: they will not have a constant variance

I

Also: OLS will easily predict values outside of (0, 1) even without the sampling variance problems – and thus give non-sensical results

Non-constant variance

Var (ei )

= E (ei2 ) − (E (ei ))2 = E (ei2 ) − 0 = P(Yi = 0)(−Xi b)2 + P(Yi = 1)(1 − Xi b)2 =

(1 − P(Yi = 1))(P(Yi = 1))2 + P(Yi = 1)(1 − P(Yi = 1))2

= P(Yi = 1)(1 − P(Yi = 1)) = Xi b(1 − Xi b)

I

Hence the variance of ei varies systematically with the values of Xi

I

Inference from OLS for binary dep. variables is therefore invalid

Back to basics: the Bernoulli distribution I

The Bernoulli distribution is generated from a random variable with possible events: 1. Random variable Yi has two mutually exclusive outcomes: Yi = {0, 1} Pr (Yi = 1|Yi = 0) = 0 2. 0 and 1 are exhaustive outcomes: Pr (Yi = 1) = 1 − Pr (Yi = 0)

I

Denote the population parameter of interest as π: the probability that Yi = 1 Pr (Y1 = 1) = π Pr (Yi = 0) = 1 − π

Bernoulli distribution cont.

I

Formula:  Yi = fbern (yi |π) =

I

π yi (1 − π)1−yi 0

for yi = 0, 1 otherwise

Expectation of Y is π E (Yi ) =

X

yi f (Yi )

i

= 0 · f (0) + 1 · f (1) = 0+π = π

Introduction to maximum likelihood I

Goal: Try to find the parameter value β˜ that makes E (Y |X , β) as close as possible to the observed Y

I

For Bernoulli: Let pi = P(Yi = 1|Xi ) which implies P(Yi = 0|Xi ) = 1 − Pi . The probability of observing Yi is then P(Yi |Xi ) = PiYi (1 − Pi )1−Yi

I

Since the observations can be assumed independent events, then P(Yi |Xi ) =

N Y

PiYi (1 − Pi )1−Yi

i=1

I

When evaluated, this expression yields a result on the interval (0, 1) that represents the likelihood of observing this sample Y given X if βˆ were the “true” value

I

The MLE is denoted as β˜ for β that maximizes L(Y |X , b) = max L(Y |X , b)

MLE example: MAXIMUM whatLIKELIHOOD π for EXAMPLE a tossed coin? 0.5 Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

1 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5

Likelihood Log-Likelihood

(1-P)^(1-yi) L 0.5 1 1 0.5 1 1 0.5 1 1 1

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

ln L -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147

0.0009766 -6.931472 0.6

Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

Likelihood Log-Likelihood

1 0.6 0.6 1 0.6 0.6 1 0.6 0.6 0.6

(1-P)^(1-yi) L 0.4 1 1 0.4 1 1 0.4 1 1 1

0.4 0.6 0.6 0.4 0.6 0.6 0.4 0.6 0.6 0.6

ln L -0.916291 -0.510826 -0.510826 -0.916291 -0.510826 -0.510826 -0.916291 -0.510826 -0.510826 -0.510826

0.0017916 -6.324652

MLE example continued 0.7 Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

1 0.7 0.7 1 0.7 0.7 1 0.7 0.7 0.7

Likelihood Log-Likelihood

(1-P)^(1-yi) L 0.3 1 1 0.3 1 1 0.3 1 1 1

0.3 0.7 0.7 0.3 0.7 0.7 0.3 0.7 0.7 0.7

ln L -1.203973 -0.356675 -0.356675 -1.203973 -0.356675 -0.356675 -1.203973 -0.356675 -0.356675 -0.356675

0.0022236 -6.108643 0.8

Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

Likelihood Log-Likelihood

1 0.8 0.8 1 0.8 0.8 1 0.8 0.8 0.8

(1-P)^(1-yi) L 0.2 1 1 0.2 1 1 0.2 1 1 1

0.2 0.8 0.8 0.2 0.8 0.8 0.2 0.8 0.8 0.8

ln L -1.609438 -0.223144 -0.223144 -1.609438 -0.223144 -0.223144 -1.609438 -0.223144 -0.223144 -0.223144

0.0016777 -6.390319

MLE example in R > ## MLE example > y coin.mle |z|) (Intercept) -3.299165e+00 7.054731e-01 -4.676528 2.917722e-06 incumbIncumbent 3.174541e+00 8.427698e-01 3.766795 1.653564e-04 spend_total 1.631863e-04 2.344526e-05 6.960312 3.395191e-12 electorate -8.813909e-06 8.328096e-06 -1.058334 2.899031e-01 incumbIncumbent:spend_total -6.421803e-05 4.297991e-05 -1.494141 1.351388e-01 > summary(won.probit)$coeff Estimate Std. Error z value Pr(>|z|) (Intercept) -1.944149e+00 3.904175e-01 -4.9796655 6.369428e-07 incumbIncumbent 1.861190e+00 4.670316e-01 3.9851492 6.743774e-05 spend_total 9.304030e-05 1.247458e-05 7.4583902 8.758594e-14 electorate -4.418578e-06 4.749578e-06 -0.9303096 3.522108e-01 incumbIncumbent:spend_total -3.627852e-05 2.347314e-05 -1.5455330 1.222174e-01

Interpreting logit coefficients The problem: How do we interpret these when the marginal effect of a change in X no longer represents a β change in Y ? 1. We can compute fitted values using the formula for pi : > > > > + > > > 1 2 3 4 5 6 7 8 9

## interpretation 1: fitted values x

Suggest Documents