Models for binary data: Logit and Probit

Models for binary data: Logit and Probit Quantitative Methods II for Political Science Kenneth Benoit February 25, 2009 Limited dependent variables...

Author: Eugene Terry

2 downloads 0 Views 576KB Size

Report

Download PDF

Recommend Documents

Logit, Probit & Ordinal Models Lecture 1 Overview and Assumptions

Bayesian Analysis of Random Coefficient Logit Models Using. Aggregate Data

Logit models for household food insecurity classification

Endogeneity in Probit Response Models

AUTOLOGISTIC MODELS FOR BINARY DATA ON A LATTICE

AZEOTROPIC DATA FOR BINARY MIXTURES

The Split Population Logit (SPopLogit): Modeling Measurement Bias in Binary Data

Estimating Logit Models with Small Samples

Forecasting turning points of the business cycle: dynamic logit models for panel data

Day 3: Binary response models

THE USE AND IMPACT OF BUSINESS ADVICE BY SMES IN BRITAIN: AN EMPIRICAL ASSESSMENT USING LOGIT AND ORDERED LOGIT MODELS

Comparing the Performance of Logit and Probit Early Warning Systems for Currency Crises in Emerging Market Economies

Data and Image Models

Longitudinal Models for Discrete Data

Binary Program Integrity Models for Defeating Code-Reuse Attacks

Estimation of multinomial logit models in R : The mlogit Packages

Semiparametric Inference in Dynamic Binary Choice Models

Data Models and Data processing in GIS

Fast Estimation of Multinomial Logit Models: R Package mnlogit

Panel Data and Mixed Models

Topic Binary Search Trees (Non-Linear Data Structures for Searching)

Regression Models for Count Data in R

Metrics for data warehouse conceptual models understandability

Language evolution: Computer models for empirical data

Models for binary data: Logit and Probit Quantitative Methods II for Political Science Kenneth Benoit

February 25, 2009

Limited dependent variables

I

Some dependent variables are limited in the possible values they may take on I I I I

might might might might

be be be be

binary (aka dichotomous) counts unordered categories ordered categories

I

For these methods, OLS and the CLRM will fail to provide desirable estimates – in fact OLS easily produces non-sensical results

I

Focus here will be on binary and count limited dependent variables

Binary dependent variables I

Remember OLS assumptions I I I I

I

i has a constant variance σ 2 (homoskedasticity) i are uncorrelated with one another i is normally distributed (necessary for inference) Y is unconstrained on R – implied by the lack of restrictions on the values of the independent variables (except that they cannot be exact linear combinations of each other)

This cannot work if Y = {0, 1} only E (Yi ) = 1 · P(Yi = 1) + 0 · P(Yi = 0) = P(Yi = 1) X = bk Xik = Xi b

I

But if Yi only takes two possible values, then ei = Yˆi − Yi can only take on two possible values (here, 0 or 1)

Why OLS is unsuitable for binary dependent variables I

From above, P(Yi = 1) = Xi b – hence this is called a linear probability model I I

I

if Yi = 0, then (0 = Xi b + ei ) or (ei = −Xi b) if Yi = 1, then (1 = Xi b + ei ) or (ei = 1 − Xi b)

We can maintain the assumption that E(ei ) = 0: E (ei )

=

P(Yi = 0)(−Xi b) + P(Yi = 1)(1 − Xi b)

=

−(1 − P(Yi = 1))P(Yi = 1) + P(Yi = 1)(1 − P(Yi = 1))

=

0

I

As a result, OLS estimates are unbiased, but: they will not have a constant variance

I

Also: OLS will easily predict values outside of (0, 1) even without the sampling variance problems – and thus give non-sensical results

Non-constant variance

Var (ei )

= E (ei2 ) − (E (ei ))2 = E (ei2 ) − 0 = P(Yi = 0)(−Xi b)2 + P(Yi = 1)(1 − Xi b)2 =

(1 − P(Yi = 1))(P(Yi = 1))2 + P(Yi = 1)(1 − P(Yi = 1))2

= P(Yi = 1)(1 − P(Yi = 1)) = Xi b(1 − Xi b)

I

Hence the variance of ei varies systematically with the values of Xi

I

Inference from OLS for binary dep. variables is therefore invalid

Back to basics: the Bernoulli distribution I

The Bernoulli distribution is generated from a random variable with possible events: 1. Random variable Yi has two mutually exclusive outcomes: Yi = {0, 1} Pr (Yi = 1|Yi = 0) = 0 2. 0 and 1 are exhaustive outcomes: Pr (Yi = 1) = 1 − Pr (Yi = 0)

I

Denote the population parameter of interest as π: the probability that Yi = 1 Pr (Y1 = 1) = π Pr (Yi = 0) = 1 − π

Bernoulli distribution cont.

I

Formula: Yi = fbern (yi |π) =

I

π yi (1 − π)1−yi 0

for yi = 0, 1 otherwise

Expectation of Y is π E (Yi ) =

X

yi f (Yi )

i

= 0 · f (0) + 1 · f (1) = 0+π = π

Introduction to maximum likelihood I

Goal: Try to find the parameter value β˜ that makes E (Y |X , β) as close as possible to the observed Y

I

For Bernoulli: Let pi = P(Yi = 1|Xi ) which implies P(Yi = 0|Xi ) = 1 − Pi . The probability of observing Yi is then P(Yi |Xi ) = PiYi (1 − Pi )1−Yi

I

Since the observations can be assumed independent events, then P(Yi |Xi ) =

N Y

PiYi (1 − Pi )1−Yi

i=1

I

When evaluated, this expression yields a result on the interval (0, 1) that represents the likelihood of observing this sample Y given X if βˆ were the “true” value

I

The MLE is denoted as β˜ for β that maximizes L(Y |X , b) = max L(Y |X , b)

MLE example: MAXIMUM whatLIKELIHOOD π for EXAMPLE a tossed coin? 0.5 Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

1 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5

Likelihood Log-Likelihood

(1-P)^(1-yi) L 0.5 1 1 0.5 1 1 0.5 1 1 1

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

ln L -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147 -0.693147

0.0009766 -6.931472 0.6

Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

Likelihood Log-Likelihood

1 0.6 0.6 1 0.6 0.6 1 0.6 0.6 0.6

(1-P)^(1-yi) L 0.4 1 1 0.4 1 1 0.4 1 1 1

0.4 0.6 0.6 0.4 0.6 0.6 0.4 0.6 0.6 0.6

ln L -0.916291 -0.510826 -0.510826 -0.916291 -0.510826 -0.510826 -0.916291 -0.510826 -0.510826 -0.510826

0.0017916 -6.324652

MLE example continued 0.7 Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

1 0.7 0.7 1 0.7 0.7 1 0.7 0.7 0.7

Likelihood Log-Likelihood

(1-P)^(1-yi) L 0.3 1 1 0.3 1 1 0.3 1 1 1

0.3 0.7 0.7 0.3 0.7 0.7 0.3 0.7 0.7 0.7

ln L -1.203973 -0.356675 -0.356675 -1.203973 -0.356675 -0.356675 -1.203973 -0.356675 -0.356675 -0.356675

0.0022236 -6.108643 0.8

Y_i

P^yi 0 1 1 0 1 1 0 1 1 1

Likelihood Log-Likelihood

1 0.8 0.8 1 0.8 0.8 1 0.8 0.8 0.8

(1-P)^(1-yi) L 0.2 1 1 0.2 1 1 0.2 1 1 1

0.2 0.8 0.8 0.2 0.8 0.8 0.2 0.8 0.8 0.8

ln L -1.609438 -0.223144 -0.223144 -1.609438 -0.223144 -0.223144 -1.609438 -0.223144 -0.223144 -0.223144

0.0016777 -6.390319

MLE example in R > ## MLE example > y coin.mle |z|) (Intercept) -3.299165e+00 7.054731e-01 -4.676528 2.917722e-06 incumbIncumbent 3.174541e+00 8.427698e-01 3.766795 1.653564e-04 spend_total 1.631863e-04 2.344526e-05 6.960312 3.395191e-12 electorate -8.813909e-06 8.328096e-06 -1.058334 2.899031e-01 incumbIncumbent:spend_total -6.421803e-05 4.297991e-05 -1.494141 1.351388e-01 > summary(won.probit)$coeff Estimate Std. Error z value Pr(>|z|) (Intercept) -1.944149e+00 3.904175e-01 -4.9796655 6.369428e-07 incumbIncumbent 1.861190e+00 4.670316e-01 3.9851492 6.743774e-05 spend_total 9.304030e-05 1.247458e-05 7.4583902 8.758594e-14 electorate -4.418578e-06 4.749578e-06 -0.9303096 3.522108e-01 incumbIncumbent:spend_total -3.627852e-05 2.347314e-05 -1.5455330 1.222174e-01

Interpreting logit coefficients The problem: How do we interpret these when the marginal effect of a change in X no longer represents a β change in Y ? 1. We can compute fitted values using the formula for pi : > > > > + > > > 1 2 3 4 5 6 7 8 9

## interpretation 1: fitted values x