For these methods, OLS and the CLRM will fail to provide desirable estimates – in fact OLS easily produces non-sensical results
I
Focus here will be on binary and count limited dependent variables
Binary dependent variables I
Remember OLS assumptions I I I I
I
i has a constant variance σ 2 (homoskedasticity) i are uncorrelated with one another i is normally distributed (necessary for inference) Y is unconstrained on R – implied by the lack of restrictions on the values of the independent variables (except that they cannot be exact linear combinations of each other)
This cannot work if Y = {0, 1} only E (Yi ) = 1 · P(Yi = 1) + 0 · P(Yi = 0) = P(Yi = 1) X = bk Xik = Xi b
I
But if Yi only takes two possible values, then ei = Yˆi − Yi can only take on two possible values (here, 0 or 1)
Why OLS is unsuitable for binary dependent variables I
From above, P(Yi = 1) = Xi b – hence this is called a linear probability model I I
I
if Yi = 0, then (0 = Xi b + ei ) or (ei = −Xi b) if Yi = 1, then (1 = Xi b + ei ) or (ei = 1 − Xi b)
We can maintain the assumption that E(ei ) = 0: E (ei )
Hence the variance of ei varies systematically with the values of Xi
I
Inference from OLS for binary dep. variables is therefore invalid
Back to basics: the Bernoulli distribution I
The Bernoulli distribution is generated from a random variable with possible events: 1. Random variable Yi has two mutually exclusive outcomes: Yi = {0, 1} Pr (Yi = 1|Yi = 0) = 0 2. 0 and 1 are exhaustive outcomes: Pr (Yi = 1) = 1 − Pr (Yi = 0)
I
Denote the population parameter of interest as π: the probability that Yi = 1 Pr (Y1 = 1) = π Pr (Yi = 0) = 1 − π
Bernoulli distribution cont.
I
Formula: Yi = fbern (yi |π) =
I
π yi (1 − π)1−yi 0
for yi = 0, 1 otherwise
Expectation of Y is π E (Yi ) =
X
yi f (Yi )
i
= 0 · f (0) + 1 · f (1) = 0+π = π
Introduction to maximum likelihood I
Goal: Try to find the parameter value β˜ that makes E (Y |X , β) as close as possible to the observed Y
I
For Bernoulli: Let pi = P(Yi = 1|Xi ) which implies P(Yi = 0|Xi ) = 1 − Pi . The probability of observing Yi is then P(Yi |Xi ) = PiYi (1 − Pi )1−Yi
I
Since the observations can be assumed independent events, then P(Yi |Xi ) =
N Y
PiYi (1 − Pi )1−Yi
i=1
I
When evaluated, this expression yields a result on the interval (0, 1) that represents the likelihood of observing this sample Y given X if βˆ were the “true” value
I
The MLE is denoted as β˜ for β that maximizes L(Y |X , b) = max L(Y |X , b)
MLE example: MAXIMUM whatLIKELIHOOD π for EXAMPLE a tossed coin? 0.5 Y_i
MLE example in R > ## MLE example > y coin.mle |z|) (Intercept) -3.299165e+00 7.054731e-01 -4.676528 2.917722e-06 incumbIncumbent 3.174541e+00 8.427698e-01 3.766795 1.653564e-04 spend_total 1.631863e-04 2.344526e-05 6.960312 3.395191e-12 electorate -8.813909e-06 8.328096e-06 -1.058334 2.899031e-01 incumbIncumbent:spend_total -6.421803e-05 4.297991e-05 -1.494141 1.351388e-01 > summary(won.probit)$coeff Estimate Std. Error z value Pr(>|z|) (Intercept) -1.944149e+00 3.904175e-01 -4.9796655 6.369428e-07 incumbIncumbent 1.861190e+00 4.670316e-01 3.9851492 6.743774e-05 spend_total 9.304030e-05 1.247458e-05 7.4583902 8.758594e-14 electorate -4.418578e-06 4.749578e-06 -0.9303096 3.522108e-01 incumbIncumbent:spend_total -3.627852e-05 2.347314e-05 -1.5455330 1.222174e-01
Interpreting logit coefficients The problem: How do we interpret these when the marginal effect of a change in X no longer represents a β change in Y ? 1. We can compute fitted values using the formula for pi : > > > > + > > > 1 2 3 4 5 6 7 8 9