Multinomial Logistic Regression

Multinomial Logistic Regression Andrea Arias Ll. Luis Sandoval-Mej´ıa Yangmei (Emily) Wang Texas Tech University ISQS 5349: Regression Analysis Ap...
Author: Myra Griffith
29 downloads 0 Views 673KB Size
Multinomial Logistic Regression Andrea Arias Ll.

Luis Sandoval-Mej´ıa

Yangmei (Emily) Wang

Texas Tech University ISQS 5349: Regression Analysis

April 28, 2016

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Agenda

1

Introduction

2

Multinomial Logistic Regression Multinomial logit model Model assumptions Parameter estimation: MLE

3

Example in R Estimated probabilities

4

Simulation in R Accounting example

5

References

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Introduction

Let’s consider a data set A data set with n observations where the response variable can take one of several discrete values (1,2,...,J) Let Y be program type1 : General Academic Vocation

1

IDRE, 2016 Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis

Probability of choosing one program by SES in general 0.80

0.72

0.70 0.60 0.50 0.40

0.46

0.40 0.34

0.33 0.26

0.30

0.21

0.16

0.20

0.12

0.10 0.00 low

middle academic

Arias Ll., Sandoval-Mej´ ıa, Wang

general

high voca:on

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by SES at the 1st quartile of writing score 0.90 0.80 0.70

0.57

0.60 0.50 0.40

0.31

0.38

0.44 0.31

0.33 0.23

0.30

0.22 0.21

0.20 0.10 0.00 ses_low

ses_middle Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

ses_high Voca?onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by SES at the mean writing score 0.90 0.80

0.70

0.70 0.60 0.50 0.40

0.48

0.44 0.36

0.30

0.23

0.20

0.20

0.29 0.18

0.12

0.10 0.00 ses_low

ses_middle Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

ses_high Voca?onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by SES at the 3rd quartile of writing score 0.90

0.81

0.80 0.70 0.60

0.63

0.58

0.50 0.40

0.31

0.30

0.20

0.20

0.11

0.17

0.13 0.06

0.10 0.00 ses_low

ses_middle Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

ses_high Voca?onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by writing score at SES low 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

Wri0ng score Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

Voca0onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by writing score at SES middle 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

Wri0ng score Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

Voca0onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Some data analysis (cont.)

Probability of choosing one program by writing score at SES high 1.00 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67

Wri0ng score Academic

Arias Ll., Sandoval-Mej´ ıa, Wang

General

Voca0onal

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

When do we use Multinomial Logit?

When we need to model nominal (unordered choices) outcome variables! The individual is choosing between more than two alternatives. Undergraduate major choice Food choices Car manufacturer choice

Important!! The multinomial logit model applies only when the predictor variables are individual specific. Income. Age. Social economic status. ... and so on.

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

The Multinomial Distribution Consider a random variable Yi that may take one of several discrete values (1,2,...,J) Let: P r{Yi = j} = πij

i = 1, .., n

j = 1, .., J

Where πij is the probability that the i-th individual choose the j-th category2 . And J X

πij = 1 i = 1, .., n

j=1 2

Rodriguez,2016 Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Multinomial logit model Model assumptions Parameter estimation: MLE

The Multinomial Logit Model

A model for the probabilities where the probabilities depend on a vector Xi . Nominate one of the response categories as baseline. Calculate the logits for all other categories. with respect to the baseline. Let the logits be a linear function of the predictors.

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Multinomial logit model Model assumptions Parameter estimation: MLE

The Multinomial Logit model (cont.)



πij log πiJ



0

= Xi βj j = 1, .., J − 1

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Multinomial logit model Model assumptions Parameter estimation: MLE

The Multinomial Logit model (cont.)

Modeling the probabilities3 :

0

P rob(Yi = j|Xi ) = πij

=

exp(Xi βj ) , PJ−1 0 1 + j=1 exp(Xi βj )

P rob(Yi = J|Xi ) = πiJ

=

1 , PJ−1 0 1 + j=1 exp(Xi βj )

j = 1, 2, ..., J − 1

Note: The binomial logit model is a special case when J = 2.

3

Greene, 2012 Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Multinomial logit model Model assumptions Parameter estimation: MLE

Model assumptions

Linearity Independence of observations Independence of irrelevant alternatives (Not as important as the first two!!)

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Multinomial logit model Model assumptions Parameter estimation: MLE

Maximum Likelihood Estimation Recall that the Likelihood function for a parameter vector θ resulting from an independent sample is4 : L(θ|y1 , y2 , ..., yn ) = p(y1 |θ) × p(y2 |θ) × ... × p(yn |θ) In our case, since for observation i the probability is given by: P r{Yi = j} = πij We have that the Likelihood function is the following: Y Y Y L(π|data) = πi1 πi2 ... i∈choice1

i∈choice2

πiJ

i∈choiceJ

We choose the βˆ that maximizes L(π|data)

4

Westfall and Henning, 2013 Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Estimated probabilities

Estimated probabilities

ˆ we can estimate the probOnce we have estimated the parameters (β), abilities for each particular cohort. P r (Y = O0 |X1 = x1 , X2 = x2 )

=

P r (Y = O1 |X1 = x1 , X2 = x2 )

=

P r (Y = O2 |X1 = x1 , X2 = x2 )

=

1 ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

ˆ

e(β10 +β11 x1 +β12 x2 ) 1+

ˆ ˆ ˆ e(β10 +β11 x1 +β12 x2 )

1+

ˆ ˆ ˆ e(β10 +β11 x1 +β12 x2 )

ˆ

Arias Ll., Sandoval-Mej´ ıa, Wang

ˆ

1 + e(β10 +β11 x1 +β12 x2 ) + e(β20 +β21 x1 +β22 x2 )

ˆ

ˆ

+ e(β20 +β21 x1 +β22 x2 ) ˆ

e(β20 +β21 x1 +β22 x2 ) ˆ

+ e(β20 +β21 x1 +β22 x2 )

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Accounting example

Simulation Accounting example

Response variable: Tax service provider. Three nominal choices: B - Big4 N - Non Big4 S - Self preparer

Predictor variable: Company’s tax preparation budget. (lbudget)

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

Accounting example

Simulation Accounting example

The Model:  log  log

P r(Y = N |lbudget) P r(Y = B|lbudget)



P r(Y = S|lbudget) P r(Y = B|lbudget)



=

β01 + β11 lbudget

=

β02 + β12 lbudget

The probabilities of choosing each of the three choices: P r (Y = B|lbudget)

=

1 1 + e(β01 +β11 lbudget) + e(β02 +β12 lbudget)

P r (Y = N |lbudget)

=

e(β01 +β11 lbudget) 1 + e(β01 +β11 lbudget) + e(β02 +β12 lbudget)

P r (Y = S|lbudget)

=

e(β02 +β12 lbudget) 1 + e(β01 +β11 lbudget) + e(β02 +β12 lbudget)

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

References

McKee, D. 2015. An intuitive introduction to the multinomial logit. Available at: https://www.youtube.com/watch?v=n7zHXjuE6PE&t=2732s Institute for Digital Research and Education, UCLA. R Data analysis examples: Multinomial Logistic Regression. Available at: http://www. ats.ucla.edu/stat/r/dae/mlogit.htm Greene, W, H. 2012. Econometric Analysis, 7th Edition. Prentice Hall, NJ. Rodriguez, G. 2016. A note on interpreting multinomial logit coefficients. Princeton University. Available at: http://data.princeton. edu/wws509/stata/mlogit.html Westfall, P., Henning, K. S. (2013). Understanding advanced statistical methods. CRC Press.

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression

Introduction Multinomial Logistic Regression Example in R Simulation in R References

THANK YOU !!! Questions?

Arias Ll., Sandoval-Mej´ ıa, Wang

Multinomial Logistic Regression