Multi-Nominal Choice Models

ECON 203C: System Models TA Note 7: Version 1 Multi-Nominal Choice Models Hisayuki Yoshimoto Last Modi…ed: May 28, 2008 Abstract: 1 Introduction ...

Author: Pierce Hill

5 downloads 0 Views 190KB Size

Report

Download PDF

Recommend Documents

Discrete choice models

Semiparametric Inference in Dynamic Binary Choice Models

Normative vs. Positive Models: Choice under Uncertainty

Choice Models and the Hospitality Business Environment

Hybrid Choice Models: Progress and Challenges

NON-LINEAR UTILITY FUNCTIONS IN MNL DISCRETE-CHOICE MODELS

Integrating latent variables in discrete choice models How higher-order values and attitudes determine consumer choice

Analyzing paired-comparison data in R using probabilistic choice models

Causal Models of Decision Making: Choice as Intervention

Hierarchical Diffusion Models for Two-Choice Response Times

Bounds on Parameters in Panel Dynamic Discrete Choice Models

9 Models of Decision Making Under Uncertainty: The Criminal Choice

Panel Data Discrete Choice Models of Consumer Demand

Understanding Choice Behavior Beyond Option Scaling Using Structural Equation Models

Probabilistic Choice Models for Product Pricing using Reservation Prices

Huawei Datacom New Choice! Right Choice!

Location of choice Partner of choice

24 Menu. Choice of. Choice of

A CHOICE HOUSE FOR CHOICE PEOPLE

Heavy Models, Light models and Proxy Models

Complements versus Substitutes and Trends in Fertility Choice in Dynastic Models

Colony choice in birds: models based on temporally invariant site quality

Using Elicited Choice Probabilities to Estimate Random Utility Models: Preferences for Electricity Reliability

Consumer Valuation of Energy-Saving Features of Residential Air Conditioners with Hedonic and Choice Models

ECON 203C: System Models TA Note 7: Version 1

Multi-Nominal Choice Models Hisayuki Yoshimoto Last Modi…ed: May 28, 2008

Abstract:

1

Introduction

On this TA note, we will discuss Multi-nominal models. So far, we have learned binary choice (probit and logit) models, that consists of only two choices. Now, we extend the concept of binary choice model to multi-nominal choice models that consists of more than or equal to three choices. In summary, we have two categories of multi-nominal choice models such as category Ordered Multi-Nominal Choice Models Non-Ordered Multi-Nominal Choice Models

with speci…c distribution function of error term Ordered Probit / Logit Model Multi-Nominal Probit / Logit Models

We …rst discuss the ordered multi-nominal choice models. Then, we also discuss non-ordered Multi-nominal choice models.

2

Ordered Multi-Nominal Choice Models (Ordered Probit / Logit Models)

Multi-nominal model is used for the data in which yi takes more than or equal to three values. Furthermore, if values of yi can be ordered, we call it ordered multi-nominal choice model. For example, survey data (such as political opinion survey of "do you agree with tax reduction?") takes 8 assign value 2 < yes yes/no (not sure) assign value 1 ; yi = : no assign value 0 and we can order yes > yes/no (not sure) > no. Another example is grade data in a college (undergraduate) course such as 8 grade A (student did well) > > < grade B (student did average) yi = grade C (student did poorly) > > : grade D (student did not understand anything, re-take it!) In this case, we can order grade A > grade B > grade C > grade D.

1

assign assign assign assign

value value value value

3 2 ; 1 0

2.1

Three-Choices Ordered Multi-Nominal Models

We …rst consider the simplest ordered multi-nominal choice models, the case of three mutually exclusive choices. Let’s assume that yi takes only three ordered values 0; 1; and 2: We will generalize the morel that has more than three choices later. We have a latent variable yi yi

= x0i + "i + "i = x0i |{z}|{z} |{z} 1 1

1 KK 1

x0i |{z}

=

+

explanatory variable part

and yi determines the value of yi by the following 8 < 0 1 yi = : 2

"i |{z}

;

error term that has c.d.f. F ("i )

rule : if yi 1 : if 1 < yi : if 2 < yi

:

2

Our goals are estimating parameter ; 1 ; and 2 and deriving their asymptotic distribution. Assume that the error term has c.d.f. F ( ) : If we assume the error term "i is distributed standard normally, i.e. c.d.f. function of "i is Z "i 1 1 2 p exp F ("i ) = " ; 2 i 2 1 we call the model ordered probit model. If we assume the error term "i is distributed as logit distribution, i.e. c.d.f. function of "i is F ("i ) =

exp ("i ) ; 1 + exp ("i )

we call the model ordered logit mode. We will keep discussing the model without specifying the distribution of "i ; and keep using the notation of general c.d.f. F ( ) : Now, conditional probabilities of yi = 0; 1; 2;given xi are derived by yi = 0 , y i , x0i + "i (*) 1 1 0 yi = 1 , , (#) 1 < yi 2 1 < xi + "i 2 0 yi = 2 , , (%) 2 < yi 2 < xi + "i (*) , x0i "i (or 1 x0i "i ) , Pr ( yi = 0j xi ) = 1 F ( 1 x0i ) 1 0 0 (#) , xi < "i xi , Pr ( yi = 1j xi ) = F ( 2 x0i ) F ( 1 1 2 0 (%) , xi < "i , Pr ( yi = 2j xi ) = F ( 2 x0i ) 2 Therefore, we can write the N -sample likelihood function as (note that samples are i.i.d.)1 Ln ( ;

1;

2)

=

N Q

i=1

=

N Q

i=1 1 Remind

fPr ( yi = 0j xi )g f1

F(

1(yi =0)

1fyi =1g

fPr ( yi = 1j xi )g

1(yi =0)

1

x0i )g

fF (

2

x0i )

fPr ( yi = 2j xi )g 1fyi =1g

F(

1

x0i )g

that the the p.d.f of binary choice model with probability p with probability 1

y=

1 0

f (p)

=

py (1

=

p1fy=1g (1

p

is

Similarly, the p.d.f of three choice model

is

8 < 2 1 y= : 0

p)1

y

p)1fy=0g

with probability p with probability q with probability 1

f (p; q) = p1fy=2g q 1fy=1g (1

2

p

p

q

q)1fy=0g :

1fyi =2g

fF (

1fyi =2g

2

x0i )g

:

x0i )

The N -sample log-likelihood function is ln ( ;

1;

2)

=

ln Ln ( ; 1 ; 2 ) 2 1 (yi = 0) ln f1 F ( 1 x0i )g N P 4 + 1 (yi = 1) ln fF ( 2 x0i ) F ( = i=1 + 1 (yi = 2) ln fF ( 2 x0i )g

1

Therefore, the ML estimators are de…ned as ^

M L;

^ 1;M L ; ^ 2;M L = arg max l ( ; ;

1; 2

1;

3

x0i )g 5 :

2) :

The asymptotic distribution of ML estimator is de…ned in normal way. De…ne the 1-sample likelihood and log-likelihood function L1 ( ; l1 ( ;

1(y =0)

1fy =1g

= fPr ( yi = 0j xi )g i fPr ( yi = 1j xi )g i 1 ; 2 ) = ln L1 ( ; 1 ; 2 ) = 1 (yi = 0) ln f1 F ( 1 x0i )g + 1 (yi = 1) ln fF ( 2 x0i ) F ( 1 x0i )g + 1 (yi = 2) ln fF ( 2 x0i )g ; 1;

2)

The asymptotic distribution is 0 p

B2 3 B ^ ML B 4 5 NB B ^ 1;M L B ^ L @| 2;M {z } (K+2) 1

2 4

1

1 2

2 3C C C 5C ! N 6 40(K+2) C C A

(K+2) (K+2)

where the 1-sample Fisher information matrix I1 is de…ned as 2 I1 |{z}

(K+2) (K+2)

2.2

=

6 6 6 E6 6 2 6 4@ 4

1 2

5@

3

7 5;

3

7 7 7 l1 ( ; 1 ; 2 ) :7 7 7 5

@

3

I1 1 |{z}

1;

1fyi =2g

fPr ( yi = 2j xi )g

1

2

J-Choice Multi-Nominal Ordered Model

We can easily extend above 3 choices ordered model to J choices ordered model. De…ne the latent variable as yi = x0i + "i ; and yi determines the value of yi by the following 8 0 > > > > 1 > > > > 2 > > > > .. < . yi = j > > > > .. > > > . > > > > J 2 > : J 1

rule : : : .. .

if yi 1 if 1 < yi if 2 < yi .. .

: .. .

if .. .

j

< yi

: if J 2 < yi : if yi > J 1

2 3

: j+1

J 1

Assume that the error term "i has c.d.f. F ("i ) : As the same as three choices model, if we choose F ("i ) as standard normal, we call the model as ordered probit model. Also, if we choose F ("i ) as logit distribution, we call the model as ordered logit model. Keep discussing without specifying the distribution and keep using general notation of c.d.f. F ( ). 3

Again, our goal are estimating parameters ; 1 ; 2 ; ; J 2 ; J 1 and deriving their asymptotic distribution. As before, the conditional probabilities of yi = 0; 1; 2; : : : ; J 1 given xi are 8 Pr ( yi = 0j xi ) = 1 F ( 1 x0i ) > > > > Pr ( yi = 1j xi ) = F ( 2 x0i ) F ( 1 x0i ) > > > > Pr ( yi = 2j xi ) = F ( 3 x0i ) F ( 2 x0i ) > > > > .. .. < .. . . . : Pr ( y = Jj x ) = F j+1 x0i F j x0i > i i > > > .. .. .. > > > . . . > > > Pr ( yi = J 2j xi ) = F > x0i F J 2 x0i J 1 > : Pr ( yi = J 1j xi ) = F J 1 x0i Therefore, we can write the likelihood function as (note that samples are i.i.d.) Ln =

;

1;

N Q

i=1

2; : : : ;

J 1 1(y =0)

1fyi =1g

fPr ( yi = 0j xi )g i fPr ( yi = 1j xi )g 1fy =jg fPr ( yi = jj xi )g i 1fy =J 2g fPr ( yi = J 2j xi )g i fPr ( yi = J

N Q 6 4

i=1

=

2 2

f1 F

6 4

F

1(yi =0)

j+1

x0i )g x0i F

j

J 1

x0i

J 2

F(

1

F

7 5

1fyi =J 1g

F(

1fyi =J 2g

x0i

3

1fyi =2g

1j xi )g

x0i ) 2 1fyi =jg

fF ( x0i

fPr ( yi = 2j xi )g

1fyi =1g

x0i )g

1

F

x0i )

3

F(

x0i )g

2

1fyi =2g

The log likelihood function is ln

;

1;

2; : : : ;

=

J 1

ln Ln 2

6 6 6 6 N 6 P 6 = 6 i=1 6 6 6 4

;

1;

2; : : : ;

J 1

ln f1 F ( 1 x0i )g ln fF ( 2 x0i ) F ( ln fF ( 3 x0i ) F (

1 (yi = 0) + 1 (yi = 1) + 1 (yi = 2) + + 1 (yi = j) + + 1 (yi = J + 1 (yi = J

ln F

x0i

j+1

2) ln F 1) ln F

2

F x0i x0i

J 1 J 1

3

x0i )g x0i )g

1

j

x0i

F

x0i

J 2

Therefore, the ML estimators are de…ned as ^

M L;

^ 1;M L ; ^ 2;M L ;

As the same as three-choice ordered 02 ^ ML B6 ^ 1;M L B6 6 p B B6 ^ 2;M L N B6 .. B6 . B6 @4 ^ J 2

^J

where

I1 =

E

"

1

; ^J

1;M L

= arg max ln ;

1; 2

;

1;

2; : : : ;

model, we have the asymptotic distribution 3 2 31 7 7 7 7 7 7 7 5

6 6 6 6 6 6 6 4

7C 2 7C 7C 7C 6 7C ! N 40(K+J 7C 7C 5A

1 2

.. .

J 2 J 1

(K+J 1) (K+J 1)

@ @

1

2

J 2

J 1

0

3

I1 1 |{z}

1) 1 ;

ln

@

1

4

2

J 2

J 1

7 7 7 7 7 7: 7 7 7 7 5

:

J 1

7 5

;

1;

2; : : : ;

J 1

3 7 5

1fyi =J 1g

x0i

J 1

fF (

#

:

3

Final 2003S: Question 2 - Ordered Probit Model

Suppose that a person is faced with three discrete choices 1; 2; and 3; depending on the value of latent variable. That is yi = zi0

0

+ vi

and yi yi yi

= 1 , yi 1; = 2 , 1 < yi2 = 3 , yi > 2 ;

2

where a K-dimensional parameter space. (1) Write the likelihood function which, if maximized, will yield an estimator for the model’s parameter. Answer2 : Assume samples are i.i.d. and the error term vi has c.d.f. F (vi ) :Then, conditional probabilities are yi = 1 , yi , zi0 + "i (*) 1 1 0 yi = 2 , < y , < z + " (#) i 1 2 1 2 i i 0 + " (%) , < z yi = 3 , < y i 2 2 i i 0 (*) , zi0 " (or z " ) , Pr ( yi = 0j zi ) = 1 F ( 1 zi0 ) i i 1 1 i 0 0 (#) , z < " z , Pr ( yi = 1j zi ) = F ( 2 zi0 ) F ( i 1 2 i i 0 (%) , z < " , Pr ( yi = 2j zi ) = F ( 2 zi0 ) i 2 i Therefore, the N -sample likelihood function is LN =

N Q

i=1

=

N Q

i=1

;

zi0 )

1; 2

fPr ( yi = 1j zi )g f1

1

F(

1fyi =1g

zi0 )g

1

fPr ( yi = 2j zi )g

1fyi =1g

fF (

2

1fyi =2g

zi0 )

F(

1fyi =3g

fPr ( yi = 3j zi )g 1fyi =2g

zi0 )g

1

1fyi =3g

fF (

zi0 )g

2

:

(2) Suppose now that vi j xi i.i.d.N 0; 2v : Write the speci…c likelihood function. What are the conditions that will allow us to estimate 0 ? Justify your answer. Answer: Now, we have vi j xi i.i.d.N 0; 2v (so the model is ordered probit model with variance of error term 2v ). The conditional probabilities become

Pr ( yi = 1j zi ) Pr ( yi = 2j zi ) Pr ( yi = 3j zi )

,

1

,

F(

,

F(

F(

zi0 )

1

zi0 )

2

F(

zi0 )

1

zi0 )

2

,

1

1

zi0

assum e

,

v

,

2

,

2

zi0

1

v

zi0

assum e

,

v

zi0

assum e

,

v

v =1 v =1 v =1

1

(

1

zi0 )

(

2

zi0 )

(

2

zi0 )

(

Therefore, we can identify ; 1; and 2 up to scale. If we assume v = 1 (assuming error term "i is distributed as standard normal), we can identify parameters ;

1;

1

and

2:

Under the assumption of LN

;

1; 2

=

N Q

i=1

v

f1

= 1, the N -sample likelihood function is (

1fyi =1g

1

zi0 )g

f (

2

zi0 )

1fyi =2g

(

1

(

1

zi0 )g

f (

2

zi0 )g

1fyi =3g

:

The N -sample log-likelihood function is lN

;

1; 2

=

ln LN ; 1; 2 2 1 fyi = 1g ln f1 N P 4 +1 fyi = 2g ln f ( = i=1 +1 fyi = 3g ln f (

2 2

zi0 )g 0 zi ) ( zi0 )g

1

3

zi0 )g 5

2 Note that It is convention to de…ne J multi-nominal values y as y = 0; 1; 2; ; J 1: However, in this question, unlike conventional way, i i the professor de…ned yi = 1; 2; 3; skipping yi = 0 terms. This change does not a¤ect anything on estimation.

5

zi0 )

(3) Provide the MLE for 0 ; say ^ n ; and show that it is a consistent estimator for Answer: Denote ^ M L = ^ n : The ML estimators are de…ned as ^ M L ; ^ 1;M L ; ^ 2;M L = arg max ;

lN

1; 2

;

1; 2

0:

:

Since the log likelihood function of ordered probit model (as the same as ordered logit model) is concave, the consistency of parameter is established by regularity conditions. (For details of regularity conditions, see Hogg, Mckean, Craig, 6th edition Ch.6) (4) Suppose that it is claimed that K P 2 0k = K: k=1

How would you test this hypothesis? Give a detailed description of the suggested procedure, including the test statistic that is being proposed. Answer: We have hypotheses 8 K P > 2 > K=0 < H0 : 0k > > : H1

k=1 K P

:

k=1

2 0k

K 6= 0

Deriving the Wald statistic. We have the asymptotic distribution of ML estimator 1 0 2 3 2 3C B2 ^ C B ML 0 p B C 6 N B4 ^ 1;M L 5 4 01 5C ! N 40(K+2) C B A @ ^ 2;M L 02 | {z }

I1 1 |{z}

1;

(K+2) (K+2)

(K+2) 1

where the 1-sample Fisher information matrix I1 is de…ned as 2 I1 |{z}

=

(K+2) (K+2)

where l1 ( ; 1 ; 2 ) = 1 fyi = 1g ln f1

(

1

6 6 6 E6 6 2 6 4@ 4

1 2

5@

zi0 )g + 1 fyi = 2g ln f (

1

7 7 7 l1 ( ; 1 ; 2 ) :7 7 7 5

2

zi0 )

2

(

1

De…ne the continuous and di¤erentiable function h( ;

1;

2)

=

K P

2 k

k=1

2

@4

|

2

@ 1 2

2 1 6 2 2 6 6 .. 3 h ( ; 1; 2) = 6 6 . 6 2 K 6 5 4 01 1 01 1 {z }

(K+2) 1

6

7 5;

3

@

3

3

K 3 7 7 7 7 7 7 7 5

zi0 )g + 1 fyi = 3g ln f (

2

zi0 )g :

Then we have

2

2 1 6 2 2 6 6 .. 3 h ( ; 1; 2) = 6 6 . 6 2 K 6 5 4 01 1 01 1 {z }

@

@4

1 2

|

3

2

7 7 7 7 7 7 7 5

(K+2) 1

@4

Then, by multi-nominal delta-method3 , we have 0 p B N @h ^ M L ; ^ 1;M L ; ^ 2;M L

1;

2)

2

{z

=

0;

1 = 01 ; 2 = 02

1 (K+2)

A

The Wald statistic is derived as W

=

0

h ^ M L ; ^ 1;M L ; ^ 2;M L | {z } 1 1

= N h ^ M L ; ^ 1;M L ; ^ 2;M L = N

K P

2

|

h( |

0;

=01

^ 2k;M L

K

j

=

{z

1

{z01

1

2

C d ! N 401 }

;

02 )A

under H0

1 = 01 ; 2 = 02

}

2 01 6 2 02 6 6 .. 6 =6 . 6 2 0K 6 4 01 1 01 1

3

7 7 7 7 7: 7 7 5

3 5

1 ; |{z} 1 1

1

0

1;

1 N |

j

1;

2

|

}

=^ M L ;

2)

{z

=

0;

1 = 01 ; 2 = 02

(K+2) 1

C C C C C: C C A }

3

7 7: 5 }

1 =^ 1;M L ; 2 = 2;M L

{z

1 1

1

j

=^ M L ;

1 =^ 1;M L ;

2=

2;M L

{z

1 1

2

0;

C B C B C B C B 2@ 3 h( ; I1 1 C B C B |{z} C (K+2) (K+2)B 4 5 A @@ 1

6 N6 401

1 N j

2

k=1

2)

(K+2) 1

Therefore, under H0 ; the approximated distribution is 2 h ^ M L ; ^ 1;M L ; ^ 2;M L

1;

5

10

0

|

3 hh ( ;

1

|

where

B B B B 2@ 3 = h( ; |{z} B B B 4 1 1 5 @@ 1

@

2

and

2

=^ M L ;

=^ M L ;

1 =^ 1;M L ; 2 = 2;M L

}

h ^ M L ; ^ 1;M L ; ^ 2;M L {z } | 1 1

1 =^ 1;M L ; 2 = 2;M L

and W is distributed as

2

W

2

(# of restrictions)

(1) :

3 Multi-Variate

Delta Method (Rewriting): p d Let n be a sequence of K 1 random vector that has asymptotic distribution n ( n ) ! N (0; ), where is K 1 vector and is K K K covariance matrix. For given L 1 multi-demensional function g (x) such that g : R ! RL ; and given speci…c K 1 vector of ; d suppose that e 0 g (x) exists and is not equal to 0L K : Then dx x= | {z } L K

p

n (g (

n)

0

p B n @g ( n ) | {z } L 1

d

g ( )) ! N 1

0L

0

1;

B B C d g ( )A ! N B B0L | {z } @ L 1

@ g (x) @x0

1;

x=

@ g (x) @x0 | {z

L K

7

x=

@ g (x) @x0

0

x=

@ g (x) |{z} @x0 }K K | {z K

L

1

x=

0C C

C: C A }

4

(Non-Ordered) Multi-Nominal Choice Models (Multi-Nominal Probit / Logit Models)

In previous sections, we discussed models in which the dependent variable yi has orders. However, there are many discrete choice variables that does not have clear order relations. This section is summarizing the contribution of Daniel McFadden (Nobel prize recipient in 2000). Explaining by example. If you want to travel from New York City to Boston, there are three alternatives j = 0; 1; 2 8 < use Greyhound bus assign value j = 0 use Amtrack train assign value j = 1 ; yi = : use airplane assign value j = 2 associate with explanatory variables

2

3 hours to take in transportation method j 6 price of ticket in transportation method j 7 7 xij = 6 4 5 agent i’s wealth agent i’s age

for all agent i: Clearly, these transportation methods does not have clear order relations4 . Notice that econometricans (you) observe, fyi ; xi1 ; xi2 ; xg De…ne partially observable latent variables yi;j ,j = 0; 1; 2 as 0

yi;0

= xi;0 0

yi;1

= xi;1 0

yi;2

= xi;2

0

+ "i;0

(utility of using bus)

1

+ "i;1

(utility of using train)

2

+ "i;2

(utility of using airplane)

You can interpret latent variable yi;j as utility that agent i can obtain by using transportation method j: By utility maximization theory, agent i choose to use

Analyzing in detail. Agent i uses a bus (j = 0) if yi;0

8 if yi;0 yi;1 and yi;0 yi;2 < bus (j = 0) train (j = 1) if yi;1 > yi;2 and yi;1 yi;0 : : airplane (j = 2) if yi;2 > yi;0 and yi;2 > yi;1 yi;1 and yi;0 0

,

xi;0

,

xi;0

0

0 0

yi;2 0

+ "i;0 0

xi;1

xi;1

1

0

+ "i;1 and xi;0 0

"i;1

1

"i;0 and xi;0

0

0

+ "i;0 0

xi;2

0

xi;2 2

2

+ "i;2

"i;2

"i;0

Agent i uses a train (j / = 1) if yi;1

>

yi;2 and yi;1 0

, xi;1

1

, xi;1

1

0

yi;0 0

+ "i;1 > xi;2 0

xi;2

2 "i;2

2

0

+ "i;2 and xi;1

1 0

> "i;2

0

+ "i;1

"i;1 and xi;1

xi;0 0

xi;0

1

0

0

+ "i;0 "i;0

"i;1

Agent i uses a plane (j = 2) if yi;2

>

yi;0 and yi;2 > yi;1 0

, xi;2

2

, xi;2

2

0

4 FYI:

0

+ "i;2 > xi;0 0

xi;0

0

0

> "i;0

0

+ "i;0 and xi;2 0

"i;2 and xi;2

2 2

0

+ "i;2 > xi;1 0

xi;1

1

1

> "i;1

+ "i;1 "i;2 :

Your TA once took "Chinatown to Chinatown" ultra cheep bus from NYC to Boston. It was only $15.

8

0

Denote, xi = [xi;0; xi;1 ; xi;2 ] The conditional probabilities are Pr ( yi = 0j xi )

=

Pr

Pr ( yi = 1j xi )

=

Pr

Pr ( yi = 2j xi )

=

Pr

h

h

h

0

xi;0 0

xi;1 0

xi;2

0

0

xi;1

1

xi;2

2

xi;0

0

0

1

"i;1

2 "i;2 0

"i;0

> "i;2

> "i;0

and "i;1

"i;2

0

xi;0

0

and

and

0

xi;1 0

xi;2

2

0

xi;2

"i;2

2 0

xi;0

1 0

xi;1

1

0

> "i;1

Assume that error terms ["i;0 ; "i;1 ; "i;2 ] has joint normal distribution 2 3 0 1 "i;0 4 "i;1 5 N @03 1 ; A |{z} : "i;2 3 3

"i;0 "i;0

i

"i;1 i "i;2 :

i

The likelihood function is de…ned as 0 1 N Q 1fy =0g 1fy =1g 1fy =1g L @ 0 ; 2 ; 2 ; |{z}A = fPr ( yi = 0j xi )g i fPr ( yi = 1j xi )g i fPr ( yi = 0j xi )g i : i=1 3 3

and ML estimator is de…ned as ^

0;M L ;

^

2;M L ;

^

2;M L ;

^ M L = arg

max

0; 2; 2;

fln L (

0;

2;

2;

)g :

Let’s see the Comp question to understand the concept of (unordered) multi-nominal choice model.

9

5

Comp 2004F Part II (Kyriazidou): Question 2

A person decides to migrate depending on wether the present value of his/her lifetime utility at the present location, assumed to be determined by: ui;p = Xi;p

+ "i;p

p

(1)

is less or equal to the present value of his/her lifetime utility at the migration location assumed to be determined by ui;m = Xi;m

m

+ "i;m

(2)

minus the migration cost, assumed to be determined by Ci = Zi + ui : (3) In expression (1)-(2) the subscripts p; m denote present migration location. Variables in Xi;p and Xi;m would include, for example, the person’s education, experience, age, race, gender, and local unemployment rates and average wages in di¤erent sectors. Variables in Zi would include, for example, whether the individuals is self-employed and wether he/she has recently changed industry of employment. "i;p ; "i;m ; and ui are unobserved error terms that are jointly normally distributed with zero means and positive covariance matrix 2 3 =4

pp

pm

pu

mm

mu uu

5:

Suppose that we can observe whether an individual has decided to migrate or not, and Xi;p ; Xi;m ; and Zi which are assumed to be independent of all error terms. (Preparation) First of all, we distinguish observable and unobservable data. Observable data are Xi;p ; Xi;m ; and Zi : Unobservable are data are ui;p ; ui;m ; and Ci : Note that we assume 2

=4

5

pp

pm

pu

mm

mu uu

3 5

is known . The point of this question is that we can construct the model with using unobservable data Xi;p ; Xi;m ; and Zi ; but we cannot use unobservable for estimating (calculating) parameters p ; m ; and : Let’s see what this exactly means. (a) Construct the econometric model for the migration decision for an individual i: (b) Construct the log-likelihood for a sample of i.i.d. observations of the migration decision of individuals. (Answer) The person i migrate if

ui;m

ui;p

ui;m Ci

> ui;p + Ci > 0:

i.e. the utility in new place is higher than the utility in the present place plus migration cost. Substituting the equation (1), (2), and (3) into the above inequality, we have 5 Once

can guess that this assumption is fairly unrealistic. We jsut assume it to solve this question.

10

Xi;m m + "i;m Xi;p Xi;m m p + Xi;m m

Xi;p

Xi;p

+ "i;p (Zi + ui ) > 01 Zi "i;p + "i;m ui > 01 {z } |

1

p

p

+ Xi;m

Zi +

m

>

1

01

1;

(4)

where I de…ned the new r.v. "i;m

"i;p

1

ui =

1

2

3 "i;p 4 "i;m 5 ui

1

Notice that, since "i;m ; "i;p ; and ui are distributed normally, The expectation of r.v. is E [ ] = E ["i;m

"i;p

ui ] = E ["i;m ]

the variance of r.v.

V ar [ ]

E [ui ] = 01

pm

pu

pm

mm

mu

pu

mu

uu

is

33 22 33 "i;p "i;p = V ar 4A 4 "i;m 55 = AV ar 44 "i;m 55 A0 = ui ui

= = =

1

pp

2 2

=

3 "i;p A 4 "i;m 5 : ui

is also r.v.

E ["i;p ]

By the assumption E ["i;m ] = E ["i;p ] = E [ui ] = 0: Also, since we have 22 33 2 "i;p V ar 44 "i;m 55 = 4 ui

2

pp

(

+

pp

+

pm

pm

pm

+

mm

mu

pu )

pm

+ pp + mm +

pp

pu

pu uu

1

+ ( pm + mm pm + mm mu + 2 pm + 2 pu 2 mu

pu mu )

+

mu

+

+ 01

2

Thus, the r.v.

V ar [ ] =

pp

+

mm

+

2

pm

4 uu

+

pu

pm

mm

mu

pu

mu

uu

3 1 4 1 5 1 2

uu )

mu

uu

+2

pu

2

mu :

is distributed as (summation of normal r.v.’s is normally distributed) N 01

1;

2

and we can normalize N (0; 1) De…ne that yi =

1 if the person i migrate : 0 if the person i keep staying in present place

The conditional probability of migration is,

11

1;

pm

;

uu

= 01

pp

where I de…ne that 2

1

5;

1

pu

1

3

mu

(

pu 2

1

+ 01

32

3 1 54 1 5 1

Pr [yi = 1jxim ; xip ; zi ]

= Pr

xi;p

=

Pr

=

Pr

xi;p xi;p

=

1

p

+ xi;m p + xi;m

xi;p xi;p

m

p

xi;p

= =

+ xi;m

p

p

p

m m

+ xi;m

+ xi;m

xi;m

m

zi +

> 0 xim ; xip ; zi

zi > zi >

xim ; xip ; zi xim ; xip ; zi

zi

m

zi

m

(From inequality (4))

+ zi

(By the symmetricity of standard normal) :

Also, conditional probability of staying in the present place is Pr [yi = 0jxim ; xip ; zi ]

= 1 =

Pr [ Yi = 1j xim ; xip ; zi ] xi;p p xi;m m + zi

1

:

Then, the density function is f (yi jxim ; xip ; zi )

1 yi

y

= fPr [yi = 1jxim ; xip ; zi ]g i fPr [yi = 0jxim ; xip ; zi ]g xi;p

=

p

xi;m

m

yi

+ zi

xi;p

1

xi;m

p

m

1 yi

+ zi

:

Since samples are i.i.d., the n -sample likelihood function is Ln

m;

p;

; yi ; xim ; xip ; zi

= joint conditional density density " n xi;p p xi;m m + zi Q =

yi

xi;p

1

xi;m

p

m

+ zi

i=1

1 yi

#

:

The n-sample log-likelihood is ln

m;

p;

; ; yi ; xim ; xip ; zi

= =

log L log

n Q

i=1

m;

"

8 n < P = i=1 : + (1

p;

; yi ; xim ; xip ; zi xi;p

xi;m

p

yi log

xi;p

h yi ) log 1

m

p

yi

+ zi

xi;m xi;p

p

xi;p

1

m +zi

xi;m

m +zi

i

9 = ;

p

xi;m

m

+ zi

1 yi

#

: (5)

Notice that this n-sample log-likelihood function does not include any unobservable data, ui;p ; ui;m ; and Ci : Thus, under regularity conditions, maximization problem over this n-sample log-likelihood function is implantable. (c) Describe as precisely as you can the maximum likelihood of unknown parameters in equation (1)-(3). Can all of coe¢ cients in equation (1)-(3) be consistently estimated? (Answer)6 Taking f.o.c. w.r.t. p ; m ; ;and ; we have four equations. The maximum likelihood estimators ^ p;M L ; ^ m;M L ; ^ i;M L ; and ^ M L are de…ned the solutions of the system of these four f.o.c.’s. Or equivalently, we can write 6 Actually, there are two methods of estimating parameters, maximum likelihood and WNLS methods. Since this question asks to describe the maximum likelihood method, I do not discuss the WNLS method here. If you are interested in the WNLS method, please read the appendix.

12

^

p;M L ;

^

m;M L ; ^ i;M L ;

^M L

arg max ln m; p;

m;

;

p;

; ; yi ; xim ; xip ; zi

:

However, for the following reason, we cannot estimate p ; m ; ; and consistently. (1) p ; m ; ;and can be identi…ed up to the scale. This means if we multiply p ; m ; and zi by some scalar constant, say a; and also multiply by a; the n-sample log-likelihood (5) becomes 8 9 xi;p (a p ) xi;m (a m )+zi (a ) > > > > y log < = i n a P ln m ; p ; ; ; yi ; xim ; xip ; zi = : xi;p (a p ) xi;m (a m )+zi (a ) > i=1 > > > : + (1 yi ) log 1 ; a

d d \ And after taking f.o.c.’s, all we can calculate is not ^ p;M L ; ^ m;M L ; ^ i;M L ; ^ M L ; but d p M L ; (a m )M L ; (a )M L ; (a )M L : Thus, we can estimate p ; m ; i ;and up to scale. (2) If Xi;p ; Xi;m ; and Zi share the common regressors, we only can estimate the coe¢ cients of shared regressors up to sum. This means, for the extreme example, if Xi;m ; and Zi are the identical, the log-likelihood function (5) becomes

ln

m;

p;

; ; yi ; xim ; xip ; zi =

n P

8
m ; p ; ; :i=1 1

m +zi

xi;p

p

o2

xi;m

m +zi

9 > = > ;

:

After taking f.o.c.’s (though I do not calculate them), we will see the system of four f.o.c.’s are equivalent of that of maximum likelihood. Thus, WNLS and ML estimators are equivalent. This fact also leads WNLS estimators have the same asymptotic distribution as ML estimator and asymptotically e¢ cient.

7 For

Bernoulli r.v. f (x) = px (1

p)1

x

where x = 1 or 0

the expectation and variance are Ex [X] = 1p + 0 (1

p) = p

and V arx [X]

=

(1

p)2 p + (0

=

(1

p) (1

=

(1

p) p:

14

p) p

p)2 (1 p2

p)