#2.3 ML Estimation in an Autoregressive Model Solution Guide

Econometrics II (Fall 2016) Department of Economics, University of Copenhagen By Morten Nyboe Tabor #2.3 ML Estimation in an Autoregressive Model – ...
0 downloads 0 Views 192KB Size
Econometrics II (Fall 2016) Department of Economics, University of Copenhagen By Morten Nyboe Tabor

#2.3

ML Estimation in an Autoregressive Model – Solution Guide

(1) Present the intuition for the maximum likelihood estimation principle, and outline the basic steps in deriving the estimators and the covariance matrix of the estimates. What is the number of parameters in the statistical model?

We assume to know the density of yt , given by, yt ∼ Density (θ) ,

(2.1)

where θ is a K-dimensional vector of parameters for the assumed density. The probability of observing yt is given by the density function, f (yt |θ). For independent and identically distributed (iid ) observations the joint density is given by, T Y f (yt |θ) . (2.2) f (y1 , ..., yT ) = t=1

The likelihood function, defined as, L (θ) = f (y1 , ..., yT ) =

T Y t=1

f (yt |θ) =

T Y

Lt (θ) .

(2.3)

t=1

can be written as the product of the individual likelihood contributions, Lt (θ), which indicate how much the individual observations contribute to the joint likelihood. The maximum likelihood estimator, θbM L , maximizes the joint likelihood. The intuition behind ML is that we select the estimator that maximizes the probability of observing the data given the model. Often, however, it is often more

1

convinient to work with the log of the likelihood function, ! T T Y X log L (θ) = log Lt (θ) = log Lt (θ) . t=1

(2.4)

t=1

Since the log function is a monotonic transformation, maximizing the log-likelihood and the likelihood functions yield the same results, but the log-likelihood function is typically easier to work with. To derive the ML estimator we carry out the following steps. Step 1. Write the likelihood function and the log-likehood function given the assumed distribution, L (θ) = f (y1 , ..., yT ) =

T Y

f (yt |θ) =

t=1

log L (θ) = log

T Y

! Li (θ)

=

T Y

Lt (θ)

(2.5)

t=1 T X

log Lt (θ) .

(2.6)

t=1

t=1

Step 2. Find the score vector, which is the first derivative of the log-likelihood function with respect to the parameter vector θ, s (θ) = (K×1)

T

T

t=1

t=1

∂ log L (θ) X ∂ log Lt (θ) X = = st (θ) . ∂θ ∂θ

(2.7)

Note, that s(θ) is of dimension (K × 1) and note that the score vector can be written as the sum of the individual scores for each observation. The score vector indicates the slope of the log-likelihood function. Step 3. Find the first order conditions for the ML estimator, θbM L , T    X  s θbM L = st θbM L = 0. (K×1)

(2.8)

t=1

and solve for θbM L to find the ML estimator. Note, that this is a system with K equations, the K so-called likelihood equations, and K parameters. In practice, it can be impossible to find an analytical solution to the likelihood equations. In such cases, numerical optimization algorithms can be used to find the ML estimator. Step 4. Find the Hessian as the second derivative, Ht (K×K)

=

∂ 2 log Lt (θ) , ∂θ∂θ0 2

(2.9)

which indicates the curvature of the log-likelihood function. The Hessian is a block-diagonal (K × K) matrix. Additionally, find the information matrix for observation t,   2 ∂ log Lt (θ) = −E [Ht ] . (2.10) I (θ) = −E ∂θ∂θ0 Step 5. The asymptotic covariance matrix is given by the inverse of the information. As the Hessian measures the curvature of the log-likelihood function the variance of the ML estimator depends on the Hessian. The greater curvature, the greater the second derivative, and the smaller the variance. The ML estimator is asymptotically normally distributed with,  √  T θbM L − θ0 −→ N (0, V ) (2.11) where θ0 are the true parameters and V is the asymptotic variance, so that,  θbM L −→ N 0, T −1 V .

(2.12)

The asymptotic variance, V , is given by, " −1

V = I (θ)

# ∂ 2 log Lt (θ) . = −E ∂θ∂θ0 θ=θ0

(2.13)

In practice, we can estimate the asymptotic variance by replacing population expectations with sample averages and by replacing the unknown parameters with the ML estimates, VbH =

!−1 T 1 X ∂ 2 log Lt (θ) − N ∂θ∂θ0 θ=θbM L

(2.14)

t=1

where the subscript H indicates that the estimate is based on the Hessian (alternatively, the asymptotic variance can be estimated based on the outer product of the scores).

3

(2) Show how the joint density function for the time series, y0 , y1 , ..., yT , denoted f (y0 , y1 , ..., yT | θ, σ 2 ), can be factorized into a series of conditional and marginal distributions. Discuss how to construct the likelihood function for y1 , y2 , ..., yT conditional on y0 . How does this procedure differ from the IID case? For iid data we can factorize the joint density as the product of the individual densities, T  Y  2 f y1 , y2 , ..., yT |θ, σ = f yt |θ, σ 2 . (2.15) t=1

For most economic data the iid assumption does not hold, so we cannot use this factorization. However, we use the factorization of a joint density into a conditional and a marginal density, f (A, B) = f (A|B) · f (B)

(2.16)

to factorize the joint (unconditional) density of y0 , y1 , ..., yT into a series of conditional and marginal densities,    f y0 , y1 , ..., yT |θ, σ 2 = f yT |y0 , y1 , ..., yT −1 ; θ, σ 2 · f y0 , y1 , ..., yT −1 ; θ, σ 2  = f yT |y0 , y1 , ..., yT −1 ; θ, σ 2   ·f yT −1 |y0 , y1 , ..., yT −2 ; θ, σ 2 · f y0 , y1 , ..., yT −2 ; θ, σ 2 = ... T Y   = f yt |y0 , y1 , ..., yt−1 ; θ, σ 2 · f y0 |θ, σ 2 .

(2.17)

t=1

By rewriting the expression, we get the joint density of y1 , y2 , ..., yT conditional on y0 ,  T Y   f y0 , y1 , ..., yT |θ, σ 2 2 = f yt |y0 , y1 , ..., yt−1 ; θ, σ 2 . f y1 , ..., yT |y0 ; θ, σ = 2 f (y0 |θ, σ ) t=1 (2.18) Despite that the time series data do not satisfy the iid assumption, we can still factorize the joint density into a product of the individual densities when we condition on y0 . Thereby, we can still use the ‘usual’ additive form for ML estimation based on the log-likelihood function. 4

(3) Find an expression for the likelihood contribution for yt | yt−1 , denoted Lt (θ, σ 2 ), and state the likelihood function for y1 , y2 , ..., yT | y0 . Also write the corresponding log-likelihood function. We consider the first order autoregressive, AR(1), model yt = θyt−1 + t ,

t = 1, 2, ..., T,

(2.19)

where we assume that t ∼ N (0, σ 2 ) and we condition on the initial value y0 . We derive the ML estimator based on the assumption that the error term is normally distributed. Note, that we have two parameters to estimate: the autoregressive parameter, θ, and the variance of the error term, σ 2 . First, we find the likelihood contribution of the error terms, εt = yt − θyt−1 , where we assume that E(εt ) = 0,   Lt θ, σ 2 = f yt |yt−1 ; θ, σ 2 ( ) (et − µε )2 1 exp = p σ2 (2πσ 2 ) ) ( 1 (yt − θyt−1 )2 = p exp . (2.20) σ2 (2πσ 2 ) The likelihood function is given by,   L θ, σ 2 = f y1, y2 , ..., yT |y0 ; θ, σ 2 =

=

=

T Y t=1 T Y t=1 T Y t=1

f yt |yt−1 ; θ, σ 2 Lt θ, σ 2



 (

1

p exp (2πσ 2 )

(yt − θyt−1 )2 σ2

)! .

(2.21)

The log-likelihood function is given by, log L θ, σ

2



=

T X t=1

 1 (yt − θyt−1 )2 1 1 − log (2π) − log σ 2 − 2 2 2 σ2

! ,

(2.22)

and the log-likelihood contributions,   1 (yt − θyt−1 )2 1 1 log Lt θ, σ 2 = − log (2π) − log σ 2 − . 2 2 2 σ2 5

(2.23)

(4) Calculate the individual scores   st θ, σ 2 = 

∂ log Lt (θ,σ 2 ) ∂θ ∂ log Lt (θ,σ 2 ) ∂σ 2

 .

We find the individual scores by differentiating the log-likelihood contributions with respect to the parameters θ and σ 2 (remember that we here differentiate with respect to σ 2 and not σ. Alternatively, you could consider σ and get similar results),   ! ∂ log Lt (θ,σ 2 ) yt−1 (yt −θyt−1 )  2 σ st θ, σ 2 =  ∂ log L∂θ θ,σ2  = . (2.24) 2 ) t( − 1 + 1 (yt −θyt−1 ) 2σ 2

∂σ 2

σ4

2

(5) State the likelihood equations as the first order conditions for maximizing the log-likelihood function. 2 . Solve the first order conditions and find the ML estimators, θbM L and σ bM L

The first order conditions are given by the likelihood equations,   b t−1 ) yt−1 (yt −θy T T   X   X σ b2 bσ bσ  s θ, b2 = st θ, b2 = 2  = 0. b 1 1 (yt −θyt−1 ) − 2bσ2 + 2 t=1 t=1 σ b4

(2.25)

We can rewrite the two equations separately as,   b T X yt−1 yt − θyt−1 = 0 σ b2 t=1 T X

  b t−1 yt−1 yt − θy = 0

t=1 T X t=1

yt−1 yt −

T X

b t−1 = 0 yt−1 θy

t=1 T X

yt−1 yt = θb

t=1

T X t=1

6

2 yt−1 ,

(2.26)

and,   2  b T y − θy X t−1 1 t  1  − 2 +  = 0 4 2b σ 2 σ b t=1



 2 b t−1 T y − θy X t 1

1T + 2σ b2 2

σ b4

t=1

PT



t=1

b t−1 yt − θy

= 0

2 =

σ b4

T . σ b2

(2.27)

Hence, we get the ML estimator of the autoregressive parameter, θbM L =

T X

!−1 2 yt−1

t=1

T X

yt−1 yt ,

(2.28)

t=1

b t−1 , we get the ML estimator of the error variance, and, by noting that εbt = yt − θy 2 σ bM L =

T 1X 2 εbt . T

(2.29)

t=1

(6) How do the ML estimators compare to the OLS estimators in the model (2.19)? The maximum likelihood (ML) estimator, θbM L , is identical to the OLS estimator, θbOLS , but note that the ML estimator of the error variance is different from the OLS estimator of the error variance, given by, 2 σ bOLS

T 1 X 2 = εbt . T −K

(2.30)

t=1

2 We know that the OLS estimator of the error variance, σ bOLS , is unbiased, so the 2 ML estimator, σ bM L , must be biased but consistent. We also note, that the ML estimator has the smallest possible asymptotic variance among all consistent and asymptotically normal estimators (denoted the Cramer-Rao lower bound).

7

(7) Find the Hessian matrix of double derivatives,  ∂ 2 log L (θ,σ2 ) ∂ 2 log L

t (θ,σ ∂θ∂σ 2

t

Ht = 

∂θ∂θ ∂ 2 log Lt (θ,σ 2 ) ∂σ 2 ∂θ

2)

∂ 2 log Lt (θ,σ 2 ) ∂σ 2 ∂σ 2

 ,

and the information matrix I(θ, σ 2 ) = −E[Ht ]. Comment on the role of the information matrix in inference on the parameters and state the asymptotic distribution.

The Hessian matrix is the second-derivative of the log-likelihood contributions, given by,    yt−1 (yt −θyt−1 ) 2 2 ∂ ∂ log Lt θ, σ y2 σ2 = = − t−1 (2.31) ∂θ∂θ ∂θ σ2    t−1 ) ∂ yt−1 (ytσ−θy 2 ∂ 2 log Lt θ, σ 2 yt−1 εt = =− 4 (2.32) 2 2 ∂θ∂σ ∂σ σ   2  t−1 ) ∂ − 2σ1 2 + 12 (yt −θy ∂ 2 log Lt θ, σ 2 yt−1 εt σ4 = =− 4 (2.33) 2 ∂σ ∂θ ∂θ σ   2  t−1 ) ∂ − 2σ1 2 + 12 (yt −θy ∂ 2 log Lt θ, σ 2 1 ε2t σ4 = = − (2.34) ∂σ 2 ∂σ 2 ∂σ 2 2σ 4 σ 6 which gives the Hessian matrix,  2 ∂ log Lt (θ,σ 2 ) H =  ∂ 2 log∂θ∂θ Lt (θ,σ 2 ) 



∂ 2 log Lt (θ,σ 2 ) ∂θ∂σ 2  ∂ 2 log Lt (θ,σ 2 ) ∂σ 2 ∂θ ∂σ 2 ∂σ 2     yt−1 (yt −θyt−1 ) yt−1 (yt −θyt−1 ) ∂ ∂ 2 2 σ

 =  

∂θ

 ∂ −

(y −θyt−1 )2 1 + 12 t 2σ 2 σ4



∂θ

" =

y2 − σt−1 2 yt−1 εt − σ4



 ∂ −

∂σ 2  2 1 1 (yt −θyt−1 ) + 2 2σ 2 σ4 ∂σ 2

  

#

εt − yt−1 σ4 1 2σ 4



σ

.

ε2t σ6

(2.35)

Note, that the Hessian is always block diagonal. The information matrix is the negative expected Hessian,    I θ, σ 2 = −E H θ, σ 2 . 8

(2.36)

  As E [εt ] = 0, E ε2t = σ 2 , and E [yt−1 εt ] = 0we get the information matrix, " #  2  1  E y 0 2 t−1 I θ, σ 2 = σ . (2.37) 1 0 2σ 4 The variance of the ML estimator, θbM L , is given by,    1  2  −1 1 b V θM L = E yt−1 . (2.38) T σ2  2  As we do not know E yt−1 , we replace the expectation with the sample average to get the estimate of the asymptotic variance, 



Vb θbM L



1 2 bM = ·σ L· T

T 1X 2 yt−1 T t=1

9

!−1 =

2 σ bM L

T X t=1

2 yt−1 .

(2.39)

Suggest Documents