Bayesian Estimation & Model Evaluation

Bayesian Estimation & Model Evaluation Frank Schorfheide University of Pennsylvania MFM Summer Camp June 12, 2016 Frank Schorfheide Bayesian Estima...
Author: Osborne Wade
0 downloads 4 Views 1MB Size
Bayesian Estimation & Model Evaluation Frank Schorfheide University of Pennsylvania MFM Summer Camp

June 12, 2016

Frank Schorfheide

Bayesian Estimation & Model Evaluation

Why Bayesian Inference?

• Why not?

p(θ|Y ) = R

p(Y |θ)p(θ) p(Y |θ)p(θ)dθ

• Treat uncertainty with respect to shocks, latent states, parameters,

and model specifications uncertainty symmetrically. • Condition inference on what you know (the data Y ) instead of what

you don’t know (the parameter θ). • Make optimal decision conditional on observed data.

Frank Schorfheide

Bayesian Estimation & Model Evaluation

Excuses and Overview

• Too little time to provide a detailed survey of state-of-the-art

Bayesian methods. • Instead: an eclectic collection of ideas and insights related to: 1

Model Development

2

Identification

3

Priors

4

Computations

5

Working with Multiple Models

Frank Schorfheide

Bayesian Estimation & Model Evaluation

1. Model Development • Bayesian estimation can take a lot of time... so don’t waste it on

bad models! • Suppose you have an elaborate macro-finance DSGE model... • Applied theorists get credit for plugging parameter values into the

model and solving/simulating it. • You can easily get extra credit by: • specifying a prior distribution p(θ); • generating draws θ i , i = 1, . . . , N from prior; ˜ i (conditional on θi ), i = 1, . . . , N; • simulating trajectories Y ˜ i ); • computing sample statistics S(Y • comparing the distribution of simulated sample statistics observed sample statistic S(Y ); • calling it a prior predictive check

Frank Schorfheide

Bayesian Estimation & Model Evaluation

1. Predictive Checks – An Example

Reference: Chang, Doh, and Schorfheide (2007, JMCB) Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. Identification • We are trying to learn the parameters θ from the data. • Formal definitions... e.g., model is identified at θ0 if

p(Y |θ) = p(Y |θ0 ) implies that θ = θ0 . • Without identification or with weak identfication: • use more/different data to achieve identification; • use identification-robust inference procedures.

• Lack of identification does not raise conceptual issues for Bayesian

inference (as long as priors are proper), but possibly computational challenges. Reference: Fernandez-Villaverde, Rubio-Ramirez, Schorfheide (2016, HB of Macro Chapter)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. (Lack of) Identification – An Analytical Example • Let φ be an identifiable reduced-form parameter. • Let θ be a structural parameter of interest:

φ≤θ

and θ ≤ φ + 1.

• Parameter θ is set-identified. • The interval Θ(φ) = [φ, φ + 1] is called the identified set. • This problem shows up prominently in VARs identified with sign

restrictions.

References: Moon and Schorfheide (2012, Econometrica); Schorfheide (2016, Discussion of World Congress Lectures by M¨ uller and Uhlig)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. (Lack of) Identification – An Analytical Example • Joint posterior of θ and φ:

p(θ, φ|Y ) = p(φ|Y )p(θ|φ, Y ) ∝ p(Y |φ)p(θ|φ)p(φ). • Because θ does not enter the likelihood function, we deduce that

p(Y |φ)p(φ) p(Y |φ)p(φ)dφ p(θ|φ, Y ) = p(θ|φ). p(φ|Y )

=

R

No updating of beliefs about θ conditional on φ! • Marginal posterior distribution of θ:

Z

θ

p(θ|Y ) =

p(φ|Y )p(θ|φ)dφ θ−1

Updating of marginal posterior of θ!

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. An Analytical Example: Posterior p(θ|Y )

Posterior Density π(θ)

 Assume φ|Y ∼ N − 0.5, V¯ ; θ|φ ∼ U[φ, φ + 1]. V¯ is equal to 1/4 (solid red), 1/20 (dashed blue), and 1/100 (dotted green).

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

Parameter θ

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Prior Distributions

• Ideally: probabilistic representation of our knowledge/beliefs before

observing sample Y . • More realistically: choice of prior as well as model are influenced by

some observations. Try to keep influence small or adjust measures of uncertainty. • Views about role of priors: 1

keep them “uninformative” (???) so that posterior inherits shape of likelihood function;

2

use them to regularize the likelihood function;

3

incorporate information from sources other than Y ;

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 1 • “Uninformative” priors? • Consider structural VAR

yt = Φyt−1 + Σtr Ωt ,

ut = Σtr Ωt ,

E[ut ut0 ] = Σ

• Uniform distribution on orthonormal matrix Ω does not induce

uniform prior over identified set for IRF IRF (i, h) = Φh Σtr [Ω].i = Φh Σtr q,

where

kqk = 1

q2 F q (Σtr )

F θ (Σtr )

θ = q1

tr Σtr 21 q1 + Σ22 q2 = 0

Reference: Schorfheide (2016, World Congress Discussion) Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2a • Consider model

yt = θ1 x1,t + θ1 θ2 x2,t + ut . • No identification of θ2 if θ1 = 0. • Models with multiplicative parameters generate likelihood functions

that look like this... θ2 0.8 0.6 0.4 0.2

θ1

0.0 0.1 0.2 0.3 0.4 0.5

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2a • Identification problem also distorts

p(θ1 = 0|Y ) ∝

Z

p(Y |θ1 = 0, θ2 )p(θ2 )p(θ1 = 0)dθ2 .

• Reparameterize: α1 = θ1 , α2 = θ1 θ2 . • Prior p(α1 , α2 ) ∝ c can regularize the problem. • Jacobian:

∂α 1 ∂θ0 = θ2

0 = |θ1 |. θ1

• Prior density p(θ1 , θ2 ) ∝ |θ1 | vanishes as θ1 approaches point of

non-identification. • More generally: try to add information when data are not particularly informative.

References: For cointegration model: Kleibergen and van Dijk (1994), Kleibergen and Paap (2002) Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2b • For instance, high-dimensional VARs:

Y = X Φ + U,

ut ∼ N(0, Σ)

with low observation-parameter ratio. • Hierarchical (conjugate) MNIW prior p(Φ, Σ|λ) adds information.

Frequentist perspective: add some bias and reduce variance to improve MSE. • How much? Data-driven choice of λ (empirical Bayes):

ˆ = argmaxλ λ

Z

p(Y |Φ, Σ)p(Φ, Σ|λ)d(Φ, Σ)

• Or specify prior p(λ) and integrate out hyperparameters. • Alternative priors: LASSO, spike-and-slab,... References: Giannone, Lenza, and Primiceri (2014, REStat)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 3 • Prior elicitation based on: pre-sample information; information from

excluded data series; or micro (macro) level information when estimating a model on macro (micro) data. • A cute example... • Production function: Yt = (At Ht )

α

Kt1−α

 1−ϕ·

Ht −1 Ht−1

2 ! .

• Prior for adjustment costs ϕ? • Firms can either search for workers, incurring adjustment costs • • • •

)2 Y , or pay head hunters for finding workers. ϕ( ∆H H Head hunters service fee is ζW ∆H. Head hunters tend to charge about ζ = 1/3 to 2/3 of quarterly earnings of a worker. Recruiting costs should be approximately the same: ϕ( ∆H )2 Y = ζW ∆H. H With the labor share of 1/3 (= WH ) for a size of one percent increase Y = 1%, we obtain a range of 22 to 44 for ϕ. of employment, ∆H H

Reference: Chang, Doh, and Schorfheide (2007, JMCB) Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. Computations • Practical work utilizes algorithms to generate draws θ i , i = 1, . . . , N

from posterior p(θ|Y ). • Post-process draws by converting them into object of interest

hi = h(θi ) to characterize p(h(θ)|Y ) =⇒ inference and decision making under uncertainty. • Important algorithms: • importance sampling • Markov chain Monte Carlo (MCMC) algorithms, e.g.,

Metropolis-Hastings samplers or Gibbs samplers • More recently: widespread access to parallel computation

environments. • Sequential Monte Carlo (SMC) techniques provide an interesting

alternative. Reference: Herbst and Schorfheide (2015, Princeton University Press) Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. Importance Sampling • Target posterior π(θ) ∝ f (θ). R R • Use identity h(θ)f (θ)dθ = h(θ) gf (θ) (θ) g (θ)dθ. • θ i ’s are draws from g (·). • approximation:

Eπ [h] ≈ 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

1 N

PN 1 N

i i i=1 h(θ )w (θ ) , PN i i=1 w (θ )

w (θ) =

f (θ) . g (θ)

f g1 g2

6

4

2

0

weights

2

4

6

2

4

6

f/g1 f/g2 6

4 Frank Schorfheide 2

0

Bayesian Estimation & Model Evaluation

4. A Challenging Posterior • Consider the state-space model:

 yt = [1 1]st ,

st =

θ12 2 (1 − θ1 ) − θ1 θ2

• Shocks: t ∼ iidN(0, 1); uniform prior.

0 (1 − θ12 )



 st−1 +

1 0

• Simulate T = 200 observations given θ = [0.45, 0.45]0 , which is

observationally equivalent to θ = [0.89, 0.22]0 . 1.0

θ2

0.8 0.6 0.4 0.2 0.0 0.0

0.2

0.4

Frank Schorfheide

0.6

0.8

θ1

1.0

Bayesian Estimation & Model Evaluation

 t .

4. From Importance to Sequential Importance Sampling

5 4 3 2 1 0 50

40

30 n 20

10

0.6

0.0

0.2

1.0

0.4 θ 1

[p(Y |θ)]φn p(θ) fn (θ) πn (θ) = R = , φ n Zn [p(Y |θ)] p(θ)dθ Frank Schorfheide

0.8

 φn =

n Nφ



Bayesian Estimation & Model Evaluation

4. SMC Algorithm: A Graphical Illustration

10 5 0

−5 −10

C

φ0

S

M

φ1

C

S

M

φ2

C

S

M

φ3

• πn (θ) is represented by a swarm of particles {θni , Wni }N i=1 : N 1 X i a.s. Wn h(θni ) −→ Eπn [h(θn )]. h¯n,N = N i=1

• C is Correction; S is Selection; and M is Mutation. Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. SMC Algorithm 1

Initialization. (φ0 = 0). Draw the initial particles from the prior:

2

θ1i ∼ p(θ) and W1i = 1, i = 1, . . . , N. Recursion. For n = 1, . . . , Nφ ,

iid

1

Correction. Reweight the particles from stage n − 1 by defining the incremental weights i w ˜ni = [p(Y |θn−1 )]φn −φn−1

(1)

and the normalized weights ˜ ni = W

1 N

i w ˜ni Wn−1 , PN i ˜ni Wn−1 i=1 w

i = 1, . . . , N.

(2)

An approximation of Eπn [h(θ)] is given by N 1 X ˜i i h˜n,N = ). Wn h(θn−1 N i=1

2

(3)

Selection. Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. SMC Algorithm 1 2

Initialization. Recursion. For n = 1, . . . , Nφ , 1 2

Correction. ˆ N Selection. (Optional Resampling) Let {θ} i=1 denote N iid draws from a multinomial distribution characterized by support points and i i ˜ ni }N weights {θn−1 ,W i=1 and set Wn = 1. An approximation of Eπn [h(θ)] is given by N 1 X i ˆi Wn h(θn ). hˆn,N = N i=1

3

(4)

Mutation. Propagate the particles {θˆi , Wni } via NMH steps of a MH algorithm with transition density θni ∼ Kn (θn |θˆni ; ζn ) and stationary distribution πn (θ). An approximation of Eπn [h(θ)] is given by N 1 X h¯n,N = h(θni )Wni . N i=1

Frank Schorfheide

(5)

Bayesian Estimation & Model Evaluation

4. Remarks • Correction Step: • reweight particles from iteration n − 1 to create importance sampling approximation of Eπn [h(θ)] • Selection Step: the resampling of the particles • (good) equalizes the particle weights and thereby increases accuracy of subsequent importance sampling approximations; • (not good) adds a bit of noise to the MC approximation. • Mutation Step: • adapts particles to posterior πn (θ); • imagine we don’t do it: then we would be using draws from prior p(θ) to approximate posterior π(θ), which can’t be good! 5 4 3 2 1 0 50

40

30 n 20

10

0.0

0.2

0.6 0.4 θ 1

0.8

1.0

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Working with Multiple Models • Assign prior probabilities γj,0 to models Mj , j = 1, . . . , J. • Posterior model probabilities are given by

γj,0 p(Y |Mj ) , γj,T = PJ j=1 γj,0 p(Y |Mj ) where p(Y |Mj ) =

Z

p(Y |θ(j) , Mj )p(θ(j) |Mj )dθ(j)

• Log marginal data densities are one-step-ahead predictive scores:

ln p(Y |Mj ) =

T X

Z ln

t=1

p(yt |θ(j) , Y1:t−1 , Mj )p(θ(j) |Y1:t−1 , Mj )dθ(j) .

• Bayesian model averaging:

p(h|Y ) =

J X

γj,T p(hj (θ(j) )|Y , Mj ).

j=1

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Working with Multiple Models

• Application: DSGE model with and without financial frictions. • Food for thought: • Bayesian model averaging essentially assumes that the model space

is complete. Is it? • Time-varying model weights can be a stand in for nonlinear

macroeconomic dynamics.

Reference: Del Negro, Hasegawa, and Schorfheide (2016, JoE)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. A Stylized Framework • Consider principal-agent setting in mind to separate the task of

estimating models from the task of combining them. • Agents Mm = econometric modelers: • provide principal with predictive densities p(yt+1 |Itm , Mm ); • are rewarded based on the realized value of ln p(yt+1 |Itm , Mm ) (induces truth-telling). • Itm is model specific information set. • Principal P = policy maker who aggregates information obtained

from modelers:

p(yt+1 |λ, ItP , P) = λp(yt+1 |It1 , M1 ) + (1 − λ)p(yt+1 |It2 , M2 ) where ItP = {y1:t , {p(yτ |Iτm−1 , Mm )}tτ =1 for m = 1, 2}

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Bayesian Model Averaging (BMA): λ ∈ {1, 0} • At any time T the policy maker can use the predictive densities to

form marginal likelihoods: p(Y1:T |Mi ) =

T Y t=1

p(yt |Y1:t−1 , Mi )

• . . . use them to update model probabilities:

λBMA p(Y1:T |M1 ) 0 λBMA = P[λ = 1|Y ] = 1:T T BMA | {z } λ0 p(Y1:T |M1 ) + (1 − λBMA )p(Y1:T |M2 ) 0 P[M1 is correct] • Predictive density:

pBMA (yt+1 |ItP , P)

=

λBMA p(yt+1 |Y1:t , M1 ) t

+(1 − λBMA )p(yt+1 |Y1:t , M2 ) t

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. BMA and Model Misspecification • BMA is based on the assumption that the model space contains the

‘true’ model (“complete model space”):  p(y1:T |λ, P) =

Q p(y1:T |M1 ) = QTt=1 p(yt |Y1:t−1 , M1 ) p(y1:T |M2 ) = Tt=1 p(yt |Y1:t−1 , M2 )

if if

λ=1 λ=0

DGP = p(Y1:T )

KL Discrepancy

p(Y1:T |M1 )

p(Y1:T |M2 )

a.s.

• λBMA −→ 1 or 0 as T −→ ∞ (Dawid 1984, others): Asymptotically, T

no model averaging! All the weight is on model closest in KL discrepancy. Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Optimal (Static) Pools: λ ∈ [0, 1] • A policy maker concerned about misspecification of Mi could create

convex combinations of predictive densities:

DGP = p(Y1:T )

p(Y1:T |M1 ) p(Y1:T |λ, P) =

p(Y1:T |M2 ) T Y  t=1

λ p (yt |Y1:t−1 , M1 ) + (1 − λ) p (yt |Y1:t−1 , M2 )

• λSP T = argmaxλ∈[0,1] p(y1:T |λ, P) generally 6→ 1 or 0 (unless one of

the models is correct): Exploits gains from diversification.

References: Hall and Mitchell (2007), Geweke and Amisano (2011) Frank Schorfheide

Bayesian Estimation & Model Evaluation



5. Dynamic Pools - Prior for Weights λ1:T • Dynamic pool: replace λ by sequence λt • Likelihood function:

 T  Y p(y1:T |λ1:T , P) = λt p(yt |y1:t−1 , M1 )+(1−λt )p(yt |y1:t−1 , M2 ) . t=1

• Prior p(λ1:T |ρ) for sequence λ1:T :

p xt = ρxt−1 + 1 − ρ2 εt , λt = Φ(xt ) where Φ(.) is the Gaussian CDF.

εt ∼ iid N(0, 1),

x0 ∼ N(0, 1),

• Unconditionally, λt ∼ U[0, 1] for all t. • Hyperparameter ρ controls the amount of “smoothing.” • As ρ −→ 1: dynamic pool −→ static pool. • Specify a prior distribution for ρ (and other hyperparamters) and

base our results on the (real time) posterior distribution. Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Dynamic Pools - Nonlinear State Space System • Measurement equation:

p(yt |λt , P) = λt p(yt |y1:t−1 , M1 ) + (1 − λt )p(yt |y1:t−1 , M2 )

• Transition equation:

λt

=

Φ(xt ),

xt = ρxt−1 +

p

1 − ρ2 εt ,

εt ∼ iid N(0, 1)

• Use particle filter to construct the sequence p(λt |ρ, ItP , P)..

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Application

• Two models: Smets-Wouters and Smets-Wouters with financial

frictions • Track relative performance over time and construct real-time weights

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Log Scores Comparison: SWFF vs SWπ

p(¯ yt+h,h |Itm+ , Mm ) Frank Schorfheide

Bayesian Estimation & Model Evaluation

(h)

5. Dynamic Pools – Posterior pDP (λt |ItP , P) ρ ∼ U[0, 1], µ = 0, σ = 1

Frank Schorfheide

Bayesian Estimation & Model Evaluation

To Recap...

• Too little time to provide a detailed survey of state-of-the-art

Bayesian methods. • Instead: an eclectic collection of ideas and insights related to: 1

Model Development

2

Identification

3

Priors

4

Computations

5

Working with Multiple Models

Frank Schorfheide

Bayesian Estimation & Model Evaluation

Suggest Documents