Bayesian Estimation & Model Evaluation

Bayesian Estimation & Model Evaluation Frank Schorfheide University of Pennsylvania MFM Summer Camp June 12, 2016 Frank Schorfheide Bayesian Estima...

Author: Osborne Wade

0 downloads 4 Views 1MB Size

Report

Download PDF

Recommend Documents

90 Bayesian Estimation. 4.1 Bayesian Estimation Theory: Basic Definitions

Bayesian intrinsic point estimation

Bayesian analysis for uncertainty estimation of a canopy transpiration model

Maximum Likelihood vs. Bayesian Estimation

STAT 830 Bayesian Point Estimation

Bayesian Entropy Estimation for Countable Discrete Distributions

Bayesian parameter estimation via variational methods

Bayesian Epipolar Geometry Estimation from Tomographic Projections

Lecture 5: Bayesian Estimation & Hypothesis Testing

Recursive Bayesian Estimation Bearings-only Applications

LEARNING STYLE ESTIMATION USING BAYESIAN NETWORKS

Maximum-Likelihood and Bayesian Parameter Estimation

Rasch Model Estimation :

Learning Bayesian Network Model Structure from Data

ABC Methods for Bayesian Model Choice

A Hierarchical Bayesian Arms Race Model

On the Bayesian mixture model and. identifiability

Stan: A (Bayesian) Directed Graphical Model Compiler

Bayesian analysis of a correlated binomial model

Successive Bayesian estimation of reaction rate constants from spectral data

Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation

Bayesian Analysis of Extreme Events with Threshold Estimation

Toward COCOMO Cost Estimation Model

Model Fitting and Error Estimation

Bayesian Estimation & Model Evaluation Frank Schorfheide University of Pennsylvania MFM Summer Camp

June 12, 2016

Frank Schorfheide

Bayesian Estimation & Model Evaluation

Why Bayesian Inference?

• Why not?

p(θ|Y ) = R

p(Y |θ)p(θ) p(Y |θ)p(θ)dθ

• Treat uncertainty with respect to shocks, latent states, parameters,

and model specifications uncertainty symmetrically. • Condition inference on what you know (the data Y ) instead of what

you don’t know (the parameter θ). • Make optimal decision conditional on observed data.

Frank Schorfheide

Bayesian Estimation & Model Evaluation

Excuses and Overview

• Too little time to provide a detailed survey of state-of-the-art

Bayesian methods. • Instead: an eclectic collection of ideas and insights related to: 1

Model Development

2

Identification

3

Priors

4

Computations

5

Working with Multiple Models

Frank Schorfheide

Bayesian Estimation & Model Evaluation

1. Model Development • Bayesian estimation can take a lot of time... so don’t waste it on

bad models! • Suppose you have an elaborate macro-finance DSGE model... • Applied theorists get credit for plugging parameter values into the

model and solving/simulating it. • You can easily get extra credit by: • specifying a prior distribution p(θ); • generating draws θ i , i = 1, . . . , N from prior; ˜ i (conditional on θi ), i = 1, . . . , N; • simulating trajectories Y ˜ i ); • computing sample statistics S(Y • comparing the distribution of simulated sample statistics observed sample statistic S(Y ); • calling it a prior predictive check

Frank Schorfheide

Bayesian Estimation & Model Evaluation

1. Predictive Checks – An Example

Reference: Chang, Doh, and Schorfheide (2007, JMCB) Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. Identification • We are trying to learn the parameters θ from the data. • Formal definitions... e.g., model is identified at θ0 if

p(Y |θ) = p(Y |θ0 ) implies that θ = θ0 . • Without identification or with weak identfication: • use more/different data to achieve identification; • use identification-robust inference procedures.

• Lack of identification does not raise conceptual issues for Bayesian

inference (as long as priors are proper), but possibly computational challenges. Reference: Fernandez-Villaverde, Rubio-Ramirez, Schorfheide (2016, HB of Macro Chapter)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. (Lack of) Identification – An Analytical Example • Let φ be an identifiable reduced-form parameter. • Let θ be a structural parameter of interest:

φ≤θ

and θ ≤ φ + 1.

• Parameter θ is set-identified. • The interval Θ(φ) = [φ, φ + 1] is called the identified set. • This problem shows up prominently in VARs identified with sign

restrictions.

References: Moon and Schorfheide (2012, Econometrica); Schorfheide (2016, Discussion of World Congress Lectures by M¨ uller and Uhlig)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. (Lack of) Identification – An Analytical Example • Joint posterior of θ and φ:

p(θ, φ|Y ) = p(φ|Y )p(θ|φ, Y ) ∝ p(Y |φ)p(θ|φ)p(φ). • Because θ does not enter the likelihood function, we deduce that

p(Y |φ)p(φ) p(Y |φ)p(φ)dφ p(θ|φ, Y ) = p(θ|φ). p(φ|Y )

=

R

No updating of beliefs about θ conditional on φ! • Marginal posterior distribution of θ:

Z

θ

p(θ|Y ) =

p(φ|Y )p(θ|φ)dφ θ−1

Updating of marginal posterior of θ!

Frank Schorfheide

Bayesian Estimation & Model Evaluation

2. An Analytical Example: Posterior p(θ|Y )

Posterior Density π(θ)

Assume φ|Y ∼ N − 0.5, V¯ ; θ|φ ∼ U[φ, φ + 1]. V¯ is equal to 1/4 (solid red), 1/20 (dashed blue), and 1/100 (dotted green).

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -2

-1.5

-1

-0.5

0

0.5

1

1.5

Parameter θ

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Prior Distributions

• Ideally: probabilistic representation of our knowledge/beliefs before

observing sample Y . • More realistically: choice of prior as well as model are influenced by

some observations. Try to keep influence small or adjust measures of uncertainty. • Views about role of priors: 1

keep them “uninformative” (???) so that posterior inherits shape of likelihood function;

2

use them to regularize the likelihood function;

3

incorporate information from sources other than Y ;

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 1 • “Uninformative” priors? • Consider structural VAR

yt = Φyt−1 + Σtr Ωt ,

ut = Σtr Ωt ,

E[ut ut0 ] = Σ

• Uniform distribution on orthonormal matrix Ω does not induce

uniform prior over identified set for IRF IRF (i, h) = Φh Σtr [Ω].i = Φh Σtr q,

where

kqk = 1

q2 F q (Σtr )

F θ (Σtr )

θ = q1

tr Σtr 21 q1 + Σ22 q2 = 0

Reference: Schorfheide (2016, World Congress Discussion) Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2a • Consider model

yt = θ1 x1,t + θ1 θ2 x2,t + ut . • No identification of θ2 if θ1 = 0. • Models with multiplicative parameters generate likelihood functions

that look like this... θ2 0.8 0.6 0.4 0.2

θ1

0.0 0.1 0.2 0.3 0.4 0.5

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2a • Identification problem also distorts

p(θ1 = 0|Y ) ∝

Z

p(Y |θ1 = 0, θ2 )p(θ2 )p(θ1 = 0)dθ2 .

• Reparameterize: α1 = θ1 , α2 = θ1 θ2 . • Prior p(α1 , α2 ) ∝ c can regularize the problem. • Jacobian:

∂α 1 ∂θ0 = θ2

0 = |θ1 |. θ1

• Prior density p(θ1 , θ2 ) ∝ |θ1 | vanishes as θ1 approaches point of

non-identification. • More generally: try to add information when data are not particularly informative.

References: For cointegration model: Kleibergen and van Dijk (1994), Kleibergen and Paap (2002) Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 2b • For instance, high-dimensional VARs:

Y = X Φ + U,

ut ∼ N(0, Σ)

with low observation-parameter ratio. • Hierarchical (conjugate) MNIW prior p(Φ, Σ|λ) adds information.

Frequentist perspective: add some bias and reduce variance to improve MSE. • How much? Data-driven choice of λ (empirical Bayes):

ˆ = argmaxλ λ

Z

p(Y |Φ, Σ)p(Φ, Σ|λ)d(Φ, Σ)

• Or specify prior p(λ) and integrate out hyperparameters. • Alternative priors: LASSO, spike-and-slab,... References: Giannone, Lenza, and Primiceri (2014, REStat)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

3. Role of Priors – Example 3 • Prior elicitation based on: pre-sample information; information from

excluded data series; or micro (macro) level information when estimating a model on macro (micro) data. • A cute example... • Production function: Yt = (At Ht )

α

Kt1−α

1−ϕ·

Ht −1 Ht−1

2 ! .

• Prior for adjustment costs ϕ? • Firms can either search for workers, incurring adjustment costs • • • •

)2 Y , or pay head hunters for finding workers. ϕ( ∆H H Head hunters service fee is ζW ∆H. Head hunters tend to charge about ζ = 1/3 to 2/3 of quarterly earnings of a worker. Recruiting costs should be approximately the same: ϕ( ∆H )2 Y = ζW ∆H. H With the labor share of 1/3 (= WH ) for a size of one percent increase Y = 1%, we obtain a range of 22 to 44 for ϕ. of employment, ∆H H

Reference: Chang, Doh, and Schorfheide (2007, JMCB) Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. Computations • Practical work utilizes algorithms to generate draws θ i , i = 1, . . . , N

from posterior p(θ|Y ). • Post-process draws by converting them into object of interest

hi = h(θi ) to characterize p(h(θ)|Y ) =⇒ inference and decision making under uncertainty. • Important algorithms: • importance sampling • Markov chain Monte Carlo (MCMC) algorithms, e.g.,

Metropolis-Hastings samplers or Gibbs samplers • More recently: widespread access to parallel computation

environments. • Sequential Monte Carlo (SMC) techniques provide an interesting

alternative. Reference: Herbst and Schorfheide (2015, Princeton University Press) Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. Importance Sampling • Target posterior π(θ) ∝ f (θ). R R • Use identity h(θ)f (θ)dθ = h(θ) gf (θ) (θ) g (θ)dθ. • θ i ’s are draws from g (·). • approximation:

Eπ [h] ≈ 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00

1 N

PN 1 N

i i i=1 h(θ )w (θ ) , PN i i=1 w (θ )

w (θ) =

f (θ) . g (θ)

f g1 g2

6

4

2

0

weights

2

4

6

2

4

6

f/g1 f/g2 6

4 Frank Schorfheide 2

0

Bayesian Estimation & Model Evaluation

4. A Challenging Posterior • Consider the state-space model:

yt = [1 1]st ,

st =

θ12 2 (1 − θ1 ) − θ1 θ2

• Shocks: t ∼ iidN(0, 1); uniform prior.

0 (1 − θ12 )

st−1 +

1 0

• Simulate T = 200 observations given θ = [0.45, 0.45]0 , which is

observationally equivalent to θ = [0.89, 0.22]0 . 1.0

θ2

0.8 0.6 0.4 0.2 0.0 0.0

0.2

0.4

Frank Schorfheide

0.6

0.8

θ1

1.0

Bayesian Estimation & Model Evaluation

t .

4. From Importance to Sequential Importance Sampling

5 4 3 2 1 0 50

40

30 n 20

10

0.6

0.0

0.2

1.0

0.4 θ 1

[p(Y |θ)]φn p(θ) fn (θ) πn (θ) = R = , φ n Zn [p(Y |θ)] p(θ)dθ Frank Schorfheide

0.8

φn =

n Nφ

λ

Bayesian Estimation & Model Evaluation

4. SMC Algorithm: A Graphical Illustration

10 5 0

−5 −10

C

φ0

S

M

φ1

C

S

M

φ2

C

S

M

φ3

• πn (θ) is represented by a swarm of particles {θni , Wni }N i=1 : N 1 X i a.s. Wn h(θni ) −→ Eπn [h(θn )]. h¯n,N = N i=1

• C is Correction; S is Selection; and M is Mutation. Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. SMC Algorithm 1

Initialization. (φ0 = 0). Draw the initial particles from the prior:

2

θ1i ∼ p(θ) and W1i = 1, i = 1, . . . , N. Recursion. For n = 1, . . . , Nφ ,

iid

1

Correction. Reweight the particles from stage n − 1 by defining the incremental weights i w ˜ni = [p(Y |θn−1 )]φn −φn−1

(1)

and the normalized weights ˜ ni = W

1 N

i w ˜ni Wn−1 , PN i ˜ni Wn−1 i=1 w

i = 1, . . . , N.

(2)

An approximation of Eπn [h(θ)] is given by N 1 X ˜i i h˜n,N = ). Wn h(θn−1 N i=1

2

(3)

Selection. Frank Schorfheide

Bayesian Estimation & Model Evaluation

4. SMC Algorithm 1 2

Initialization. Recursion. For n = 1, . . . , Nφ , 1 2

Correction. ˆ N Selection. (Optional Resampling) Let {θ} i=1 denote N iid draws from a multinomial distribution characterized by support points and i i ˜ ni }N weights {θn−1 ,W i=1 and set Wn = 1. An approximation of Eπn [h(θ)] is given by N 1 X i ˆi Wn h(θn ). hˆn,N = N i=1

3

(4)

Mutation. Propagate the particles {θˆi , Wni } via NMH steps of a MH algorithm with transition density θni ∼ Kn (θn |θˆni ; ζn ) and stationary distribution πn (θ). An approximation of Eπn [h(θ)] is given by N 1 X h¯n,N = h(θni )Wni . N i=1

Frank Schorfheide

(5)

Bayesian Estimation & Model Evaluation

4. Remarks • Correction Step: • reweight particles from iteration n − 1 to create importance sampling approximation of Eπn [h(θ)] • Selection Step: the resampling of the particles • (good) equalizes the particle weights and thereby increases accuracy of subsequent importance sampling approximations; • (not good) adds a bit of noise to the MC approximation. • Mutation Step: • adapts particles to posterior πn (θ); • imagine we don’t do it: then we would be using draws from prior p(θ) to approximate posterior π(θ), which can’t be good! 5 4 3 2 1 0 50

40

30 n 20

10

0.0

0.2

0.6 0.4 θ 1

0.8

1.0

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Working with Multiple Models • Assign prior probabilities γj,0 to models Mj , j = 1, . . . , J. • Posterior model probabilities are given by

γj,0 p(Y |Mj ) , γj,T = PJ j=1 γj,0 p(Y |Mj ) where p(Y |Mj ) =

Z

p(Y |θ(j) , Mj )p(θ(j) |Mj )dθ(j)

• Log marginal data densities are one-step-ahead predictive scores:

ln p(Y |Mj ) =

T X

Z ln

t=1

p(yt |θ(j) , Y1:t−1 , Mj )p(θ(j) |Y1:t−1 , Mj )dθ(j) .

• Bayesian model averaging:

p(h|Y ) =

J X

γj,T p(hj (θ(j) )|Y , Mj ).

j=1

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Working with Multiple Models

• Application: DSGE model with and without financial frictions. • Food for thought: • Bayesian model averaging essentially assumes that the model space

is complete. Is it? • Time-varying model weights can be a stand in for nonlinear

macroeconomic dynamics.

Reference: Del Negro, Hasegawa, and Schorfheide (2016, JoE)

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. A Stylized Framework • Consider principal-agent setting in mind to separate the task of

estimating models from the task of combining them. • Agents Mm = econometric modelers: • provide principal with predictive densities p(yt+1 |Itm , Mm ); • are rewarded based on the realized value of ln p(yt+1 |Itm , Mm ) (induces truth-telling). • Itm is model specific information set. • Principal P = policy maker who aggregates information obtained

from modelers:

p(yt+1 |λ, ItP , P) = λp(yt+1 |It1 , M1 ) + (1 − λ)p(yt+1 |It2 , M2 ) where ItP = {y1:t , {p(yτ |Iτm−1 , Mm )}tτ =1 for m = 1, 2}

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Bayesian Model Averaging (BMA): λ ∈ {1, 0} • At any time T the policy maker can use the predictive densities to

form marginal likelihoods: p(Y1:T |Mi ) =

T Y t=1

p(yt |Y1:t−1 , Mi )

• . . . use them to update model probabilities:

λBMA p(Y1:T |M1 ) 0 λBMA = P[λ = 1|Y ] = 1:T T BMA | {z } λ0 p(Y1:T |M1 ) + (1 − λBMA )p(Y1:T |M2 ) 0 P[M1 is correct] • Predictive density:

pBMA (yt+1 |ItP , P)

=

λBMA p(yt+1 |Y1:t , M1 ) t

+(1 − λBMA )p(yt+1 |Y1:t , M2 ) t

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. BMA and Model Misspecification • BMA is based on the assumption that the model space contains the

‘true’ model (“complete model space”): p(y1:T |λ, P) =

Q p(y1:T |M1 ) = QTt=1 p(yt |Y1:t−1 , M1 ) p(y1:T |M2 ) = Tt=1 p(yt |Y1:t−1 , M2 )

if if

λ=1 λ=0

DGP = p(Y1:T )

KL Discrepancy

p(Y1:T |M1 )

p(Y1:T |M2 )

a.s.

• λBMA −→ 1 or 0 as T −→ ∞ (Dawid 1984, others): Asymptotically, T

no model averaging! All the weight is on model closest in KL discrepancy. Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Optimal (Static) Pools: λ ∈ [0, 1] • A policy maker concerned about misspecification of Mi could create

convex combinations of predictive densities:

DGP = p(Y1:T )

p(Y1:T |M1 ) p(Y1:T |λ, P) =

p(Y1:T |M2 ) T Y t=1

λ p (yt |Y1:t−1 , M1 ) + (1 − λ) p (yt |Y1:t−1 , M2 )

• λSP T = argmaxλ∈[0,1] p(y1:T |λ, P) generally 6→ 1 or 0 (unless one of

the models is correct): Exploits gains from diversification.

References: Hall and Mitchell (2007), Geweke and Amisano (2011) Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Dynamic Pools - Prior for Weights λ1:T • Dynamic pool: replace λ by sequence λt • Likelihood function:

T Y p(y1:T |λ1:T , P) = λt p(yt |y1:t−1 , M1 )+(1−λt )p(yt |y1:t−1 , M2 ) . t=1

• Prior p(λ1:T |ρ) for sequence λ1:T :

p xt = ρxt−1 + 1 − ρ2 εt , λt = Φ(xt ) where Φ(.) is the Gaussian CDF.

εt ∼ iid N(0, 1),

x0 ∼ N(0, 1),

• Unconditionally, λt ∼ U[0, 1] for all t. • Hyperparameter ρ controls the amount of “smoothing.” • As ρ −→ 1: dynamic pool −→ static pool. • Specify a prior distribution for ρ (and other hyperparamters) and

base our results on the (real time) posterior distribution. Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Dynamic Pools - Nonlinear State Space System • Measurement equation:

p(yt |λt , P) = λt p(yt |y1:t−1 , M1 ) + (1 − λt )p(yt |y1:t−1 , M2 )

• Transition equation:

λt

=

Φ(xt ),

xt = ρxt−1 +

p

1 − ρ2 εt ,

εt ∼ iid N(0, 1)

• Use particle filter to construct the sequence p(λt |ρ, ItP , P)..

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Application

• Two models: Smets-Wouters and Smets-Wouters with financial

frictions • Track relative performance over time and construct real-time weights

Frank Schorfheide

Bayesian Estimation & Model Evaluation

5. Log Scores Comparison: SWFF vs SWπ

p(¯ yt+h,h |Itm+ , Mm ) Frank Schorfheide

Bayesian Estimation & Model Evaluation

(h)

5. Dynamic Pools – Posterior pDP (λt |ItP , P) ρ ∼ U[0, 1], µ = 0, σ = 1

Frank Schorfheide

Bayesian Estimation & Model Evaluation

To Recap...

• Too little time to provide a detailed survey of state-of-the-art

Bayesian methods. • Instead: an eclectic collection of ideas and insights related to: 1

Model Development

2

Identification

3

Priors

4

Computations

5

Working with Multiple Models

Frank Schorfheide

Bayesian Estimation & Model Evaluation