David F. Hendry Economic Forecasting

Economic Forecasting David F. Hendry Economics Department, Oxford University Carlos III University, Madrid June–July 2010 David F. Hendry Economic F...
Author: Damon Crawford
58 downloads 0 Views 2MB Size
Economic Forecasting David F. Hendry Economics Department, Oxford University Carlos III University, Madrid June–July 2010

David F. Hendry

Economic Forecasting – p.1/169

Structure of lectures (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.2/169

Introduction Economic forecasting confronts a non-stationary, evolving world, where model and mechanism differ. Given an objective, and the available information how ‘best’ to forecast a future outcome? Forecasts instrumental to decisions, not an end in themselves: just good enough for purpose at hand. Determines form of forecast, and how to evaluate it. Discuss framework, basic concepts, main implications.

David F. Hendry

Economic Forecasting – p.3/169

Forecast failure Poor historical track record of econometric systems: forecast failures, and out-performed by ‘naive devices’. Problems date from the early history of econometrics. Such an adverse outcome is surprising: econometrics uses inter-temporal causal information. Explanation: Many steps between predictability of a random variable at time t, and forecast of it from an estimated model at T . We will unravel all of these steps.

David F. Hendry

Economic Forecasting – p.4/169

Forecasting A forecast is any statement about the future. Two basic methods of forecasting. (1) a ‘crystal ball’ that can ‘see’ into the future; (2) extrapolate from present information. Unfortunately: “Never a crystal ball when you need one” Robert J. Samuelson, Washington Post, 16 June, 2001. Working examples of (1) unavailable to humanity: forced to focus on dramatically inferior method (2). Forecasting rule: systematic operational procedure for making statements about future events.

David F. Hendry

Economic Forecasting – p.5/169

Forecast uncertainty Problem with forecasting is: future is uncertain. Forecast uncertainty is intrinsic; but two sources: one we know is present and understand the probabilities; and one due to factors we do not even know exist. “Because of the things we don’t know [that] we don’t know, the future is largely unpredictable.” Maxine Singer, 1997, Thoughts of a Nonmillenarian, p. 39. In tossing 2 dice, the two sources are: the probability any pair of numbers will be face up not knowing that the dice are loaded. Second is type of problem in above quote by Singer. David F. Hendry

Economic Forecasting – p.6/169

Unmeasured uncertainty Unmeasured uncertainty important for the future. Impossible to conceive of all possibilities: economic ‘earthquakes’ seem to occur all too often. Once the unpredictable has occurred, economists can usually account for it, so explain the past. History is not inevitable, but an improbable sequence where contingency played a large role. New unpredictable events will intrude in the future: last 1000 years have been an exceptional period, and so will the present millennium. Future will see more large, unanticipated shocks – viz. the recent collapses of the dotcom boom and commercial banks. David F. Hendry

Economic Forecasting – p.7/169

Forecasting difficulties Economic forecasters confront a difficult environment. Many forecasting agencies maintain several models: different forecasts warn that some are at odds with reality. Fat−tail distribution versus a shift

0.40 0.35

t3 shifted mean original distribution

0.30 0.25 0.20 0.15 0.10 0.05 −20

−15

−10

−5

0

5

10

15

20

Good guides sparse when future is not like past. Distributions of events change over time: non-stationarity. David F. Hendry

Economic Forecasting – p.8/169

Non-stationarity Non-stationarity characteristic of economies from: technological progress, legislative changes, policy regime shifts, financial innovation, social mores altering. Can markedly affect living standards, inflation, etc. Two classes of non-stationarity: a) regular persistent changes: stochastic trends; b) sudden, unanticipated, large changes: structural breaks. Location shifts are an important member of b): 1929 crash and Great Depression are classic example. Problem manifest in almost every economic time series. David F. Hendry

Economic Forecasting – p.9/169

Changes in P , M , GDP, Y ,RL 2

∆P

∆M 300 200

1

100 0 1850 10

1900

1950

2000

∆Y

0.04

5

1900

1950

2000

1900

1950

2000

∆R L

0.02

0

1850

0 1850

0.00

1900

David F. Hendry

1950

2000

1850

Economic Forecasting – p.10/169

∆p, ∆m, ∆y , ∆rL 0.2

∆p

0.2

∆m 25−year means

0.1 0.1 0.0 0.0

−0.1 1850 0.10

1900

1950

2000

1850

∆y

1900

1950

2000

∆r L 0.2

0.05 0.00

0.0 −0.05 −0.10 1850

−0.2 1900

David F. Hendry

1950

2000

1900

1950

2000

Economic Forecasting – p.11/169

Forecast failure Significant deterioration relative to anticipated outcome. Macro-economic forecast failure occurs regularly. Forecast failure depends on forecast-period events: need not invalidate theory or model, nor be predictable from in-sample tests; neither avoided, nor induced, by congruence.

Analogy: rocket to moon predicted to land on 4th July, but is hit by meteor and knocked off course Forecast is systematically and badly wrong. Outcome not due to poor engineering: nor bad forecasting models; and does not refute Newtonian gravitation theory. David F. Hendry

Economic Forecasting – p.12/169

Theory of economic forecasting Well developed if econometric model coincides with stationary DGP. Consider n × 1 vector xt ∼ Dxt (xt |Xt−1 , θ) for θ ∈ Θ ⊆Rk , where Xt−1 = (. . . x1 . . . xt−1 ). Statistical forecast e xT +h|T = fh (XT ) for T + h at T . How to select fh ?

Conditional expectation: b xT +h|T = E[xT +h | XT ] unbiased, E[(xT +h − b xT +h|T ) | XT ] = 0. b xT +h|T has smallest mean-square forecast-error matrix: i h  0   M b xT +h|T xT +h − x bT +h|T | XT . xT +h|T | XT = E xT +h − b David F. Hendry

Economic Forecasting – p.13/169

Ten problems First: problems learning DX1T (·) and θ : (1) specification of the set of relevant variables {xt }, (2) measurement of the xs, (3) formulation of DX1T (·), (4) modeling of the relationships, (5) estimation of θ , and (6) properties of DX1T (·) determine ‘intrinsic’ uncertainty, all of which introduce in-sample uncertainties. Next: (7) properties of DXTT +1 (·) determine forecast uncertainty, +H (8) which grows as H increases, (9) especially for integrated data, (10) increased by changes in DXTT +1 (·) or θ . +H These 10 issues structure analysis of forecasting. David F. Hendry

Economic Forecasting – p.14/169

Example Stationary scalar first-order autoregressive example:  2 xt = ρxt−1 + vt where vt ∼ IN 0, σ v and |ρ| < 1. With ρ known and constant, forecast from xT is: x bT +1|T = ρxT

DX1T (·) implies DXT +1 (·), producing unbiased forecast: T +1

E







xT +1 − x bT +1|T | xT = E [(ρ − ρ) xT + vT ] = 0,

with smallest possible variance determined by DX1T (·):   V xT +1 − x bT +1|T = σ 2v .   2 Thus: DXTT +1 (·) = IN ρxT , σ v . +1 Issues (1)–(10) ‘assumed away’. David F. Hendry

Economic Forecasting – p.15/169

Potential problems (1) Specification incomplete if (e.g.) vector xt not scalar. (2) Measurement incorrect if (e.g.) observe e xt not xt . (3) Formulation inadequate if (e.g.) intercept needed. (4) modeling wrong if (e.g.) selected ρxt−2 . (5) Estimating ρ adds bias, (ρ − E[b ρ])xT , and variance V[b ρ]x2T .  2 (6) Properties of D(vt ) = IN 0, σ v determine V[xt ].  2 (7) Assumed vT +1 ∼ IN 0, σ v but V[vT +1 ] could differ. PH h−1 1−ρ2H 2 (8) Multi-step forecast error h=1 ρ vT +h has V = 1−ρ2 σ v . (9) If ρ = 1 have trending forecast variance Hσ 2v . (10) If ρ changes could experience forecast failure. Must be prepared for risks from (1)–(10).

David F. Hendry

Economic Forecasting – p.16/169

Estimated AR1: ρ = 0.8, T = 40, σ 2v = 10 2

a

{y t }

^ D(ρ)

4

b

1 3

0

2

-1

1

-2 0

10

20

30

40

0.0

0.2

0.4

0.6

0.8

c

d

0.15 0.15

^2 ) D(σ v 0.10

D(tρ=0) 0.10

0.05

0.05

2.5

1.0

5.0

7.5

10.0

David F. Hendry

12.5

15.0

17.5

20.0

0

5

10

15

20

Economic Forecasting – p.17/169

AR1 forecasts: break in ρ = 0.4 at T = 40 1-step forecasts, known ρ versus estimated ρ 5-step forecasts, constant parameters 20

a

20

b

10

10

0

0

known ρ yt estimated ρ

full variance yt error variance only

40

40

45

constancy test under null constancy test under alternative 1.00

c

0.75

20

45

5-step forecasts, changed ρ 5-step forecasts y t for changed ρ

d

10

5% significance line 0.50 0 0.25

0.0

2.5

5.0

David F. Hendry

7.5

10.0

12.5

40

45 Economic Forecasting – p.18/169

Problems hardly disastrous Small increase in uncertainty from estimating ρ; forecast intervals grow quite slowly as H increases. Little noticeable impact from halving ρ at T = 40. Constancy test hardly rejects false null. But, slight change to model: xt = α + ρxt−1 + vt where vt ∼ IN



0, σ 2v



and |ρ| < 1.

Everything else the same, except α = 10. Little change in estimation distributions or forecasts: until non-constant ρ, for same size and time of break. Then – catastrophe!!

David F. Hendry

Economic Forecasting – p.19/169

AR1 forecasts: intercept & break in ρ y t =10+ρyt−1 +v t

60

b

Constancy test

a 0.02

50 40

0.01

30 20 0 60

10

20

30

40

1-step forecasts yt

50 40

0

25

50

75

100

125

150

60

c

d 50 40

30

30

20 20

5-step forecasts yt

10 40

45

David F. Hendry

40

45

Economic Forecasting – p.20/169

Problems now disastrous Change due to effect on E[xt ] = µ. Write model as:  2 ∆xt = (ρ − 1) (xt−1 − µ)+vt where vt ∼ IN 0, σ v and |ρ| < 1. In first case E[xt ] = 0 before and after shift in ρ.

In second: E[xt ] = µ = α/(1 − ρ). E[xt ] = µ shifts markedly from µ = 50 to µ∗ = 17. All models in this class are equilibrium correction: fail systematically if E[·] changes, as converge back to µ irrespective of the value of µ∗ . Huge class of equilibrium-correction models (EqCMS): regressions; dynamic systems; VARs; DSGEs; ARCH; GARCH; some other volatility models. Pervasive and pernicious problem affecting all members. David F. Hendry

Economic Forecasting – p.21/169

Explanation Must write conditional expectation as: b xT +h|T = ET +h [xT +h |XT ]. Fine if stationary: ET +h = ET . But paradox if Dxt (·) not constant: need to know whole future distribution to derive forecast. xT +h|T = ET [xT +h |XT ] is useful. Cannot prove e For example, if xt ∼ IN[µt , σ 2x ], then:

Et [xt | Xt−1 ] = µt Et−1 [xt | Xt−1 ] = µt−1

when the mean changes, letting t = xt − Et−1 [xt |Xt−1 ]: Et [t ] = µt − µt−1 6= 0

(1)

so conditional expectation is not even unbiased. David F. Hendry

Economic Forecasting – p.22/169

New theoretical framework Empirically-relevant theory needs to allow for: model mis-specified for DGP parameters estimated from inaccurate observations, on an integrated-cointegrated system, intermittently altering unexpectedly from structural breaks. Theory has achieved some success: explains prevalence of forecast failure; accounts for results of forecasting competitions; explains good performance of ‘consensus’ forecasts. Corrects some ‘folklore’ of forecasting: forecast failure not due to: ‘poor econometric methods’, ‘inaccurate data’, ‘incorrect estimation’, or ‘data-based model selection’. David F. Hendry

Economic Forecasting – p.23/169

Possible forecasting problems Mis-specification, mis-estimation, non-constancy, of deterministic, stochastic, or error components, all could induce forecast failure. But location shifts are the key problem, namely shifts in parameters of deterministic components. Location shifts easy to detect. Other breaks not so easy to detect: impulse response analyses then unreliable. Many conventional results change radically when parameter non-constancy: non-causal models can outperform causal; multi-step forecasts more accurate than 1-step; intercept corrections can improve forecasts. David F. Hendry

Economic Forecasting – p.24/169

£ERI outturns & 2-year consensus forecasts

1990 = 100

115 110 105 100 95 90 85

2-year Consensus forecasts (end-points only)

80 75

1995 1995 1996 1997 1998 1998 1999 2000 2001 2001 2002 2003 2003 2004

David F. Hendry

Economic Forecasting – p.25/169

Six different breaks Broken trend

Step shift

2.0

150 100



1.5

50

1.0 0

1.0

20

40

60

80

Impulse

∆2

0

100

20 40 60 80 100 Changing all VAR(1) parameters

2.5

0.5

0.0 −2.5

0.0 0

20 40 60 80 Changing collinearity

100 11.50

x1,b

0

20

40

60

x2,b

80

Changing measurements

100

11.25

5

11.00 10.75

0 −1

0

1

David F. Hendry

2

3

4

1975

1980

1985

1990

Economic Forecasting – p.26/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.27/169

Measuring forecast accuracy Measures of forecast accuracy often based on: forecast-error second-moment (MSFE) matrix:    0 Vh ≡ E eT +h eT +h = V [eT +h ] + E [eT +h ] E

e0T +h



(2)

where eT +h is vector of h-step ahead forecast errors.

Comparisons based on (2) may yield inconsistent rankings between forecasting models or methods, and for multi-step ahead forecasts. Figure 29 illustrates. Analyses could begin with loss function, from which optimal predictor derived. But well-defined mapping between forecast errors and costs not typical in macroeconomics. David F. Hendry

Economic Forecasting – p.28/169

Who wins: a or b? 80

forecast a actual

2

70 0 60 50 1975 80

change forecast a actual growth

−2 1976

1977

1978

actual forecast b

1975

1976

1977

1978

1977

1978

2

70 0 60 50 1975

−2 1976 David F. Hendry

1977

1978

1975

actual growth change forecast b

1976

Economic Forecasting – p.29/169

Forecast error measures Problem with measures based on (2): lack of invariance to non-singular, scale-preserving, linear transformations, although model class is invariant. Generalized forecast-error second-moment matrix (GFESM) reflects all variables and horizons:   0 Φh = E E h E h .

Eh stacks all forecast errors up to h-step ahead: E0h

=



e0T +1 , e0T +2 , . . . , e0T +h−1 , e0T +h



.

Direction invariant: volume of hypersphere around origin. |Φh | is close to predictive likelihood. Could base on cost or utility function of user–if known. David F. Hendry

Economic Forecasting – p.30/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (H) Other potential sources of forecast failure (G) Re-analysis of UK M1 (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.31/169

Unpredictability Non-degenerate ν t is unpredictable wrt It−1 over T if: Dν t (ν t | It−1 ) = Dν t (ν t ) ∀t ∈ T .

(3)

Property of ν t in relation to It−1 intrinsic to ν t ; T may be a singleton ({t}). Necessary that ν t is unpredictable in mean and variance: Et [ν t | It−1 ] = Et [ν t ] and Vt [ν t | It−1 ] = Vt [ν t ] . Former does not imply latter, or vice versa. Take Et [ν t ] = 0 ∀t.

David F. Hendry

Economic Forecasting – p.32/169

Predictability Predictability requires combinations with It−1 as in: yt = φt (It−1 , ν t )

yt depends on information set and innovation, so: Dyt (yt | It−1 ) 6= Dyt (yt ) ∀t ∈ T .

(4) (5)

Predictability not invariant to inter-temporal transforms. Two most relevant special cases of (4) are: yt = ft (It−1 ) + ν t

(6)

yt = ν t ϕt (It−1 )

(7)

is element by element multiplication, yi,t = ν i,t ϕi,t (It−1 ). yt in (6) is predictable even if ν t is not: Et [yt | It−1 ] = ft (It−1 ) 6= Et [yt ] .

‘Events’ which help predict yt must have happened. David F. Hendry

Economic Forecasting – p.33/169

Prediction decomposition Converse to (6): 

DX X1T | I0 , · =

T Y t=1

Dxt (xt | It−1 , ·) .

(8)

yt decomposable into 2 orthogonal components,

one being a mean innovation. Since: Vt [yt | It−1 ] ≤ Vt [yt ]

(9)

predictability ensures a variance reduction. yt is unpredictable in mean in (7): Et [yt | It−1 ] = Et [ν t ϕt (It−1 ) | It−1 ] = 0,

but predictable in variance as Et [yt yt0 | It−1 ] is:   0 0

Et ν t ν t ϕt (It−1 ) ϕt (It−1 ) | It−1 = Ων t ϕt (It−1 ) ϕt (It−1 )0 .

David F. Hendry

Economic Forecasting – p.34/169

GARCH processes If It−1 is σ -field of yt−i with constant σ 2v and ϕ (·) = σ t : yt = ν t σ t ,

GARCH processes generated by: p p X X σ 2t = ϕ0 +

2 ϕi yt−i +

i=1

ϕp+j σ 2t−j .

(10)

j=1

ϕ (·) = exp (σ t /2) leads to stochastic volatility: σ t+1 = ϕ0 + ϕ21 σ t + η t .

(11)

Predictability of variance important for option pricing, and for appropriate forecast intervals. But variance can also experience breaks: return to this later. David F. Hendry

Economic Forecasting – p.35/169

Changes in information sets Predictability relative to the information used— when Jt−1 ⊂ It−1 , it is possible that: Dut (ut | Jt−1 ) = Dut (ut ) yet Dut (ut | It−1 ) 6= Dut (ut ) . (12)

Helps underpin Gets model selection, and congruence as a basis for econometric modeling. Less learned from Jt−1 than It−1 , and variance unnecessarily large. Extra information in It−1 explains part of previous error.

David F. Hendry

Economic Forecasting – p.36/169

Prediction from reduced information Use Jt−1 ⊂ It−1 to predict yt = ν t + ft (It−1 ): less accurate, but unbiased predictions result. Since Et [ν t |It−1 ] = 0: Et [ν t | Jt−1 ] = 0,

then:

Et [yt | Jt−1 ] = Et [ft (It−1 ) | Jt−1 ] = gt (Jt−1 ) .

Let et = yt − gt (Jt−1 ), then:

Et [et | Jt−1 ] = 0,

so et is a mean innovation wrt Jt−1 .

David F. Hendry

Economic Forecasting – p.37/169

Impacts But as et = ν t + ft (It−1 ) − gt (Jt−1 ): Et [et | It−1 ] = ft (It−1 ) − Et [gt (Jt−1 ) | It−1 ] = ft (It−1 ) − gt (Jt−1 ) 6= 0,

so et not an innovation wrt It−1 : Vt [et ] > Vt [ν t ] .

Cannot forecast the unpredictable: may be hope of forecasting predictable events.

David F. Hendry

Economic Forecasting – p.38/169

Increasing information Disaggregate yT into its components yi,T . Extends information set IT −1 . If yi,T depend in different ways on IT −1 , can predictability be improved by disaggregation? Compared to predicting yT directly from IT −1 , nothing is gained when IT −1 is the same. Key issue is not predicting component changes, then aggregating those predictions, but including components in IT −1 , and not restricting IT −1 to lags of aggregate. If aggregate information set is JT −1 ⊂ It−1 must have larger MSFE.

David F. Hendry

Economic Forecasting – p.39/169

Increasing horizon for multi-step Fix information at IT and predict yT +h as h = 1, 2, . . . , H . If unpredictable, remains so ∀(T + h) ∈ T . Little more can be said if densities can change over time:  2 −1 (13) yt = ρt + t t where t ∼ IN 0, σ  . Compare predictability of yT +h with yT +h−1 given IT :  2 VT +h [yT +h | IT ] = ET +h (yT +h − ρ (T + h)) = (T + h)−2 σ 2 < VT +h−1 [yT +h−1 | IT ]

(14)

Variance falls as further ahead in this non-stationary case.

David F. Hendry

Economic Forecasting – p.40/169

Non-stationarity Unpredictability relative to historical time period: Dut (ut | It−1 ) 6= Dut (ut ) for t = 1, . . . , T,

yet (or vice versa): Dut (ut | It−1 ) = Dut (ut ) for t = T + 1, . . . , T + H.

Two important cases of change in ft (It−1 ). (a) ft+1 (·) 6= ft (·), but Et+1 [yt+1 ] = Et [yt ] (b) ‘location shift’: Et+1 [yt+1 ] 6= Et [yt ]. Such changes unproblematic for predictability: yt+j − ft+j (It+j−1 ) is unpredictable for both periods j = 0, 1. But practical difficulties may be immense.

David F. Hendry

Economic Forecasting – p.41/169

Implications of predictability To forecast, model gt (Jt−1 ) by ψ(Jbt−1 , θ)  and use: θ (T ) . (15) y bT +h|T = ψ h JbT , b Predictability of yT +h from JT has 6 distinct aspects: 1. composition of It−1 ; 2. how It−1 influences ft (It−1 ); 3. how Dyt (· | It−1 ) (or ft (It−1 )) changes over time; 4. limited information set Jt−1 ⊂ It−1 ; 5. mapping ft (It−1 ) into gt (Jt−1 ); 6. how JT will enter DyT +h (· | JT ) (gT +h (JT )).   θ (T ) : Four more issues forecasting yT +h from ψ h JbT , b 7. approximating gt (Jt−1 ) by ψ (Jt−1 , θ); 8. measurement errors between Jt−1 and observed Jbt−1 ∀t; θ (T ) from in-sample data; 9. estimation of θ in ψ (Jt−1 , θ) by b θ (T ) ). 10. multi-step errors in yT +h from ψ h (JbT , b David F. Hendry

Economic Forecasting – p.42/169

Taxonomy of forecast errors Six basic sources of 1-step error on forecasting model, uT +1 = yT +1 − ψ 1 (JbT , b θ (T ) ), where JbT is observed information: uT +1 = + + + +

DGP innovation error incomplete information induced change approximation reduction  measurement error JbT , b θ (T ) estimation uncertainty

ν T +1 fT +1 (IT ) − gT +1 (JT ) gT +1 (JT ) − gT (JT ) gT (JT ) − ψ 1 (JT , θ) bT , θ) ψ 1 (J , θ) − ψ ( J T 1   

+ ψ 1 JbT , θ − ψ 1

ν T +1 is an innovation against DGP information IT ,

so nothing will reduce its uncertainty. Errors accumulate if h > 1-steps ahead. David F. Hendry

Economic Forecasting – p.43/169

Incomplete information Increases uncertainty, but: gT +1 (JT ) = ET +1 [fT +1 (IT ) | JT ] .

Thus, no additional biases, even when breaks often occur. Problem of breaks lies in gT +1 (JT ) − gT (JT ). Systematic mis-specifications possible from ψ (·) and θ as approximations to gT (JT ). Measurement errors pernicious at forecast origin: ‘wrong’ local trend. θ (T ) in place of θ : Estimation uncertainty from b compound effects of breaks by slow adjustment to changes in θ . David F. Hendry

Economic Forecasting – p.44/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.45/169

Implications for forecasting theory When model is a good representation of economy, and structure of economy unchanged, many important theorems can be proved. Forecasts approximate conditional expectations: ‘best’ model generally produces best forecasts – congruent encompassing model should dominate. Need to pool forecasts refutes encompassing. Forecast intervals accurate and increase with horizon. But history of economic forecasting refutes.

David F. Hendry

Economic Forecasting – p.46/169

New theory When econometric models mis-specified, and economies subject to important unanticipated shifts: forecast evaluations unfavourable to econometrics, as are forecasting competitions. ‘Simple’ extrapolative methods outperform; pooling forecasts often pays. ‘Judgement’ has value added in economic forecasting. Forecast intervals inaccurate and may decrease with horizon. Thus Jt−1 ⊂ It−1 , and gt−1 non-constant creates gulf between predictability and empirical forecasting.

David F. Hendry

Economic Forecasting – p.47/169

Taxonomy of VAR forecast errors 1. Forecasting model formulated from theory, 2. selected by empirical criteria, 3. model mis-specified for DGP, 4. parameters estimated (possibly inconsistently), 5. from (probably inaccurate) observations, 6. generated by an integrated-cointegrated process, 7. subject to intermittent structural breaks. Theory to mimic empirical setting. Forecast-error taxonomy partitioned by deterministic and stochastic influences. Any or all could cause forecast failure: key is 7. David F. Hendry

Economic Forecasting – p.48/169

Data generation process First-order VAR: yt = φ + Πyt−1 + t with t ∼ INn [0, Ω ] .

Unconditional mean of yt is:

E [yt ] = (In − Π)−1 φ = ϕ

and hence:

(16)

yt − ϕ = Π (yt−1 − ϕ) + t .

Use forecasting model:

b + Πy b T +h−1|T +h−2 . y bT +h|T +h−1 = φ

h-step forecasts at time T for h = 1, . . . , H : y bT +h|T



h b b −ϕ b=Π b yT +h−1|T − ϕ b = Π (b yT − ϕ b) ,

b. b −1 φ where ϕ b = (In − Π) David F. Hendry

(17)

Economic Forecasting – p.49/169

Structural break After forecasting at time T , (φ : Π) change to (φ∗ : Π∗ ), but process stays I(0). From T + 1 onwards: yT +h = φ∗ + Π∗ yT +h−1 + T +h = ϕ∗ + Π∗ (yT +h−1 − ϕ∗ ) + T +h = ϕ∗ + (Π∗ )h (yT − ϕ∗ ) +

h−1 X

(Π∗ )i T +h−i .

i=0

Forecast error is b T +h|T = yT +h − y bT +h|T . b = Πp and plimT →∞ ϕ Let plimT →∞ Π b = ϕp . Figure 51 illustrates the various parameters. David F. Hendry

Economic Forecasting – p.50/169

Population parameters and estimates density 0.40

^ f(π)

π^ is the actual estimate

0.35

π p is the plim of the sample estimator

0.30

π e is the expectation of the estimator

0.25

π is the in−sample population value

0.20

π* is the forecast−period value ~ is selected πs = (0,π)

0.15 0.10 0.05 −4

−2

π

0

David F. Hendry

2

4 ~ ^ π π πe

6 8 πp = plim π^

10

π*

12

14

Economic Forecasting – p.51/169

Some theory Two approximations needed: b h = (Πp + δ Π )h ' Πhp + Π

h−1 X i=0

h ' Π Πip δ Π Πh−i−1 p + Ch . p

(18)

b − Πp . Note definition of δ Π = Π Let (·)ν denote the vector stacking the columns of a matrix: ν  Ch (yT − ϕ b ) ' Ch yT − ϕp = Ch yT − ϕp ! h−1 X 0 h−i−10 ν i =

'

Πp ⊗ yT − ϕ p Πp

δΠ

i=0 Fh δ νΠ .

Forecast error taxonomy now follows. David F. Hendry

Economic Forecasting – p.52/169

VAR forecast-error taxonomy bT +h|T ' 



In − (Π∗ )h (ϕ∗ − ϕ) (ia) equilibrium-mean change





+ (Π∗)h − Πh (yT −ϕ) (ib) slope change



+ In − Πhp (ϕ − ϕp )  h h + Π − Πp (yT − ϕ)  h bT ) + Πp + Ch (yT − y  h b − ϕp ) − I n − Πp (ϕ  ν b − Πp −Fh Π +

h−1 P

(Π∗ )i T +h−i

(iia) equilibrium-mean mis-specification (iib) slope mis-specification (iii) forecast-origin uncertainty (iva) equilibrium-mean estimation (ivb) slope estimation (v) error accumulation.

i=0

(19) David F. Hendry

Economic Forecasting – p.53/169

Detectable shifts b T +1|T ' 0 if slope changes at yT ' ϕ (ϕ constant). If in substantial disequilibrium, larger effect. Finite-sample biases in estimators negligible: b ' φ (true for symmetric errors). b ' Π and E[φ] E[Π]

Detectable shifts are ϕ to ϕ∗ , either from changes in φ, or changes in dynamics (via (In − Π∗ )−1 φ). But if φ and Π alter with ϕ unchanged, not easily detected. Also, ease of detection of breaks in Π depends on φ. Breaks take time to have full effect, so data expectations alter every period.

David F. Hendry

Economic Forecasting – p.54/169

Undetectable breaks example Data generation process (DGP) is bivariate VAR(1) so n = 2 and: yt = φ + Πyt−1 + t , t ∼ INn [0, Ω] , t = 1, . . . T1

yt = φ∗ + Π∗ yt−1 + t , t ∼ INn [0, Ω] , t = T1 + 1, . . . T

ω i,i = σ 2 = 0.012 , ω i,j = 0, so |i,t | > 0.03 = 3σ is an ‘outlier’     0.7 0.20 1.0 Π= , φ= −0.20 0.6 1.0 π i,j changed by (−20, −40, 30, −10)σ , and φi by ±100σ : ∗

Π =



0.5 −0.20 0.10 0.5





, φ =



2.0 −.0625



Fundamentally changed process at T1 (= 0.75T here) David F. Hendry

Economic Forecasting – p.55/169

Chow test rejection frequencies Baseline data sample

Baseline null rejection frequencies

1.00

(a1)

(a2)

2

Chow test at 0.01

0.75

0

0.50 0.25

−2

x1,a x2,a 0

20

40

60

80

Changing all VAR(1) parameters

100

60 1.00

(b1)

(b2)

70

80

90

100

90

100

Changing all VAR(1) parameters Chow test at 0.01

2 0.75 0

0.50

−2

0.25

x1,b x2,b 0

20

40

David F. Hendry

60

80

100

60

70

80

Economic Forecasting – p.56/169

Detectable breaks Fig. 56 shows outcomes for VAR and conditional model in: (a) base scenario and (b) all parameters changed No sign of any shift But with constant Π and change φ by just 5σ to:   1.05 0.95

break easily detected, as fig. 58c,d show (even 1 period)

David F. Hendry

Economic Forecasting – p.57/169

All Chow test outcomes Baseline data sample

1.0

(a1)

2.5 0.0

Baseline null rejection frequencies

(a2)

Chow test at 0.01

0.5

−2.5

x1,a

0 20 40 60 Changing all VAR(1) parameters

x2,a

80

100 1.0

(b1)

2.5

60

0.0

(b2)

70 80 90 Changing all VAR(1) parameters

100

Chow test at 0.01

0.5

−2.5

x1,b

0

20 40 60 Changing intercepts only

5 (c1)

x1,c

x2,b

80

100

60 1.0

70 80 90 Changing intercepts only Chow test at 0.01

(c2)

x2,c

0

100

0.5

−5 0

20

(d1) 5

40 60 One−period shift x1,d

80

100

60

70

1.0

x2,d

80 90 One−period shift

100

Chow test at 0.01

(d2) 0.5

0 0

20

40

David F. Hendry

60

80

100

60

70

80

90

100

Economic Forecasting – p.58/169

Location shifts Key is long-run mean: E [yt ] = (I − Π)−1 φ stays at: 

3.75 0.625



in first case, despite changes in Π and φ. In second case, shifts from that to:   3.8125 0.46875

.

Location shifts of a little over 6σ and 16σ

Location shifts are key to break detection

David F. Hendry

Economic Forecasting – p.59/169

Location shifts Seen location shifts can be detected: break tests have power against them. General approach via impulse saturation noted earlier. Equilibrium-mean mis-specification mainly due to unmodeled location shifts in-sample. y=ϕ b is consistent for ϕ when ϕ is constant. And φ implicitly derived from ϕ.

Source of problem: if equilibrium mean has shifted, contaminates all EqCM forecasts. Cannot solve by adding variables that ‘explain’ shift. Non-linear models essential: hard selection problem. Classic example: long-period autoregressions for inflation. Mean inflation in 1970s much higher than today: figure 61. David F. Hendry

Economic Forecasting – p.60/169

AR(1) inflation forecasts 0.25

∆ p^ ∆p

0.20 0.15 0.10 0.05 1970 0.150

1975

1980

1985

1990

1995

^ ^ × ±2σ ϕ (T) ϕ

0.125

2000

(T)

0.100 0.075

^ ϕ (1970−−87)

0.050

^ ϕ (1987−−2001)

0.025 0.000

1980

1985

David F. Hendry

1990

1995

2000

Economic Forecasting – p.61/169

Consequences for forecasting Intermittent forecast failures confirm these arguments: either economists uniquely fail own assumptions; or assumptions of time invariance are incorrect. Forecast failure mainly due to location shifts. Most location shifts seem unanticipated ex ante: explains forecast failure in equilibrium-correction models– vast majority of theories and models in economics. Zero-mean changes don’t harm forecasts, pace Lucas; but could damage policy analyses. Pernicious for impulse response analyses. Could imposing more economic-theory restrictions solve? Or perhaps even less....

David F. Hendry

Economic Forecasting – p.62/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.63/169

Models of expectations Theories of expectations must face realities of forecasting. ‘Rational’ expectations (RE): conditional expectation given available information (It ): re = E [yt+1 | It ] . yt+1

(20)

Avoids arbitrage opportunities or unnecessary losses. RE assumes free information, computing power, and discovery of form of E [yt+1 |It ]. But expectations, like forecasts, are instrumental— agents should equate marginal costs with benefits: leads to ‘economically rationally expectations’ (ERE). Model-consistent expectations (M-CE): as models are false, also imposes invalid model on expectations. David F. Hendry

Economic Forecasting – p.64/169

Problems in expectations formation RE, ERE and M-CE assume agents can do calculations. But (20) should be written Z as: e = Et+1 [yt+1 | It ] = yt+1

yt+1 ft+1 (yt+1 | It ) dyt+1 .

(21)

e Only then is yt+1 even unbiased for yt+1 . But (21) requires crystal balls for future ft+1 (yt+1 |It ). se , Best an agent can do is form ‘sensible expectation’, yt+1 forecasting ft+1 (·) by fbt+1 (·): Z se (22) = yt+1 fbt+1 (yt+1 | It ) dyt+1 . yt+1

If moments of ft+1 (yt+1 |It ) alter, no good rules for fbt+1 (·), but fbt+1 (yt+1 |It ) = ft (·) not a good choice. Agents cannot know how It enters ft+1 (·) if no time invariance. David F. Hendry

Economic Forecasting – p.65/169

Loss-minimization forecasting Nor can econometricians. Clive Granger (Empirical Modeling in Economics, p50, is typical) asserts must be explicit about cost function: Z  min C yT +h − ybT +h|T dPT (yT +h |IT ) (23) ybT +h|T

Claims can always be solved (perhaps numerically).

1] dPT (·) assumes complete stationarity; or else it is the wrong distribution to average over. 2] Should at least write dPT +h (·) to clarify: now if a location shift, dPT (·) can be very poor. 3] Needs to forecast complete dPT +h (·) to solve. David F. Hendry

Economic Forecasting – p.66/169

Robust devices When ft+1 (·) 6= ft (·), forecasting devices robust to location shifts win forecasting competitions. Key is avoiding systematic mis-forecasting. ‘Forecasting’ success of differenced models partly due to their robustness, partly to capturing key prediction information. If agents use robust predictors, need to re-specify expectations in models.

David F. Hendry

Economic Forecasting – p.67/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.68/169

Cointegrated DGP Vector equilibrium correction model (VEqCM):  0

(24)

Shifts of ∇µ∗ = µ∗ − µ and ∇γ ∗ = γ ∗ − γ . Then:  0 ∗ ∗

(25)

(∆xt − γ) = α β xt−1 − µ + t .

E[∆xt ] = γ



0

0



with β γ = 0 and E β xt = µ. ∆xT +1 = γ + α β xT − µ

+ T +1

so from (25):

0



∆xT +1 = γ + α β xT − µ + T +1 + ∇γ ∗ − α∇µ∗

or letting: then:



c T +1|T = γ + α β xT − µ ∆x 0

c T +1|T + ∇γ ∗ − α∇µ∗ + T +1 . ∆xT +1 = ∆x

David F. Hendry

(26)

(27) (28)

Economic Forecasting – p.69/169

Forecasting models Two models robust to location shifts: VAR in differences (DV): ∆xt = γ + ξt ,

(29)

and a DV in differences (DDV): ∆2 xt = ζ t or ∆xt = ∆xt−1 + ζ t .

Consider a location shift at T − 1:



0

(30)

∆xT −1 = γ + α β xT −2 − µ + T −1 ∗

0



0



∆xT = γ + α β xT −1 − µ ∗

∆xT +1 = γ + α β xT − µ David F. Hendry





+ T

(31)

+ T +1 ,

Economic Forecasting – p.70/169

Robust device taxonomy Consider 1-step forecasts from DGP:



0

∆b xT +1|T = γ + α β xT − µ .

Taxonomy simplifies to:

eT +1|T = (γ ∗ − γ) − α (µ∗ − µ) + T +1 ,

(32)

with a forecast-error bias of:  

ET +1 eT +1|T = (γ ∗ − γ) − α (µ∗ − µ) .

Contrast using ∆e xT +1|T = ∆xT . ∆xt−1 does not enter DGP: not causally-relevant. Corresponding taxonomy is:  ∗ 0 ∗ ∆xT +1 − ∆e xT +1|T = γ + α β xT − µ + T +1 − ∆xT = ∆T +1 + αβ 0 ∆xT .

David F. Hendry

(33)

Economic Forecasting – p.71/169

Forecast-error bias 



xT +1|T = αβ 0 γ ∗ = 0, ET +1 ∆xT +1 − ∆e

because β 0 γ ∗ is zero.

Until VEqCM has parameters γ ∗ and µ∗ , (30) will outperform, despite non-causal basis.

David F. Hendry

Economic Forecasting – p.72/169

DDV as a robust device Economic time series do not continuously accelerate – zero unconditional expectation of second difference:  2  E ∆ xt = 0.

(34)

Second differencing: removes two unit roots, intercepts and linear trends; changes location shifts to ‘blips’; and converts breaks in trends to impulses. Key is differencing till no deterministic terms: hence success of ‘random walk’ for speculative prices. But differencing incompatible with measurement errors: exacerbates negative moving average.

David F. Hendry

Economic Forecasting – p.73/169

Location shifts and broken trends Location shift 1.0

Broken trend

Level

Level

150 100

0.5

50 0.0 0 1.0

20

40

60

80

100

0 2.0



0.5

1.5

0.0

1.0

1

0

20

40

60

80

100

∆2

1.0

20

40

60

80

100

20

40

60

80

100

20

40

60

80

100



0 ∆2

0.5

0

0.0 0

20

40

David F. Hendry

60

80

100

0

Economic Forecasting – p.74/169

Using ∆xT to forecast Consider an extended in-sample DGP:  0

∆xT = γ + α β xT −1 − µ + ΨzT + vT ,

where zt denotes many omitted effects, with:  ∗ 0 ∗ ∗ ∗ ∗ ∆xT +i = γ + α

(β ) xT +i−1 − µ

+ Ψ zT +i + vT +i .

A VEqCM in xt is used for forecasting:   0 b xT +i−1 − µ b . ∆b xT +i|T +i−1 = γ b+α b β

(35)

(36)

(37)

All main sources of forecast error occur given (36): stochastic and deterministic breaks; omitted variables; inconsistent parameters; estimation uncertainty; innovation errors. David F. Hendry

Economic Forecasting – p.75/169

DDV avoids failure Contrast using sequence of ∆xT +i−1 to forecast: ∆e xT +i|T +i−1 = ∆xT +i−1 . But because of (36), for i > 1, ∆xT +i−1 is:  ∗ 0 ∗ ∗ ∗ ∗ ∆xT +i−1 = γ + α

(β ) xT +i−2 − µ

(38)

+ Ψ zT +i−1 + vT +i−1 .

Thus, ∆xT +i−1 reflects all the effects needed: parameter changes; no omitted variables; with no estimation issues at all.

(39)

Two drawbacks: unwanted presence of vT +i−1 in (39), which doubles innovation error variance; and all variables lagged one extra period, which adds ‘noise’ of I(−1) effects. Clear trade-off. David F. Hendry

Economic Forecasting – p.76/169

Explanation But easy to see why ∆e xT +i|T +i−1 = ∆xT +i−1 may win. xT +i|T +i−1 = v eT +i|T +i−1 , then: Let ∆xT +i − ∆e  ∗ 0 ∗ ∗ ∗ v eT +i|T +i−1 = γ + α (β ) xT +i−1 − µ + Ψ∗ zT +i + vT +i  ∗  ∗ 0 ∗ ∗ ∗ − γ +α

(β ) xT +i−2 − µ

+ Ψ zT +i−1 + vT +i−1

= α∗ (β ∗ )0 ∆xT +i−1 + Ψ∗ ∆zT +i + ∆vT +i .

(40)

All terms in last line must be I(−1), so very ‘noisy’, but no systematic failure.

David F. Hendry



Economic Forecasting – p.77/169

Differencing the VEqCM Shifts in µ are most pernicious for forecasting: use differenced variant of (24) after estimation. Difference the VEqCM prior to forecasting:  0 ∆xt = In + αβ ∆xt−1 + ∆t ,

so written as forecasting device:





0

b xT − µ c T +1|T = ∆xT + α b . ∆x b∆ β

(41)

(42)

In (41), cointegration rank restriction imposed; 0 b so (42) is double-differenced VAR plus α b β ∆xT : re-introduces main observable from (36) in robust form. From (26):  0 ∗ ∆xT +1 = ∆xT + α β ∆xT − ∆µ

+ ∆T +1 .

∆µ∗ = ∇µ∗ at time T − 1 only, otherwise is zero: ∆µ b = 0. David F. Hendry

Economic Forecasting – p.78/169

Properties f T +1|T ] is: Thus E[∆xT +1 − ∆x h 

b E ∆xT + αβ 0 ∆xT − In + α bβ

0



i

∆xT = 0

DVEqCM ‘misses’ for 1 period only, does not make systematic, and increasing, errors. Ignoring parameter estimation uncertainty: f T +1|T ' ∆T +1 , e eT +1|T = ∆xT +1 − ∆x

and e eT +2|T +1 ' ∆T +2 . System error is {t }: so differencing doubles 1-step error variance.

Same for DDV, but adds variance of DVEqCM term αβ 0 ∆xt .

David F. Hendry

Economic Forecasting – p.79/169

Public-service case study Forecasting discounted net TV advertising revenues 10 years ahead for ITV3 licence fee renewal for OfCom. Econometric VEqCM system from PcGets ‘quick modeler’; forecasting by the DVEqCM: the robust device which retains causal effects Also averaged across a small group of related methods: see http://www.ofcom.org.uk/research/tv/reports/tvadvmarket.pdf

David F. Hendry

Economic Forecasting – p.80/169

TV advertising revenue

David F. Hendry

Economic Forecasting – p.81/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.82/169

Re-analysis of UK M1 Two ‘forecasting’ models of UK M1: pre/post 1984 Banking Act which legalized interest payments on checking accounts. Data quarterly, SA, over 1963(1)–1989(2), defined as: M nominal M1, I real total final expenditure (TFE) at 1985 prices, P the TFE deflator, RLA the three-month local authority interest rate, RS learning-adjusted own interest rate, RN ET RLA − RS . Model1 uses competitive interest rate RLA : model2 opportunity-cost measure RN ET . Model1 excellent over 1963(3) to 1983(2). But forecasts fail badly to 1989(2). David F. Hendry

Economic Forecasting – p.83/169

Money, Income, Inflation & Interest rates 12.00

(m−p)t xt

0.050

11.75

0.025

11.50

0.000

∆(m−p)t ∆x t

−0.025

11.25

−0.050 1960 0.06

1970

1980

1990

∆p t Rn,t

1960

1970

0.00

1980

1990

(m−p−x)t −Rn,t

−0.05 0.04 −0.10 0.02 −0.15 0.00 1960

1970

David F. Hendry

1980

1990

1960

1970

1980

1990

Economic Forecasting – p.84/169

Short-sample estimation and forecasts .05

∆(m−p) Fitted

.05

.025

.025

0

0

−.025

−.025

−.05

−.05 1965

5

1970

1975

1980

1985

Residual

1990

∆(m−p) x Fitted Forecast

−.04

−.02

0

.02

.04

.05

2.5 0 0 −.05 1965

1970

1975

David F. Hendry

1980

1985

1990

∆(m−p) Fitted Forecast

1985

1990

Economic Forecasting – p.85/169

Forecast Failure Figure 87 – if shift not modeled: (a) intercept falls towards growth rate of 0.007; (b) recursive estimates of feedback coefficient, α b1 , for initial cointegrating relation e = m − p − i + 8.7∆p + 6.6RLA converge to 0 with loss of cointegration, (Perron, 1989, Hendry and Neale, 1991); (c,d) constancy tests increasingly strongly reject; (e,f) ex post ‘tracking’ improves over ex ante; (f) Clearly see problem of EqCM leading to forecast of a fall when data rise.

David F. Hendry

Economic Forecasting – p.86/169

Intercept, α1, constancy tests, fitted values a

0.03

^ α^ 1 × ±2σ

b

−0.05

0.02 −0.10

^ Intercept × ±2σ

0.01 1983 1984 1985 1986 1987 1988 1989 1990 1983 1984 1985 1986 1987 1988 1989 1990 0.05

c

4

d

Forecast Chow tests 1%

3 0.00

^ 1−step residuals ±2σ t

2

1 1983 1984 1985 1986 1987 1988 1989 1990 1983 1984 1985 1986 1987 1988 1989 1990 0.05

e 0.05

0.00

0.00 ∆(m−p) Ex post fitted

−0.05 1965

1970

1975

David F. Hendry

1980

1985

−0.05 1990 1965

f

∆(m−p) Ex ante fitted

1970

1975

1980

1985

1990

Economic Forecasting – p.87/169

Equilibrium correction Highlights that equilibrium-correction mechanisms correct to specific equilibria; if equilibrium shifts, model makes large forecast errors. Revised model avoids predictive failure. In-sample model is unchanged with identical estimates: only alter measure of ‘opportunity cost of holding money’ from RLA to RLA − RS . Figure 89 shows forecasts for extended model: failure has been removed. No possible within-sample test of later behaviour. Key concept of ‘extended constancy’.

David F. Hendry

Economic Forecasting – p.88/169

Extended model with forecasts .05

∆(m−p) Fitted

.025

.025

0

0

−.025

−.025

−.05

−.05 1965

1970

∆(m−p) x Fitted Forecast

.05

1975

3

1980

1985

1990

−.05

−.025

.075

1

.025

.05 ∆(m−p) Forecast

Residual

2

0

.05

0

.025

−1 0 −2 1965

1970

1975

David F. Hendry

1980

1985

1990

1985

1990

Economic Forecasting – p.89/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.90/169

Model mis-specification & forecast failure Mis-specification of zero-mean components not major source of failure. But non-zero means interacting with breaks could precipitate failure – induces falsely included shifting variable. Model-mean shifts but the data mean does not: changes E[yt ], when E[yt ] unaltered, and E[·] is expectation based on model.

David F. Hendry

Economic Forecasting – p.91/169

Estimation uncertainty Secondary problem: variance terms O(1/T ) or O(1/T 2 ). \  Large difference E[yT +h ] − E yT +h|T could induce forecast inaccuracy. Neither collinearity nor lack of parsimony are culprits. But interacting with breaks elsewhere problematic: both could induce ex ante forecast failure, less likely to get ex post non-constancy: re-estimation eliminates non-systematic effects. Lack of parsimony may entail false inclusion: in larger sample, offending variable would be dropped. May be why ‘prior’ restrictions on VARs help.

David F. Hendry

Economic Forecasting – p.92/169

From prediction to forecasting Four obvious steps from predictability to forecastability: specification, modeling, estimation and selection. First relates to specification of Jt−1 ; second to formulation of Et [yt | Jt−1 ] to capture breaks etc.; third to estimation of unknown parameters such as ψ t ; fourth to selecting relevant variables from Jt−1 . Consider conditional regression model: 0

yt = β xt + ν t where ν t ∼ IN



0, σ 2v

with xt ∼ INk [µ, Σ] independently of {ν t } and:  0 0 E xt xt = µµ + Σ = Ω.

David F. Hendry



(43)

(44)

Economic Forecasting – p.93/169

Derivation Known value xT +1 , so forecast is: ybT +1|T

0 b = β xT +1 ,

(45)

with conditional MSFE for b ν T +1|T = yT +1 − ybT +1|T :    −1 −1 0 2 M b ν T +1|T | xT +1 = σ v 1 + T xT +1 Ω xT +1 .

Unconditional MSFE simplifies to well-known formula:    −1 2 M b ν T +1|T = σ v 1 + T k .

(46)

(47)

Problems of: collinearity, mis-specification, and breaks all need addressed.

David F. Hendry

Economic Forecasting – p.94/169

Collinearity Factorize Ω = H0 ΛH where H0 H = Ik and Hxt = zt , so:  0 E zt zt = Λ.

Then:

(48)

x0T +1 Ω−1 xT +1 = x0T +1 H0 Λ−1 HxT +1 = z0T +1 Λ−1 zT +1 =

k 2 X zi,T +1 i=1

λi

.

(49)

2 On average, (48) entails E[zi,T +1 ] = λi , so unconditionally: " k # 2 X zi,T   0 +1 −1

E xT +1 Ω

xT +1 = E

i=1

λi

= k.

(50)

Collinearity irrelevant to forecasting if the marginal process stays constant. David F. Hendry

Economic Forecasting – p.95/169

Changes in collinearity If Ω changes to Ω∗ , Λ changes to Λ∗ then: " k # k 2 X zi,T +1 X λ∗ E

i=1

so:

λi

i

=

i=1

λi

,

k ∗ X   λ i M b ν T +1|T ' σ 2v 1 + T −1 i=1

λi

(51)

!

.

Change in smallest λj induces biggest changes in M[b ν T +1|T ]. λj = 0.0001 → λ∗j = 0.01 implies λ∗j /λj = 100: dramatic increase in uncertainty. With updating: ! n h i ∗ X λ 2 i E bT +2|T +1 = σ 2 1 + ∗ i=1

T λi + λi

(52)

(53)

Figure 97 illustrates: same equation in both blocks; but re-estimated 2 quarters longer in lower David F. Hendry

Economic Forecasting – p.96/169

UK house-price inflation forecast errors 0.2 0.1

∆ph Fitted 1−step forecasts

4

scaled residuals forecast error

2 0.0

From 1972(1)

0

−0.1 −2 1965 0.2

1970

1975

∆ph Fitted 1−step forecasts

1965

1970

1975

scaled residuals forecast error

4

0.1 2 0.0 0

From 1972(3) −0.1

−2 1965 David F. Hendry

1970

1975

1965

1970

1975

Economic Forecasting – p.97/169

Impact of breaks in collinearity Largest λ ratio 0.1/0.9

1.5

Largest λ ratio 0.01/0.999

1.5

1.4

1.4

MSFE

^ In−sample ~ β Rolling β, k=25

1.2

^ 1.3 Recursive ~ β Rolling β, k=10 MSFE

1.3

1.2

1.1

1.1 1

3

5

7

9

1

Time since break→

Largest λ ratio 0.9/0.1

3

5

7

9

Time since break→

Largest λ ratio 0.999/0.01

15

MSFE

MSFE

2.5 2.0 1.5

10

5

1

3

5

David F. Hendry

7

9

Time since break→

1

3

5

7

9

Time since break→

Economic Forecasting – p.98/169

Mis-specification Partition x0t = (x01,t : x02,t ), where k1 + k2 = k . Forecast: 0

where β =

β 01

:

β 02



y T +1|T = x01,T +1 β 1 ,

(54)

. Let Ω = Λ. Then for ρ1y = E[x1,t yt ]:

−1 β 1,e = E[β 1 ] ' Λ−1 ρ = β + Λ 1 11 1y 11 Λ12 β 2 = β 1 ,

(55)

ω T +1|T = x1,T +1 β 1 − β 1 + x2,T +1 β 2 + ω T +1 .

(56)

for Λ12 = 0. Forecast error at T + 1 is ω T +1|T = yT +1 − y T +1|T :  0 0 Unconditional MSE is: h i E ω 2T +1|T

= σ 2ω

1 + T −1

k1 X i=1

λ∗i λi

!

+ β 02 Λ∗22 β 2 .

(57)

This trade-off central to forecast-model selection. David F. Hendry

Economic Forecasting – p.99/169

Model selection for forecasting Since

β 02 Λ22 β 2

=

then from h(57):

Pk

2 β i=k1 +1 i λi and letting:

E ω 2T +1|T | xT +1

τ 2β j

i

T β 2j λj ' , 2 σω

(58)

h i 2 ' E b ν T +1|T | xT +1 +

σ 2ω T

k  X

j=k1 +1

 λ∗

τ 2β j − 1

j

λj

.

(59)

Minimize E[ω 2T +1|T |xT +1 ] if τ 2β j < 1 Invaluable to eliminate or retain correctly if λ∗j /λj is large. Cannot forecast better by dropping relevant collinear variables David F. Hendry

Economic Forecasting – p.100/169

Collinearity & mis-specification τ2 >1, largest λ ratio = 9, changing collinearity

1.8

τ2 1, largest λ ratio 0.11, changing collinearity

1

1.100

1.100

1.075

1.075

1.050

1.050 5

David F. Hendry

7

9

Time since break →

7

9

Time since break →

1.150 1.125

3

5

τ2 T Provides basis for general-to-simple approach: linear model embedded in non-linear general model

David F. Hendry

Economic Forecasting – p.120/169

Mitigating forecast failure To avoid systematic forecast failure: Forecast location shifts; Forecast ongoing effects when shifts occur; Adapt rapidly to breaks. Obvious problems when model is non-constant But problems remain even if model is constant

David F. Hendry

Economic Forecasting – p.121/169

Empirical illustration Learning-adjusted interest rate on retail sight deposits at banks following Banking Act of 1984. Ro,t = wt · RS,t wt is weighting function representing agents’ learning about

interest-bearing retail sight deposits: wt = (1 + exp [α − β (t − t∗ + 1)])−1

(68)

for t ≥ t∗ , zero otherwise, when t∗ = 1984(3). α, β estimated as in Hendry and Ericsson (1991). Figure 123 shows four stages of (68): (a) none; (b) after 1 year’s information; (c) after 2 year’s information; (d) full (5 years). Find (b) is enough to forecast rest of impact David F. Hendry

Economic Forecasting – p.122/169

M1 forecasts at 4 stages of ‘learning’ 1−year learning ∆(m−p)

No learning ∆(m−p)

0.050

0.050

0.025 0.025 0.000 −0.025

0.000

−0.050 1986

1987

1988

1989

1986

1987

1988

Full learning ∆(m−p)

2−year learning ∆(m−p)

0.050

0.050

0.025

0.025

0.000

0.000

1986

1989

1987

1988

David F. Hendry

1989

1986

1987

1988

1989

Economic Forecasting – p.123/169

Forecasting during a break DGP: t ∼ IN



yt = α + λ [1 − exp (−ψ [t − T + 1])] 1{t≥T } + t

0, σ 2

(69)

 , and 1{t≥T } = D{T } is indicator function/dummy

Know break occurs at T , forecasting over T + 1, . . . , T + 4.

Four alternative 1-step ahead forecasting devices: (a) an intercept-corrected model, yt = γ + δD{T } + vt ; (b) the differenced device, y T +h|T +h−1 = yT +h−1 ; (c) an estimated version of (69): ybT +1|T = α b + bδ at T + 1, b , T + 2 on; and ybT +h|T +h−1 = α b+b λ(1 − exp(−(h + 1)ψ)) (d) ignoring the break Forecast T + 1 from T then T + 2 from T + 1, etc. Fig. 125 shows mapping of (69) to UK RS David F. Hendry

Economic Forecasting – p.124/169

Artificial break function and RS 1.0

R S ×10

0.9

break

0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

5

David F. Hendry

10

15

20

Economic Forecasting – p.125/169

Simulation results DGP is (69) for λ = 1, ψ = 0.2, σ  = 0.1 and T = 100 4 alternative forecast devices: fig. 127 summarises Estimating the ‘ogive’ break does yield a lower MSFE than mechanistic devices even 2 periods after the break Mainly due to reduction in bias but MSFEs are very similar All updating devices do substantially better than ignoring the break

David F. Hendry

Economic Forecasting – p.126/169

Mean error and MSFE for break DGP Mean Error 0.6

Intercept correction Estimated DGP

Differenced device Unadjusted model

0.4

0.2

T+1

MSFE

T+2

T+3

T+4

T+2

T+3

T+4

0.4 0.3 0.2 0.1

T+1 David F. Hendry

Economic Forecasting – p.127/169

Learning weights and interest rates

David F. Hendry

Economic Forecasting – p.128/169

4-step forecasts of real money

David F. Hendry

Economic Forecasting – p.129/169

1-step forecasts of UK M1 0.100

∆(m−p) EqCM[RLA]

(a)

0.100

0.075

0.075

0.050

0.050

0.025

0.025

0.000

0.000

−0.025

−0.025

−0.050

−0.050

−0.075

−0.075

(b)

∆(m−p) DEqCM[RLA]

1983 1984 1985 1986 1987 1988 1989 1990 1983 1984 1985 1986 1987 1988 1989 1990 David F. Hendry

Economic Forecasting – p.130/169

Forecast errors from all UKM1 models 0.075

(a)

0.075

0.050

0.050

0.025

0.025

0.000

0.000

−0.025

−0.025 EqCM[Rla ]

−0.050

1984

DEqCM[Rla ]

1986

0.075

DDV

1988

−0.050 1984 1990

(c)

ADV

0.075

0.050

0.050

0.025

0.025

0.000

0.000

−0.025

−0.025

−0.050

1984

(b)

EqCM[Rnet ] DEqCM[Rnet ]

1986

David F. Hendry

1988

−0.050 1990 1984

1986

1988

1990

(d)

VEqCM[Rnet ] DVEqcm[Rnet ]

1986

1988

Economic Forecasting – p.131/169

1990

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.132/169

Pooling of forecasts Combination of individual forecasts often outperforms: delivers smaller MSFE than individual forecasts. Simple rules (averages) often work well: why? If models are partial explanations, combination might improve: e.g. offsetting biases. If all explanatory variables orthogonal, and models are subsets, combination reflects information. But why best, particularly if some produce poor forecasts? Averaging reduces variance: but parameter estimation uncertainty not key.

David F. Hendry

Economic Forecasting – p.133/169

Properties If unexpected break, most forecasts fail in same direction: but mis-specifications interacting with repeated breaks could account for the benefits of pooling. Additional forecasts act like intercept corrections (ICs): improve forecasts if there are structural breaks. Pooling is like ‘shrinkage’ estimation: unknown future value is ‘meta-parameter’, so average may provide a ‘better’ estimate thereof. In a weakly-stationary process, combination ‘insures’ against worst forecasts. Encompassing violated by need to pool: but encompassed models may later become dominant.

David F. Hendry

Economic Forecasting – p.134/169

Role of encompassing If fixed weights and constant DGP: only non-encompassed models worth pooling. But if location shifts occur, an encompassed model may later dominate Since breaks pandemic in macroeconomics, no general results for forecast encompassing If model averaging– average only over ‘reasonable’ models or get counter outcomes Analogy: mix a glass of poison with n glasses of pure water then drink

David F. Hendry

Economic Forecasting – p.135/169

Averaging forecasting models Debate between model averaging and selection See Hansen (2007), Raffalovich, Deane, Armstrong and Tsao (2001), Jacobson and Karlsson (2004) Averaging without selection has obvious drawbacks extreme of averaging over all 2K models in a class: Hoeting, Madigan, Raftery and Volinsky (1999) easy to create counter examples: Hendry and Reade (2006) Some selection seems essential But bad selection also causes bad averaging so good selection required to lead to good averaging but begs the question ‘why bother averaging’?

David F. Hendry

Economic Forecasting – p.136/169

Methods of averaging and selection Summarize various modeling methods by choice of weights wm :  1 m = m∗ Model selection: wm = 0

otherwise

Simple averaging: wm = 2−K Weighted averaging: wm =

P Cm

l∈M

Cl

Selection/averaging combination: ( wm =

P Cm

l∈Ms Cl

0

m ∈ Ms

m∈ / Ms

Weighted averaging carried out using some criterion Cm , scaled by weights on models in model set Ms ⊂ M David F. Hendry

Economic Forecasting – p.137/169

Model averaging and selection √ GUM in (70) has β i = τ β i / T with τ β i = 0.1, 1, 2, 3, 4, 5, 10 (τ β i is population t-ratio): yt = β 0 +

5 X i=1

β i zi,t +

5 X

β j zj,t + t

(70)

j=1

All variables enter demeaned Figure plots 1-step MSFEs In most cases GUM forecasts better than averaging Autometrics is started from GUM: report both selected model and terminal models from tree-search, averaged using SIC

David F. Hendry

Economic Forecasting – p.138/169

2.0

→ MSFE

Simulation graphics

All 2k models

1.8

equal weight SC weights

1.6 1.4 1.2

unconditional BMA

Autometrics

conditional BMA

1.0 0

→ t−value 1

2

David F. Hendry

3

4

5

GUM

auto

DGP 6

7

8

9

Economic Forecasting – p.139/169

10

Simulation findings Autometrics forecasts have ‘hump’ around t = 2 partly as no bias correction BMAs also have ‘hump’, but around t = 4 Model averaging less useful as significance of zi,t increases Using equal 2−K weights or SIC weights, MSFEs increase exponentially Autometrics provides a better, more consistent, performance When τ β i < 2, model averaging can outperform GUM and when 0 < τ β i < 4 can outperform Autometrics When τ β i = 0, Autometrics selects null model unconditional model averaging averages over all models

David F. Hendry

Economic Forecasting – p.140/169

Changing selection probabilities MSFE 5%

1.100

10% 16%

1.075

25% 50%

1.050

75% 50%

1.025

1.000

50% GUM × t−val 10% 75%

0.975

0

1

David F. Hendry

2

3

4

5

6

DGP × t−val 25% 5% 16%

7

t−value→

Economic Forecasting – p.141/169

8

Forecasting under non-stationarity √ GUM has β i = 3/ T and β j = 0 dt = 1{t>T −4} is step dummy: yt = β 0 +

4 X i=1

β i zi,t +

5 X

β j zj,t + γdt + t

(71)

j=1

All variables enter demeaned Correlation of ρ = 0.05 between all variables Largest break of 4.8 is 2.4 standard deviations Once break exceeds 1.25 SDs Autometrics outperforms model averaging Hump is Autometrics at 5% not selecting indicator sufficiently often For very small breaks, Autometrics and model averaging outperform forecasting from DGP David F. Hendry

Economic Forecasting – p.142/169

2.4

2.2

→ MSFE

Structural breaks graphics

4 significant variables with t=3 1 step dummy (centred, step at T−4=97, t=0.005,...,1.2) 5 irrelevant variables All 2k models

2.0

1.8

Autometrics 1.6

1.4

GUM DGP

1.2 0.0

0.5

1.0

David F. Hendry

1.5

2.0

2.5

3.0

3.5

4.0

4.5

→ t−value

Economic Forecasting – p.143/169

5.0

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.144/169

Forecasting volatility Reconsider GARCH(1,1) process: σ 2t = ϕ0 + ϕ1 u2t−1 + ϕ2 σ 2t−1 .

(72)

Long-run variance is ϕ0 /(1 − ϕ1 − ϕ2 ) > 0: equilibrium-correction model. Forecast of next period’s volatility is: 2 2 b σ T +1|T = ϕ b2 b σT . b0 + ϕ b1 u b2T + ϕ

(73)

Confronts every problem noted for forecasts of mean: breaks in ϕ0 , ϕ1 , ϕ2 ; mis-specification of variance evolution; estimation uncertainty; etc. David F. Hendry

Economic Forecasting – p.145/169

Shifts in variance Consider a shift in ω , α, β to ω ∗ , α∗ , β ∗ at T so:  ∗ ∗ ∗ 2 2 ∗ 2 2 σ T +1 = φ + α

uT − σ T + (α + β ) σ T − φ





.

(74)

Let p denote the plim. 2 σ T +1|T The 1-step forecast-error taxonomy for σ 2T +1 − b follows.

David F. Hendry

Economic Forecasting – p.146/169

Variance-forecast error taxonomy (1− (α∗ + β ∗ )) (φ∗ − φ)  b φ − φp + 1− α b +β      b b + 1− α b+β φp − φ + (α∗ − α) (u2T − σ 2T ) + (α − αp ) (u2T − σ 2T ) + (αp − α b ) (u2T − σ 2T ) +b α (u2T − ET [b u2T ]) b2T ) +b α (ET [b u2T ] − u ∗ ∗ 2 +[(α + β ) − (α + β)](σ   T2 − φ) + h(α + β) − α (σ T − φ) p + β p i  b (σ 2 − φ) b +β + αp + β p − α T  2  2 b b +β σ T − ET σ  2  T2  b ET σ b b −σ +β T

David F. Hendry

T

long-run mean shift, [1] long-run mean inconsistency, [2 long-run mean variability, [3] α shift, [4] α inconsistency, [5] variability, [6] impact inconsistency, [7] impact variability, [8] variance shift, [9] variance inconsistency, [10] variance variability, [11] σ 2T inconsistency, [12] σ 2T variability, [13]. (75) Economic Forecasting – p.147/169

Implications In practice, ϕ b1 + ϕ b2 often close to unity, and ϕ b0 is small. Rather like unit root in AR(1) from location shifts. Usually miss jumps in volatility: capture high/low phases. Consider equivalent of ∆2 x bT +1|T = 0, namely: 2 e σ T +1|T

=

2 b σT .

(76)

Extrapolate latest volatility estimate: track main changes. All earlier ‘tricks’ seem to apply again.

David F. Hendry

Economic Forecasting – p.148/169

Route map (A) Introduction and background (B) Measuring forecast accuracy (C) Unpredictability (D) Implications for forecasting: taxonomies (E) Implications for models of expectations (F) Forecasting models and forecast failure (G) Re-analysis of UK M1 (H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusions David F. Hendry

Economic Forecasting – p.149/169

Summary Four provable theorems underpin our present knowledge. The first theorem from §32 is that: more information does not worsen predictability, even in intrinsically non-stationary processes. But more information need not improve forecasting. Second theorem applies only to stationary processes (§42): retaining all variables with non-centralities τ 2 > 1 in their t2 -distributions dominates in 1-step forecasting on MSFE. τ 2 > 1 implies expected t2 > 2, like AIC (see Akaike, 1985). These theorems counter conventional wisdom on parsimony in forecasting.

David F. Hendry

Economic Forecasting – p.150/169

Two more theorems Third theorem in §32 returns to predictability: less information should not induce predictive failure. Costs of reducing information are inefficiency, not failure: parsimony need not be too expensive. The fourth theorem, in §42, is negative counter to first: if DGP lacks time invariance, in-sample dominant ‘casual’ models need not outperform naive devices in forecasting. Thus, for forecasting variables from processes that lack time invariance, theoretical analysis cannot establish the pre-eminence of more information over less. Implication of four theorems for forecasting– transform congruent encompassing causal models to achieve robustness to ex post breaks: §103. David F. Hendry

Economic Forecasting – p.151/169

Conclusions Despite weak assumptions of non-stationary economy, subject to unanticipated structural breaks, model differs from DGP in unknown ways, selected and estimated from unreliable data, can derive many useful insights. Implications differ considerably from: model = DGP or constant mechanism. Fundamental concepts point towards problems confronting successful forecasting. Econometric systems should outperform—but do not. Causal information swamped by unmodeled breaks. Strategy: retain former yet avoid systematic failure.

David F. Hendry

Economic Forecasting – p.152/169

Conclusions on differencing ∆2 xt representations knowingly mis-specified in-sample, and highly restricted Jt−1 :

yet help avoid systematic forecast failure.

Illustrated by UK M1 after Banking Act of 1984. Adaptations behaved as anticipated from theory— showed difficulty of out-performing ‘naive devices’ which are adaptive to location shifts. Transformations may be best route to retain causal information when model mis-specification unknown.

David F. Hendry

Economic Forecasting – p.153/169

Conclusions on policy Causal information need not dominate non-causal: best forecasting model not necessarily best for policy. Unanticipated location shifts (deterministic) pernicious for forecasting; other parameter changes (stochastic) pernicious for policy. Congruent models may not outperform non-congruent; mis-specification testing need not improve forecasts; equilibrium-mean shifts induce forecast failure: cointegration helpful only if remains constant. Co-breaking suggests causal information may be valuable.

David F. Hendry

Economic Forecasting – p.154/169

Conclusions on modeling Surprisingly: poor methods, bad models, inaccurate data, and data-based selection not primary causes of systematic mistakes. Main causes are unanticipated large changes affecting forecast period. Primary fault in economic forecasting: not rapidly adjusting forecasts once they go wrong. Unanticipated departures of E[yt ] from E[yt ] main source of systematic forecast failure. Forecasting success/failure inadequate for model choice: focus on ‘out-of-sample’ performance unsustainable. Greater reliance on economic theory cannot solve problem. David F. Hendry

Economic Forecasting – p.155/169

Conclusions on forecasting breaks Whether breaks are predictable from relevant information available at the forecast origin remains unknown as yet. But progress in developing forecasting models; and methods of testing for and selecting such models. Predictability theory: two information sets, regular and shifts; model latter as non-linear ogive. Handle non-normality by impulse saturation: size is controlled, distributions well behaved, and power for a range of interesting shifts. Leads to new automatic test of super exogeneity. Applies to more variables than observations and hence invaluable in non-linear modeling. David F. Hendry

Economic Forecasting – p.156/169

Conclusions: external breaks Analysis of changing collinearity in mis-specified models: minimize E[2T +1|T ] by eliminating regressors with τ 2β j < 1. Adverse impact on MSFE if collinearity changes: cannot forecast better simply by dropping collinear variables if τ 2β j > 1. Data based orthogonal transformations may change an external break into an internal one Immediate updating invaluable Even more valuable to correctly eliminate or retain variables when λ∗n /λn is large.

David F. Hendry

Economic Forecasting – p.157/169

Conclusions: internal breaks Location shifts key cause of forecast failure Several approaches to alleviate impact following a break: updating; intercept corrections; differencing; estimating future impact of break during progress. Model of break process outperforms mechanistic corrections but requires 3 periods into evolving break. Break course can be tracked, even if break cannot be predicted. Improves in practice for UK M1, but not greatly over differencing. Analysis of location shifts in VEqCMs reveals why differencing VEqCM for forecasting may be valuable. David F. Hendry

Economic Forecasting – p.158/169

Conclusions on modeling for breaks Extended automatic Gets algorithm for non-linear functions Approximations capture unknown non-linear functional form initial focus on polynomials; final selection ‘ogive’ like. Portmanteau test of functional form with just 2n terms Approximate a wide range of non-linear models Operational rules to reduce collinearities Indicator saturation to remove extreme observations Large number of potential non-linear variables: super-conservative strategy and handle N > T Encompassing tests against ‘ogive’ of choice Non-linear capability may help to forecast breaks.

David F. Hendry

Economic Forecasting – p.159/169

Conclusions on model averaging First, minor levels of collinearity produce poor forecasts using simple model averaging, either weighted or not Second, consistent selection, as embodied in Autometrics but not Occam’s window provides effective forecasts whether or not one selects a single model or selects a range of models to average over Third, model averaging, either weighted or not performs badly when outliers are not modeled

David F. Hendry

Economic Forecasting – p.160/169

Final conclusions Econometric systems have: cointegration, co-breaking, model selection, and intercept corrections. But forecasting models must be robust to breaks. Conclusions have positive prescriptions; many are surprising; and many criticisms have clear remedies.

David F. Hendry

Economic Forecasting – p.161/169

Rerouting (A) (B) (C) (D) (E) (F) (G)

Introduction and background Measuring forecast accuracy Unpredictability Implications for forecasting: taxonomies Implications for models of expectations Forecasting models and forecast failure Re-analysis of UK M1

(H) Other potential sources of forecast failure (I) Potential improvements to forecasting devices (J) Forecasting (during) breaks (K) Pooling of forecasts (L) Forecasting volatility Conclusion David F. Hendry

Economic Forecasting – p.162/169

References Akaike, H. (1985). Prediction and entropy. In Atkinson, A. C., and Fienberg, S. E. (eds.), A Celebration of Statistics, pp. 1–24. Springer-Verlag. Castle, J. L., and Hendry, D. F. (2010). A low-dimension, portmanteau test for non-linearity. Journal of Econometrics, DOI:10.1016/j.jeconom.2010.01.006. —– (2008). Forecasting UK inflation. In Rapach, D. E., and Wohar, M. E. (eds.), Forecasting in the Presence of Structural Breaks and Model Uncertainty, pp. 41–92. Emerald Group. Clements, M. P., and Hendry, D. F. (1999). Forecasting Non-stationary Economic Time Series. MIT Press. —– (2005). Information in economic forecasting. OxBull, 67, 713–753. Collier, P., and Hoeffler, A. (2007). Civil war. Chapter 22, Handbook of Defense Economics, T. Sandler and K. Hartley (eds), Elsevier. Hansen, B. E. (1999). Discussion of ‘Data mining reconsidered’. EctJ, 2, 26–40. —– (2007). Selection and combination of unit root and stationary autoregressions. Madison. Hendry, D. F. (2006). Robustifying forecasts from EqCMs. JEcts, 135, 399–426. —– and Ericsson, N. R. (1991). Demand for narrow money in UK and US. EER, 35, 833–886. Hendry, D. F., and Neale, A. J. (1991). Effects of structural breaks on tests for unit roots. In Hackl, P., and Westlund, A. H. (eds.), Economic Structural Change, pp. 95–119. Springer-Verlag. Hendry, D. F., and Reade, J. J. (2006). Forecasting using model averaging. Oxford. Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian model averaging. Statistical Science, 214, 382–417. Jacobson, T., and Karlsson, S. (2004). Finding good predictors for inflation. JForc, 23, 479–496. Perron, P. (1989). The Great Crash, the oil price shock and the unit root hypothesis. Econometrica, 57, 1361–1401. Raffalovich, L., Deane, D., Armstrong, D., and Tsao, H.-S. (2001). Model selection procedures. WP 2005/16, CSDAs, SUNY, Albany.

David F. Hendry

Economic Forecasting – p.163/169

Measurement errors versus breaks Signal extraction problem: if yei,T |T −δ differs significantly from yi,T :

outlier due to measurement error? more permanent location shift? combination of both?

Observationally equivalent with one data point Measurement errors and location shifts at T require different forecasting models Location shift requires adding intercept correction (IC): measurement errors require subtracting IC to offset error Resolving conflict between opposite responses is central to accurate nowcasting David F. Hendry

Economic Forecasting – p.164/169

Distinguishing measurement errors from breaks Measurement error at T does not ‘carry forward’, although effects will in dynamic process; data at T usually revised: revisions to errors at T + 1 informative about source of error from T to T + 1; next nowcast error (from T + 1 to T + 2) large if source is location shift, so repeated mis-forecasting indicative of location shift; extraneous contemporaneous data help to discriminate: discrepancy from existing models persist or disappear; variance of measurement errors usually changes as the forecast origin approaches.

David F. Hendry

Economic Forecasting – p.165/169

Analyzing measurement errors versus breaks Simplest model: ∗

yt = y + t where t ∼ IN

Three cases:



0, σ 2



(77)

y ∗ is constant – pure measurement error; t = 0 ∀t but y ∗ alters – pure location shift;

both y ∗ shifts and t 6= 0

Nowcast at T from estimation sample 1, . . . , T − δ for small δ Sequence of nowcasts as if in real time, at T + 1, T + 2 etc.

David F. Hendry

Economic Forecasting – p.166/169

A. Measurement errors only ET +1 [yT +1 | IT +1 ] = ET +1 [(y ∗ + T +1 )] = y ∗

(78)

Despite measurement error 1-step minimum MSE PT 1 forecasting device: ybT +1|T = y (T ) = T t=1 yt . Thus:   ET +1 ybT +1|T | IT +1 = y ∗     1 VT +1 ybT +1|T = σ 2 1 + T

Full-sample estimate y (T ) of y ∗ is unbiased predictor, with smallest variance of any sample size choice, which is just T1 larger than nowcast from the known location y ∗ . Largest sample is best.

David F. Hendry

Economic Forecasting – p.167/169

B. Location shift only Best predictor of yT +1 is yeT +1|T = yT .

When no location shifts, yeT +1|T = yT = y ∗ is exact; When shift at T (∇(T ) y ∗ ) a one-period mistake is made: yT +1 − yeT +1|T = yT +1 − yT = ∇(T ) y ∗

Nowcast at T + 2 unbiased if process remains constant: yT +2 − yeT +2|T +1 = yT +2 − yT +1 = 0.

Best 1-step predictor is lagged value, uses smallest possible sample size – final observation – ignoring all earlier observations.

David F. Hendry

Economic Forecasting – p.168/169

3. Cross solutions A.] Use yeT +1|T = yT but only measurement error:   ET +1 yeT +1|T = ET +1 [yT ] = y ∗   VT +1 yeT +1|T = 2σ 2

Predictor unbiased but variance increases by (T − 1) /T . B.] Use ybT +1|T when t = 0 ∀t but location shift (∇(τ ) y ∗ ): 

ET +1 yT +1 − ybT +1|T



τ ∗ ∇(τ ∗ ) y ∗ = T

y (T ) is biased predictor – impact of in-sample break

under-estimated when not modelled. Either model break or use rolling sample spanning post-break period. David F. Hendry

Economic Forecasting – p.169/169