Time Series Regression

Statistics 203: Introduction to Regression and Analysis of Variance Time Series Regression Jonathan Taylor - p. 1/12 Today’s class ● Today’s clas...

Author: Clifton Tate

45 downloads 0 Views 107KB Size

Report

Download PDF

Recommend Documents

Stationarity of time series and the problem of spurious regression

IT S TIME TO REASSESS TRADITIONAL TIME-DEPENDENT REGRESSION METHODS

Outline. Time Series Analysis. Time series in astronomy. Difficulties in astronomical time series

Time Series Magic: Using PROC EXPAND with Time Series Data

INFERENCE IN TIME SERIES REGRESSION WHEN THE ORDER OF INTEGRATION OF A REGRESSOR IS UNKNOWN

Residual diagnostic plots for checking for model mis-specification in time series regression

Elementary time series analysis Unit roots Vector autoregressions Integration and cointegration Dynamic regression issues

14. Time Series Analysis:

Learning Time Series CS498

Nonlinear Time Series Models

Time Series Databases

Nonparametric Bootstrap Tests for Neglected Nonlinearity in Time Series Regression Models

Time Series Analysis Exercises

Time Series Analysis

Codebook. Time series (USA)

sign time series data

Discrete-Time Fourier Series Concept. Discrete-Time Fourier Methods. Discrete-Time Fourier Series Concept. Discrete-Time Fourier Series Concept

Time Series Prediction

ENVISAT time series

Time Series Analysis:

Time Series Analysis

Applied Econometrics with. Time Series. Overview. Time Series. Chapter 6. Overview. Overview. Time series data: typical in macroeconomics and finance

TIME SERIES ANALYSIS USING R

Statistics 203: Introduction to Regression and Analysis of Variance

Time Series Regression Jonathan Taylor

- p. 1/12

Today’s class

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

Regression with autocorrelated errors. Functional data.

● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

- p. 2/12

Autocorrelation

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

● Other models of correlation ● More than one time series ● Functional Data

■

● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

■

In the random effects model, outcomes within groups were correlated. Other regression applications also have correlated outcomes (i.e. errors). Common examples: time series data. Why worry? Can lead to underestimates of SE → inflated t’s → false positives.

- p. 3/12

Durbin-Watson test for autocorrelation

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression ● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

■

In regression setting, if noise is AR(1), a simple estimate of ρ is obtained by (essentially) regressing et onto et−1 Pn (e e ) Pn t t−1 ρb = t=2 . 2 t=1 et To formally test H0 : ρ = 0 (i.e. whether residuals are independent vs. they are AR(1)), use Durbin-Watson test, based on d = 2(1 − ρb).

- p. 4/12

Correcting for AR(1) in regression model

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

If we now ρ, it is possible “pre-whiten” the data and regressors Y˜i+1 = Yi+1 − ρYi , i > 1

● Other models of correlation

˜ (i+1)j = X(i+1)j − ρXij , i > 1 X

● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

■

then model satisfies “usual” assumptions. ˜ β0 = β˜0 /(1 − ρ), βj = β˜j . For coefficients in new model β,

- p. 5/12

Two-stage regression

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression ● Other models of correlation ● More than one time series ● Functional Data

■ ■

Step 1: Fit linear model to unwhitened data. Step 2: Estimate ρ with ρb. Step 3: Pre-whiten data using ρb – refit the model.

● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

- p. 6/12

Other models of correlation

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines

■

If we have ARM A(p, q) noise then we can also pre-whiten the data and perform OLS – equivalent to GLS. If we estimate parameters we can then use a two-stage procedure as in the AR(1) case. OR, we can just use MLE (or REML): R does this. This is similar to iterating the two-stage procedure.

● Kernel smoother

- p. 7/12

More than one time series

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

Suppose we have r time series Yij , 1 ≤ i ≤ r, 1 ≤ j ≤ nr . Regression model Yij = β0 + β1 Xij + εij .

● Other models of correlation ● More than one time series

where the β’s are common to everyone and

● Functional Data ● Scatterplot smoothing ● Smoothing splines

εi = (εi1 , . . . , εini ) ∼ N (0, Σi ),

● Kernel smoother

■

independent across i We can put all of this into one big regression model and estimate everything. Easy to do in R.

- p. 8/12

Functional Data

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines

■

● Kernel smoother

■

Having observations that are time series can be thought of as having a “function” as an observation. Having many time series, i.e. daily temperature in NY, SF, LA, . . . allows one to think of the individual time series as observations. The field “Functional Data Analysis” (Ramsay & Silverman) is a part of statistics that focuses on this type of data. Today we’ll think of having one function and what we might do with it.

- p. 9/12

Scatterplot smoothing

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression ● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

■

When we only have one “function” we can think of fitting a trend as smoothing a scatterplot of pairs (Xi , Yi )1≤i≤n . Different techniques ◆ B-splines; ◆ Smoothing splines; ◆ Kernel smoothers; ◆ many others.

- p. 10/12

Smoothing splines

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression

■

● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines ● Kernel smoother

We saw early on in the class that we could use B-splines in a regression setting to predict Yi from Xi . Smoothing splines: for λ ≥ 0 and weights wi , 1 ≤ i ≤ n find the function with two-derivatives that minimizes Z n X ωi (Yi − f (Xi ))2 + λ (f 00 (x))2 dx. i=1

■

■

This should remind you of ridge regression: prior is now on functions. Equivalent to saying that we have a Gaussian prior (integrated Brownian motion) on functions and we want the “MAP” estimator based on observing f at the points X with measurement errors εi ∼ N (0, 1/wi ).

- p. 11/12

Kernel smoother

● Today’s class

■

● Autocorrelation ● Durbin-Watson test for autocorrelation ● Correcting for AR(1) in regression model ● Two-stage regression ● Other models of correlation ● More than one time series ● Functional Data ● Scatterplot smoothing ● Smoothing splines

■

● Kernel smoother

Given a kernel function K and a bandwidth h, the kernel smooth of the scatterplot (Xi , Yi )1≤i≤n is defined by the local average Pn Yi · K((x − Xi )/h) i=1 b . Y (x) = Pn i=1 K((x − Xi )/h) Most commonly used kernel:

K(x) = e ■

−x2 /2

.

The key parameter is the bandwidth. Much work has been done on choosing an “optimal bandwidth.”

- p. 12/12