Regression III: Advanced Methods

Lecture 13: Nonconstant Error Variance Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/reg...
Author: Donna Carroll
0 downloads 3 Views 338KB Size
Lecture 13: Nonconstant Error Variance

Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3

Non-Constant Error Variance • Also called Heteroscedasticity • An important assumptions of the least-squares regression model is that the variance of the errors around the regression surface is everywhere the same: V(E)=V(Y|x1,…,xk)=σ2. • Non-constant error variance does not cause biased estimates, but it does pose problems for efficiency and the usual formulas for standard errors are inaccurate – OLS estimates are inefficient because they give equal weight to all observations regardless of the fact that those with large residuals contain less information about the regression • Two types of nonconstant error variance are common (though others are possible): – Error variance increases as the expectation of Y increases; – There is a systematic relationship between the errors and one (or more) of the X’s 2

Assessing Non-Constant Error Variance • Direct examination of the data is usually not helpful in assessing non-constant error variance, especially if there are many predictors. We look to the distribution of the residuals to try and discover the distribution of the errors. – Still, the distribution of the residuals are different from the errors and because they are weighted averages of the data, the residuals can be normally distributed even if the errors are not – Moreover, it is not helpful to plot Y against the residuals E, because there is a built-in correlation between Y and E:

• The least squares fit ensures that the correlation between Y-hat and E is 0, so a plot of these (residual plot) can help us uncover nonconstant error variance. 3

Residual Plots (1) • Plotting the residuals against the fitted values can prove to be very useful for assessing heteroscedasticity, in particular for the problem of the error variance increasing as Y grows larger • The pattern of changing spread is often more easily seen using studentized residuals Ei*2 against Y-hat • Can also plot squared or absolute studentized residuals against Y-hat or X’s • If the values of Y-hat are all positive, we can use a Spread-level plot – plot log|Ei*| (called the log spread) against log Yhat (called the log level) – The slope b of the regression line fit to this plot suggests the variance-stabilizing transformation Y(p), with p=1-b

4

Assessing nonconstant error variance: Example: Ornstein data (1)

5

Assessing nonconstant error variance: Example: Ornstein data (2) Spread-Level Plot for Ornstein.model

2.000 0.500 0.100 0.005

0.020

Absolute Studentized Residuals

2 0 -2

rstudent(Ornstein.model)

4

Plot of studentized residuals versus fitted values

0

20

40

60

80

fitted.values(Ornstein.model)

100

1

2

5

10

20

50

100

Fitted Values + 3

• In the residual plot we see that the data “fan out”— i.e., the spread (hence, variance) of the residuals is increasing as the fitted values get larger • The slope of the spread-level plot helps us find an appropriate transformation 6

Assessing nonconstant error variance: Example: Ornstein data (3)

• The spread.level.plot command in car returns a suggested transformation of for Y of λ=.322 (1-the slope of the spread level plot) 7

Testing for non-constant error variance (1) • Assume that a discrete X (or a recoded version of a continuous X) partitions the data into m groups • Let Yij denote the ith of nj dependent-variable scores in group j. • Within-group sample variances are then calculated as follows:

• We could then compare these within-group sample variances to see if they differ • If the distribution of the errors is nonnormal, however, tests that examine Sj2 directly are not valid because the mean is not a good summary of the data 8

Testing for non-constant error variance (2): Levine’s test • An alternative robust test is Levine’s test, which replaces each value with Zij, and performs a one-way analysis of variance of the Zij over the groups

• If the error variance is not constant across the group means of the Zij, will differ. • A simple one-way analysis of variance (F-Test) is used to assess whether they differ

9

R-script, Breusch-Pagan (aka CookWeisberg) test for heteroskedasticity

• The result here shows that the nonconstant error variance is statistically significant – Y should be transformed before proceeding to the final regression analysis—recall that the spread.level.plot command suggested a transformation of p∼ 1/3 – Alternatively, weighted least-squares could be used instead of OLS 10

Weighted Least Squares (1) • Recall that the maximum likelihood estimators for WLS are defined as:

• This is equivalent to minimizing the weighted sum of squares ∑wi2Ei2, giving greater weight to observations with smaller variance – Initial estimates of the weights are obtained from a preliminary OLS – We then examine then residual plots to determine if the magnitude of the errors is proportional to one of the X’s, in which case we use 1/X as the weights wi, thus ensuring that the new errors εi ≡ εi/Xi are normally distributed 11

Weighted Least Squares (2) Example: Ornstein data • Assume that we have reason to believe that the error variance in the Ornstein model were proportional to X1 (assets)—The “fanning” pattern in the residual plot indicates that it is • This implies, then, that V(εi)=σ2 × assets • We could then proceed to estimate a WLS using the weight 1/Xi1 • In R we simply add a weights argument to the lm function:

12

Robust Standard Errors (1) • Robust standard errors can be calculated to compensate for an unknown pattern of non-constant error variance • Robust standard errors require less assumptions about the model than WLS (which is a better if there is increasing error variance with the level of Y) – Robust standard errors do not change the OLS coefficient estimates or solve the inefficiency problem, but do give more accurate p-values • There are several methods for calculating heteroskedasticity consistent standard errors (e.g., known variously as White, Eicker or Huber standard errors) but most are variants on the method originally proposed be White (1980)

13

Robust Standard Errors (2) White’s Standard Errors • The covariance matrix of the OLS estimator is

• Where V(y)=σε2 In if the assumptions of normality and constant error variance are satisfied. The variance, then simplifies to: • In the presence of nonconstant error variance, however, Σ ≡ V(y)=diag{σ2,…,σn2} meaning that

• Since the variance of the ith error is σi2=E(εi)=0, White suggests a consistent estimator of the variance that constrains Σ to a diagonal matrix containing only the squared residuals 14

Robust Standard Errors (3) White’s Standard Errors • The heteroskedasticity consistent covariance matrix (HCCM) matrix estimator is then:

• HCCM matrix can be easily found using the hccm (model) function in the car package. Taking the square root of the diagonal of the matrix HCCM matrix gives the robust standard errors

15

Function for lm output that includes White Standard Errors • The robust.se function below relies on the hccm function in car and the summary function in the base R package

16

summary for Ornstein model

17

Output for robust.se

18

Robust Standard Errors (3) • Since the HCCM is found without a formal model of the heteroskedasticity, relying instead on only the regressors and residuals from the OLS for its computation, it can be easily adapted to many different applications • For example, robust standard errors can also be used to improve statistical inference from clustered data, pooled time series data with autocorrelated errors (e.g., NeweyWest standard errors), and panel data • Cautions: – Not all robust estimators of variance perform equally well with small sample sizes (Long and Ervin, 2000) – Robust standard errors should not be seen as a substitute for careful model specification. In particular, if the pattern of heteroskedasticity is known, it can often be more effectively corrected—and the model more efficiently estimated—using weighted least squares. 19