Multiple Regression Analysis y = β0 + β1x1 + β2x2 + . . . βkxk + u Heteroskedasticity
ECON 5329 - Professor Crowder
1
What is Heteroskedasticity Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant If this is not true, that is if the variance of u is different for different values of the x’s, then the errors are heteroskedastic Example: estimating returns to education and ability is unobservable, and think the variance in ability differs by educational attainment ECON 5329 - Professor Crowder
2
Example of Heteroskedasticity y
f(y|x)
.
. x1
x2
x3 ECON 5329 - Professor Crowder
.
E(y|x) = β0 + β1x
x 3
Why Worry About Heteroskedasticity? OLS is still unbiased and consistent, even if we do not assume homoskedasticity The standard errors of the estimates are biased if we have heteroskedasticity If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences ECON 5329 - Professor Crowder
4
Variance with Heteroskedasticity (xi − x )ui ∑ ˆ For the simple case, β1 = β1 + , so 2 ∑ (xi − x ) 2 ( ) x x − σ ∑ i i 2
( )
Var βˆ1 =
2 x
SST
, where SSTx = ∑ ( xi − x )
2
A valid estimator for this when σ i2 ≠ σ 2is 2 ˆ ( ) x x u − ∑ i i 2
2 x
SST
, where uˆi are are the OLS residuals ECON 5329 - Professor Crowder
5
Variance with Heteroskedasticity For the general multiple regression model, a valid
( )
estimator of Var βˆ j with heteroskedasticity is
( )
Varˆ βˆ j =
2 2 ˆ r ∑ ij uˆi
SST j2
, where rˆij is the i th residual from
regressing x j on all other independent variables, and SST j is the sum of squared residuals from this regression
ECON 5329 - Professor Crowder
6
Robust Standard Errors Dependent Variable: LWAGE Method: Least Squares Included observations: 526 Variable
Coefficient
Std. Error
t-Statistic
Prob.
MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C
0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378
0.055357 0.071767 0.055742 0.006694 0.005243 0.000110 0.006762 0.000231 0.100009
3.841881 -4.188461 -1.979658 11.78733 5.111835 -4.847105 4.301613 -2.305552 3.213492
0.0001 0.0000 0.0483 0.0000 0.0000 0.0000 0.0000 0.0215 0.0014
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Use our familiar wage regression to demonstrate the calculation of robust standard errors. Step 1: Save residuals from this regression and square them. 1.623268 0.531538 0.988423 1.061403 1.016998 1.784785
ECON 5329 - Professor Crowder
7
Robust Standard Errors Dependent Variable: MARRIED Method: Least Squares Date: 10/03/13 Time: 21:29 Sample: 1 526 Included observations: 526 Variable
Coefficient
Std. Error
t-Statistic
Prob.
MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C
0.893251 -0.597199 0.011068 0.025058 -0.000438 0.010709 -0.000230 0.275636
0.041284 0.035622 0.005291 0.004013 8.55E-05 0.005346 0.000183 0.078449
21.63694 -16.76484 2.091790 6.244295 -5.126129 2.003072 -1.252971 3.513583
0.0000 0.0000 0.0369 0.0000 0.0000 0.0457 0.2108 0.0005
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.597240 0.591797 0.312158 50.47516 -129.9371 109.7323 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Step 2: Run auxiliary regression of Xj on all other X’s and save the residuals and the sum of the squared residuals.
0.608365 0.488580 0.524476 0.589347 0.549876 1.964443
ECON 5329 - Professor Crowder
8
Variance with Heteroskedasticity ˆ r u ∑ Varˆ ( βˆ ) = , ∑ rˆ uˆ SST 1
2 2 i1 i 2 1
2 2 i1 i
= 8.1765,
( )
SST = 2547.7422, Varˆ βˆ1 = 0.0032 2 1
( )
σˆ βˆ = Varˆ βˆ1 = 0.0567 1
ECON 5329 - Professor Crowder
9
Robust Standard Errors Dependent Variable: LWAGE
Eviews can calculate the robust standard errors for us.
Method: Least Squares Included observations: 526 White heteroskedasticity-consistent standard errors & covariance Variable
Coefficient
Std. Error
t-Statistic
Prob.
MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C
0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378
0.056651 0.071750 0.056626 0.007351 0.005095 0.000105 0.006881 0.000242 0.108528
3.754142 -4.189453 -1.948772 10.73469 5.260206 -5.076983 4.227050 -2.206796 2.961234
0.0002 0.0000 0.0519 0.0000 0.0000 0.0000 0.0000 0.0278 0.0032
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
1.623268 0.531538 0.988423 1.061403 1.016998 1.784785
ECON 5329 - Professor Crowder
10
Robust Standard Errors Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference Typically call these robust standard errors Sometimes the estimated variance is corrected for degrees of freedom by multiplying by n/(n – k – 1) As n → ∞ it’s all the same, though ECON 5329 - Professor Crowder
11
Robust Standard Errors Dependent Variable: LWAGE Method: Least Squares Included observations: 526 White heteroskedasticity-consistent standard errors & covariance Variable MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C
Coefficient 0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
Std. Error 0.057142 0.072372 0.057116 0.007415 0.005139 0.000106 0.006941 0.000244 0.109469 0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000
t-Statistic 3.721886 -4.153457 -1.932028 10.64246 5.215010 -5.033361 4.190731 -2.187835 2.935791
Prob. 0.0002 0.0000 0.0539 0.0000 0.0000 0.0000 0.0000 0.0291 0.0035
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
( )
σˆ βˆ = Varˆ βˆ1 = 0.0567 1
Correct for the degrees of freedom used in the estimation by multiplying by
σˆ βˆ × 1
n . (n − k − 1)
n = 0.0567 *1.0175 = 0.057 (n − k − 1)
1.623268 0.531538 0.988423 1.061403 1.016998 1.784785
ECON 5329 - Professor Crowder
12
Robust Standard Errors (cont) Important to remember that these robust standard errors only have asymptotic justification – with small sample sizes t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct
ECON 5329 - Professor Crowder
13
A Robust LM Statistic Run OLS on the restricted model and save the residuals ŭ Regress each of the excluded variables on all of the included variables (q different regressions) and save each set of residuals ř1, ř2, …, řq Regress a variable defined to be = 1 on ř1 ŭ, ř2 ŭ, …, řq ŭ, with no intercept The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final regression ECON 5329 - Professor Crowder
14
A Robust LM Statistic Dependent Variable: NARR86 Method: Least Squares Included observations: 2725 Variable
Coefficient
Std. Error
t-Statistic
Prob.
PCNV AVGSEN AVGSEN^2 PTIME86 QEMP86 INC86 BLACK HISPAN C
-0.135595 0.017841 -0.000516 -0.039360 -0.050507 -0.001480 0.324602 0.193380 0.567013
0.040370 0.009696 0.000297 0.008693 0.014435 0.000341 0.045419 0.039703 0.036057
-3.358825 1.840039 -1.738336 -4.527519 -3.499055 -4.345149 7.146872 4.870607 15.72531
0.0008 0.0659 0.0823 0.0000 0.0005 0.0000 0.0000 0.0000 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.072798 0.070067 0.828434 1863.998 -3349.205 26.65535 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Let’s use the arrest data from project #2 to examine the calculation of the robust LM statistic.
0.404404 0.859077 2.464738 2.484258 2.471794 1.840556
ECON 5329 - Professor Crowder
Of interest is whether increasing average sentence length (avgsen) reduces crime (narr86).
15
A Robust LM Statistic Since avgsen enters our model in levels and through its square the null hypothesis is a joint null of β2 = β3 = 0.
Dependent Variable: NARR86 Method: Least Squares Included observations: 2725 Variable
Coefficient
Std. Error
t-Statistic
Prob.
PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C
-0.132278 -0.037795 -0.050981 -0.001490 0.329688 0.195451 0.570334
0.040341 0.008497 0.014436 0.000340 0.045178 0.039693 0.036007
-3.279039 -4.448054 -3.531573 -4.376607 7.297577 4.924074 15.83942
0.0011 0.0000 0.0004 0.0000 0.0000 0.0000 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.071618 0.069569 0.828656 1866.370 -3350.938 34.94583 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
The first step is to estimate the restricted model and save the residuals, ŭ. 0.404404 0.859077 2.464541 2.479724 2.470029 1.841232
ECON 5329 - Professor Crowder
16
A Robust LM Statistic Regress each excluded variable on all of the included independent variables and save residuals, ř1.
Dependent Variable: AVGSEN Method: Least Squares Included observations: 2725 Variable
Coefficient
Std. Error
t-Statistic
Prob.
PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C
0.180057 0.391734 -0.002581 -0.002354 0.990388 0.216405 0.344888
0.164973 0.034749 0.059035 0.001392 0.184754 0.162324 0.147252
1.091438 11.27337 -0.043712 -1.690467 5.360567 1.333165 2.342170
0.2752 0.0000 0.9651 0.0911 0.0000 0.1826 0.0192
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.068887 0.066831 3.388782 31213.08 -7188.885 33.51432 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
0.632294 3.508031 5.281383 5.296566 5.286871 1.825014
ECON 5329 - Professor Crowder
17
A Robust LM Statistic Regress each excluded variable on all of the included variables and save residuals, ř2.
Dependent Variable: AVGSEN^2 Method: Least Squares Included observations: 2725 Variable
Coefficient
Std. Error
t-Statistic
Prob.
PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C
-0.202596 10.50548 0.829241 -0.061357 24.37115 3.466819 5.484001
5.385343 1.134329 1.927145 0.045448 6.031096 5.298882 4.806861
-0.037620 9.261409 0.430295 -1.350053 4.040916 0.654255 1.140869
0.9700 0.0000 0.6670 0.1771 0.0001 0.5130 0.2540
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.043563 0.041452 110.6230 33261354 -16687.30 20.63297 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
12.70156 112.9895 12.25270 12.26788 12.25819 1.945413
ECON 5329 - Professor Crowder
18
A Robust LM Statistic Dependent Variable: X1 Method: Least Squares Included observations: 2725 Variable
Coefficient
Std. Error
t-Statistic
Prob.
RU2 RU1
-0.001045 0.027785
0.000548 0.014060
-1.906863 1.976170
0.0566 0.0482
Mean dependent var S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
1.000000 0.999633 2721.003 -3864.607 0.003010
S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.
0.000000 2.837877 2.842215 2.839445
Multiply the residuals (element-by-element) ři and ŭ to form ř1 ŭ and ř2 ŭ. Then regress these residual products onto the unit vector. The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final regression, 2725-2721.003 = 4.0. Compare this to the χ2(2). ECON 5329 - Professor Crowder
19
A Robust LM Statistic Redundant Variables Test Specification: NARR86 PCNV AVGSEN AVGSEN^2 PTIME86 QEMP86 INC86 BLACK HISPAN C Redundant Variables: AVGSEN AVGSEN^2 F-statistic Likelihood ratio
Value 1.727778 3.464803
df (2, 2716) 2
LR test summary: Restricted LogL Unrestricted LogL
Value -3350.938 -3349.205
df 2718 2716
Probability 0.1779 0.1769
The LR test of the same restriction is not robust to the heteroskedasticity but still yields the same basic inference.
ECON 5329 - Professor Crowder
20
Testing for Heteroskedasticity Essentially want to test H0: Var(u|x1, x2,…, xk) = σ2, which is equivalent to H0: E(u2|x1, x2,…, xk) = E(u2) = σ2 If assume the relationship between u2 and xj will be linear, can test as a linear restriction So, for u2 = δ0 + δ1x1 +…+ δk xk + v) this means testing H0: δ1 = δ2 = … = δk = 0 ECON 5329 - Professor Crowder
21
The Breusch-Pagan Test Don’t observe the error, but can estimate it with the residuals from the OLS regression After regressing the residuals squared on all of the x’s, can use the R2 to form an F or LM test The F statistic is just the reported F statistic for overall significance of the regression, F = [R2/k]/[(1 – R2)/(n – k – 1)], which is distributed Fk, n – k - 1 The LM statistic is LM = nR2, which is distributed χ2k ECON 5329 - Professor Crowder
22
The Breusch-Pagan Test Let’s test the housing price model from HPRICE1.RAW for heteroskedasticity using the Breusch-Pagan test.
Dependent Variable: PRICE Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
LOTSIZE SQRFT BDRMS C
0.002068 0.122778 13.85252 -21.77031
0.000642 0.013237 9.010145 29.47504
3.220096 9.275093 1.537436 -0.738601
0.0018 0.0000 0.1279 0.4622
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.672362 0.660661 59.83348 300723.8 -482.8775 57.46023 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
293.5460 102.7134 11.06540 11.17800 11.11076 2.109796
ECON 5329 - Professor Crowder
From original regression save residuals and calculate their squares, ř2.
23
The Breusch-Pagan Test Regress the squared residuals on the independent variables.
Dependent Variable: RESSQ Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
LOTSIZE SQRFT BDRMS C
0.201521 1.691037 1041.760 -5522.795
0.071009 1.463850 996.3810 3259.478
2.837961 1.155198 1.045544 -1.694380
0.0057 0.2513 0.2988 0.0939
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.160141 0.130146 6616.646 3.68E+09 -896.9860 5.338919 0.002048
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Calculate the LM test = nR2 = 88*0.16 = 14.08. Compare this to the χ2(4) critical value. 3417.316 7094.384 20.47695 20.58956 20.52232 2.351111
ECON 5329 - Professor Crowder
Or just use regression Fstatistic 5.34 and compare to F(84,3) distribution.
24
The Breusch-Pagan Test Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic Obs*R-squared Scaled explained SS
5.338919 14.09239 27.35542
Prob. F(3,84) Prob. Chi-Square(3) Prob. Chi-Square(3)
0.0020 0.0028 0.0000
Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
C LOTSIZE SQRFT BDRMS
-5522.795 0.201521 1.691037 1041.760
3259.478 0.071009 1.463850 996.3810
-1.694380 2.837961 1.155198 1.045544
0.0939 0.0057 0.2513 0.2988
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.160141 0.130146 6616.646 3.68E+09 -896.9860 5.338919 0.002048
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Eviews calculates B-P directly so you don’t have to.
3417.316 7094.384 20.47695 20.58956 20.52232 2.351111
ECON 5329 - Professor Crowder
25
The White Test The Breusch-Pagan test will detect any linear forms of heteroskedasticity The White test allows for nonlinearities by using squares and crossproducts of all the x’s Still just using an F or LM to test whether all the xj, xj2, and xjxh are jointly significant This can get to be unwieldy pretty quickly ECON 5329 - Professor Crowder
26
The White Test Heteroskedasticity Test: White F-statistic 5.386953 Obs*R-squared 33.73166 Scaled explained SS 65.47818
Prob. F(9,78) Prob. Chi-Square(9) Prob. Chi-Square(9)
0.0000 0.0001 0.0000
Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
C LOTSIZE LOTSIZE^2 LTSZ*SQFT LTSIZ*BDR SQRFT SQRFT^2 SQFT*BDR BDRMS BDRMS^2
15626.24 -1.859507 -4.98E-07 0.000457 0.314647 -2.673918 0.000352 -1.020860 -1982.841 289.7541
11369.41 0.637097 4.63E-06 0.000277 0.252094 8.662183 0.001840 1.667154 5438.483 758.8303
1.374411 -2.918719 -0.107498 1.649673 1.248135 -0.308689 0.191484 -0.612337 -0.364595 0.381843
0.1733 0.0046 0.9147 0.1030 0.2157 0.7584 0.8486 0.5421 0.7164 0.7036
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.383314 0.312158 5883.814 2.70E+09 -883.3955 5.386953 0.000010
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
ECON 5329 - Professor Crowder
3417.316 7094.384 20.30444 20.58596 20.41786 2.052712
27
Alternate form of the White test Consider that the fitted values from OLS, ŷ, are a function of all the x’s Thus, ŷ2 will be a function of the squares and crossproducts and ŷ and ŷ2 can proxy for all of the xj, xj2, and xjxh, so Regress the residuals squared on ŷ and ŷ2 and use the R2 to form an F or LM statistic Note only testing for 2 restrictions now ECON 5329 - Professor Crowder
28
Alternate form of the White test Dependent Variable: RESSQ Method: Least Squares Date: 10/04/13 Time: 20:54 Sample: 1 88 Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
YHAT YHAT2 C
-119.6554 0.208947 19071.59
53.31721 0.074596 8876.227
-2.244217 2.801037 2.148615
0.0274 0.0063 0.0345
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.184868 0.165689 6480.055 3.57E+09 -895.6710 9.638819 0.000169
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
LM test = n*R2 = 88*0.18 = 15.84. Compare to χ2(2) critical value.
3417.316 7094.384 20.42434 20.50880 20.45837 2.031774
ECON 5329 - Professor Crowder
29
Dealing with Heteroskedasticity The results of our tests for heteroskedasticity provide strong evidence against the null hypothesis of homoskedasticity. Remember that we had earlier suggested that taking logs of the data may help with heteroskedasticity. Let’s check that out. ECON 5329 - Professor Crowder
30
Dealing with Heteroskedasticity Dependent Variable: LPRICE Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
LLOTSIZE LSQRFT BDRMS C
0.167967 0.700232 0.036958 -1.297041
0.038281 0.092865 0.027531 0.651284
4.387712 7.540306 1.342411 -1.991515
0.0000 0.0000 0.1831 0.0497
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.642965 0.630214 0.184603 2.862563 25.86066 50.42372 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Here is our house price model but now in logs.
5.633180 0.303573 -0.496833 -0.384227 -0.451467 2.088995
ECON 5329 - Professor Crowder
31
Dealing with Heteroskedasticity Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic 1.411498 Prob. F(3,84) Obs*R-squared 4.223241 Prob. Chi-Square(3) Scaled explained SS 9.738973 Prob. Chi-Square(3)
0.2451 0.2383 0.0209
Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable
Coefficient
Std. Error
t-Statistic
Prob.
C LLOTSIZE LSQRFT BDRMS
0.509993 -0.007016 -0.062737 0.016841
0.257857 0.015156 0.036767 0.010900
1.977814 -0.462882 -1.706315 1.544983
0.0512 0.6446 0.0916 0.1261
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.047991 0.013991 0.073088 0.448717 107.3962 1.411498 0.245146
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
B-P test now cannot reject the null hypothesis at standard significance levels.
0.032529 0.073605 -2.349914 -2.237307 -2.304547 2.109799
ECON 5329 - Professor Crowder
32
Dealing with Heteroskedasticity Heteroskedasticity Test: White F-statistic Obs*R-squared Scaled explained SS
1.054955 9.549442 22.02142
Prob. F(9,78) Prob. Chi-Square(9) Prob. Chi-Square(9)
0.4053 0.3882 0.0088
White’s test also cannot reject the null. So taking logs can sometimes solve the heteroskedasticity problem.
ECON 5329 - Professor Crowder
33
Weighted Least Squares While it’s always possible to estimate robust standard errors for OLS estimates, if we know something about the specific form of the heteroskedasticity, we can obtain more efficient estimates than OLS The basic idea is going to be to transform the model into one that has homoskedastic errors – called weighted least squares ECON 5329 - Professor Crowder
34
Case of form being known up to a multiplicative constant Suppose the heteroskedasticity can be modeled as Var(u|x) = σ2h(x), where the trick is to figure out what h(x) ≡ hi looks like E(ui/√hi|x) = 0, because hi is only a function of x, and Var(ui/√hi|x) = σ2, because we know Var(u|x) = σ2hi So, if we divided our whole equation by √hi we would have a model where the error is homoskedastic ECON 5329 - Professor Crowder
35
Generalized Least Squares Estimating the transformed equation by OLS is an example of generalized least squares (GLS) GLS will be BLUE in this case GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi) ECON 5329 - Professor Crowder
36
Weighted Least Squares While it is intuitive to see why performing OLS on a transformed equation is appropriate, it can be tedious to do the transformation Weighted least squares is a way of getting the same thing, without the transformation Idea is to minimize the weighted sum of squares (weighted by 1/hi) ECON 5329 - Professor Crowder
37
Weighted Least Squares
ECON 5329 - Professor Crowder
38
Weighted Least Squares Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 White heteroskedasticity-consistent standard errors & covariance Variable
Coefficient
Std. Error
t-Statistic
Prob.
INC C
0.820681 -10.57095
0.103594 2.530272
7.922125 -4.177793
0.0000 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.082673 0.082218 45.59223 4188483. -10565.41 181.5995 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
13.59498 47.59058 10.47834 10.48390 10.48038 1.914495
ECON 5329 - Professor Crowder
Let’s look at an example using data from 401ksubs.wf1. We estimate net total financial assets, nettfa, as a function of income, inc. We have used White’s robust standard errors and OLS is unbiased and consistent so we can appeal to asymptotic distribution theory to make valid inference. But OLS is not efficient if the errors are not homoskedastic. 39
Weighted Least Squares Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic Obs*R-squared Scaled explained SS
8.770638 8.741295 1174.716
Prob. F(1,2015) Prob. Chi-Square(1) Prob. Chi-Square(1)
0.0031 0.0031 0.0000
Test Equation: Dependent Variable: RESID^2 Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable
Coefficient
Std. Error
t-Statistic
Prob.
C INC
-1886.359 134.5828
1537.683 45.44374
-1.226755 2.961526
0.2201 0.0031
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.004334 0.003840 34021.04 2.33E+12 -23907.86 8.770638 0.003097
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
The B-P test for heteroskedasticity rejects the null hypothesis at high levels of significance.
2076.590 34086.54 23.70834 23.71390 23.71038 2.628511
ECON 5329 - Professor Crowder
40
Weighted Least Squares Dependent Variable: YSTAR Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017
Let’s assume that the error variance is a function of income;
Variable
Coefficient
Std. Error
t-Statistic
Prob.
XSTAR CSTAR
0.787052 -9.580702
0.063481 1.653284
12.39815 -5.794953
0.0000 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
0.043868 0.043394 7.219338 105019.5 -6848.130 1.957812
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.
Var(u|x) = σ2 * inc 2.180711 7.381265 6.792395 6.797957 6.794436
ECON 5329 - Professor Crowder
hi = inci The results are not much different from OLS. But the standard errors are smaller.
41
Weighted Least Squares Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable
Coefficient
Std. Error
t-Statistic
Prob.
INC (AGE-25)^2 MALE E401K C
0.770583 0.025127 2.477927 6.886223 -20.98499
0.061452 0.002593 2.047776 2.123275 2.472022
12.53960 9.688756 1.210057 3.243209 -8.488998
0.0000 0.0000 0.2264 0.0012 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.127868 0.126134 44.48805 3982124. -10514.46 73.74763 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Let’s add some other control variables and compare OLS with WLS. Displayed are the OLS results. 13.59498 47.59058 10.43079 10.44470 10.43590 1.968952
ECON 5329 - Professor Crowder
The B-P test for heteroskedasticity has a p-value of 0.0034.
42
Weighted Least Squares Dependent Variable: YSTAR Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017
Now the WLS results.
Variable
Coefficient
Std. Error
t-Statistic
Prob.
XSTAR AGE1STAR MALESTAR E401KSTAR CSTAR
0.740384 0.017537 1.840529 5.188281 -16.70252
0.064303 0.001931 1.563587 1.703426 1.957995
11.51401 9.079619 1.177120 3.045792 -8.530422
0.0000 0.0000 0.2393 0.0024 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
0.085679 0.083861 7.064989 100427.1 -6803.036 1.999335
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.
2.180711 7.381265 6.750656 6.764561 6.755759
ECON 5329 - Professor Crowder
The coefficient estimates and standard errors across OLS and WLS are only marginally different so that basic inference changes little in this example. The B-P test for heteroskedasticity has a p-value of 0.1383.
43
Feasible GLS WLS is great if we know what Var(ui|xi) looks like More typical is the case where you don’t know the form of the heteroskedasticity In this case, you need to estimate h(xi) Typically, we start with the assumption of a fairly flexible model, such as Var(u|x) = σ2exp(δ0 + δ1x1 + …+ δkxk) Since we don’t know the δ, must estimate ECON 5329 - Professor Crowder
44
Feasible GLS (continued) Our assumption implies that u2 = σ2exp(δ0 + δ1x1 + …+ δkxk)v Where E(v|x) = 1, then if E(v) = 1 ln(u2) = α0 + δ1x1 + …+ δkxk + e Where E(e) = 1 and e is independent of x Now, we know that û is an estimate of u, so we can estimate this by OLS ECON 5329 - Professor Crowder
45
Feasible GLS (continued) Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of this is our weight So, what did we do? Run the original OLS model, save the residuals, û, square them and take the log Regress ln(û2) on all of the independent variables and get the fitted values, ĝ Do WLS using 1/exp(ĝ) as the weight ECON 5329 - Professor Crowder
46
Feasible GLS (continued)
ECON 5329 - Professor Crowder
47
Feasible GLS (continued) 1. Run the regression of y on x1, x2, …, xk and 2. 3. 4. 5.
obtain the residuals, u. Create log(u2) by first squaring the OLS residuals and then taking the natural log. Regress the log-squared residuals on the x’s and save the fitted values. Exponentiate the fitted values in the previous step. These are the hi. Estimate by WLS using weights created above.
ECON 5329 - Professor Crowder
48
Feasible GLS Here is the earlier OLS regression of financial wealth on income and other controls.
Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable
Coefficient
Std. Error
t-Statistic
Prob.
INC (AGE-25)^2 MALE E401K C
0.770583 0.025127 2.477927 6.886223 -20.98499
0.061452 0.002593 2.047776 2.123275 2.472022
12.53960 9.688756 1.210057 3.243209 -8.488998
0.0000 0.0000 0.2264 0.0012 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.127868 0.126134 44.48805 3982124. -10514.46 73.74763 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
13.59498 47.59058 10.43079 10.44470 10.43590 1.968952
When we estimated the WLS version we assumed a known form of heteroskedasticity. That assumption may be wrong. To estimate F-GLS save residuals from this regression.
ECON 5329 - Professor Crowder
49
Feasible GLS Create the log squared residuals (square first then log).
Dependent Variable: LRESSQ Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable
Coefficient
Std. Error
t-Statistic
Prob.
INC (AGE-25)^2 MALE E401K C
0.053933 0.002315 -0.028477 0.219585 1.945335
0.003053 0.000129 0.101738 0.105489 0.122816
17.66508 17.96550 -0.279908 2.081582 15.83943
0.0000 0.0000 0.7796 0.0375 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)
0.255746 0.254266 2.210273 9829.237 -4459.211 172.8442 0.000000
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat
Regress these on the independent variables and save the fitted values, i.e. ŷ. 4.340098 2.559493 4.426585 4.440490 4.431689 1.941515
ECON 5329 - Professor Crowder
50
Feasible GLS Exponentiate the fitted values, ŷ. These are now the hi. So we must take the square root of hi and weight each variable (including intercept) by this value.
Dependent Variable: YFGLS Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable
Coefficient
Std. Error
t-Statistic
Prob.
INCFGLS AGE1FGLS MALEFGLS E401KFGLS CFGLS
0.443493 0.014946 1.429707 4.170857 -9.265882
0.056687 0.002520 1.010099 1.195483 1.423330
7.823524 5.931668 1.415412 3.488847 -6.510005
0.0000 0.0000 0.1571 0.0005 0.0000
R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat
0.022768 0.020825 3.361151 22730.24 -5304.672 2.847689
Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.
0.890035 3.396705 5.264920 5.278825 5.270023
Note that the coefficient estimates and R2 are significantly smaller once we account for the heteroskedasticity. The B-P test has a pvalue of 0.78.
ECON 5329 - Professor Crowder
51
WLS Wrapup When doing F tests with WLS, form the weights from the unrestricted model and use those weights to do WLS on the restricted model as well as the unrestricted model Remember we are using WLS just for efficiency – OLS is still unbiased & consistent Estimates will still be different due to sampling error, but if they are very different then it’s likely that some other Gauss-Markov assumption is false ECON 5329 - Professor Crowder
52