## Multiple Regression Analysis

Multiple Regression Analysis y = β0 + β1x1 + β2x2 + . . . βkxk + u Heteroskedasticity ECON 5329 - Professor Crowder 1 What is Heteroskedasticity R...
Author: Raymond Harmon
Multiple Regression Analysis y = β0 + β1x1 + β2x2 + . . . βkxk + u Heteroskedasticity

ECON 5329 - Professor Crowder

1

What is Heteroskedasticity Recall the assumption of homoskedasticity implied that conditional on the explanatory variables, the variance of the unobserved error, u, was constant If this is not true, that is if the variance of u is different for different values of the x’s, then the errors are heteroskedastic Example: estimating returns to education and ability is unobservable, and think the variance in ability differs by educational attainment ECON 5329 - Professor Crowder

2

Example of Heteroskedasticity y

f(y|x)

.

. x1

x2

x3 ECON 5329 - Professor Crowder

.

E(y|x) = β0 + β1x

x 3

Why Worry About Heteroskedasticity? OLS is still unbiased and consistent, even if we do not assume homoskedasticity The standard errors of the estimates are biased if we have heteroskedasticity If the standard errors are biased, we can not use the usual t statistics or F statistics or LM statistics for drawing inferences ECON 5329 - Professor Crowder

4

Variance with Heteroskedasticity (xi − x )ui ∑ ˆ For the simple case, β1 = β1 + , so 2 ∑ (xi − x ) 2 ( ) x x − σ ∑ i i 2

( )

Var βˆ1 =

2 x

SST

, where SSTx = ∑ ( xi − x )

2

A valid estimator for this when σ i2 ≠ σ 2is 2 ˆ ( ) x x u − ∑ i i 2

2 x

SST

, where uˆi are are the OLS residuals ECON 5329 - Professor Crowder

5

Variance with Heteroskedasticity For the general multiple regression model, a valid

( )

estimator of Var βˆ j with heteroskedasticity is

( )

Varˆ βˆ j =

2 2 ˆ r ∑ ij uˆi

SST j2

, where rˆij is the i th residual from

regressing x j on all other independent variables, and SST j is the sum of squared residuals from this regression

ECON 5329 - Professor Crowder

6

Robust Standard Errors Dependent Variable: LWAGE Method: Least Squares Included observations: 526 Variable

Coefficient

Std. Error

t-Statistic

Prob.

MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C

0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378

0.055357 0.071767 0.055742 0.006694 0.005243 0.000110 0.006762 0.000231 0.100009

3.841881 -4.188461 -1.979658 11.78733 5.111835 -4.847105 4.301613 -2.305552 3.213492

0.0001 0.0000 0.0483 0.0000 0.0000 0.0000 0.0000 0.0215 0.0014

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Use our familiar wage regression to demonstrate the calculation of robust standard errors. Step 1: Save residuals from this regression and square them. 1.623268 0.531538 0.988423 1.061403 1.016998 1.784785

ECON 5329 - Professor Crowder

7

Robust Standard Errors Dependent Variable: MARRIED Method: Least Squares Date: 10/03/13 Time: 21:29 Sample: 1 526 Included observations: 526 Variable

Coefficient

Std. Error

t-Statistic

Prob.

MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C

0.893251 -0.597199 0.011068 0.025058 -0.000438 0.010709 -0.000230 0.275636

0.041284 0.035622 0.005291 0.004013 8.55E-05 0.005346 0.000183 0.078449

21.63694 -16.76484 2.091790 6.244295 -5.126129 2.003072 -1.252971 3.513583

0.0000 0.0000 0.0369 0.0000 0.0000 0.0457 0.2108 0.0005

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.597240 0.591797 0.312158 50.47516 -129.9371 109.7323 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Step 2: Run auxiliary regression of Xj on all other X’s and save the residuals and the sum of the squared residuals.

0.608365 0.488580 0.524476 0.589347 0.549876 1.964443

ECON 5329 - Professor Crowder

8

Variance with Heteroskedasticity ˆ r u ∑ Varˆ ( βˆ ) = , ∑ rˆ uˆ SST 1

2 2 i1 i 2 1

2 2 i1 i

= 8.1765,

( )

SST = 2547.7422, Varˆ βˆ1 = 0.0032 2 1

( )

σˆ βˆ = Varˆ βˆ1 = 0.0567 1

ECON 5329 - Professor Crowder

9

Robust Standard Errors Dependent Variable: LWAGE

Eviews can calculate the robust standard errors for us.

Method: Least Squares Included observations: 526 White heteroskedasticity-consistent standard errors & covariance Variable

Coefficient

Std. Error

t-Statistic

Prob.

MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C

0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378

0.056651 0.071750 0.056626 0.007351 0.005095 0.000105 0.006881 0.000242 0.108528

3.754142 -4.189453 -1.948772 10.73469 5.260206 -5.076983 4.227050 -2.206796 2.961234

0.0002 0.0000 0.0519 0.0000 0.0000 0.0000 0.0000 0.0278 0.0032

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

1.623268 0.531538 0.988423 1.061403 1.016998 1.784785

ECON 5329 - Professor Crowder

10

Robust Standard Errors Now that we have a consistent estimate of the variance, the square root can be used as a standard error for inference Typically call these robust standard errors Sometimes the estimated variance is corrected for degrees of freedom by multiplying by n/(n – k – 1) As n → ∞ it’s all the same, though ECON 5329 - Professor Crowder

11

Robust Standard Errors Dependent Variable: LWAGE Method: Least Squares Included observations: 526 White heteroskedasticity-consistent standard errors & covariance Variable MARRIED MARFEM FEMALE EDUC EXPER EXPERSQ TENURE TENURSQ C

Coefficient 0.212676 -0.300593 -0.110350 0.078910 0.026801 -0.000535 0.029088 -0.000533 0.321378

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

Std. Error 0.057142 0.072372 0.057116 0.007415 0.005139 0.000106 0.006941 0.000244 0.109469 0.460877 0.452535 0.393290 79.96799 -250.9552 55.24559 0.000000

t-Statistic 3.721886 -4.153457 -1.932028 10.64246 5.215010 -5.033361 4.190731 -2.187835 2.935791

Prob. 0.0002 0.0000 0.0539 0.0000 0.0000 0.0000 0.0000 0.0291 0.0035

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

( )

σˆ βˆ = Varˆ βˆ1 = 0.0567 1

Correct for the degrees of freedom used in the estimation by multiplying by

σˆ βˆ × 1

n . (n − k − 1)

n = 0.0567 *1.0175 = 0.057 (n − k − 1)

1.623268 0.531538 0.988423 1.061403 1.016998 1.784785

ECON 5329 - Professor Crowder

12

Robust Standard Errors (cont) Important to remember that these robust standard errors only have asymptotic justification – with small sample sizes t statistics formed with robust standard errors will not have a distribution close to the t, and inferences will not be correct

ECON 5329 - Professor Crowder

13

A Robust LM Statistic Run OLS on the restricted model and save the residuals ŭ Regress each of the excluded variables on all of the included variables (q different regressions) and save each set of residuals ř1, ř2, …, řq Regress a variable defined to be = 1 on ř1 ŭ, ř2 ŭ, …, řq ŭ, with no intercept The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final regression ECON 5329 - Professor Crowder

14

A Robust LM Statistic Dependent Variable: NARR86 Method: Least Squares Included observations: 2725 Variable

Coefficient

Std. Error

t-Statistic

Prob.

PCNV AVGSEN AVGSEN^2 PTIME86 QEMP86 INC86 BLACK HISPAN C

-0.135595 0.017841 -0.000516 -0.039360 -0.050507 -0.001480 0.324602 0.193380 0.567013

0.040370 0.009696 0.000297 0.008693 0.014435 0.000341 0.045419 0.039703 0.036057

-3.358825 1.840039 -1.738336 -4.527519 -3.499055 -4.345149 7.146872 4.870607 15.72531

0.0008 0.0659 0.0823 0.0000 0.0005 0.0000 0.0000 0.0000 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.072798 0.070067 0.828434 1863.998 -3349.205 26.65535 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Let’s use the arrest data from project #2 to examine the calculation of the robust LM statistic.

0.404404 0.859077 2.464738 2.484258 2.471794 1.840556

ECON 5329 - Professor Crowder

Of interest is whether increasing average sentence length (avgsen) reduces crime (narr86).

15

A Robust LM Statistic Since avgsen enters our model in levels and through its square the null hypothesis is a joint null of β2 = β3 = 0.

Dependent Variable: NARR86 Method: Least Squares Included observations: 2725 Variable

Coefficient

Std. Error

t-Statistic

Prob.

PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C

-0.132278 -0.037795 -0.050981 -0.001490 0.329688 0.195451 0.570334

0.040341 0.008497 0.014436 0.000340 0.045178 0.039693 0.036007

-3.279039 -4.448054 -3.531573 -4.376607 7.297577 4.924074 15.83942

0.0011 0.0000 0.0004 0.0000 0.0000 0.0000 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.071618 0.069569 0.828656 1866.370 -3350.938 34.94583 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

The first step is to estimate the restricted model and save the residuals, ŭ. 0.404404 0.859077 2.464541 2.479724 2.470029 1.841232

ECON 5329 - Professor Crowder

16

A Robust LM Statistic Regress each excluded variable on all of the included independent variables and save residuals, ř1.

Dependent Variable: AVGSEN Method: Least Squares Included observations: 2725 Variable

Coefficient

Std. Error

t-Statistic

Prob.

PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C

0.180057 0.391734 -0.002581 -0.002354 0.990388 0.216405 0.344888

0.164973 0.034749 0.059035 0.001392 0.184754 0.162324 0.147252

1.091438 11.27337 -0.043712 -1.690467 5.360567 1.333165 2.342170

0.2752 0.0000 0.9651 0.0911 0.0000 0.1826 0.0192

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.068887 0.066831 3.388782 31213.08 -7188.885 33.51432 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

0.632294 3.508031 5.281383 5.296566 5.286871 1.825014

ECON 5329 - Professor Crowder

17

A Robust LM Statistic Regress each excluded variable on all of the included variables and save residuals, ř2.

Dependent Variable: AVGSEN^2 Method: Least Squares Included observations: 2725 Variable

Coefficient

Std. Error

t-Statistic

Prob.

PCNV PTIME86 QEMP86 INC86 BLACK HISPAN C

-0.202596 10.50548 0.829241 -0.061357 24.37115 3.466819 5.484001

5.385343 1.134329 1.927145 0.045448 6.031096 5.298882 4.806861

-0.037620 9.261409 0.430295 -1.350053 4.040916 0.654255 1.140869

0.9700 0.0000 0.6670 0.1771 0.0001 0.5130 0.2540

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.043563 0.041452 110.6230 33261354 -16687.30 20.63297 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

12.70156 112.9895 12.25270 12.26788 12.25819 1.945413

ECON 5329 - Professor Crowder

18

A Robust LM Statistic Dependent Variable: X1 Method: Least Squares Included observations: 2725 Variable

Coefficient

Std. Error

t-Statistic

Prob.

RU2 RU1

-0.001045 0.027785

0.000548 0.014060

-1.906863 1.976170

0.0566 0.0482

Mean dependent var S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

1.000000 0.999633 2721.003 -3864.607 0.003010

S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.

0.000000 2.837877 2.842215 2.839445

Multiply the residuals (element-by-element) ři and ŭ to form ř1 ŭ and ř2 ŭ. Then regress these residual products onto the unit vector. The LM statistic is n – SSR1, where SSR1 is the sum of squared residuals from this final regression, 2725-2721.003 = 4.0. Compare this to the χ2(2). ECON 5329 - Professor Crowder

19

A Robust LM Statistic Redundant Variables Test Specification: NARR86 PCNV AVGSEN AVGSEN^2 PTIME86 QEMP86 INC86 BLACK HISPAN C Redundant Variables: AVGSEN AVGSEN^2 F-statistic Likelihood ratio

Value 1.727778 3.464803

df (2, 2716) 2

LR test summary: Restricted LogL Unrestricted LogL

Value -3350.938 -3349.205

df 2718 2716

Probability 0.1779 0.1769

The LR test of the same restriction is not robust to the heteroskedasticity but still yields the same basic inference.

ECON 5329 - Professor Crowder

20

Testing for Heteroskedasticity Essentially want to test H0: Var(u|x1, x2,…, xk) = σ2, which is equivalent to H0: E(u2|x1, x2,…, xk) = E(u2) = σ2 If assume the relationship between u2 and xj will be linear, can test as a linear restriction So, for u2 = δ0 + δ1x1 +…+ δk xk + v) this means testing H0: δ1 = δ2 = … = δk = 0 ECON 5329 - Professor Crowder

21

The Breusch-Pagan Test Don’t observe the error, but can estimate it with the residuals from the OLS regression After regressing the residuals squared on all of the x’s, can use the R2 to form an F or LM test The F statistic is just the reported F statistic for overall significance of the regression, F = [R2/k]/[(1 – R2)/(n – k – 1)], which is distributed Fk, n – k - 1 The LM statistic is LM = nR2, which is distributed χ2k ECON 5329 - Professor Crowder

22

The Breusch-Pagan Test Let’s test the housing price model from HPRICE1.RAW for heteroskedasticity using the Breusch-Pagan test.

Dependent Variable: PRICE Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

LOTSIZE SQRFT BDRMS C

0.002068 0.122778 13.85252 -21.77031

0.000642 0.013237 9.010145 29.47504

3.220096 9.275093 1.537436 -0.738601

0.0018 0.0000 0.1279 0.4622

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.672362 0.660661 59.83348 300723.8 -482.8775 57.46023 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

293.5460 102.7134 11.06540 11.17800 11.11076 2.109796

ECON 5329 - Professor Crowder

From original regression save residuals and calculate their squares, ř2.

23

The Breusch-Pagan Test Regress the squared residuals on the independent variables.

Dependent Variable: RESSQ Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

LOTSIZE SQRFT BDRMS C

0.201521 1.691037 1041.760 -5522.795

0.071009 1.463850 996.3810 3259.478

2.837961 1.155198 1.045544 -1.694380

0.0057 0.2513 0.2988 0.0939

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.160141 0.130146 6616.646 3.68E+09 -896.9860 5.338919 0.002048

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Calculate the LM test = nR2 = 88*0.16 = 14.08. Compare this to the χ2(4) critical value. 3417.316 7094.384 20.47695 20.58956 20.52232 2.351111

ECON 5329 - Professor Crowder

Or just use regression Fstatistic 5.34 and compare to F(84,3) distribution.

24

The Breusch-Pagan Test Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic Obs*R-squared Scaled explained SS

5.338919 14.09239 27.35542

Prob. F(3,84) Prob. Chi-Square(3) Prob. Chi-Square(3)

0.0020 0.0028 0.0000

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

C LOTSIZE SQRFT BDRMS

-5522.795 0.201521 1.691037 1041.760

3259.478 0.071009 1.463850 996.3810

-1.694380 2.837961 1.155198 1.045544

0.0939 0.0057 0.2513 0.2988

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.160141 0.130146 6616.646 3.68E+09 -896.9860 5.338919 0.002048

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Eviews calculates B-P directly so you don’t have to.

3417.316 7094.384 20.47695 20.58956 20.52232 2.351111

ECON 5329 - Professor Crowder

25

The White Test The Breusch-Pagan test will detect any linear forms of heteroskedasticity The White test allows for nonlinearities by using squares and crossproducts of all the x’s Still just using an F or LM to test whether all the xj, xj2, and xjxh are jointly significant This can get to be unwieldy pretty quickly ECON 5329 - Professor Crowder

26

The White Test Heteroskedasticity Test: White F-statistic 5.386953 Obs*R-squared 33.73166 Scaled explained SS 65.47818

Prob. F(9,78) Prob. Chi-Square(9) Prob. Chi-Square(9)

0.0000 0.0001 0.0000

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

C LOTSIZE LOTSIZE^2 LTSZ*SQFT LTSIZ*BDR SQRFT SQRFT^2 SQFT*BDR BDRMS BDRMS^2

15626.24 -1.859507 -4.98E-07 0.000457 0.314647 -2.673918 0.000352 -1.020860 -1982.841 289.7541

11369.41 0.637097 4.63E-06 0.000277 0.252094 8.662183 0.001840 1.667154 5438.483 758.8303

1.374411 -2.918719 -0.107498 1.649673 1.248135 -0.308689 0.191484 -0.612337 -0.364595 0.381843

0.1733 0.0046 0.9147 0.1030 0.2157 0.7584 0.8486 0.5421 0.7164 0.7036

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.383314 0.312158 5883.814 2.70E+09 -883.3955 5.386953 0.000010

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

ECON 5329 - Professor Crowder

3417.316 7094.384 20.30444 20.58596 20.41786 2.052712

27

Alternate form of the White test Consider that the fitted values from OLS, ŷ, are a function of all the x’s Thus, ŷ2 will be a function of the squares and crossproducts and ŷ and ŷ2 can proxy for all of the xj, xj2, and xjxh, so Regress the residuals squared on ŷ and ŷ2 and use the R2 to form an F or LM statistic Note only testing for 2 restrictions now ECON 5329 - Professor Crowder

28

Alternate form of the White test Dependent Variable: RESSQ Method: Least Squares Date: 10/04/13 Time: 20:54 Sample: 1 88 Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

YHAT YHAT2 C

-119.6554 0.208947 19071.59

53.31721 0.074596 8876.227

-2.244217 2.801037 2.148615

0.0274 0.0063 0.0345

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.184868 0.165689 6480.055 3.57E+09 -895.6710 9.638819 0.000169

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

LM test = n*R2 = 88*0.18 = 15.84. Compare to χ2(2) critical value.

3417.316 7094.384 20.42434 20.50880 20.45837 2.031774

ECON 5329 - Professor Crowder

29

Dealing with Heteroskedasticity The results of our tests for heteroskedasticity provide strong evidence against the null hypothesis of homoskedasticity. Remember that we had earlier suggested that taking logs of the data may help with heteroskedasticity. Let’s check that out. ECON 5329 - Professor Crowder

30

Dealing with Heteroskedasticity Dependent Variable: LPRICE Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

LLOTSIZE LSQRFT BDRMS C

0.167967 0.700232 0.036958 -1.297041

0.038281 0.092865 0.027531 0.651284

4.387712 7.540306 1.342411 -1.991515

0.0000 0.0000 0.1831 0.0497

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.642965 0.630214 0.184603 2.862563 25.86066 50.42372 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Here is our house price model but now in logs.

5.633180 0.303573 -0.496833 -0.384227 -0.451467 2.088995

ECON 5329 - Professor Crowder

31

Dealing with Heteroskedasticity Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic 1.411498 Prob. F(3,84) Obs*R-squared 4.223241 Prob. Chi-Square(3) Scaled explained SS 9.738973 Prob. Chi-Square(3)

0.2451 0.2383 0.0209

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Included observations: 88 Variable

Coefficient

Std. Error

t-Statistic

Prob.

C LLOTSIZE LSQRFT BDRMS

0.509993 -0.007016 -0.062737 0.016841

0.257857 0.015156 0.036767 0.010900

1.977814 -0.462882 -1.706315 1.544983

0.0512 0.6446 0.0916 0.1261

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.047991 0.013991 0.073088 0.448717 107.3962 1.411498 0.245146

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

B-P test now cannot reject the null hypothesis at standard significance levels.

0.032529 0.073605 -2.349914 -2.237307 -2.304547 2.109799

ECON 5329 - Professor Crowder

32

Dealing with Heteroskedasticity Heteroskedasticity Test: White F-statistic Obs*R-squared Scaled explained SS

1.054955 9.549442 22.02142

Prob. F(9,78) Prob. Chi-Square(9) Prob. Chi-Square(9)

0.4053 0.3882 0.0088

White’s test also cannot reject the null. So taking logs can sometimes solve the heteroskedasticity problem.

ECON 5329 - Professor Crowder

33

Weighted Least Squares While it’s always possible to estimate robust standard errors for OLS estimates, if we know something about the specific form of the heteroskedasticity, we can obtain more efficient estimates than OLS The basic idea is going to be to transform the model into one that has homoskedastic errors – called weighted least squares ECON 5329 - Professor Crowder

34

Case of form being known up to a multiplicative constant Suppose the heteroskedasticity can be modeled as Var(u|x) = σ2h(x), where the trick is to figure out what h(x) ≡ hi looks like E(ui/√hi|x) = 0, because hi is only a function of x, and Var(ui/√hi|x) = σ2, because we know Var(u|x) = σ2hi So, if we divided our whole equation by √hi we would have a model where the error is homoskedastic ECON 5329 - Professor Crowder

35

Generalized Least Squares Estimating the transformed equation by OLS is an example of generalized least squares (GLS) GLS will be BLUE in this case GLS is a weighted least squares (WLS) procedure where each squared residual is weighted by the inverse of Var(ui|xi) ECON 5329 - Professor Crowder

36

Weighted Least Squares While it is intuitive to see why performing OLS on a transformed equation is appropriate, it can be tedious to do the transformation Weighted least squares is a way of getting the same thing, without the transformation Idea is to minimize the weighted sum of squares (weighted by 1/hi) ECON 5329 - Professor Crowder

37

Weighted Least Squares

ECON 5329 - Professor Crowder

38

Weighted Least Squares Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 White heteroskedasticity-consistent standard errors & covariance Variable

Coefficient

Std. Error

t-Statistic

Prob.

INC C

0.820681 -10.57095

0.103594 2.530272

7.922125 -4.177793

0.0000 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.082673 0.082218 45.59223 4188483. -10565.41 181.5995 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

13.59498 47.59058 10.47834 10.48390 10.48038 1.914495

ECON 5329 - Professor Crowder

Let’s look at an example using data from 401ksubs.wf1. We estimate net total financial assets, nettfa, as a function of income, inc. We have used White’s robust standard errors and OLS is unbiased and consistent so we can appeal to asymptotic distribution theory to make valid inference. But OLS is not efficient if the errors are not homoskedastic. 39

Weighted Least Squares Heteroskedasticity Test: Breusch-Pagan-Godfrey F-statistic Obs*R-squared Scaled explained SS

8.770638 8.741295 1174.716

Prob. F(1,2015) Prob. Chi-Square(1) Prob. Chi-Square(1)

0.0031 0.0031 0.0000

Test Equation: Dependent Variable: RESID^2 Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable

Coefficient

Std. Error

t-Statistic

Prob.

C INC

-1886.359 134.5828

1537.683 45.44374

-1.226755 2.961526

0.2201 0.0031

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.004334 0.003840 34021.04 2.33E+12 -23907.86 8.770638 0.003097

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

The B-P test for heteroskedasticity rejects the null hypothesis at high levels of significance.

2076.590 34086.54 23.70834 23.71390 23.71038 2.628511

ECON 5329 - Professor Crowder

40

Weighted Least Squares Dependent Variable: YSTAR Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017

Let’s assume that the error variance is a function of income;

Variable

Coefficient

Std. Error

t-Statistic

Prob.

XSTAR CSTAR

0.787052 -9.580702

0.063481 1.653284

12.39815 -5.794953

0.0000 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.043868 0.043394 7.219338 105019.5 -6848.130 1.957812

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.

Var(u|x) = σ2 * inc 2.180711 7.381265 6.792395 6.797957 6.794436

ECON 5329 - Professor Crowder

hi = inci The results are not much different from OLS. But the standard errors are smaller.

41

Weighted Least Squares Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable

Coefficient

Std. Error

t-Statistic

Prob.

INC (AGE-25)^2 MALE E401K C

0.770583 0.025127 2.477927 6.886223 -20.98499

0.061452 0.002593 2.047776 2.123275 2.472022

12.53960 9.688756 1.210057 3.243209 -8.488998

0.0000 0.0000 0.2264 0.0012 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.127868 0.126134 44.48805 3982124. -10514.46 73.74763 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Let’s add some other control variables and compare OLS with WLS. Displayed are the OLS results. 13.59498 47.59058 10.43079 10.44470 10.43590 1.968952

ECON 5329 - Professor Crowder

The B-P test for heteroskedasticity has a p-value of 0.0034.

42

Weighted Least Squares Dependent Variable: YSTAR Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017

Now the WLS results.

Variable

Coefficient

Std. Error

t-Statistic

Prob.

XSTAR AGE1STAR MALESTAR E401KSTAR CSTAR

0.740384 0.017537 1.840529 5.188281 -16.70252

0.064303 0.001931 1.563587 1.703426 1.957995

11.51401 9.079619 1.177120 3.045792 -8.530422

0.0000 0.0000 0.2393 0.0024 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.085679 0.083861 7.064989 100427.1 -6803.036 1.999335

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.

2.180711 7.381265 6.750656 6.764561 6.755759

ECON 5329 - Professor Crowder

The coefficient estimates and standard errors across OLS and WLS are only marginally different so that basic inference changes little in this example. The B-P test for heteroskedasticity has a p-value of 0.1383.

43

Feasible GLS WLS is great if we know what Var(ui|xi) looks like More typical is the case where you don’t know the form of the heteroskedasticity In this case, you need to estimate h(xi) Typically, we start with the assumption of a fairly flexible model, such as Var(u|x) = σ2exp(δ0 + δ1x1 + …+ δkxk) Since we don’t know the δ, must estimate ECON 5329 - Professor Crowder

44

Feasible GLS (continued) Our assumption implies that u2 = σ2exp(δ0 + δ1x1 + …+ δkxk)v Where E(v|x) = 1, then if E(v) = 1 ln(u2) = α0 + δ1x1 + …+ δkxk + e Where E(e) = 1 and e is independent of x Now, we know that û is an estimate of u, so we can estimate this by OLS ECON 5329 - Professor Crowder

45

Feasible GLS (continued) Now, an estimate of h is obtained as ĥ = exp(ĝ), and the inverse of this is our weight So, what did we do? Run the original OLS model, save the residuals, û, square them and take the log Regress ln(û2) on all of the independent variables and get the fitted values, ĝ Do WLS using 1/exp(ĝ) as the weight ECON 5329 - Professor Crowder

46

Feasible GLS (continued)

ECON 5329 - Professor Crowder

47

Feasible GLS (continued) 1. Run the regression of y on x1, x2, …, xk and 2. 3. 4. 5.

obtain the residuals, u. Create log(u2) by first squaring the OLS residuals and then taking the natural log. Regress the log-squared residuals on the x’s and save the fitted values. Exponentiate the fitted values in the previous step. These are the hi. Estimate by WLS using weights created above.

ECON 5329 - Professor Crowder

48

Feasible GLS Here is the earlier OLS regression of financial wealth on income and other controls.

Dependent Variable: NETTFA Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable

Coefficient

Std. Error

t-Statistic

Prob.

INC (AGE-25)^2 MALE E401K C

0.770583 0.025127 2.477927 6.886223 -20.98499

0.061452 0.002593 2.047776 2.123275 2.472022

12.53960 9.688756 1.210057 3.243209 -8.488998

0.0000 0.0000 0.2264 0.0012 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.127868 0.126134 44.48805 3982124. -10514.46 73.74763 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

13.59498 47.59058 10.43079 10.44470 10.43590 1.968952

When we estimated the WLS version we assumed a known form of heteroskedasticity. That assumption may be wrong. To estimate F-GLS save residuals from this regression.

ECON 5329 - Professor Crowder

49

Feasible GLS Create the log squared residuals (square first then log).

Dependent Variable: LRESSQ Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable

Coefficient

Std. Error

t-Statistic

Prob.

INC (AGE-25)^2 MALE E401K C

0.053933 0.002315 -0.028477 0.219585 1.945335

0.003053 0.000129 0.101738 0.105489 0.122816

17.66508 17.96550 -0.279908 2.081582 15.83943

0.0000 0.0000 0.7796 0.0375 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood F-statistic Prob(F-statistic)

0.255746 0.254266 2.210273 9829.237 -4459.211 172.8442 0.000000

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter. Durbin-Watson stat

Regress these on the independent variables and save the fitted values, i.e. ŷ. 4.340098 2.559493 4.426585 4.440490 4.431689 1.941515

ECON 5329 - Professor Crowder

50

Feasible GLS Exponentiate the fitted values, ŷ. These are now the hi. So we must take the square root of hi and weight each variable (including intercept) by this value.

Dependent Variable: YFGLS Method: Least Squares Sample: 1 9275 IF FSIZE=1 Included observations: 2017 Variable

Coefficient

Std. Error

t-Statistic

Prob.

INCFGLS AGE1FGLS MALEFGLS E401KFGLS CFGLS

0.443493 0.014946 1.429707 4.170857 -9.265882

0.056687 0.002520 1.010099 1.195483 1.423330

7.823524 5.931668 1.415412 3.488847 -6.510005

0.0000 0.0000 0.1571 0.0005 0.0000

R-squared Adjusted R-squared S.E. of regression Sum squared resid Log likelihood Durbin-Watson stat

0.022768 0.020825 3.361151 22730.24 -5304.672 2.847689

Mean dependent var S.D. dependent var Akaike info criterion Schwarz criterion Hannan-Quinn criter.

0.890035 3.396705 5.264920 5.278825 5.270023

Note that the coefficient estimates and R2 are significantly smaller once we account for the heteroskedasticity. The B-P test has a pvalue of 0.78.

ECON 5329 - Professor Crowder

51

WLS Wrapup When doing F tests with WLS, form the weights from the unrestricted model and use those weights to do WLS on the restricted model as well as the unrestricted model Remember we are using WLS just for efficiency – OLS is still unbiased & consistent Estimates will still be different due to sampling error, but if they are very different then it’s likely that some other Gauss-Markov assumption is false ECON 5329 - Professor Crowder

52