ECON 482 / WH Hong
Multiple Regression Analysis: Inference
1. Sampling Distribution of the OLS Estimators Statistical inference in the regression model ¾ Hypothesis tests about population parameters ¾ Construction of confidence intervals
Sampling distribution of the OLS estimators ¾ The OLS estimators are random variables ¾ We already know their expected values and their variances ¾ However, for hypothesis tests we need to know their distribution. ¾ In order to derive their distribution we need additional assumptions ¾ Assumption about distribution of errors: Normal distribution =>
1
Assumption MRL. 6
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Additional Assumption MLR. 6
Normality of error terms
The population error ui is independent of the explanatory variables xi1 , xi 2 ,..., xik and is normally distributed with zero mean and variance σ 2 . That is, ui ~ N ( 0,σ 2 )
independently of xi1 , xi 2 ,..., xik
Terminology MLR.1 - MLR. 5: Gauss-Markov assumptions MLR.1 - MLR. 6: Gauss-Markov assumptions + Normality =
Classical Linear Model (CLM) assumptions
Under CLM assumption, we can summarize the population assumptions as: y x ~ Normal ( β 0 + β1 x1 + ... + β k xk , σ 2 )
2
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Graphically,
Note that Assumption MRL. 6 is much stronger than any of our previous assumptions. In fact, if we make Assumption MRL. 6, we are necessarily assuming MLR. 4 (Zero conditional mean) and MLR. 5 (Homoskedasticity).
3
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Discussion of the normality assumption • The error term is the sum of "many" different unobserved factors • Sums of independent factors are normally distributed by the Central Limit Theorem (CLT). • Problem ¾ CLT only works when all unobserved factors affect y in a separate additive fashion. But nothing guarantees that this is so. => may be a complicated functional form ¾ Possibly very heterogeneous distributions of individual factors ¾ How independent are the different factors? • The normality of the error term is an empirical question. In many cases, normality is questionable or impossible by definition • Examples where normality cannot hold: ¾ Wage (nonnegative); Unemployment (indicator variable, takes on only 1 or 0), etc • In some cases, normality can be achieved through transformations of the dependent variable (e.g. use log(wage) instead of wage) • Under normality, OLS is BLUE. • Important: For the purpose of statistical inference, the assumption of normality can be replaced by a large sample size. 4
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Normal sampling distributions • Normality of the error term translates into normal sampling distributions of the OLS estimators.
Theorem 4.1 Normal sampling distribution Under the CLM assumption from MLR. 1 through MLR. 6,
(
( ))
βˆ j ~ Normal β j , var βˆ j where
( )
var βˆ j =
σ2
SST j (1 − R 2j )
SST j = ∑ ( x j − x j ) n
,
2
and
i =1
R j2 is the R-squared from the regression of x j on all other independent variables
Therefore, the standardized estimators follow:
( βˆ
) ~ Normal ( 0, 1) s.d .( βˆ ) j
−βj j
5
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Sketch of the proof.
• The proof uses a property of normality: ¾ Any linear combination of normally-distributed random variables is also normally distributed. • Note that the OLS estimators are a linear combination of errors: n
¾ βˆ j = β j + ∑ wij ui where wij = rˆij / SSR j , rˆij is the i-th residual from the regression of i =1
n
the x j on all other independent variables, and SSR j = ∑ rˆij2 i =1
• Therefore, βˆ j is also normally-distributed.
Note that any linear combination of βˆ1 ,..., βˆk is also normally distributed since each β j is normally distributed.
6
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
2. Testing Hypotheses about a Single Population Parameter: The t Test (1)
Test statistics
Theorem 4.2
t-distribution for the standardized estimator
Under the CLM assumptions MLR. 1 through MLR. 6,
βˆ j − β j
( )
s.d . βˆ j
~ tn−k −1 = tdf
where k + 1 is the number of unknown parameters, and n − k − 1 is the degrees of freedom (df).
Note: The t-distribution is close to the standard normal distribution if n − k − 1 is large
7
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Constructing the test statistics and hypothesis testing H0 : β j = 0
• Null hypothesis:
¾ The population parameter is equal to zero, i.e., after controlling for the other independent variables, x j have no effect on the expected value of y . • Under this null hypothesis, we can construct the t-statistics (or t-ratio) as: ¾ tβˆ = j
βˆ j
( )
s.d . βˆ j
¾ The farther the estimated coefficient is away from zero, the less likely it is that the null hypothesis hold true. • Distribution of the t-statistics if the null hypothesis is true ¾ tβˆ = j
• Goal:
βˆ j
( )
s.d . βˆ j
=
βˆ j − β j
( )
s.d . βˆ j
~ tn−k −1
Define a rejection rule so that, if it is true, H 0 is rejected only with a small
probability (= significance level, e.g. 5%)
8
ECON 482 / WH Hong
(2)
Multiple Regression Analysis: Inference
Testing against one-sided alternatives • Hypothesis (greater than zero) H0 : β j = 0
H1 : β j > 0
against
• Construct the t-statistics under the null hypothesis • Decide a rejection rule, i.e., define the significance level (10%, 5%, or 1%). And construct the critical value, c . • The null is rejected if
tβˆ > c0.05 = tn−k −1,0.05 ; e.g. c0.05 = t28,0.05 = 1.701 j
• Graphically,
9
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
(Example) Wage equation • Estimated equation m ( wage ) = 0.284 + 0.092 educ + 0.0041 exper + 0.022 tenure log
(0.104) (0.007)
n = 526 ,
(0.0017)
(0.003)
R 2 = 0.316 , and standard errors are in the parentheses
• Test whether, after controlling for education and tenure, higher work experience leads to higher hourly wages • Hypothesis:
H 0 : β exper = 0
• t-statistics:
tβˆ
exper
=
H1 : β exper > 0
vs.
0.0041 ≈ 2.41 0.0017
• Critical values: df = n − k − 1 = 526 − 3 − 1 = 522
At the 5 % significance level, c0.05 = t522,0.05 = 1.645 ; At the 1 % significance level, c0.01 = t522,0.01 = 2.326 • Since tβˆ
exper
> c0.05 , the effect of experience on hourly wage is statistically greater than zero at
the 5 % (even at the 1%) significance level. 10
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
Similarly, we can test the following hypothesis (less than zero) H0 : β j = 0
against
H1 : β j < 0
• Construct the t-statistics under the null hypothesis • Decide a rejection rule, i.e., define the significance level (10%, 5%, or 1%). And construct the critical value, c . • The null is rejected if
tβˆ < c0.05 = tn−k −1,0.05 ; e.g. c0.05 = t18,0.05 = −1.734 j
• Graphically,
11
ECON 482 / WH Hong
(3)
Multiple Regression Analysis: Inference
Testing against two-sided alternatives
• Hypothesis H0 : β j = 0
against
H1 : β j ≠ 0
• Construct the t-statistics under the null hypothesis • Decide a rejection rule, i.e., define the significance level (10%, 5%, or 1%). And construct the critical value, c . • The null is rejected if
tβˆ > c0.05 = tn−k −1,0.025 ; j
• Graphically,
12
e.g. c0.05 = t25,0.025 = 2.06
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
(Example) Determinants of college GPA • Estimated equation n = 1.39 + 0.412 hsGPS + 0.015 ACT − 0.083 skipped collGPA
(0.33) (0.094)
(0.011)
(0.026)
skipped : lectures missed per week
n = 141 ,
R 2 = 0.234
• thsGPA = 4.38 > c0.01 = 2.58 t ACT = 1.36 < c0.10 = 1.645 tskipped = −3.19 > c0.01 = 2.58
• The effects of hsGPA and skipped are significantly different from zero at the 1 % significance level. The effect of ACT is not significantly different from zero, not even at the 10 % significance level.
13
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
"Statistically significant" variables in a regression ¾ If a regression coefficient is different from zero in a two-sided test, the corresponding
variable is said to be "statistically significant" ¾ If the number of degrees of freedom is large enough so that the normal approximation
applies, the following rules of thumb apply: t − ratio > 1.645
=> "statistically significant at the 10 % level"
t − ratio > 1.96
=> "statistically significant at the 5 % level"
t − ratio > 2.576
=> "statistically significant at the 1 % level"
14
ECON 482 / WH Hong
(4)
Multiple Regression Analysis: Inference
Testing other hypothesis: general case
• Hypothesis H0 : β j = α j
against
H1 : β j ≠ α j
where α j is a hypothesized value of the coefficient • t-statistics ˆ estimate − hypothesized value ) ( β j − α j ) ( = t= standard error
• critical value:
( )
s.d . βˆ j
c0.05 = tn−k −1,0.025
• Reject the null if
t > c0.05
• The test works exactly as before, except that the hypothesized value is subtracted from the estimate when forming the statistic
15
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
Campus crime and enrollment
• An interesting hypothesis is whether crime increases by one percent if enrollment is increased by one percent • Estimated equation m ( crime ) = −6.63 + 1.27 log ( enroll ) log
(1.03) (0.11)
n = 97 ,
R 2 = 0.585
• Hypothesis: • t=
H 0 : β log( enroll ) = 1 vs.
H1 : β log( enroll ) ≠ 1
(1.27 − 1) ≈ 2.45 > 1.96 = c 0.11
0.05
16
ECON 482 / WH Hong
(5)
Multiple Regression Analysis: Inference
Computing p-values for t-test
• If the significance level is made smaller and smaller, there will be a point where the null hypothesis cannot be rejected anymore • The smallest significance level at which the null hypothesis is rejected, is called the p-value of the hypothesis test • p-value= P ( T > t − ratio )
where
T is a t distributed random variable.
• A small p-value is evidence against the null hypothesis. • For example, when the t-statistic is 1.85 with df = 40, we can calculate the p-value as: P ( T > 1.85 ) = 2 ( 0.359 ) = 0.0718
(need to use a computer program.)
17
ECON 482 / WH Hong
(6)
Multiple Regression Analysis: Inference
Confidence intervals
• Simple manipulation of the result in Theorem 4.2 implies that
(
( ) ) = 0.95
( )
P βˆ j − c0.05 ⋅ s.d . βˆ j ≤ β j ≤ βˆ j + c0.05 ⋅ s.d . βˆ j
where
c0.05 is the critical value of the sided test
• Interpretation of the confidence interval ¾ The bound of the interval are random ¾ In repeated samples, the interval that is constructed in the above way will cover the
population regression coefficient in 95% of the cases • Confidence intervals for typical confidence levels
( P ( βˆ − c P ( βˆ − c
( )
( ) ) = 0.99
( )
( ) ) = 0.95 ⋅ s.d .( βˆ ) ) = 0.90
P βˆ j − c0.01 ⋅ s.d . βˆ j ≤ β j ≤ βˆ j + c0.01 ⋅ s.d . βˆ j j
0.05
⋅ s.d . βˆ j ≤ β j ≤ βˆ j + c0.05 ⋅ s.d . βˆ j
j
0.10
⋅ s.d . βˆ j ≤ β j ≤ βˆ j + c0.10
( )
j
As a rule of thumb, we can use c0.01 = 2.576 , c0.05 = 1.96 and c0.10 = 1.645
18
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
Model of firms' R&D expenditure
• Estimated equation m ( rd ) = −4.38 + 1.084 log ( sales ) + 0.0218 profmarg log
(0.47) (0.060)
n = 32 ,
R 2 = 0.918 ,
• Critical value:
(0.0217)
profmarg : Profits as percentage of sales
c0.05 = t29,0.025 = 2.045
• C.I. of log ( sales ) :
1.084 ± 2.045 ( 0.060 ) = [ 0.961, 1.21]
¾ The effect of sales on R&D is relatively precisely estimated as the interval is narrow. ¾ Moreover, the effect is significantly different from zero because zero is outside the
interval • C.I. of profmarg :
0.0217 ± 2.045 ( 0.0218 ) = [ −0.0045, 0.0479]
¾ The effect is imprecisely estimated as the interval is very wide ¾ It is not even statistically significant because zero lies in the interval
19
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
3. Testing Hypotheses about a Single Linear Combination of the Parameters (Example) Return to education at 2 year vs. at 4 year colleges • log ( wage ) = β 0 + β1 jc + β 2univ + β3exper + u jc :
years of education at 2 year colleges
univ :
years of education at 4 year colleges
exper :
work experience
• Test hypothesis:
H 0 : β1 = β 2 or β1 − β 2 = 0
vs.
H1 : β1 − β 2 < 0
(i) A possible test statistic • t-statistic:
t=
βˆ1 − βˆ2
(
s.d . βˆ1 − βˆ2
)
, Hence, reject the null if t < c0.05 = tn−k −1,0.05
• However, it is impossible with standard regression output because
( ) ( ) m βˆ , βˆ is unknown. cov ( )
( )
( )
(
m βˆ − βˆ = var m βˆ + var m βˆ − 2cov m βˆ , βˆ s.d . βˆ1 − βˆ2 = var 1 2 1 2 1 2 1
2
20
)
and
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
(ii) Alternative method • Define θ1 = β1 − β 2 and test
H 0 : θ1 = 0
against
H1 : θ1 < 0
log ( wage ) = β 0 + (θ1 + β 2 ) jc + β 2univ + β 3exper + u
= β 0 + θ1 jc + β 2 ( jc + univ ) + β 3exper + u
• Estimated equation m ( wage ) = 1.472 − 0.0102 jc + 0.0769totcoll + 0.0049exper log (0.021) (0.0069) n = 6,763 ,
• t=
R 2 = 0.222 ,
(0.0023)
(0.0002)
totcoll = jc + univ
−0.0102 = −1.48 0.0069
p − value = P (T < −1.48 ) = 0.70
C.I.
−0.0102 ± 1.96 ( 0.0069 ) = [ −0.0237, 0.0003]
=> The null hypothesis is rejected at the 10% level but not at the 5% level • This method works always for single linear hypothesis
21
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
4. Testing Multiple Linear Restrictions: The F test (1)
Testing exclusion restriction
(Example)
MLB players' salaries
• log ( salary ) = β 0 + β1 years + β 2 gamesyr + β3bavg + β 4 hrunsyr + β5 rbisyr + u years :
years in the league
gamesyr :
average number of games per year
bavg :
batting average
hrunsyr :
home runs per year
rbisyr :
runs batted in per year
• Hypothesis:
H 0 : β3 = 0, β 4 = 0, β5 = 0
vs.
H1 : H 0 is not true
¾ test whether performance measures have no effect on salary / can be excluded from
regression. ¾ The alternative hypothesis can be read as "at least one performance measure have an
effect on salary."
22
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
MLB players' salaries (Cont'd)
• Estimation of the unrestricted model m ( salary ) = 11.19 + 0.0689 years + 0.0126 gamesyr log (0.29) (0.0121) (0.0026) + 0.00098bavg + 0.0144hrunsyr + 0.0108rbisyr (0.00110) (0.0161) (0.0072)
n = 353 , SSRur = 183.186 , Rur2 = 0.6278 ¾ None of the performance measures is statistically significant when tested individually
using t-statistic. • Estimation of the restricted model ( β 3 = 0, β 4 = 0, β 5 = 0 ) m ( salary ) = 11.22 + 0.0713 years + 0.0202 gamesyr log
(0.11) (0.0125)
(0.0013)
n = 353 , SSRr = 198.311 , Rr2 = 0.5971 ¾ The sum of squared residuals necessarily increases because possible relevant variables
are dropped in the restricted regression. => Is the increase statistically significant? 23
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
MLB players' salaries (Cont'd)
• Test statistic F=
( SSRr − SSRur ) / q ~ F q ,n −k −1 SSRur / ( n − k − 1)
q = numerator df = # of restictions = df r − dfur
and
n − k − 1 = denominator df = dfur
¾ No derivation, but the definition of an F distributed random variable is F =
• Rejection rule: Reject the null if
F > c0.05 = Fq ,n−k −1,0.05
¾ A F-distributed variable only takes on positive values. 24
χ q2 χ n2−k −1
.
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
MLB players' salaries (Cont'd)
• Test decision F=
(198.311 − 183.186 ) / 3 ≈ 9.55 183.186 / ( 353 − 5 − 1)
and
c0.05 = F3,347,0.01 = 3.78
p − value = P ( F > 9.55 ) = 0.000
Thus, the null hypothesis is overwhelmingly rejected (even very small significance level).
• Discussion ¾ They were not significant when tested individually ¾ But, the three variables are "jointly significant" ¾ The likely reason is multicollinearity between them
25
ECON 482 / WH Hong
Multiple Regression Analysis: Inference
• The R-squared form of the F statistic ¾
Rur2 − Rr2 ) / q ( SSRr − SSRur ) / q ( F= = SSRur / ( n − k − 1) (1 − Rur2 ) / ( n − k − 1)
¾ The proof can be easily done by using SSRr = SST (1 − Rr2 ) and SSRur = SST (1 − Rur2 ) ¾ Note that the R-squared form of the test is only valid for exclusion restriction. ¾ Since Rr2 = 0.5971 Rur2 = 0.6278 , F=
(R
− Rr2 ) / q
( 0.6278 − 0.5971) / 3 ≈ 9.54 (1 − Rur2 ) / ( n − k − 1) (1 − 0.6278) / 347 2 ur
=
- The difference in the last decimal digit is due to rounding error
26
ECON 482 / WH Hong
(2)
Multiple Regression Analysis: Inference
Test of overall significance of a regression
• Unrestricted model yi = β 0 + β1 xi1 + β 2 xi 2 + ... + β k xik + ui
• Hypothesis:
H 0 : β1 = ... = β k = 0
vs.
H1 : H1 is not true
¾ The null hypothesis states that the explanatory variables are not useful at all in explaining
the dependent variable ¾
• Restricted model (regression on constant) yi = β 0 + ui
• Test statistic SSRr − SSRur ) / q Rur2 / q ( F= = ~ Fk ,n−k −1 SSRur / ( n − k − 1) (1 − Rur2 ) / ( n − k − 1)
¾ Note that Rr2 = 0 .
27
ECON 482 / WH Hong
(3)
Multiple Regression Analysis: Inference
Testing general linear restrictions with the F-test
(Example)
Test whether house price assessments are rational
• Estimation model: log ( price ) = β 0 + β1 log ( assess ) + β 2 log ( lotsize ) + β 3 log ( sqrft ) + β 4bdrms + u price :
actual house price
lotsize :
size of lots (in feet)
bdrms :
number of bedrooms
• Hypothesis:
assess : sqrft :
the assessed housing value (before sold) square feet
H 0 : β1 = 1, β 2 = 0, β 3 = 0, β 4 = 0
vs.
H1 : H 0 is not true
¾ If house price assessments are rational, a 1% change in the assessment should be
associated with a 1% change in price. ¾ In addition, other known factors should not influence the price once the assessed value
has been controlled for.
28
ECON 482 / WH Hong
(Example)
Multiple Regression Analysis: Inference
Test whether house price assessments are rational (cont'd)
• Unrestricted regression y = β 0 + β1 x1 + β 2 x2 + β3 x3 + β 4 x4 + u ,
SSRur = 1.822
• Restricted regression
[ y − x1 ] = β0 + u ,
y = β 0 + x1 + u
=>
¾ Regression of
[ y − x1 ]
SSRr = 1.880
on a constant
• Test statistics F=
( SSRr − SSRur ) / q = (1.880 − 1.822 ) / 4 ≈ 0.661 SSRur / ( n − k − 1) 1.822 / ( 88 − 4 − 1)
F ~ F4,83,0.05 = c0.05 = 2.50
¾ H 0 cannot be rejected since
F = 0.661 < c0.05 = 2.50
• Note that the R-squared form of the test statistic CANNOT be applied here.
29