Chapter 6: Statistical inference for regression

Chapter 6: Statistical inference for regression Sampling distributions Assumptions. . . . . . . Sampling distributions Mean of A, B. . . . . . Varian...
Author: Drusilla Berry
17 downloads 1 Views 103KB Size
Chapter 6: Statistical inference for regression

Sampling distributions Assumptions. . . . . . . Sampling distributions Mean of A, B. . . . . . Variance of A, B . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

2 3 4 5 6

Gauss-Markov theorem 7 Gauss-Markov theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Maximum likelihood 9 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Normal distribution 11 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 σǫ2 is unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Confidence intervals and testing 14 Confidence intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Multiple regression 17 Multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Variance of Bj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Confidence intervals and testing Simple regression . . . . . . . . . . Incremental F-test . . . . . . . . . . Example: Ombibus F-test . . . . . Omnibus F-test. . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

20 21 22 23 24

Confounders 25 Empirical vs structural relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Omitting a confounder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Measurement error in indep. Measurement error . . . . . . Model . . . . . . . . . . . . . . . Effects . . . . . . . . . . . . . . .

variables 28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1

What to do about it?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2

2 / 32

Sampling distributions Assumptions For one independent variable: ■ Y = α + βX + ǫ ■

Linearity, E(ǫi ) = 0



Constant variance, Var(ǫi ) = σǫ2



Normality: ǫi ∼ N (0, σǫ2 )



Independence: ǫi and ǫj are independent for i 6= j

The X-values are fixed, or, if random, are independent of the statistical errors Note that the independent variable(s) do not need to be normal! We will now see what each assumption is needed for. ■

3 / 32

Sampling distributions ■

Imagine an infinite population



We get a random sample, and compute estimates A, B and SE



If we get another sample of the same population, we will get different estimates



Hence, A, B, and SE are random variables, and they have a sampling distribution



We will study their sampling distributions. Based on these distributions, we can develop hypothesis tests.



Show simulation sampling distribution 4 / 32

Mean of A, B ■

Suppose the independent variables are fixed: x1 , . . . , xn .



Let mi =

x P xi −¯ . (xj −¯ x)2

Then B=

X

mi Yi ,

X 1X A = Y¯ − B x ¯= Yi − x ¯ mi Yi . n ■

A and B are linear estimates, in the sense that they are linear functions of the observations Yi .



Unbiased estimators: E(B) = β and E(A) = α. Assumptions needed: linearity (derivation on board). 5 / 32

3

Variance of A, B ■

Sample variances: P P σǫ2 x2i σǫ2 x2i P = Var(A) = 2 n (xi − x ¯)2 n(n − 1)SX

σ2 σǫ2 Var(B) = P ǫ = 2 (xi − x ¯)2 (n − 1)SX

Assumptions needed: linearity(?), constant variance, independence (derivation on board). ■

What are the effects on Var(A) and Var(B) of: sample size n, error variance σǫ2 , spread of independent variables, center of independent variables? 6 / 32

7 / 32

Gauss-Markov theorem Gauss-Markov theorem ■

Of all linear and unbiased estimators, the least squares estimator is most efficient: it has the smallest variance. Assumptions needed: linearity, constant variance, independence.



Under the extra assumption of normality, the least squares estimator is most efficient among all unbiased estimators (so not just among the linear unbiased estimators).



When the assumption of normality is not met, there may be other estimators that are much more efficient than the least squares estimator. 8 / 32

9 / 32

Maximum likelihood Maximum likelihood ■

If all assumptions hold, then the least squares coefficients A and B are the maximum likelihood estimates of α and β.



See exercise 6.5 in the book. 10 / 32

4

11 / 32

Normal distribution Normal distribution ■



If all assumptions hold, then the least squares coefficients are normally distributed: P   σǫ2 x2i A ∼ N mean = α, Var = P , n (xi − x ¯)2   σǫ2 B ∼ N mean = β, Var = P . (xi − x ¯)2

Even if the errors are not normally distributed, then the distributions of A and B are still approximately normal for larger sample sizes, because of the central limit theorem. 12 / 32

σǫ2 is unknown ■

The formulas for the variances of A and B contains σǫ2 , which we do not know.



2 = However, we have an estimate: SE



This gives estimates for the sample variances of A and B: P 2 2 SE xi \ P Var(A) = n (xi − x ¯)2 2 2 SE \ = P SE Var(B) = 2 (xi − x ¯)2 (n − 1)SX



\ = SE(A)

q

\ and SE(B) \ = Var(A),

Ei2 n−2 .

P

q

\ Var(B). 13 / 32

5

14 / 32

Confidence intervals and testing Confidence intervals ■

Because of the extra uncertainty that comes from estimating σǫ , we need to use a t-distribution instead of the normal distribution.



For simple linear regression we use a t-distribution with n − 2 degrees of freedom: the sample size minus the number of estimated parameters.



100(1 − α)% confidence interval: tcrit = qt(1 − alpha/2, df = n − 2) \ α = A ± tcrit · SE(A) \ β = B ± tcrit · SE(B)



What does a confidence interval mean? See simulations. 15 / 32

Testing ■

Test-statistic to test if intercept equals a certain value α0 : ◆

Null hypothesis H0 : α = α0

Alternative hypothesis Ha : α 6= α0 \ ◆ t0 = (A − α0 )/SE(A). ◆



Test-statistic to test if slope equals a certain value β0 : ◆

Null hypothesis: H0 : β = β0

Alternative hypothesis: Ha : β 6= β0 . \ ◆ t0 = (B − β0 )/SE(B).





Both statistics have a t-distribution with n − 2 degrees of freedom when the null hypothesis is true.



p − value: Probability to observe a value as or more extreme than t0 , when the null hypothesis is true.



We usually reject the null hypothesis when p < 0.05. 16 / 32

6

17 / 32

Multiple regression Multiple regression ■

Y = α + β1 X1 + · · · + βk Xk + ǫ



Assumptions:





Linearity



Constant variance



Normality



Independence



Fixed X’s, or X independent of ǫ

Then, the estimators A, B1 ,. . . ,Bk are: ◆

linear functions of the data



unbiased



maximally efficient among unbiased estimators



maximum-likelihood estimators



normally distributed 18 / 32

Variance of Bj ■

Var(Bj ) =

1 σǫ2 P · ¯j )2 (xij − x 1 − Rj2



Rj2 is the multiple correlation coefficient for the regression of Xj on all the other independent variables.



The first factor is called a variance inflation factor. This factor is large when the independent variable Xj is strongly correlated with the other independent variables ⇒ Strong correlation between the independent variables is problematic.



The second factor is similar as before (but σǫ2 is smaller). 19 / 32

7

20 / 32

Confidence intervals and testing Simple regression ■

2 = For confidence intervals and tests, we use SE





P

Ei2 /(n − k − 1) as estimate for σǫ2 .

S \j ) = q 1 pP E SE(B ¯j )2 (xij − x 1 − Rj2 We use a t-distribution with n − k − 1 degrees of freedom (degrees of freedom = sample size number estimated parameters). 21 / 32

Incremental F-test ■

Null model (m0): Y = α + βq+1 Xq+1 + · · · + βk Xk + ǫ.



Full regression model (m1): Y = α + β1 X1 + · · · + βk Xk + ǫ.



Compute RSS0 and df0 for the null model.



Compute RSS1 and df1 for the full model.



The null model is a special case of the full model ⇒ RSS1 ≤ RSS0 . Also df1 ≤ df0 , why?



The incremental sum of squares is RSS0 − RSS1 . We reject the null hypothesis if this is large.



The F-statistic is F =

(RSS0 − RSS1 )/(df0 − df1 ) RSS1 /df1

and has an F distribution with df0 − df1 and df1 degrees of freedom. 22 / 32

Example: Ombibus F-test ■

Null model: Y = α + ǫ



Full model: Y = α + β1 X1 + · · · + βk Xk + ǫ



Null hypothesis: β1 = · · · = βk = 0



Alternative hypothesis Ha : βj 6= 0 for at least one j ∈ {1, . . . , k}. Source





Analysis of variance table:

SumSquares

df

Regression

T SS − RSS

k

Residuals

RSS

n−k−1

Total

T SS

n−1

MeanSquare

F

RegM S = T SS−RSS k RSS 2 RM S = n−k−1 = SE

RegM S RM S

F = RegM S/RM S and has a F distribution with k and n − k − 1 degrees of freedom. We reject the null hypothesis for large values of F . 23 / 32

8

Omnibus F-test ■

Omnibus F-test is used as protection against multiple testing.



In R: see the last line of the summary of a linear model fit.



If there are many independent variables in the model, we do many tests. Then the chance to falsely reject a null hypothesis is much larger than 0.05.



Procedure: ◆

First check omnibus F-test.



If F-test non-significant, then stop.



If F-test is significant, then look at individual p-values of the effects. 24 / 32

25 / 32

Confounders Empirical vs structural relationships ■

Empirical: description, prediction



Structural: mechanism for determination of dependent variable, causal analysis



Empirical:





Regression coefficients do not represent an effect on the dependent variable.



It is no problem to leave out an independent variable.

Structural: ◆

Regression coefficients do represent an effect on the dependent variable.



Omitting confounders causes bias. 26 / 32

Omitting a confounder ■

True model: Y = α + β1 X1 + β2 X2 + ǫ.



Fitted model: Y = α + β1′ X1 + ǫ′ . The effect of X2 is absorbed in ǫ′ .



If X1 and X2 were correlated, then X1 and ǫ′ are correlated. When fitting least squares, we assume X1 and ǫ′ are uncorrelated.



β1 = σ1Y /σ12 − β2 σ12 /σ12



β1′ = σ1Y /σ12 = β1 + β2 σ12 /σ12 = β1 + bias.



Bias is nonzero if: β2 6= 0 and σ12 6= 0, i.e. if:





X2 has an effect on Y



and X1 and X2 are correlated

Difference between common prior cause and intervening variable (see causal diagrams on blackboard). 27 / 32 9

Measurement error in indep. variables

28 / 32

Measurement error ■

The regression model accommodates error in the dependent variable - is put in ǫ.



Independent variables are assumed to be measured without error.



What happens if there is error in the independent variables? 29 / 32

Model ■

Y = α + β1 τ + β2 X2 + ǫ



X2 is measured without error, τ is measured with error - we observed X1 = τ + δ.



Measurement errors δ are assumed to be random, mean zero, and uncorrelated with τ , ǫ and X2 .



We find:



β1 =

σY 1 σ22 − σ12 σY 2 2 − σ2 σ2 σ12 σ22 − σ12 δ 2

β2 =

β1 σ12 σδ2 σY 2 σ12 − σ12 σY 1 − 2 2 2 2 2 2 . σ1 σ2 − σ12 σ1 σ2 − σ12

Ignoring measurement error gives estimates β1′ and β2′ as above, but with σδ2 = 0. 30 / 32

Effects ■

Estimate for β1 is driven to zero. Ignoring measurement error attenuates the effect of the independent variable.



lim β2′ =

σδ2 →∞

σY 2 σ12 = σY 2 /σ22 . σ12 σ22

This means that ignoring measurement error drives the estimate for β2 towards the estimate that we would get if we regress Y on X2 alone. Measurement error in X1 makes it an imperfect statistical control. 31 / 32

10

What to do about it? ■

There are some statistical methods to deal with measurement error. These are beyond the scope of this course, and involve assumptions that are hard to justify in practice.



Remember: ◆

Measurement error can invalidate regression analysis.



If substantial measurement errors are likely, don’t view regression analysis as definitive.



Try to get data without measurement errors. It is worth investing in this. 32 / 32

11

Suggest Documents