Lecture 1: Simple Linear Regression

Simple linear regression Lecture 1: Simple Linear Regression Maochao Xu Department of Mathematics Illinois State University [email protected] Simple ...
Author: Dorthy Floyd
1 downloads 0 Views 327KB Size
Simple linear regression

Lecture 1: Simple Linear Regression

Maochao Xu Department of Mathematics Illinois State University [email protected]

Simple linear regression

Review basic concepts in inference Assume X1 , . . . , Xn are i.i.d. Normal random variables with mean µ and variance σ 2 . That is, Xi ∼ N(µ, σ 2 ), i = 1, . . . , n. • Sample mean

¯ = X

n X

Xi /n.

i=1

• Sample variance

S2 =

n X

¯ Xi − X

2

/(n − 1).

i=1

• Unbiased estimates

¯ ) = µ, E(X

E(S 2 ) = σ 2 .

• Standard normal distribution

¯ −µ X √ ∼ N(0, 1). σ/ n • t distribution

¯ −µ X √ ∼ tn−1 . S/ n

Simple linear regression

• χ2 distribution

Let Z1 , . . . , Zn be n i.i.d. standard normal random variables. Then, the chi-square random variable is defined as χ2n = Z12 + . . . + Zn2 . It is known that E(χ2n ) = n, Recall that

Var(χ2n ) = 2n.

(n − 1)S 2 ∼ χ2n−1 . σ2

• F distribution (how to use it?)

Fm,n = where χ2m and χ2n are independent.

χ2m /m , χ2n /n

Simple linear regression

Estimation methods • Maximum Likelihood estimator (MLE)

Suppose Y1 , . . . , Yn are random variables with density function f (y ; θ), where θ is an unknown parameter. Given independent observations y1 , . . . , yn , the likelihood function can be expressed as L(θ) =

n Y

f (yi ; θ).

i=1

The MLE can be obtained by maximizing L(θ) or log(L(θ)). That is, θˆ = argθ max L(θ). • Least square estimator (LSE)

The sample observations are assumed to be of the form Yi = fi (θ) + i ,

i = 1, . . . , n,

where fi (θ) is a known function of the parameter θ and the i are random variables. The LSE can be obtained by minimizing the following function Q(θ) =

n X

[Yi − fi (θ)]2 .

i=1

That is, θˆ = argθ min Q(θ).

Simple linear regression

Simple linear regression Example 1: Mother and daughter’s heights During the period 1893-1898, E. S. Pearson organized the collection of n = 1375 heights of mothers in the United Kingdom under the age of 65 and one of their adult daughters over the age of 18.

Questions: 1. How are mother’s height related to daughter’s height? 2. Can daughter’s height be predicted from mother’s height?

Simple linear regression

Example 2: Atmospheric pressure and the boiling point A Scottish physicist named James D. Forbes discussed a series of experiments that he had done concerning the relationship between atmospheric pressure and the boiling point of water. He collected 17 points from different locations.

Questions: 1. How are pressure and boiling point related? 2. Can pressure be predicted from boiling point and how well?

Simple linear regression

Example 3: House values What is the fair market value of a house?

Questions: 1. If a house have 8 rooms, what is the price? 2. What is the price range for a house? 3. What is an average price for a room?

Simple linear regression

Model and notations 1. Bivariate data (X , Y ): (x1 , y1 , ), . . . , (xn , yn ) 2. X : Independent variable 3. Y : Dependent variable (response) 4. Regression equation: Y = β0 + β1 X + , where β0 is the intercept, and β1 is the slope, which represents number of units increase in Y if X increases one unit. For ith trail, Yi = β0 + β1 Xi + i . 5. Assumptions on i : • Independence:

Cov(i , j ) = 0,

i 6= j.

• Homoscedasticity (constant variance): 2

Var(i |X = x) = σ . • Zero mean:

E[i |X = xi ] = 0.

Simple linear regression

Ordinary least square estimation: OLS

Minimize residual sum of squares. 1. OLS: RSS(β0 , β1 ) =

n X

[yi − (β0 + β1 xi )]2 .

i=1

Find β0 and β1 to minimize RSS(β0 , β1 ) (how?): (b0 , b1 ) (or (βˆ0 , βˆ1 )). 2. Residual sum of squares: RSS = RSS(b0 , b1 ) =

n X

[yi − (b0 + b1 xi )]2 = ei2 .

i=1

3. Residual: ei = yi − (b0 + b1 xi ),

i = 1, . . . , n.

4. Fitted value: yˆi = b0 + b1 xi .

Simple linear regression

Ordinary least square estimation: OLS

Pn b1 =

¯

i=1 (Xi − X )(Yi − Pn ¯ 2 i=1 (Xi − X )

¯ − b1 X ¯. b0 = Y

¯) Y

.

Simple linear regression

Estimating σ 2 The regression equations give the mean of the group. What is the standard deviation of this group? That is, we have to estimate σ. Yi = β0 + β1 Xi + i ,

Var(i ) = σ 2 .

A nature estimator for σ 2 is n n n 1X 1X 2 1X (i − E[])2 = i = (yi − β0 − β1 xi )2 . n n n i=1

i=1

i=1

Since, β0 and β1 are unknown, we use the estimates: n n 1X 2 1X ei = (Yi − b0 − b1 Xi )2 = SSE/n. n n i=1

i=1

ˆi2 are no longer independent, we use the following Since βˆ0 , βˆ1 are estimated, e estimator: SSE s2 = = MSE, n−2 where n − 2 is the degree of freedom (df) (why?), and MSE stands for error mean square or residual mean square. Generally, df = number of cases-number of parameters.

Simple linear regression

Properties of OLS The estimates b0 and b1 can both be written as linear combinations of Y1 , . . . , Yn . (prove!) X b1 = ki Yi , where

¯ Xi − X . ¯ 2 i=1 (Xi − X )

ki = Pn Some interesting properties of ki : P • ki = 0; P • ki Xi = 1; P 2 P ¯ )2 . • k = 1/ (Xi − X i

Properties of b1 : 1. Mean-Unbiased E(b1 ) = β1 . 2. Variance Var(b1 ) =

X

ki2 Var(Yi ) = P

3. Estimated variance S 2 (b1 ) = P which is an unbiased estimate of Var(b1 ).

σ2 . ¯ )2 (Xi − X

MSE , ¯ )2 (Xi − X

Simple linear regression

Properties of OLS The fitted value at x = x¯ is ˆ |X = x¯ ) = b0 + b1 x¯ = y¯ − b1 x¯ + b1 x¯ = y¯ . E(Y so the fitted line must pass through the point (x¯ , y¯ ), intuitively the center of the data. So, we have ¯ − b1 X ¯. b0 = Y Under the linear model assumptions, the least squares estimates are unbiased: E(b0 ) = β0 . Further, Var(b0 ) = σ 2

¯2 1 X +P ¯ )2 n (Xi − X

! .

Similarly, the estimated variance is S 2 (b0 ) = MSE

¯2 1 X +P ¯ )2 n (Xi − X

!

The two estimates are correlated, with covariance Cov(b0 , b1 ) = −σ 2 P

¯ X . ¯ )2 (Xi − X

(Question: What happens if the data becomes more spread?)

.

Simple linear regression

Properties

• The sum of residuals is zero:

n X

ei = 0.

i=1

ˆi : • The sum of observed values Yi equals to the sum of the fitted values Y n X

Yi =

n X

i=1

ˆi . Y

i=1

• The sum of the weighted residuals is zero: n X

Xi ei = 0.

i=1

• Also

n X i=1

ˆi ei = 0. Y

Simple linear regression

Gauss-Markov theorem The OLS estimates are the best unbiased linear estimators (BLUE). For example, if X b1 = ki Yi , E(b1 ) = β1 , then for any other unbiased linear estimate b1∗ (prove!) Var(b1∗ ) ≥ Var(b1 ). If we further assume that i ∼ N(0, σ 2 ), then the ols estimates are also maximum likelihood estimates (MLE). Under the normal assumption, ! σ2 , βˆ1 ∼ N β1 , P ¯ )2 (Xi − X βˆ0 ∼ N

β0 , σ

2

1 x¯ 2 +P ¯ )2 n (Xi − X

!!

These quantities will be used to construct confidence intervals, testing, and other statistical inferences.

Simple linear regression

Confidence intervals and tests Linear model assumptions: i ∼ N(0, σ 2 ).

yi = β0 + β1 xi + i , Under this model:

b0 − β0 ∼ tn−2 , S(b0 )

b1 − β1 ∼ tn−2 . S(b1 )

Hence, 100(1 − α)% confidence interval for β0 is b0 ± tn−2,α/2 S(b0 ); 100(1 − α)% confidence interval for β1 is b1 ± tn−2,α/2 S(b1 ). A hypothesis test of

H0 : β0 = β0∗

vs

Ha : β0 6= β0∗ ,

is obtained by computing t=

b0 − β0∗

Then, reject H0 if |t| > tn−2,α/2 .

S(b0 )

∼ tn−2 ,

under H0 .

Simple linear regression

Confidence intervals and tests Similarly, a hypothesis test of H0 : β1 = β1∗

vs

Ha : β1 6= β1∗ ,

is obtained by computing t=

b1 − β1∗ S(b1 )

∼ tn−2 ,

under H0 .

Then, reject H0 if |t| > tn−2,α/2 . p-values can be computed as p = 2P(T > t). Considering test problem: H0 : β1 = 0 we have t2 =



b1 S(b1 )

vs 2 =

Ha : β1 6= 0, b12 2 S (b1 )

∼ F.

So the square of a t statistic with d df is equivalent to an F-statistic with (1, d) df.

Simple linear regression

Interval estimation of EYh A common objective in regression analysis is to estimate the mean for one or more probability distribution of Y . Let Xh denote the level of X for which we wish to estimate the mean response. Then, by the regression equation we have ˆh = b0 + b1 Xh . Y ˆh Sampling distribution of Y ˆ is a Normal random variable. (Why?) • Y • Mean   ˆh = β0 + β1 Xh . E Y • Variance

" ˆh ) = σ 2 Var(Y

¯ )2 1 (Xh − X +P ¯ )2 n (Xi − X

# .

• Estimated variance

" ˆh ) = MSE S (Y 2

• t-distribution

¯ )2 (Xh − X 1 +P ¯ )2 n (Xi − X

  ˆh − E Y ˆh Y

∼ tn−2 . ˆh ) S(Y Hence, the 100(1 − α)% confidence interval is ˆh ± tα/2;n−2 S(Y ˆh ). Y

# .

Simple linear regression

Prediction In prediction we have a new case, possibly a future value, not one used to estimate parameters, with observed value of the predictor X∗ . We would like to know the value Y∗ , the corresponding response, but it has not yet been observed. We can use the estimated mean function to predict it. Y∗ = β0 + β1 X∗ + ∗ ,

Var(∗ ) = σ 2 .

A nature estimation is ˜∗ = b0 + b1 X∗ . Y The variance of prediction error ˜∗ − Y∗ ) = σ 2 + σ 2 Var(pred) = Var(Y

¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X

The estimated standard error of prediction at X∗ √ S(pred) =

MSE

1+

¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X

!1/2 .

! .

Simple linear regression

Hence, ˜∗ Y∗ − Y ∼ tn−2 . S(pred) So, the prediction interval for Y∗ is ˜∗ ± S(pred)tα/2,n−2 . Y