Lecture 1: Simple Linear Regression

Simple linear regression Lecture 1: Simple Linear Regression Maochao Xu Department of Mathematics Illinois State University [email protected] Simple ...

Author: Dorthy Floyd

1 downloads 0 Views 327KB Size

Report

Download PDF

Recommend Documents

Simple Linear Regression Models

Chapter 2: Simple Linear Regression

CHAPTER 6 SIMPLE LINEAR REGRESSION

Simple linear regression. An Introduction to R for the Geosciences: Regression. Least-squares. Simple linear regression

Simple Linear Regression for the Advertising Data

Lecture 2: The Classical Linear Regression Model

Lecture 6: The Method of Maximum Likelihood for Simple Linear Regression

REGRESSION Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Parameter

Stata version 13. Simple and Multiple Linear Regression. February 2015

Lecture 15: Vertical Datums and a little Linear Regression

Lecture 5. Statistical Inference and the Classical Linear Regression Model

LINEAR REGRESSION MODELS W4315

Statistics Review & Linear Regression

Current status linear regression

Non-Linear Regression

Lecture 12 Logistic regression

Lecture 12 Nonparametric Regression

Section 4: Multiple Linear Regression

model selection in linear regression

Linear Regression with Multiple Regressors

NMSA407 Linear Regression. Course Notes

Varying-Coefficient Functional Linear Regression

Simple linear regression

Lecture 1: Simple Linear Regression

Maochao Xu Department of Mathematics Illinois State University [email protected]

Simple linear regression

Review basic concepts in inference Assume X1 , . . . , Xn are i.i.d. Normal random variables with mean µ and variance σ 2 . That is, Xi ∼ N(µ, σ 2 ), i = 1, . . . , n. • Sample mean

¯ = X

n X

Xi /n.

i=1

• Sample variance

S2 =

n X

¯ Xi − X

2

/(n − 1).

i=1

• Unbiased estimates

¯ ) = µ, E(X

E(S 2 ) = σ 2 .

• Standard normal distribution

¯ −µ X √ ∼ N(0, 1). σ/ n • t distribution

¯ −µ X √ ∼ tn−1 . S/ n

Simple linear regression

• χ2 distribution

Let Z1 , . . . , Zn be n i.i.d. standard normal random variables. Then, the chi-square random variable is defined as χ2n = Z12 + . . . + Zn2 . It is known that E(χ2n ) = n, Recall that

Var(χ2n ) = 2n.

(n − 1)S 2 ∼ χ2n−1 . σ2

• F distribution (how to use it?)

Fm,n = where χ2m and χ2n are independent.

χ2m /m , χ2n /n

Simple linear regression

Estimation methods • Maximum Likelihood estimator (MLE)

Suppose Y1 , . . . , Yn are random variables with density function f (y ; θ), where θ is an unknown parameter. Given independent observations y1 , . . . , yn , the likelihood function can be expressed as L(θ) =

n Y

f (yi ; θ).

i=1

The MLE can be obtained by maximizing L(θ) or log(L(θ)). That is, θˆ = argθ max L(θ). • Least square estimator (LSE)

The sample observations are assumed to be of the form Yi = fi (θ) + i ,

i = 1, . . . , n,

where fi (θ) is a known function of the parameter θ and the i are random variables. The LSE can be obtained by minimizing the following function Q(θ) =

n X

[Yi − fi (θ)]2 .

i=1

That is, θˆ = argθ min Q(θ).

Simple linear regression

Simple linear regression Example 1: Mother and daughter’s heights During the period 1893-1898, E. S. Pearson organized the collection of n = 1375 heights of mothers in the United Kingdom under the age of 65 and one of their adult daughters over the age of 18.

Questions: 1. How are mother’s height related to daughter’s height? 2. Can daughter’s height be predicted from mother’s height?

Simple linear regression

Example 2: Atmospheric pressure and the boiling point A Scottish physicist named James D. Forbes discussed a series of experiments that he had done concerning the relationship between atmospheric pressure and the boiling point of water. He collected 17 points from different locations.

Questions: 1. How are pressure and boiling point related? 2. Can pressure be predicted from boiling point and how well?

Simple linear regression

Example 3: House values What is the fair market value of a house?

Questions: 1. If a house have 8 rooms, what is the price? 2. What is the price range for a house? 3. What is an average price for a room?

Simple linear regression

Model and notations 1. Bivariate data (X , Y ): (x1 , y1 , ), . . . , (xn , yn ) 2. X : Independent variable 3. Y : Dependent variable (response) 4. Regression equation: Y = β0 + β1 X + , where β0 is the intercept, and β1 is the slope, which represents number of units increase in Y if X increases one unit. For ith trail, Yi = β0 + β1 Xi + i . 5. Assumptions on i : • Independence:

Cov(i , j ) = 0,

i 6= j.

• Homoscedasticity (constant variance): 2

Var(i |X = x) = σ . • Zero mean:

E[i |X = xi ] = 0.

Simple linear regression

Ordinary least square estimation: OLS

Minimize residual sum of squares. 1. OLS: RSS(β0 , β1 ) =

n X

[yi − (β0 + β1 xi )]2 .

i=1

Find β0 and β1 to minimize RSS(β0 , β1 ) (how?): (b0 , b1 ) (or (βˆ0 , βˆ1 )). 2. Residual sum of squares: RSS = RSS(b0 , b1 ) =

n X

[yi − (b0 + b1 xi )]2 = ei2 .

i=1

3. Residual: ei = yi − (b0 + b1 xi ),

i = 1, . . . , n.

4. Fitted value: yˆi = b0 + b1 xi .

Simple linear regression

Ordinary least square estimation: OLS

Pn b1 =

¯

i=1 (Xi − X )(Yi − Pn ¯ 2 i=1 (Xi − X )

¯ − b1 X ¯. b0 = Y

¯) Y

.

Simple linear regression

Estimating σ 2 The regression equations give the mean of the group. What is the standard deviation of this group? That is, we have to estimate σ. Yi = β0 + β1 Xi + i ,

Var(i ) = σ 2 .

A nature estimator for σ 2 is n n n 1X 1X 2 1X (i − E[])2 = i = (yi − β0 − β1 xi )2 . n n n i=1

i=1

i=1

Since, β0 and β1 are unknown, we use the estimates: n n 1X 2 1X ei = (Yi − b0 − b1 Xi )2 = SSE/n. n n i=1

i=1

ˆi2 are no longer independent, we use the following Since βˆ0 , βˆ1 are estimated, e estimator: SSE s2 = = MSE, n−2 where n − 2 is the degree of freedom (df) (why?), and MSE stands for error mean square or residual mean square. Generally, df = number of cases-number of parameters.

Simple linear regression

Properties of OLS The estimates b0 and b1 can both be written as linear combinations of Y1 , . . . , Yn . (prove!) X b1 = ki Yi , where

¯ Xi − X . ¯ 2 i=1 (Xi − X )

ki = Pn Some interesting properties of ki : P • ki = 0; P • ki Xi = 1; P 2 P ¯ )2 . • k = 1/ (Xi − X i

Properties of b1 : 1. Mean-Unbiased E(b1 ) = β1 . 2. Variance Var(b1 ) =

X

ki2 Var(Yi ) = P

3. Estimated variance S 2 (b1 ) = P which is an unbiased estimate of Var(b1 ).

σ2 . ¯ )2 (Xi − X

MSE , ¯ )2 (Xi − X

Simple linear regression

Properties of OLS The fitted value at x = x¯ is ˆ |X = x¯ ) = b0 + b1 x¯ = y¯ − b1 x¯ + b1 x¯ = y¯ . E(Y so the fitted line must pass through the point (x¯ , y¯ ), intuitively the center of the data. So, we have ¯ − b1 X ¯. b0 = Y Under the linear model assumptions, the least squares estimates are unbiased: E(b0 ) = β0 . Further, Var(b0 ) = σ 2

¯2 1 X +P ¯ )2 n (Xi − X

! .

Similarly, the estimated variance is S 2 (b0 ) = MSE

¯2 1 X +P ¯ )2 n (Xi − X

!

The two estimates are correlated, with covariance Cov(b0 , b1 ) = −σ 2 P

¯ X . ¯ )2 (Xi − X

(Question: What happens if the data becomes more spread?)

.

Simple linear regression

Properties

• The sum of residuals is zero:

n X

ei = 0.

i=1

ˆi : • The sum of observed values Yi equals to the sum of the fitted values Y n X

Yi =

n X

i=1

ˆi . Y

i=1

• The sum of the weighted residuals is zero: n X

Xi ei = 0.

i=1

• Also

n X i=1

ˆi ei = 0. Y

Simple linear regression

Gauss-Markov theorem The OLS estimates are the best unbiased linear estimators (BLUE). For example, if X b1 = ki Yi , E(b1 ) = β1 , then for any other unbiased linear estimate b1∗ (prove!) Var(b1∗ ) ≥ Var(b1 ). If we further assume that i ∼ N(0, σ 2 ), then the ols estimates are also maximum likelihood estimates (MLE). Under the normal assumption, ! σ2 , βˆ1 ∼ N β1 , P ¯ )2 (Xi − X βˆ0 ∼ N

β0 , σ

2

1 x¯ 2 +P ¯ )2 n (Xi − X

!!

These quantities will be used to construct confidence intervals, testing, and other statistical inferences.

Simple linear regression

Confidence intervals and tests Linear model assumptions: i ∼ N(0, σ 2 ).

yi = β0 + β1 xi + i , Under this model:

b0 − β0 ∼ tn−2 , S(b0 )

b1 − β1 ∼ tn−2 . S(b1 )

Hence, 100(1 − α)% confidence interval for β0 is b0 ± tn−2,α/2 S(b0 ); 100(1 − α)% confidence interval for β1 is b1 ± tn−2,α/2 S(b1 ). A hypothesis test of

H0 : β0 = β0∗

vs

Ha : β0 6= β0∗ ,

is obtained by computing t=

b0 − β0∗

Then, reject H0 if |t| > tn−2,α/2 .

S(b0 )

∼ tn−2 ,

under H0 .

Simple linear regression

Confidence intervals and tests Similarly, a hypothesis test of H0 : β1 = β1∗

vs

Ha : β1 6= β1∗ ,

is obtained by computing t=

b1 − β1∗ S(b1 )

∼ tn−2 ,

under H0 .

Then, reject H0 if |t| > tn−2,α/2 . p-values can be computed as p = 2P(T > t). Considering test problem: H0 : β1 = 0 we have t2 =

b1 S(b1 )

vs 2 =

Ha : β1 6= 0, b12 2 S (b1 )

∼ F.

So the square of a t statistic with d df is equivalent to an F-statistic with (1, d) df.

Simple linear regression

Interval estimation of EYh A common objective in regression analysis is to estimate the mean for one or more probability distribution of Y . Let Xh denote the level of X for which we wish to estimate the mean response. Then, by the regression equation we have ˆh = b0 + b1 Xh . Y ˆh Sampling distribution of Y ˆ is a Normal random variable. (Why?) • Y • Mean ˆh = β0 + β1 Xh . E Y • Variance

" ˆh ) = σ 2 Var(Y

¯ )2 1 (Xh − X +P ¯ )2 n (Xi − X

# .

• Estimated variance

" ˆh ) = MSE S (Y 2

• t-distribution

¯ )2 (Xh − X 1 +P ¯ )2 n (Xi − X

ˆh − E Y ˆh Y

∼ tn−2 . ˆh ) S(Y Hence, the 100(1 − α)% confidence interval is ˆh ± tα/2;n−2 S(Y ˆh ). Y

# .

Simple linear regression

Prediction In prediction we have a new case, possibly a future value, not one used to estimate parameters, with observed value of the predictor X∗ . We would like to know the value Y∗ , the corresponding response, but it has not yet been observed. We can use the estimated mean function to predict it. Y∗ = β0 + β1 X∗ + ∗ ,

Var(∗ ) = σ 2 .

A nature estimation is ˜∗ = b0 + b1 X∗ . Y The variance of prediction error ˜∗ − Y∗ ) = σ 2 + σ 2 Var(pred) = Var(Y

¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X

The estimated standard error of prediction at X∗ √ S(pred) =

MSE

1+

¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X

!1/2 .

! .

Simple linear regression

Hence, ˜∗ Y∗ − Y ∼ tn−2 . S(pred) So, the prediction interval for Y∗ is ˜∗ ± S(pred)tα/2,n−2 . Y