Simple linear regression
Lecture 1: Simple Linear Regression
Maochao Xu Department of Mathematics Illinois State University
[email protected]
Simple linear regression
Review basic concepts in inference Assume X1 , . . . , Xn are i.i.d. Normal random variables with mean µ and variance σ 2 . That is, Xi ∼ N(µ, σ 2 ), i = 1, . . . , n. • Sample mean
¯ = X
n X
Xi /n.
i=1
• Sample variance
S2 =
n X
¯ Xi − X
2
/(n − 1).
i=1
• Unbiased estimates
¯ ) = µ, E(X
E(S 2 ) = σ 2 .
• Standard normal distribution
¯ −µ X √ ∼ N(0, 1). σ/ n • t distribution
¯ −µ X √ ∼ tn−1 . S/ n
Simple linear regression
• χ2 distribution
Let Z1 , . . . , Zn be n i.i.d. standard normal random variables. Then, the chi-square random variable is defined as χ2n = Z12 + . . . + Zn2 . It is known that E(χ2n ) = n, Recall that
Var(χ2n ) = 2n.
(n − 1)S 2 ∼ χ2n−1 . σ2
• F distribution (how to use it?)
Fm,n = where χ2m and χ2n are independent.
χ2m /m , χ2n /n
Simple linear regression
Estimation methods • Maximum Likelihood estimator (MLE)
Suppose Y1 , . . . , Yn are random variables with density function f (y ; θ), where θ is an unknown parameter. Given independent observations y1 , . . . , yn , the likelihood function can be expressed as L(θ) =
n Y
f (yi ; θ).
i=1
The MLE can be obtained by maximizing L(θ) or log(L(θ)). That is, θˆ = argθ max L(θ). • Least square estimator (LSE)
The sample observations are assumed to be of the form Yi = fi (θ) + i ,
i = 1, . . . , n,
where fi (θ) is a known function of the parameter θ and the i are random variables. The LSE can be obtained by minimizing the following function Q(θ) =
n X
[Yi − fi (θ)]2 .
i=1
That is, θˆ = argθ min Q(θ).
Simple linear regression
Simple linear regression Example 1: Mother and daughter’s heights During the period 1893-1898, E. S. Pearson organized the collection of n = 1375 heights of mothers in the United Kingdom under the age of 65 and one of their adult daughters over the age of 18.
Questions: 1. How are mother’s height related to daughter’s height? 2. Can daughter’s height be predicted from mother’s height?
Simple linear regression
Example 2: Atmospheric pressure and the boiling point A Scottish physicist named James D. Forbes discussed a series of experiments that he had done concerning the relationship between atmospheric pressure and the boiling point of water. He collected 17 points from different locations.
Questions: 1. How are pressure and boiling point related? 2. Can pressure be predicted from boiling point and how well?
Simple linear regression
Example 3: House values What is the fair market value of a house?
Questions: 1. If a house have 8 rooms, what is the price? 2. What is the price range for a house? 3. What is an average price for a room?
Simple linear regression
Model and notations 1. Bivariate data (X , Y ): (x1 , y1 , ), . . . , (xn , yn ) 2. X : Independent variable 3. Y : Dependent variable (response) 4. Regression equation: Y = β0 + β1 X + , where β0 is the intercept, and β1 is the slope, which represents number of units increase in Y if X increases one unit. For ith trail, Yi = β0 + β1 Xi + i . 5. Assumptions on i : • Independence:
Cov(i , j ) = 0,
i 6= j.
• Homoscedasticity (constant variance): 2
Var(i |X = x) = σ . • Zero mean:
E[i |X = xi ] = 0.
Simple linear regression
Ordinary least square estimation: OLS
Minimize residual sum of squares. 1. OLS: RSS(β0 , β1 ) =
n X
[yi − (β0 + β1 xi )]2 .
i=1
Find β0 and β1 to minimize RSS(β0 , β1 ) (how?): (b0 , b1 ) (or (βˆ0 , βˆ1 )). 2. Residual sum of squares: RSS = RSS(b0 , b1 ) =
n X
[yi − (b0 + b1 xi )]2 = ei2 .
i=1
3. Residual: ei = yi − (b0 + b1 xi ),
i = 1, . . . , n.
4. Fitted value: yˆi = b0 + b1 xi .
Simple linear regression
Ordinary least square estimation: OLS
Pn b1 =
¯
i=1 (Xi − X )(Yi − Pn ¯ 2 i=1 (Xi − X )
¯ − b1 X ¯. b0 = Y
¯) Y
.
Simple linear regression
Estimating σ 2 The regression equations give the mean of the group. What is the standard deviation of this group? That is, we have to estimate σ. Yi = β0 + β1 Xi + i ,
Var(i ) = σ 2 .
A nature estimator for σ 2 is n n n 1X 1X 2 1X (i − E[])2 = i = (yi − β0 − β1 xi )2 . n n n i=1
i=1
i=1
Since, β0 and β1 are unknown, we use the estimates: n n 1X 2 1X ei = (Yi − b0 − b1 Xi )2 = SSE/n. n n i=1
i=1
ˆi2 are no longer independent, we use the following Since βˆ0 , βˆ1 are estimated, e estimator: SSE s2 = = MSE, n−2 where n − 2 is the degree of freedom (df) (why?), and MSE stands for error mean square or residual mean square. Generally, df = number of cases-number of parameters.
Simple linear regression
Properties of OLS The estimates b0 and b1 can both be written as linear combinations of Y1 , . . . , Yn . (prove!) X b1 = ki Yi , where
¯ Xi − X . ¯ 2 i=1 (Xi − X )
ki = Pn Some interesting properties of ki : P • ki = 0; P • ki Xi = 1; P 2 P ¯ )2 . • k = 1/ (Xi − X i
Properties of b1 : 1. Mean-Unbiased E(b1 ) = β1 . 2. Variance Var(b1 ) =
X
ki2 Var(Yi ) = P
3. Estimated variance S 2 (b1 ) = P which is an unbiased estimate of Var(b1 ).
σ2 . ¯ )2 (Xi − X
MSE , ¯ )2 (Xi − X
Simple linear regression
Properties of OLS The fitted value at x = x¯ is ˆ |X = x¯ ) = b0 + b1 x¯ = y¯ − b1 x¯ + b1 x¯ = y¯ . E(Y so the fitted line must pass through the point (x¯ , y¯ ), intuitively the center of the data. So, we have ¯ − b1 X ¯. b0 = Y Under the linear model assumptions, the least squares estimates are unbiased: E(b0 ) = β0 . Further, Var(b0 ) = σ 2
¯2 1 X +P ¯ )2 n (Xi − X
! .
Similarly, the estimated variance is S 2 (b0 ) = MSE
¯2 1 X +P ¯ )2 n (Xi − X
!
The two estimates are correlated, with covariance Cov(b0 , b1 ) = −σ 2 P
¯ X . ¯ )2 (Xi − X
(Question: What happens if the data becomes more spread?)
.
Simple linear regression
Properties
• The sum of residuals is zero:
n X
ei = 0.
i=1
ˆi : • The sum of observed values Yi equals to the sum of the fitted values Y n X
Yi =
n X
i=1
ˆi . Y
i=1
• The sum of the weighted residuals is zero: n X
Xi ei = 0.
i=1
• Also
n X i=1
ˆi ei = 0. Y
Simple linear regression
Gauss-Markov theorem The OLS estimates are the best unbiased linear estimators (BLUE). For example, if X b1 = ki Yi , E(b1 ) = β1 , then for any other unbiased linear estimate b1∗ (prove!) Var(b1∗ ) ≥ Var(b1 ). If we further assume that i ∼ N(0, σ 2 ), then the ols estimates are also maximum likelihood estimates (MLE). Under the normal assumption, ! σ2 , βˆ1 ∼ N β1 , P ¯ )2 (Xi − X βˆ0 ∼ N
β0 , σ
2
1 x¯ 2 +P ¯ )2 n (Xi − X
!!
These quantities will be used to construct confidence intervals, testing, and other statistical inferences.
Simple linear regression
Confidence intervals and tests Linear model assumptions: i ∼ N(0, σ 2 ).
yi = β0 + β1 xi + i , Under this model:
b0 − β0 ∼ tn−2 , S(b0 )
b1 − β1 ∼ tn−2 . S(b1 )
Hence, 100(1 − α)% confidence interval for β0 is b0 ± tn−2,α/2 S(b0 ); 100(1 − α)% confidence interval for β1 is b1 ± tn−2,α/2 S(b1 ). A hypothesis test of
H0 : β0 = β0∗
vs
Ha : β0 6= β0∗ ,
is obtained by computing t=
b0 − β0∗
Then, reject H0 if |t| > tn−2,α/2 .
S(b0 )
∼ tn−2 ,
under H0 .
Simple linear regression
Confidence intervals and tests Similarly, a hypothesis test of H0 : β1 = β1∗
vs
Ha : β1 6= β1∗ ,
is obtained by computing t=
b1 − β1∗ S(b1 )
∼ tn−2 ,
under H0 .
Then, reject H0 if |t| > tn−2,α/2 . p-values can be computed as p = 2P(T > t). Considering test problem: H0 : β1 = 0 we have t2 =
b1 S(b1 )
vs 2 =
Ha : β1 6= 0, b12 2 S (b1 )
∼ F.
So the square of a t statistic with d df is equivalent to an F-statistic with (1, d) df.
Simple linear regression
Interval estimation of EYh A common objective in regression analysis is to estimate the mean for one or more probability distribution of Y . Let Xh denote the level of X for which we wish to estimate the mean response. Then, by the regression equation we have ˆh = b0 + b1 Xh . Y ˆh Sampling distribution of Y ˆ is a Normal random variable. (Why?) • Y • Mean ˆh = β0 + β1 Xh . E Y • Variance
" ˆh ) = σ 2 Var(Y
¯ )2 1 (Xh − X +P ¯ )2 n (Xi − X
# .
• Estimated variance
" ˆh ) = MSE S (Y 2
• t-distribution
¯ )2 (Xh − X 1 +P ¯ )2 n (Xi − X
ˆh − E Y ˆh Y
∼ tn−2 . ˆh ) S(Y Hence, the 100(1 − α)% confidence interval is ˆh ± tα/2;n−2 S(Y ˆh ). Y
# .
Simple linear regression
Prediction In prediction we have a new case, possibly a future value, not one used to estimate parameters, with observed value of the predictor X∗ . We would like to know the value Y∗ , the corresponding response, but it has not yet been observed. We can use the estimated mean function to predict it. Y∗ = β0 + β1 X∗ + ∗ ,
Var(∗ ) = σ 2 .
A nature estimation is ˜∗ = b0 + b1 X∗ . Y The variance of prediction error ˜∗ − Y∗ ) = σ 2 + σ 2 Var(pred) = Var(Y
¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X
The estimated standard error of prediction at X∗ √ S(pred) =
MSE
1+
¯ )2 1 (X∗ − X +P ¯ )2 n (Xi − X
!1/2 .
! .
Simple linear regression
Hence, ˜∗ Y∗ − Y ∼ tn−2 . S(pred) So, the prediction interval for Y∗ is ˜∗ ± S(pred)tα/2,n−2 . Y