Theory of Point Estimation

Theory of Point Estimation Dr. Phillip YAM 2012/2013 Spring Semester Reference: Chapter 6 of “Probability and Statistical Inference” by Hogg and Tan...
Author: Jack Turner
14 downloads 2 Views 519KB Size
Theory of Point Estimation Dr. Phillip YAM

2012/2013 Spring Semester

Reference: Chapter 6 of “Probability and Statistical Inference” by Hogg and Tanis.

Section 6.1 Point estimation

I

Estimating characteristics of the distribution from the corresponding ones from the sample;

I

Sample mean x¯ can be thought of as an estimate of the distribution mean µ

I

Sample mean s 2 can be used as an estimate of the distribution variance σ 2

I

What makes an estimate good? Can we say the closeness of the estimate to the true value?

Section 6.1 Point estimation

I

The functional form of the p.d.f is known but depends on an unknown parameter θ.

I

The parameter space Ω.

I

Example: f (x; θ) = 1/θexp(−x/θ) for a positive number θ.

I

The experimenter needs a point estimate for the parameters

Section 6.1 Point estimation I

Repeat the experiment n independent times, observe the sample X1 , X2 , ..., Xn .

I

Estimate the parameter by using the observations x1 , x2 , ..., xn .

I

Estimator for θ is a function (statistic) u(X1 , X2 , ..., Xn ). Suppose that X follows Bernoulli distribution with success probability p, f (x; p) = p x (1 − p)1−x . Pr (X1 = x1 , X2 = x2 , ...Xn = xn ) =

n Y i=1 P

=p

p xi (1 − p)1−xi xi

(1 − p)n−

P

xi

I

Find the value p that maximizes it.

I

p value most likely to have produced these sample values. The joint p.d.f. is called the likelihood function.

.

Section 6.1 Point estimation

The or gradient of the function:PL0 (p) = P P P firstPderivative Plikelihood x −1 n− x x i i i ( xi )p (1 − p) − (n − xi )p (1 − p)n− xi −1 . P  P P P xi n − xi xi n− xi − =0. p (1 − p) p 1−p Pn xi p = i=1 = x¯ . n 00 ItPcan be shown that L (¯ x ) < 0, so that L(¯ x ) is a maximum. ( ni=1 Xi )/n = X , is called the maximum likelihood estimator.

Section 6.1 Point estimation When finding a maximum likelihood estimator, it is often easier to find the value of the parameter that maximizes the natural logarithm of the likelihood function rather than the value of the parameter that maximizes the likelihood function itself. ! ! n n X X ln L(p) = xi ln p + n − xi ln(1 − p) . i=1

d[ln L(p)] = dp

n X i=1

!  1 xi + p

i=1

n−

n X i=1

! xi

−1 1−p

 =0,

Section 6.1 Point estimation The joint p.d.f. of X1 , X2 , · · · , Xn , namely, L(θ1 , θ2 , · · · , θm ) = f (x1 ; θ1 , · · · , θm )f (x2 ; θ1 , · · · , θm ) · · · f (xn ; θ1 , · · · , θm ),

(θ1 , θ2 , · · · , θm ) ∈ Ω ,

When regarded as a function of θ1 , θ2 , · · · , θm , is called the likelihood function. θˆ1 = u1 (X1 , · · · , Xn ) , θˆ2 = u2 (X1 , · · · , Xn ) , .. . ˆ θm = um (X1 , · · · , Xn )

Section 6.1 Point estimation Example 6.1-1

Let X1 , X2 , · · · , Xn be a random sample from the exponential distribution with p.d.f. 1 f (x; θ) = e −x/θ , θ

0 < x < ∞,

θ ∈ Ω = {θ : 0 < θ < ∞} .

The likelihood function is given by L(θ) = L(θ; x1 , x2 , · · · , xn )      1 −x2 /θ 1 −xn /θ 1 −x1 /θ e e ··· e = θ θ θ  Pn  − i=1 xi 1 = n exp , 0 0. It is difficult to maximize L(θ1 , θ2 ; x1 , . . . , xn ) =

h

1 Γ(θ1 )θ2θ1

in

n  X  (x1 x2 · · · xn )θ1 −1 exp − xi /θ2

with respect to θ1 and θ2 , owing to the presence of the gamma function Γ(θ1 ).

i=1

Section 6.1 Point estimation I

(Moment method) Equating the sample moment(s) to the theoretical moment(s).

I

For gamma distribution: θ1 θ2 = X , 2

X θe1 = V

I

θ1 θ22 = V ,

V and θe2 = . X

θe1 and θe2 , are respective estimators of θ1 and θ2 found by the method of moments. P The sum Mk = ni=1 Xik /n is the kth moment of the sample, k = 1, 2, 3, . . .. The method of moments can be described as follows: Equate E (X k ) to Mk .

Section 6.1 Point estimation Example 6.1-6

Let the distribution of X be N(µ, σ 2 ). Then E (X ) = µ

and E (X 2 ) = σ 2 + µ2 .

For a random sample of size n, the first two moments are given by n

1X m1 = xi n i=1

n

1X 2 and m2 = xi . n i=1

We set m1 = E (X ) and m2 = E (X 2 ) and solve for µ and σ 2 . That is, n n 1X 1X 2 xi = µ and xi = σ 2 + µ2 . n n i=1

i=1

The first equation yields x as the estimate of µ.

Section 6.1 Point estimation

Replacing µ2 with x 2 in the second equation and solving for σ 2 , we obtain n n X 1X 2 (xi − x)2 xi − x 2 = =v n n i=1

σ2.

i=1

as the solution of Thus, the method-of-moments estimators for 2 f2 = V , which are the same as the µ and σ are µ ˜ = X and σ maximum likelihood estimators. Of course, µ ˜ = X is unbiased f 2 whereas σ = V is biased.

Section 6.2 Confidence intervals for means

I

A random sample X1 , X2 , . . . , Xn from a normal distribution N(µ, σ 2 ), and σ 2 is known;

I

X is N(µ, σ 2 /n)   X −µ √ ≤ zα/2 = 1 − α . P − zα/2 ≤ σ/ n X −µ √ ≤ zα/2 , −zα/2 ≤ σ/ n  σ   α  X + zα/2 √ ≥ µ ≥ X − zα/2 √ . n n

Section 6.2 Confidence intervals for means

I

 σ   σ i h ≤ µ ≤ X + zα/2 √ =1−α . P X − zα/2 √ n n I

The probability that the random interval h  σ   σ i X − zα/2 √ , X + zα/2 √ n n includes the unknown mean µ is 1 − α. A 100(1 − α)% confidence interval for the unknown mean µ. A shorter confidence interval indicates that we have more credence in x as an estimate of µ.

Section 6.2 Confidence intervals for means If we cannot assume that the distribution from which the sample arose is normal. By the central limit theorem, provided that n is √ large enough, the ratio (X − µ)/(σ/ n) has the approximate normal distribution N(0, 1) when the underlying distribution is not normal.   X −µ √ ≤ zα/2 ≈ 1 − α . P − zα/2 ≤ σ/ n and

h  σ   σ i , x + zα/2 √ x − zα/2 √ n n

is an approximate 100(1 − α)% confidence interval for µ. When the underlying distribution is unimodal (has only one mode) and continuous, the approximation is usually quite good even for small n, such as n = 5. In almost all cases, an n of at least 30 is usually adequate.

Section 6.2 Confidence intervals for means I

T =

X −µ √ S/ n

has a t distribution with r = n − 1 degrees of freedom. I

X has a Chi-square distribution with r degrees of freedom if X have a gamma distribution with θ = 2 and α = r /2. f (x) =

I

1 x r /2−1 e −x/2 . r /2 Γ(r /2)2

(Theorem 5.5-2) Let X1 , X2 , · · · , Xn be observations of a random sample of size n from the normal distribution ¯ and the sample variance N(µ, σ 2 ). Then the sample mean X 2 S 2 are independent, and (n−1)S follows a Chi-square σ2 distribution with degree of freedom n − 1.

Section 6.2 Confidence intervals for means

I

(Theorem 5-4.2) X1 , X2 , · · · , Xn are Chi-square random variables with respective degrees of freedom r1 , r2 , · · · , rn . Then X1 + X2 + · · · + Xn is Chi-square random variable with degrees of freedom r1 + r2 + · · · + rn

I

(Proof) Consider the product of moment generating functions of each Chi-square random variables.

I

(Theorem 3-6.2) X is a normal random variable with mean µ and variance σ 2 . Then Z 2 = (X − µ)2 /σ 2 is Chi-square random variable with 1 degree of freedom. √ √ (Proof) Consider Pr (Z 2 ≤ v ) = Pr (− v ≤ Z ≤ v )

I

Section 6.2 Confidence intervals for means I

(Theorem 5-5.3) Let Z T =p U/r where Z is the standard normal random variable with mean zero and variance 1, U is another independent random variable following a Chi-square distribution with degree of freedom r. Then T has a t-distribution with p.d.f 1 Γ((r + 1)/2) f (t) = √ 2 πr Γ(r /2) (1 + t /r )(r +1)/2

I

(Proof) Consider p p F (t) = Pr (Z / U/r ≤ t) = Pr (Z ≤ U/r t)

Section 6.2 Confidence intervals for means

Select tα/2 (n − 1) so that P[T ≥ tα/2 (n − 1)] = α/2. i X −µ √ ≤ tα/2 (n − 1) 1 − α = P − tα/2 (n − 1) ≤ S/ n h  S   S i = P X − tα/2 (n − 1) √ ≤ µ ≤ X + tα/2 (n − 1) √ . n n  s   s i h , x + tα/2 (n − 1) √ x − tα/2 (n − 1) √ n n h

is a 100(1 − α)% confidence interval for µ.

Section 6.2 Confidence intervals for means

I

Not able to assume that the underlying distribution is normal, approximate confidence intervals for µ can still be constructed with the formula X −µ √ , T = S/ n which now only has an approximate t distribution, it works well if the underlying distribution is symmetric, unimodal, and of the continuous type.

I

Assume the variance is known. Once X is observed to be √ equal to x, it follows that [x − zα (σ/ n), ∞) is a 100(1 − α)% one-sided confidence interval for µ.

Section 6.3 Confidence intervals for the difference of 2 means I

Two independent sample X1 , X2 , · · · , Xn and Y1 , Y2 , · · · , Ym with respective distributions N(µX , σX2 ), N(µY , σY2 ).

I

Both variance are assumed to be known. ¯ − Y¯ is Distribution of the difference W = X 2 2 N(µX − µY , σX /n + σY /m)

I

I

[¯ x − y¯ − zα/2 σW , x¯ − y¯ + zα/2 σW ] serves as a 100(1 − α)% confidence interval for the difference of 2 means.

I

If the sample sizes are large, while both variance are unknown, we can replace the population q variance with the sample variances, and x¯ − y¯ ± zα/2 sx2 /n + sy2 /m serves as an approximate 100(1 − α)% confidence interval for the difference of 2 means.

Section 6.3 Confidence intervals for the difference of 2 means I

Small sample (< 30) yet with a common variance

I

−Y −(µx −µY ) is N(0, 1) Z = X√ 2 2

I

U=

I

T =

¯

¯

σ /n+σ /m

(n−1)SX2 (m−1)SY2 + σ2 σ2 √ Z has a U/(n+m−2)

is χ2 (n + m − 2). t-distribution with n + m − 2 degrees

of freedom q

I I

I

(n−1)S 2 +(m−1)S 2

X Y Set t0 = tα/2 (n + m − 2) and SP = n+m−2 p x¯ − y¯ ± t0 sP 1/n + 1/m is a 100(1 − α)% confidence interval for the difference of 2 means. ¯ ¯ Y) Notice that when both n and m are large, T ≈ X√−Y2 −(µx −µ , 2

SX /m+SY /n

i.e. each variance is divided by a wrong sample size.

Section 6.3 Confidence intervals for the difference of 2 means

I I

Small sample with different variances c2 (1 − c)2 1 = + r n−1 m−1

I

c= I

sx2 /n sx2 /n + sy2 /m

q x¯ − y¯ ± tα/2 (r ) sx2 /n + sy2 /m is a 100(1 − α)% confidence interval for the difference of 2 means.

Section 6.3 Confidence intervals for the difference of 2 means

I

X and Y are dependent, e.g. weight before and after participating in a diet-and-exercise program.

I

Let (X1 , Y1 ), (X2 , Y2 ), · · · , (Xn , Yn ) be n pairs of dependent measurements.

I

2 ). Di ≡ Xi − Yi form a random sample from N(µD , σD

I

Use T =

¯ D−µ √D SD / n

as test statistics.

Section 6.4 Confidence intervals for variances I I

The distribution of (n − 1)S 2 /σ 2 is χ2 (n − 1) (n − 1)S 2 P a≤ ≤b σ2 

I

I

I

 =1−α .

Selecting a and b so that a = χ21−α/2 (n − 1) and b = χ2α/2 (n − 1)  1 b a ≤ 2 ≤ 1−α=P (n − 1)S 2 σ (n − 1)S 2   (n − 1)S 2 (n − 1)S 2 2 =P ≤σ ≤ . b a 

a 100(1 − α)% confidence interval for σ, the standard deviation, is given by # "r # "r r r (n − 1)s 2 (n − 1)s 2 n−1 n−1 , = s, s . b a b a

Section 6.4 Confidence intervals for variances

I

1 (Example 5-2.4) F = VU/r /r2 where U and V are independent chi-square random variables with respective degrees of freedom r1 and r2 . Then we said F has an F-distribution with the same degrees of freedom and its p.d.f. is

f (w ) = I

(r1 /r2 )r1 /2 Γ[(r1 + r2 )/2]w r1 /2−1 Γ(r1 /2)Γ(r2 /2)[1 + (r1 w /r2 )](r1 +r2 )/2

(Proof) First write down the joint density, and then use the definition F (w ) = Pr (

U/r1 r1 ≤ w ) = Pr (U ≤ Vw ) V /r2 r2

Section 6.4 Confidence intervals for variances I

I

I

I

Two independent sample X1 , X2 , · · · , Xn and Y1 , Y2 , · · · , Ym with respective distributions N(µX , σX2 ), N(µY , σY2 ). (n − 1)SX2 /σX2 and (m − 1)SY2 /σY2 are independent chi-square variables with (n − 1) and (m − 1) degrees of freedom SY2 (m − 1)SY2 σ 2 (m − 1) σY2 = F = Y (n − 1)SX2 SX2 σX2 (n − 1) σX2 F has an F distribution with r1 = m − 1 and r2 = n − 1 degrees of freedom   SY2 /σY2 1−α=P c ≤ 2 2 ≤d Sx /σx   2 σ2 S2 S . = P c X2 ≤ X2 ≤ d X2 SY σY SY

Section 6.4 Confidence intervals for variances I

I

  S 2 /σ 2 1 − α = P c ≤ Y2 Y2 ≤ d Sx /σx  2  SX σX2 SX2 =P c 2 ≤ 2 ≤d 2 . SY σY SY If sx2 and sy2 are the observed values of SX2 and SY2 , respectively, then   sx2 sx2 1 , F (m − 1, n − 1) 2 Fα/2 (n − 1, m − 1) sy2 α/2 sy is a 100(1 − α) % confidence interval for σX2 /σY2 .

I

These intervals are generally not too useful because they are often very wide. The confidence coefficients are not very accurate if we deviate much from underlying normal distributions.

Section 6.5 Confidence intervals for proportions I

Let Y equal the frequency of measurements in the interval out of the n observations

I

Y has the binomial distribution b(n, p)

I

To determine the accuracy of the relative frequency Y /n as an estimator of p

I

(Y /n) − p Y − np p =p np(1 − p) p(1 − p)/n has an approximate normal distribution N(0, 1).

I

" P −zα/2

# (Y /n) − p ≤p ≤ zα/2 ≈ 1 − α p(1 − p)/n

Section 6.5 Confidence intervals for proportions

I

(Method I) Replacing p with Y /n in p(1 − p)/n in the endpoints # " r r (y /n)(1 − y /n) y (y /n)(1 − y /n) y − zα/2 , + zα/2 n n n n serves as an approximate 100(1 − α)% confidence interval for p

Section 6.5 Confidence intervals for proportions I

(Method II) |Y /n − p| p ≤ zα/2 p(1 − p)/n is equivalent to  H(p) =

Y −p n

2 −

2 p(1 − p) zα/2

n

≤0

Letting pˆ = Y /n and z0 = zα/2     z02 z02 2 H(p) = 1 + p − 2ˆ p+ p + pˆ2 . n n q pˆ + z02 /(2n) ± z0 pˆ(1 − pˆ)/n + z02 /(4n2 ) 1 + z02 /n

Section 6.5 Confidence intervals for proportions I I

Statistical inference about the difference p1 − p2 Y1 /n1 − Y2 /n2 must have mean p1 − p2 and variance p1 (1 − p1 ) p2 (1 − p2 ) + . n1 n2

I

I

I

(Y /n ) − (Y2 /n2 ) − (p1 − p2 ) p 1 1 p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 has an approximate normal distribution N(0, 1). For large n1 and n2 , replace p1 and p2 in the denominator of this ratio by Y1 /n1 and Y2 /n2 , respectively An approximate 100(1 − α)% confidence interval s y2 y1 (y1 /n1 )(1 − y1 /n1 ) (y2 /n2 )(1 − y2 /n2 ) − ± zα/2 + n1 n2 n1 n2 for the unknown difference p1 − p2

Section 6.6 Sample size I

I I

I I I

How large should the sample size be to estimate a mean? In general, the smaller the variance, the smaller is the sample size needed to achieve to given degree of accuracy (Example 6-6.1) A mathematics department wishes to evaluate a new method of teaching calculus with a computer. Aim to find the sample size n such that we are fairly confident that x¯ ± 1 contains the unknown test mean µ. From past experience, it is believed that the standard deviation associated with this type of test is about 15 ¯ is approximately N(µ, σ 2 /n) X √ x¯ ± 1.96(15/ n) will serve as an approximate 95% confidence interval for µ   15 1.96 √ =1 n √ n = 29.4 and thus n ≈ 864.36 , or n = 865 because n must be an integer

Section 6.6 Sample size I

Less ambitious plan: x¯ ± 2 would be a satisfactory 80% one   15 =2 1.282 √ n or, equivalently, √

I

n = 9.615

so that

n ≈ 92.4 .

n must be an integer, we prefer to use 93 instead. In general, to find the 100(1 − α)% confidence interval for √ µ, x¯ ± zα/2 (σ/ n), to be no longer than that given by x¯ ± ε, set up equation: zα/2 σ α ε = √ , where Φ(zα/2 ) = 1 − . 2 n That is,

zα2 σ 2 ε2 where it is assumed that σ 2 is known. n=

Section 6.6 Sample size I

To find the required sample size to estimate the proportion p.

I

The point estimate of p is pˆ = y /n

I

an approximate 1 − α confidence interval for p is r pˆ(1 − pˆ) . pˆ ± zα/2 n

I

pˆ is unknown before the experiment is run, we cannot use the value of pˆ in our determination of n.

I

No matter what value p takes between 0 and 1, it is always true that p ∗ (1 − p ∗ ) ≤ 1/4 n=

And now pick n =

2 p ∗ (1 − p ∗ ) zα/2

2 zα/2

4ε2

ε2



2 zα/2

4ε2

.

Section 6.6 Sample size I

Finite population size

I

Let N1 individuals in a population of size N have characteristic C. Let p = N1 /N.

I

Take a sample of size n without replacement, the number of observations X with the characteristic C, has a hypergeometric distribution. Recall (see Examples 2-2.5 and 2-3.5 for details) the mean and the variance of X are: µ = n( σ2 = n

N1 ) = np N

N1 N1 N − n N −n (1 − )( ) = np(1 − p)( ) N N N −1 N −1

Section 6.6 Sample size I

µ X )= =p n n 2 X σ p(1 − p) N − n Var ( ) = 2 = ( ) n n n N −1 An approximate 1 − α confidence interval for p is r pˆ(1 − pˆ) N − n pˆ ± zα/2 ( ) n N −1 where pˆ = x/n Let  be the maximum error of the estimate of p r p(1 − p) N − n  = zα/2 ( ) n N −1 m n= 1 + m−1 N which is an increasing function of m, and we substitute E(

I

I

m=

2 p ∗ (1−p ∗ ) zα/2

2

and we pick p ∗ = 1/2.

Section 6.7 Simple regression I

For any random variable Y , even though we cannot predict its future observed value Y = y with certainty, we can still estimate its mean.

I

Suppose that E (Y ) is a function of another observed variable x. E (Y ) = µ(x) is assumed to be of a given form, such as linear, quadratic, or exponential: µ(x) could be assumed to be equal to α + βx, α + βx + γx 2 , or αe βx

I

Firstly, consider the case that E (Y ) = µ(x) is a linear function

I

Given (x1 , y1 ), (x2 , y2 ), · · · , (xn , yn ), we like to fit a straight line to the set of data Yi = α1 + βxi + εi , where εi , for i = 1, 2, · · · , n, are independent and N(0, σ 2 )

Section 6.7 Simple regression

I

Y1 , Y2 , · · · , Yn are mutually independent normal variables with respective means α + β(xi − x¯), i = 1, 2, · · · , n, and unknown variance σ 2 , where α = α1 + β¯ x

Section 6.7 Simple regression I

The likelihood function n Y

  [yi − α − β(xi − x¯)]2 exp − 2σ 2 2πσ 2 i=1  n/2  Pn  ¯)]2 1 i=1 [yi − α − β(xi − x = exp − . 2πσ 2 2σ 2

L(α, β, σ 2 ) =

I



1

To maximize L(α, β, σ 2 ) or, equivalently, to minimize n P

n − ln L(α, β, σ 2 ) = ln(2πσ 2 ) + 2

[yi − α − β(xi − x¯)]2

i=1

2σ 2

we must select α and β to minimize H(α, β) =

n X [yi − α − β(xi − x¯)]2 . i=1

,

Section 6.7 Simple regression I

Selecting α and β so that the sum of the squares is minimized means that we are fitting the straight line to the data by the ”Method of Least Squares” n

X ∂H(α, β) =2 [yi − α − β(xi − x¯)](−1) ∂α i=1

n

X ∂H(α, β) =2 [yi − α − β(xi − x¯)[−(xi − x¯)] . ∂β i=1

n X

yi − nα − β

i=1

n X (xi − x¯) = 0 . i=1

n X

yi − nα = 0 ;

i=1

α ˆ=Y .

Section 6.7 Simple regression I

The equation ∂H(α, β)/∂β = 0 yields n X

(yi − Y )(xi − x¯) − β

i=1

(xi − x¯)2 = 0

i=1 n P

βˆ =

n X

n P

(Yi − Y )(xi − x¯)

i=1 n P

= (xi −

x¯)2

i=1

Yi (xi − x¯)

i=1 n P

. (xi −

x¯)2

i=1 n P

∂[− ln L(α, β, σ 2 )] n = 2− 2 ∂(σ ) 2σ

[yi − α − β(xi − x¯)]2

i=1

2(σ 2 )2

.

n

X c2 = 1 ˆ i − x¯)]2 [Yi − α ˆ − β(x σ n i=1

I

The maximum likelihood estimate of σ 2 is then the sum of the squares of the residuals divided by n.

Section 6.7 Simple regression I

Both α ˆ and βˆ are linear functions of Y1 , Y2 , · · · , Yn , and hence have normal distributions with respective means and variances: ! n n 1X 1X Yi = E (Yi ) = α E (ˆ α) = E n n i=1 i=1   n 2 X 1 σ2 Var(α ˆ) = Var(Yi ) = . n n i=1

n P

n P

(xi − x¯)E (Yi ) i=1 ˆ P = E (β) = n ¯)2 i=1 (xi − x

(xi − x¯)[α + β(xi − x¯)]

i=1 n P

(xi − x¯)2

i=1

α =

n P

(xi − x¯) + β

i=1

n P

(xi − x¯)2

i=1 n P

(xi −

i=1

x¯)2



Section 6.7 Simple regression

ˆ = Var(β)

n X i=1

"

x − x¯ Pn i ¯)2 j=1 (xj − x

n P

#2 Var(Yi ) =

(xi − x¯)2

i=1  n P

2 σ

2

(xi − x¯)2

i=1

= σ2

n X (xi − x¯)2 . i=1

n X

2

[Yi − α − β(xi − x¯)] =

i=1

n X

{(ˆ α − α) + (βˆ − β)(xi − x¯)

i=1

ˆ i − x¯)]}2 + [Yi − α ˆ − β(x n X 2 2 ˆ = n(α ˆ − α) + (β − β) (xi − x¯)2 i=1

+

n X i=1

ˆ i − x¯)]2 . [Yi − α ˆ − β(x

Section 6.7 Simple regression I

I

Yi , α ˆ , and βˆ have normal distributions, and hence each of [Yi − α − β(xi − x¯)]2 (α ˆ − α)2 (βˆ − β)2   , , and σ2 σ2 σ2 Pn n ¯)2 i=1 (xi − x has a chi-square distribution with one degree of freedom. Therefore, n P [Yi − α − β(xi − x¯)]2 i=1

σ2 χ2 (n).

I

is c2 are mutually independent ˆ and σ It can be shown that α ˆ , β, (see Hogg, McKean and Craig [2005]). Therefore, n P ˆ i − x¯)]2 [Yi − α ˆ − β(x c2 nσ i=1 = ≥0. σ2 σ2 is χ2 (n − 2).

Section 6.7 Simple regression I

Therefore, we deduce that ! s n ˆ−β P β (xi − x¯)2 σ i=1 βˆ − β s T1 = =s c2 c2 nσ nσ P (n − 2) ni=1 (xi − x¯)2 σ 2 (n − 2) has a t distribution with n − 2 degrees of freedom.

I

Let

s η= h

(n − 2)

c2 nσ Pn

i=1 (xi

− x¯)2

βˆ − tγ/2 (n − 2)η, βˆ + tγ/2 (n − 2)η

i

is a 100(1 − γ)% confidence interval for β, where

Section 6.7 Simple regression

I

Similarly,



n(ˆ α − α) α ˆ−α =s T2 = s σ c2 c2 nσ σ σ 2 (n − 2) n−2 I

has a t distribution with n − 2 degrees of freedom. c2 /σ 2 has a chi-square distribution with n − 2 The fact that nσ degrees of freedom can be used to make inferences about the variances σ 2

Section 6.8 More regression I

ˆ − x¯) is a point estimate for the For a given x, Yˆ = α ˆ + β(x mean of Y . Since α ˆ and βˆ are normally and independently ˆ distributed, Y has a normal distribution. ˆ − x¯)] E (Yˆ ) = E [ˆ α + β(x = α + β(x − x¯) ˆ − x¯)] Var(Yˆ ) = Var[ˆ α + β(x σ2 σ2 + Pn (x − x¯)2 2 n (x − x ¯ ) i=1 i   (x − x¯)2 1 + Pn . = σ2 n ¯ )2 i=1 (xi − x =

Section 6.8 More regression I

c2 , Since α ˆ and βˆ are independent of σ ˆ − x¯) − [α + β(x − x¯)] α ˆ + β(x s 1 (x − x¯)2 σ + Pn n ¯ )2 i=1 (xi − x s , T = c2 nσ (n − 2)σ 2

I

has a t distribution with r = n − 2 degrees of freedom. Let s s c2 nσ 1 (x − x¯)2 + Pn . c= n−2 n ¯ )2 i=1 (xi − x The endpoints for a 100(1 − γ)% confidence interval for µ(x) = α + β(x − x¯) are ˆ − x¯) ± ctγ/2 (n − 2) . α ˆ + β(x

Section 6.8 More regression I

A prediction interval for Yn+1 when x = xn+1 : ˆ n+1 − x¯) W = Yn+1 − α ˆ − β(x is a linear combination of normally and independently distributed random variables, so W has a normal distribution. ˆ n+1 − x¯)] E (W ) = E [Yn+1 − α ˆ − β(x = α + β(xn+1 − x¯) − α − β(xn+1 − x¯) = 0 .

I

Since Yn+1 , α ˆ and βˆ are independent, the variance of W is σ2 σ2 + n (xn+1 − x¯)2 P n (xi − x¯)2 i=1  

Var(W ) = σ 2 +

 1 (xn+1 − x¯)2   . = σ2  1 + + n   P n 2 (xi − x¯) i=1

Section 6.8 More regression I

c2 , Yn+1 , α ˆ , and βˆ are independent of σ ˆ n+1 − x¯) Y −α ˆ − β(x sn+1 1 (xn+1 − x¯)2 σ 1 + + Pn n ¯)2 i=1 (xi − x s T = c2 nσ (n − 2)σ 2 has a t distribution with r = n − 2 degrees of freedom.

I

Let

s d=

c2 nσ n−2

s 1+

(xn+1 − x¯)2 1 + Pn . n ¯)2 i=1 (xi − x

The endpoints for a 100(1 − γ)% prediction interval for Yn+1 are ˆ n+1 − x¯) ± dtγ/2 (n − 2) . α ˆ + β(x

The end of Chapter 6