Homework 1 Solutions Heather Royer

1. Wages for workers in a particular industry average $11.90 per hour and the standard deviation is $0.40. If the wages are assumed to be normally distributed: First of all, let’s review some of the facts relating to probabilities. If you don’t understand them, it may be helpful to draw a picture. Suppose X is a continuous random variable. Note some of these rules are special cases of others. Then, the following are true: 1. Pr(X = a) = 0 since X is a continuous random variable. 2. Pr(X ≥ a) + Pr(X ≤ a) = 1 for some constant a. This is true because X has to be less than a or greater than or equal to a and because of rule 1.

3.

Pr(X ≤ −a)+Pr(−a ≤ X ≤ 0)+Pr(0 ≤ X ≤ a)+Pr(a ≤ X ) =1 for some positive constant a.

4.

Pr(b ≤ X ≤ a) = Pr(X ≤ a) − Pr(X ≤ b) for some constants a and b where a>b.

5.

This is true because X has to be either less than -a, between -a and 0, between 0 and a, or greater than a.

Pr(X ≤ a) = Pr(0 ≤ X ≤ a) + Pr(X ≤ 0) for some

positive constant a (special case of 4 where

b=0).

6.

Pr(X ≤ −a) = Pr(X ≤ 0) − Pr(−a ≤ X ≤ 0) for some positive constant a

(special case of 4 where

a=0 and b=-a).

Second of all, let’s review some of the facts relating to the standard normal distribution. Suppose Z is a standard normal random variable. Then, the following are true: 7.

Pr(Z ≤ 0) = 0.5 and Pr(Z ≥ 0) = 0.5 because the standard normal distribution is symmetric around its zero mean.

8.

Pr(Z ≥ a) = Pr(Z ≤ −a) for some constant a.

This is true because the standard normal distribution

is symmetric around 0.

9.

2 Pr(0 ≤ Z ≤ a) + 2 Pr(a ≤ Z ) = 1 ⇒ Pr(0 ≤ Z ≤ a) + Pr(a ≤ Z ) = 0.5 for some positive a because of rule 8 and rule 3.

10.

2 Pr(−a ≤ Z ≤ 0) + 2 Pr(Z ≤ −a) = 1 ⇒ Pr(−a ≤ Z ≤ 0) + Pr(Z ≤ −a) = 0.5 for some positive a because of rule 8 and rule 3.

Third of all, let’s review what the Z table handed out in class gives us and what the normdist function in EXCEL gives us. You should get used to using the Z table handed out in class because that is the table that you’ll have available during an exam. The Z table handed out in class gives us

Pr(0 < Z < a) Z table from class The normdist function in EXCEL gives us

Pr(Z ≤ a) normdist function in EXCEL and Z table in back of textbook 1

a) what percentage of workers receive wages between $10.80 and $12.40? Let Y be a random variable for wages. Then, we want to know

Pr($10.80 ≤ Y ≤ $12.40) Using rule 3, we can simplify this expression as follows

Pr($10.80 ≤ Y ≤ $12.40) = Pr(Y ≤ $12.40) − Pr(Y ≤ $10.80) In order to compute these probabilities though, we need to tranform Y (a normal random variable) into a standard normal random variable because we don’t have tables for the cumulative distribution function for all normal random variables. If we did, there would be a lot of tables because the normal distribution is defined by its mean and variance, and there are an infinite number of values those parameters can take on. To convert Y to a standard random variable Z, we subtract off the population mean and divide by the population standard deviation. That is,

Z = Y σ− µ

Note

Pr(Y ≤ b) = Pr(Z ≤ b −σ µ ) In this case µ = $11.90 and σ = $0.40, so therefore, Pr($10.80 ≤ Y ≤ $12.40) = Pr(Z ≤ $12.40$0−.40$11.90 ) − Pr(Z ≤ $10.80$0−.40$11.90 ) = Pr(Z ≤ 1.25) − Pr(Z ≤ −2.75) To use the Z table handed out in class, we have to transform Pr(Z ≤ 1.25) − Pr(Z ≤ −2.75) so that we can use information on Pr(0 < Z < a). Let’s first look at Pr(Z ≤ 1.25). Pr(Z ≤ 1.25) = Pr(Z ≤ 0) + Pr(0 ≤ Z ≤ 1.25) because of rule 5 = 0.5 + Pr(0 ≤ Z ≤ 1.25) because of rule 7 = 0.5 + 0.3944 reading off the Z table = 0.8944 Let’s now look at Pr(Z ≤ −2.75) Pr(Z ≤ −2.75) = Pr(Z ≤ 0) − Pr(−2.75 ≤ Z ≤ 0) because of rule 6 = 0.5 − Pr(−2.75 ≤ Z ≤ 0) because of rule 7 = 0.5 − Pr(2.75 ≥ Z ≥ 0) because of rule 8 = 0.5 − 0.4970 reading off the Z table = 0.0030 Now we can compute Pr($10.80 ≤ Y ≤ $12.40) Pr($10.80 ≤ Y ≤ $12.40) = Pr(Z ≤ 1.25) − Pr(Z ≤ −2.75) = 0.8944 − 0.0030 = 0.8914 2

The probability that a worker receives a wage between $10.80 and $12.40 is 0.89, so given many workers, the percentage of workers earning between $10.80 and $12.40 is 89 percent.

b) what percentage of workers receive less than $11.00? In other words we want to know

Pr(Y

< $11.00)

Like in a, we want to convert Y into a standard normal random variable.

Pr(Y

< $11.00) = Pr(Z < $11.00$0−.40$11.90 ) = Pr(Z < −2.25) simplification = Pr(Z ≤ −2.25) because of rule 1 = Pr(Z ≤ 0) − Pr(−2.25 ≤ Z ≤ 0) because of rule 6 = 0.5 − Pr(−2.25 ≤ Z ≤ 0) because of rule 7 = 0.5 − Pr(0 ≤ Z ≤ 2.25) because of rule 8 = 0.5 − 0.4878 reading off the Z table = 0.0122

So, the probability that a worker receives a wage below $11.00 is 0.01, so given many workers, the percentage of workers earning less than $11.00 is 1 percent.

c) what percentage of workers receive more than $12.95? For this part of the question, we want to know

Pr(Y

> $12.95)

Now let’s transform Y into a standard normal random variable Z.

Pr(Y

Once we do that, we know that

> $12.95) = Pr(Z > $12.95$0−.40$11.90 ) = Pr(Z > 2.625) = 1 − Pr(Z ≤ 2.625) because of rule 2 = 1 − [Pr(Z ≤ 0) + Pr(0 ≤ Z ≤ 2.625)] because of rule 5 = 1 − 0.5 − Pr(0 ≤ Z ≤ 2.625) because of rule 7 = 0.5 − 0.4956 = 0.0044 reading off the Z table

So, the probability that a worker receives a wage above $12.95 is 0.0044, so given many workers, the percentage of workers earning more than $12.95 is approximately 0.44 percent.

d) 10 percent of all workers earn more than what wage? For this part, we want know to the wage a such that

Pr(a ≤ Y ) = 0.10 3

To answer this question using the Z table, we transform Y, which is a normal random variable, into a standard normal variable as we did in parts a through c. That is,

.90 ≤ Z ) = 0.10 Pr(a ≤ Y ) = Pr( a −$0$11 .40

(1)

Note that a is greater than the population mean wage because only 10 percent of workers earn more $11.90 is positive. Thus, we can rewrite equation (1) as follows than a, so a−$0 .40

Pr(a ≤



a − $11.90 $0.40 ≤ Z ) = 0.5 − Pr(0 ≤ Z .90 ) = 0.5 − 0.1 = 0.4 Pr(0 ≤ Z ≤ a −$0$11 .40 Y ) = Pr(

.90 ≤ a −$0$11 ) = 0.1 .40

using rule 9

.90 . We find Now use the Z table to find the point c where Pr(0 Z c) = 0.4 where c = a−$0$11 .40 that c=1.28. Note that the Z table doesn’t have any entries with 0.4, so I used the entry closest to 0.4. Therefore, we can solve for a as follows

≤ ≤

.90 ⇒ a = $12.41 1.28 = a −$0$11 .40 That means that 10 percent of all workers earn more than $12.41.

e) 25 percent of all workers earn less than what wage? For this part, we want know to the wage a such that

Pr(Y ≤ a) = 0.25 To answer this question using the Z table, we transform Y into a standard normal variable as we did in parts a through d. That is,

.90 ) = 0.25 Pr(Y ≤ a) = Pr(Z ≤ a −$0$11 .40

Note that a is less than the population mean wage because only 25 percent of workers earn less than $11.90 is negative. Therefore, we can rewrite Pr(Y a) as follows a, so a−$0 .40



.90 ) = Pr(Z ≤ 0) − Pr( a − $11.90 ≤ Z ≤ 0) = 0.25 ≤ a −$0$11 .40 $0.40 a − $11.90 = 0.5 − Pr( $0.40 ≤ Z ≤ 0) = 0.25 using rule 7  a − $11.90  ) = 0.25 using rule 8 = 0.5 − Pr(0 ≤ Z ≤ −

Pr(Y ≤

a) = Pr(Z

⇒ Pr(0 ≤ Z ≤ −

using rule 6

$0.40  a − $11.90 $0.40

) = 0.25

Now use the Z table to find the point c where c=0.67. Therefore, we can solve for a as follows

0.67 = −



Pr(Z ≤ c) = 0.25 where c = − a−$0$114090 .

.

 a − $11.90 

$0.40

4

⇒ a = $11.63



. We find that

That means that 25 percent of all workers earn more than $11.63.

f) determine the interquartile wage range (25-75 percentile) for workers in this industry? For this part, we want to know the wages a and b such

Pr(Y ≤ a) = 0.25 and Pr(Y ≤ b) = 0.75 From part e, we know that a is $11.63. So, all we have to compute is

Pr(Y ≤ b) = 0.75 ⇒ Pr(Y ≥ b) = 0.25

Pr(Y ≤ b) = 0.75.

Note

because of rule 2

The normal pdf of Y is symmetric around its population mean. Therefore, the points a and b are equidistant from the population mean because the Pr( ) = 0 25 and the Pr( ) = 0 25. It may be helpful to draw a picture if you don’t understand this. Thus, = . So,

Y ≤a

. Y ≥b µ−a b−µ

.

$11.90 − a =

b − $11.90 ⇒ b = $11.90 + $11.90 − a = $11.90 ∗ 2 − $11.63 = $12.17

Thus, the interquartile wage range (25% to 75%), which is the wage range for the middle 50% of workers, is [$11.63,$12.17].

2. An orange juice producer buys all his oranges from a large orange orchard. The amount squeezed from each of the oranges is approximately normally distributed with a mean of 4.70 ounces and a standard deviation of 0.5 ounce. a) what is the probability that an orange, selected at random, will contain between 4.70 and 5.00 ounces? This question should seem similar to the questions in problem 1. Let X be a random normal variable denoting the amount of juice squeezed from an orange. We want to know the

Pr(4.70 ounces ≤ X ≤ 5.00 ounces) To compute this probability, we need to tranform X (a normal random variable) into a standard normal variable as we did in problem 1. We know that once we transform X into a standard normal random variable Z, the following are true:

Pr(4.70 ≤ = = = = = =

X ≤ 5.00) = Pr( 4.700−.54.70 ≤ Z ≤ 5.000−.54.70 ) Pr(Z ≤ 5.000−.54.70 ) − Pr(Z ≤ 4.700−.54.70 ) using rule 4 Pr(Z ≤ 0.6) − Pr(Z ≤ 0) simplification Pr(Z ≤ 0.6) − 0.5 using rule 7 Pr(0 ≤ Z ≤ 0.6) + Pr(Z ≤ 0) − 0.5 using rule 5 Pr(0 ≤ Z ≤ 0.6) + 0.5 − 0.5 using rule 7 0.2257 + 0.5 − 0.5 = 0.2257 using the Z table 5

Therefore, the probability that an orange, selected at random will contain between 4.70 and 5.00 ounces is 0.23.

b) 77 percent of the oranges will contain at least how many ounces of juice? For this part, we want to know the point a such that

Pr(a ≤ X ) = 0.77 To compute this probability, we need to tranform X (a normal random variable) into a standard normal variable as we did in part a. We know that once we transform X into a standard normal random variable Z, the following is true:

Pr(a ≤ X ) = Pr( a −0.45.70 ≤ Z ) = 0.77

Note a is less than the population mean because 77 percent of the oranges contain at least a ounces of juice. Therefore, a−04.5.70 will be negative. So,

X ) = Pr( a −0.45.70 ≤ Z ) = 1 − Pr( a −0.45.70 ≥ Z ) = 0.77 using rule 2 ⇒ Pr( a −0.45.70 ≥ Z ) = 1 − 0.77 = 0.23 Pr( a − 4.70 ≥ Z ) = 0.5 − Pr(0 ≥ Z ≥ a − 4.70 ) = 0.23 using rule 10 0.5 0.5 a − 4 . 70 ⇒ Pr(0 ≥ Z ≥ 0.5 ) = 0.27 Pr(0 ≥ Z ≥ a −0.45.70 ) = Pr(0 ≤ Z ≤ − a −0.45.70 ) = 0.27 using rule 8 Now we read off the Z table to find the point c such that Pr(0 ≤ Z ≤ c) = 0.27 where c = − a−04570 . Pr(a ≤

.

.

The value of c is 0.74. Therefore,

0.74 = − a −0.45.70 ⇒ a = 4.33

Thus, 77 percent of the oranges will contain 4.33 ounces of juice.

c) between what two values (in ounces) symmetrically distributed around the population mean will 80 percent of the oranges fall? First note that since these two values are symmetrically distributed around the population mean and 80 percent of the oranges fall between these two values, the lower value must be the point a where Pr( ) = 0 10 and the upper value must be the point b where Pr( ) = 0 10 Also, note since a and b are equidistant from the population mean because of the symmetry of the normal probability distribution function, it must be true that

a

.

b≤X

. .

X≤

4.70 − a = b − 4.70

We can solve this equation for b,

b = 9.4 − a

6

(2)

Therefore, if we determine a, we can compute b using equation (2). So, let’s determine a. To calculate a, we need to first transform X, a normal random variable, into a standard normal random variable Z.

Pr(X ≤ a) = Pr(Z ≤ a −0.45.70 ) = 0.10 Note that a will be less than the population mean because Therefore,

Pr(X ≤ a) = 0.10, so a−04.5.70

≤ a −0.45.70 ) = Pr(Z ≤ 0) − Pr( a −0.45.70 ≤ Z ≤ 0) = 0.10 = 0.5 − Pr( a −0.45.70 ≤ Z ≤ 0) = 0.10 using rule 7 ⇒ Pr( a −0.45.70 ≤ Z ≤ 0) = 0.4 Pr( a −0.45.70 ≤ Z ≤ 0) = Pr(− a −0.45.70 ≥ Z ≥ 0) = 0.4 using rule 8 Pr(X ≤

a) = Pr(Z

Now we read off the Z table to find the point c such that

The value of c is 1.28. Therefore,

is negative.

using rule 6

Pr(0 ≤ Z ≤ c) = 0.40 where c = − a−04570 . .

.

1.28 = − a −0.45.70 ⇒ a = 4.06 Therefore, the lower value is 4.06 ounces. We can use equation (2) to determine b by plugging in the value for a. Doing this, we find that b = 9.4 − 4.06 = 5.34 Thus, the upper value is 5.34 ounces. Hence, 80 percent of the oranges will fall symmetrically around the population mean between 4.06 ounces and 5.34 ounces. Suppose that a sample of 25 oranges is selected: d) what is the probability that the sample mean will be at least 4.60 ounces? First note that for parts d through f, we know the population standard deviation, so we don’t have to use the t-distribution. We use the t-distribution when we don’t know the population variance and have to use the sample variance to estimate it. This question asks Pr(4 60 ounces ¯ )

.

≤X

This problem is a bit different from part b, because we want to know the probability of the sample mean being greater than 4.60 ounces. In part b, we wanted to know the probability of a single orange being at least 4.60 ounces, and we assumed that the distribution of the amount of juice squeezed from orange is normally distributed so we used the Z statistic. From the Central Limit Theorem, we know that the distribution of the sample mean is approximately σ2 normal with mean= x and variance= nx where x and 2x are the population mean and population variance of i respectively. No matter the distribution of i , the distribution of the sample mean will be approximately normal. In parts a through c, we had to assume the i were distributed normally to use the Z statistic. We can do a similar transformation as in part b to transform ¯ , a normal random variable into a standard normal random variable. To do so, we subtract the population mean from ¯ from and divide by the square root of the variance of ¯ to compute the Z statistic. That is,

X

µ

µ

σ X

X

7

X X

X

Pr(a ≤ X¯ ) = Pr( a− µ2 x ≤ Z ) = Pr(

√n(a − µ )

σx n

x

σx

where Z is a standard normal random variable. In this case, n=25,

≤ Z)

µx = 4.70, and σx = 0.5.

So,



− 4.70) ≤ Z ) X¯ ) = Pr( 25(4.60 0.5 Pr(−1.0 ≤ Z ) simplification Pr(1.0 ≥ Z ) using rule 8 Pr(0 ≤ Z ≤ 1.0) + P (Z ≤ 0) using rule 5 Pr(0 ≤ Z ≤ 1.0) + 0.5 using rule 7 0.3413 + 0.5 = 0.8413 using the Z table

Pr(4.60 ≤ = = = = =

Therefore, the probability that the sample mean will be at least 4.60 ounces is 0.84.

e) between what two values (in ounces) symmetrically distributed around the population mean will 80 percent of the sample means fall? Using the same logic as part c of this problem, we want to find the points a and b such that the following are true: Pr( ¯ ) = 0 10 and Pr( ¯ ) = 0 10

b

. −a

X ≤a

.

b≤X

.

We also know that = 9 4 which we derived in part c must hold for the same reason stated in part c. So, we can find a and then use the formula for b (equation (2)) to determine b. We know from the Central Limit Theorem,



Pr(X¯ ≤ a) = Pr(Z ≤ 25(a0.−5 4.70) ) = 0.10

2 X¯ is distributed normal with mean √ µx and variance σ x . Note that a will be less than the population ¯ ≤ a) = 0.10, so 25(0a.−54.70) is negative. Therefore, mean because Pr(X

since

Pr(X¯ ≤ = = =





≤ 25(a0.−5 4.70) ) = 0.10 √ Pr(Z ≤ 0) − Pr( 25(a − 4.70) ≤ Z ≤ 0) = 0.10 using rule 6 0.5 √ 0.5 − Pr( 25(a0.−5 4.70) ≤ Z ≤ 0) = 0.10 using rule 7 √ − 0.5 − Pr( 25(0a.5− 4.70) ≥ Z ≥ 0) = 0.10 using rule 8 √ − Pr( 25(a − 4.70) ≥ Z ≥ 0) = 0.40 a) = Pr(Z

0.5

Now we read √ off the Z table at the back of the book to find the point c such that

− where c=

25(a−4.70) . The value of c is 1.28. Therefore, 0.5

√ − 25(a − 4.70) = 1.28 ⇒ a = 4.572 0.5

8

Pr(0 ≤ Z ≤ c) = 0.40

b = 9.4 − a (equation (2))), we can solve for b b = 9.4 − 4.572 = 4.828

Using the formula for b in terms of a (i.e.,

Therefore, the two values symmetrically distributed around the population mean that 80 percent of the sample means fall are 4.57 ounces and 4.83 ounces.

f) 77 percent of the sample means are above what value? This part of the problem should look very similar to part b. We want to know the point a such that

Pr(a ≤ X¯ ) = 0.77 ¯ into a standard normal variable as we did in part e. To compute this probability, we need to tranform X Again as in parts d and e, we know that



Pr(a ≤ X¯ ) = Pr( 25(a0.−5 4.70) ≤ Z ) = 0.77 from the Central √ Limit Theorem. We know that a will be less than the population mean because

X¯ ) = 0.77

25(a−4.70) will be negative. Therefore, so 0.5

Pr(a ≤ = =





Pr( 25(a − 4.70) ≤ 0.5





X¯ ) = Pr( 25(a0.−5 4.70) ≤ Z ) = 1 − Pr(Z ≤ 25(a0.−5 4.70) ) = 0.77 √ 1 − (Pr(Z ≤ 0) − Pr( 25(a0.−5 4.70) ≤ Z ≤ 0)) = 0.77 using rule 6 √ 1 − (0.5 − Pr( 25(a0.−5 4.70) ≤ Z ≤ 0)) = 0.77 using rule 7 √ Pr( 25(a0.−5 4.70) ≤ Z ≤ 0) = 0.27 √ − Z ≤ 0) = Pr( 25(0a.5− 4.70) ≥ Z ≥ 0) = 0.27 using rule 8

Now we read off the Z table to find the point c such that The value of c is 0.74. Therefore,

Pr(a ≤

using rule 2



Pr(0 ≤ Z ≤ c) = 0.27 where c= −

25(a−4.70) . 0.5

√ − 0 74 = 25(a − 4.70) ⇒ a = 4.626 .

0.5

Thus, 77 percent of the sample means are above 4.63 ounces.

g) Compare your answers in b), c), e), and f). First of all, parts b) and f) and c) and e) are similar. Let’s first examine b) and f) together. In part b, we found that 77 percent of the oranges will contain 4.33 ounces of juice. In part f, we found that 77 percent of the sample means are above 4.63 ounces. What is the reason for this difference? Well, the ¯ are σ2 and σ 2 , respectively. So, as n increases, the variance of the sample mean variance of X and X n falls but the variance of X does not. Therefore, we’d expect more of the sample means to be near the ¯ E(X)=E(X)=4.70 when n is large. Indeed if n was really large, our sample would encompass the entire population so we would expect that sample mean would be exactly the population mean so the sampling 9

distribution of the mean would have zero variance. So, as n increases, we would expect more the sample means to be closer to the population mean. That is what we find as 4.64 ounces is closer to 4.70 ounces than 4.33 ounces. Similar logic follows for comparing c) and e). As n increases, more of the sample means should be close to 4.70. In part c, we found that 80 percent of the oranges will fall symmetrically around the population mean between 4.06 ounces and 5.34 ounces. In part e, we found that 80 percent of the sample means will fall symmetrically around the population mean between 4.57 ounces and 4.83 ounces. The interval for the sample mean is tighter around the population mean because the variance of the sample mean goes to zero as n increases.

3. The personnel director of a large corporation wishes to study absenteeism among the clerical workers at the corporation’s central office during the year. A random sample of 25 clerical workers reveals the following: Absenteeism sample mean = 9.7 days, sample standard deviation = 4.0 days. Although this is not a necessary part of your solution, I will start with a derivation of the 95% confidence interval. Let µx be the population mean and X be a random variable. Suppose we don’t know the population variance. We want to construct a 95 percent confidence interval for a two-sided test. That is, we want to find the a such that ¯ and X+a ¯ is 0.95 Pr(µx ) lies between X-a

(3)

Since the population variance is not known, we must use the t-distribution. Unlike when the population variance is known, we have to estimate the population variance and this estimation of the population variance changes the distribution of the test statistic. Note, if n is large (around 30), the t-distribution X¯ −µx is distributed t with n-1 degrees and normal distribution are very similar. In this case, we know that 2 1 of freedom where s2 = n− 1

ni

s

n

Xi − X¯ )2 is the sample variance. First, we want to find the b such that ¯ (4) Pr(−b < X− µx < b) = 0.95

=1 (

s2 n

Then, we can rewrite equation (4) as follows,

Pr(−b √sn − X¯ < −µx < b √sn − X¯ ) = 0.95

by multiplying both sides of the inequalities by √sn and subtracting We can divide both sides of the inequalities by -1.

X¯ from both sides of the inequalities.

Pr(X¯ + b √sn > µx > X¯ − b √sn ) = 0.95

Therefore, the 95% confidence interval is

¯ , n, what b, X

s

X¯ ± b √sn .

So, the a in (3) is

(5)

b √sn .

We need to figure out

and are to determine the 95% confidence interval. To determine b, we use the t-table with the appropriate degrees of freedom such that the area in each of the two tails is 0.025, which sums to X¯ −µx < b is true. 0.05, because we want to know the b such that Pr( b < 2



s

n

) = 0.95

a) Set up a 95 percent confidence interval estimate of the average number of absenses for clerical workers last year.

10

Let X be a random variable for the number of absenses. Using the t-distribution with 25-1=24 degrees of freedom and the table handed out in class, we find that b=2.064. From the information given, we know . Therefore, the 95% confidence interval is that days and /

s √n = 4.0/5 = 0.8 X¯ ± bσˆx = 9 7 ± 2 064 ∗ 0 8 = 9 7 ± 1 6512

X¯ = 9.7

.

.

.

.

.

Thus, with a 0.95 probability, the population mean number of absences lies between 8.05 and 11.35 days.

b) Set up a 95 percent confidence interval estimate of the average number of absences for clerical workers last year assuming that you have the population variance 17 days. This part of the problem differs from the previous part in that we know the population variance. Remember that when we know the population variance, we can use the normal distribution because we don’t have to estimate the population variance. We can still use the general formula for the 95% confidence interval we derived in part b (i.e., equation (5)) with σ x substituting for s where σ x is the population standard deviation. If you don’t believe me, it might be helpful to go through the steps detailed in part a for this different case. In this case, the appropriate b is different because we are using the normal distribution. In particular we want to know the b such that

¯

Pr(−b < X− µ2 x σ n

where

¯ −µx X σ2

n

< b) = 0.95

is distributed normal with a mean of zero and a variance of 1. Since this is a two-tailed

test, we put equal probability in each of the tails, so

¯ Pr( X− µ2 x σ n

> b) = 0.025

Note that b is positive since we are looking at the top 2.5 percent of the standard normal distribution. So,

¯ Pr( X− µ2 x σ n

> b) = 1 − Pr( X−σµx < b) = 0.025 ¯

2

using rule 2

n

 σ < b) + Pr( X¯−σµx < 0)) = 0.025

= 1 − (Pr(0
X¯ − 1.96 √ n

We know

X¯ , σx, and n from the given information. Thus, the 95% confidence interval is √ σ x X¯ ± b √n = 9.7 ± 1.96 ∗ 517 = 9.7 ± 1.616 11

Thus, with a 0.95 probability, the population mean number of absenses lies between 8.08 and 11.32 days.

4. Find the Ordinary Least Squares (OLS) estimators for the following models (where a and b are the parameters of interest). Before finding the OLS estimators for the models, let’s first review the general procedure for finding the OLS estimators. Remember that the OLS estimators minimize the sum of the squared residuals (SSR) or equivalently, the squared difference between the dependent variable Yi and the predicted value of the ˆi (i.e., n e2 = n (Yi Yˆi )2). To do this, we follow the following steps: dependent variable Y i=1 i i=1











1. To start, it will be useful to rewrite in terms of the unknown parameters. For instance, in the case n n 2 2 the model is i = + i + i , i=1 i = i=1 ( i i) . n e2 with respect to the unknown parameters. If there is more than one 2. Take the derivative of i=1 i unknown parameter, then we use partial derivatives otherwise we use ordinary derivatives.

Y a bX e



e

Y − a − bX

3. Set the derivative(s) equal to zero. These are the first order conditions (FOC). This gives you as many equations as there are unknown parameters. 4. We solve the equations to get formula(e) for our unknown parameters in terms of the data. If you have two or more unknown parameters, it is ok to leave the formula(e) for the unknown parameters in terms of other unknown parameters except the formula for one of the unknown parameters should be in terms of the data (usually X and Y). For example, when deriving a and b for the model Y=a+bX+e, it is ok to leave the formula for a as a function of b, but b must be solved in terms of X and Y. In class, the formulae we derived for a and b in this case were as follows:

¯ − bX¯ = Y n ¯¯ b = i=1n XXiY2i −− nnXX¯ 2Y i=1 i

a



n e2 with 5. Check the second order conditions (SOC). That is, compute the second derivative of i=1 i respect to the unknown parameters. We know that for a minimum, the second order conditions must be positive when evaluated at the minimized values for the unknown parameters. If the SOC conditions for a minimum are satisfied, then you have found a minimum.

a) Y=a+e Note this problem and a similar problem of deriving the OLS estimator (b) for the model Y=bX+e is covered in Tanguy Brachet’s OLS handout. To start, we want to minimize the sum of squared residuals with respect to a, the unknown parameter. That is, n e2i with respect to a (6) i=1

min



Note that the model Y=a+e can be equivalently written as Yi =a+ei . Yi equation(6) in terms of the unknown parameter a using the fact that ei

min

 n

i=1

e2i with respect to a

⇒ min  Y − a n

(

i=1

12

i

Also, note we can rewrite That is,

= − a (step 1).

2 ) with respect to a

(7)

Next, we follow step 2 and take the ordinary derivative of (7) with respect to a. The result is

d

ni

=1 (Yi

da

− a)2 = −2

 n

i=1

(Yi

− a)

(8)

Then we set the derivative equal to zero (our FOC) as follows:

−2

 n

i=1

(Yi

− a) = 0



X Y

n ( + )= Dividing both sides by -2 and using the fact that i i=1 i the summation), we get n n

ni Using the fact that

=1 a

Y − a = 0 i

i=1

Xi + ni=1 Yi (distribution of

= an and bringing the an to the other side of the equation,

n Y = an i

ni

Now we divide both sides by n,

n

=1

i=1

i=1

ni=1 Yi

ni

=1 Yi

n

=a

¯ so is precisely the sample mean of Y (Y),



=

a

Now we need to check the second order condition for a minimum is satisfied. So, we take the derivative of (8) with respect to a. d( 2 ni=1 (Yi a)) = 2n > 0







da

Since the second order condition is positive, we know that we have found a minimum. Therefore, the OLS estimator in this model is

a = Y¯

b) Y=a+bX+e First note that this is the model for which we derived a and b in class and in section. To start, we write

ni

2 =1 ei in terms of the unknown parameters. In this case,

n e

2

i=1

i =

 Y − a − bX n

i=1

( i

2

(9)

i)

Now we take the partial derivatives of (9) with respect to the unknown parameters a and b. let’s start with the partial derivative (9) with respect to a.

∂  Y − a − bX ∂a i

[ i

2

i]

=

−2

 Y − a − bX (

i

13

i

i)

First,

(10)

Now set the partial derivative equal to 0 (i.e., derive the FOC for a).

−2

 Y − a − bX (



i)

i

=0 =

 Y − a − bX (

i

i)

i

=0

FOC for a

(11)

i

 Y − a − bX X

Similarly, we can take the partial derivative of (9) with respect to b,

∂  [Y − a − bX ]2 = −2 ∂b i

i

i

 Y − a − bX X

(

i)

i

i

(12)

i

 Y − a − bX X

Now set the partial derivative equal to 0 (i.e., derive the FOC for b).

−2

(

i

i)



=0=

i

(

i

i)

i

i

= 0 FOC for b

(13)

i

We now have two equations (11) and (13) and two unknowns (a and b), we can solve these equations for a and b. Let’s start with equation (11):

 Y − a − bX (

i)

i

=0

i

 Y −  bX  a

Distributing the summation and moving the summation involving a to the other side of the equation, i

i

i

=

i

i

We can further simplify this expression using the facts that Therefore,

 Y − b  X = an i

i

Divide both sides of the equation by n,

1

n

ni

=1

bXi = b ni=1 Xi and ni=1 a = na.

i

i

Y − b1 X = a

(14) n 1  Y =Y ¯ and n1 i Xi = X¯ , we can rewrite the equation above as follows: Remembering that n i i a = Y¯ − bX¯ i

i

i

i

Note this formula for a involves b. This is fine as long as we derive a formula for b that doesn’t involve a. That is, we need to derive formulae for a and b such that we could use them to compute a and b if we were given data on X and Y. Now let’s derive a formula for b using the equation (13) along with the formula for a (i.e.,

 Y − a − bX

i)

Xi = 0

Xi through, distributing the summation, and using the fact that ni=1 bXi = b ni=1 Xi, i

Multiplying the

( i

a = Y¯ − bX¯ ).

n X Y − a n X − b n X i=1

i i

i

i=1

14

i=1

2

i

=0

(15)

ni

Note that

Xi = nX¯ and a = Y¯ − bX¯ , so

 X Y − Y − bX nX − b  X

=1

n

i=1



i i

¯)

2 i =0

i=1

Gathering the terms involving b together,

 X Y − nXY n

i=1

n

¯

 b nX − X

i i

n

¯2

¯¯ + (

i=1

2

i)=0

Moving the term involving b to the other side of the equal sign and dividing by

ni XiYi − nX¯ Y¯ ni Xi − nX¯ = b =1

2

=1

ni

=1

Xi2 − nX¯ 2

,

2

Notice that b is a function of known information (X and Y). Now we need to check if the second order conditions for a minimum are satisfied. To start, let’s take the partial derivative of (10) with respect to a, which is the second partial derivative of the sum of squared residuals with respect to a. That is,



∂2[

i

(Yi

− a − bX )2] = ∂ [−2  (Y − a − bX )] = −2  −1 = 2n > 0 i

∂a

2

i

i

∂a

i

i

Therefore, the second order condition for a satisfies the condition for a minimum since it is positive. Now let’s take the partial derivative of (12) with respect to b, which is the second partial derivative of the sum of squared residuals with respect to b). That is,



∂2[

i

(Yi

− a − bX )2] = ∂[−2  (Y − a − bX ) X ] = −2 (−X 2) = 2  X 2 > 0

∂b2

i

i

i

i

∂b

i

i

i

i

i

Therefore, the second order condition for a satisfies the condition for a minimum since it is positive. Thus, the OLS estimators for a and b are respectively

¯ − bX¯ a = Y n ¯¯ b = i=1n XXiY2i −−nnXX¯ 2Y i=1 i c) Y2 =a+bX+e We could derive a and b in this case from stratch. Note that although we have Y2 , we are still estimating a linear relationship. The linear relationship we are estimating is between Y2 and X. To derive a and b like we did before, we can minimize

 Y n

i=1

2

n

ei =

i=1

2 ( i

− (a − bXi))2

with respect to a and b. Alternatively, we can rewrite this model to look like the model which we considered in part b. Define 2. Therefore, the model is equivalently

Y˜ = Y

Y˜i = a + bXi + ei 15



Y

Now we can use the FOC we derived for a and b in part b. We just need to substitute i for i . We can do this because when we took derivatives of the sum of squared residuals we treated the Yi as known i i and ∂Y . and constant. That is, the Yi are not functions of a and b, so ∂Y ∂a ∂b You are not required to go through all of the steps that follow as along as you make the appropriate and correct substitutions. That is, you could simply give a and b with the correct substitutions as answers. These solutions give more detail than necessary to highlight the appropriate and correct substitutions. So, the first order conditions for a and b in this case are respectively

=0



Y˜ − a − bX i

 i



i

=0

= 0



Y˜ − a − bX X = 0 i

i

i

i

Now we use the formula for a and b we derived in part b (except we can’t use n1 used in part b). Using equation (14),

1  Y˜ − b 1

X = a

ni

=1 Yi

= Y¯

that we

n  ˜i and X ¯ for 1 n Xi as follows We can substitute Yi2 for Y n i=1 1 2 ¯ n Yi − bX = a n

i

i

i

i

i

Similarly, we can use the formula for b in part b with the appropriate substitutions. equation (15) n n n

 X Y˜ − a  X − b  X = 0

ni Note that

i i

=1

Xi = nX¯ and a = n

2

i

i

i i Yi − bX¯i (from above), so

i=1

1

=1

=1

2

 X Y −  Y − bX nX − b  X n

i ˜i

(

1

˜i

n

i=1 i Gathering the terms involving b together, n

¯)

i=1

i ˜i

¯(

1

n

i

¯

n

i=1

 X Y − nX  Y

b nX¯ 2 −

˜)+ ( i

2 i =0

X n

i=1

2

i)=0

Moving the term involving b to the other side of the equal sign, dividing by

Y

Y

substituting i for ˜i , 2

That is, using

ni

=1

XiYi2 − nX¯ ( n1 ni=1 Yi2) = b n X 2 − nX ¯2 i=1 i

ni

=1

Xi2 − nX¯ 2, and

The second order conditions for a minimum are satisfied because if we substitute Yi2 for Yi in the second order conditions for a and b, the second order conditions are still positive. So, the OLS estimators of a and b are as follows

a

b

= n1 =



ni

i

=1

¯ Yi2 − bX

XiYi2 − nX¯ ( n1 ni=1 Yi2) n X 2 − nX ¯2 i=1 i 16

d) Y=a+b(ln(X))+e In this part, we are estimating a linear relationship between Y and ln(X). We could derive a and b from stratch as we did in part b by minimizing

  Y − a−b n

e2 i

n

=

( i

(

ln Xi ))2

i=1 i=1 with respect to a and b, but it is faster to use the derivations in part b and transform the model in part d to look like the model in part b. Again you are not required to go through all of the steps that follow as along as you make the appropriate and correct substitutions. That is, you could simply give a and b with the correct substitutions as answers. These solutions give you more detail than necessary to highlight the appropriate and correct substitutions. To start, we can rewrite the model in this part as follows

X

Yi = a + bX˘i + ei

X

where ˘ i = ln i to look like the model in part b. Now we can use the FOC we derived for a and b in part b. We just need to substitute ˘ i for i . We can do this because when we derived the first order conditions for a and b, we treated the i as constant and known. The i are not functions of a and b, so ∂Xi = 0 and ∂Xi = 0. So, the first order conditions for a and b in this case are respectively ∂a ∂b

X X

X

X

 Y − a − bX˘  i

i

= 0

i





Y − a − bX˘ X˘ = 0 i

i

i

i

Now we use the formula for a and b we derived in part b (except we can’t use n1 used in part b). Using equation (14),

We can substitute

ln Xi

for



1 Y − b1 n i i n

i

and

Y¯ for n1 Y¯

ni



X

i

=1

i

as follows

ln Xi = a

Similarly, we can use the formula for b in part b with the appropriate substitutions. equation (15)

 X˘ Y − a  X˘ − b  X˘ = 0 n

¯ − b n1 i ln Xi, so Note that a = Y

i=1

n

i i

n

i

i=1

2

i

i=1

n X˘ Y − (Y¯ − b 1  ln X ) n X˘ − b n X˘ i=1

i i





i=1

i=1

n

i

i

i

i=1

(ln Xi )Yi − Y¯

n

ln(Xi ) + b((

1

n

ln Xi

 X

Gathering the terms involving b together and substituting

n

i

17

Xi = X¯ that we

˘i = a

=1 Yi

− b n1

ni

ln i )

n

i=1

i=1

for

2

i

=0

X˘i,

ln Xi −

 n

i=1

(ln Xi )2 ) = 0

That is, using



Moving the term involving b to the other side of the equal sign and dividing by 1 ( n ln 2 i) i=1 n n (ln ) ¯ ni=1 ln( i ) i i i=1 = n (ln )2 1 2 i i=1 n ( i ln i ) The second order conditions for a minimum are satisfied because if we substitute ln second order conditions for a and b, the second order conditions are still positive. The OLS estimators for a and b for this model are

X ,

X Y −Y  X  X − X

 

a

= Y¯

b

=

− b n1

ni

=1(ln

Xi)2 −

b

Xi for Xi in the

 ln X

i

ni (ln Xi)Yi − Y¯ ni ln(Xi) ni (ln Xi) − (i ln Xi) i

=1

2

=1

1

=1

n

2

What can you say about the linearity of the least squares approach? A satisfactory answer may mention that the least squares approach minimizes the sum of square residuals to fit a line to the data. For instance, in part d, we fit a linear relationship between between Y and ln(X). As Professor Hildreth mentioned in class, OLS allows for nonlinearity in the variables as we saw in parts c and d.

5. Suppose that Y Y ,Y is a simple random sample from a population whose expected value m is unknown. Determine which of these estimators of m is preferable and explain why this is the case. 1,

2

3

m1

m2

=

0.25Y1 + 0.50Y2 + 0.25Y3

=

0.16Y1 + 0.34Y2 + 0.50Y3

First of all, what makes an estimator preferable? Some of these conditions include 1. Unbiasedness of the estimator (i.e., E(estimator)=population mean) 2. Efficiency of the estimator (i.e, the estimator has the lowest variance within the class of unbiased estimators) 3. Consistency of the estimator (i.e., an estimator that has a high probability of being close to the population mean when the sample size increases) We probably won’t concern ourselves with this too much in this class. This is a large sample property. Of first importance is unbiasedness because, in terms of efficiency, we can’t compare the variances of estimators if either of them is biased (look at the definition of efficiency). Next to note is that when we say that we draw a random sample from a population, we will assume that each observation in the sample comes from the same distribution with the same variance and mean and that each observation is independent. So, let’s look at E(m1).

E (m1 ) = E (0.25Y1 + 0.50Y2 + 0.25Y3 ) 18

By the linearity property of the expected value (i.e., E(X+Y) = E(X)+E(Y)),

E (m1) = E (0.25Y1) + E (0.50Y2) + E (0.25Y3) Since E(aX)=aE(X),

E (m1) = 0.25E (Y1) + 0.50E (Y2) + 0.25E (Y3) We know that each of the Yi are drawn from the same distribution with population mean m so E(Yi )=m. Therefore,

E (m1 ) = 0.25m + 0.50m + 0.25m = m Therefore, m1 is an unbiased estimator of m. Now let’s check if m2 is an unbiased estimator of m. Using the same properties of expected value as above,

E (m2)

=

0.16E (Y1) + 0.34E (Y2) + 0.50E (Y3)

=

0.16m + 0.34m + 0.50m

=

m

Thus, m2 is also an unbiased estimator of m. Now that we have established that both estimators are unbiased. Let’s check which estimator has ˜ and X ˘ are linear unbiased estimators of the a lower variance (an idea related to efficiency). Suppose X ˘ ˜ ˜ ˘ Note neither population mean. If var(X)>var(X), we say that an estimator X is efficient relative to X. linear estimator (m1 or m2) is efficient because we know that the mean is the best linear unbiased estimator and neither estimator is the mean. Starting with the variance of m1 ,

V ar(m1 ) = V ar(0.25Y1 + 0.50Y2 + 0.25Y3 ) Since the Yi are independent draws, cov(Yi , Yj )=0 if i = j , so the variance of the sum is the sum of the variances. Therefore,

V ar(m1 ) = V ar(0.25Y1) + V ar(0.50Y2 ) + V ar(0.25Y3 ) Now we need to invoke the fact that Var(aX)=a2 Var(X), so

V ar(m1 ) = 0.252V ar(Y1 ) + 0.502V ar(Y2 ) + 0.252V ar(Y3) Since each Yi comes from the same distribution, V ar(Yi ) = σ2 where σ2 is the unknown population variance. Thus,

V ar(m1) = (0.252 + 0.502 + 0.252 )σ2 Turning to m2 ,

= 0.375σ

2

V ar(m2 ) = V ar(0.16Y1 + 0.34Y2 + 0.50Y3 )

Using the same logic as for the determination of the variance of m1 ,

V ar(m2 )

= = =

0.162V ar(Y1) + 0.342 V ar(Y2) + 0.502 V ar(Y3) 2 2 2 2 (0.16 + 0.34 + 0.50 )σ 2 0.3912σ

The variance of m1 is smaller than the variance of m2 , so m1 is efficient relative to m2 . Therefore, m1 is the preferable estimator of m. If we had also included the mean of the Yi as an estimator, you should be able to show that the mean would be the preferable estimator (because it is BLUE).

19