Econometrics Problem Set #1 Nathaniel Higgins [email protected]

B.1 Suppose that a high school student is preparing to take the SAT exam. Explain why his or her eventual SAT score is properly viewed as a random variable. • Because if you knew everything that you think could possibly impact your SAT score, you still wouldn’t be totally sure what your SAT score would be without taking the test • The SAT score is determined by “an experiment” (not very satisfying if you don’t realize what the experiment is represented by in this context; see below) • You cannot possibly know the SAT score until the exam is taken (the exam is the “experiment”)

B.2 Let X be a random variable distributed as Normal(5,4). Find the probabilities of the following events: 1. P (X ≤ 6) 2. P (X > 4) 3. P (|X − 5| > 1)

(i) P (X ≤ 6) First: don’t be confused by the notation. P (X ≤ 6) translates to the probability that X is less than or equal to 6. Second: realize what X is. In this instance, X is a random variable. 6 is, well, 6 — it’s just a number. The clue that X is a random variable is that it is represented by a capital letter. Little x usually represents a regular old variable (not a random variable). A regular old variable like little x is just a number that we don’t know the value of — it doesn’t vary. So little x is some fixed number, the value 1

of which we don’t happen to know, whereas X is a random variable, which is a variable that takes on lots of different values. So we want to know P (X ≤ 6): the probability that X (a draw of the random variable X) is less than or equal to 6.

0.10 0.00

0.05

probability density

0.15

0.20

pdf of Normal(5,4)

−5

0

5

10

15

X

This is the probability that the draw falls to the left of 6. The total probability is given by the area under the pdf and to the left of 6 (the gray area in the figure). The area under the pdf is given by the value of the cdf evaluated at the number 6. In the second figure (below) a blue line shows the value of the cdf evaluated at 6.

2

0.6 0.4 0.0

0.2

probability

0.8

1.0

cdf of Normal(5,4)

−5

0

5

10

15

X

In other words, the cdf of a Normal(5,4) distribution gives us exactly the answer we are looking for. But computing the value of the Normal(5,4) cdf is not trivial. We can use software to do it, but it’s not something we want to do in our heads. Rather, we tend to look this number up in a table. Most statistics books only contain one table representing the cdf of a Normal distribution — specifically the Normal(0,1) distribution. The Normal(0,1) distribution has a mean of 0 and a standard deviation of 1. The random variable we are interested in is distributed Normal(5,4), i.e. it has a mean of 5 and a standard deviation of 2. So to use the table contained in the back of Wooldridge (or most any other textbook you grab off the shelf), we have to standardize our variable X. That is, we need to create a new variable, using X as a building block. If we take X and subtract from it the number 5, we get a new random variable that has a mean of 0 and a standard deviation of 2 (simply by subtracting 5 from X we haven’t done anything to change the dispersion of the new random variable). If we then take this variable and divide it by 2, we get a random variable with a mean of 0 and a standard deviation of 1 — exactly what we are after. So if X is distributed Normal(5,4), then X−5 is distributed Normal(0,1). To find 2 P (X ≤ 6), we transform both sides of the equation contained inside the parentheses. We subtract 5 and divide by 2 on both sides of the “≤” sign. P (X ≤ 6) = P (

X −5 6−5 ≤ ) 2 2

By convention, a variable that is distributed Normal(0,1) is called “Z” (don’t ask me why, that’s just how folks do it). We can clean up the expression above by writing

3

1 P (X ≤ 6) = P (Z ≤ ) 2 The distribution of a Normal(0,1) (i.e. the distribution of Z) can be found on page 823-824 of Wooldridge. Using the table we see that P (Z ≤ 12 ) = 0.6915.

(ii) P (X > 4) The next problem is slightly different. We now want to know P (X > 4). In this case, instead of needing to find the area under the pdf to the left of a certain point, we want to find the area under the pdf to the right of a certain point (to the right of 4).

0.10 0.00

0.05

probability density

0.15

0.20

pdf of Normal(5,4)

−5

0

5

10

15

X

This problem is a bit trickier. Evaluating the cdf at 4 does not give us our answer! Evaluating the cdf at 4 gives us the cumulative probability up to 4 (i.e. the area under the pdf to the left of the number 4).

4

0.6 0.4 0.0

0.2

probability

0.8

1.0

cdf of Normal(5,4)

−5

0

5

10

15

X

To find the answer, then, we need to realize that that probability that X falls to the right of 4 is the complement of an event we do know the probability of. The occurrence of X falling to the right of 4 is exactly the opposite of the occurrence of X falling to the left of 4. So P (X > 4) is exactly equal to 1 − P (X < 4). P (X > 4) = 1 − P (X < 4) Transform the random variable X into Z just like we did above. 1 − P (X < 4) = 1 − P (

X −5 4−5 1 < ) = 1 − P (Z < − ) 2 2 2

We look up P (Z < − 12 ) on page 823 and find that it is equal to 0.1151. Subtract 0.1151 from 1 to find that P (X > 4) = 1 − 0.1151 = 0.8849.

(iii) P (|X − 5| > 1) Finally, we want to find P (|X − 5| > 1). We’ll use the same trick we did above to deal with the fact that we want to know the probability of a draw falling to the right of 1. The new problem that this calculation presents us with is related to those nasty absolute value signs. Ask yourself: when is |X − 5| > 1? There are two ways that |X − 5| can be greater than 1. First, we could have that X − 5 > 1 (notice no absolute value signs this time). This happens when X > 6. But |X − 5| is also greater than 1 when X − 5 is less than negative 1. So to determine P (|X − 5| > 1) we really need to determine the probability of two events: P (X − 5 > 1) and P (X − 5 < −1). 5

First P (X − 5 > 1): P (X − 5 > 1) = P (

X −5 1 1 > ) = 1 − P (Z < ) = 0.3085. 2 2 2

Then P (X − 5 < −1): P (X − 5 < −1) = P (

X −5 1 < − ) = 0.3085. 2 2

So the total probability is equal to P (X−5 > 1) + P (X−5 < −1) = 0.3085+0.3085 = 0.617.

B.3 Much is made of the fact that certain mutual funds outperform the market year after year (that is, the return from holding shares in the mutual fund is higher than the return from holding a portfolio such as the S&P 500). For concreteness, consider a 10-year period and let the population be the 4,170 mutual funds reported in The Wall Street Journal on January 1, 1995. By saying that performance relative to the market is random, we mean that each fund has a 50-50 chance of outperforming the market in any year and that performance is independent from year to year.

(i) If performance relative to the market is truly random, what is the probability that any particular fund outperforms the market in all 10 years? The probability that any particular fund outperforms the market in any one year is 0.5. The probability that any particular fund outperforms the market in two consecutive years is equal to the probability that the fund outperforms the market in the first year, times the probability that the market in the second year: 0.5 × 0.5 = 0.25. When two “draws” are independent (the performance of the fund relative to the S&P 500 is the random draw, in this case), the joint probability of two independent events is just equal to the probability of each individual event multiplied together. By this logic, the probability that a given fund outperforms the S&P 500 in 10 consecutive years is equal to 0.5 × 0.5 . . . × 0.5 = 0.510 = 0.0009765625.

(ii) Find the probability that at least one fund out of 4,170 funds outperforms the market in all 10 years. We already know the probability that any given fund outperforms the market in all 10 years — it’s really small (0.510 ). What if there are 4,170 chances for one firm to do it? How likely is it that out of 4,170 chances for a low-probability event to occur, we observe 6

at least one (and maybe two, three, . . . ) success(es)? The binomial distribution is our friend here. The binomial distribution (in general) gives us the probability of k successes  in n tries: F (x) = nk pn (1 − p)n−k , where p is the probability of success in one trial. We have all the building blocks of the answer here: we know the number of chances for at least one success (4,170) and we know the probability of success in a single trial (0.510 ). The event that we have at least one success is actually a compound event. This is another way of saying that there are lots of different ways to observe “at least one success.” We could observe exactly one success. We could observe exactly two successes. And so on. Any number of successes > 1 would count as at least one success. So the probability of at least one success is equal to the probability of one success + the probability of two successes + . . . . If we make X the number of funds that beat the market in all 10 years, we can write this as P (X > 1) = P (X = 1) + P (X = 2) + . . . + P (X = 4, 170). To find P (X > 1) this way, we would have to compute 4,170 different probabilities and add them together. Or we could do it the easy way. The probability of at least one success is exactly the opposite of . . . no successes. The probability of no successes plus the probability of one success + . . . + the probability of 4,170 successes = 1. Therefore, the probability of at least one success = 1 - probability of no successes. P (X > 1) = 1 − P (X = 0) We only need to compute the Binomial(4170, 0.510 ) once to get our answer. The n k binomial pdf can be found on page 721 of Wooldridge: f (k) = k p (1−p)n−k . Therefore, !

4, 170 0 P (X > 1) = 1 − f (0) = 1 − p (1 − p)4,170 . 0 4,170 0

= 1 and anything raised to the 0th power is 1, so we really only need to compute that last term on the far right of the expression above. P (X > 1) = 1 − (1 − p)4,170 = 1 − (1 − 0.510 )4,170 = 0.9829951. Even though each firm has no better than a 50-50 shot of beating the market in a given year, the chances that at least one firm looks absolutely studly after 10 years of beating the market is 98%!

B.4 For a randomly selected county in the United States, let X represent the proportion of adults over age 65 who are employed, or the elderly employment rate. Then, X is restricted to a value between zero and one. Suppose that the cumulative distribution function for X is given by F (x) = 3x2 − 2x3 for 0 ≤ x ≤ 1. Find the probability that the 7

elderly employment rate is at least 0.6 (60%). We have been given the cumulative distribution function (the cdf) corresponding to the old folks employment rate, X. This function tells us the probability that the old folks employment rate is less than a particular value. Pick a value. Any value. Say 0.2. The function they gave us, F (x) = 3x2 − 2x3 , tells us the probability that the old folks employment rate is less than that value substituted in where you see x. Evaluated at 0.2 we see that P (X < 0.2) = P (X ≤ 0.2) = 3(0.2)2 − 2(0.2)3 = 0.104. To answer the question they have given us, we need to calculate the probability that X is at least 0.6. The probability that X is at least 0.6 is 1 - the probability that X is at most 0.6. P (X > 0.6) = 1 − P (X < 0.6) = 1 − 3(0.6)2 − 1(0.6)3 = 0.352

B.5 Just prior to jury selection for O.J. Simpson’s murder trial in 1995, a poll found that about 20% of the adult population believed Simpson was innocent (after much of the physical evidence in the case had been revealed to the public). Ignore the fact that this 20% is an estimate based on a subsample from the population; for illustration, take it as the true percentage of people who thought Simpson was innocent prior to jury selection. Assume that the 12 jurors were selected randomly and independently from the population (although this turned out not to be true).

(i) Find the probability that the jury had at least one member who believed in Simpson’s innocence prior to jury selection. [Hint: Define the Binomial(12,0.20) random variable X to be the number of jurors believing in Simpson’s innocence.] The binomial again. The probability that the jury had at least one member who believed in Simpson’s innocence is equal to the probability that the jury had exactly one member who thought Simpson was innocent plus the probability that the jury had exactly two members who thought Simpson was innocent plus . . . the probability that all twelve members of the jury thought Simpson was innocent. Once again, we can either compute each of these probabilities, or we can compute the probability that no members of the jury thought Simpson was innocent, subtract that value from 1, and be done with it. The probability that no jurists thought Simpson was innocent plus all the other probabilities enumerated above have to sum to 1. So we get the probability that at least one jurist thought he was innocent by subtracting from 1 the probability that everybody thought he was guilty. P (X = 0) + P (X = 1) + P (X = 2) + . . . + P (X = 12) = 1

8

Another way of writing this same thing is P (X = 0) + P (X ≥ 1) = 1 Therefore P (X ≥ 1) = 1 − P (X = 0) The probability that X = 0 is given by the binomial distribution with 12 chances (12 draws from the jury pool) and a probability of “success” equal to p. The probability of “success” here is the probability that a given draw from the jury pool will give us a juror that thinks Simpson is innocent. And what is p? p is the proportion of the population that believes Simpson is innocent (in this question they asked us to treat 0.2 as the true probability that any given individual chosen at random would think Simpson was innocent). !

P (X = 0) = f (0) =

12 0 p (1 − p)12 = 1 × 1 × (1 − 0.2)12 = 0.812 = 0.06871948. 0

To determine P (X ≥ 1) we simply subtract this value from 1 P (X ≥ 1) = 1 − 0.06871948 = 0.9312805 So, given the proportion of the population that believed Simpson was innocent, if the jury had been selected at random from the U.S. population, Simpson had a 93% chance of getting off!

(ii) Find the probability that the jury had at least two members who believed in Simpson’s innocence. [Hint: P (X ≥ 2) = 1 − P (X ≤ 1), and P (X ≤ 1) = P (X = 0) + P (X = 1).] Same logic that applies above applies here as well. They essentially give you the sequence of calculations in the hint above. P (X ≥ 2) = 1 − (P (X = 0) + P (X = 1)) We already found P (X = 0) above (it was 0.06871948 if you’ve already forgotten). Compute P (X = 1) !

f (1) =

12 1 p (1 − p)11 = 12 × 0.2 × 0.811 = 0.2061584. 1

and the desired result immediately follows P (X ≥ 2) = 1 − (P (X = 0) + P (X = 1)) = 1 − 0.06871948 − 0.2061584 = 0.7251221.

9

B.7 If a basketball player is a 74% free throw shooter, then, on average, how many free throws will he or she make in a game with eight free throw attempts? See the spreadsheet B-7.xlsx (posted on my website) to see an informative way to answer this question. There is a much easier calculation you could do (I bet many of you did it), but the spreadsheet gives an alternative way to attack the problem. Warning: the alternative way is longer and may result in learning (or maybe not). The answer is 5.92. If you rounded that number to the nearest whole number, that would make sense. You can’t really make 0.92 of a free throw now can you.

B.8 Suppose that a college student is taking three courses: a two-credit course, a three-credit course, and a four-credit course. The expected grade in the three- and four-credit courses is 3.0. What is the expected overall grade point average for the semester? (Remember that each course grade is weighted by its share of the total number of units.) 2 × 3.5 + 3 × 3.0|4 × 3.0 = 3.11 9

B.10 Suppose that at a large university, college grade point average, GPA, and SAT score, SAT, are related by the conditional expectation E(GP A|SAT ) = 0.70 + 0.002SAT .

(i) Find the expected GPA when SAT = 800. F indE(GP A|SAT = 1, 400). E(GP A|SAT = 800) = 0.70 + 0.002 × 800 = 2.3. E(GP A|SAT = 1, 400) = 0.70 + 0.002 × 1400 = 3.5 Whadya know. People with higher SAT scores have higher GPAs. Learn something new every day.

(ii) If the average SAT in the university is 1,100, what is the average GPA? (Hint: Use Property CE.4.)

10

Property CE.4 is called the “law of iterated expectations.” You don’t need to know that, but if you look at the rule for a while, you might understand why it’s called this. And understanding why it’s called this might help you remember it. Just sayin’. The law of iterated expectations (in terms of two random variables, Y and X) is given by E[E(Y |X)] = E(Y ), where the outer expectation on the left side of the equation is the expectation over X. Let’s use that same rule but substitute GP A for Y and SAT for X: E[E(GP A|SAT )] = E(GP A) E[E(GP A|SAT )] = E[0.70+0.002(SAT )] = 0.70+0.002(E[SAT ]) = 0.70+0.002(1100) = 2.9.

(iii) If a student’s SAT score is 1,100, does this mean he or she will have the GPA found in part (ii)? Explain. Nope. Of course there is randomness involved here. Just because a student has an SAT score equal to 1100 doesn’t mean they have a GPA equal to 2.9. On average, a GPA of 2.9 is what we expect from someone with an 1100 SAT. But other things influence GPA, including work ethic, how much of a suckup the person is, etc.

C.1 Let Y1 , Y2 , Y3 , and Y4 be independent identically distributed random variables from a population with mean µ and variance σ 2 . Let Y¯ = 14 (Y1 + Y2 + Y3 + Y4 ) denote the average of these four random variables.

(i) What are the expected value and variance of Y¯ in terms of µ and σ 2 ? Y¯ is an estimator. Y¯ is a variable built using building blocks that are random variables. Another way to say this is that Y¯ is a function of random variables. This means that Y¯ is a random variable. This is important to realize. If you don’t realize this fact, then this exercise seems very, very pointless. If you do realize this fact, then you notice that taking the expectation of Y¯ has a point — the expectation of Y¯ should be equal to the thing (the non-random thing) that you are trying to estimate. What are you trying to estimate when you calculate the sample average? The population average, µ. E(Y¯ ) = E(

1X 1X 1X 1 Yi ) = E(Yi ) = µ = 4µ = µ 4 i 4 i 4 i 4

11

Notice what we did. We started with one fact, that the expectation of Yi was µ, and from this derived the expectation of Y¯ . Even if it seems intuitive to you that the expectation of Y¯ is µ, you should realize upon reflection that there is no reason why this has to be true. We started out knowing the distribution of Y , not Y¯ . Now for the variance of the estimator Y¯ : 1 V (Y¯ ) = V ( 4

X i

Yi ) =

X 1 1 X 1 1 V ( Yi ) = V (Yi ) = 4σ 2 = σ 2 = 0.25σ 2 . 16 16 i 16 4 i

The variance of Y¯ characterizes the dispersion of Y¯ . If we take a bit of data and calculate the sample average, we expect to get something pretty close to µ. But we won’t get exactly µ every time. This is the nature of a random variable. Instead, we’ll (hopefully) get something close to µ. The variance of Y¯ tells us something about how likely we are to get values far away from µ. You’ll see why this matters below.

(ii) Now, consider a different estimator of µ: W = 18 Y1 + 18 Y2 + 14 Y3 + 21 Y4 . This is an example of a weighted average of the Yi . Show that W is also an unbiased estimator of µ. Find the variance of W . A weighted average is an estimator, just like the regular old average. A weighted average is (usually) meant to estimate the population average (µ), just like the regular old sample average is meant to estimate µ. So let’s use the same mechanics to find out if using a weighted average of our sample data is a good way to estimate µ. 1 1 1 1 1 1 1 1 E(W ) = E( Y1 + Y2 + Y3 + Y4 ) = µ + µ + µ + µ = µ 8 8 4 2 8 8 4 2 So far so good . . . 1 1 1 1 1 1 1 1 V (W ) = V ( Y1 + Y2 + Y3 + Y4 ) = σ 2 + σ 2 + σ 2 + σ 2 = 0.34375σ 2 . 8 8 4 2 64 64 16 4

(iii) Based on your answers to parts (i) and (ii), which estimator of µ do you prefer, Y¯ or W ? Both estimators are unbiased, but the variance of the regular old sample average (Y¯ ) is smaller than the variance of W . So I like Y¯ better. When estimating µ I would prefer an estimator that varies less around µ.

12