Econometrics I Lecture 2: Statistics

Econometrics I Lecture 2: Statistics Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1395 1 / ...
Author: Shauna Shaw
0 downloads 2 Views 479KB Size
Econometrics I Lecture 2: Statistics Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology

44716 Fall 1395

1 / 43

Outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

Reference: Wooldridge, Appendix C; Stock and Watson, Ch3; Greene, Appendix C (partly);

2 / 43

Denitions



Population, sample, statistical inference



learning: estimation and hypothesis testing



example: returns to education impractical and costly to survey the whole population point estimate: 7.5 percent interval estimate: 5.6-9.4 percent



example: neighborhood watch aects crime hypothesis testing

3 / 43

Denitions (2)



Identify population of interest and then the relation of interest. the relations involve (features of ) probability distributions.

Y

random variable on a population with pdf

f (y; θ). θ is.

Draw a sample from the population to learn what

• {Y1 , ..., Yn }

f (y; θ) if Y1 , ..., Yn f (y; θ) (i.i.d)

is a random sample from

independent with common pdf

are

once the sample is drawn we have a set of numbers

{y1 , ..., yn }

4 / 43

Example •

We want to know what is the average returns to education in Iran.

• Y: •

wage,

X:

years of schooling

We take a random sample of

n

individuals in Iran and ask

about their wages and years of schooling.



Before we ll out the questionnaire: after, we have data:



{(X1 , Y1 ), ..., (Xn , Yn )}

{(x1 , y1 ), ..., (xn , yn )}

joint pdf in the whole population:

fX,Y (x, y; θ),

where

θ

is

a set of parameters.



Aim: learn something about

θ



We might be interested in a

subset

from the sample. of parameters

fY |X (y | X = x; β) more specic example:

E[Y | X = x] = β0 + β1 x 5 / 43

Outline Preliminary denitions

Point estimators Finite sample properties Asymptotic (large sample) properties

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

6 / 43

What is an estimator? {Y1 , ..., Yn }



Random sample



(Point) Estimator: is a rule (function) that relates the

from

f (y; θ).

observed values of the sample to a value for the parameter(s) of interest (θ ):

W = h(Y1 , ..., Yn ) W

is a random variable itself.

{y1 , ..., yn } we w = h(y1 , ..., yn ).

given an actual sample of point estimate for



θ

Sometimes I use

as

θˆ when

can calculate a

talking about an estimator for

θ.

with a new sample, we get a new estimate for the parameter.

7 / 43

Example



Say we are interested in knowing the population mean for a random variable

Y ∼ N (µ, σ 2 ). {Y1 , ..., Yn }



We draw a random sample



What are the potential estimators for Anything like

µ?

µ ˆ = h(Y1 , ...Yn ).

µ ˆ = Y¯ = µ ˆ1 = Y1

Natural candidate is the sample mean: Weird candidate: use rst draw only:



1 n n Σi=1 Yi .

How do we pick among the many possible estimators?

8 / 43

Finite sample properties of estimators



Dene criteria for a good estimator Unbiasedness Relative eciency

9 / 43

Unbiasedness •

An estimator,

E(W ) = θ

W

of

θ,

is an unbiased estimator if

for all possible values of

θ.

interpretation: this is about the mean of the distribution of

W

NOT the value you get for an observed sample (w ).



dene bias: Bias(W )



Is

µ ˆ = Y¯

≡ E(W ) − θ

unbiased estimator of the population mean

What about

µ ˆ1 = Y1 ?



Some poor estimators are unbiased



Exercise: Is

σ2?

µ?

S 2 = n1 Σni=1 (Yi − µ)2

an unbiased estimator for

10 / 43

Relative eciency



Why only consider the mean of the distribution of



Sampling variance of estimators could be important too!

W?

Var(W )?



Relative eciency: If

W1

and

W2

W2

when Var(W1 )

with strict inequality for at least one



θ, W1 is ≤Var(W2 ) for all θ, θ.

are two unbiased estimators of

ecient relative to

Example: Var(ˆ µ) vs. Var(µ ˆ1 ) Var(ˆ µ)

=

σ2 n

< Var(ˆ µ1 ) = σ 2

for

n>1

11 / 43

Unbiasedness vs. relative eciency

Source: Wooldridge (2013)

12 / 43

Comparing two estimators



What if one estimator is biased and the other is not? How do we compare them?



Mean squared error (MSE)

M SE(W ) = E[(W − θ)2 ] = V ar(W ) + (Bias(W ))

2

13 / 43

Asymptotic properties



How do the properties of estimators change as the sample size increases?



How large is large?

14 / 43

Consistency • Wn

an estimator of

θ

based on

{Y1 , ..., Yn }

is consistent if

∀ > 0, Pr(| Wn − θ |> ) → 0 as n → ∞ •

Sometimes we write the above condition as plimWn

= θ.

If I require the estimator to be close enough to the parameter, I can nd a large enough sample size that does this. Or as the sample size increases, the probability of being far from

µ

drops to zero.

Unbiased estimators are not necessarily consistent But those with shrinking variance are. if

E(Wn ) = θ

and Var(Wn )

→0

as

n→∞

then

Wn

is consistent.

15 / 43

LLN and CLT



Law of L arge Numbers (LLN) If

Y1 , ..., Yn

are i.i.d with mean



Asymptotic normality



Central Limit Theorem (CLT)

µ

¯n ) then plim(Y



Y1 , ..., Yn is a random sample with mean µ and variance ¯ −µ σ 2 , then Zn = Yσ/n√ has an asymptotic standard normal n If

distribution.

16 / 43

Approaches to parameter estimation



Are there systematic methods for derivation of good estimators?



We discuss three approaches briey, Method of moments Maximum likelihood Least squares



but during the course we mostly rely on least squares.

17 / 43

Method of moments •

Remember the following

moments

for a random variable

with a given distribution

  µ = E(Y ), σ 2 = E (Y − µ)2 •

A random sample with



A natural way to estimate parameters

n

observations is

µ

{Y1 , ..., Yn } and

σ2

is to replace

the moment conditions with their sample counterparts.

µ ˆ = Y¯ = •

1 n n Σi=1 Yi ,

σ ˆ2 =

1 n n−1 Σi=1

Yi − Y¯



A more general interpretation is We know some random variables must satisfy a few conditions in the population. We use the sample counterparts of these conditions to formulate method of moment estimators.

18 / 43

Maximum likelihood •

For a sample of

{Y1 , ..., Yn }

we can dene the likelihood

function as follows

L(θ; Y1 , ..., Yn ) = f (Y1 , ..., Yn ; θ) where

f ()

is the joint pdf for the random variables

a vector of unknown parameters

Yi

with

θ.



Maximum likelihood (ML) suggests a good estimator for



Intuition:

θ

maximizes the likelihood function.

L(θ; y1 , ..., yn )

gives the probability of observing a given

realization for the sample as a function of ML says a good estimator picks values for

θ. θ such

that the

probability of observing the current sample is maximized.



Under random sampling and assuming a pdf for

Y

this

could be a fruitful method.



Example: Random sample and estimator for

µ

Y ∼ N (µ, σ 2 ).

Find ML

2 and σ . 19 / 43

Least squares •

Consider the following decomposition

Yi = E [Yi | Xi = xi ] + ui where



ui = Yi − E [Yi | Xi = xi ].

Also consider a functional form for

E [Yi | Xi = xi ] = g(xi ; θ). •

One way to assess the goodness of

g(.)

is to see how far we

are on average from the observed values for

Y.

l(θ) = Σni=1 (Yi − g(xi ; θ))2 • •

LS estimator is

θˆ = argmin l(θ)

Example: Assume estimators for

β0

g(xi ; β0 , β1 ) = β0 + β1 xi , β1 .

nd LS

and

20 / 43

outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

21 / 43

Point vs. interval estimation



Point estimate doesn't provide information about how close the estimate is likely to be to the parameter of interest.



Point estimate for returns to education: 7 percent we cannot know

for certain

how close this is to the

population parameter. But we can make



probabilistic

claims.

Interval estimate: 3-11 percent Build a condence interval

22 / 43

Interval estimate for mean •

Random sample

{Y1 , ..., Yn }

from a population with

N (µ, σ 2 ). •

Use sample mean to estimate

µ:

σ2 Y¯ − µ √ ∼ N (0, 1) Y¯ ∼ N (µ, ) ⇒ n σ/ n •

From the standard normal distribution we know

 Y¯ − µ √ < +1.96 = 0.95 Pr −1.96 < σ/ n 



which suggests with 95 percent probability the interval

[Y¯ − 1.96 √σn , Y¯ + 1.96 √σn ] •

interval estimate for

contains

µ.

µ: [¯ y − 1.96 √σn , y¯ + 1.96 √σn ]. 23 / 43

100(1 − α)% condence interval •

We considered 95 percent condence intervals but the concept is more general



Let

cα/2

denote the

100(1 − α/2)

percentile of the standard

normal distribution.

• 100(1 − α)% obtained as

condence interval around the mean is

σ σ [¯ y − cα/2 √ , y¯ + cα/2 √ ] n n ¯ the standard deviation of Y



σ Notice √



numerical example:

n

is

σ = 1, n = 100, y¯ = 0.4

95% conf. int. (c2.5

= 1.96) 1 1 , 0.4 + 1.96 × 10 ] ⇒ [0.204, 0.596] 10 99% conf. int. (c0.5 = 2.575) 1 1 [0.4 − 2.575 × 10 , 0.4 + 2.575 × 10 ] ⇒ [0.1425, 0.6575] [0.4 − 1.96 ×

24 / 43

Meaning of a condence interval



Is it right to say: the probability that

µ

is in the

calculated

condence interval is 95 percent? No! For 95 percent of all random samples, the condence interval contains

µ!

25 / 43

Condence interval with unknown variance Y ∼ N (µ, σ 2 )



If



But if

S=



with known

σ

then

Y¯ −µ √ σ/ n

∼ N (0, 1).

is unknown then we need to estimate it (e.g. with

1 n n−1 Σi=1

Yi − Y¯

2

).

Y¯ −µ √ ∼ tn−1 . S/ n be 97.5th pctile of tn−1 distribution, then

Therefore, Let

c

• [Y¯ − c × √Sn , Y¯ + c × √Sn ]

contains

µ

with 95 percent

probability



S Once again √

n

deviation of

Y¯ .

is a point estimator for the standard This is usually referred to as the standard

error of the point estimate.

26 / 43

Condence interval with non-normal distribution



If

Y ∼ (µ, σ 2 )

still an estimator for



CLT:

Yi ∼

Y¯ is ¯ Y?

but the distribution is non-normal, then

µ

but what is the distribution of

Y¯ −µ √ is (µ, σ 2 ) iid random variables then σ/ n

standard normal when can use

S

n

is large.

as an estimator for

σ

and the theorem still holds.

we build asymptotic condence intervals

27 / 43

Racial discrimination 1988 Washington D.C. •

What is the extent of racial discrimination in hiring?



Study: 5 pairs of black and white applicants (identical in all other aspects). observe if applicants receive a job oer object of interest:

θB − θW

where

receiving a job oer for race

• Bi = 1

θr

indicates probability of

r

if black person gets an oer from employer

i

• Wi = 1if white person gets an oer from employer i ¯ and W ¯ are unbiased estimators for θB and θW • B ¯ ] = θB − θW , is Yi normal? dene Yi = Bi − Wi , then E[Y

y¯ = 0.224 − 0.357 = −0.133



from data we learn:



95% asymptotic conf. int.

0.482 0.482 [−0.133−1.96× √ , −0.133+1.96× √ ] → [−0.164, −0.102] 241 241 28 / 43

Outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

29 / 43

Why formulate a hypothesis test?



In estimation, we looked at the magnitude of an eect of interest.



Sometimes we may want to answer a question, e.g. Did subsidy reform increase ination? Does strict labor regulation increase wages?



statistical signicance vs. practical signicance

30 / 43

Example 1 - returns to construction •

Someone tells you the average returns to construction of residential buildings in Tehran is

r = 30

percent.



You want to test if this is a valid claim.



Collect data on cost and revenue of 100 construction projects and calculate

r¯ = 0.20.

Is this enough to claim average returns is not 0.3? We need to assess the strength of the evidence. Is nding Is nding

r¯ = 0.1 r¯ = 0.2

a stronger evidence against the claim? in a sample of 10,000 projects a stronger

evidence against the claim?



Given our sample, what is the probability that



Null hypothesis

r = 0.3?

H0 : r = 0.3 Alternative hypothesis

H1 : r 6= 0.3 31 / 43

Types of mistakes •

Type I and type II errors Type I: reject

H0

when it is true.

Type II: fail to reject

H0

when it is false.

source: Introduction to hypothesis testing



Reality vs. statistical rejection

32 / 43

Signicance and power •

Signicance level (size) of a test: probability of type I error

α = Pr(Reject H0 | H0 ) this shows how concerned you're about falsely rejecting



H0 .

Power of a test: 1-probability of type II error

π(r) = Pr(Reject H0 | r) = 1 − Pr(do

NOT reject

H0 | r)

this is a function of the actual value of the parameter!



Routine: We pick a signicance level

(α),

then try to minimize

π(r).

33 / 43

Test statistic •

Test statistic (T ) is a random variable built from the random sample (like an estimator!) for a given draw we calculate the value of this random variable and denote by



t

Given a test statistic we need a rejection rule to decide when to reject

H0

in favor of

simple rule: compare values of

t

t

H1

to a critical value

that lead to rejection of

H0

c

are called rejection

region. p-value: Given the calculated

t,

what is the largest

signicance level that we still fail to reject Choosing

α

H0 ? c.

(signicance level) will pin down

34 / 43

Example 1 - cont. • H0 : r = 0.3 •

vs.

H1 : r 6= 0.3

Since this is a hypothesis about the mean, let's use the sample average as a test statistic

T = if



Yi ∼ N i.i.d(r, σ 2 )

Y¯ − r σ/√n

with known

Pick signicance level:α

then

T ∼ N (0, 1).

= 5%

what is the critical value

α

σ

=

c? Pr(| T |> c | H0 )

⇒ c = zα/2 = 1.96 Reject

H0

if

| t |> 1.96

(rejection region).

35 / 43

Example 1 - with numbers



σ = 1, y¯ = 0.2. say

and for a given sample (n

Do we reject

t= •

H0 : r = 0.3

0.2−0.3 √ 1/ 100

in favor of

= −1 ⇒| t |< 1.96 ⇒

what if the sample size was

= 100)

we calculated

H1 : r 6= 0.3?

we don't reject

H0 .

n = 10000?

| t = −10 |> 1.96 ⇒reject H0 . •

What if

n = 100

but

y¯ = 0.1?

| t = −2 |> 1.96 ⇒reject H0 .

36 / 43

Rejection region

37 / 43

How to calculate power •

So far we focused on type I error.



How do we calculate type II error? alternatively, how do we calculate power (1-type II error)?



Power: what is the probability we reject value of the parameter is

H0

if the true

r1 ?

π(r1 ) = Pr(Reject H0 | r = r1 ) = Pr(| T |> c | r = r1 ) r1 − r0 r1 − r0 = Φ(−cα/2 − ) + 1 − Φ(cα/2 − ) √ σ/ n σ/√n •

Graphical calculation is nice.

38 / 43

Graphical representation of power function

39 / 43

Some desirable features of a test •

unbiased test If power is greater or equal to the signicance level of the test for all values of the parameter.



test consistency power goes to one as sample size goes to innity.



Do you think our test statistic in example 1 delivers an unbiased and consistent test? Remember



was a consistent estimator for

r

it seems the test based on this should also be a consistent test!



We can also pick among various test (statistic)s based on the properties of their power functions.

40 / 43

Unknown σ / non-normal distributions •

If

σ

unknown, then we use

this means

T = Must choose

c

S=

q

1 n−1 Σ

Yi − Y¯

2

instead,

Y¯ − r ∼ tn−1 S/√n

based on pctiles of

tn−1

but if

n

is large, this

won't be that dierent from standard normal.



If

Yi ∼ D(µ, σ 2 )

and we can derive the distribution of

then use pctiles of



If

Yi ∼ (µ, σ 2 )

T

f (t).

but the distribution is unknown, then we

need CLT to build an asymptotic test

T = •

Y¯ − r ∼ N (0, 1) as n → ∞ S/√n

The principles remain the same!

41 / 43

Condence intervals vs. hypothesis testing



Condence intervals are just the complement of the rejection region. If a value of the parameter (r0 ) falls in the 95% condence interval around the estimated mean then we will not reject

H0 : r = r0 •

against

H1 : r 6= r0 .

Note many values fall in the condence interval and therefore many nulls won't be rejected!! That's why we don't say we accept reject

H0

H0 .

OR fail to reject

H0 ;

NOT accept

H0

is non-sense!

42 / 43

Summary



In this topic we learned What a (point) estimator is and what desirable nite and asymptotic properties it may possess. What interval estimation is and how we build condence intervals around a point estimate. How to formulate a hypothesis and conduct a hypothesis test.

43 / 43