Econometrics I Lecture 2: Statistics

Econometrics I Lecture 2: Statistics Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology 44716 Fall 1395 1 / ...

Author: Shauna Shaw

0 downloads 2 Views 479KB Size

Report

Download PDF

Recommend Documents

Statistics and econometrics

Econometrics I Lecture 7: Dummy Variables

Graduate Econometrics Lecture Notes

241A-Probability, Statistics and Econometrics

Applied Informatics Econometrics Statistics Accounting

241A-Probability, Statistics and Econometrics

Applied Econometrics. Lecture 2: Instrumental Variables, 2SLS and GMM

Lecture 2 (Part I)

Department of Econometrics and Business Statistics

Theoretical Statistics. Lecture 21

Elementary Statistics Lecture 1

Statistics 240 Lecture Notes

ECON Introductory Econometrics. Lecture 14: Panel data

Lecture 2 Dry Etching I

EKONOMETRIA ECONOMETRICS 2(48) 2015

EKONOMETRIA ECONOMETRICS 2(44) 2014

Types of Biostatistics. Lecture 18: Review Lecture. Types of Biostatistics. Approach to Modeling. 2) Inferential Statistics

Lecture 2: Cognitive Psychology Overview I

New Developments in Econometrics Lecture 6: Nonlinear Panel Data Models

Econometrics

The econometrics of inequality and poverty Lecture 8: Equivalence scales

What s New in Econometrics? Lecture 2 Linear Panel Data Models

Econometrics I Lecture 2: Statistics Mohammad Vesal Graduate School of Management and Economics Sharif University of Technology

44716 Fall 1395

1 / 43

Outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

Reference: Wooldridge, Appendix C; Stock and Watson, Ch3; Greene, Appendix C (partly);

2 / 43

Denitions

•

Population, sample, statistical inference

•

learning: estimation and hypothesis testing

•

example: returns to education impractical and costly to survey the whole population point estimate: 7.5 percent interval estimate: 5.6-9.4 percent

•

example: neighborhood watch aects crime hypothesis testing

3 / 43

Denitions (2)

•

Identify population of interest and then the relation of interest. the relations involve (features of ) probability distributions.

Y

random variable on a population with pdf

f (y; θ). θ is.

Draw a sample from the population to learn what

• {Y1 , ..., Yn }

f (y; θ) if Y1 , ..., Yn f (y; θ) (i.i.d)

is a random sample from

independent with common pdf

are

once the sample is drawn we have a set of numbers

{y1 , ..., yn }

4 / 43

Example •

We want to know what is the average returns to education in Iran.

• Y: •

wage,

X:

years of schooling

We take a random sample of

n

individuals in Iran and ask

about their wages and years of schooling.

•

Before we ll out the questionnaire: after, we have data:

•

{(X1 , Y1 ), ..., (Xn , Yn )}

{(x1 , y1 ), ..., (xn , yn )}

joint pdf in the whole population:

fX,Y (x, y; θ),

where

θ

is

a set of parameters.

•

Aim: learn something about

θ

•

We might be interested in a

subset

from the sample. of parameters

fY |X (y | X = x; β) more specic example:

E[Y | X = x] = β0 + β1 x 5 / 43

Outline Preliminary denitions

Point estimators Finite sample properties Asymptotic (large sample) properties

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

6 / 43

What is an estimator? {Y1 , ..., Yn }

•

Random sample

•

(Point) Estimator: is a rule (function) that relates the

from

f (y; θ).

observed values of the sample to a value for the parameter(s) of interest (θ ):

W = h(Y1 , ..., Yn ) W

is a random variable itself.

{y1 , ..., yn } we w = h(y1 , ..., yn ).

given an actual sample of point estimate for

•

θ

Sometimes I use

as

θˆ when

can calculate a

talking about an estimator for

θ.

with a new sample, we get a new estimate for the parameter.

7 / 43

Example

•

Say we are interested in knowing the population mean for a random variable

Y ∼ N (µ, σ 2 ). {Y1 , ..., Yn }

•

We draw a random sample

•

What are the potential estimators for Anything like

µ?

µ ˆ = h(Y1 , ...Yn ).

µ ˆ = Y¯ = µ ˆ1 = Y1

Natural candidate is the sample mean: Weird candidate: use rst draw only:

•

1 n n Σi=1 Yi .

How do we pick among the many possible estimators?

8 / 43

Finite sample properties of estimators

•

Dene criteria for a good estimator Unbiasedness Relative eciency

9 / 43

Unbiasedness •

An estimator,

E(W ) = θ

W

of

θ,

is an unbiased estimator if

for all possible values of

θ.

interpretation: this is about the mean of the distribution of

W

NOT the value you get for an observed sample (w ).

•

dene bias: Bias(W )

•

Is

µ ˆ = Y¯

≡ E(W ) − θ

unbiased estimator of the population mean

What about

µ ˆ1 = Y1 ?

•

Some poor estimators are unbiased

•

Exercise: Is

σ2?

µ?

S 2 = n1 Σni=1 (Yi − µ)2

an unbiased estimator for

10 / 43

Relative eciency

•

Why only consider the mean of the distribution of

•

Sampling variance of estimators could be important too!

W?

Var(W )?

•

Relative eciency: If

W1

and

W2

W2

when Var(W1 )

with strict inequality for at least one

•

θ, W1 is ≤Var(W2 ) for all θ, θ.

are two unbiased estimators of

ecient relative to

Example: Var(ˆ µ) vs. Var(µ ˆ1 ) Var(ˆ µ)

=

σ2 n

< Var(ˆ µ1 ) = σ 2

for

n>1

11 / 43

Unbiasedness vs. relative eciency

Source: Wooldridge (2013)

12 / 43

Comparing two estimators

•

What if one estimator is biased and the other is not? How do we compare them?

•

Mean squared error (MSE)

M SE(W ) = E[(W − θ)2 ] = V ar(W ) + (Bias(W ))

2

13 / 43

Asymptotic properties

•

How do the properties of estimators change as the sample size increases?

•

How large is large?

14 / 43

Consistency • Wn

an estimator of

θ

based on

{Y1 , ..., Yn }

is consistent if

∀ > 0, Pr(| Wn − θ |> ) → 0 as n → ∞ •

Sometimes we write the above condition as plimWn

= θ.

If I require the estimator to be close enough to the parameter, I can nd a large enough sample size that does this. Or as the sample size increases, the probability of being far from

µ

drops to zero.

Unbiased estimators are not necessarily consistent But those with shrinking variance are. if

E(Wn ) = θ

and Var(Wn )

→0

as

n→∞

then

Wn

is consistent.

15 / 43

LLN and CLT

•

Law of L arge Numbers (LLN) If

Y1 , ..., Yn

are i.i.d with mean

•

Asymptotic normality

•

Central Limit Theorem (CLT)

µ

¯n ) then plim(Y

=µ

Y1 , ..., Yn is a random sample with mean µ and variance ¯ −µ σ 2 , then Zn = Yσ/n√ has an asymptotic standard normal n If

distribution.

16 / 43

Approaches to parameter estimation

•

Are there systematic methods for derivation of good estimators?

•

We discuss three approaches briey, Method of moments Maximum likelihood Least squares

•

but during the course we mostly rely on least squares.

17 / 43

Method of moments •

Remember the following

moments

for a random variable

with a given distribution

µ = E(Y ), σ 2 = E (Y − µ)2 •

A random sample with

•

A natural way to estimate parameters

n

observations is

µ

{Y1 , ..., Yn } and

σ2

is to replace

the moment conditions with their sample counterparts.

µ ˆ = Y¯ = •

1 n n Σi=1 Yi ,

σ ˆ2 =

1 n n−1 Σi=1

Yi − Y¯

A more general interpretation is We know some random variables must satisfy a few conditions in the population. We use the sample counterparts of these conditions to formulate method of moment estimators.

18 / 43

Maximum likelihood •

For a sample of

{Y1 , ..., Yn }

we can dene the likelihood

function as follows

L(θ; Y1 , ..., Yn ) = f (Y1 , ..., Yn ; θ) where

f ()

is the joint pdf for the random variables

a vector of unknown parameters

Yi

with

θ.

•

Maximum likelihood (ML) suggests a good estimator for

•

Intuition:

θ

maximizes the likelihood function.

L(θ; y1 , ..., yn )

gives the probability of observing a given

realization for the sample as a function of ML says a good estimator picks values for

θ. θ such

that the

probability of observing the current sample is maximized.

•

Under random sampling and assuming a pdf for

Y

this

could be a fruitful method.

•

Example: Random sample and estimator for

µ

Y ∼ N (µ, σ 2 ).

Find ML

2 and σ . 19 / 43

Least squares •

Consider the following decomposition

Yi = E [Yi | Xi = xi ] + ui where

•

ui = Yi − E [Yi | Xi = xi ].

Also consider a functional form for

E [Yi | Xi = xi ] = g(xi ; θ). •

One way to assess the goodness of

g(.)

is to see how far we

are on average from the observed values for

Y.

l(θ) = Σni=1 (Yi − g(xi ; θ))2 • •

LS estimator is

θˆ = argmin l(θ)

Example: Assume estimators for

β0

g(xi ; β0 , β1 ) = β0 + β1 xi , β1 .

nd LS

and

20 / 43

outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

21 / 43

Point vs. interval estimation

•

Point estimate doesn't provide information about how close the estimate is likely to be to the parameter of interest.

•

Point estimate for returns to education: 7 percent we cannot know

for certain

how close this is to the

population parameter. But we can make

•

probabilistic

claims.

Interval estimate: 3-11 percent Build a condence interval

22 / 43

Interval estimate for mean •

Random sample

{Y1 , ..., Yn }

from a population with

N (µ, σ 2 ). •

Use sample mean to estimate

µ:

σ2 Y¯ − µ √ ∼ N (0, 1) Y¯ ∼ N (µ, ) ⇒ n σ/ n •

From the standard normal distribution we know

Y¯ − µ √ < +1.96 = 0.95 Pr −1.96 < σ/ n

•

which suggests with 95 percent probability the interval

[Y¯ − 1.96 √σn , Y¯ + 1.96 √σn ] •

interval estimate for

contains

µ.

µ: [¯ y − 1.96 √σn , y¯ + 1.96 √σn ]. 23 / 43

100(1 − α)% condence interval •

We considered 95 percent condence intervals but the concept is more general

•

Let

cα/2

denote the

100(1 − α/2)

percentile of the standard

normal distribution.

• 100(1 − α)% obtained as

condence interval around the mean is

σ σ [¯ y − cα/2 √ , y¯ + cα/2 √ ] n n ¯ the standard deviation of Y

•

σ Notice √

•

numerical example:

n

is

σ = 1, n = 100, y¯ = 0.4

95% conf. int. (c2.5

= 1.96) 1 1 , 0.4 + 1.96 × 10 ] ⇒ [0.204, 0.596] 10 99% conf. int. (c0.5 = 2.575) 1 1 [0.4 − 2.575 × 10 , 0.4 + 2.575 × 10 ] ⇒ [0.1425, 0.6575] [0.4 − 1.96 ×

24 / 43

Meaning of a condence interval

•

Is it right to say: the probability that

µ

is in the

calculated

condence interval is 95 percent? No! For 95 percent of all random samples, the condence interval contains

µ!

25 / 43

Condence interval with unknown variance Y ∼ N (µ, σ 2 )

•

If

•

But if

S=

qσ

with known

σ

then

Y¯ −µ √ σ/ n

∼ N (0, 1).

is unknown then we need to estimate it (e.g. with

1 n n−1 Σi=1

Yi − Y¯

2

).

Y¯ −µ √ ∼ tn−1 . S/ n be 97.5th pctile of tn−1 distribution, then

Therefore, Let

c

• [Y¯ − c × √Sn , Y¯ + c × √Sn ]

contains

µ

with 95 percent

probability

•

S Once again √

n

deviation of

Y¯ .

is a point estimator for the standard This is usually referred to as the standard

error of the point estimate.

26 / 43

Condence interval with non-normal distribution

•

If

Y ∼ (µ, σ 2 )

still an estimator for

•

CLT:

Yi ∼

Y¯ is ¯ Y?

but the distribution is non-normal, then

µ

but what is the distribution of

Y¯ −µ √ is (µ, σ 2 ) iid random variables then σ/ n

standard normal when can use

S

n

is large.

as an estimator for

σ

and the theorem still holds.

we build asymptotic condence intervals

27 / 43

Racial discrimination 1988 Washington D.C. •

What is the extent of racial discrimination in hiring?

•

Study: 5 pairs of black and white applicants (identical in all other aspects). observe if applicants receive a job oer object of interest:

θB − θW

where

receiving a job oer for race

• Bi = 1

θr

indicates probability of

r

if black person gets an oer from employer

i

• Wi = 1if white person gets an oer from employer i ¯ and W ¯ are unbiased estimators for θB and θW • B ¯ ] = θB − θW , is Yi normal? dene Yi = Bi − Wi , then E[Y

y¯ = 0.224 − 0.357 = −0.133

•

from data we learn:

•

95% asymptotic conf. int.

0.482 0.482 [−0.133−1.96× √ , −0.133+1.96× √ ] → [−0.164, −0.102] 241 241 28 / 43

Outline Preliminary denitions

Point estimators

Approaches to parameter estimation

Interval estimation and condence intervals

Hypothesis testing

29 / 43

Why formulate a hypothesis test?

•

In estimation, we looked at the magnitude of an eect of interest.

•

Sometimes we may want to answer a question, e.g. Did subsidy reform increase ination? Does strict labor regulation increase wages?

•

statistical signicance vs. practical signicance

30 / 43

Example 1 - returns to construction •

Someone tells you the average returns to construction of residential buildings in Tehran is

r = 30

percent.

•

You want to test if this is a valid claim.

•

Collect data on cost and revenue of 100 construction projects and calculate

r¯ = 0.20.

Is this enough to claim average returns is not 0.3? We need to assess the strength of the evidence. Is nding Is nding

r¯ = 0.1 r¯ = 0.2

a stronger evidence against the claim? in a sample of 10,000 projects a stronger

evidence against the claim?

•

Given our sample, what is the probability that

•

Null hypothesis

r = 0.3?

H0 : r = 0.3 Alternative hypothesis

H1 : r 6= 0.3 31 / 43

Types of mistakes •

Type I and type II errors Type I: reject

H0

when it is true.

Type II: fail to reject

H0

when it is false.

source: Introduction to hypothesis testing

•

Reality vs. statistical rejection

32 / 43

Signicance and power •

Signicance level (size) of a test: probability of type I error

α = Pr(Reject H0 | H0 ) this shows how concerned you're about falsely rejecting

•

H0 .

Power of a test: 1-probability of type II error

π(r) = Pr(Reject H0 | r) = 1 − Pr(do

NOT reject

H0 | r)

this is a function of the actual value of the parameter!

•

Routine: We pick a signicance level

(α),

then try to minimize

π(r).

33 / 43

Test statistic •

Test statistic (T ) is a random variable built from the random sample (like an estimator!) for a given draw we calculate the value of this random variable and denote by

•

t

Given a test statistic we need a rejection rule to decide when to reject

H0

in favor of

simple rule: compare values of

t

t

H1

to a critical value

that lead to rejection of

H0

c

are called rejection

region. p-value: Given the calculated

t,

what is the largest

signicance level that we still fail to reject Choosing

α

H0 ? c.

(signicance level) will pin down

34 / 43

Example 1 - cont. • H0 : r = 0.3 •

vs.

H1 : r 6= 0.3

Since this is a hypothesis about the mean, let's use the sample average as a test statistic

T = if

•

Yi ∼ N i.i.d(r, σ 2 )

Y¯ − r σ/√n

with known

Pick signicance level:α

then

T ∼ N (0, 1).

= 5%

what is the critical value

α

σ

=

c? Pr(| T |> c | H0 )

⇒ c = zα/2 = 1.96 Reject

H0

if

| t |> 1.96

(rejection region).

35 / 43

Example 1 - with numbers

•

σ = 1, y¯ = 0.2. say

and for a given sample (n

Do we reject

t= •

H0 : r = 0.3

0.2−0.3 √ 1/ 100

in favor of

= −1 ⇒| t |< 1.96 ⇒

what if the sample size was

= 100)

we calculated

H1 : r 6= 0.3?

we don't reject

H0 .

n = 10000?

| t = −10 |> 1.96 ⇒reject H0 . •

What if

n = 100

but

y¯ = 0.1?

| t = −2 |> 1.96 ⇒reject H0 .

36 / 43

Rejection region

37 / 43

How to calculate power •

So far we focused on type I error.

•

How do we calculate type II error? alternatively, how do we calculate power (1-type II error)?

•

Power: what is the probability we reject value of the parameter is

H0

if the true

r1 ?

π(r1 ) = Pr(Reject H0 | r = r1 ) = Pr(| T |> c | r = r1 ) r1 − r0 r1 − r0 = Φ(−cα/2 − ) + 1 − Φ(cα/2 − ) √ σ/ n σ/√n •

Graphical calculation is nice.

38 / 43

Graphical representation of power function

39 / 43

Some desirable features of a test •

unbiased test If power is greater or equal to the signicance level of the test for all values of the parameter.

•

test consistency power goes to one as sample size goes to innity.

•

Do you think our test statistic in example 1 delivers an unbiased and consistent test? Remember

Y¯

was a consistent estimator for

r

it seems the test based on this should also be a consistent test!

•

We can also pick among various test (statistic)s based on the properties of their power functions.

40 / 43

Unknown σ / non-normal distributions •

If

σ

unknown, then we use

this means

T = Must choose

c

S=

q

1 n−1 Σ

Yi − Y¯

2

instead,

Y¯ − r ∼ tn−1 S/√n

based on pctiles of

tn−1

but if

n

is large, this

won't be that dierent from standard normal.

•

If

Yi ∼ D(µ, σ 2 )

and we can derive the distribution of

then use pctiles of

•

If

Yi ∼ (µ, σ 2 )

T

f (t).

but the distribution is unknown, then we

need CLT to build an asymptotic test

T = •

Y¯ − r ∼ N (0, 1) as n → ∞ S/√n

The principles remain the same!

41 / 43

Condence intervals vs. hypothesis testing

•

Condence intervals are just the complement of the rejection region. If a value of the parameter (r0 ) falls in the 95% condence interval around the estimated mean then we will not reject

H0 : r = r0 •

against

H1 : r 6= r0 .

Note many values fall in the condence interval and therefore many nulls won't be rejected!! That's why we don't say we accept reject

H0

H0 .

OR fail to reject

H0 ;

NOT accept

H0

is non-sense!

42 / 43

Summary

•

In this topic we learned What a (point) estimator is and what desirable nite and asymptotic properties it may possess. What interval estimation is and how we build condence intervals around a point estimate. How to formulate a hypothesis and conduct a hypothesis test.

43 / 43