Theoretical Statistics. Lecture 21

Theoretical Statistics. Lecture 21. Peter Bartlett 1. Motivation: Asymptotics of tests 2. Recall: Contiguity 3. Le Cam’s lemmas. [vdV6] 1 Motivatin...
Author: Daniel Malone
0 downloads 0 Views 82KB Size
Theoretical Statistics. Lecture 21. Peter Bartlett 1. Motivation: Asymptotics of tests 2. Recall: Contiguity 3. Le Cam’s lemmas. [vdV6]

1

Motivating example: asymptotic testing Consider the asymptotics of a test. We have • A parametric model Pθ for θ ∈ Θ. • A null hypothesis θ = θ0 . • An alternative hypothesis θ = θ0 + h. Test: compute the log likelihood ratio, λ = log

n Y dPθ

(Xi ) , dPθ0 (Xi )

i=1

0 +h

and reject the null hypothesis if it is sufficiently large.

2

Asymptotic testing For a fixed alternative, this typically has trivial asymptotics. For example, suppose Pθ = N (θ, σ 2 ). Then λ = log

n Y dPθ

i=1 n X

1 = 2 2σ 1 = 2 2σ

i=1 n X i=1

0 +h

dPθ0

(Xi ) 2

(Xi − θ0 ) − (Xi − θ0 − h) (2h(Xi − θ0 ) − h2 )

nh ¯ nh2 = 2 (X − θ 0 ) − 2 . σ 2σ

3

2



Asymptotic testing ¯ ∼ N (θ0 , σ 2 /n), so the log likelihood ratio is Under the null hypothesis, X   2 2 nh nh θ λ ∼0 N − 2 , 2 . 2σ σ [Notice that the mean is half the negative variance!] Clearly (consider, for example, Chebyshev’s inequality), for a fixed h 6= 0, P

we have λ → −∞ (that is, for all c, Pr(λ < c) → 1). So the asymptotics are rather trivial: asymptotically, we do not reject the null hypothesis.

4

Asymptotic testing Consider instead a shrinking alternative: Replace h with hn → 0. Then λ = log

n Y dPθ

0 +hn

dPθ0

i=1

(Xi )

nh2n nhn ¯ = 2 (X − θ 0 ) − σ 2σ 2   2 2 nhn nhn θ0 ∼N − 2, 2 . 2σ σ √ So for nhn → h 6= 0, its parameters approach (−h2 /(2σ 2 ), h2 /σ 2 ). And provided h2 /(2σ 2 ) ≫ h/σ (that is, h/(2σ) ≫ 1, or hn /(2σ) ≫ n−1/2 ), we do not reject the null hypothesis. 5

Asymptotic testing These asymptotics are typical. Another example: The exponential family with sufficient statistic T has density pθ (x) = exp (T (x)θ − A(θ)). We have n Y dPθ0 +hn λ = log (Xi ) dP θ0 i=1 = hn = hn

n X

i=1 n X

T (Xi ) − n (A(θ0 + hn ) − A(θ0 )) 

1 T (Xi ) − n A′ (θ0 )hn + A′′ (θ0 )h2n + o(h2n ) 2 i=1

n X



n ′′ (T (Xi ) − Pθ0 T (Xi )) − A (θ0 )h2n + o(nh2n ). = hn 2 i=1 6

Asymptotic testing √ So if hn = h/ n, and T (Xi ) has finite variance, then we have n

h2 ′′ h X (T (Xi ) − Pθ0 T (Xi )) − A (θ0 ) + o(1) λ= √ n i=1 2  2  h var(T (X1 )) 2 θ0 ∼N − , h var(T (X1 )) . 2

7

Contiguity For these examples, the distributions are absolutely continuous wrt each other. In general, we need to make sure that the likelihood ratios dQn dPn make sense (at least asymptotically). We need an analogous asymptotic condition to absolute continuity (P (A) = 0 only if Q(A) = 0): contiguity.

8

Recall: Absolute Continuity

Definition: 1. Q ≪ P (“Q is absolutely continuous wrt P ”) means ∀A, P (A) = 0 =⇒ Q(A) = 0. 2. P ⊥ Q (“P and Q are orthogonal”) means ∃ΩP , ΩQ , P (ΩP ) = 1,

Q(ΩP ) = 0,

Q(ΩQ ) = 1,

P (ΩQ ) = 0.

9

Recall: Absolute Continuity Suppose that P and Q have densities p and q wrt some measure µ. Define Qa (A) = Q (A ∩ {p > 0}) ,

Q⊥ (A) = Q (A ∩ {p = 0}) .

Lemma: 1. Q = Qa + Q⊥ , with Qa ≪ P and Q⊥ P  Z  Z q dQ 2. Qa (A) = dP = dP . A p A dP a

3. Q ≪ P ⇔ Q = Q

If Q ≪ P then Qf (X) = P

(Lebesgue decomposition)

⇔ Q(p = 0) = 0 ⇔ 

 dQ . f (X) dP 10

Z

q dP = 1. p

Contiguity

Definition: Qn ⊳ Pn (“Qn is contiguous wrt Pn ”) means, ∀An , Pn (An ) → 0 =⇒ Qn (An ) → 0.

11

Contiguity: Examples

Example: 1. Pn = N (0, 1), Qn = N (µn , σ 2 ) with σ 2 > 0 and µn → µ ∈ R. Then Qn ⊳ Pn and Pn ⊳ Qn . 2. Pn = N (0, 1), Qn = N (µn , σ 2 ) with σ 2 > 0 and µn → ∞. Then An = [µn , µn + 1] shows that we do not have Qn ⊳ Pn . (But notice that we have Qn ≪ Pn for all n.)

3. Pn is uniform on [0, 1], Qn is uniform on [θn0 , θn1 ], θn0 < θn1 , θn0 → 0, θn1 → 1. Then Pn ⊳ Qn and Qn ⊳ Pn . (But notice that we have neither Pn ≪ Qn nor Qn ≪ Pn .) 12

Contiguity

Lemma: [Le Cam’s first lemma] The following are equivalent: 1. Qn ⊳ Pn . dPn Qn 2. U along a subsequence =⇒ P (U > 0) = 1. dQn dQn Pn V along a subsequence =⇒ EV = 1. 3. dPn P

Qn

4. Tn →n 0 =⇒ Tn → 0.

13

Contiguity dPn dQn Notice that , are non-negative and dQn dPn EPn

dQn ≤ 1, dPn

EQn

dPn ≤ 1. dQn

So the likelihood ratios are uniformly tight, and therefore have a weakly converging subsequence (Prohorov’s theorem). Le Cam’s first lemma shows that the limits characterize contiguity. These characterizations are analogous to the characterizations we saw for absolute continuity:   dP dQ = 0 = 0 ⇔ EP = 1. Q≪P ⇔ Q dQ dP

14

Aside: Recall normal asymptotic testing For Pθ = N (θ, σ 2 ), λ = log

n Y dPθ

i=1

(Xi ) dPθ0 (Xi ) 0 +hn

nhn ¯ nh2n = 2 (X − θ 0 ) − σ 2σ 2   2 2 nh nh θ0 ∼ N − n2 , 2n . 2σ σ And for



nhn → h 6= 0, its parameters approach (−h2 /(2σ 2 ), h2 /σ 2 ).

15

Contiguity Here is an important example. Local asymptotic normality: log likelihood ratio of local alternative to true parameter is asymptotically normal. Example: Suppose log

dPn dQn

Qn

N (µ, σ 2 ).

dPn Qn U implies U = exp(N (µ, σ 2 )) > 0, so part (2) of the Then dQn lemma shows that Qn ⊳ Pn . Conversely, part (3) of the lemma shows that Pn ⊳ Qn iff E exp(N (µ, σ 2 )) = 1.

16

Contiguity

Example: (Continued) This is true iff   Z 2 (x − µ) 1 exp x − 1= √ dx 2 2 2σ 2πσ     Z 2 2 2 2 2 (x − (µ + σ )) (µ + σ ) − µ 1 exp − dx exp , =√ 2 2 2 2σ 2σ 2πσ which is equivalent to µ = −σ 2 /2. (Alternatively, E exp(Z) = MZ (1) for Z ∼ N (µ, σ 2 ). And MZ (t) = exp(µt + σ 2 t/2), so MZ (1) = 1 for µ = −σ 2 /2.)

17

Contiguity

Example: (Continued) That is, for log

dPn dQn

Qn

Pn ⊳ Qn iff µ = −σ 2 /2.

18

N (µ, σ 2 ),

Contiguity and change of measure Recall that, if Q ≪ P then we can write the Q-law of X in terms of the P -law of the pair (X, dQ/dP ). Le Cam’s third lemma shows an asymptotic version: If Qn is contiguous wrt Pn , then we can write the limit of the Qn -law of a weakly converging random variable Xn in terms of the limit of the Pn -law of the pair (Xn , dQn /dPn ).

19

Contiguity and change of measure

Theorem: [Le Cam’s third lemma] If Qn ⊳ Pn and   dQn Pn (X, V ), Xn , dPn then we can write Xn

Qn

L where the distribution L satisfies

EL f = Ef (X)V, L(X ∈ A) = E [1[X ∈ A]V ] =

20

Z

v dPX,V (x, v). A×R

Contiguity and change of measure Corollary: Suppose, for Xn ∈ Rk ,      dQn Pn  µ   Σ Xn , log , N 2 σ dPn τT − 2

Then Xn

Qn

τ σ2



 .

N (µ + τ, Σ).

Think of Xn as some test statistic, which approaches a normal under Pn . Think of Qn as an alternative distribution, for which the asymptotic distribution of the log likelihood ratio is normal, with µ = −σ 2 /2. Under the alternative distribution Qn , the asymptotic distribution of the statistic Xn also approaches a normal, but with the variance shifted by the limiting covariance between Xn and log(dQn /dPn ) under Pn . 21

Contiguity and change of measure: Proof Let (X, Z) have the limiting distribution, so   dQn Pn (X, Z), Xn , log dPn hence 

dQn Xn , dPn



Pn

(X, exp(Z)).

Because Z ∼ N (−σ 2 /2, σ 2 ), Qn ⊳ Pn . By Le Cam’s lemma, Xn R where f (x)L(dx) = Ef (X) exp(Z).

22

Qn

L,

Contiguity and change of measure: Proof Thus, the characteristic function of L is φL (t) = E exp(itT X + Z)   t  ,   = φX,Z −i But the normal distribution of (X, Z) implies its characteristic function is          Σ τ 2 t t iuσ 1 T T         .  φX,Z = exp it µ − − t u 2 2 u u τ T σ2

23

Contiguity and change of measure: Proof Substituting gives  1 T T φL (t) = exp it (µ + τ ) − t Σt , 2 

which implies that L is N (µ + τ, Σ).

24