The Central Limit Theorem

The Central Limit Theorem November 19, 2009 Convergence in distribution Xn →D X is defined to by lim Eh(Xn ) = Eh(X). n→∞ or every bounded continuou...

Author: Melanie Cox

26 downloads 2 Views 139KB Size

Report

Download PDF

Recommend Documents

The Central Limit Theorem

The Martingale Central Limit Theorem

6 5 The Central Limit Theorem

Lab #5. Normal Distribution, Central Limit Theorem

Kullback-Leibler Divergence and the Central Limit Theorem

A Probabilistic Proof of the Lindeberg-Feller Central Limit Theorem

Lecture 10 : Setup for the Central Limit Theorem

Stirling s Formula and DeMoivre-Laplace Central Limit Theorem

Martingale Central Limit Theorem and Nonuniformly Hyperbolic Systems

The central limit theorem The distribution of the sample proportion The distribution of the sample mean

Statistical Foundations: The Normal Distribution The Central Limit Theorem Other Fun Things

Lecture 9 The weak law of large numbers and the central limit theorem

The Second Law of Probability: Entropy Growth in the Central Limit Theorem. Keith Ball

Outline of the course. Chapter 7 : Sampling Distributions and the Central Limit Theorem

Section 7. Sampling Distribution of the Mean and the Central Limit Theorem

PHP 2510 Inference about population mean; distribution of the sample mean; standard error; central limit theorem

Math 341: From Generating Functions to the Central Limit Theorem. Steven J. Miller

The functional central limit theorem for a family of GARCH observations with applications

7.1 Sampling Distributions 7.2 The Central Limit Theorem 7.3 Sampling Distributions for Proportions

Math 111, section 08.x supplement: The Central Limit Theorem notes by Tim Pilachowski

A LIMIT THEOREM FOR SHIFTED SCHUR MEASURES

Central Limit Theorem and convergence to stable laws in Mallows distance

Sampling and Central Limit Theorem. Week 9 Prepared by: Nurazrin Jupri

OPTIMAL ON-LINE SELECTION OF AN ALTERNATING SUBSEQUENCE: A CENTRAL LIMIT THEOREM

The Central Limit Theorem November 19, 2009 Convergence in distribution Xn →D X is defined to by lim Eh(Xn ) = Eh(X).

n→∞

or every bounded continuous function h : R → R. However, it is not necessary to verify this for each choice of h. We can limit ourselves to a smaller so-called convergence determining family of functions. • For random variables taking values in the natural numbers, {hz (x) = z x ; |z| < 1} is convergence determining. In this case, we are look at convergence of the probability generating function. • For real valued random variables, {ht (x) = exp tx; −h < t < h} is convergence determining provided the necessary expected values exist. Note that exp tx is not bounded and so we need to make an additional argument to include these function. In this case, we are looking at convergence of the moment generating function. Example 1. For the binomial distribution with parameters n and p, the probability generating function is ρXn (z) = ((1 − p) + pz)n = (1 − p(1 − z))n If we take the success probability p = λ/n to depend on n, then n λ ρXn (z) = ((1 − p) + pz)n = 1 − (1 − z) → exp λ(1 − z) = ρX (z), n the probability generating function for a Poisson random variable X with parameter λ. Thus, we have that the given binomial random variables converges in distribution to a Poisson random variable. To use this, assume that n is large, but λ = np is moderate the binomial random variable is well approximated by a Poisson random. In particular, Eh(Xn ) ≈ Eh(X) for any bounded continuous h

1

Central Limit Theorem

If we look at distributions for the sum Tn = X1 + X2 + · · · + Xn , what do we see. Let’s look first to the simplest case, Xi Bernoulli random variables. This is looking like the bell curve. To make the comparisons fair, let look at standardized versions of the random variables with mean µ and variance σ 2 , 1

!

!

0.06

! ! !

y

!

!

!

!

! !

!

!

!

!

!

!

!

!

!

!

! !

! !

!

20

40

60

80

!

!!

!

!

!

!

! ! ! ! ! ! ! ! ! ! ! ! ! !! ! ! !! ! !!!!!!!!!!!! !!!!!!!!!!!!!!!!! !! !! !!!!!!!!!!!!!!!!!!!!!!!! !! !!!!!!!!

0.00

0.00

!

!

! !

!! !! !! ! ! !! !! ! ! !! !! !! ! ! ! ! ! !!! ! !!! !! !!! ! ! ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!!!!!!!!!!!!!!!! !!!!!!!!!!

!!! ! ! ! !

!

! !

!

0

!

! !

!

!

!

! !

!

!

! !

!

! !!

!

! !

!

!

! !

!

0.04 0.02

!

!

!

! !! ! !

0.10

!!! ! !

!

0.05

0.08

!

! !!

0.15

!! ! !

y

0.10

!! ! !

100

0

x

10

20

30

40

50

60

x

Figure 1: a. Successes in 100 Bernoulli trials with p = 0.2, 0.4, 0.6 and 0.8. b. Successes in Bernoulli trials with p = 1/2 and n = 20, 40 and 80.

Zn =

Tn − nµ √ σ n

(1)

and look at the density of the sum of standardized exponential random variables. Again, we see the densities approaching that of a bell curve. The classical central limit theorem states that if {Xi ; i ≥ 1} and independent and identically distributed with common mean µ and common variance σ 2 , then Zn as defined by equation (??) converges to Z a standard normal random variable. In terms of the cumulative distribution function Z z 2 1 lim P {Zn ≤ z} = √ e−x /2 dx = Φ(z) n→∞ 2π −∞ where Φ is the cumulative distribution function of the standard normal. We will prove this in the case that the Xi have a moment generating function MX (t) for the interval t ∈ (−h, h) by showing that t2 lim MZn (t) = exp n→∞ 2 or equivalently, show that the cumulant generating functions KZn (t) = log MZn (t) satisfy lim KZn (t) =

n→∞

Write Yi =

Xi − µ σ 2

t2 2

0.6 0.5 0.4

fZ(z)

0.3 0.2 0.1 0 !0.1

!3

!2

!1

0 z

1

2

3

Figure 2: Density of the standardized version of the sum of n independent exponential random variables for n = 2, 4, 8, 16 and 32.

then

n

1 X Zn = √ Yi . n i=1 For MY (t) the moment generating function for the Yi and MTn (t), the moment generating function for Tn , ! n n n Y t X t t Yi = MZn (t) = E exp(tZn ) = E exp √ E exp √ Yi = MY √ n i=1 n n i=1 and

KZn (t) = n log MY

t √ n

= nKY

t √ n

.

Recall that for the cumulant generating function KY , KY0 (0) = EY1 = 0,

KY00 (0) = Var(Y ) = 1.

Finally, from two applications of L’Hˆ opital’s rule, KY (t) tKY0 (t) t2 KY00 (t) t2 KY00 (0) t2 t = lim = lim = lim . = = . lim KZn (t) = lim nKY √ n→∞ n→∞ →0 →0 →0 2 2 2 2 2 n Example 2. For Bernoulli trials, µ = p and σ 2 = p(1 − p). Thus, for large enough n Tn − np Zn = p , np(1 − p) 3

has approximately the distribution of a standard normal random variable. For 100 tosses of a fair coin, Zn =

Tn − 50 , 5

and {Tn ≤ 40} = {Zn ≤ −2} So, P {Tn ≤ 40} ≈ P {Z ≤ −2} = 0.054. Example 3. For an exponential sample with mean 1. Then, the standard deviation is also 1 and for 64 observations Tn − 64 , Zn = 8 {Tn ≥ 78} = {Zn ≥ 1.75} So, P {Tn ≥ 78} ≈ P {Z ≥ −1.75} = 0.086.

2

Slutsky’s Theorem

Some useful extensions of the central limit theorem are based on Slutsky’s theorem. Theorem 4. Let Xn →D X and Yn →P a, a constant as n → ∞. Then 1. Yn Xn →D aX, and 2. Xn + Yn →D X + a. For example, by the law of large numbers, the sample variance Sn2 →a.s. σ 2 , the distribution variance as n → ∞. Thus, Sn a.s. → 1. σ Thus, it also converges in probability. So, by Slutsky’s theorem, the t-statistic Tn − nµ S T − nµ Sn √ = n n√ = Zn →D 1 · Z, σ σ n σ Sn n a standard normal as n → ∞

4

3

Delta Method

For a random sample {Xn ≥ 1} with common mean µ and common variance σ 2 , we can write the central limit theorem using the sample mean. √ ¯ n − µ) →D σZ n(X where Z is a standard normal. To generalize this, assume that {Yn ≥ 1} is a sequence of random variables satisfying √

n(Yn − θ) →D σZ

for some value θ Then the delta method states that if a function g has a continuous derivative and g 0 (θ) 6= 0, then √ n(g(Yn ) − g(θ)) →D σg 0 (θ)Z˜ where Z˜ is also a standard normal. To prove this, expand g is a Taylor’s series about the value θ ˜ n − θ), g(Yn ) = g(θ) + g 0 (θ)(Y or

√

√ ˜ n(Yn − θ). n(g(Yn ) − g(θ)) = g 0 (θ)

where θ˜ lies between Yn and θ. Note that since Yn →P θ implies θ˜ →P , θ and g 0 (θ) is continuous, ˜ →P g 0 (θ). g 0 (θ) and the theorem follows from applying Slutsky’s theorem. ¯ = pˆ, then Example 5. For Bernoulli trials, write X p √ n(ˆ p − p) →D p(1 − p)Z. If we could find g so that 1 g 0 (p) = p , p(1 − p) then

√

n(g(ˆ p) − g(p)) →D Z. √ Such a choice, which here is g(p) = 2 arcsin( p) is called a variance stabilizing transformation.

5