Lecture 6. (iv) The complex conjugate of φ, φ(t), is the characteristic function of ( X), and φ(t) R for all t R d if X dist = ( X)

Lecture 6 1 Characteristic Functions For weak convergence of probability measures on Rd , or equivalently weak convergence of Rd -valued random vari...
Author: Guest
2 downloads 2 Views 223KB Size
Lecture 6 1

Characteristic Functions

For weak convergence of probability measures on Rd , or equivalently weak convergence of Rd -valued random variables, an important convergence determining class of functions are functions of the form f (x) = eit·x , with t ∈ Rd . They lead to Definition 1.1 [Characteristic Function] Let X be a Rd -valued random variable with distribution µ on Rd . Then the characteristic function of X (or µ) is defined to be Z φ(t) := E[eit·X ] = eit·x µ(dx). When µ has a density with respect to Lebesgue measure, i.e., µ(dx) = ρ(x)dx, φ(·) is just the Fourier transform of the function ρ(·). Therefore in general, we can think of φ(·) as the Fourier transform of the measure µ. Here are some properties which are immediate from the definition: Proposition 1.2 [Properties of Characteristic Functions] (i) Since eit·x = cos tx + i sin tx has bounded real and imaginary parts, φ(t) is well-defined. (ii) φ(0) = 1 and |φ(t)| ≤ 1 for all t ∈ Rd . (iii) φ is uniformly continuous on Rd . More precisely, |φ(t + h) − φ(t)| = E[ei(t+h)·X ] − E[eit·X ] = E[eit·X (eih·X − 1)] ≤ E[|eih·X − 1|], which tends to 0 as h → 0 by the bounded convergence theorem. (iv) The complex conjugate of φ, φ(t), is the characteristic function of (−X), and φ(t) ∈ R dist

for all t ∈ Rd if X = (−X). (v) For a ∈ R and b ∈ Rd , aX + b has characteristic function ϕ(t) = eib·t φ(at). (vi) If and X and Y are two independent random variables with characteristic functions φX and φY , then X + Y has characteristic function φX+Y (t) = E[eit·(X+Y ) ] = φX (t)φY (t). Proposition 1.3 [Common Distributions and Their Characteristic Functions] (a) The delta measure at a ∈ R: µ(dx) = δa (dx). Then φ(t) = eia·t . (b) The coin flip: µ({1}) = µ({−1}) = 12 . Then φ(t) =

eit +e−it 2

= cos t. n

(c) The Poisson distribution with parameter λ > 0: µ({n}) = e−λ λn! for n ∈ {0} ∪ N. Then P itn λn = e−λ(1−eit ) . φ(t) = e−λ ∞ n=0 e n! P (d) The compound Poisson distribution: X = Tn=1 Yi , where T has Poisson distribution with parameter λ, and (Yi )i∈N is an independent sequence of i.i.d. random variables with characteristic function ψ(·). Then φ(t) = E[eitX ] = e−λ(1−ψ(t)) . 1

x2

(e) The Gaussian (or standard normal) distribution: µ(dx) = √12π e− 2 dx on R. Then R R (x−it)2 x2 t2 t2 φ(t) = √12π eitx e− 2 dx = e− 2 √12π R e− 2 dx = e− 2 by contour integration. 1 e−x xγ−1 on [0, ∞), where γ > 0 (f) The exponential and gamma distributions: µ(dx) = Γ(γ) and Γ(γ) is the normalizing constant. When γ = 1, µ is the exponential distribution. R it·x −x γ−1 1 1 Then φ(t) = Γ(γ) e e x dx = (1−it) γ.

The next result shows that the characteristic function of a probability measure on R uniquely determines the probability measure, just as the Fourier transform of a function can be inverted to recover the original function. Theorem 1.4 [The Inversion Formula] Let φ(·) be the characteristic function of a probability measure µ on R. If a < b, then Z T −ita 1 e − e−itb µ({a}) + µ({b}) lim φ(t)dt = µ((a, b)) + . T →∞ 2π −T it 2 Note that Theorem 1.4 recovers µ((a, b)) from φ(·) for all a < b with µ({a}) = µ({b}) = 0, and hence recovers µ. This implies that the family of functions {x ∈ R → eit·x : t ∈ R} is a distribution determining class. Proof of Theorem 1.4. Note that e−ita − e−itb Z b e−ity dy ≤ |b − a| = it a Therefore by Fubini, we can write Z T −ita 1 e − e−itb lim φ(t)dt = T →∞ 2π −T it =

T

1 lim T →∞ 2π

Z

1 T →∞ 2π

Z Z

−T

for all t ∈ R.

e−ita − e−itb  it T

lim

−T T

Z

(1.1)

 eitx µ(dx) dt

eit(x−a) − eit(x−b) dt µ(dx) it

1 sin t(x − a) − sin t(x − b) dt µ(dx) T →∞ 2π t −T Z 1 = lim (u(T, x − a) − u(T, x − b))µ(dx), T →∞ 2π Z Z

=

lim

t(x−b) where we used that cos t(x−a)−cos is an odd function in t, and it Z T Z T Z zT Z ∞ sin tz sin tz sin t sin t u(T, z) := dt = 2 dt = 2 dt −→ sign(z)2 dt, T →∞ t t t t −T 0 0 0

where sign(z) = 0 if z = 0,R −1 if z < 0, and 1 if > 0. Furthermore, it is known that the R Tz sin ∞ sin t so-called Dirichlet integral 0 t dt = limT →∞ 0 t t dt = π2 (see [1, Appendix 6, Ex.6.6]). Therefore supz,T |u(z, T )| < ∞, and hence by the bounded convergence theorem, Z 1 µ({a}) + µ({b}) lim (u(T, x − a) − u(T, x − b))µ(dx) = µ((a, b)) + . T →∞ 2π 2 It is known in the theory of Fourier transforms that, the faster f (x) tends to 0 as |x| → ∞, the smoother (more differentiable) is its Fourier transform fˆ. Conversely, the faster fˆ(t) → 0 as |t| → ∞, the smoother is f . The characteristic function φ is the Fourier transform of the measure µ. When φ is integrable, which can be interpreted as a condition on the decay of φ(t) as |t| → ∞, it can be shown that µ is “smooth” in the sense that it admits a density. 2

R R Theorem 1.5 [Inverse Fourier Transform] Let φ(t) = eitx µ(dx), such that R |φ(t)|dt < ∞. Then µ(dx) = f (x)dx with bounded density Z 1 f (x) = e−itx φ(t)dt. 2π R Proof. By (1.1) and the assumption that φ is integrable, we can apply the dominated convergence theorem in Theorem 1.4 to conclude that for all a < b, Z T −ita e − e−itb µ({a}) + µ({b}) 1 µ((a, b)) + = lim φ(t)dt T →∞ 2π −T 2 it Z −ita Z e − e−itb 1 |b − a| |φ(t)|dt, = φ(t)dt ≤ 2π R it 2π which implies that µ has no atoms. Therefore by Fubini, for all a < b, Z −ita Z Z b 1 e − e−itb 1 µ((a, b)) = e−ity dyφ(t)dt φ(t)dt = 2π R it 2π R a Z b Z Z b  1 −ity = e φ(t)dt dy = f (y)dy, 2π R a a which implies that µ(dx) = f (x)dx. The above proof can be extended to show that probability measures on Rd are also uniquely determined by their characteristic functions. Alternatively, see [2, Theorem 15.8] for a proof which uses the fact that the algebra of functions generated by {eit·x : t ∈ Rd } is dense in some sense in the space of bounded continuous functions, which makes the characteristic function distribution determining.

2

Characteristic Functions and Weak Convergence

We now show that the class of functions {x ∈ R → eitx : t ∈ R} is not only distribution determining, but also convergence determining. More precisely, we show Theorem 2.1 [L´ evy’s Continuity Theorem on R] Let (µn )n∈N be a sequence of probability measures on R, with characteristic functions (φn )n∈N . If µn ⇒ µ∞ , then φn converges pointwise to φ∞ , the characteristic function of µ∞ . Conversely if φn converges pointwise to a function φ∞ which is continuous at 0, then φ∞ is the characteristic function of a probability measure µ∞ , and µn ⇒ µ∞ . Proof. Since eitx has bounded and continuous real and imaginary parts, φn → φ∞ follows from µn ⇒ µ∞ by the definition of weak convergence. For the converse, assume that φn → φ∞ pointwise on R, where φ∞ is continuous at 0. To prove that µn converges weakly to some limit µ∞ , we only need to show that {µn }n∈N is a relatively compact set of probability measures, which implies that every subsequence of {µn }n∈N has a further subsequential weak limit. The assumption φn → φ∞ implies that all subsequential weak limits of (µn )n∈N have characteristic function φ∞ , and hence µn converges weakly to a unique limit µ∞ with characteristic function φ∞ . By Prohorov’s Theorem, {µn }n∈N is relatively compact if and only if it is tight, namely, for all  > 0, we can find A such that µn (−∞, −A) + µn (A, ∞) ≤  3

for all n ∈ N.

(2.2)

Information about the tail probability µn (−∞, −A) + µn (A, ∞) can in fact be recovered from the behavior of φn near 0. We proceed as follows. Note that for any T > 0, by Fubini, Z Z T Z T 1 1 φn (t)dt = eitx dt µn (dx) 2T −T 2T −T Z sin T x = µn (dx) Tx Z Z sin T x sin T x ≤ µn (dx) + µn (dx) Tx Tx |x|< l |x|≥ l  1 1 ≤ µn (−l, l) + µn (x : |x| ≥ l) = 1 − 1 − µn (x : |x| ≥ l), Tl Tl where we used that siny y ≤ 1 and | sin y| ≤ 1 for all y ∈ R. Therefore 

Choosing l =

2 T

µn

1  1 1− µn (x : |x| ≥ l) ≤ 1 − Tl 2T

Z

T

φn (t)dt. −T

then gives 

 2 1 x : |x| ≥ ≤2 1− T 2T

Z

T

Z  1 T φn (t)dt = (1 − φn (t))dt, T −T

−T

(2.3)

 RT 1 where the right hand side converges to 2 1 − 2T −T φ∞ (t)dt as n → ∞ by the assumption φn → φ∞ and the bounded convergence theorem. OnRthe other hand,  the assumption φ∞ (0) = T 1 1 and φ∞ is continuous at 0 implies that 2 1 − 2T φ (t)dt → 0 as T → 0. Therefore −T ∞ given any  > 0, we can first choose T sufficiently small and then n sufficiently large, say n ≥ n0 (), such that µn



2 1 x : |x| ≥ ≤ T T

Z

T

1 (φ∞ (t) − φn (t))dt + T −T

Z

T

(1 − φ∞ (t))dt ≤ . −T

We can then choose T to be even smaller such that the above bound holds for all µi with 1 ≤ i < n0 , which establishes the tightness condition (2.2) with A = T2 . Remark. We can complement (2.3) by establishing an inequality in the reverse direction, so that how quickly φ(t) → 1 as t → 0 is controlled by the tail probability: Z Z Z itx itx |1 − φ(t)| ≤ |e − 1|µ(dx) = |e − 1|µ(dx) + |eitx − 1|µ(dx) |x|< l

Z ≤ |x|< l

|x|≥ l

Z

0

tx

eiy dy µ(dx) + 2µ(x : |x| ≥ l)

(2.4)

≤ |t| l + 2µ(x : |x| ≥ l). If we choose l = l(t) such that l ↑ ∞ and |t|l → 0 as t → 0, then we obtain a bound on how fast φ(t) → 1 as t → 0 in terms of how fast µ(x : |x| ≥ l) → 0 as l → ∞. L´evy’s Continuity Theorem can be extended to higher dimensions. Theorem 2.2 [L´ evy’s Continuity Theorem on Rd ] Let (µn )n∈N be a sequence of probability measures on Rd with characteristic functions (φn )n∈N . If φn converges pointwise to a function φ∞ which is continuous at 0, then µn ⇒ µ∞ for some probability measure µ∞ on Rd with characteristic function φ∞ . 4

Proof. As in the one-dimensional case, it suffices to show that {µn }n∈N is tight. Let Xn := (Xn (1), . . . , Xn (d)) ∈ Rd be a random variable with distribution µn . We leave it as an exercise to show that Exercise 2.3 A family of Rd -valued random variables {Xn := (Xn (1), . . . , Xn (d))}n∈N is tight if and only if for each coordinate 1 ≤ i ≤ d, {Xn (i)}n∈N is a tight family of R-valued random variables. Therefore it only remains to show that {Xn (i)}n∈N is tight for each 1 ≤ i ≤ d. Note that E[eitXn (1) ] = φn (t, 0, . . . , 0) → φ∞ (t, 0, . . . , 0), where φ∞ (t, 0, . . . , 0) is continuous at t = 0 by assumption. Therefore by L´evy’s Continuity Theorem on R, {Xn (1)}n∈N is tight, and so is {Xn (i)}n∈N for each 1 ≤ i ≤ d. The next result shows that weak convergence of Rd -valued random variables can be reduced to weak convergence of a family of R-valued random variables, which can be useful at times. Theorem 2.4 [Cram´ er-Wold Device] Let Xn := (Xn (1), . . . , Xn (d)), n ∈ N, be a sequence of Rd -valued random variables. Then Xn converges weakly to some limit X∞ = Pd (X∞ (1), . . . , X∞ (d)) if and only if for every λ ∈ Rd , hλ, Xn i := i=1 λi Xn (i) converges weakly to some limit Yλ , and we can further deduce that Yλ = hλ, X∞ i. Proof. If Xn ⇒ X∞ , then hλ, Xn i ⇒ hλ, X∞ i by the Continuous Mapping Theorem, since x ∈ Rd → hλ, xi ∈ R is a continuous map. Conversely if hλ, Xn i ⇒ Yλ for some Yλ for each λ ∈ Rd , then Xn (i) converges for each 1 ≤ i ≤ d, which by Exercise 2.3 implies that {Xn }n∈N is tight and hence relatively compact. On the other hand, E[eiht,Xn i ] → E[eiYt ] uniquely determines the characteristic function of any subsequential weak limit of {Xn }n∈N , and hence Xn converges weakly to a unique limit X∞ , with Yλ = hλ, X∞ i.

3

Characteristic Functions and Moments

In the proof of L´evy’s Continuity Theorem, we saw a close connection between the tail probability µ(−∞, −A) + µ(A, ∞) of a measure µ and the behavior of its characteristic function φ near 0. We now explore this connection further and show how moments of µ are related to the higher order derivatives of φ at 0. Theorem 3.1 [Derivatives of a Characteristic Function] Let µ be a probability measure on R with characteristic function φ. R n (i) If µ(dx) < ∞ for some n R∈ N, then φ has a continuous n-th derivative φ(n) (t) = R |x| (ix)n eitx µ(dx). In particular, xn µ(dx) = (−i)n φ(n) (0). R (ii) Conversely, if φ(2n) (0) exists for some n ∈ N, then x2n µ(dx) l) ≤ n

C , l2

which in turn implies that {µn }n∈N is a tight (and hence relatively compact) family of probability measures. Therefore subsequential weak limits are guaranteed to exist. It only remains conditions on (mk )n∈N such that there exists a unique probability measure µ with Rto find k x µ(dx) = mk for all k ∈ N. This is known as the Hamburger moment problem. R Exercise 4.1 Assume that limn→∞ xk µn (dx) = mk ∈ R, for each k ∈ N, for a sequence of probability measures R k(µn )n∈N on R. Prove that if µ is any subsequential weak limit of (µn )n∈N , then mk = x µ(dx) for all k ∈ N. This exercise shows that in our context, the existence of a solution to the Hamburger moment problem with moment sequence (mk )k∈N is guaranteed. The real issue is the uniqueness of the solution, for which it is necessary to impose some conditions on (mk )k∈N . We now construct a moment sequence (mn )n∈N which corresponds to two distinct probabilP P P∞ P an = bn = 1. ity measures. Let µ = ∞ n=0 an δen and ν = n=0 bn δen , with an , bn ≥ 0 and Assume further that for each k ∈ N, Z mk =

xk µ(dx) =

∞ X

an ekn =

Z

xk ν(dx) =

n=0

∞ X

bn ekn .

n=0

Writing cn = an − bn , then the construction of µ 6= ν satisfying the above conditions is equivalent to finding a sequence (cn )n≥0 not identically 0, such that C(z) :=

∞ X

cn z n = 0

for z = 1, e, e2 , . . .

with

X

|cn | < ∞.

n

n=0

n ,0} n ,0} P P Indeed, given such (cn )n≥0 , we can simply take an := 2 max{c and bn := 2 max{−c . One |c | |c | n n n n  Q∞ z explicit construction is to take C(z) = n=1 1 − en , which is an entire function by the P kn < Weierstrass Factorization Theorem. Expanding C(z) then gives (cn )n∈N , and ∞ n=0 |cn |e ∞ for all k ∈ N since the radius of convergence of C(z) is ∞.

6

To ensure that at most one probability measure corresponds to (mk )k∈N , we require mk to grow not too fast in k ∈ N. The classic condition is Carleman’s condition: ∞ X



1

m2k2k = ∞,

(4.5)

k=1

which is slightly weaker than what we will assume. P∞ a2k Theorem 4.2 Let (mk )k∈N be such that R k k=1 m2k 2k! < ∞ for some a > 0, then there is at most one probability measure µ with x µ(dx) = mk . Proof. It suffices to determine uniquely the characteristic function Rof any µ with moment sequence (mk )k∈N . Since mk ∈ R for all k ∈ N, it follows that φ(t) = eitx µ(dx) is infinitely differentiable at each t ∈ R, with |φ(2k) (t)| ≤ m2k for each k ∈ N. Since for k ≥ 0, Z m2k + m2k+2 √ |φ(2k+1) (t)| ≤ |x|2k+1 µ(dx) ≤ m2k m2k+2 ≤ 2 P a2k by the Cauchy-Schwarz inequality, the assumption ∞ k=1 m2k 2k! < ∞ implies that for any t ∈ R, (see [1, 2] for detailed justifications on the power series expansion) φ(t + z) =

∞ X k=0

φ(k) (t)

zk k!

is analytic in z with |z| < a. In particular, for |z| < a, φ(z) is determined by its Taylor series at 0, with Taylor coefficients φ(k) (0) determined by (mk )k∈N . We can then repeat the argument and Taylor expand at t = ±a/2, ±a, ±3a/2, . . . to conclude that φ(z) is determined by (mk )k∈N for all z = t + ix, with t ∈ R and −a < x < a. In particular, φ is determined by (mk )k∈N , which in turn determines µ. Remark. The most common distributions, such as the exponential or Gaussian distribution, satisfy Carleman’s condition or the condition in Theorem 4.2. Therefore to prove (µn )n∈N converges weakly to the exponential or Gaussian distribution, it suffices to prove that the moments of µn converge to those of the exponential or Gaussian. This is called the method of moments for proving weak convergence. On the other hand, a distribution whose moments satisfy Carleman’s condition (4.5) is uniquely determined by its distribution function φ on [−, ] for any  > 0, by Theorem 3.1 (i). Therefore to prove weak convergence to such a distribution using L´evy’s Continuity Theorem, it is sufficient to verify the convergence of the characteristic functions on [−, ] for any  > 0. Lastly we note that, without assuming Carleman’s condition, a distribution in general is not uniquely determined by its characteristic function on a finite interval [−a, a]. This can be seen easily from the following result (see [1, 2] for a proof): Theorem 4.3 [P´ olya’s Criterion] If φ : R → [0, 1] is a real, continuous and even function with φ(0) = 1, limt→∞ φ(t) = 0, and φ is convex on [0, ∞), then φ is a characteristic function.

References [1] R. Durrett. Probability: Theory and Examples, 2nd edition, Duxbury Press, 1996. [2] A. Klenke. Probability Theory–A Comprehensive Course, Springer-Verlag.

7