STAT 421 Lecture Notes
4.4
105
Moments
Let X be a random variable and k denote a positive integer. The kth moment of X is defined to be E(X k ). The kth moment need not exist, though the kth and lower moments exist if E(|X|k ) exists. Theorem 4.4.1. If E(|X k |) < ∞ for some positive k, then E(|X j |) < ∞ for all positive integers j < k. A sketch of the proof for the case of X with a continuous distribution follows. Suppose that E(|X k |) < ∞ for some positive k and j is a positive integer less than k. Then, ∫ ∞ j E(|X |) = |xj |f (x) dx −∞ ∫ 1 ∫ j |x |f (x) dx + |xj |f (x) dx = −1 1
∫ ≤
{x||x|>1}
∫ f (x) dx +
{x||x|>1} k
−1
|xk |f (x) dx
≤ Pr(|X| ≤ 1) + E(|X |) < ∞. Central Moments Suppose that X is a random variable with E(X) = µ. For every positive integer k, the kth central moment is defined to be E[(X − µ)k ]. For example, the second central moment is the variance E[(X − µ)2 ]. Suppose that the distribution of X is symmetric about µ and E[(X − µ)k ] exists. Then, if k is odd, E[(X − µ)k ] = 0 because ∫ 0 ∫ ∞ k k E[(X − µ) ] = (x − µ) f (x) dx + (x − µ)k f (x) dx −∞ 0 ∫ ∞ ∫ ∞ k = − (x − µ) f (x) dx + (x − µ)k f (x) dx = 0. 0
0
Example 4.4.1. Suppose that X is continuous with p.d.f. f (x) = ce−(x−3)
2 /2
, x ∈ R.
The p.d.f. is symmetric about 3, and so the median and mean of X is µ = 3. It can be shown that for every positive k, ∫ ∞ 2 k E(|X |) = |xk |ce−(x−3) /2 dx < ∞. −∞
STAT 421 Lecture Notes
106
By symmetry of f , E[(X − µ)k ] = 0 when k is an odd positive integer. For the even central moments, a recursive formula can be developed. Let 2n = k and y = x − µ. Then, ∫ ∞ 2 k E[(X − µ) ] = y 2n ce−y /2 dx. −∞
Using integration by parts, set u = y 2n−1 ⇒ dy = (2n − 1)y 2n−2 dv = cye−y
2 /2
dy ⇒ v = −ce−y
2 /2
.
Then, E[(X − µ) ] = −y k
2n−1
∞ −y 2 /2
ce
−∞
∫
∞
+
(2n − 1)y 2n−2 ce−y
−∞ 2n−2
= 0 + (2n − 1)E[(X − µ)
2 /2
dy
]
= (2n − 1)E[(X − µ)2n−2 ]. Now, E[(X − µ)0 ] = E(1) = 1, E[(X − µ)2 ] = (2 − 1) × 1 = 1, E[(X − µ)4 ] = (4 − 1) × 1 = 3, E[(X − µ)6 ] = (6 − 1) × 3 = 15, and so on. Skewness is a measure of lack of symmetric based on the third central moment. Since the third central moment is 0 if the distribution is symmetric, the difference between E[(X −µ)3 ] and 0 reflects lack of symmetry. For interpretability, E[(X − µ)3 ] is scaled by the second central moment, and the measure of skewness is E[(X − µ)3 ] E[(X − µ)3 ] = . E[(X − µ)2 ] σ2 The figure below and left shows the probability function for the binomial distribution as a function of p. Values of p are drawn from the set {.1, .15, .2, . . . , .85, .9}. Skewness is plotted against p in the figure below and to the right.
107
−0.5
0.0
skewness
0.2 0.1 0.0
Pr(X=x)
0.3
0.5
0.4
STAT 421 Lecture Notes
0
5
10
15
0.2
20
0.4
0.6
0.8
p
x
Moment Generating Functions Definition 4.4.2. Let X be a random variable. The moment generating function (m.g.f.) of X is defined to be ψ(t) = E(etX ), t ∈ R. Any two random variables with the same distribution will have the same moment generating function. The m.g.f. may or may not exist (though if X is bounded, then ψ(t) will exist). Note that ψ(0) = E(1) = 1 always exists, but is uninteresting. Theorem 4.4.2. Let X be a random variable whose m.g.f. ψ(t) is finite for all values of t in an open interval containing t = 0. Then, for each positive integer n, E(X n ), the nth moment of X, is finite and equals the nth derivative of ψ(t) evaluated at t = 0. The nth derivative evaluated at 0 is denoted by ψ (n) (0). The proof of the theorem depends strongly on the following result: ( n tX ) dn E(etX ) d e (n) ψ = = E . n dt dtn Accepting the truth of this statement leads to ) detX ψ (0) = E dt ( 0·X )t=0 = E Xe (
(1)
= E(X).
STAT 421 Lecture Notes
108
Example 4.4.3. Suppose that X has the following p.d.f. f (x) = e−x I{r|r>0} (x). The m.g.f. of X is ( ) ψ(t) = E etX ∫ ∞ = etx e−x dx ∫0 ∞ = e(t−1)x dx. 0
This integral is finite provided that t ∈ (−∞, 1), and for t ∈ (−∞, 1), ψ(t) = (1 − t)−1 . To determine the first and second moments, and the variance of X, we compute ψ (1) (t) = (1 − t)−2 ⇒ ψ (1) (0) = E(X) = 1 ψ (2) (t) = 2(1 − t)−3 ⇒ ψ (2) (0) = E(X 2 ) = 2 Var(X) = ψ (2) (0) − [ψ (1) (0)]2 = 1. Question 1 : Compute E(X) where X has the following p.f.: f (x) =
λx e−λ I{0,1,2,...} (x). x!
Properties of moment generating functions Let X be a random variable with mg.f. ψ1 (t) and Y = aX + b where a, b are constants. Let ψ2 (t) denote the m.g.f. of Y . Then, for every value of t for which ψ1 (at) is finite, ψ2 (t) = ebt ψ1 (at). The proof proceeds as follows: ] [ ψ2 (t) = E e(aX+b)t [ ] = E ebt eaXt [ ] = ebt E eatX = ebt ψ1 (at). Example 4.4.4. Suppose that X has the following p.d.f. f (x) = e−x I{r|r>0} (x),
STAT 421 Lecture Notes
109
and so the m.g.f. of X is ψ1 (t) = (1 − t)−1 . If Y = 3 − 2X, then the m.g.f. of Y will be finite for t > −1/2 and will have the value ψ2 (t) = e3t ψ1 (−2t) =
e3t . 1 + 2t
Sums of independent random variables (but not necessarily identically distributed random variables) are important statistics, and computing the m.g.f. of the sum is a convenient method for determining the distribution, or at least the moments, of the sum. The following theorem is key: Theorem 4.4.4. Suppose that X1 , . . . , Xn are independent random variables with m.g.f.s ∑ ψ1 , . . . , ψn . Let Y = ni=1 Xi and let ψ denote the m.g.f. of Y . Then, for every value of t for which ψ1 (t), . . . , ψn (t) are finite, ψ(t) =
n ∏
ψi (t).
i=1
The proof proceeds as follows: ) ( ∑n ψ(t) = E et i=1 Xi ( n ) ∏ = E etXi i=1
=
n ∏
( ) E etXi (by independence of X1 , . . . , Xn )
i=1
=
n ∏
ψi (t).
i=1
The moment generating function for the binomial random variable Suppose that X ∼ Binom(n, p) and that Xi ∼ Binom(1, p), i = 1, . . . , n are independent. ∑ Then, X = Xi . Furthermore, the m.g.f. of Xi is ∑ ψi (t) = etx px (1 − p)1−x x∈{0,1}
= 1 − p + pet . By Theorem 4.4.4., the m.g.f. of X is ψ(t) =
n ∏
ψi (t)
i=1
= (1 − p + pet )n . Theorem 4.4.5. If the m.g.f.s of random variables X and Y are finite and identical in an open interval containing t = 0, then the probability distributions of X and Y are identical.
STAT 421 Lecture Notes
110
The proof is beyond the scope of the book. Theorem 4.4.6. Suppose that X ∼ Binom(n1 , p) and Y ∼ Binom(n2 , p) are independent (note that p is the same for both random variables). Then, X + Y ∼ Binom(n1 + n2 , p). To prove the claim, let ψ1 (t) and ψ2 (t) denote the m.g.f.s of X and Y , and let ψ(t) denote the m.g.f. of X + Y . By Theorem 4.4.4., ψ(t) = ψ1 (t)ψ2 (t) = (1 − p + pet )n1 (1 − p + pet )n2 = (1 − p + pet )n1 +n2 . The m.g.f. of Y , ψ(t), is the m.g.f. of a Binom(n1 + n2 , p) random variable. By Theorem 4.4.5., the distribution of Y is that of a Binom(n1 + n2 , p) random variable.