2

Functions of random variables

There are three main methods to find the distribution of a function of one or more random variables. These are to use the CDF, to transform the pdf directly or to use moment generating functions. We shall study these in turn and along the way find some results which are useful for statistics. 2.1

Method of distribution functions

I shall give an example before discussing the general method. Example 2.1. Suppose the random variable Y has a pdf fY (y) = 3y 2

0 0, which we shall write as Y ∼ Ga(α, β) if β α y α−1 fY (y) = exp(−βy) Γ(α) where

Z Γ(α) =

0 ≤ y < ∞.



y α−1 exp(−y) dy

0

is the Gamma function. 13

Note that some text books define a Gamma distribution with 1/β instead, it doesn’t really matter. Now we can see that Z Γ(1) = 0



 ∞ exp(−y) dy = −e−y 0 = 1.

Also if we integrate Γ(α) by parts we see that Z ∞ Γ(α) = y α−1 exp(−y) dy 0 Z ∞  α−1  −y ∞ = y (−e ) 0 + (α − 1)y α−2 exp(−y) dy Z ∞ 0 = 0 + (α − 1) y α−2 exp(−y) dy 0

= (α − 1)Γ(α − 1) Note that as Γ(1) = 1 we have Γ(2) = 1 × Γ(1), Γ(3) = 2 × Γ(2) = 2, Γ(4) = 3 × Γ(3) = 6, and so on so that if n is an integer Γ(n) = (n − 1)!. √ It is possible to show, but we are not going to, that Γ( 21 ) = π. Definition 2.2. We say that a random variable Y with a Ga(ν/2, 1/2) distribution where ν is an integer has a Chi-Square distribution with ν degrees of freedom and we write it as Y ∼ χ2ν . ν is the Greek letter nu. Example 2.3. We showed in Example 2.2 that the square of a standard normal distribution had pdf  u 1 fU (u) = √ exp − . 2 2πu We can rewrite this, using the results above as 1/21/2 u−1/2 exp(−u/2) fU (u) = Γ(1/2) and so U has a Ga(1/2, 1/2) or χ21 distribution. 14

So we have proved the following theorem. Theorem 2.1. If the random variable Z ∼ N (0, 1) then Z 2 ∼ χ21 . Example 2.4. Probability Integral Transform. Suppose X is a random variable with cdf FX . We can ask what is the distribution of Y = FX (X)? the cdf FX is non-decreasing and FX−1 (y) can be defined for 0 < y < 1 as the smallest x satisfying FX (x) ≥ y. Therefore P (Y ≤ y) = = = =

P (FX (X) ≤ y) P (X ≤ FX−1 (y)) FX (FX−1 (y)) y 01

and zero otherwise, where θ is a positive parameter. This is an example of a Pareto distribution. We want to find the density of Y = ln X. As the support of X, i.e. the range on which the density is non-zero, is x > 1 the support of Y is y > 0. The inverse transformation is y x = ey and dx dy = e . Therefore θ × ey y θ+1 (e ) = θe−yθ y > 0

fY (y) =

and so Y has an exponential distribution. There is an analogous theorem for transforming 2, or indeed n random variables. I am not going to prove this as a theorem, see results in calculus courses, so I will indicate the method. 16

Suppose X1 , X2 have joint pdf fX1 ,X2 (x1 , x2 ) with support A = {(x1 , x2 ) : f (x1 , x2 ) > 0}. We are interested in the random variables Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ). The transformation y1 = g1 (x1 , x2 ) ,y2 = g2 (x1 , x2 ) is a 1-1 transformation of A onto B. So there is an inverse transformation x1 = g1−1 (y1 , y2 )x2 = g2−1 (y1 , y2 ). Let the determinant J, given by ∂x1 ∂x1 ∂y1 ∂y2 J = ∂x 2 ∂y12 ∂x ∂y2 be the Jacobian of the transformation where we assume the partial derivatives are continuous and J 6= 0 for (y1 , y2 ) ∈ B. Then the joint pdf of Y1 = g1 (X1 , X2 ) and Y2 = g2 (X1 , X2 ) is fY1 ,Y2 (y1 , y2 ) = |J|fX1 ,X2 (g1−1 (y1 , y2 ), g2−1 (y1 , y2 ))

(y1 , y2 ) ∈ B.

We will look at some examples. Example 2.6. X1 and X2 have joint pdf f (x1 , x2 ) = exp(−(x1 + x2 )) x1 ≥ 0, x2 ≥ 0. Consider the transformation y1 = x1 and y2 = x1 + x2 with inverse x1 = y1 , x2 = y2 − y1 . The set B = {(y1 , y2 ) : 0 ≤ y1 ≤ y2 ≤ ∞}. The Jacobian is 1 0 = 1. J = −1 1 So the joint pdf of Y1 and Y2 is given by fY1 ,Y2 (y1 , y2 ) = 1 × exp(−y2 )

0 ≤ y1 ≤ y 2 ≤ ∞

If we want the pdf of Y2 = X1 + X2 we must find the marginal pdf of Y2 by integrating out Y1 . Z y2 fY2 (y2 ) = e−y2 dy1 = y2 e−y2 0 ≤ y2 ≤ ∞. 0

17

Note in this example that as we started with 2 random variables we have to transform to 2 random variables. If we are only interested in one of them we can integrate out the other. Example 2.7. X1 and X2 have joint pdf f (x1 , x2 ) = 8x1 x2

0 < x1 < x2 < 1.

Suppose we want to find the pdf of Y1 = X1 /X2 . We need another variable we choose Y2 = X2 as we can then find the inverse easily. The inverse is x1 = y1 y2 , x2 = y2 . The Jacobian is y 2 y1 = y2 . J = 0 1 We have A = {0 < x1 < x2 < 1} which implies that B = {0 < y1 y2 < y2 < 1} which implies that B = {0 < y1 < 1, 0 < y2 < 1}. So f (y1 , y2 ) = 8(y1 y2 )y2 × y2 = 8y1 y23 (y1 , y2 ) ∈ B. Thus the marginal pdf of Y1 is  4 1 Z 1 y 3 f (y1 ) = 8y1 y2 dy2 = 8y1 2 = 2y1 4 0 0 2.3

0 < y1 < 1.

Method of moment generating functions

The moment generating function of a random variable X, written as MX (t) is defined by MX (t) = E[etX ] and is defined for t in a region about 0, −h < t < h for some h. Why is MX (t) useful?

18

First note that MX (0) = 1. Differentiating MX (t) with respect to t assuming X is continuous we have Z d MX0 (t) = etx f (x) dx Zdt = xetx f (x) dx Z MX0 (0) = xf (x) dx = E[X] where we assume we can take

d dt

inside the integral.

Similarly MX00 (t)

d2 = 2 Zdt Z =

etx f (x) dx

x2 etx f (x) dx

= MX00 (0)

Z

x2 f (x) dx

= E[X 2 ]. Hence Var[X] = MX00 (0) − (MX0 (0))2 . If we can calculate the value of the integral (or sum for a discrete rv) in terms of t then we can find the moments of X by differentiation. Example 2.8. Suppose X is a discrete binomial random variable with probability mass function   n f (x) = px (1 − p)n−x x = 0, 1, . . . , n. x

19

Then the moment generating function is   n X n tx M (t) = e px (1 − p)n−x x x=0  n  X n = (pet )x (1 − p)n−x x x=0t n = pe + (1 − p) Thus

 n−1 t M 0 (t) = n pet + (1 − p) pe

so M 0 (0) = np = E[X]. Also  n−2 2 2  n−1 t M 00 (t) = n(n−1) pet + (1 − p) p e t+n pet + (1 − p) pe so M 00 (0) = n(n−1)p2 +np and Var[X] = n(n−1)p2 +np−(np)2 = np − np2 = np(1 − p). Example 2.9. Consider an exponential distribution with pdf f (x) = λe−λx 0 < x. Then Z ∞ M (t) = etx λe−λx dx Z0 ∞ = λe−x(λ−t) dx 0  −x(λ−t) ∞ e = λ − λ−t 0 λ = . λ−t So M 0 (t) = λ(λ − t)−2 , M 0 (0) = λ−1 = E[X] and M 00 (t) = 2λ(λ − t)−3 , M 00 (0) = 2λ−2 and hence Var[X] = 2λ−2 − (λ−1 )2 = λ−2 .

20

Example 2.10. Suppose X has a Gamma distribution, Ga(α, β). We find the mgf of X as follows. Z ∞ α tx β MX (t) = e xα−1 exp(−βx) dx Γ(α) Z0 ∞ α β = xα−1 exp(−x(β − t)) dx Γ(α) 0 Z ∞ α (β − t)α α−1 β x exp(−x(β − t)) dx = (β − t)α 0 Γ(α) Now the integral is of the pdf of a Ga(α, β − t) random variable and so is equal to 1. Note we have to have t < β to make this work. That is ok, so long as we can let t → 0 which we can. Thus the mgf for a Ga(α, β) random variable X is  α β MX (t) = . β−t We now find the mgf of a normally distributed rv. Example 2.11. Suppose X has a normal distribution, N (µ, σ 2 ), find the mgf of X.   Z ∞ 1 (x − µ)2 tx MX (t) = e √ exp − dx 2σ 2 σ 2π  −∞  Z ∞ 1 1 2 2 2 √ exp − 2 [x − 2µx + µ − 2σ tx] dx = 2σ σ 2π   Z−∞ ∞ 1 1 2 √ exp − 2 [x − 2x(µ + σ 2 t) + µ2 ] dx = 2σ −∞ σ 2π Now we complete the square: [x2 − 2x(µ + σ 2 t) + µ2 ] = [x − (µ + σ 2 t)]2 + µ2 − (µ + σ 2 t)2 = [x − (µ + σ 2 t)]2 − (2µσ 2 t + σ 4 t2 ) 21

and so as the final bracket does not depend on x we can take it outside the integral to give     Z ∞ 1 1 2 2µσ 2 t + σ 4 t2 2 2 √ exp − 2 [x − (µ + σ t)] MX (t) = dx exp . 2σ 2σ 2 −∞ σ 2π Now the function inside the integral is the pdf of a N (µ + σ 2 t, σ 2 ) rv and so is equal to one. Thus   σ 2 t2 . MX (t) = exp µt + 2   σ 2 t2 0 Now differentiating the mgf we find M (t) = exp µt + 2 µ + σ 2 t), M 0 (0) = µ = E[X] and     2 2 2 2 σ t σ t M 00 (t) = exp µt + (µ + σ 2 t)2 + exp µt + σ2, 2 2 M 00 (0) = µ2 + σ 2 and hence Var[X] = µ2 + σ 2 − (µ)2 = σ 2 . The following theorem, which we won’t prove, tells us why we can use the mgf to find the distributions of transformed variables. Theorem 2.3. If X1 and X2 are random variables and MX1 (t) = MX2 (t) then X1 and X2 have the same distribution. Example 2.12. Suppose Z ∼ N (0, 1) and Y = Z 2 . Find the distribution of Y using the mgf technique. We have that MY (t) = E[etY ] 2 = E[etZ ]  2 Z ∞ 1 z 2 dz = etz √ exp − 2 2π  2  Z−∞ ∞ 1 z (1 − 2t) √ exp − = dz 2 2π −∞   Z ∞ 1/2 (1 − 2t) (1 − 2t) √ = (1 − 2t)−1/2 exp − z 2 dz 2 2π −∞ 22

Now the function inside the integral is the pdf of a N (0, (1 − 2t)− 1) rv and so equals one. Therefore 1 1 − 2t



1/2 1/2 − t

MY (t) = =

1/2



1/2

which is the mgf of a Ga(1/2, 1/2) rv or equivalently of a χ21 rv. Thus the distribution of Y is χ21 . The moment generating function is also useful for proving other results. For example results about sums of random variables. Suppose X1 , X2 , . .P . , Xn are independent rvs with mgf MXi (t), i = 1, . . . , n. Let Y = i Xi then MY (t) =

n Y

MXi (t).

i=1

This is easily proved. MY (t) = = = = =

E[etY ] P E[et Xi ] E[etX1 etX2 · · · etXn ] E[etX1 ] E[etX2 ] · · · E[etXn ] by independence MX1 (t)MX2 (t) · · · MXn (t)

Example 2.13. If X1 , X2 , . . . , Xn are independent, P each with an ex−1 ponential distribution mean λ show that Y = Xi has a Ga(n, λ) distribution. We showed in example 2.9 that for an exponential distribution MXi (t) = 23

λ/(λ − t). Thus MY (t) =

n Y i=1

λ = λ−t



λ λ−t

n

which is the mgf of a Ga(n, λ) rv. Example 2.14. Suppose Y1 , Y2 , . . . , Yn are independent and normally distributed with mean E[Yi ] = µi and variance Var[Yi ] = σi2 . Define U = a1 Y1 +a2 Y2 +· · ·+an Yn where ai i = 1, 2, . . . , n arePconstants. Show that U is normally with mean E[U ] = ai µi and P 2 distributed variance Var[U ] = ai σi2 . The mgf of Yi is MYi (t) = exp(µi t + 12 σi2 t2 ) so the mgf of ai Yi is Mai Yi (t) E[eai Yi t ] = exp(µi ai t + 21 σi2 a2i t2 ). The Yi are independent so the ai Yi are independent. Hence Y MU (t) = Mai Yi (t) i

1 exp(µi ai t + σi2 a2i t2 ) 2 X 1 X 2 2 2 ai σi )t ) = exp(( ai µi )t + ( 2

=

Y

ComparingP this with the mgf ofP a normal we see that U is normal with mean ai µi and variance a2i σi2 . We give the next result about sums of standard normal rvs as a theorem as the result is important. Theorem 2.4. Suppose Y1 , . . . , Yn are independent, normally distributed with mean E[Yi ] = µi and variance Var[Yi ] = σi2 . Let Zi = (Yi − µi )/σi so that Z1P , . . . , Zn are independent and each has a N (0, 1) distribution. Then Zi2 has a χ2n distribution. Proof. We have seen before that each Zi2 has a χ21 distribution. So

24

MZi2 (t) = (1 − 2t)−1/2 . Let V =

Zi2 . Then Y MV (t) = MZi2 (t) 1 = (1 − 2t)n/2  1  n2 2 = 1 2 −t P

but this is the mgf of a Ga(n/2, 1/2) random variable, that is a χ2n rv. 

25