5. Moment Generating Functions and Functions of Random Variables

5. Moment Generating Functions and Functions of Random Variables 5.1 The Distribution of a Sum of Two Independent Random Variables - Convolutions Supp...
3 downloads 0 Views 551KB Size
5. Moment Generating Functions and Functions of Random Variables 5.1 The Distribution of a Sum of Two Independent Random Variables - Convolutions Suppose X and Y are discrete random variables. Let S = X + Y . We have X P(S = k) = P(X = x, Y = k − x), x∈SX ,k−x∈SY

i.e. we sum the probabilities of all the possible combinations of X and Y that lead to the sum being k. In particular, if X and Y are non-negative random variables that take integer values, then P(S = k) =

k X

P(X = x, Y = k − x).

x=0 1 / 78

Example 5.1

Using the fact that k

(x + y ) =

k   X k i=0

i

x i y k−i ,

show that the sum of two independent Poisson distributed random variables also has a Poisson distribution.

2 / 78

Example 5.1 Assume that X ∼Poisson(λ) and Y ∼Poisson(µ), we have k k X X P(S = k)= P(X = i, Y = k − i) = P(X = i)P(Y = k − i) i=0 k X

=

i=0

=

i=0

e −λ λi e −µ µk−i i! (k − i)!

k e −µ−λ X k!λi µk−i k! i!(k − i)! i=0

e −µ−λ (µ + λ)k = k! Hence, X + Y ∼Poisson(µ + λ). 3 / 78

Convolutions of Continuous Random Variables Suppose X and Y are continuous random variables. Let S = X + Y . We have Z ∞ fS (k) = fX ,Y (x, k − x)dx. −∞

i.e. we integrate over all the possible combinations of X and Y that lead to the sum equalling k. In particular, if X and Y are non-negative random variables, then Z

k

fX ,Y (x, k − x)dx.

fS (k) = 0

4 / 78

Example 5.2

Suppose X and Y are independent variables from the Exp(λ) distribution. Derive the density function of the sum S = X + Y .

5 / 78

Example 5.2

Since exponential random variables are non-negative, we have Z k Z k fS (k)= fX ,Y (x, k − x)dx = fX (x)fY (k − x)dx 0 0 Z k = λe −λx λe −λ(k−x) dx 0 Z k 2 −kλ =λ e dx 0 2

=λ ke

−kλ

6 / 78

5.2 Distributions of the Minimum and the Maximum of a Set of Independent Random Variables

Suppose X1 , X2 , . . . , Xn are independent, identically distributed (i.i.d.) random variables. We wish to derive the distribution of the maximum U = max{X1 , X2 , . . . , Xn } and minimum V = min{X1 , X2 , . . . , Xn } of these variables. These distributions can be found by deriving their cumulative distribution functions.

7 / 78

Distribution of the Maximum of a Set of Random Variables

Note that if all of the Xi are less than some x, then their maximum is less than x. Hence, FU (x) = P(U < x)=P(X1 < x, X2 < x, . . . Xn < x) =P(X1 < x)n We can find the density function of the maximum by differentiating the cumulative distribution function, i.e. fU (x) = FU0 (x).

8 / 78

Example 5.3

Suppose X1 , X2 , . . . Xn are i.i.d. random variables from the Exp(λ) distribution. Find the density function of U = max{X1 , X2 , . . . , Xn }.

9 / 78

Example 5.3

We have FU (x) = P(U < x)=P(X1 < x, X2 < x, . . . Xn < x) =P(X1 < x)n Note that Z x P(X1 < x)= λe −λt dt h0 ix = −e −λt = 1 − e −λx 0

10 / 78

Example 5.3

Hence,  n FU (x)= 1 − e −λx n−1  fU (x) = FU0 (x)=nλe −λx 1 − e −λx

11 / 78

Distribution of the Minimum of a Set of Random Variables

We calculate the cumulative distribution function of V = min{X1 , X2 , . . . , Xn } indirectly by using the fact that if all the Xi are > x, then the minimum is greater than x. We have P(V > x) = 1 − P(V < x) = 1 − FV (x)=P(X1 > x, X2 > x, . . . Xn > x) =P(X1 > x)n

12 / 78

Example 5.4

Suppose X1 , X2 , . . . Xn are i.i.d. random variables from the Exp(λ) distribution. Find the density function of V = max{X1 , X2 , . . . , Xn }.

13 / 78

Example 5.4

We have P(V > x) = 1 − FV (x)=P(X1 > x, X2 > x, . . . Xn > x) =P(X1 > x)n Note that Z ∞ P(X1 > x)= λe −λt dt hx i∞ = −e −λt = e −λx x

14 / 78

Example 5.4

Hence,  n 1 − FV (x)= e −λx = e −λnx FV (x)=1 − e −λnx fV (x) = FV0 (x)=nλe −λnx

15 / 78

Example 5.4

Note that the minimum of n random variables from the Exp(λ) distribution has an Exp(nλ) distribution. This agrees with the interpretation of the exponential distribution describing the time to ”the first call” when calls arrive at rate λ. Suppose a call centre takes n types of call and the rate at which each type of call comes is λ.

16 / 78

Example 5.4

The total rate at which calls come in is thus nλ. Hence, the time to the next call has an Exp(nλ) distribution Let Xi be the time until the next call of type i, i.e. Xi ∼Exp(λ). The time to the next call will simply be the minimum of the Xi .

17 / 78

5.3 Moment Generating Functions

The moment generating function of a continuous random variable X is denoted by MX (t) and is given by Z ∞ tX MX (t) = E [e ] = e tx fX (x)dx. −∞

The moment generating function of a discrete random variable X is given by X MX (t) = E [e tX ] = e tx P(X = x). x∈SX

18 / 78

Example 5.5

Calculate the moment generating function for the exponential distribution with parameter λ, i.e. fX (x) = λe −λx , x ≥ 0.

19 / 78

Example 5.5 We have Z ∞ MX (t) = E [e tx ]= e tx λe −λx dx Z0 ∞ = λe −x(λ−t) dx 0

=−

λ h −x(λ−t) i∞ e . λ−t x=0

Note that this integral is well defined as long as λ > t. It follows that λ MX (t) = , λ > t. λ−t

20 / 78

Properties of the Moment Generating Function 1. Just like the density function or the cumulative distribution function, the moment generating function MX (t) defines a random variable. 2. For any continuous random variable X , R∞ MX (0) = −∞ fX (x)dx = 1 (a similar result holds if X has a discrete distribution). 0

3. If the n-th moment of X exists, it is given by MXn (0), 0 where g n denotes the n-th derivative of the function g. 4. Suppose X1 , X2 , . . . , Xk are independent random variables and S = X1 + X2 + . . . + Xk . MS (t) = MX1 (t)MX2 (t) . . . MXk (t). 21 / 78

Properties of the Moment Generating Function - Number 3

Using the Taylor expansion of e tX around 0 t 2X 2 t 3X 3 + + . . .] 2! 3! t2 t3 =1 + tE [X ] + E [X 2 ] + E [X 3 ] + . . . 2! 3!

E [e tX ]=E [1 + tX +

Differentiating n times, we obtain a polynomial in t whose constant is equal to E [X n ], thus the value of the n-th derivative at 0 is equal to the n-th moment.

22 / 78

Example 5.6

i) Using Property 4 above, derive the moment generating function of X , where X ∼ Bin(n, p). ii) Using your result from i), calculate the expected value and variance of X .

23 / 78

Example 5.6

i) We use the fact that X = X1 + X2 + . . . + Xn , where the Xi are independent and Xi ∼ 0-1(p), i.e. P(Xi = 0) = 1 − p;

P(Xi = 1) = p.

We first calculate the moment generating function of Xi

24 / 78

Example 5.6

We have MXi (t)=E [e tXi ] =

X

e tx P(Xi = x)

x=0,1 t

=(1 − p) + pe It follows that

MX (t) = MX1 (t)MX2 (t) . . . MXn (t) = [(1 − p) + pe t ]n .

25 / 78

Example 5.6

ii) In order to derive expected values from this, we need to differentiate MX (t). We have E (X ) = MX0 (0). MX0 (t) = n[(1 − p) + pe t ]n−1 pe t . Hence, E (X ) = MX0 (0) = n[(1 − p) + p]n−1 p = np.

26 / 78

Example 5.6

In order to derive the variance of X , we use Var(X ) = E (X 2 ) − E (X )2 . Hence, we need to calculate the second moment of X . We have MX00 (t) = n(n − 1)[(1 − p) + pe t ]n−2 pe t pe t + n[(1 − p) + pe t ]n−1 pe t . Hence, E (X 2 ) = MX00 (0) = n(n − 1)p 2 + np

27 / 78

Example 5.6

Thus Var (X )=E (X 2 ) − E (X )2 = np + n(n − 1)p 2 − n2 p 2 =np − np 2 = np(1 − p)

28 / 78

5.4 The Distribution of a Function of a Random Variable

From Chapters 2 and 3, it can be seen that we can calculate the expected value of a function g of a random variable X directly from the distribution of X , i.e. without having to derive the distribution of g (X ). However, in some cases we may be interested in the distribution of g (X ) itself.

29 / 78

Distribution of a Discrete Random Variable with Finite Support

Suppose X is a discrete random variable with a finite support SX . Suppose we wish to find the distribution of Y = g (X ). To find P(Y = y ), we use P(Y = y ) =

X

P(X = x),

x∈SX :y =g (x)

i.e. we sum the probabilities of all the realisations of X that give the appropriate value of the function.

30 / 78

Example 5.7

Let X denote the result of a die roll. Define the distribution of Y = (X − 3)2 .

31 / 78

Example 5.7

It is easiest to derive the distribution of Y using a tabulation of the distribution of X . We have x y = (x − 3)2 P(X = x)

1 4 1/6

2 1 1/6

3 0 1/6

4 1 1/6

5 4 1/6

6 9 1/6

32 / 78

Example 5.7

It follows that P(Y = 0)=P(X = 3) = 1/6 P(Y = 1)=P(X = 2) + P(X = 4) = 1/6 + 1/6 = 1/3 P(Y = 4)=P(X = 1) + P(X = 5) = 1/6 + 1/6 = 1/3 P(Y = 9)=P(X = 6) = 1/6

33 / 78

Distribution of a Function of a Continuous Random Variable

When we can solve the inequality g (x) ≤ k algebraically and the density function (or cumulative distribution function) of X is known, the simplest way of finding the distribution of Y = g (X ) is by finding the cumulative distribution function of g (X ). We can then find the density function of Y = g (X ) by differentiation, since fX (x) = FX0 (x).

34 / 78

Example 5.8

i) Suppose X has a uniform distribution on [0, 1]. Find the density function of Y = − ln(1 − X ). ii) Suppose Z has a standard normal distribution. Find the density function of W = Z 2 . Note that from the section on the relation between the Gamma distribution and the normal distribution in Chapter 3, W has a Chi-squared distribution with one degree of freedom.

35 / 78

Example 5.8 i) First consider the support of Y . Since SX = [0, 1], 0 ≤ 1 − X ≤ 1 ⇒ Y ∈ [0, ∞). We then calculate the cumulative distribution function of Y . Using the form of Y as a function of X , we have FY (y ) = P(Y ≤ y ) = P(− ln(1 − X ) ≤ y ) We now rearrange this probability into the form P(X ∈ A) and express this probability using the distribution function of X . FY (y ) = P(− ln(1 − X ) ≤ y )=P(ln(1 − X ) ≥ −y ) = P(1 − X ≥ e −y ) =P(X ≤ 1 − e −y ) = FX (1 − e −y ).

36 / 78

Example 5.8

i) We then differentiate using the chain rule to find the density function of Y . Differentiating with respect to Y , we obtain FY (y )=FX (1 − e −y ) FY0 (y ) = fY (y )=FX0 (1 − e −y )[1 − e −y ]0 fY (y )=fX (1 − e −y )e −y On its support, the density function of X is 1. Hence, fY (y ) = e −y . This is valid on the support of Y , i.e. for y ∈ [0, ∞). It follows that Y ∼Exp(1).

37 / 78

Example 5.8

i) First consider the support of W . Since the support of Z is R, the support of W = Z 2 = [0, ∞). Using the form of W as a function of Z , we obtain FW (w )=P(W ≤ w ) = P(Z 2 ≤ w ) √ √ √ √ =P(− w < Z < w ) = P(Z < w ) − P(Z < − w ) √ √ =FZ ( w ) − FZ (− w )

38 / 78

Example 5.8

Differentiating with respect to W , we obtain √ √ √ √ 0 FW (w ) = fW (w )=FZ0 ( w )[ w ]0 − FZ0 (− w )[− w ]0 √ √ fZ ( w ) fZ (− w ) √ fW (w )= √ + 2 w 2 w Using the symmetry of the standard normal distribution around 0, we have fZ (z) = fZ (−z), thus √ fZ ( w ) fW (w ) = √ . w

39 / 78

Example 5.8

Finally, the density function of Z is given by fZ (z) = Hence,

exp[−w 2 /2] √ . 2π

√ fZ ( w ) exp[−w /2] fW (w ) = √ = √ . w 2πw

This density function is valid on the support of W , i.e. for w ∈ [0, ∞).

40 / 78

The Inverse Method of Generating Random Numbers

Theorem Suppose X ∼ U[0, 1]. Let Y = F −1 (X ), then Y has distribution function F . Random number generators are based on sequences of ”pseudo-random” numbers from the U[0,1] distribution. If a cumulative distribution function F is invertible, then this theorem can be used to generate a sequence from this distribution, first by generating a sequence of realisations from the U[0,1] and applying F −1 to each of these realisations. Note that if X ∼ U[0, 1], then FX (x) = x for 0 ≤ x ≤ 1.

41 / 78

The Inverse Method of Generating Random Numbers

We have FY (y )=P(Y < y ) = P(F −1 (X ) < y ) =P(F [F −1 (X )] < F (y )) = P(X < F (y )) Since 0 ≤ F (y ) ≤ 1, then FY (y ) = P(X < F (y )) = F (y ).

42 / 78

The Inverse Method of Generating Random Numbers

Consider Example 5.8 i). If X ∼ U[0, 1] and Y = − ln(1 − X ), then Y ∼Exp(1). Note that if Y ∼Exp(1), then fY (y ) = e −y , y ≥ 0. Z y FY (y ) = P(Y < y )= e −t dt 0  y = −e −t 0 = 1 − e −y

43 / 78

The Inverse Method of Generating Random Numbers

We have FY (y ) = 1 − e −y . To find the inverse function, we transform x = FY (y ) into y = FY−1 (x). x = 1 − e −y ⇔e −y = 1 − x −y = ln(1 − x)⇔y = − ln(1 − x). Hence, this example is a particular case of the theorem given above.

44 / 78

Example 5.9

Suppose I have a random number generator which produces a sequence from the U[0,1] distribution. Define a procedure for generating numbers from the distribution with density function fX (x) = 1/x 2 for x ≥ 1, otherwise fX (x) = 0.

45 / 78

Example 5.9

First we must derive the cumulative distribution function of this random variable. Z x dt FX (x)=P(X < x) = 2 1 t  x 1 1 = − =1− t 1 x Note that this is valid on the support of X , i.e. for x ≥ 1.

46 / 78

Example 5.9

We now derive the inverse to the cumulative distribution function. 1 1 u =1− ⇔ =1−u x x 1 x= 1−u It follows that if U has a uniform distribution on [0,1], then 1 has the required distribution. X = 1−U

47 / 78

5.5 Distribution of a Function of a Pair of Random Variables This method is a generalisation of the method based on inverting the distribution function. Suppose we know the joint density of the random variables X and Y. We wish to find the joint density of the pair of random variables U = U(X , Y ) and V = V (X , Y ). It is assumed that these transformations are invertible, i.e. we can define X and Y as functions of U and V , i.e. X = X (U, V ) and Y = Y (U, V ).

48 / 78

The Jacobian of the Pair of Transformations U = U(X , Y ) and V = V (X , Y )

Denote the Jacobian of the pair of transformations U = U(X , Y ) and V = V (X , Y ) by J(U, V ). By definition ∂X

∂U ∂Y

∂V ∂X

∂V ∂Y

 ∂U J(U, V ) = det 

 

49 / 78

Distribution of a Function of a Pair of Random Variables

The joint density function of U and V is given by fU,V (u, v ) =

fX ,Y (x, y ) . |J(U, V )|

Note that in the above formula fX ,Y (x, y ) is given in terms of x and y . To get the final answer, we need to use the transformations x = X (u, v ), y = Y (u, v ).

50 / 78

Example 5.10

Suppose that X and Y are independent and both have standard normal distributions. Let U = X + Y and V = X − Y . i) Find the joint density function of U and Y . ii) Show that U and V are independent normal random variables with mean 0 and variance 2.

51 / 78

Example 5.10

The density function of the standard normal distribution is given by  2 −x 1 fX (x)= √ exp 2 2π  2 1 −y fY (y )= √ exp 2 2π Since X and Y are independent, it follows that  2  1 x + y2 fX ,Y (x, y ) = fX (x)fY (y ) = exp − . 2π 2

52 / 78

Example 5.10

Now we calculate the Jacobian for this transformation. We have ∂U ∂U = 1; =1 ∂X ∂Y ∂V ∂V = 1; = −1 ∂X ∂Y It follows that  J(U, V ) = det

1 1 1 −1

 = −2.

53 / 78

Example 5.10

It follows that  2  fX ,Y (x, y ) 1 x + y2 fU,V (u, v ) = = exp − (∗). |J(U, V )| 4π 2 The final thing to do is to substitute in x = X (u, v ) and y = Y (u, v ). We have U=X + Y

(1)

V =X − Y

(2)

54 / 78

Example 5.10

Adding Equations (1) and (2) together, we obtain U + V = 2X . Hence, U +V . X = X (U, V ) = 2 From Equation (2), it follows that V =

U +V U −V − Y ⇒ Y = Y (U, V ) = 2 2

55 / 78

Example 5.10

Substituting this back into Equation (*), we obtain   1 (u + v )2 /4 + (u − v )2 /4 fU,V (u, v )= exp − 4π 2  2  2 1 u v fU,V (u, v )= exp − − 4π 4 4

56 / 78

Example 5.10

We should always consider the support of the variables U and V . Since X and Y and independent, normally distributed random variables, it follows that U = X + Y and V = X − Y are normally distributed (as they are simply linear combinations of X and Y ). Hence, R is the support of both U and V . It should be noted that this joint density function can be factorised into a product of a function of u and a function of v . It follows that U and V are independent random variables.

57 / 78

Example 5.10 We have  2  2 1 u 1 v √ √ fU,V (u, v ) = exp − exp − . 4 2 π 4 2 π It can be seen that U and V have the same distribution. Comparing this with the general formula for the density function of a normal distribution   (x − µ)2 1 , f (x) = √ exp − 2σ 2 σ 2π it follows that µ = 0 and σ 2 = 2.

58 / 78

Use of the Jacobian Method to Find the Distribution of a Single Function of Two Random Variables

If the joint distribution cannot be split into the product of two density functions in this way, we can find the density function of U by integrating out V . This method can always be used to find the distribution of U = U(X , Y ). Note that no variable V is defined, we can always define V = X or V = Y .

59 / 78

Example 5.11

Suppose X and Y are independent random variables with Exp(λ) distributions. Using the Jacobian method, find the density function of U = X +Y. Note: This distribution has already been defined using the convolution approach. U has support [0, ∞).

60 / 78

Example 5.11

Since no second function of X and Y is given, we may define V as we wish. An obvious choice is V = X . Note that U and V will be dependent. Hence, we have U = X + Y , V = X . Thus ∂U = 1; ∂X ∂V = 1; ∂X

∂U =1 ∂Y ∂V =0 ∂Y

61 / 78

Example 5.11

Hence,  J(U, V ) = det

1 1 1 0

 = −1.

62 / 78

Example 5.11

Since X and Y are independent random variables with an Exp(λ) distribution, for x, y ≥ 0, we have fX ,Y (x, y ) = fX (x)fY (y ) = λe −λx λe −λy = λ2 e −λ(x+y ) . Hence, fU,V (u, v ) =

fX ,Y (x, y ) = λ2 e −λ(x+y ) . |J(U, V )|

It remains to substitute in x = X (u, v ) and y = Y (u, v ) and determine the support of U and V .

63 / 78

Example 5.11

Since U = X + Y , V = X and both X and Y are non-negative, it follows that 0 ≤ V ≤ U. Both U and V are unbounded. Hence, the support of (U, V ) is given by {(u, v ) : 0 ≤ v ≤ u}. We have X = V , thus Y = U − X = U − V . Hence, X (u, v ) = v and Y (u, v ) = u − v . It follows that fU,V (u, v ) = λ2 e −λ(v +u−v ) = λ2 e −λu , {(u, v ) : 0 ≤ v ≤ u}

64 / 78

Example 5.11

In order to find the density function of U, we integrate v out. Note that there is only a positive density when v ≤ u. It follows that Z fU (u) =

u

λ2 e −λu dv = λ2 ue −λu

0

65 / 78

Appendix

The Cauchy Distribution The standard Cauchy distribution has density function f (x) =

1 , x ∈ R. π(1 + x 2 )

This distribution is symmetric around 0 and has a similar shape to the normal distribution (however, it is less peaked/more spread out).

66 / 78

The Cauchy Distribution

Note that this does indeed define a probability distribution, since f (x) > 0, ∀x ∈ R. Also, Z Z ∞ 1 ∞ dx f (x)dx= π −∞ 1 + x 2 −∞ ∞ 1 = tan−1 (x) −∞ π  1 π −π − = = 1. π 2 2

67 / 78

The Cauchy Distribution

However, the expected value is undefined for this distribution, since Z ∞ Z 1 ∞ xdx E (X )= xf (x)dx = π −∞ 1 + x 2 −∞   ∞ = ln(1 + x 2 ) −∞ This integral is undefined as ln(1 + x 2 ) is unbounded as x tends to ∞ or −∞.

68 / 78

Two Inequalities

Markov’s Inequality Assume that X is a non-negative random variable. E (X ) P(X > k) ≤ k Chebyshev’s Inequality P(|X − E (X )| > kσ) ≤

1 . k2

69 / 78

Proof of Markov’s Inequality for Continuous Distributions

Since X is assumed to be non-negative, we have Z ∞ E (X )= xf (x)dx 0 Z k Z ∞ = xf (x)dx + xf (x)dx 0

k

Rk

xf (x)dx ≥ 0 and Z ∞ Z ∞ ii) xf (x)dx ≥ k f (x)dx = kP(X > k).

Note that i)

0

k

k

70 / 78

Proof of Markov’s Inequality for Continuous Distributions

It follows that E (X ) ≥ kP(X > k) ⇒ P(X > k) ≤

E (X ) k

71 / 78

Proof of Chebyshev’s Inequality for Continuous Distributions We have Z ∞ Var (X ) = σ = (x − E [X ])2 f (x)dx −∞ Z Z 2 = (x − E [X ]) f (x)dx + 2

|x−E (x)|≤kσ

(x − E [X ])2 f (x)dx

|x−E (x)|>kσ

The first of these integrals is non-negative and Z Z 2 (x−E [X ]) f (x)dx ≥ k 2 σ 2 f (x)dx = k 2 σ 2 P(|X −E (X )| > kσ) |x−E (x)|>kσ

|x−E (x)|>kσ

72 / 78

Proof of Chebyshev’s Inequality for Continuous Distributions

It follows that σ 2 ≥k 2 σ 2 P(|X − E (X )| > kσ) 1 ≥P(|X − E (X )| > kσ) k2

73 / 78

Example A.1

I throw a coin 100 times. Let X be the number of heads. i) Using Markov’s inequality find an upper bound on P(X > 70). ii) Using Chebyshev’s inequality find a lower bound on P(30 ≤ X ≤ 70). iii) Using your answer to ii) and the symmetry of the distribution of X , obtain a better upper bound on P(X > 70).

74 / 78

Example A.1

We have X ∼Bin(100,0.5). Thus E (X ) = 50, Var (X ) = 25. i) Using Markov’s inequality E (X ) k E (X ) 5 P(X > 70)≤ = . 70 7 P(X > k)≤

75 / 78

Example A.1

ii) Note that P(30 ≤ X ≤ 70) = P(|X − E (X )| ≤ 4σ). Using Chebyshev’s inequality 1 k2 1 P(|X − E (X )| > 4σ)≤ . 16

P(|X − E (X )| > kσ)≤

We have P(|X − E (X )| ≤ 4σ) = 1 − P(|X − E (X )| > 4σ) ≥ 1 −

1 15 = . 16 16

76 / 78

Example A.1

Using the symmetry of the distribution of X around 50, we have P(X > 70) = P(X < 30). Hence, P((|X −E (X )| > 4σ) = P(X < 30)+P(X > 70) = 2P(X > 70) ≤ It follows that P(X > 70) ≤

1 . 16

1 32 .

77 / 78

Jensen’s Inequalities

Suppose g is a convex function. It follows that E [g (X )] ≥ g (E [X ]). Note that since g (X ) = X 2 is a convex function, we have E [g (X )] = E [X 2 ] ≥ g (E [X ]) = E [X ]2 . Suppose that h is a concave function. It follows that E [h(X )] ≤ h(E [X ]). Since h(X ) = ln X is a concave function, we have E [h(X )] = E [ln X ] ≤ h[E (X )] = ln(E [X ]).

78 / 78