Pairs of Random Variables • In this chapter, we consider experiments that produce a collection of random variables, X1 , X2 , . . . , Xn , where n can be any integer. • For most of this chapter, we study n = 2 random variables: X and Y . A pair of random variables is enough to show the important concepts and useful problemsolving techniques. Moreover, the definitions and theorems we introduce for X and Y generalize to n random variables. These generalized definitions appear near the end of this chapter in Section 5.10. • We also note that a pair of random variables X and Y is the same as the two0 dimensional vector X Y . Similarly, the random variables X1 , . . . , Xn can be writ 0 ten as the n dimensional vector X = X1 · · · Xn . Since the components of X are random variables, X is called a random vector . Thus this chapter begins our study of random vectors. • This subject is continued in Chapter 8, which uses techniques of linear algebra to develop further the properties of random vectors.

Pairs of Random Variables: Definitions • We begin here with the definition of FX,Y (x, y), the joint cumulative distribution function of two random variables, a generalization of the CDF introduced in Section 3.4 and again in Section 4.2. • The joint CDF is a complete probability model for any experiment that produces two random variables. However, it not very useful for analyzing practical experiments. • More useful models are PX,Y (x, y), the joint probability mass function for two discrete random variables, presented in Sections 5.2 and 5.3, and fX,Y (x, y), the joint probability density function of two continuous random variables, presented in Sections 5.4 and 5.5. • Section 5.7 considers functions of two random variables and expectations, respectively. • We extend the definition of independent events to define independent random variables. • The subject of Section 5.9 is the special case in which X and Y are Gaussian.

Example 5.1 We would like to measure random variable X, but we instead observe Y = X + Z.

(5.1)

The noise Z prevents us from perfectly observing X. In some settings, Z is an interfering signal. In the simplest setting, Z is just noise inside the circuitry of your measurement device that is unrelated to X. In this case, it is appropriate to assume that the signal and noise are independent; that is, the events X = x and Z = z are independent. This simple model produces three random variables, X, Y and Z, but any pair completely specifies the remaining random variable. Thus we will see that a probability model for the pair (X, Z) or for the pair (X, Y ) will be sufficient to analyze experiments related to this system.

Section 5.1

Joint Cumulative Distribution Function

Figure 5.1 Y {X £ x, Y £ y}

(x,y)

X

The area of the (X, Y ) plane corresponding to the joint cumulative distribution function FX,Y (x, y).

Joint Cumulative Distribution Definition 5.1 Function (CDF) The joint cumulative distribution function of random variables X and Y is FX,Y (x, y ) = P [X ≤ x, Y ≤ y ] .

Theorem 5.1 For any pair of random variables, X, Y , (a) (b) (c) (d) (e)

0 ≤ FX,Y (x, y) ≤ 1, FX,Y (∞, ∞) = 1, FX(x) = FX,Y (x, ∞), FY (y) = FX,Y (∞, y), FX,Y (x, −∞) = 0,

(f) FX,Y (−∞, y) = 0, (g) If x ≤ x1 and y ≤ y1, then FX,Y (x, y ) ≤ FX,Y (x1, y1)

Example 5.2 Problem X years is the age of children entering first grade in a school. Y years is the age of children entering second grade. The joint CDF of X and Y is

FX,Y (x, y ) =

Find FX(x) and FY (y).

   0      0     

(x − 5)(y − 6)  y−6       x−5     1

x < 5, y < 6, 5 ≤ x < 6, 6 ≤ y < 7, x ≥ 6, 6 ≤ y < 7, 5 ≤ x < 6, y ≥ 7, otherwise.

(5.3)

Example 5.2 Solution Using Theorem 5.1(b) and Theorem 5.1(c), we find

FX (x) =

   0   

x−5 1

x < 5, 5 ≤ x < 6, x ≥ 6,

F Y (y ) =

   0

y−6 1

  

y < 6, 6 ≤ y < 7, y ≥ 7.

(5.4)

Referring to Theorem 4.6, we see from Equation (5.4) that X is a continuous uniform (5, 6) random variable and Y is a continuous uniform (6, 7) random variable.

Theorem 5.2 P [x1 < X ≤ x2, y1 < Y ≤ y2] = FX,Y (x2, y2) − FX,Y (x2, y1) − FX,Y (x1, y2) + FX,Y (x1, y1) .

Quiz 5.1 Express the following extreme values of the joint CDF FX,Y (x, y) as numbers or in terms of the CDFs FX(x) and FY (y).

(a) FX,Y (−∞, 2)

(b) FX,Y (∞, ∞)

(c) FX,Y (∞, y)

(d) FX,Y (∞, −∞)

Quiz 5.1 Solution Each value of the joint CDF can be found by considering the corresponding probability. (a) FX,Y (−∞, 2) = P[X ≤ −∞, Y ≤ 2] ≤ P[X ≤ −∞] = 0 since X cannot take on the value −∞. (b) FX,Y (∞, ∞) = P[X ≤ ∞, Y ≤ ∞] = 1. This result is given in Theorem 5.1. (c) FX,Y (∞, y) = P[X ≤ ∞, Y ≤ y] = P[Y ≤ y] = FY (y). (d) FX,Y (∞, −∞) = P[X ≤ ∞, Y ≤ −∞] = P[Y ≤ −∞] = 0 since Y cannot take on the value −∞.

Section 5.2

Joint Probability Mass Function

Joint Probability Mass Definition 5.2 Function (PMF) The joint probability mass function of discrete random variables X and Y is PX,Y (x, y ) = P [X = x, Y = y ] .

Example 5.3 Problem Test two integrated circuits one after the other. On each test, the possible outcomes are a (accept) and r (reject). Assume that all circuits are acceptable with probability 0.9 and that the outcomes of successive tests are independent. Count the number of acceptable circuits X and count the number of successful tests Y before you observe the first reject. (If both tests are successful, let Y = 2.) Draw a tree diagram for the experiment and find the joint PMF PX,Y (x, y).

Example 5.3 Solution

0.9

 

 HH

 

a



H HH

H HH

0.1

H H

r

 a •aa 0.9  

X=2,Y =2The experiment has the tree dia-

0.1

r •ar

gram shown to the left. The sample X=1,Y =1space of the experiment is

0.9

a •ra

X=1,Y =0

0.1

r •rr

X=0,Y =0Observing the tree diagram, we com-

 X X XXX XXX X X

      XXX XXX XXX X

S = {aa, ar, ra, rr} .

(5.7)

pute P [aa] = 0.81,

P [ar] = 0.09,

(5.8)

P [ra] = 0.09,

P [rr] = 0.01.

(5.9)

Each outcome specifies a pair of values X and Y . Let g(s) be the function that transforms each outcome s in the sample space S into the pair of random variables (X, Y ). Then g(aa) = (2, 2),

g(ar) = (1, 1),

g(ra) = (1, 0),

g(rr) = (0, 0). (5.10) [Continued]

Example 5.3 Solution

(Continued 2)

For each pair of values x, y, PX,Y (x, y) is the sum of the probabilities of the outcomes for which X = x and Y = y. For example, PX,Y (1, 1) = P[ar]. PX,Y (x, y) y = 0 y = 1 y = 2 x=0 0.01 0 0 0.09 0.09 0 x=1 x=2 0 0 0.81

The joint PMF can be represented by the table on left, or, as shown below, as a set of labeled points in the x, y plane where each point is a possible value of the pair (x, y), or as a simple list: y 2



6



1 0 • 0

.01

• 1

.81

.09

PX,Y (x, y ) = .09 -

2

x

   0.81       0.09

0.09

    0.01     0

x = 2, y = 2, x = 1, y = 1, x = 1, y = 0, x = 0, y = 0. otherwise

Theorem 5.3 For discrete random variables X and Y and any set B in the X, Y plane, the probability of the event {(X, Y ) ∈ B} is P [B ] =

X (x,y)∈B

PX,Y (x, y ) .

Example 5.4 Problem Continuing Example 5.3, find the probability of the event B that X, the number of acceptable circuits, equals Y , the number of tests before observing the first failure.

Example 5.4 Solution Mathematically, B is the event {X = Y }. The elements of B with nonzero probability are B ∩ SX,Y = {(0, 0), (1, 1), (2, 2)} .

(5.12)

Therefore, P [B ] = PX,Y (0, 0) + PX,Y (1, 1) + PX,Y (2, 2) = 0.01 + 0.09 + 0.81 = 0.91.

(5.13)

Figure 5.2 Y

Y

B={X+Y £ 3}

X

Subsets B of the (X, Y ) plane. bullets.

B={X 2 + Y 2 £ 9}

X

Points (X, Y ) ∈ SX,Y are marked by

Quiz 5.2 The joint PMF PQ,G(q, g) for random variables Q and G is given in the following table: PQ,G (q, g ) g = 0 g = 1 g = 2 g = 3 q=0 0.06 0.18 0.24 0.12 q=1 0.04 0.12 0.16 0.08 Calculate the following probabilities: (a) P[Q = 0] (b) P[Q = G] (c) P[G > 1] (d) P[G > Q]

Quiz 5.2 Solution From the joint PMF of Q and G given in the table, we can calculate the requested probabilities by summing the PMF over those values of Q and G that correspond to the event. (a) The probability that Q = 0 is P [Q = 0] = PQ,G (0, 0) + PQ,G (0, 1) + PQ,G (0, 2) + PQ,G (0, 3) = 0.06 + 0.18 + 0.24 + 0.12 = 0.6.

(1)

(b) The probability that Q = G is P [Q = G] = PQ,G (0, 0) + PQ,G (1, 1) = 0.18.

(2)

(c) The probability that G > 1 is P [G > 1] =

3 X 1 X

PQ,G (q, g)

g=2 q=0

= 0.24 + 0.16 + 0.12 + 0.08 = 0.6.

(3)

(d) The probability that G > Q is P [G > Q] =

3 1 X X

PQ,G (q, g)

q=0 g=q+1

= 0.18 + 0.24 + 0.12 + 0.16 + 0.08 = 0.78.

(4)

Section 5.3

Marginal PMF

Theorem 5.4 For discrete random variables X and Y with joint PMF PX,Y (x, y), P X ( x) =

X y∈SY

PX,Y (x, y ) ,

P Y (y ) =

X x∈SX

PX,Y (x, y ) .

Example 5.5 Problem PX,Y (x, y) y = 0 y = 1 y = 2 x=0 0.01 0 0 0.09 0.09 0 x=1 0 0 0.81 x=2 Y.

In Example 5.3, we found that random variables X and Y have the joint PMF shown in this table. Find the marginal PMFs for the random variables X and

Example 5.5 Solution We note that both X and Y have range {0, 1, 2}. Theorem 5.4 gives PX (0) =

2 X

PX,Y (0, y) = 0.01

y=0

PX (2) =

2 X

PX (1) =

2 X

PX,Y (1, y) = 0.18

(5.14)

x 6= 0, 1, 2

(5.15)

y=0

PX,Y (2, y) = 0.81

PX (x) = 0

y=0

Referring to the table representation of PX,Y (x, y), we observe that each value of PX(x) is the result of adding all the entries in one P row of the table. Similarly, the formula for the PMF of Y in Theorem 5.4, PY (y) = x∈SX PX,Y (x, y), is the sum of all the entries in one column of the table. [Continued]

Example 5.5 Solution

(Continued 2)

We display PX(x) and PY (y) by rewriting the table and placing the row sums and column sums in the margins. PX,Y (x, y) x=0 x=1 x=2 PY (y)

y=0 y=1 y=2 0.01 0 0 0.09 0.09 0 0 0 0.81 0.10 0.09 0.81

PX (x) 0.01 0.18 0.81

Thus the column in the right margin shows PX(x) and the row in the bottom margin shows PY (y). Note that the sum of all the entries in the bottom margin is 1 and so is the sum of all the entries in the right margin. This is simply a verification of Theorem 3.1(b), which states that the PMF of any random variable must sum to 1.

Quiz 5.3 The probability mass function PH,B(h, b) for the two random variables H and B is given in the following table. Find the marginal PMFs PH(h) and PB(b). PH,B (h, b) b = 0 b = 2 b = 4 h = −1 0 0.4 0.2 0.1 0 0.1 h=0 h=1 0.1 0.1 0

(5.16)

Quiz 5.3 Solution By Theorem 5.4, the marginal PMF of H is X PH (h) = PH,B (h, b) .

(1)

b=0,2,4

For each value of h, this corresponds to calculating the row sum across the table of the joint PMF. Similarly, the marginal PMF of B is PB (b) =

1 X

PH,B (h, b) .

(2)

h=−1

For each value of b, this corresponds to the column sum down the table of the joint PMF. The easiest way to calculate these marginal PMFs is to simply sum each row and column: PH,B(h,b)

h = −1 h=0 h=1 PB(b)

b=0 b=2 b=4 0 0.4 0.2 0.1 0 0.1 0.1 0.1 0 0.2 0.5 0.3

PH(h)

0.6 0.2 0.2

Section 5.4

Joint Probability Density Function

Joint Probability Density Definition 5.3 Function (PDF) The joint PDF of the continuous random variables X and Y is a function fX,Y (x, y) with the property FX,Y (x, y ) =

Z x

Z y

−∞ −∞

fX,Y (u, v ) dv du.

Theorem 5.5 fX,Y (x, y ) =

∂ 2FX,Y (x, y ) ∂x ∂y

Example 5.6 Problem Use the joint CDF for childrens’ ages X and Y given in Example 5.2 to derive the joint PDF presented in Equation (5.5).

Example 5.6 Solution Referring to Equation (5.3) for the joint CDF FX,Y (x, y), we must evaluate the partial derivative ∂ 2FX,Y (x, y)/∂x ∂y for each of the six regions specified in Equation (5.3). However, ∂ 2FX,Y (x, y)/∂x ∂y is nonzero only if FX.Y (x.y) is a function of both x and y. In this example, only the region {5 ≤ x < 6, 6 ≤ y < 7} meets this requirement. Over this region, ∂2 ∂ ∂ fX,Y (x, y ) = [x − 5] [y − 6] = 1. [(x − 5)(y − 6)] = ∂x ∂y ∂x ∂y Over all other regions, the joint PDF fX,Y (x, y) is zero.

(5.18)

Theorem 5.6 A joint PDF fX,Y (x, y) has the following properties corresponding to first and second axioms of probability (see Section 1.2): (a) fX,Y (x, y) ≥ 0 for all (x, y),

(b)

Z ∞ Z ∞ −∞ −∞

fX,Y (x, y) dx dy = 1.

Theorem 5.7 The probability that the continuous random variables (X, Y ) are in A is P [A] =

ZZ A

fX,Y (x, y ) dx dy.

Example 5.7 Problem Random variables X and Y have joint PDF fX,Y (x, y ) =

 c 0

0 ≤ x ≤ 5, 0 ≤ y ≤ 3, otherwise.

Find the constant c and P[A] = P[2 ≤ X < 3, 1 ≤ Y < 3].

(5.19)

Example 5.7 Solution The large rectangle in the diagram is the area of nonzero probability. Theorem 5.6 states that the integral of the joint PDF over this rectangle is 1:

Y

1=

Z 5Z 3 0

A

X

0

c dy dx = 15c.

(5.20)

Therefore, c = 1/15. The small dark rectangle in the diagram is the event A = {2 ≤ X < 3, 1 ≤ Y < 3}. P[A] is the integral of the PDF over this rectangle, which is P [A] =

Z 3Z 3 1 2

1 15

dv du = 2/15.

(5.21)

This probability model is an example of a pair of random variables uniformly distributed over a rectangle in the X, Y plane.

Figure 5.3 Y

Y

1

1

y y

X x

X x

1

1

0≤y≤x≤1 (b)

x < 0 or y < 0 (a) Y

Y

1

1

y y

X x

1

X 1 x

0≤x1 (c) (d) (Case (e) covering the whole triangle is omitted.)

Five cases for the CDF FX,Y (x, y) of Example 5.8.

Example 5.8 Problem Find the joint CDF FX,Y (x, y) when X and Y have joint PDF

Y 1

fXY(x,y)=2 fX,Y (x, y ) =

X 1

 2 0

0 ≤ y ≤ x ≤ 1, otherwise.

(5.22)

Example 5.8 Solution We can derive the joint CDF using Definition 5.3 in which we integrate the joint PDF fX,Y (x, y) over the area shown in Figure 5.1. To perform the integration it is extremely useful to draw a diagram that clearly shows the area with nonzero probability and then to use the diagram to derive the limits of the integral in Definition 5.3. The difficulty with this integral is that the nature of the region of integration depends critically on x and y. In this apparently simple example, there are five cases to consider! The five cases are shown in Figure 5.3. First, we note that with x < 0 or y < 0, the triangle is completely outside the region of integration, as shown in Figure 5.3a. Thus we have FX,Y (x, y) = 0 if either x < 0 or y < 0. Another simple case arises when x ≥ 1 and y ≥ 1. In this case, we see in Figure 5.3e that the triangle is completely inside the region of integration, and we infer from Theorem 5.6 that FX,Y (x, y) = 1. The other cases we must consider are more complicated. In each case, since fX,Y (x, y) = 2 over the triangular region, the value of the integral is two times the indicated area. When (x, y) is inside the area of nonzero probability (Figure 5.3b), the integral is Z yZ x (Figure 5.3b). (5.23) FX,Y (x, y) = 2 du dv = 2xy − y 2 0

v

[Continued]

Example 5.8 Solution

(Continued 2)

In Figure 5.3c, (x, y) is above the triangle, and the integral is Z xZ x FX,Y (x, y) = 2 du dv = x2 (Figure 5.3c). 0

(5.24)

v

The remaining situation to consider is shown in Figure 5.3d, when (x, y) is to the right of the triangle of nonzero probability, in which case the integral is Z yZ 1 FX,Y (x, y) = 2 du dv = 2y − y 2 (Figure 5.3d) (5.25) 0

v

The resulting CDF, corresponding to the  0     2xy − y 2 x2 FX,Y (x, y) =  2  2y − y   1

five cases of Figure 5.3, is x < 0 or y < 0 0≤y≤x≤1 0 ≤ x < y, 0 ≤ x ≤ 1 0 ≤ y ≤ 1, x > 1 x > 1, y > 1

(a), (b), (c), (d), (e).

(5.26)

In Figure 5.4, the surface plot of FX,Y (x, y) shows that cases (a) through (e) correspond to contours on the “hill” that is FX,Y (x, y). In terms of visualizing the random variables, the surface plot of FX,Y (x, y) is less instructive than the simple triangle characterizing the PDF fX,Y (x, y). Because the PDF in this example is fX,Y (x, y) = 2 over (x, y) ∈ SX,Y , each probability is just two times the area of the region shown in one of the diagrams (either a triangle or a trapezoid). You may want to apply some high school geometry to verify that the results obtained from the integrals are indeed twice the areas of the regions indicated. The approach taken in our solution, integrating over SX,Y to obtain the CDF, works for any PDF.

Figure 5.4

1

0.5 2 1 0 0

0.5

1

1.5 x

2

A graph of the joint CDF FX,Y (x, y) of Example 5.8.

0

y

Example 5.9 Problem As in Example 5.7, random variables X and Y have joint PDF fX,Y (x, y ) = What is P[A] = P[Y > X]?

 1/15 0

0 ≤ x ≤ 5, 0 ≤ y ≤ 3, otherwise.

(5.27)

Example 5.9 Solution Applying Theorem 5.7, we integrate fX,Y (x, y) over the part of the X, Y plane satisfying Y > X. In this case,

Y

Y>X

P [A] = =

X

Z 3

! Z 3 1

0

x 15

Z 3 3−x 0

15

dy dx

3 2 (3 − x) 3 dx = − . = 30 10 0

(5.28) (5.29)

Quiz 5.4 The joint probability density function of random variables X and Y is fX,Y (x, y ) =

 cxy 0

0 ≤ x ≤ 1, 0 ≤ y ≤ 2, otherwise.

(5.30)

Find the constant c. What is the probability of the event A = X 2 + Y 2 ≤ 1?

Quiz 5.4 Solution To find the constant c, we apply Z ∞Z ∞ fX,Y (x, y) dx dy 1= −∞ −∞ 2Z 1

Z =

Z

cxy dx dy = c 0

0

2

y 0

! Z 2 2 2 c cy = c. dy = y dy = 2 0 2 0 4 0

1 x2

Thus c = 1.

(1)

Y 2

To calculate P[A], we write

ZZ P [A] =

fX,Y (x, y) dx dy

(2)

A

1

A

To integrate over A, we convert to polar coordinates using the substitutions x = r cos θ, y = r sin θ and dx dy = r dr dθ.

1

X

This yields

Z

π/2 Z 1

P [A] = 0

r2 sin θ cos θ r dr dθ

0 1

Z

r3 dr

= 0

Z 0

π/2



1  4 sin θ cos θ dθ = r /4 0

π/2! 1 sin θ = . 2 0 8 2

(3)

Section 5.5

Marginal PDF

Theorem 5.8 If X and Y are random variables with joint PDF fX,Y (x, y), fX (x) =

Z ∞ −∞

fX,Y (x, y ) dy,

f Y (y ) =

Z ∞ −∞

fX,Y (x, y ) dx.

Proof: Theorem 5.8 From the definition of the joint PDF, we can write FX (x) = P [X ≤ x] =

Z ∞

Z x −∞

−∞



fX,Y (u, y ) dy

du.

(5.31)

Taking the derivative of both sides with respect to x (which involves differentiating an integral with variable limits), we obtain f X ( x) =

Z ∞ −∞

fX,Y (x, y ) dy

. A similar argument holds for fY (y).

Example 5.10 Problem The joint PDF of X and Y is fX,Y (x, y ) =

 5y/4 0

−1 ≤ x ≤ 1, x2 ≤ y ≤ 1, otherwise.

Find the marginal PDFs fX(x) and fY (y).

(5.32)

Example 5.10 Solution We use Theorem 5.8 to find the marginal PDF fX(x). In the figure that accompanies Equation (5.33) below, the gray bowl-shaped region depicts those values of X and Y for which fX,Y (x, y) > 0. When x < −1 or when x > 1, fX,Y (x, y) = 0, and therefore fX(x) = 0. For −1 ≤ x ≤ 1,

Y X=x 1 f X ( x) =

2

x

X

5(1 − x4) dy = . 4 8

Z 1 5y x2

(5.33)

fX(x)

x 1 -1 The complete expression for the marginal PDF of X is 0.5

f X ( x) = 0 −1

0 x

 5(1 − x4)/8 0

−1 ≤ x ≤ 1, otherwise.

(5.34)

1

[Continued]

Example 5.10 Solution

(Continued 2)

For the marginal PDF of Y , we note that for y < 0 or y > 1, fY (y) = 0. For 0 ≤ y ≤ 1, we integrate over the horizontal bar marked Y = y. The boundaries of the bar are √ √ x = − y and x = y. Therefore, for 0 ≤ y ≤ 1,

Y 1

1/2

-1 -y

Y=y X 1/2 y 1

x=√y 5y 5y fY (y) = √ = 5y 3/2 /2. dx = x 4 x=−√y − y 4 Z

√ y

(5.35)

The complete marginal PDF of Y is

fY(y)

3 2 

1

fY (y) =

0 −1

0 y

1

(5/2)y 3/2 0

0 ≤ y ≤ 1, otherwise.

(5.36)

Quiz 5.5 The joint probability density function of random variables X and Y is fX,Y (x, y ) =

 6(x + y 2)/5 0

0 ≤ x ≤ 1, 0 ≤ y ≤ 1, otherwise.

Find fX(x) and fY (y), the marginal PDFs of X and Y .

(5.37)

Quiz 5.5 Solution By Theorem 5.8, the marginal PDF of X is Z ∞ fX (x) = fX,Y (x, y) dy

(1)

−∞

Note that fX(x) = 0 for x < 0 or x > 1. For 0 ≤ x ≤ 1, Z  y=1 6 1 6x + 2 6 fX (x) = xy + y 3 /3 y=0 = (x + y 2 ) dy = 5 0 5 5 The complete expression for the PDF of X is  (6x + 2)/5 fX (x) = 0

0≤x≤1 otherwise

(2)

(3)

By the same method we obtain the marginal PDF for Y . For 0 ≤ y ≤ 1, Z ∞ fY (y) = fX,Y (x, y) dy −∞

6 = 5

Z 0

1

 x=1  2 6 x 6y 2 + 3 2 2 (x + y ) dx = + xy = . 5 2 5 x=0

Since fY (y) = 0 for y < 0 or y > 1, the complete expression for the PDF of Y is  (3 + 6y 2 )/5 0 ≤ y ≤ 1, fY (y) = 0 otherwise.

(4)

(5)

Section 5.6

Independent Random Variables

Definition 5.4 Independent Random Variables Random variables X and Y are independent if and only if Discrete:

PX,Y (x, y) = PX(x)PY (y)

Continuous: fX,Y (x, y) = fX(x)fY (y).

Example 5.11 Problem Are the childrens’ ages X and Y in Example 5.2 independent?

Example 5.11 Solution In Example 5.2, we derived the CDFs FX(x) and FY (y), which showed that X is uniform (5, 6) and Y is uniform (6, 7). Thus X and Y have marginal PDFs f X ( x) =

 1 0

5 ≤ x < 6, otherwise,

fY (y ) =

 1 0

6 ≤ x < 7, otherwise.

(5.38)

Referring to Equation (5.5), we observe that fX,Y (x, y) = fX(x)fY (y). Thus X and Y are independent.

Example 5.12 Problem fX,Y (x, y ) = Are X and Y independent?

 4xy 0

0 ≤ x ≤ 1, 0 ≤ y ≤ 1, otherwise.

Example 5.12 Solution The marginal PDFs of X and Y are f X ( x) =

 2x 0

0 ≤ x ≤ 1, otherwise,

fY (y ) =

 2y 0

0 ≤ y ≤ 1, otherwise.

(5.39)

It is easily verified that fX,Y (x, y) = fX(x)fY (y) for all pairs (x, y), and so we conclude that X and Y are independent.

Example 5.13 Problem

fU,V (u, v ) = Are U and V independent?

 24uv 0

u ≥ 0, v ≥ 0, u + v ≤ 1, otherwise.

(5.40)

Example 5.13 Solution Since fU,V (u, v) looks similar in form to fX,Y (x, y) in the previous example, we might suppose that U and V can also be factored into marginal PDFs fU(u) and fV (v). However, this is not the case. Owing to the triangular shape of the region of nonzero probability, the marginal PDFs are fU (u) =

 12u(1 − u)2 0

 12v(1 − v)2

0 ≤ u ≤ 1, fV (v ) = 0 otherwise,

0 ≤ v ≤ 1, otherwise.

Clearly, U and V are not independent. Learning U changes our knowledge of V . For example, learning U = 1/2 informs us that P[V ≤ 1/2] = 1.

Example 5.14 Problem Consider again the noisy observation model of Example 5.1. Suppose X is a Gaussian (0, σX ) information signal sent by a radio transmitter and Y = X + Z is the output of a low-noise amplifier attached to the antenna of a radio receiver. The noise Z is a Gaussian (0, σZ ) random variable that is generated within the receiver. What is the joint PDF fX,Z(x, z)?

Example 5.14 Solution From the information given, we know that X and Z have PDFs fX (x) = q

1 2 2πσX

2 −x2 /2σX

e

,

1

2 −z 2 /2σZ

fZ (z ) = q e 2 2πσZ

.

(5.41)

The signal X depends on the information being transmitted by the sender and the noise Z depends on electrons bouncing around in the receiver circuitry. As there is no reason for these to be related, we model X and Z as independent. Thus, the joint PDF is fX,Z (x, z ) = fX (x) fZ (z ) =

1 2π

q

2 σ2 σX Z

1 −2

e



x2 + z 2 σ2 σ2 X Z



.

(5.42)

Quiz 5.6(A) Random variables X and Y in Example 5.3 and random variables Q and G in Quiz 5.2 have joint PMFs: PX,Y (x, y) x=0 x=1 x=2

y=0 y=1 y=2 0.01 0 0 0.09 0.09 0 0 0 0.81

(a) Are X and Y independent?

(b) Are Q and G independent?

PQ,G(q, g) q=0 q=1

g=0 g=1 g=2 g=3 0.06 0.18 0.24 0.12 0.04 0.12 0.16 0.08

Quiz 5.6(A) Solution (a) For random variables X and Y from Example 5.3, we observe that PY (1) = 0.09 and PX(0) = 0.01. However, PX,Y (0, 1) = 0 6= PX (0) PY (1)

(1)

Since we have found a pair x, y such that PX,Y (x, y) 6= PX(x)PY (y), we can conclude that X and Y are dependent. Note that whenever PX,Y (x, y) = 0, independence requires that either PX(x) = 0 or PY (y) = 0. (b) For random variables Q and G from Quiz 5.2, it is not obvious whether they are independent. Unlike X and Y in part (a), there are no obvious pairs q, g that fail the independence requirement. In this case, we calculate the marginal PMFs from the table of the joint PMF PQ,G(q, g) in Quiz 5.2. In transposed form, this table is PQ,G(q, g) g=0 g=1 g=2 g=3 PQ(q)

q=0 q=1 0.06 0.04 0.18 0.12 0.24 0.16 0.12 0.08 0.60 0.40

PG(g) 0.10 0.30 0.40 0.20

Careful study of the table will verify that PQ,G(q, g) = PQ(q)PG(g) for every pair q, g. Hence Q and G are independent.

Quiz 5.6(B) Random variables X1 and X2 are independentand identically distributed with probability density function fX (x) =

 x/2 0

What is the joint PDF fX1,X2(x1, x2)?

0 ≤ x ≤ 2, otherwise.

(5.43)

Quiz 5.6(B) Solution Since X1 and X2 are identical, fX1(x) = fX2(x) = fX(x). Since X1 and X2 are independent,   x1 · x2 fX1,X2 (x1, x2) = fX1 (x1) fX2 (x2) = 2 2 0

0 ≤ x1, x2 ≤ 2, otherwise.

(1)

Section 5.7

Expected Value of a Function of Two Random Variables

Theorem 5.9 For random variables X and Y , the expected value of W = g(X, Y ) is Discrete:

E[W ] =

X

X

g(x, y)PX,Y (x, y)

x∈SX y∈SY

Continuous: E[W ] =

Z ∞ Z ∞ −∞ −∞

g(x, y)fX,Y (x, y) dx dy.

Theorem 5.10 E [a1g1(X, Y ) + · · · + angn(X, Y )] = a1 E [g1(X, Y )] + · · · + an E [gn(X, Y )] .

Proof: Theorem 5.10 Let g(X, Y ) = a1 g1 (X, Y ) + · · · + an gn (X, Y ). For discrete random variables X, Y , Theorem 5.9 states X X E [g(X, Y )] = (5.44) (a1 g1 (x, y) + · · · + an gn (x, y)) PX,Y (x, y) . x∈SX y∈SY

We can break the double summation into n weighted double summations: X X X X E [g(X, Y )] = a1 g1 (x, y)PX,Y (x, y) + · · · + an gn (x, y)PX,Y (x, y) . x∈SX y∈SY

x∈SX y∈SY

By Theorem 5.9, the ith double summation on the right side is E[gi (X, Y )]; thus, E [g(X, Y )] = a1 E [g1 (X, Y )] + · · · + an E [gn (X, Y )] . For continuous random variables, Theorem 5.9 says Z ∞Z ∞ E [g(X, Y )] = (a1 g1 (x, y) + · · · + an gn (x, y)) fX,Y (x, y) dx dy. −∞

(5.45)

(5.46)

−∞

To complete the proof, we express this integral as the sum of n integrals and recognize that each of the new integrals is a weighted expected value, ai E[gi (X, Y )].

Theorem 5.11 For any two random variables X and Y , E [X + Y ] = E [X ] + E [Y ] .

Theorem 5.12 The variance of the sum of two random variables is Var [X + Y ] = Var [X ] + Var [Y ] + 2 E [(X − µX )(Y − µY )] .

Proof: Theorem 5.12 Since E[X + Y ] = µX + µY , h

Var[X + Y ] = E (X + Y − (µX + µY )) h

2

i

= E ((X − µX ) + (Y − µY ))

2

i

h

i 2 2 = E (X − µX ) + 2(X − µX )(Y − µY ) + (Y − µY ) . (5.47)

We observe that each of the three terms in the preceding expected values is a function of X and Y . Therefore, Theorem 5.10 implies h

i h i 2 2 Var[X + Y ] = E (X − µX ) + 2 E [(X − µX )(Y − µY )] + E (Y − µY ) .

(5.48) The first and last terms are, respectively, Var[X] and Var[Y ].

Example 5.15 Problem A company website has three pages. They require 750 kilobytes, 1500 kilobytes, and 2500 kilobytes for transmission. The transmission speed can be 5 Mb/s for external requests or 10 Mb/s for internal requests. Requests arrive randomly from inside and outside the company independently of page length, which is also random. The probability models for transmision speed, R, and page length, L, are:

P R (r ) =

   0.4

0.6 0

  

r = 5, r = 10, otherwise,

PL (l) =

  0.3     0.5  0.2     

0

l = 750, l = 1500, l = 2500, otherwise.

(5.49)

Write an expression for the transmission time g(R, L) seconds. Derive the expected transmission time E[g(R, L)]. Does E[(g(R, L)] = g(E[R], E[L])?

Example 5.15 Solution The transmission time T seconds is the the page length (in kb) divided by the transmission speed (in kb/s), or T = 8L/1000R. Because R and L are independent, PR,L(r, l) = PR(r)PL(l) and E [g(R, L)] =

XX r

PR (r) PL (l)

l

8l 1000r

 ! X P R (r ) X 8  = P L (l ) l  1000 r r l  

0.4 0.6 8 + (0.3(750) + 0.5(1500) + 0.2(2500)) 1000 5 10 = 1.652 s. (5.50) =

By comparison, E[R] = kilobytes. This implies g(E [R] , E [L]) =

P P rP (r) = 8 Mb/s and E[L] = R r l lPL(l) = 1475

8 E [L] = 1.475 s 6= E [g(R, L)] . 1000 E [R]

(5.51)

Section 5.8

Covariance, Correlation and Independence

Definition 5.5 Covariance The covariance of two random variables X and Y is Cov [X, Y ] = E [(X − µX ) (Y − µY )] .

Example 5.16 Suppose we perform an experiment in which we measure X and Y in centimeters (for example the height of two sisters). However, if we change units and measure height in meters, we will perform the same experiment ˆ = X/100 and Y ˆ = Y /100. In this case, X ˆ and Y ˆ except we observe X have expected values µX ˆ = µY /100 m and ˆ = µX /100 m, µY h

i

h

i

ˆ Y ˆ = E (X ˆ − µ ˆ )(Y ˆ − µˆ) Cov X, X Y E [(X − µX )(Y − µY )] Cov [X, Y ] 2 = = m . 10, 000 10, 000

(5.53)

Changing the unit of measurement from cm2 to m2 reduces the covariance by a factor of 10, 000. However, the tendency of X − µX and Y − µY ˆ − µ ˆ and Y ˆ − µˆ to have the same sign is the same as the tendency of X X Y to have the same sign. (Both are an indication of how likely it is that a girl is taller than average if her sister is taller than average).

Definition 5.6 Correlation Coefficient The correlation coefficient of two random variables X and Y is ρX,Y = q

Cov [X, Y ]

Cov [X, Y ] = . σ σ X Y Var[X] Var[Y ]

Figure 5.5

0 −2

2 Y

2 Y

Y

2

0 −2

−2

0 X

2

(a) ρX,Y = −0.9

0 −2

−2

0 X

(b) ρX,Y = 0

2

−2

0 X

2

(c) ρX,Y = 0.9

Each graph has 200 samples, each marked by a dot, of the random variable pair (X, Y ) such that E[X] = E[Y ] = 0, Var[X] = Var[Y ] = 1.

Theorem 5.13 ˆ = aX + b and Y ˆ = cY + d, then If X (a) ρX, ˆY ˆ = ρX,Y ,

ˆ Y ˆ ] = ac Cov[X, Y ]. (b) Cov[X,

Theorem 5.14 −1 ≤ ρX,Y ≤ 1.

Proof: Theorem 5.14 2 and σ 2 denote the variances of X and Y , and for a constant a, Let σX Y let W = X − aY . Then,

h

i 2 Var[W ] = E (X − aY ) − (E [X − aY ])2 .

(5.54)

Since E[X − aY ] = µX − aµY , expanding the squares yields h

i   2 2 2 2 2 2 Var[W ] = E X − 2aXY + a Y − µX − 2aµX µY + a µY

= Var[X] − 2a Cov [X, Y ] + a2 Var[Y ].

(5.55)

Since Var[W ] ≥ 0 for any a, we have 2a Cov[X, Y ] ≤ Var[X] + a2 Var[Y ]. Choosing a = σX /σY yields Cov[X, Y ] ≤ σY σX , which implies ρX,Y ≤ 1. Choosing a = −σX /σY yields Cov[X, Y ] ≥ −σY σX , which implies ρX,Y ≥ −1.

Theorem 5.15 If X and Y are random variables such that Y = aX + b, ρX,Y =

   −1 a < 0,  

0 a = 0, 1 a > 0.

5.8 Comment: Examples of Correlation Some examples of positive, negative, and zero correlation coefficients include: • X is a student’s height. Y is the same student’s weight. 0 < ρX,Y < 1. • X is the distance of a cellular phone from the nearest base station. Y is the power of the received signal at the cellular phone. −1 < ρX,Y < 0. • X is the temperature of a resistor measured in degrees Celsius. Y is the temperature of the same resistor measured in Kelvins. ρX,Y = 1 . • X is the gain of an electrical circuit measured in decibels. Y is the attenuation, measured in decibels, of the same circuit. ρX,Y = −1. • X is the telephone number of a cellular phone. Y is the Social Security number of the phone’s owner. ρX,Y = 0.

Definition 5.7 Correlation The correlation of X and Y is rX,Y = E[XY ]

Theorem 5.16 (a) Cov[X, Y ] = rX,Y − µX µY . (b) Var[X + Y ] = Var[X] + Var[Y ] + 2 Cov[X, Y ]. (c) If X = Y , Cov[X, Y ] = Var[X] = Var[Y ] and rX,Y = E[X 2] = E[Y 2].

Proof: Theorem 5.16 Cross-multiplying inside the expected value of Definition 5.5 yields Cov [X, Y ] = E [XY − µX Y − µY X + µX µY ] .

(5.56)

Since the expected value of the sum equals the sum of the expected values, Cov [X, Y ] = E [XY ] − E [µX Y ] − E [µY X ] + E [µY µX ] .

(5.57)

Note that in the expression E[µY X], µY is a constant. Referring to Theorem 3.12, we set a = µY and b = 0 to obtain E[µY X] = µY E[X] = µY µX . The same reasoning demonstrates that E[µX Y ] = µX E[Y ] = µX µY . Therefore, Cov [X, Y ] = E [XY ] − µX µY − µY µX + µY µX = rX,Y − µX µY .

(5.58)

The other relationships follow directly from the definitions and Theorem 5.12.

Example 5.17 Problem For the integrated circuits tests in Example 5.3, we found in Example 5.5 that the probability model for X and Y is given by the following matrix. PX,Y (x, y ) y = 0 y = 1 y = 2 PX (x) x=0 0.01 0 0 0.01 x=1 0.09 0.09 0 0.18 0 0 0.81 0.81 x=2 P Y (y ) 0.10 0.09 0.81 Find rX,Y and Cov[X, Y ].

Example 5.17 Solution By Definition 5.7, rX,Y = E [XY ] =

2 X 2 X

xyPX,Y (x, y )

(5.59)

x=0 y=0

= (1)(1)0.09 + (2)(2)0.81 = 3.33.

(5.60)

To use Theorem 5.16(a) to find the covariance, we find E [X ] = (1)(0.18) + (2)(0.81) = 1.80, E [Y ] = (1)(0.09) + (2)(0.81) = 1.71.

(5.61)

Therefore, by Theorem 5.16(a), Cov[X, Y ] = 3.33−(1.80)(1.71) = 0.252.

Definition 5.8 Orthogonal Random Variables Random variables X and Y are orthogonal if rX,Y = 0.

Definition 5.9 Uncorrelated Random Variables Random variables X and Y are uncorrelated if Cov[X, Y ] = 0.

Theorem 5.17 For independent random variables X and Y ,

(a) E[g(X)h(Y )] = E[g(X)] E[h(Y )], (b) rX,Y = E[XY ] = E[X] E[Y ], (c) Cov[X, Y ] = ρX,Y = 0, (d) Var[X + Y ] = Var[X] + Var[Y ],

Proof: Theorem 5.17 We present the proof for discrete random variables. By replacing PMFs and sums with PDFs and integrals we arrive at essentially the same proof for continuous random variables. Since PX,Y (x, y) = PX(x)PY (y), X X E [g(X)h(Y )] = g(x)h(y)PX (x) PY (y) x∈SX y∈SY

=

 ! X X  g(x)PX (x) h(y)PY (y) = E [g(X)] E [h(Y )] . x∈SX

(5.62)

y∈SY

If g(X) = X, and h(Y ) = Y , this equation implies rX,Y = E[XY ] = E[X] E[Y ]. This equation and Theorem 5.16(a) imply Cov[X, Y ] = 0. As a result, Theorem 5.16(b) implies Var[X + Y ] = Var[X] + Var[Y ]. Furthermore, ρX,Y = Cov[X, Y ]/(σX σY ) = 0.

Example 5.18 Problem For the noisy observation Y = X +Z of Example 5.1, find the covariances Cov[X, Z] and Cov[X, Y ] and the correlation coefficients ρX,Z and ρX,Y .

Example 5.18 Solution We recall from Example 5.1 that the signal X is Gaussian (0, σX ), that the noise Z is Gaussian (0, σZ ), and that X and Z are independent. We know from Theorem 5.17(c) that independence of X and Z implies Cov [X, Z ] = ρX,Z = 0.

(5.63)

In addition, by Theorem 5.17(d), 2 + σ2 . Var[Y ] = Var[X] + Var[Z] = σX (5.64) Z Since E[X] = E[Z] = 0, Theorem 5.11 tells us that E[Y ] = E[X] + E[Z] = 0 and Theorem 5.17)(b) says that E[XZ] = E[X] E[Z] = 0. This permits us to write

Cov [X, Y ] = E [XY ] = E [hX(X + Z)i] h i h i 2 2 2 2. = E X + XZ = E X + E [XZ ] = E X = σX (5.65) This implies ρX,Y = q

Cov [X, Y ]

2 σX

=q 2 (σ 2 + σ 2 ) σX Var[X] Var[Y ] X Z

v u u =t

2 /σ 2 σX Z . 2 /σ 2 1 + σX Z

(5.66)

Quiz 5.8(A) Random variables L and T have joint PMF PL,T (l, t) t = 40 sec t = 60 sec l = 1 page 0.15 0.1 0.30 0.2 l = 2 pages l = 3 pages 0.15 0.1. Find the following quantities. (a) E[L] and Var[L] (b) E[T ] and Var[T ] (c) The covariance Cov[L, T ] (d) The correlation coefficient ρL,T

Quiz 5.8(A) Solution It is helpful to first make a table that includes the marginal PMFs. PL,T(l, t) l=1 l=2 l=3 PT(t)

t = 40 0.15 0.3 0.15 0.6

t = 60 0.1 0.2 0.1 0.4

PL(l) 0.25 0.5 0.25

(a) The expected value of L is E [L] = 1(0.25) + 2(0.5) + 3(0.25) = 2. Since the second moment of L is   E L2 = 12 (0.25) + 22 (0.5) + 32 (0.25) = 4.5,

(1)

(2)

the variance of L is

 2 Var [L] = E L − (E [L])2 = 0.5.

(3)

(b) The expected value of T is E [T ] = 40(0.6) + 60(0.4) = 48. The second moment of T is   E T 2 = 402 (0.6) + 602 (0.4) = 2400.

(4)

(5) [Continued]

Quiz 5.8(A) Solution

(Continued 2)

Thus

  Var[T ] = E T 2 − (E [T ])2 = 96.

(6)

(c) First we need to find E [LT ] =

3 X X

ltPLT (lt)

t=40,60 l=1

= 1(40)(0.15) + 2(40)(0.3) + 3(40)(0.15) + 1(60)(0.1) + 2(60)(0.2) + 3(60)(0.1) = 96.

(7)

The covariance of L and T is Cov [L, T ] = E [LT ] − E [L] E [T ] = 96 − 2(48) = 0. (d) Since Cov[L, T ] = 0, the correlation coefficient is ρL,T = 0.

(8)

Quiz 5.8(B) The joint probability density function of random variables X and Y is fX,Y (x, y ) =

 xy 0

Find the following quantities. (a) E[X] and Var[X] (b) E[Y ] and Var[Y ] (c) The covariance Cov[X, Y ] (d) The correlation coefficient ρX,Y

0 ≤ x ≤ 1, 0 ≤ y ≤ 2, otherwise.

(5.67)

Quiz 5.8(B) Solution As in the discrete case, the calculations become easier if we first calculate the marginal PDFs fX(x) and fY (y). For 0 ≤ x ≤ 1, y=2 Z ∞ Z 2 1 2 fX (x) = fX,Y (x, y) dy = xy dy = xy = 2x. (1) 2 −∞ 0 y=0 Similarly, for 0 ≤ y ≤ 2,

Z



fY (y) =

2

Z fX,Y (x, y) dx =

−∞

0

x=1 y 1 = . xy dx = x2 y 2 2 x=0

The complete expressions for the marginal PDFs are   2x 0 ≤ x ≤ 1, y/2 fX (x) = fY (y) = 0 0 otherwise,

0 ≤ y ≤ 2, otherwise.

(2)

(3)

From the marginal PDFs, it is straightforward to calculate the various expectations. (a) The first and second moments of X are Z ∞ Z 1 2 (4) E [X] = xfX (x) dx = 2x2 dx = . 3 −∞ 0 Z ∞ Z 1   1 E X2 = x2 fX (x) dx = 2x3 dx = . (5) 2 −∞ 0 [Continued]

Quiz 5.8(B) Solution

(Continued 2)

The variance of X is Var[X] = E[X 2 ] − (E[X])2 = (a) The first and second moments of Y are Z ∞ Z E [Y ] = yfY (y) dy =

1 . 18

2

4 1 2 y dy = , 3 −∞ 0 2 Z ∞ Z 2  2 1 3 E Y = y dy = 2. y 2 fY (y) dy = 2 −∞ 0

(6) (7)

The variance of Y is

 2 2 16 Var[Y ] = E Y − (E [Y ])2 = 2 − = . 9 9

(8)

(b) We start by finding

Z



Z

E [XY ] =

xyfX,Y (x, y) dx, dy −∞ −∞ 1Z 2 2 2

Z =



0

0

1 2 x3 y 3

= 8. x y dx, dy = 3 0 3 0 9

(9)

The covariance of X and Y is then 8 2 4 − · = 0. 9 3 3 (c) Since Cov[X, Y ] = 0, the correlation coefficient is ρX,Y = 0. Cov [X, Y ] = E [XY ] − E [X] E [Y ] =

(10)

Section 5.9

Bivariate Gaussian Random Variables

Bivariate Gaussian Random Definition 5.10 Variables Random variables X and Y have a bivariate Gaussian PDF with parameters µX , µY , σX > 0, σY > 0, and ρX,Y satisfying −1 < ρX,Y < 1 if 

2  2  2ρ (x−µ )(y−µ ) x−µX y−µY X,Y X Y − +   σX σX σY σY   exp −  2 2(1 − ρX,Y ) q fX,Y (x, y ) = , 2 2πσX σY 1 − ρX,Y 

Figure 5.6 ρ = −0.9

ρ=0

0.1

0.3

2

0 0

−2 1

0 −2

y

2

−1

x

−2 1

−2

y

2

−2

0 −2

2

0

−2 1

y

2

2 Y

0

0 X

−1

x

2 Y

Y

0

x

2

−2

2

0.1 0

0 −1

0.2

0

0 −2

fX,Y(x,y)

2

0.1

0.2

X,Y

0.2

(x,y)

0.3

f

fX,Y(x,y)

0.3

ρ = 0.9

0 −2

−2

0 X

2

−2

0 X

2

The Joint Gaussian PDF fX,Y (x, y) for µX = µY = 0, σX = σY = 1, and three values of ρX,Y = ρ. Next to each PDF, we plot 200 sample pairs (X, Y ) generated with that PDF.

Theorem 5.18 If X and Y are the bivariate Gaussian random variables in Definition 5.10, X is the Gaussian (µX , σX ) random variable and Y is the Gaussian (µY , σY ) random variable: 2 /2σ 2 2 /2σ 2 1 1 −(x−µ −(y−µ ) ) X Y X Y. √ e √ e f X ( x) = , f Y (y ) = σX 2π σY 2π

Proof: Theorem 5.18 Integrating fX,Y (x, y) in Equation (5.69) over all y, we have fX (x) = =

Z ∞ −∞

fX,Y (x, y ) dy

1 √

σX 2π

2 −(x−µX )2 /2σX e

Z ∞ −∞ σ ˜Y

|

1 √

−(y−˜ µY (x))2 /2˜ σY2



e

{z 1

dy

(5.70)

}

The integral above the bracket equals 1 because it is the integral of a Gaussian PDF. The remainder of the formula is the PDF of the Gaussian (µX , σX ) random variable. The same reasoning with the roles of X and Y reversed leads to the formula for fY (y).

Theorem 5.19 Bivariate Gaussian random variables X and Y in Definition 5.10 have correlation coefficient ρX,Y .

Theorem 5.20 Bivariate Gaussian random variables X and Y are uncorrelated if and only if they are independent.

Theorem 5.21 If X and Y are bivariate Gaussian random variables with PDF given by Definition 5.10, and W1 and W2 are given by the linearly independent equations W1 = a1X + b1Y,

W2 = a2X + b2Y,

then W1 and W2 are bivariate Gaussian random variables such that E [Wi] = aiµX + biµY , 2 2 2 Var[Wi] = a2 i σX + bi σY + 2ai bi ρX,Y σX σY , 2 + b b σ 2 + (a b + a b )ρ Cov [W1, W2] = a1a2σX 1 2 Y 1 2 2 1 X,Y σX σY .

i = 1, 2, i = 1, 2,

Example 5.19 Problem For the noisy observation in Example 5.14, find the PDF of Y = X + Z.

Example 5.19 Solution Since X is Gaussian (0, σX ) and Z is Gaussian (0, σZ ) and X and Z are independent, X and Z are jointly Gaussian. It follows from Theorem 5.21 that Y is Gaussian with E[Y ] = E[X] + E[Z] = 0 and variance σY2 = 2 + σ 2 . The PDF of Y is σX Z fY (y ) = q

1 2 + σ2 ) 2π(σX Z

e

2 +σ 2 ) −y 2 /2(σX Z

.

(5.71)

Example 5.20 Problem Continuing Example 5.19, find the joint PDF of X and Y when σX = 4 and σZ = 3.

Example 5.20 Solution From Theorem 5.21, we know that X and Y are bivariate Gaussian. We 2 + σ2 = also know that µX = µY = 0 and that Y has variance σY2 = σX Z 25. Substituting σX = 4 and σZ = 3 in the formula for the correlation coefficient derived in Example 5.18, we have v u u ρX,Y = t

2 /σ 2 σX 4 Z = . 2 2 5 1 + σX /σZ

(5.72)

Applying these parameters to Definition 5.10, we obtain 1 − 25x2/16−2xy+y2/18 fX,Y (x, y ) = e . 24π

(5.73)

Quiz 5.9 Let X and Y be jointly Gaussian (0, 1) random variables with correlation coefficient 1/2. What is the joint PDF of X and Y ?

Quiz 5.9 Solution This problem just requires identifying the various parameters in Definition 5.10. Specifically, from the problem statement, we know ρ = 1/2 and µX = 0,

µY = 0,

σX = 1,

σY = 1.

Applying these facts to Definition 5.10, we have 2 2 e−2(x −xy+y )/3 √ fX,Y (x, y ) = . 2 3π

(1)

Section 5.10

Multivariate Probability Models

Definition 5.11 Multivariate Joint CDF The joint CDF of X1, . . . , Xn is FX1,...,Xn (x1, . . . , xn) = P [X1 ≤ x1, . . . , Xn ≤ xn] .

Definition 5.12 Multivariate Joint PMF The joint PMF of the discrete random variables X1, . . . , Xn is PX1,...,Xn (x1, . . . , xn) = P [X1 = x1, . . . , Xn = xn] .

Definition 5.13 Multivariate Joint PDF The joint PDF of the continuous random variables X1, . . . , Xn is the function fX1,...,Xn (x1, . . . , xn) =

∂ nFX1,...,Xn (x1, . . . , xn) ∂x1 · · · ∂xn

.

Theorem 5.22 If X1, . . . , Xn are discrete random variables with joint PMF PX1,...,Xn(x1, . . . , xn), (a) PX1,...,Xn(x1, . . . , xn) ≥ 0, (b)

X x1 ∈SX1

···

X xn ∈SXn

PX1,...,Xn(x1, . . . , xn) = 1.

Theorem 5.23 If X1, . . . , Xn have joint PDF fX1,...,Xn(x1, . . . , xn), (a) fX1,...,Xn(x1, . . . , xn) ≥ 0, (b) FX1,...,Xn(x1, . . . , xn) = (c)

Z ∞ −∞

···

Z ∞ −∞

Z x 1 −∞

···

Z x n −∞

fX1,...,Xn(u1, . . . , un) du1 · · · dun,

fX1,...,Xn(x1, . . . , xn) dx1 · · · dxn = 1.

Theorem 5.24 The probability of an event A expressed in terms of the random variables X1, . . . , Xn is Discrete:

X

P[A] =

PX1,...,Xn(x1, . . . , xn)

(x1 ,...,xn )∈A

Continuous: P[A] =

Z

··· A

Z

fX1,...,Xn(x1, . . . , xn) dx1 dx2 . . . dxn.

Example 5.21 Problem Consider a set of n independent trials in which there are r possible outcomes s1, . . . , sr for each trial. In each trial, P[si] = pi. Let Ni equal the number of times that outcome si occurs over n trials. What is the joint PMF of N1, . . . , Nr ?

Example 5.21 Solution The solution to this problem appears in Theorem 2.9 and is repeated here:   n n n r (5.74) PN1,...,Nr (n1, . . . , nr ) = p1 1 p2 2 · · · pn r . n1, . . . , nr

Theorem 5.25 For a joint PMF PW,X,Y,Z(w, x, y, z) of discrete random variables W, X, Y, Z, some marginal PMFs are PX,Y,Z (x, y, z ) = PW,Z (w, z ) =

X

PW,X,Y,Z (w, x, y, z ) ,

w∈SW X X x∈SX y∈SY

PW,X,Y,Z (w, x, y, z ) ,

Theorem 5.26 For a joint PDF fW,X,Y,Z(w, x, y, z) of continuous random variables W, X, Y, Z, some marginal PDFs are fW,X,Y (w, x, y ) = f X ( x) =

Z ∞

fW,X,Y,Z (w, x, y, z ) dz,

Z−∞ ∞ Z ∞ Z ∞

−∞ −∞ −∞

fW,X,Y,Z (w, x, y, z ) dw dy dz.

Example 5.22 Problem As in Quiz 5.10, the random variables Y1, . . . , Y4 have the joint PDF fY1,...,Y4 (y1, . . . , y4) =

 4 0

0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1, otherwise.

Find the marginal PDFs fY1,Y4(y1, y4), fY2,Y3(y2, y3), and fY3(y3).

(5.75)

Example 5.22 Solution fY1,Y4 (y1, y4) =

Z ∞ Z ∞ −∞ −∞

fY1,...,Y4 (y1, . . . , y4) dy2 dy3.

(5.76)

In the foregoing integral, the hard part is identifying the correct limits. These limits will depend on y1 and y4. For 0 ≤ y1 ≤ 1 and 0 ≤ y4 ≤ 1, fY1,Y4 (y1, y4) =

Z 1Z y 4 y1 0

4 dy3 dy2 = 4(1 − y1)y4.

(5.77)

The complete expression for fY1,Y4(y1, y4) is  4(1 − y )y 1 4 fY1,Y4 (y1, y4) = 0

0 ≤ y1 ≤ 1, 0 ≤ y4 ≤ 1, otherwise.

(5.78)

Similarly, for 0 ≤ y2 ≤ 1 and 0 ≤ y3 ≤ 1, fY2,Y3 (y2, y3) =

Z y Z 1 2 0

y3

4 dy4 dy1 = 4y2(1 − y3).

(5.79) [Continued]

Example 5.22 Solution

(Continued 2)

The complete expression for fY2,Y3(y2, y3) is  4y (1 − y ) 2 3 fY2,Y3 (y2, y3) = 0

0 ≤ y2 ≤ 1, 0 ≤ y3 ≤ 1, otherwise.

(5.80)

Lastly, for 0 ≤ y3 ≤ 1, fY3 (y3) =

Z ∞ −∞

fY2,Y3 (y2, y3) dy2 =

Z 1 0

4y2(1 − y3) dy2 = 2(1 − y3). (5.81)

The complete expression is  2(1 − y ) 3 fY3 (y3) = 0

0 ≤ y3 ≤ 1, otherwise.

(5.82)

N Independent Random Definition 5.14 Variables Random variables X1, . . . , Xn are independent if for all x1, . . . , xn, Discrete:

PX1,...,Xn(x1, . . . , xn) = PX1(x1)PX2(x2) · · · PXN(xn)

Continuous: fX1,...,Xn(x1, . . . , xn) = fX1(x1)fX2(x2) · · · fXn(xn).

Independent and Identically Definition 5.15 Distributed (iid) X1, . . . , Xn are independent and identically distributed (iid) if Discrete:

PX1,...,Xn(x1, . . . , xn) = PX(x1)PX(x2) · · · PX(xn)

Continuous: fX1,...,Xn(x1, . . . , xn) = fX(x1)fX(x2) · · · fX(xn).

Example 5.23 Problem The random variables X1, . . . , Xn have the joint PDF fX1,...,Xn (x1, . . . , xn) =

 1 0

0 ≤ xi ≤ 1, i = 1, . . . , n, otherwise.

Let A denote the event that maxi Xi ≤ 1/2. Find P[A].

(5.83)

Example 5.23 Solution We can solve this problem by applying Theorem 5.24: 



P [A] = P max Xi ≤ 1/2 = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2] i

Z 1/2

Z 1/2

1 (5.84) 1 dx1 · · · dxn = n . 2 0 0 As n grows, the probability that the maximum is less than 1/2 rapidly goes to 0. =

···

We note that inspection of the joint PDF reveals that X1, . . . , X4 are iid continuous uniform (0, 1) random variables. The integration in Equation (5.84) is easy because independence implies P [A] = P [X1 ≤ 1/2, . . . , Xn ≤ 1/2] = P [X1 ≤ 1/2] × · · · × P [Xn ≤ 1/2] = (1/2)n.

(5.85)

Quiz 5.10 The random variables Y1, . . . , Y4 have the joint PDF fY1,...,Y4 (y1, . . . , y4) =

 4 0

0 ≤ y1 ≤ y2 ≤ 1, 0 ≤ y3 ≤ y4 ≤ 1, otherwise.

Let C denote the event that maxi Yi ≤ 1/2. Find P[C].

(5.86)

Quiz 5.10 Solution We find P[C] by integrating the joint PDF over the region of interest. Specifically, P [C ] =

Z

1 2

dy2

Z y 2

dy1

Z

1 2

dy4

Z y 4

4dy3

0 0  0  0 Z 1 Z 1 2 2 = 4 y2 dy2  y4 dy4 0 0    1 1  2 2 2 1 1 1 1 = . = 4  y22   y42  = 4 2 0 2 0 8 16

(1)

Section 5.11

Matlab

Sample Space Grids • We start with the case when X and Y are finite random variables with ranges SX = {x1 , . . . , xn } ,

SY = {y1 , . . . , ym } .

(5.87)

In this case, we can take advantage of Matlab techniques for surface plots of g(x, y) over the x, y plane. • In Matlab, we represent SX and SY by the n element vector vector sy.

sx and m element

• The function [SX,SY]=ndgrid(sx,sy) produces the pair of n × m matrices,     x1 · · · x1 y1 · · · ym ...  , ...  . SX =  ... SY =  ... xn · · · xn y1 · · · ym

(5.88)

We refer to matrices SX and SY as a sample space grid because they are a grid representation of the joint sample space SX,Y = {(x, y)|x ∈ SX , y ∈ SY } . That is, [SX(i,j) SY(i,j)] is the pair (xi , yj ).

(5.89)

Probabilities on Grids • To complete the probability model, for X and Y , in Matlab, we employ the n × m matrix PXY such that PXY(i,j) = PX,Y (xi , yj ). • To make sure that probabilities have been generated properly, we note that [SX(:)

SY(:)

PXY(:)]

is a matrix whose rows list all possible pairs xi , yj and corresponding probabilities PX,Y (xi , yj ). • Given a function g(x, y) that operates on the elements of vectors x and y, the advantage of this grid approach is that the Matlab function g(SX,SY) will calculate g(x, y) for each x ∈ SX and y ∈ SY . • In particular, g(SX,SY) produces an n × m matrix with i, jth element g(xi , yj ).

Example 5.24 Problem An Internet photo developer website prints compressed photo images. Each image file contains a variable-sized image of X × Y pixels described by the joint PMF PX,Y (x, y ) y = 400 y = 800 y = 1200 x = 800 0.2 0.05 0.1 0.05 0.2 0.1 x = 1200 x = 1600 0 0.1 0.2.

(5.90)

For random variables X, Y , write a script imagepmf.m that defines the sample space grid matrices SX, SY, and PXY.

Example 5.24 Solution i0

h

In the script imagepmf.m, the matrix SX has 800 1200 1600 h

i

column and SY has 400 800 1200 for each row. After running imagepmf.m, we can inspect the variables: %imagepmf.m PXY=[0.2 0.05 0.1; ... 0.05 0.2 0.1; ... 0 0.1 0.2]; [SX,SY]=ndgrid([800 1200 1600],... [400 800 1200]);

>> imagepmf; SX SX = 800 800 1200 1200 1600 1600 >> SY SY = 400 800 400 800 400 800

800 1200 1600

1200 1200 1200

for each

Example 5.25 Problem At 24 bits (3 bytes) per pixel, a 10:1 image compression factor yields image files with B = 0.3XY bytes. Find the expected value E[B] and the PMF PB(b).

Example 5.25 Solution The script imagesize.m produces the expected value as eb, and produces the PMF, which is represented by the vectors sb and pb. The 3 × 3 matrix SB has i, jth element g(xi, yj ) = 0.3xiyj . The calculation of eb is simply a Matlab implementation of Theorem 5.9. Since some elements of SB are identical, sb=unique(SB) extracts the unique elements. Although SB and PXY are both 3 × 3 matrices, each is stored internally by Matlab as a 9-element vector. Hence, we can pass SB and PXY to the finitepmf() function, which was designed to handle a finite random variable described by a pair of column vectors. Figure 5.7 shows one result of running the program imagesize. The vectors sb and pb comprise PB(b). For example, PB(288000) = 0.3. %imagesize.m imagepmf; SB=0.3*(SX.*SY); eb=sum(sum(SB.*PXY)) sb=unique(SB)’ pb=finitepmf(SB,PXY,sb)’

Figure 5.7

>> imagesize eb = 319200 sb = 96000 144000 pb = 0.2000 0.0500

192000

288000

384000

432000

576000

0.0500

0.3000

0.1000

0.1000

0.2000

Output resulting from imagesize.m in Example 5.25.

Example 5.26 Problem Write a function xy=imagerv(m) that generates m sample pairs of the image size random variables X, Y of Example 5.25.

Example 5.26 Solution The function imagerv uses the imagesize.m script to define the matrices SX, SY, and PXY. It then calls the finiterv.m function. Here is the code imagerv.m and a sample run: function xy = imagerv(m); imagepmf; S=[SX(:) SY(:)]; xy=finiterv(S,PXY(:),m);

>> xy=imagerv(3) xy = 800 1200 1600

400 800 800

Example 5.27 Problem Given a list xy of sample pairs of random variables X, Y with Matlab range grids SX and SY, write a Matlab function fxy=freqxy(xy,SX,SY) that calculates the relative frequency of every pair x, y. The output fxy should correspond to the matrix [SX(:) SY(:) PXY(:)].

Example 5.27 Solution The matrix [SX(:) SY(:)] in freqxy has rows that list all possible pairs x, y. We append this matrix to xy to ensure that the new xy has every possible pair x, y. Next, the unique function copies all unique rows of xy to the matrix U and also provides the vector J that indexes the rows of xy in U; that is, xy=U(J). In addition, the number of occurrences of j in J indicates the number of occurrences in xy of row j in U. Thus we use the hist function on J to calculate the relative frequencies. We include the correction factor -1 because we had appended [SX(:) SY(:)] to xy at the start. Lastly, we reorder the rows of fxy because the output of unique produces the rows of U in a different order from [SX(:) SY(:) PXY(:)]. function fxy = freqxy(xy,SX,SY) xy=[xy; SX(:) SY(:)]; [U,I,J]=unique(xy,’rows’); N=hist(J,1:max(J))-1; N=N/sum(N); fxy=[U N(:)]; fxy=sortrows(fxy,[2 1 3]);

Example 5.28 Problem Generate m = 10, 000 samples of random variables X, Y of Example 5.25. Calculate the relative frequencies and use stem3 to graph them.

Example 5.28 Solution The script imagestem.m generates the following relative frequency stem plot. %imagestem.m imagepmf; xy=imagerv(10000); fxy=freqxy(xy,SX,SY); stem3(fxy(:,1),... fxy(:,2),fxy(:,3)); xlabel(’\it x’); ylabel(’\it y’);

0.2 0.1 1600 0 1200

800 800

400 y

0

0

x