Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Continuous Random Variables M. George Akritas

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Probability Density Function The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability The Exponential Distribution The Normal or Gaussian Distribution Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

I

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

A random variable X is called continuous if it can take any value within a finite or infinite interval of the real line.

Such are measurements of quantitative variables (weight, strength, life time, pH, concentration of contaminants, etc) as well as sample averages and variances of such measurements. • Because no measuring device has infinite precision, continuous variables are only the ideal versions of the discretized variables which are measured. • The sample space of any continuous random variable has uncountably infinite values.

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Continuous random variable cannot have a pmf because I

P(X = x) = 0, for any value x. (See Example 3.1, p. 102, of the book)

Definition The probability density function, or pdf, fX , of a continuous random variable X is a nonnegative function with the property that P(a < X < b) equals the area under it and above the interval (a, b). Thus,  I

P(a < X < b) =

area under fX = between a and b.

M. George Akritas

Z

b

fX (x)dx. a

Continuous Random Variables

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

0.3

0.4

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

0.0

0.1

f(x)

0.2

P(1.0 < X < 2.0)

-3

-2

-1

M. George Akritas

0

1

2

Continuous Random Variables

3

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Common Shapes of PDFs

symmetric

bimodal

positively skewed

negatively skewed

A positively skewed distribution is also called skewed to the right, and a negatively skewed is also called skewed to the left. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Proposition If X has pdf fX (x) and cdf FX (x), then Z x 1. FX (x) = fX (y )dy , and −∞

d FX (y )|y =x 2. fX (x) = dy

Example (Exponential RV) The life time, T , measured in hours, of a randomly selected component has pdf fT (t) = λ exp(−λt) for t ∈ SX , with λ = 0.001. Find: a) FT (t), and b) P(900 < T < 1200).

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

The Uniform Random Variable I

Consider selecting a number at random from the interval [0, 1] in such a way that any two subintervals of [0, 1] of equal length are equally likely to contain the selected number.

I

For example, the subintervals [0.3, 0.4] and [0.6, 0.7] are equally likely to contain the selected number.

I

If X denotes the outcome of such a selection, then X is said to have the uniform in [0, 1] distribution; this is denoted by X ∼ U(0, 1).

I

Since we know the probability with which X takes value in any interval, we know its distribution.

I

What pdf describes it? M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

The Uniform PDF If X ∼ U(0, 1), its pdf is:

0.0

0.2

0.4

0.6

0.8

1.0

P(0.2 < X < 0.6)

0.0

0.2

0.4

M. George Akritas

0.6

0.8

Continuous Random Variables

1.0

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

The Uniform cdf

0.0

0.2

0.4

0.6

0.8

1.0

If X ∼ U(0, 1), its cdf is (why?):

0.0

0.2

0.4

M. George Akritas

0.6

0.8

Continuous Random Variables

1.0

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Definition If X has pdf f (x), its expected value and variance are defined, respectively, by Z ∞ Z ∞ µX = xf (x)dx, σX2 = (x − µ)2 f (x)dx = E (X 2 ) − µ2X , −∞

−∞

The standard deviation is σX =

q σX2 .

Proposition 1. For a function Y = h(X ) of X , E (h(X )) =

R∞

−∞ h(x)f (x)dx

2. If Y = a + bX , 2 E (a + bX ) = a + bE (X ), σa+bX M. George Akritas

= b 2 σX2 .

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example If X ∼ U(0, 1), show that µX = 0.5 and σX2 = 1/12. R1 R1 Solution. First, µX = 0 xdx = 0.5, and E (X 2 ) = 0 x 2 dx = 1/3. Thus, σX2 = 1/3 − 0.52 = 1/12.

Example If Y ∼ U(A, B), show that µY =

B+A 2 ,

σY2 =

(B−A)2 12 .

Solution. First note that if X ∼ U(0, 1), then Y = A + (B − A)X ∼ U(A, B). Use now the previous proposition to show the result. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Proposition If Y is a ≥ 0 random variable, then E (Y ) =

R∞ 0

P(Y > y )dy .

Example Use the above proposition to calculate 1. The mean of a X ∼ U(0, 1). 2. The mean of X ∼ Exp(λ)

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example a) If X ∼ U(0, 1), find the distribution of Y = X 2 . b) If X ∼ U(−1, 1), find the distribution of Y = X 2 . c) If X ∼ U(0, 1), find the distribution of Y = log X . d) See Example 3.26, p.134, of the book.

Theorem Let X be continuous with pdf fX , and let g (x) be strictly monotonic and differentiable function. Then Y = g (X ) has pdf d fY (y ) = fX (g −1 (y )) g −1 (y ) dy for y in the range of the function g , and zero otherwise. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example 1. (The Probability Transformation) Let X be continuous with cumulative distribution function FX . Then, if g = FX , Y = g (X ) ∼ U(0, 1). 2. (The Quantile Transformation) Let X ∼ U(0, 1) and F be a cumulative distribution function of a continuous random variable. Then, if g = F −1 , Y = g (X ) has FY = F .

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Definition The median of a continuous r.v. X , or its distribution, is defined as the number µ ˜X with the property P(X ≤ µ ˜X ) = P(X ≥ µ ˜X ), or, equivalently F (˜ µX ) = P(X ≤ µ ˜X ) = 0.5, where F is the cdf of X

Proposition (Relationship between µX and µ ˜X ) I

In symmetric distributions, µX = µ ˜X .

I

In positively skewed distributions, µX > µ ˜X .

I

In negatively skewed distributions, µX < µ ˜X .

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example If X ∼ U(A, B), find µ ˜X . Solution: The cdf of X is F (x) = (x − A)/(B − A),

for A ≤ x ≤ B,

for x ≤ A, F (x) = 0, and for x ≥ B, F (x) = 1. To find µ ˜X we need to solve the equation F (˜ µX ) = 0.5 The solution to it is µ ˜X = A + 0.5 × (B − A). M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Definition Let α be a number between 0 and 1. The 100(1-α)th percentile (or quantile) of a continuous r.v. X is the number, denoted by xα , with the property F (xα ) = P(X ≤ xα ) = 1 − α, where F is the cdf of X . Thus, x0.05 , the 95th percentile of X , separates the top 5% of the population units (in terms of their X -value) from the rest. I I I I

x0.5 is the median and is also denoted by q2 . x0.75 is also called the lower quartile, and denoted by q1 . x0.25 is also called the upper quartile, and denoted by q3 . For any given α, xα can be found by solving the equation F (xα ) = 1 − α, M. George Akritas

for xα .

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example Let the cdf F (x) of the r.v. X be such that F (x) = 0 for x ≤ 0, F (x) =

x2 , for x between 0 and 2, and F (x) = 1 for x > 2 . 4

Find the three quartiles (the 25th, 50th, and 75th percentiles). Solution: The 100(1 − α)th percentile of X is found by solving F (xα ) = 1 − α, or √ xα2 /4 = 1 − α, or xα = 2 1 − α. The 25th, 50th, and 75th percentiles correspond to, respectively, to α = 0.75, 0.5, and 0.25, respectively. Thus, √ √ √ q1 = 2 0.25 = 1, q2 = 2 0.5 = 1.41, q3 = 2 0.75 = 1.73. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Definition The interquartile range, abbreviated by IQR, is the distance between the 25th and 75th percentile. Thus, IQR = q3 − q1 .

Example Let X ∼ U(a, b). Find the IQR and compare it with the standard deviation. Solution: The cdf F (x) of X satisfies F (x) = 0, for x ≤ a, F (x) =

x −a , for a ≤ x ≤ b, and F (x) = 1 for x ≥ b b−a

The solution to the equation M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability

Example (Continued) xα − a = 1 − α is xα = a + (b − a)(1 − α). b−a Thus, q1 = x0.75 = a + (b − a)0.25, q3 = x0.25 = a + (b − a)0.75, and IQRX = 0.5(b − a).

I

I

It can be shown that IQRX and σX change proportionately whenever a r.v. X is multiplied by a constant. For example, for √ X ∼ U(a, b), σX = (b − a)/ 12 = 0.289(b − a). Thus the ratio IQRX /σX remains constant for all values of b, a. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

I

X ∼ Exp(λ) if fX (x) = λe −λx I (x > 0), for some λ > 0.

I

F (x) = P(X ≤ x) = 1 − e −λx

I

E (X n ) = λn E (X n−1 ), so that E (X ) =

1 1 , Var(X ) = 2 λ λ

Example Suppose that the number of miles a car can run before its battery wears out is exponentially distributed with an average value of 10,000 miles. A person decides to take a 5,000 mile trip having just changed the battery. What is the probability that the trip will be completed without having to replace the battery? Solution: P(X > 5) = e −5/10 = 0.604. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Memoryless Property of the Exponential RV I

If X ∼ Exp(λ) then for t > s we have P(X > t|X > s) = P(X > t − s)

Example Suppose that the number of miles a car can run before its battery wears out is exponentially distributed with an average value of 10,000 miles. A person decides to take a 5,000 mile. What is the probability that the trip will be completed without having to replace the battery? Solution: By the memoryless property, P(X > 5) = e −5/10 = 0.604. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

The Poisson-Exponential Relationship

Proposition Let X (t) be a Poisson process with parameter λ, and let T be the time until the first occurrence. Then T ∼ Exp(λ)

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

I

The Normal distribution if the most important distribution in probability and statistics.

I

X ∼ N(µ, σ 2 ) if its pdf is f (x; µ, σ 2 ) = √

1 2πσ 2

e−

(x−µ)2 2σ 2

, −∞ < x < ∞.

I

The cdf, F (x; µ, σ), does not have a closed form expression.

I

R command for f (x; µ, σ 2 ): dnorm(x,µ,σ). I

I

For example, dnorm(0,0,1) gives 0.3989423, which is the value of f (0; µ = 0, σ 2 = 1).

R command for F (x; µ, σ 2 ): pnorm(x,µ,σ). I

For example, pnorm(0,0,1) gives 0.5, which is the value of F (0; µ = 0, σ 2 = 1). M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

The Standard Normal Distribution When µ = 0 and σ = 1, X is said to have the standard normal distribution and is denoted, universally, by Z . The pdf of Z is 1 2 φ(z) = √ e −z /2 , −∞ < z < ∞. 2π The cdf of Z is denoted by Φ. Thus Z Φ(z) = P(Z ≤ z) =

z

φ(x)dx.

−∞

Φ(z) has no closed form expression, but is tabulated in Table A.3

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Plot of φ(z)

0.0

0.1

f(x)

0.2

0.3

0.4

mu=0, sigm^2=1

-3

-2

-1

M. George Akritas

mu 0

1

2

3

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Historical Notes I

It was discovered by Abraham DeMoivre in 1733, for approximating binomial probabilities when n is large. He called it the exponential bell-shaped curve. I

I

I

DeMoivre was the first statistical consultant working out of ”Slaughter’s Coffee House”, a betting shop in Long Acres, London.

In 1803, Karl Friedrich Gauss used it for predicting the location of astronomical objects. Because of this it became known as the Gaussian distribution. By the late 19th century, statisticians had noted that most data sets would have approximately bell-shaped histograms. It came to be accepted that it was ”normal” for any well-behaved data set to follow this curve. So the Gaussian curve became the normal curve. M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Proposition If X ∼ N(µ, σ 2 ), then 1. E (X ) = µ. 2. Var(X ) = σ 2 . 3. For an real numbers a, b Y = a + bX ∼ N(a + bµ, b 2 σ 2 ). For example, if X ∼ N(4, 9) then Y = 5 + 2X ∼ N(13, 36)

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Corollary 1. If Z ∼ N(0, 1), then X = µ + σZ ∼ N(µ, σ 2 ). 2. If X ∼ N(µ, σ 2 ), then 3. If X ∼ N(µ, σ 2 ), then

X −µ ∼ N(0, 1). σ xα = µ + σzα , Z=

where xα and zα denote the percentiles of X and Z (see figure in next slide). I

The corollary implies that probabilities and percentiles of any normal random variable can be computed from corresponding probabilities and percentiles of Z .

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

0.2

f(x)

0.3

0.4

Figure of the Standard Normal Percentile

0.0

0.1

area=alpha

z_alpha -3

I

-2

-1

0

1

2

3

Normal percentiles in R: qnorm(p,µ,σ). For example, qnorm(0.95,0,1) gives 1.644854, which is the value of z0.05 . M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

In Table A.3, z-values are identified from the left column, up to the first decimal, and the top row, for the second decimal. Thus, 1 is identified by 1.0 in the left column and 0.00 in the top row.

Example (The 68-95-99.7% Property.) Let Z ∼ N(0, 1). Then 1. P(−1 < Z < 1) = Φ(1) − Φ(−1) = .8413 − .1587 = .6826. 2. P(−2 < Z < 2) = Φ(2) − Φ(−2) = .9772 − .0228 = .9544. 3. P(−3 < Z < 3) = Φ(3) − Φ(−3) = .9987 − .0013 = .9974.

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Example Let X ∼ N(1.25, 0.462 ). Find a) P(1 ≤ X ≤ 1.75), and b) P(X > 2). X − 1.25 ∼ N(0, 1) to express these 0.46 probabilities in terms of Z . Thus,   1 − 1.25 X − 1.25 1.75 − 1.25 a) P(1 ≤ X ≤ 1.75) = P ≤ ≤ .46 .46 .46 Solution. Use Z =

= P(−.54 < Z < 1.09) = Φ(1.09) − Φ(−.54) = .8621 − .2946.   2 − 1.25 b) P(X > 2) = P Z > = 1 − Φ(1.63) = .0516. .46

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

The 68-95-99.7% Property

The 68-95-99.7% rule applies for any normal random variable X ∼ N(µ, σ 2 ): I

P(µ − 1σ < X < µ + 1σ) = P(−1 < Z < 1) = 0.6826,

I

P(µ − 2σ < X < µ + 2σ) = P(−2 < Z < 2) = 0.9544,

I

P(µ − 3σ < X < µ + 3σ) = P(−3 < Z < 3) = 0.9974.

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

To find zα , one first locates 1 − α in the body of Table A.3 and then reads zα from the margins. If the exact value of 1 − α does not exist in the main body of the table, then an approximation is used as described in the following.

Example Find z0.05 , the 95th percentile of Z . Solution. 1 − α = 0.95 does not exist in the body of the table. The entry that is closest to, but larger than 0.95 (i.e. 0.9505), corresponds to 1.64. The entry that is closest to, but smaller than 0.95 (which is 0.9495), corresponds to 1.65. We approximate z0.05 by averaging these two z-values: z.05 '

M. George Akritas

1.64 + 1.65 = 1.645. 2

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Example Let X denote the weight of a randomly chosen frozen yogurt cup. Suppose X ∼ N(8, .462 ). Find the value c that separates the upper 5% of weight values from the lower 95%. Solution. This is another way of asking for the 95-th percentile, x.05 , of X . Using the formula xα = µ + σzα , we have x.05 = 8 + .46z.05 = 8 + (.46)(1.645) = 8.76.

M. George Akritas

Continuous Random Variables

Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution

Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table

Example A message consisting of a string of binary (either 0 or 1) signals is transmitted from location A to location B. Due to channel noise, however, when x is sent from A, the B receives y = x + e, where e ∼ N(0, 1) represents the noise. To minimize error, location A sends x = 2 for 1 and x = −2 for 0. Location B decodes the received signal y as 1, if y ≥ 0.5 and as 0 if y < 0.5. Find the probability of an error in the decoded signal. Solution. Let B = signal is decoded incorrectly. Cannot find P(B) (why?), but can find P(B|signal is 1) = P(x + e < 0.5|x = 2) = P(e < −1.5) = 0.0668, P(B|signal is 0) = P(x + e ≥ 0.5|x = −2) = P(e ≥ 2.5) = 0.0062. M. George Akritas

Continuous Random Variables