Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Continuous Random Variables M. George Akritas
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Probability Density Function The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability The Exponential Distribution The Normal or Gaussian Distribution Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
I
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
A random variable X is called continuous if it can take any value within a finite or infinite interval of the real line.
Such are measurements of quantitative variables (weight, strength, life time, pH, concentration of contaminants, etc) as well as sample averages and variances of such measurements. • Because no measuring device has infinite precision, continuous variables are only the ideal versions of the discretized variables which are measured. • The sample space of any continuous random variable has uncountably infinite values.
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Continuous random variable cannot have a pmf because I
P(X = x) = 0, for any value x. (See Example 3.1, p. 102, of the book)
Definition The probability density function, or pdf, fX , of a continuous random variable X is a nonnegative function with the property that P(a < X < b) equals the area under it and above the interval (a, b). Thus, I
P(a < X < b) =
area under fX = between a and b.
M. George Akritas
Z
b
fX (x)dx. a
Continuous Random Variables
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
0.3
0.4
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
0.0
0.1
f(x)
0.2
P(1.0 < X < 2.0)
-3
-2
-1
M. George Akritas
0
1
2
Continuous Random Variables
3
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Common Shapes of PDFs
symmetric
bimodal
positively skewed
negatively skewed
A positively skewed distribution is also called skewed to the right, and a negatively skewed is also called skewed to the left. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Proposition If X has pdf fX (x) and cdf FX (x), then Z x 1. FX (x) = fX (y )dy , and −∞
d FX (y )|y =x 2. fX (x) = dy
Example (Exponential RV) The life time, T , measured in hours, of a randomly selected component has pdf fT (t) = λ exp(−λt) for t ∈ SX , with λ = 0.001. Find: a) FT (t), and b) P(900 < T < 1200).
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
The Uniform Random Variable I
Consider selecting a number at random from the interval [0, 1] in such a way that any two subintervals of [0, 1] of equal length are equally likely to contain the selected number.
I
For example, the subintervals [0.3, 0.4] and [0.6, 0.7] are equally likely to contain the selected number.
I
If X denotes the outcome of such a selection, then X is said to have the uniform in [0, 1] distribution; this is denoted by X ∼ U(0, 1).
I
Since we know the probability with which X takes value in any interval, we know its distribution.
I
What pdf describes it? M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
The Uniform PDF If X ∼ U(0, 1), its pdf is:
0.0
0.2
0.4
0.6
0.8
1.0
P(0.2 < X < 0.6)
0.0
0.2
0.4
M. George Akritas
0.6
0.8
Continuous Random Variables
1.0
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
The Uniform cdf
0.0
0.2
0.4
0.6
0.8
1.0
If X ∼ U(0, 1), its cdf is (why?):
0.0
0.2
0.4
M. George Akritas
0.6
0.8
Continuous Random Variables
1.0
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Definition If X has pdf f (x), its expected value and variance are defined, respectively, by Z ∞ Z ∞ µX = xf (x)dx, σX2 = (x − µ)2 f (x)dx = E (X 2 ) − µ2X , −∞
−∞
The standard deviation is σX =
q σX2 .
Proposition 1. For a function Y = h(X ) of X , E (h(X )) =
R∞
−∞ h(x)f (x)dx
2. If Y = a + bX , 2 E (a + bX ) = a + bE (X ), σa+bX M. George Akritas
= b 2 σX2 .
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example If X ∼ U(0, 1), show that µX = 0.5 and σX2 = 1/12. R1 R1 Solution. First, µX = 0 xdx = 0.5, and E (X 2 ) = 0 x 2 dx = 1/3. Thus, σX2 = 1/3 − 0.52 = 1/12.
Example If Y ∼ U(A, B), show that µY =
B+A 2 ,
σY2 =
(B−A)2 12 .
Solution. First note that if X ∼ U(0, 1), then Y = A + (B − A)X ∼ U(A, B). Use now the previous proposition to show the result. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Proposition If Y is a ≥ 0 random variable, then E (Y ) =
R∞ 0
P(Y > y )dy .
Example Use the above proposition to calculate 1. The mean of a X ∼ U(0, 1). 2. The mean of X ∼ Exp(λ)
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example a) If X ∼ U(0, 1), find the distribution of Y = X 2 . b) If X ∼ U(−1, 1), find the distribution of Y = X 2 . c) If X ∼ U(0, 1), find the distribution of Y = log X . d) See Example 3.26, p.134, of the book.
Theorem Let X be continuous with pdf fX , and let g (x) be strictly monotonic and differentiable function. Then Y = g (X ) has pdf d fY (y ) = fX (g −1 (y )) g −1 (y ) dy for y in the range of the function g , and zero otherwise. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example 1. (The Probability Transformation) Let X be continuous with cumulative distribution function FX . Then, if g = FX , Y = g (X ) ∼ U(0, 1). 2. (The Quantile Transformation) Let X ∼ U(0, 1) and F be a cumulative distribution function of a continuous random variable. Then, if g = F −1 , Y = g (X ) has FY = F .
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Definition The median of a continuous r.v. X , or its distribution, is defined as the number µ ˜X with the property P(X ≤ µ ˜X ) = P(X ≥ µ ˜X ), or, equivalently F (˜ µX ) = P(X ≤ µ ˜X ) = 0.5, where F is the cdf of X
Proposition (Relationship between µX and µ ˜X ) I
In symmetric distributions, µX = µ ˜X .
I
In positively skewed distributions, µX > µ ˜X .
I
In negatively skewed distributions, µX < µ ˜X .
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example If X ∼ U(A, B), find µ ˜X . Solution: The cdf of X is F (x) = (x − A)/(B − A),
for A ≤ x ≤ B,
for x ≤ A, F (x) = 0, and for x ≥ B, F (x) = 1. To find µ ˜X we need to solve the equation F (˜ µX ) = 0.5 The solution to it is µ ˜X = A + 0.5 × (B − A). M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Definition Let α be a number between 0 and 1. The 100(1-α)th percentile (or quantile) of a continuous r.v. X is the number, denoted by xα , with the property F (xα ) = P(X ≤ xα ) = 1 − α, where F is the cdf of X . Thus, x0.05 , the 95th percentile of X , separates the top 5% of the population units (in terms of their X -value) from the rest. I I I I
x0.5 is the median and is also denoted by q2 . x0.75 is also called the lower quartile, and denoted by q1 . x0.25 is also called the upper quartile, and denoted by q3 . For any given α, xα can be found by solving the equation F (xα ) = 1 − α, M. George Akritas
for xα .
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example Let the cdf F (x) of the r.v. X be such that F (x) = 0 for x ≤ 0, F (x) =
x2 , for x between 0 and 2, and F (x) = 1 for x > 2 . 4
Find the three quartiles (the 25th, 50th, and 75th percentiles). Solution: The 100(1 − α)th percentile of X is found by solving F (xα ) = 1 − α, or √ xα2 /4 = 1 − α, or xα = 2 1 − α. The 25th, 50th, and 75th percentiles correspond to, respectively, to α = 0.75, 0.5, and 0.25, respectively. Thus, √ √ √ q1 = 2 0.25 = 1, q2 = 2 0.5 = 1.41, q3 = 2 0.75 = 1.73. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Definition The interquartile range, abbreviated by IQR, is the distance between the 25th and 75th percentile. Thus, IQR = q3 − q1 .
Example Let X ∼ U(a, b). Find the IQR and compare it with the standard deviation. Solution: The cdf F (x) of X satisfies F (x) = 0, for x ≤ a, F (x) =
x −a , for a ≤ x ≤ b, and F (x) = 1 for x ≥ b b−a
The solution to the equation M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Expected Value and Variance Transformations The Median Other Percentiles (or Quantiles) IQR: Another measure of variability
Example (Continued) xα − a = 1 − α is xα = a + (b − a)(1 − α). b−a Thus, q1 = x0.75 = a + (b − a)0.25, q3 = x0.25 = a + (b − a)0.75, and IQRX = 0.5(b − a).
I
I
It can be shown that IQRX and σX change proportionately whenever a r.v. X is multiplied by a constant. For example, for √ X ∼ U(a, b), σX = (b − a)/ 12 = 0.289(b − a). Thus the ratio IQRX /σX remains constant for all values of b, a. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
I
X ∼ Exp(λ) if fX (x) = λe −λx I (x > 0), for some λ > 0.
I
F (x) = P(X ≤ x) = 1 − e −λx
I
E (X n ) = λn E (X n−1 ), so that E (X ) =
1 1 , Var(X ) = 2 λ λ
Example Suppose that the number of miles a car can run before its battery wears out is exponentially distributed with an average value of 10,000 miles. A person decides to take a 5,000 mile trip having just changed the battery. What is the probability that the trip will be completed without having to replace the battery? Solution: P(X > 5) = e −5/10 = 0.604. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Memoryless Property of the Exponential RV I
If X ∼ Exp(λ) then for t > s we have P(X > t|X > s) = P(X > t − s)
Example Suppose that the number of miles a car can run before its battery wears out is exponentially distributed with an average value of 10,000 miles. A person decides to take a 5,000 mile. What is the probability that the trip will be completed without having to replace the battery? Solution: By the memoryless property, P(X > 5) = e −5/10 = 0.604. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
The Poisson-Exponential Relationship
Proposition Let X (t) be a Poisson process with parameter λ, and let T be the time until the first occurrence. Then T ∼ Exp(λ)
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
I
The Normal distribution if the most important distribution in probability and statistics.
I
X ∼ N(µ, σ 2 ) if its pdf is f (x; µ, σ 2 ) = √
1 2πσ 2
e−
(x−µ)2 2σ 2
, −∞ < x < ∞.
I
The cdf, F (x; µ, σ), does not have a closed form expression.
I
R command for f (x; µ, σ 2 ): dnorm(x,µ,σ). I
I
For example, dnorm(0,0,1) gives 0.3989423, which is the value of f (0; µ = 0, σ 2 = 1).
R command for F (x; µ, σ 2 ): pnorm(x,µ,σ). I
For example, pnorm(0,0,1) gives 0.5, which is the value of F (0; µ = 0, σ 2 = 1). M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
The Standard Normal Distribution When µ = 0 and σ = 1, X is said to have the standard normal distribution and is denoted, universally, by Z . The pdf of Z is 1 2 φ(z) = √ e −z /2 , −∞ < z < ∞. 2π The cdf of Z is denoted by Φ. Thus Z Φ(z) = P(Z ≤ z) =
z
φ(x)dx.
−∞
Φ(z) has no closed form expression, but is tabulated in Table A.3
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Plot of φ(z)
0.0
0.1
f(x)
0.2
0.3
0.4
mu=0, sigm^2=1
-3
-2
-1
M. George Akritas
mu 0
1
2
3
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Historical Notes I
It was discovered by Abraham DeMoivre in 1733, for approximating binomial probabilities when n is large. He called it the exponential bell-shaped curve. I
I
I
DeMoivre was the first statistical consultant working out of ”Slaughter’s Coffee House”, a betting shop in Long Acres, London.
In 1803, Karl Friedrich Gauss used it for predicting the location of astronomical objects. Because of this it became known as the Gaussian distribution. By the late 19th century, statisticians had noted that most data sets would have approximately bell-shaped histograms. It came to be accepted that it was ”normal” for any well-behaved data set to follow this curve. So the Gaussian curve became the normal curve. M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Proposition If X ∼ N(µ, σ 2 ), then 1. E (X ) = µ. 2. Var(X ) = σ 2 . 3. For an real numbers a, b Y = a + bX ∼ N(a + bµ, b 2 σ 2 ). For example, if X ∼ N(4, 9) then Y = 5 + 2X ∼ N(13, 36)
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Corollary 1. If Z ∼ N(0, 1), then X = µ + σZ ∼ N(µ, σ 2 ). 2. If X ∼ N(µ, σ 2 ), then 3. If X ∼ N(µ, σ 2 ), then
X −µ ∼ N(0, 1). σ xα = µ + σzα , Z=
where xα and zα denote the percentiles of X and Z (see figure in next slide). I
The corollary implies that probabilities and percentiles of any normal random variable can be computed from corresponding probabilities and percentiles of Z .
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
0.2
f(x)
0.3
0.4
Figure of the Standard Normal Percentile
0.0
0.1
area=alpha
z_alpha -3
I
-2
-1
0
1
2
3
Normal percentiles in R: qnorm(p,µ,σ). For example, qnorm(0.95,0,1) gives 1.644854, which is the value of z0.05 . M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
In Table A.3, z-values are identified from the left column, up to the first decimal, and the top row, for the second decimal. Thus, 1 is identified by 1.0 in the left column and 0.00 in the top row.
Example (The 68-95-99.7% Property.) Let Z ∼ N(0, 1). Then 1. P(−1 < Z < 1) = Φ(1) − Φ(−1) = .8413 − .1587 = .6826. 2. P(−2 < Z < 2) = Φ(2) − Φ(−2) = .9772 − .0228 = .9544. 3. P(−3 < Z < 3) = Φ(3) − Φ(−3) = .9987 − .0013 = .9974.
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Example Let X ∼ N(1.25, 0.462 ). Find a) P(1 ≤ X ≤ 1.75), and b) P(X > 2). X − 1.25 ∼ N(0, 1) to express these 0.46 probabilities in terms of Z . Thus, 1 − 1.25 X − 1.25 1.75 − 1.25 a) P(1 ≤ X ≤ 1.75) = P ≤ ≤ .46 .46 .46 Solution. Use Z =
= P(−.54 < Z < 1.09) = Φ(1.09) − Φ(−.54) = .8621 − .2946. 2 − 1.25 b) P(X > 2) = P Z > = 1 − Φ(1.63) = .0516. .46
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
The 68-95-99.7% Property
The 68-95-99.7% rule applies for any normal random variable X ∼ N(µ, σ 2 ): I
P(µ − 1σ < X < µ + 1σ) = P(−1 < Z < 1) = 0.6826,
I
P(µ − 2σ < X < µ + 2σ) = P(−2 < Z < 2) = 0.9544,
I
P(µ − 3σ < X < µ + 3σ) = P(−3 < Z < 3) = 0.9974.
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
To find zα , one first locates 1 − α in the body of Table A.3 and then reads zα from the margins. If the exact value of 1 − α does not exist in the main body of the table, then an approximation is used as described in the following.
Example Find z0.05 , the 95th percentile of Z . Solution. 1 − α = 0.95 does not exist in the body of the table. The entry that is closest to, but larger than 0.95 (i.e. 0.9505), corresponds to 1.64. The entry that is closest to, but smaller than 0.95 (which is 0.9495), corresponds to 1.65. We approximate z0.05 by averaging these two z-values: z.05 '
M. George Akritas
1.64 + 1.65 = 1.645. 2
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Example Let X denote the weight of a randomly chosen frozen yogurt cup. Suppose X ∼ N(8, .462 ). Find the value c that separates the upper 5% of weight values from the lower 95%. Solution. This is another way of asking for the 95-th percentile, x.05 , of X . Using the formula xα = µ + σzα , we have x.05 = 8 + .46z.05 = 8 + (.46)(1.645) = 8.76.
M. George Akritas
Continuous Random Variables
Outline The Probability Density Function The Exponential Distribution The Normal or Gaussian Distribution
Definition: The pdf and cdf Finding Probabilities via the Standard Normal Table Finding Percentiles via the Standard Normal Table
Example A message consisting of a string of binary (either 0 or 1) signals is transmitted from location A to location B. Due to channel noise, however, when x is sent from A, the B receives y = x + e, where e ∼ N(0, 1) represents the noise. To minimize error, location A sends x = 2 for 1 and x = −2 for 0. Location B decodes the received signal y as 1, if y ≥ 0.5 and as 0 if y < 0.5. Find the probability of an error in the decoded signal. Solution. Let B = signal is decoded incorrectly. Cannot find P(B) (why?), but can find P(B|signal is 1) = P(x + e < 0.5|x = 2) = P(e < −1.5) = 0.0668, P(B|signal is 0) = P(x + e ≥ 0.5|x = −2) = P(e ≥ 2.5) = 0.0062. M. George Akritas
Continuous Random Variables