Continuous Random Variables -3 -2 -
+ +2 +3
Lecture 4
© 2010, All Rights Reserved, Robi Polikar. No part of this presentation may be used without explicit written permission. Such permission will be given – upon request – for noncommercial educational purposes only. Limited permission is hereby granted, however, to post or distribute this presentation if you agree to all of the following: 1. you do so for noncommercial educational purposes; 2. the entire presentation is kept together as a whole, including this entire notice. 3. you include the following link/reference on your site: Robi Polikar, http://engineering.rowan.edu/~polikar.
ECE 09.360
Dr. P.’s Clinic Consultant Module in
Probability & Statistics in Engineering
Today in P&S -3 -2 -
+ +2 +3
Review of Discrete Random Variables Binomial distribution Hypergeometric and negative binomial distributions Poisson distribution
Continuous Random Variables and Their Probability Distributions Probability density (distribution) function Cumulative distribution function
Percentiles, Expected Values & Variances of Cont. Random Variables The Normal (Gaussian) Distribution Standard & non-standard normal distribution The normal approximation to the binomial distribution
Other continuous distributions Gamma, Beta, Exponential, Chi-Squared and Weibull Distributions © 2010 All Rights Reserved, Robi Polikar, Rowan University
Random Variables -3 -2 -
+ +2 +3
A random variable X is a function that maps every possible event in the space S of a random experiment to a real number.
5
x
Other
3 Sword
Seabass 4
2 Trout
0.2 0.15 0.1 1
A cumulative distribution function tells us the probability of X assuming a value x or less: F(X)=P(X 0 for all values of x. 2. The area of the region between the graph of f and the x – axis is equal to 1.
y f ( x)
P X
f x dx 1
Area = 1 © 2010 All Rights Reserved, Robi Polikar, Rowan University
The actual probability is the area under the PDF… I repeat…The Actual Probability is … -3 -2 -
+ +2 +3
P(a X b) is the area of the shaded region.
y f ( x)
a
b © 2010 All Rights Reserved, Robi Polikar, Rowan University
Uniform Distribution -3 -2 -
+ +2 +3
If all outcomes of a random experiment are equally likely, the random variable is then said to have a uniform distribution. More formally, A continuous r.v. X is said to have a uniform distribution on the interval [a, b] if the pdf of X is
1 a xb f x; a, b b a 0 otherwise
f(x) 1 ba
a
b
x © 2010 All Rights Reserved, Robi Polikar, Rowan University
Cumulative Distribution Function -3 -2 -
+ +2 +3
Similar to the discrete case, the cumulative distribution function, F(x) for a continuous rv X is defined for every number x as the P(X ≤ x) by F x P X x
x
f t dt
Note that for each x, F(x) is the area under the density curve to the left of x.
f(x)
F(x) F (200) P X 200 200
f x dx 0
Conversely, the pdf can be obtained from the cdf as
f x F ' x
df x dx
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Cumulative Distribution Function -3 -2 -
+ +2 +3
f(x)
F(x)
P100 X 200
200
f x dx
100
F (200) F (100)
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Percentiles in Cont. Distributions -3 -2 -
+ +2 +3
Percentiles indicate the relative standing in ordered data: For example, if we are talking about SAT scores, and you are in the 90th percentile, then 90% of all test takers did more poorly then you did, and 10% did better. So 100pth percentile , ), 0 ≤ p ≤ 1, is the value that exceeds 100p% of all scores, and exceeded by 100(1-p)% of all scores. More formally: Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X denoted by ( p) , is defined by
p F ( p)
( p)
f ( y)dy
Thus on a pdf curve, η(p) is the value on the horizontal axis such that 100p% of the area under f(x) lies to the left of η(p) and 100(1-p)% lies to the right!
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Parameters of Cont. RVs: Mean & Median -3 -2 -
+ +2 +3
The median of a continuous distribution, denoted by , is the 50th percentile. So satisfies 0.5 F ( ). That is, half the area under the density curve is to the left of . The mean, or expected value, of a cont. r.v. is defined similar to its discrete counterpart, where the summation is replaced by integration:
x E X x f x dx
Often we wish to compute the expected value of some function h(X) of the r.v. X. Simply think of h(X) as a new r.v. with the pdf f(x) and hence
h X Eh X hx f x dx
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Variance of a Continuous rv. -3 -2 -
+ +2 +3
The variance and standard deviation of a continuous rv are also defined similar to their discrete counterparts, with summations replaced by integrals:
X f xdx
X2 E X
2
2
E X 2 E X
2
X X2
© 2010 All Rights Reserved, Robi Polikar, Rowan University
The Normal (Gaussian) Distribution -3 -2 -
+ +2 +3
By far the most important distribution in all of probability and statistics, because the most commonly occurring in nature • It provides a good explanation for many, but not all, continuously valued phenomena • Physical measurements of length, weight, width, etc., measurement errors, exam scores, quality control results, outcomes of medical diagnostic tests, many financial indicators… • Even if the individual variables of an experiment are not normal, their sum is (CLT) • Even if individual factors affecting an experiment outcome are not normal, their combination that determine the actual experiment outcome is normal !!! • Entirely determined by just two parameters. Knowing them means knowing everything! • Well studied and well understood • Has a nice bell shape to it…!
© 2010 All Rights Reserved, Robi Polikar, Rowan University
The Normal Distribution -3 -2 -
+ +2 +3
A continuous r.v. X is said to have a normal distribution with parameters μ and σ (or σ2), where -∞0, if the pdf of X is
1 f x; , e 2
x 2 2
f ( x)
1 e 2
1 x 2
2
68.2% 2
4 95.4%
95.4%
6
99.7%
-3
-2
-
99.7%
+
+2
+3
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Computing Normal Distribution -3 -2 -
+ +2 +3
As in other distributions, to compute the probability of a random variable assuming a particular range values, we need to integrate the area under the normal curve. For example, if the weight of students in this class are normally distributed (probably true) with a mean of say – 170 lbs and a std. dev. of 20 lbs – then the probability of a randomly selected student to have a weight between 185 and 200 lbs would be:
200
P(185 x 200)
f ( x)dx
x 185 200
x 185
1
2 20
e
x 1702 202
dx
0.1598 16%
185 200
© 2010 All Rights Reserved, Robi Polikar, Rowan University
Computing Normal Distributions -3 -2 -
+ +2 +3
The problem however, the integral of the function exp(-x2) cannot be computed analytically! Instead, the integral is computed numerically for a range of values and the results are tabulated. However, tabulating integrals for every possible value of μ and σ is impossible…! Therefore we define the standard normal distribution as The normal distribution with parameter values μ = 0 and σ =1 is called a standard normal distribution. The random variable for this distribution is typically denoted by Z. The pdf is therefore 1 z2 2 f ( z;0,1) e 2 The associated cdf function for Z, typically denoted by Φ(z) is z P( Z z )
z
f y;0,1dy
z
1 y2 2 e dy 2
We therefore list the tables for std. norm. dist. only. Although the standard norm. dist. does not occur too often, it is very commonly used as a reference distribution. It is straightforward to convert a nonstandard normal dist. to and from standard dist. © 2010 All Rights Reserved, Robi Polikar, Rowan University
Using Gaussian Tables -3 -2 -
+ +2 +3
Area under the curve on each side of zero is 0.5. The curve is symmetric, so the total area is 1
Area B= Φ(0.82) BA
C
Example: if z=0.82 A=Area under the curve for [0 0.82] : 0.294 B=Total area for [-∞ 0.82]=0.5+0.294=0.794 = Φ(0.82) This value is the probability that z0.82 © 2010 All Rights Reserved, Robi Polikar, Rowan University
Using Gaussian Tables -3 -2 -
+ +2 +3
In some books, the standard cdf Φ(Z) function is tabulated, rather then the pdf from 0 to z
P(-0.44