Continuous Random Variables -3 -2 -

 + +2 +3

Lecture 4

© 2010, All Rights Reserved, Robi Polikar. No part of this presentation may be used without explicit written permission. Such permission will be given – upon request – for noncommercial educational purposes only. Limited permission is hereby granted, however, to post or distribute this presentation if you agree to all of the following: 1. you do so for noncommercial educational purposes; 2. the entire presentation is kept together as a whole, including this entire notice. 3. you include the following link/reference on your site: Robi Polikar, http://engineering.rowan.edu/~polikar.

ECE 09.360

Dr. P.’s Clinic Consultant Module in

Probability & Statistics in Engineering

Today in P&S -3 -2 -

 + +2 +3

 Review of Discrete Random Variables  Binomial distribution  Hypergeometric and negative binomial distributions  Poisson distribution

 Continuous Random Variables and Their Probability Distributions  Probability density (distribution) function  Cumulative distribution function

 Percentiles, Expected Values & Variances of Cont. Random Variables  The Normal (Gaussian) Distribution  Standard & non-standard normal distribution  The normal approximation to the binomial distribution

 Other continuous distributions  Gamma, Beta, Exponential, Chi-Squared and Weibull Distributions © 2010 All Rights Reserved, Robi Polikar, Rowan University

Random Variables -3 -2 -

 + +2 +3

 A random variable X is a function that maps every possible event in the space S of a random experiment to a real number.

5

x

Other

3 Sword

Seabass 4

2 Trout

0.2 0.15 0.1 1

 A cumulative distribution function tells us the probability of X assuming a value x or less: F(X)=P(X 0 for all values of x. 2. The area of the region between the graph of f and the x – axis is equal to 1.

y  f ( x)

P   X    



 f x dx  1



Area = 1 © 2010 All Rights Reserved, Robi Polikar, Rowan University

The actual probability is the area under the PDF… I repeat…The Actual Probability is … -3 -2 -

 + +2 +3

P(a  X  b) is the area of the shaded region.

y  f ( x)

a

b © 2010 All Rights Reserved, Robi Polikar, Rowan University

Uniform Distribution -3 -2 -

 + +2 +3

 If all outcomes of a random experiment are equally likely, the random variable is then said to have a uniform distribution. More formally,  A continuous r.v. X is said to have a uniform distribution on the interval [a, b] if the pdf of X is

 1 a xb  f x; a, b    b  a  0 otherwise

f(x) 1 ba

a

b

x © 2010 All Rights Reserved, Robi Polikar, Rowan University

Cumulative Distribution Function -3 -2 -

 + +2 +3

 Similar to the discrete case, the cumulative distribution function, F(x) for a continuous rv X is defined for every number x as the P(X ≤ x) by F  x   P X  x  

x

 f t dt



 Note that for each x, F(x) is the area under the density curve to the left of x.

f(x)

F(x) F (200)  P X  200 200



 f x dx 0

 Conversely, the pdf can be obtained from the cdf as

f x   F ' x  

df x  dx

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Cumulative Distribution Function -3 -2 -

 + +2 +3

f(x)

F(x)

P100  X  200 

200

 f x dx

100

 F (200)  F (100)

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Percentiles in Cont. Distributions -3 -2 -

 + +2 +3

 Percentiles indicate the relative standing in ordered data: For example, if we are talking about SAT scores, and you are in the 90th percentile, then 90% of all test takers did more poorly then you did, and 10% did better. So 100pth percentile , ), 0 ≤ p ≤ 1, is the value that exceeds 100p% of all scores, and exceeded by 100(1-p)% of all scores. More formally:  Let p be a number between 0 and 1. The (100p)th percentile of the distribution of a continuous rv X denoted by  ( p) , is defined by

p  F  ( p)   

 ( p)



f ( y)dy

 Thus on a pdf curve, η(p) is the value on the horizontal axis such that 100p% of the area under f(x) lies to the left of η(p) and 100(1-p)% lies to the right!

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Parameters of Cont. RVs: Mean & Median -3 -2 -

 + +2 +3

 The median of a continuous distribution, denoted by  , is the 50th percentile. So  satisfies 0.5  F ( ). That is, half the area under the density curve is to the left of  .  The mean, or expected value, of a cont. r.v. is defined similar to its discrete counterpart, where the summation is replaced by integration: 

 x  E  X    x  f x dx 

 Often we wish to compute the expected value of some function h(X) of the r.v. X. Simply think of h(X) as a new r.v. with the pdf f(x) and hence 

 h  X   Eh X    hx   f x dx 

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Variance of a Continuous rv. -3 -2 -

 + +2 +3

 The variance and standard deviation of a continuous rv are also defined similar to their discrete counterparts, with summations replaced by integrals:



   X     f xdx

 X2  E  X    

 

2



2



 E X 2  E  X 

2

 X   X2

© 2010 All Rights Reserved, Robi Polikar, Rowan University

The Normal (Gaussian) Distribution -3 -2 -

 + +2 +3

 By far the most important distribution in all of probability and statistics, because  the most commonly occurring in nature • It provides a good explanation for many, but not all, continuously valued phenomena • Physical measurements of length, weight, width, etc., measurement errors, exam scores, quality control results, outcomes of medical diagnostic tests, many financial indicators… • Even if the individual variables of an experiment are not normal, their sum is (CLT) • Even if individual factors affecting an experiment outcome are not normal, their combination that determine the actual experiment outcome is normal !!! • Entirely determined by just two parameters. Knowing them means knowing everything! • Well studied and well understood • Has a nice bell shape to it…!

© 2010 All Rights Reserved, Robi Polikar, Rowan University

The Normal Distribution -3 -2 -

 + +2 +3

 A continuous r.v. X is said to have a normal distribution with parameters μ and σ (or σ2), where -∞0, if the pdf of X is

1 f x;  ,    e 2 



 x   2 2

f ( x) 

1 e  2

1  x     2  

2



68.2% 2

4 95.4%

95.4%

6

99.7%

-3

-2

-



99.7%

+

+2

+3

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Computing Normal Distribution -3 -2 -

 + +2 +3

 As in other distributions, to compute the probability of a random variable assuming a particular range values, we need to integrate the area under the normal curve.  For example, if the weight of students in this class are normally distributed (probably true) with a mean of say – 170 lbs and a std. dev. of 20 lbs – then the probability of a randomly selected student to have a weight between 185 and 200 lbs would be:

200

P(185  x  200) 

 f ( x)dx

x 185 200





x 185

1

2 20



e

 x 1702 202

dx

 0.1598  16%

185 200

© 2010 All Rights Reserved, Robi Polikar, Rowan University

Computing Normal Distributions -3 -2 -

 + +2 +3

 The problem however, the integral of the function exp(-x2) cannot be computed analytically! Instead, the integral is computed numerically for a range of values and the results are tabulated.  However, tabulating integrals for every possible value of μ and σ is impossible…! Therefore we define the standard normal distribution as  The normal distribution with parameter values μ = 0 and σ =1 is called a standard normal distribution. The random variable for this distribution is typically denoted by Z. The pdf is therefore 1 z2 2 f ( z;0,1)  e 2  The associated cdf function for Z, typically denoted by Φ(z) is z   P( Z  z ) 

z





f  y;0,1dy 

z





1  y2 2 e dy 2

 We therefore list the tables for std. norm. dist. only. Although the standard norm. dist. does not occur too often, it is very commonly used as a reference distribution. It is straightforward to convert a nonstandard normal dist. to and from standard dist. © 2010 All Rights Reserved, Robi Polikar, Rowan University

Using Gaussian Tables -3 -2 -

 + +2 +3

Area under the curve on each side of zero is 0.5. The curve is symmetric, so the total area is 1

Area B= Φ(0.82) BA

C

Example: if z=0.82  A=Area under the curve for [0 0.82] : 0.294 B=Total area for [-∞ 0.82]=0.5+0.294=0.794 = Φ(0.82) This value is the probability that z0.82 © 2010 All Rights Reserved, Robi Polikar, Rowan University

Using Gaussian Tables -3 -2 -

 + +2 +3

 In some books, the standard cdf Φ(Z) function is tabulated, rather then the pdf from 0 to z

P(-0.44