## Probability Distributions for Risk Analysis

v1.x Probability Distributions for Risk Analysis Integration Track SCEA/ISPA Joint International Conference Thursday, June 9th, 2011 Unit III - Modu...
v1.x

Probability Distributions for Risk Analysis Integration Track SCEA/ISPA Joint International Conference Thursday, June 9th, 2011

1

v1.x

What This Session Is Not • What probability distributions are – CEB 08 Probability and Statistics • Tuesday, June 26th, 1045-1215, Orange C

• How to use probability distributions in a simulation – CEA 07 Monte Carlo Simulation • Wednesday, June 27th, 1530-1700, Orange B

• Distributions for CER risk – PT 06 CER Risk and S-Curves • Wednesday, June 27th, 1530-1700, Orange A

• Fitting distributions to data – CEA 09 Advanced Probability and Statistics • Thursday, June 28th, 1330-1500, Orange B Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

70

v1.x

What This Session Is Not • How to do Inputs Risk – CEB 07 Basic Cost Risk • Wednesday, June 27th, 1030-1200, Orange C

• How to analyze historical program risk data – CEA 06 Advanced Cost Risk • Thursday, June 28th, 1030-1200, Orange B

• Joint probability distributions – INT 04 Joint Cost and Schedule Risk • Thursday, June 28th, 1330-1500, Orange A

• Interaction of risk distributions and contract types – INT 05 Contracts Risk • Thursday, June 28th, 1530-1700, Orange A Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

71

v1.x

Outline • Probability Prerequisites – Algebra and Calculus

• Types of Risk Distributions • Common Distributions – – – –

Normal Distribution Lognormal Distribution Triangular Distribution Other Risk Distributions

2

v1.x

Calculus for Probability •

Calculus was invented (Newton, Leibniz, et al.) to support mechanics (branch of physics) – In the modern world, and certainly for cost estimators, probability is the primary motivation for the use of calculus

Differential calculus deals with slopes and rates of change – Mechanics: velocity (position), acceleration (velocity) – Probability: pdf (cdf), mode

Integral calculus deals with areas under curves and cumulative values – Mechanics: position (velocity), velocity (acceleration) – Probability: cdf (pdf), mean, variance, median, percentiles

Derivatives and integrals are generally inverse operations – Fundamental Theorem of Calculus!

Following slides offer a brief refresher of a few key results needed for probability derivations Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

3

Calculus Refresher – Derivatives

v1.x

• Product Rule – Derivative of a product – “Derivative of the first, times the second, plus the first times the derivative of the second”

d  f x   g x    d f x   g x   f x    d g x   f ' x  g x   f x   g ' x  dx  dx   dx 

• Chain Rule

y  g x  – Derivative of a compound function – “Derivative of the outside, times derivative of the inside”

dy d d d d  f y  f  y    f '  y    g x   f '  g  x   g '  x  f  g x   dx dx dy dx   dx

NEW!

4

v1.x

Calculus Refresher - Integrals • Substitution – Technique for evaluating (definite) integrals – Replace functions, differentials, and limits consistently

y  g x 

dy d  g  x   g '  x   dy  g '  x dx dx dx

– Essentially chain rule in reverse

b

a

f  g  x   g '  x dx  

g b 

g a 

g b 

f  y dy  F  y  g a   F  g b   F  g a 

NEW!

5

v1.x

Calculus Refresher - Series • Taylor Series – – – –

Representation of a function as an infinite series (sum) Centered around a point of interest Converges within a certain radius Handy for computation, including approximations (finite number of terms) – In cost estimating, tends to be used for: • Quantities near 0, e.g., Coefficient of Variation (CV); or • Quantities near 1, e.g., Learning Curve Slope (LCS)

• Some handy Taylor Series – Log (0 ≤ x < 1) – Square root (-1 ≤ x ≤ 1) – Reciprocals (-1 < x < 1) 1  1  x  x 2  x3    1  x 1 x

x 2 x3 ln 1  x    x       x 2 3 x 2 x3 ln 1  x   x      x 2 3 x x2 x 1 x  1    1 2 8 2

NEW!

6

v1.x

Probability Notation • Roman letters are generally used to represent random variables or their associated distribution functions – X, Y, Z for variables, x, y, z for realization of those variables – F, G for cumulative distribution functions (cdfs) • f, g for corresponding probability density functions (pdfs) • Subscripts for random variable in case of ambiguity

Lowercase Greek letters are generally used to represent parameters of probability distributions – – –

µ (mu) = mean, a location parameter σ (sigma) = standard deviation, a scale parameter α (alpha), β (beta), θ (theta), and λ (lambda) are often shape, scale, or rate parameters

Capital Greek letters are generally used to represent particular special functions or mathematical operations –

NEW!

7

Types of (Program) Risk Distributions

v1.x

1. Input distributions – Characterize uncertainty of inputs to the cost estimating process, such as cost-driver parameters (weight, power, SLOC, etc.)

2. Intermediate output distributions – Characterize uncertainty about estimates for individual cost elements • Prediction interval (PI) associated with a cost-estimating relationship (CER) – Commonly include t or log t

• Risk ranges provided by a subject matter expert (SME)

3. Final output distributions – Characterize the uncertainty of an overall cost estimate

4. Cross-program risk distributions – Characterize range of cost growth factors (CGFs) associated with historical programs

• A classic problem in risk analysis is how to make inferences about #3 from #4

NEW!

8

v1.x

Input Distributions 1. Input distributions – Characterize uncertainty of inputs to the cost estimating process, such as cost-driver parameters (weight, power, SLOC, etc.) – Commonly include Normal, Lognormal, and Triangular

2. Intermediate output distributions – Characterize uncertainty about estimates for individual cost elements • Prediction interval (PI) associated with a cost-estimating relationship (CER) – Commonly include t or log t

• Risk ranges provided by a subject matter expert (SME) – Commonly include Triangular

3. Final output distributions

Taking the Next Step: Turning OLS CER-Based Estimates into Risk Distributions. C.M. Kanick, E.R. Druker, R.L. Coleman, M.M. Cain, P.J. Braxton, SCEA 2008.

– Characterize the uncertainty of an overall cost estimate – Commonly include Normal and Lognormal Normality of Work Breakdown Structures, M. Dameron, J. Summerville, R. Coleman, N.St. Louis, Joint ISPA/SCEA Conference, June 2001.

NEW!

9

v1.x

Portfolio Risk Distributions 4. Cross-program risk distributions – Characterize range of cost growth factors (CGFs) associated with historical programs – Commonly include Lognormal, Triangular, or other skew-right distributions (including heavy-tailed distributions)

• A classic problem in risk analysis is how to make inferences about #3 from #4

NEW!

10

v1.x

Continuous Distributions • • • • •

11

Normal Distribution Overview •

Distribution

Normal Distribution 0.4

0.3

0.2

0.1

x  (,)

0.0 -4

 1 px   e  2

 x   2

Key Facts – – – – – –

2

2

-3

-2

-1

0

1

2

3

 1 F x   x    e   2 x

4

t   2 2 2

dt

(X  )

If X ~ N(, ), then ~ N(0,1)  (“standard” normal) Central Limit Theorem holds for n ≥ 30 68.3/95.5/99.7 Rule Limiting case of t distribution Exponential of normal is lognormal Dist: NORMDIST(x, mean, stddev, cum) • cum = TRUE for cdf and FALSE for pdf

Inv cdf: NORMINV(prob, mean, stddev)

v1.x

Parameters and Statistics – Mean =  – Variance = 2 – Skewness = 0 – 2.5th percentile =  - 1.96 – 97.5th percentile =  + 1.96 Applications – Central Limit Theorem • Approximation of distributions

– Regression Analysis • Assumed error term

– Distribution of cost • Default distribution

– Distribution of risk

Unit III - Module 10

• Symmetric risks and uncertainties

NEW!

12

v1.x

t Distribution Overview •

Distribution

0.4

– – – –

0.35 0.3 0.25 0.2 0.15 0.1 0.05

x  (, )

0 -4

px  

-3

-2

-1

0

1

2

3

4

[(n  1) / 2] n (n / 2)[1  ( x 2 / n)]( n 1) / 2

Key Facts

– As n approaches infinity, the t distribution approaches Normal – The t is distributed as t  N (0,1)  ( n) / n – Excel

Parameters and Statistics

Applications – Confidence Intervals • Mean of Normal variates

– Regression Analysis • Significance of individual coefficients

• cdf = TDIST(x, n, tails) – Tails = 1 or 2 depending on whether you want to include the probability in the left-hand tail

Degrees of freedom = n Mean = 0 n Variance = n  2 Skewness = 0

– Hypothesis Testing

NEW!

13

Lognormal Distribution Overview •

Distribution

Lognormal Distribution 0.06

0.04 0.03 0.02

x  [0, )

0.01 0 0

px  

5

10

15

20

25

30

35

  ln( x /  )2   exp  2 2 x 2   1

Key Facts

– If X has a lognormal distribution, then ln(X) has a normal distribution – For small standard deviations, the normal approximates the lognormal distribution • For CVs < 25%, this holds

– Excel

Parameters and Statistics

– – – –

0.05

v1.x

Median = e  Std Deviation2 of ln X = σ   2 Mean = e 2   2  2 e 1 Variance = e

Applications – Risk Analysis

10

NEW!

14

Triangular Distribution Overview •

Distribution

0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000

– – – –

pdf

0

20

40

60

80

100

120

2x  a  /b  a c  a  a  x  c p x     2b  x  /b  a b  c  c  x  b  x  a 2 axc   b  a c  a  F x    2   b x  Key Facts 1  c xb     b a b c    – Excel

• pdf and cdf calculations can be handled as they are listed in the above formulas

Parameters and Statistics

19

Min = a Max = b Mode = c (a ≤ c ≤ b) Mean = a  b  c

3 –

Variance = a 2  b 2  c 2  ab  ac  bc 18

Applications – Risk Analysis • SME Input

– A symmetrical triangle approximates a normal when a    6 , c   , b    6 Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

v1.x

15

v1.x

Uniform Distribution Overview •

Distribution

Uniform Distribution 0.12

 0 xa x  a F x   a xb b  a  xb  1

 1  p x    b  a  0

– – – –

0.1 0.08 0.06 0.04 0.02 0 ‐0.02 0.00

5.00

10.00

15.00

Parameters and Statistics

20.00

19

Min = a Max = b Mode = any value [a,b] Mean = a  b

2

a xb

x  a or x  b

Key Facts

– Excel • pdf and cdf calculations can be handled as they are listed in the above formulas

2 Variance = b  a  12

Applications – Risk Analysis • SME Input

– Sampling from arbitrary distributions • Example: Rejection Sampling

NEW!

16

v1.x

Beta Distribution Overview •

Distribution

Beta Distribution 2 1.5 1

0

f x  

0.20

     1  1 x 1  x    

0.40

0.60

0.80

1.00

1.20

F  x   I x  ,  

Key Facts

– Excel • pdf and cdf calculations can be handled as they are listed in the above formulas

– – –

Min = 0 Max = 1  1 Mode =

 Variance =    2     1

  2  – Mean =  

0.5

0.00

Parameters and Statistics

19

for   1,   1

Applications – Risk Analysis – Order Statistics – Rule of Succession – Bayesian Inference – Task Duration

NEW!

17

v1.x

Gamma Distribution Overview •

Distribution

Gamma Distribution 0.06 0.05 0.04 0.03 0.02

F x  

1

 

  ,  x 

– – –

Min = 0 Max = ∞  1 Mode = for   1

Mean =

 Variance =  2

0.01 0 0.00

20.00

40.00

60.00

80.00

100.00 120.00

   1  x p x   x e  

Parameters and Statistics

Key Facts

– Excel • GAMMADIST, GAMMAINV, and GAMMALN [natural log of gamma]

– It is the conjugate prior for the precision (i.e. inverse of the variance) of a normal distribution.

 

19

Applications – Risk Analysis – Due to shape and scale parameters – Modeling – Size of insurance claims and rainfall

NEW!

18

v1.x

Weibull Distribution Overview •

Distribution

Weibull Distribution

– – –

0.5 0.4 0.3

F x   1  e

 

 x

k

 k  x  k 1 x k   f x        e  0 

Parameters and Statistics

0.2 0.1 0 ‐0.1

0.00

2.00

4.00

6.00

8.00

10.00

12.00

x0

– –

x0

Key Facts

– Excel • WEIBULL function where α = k and β = λ, as shown above

– Interpolates between the exponential distribution with intensity 1/λ when k =1 and a Rayleigh Distribution of when k = 2.

19

Min = 0 1 Max = ∞  k 1  k   k 1 Mode =  

 k 

 0  1 1 Mean =

k

k 1

2 2 2 Variance =   1  k  

Applications – Risk Analysis – Reliability Engineering – Failure Analysis – Delivery Times – RF Dispersion

NEW!

19

Exponential Distribution Overview •

Distribution

Exponential Distribution 0.2 0.15

x

1  e , x  0, F  x   x  0.  0,

e  x f x     0,

0.1 0.05 0 0.00

10.00

20.00

30.00

40.00

50.00

x  0.

– Excel • EXPONDIST

– Exponential distribution exhibits infinite divisibility – Memoryless

Pr T  s  t T  s   PrT  t  for all s, t  0

– – – –

Min = 0 Max = ∞ Mode = 0 Mean = 1

Variance = 

19

60.00

x  0,

Key Facts

Parameters and Statistics

2

Applications – Risk Analysis – Inter-arrival times (Poisson) – Time between events – Reliability engineering – Hazard Rate – Bathtub Curve

NEW!

v1.x

20

v1.x

Pareto Distribution Overview •

Distribution

Pareto Distribution 0.15

– – – –

0.1

0.05

0 0.00

 xm  f x    x 1  0

20.00

40.00

60.00

80.00

Parameters and Statistics

100.00 120.00

Min = xm Max = +∞ Mode =xm  Mean = xm

x 1

  xm   for x xm F x  1      x  for x  xm 0 

for x  xm

Variance =

for x  xm

Key Facts

– Excel • pdf and cdf calculations can be handled as they are listed in the above formulas

– The Pareto distribution and lognormal distribution are alternative distributions for describing the same types of quantities.

19

for x  xm xm2  for   2   12   2

Applications – Risk Analysis – Population sizes – File size and Internet traffic – Hard disk error rates – Distribution of Income

NEW!

21

v1.x

Discrete Distributions • Bernoulli • Binomial • Other Discrete

22

v1.x

Bernoulli Distribution Overview •

Distribution

q

p

0

1

q  1  p x  0 p x    x 1  p

Parameters and Statistics – – – –

Min = 0 Max = 1 Mean = p Variance = pq

x0 0  F  x   q 0  x  1 1 x 1 

Key Facts

– Excel • pdf and cdf calculations can be handled as they are listed in the above formulas

Applications – Risk Analysis

– The sum of n Bernoullis is Binomial (n,p)

• Discrete Risks • X = Cf * Bernoulli • p = Pf

NEW!

23

v1.x

Binomial Distribution Overview •

Distribution

Binomial Distribution

– – – –

0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0.00

 n nk pk     p k 1  p  k 

Parameters and Statistics

10.00

20.00

30.00

40.00

50.00

Min = 0 Max = n Mean = np Variance = np(1-p)

60.00

n n i F  x      p i 1  p  i 0  i  x

Key Facts

– Excel • BINOMDIST

– The number of “successes” in a sequence of n independent experiments – Beta Distribution is the conjugate prior

Applications – Risk Analysis – Models for repeated processes or experiments – May be used for batch processing failures

NEW!

24

Negative Binomial Distribution Overview •

Distribution

Negative binomial  Distribution 0.1 0.08 0.06 0.04

Parameters and Statistics – – –

Min = 0 Max = ∞ Mean = pr

Variance =

1 p

0.02 0 0.00

 k  r  1 r 1  p  p k pk     k 

10.00

20.00

30.00

40.00

50.00

60.00

Pr  X  k   1  I p k  1, r 

Key Facts – Excel • NEGBINOMDIST

pr 1  p 2

Applications – Risk Analysis • Discrete Risks

– Process Analysis

NEW!

25

v1.x

v1.x

Poisson Distribution Overview •

Distribution

Poisson Distribution

Parameters and Statistics – – – –

0.12 0.1 0.08 0.06 0.04 0.02 0 0.00

p x  

k e  k!

10.00

20.00

F x   e



30.00

k

40.00

50.00

Min = 0 Max = ∞ Mean = λ Variance = λ

60.00

i

 i! i 0

Key Facts – Excel

• POISSON

– Ladislaus Bortkiewicz, 1898, used to investigate the number of Prussian soldiers killed by horse kick. – Law of Small Numbers or Law of rare events

Applications – Risk Analysis – Reliability Engineering – Customer Service – Civil Engineering – Astrology

NEW!

26

Geometric Distribution Overview •

Distribution

Geometric Distribution 0.2

– – – –

0.15 0.1 0.05 0 0.00

px   1  p 

k 1

p

10.00

20.00

30.00

40.00

50.00

Parameters and Statistics

60.00

Min = 1 Max = ∞ Mean = 1/p Variance = 1  p

p2

F  x   1  1  p 

k

Key Facts

– Excel • pdf and cdf calculations can be handled as they are listed in the above formulas

Applications – Risk Analysis • Discrete Risks

– Continuous analogue is the exponential distribution

NEW!

27

v1.x

v1.x

Discrete Uniform Distribution Overview •

Distribution

Discrete Uniform Distribution 0.06

Parameters and Statistics – – –

0.05 0.04 0.03 0.02 0.01

Min = a Max = b Mean = a  b

0.00 0.00

1   p x    b  a  1  0

10.00

20.00

30.00

40.00

50.00

2

60.00

 0 ka  k  a  1 a  k  b F x   ak b  b  a  1  Otherwise k b  1

Key Facts

• pdf and cdf calculations can be handled as they are listed in the above formulas

2

12

– Excel

Variance = b  a  1  1

Applications – Risk Analysis • Discrete Risks • Quantity selections

– German Tank Problem

NEW!

28

v1.x

Befriending Your Distributions • Normal Demonstration • Triangular Derivations • Lognormal Derivations and Demonstration

29

Befriending Your Distributions • Probability distributions are to risk analysts what words are to poets • Look at distributions in multiple ways: – Graphical – PDF and CDF graphs – Numerical – Excel functions – Algebraic – formulae, parameters

• Build a “toy problem” and play with it • Understand properties of distributions, when they “arise,” how they are related to other distributions Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

30

v1.x

v1.x

Excel 2010 Distributions •

New and improved format for statistical functions in Excel – –

Same set of distributions – –

Continuous: BETA, CHISQ, EXPON, F, GAMMA, GAMMALN, LOGNORM, NORM, T, WEIBULL Discrete: BINOM, HYPGEOM, NEGBINOM, POISSON

Suffixes denote different variants – – – – –

More consistent syntax Excel 2007 versions maintained for backwards compatibility

.DIST = cdf (and sometimes pdf) .S = standard (normal only) .INV = inverse cdf .2T = two-tail .RT = right tail (left tail is default)

Examples: – –

=NORM.S.INV(RAND()) will generate a standard normal =T.INV.2T(0.05,30) will give the (positive) critical value for a t-test at alpha = 0.05, 30 degrees of freedom Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

31

Triangular Distribution – PDF and Mean • •

v1.x

For Triangle(L,M,H) , denote L=a, H=b, M=c by T(a,c,b) Since the area of the triangle must be 1 (100%), the height is twice the reciprocal of the base –

We can then derive the PDF by using similar triangles  2 xa  px    b  a c  a 2 bx  b  a b  c

  E X    xp x dx   b

c

a

a

axc c xb b 2x b  x 2x x  a dx dx   c ba bc ba ca

c 2 3 2 2 x x a x 2b  x 3  1 3 3   ba  ca bc  a

  1 2 2 2 2 2 2 2 2 2 2 2 2          c ac a ac a b bc b bc c    3  3 3 3 3 3 b a    c

b

1  bc  ac b 2  a 2  a  b  c     3  3 ba  3 “Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

32

Triangular Distribution – Variance

 2  E  X   2  E2 X 2    2

v1.x

1 4 2 3 c 2 3 1 4 x b x 2  x  xa b c 2x b 2x 1 x a b x   2 3 3 2  E X 2  x 2 p x dx  dx  dx  b  a  c  a bc a a ba ca c ba bc  a 1 1 3 2 2 2 3 1 3 2 2 3 2 3 2 2 2 2 3                c ac a c a c a a c a b b c bc b b c bc c  3 3 2 b  a  2

  

 

 

 

 

     c

b

2 2 1 2 a 2  b 2  c 2  ab  ac  bc 2 2 2 2  c  bc  ac  b  ab  a  c  bc  ac  b  ab  a  3 2 6 Square of the base 2 2 2 2 a  b  c a  b  c  ab  ac  bc 2 2 2   minus product of the 2     half-bases! 3 9  

 

3a 2  3b 2  3c 2  3ab  3ac  3bc 2a 2  2b 2  2c 2  4ab  4ac  4bc E X    18 18 2 a 2  b 2  c 2  ab  ac  bc b 2  2ab  a 2  c 2  ab  ac  bc b  a   c  a b  c     18 18 18 2 2 c 2  2ac  a 2  b 2  2bc  c 2  bc  ab  c 2  ac c  a   b  c   c  a b  c    18 18

  2

2

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

Sum of squares of half-bases and product of half-bases! 33

NEW!

Substituting a Triangular for a Normal: The √6 Factor •

v1.x

For a symmetric triangle, let ML = m, L = m-w, H = m+w, where w is the half-base – Then the mean is m, and the variance is w2/6

• •

It follows that the half-base is greater than the standard deviation by a factor of √6 To approximate a normal, N(μ, σ) the factor of √6 is multiplied by the standard deviation of the normal to be emulated to produce the halfbase 1.0 – By this means, end points are found that will produce a triangular distribution that emulates the underlying normal in mean and standard deviation – This triangular distribution, Tri(μ -√6σ, μ, μ +√6σ) differs from the underlying normal in all other moments, and at all percentiles other than the median and two “cross-over” points, but the difference is minor “Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

0.9 0.8 0.7 0.6 0.5 0.4 0.3 Tri

0.2

Norm

0.1 0.0

‐3.0 ‐2.5 ‐2.0 ‐1.5 ‐1.0 ‐0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

NEW!

34

v1.x

Derivation of Lognormal pdf Both parametrized in terms of mean and standard deviation of the related normal

• Relationship of Lognormal (X) and corresponding Normal (Y)

X  eY ~ LogN  ,    Y  ln X ~ N  ,  

• Lognormal cdf in terms of related Normal cdf

FX  x   P X  x   PY  ln x   FY ln x  • Lognormal pdf is derivative of cdf d d f X  x   FX '  x   FX  x   FY ln x  dx

dx

– Apply chain rule

1 1 d d  FY ln x   ln x  fY ln x   e dy dx x x 2 Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

ln x   2   2 2

NEW!

35

Derivation of Lognormal Median • Definition of median

m

0

v1.x

1 f X  x dx  2

• Lognormal pdf (see previous slide)

1

m

0

x 2

 ln x   2 

e

2 2

dx 

1 2

• Substitution

ln m



1 e  2

 y   2  2 2

dy 

1 y  ln x  dy  dx x

1 2

• This is just the pdf of the related normal!  0.5  ln m    m  e  • Median is preserved by transformation Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

36

Derivation of Lognormal Mode • Definition of mode, apply product rule and chain rule! d 1 f X ' x   e dx x 2

ln x   2   2 2

1  e  2

ln x   2   2 2

 1 ln x     2  2 2   0  x   x

• First term is strictly positive, get common Note mode denominator shift (left)  2  ln x   e 2    2    0  ln x      x  e   e 1  CV 2 2 2  x 1  CV 2

• pdf is concave down at peak – We’ll spare you the derivative! Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

See later CV derivation

NEW!

37

v1.x

Derivation of Lognormal Mean

v1.x

• Definition of expected value, lognormal pdf x  e y  dx  e y dy (see earlier slide), substitution 

0

0

E  X    xf X  x dx  

1 e  2

ln x   2   2 2

1 dx   e   2

 y   2  2 2

e y dy

• Completing the square! 

1  e   2

 y 2  2 y   2  2 2 y 2 2

1 e   2

dy  

 



2

 y    2  2  2  4 2 2

• Normal pdf, area under the curve = 1! e



2 2

1   2 e

 

 y    2 2

2



2

dy  e



2 2

dy

Note mean shift (right)

2   CV 2     e 1  CV  e 1  2   See later CV

NEW!

38

v1.x

Derivation of Lognormal Variance • Same approach, substitution x  e y  dx  e y dy

  

E X

2

0

x f X  x dx   2

0

1  e   2 e

x e  2

 y 2  2 y   2  4 2 y 2 2

2   2 2

 ln x   2  2 2

1 e   2

dx  

1 e   2

dy  

1   2 e

 

 y    2 2 2 2



 

   E  X 

Var  X   E X

2

e

2   2 2

2 2

e 2 y dy



2

 y    2 2  4  2  4 4 2 2

dy

2

dy  e

2   2 2

• Recall “easy-to-compute” variance 2

 y   2 

e

2   2

e

2   2

e

2

1

NEW!

39

v1.x

Derivation of Lognormal CV • Standard deviation is square root of variance e

• First factor is the mean! CV  e

2



2 2

e

2

1

1

• Note that the CV is entirely a function of the variance (not CV) of the related normal – Pythagorean relationship 1  CV  e – Normal standard deviation as function of lognormal CV 2

2

 2  ln 1  CV 2     ln 1  CV 2   CV Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

std dev  CV  (norm) (lognorm) 0.100 10% 0.198 20% 0.294 30% 0.385 40% 0.472 50%

NEW!

40

v1.x

Derivation of Related Normal

• Manipulate first and second moment to solve for normal parameters Agrees with previous • Variance of related normal as function of CV result variance and mean of lognormal

 

 Var  X   E X Var  X   E  X  e 2 2    2   2  e    ln1  2 2 2  E  X  E  X  e  E  X   2

2

2   2 2

• Mean of related normal as function of variance and mean of lognormal E  X 

 

E X

– Divide top and bottom by mean of lognormal E  X 2 e 2   EX  1  Var  X        e     ln  lnE  X   ln1  2  2 2  E  X   Var  X  e Var  X   E  X  1 E  X 2 2

2

2

Note mean shift (left)

2

NEW!

41

v1.x

Related Normal Example • Mean = 100, CV = 20% – Mode shift = -3.8% (relative to median) – Mean shift = +2.0% (relative to median) 3

CV  mode shift  mean shift  percentile  (lognorm) factor factor of mean 10% 0.990 1.005 52.0% 20% 0.962 1.020 53.9% 30% 0.917 1.044 55.8% 40% 0.862 1.077 57.6% 50% 0.800 1.118 59.3%

Median = 98.0

0.03

Mean = ln(98.0) = 4.6

2.8 2.6

Related Normal  Check

0.028 = 94.3 Mode

Related Normal

2.4

0.022

2

0.02

1.8

0.018

1.6

0.016

1.4

0.014

1.2

0.012

1

0.008

0.6

0.006

0.4

0.004

0.2

0.002

0

0 1

2

3

4

5

6

7

8

9

10

Mean = 100.0

0.01

Std dev = 0.2

0

Normal

0.024

2.2

0.8

Lognormal

0.026

‐50

0

50

100

150

200

NEW!

250

42

300

v1.x

Selecting From a Distribution • Random Number Generation • Inverse CDF Technique

43

v1.x

The Inverse CDF Technique •

Monte Carlo Simulation requires a means to generate probability distributions – All events in a simulation must be assigned a value from some sort of probability distribution – The Inverse CDF Technique is the easiest and most common

The CDF of a distribution maps a value (x) to the probability that the random variables takes on a value less an or equal to x – Therefore, the inverse of the CDF maps a probability (between 0 and 1) to a value from the distribution – By generating a random uniform(0,1) random number, we can produce a value from any distribution with an invertible CDF

Every simulation must contain some sort of Uniform (0,1) random number generator – In Excel this function is “=RAND()”, although in terms of “randomness” it is not sufficient – There is an entire area of computer science dedicated to the production of the “most random” numbers (of particular interest to cryptographers) Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

NEW!

44

Inverse CDF Normal Example • The RAND() function returns a value of 0.0639 • This is plugged into the inverse CDF • The value -45.677 is returned

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -90

-60

-30

– 1.523 standard deviations below the mean

0.014

• This is at the appropriate percentile of the distribution

0.008

0

30

60

90

0

30

60

90

0.012 0.01

0.006 0.004 0.002 0 -90

-60

-30

NEW!

45

v1.x

v1.x

Applications of the Triangular Distribution • Subject Matter Expert (SME) Understatement of Uncertainty • Conflating SME Inputs

46

The Geometry of Symmetric Triangles • •

v1.x

For a symmetric Triangle(L, M, H), where M-L = H-M Find points l and h such that l and h are the pth and 1-pth percentiles If l-L = 1/2*(M-L), H-h = 1/2*(H-M), then p = 1/(2*22) = 1/8 = 12.5% If l-L = 1/3*(M-L), H-h = 1/3*(H-M), then p = 1/(2*32) = 1/18 = 5.6% pth percentile → √(p/2) base fraction → √(2p) half-base fraction So, the 20th percentile -> 1/5 occurs at point √(1/10) = 0.3162 base fraction These two “tiled pictures” show two relationships of a fraction of the base to a fraction of the area, showing the above equations in a graphical way.

4

9

1 L

1

1 l 2

M

h

H

L

l

M

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

h 3 NEW!

1 47

H

v1.x

Triangles With Related Areas • •

We wish to know how to draw triangular distributions that are related 1 1 to one another h A  bh  bk   Constant area: 2 2 k – Used in expansion of experts (correcting understated variance) – For area to remain constant, in this case A = 1, as the base increases by a factor, the height must be multiplied by the reciprocal of that factor

Reduction in area: – For area to be reduced by a factor, the dimensions of a similar triangle must be reduced by the square root of that factor

1 1 A2  A1  b1h1  2k k

1  b1  h1     2  k  k 

– For area to be reduced by a factor, the height must be reduced by that factor if the base is to remain constant h 1 1 1 • Used in sampling of experts

A2 

k

A1 

2k

b1h1 

2

b1 

  k

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

48

1

Correction of Understated Variance for Triangles •

v1.x

For symmetric triangles – – –

To expand from 20-80 to Min-Max, multiply by 2.72 = 1/0.368 √(1/10) = 0.3162 base fraction √(2/5) = 0.6325 half-base fraction

20

20

60 0.368

0.632 0.316

0.684

– – – –

To expand from plus-or-minus-one-sigma to Min-Max, multiply by 2.45 (√6) (√6-1)/2√6 = 0.2959 base fraction (√6-1)/√6 = 0.5918 half-base fraction Compare with 68.3% within 17.5 17.5 one sigma rule of thumb for 65 Normal distribution 0.418

0.592 0.296

0.704

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

49

Correction of Understated Variance for Triangles

v1.x

• For symmetric triangles – – – –

General case To expand from pth-(1-p)th to Min-Max, multiply by 1/(1-√(2p)) If 2p = α, then multiply by 1/(1-√ α) To expand from (α1/2)th-(1-α1/2) th to (α2/2)th-(1-α2/2) th [α1 > α2], multiply by (1-√ α 2)/(1-√ α 1) p p – For example, to expand from 1-2p 33-67 to 20-80, multiply by (1-√ (2/5)/(1-√ (2/3)) ≈ 2.0 1-√(2p) √(2p) √(p/2)

1-√(p/2)

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

50

Variance of Hybrid Distributions – A Pythagorean Relationship •

v1.x

Suppose k distributions with pdf pi(xi), mean μi, and standard deviation σi are sampled Then the pdf of the hybrid distribution is the “average” of the pdfs 1 k p  x    pi  xi  k i 1

The mean of the hybrid distribution is the average of the means k

1 k   E  X     xi pi xi dxi  k i 1

 i 1

i

k

The variance of the hybrid distribution is the average of the variances plus the variance of the means taken as a discrete probability distribution! –

See next slide for derivation “Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

51

Variance of Hybrid Distributions – A Pythagorean Relationship

  k

 

E X

2

1  k

k

 i 1

2 x  i p i x i dx i 

  k

 2  E X

2

 

2

i 1

k

  i2

  i2

k 1  k

k

 i 1

 i  

2

  2  2  i  1 k   i 1 i 1      i   k  k  k i 1     In the special case of two congruent distributions with centers at m-d and m+d, the variance is   m  d 2  m  d 2 2    m2   2  d 2 2   k

  i2

2 i

i 1

2 i

v1.x

k

“Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

52

Equivalence of Averaging Distributions and v1.x Averaging Parameters for Symmetric Triangles •

In the case of symmetric triangles, averaging the individual triangles (with perfect rank correlation) can be shown to be equivalent to averaging the parameters –

We will prove it in the case of two triangles, but the proof can easily be extended to more

As previously shown, the pth percentile (p0.5 Since all percentiles are equal, the resulting distributions are identical

Monte Carlo simulation could be used to explore the difference between the two methods for asymmetric triangles, but it is not expected to be large “Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

53

Equivalence of Averaging Means and Averaging Modes for Triangles •

If we average parameters, as long as we average mins and maxes, it doesn’t matter whether we average means or modes – –

Algebraically equivalent Any number of triangles, symmetry not required

Let the kth triangle be T(ai, ci, bi), and parameter-averaged triangle be k k k T(A, C, B), where

A •

a i 1

k

i

C

c

i

i 1

k

B

b i 1

i

k

This is averaging the modes; the resulting mean is

A BC  3 •

v1.x

k

k

k

i 1

i 1

i 1

 ai   bi   ci 3k

 ai  bi  ci     3   i 1  k k

which is just the average of the means! Reversing the flow, averaging the means can be shown to produce a mode which is the average of the modes “Understatement of Risk and Uncertainty by Subject Matter Experts (SMEs)”, P.J. Braxton, R.L. Coleman, SCEA/ISPA, 2011.

NEW!

54

v1.x

Cost Growth Factor and Percentiles • A result for Final Output Distributions

55

v1.x

Implied Percentile or CGF • For a given CV, Cost Growth Factor (CGF) and percentile (of the point estimate) are related – Think of CGF as the factor it take to bring the point estimate up to the true mean – Different answers for normal of lognormal

• Given a CGF and CV, we might ask: – At what percentile must the point estimate be?

• Given a percentile and CV, we might ask: – What is the CGF between that value and the mean?

NEW!

56

v1.x

Implied Percentile – Normal • Without loss of generality (w.l.o.g.), assume mean of one (1.0) – Then the standard deviation is equal to the CV!

• We are looking for the percentile of the reciprocal of CGF X  X ~ N  ,    Z  ~ N 0,1   1 / CGF   1   1 / CGF     p        CV    

NEW!

57

v1.x

Implied Percentile – Lognormal • W.l.o.g., assume lognormal mean of one (1.0) • Then the mean and standard deviation of the related normal are:   1    1  CV 

  ln

2

  ln 1  CV 2 

• We are looking for the percentile of the reciprocal of CGF (unit space) X  eY ~ LogN  ,    Y  ln X ~ N  ,    Z    1  CV 2  ln   CGF  ln1 / CGF      p        ln 1  CV 2    

Y 

~ N 0,1

      

NEW!

58

v1.x

Percentile-to-CGF Conversion • This table shows the implied CGF for a given percentile of the Point Estimate for various CVs of a Normal distribution Coefficient of Variation (CV)

NORMAL p 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

0.05 1.09 1.07 1.05 1.04 1.03 1.03 1.02 1.01 1.01 1.00 0.99 0.99 0.98 0.97 0.97 0.96 0.95 0.94 0.92

0.10 1.20 1.15 1.12 1.09 1.07 1.06 1.04 1.03 1.01 1.00 0.99 0.98 0.96 0.95 0.94 0.92 0.91 0.89 0.86

0.15 1.33 1.24 1.18 1.14 1.11 1.09 1.06 1.04 1.02 1.00 0.98 0.96 0.95 0.93 0.91 0.89 0.87 0.84 0.80

0.20 1.49 1.34 1.26 1.20 1.16 1.12 1.08 1.05 1.03 1.00 0.98 0.95 0.93 0.91 0.88 0.86 0.83 0.80 0.75

0.25 1.49 1.34 1.26 1.20 1.16 1.12 1.08 1.05 1.03 1.00 0.98 0.95 0.93 0.91 0.88 0.86 0.83 0.80 0.75

0.30 1.49 1.34 1.26 1.20 1.16 1.12 1.08 1.05 1.03 1.00 0.98 0.95 0.93 0.91 0.88 0.86 0.83 0.80 0.75

0.35 2.36 1.81 1.57 1.42 1.31 1.22 1.16 1.10 1.05 1.00 0.96 0.92 0.88 0.84 0.81 0.77 0.73 0.69 0.63

0.40 2.92 2.05 1.71 1.51 1.37 1.27 1.18 1.11 1.05 1.00 0.95 0.91 0.87 0.83 0.79 0.75 0.71 0.66 0.60

0.45 3.85 2.36 1.87 1.61 1.44 1.31 1.21 1.13 1.06 1.00 0.95 0.90 0.85 0.81 0.77 0.73 0.68 0.63 0.57

0.50 5.63 2.78 2.08 1.73 1.51 1.36 1.24 1.15 1.07 1.00 0.94 0.89 0.84 0.79 0.75 0.70 0.66 0.61 0.55

59

v1.x

Percentile-to-CGF Conversion • This table shows the implied CGF for a given percentile of the Point Estimate for various CVs of a Lognormal distribution Coefficient of Variation (CV)

LOGNORMAL p 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

0.05 1.09 1.07 1.05 1.04 1.04 1.03 1.02 1.01 1.01 1.00 0.99 0.99 0.98 0.98 0.97 0.96 0.95 0.94 0.92

0.10 1.18 1.14 1.11 1.09 1.07 1.06 1.04 1.03 1.02 1.00 0.99 0.98 0.97 0.95 0.94 0.92 0.91 0.88 0.85

0.15 1.29 1.22 1.18 1.15 1.12 1.09 1.07 1.05 1.03 1.01 0.99 0.97 0.95 0.94 0.91 0.89 0.87 0.84 0.79

0.20 1.41 1.31 1.25 1.20 1.17 1.13 1.10 1.07 1.05 1.02 0.99 0.97 0.94 0.92 0.89 0.86 0.83 0.79 0.74

0.25 1.55 1.41 1.33 1.27 1.22 1.17 1.13 1.10 1.06 1.03 1.00 0.97 0.94 0.91 0.87 0.84 0.80 0.75 0.69

0.30 1.69 1.52 1.42 1.34 1.27 1.22 1.17 1.12 1.08 1.04 1.01 0.97 0.93 0.90 0.86 0.82 0.77 0.72 0.64

0.35 1.85 1.64 1.51 1.41 1.33 1.27 1.21 1.15 1.11 1.06 1.02 0.97 0.93 0.89 0.84 0.80 0.74 0.69 0.61

0.40 2.03 1.76 1.61 1.49 1.40 1.32 1.25 1.19 1.13 1.08 1.03 0.98 0.93 0.88 0.83 0.78 0.72 0.66 0.57

0.45 2.22 1.90 1.71 1.57 1.46 1.37 1.29 1.22 1.16 1.10 1.04 0.98 0.93 0.88 0.82 0.76 0.70 0.63 0.54

0.50 2.43 2.05 1.82 1.66 1.54 1.43 1.34 1.26 1.19 1.12 1.05 0.99 0.93 0.87 0.81 0.75 0.69 0.61 0.51

60

v1.x

Alternate Specifications • Normal • Lognormal • Two Lognormals: Ambiguity from Square Roots Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

61

Alternate Specification of Normal Given

Mean

Std Dev

CV

Mean, CV

CV  

CV

Mean, Percentile (Xp, p)

Xp 

Zp

Z p   1  p  CV, Percentile (Xp, p) Two Percentiles

Z i    pi  i  1,2 1

Xp

X p  CV

1  Z p  CV

1  Z p  CV

Z 2 X 1  Z1 X 2 Z 2  Z1

X 2  X1 Z 2  Z1

 Xp   1    Zp CV

  NEW!

62

v1.x

v1.x

Alternate Specification of Lognormal Given Mean, CV Mean, Percentile (Xp, p) Z p   1  p  CV, Percentile (Xp, p)

Mean

EX 

CV  E  X 

EX 

e



1  pi  Z   Two i i  1,2 Per-

centiles

Std Dev

CV  E  X  CV  E  X 

2

e

2



2

2

CV  E  X 

CV

µ

σ

CV

 EX  ln 2  1  CV

   

2

 EX  ln 2  1  CV

   Z p2  2 ln X p   EX   

e

1

CV

e

2

1

Zp 

ln X p  Z p ln 1  CV

ln 1  CV 2

2

ln 1  CV 2

Z 2 ln X 1  Z1 ln X 2 Z 2  Z1

NEW!

 X2   ln  X1  Z 2  Z1 63

v1.x

Alternate Specification of Lognormal Given

Mean

Std Dev

Mean, Median (m)

EX 

CV  E  X 

Median (m), Mode (M)

e

Mean, Mode (M)

EX 



2 2

CV  E  X 

CV  E  X 

CV

e

e

e

2

2

2

µ

σ

1

ln m 

 EX   2 ln   m 

1

ln m 

m ln  M 

1

1 2 ln E  X  M 3

2  EX   ln  3  M 

NEW!

64

v1.x

Alternate Specification of Lognormal Given

Mean

Median 2 (m),  Percentile e 2 (Xp, p) Z p   1  p  Mode (M), Percentile (Xp, p)

e



2 2

Std Dev

CV  E  X 

CV  E  X 

CV

e

e

2

2

1

1

µ

σ

ln m 

X  ln p   m  Zp

 



ln M 1  CV 2 1

Zp 2

 Xp   Z p2  4 ln M 2  

NEW!

65

v1.x

Two Lognormals • •

The plus-or-minus in the two previous orange cells gives rise to the possibility of two solutions to a given specification Mean and a Percentile  Xp  2 – Percentile > Mean: two tenable solutions – Percentile = Mean: one tenable solution and one singular solution (σ = 0) – Percentile < Mean: one tenable solution and one untenable solution (σ < 0)

Mean and a Percentile

  Z p  Z p  2 ln

   E X  

 

– Percentile < Mode: two tenable solutions – Percentile = Mode: one tenable solution and one singular solution (σ = 0) – Percentile > Mode: one tenable solution and one untenable solution (σ < 0)

Zp 2

 Xp  1  Z p2  4 ln 2 M 

third-class lever!

Note that the mischief occurs when the percentile (load) is on the other side of the mean/mode (effort) from the median (fulcrum) Unit III - Module 10 © 2002-2011 SCEA. All rights reserved.

66

v1.x

Two Lognormals Example • Mean = 100, 80th percentile = 120 – “Regular” solution with 26% CV – “Extreme” solution with 258% CV! 80th percentile = 120

1

Mode = 4.7

0.025

Mode = 90.7

0.8

0.02

0.6

0.015

Median = 96.8

Median = 36.1 0.4

0.01

0.2

0.005

0

0 0

20

40

60

80

100

120