Statistics with Excel Examples

Statistics with Excel Examples Questions • What is a probability density function (pdf)? • What is a cumulative density function (cdf)? • What is an...
Author: Austin Ross
0 downloads 2 Views 442KB Size
Statistics with Excel Examples

Questions • What is a probability density function (pdf)? • What is a cumulative density function (cdf)? • What is an inverse cdf? • Examples of distributions:  Uniform, Normal, Beta, Gamma, ChiSquare

• Random Numbers  What is a uniform distribution?  The Excel worksheet function rand()

• Synthesis of distributions. • What is an “Order Statistic”?  Role of Beta distribution.  Synthesis of order statistic distributions. Statistics with Excel Examples, G. Shirley

January 31, 2012

2

Synthesis of Distributions • Consider a cdf, F. P  F ( x)  Probability as a function of some distributed variable.  Examples of F: BETADIST, GAMMADIST,..

• We want a collection of x’s, x1, x2, x3,…xi,… distributed according to F. • How to do it?  Generate a random number from the uniform distribution on [0,1] and plug into the inverse cdf. xi  F 1 U i   Examples x=BETAINV(rand(),Alpha,Beta)

x=NORMSINV(rand())

Distributed normally, with mean 0, and sd = 1.

Statistics with Excel Examples, G. Shirley

January 31, 2012

3

Order Statistics • Sample n numbers from a distribution, F. • Pick the kth smallest  k=1 is the smallest.  k=n is the largest.

• Do this many times. • How is the k:n distributed?  It is the k:n (“k of n”) order statistic of F.  The 1:1 order statistic is F itself.

Statistics with Excel Examples, G. Shirley

January 31, 2012

4

Synthesis of Order Statistics • Mental furniture (just know it):  The k:n order statistic of the uniform distribution is the Beta distribution with Alpha = k, Beta = n+k-1.

• To generate numbers distributed according to the k:n order statistic of the uniform distribution: x = BETAINV(rand(), k, n+1-k) Uk:n

U1:1

Alpha

Beta

• And for any distribution, generate its k:n order statistic by plugging Uk:n into the “probability” argument of its inverse cdf, for example: xk:n=NORMSINV(Uk:n)

Isn’t that nice!

Statistics with Excel Examples, G. Shirley

January 31, 2012

5

Normal Distribution • To synthesize instances distributed according to a dist’n, plug uniformly dist’d random numbers into the probability argument of the inverse cdf of the dist’n. • Synthesis of normally distributed data  Mean = m, Variance = V (standard error = V) Inverse of standard normal dist’n. Mean = 0, variance = 1.

x m V z

z   1  Probability  is called the "Probit"

x  m  V   1  u  x  m  V  NORMSINV  rand() 

Uniformly distributed

Normally distributed with mean, m, and standard deviation V.

Statistics with Excel Examples, G. Shirley

January 31, 2012

6

Synthesis of a Multi-Normal Dist’n • For each sample, instead of generating one random number, generate one vector of random numbers. • And make the numbers in each vector correlated. • To do this, generalize to

x  m V z Square root of variance!

x  m Rz Desired sample vector i.

 x1i   m1   R11  i     x2   m2   R21  x3i    m3    R31       ..   ..   ..  ..   ..   ..

R12 R22 R32

R13 .. ..  z1i   i  R23   z2    z3i  R33     ..    .. 

Vector of means.

Matrix which is the “root” of the covariance matrix. Vector of independently sampled Probits. Instances are i = 1, 2, 3, .. Statistics with Excel Examples, G. Shirley

January 31, 2012

7

Covariance Matrix, and Its Root • The two dimensional covariance matrix is...   12  1 2   V  2      2  1 2 

 12  x12  x1

2

 22  x22  x2

2

 1 2   x1 x2  x1 x2

• Since V is real-symmetric and +ve definite, V can be •

factorized such that V  RR Transpose of R. So, since..

0  1  V  1   0  2  

   1 1   0

0   1 0   1    2   0  2   

 1  1   2  0 0



0  2    1 0    1   1       1   2   0  2   2   2 1   2   0  2 1   2 

• ..we have

  1 0   R  2    1   2  2 

The upper (or lower) triangular root is the “Cholesky root”.

Statistics with Excel Examples, G. Shirley

January 31, 2012

8

Covariance Matrix, and Its Root • Other roots differing from the Cholesky root by a rotation work too.. 0  1  V  1   0  2   1

 

1   0

0  2 

  1    1      1    1    1   1    1    1      1    1     1    1      1    1       1    1      1    1      1    1      1    1    

  1 0   2    0  2   12   1 12   1  22

   1

1   1 

1 2

1 2

1 2

1 2

1 2

1 2

 

1 1 2

1 1 2

1 2 2

1 2 2

1 1 2

1 2 2

 

 

1    1      1 0   1   1    0 2  

 

1   1    1   1   

• So

 

  1 12 1    1   R   1  2 2 1    1  

 

 

 

 1 12 1    1     1  2 2 1    1   

Statistics with Excel Examples, G. Shirley

January 31, 2012

9

2-D Example • What happens when  = 1,  = -1,  = 0? x  m  R z 0   z1   x1   m1    1    x   m    2   2   2   2   2 1     z2  x1  m1   1 z1 x2  m2   2  z1   2 1   2 z2 z1   1 U1   NORMSINV(rand()) z2   1 U 2   NORMSINV(rand())

• U1 and U2 are independently sampled from the uniform distribution on [0,1].

Statistics with Excel Examples, G. Shirley

January 31, 2012

10

Gaussian Copula • Set m1 = m2 = 0, 1 = 2 = 1  x1   1 x     2   

  z1    2  1     z2  z1   1 U1   NORMSINV(rand()) 0

z2   1 U 2   NORMSINV(rand())

• x1 and x2 are normally distributed with mean = 0, var = 1  c1     x1    c    x   2    2  • c1 and c2 are uniform on [0,1], so simulate any marginal 1 distn’s  A   x1   d  1 A, B are arbitrary cdfs.   d    1  2   B   x2   inverse!

Statistics with Excel Examples, G. Shirley

January 31, 2012

11

Gaussian Copula • Infamously implicated in financial disaster:

Recipe for Disaster: The Formula that Killed Wall Street. Wired Mag. February 2009

Miscorrelation in Manufacturing Test

April 20, 2011

12

Extension to N Dimensions • Gaussian copulas are easily extended to N dimensions. V11 V12 V13 V V V 23 V   21 22 V31 V32 V33  .. ..  ..

.. ..  ..  ..

Vij  V ji  xi x j  xi

xj

• If all marginal distributions have m = 0 and  = 1, then  1  V   21   31   ..

12 1

 32 ..

13 ..  23 .. 1 ..

..  ..

ij = ji is the correlation coefficient between variables i and j.

• Calculation of the Cholesky root of V.  Analytically messy for N > 2.  But algorithms are easily available. Statistics with Excel Examples, G. Shirley

January 31, 2012

13