p(x) x

Appendix D Probability Distributions This appendix archives a number of useful results from texts by Papoulis [44], Lee [33] and Cover [12]. Table 16....
Author: Laureen Cooper
61 downloads 3 Views 281KB Size
Appendix D Probability Distributions This appendix archives a number of useful results from texts by Papoulis [44], Lee [33] and Cover [12]. Table 16.1 in Cover (page 486) gives entropies of many distributions not listed here.

D.1 Transforming PDFs Because probabilities are de ned as areas under PDFs when we transform a variable y = f (x) (D.1) we transform the PDF by preserving the areas p(y)jdyj = p(x)jdxj (D.2) where the absolute value is taken because the changes in x or y (dx and dy) may be negative and areas must be positive. Hence p(y) = p(x) (D.3)

j dxdy j

where the derivative is evaluated at x = f (y). This means that the function f (x) must be one-to-one and invertible. 1

If the function is many-to-one then it's inverse will have multiple solutions x ; x ; :::; xn and the PDF is transformed at each of these points (Papoulis' Fundamental Theorem [44], page 93) (D.4) p(y) = p(dyx ) + p(dyx ) + ::: + p(dyxn ) 1

1

j dx j 1

2

j dx j 2

2

j dxn j

D.1.1 Mean and Variance For more on the mean and variance of functions of random variables see Weisberg [64]. 161

Expectation is a linear operator. That is

E [(a x + a x)] = a E [x] + a E [x] 1

2

1

2

(D.5)

Therefore, given the function

y = ax (D.6) we can calculate the mean and variance of y as functions of the mean and variance of x. E [y] = aE [x] V ar(y) = a V ar(x)

(D.7)

2

If y is a function of many uncorrelated variables

y= we can use the results

E [y] = V ar[y] =

X i

aixi

X i X i

But if the variables are correlated then

V ar[y] =

X i

ai V ar[xi] + 2 2

(D.8)

ai E [xi ] ai V ar[xi] 2

XX i

j

aiaj V ar(xi; xj )

(D.9) (D.10)

(D.11)

where V ar(xi; xj ) denotes the covariance of the random variables xi and xj .

Standard Error As an example, the mean

X m = N1 xi

of uncorrelated variables xi has a variance

m  V ar(m) = 2

i

X1 i x2

N V ar(xi)

= N where we have used the substitution ai = 1=N in equation D.10. Hence m = px N

(D.12)

(D.13)

(D.14)

0.2

p(x)

0.15

0.1

0.05

0 −10

−5

0

5

10

15

x

Figure D.1: The Gaussian Probability Density Function with  = 3 and  = 2.

D.2 Uniform Distribution The uniform PDF is given by

U (x; a; b) = b 1 a

(D.15)

for a  x  b and zero otherwise. The mean is 0:5(a + b) and variance is (b a) =12. 2

The entropy of a uniform distribution is H (x) = log(b a)

(D.16)

D.3 Gaussian Distribution The Normal or Gaussian probability density function, for the case of a single variable, is ! 1 ( x  ) N (x; ;  ) = (2 ) = exp (D.17) 2 2

2

2 1 2

2

where  and  are the mean and variance. 2

D.3.1 Entropy The entropy of a Gaussian variable is H (x) = 12 log  + 12 log 2 + 21 2

(D.18)

0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0

2

4

6

8

10

Figure D.2: The Gamma Density for b = 1:6 and c = 3:125. For a given variance, the Gaussian distribution has the highest entropy. For a proof of this see Bishop ([3], page 240).

D.3.2 Relative Entropy For Normal densities q(x) = N (x; q ; q ) and p(x) = N (x; p; p ) the KL-divergence is   +  +  2q p 1 D[qjjp] = 21 log p + q p 2 q (D.19) 2 2

2

2

2

2

2

2

2

q

p

D.4 The Gamma distribution The Gamma density is de ned as  x c 1 x (x; b; c) = exp (c) bc b 1

(D.20)

where () is the gamma function [49]. The mean of a Gamma density is given by bc and the variance by b c. Logs of gamma densities can be written as log (x; b; c) = x + (c 1) log x + K (D.21) b where K is a quantity which does not depend on x; the log of a gamma density comprises a term in x and a term in log x. The Gamma distribution is only de ned for positive variables. 2

D.4.1 Entropy Using the result for Gamma densities Z

p(x) log x = (c) + log b

(D.22)

where () is the digamma function [49] the entropy can be derived as

H (x) = log (c) + c log b (c 1)( (c) + log b) + c

(D.23)

D.4.2 Relative Entropy For Gamma densities q(x) = (x; bq ; cq ) and p(x) = (x; bp; cp) the KL-divergence is

D[qjjp] = (cq 1) (cq ) log bq cq log (cq ) + log (cp) + cp log bp (cp 1)( (cq ) + log bq ) + bbq cq

(D.24)

p

D.5 The 2-distribution If z ; z ; :::; zN are independent normally distributed random variables with zero-mean and unit variance then 1

2

x=

N X i=1

zi

(D.25)

2

has a  -distribution with N degrees of freedom ([33], page 276). This distribution is a special case of the Gamma distribution with b = 2 and c = N=2. This gives 2

1 xN= exp  x   (x; N ) = (N= 2) 2N= 2 2

2

1

2

(D.26)

The mean and variance are N and 2N . The entropy and relative entropy can be found by substituting the the values b = 2 and c = N=2 into equations D.23 and D.24. The  distribution is only de ned for positive variables. 2

If x is a  variable with N degrees of freedom and 2

p

y= x

(D.27)

then y has a -density with N degrees of freedom. For N = 3 we have a Maxwell density and for N = 2 a Rayleigh density ([44], page 96).

0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0

2

4

6

8

10

Figure D.3: The  Density for N = 5 degrees of freedom. 2

D.6 The t-distribution If z ; z ; :::; zN are independent Normally distributed random variables with mean  and variance  and m is the sample mean and s is the sample standard deviation then x = mp  (D.28) s= N has a t-distribution with N 1 degrees of freedom. It is written 1

2

2

t(x; D) = B (D=12; 1=2) 1 + xD

2

! (D+1)=2

(D.29)

where D is the number of 'degrees of freedom' and

B (a; b) = ((aa)+(bb))

(D.30)

is the beta function. For D = 1 the t-distribution reduces to the standard Cauchy distribution ([33], page 281).

D.7 Generalised Exponential Densities The `exponential power' or `generalised exponential' probability density is de ned as =R p(a) = G(a; R; ) = 2R (1=R) exp( jajR ) 1

(D.31)

0.4 0.35

0.3

0.3

0.25

0.25 p(t)

p(t)

0.4 0.35

0.2

0.2

0.15

0.15

0.1

0.1

0.05

0.05

0 −10

0 −10

(a) (b) Figure D.4: The t-distribution with (a) N = 3 and (b) N = 49 degrees of freedom. −5

0 t

5

10

0.1

−5

0 t

5

10

0.12

0.09

0.1

0.08 0.08

0.07 0.06

0.06

0.05

0.04

0.04 0.02

0.03

(b) (a) Figure D.5: The generalised exponential distribution with (a) R = 1,w = 5 and (b) R = 6,w = 5. The parameter R xes the weight of the tails and w xes the width of the distribution. For (a) we have a Laplacian which has positive kurtosis (k = 3); heavy tails. For (b) we have a light-tailed distribution with negative kurtosis (k = 1). 0.02 −10

−5

0

5

0 −10

10

−5

0

5

10

where () is the gamma function [49], the mean of the distribution is zero , the width of the distribution is determined by 1= and the weight of its tails is set by R. This gives rise to a Gaussian distribution for R = 2, a Laplacian for R = 1 and a uniform distribution in the limit R ! 1. The density is equivalently parameterised by a variable w, which de nes the width of the distribution, where w = =R giving p(a) = 2w R(1=R) exp( ja=wjR) (D.32) 1

1

The variance is

=R) V = w (3 (1=R) which for R = 2 gives V = 0:5w . The kurtosis is given by [7] 2

(D.33)

2

) (1=R) 3 K = (5=R (D.34) (3=R) where we have subtracted 3 so that a Gaussian has zero kurtosis. Samples may be generated from the density using a rejection method [59]. 2

1

For non zero mean we simply replace a with a  where  is the mean.

D.8 PDFs for Time Series Given a signal a = f (t) which is sampled uniformly over a time period T , its PDF, p(a) can be calculated as follows. Because the signal is uniformly sampled we have p(t) = 1=T . The function f (t) acts to transform this density from one over t to to one over a. Hence, using the method for transforming PDFs, we get p(a) = p(t) (D.35)

j dadt j

where jj denotes the absolute value and the derivative is evaluated at t = f (x). 1

D.8.1 Sampling When we convert an analogue signal into a digital one the sampling process can have a crucial e ect on the resulting density. If, for example, we attempt to sample uniformly but the sampling frequency is a multiple of the signal frequency we are, in e ect, sampling non-uniformly. For true uniform sampling it is necessary that the ratio of the sampling and signal frequencies be irrational.

D.8.2 Sine Wave For a sine wave, a = sin(t), we get

p(a) = jcos1(t)j

(D.36)

where cos(t) is evaluated at t = sin (a). The inverse sine is only de ned for =2  t  =2 and p(t) is uniform within this. Hence, p(t) = 1=. Therefore p(a) = p 1 (D.37)  1 a This density is multimodal, having peaks at +1 and 1. For a more general sine wave 1

2

a = R sin(wt) we get p(t) = w=

(D.38)

p(a) = q 1  1 (a=R)

2

which has peaks at R.

(D.39)

4 3.5 3 2.5 2 1.5 1 0.5 0

−3

−2

−1

0

1

2

3

Figure D.6: The PDF of a = R sin(wt) for R = 3.

Appendix E Multivariate Probability Distributions E.1 Transforming PDFs Just as univariate Probability Density Functions (PDFs) are transformed so as to preserve area so multivariate probability distributions are transformed so as to preserve volume. If y = f (x) (E.1) then this can be achieved from p(y) = absp((xjJ) j) (E.2) where abs() denotes the absolute value and jj the determinant and 2 66 J = 66 4

3

@y1 :: @x @y2d 7 7 :: @x d 7 (E.3) :: :: :: :: 75 @yd @yd @yd @x1 @x2 :: @xd is the Jacobian matrix for d-dimensional vectors x and y. The partial derivatives are evaluated at x = f (y). As the determinant of J measures the volume of the transformation, using it as a normalising term therefore preserves the volume under the PDF as desired. See Papoulis [44] for more details. @y1 @x1 @y2 @x1

@y1 @x2 @y2 @x2

1

E.1.1 Mean and Covariance For a vector of random variables (Gaussian or otherwise), x, with mean x and covariance x a linear transformation y = Fx + C

171

(E.4)

4 0.2

3

0.15 0.1

2

0.05

1

0 4

0 2

−1 0

2 1 −2 0 −1 0 1 2 3 4 −2 (a) (b) −2−2 −1 Figure E.1: (a) 3D-plot and (b) contour plot of Multivariate Gaussian PDF with  = [1; 1]T and  =  = 1 and  =  = 0:6 ie. a positive correlation of r = 0:6. 4

3

11

22

12

21

gives rise to a random vector y with mean y = F x + C and covariance y = F xF T

(E.5) (E.6)

If we generate another random vector, this time from a di erent linear transformation of x z = Gx + D (E.7) then the covariance between the random vectors y and z is given by y;z = F xGT (E.8) The i,j th entry in this matrix is the covariance between yi and zj .

E.2 The Multivariate Gaussian The multivariate normal PDF for d variables is  1  1 T N (x; ; ) = (2)d= jj = exp 2 (x )  (x ) 1

2

1 2

(E.9)

where the mean  is a d-dimensional vector,  is a d  d covariance matrix, and jj denotes the determinant of .

E.2.1 Entropy The entropy is

H (x) = 21 log jj + d2 log 2 + d2

(E.10)

E.2.2 Relative Entropy For Normal densities q(x) = N (x; q ; q ) and p(x) = N (x; p; p) the KL-divergence is d (E.11) pj T D[qjjp] = 0:5 log jj + 0 : 5 Trace (   q ) + 0:5(q p) p (q p) p q j 2 1

1

where jpj denotes the determinant of the matrix p.

E.3 The Multinomial Distribution If a random variable x can take one of one m discrete values x ; x ; ::xm and 1

2

p(x = xs) = s

(E.12)

then x is said to have a multinomial distribution.

E.4 The Dirichlet Distribution If  = [ ;  ; :::m ] are the parameters of a multinomial distribution then 1

2

q() = (tot )

m  s Y s s=1

1

( s )

(E.13)

de nes a Dirichlet distribution over these parameters where

tot =

X s

s

(E.14)

The mean value of s is s=tot .

E.4.1 Relative Entropy For Dirichlet densities q() = D(; q ) and p() = D(; p) where the number of states is m and q = [q (1); q (2); ::; q (m)] and p = [p(1); p(2); ::; p(m)]. the KL-divergence is

D[qjjp] =

(log qtot ) + (log ptot) +

m X

(q (s) 1)( (q (s)) (qtot ) log (q (s))(E.15)

s=1 m X

(p(s) 1)( (q (s)) (qtot ) log (p(s))

s=1

where

qtot = ptot = and () is the digamma function.

m X s=1 m X s=1

q (s) p(s)

(E.16)