Gaussian Distribution

Gaussian Distribution The Gaussian distribution is the most widely known distribution, and the most widely used. 1 P(x; µ, σ ) = 2 πσ (x−µ ) 2 − 2 e...

Author: Egbert Freeman

1 downloads 4 Views 1MB Size

Report

Download PDF

Recommend Documents

The Gaussian distribution (normal distribution)

The Multivariate Gaussian Probability Distribution

Gaussian Filtering. Gaussian filtering

Random variables, probability distributions, expected value, variance, binomial distribution, Gaussian distribution (normal distribution)

Gaussian

Visual Computing Gaussian Distribution, Maximum Likelihood Estimation Solution

Warped Gaussian Processes

GAUSSIAN ELIMINATION - REVISITED

Lasers and Gaussian beams

Chem350: Thermochemistry using Gaussian

Gaussian Processes for Classification

3. The Gaussian kernel

Gaussian Beam Propagation

Gaussian Kernel Smoothing

Gaussian Probability Density Functions

Comparative study in work-function variation: Gaussian vs. Rayleigh distribution for grain size

On the Distribution of the Maximum of a Gaussian Field with d Parameters

ON THE DISTRIBUTION OF THE MAXIMUM OF A GAUSSIAN FIELD WITH d PARAMETERS 1

Accelerating Spatially Varying Gaussian Filters

Multi-task Gaussian Process Prediction

3 Gaussian elimination (row reduction)

Electronic Structure calculations in Gaussian

An Introduction to Gaussian Geometry

Information Rates of Nonparametric Gaussian Process Methods

Gaussian Distribution The Gaussian distribution is the most widely known distribution, and the most widely used.

1 P(x; µ, σ ) = 2 πσ

(x−µ ) 2 − 2 e 2σ

The mean is µ and the variance is σ2. All Gaussians are similar in shape and symmetric, as opposed to the Binomial or Poisson distribution, and easily characterized. E.g., 68.3% of the probability lies within 1 standard deviation of the mean 95.45% within 2 standard deviations 99.7% within 3 standard deviations FWHM = 2.35σ May 4, 2009

Data Analysis

1

Derivation of Gauss Distribution We consider two derivations of the Gauss function. First, the derivation starting from the binomial distribution. The appropriate limit in this case is N→∞ and r →∞ and p not too small and not too big. We have already seen that this leads to a symmetric distribution. Binomial N=50, p=0.5 Gaussian µ=25,σ2=Np(1-p)

We will need Stirling’s approximation We now substitute in the Binomial formula May 4, 2009

Data Analysis

ln n!≈ ln 2πn + n ln n − n or

 n n n!≈ 2πn    e 2

Gaussian - derivation N! 2πN (N /e) N r N −r r N −r f (r;N, p) = p (1− p) ≈ p (1− p) r!(N − r)! 2πr(r /e) r 2π (N − r)((N − r) /e) N −r 1 = 2π

N NN r N −r p (1− p) r(N − r) r r (N − r) N −r

1 N N +1 r N −r = p (1− p) 2πN r r+1/ 2 (N − r) N −r+1/ 2

or €

−r−1/ 2  N − r −N +r−1/ 2 r 1 r f (r;N, p) ≈ p (1− p) N −r      N  2πN  N 

Doesn’t look much like the Gaussian … May 4, 2009

€

Data Analysis

3

Derivation-cont. Change variables r=Np+ξ. ξ measures the distance from the mean of the binomial, Np, and the measured quantity, r. The variance of a binomial is Np(1-p), so the typical deviation of r from Np is given by

σ = Np(1− p) Terms of the form ξ/r will therefore be of order 1/√N and will be small. Furthermore, ln(1+ ξ / N ) ≈ ξ / N −1/ 2(ξ / N )2 € First the rewrite in terms of ξ

 r −r−1/ 2 −r−1/ 2 = ( p + ξ /N ) = p−r−1/ 2 (1+ ξ /N)−r−1/ 2  € N  N − r −r−1/ 2 −N +r−1/ 2 −N +r−1/ 2 = (1− p) 1− ξ /N(1− p)   ( )  N  €

May 4, 2009

Data Analysis

4

Derivation-cont. −r−1/ 2  N − r −N +r−1/ 2 r 1 r f (r;N, p) ≈ p (1− p) N −r      N  2πN  N 

so

−r−1/ 2 −N +r−1/ 2   1 ξ  ξ  = 1+  1−  2πN p(1− p)  Np   N(1− p) 

Rewrite in exponential form and use approximations from last page f (r;N, p) ≈

€

   1 ξ  ξ  exp(−r −1/2)ln1+  + (−N + r −1/2)ln1−  2πNp(1− p)  Np   N(1− p)  

  ξ 1  ξ  2  1 = exp(−Np − ξ −1/2) −    2πNp(1− p)   Np 2  Np   2   ξ 1  ξ   + (−N(1− p) + ξ −1/2)− −    N(1− p) 2 N(1− p)      2   ξ 1  ξ 2   1 ξ 1  ξ   ≈ exp−Np −    − N(1− p)− −    2πNp(1− p) Np 2 Np N(1− p) 2 N(1− p)            1 ξ2 = exp− σ 2 = Np(1− p)  2πNp(1− p)  2Np(1− p)

May 4, 2009 €

Data Analysis

5

A different derivation Here we follow the argument used by Gauss. Gauss wanted to solve the following problem: What is the form of the function ϕ(xi-µ) which gives a maximum probability for µ=arithmetic mean of the observed values {xi}.  f ( x | µ) = ϕ (x1 − µ)ϕ (x 2 − µ)ϕ (x n − µ) is the probability to get {xi} n

∑x

i

Gauss wanted this function to peak at df =0 dµ µ =x

⇒

d n =0 ∏ ϕ (x i − µ) dµ i=1 µ =x

Assuming f (µ = x ) ≠ 0,€ ∑ i

ϕ′ Define ψ = ϕ Then ∑ zi = 0 i

May 4, 2009

µ=

i =1

n

ϕ ′(x i − x ) =0 φ (x i − x )

zi = x i − x ∑ψ (zi ) = 0

for all possible z i, so ψ ∝ z

Data Analysis

6

i

Gauss’ derivation-cont. dϕ

 kz 2  dz  ψ = kz ⇒ = kz, or ϕ (z) ∝ exp  2 ϕ We get the prefactor via normalization. Lessons: € •  Binomial looks like Gaussian for large enough N,p •  Poisson also looks like Gaussian for large enough n •  Gauss’ formula follows from general arguments (maximizing posterior probability) •  Gauss’ formula is much easier to use than Binomial or Poisson, so use it when you’re allowed.

May 4, 2009

Data Analysis

7

Comparison Gaussian-Poisson Four events expected Binomial: N p 10 0.4

2.4

4

0.48

Poisson: 4

ν 4 

4

4

Gaussian: µ 4

• 

•  May 4, 2009

Data Analysis

σ2 2.4

0

In this case, the Binomial more closely resembles a Gaussian than does the Poisson Note, for Binomial, can change N,p 8

Smaller number expected

Binomial: N p 2 0.9

0.18

1.8

-0.14

Poisson: ν 1.8

1.8

1.8

1.8

Gaussian: µ 1.8

σ2 0.18

0

In general, need to use Poisson or Binomial when dealing with small statistics or p≅0,1 May 4, 2009

Data Analysis

9

Larger number expected

Binomial: N p 100 0.1

10

9

7.2

Poisson: 10

10

ν 10 

10

Gaussian: µ 10

σ2 9

0

For large numbers, Gaussian excellent approximation.

May 4, 2009

Data Analysis

10

Some Applications When we don’t know better, we use a Gaussian for unknown probability distributions. E.g., the distribution of systematic deviations from the true values. This can sometimes be justified with the Central Limit Theorem. When reporting uncertainties on a measurement, we quote ±1σ values. These are understood as Gaussian standard deviations, and therefore refer to a probability that our measurement is within the uncertainty from the true value (68.3% central probability interval).

May 4, 2009

Data Analysis

11

Over-applications From a book review of The (Mis)behavior of Markets: A Fractal View of Risk, Ruin, and Reward Benoit Mandelbrot and Richard L. Hudson. Review by Ian Kaplan:

Bachelier claimed that the change in market prices followed a Gaussian distribution. This distribution describes many natural features, like height, weight and intelligence among people. The Gaussian distribution is one of the foundations of modern statistics. If economic features followed a Gaussian distribution, a range of mathematical techniques could be applied in economics. Unfortunately, as Mandelbrot points out in The (Mis)behavior of Markets, the foundation of this new era of economics was rotten. …There are far more market bubbles and market crashes than these models suggest. The change in market prices does not follow a Gaussian distribution in a reliable fashion. Like income distribution, market statistics frequently follow a power law. When a graph is made of market returns (e.g., profit and loss), the curve will not fall toward zero as sharply as a Gaussian curve. The distribution of market returns has "fat tails". The "fat tails" of the return curve reflect risk, where large losses and profits can be realized. May 4, 2009

Data Analysis

12

Gaussian Distribution 1 P( x; µ , σ ) = e 2π σ

( x−µ )2 − 2σ 2

The Gaussian distribution is very important in practice: many distributions resemble Gaussians, and the Gaussian distribution is relatively easy to work with – can be used to estimate uncertainties, etc. Central Limit Theorem underlies much of this, so we look into the derivation to understand how is arises. First introduce characteristic functions. These will be generally useful

May 4, 2009

Data Analysis

13

Characteristic Function A characteristic function is a moment generating function

ϕ (k ) = ∫ dx eikx p ( x) It is simply the Fourier Transform of the p.d.f. Expand the exponential, 1 2 2 i 3 3   ϕ (k ) = ∫ dx p ( x) 1 + ikx − k x − k x +  2! 3!   n

( k2 2 ik ) n = 1 + ik x − x ++ x + 2! n! so d nϕ (k ) n n = i x dk n k =0 May 4, 2009

Data Analysis

14

Characteristic Function Characteristic function for a Gaussian: ∞

ϕ (k) = ∫ dx e

ikx

−∞

1 2πσ

 (x− µ ) 2  e− 2σ 2 

2  1 x  µ    1 ∞ k 2σ 2  = ∫ dx exp −  −  + ikσ   exp ikµ −      2πσ −∞ 2 σ σ 2    

=

k 2σ 2 − ikµ e e 2

∞

where we have used ∫ e

−z 2 / a 2

dz = a π

−∞

so

ϕ (k) = May 4, 2009

k 2σ 2 − ikµ e e 2 Data Analysis

15

Characteristic Function Suppose x is a random variable with pdf px (x) and y is an independent random variable with pdf py (y) and z = f (x, y). We are interested in the probability that z lies in the interval z → z + dz. Call this pz (z)dz the characteristic function of z is

ϕ z (k) = ∫ e ikz pz (z)dz = ∫ ∫ e ikf (x,y ) px (x) dx py (y) dy

Make sure this is clear

Once we have the characteristic function, we can get the pdf for z with an inverse Fourier Transform

pz (z) =

1 −ikz ∫ e ϕ z (k) dk 2π

May 4, 2009

Data Analysis

16

Central Limit Theorem concrete example, suppose z = x + y

ϕ z (k) = ∫ ∫ e ikx p(x) dx e iky q(y) dy or The characteristic function of a sum of r.v.s is ϕ z (k) = ϕ x (k) ϕ y (k) the product of the individual char. fns. We now use this to prove the CLT: Suppose we make n measurements of x.The average of the measurements is 1 a = ( x1 + x2 ++ xn ) n What is the distribution of a ? It's simpler to consider the distribution of a − µ , Q(a − µ ), where µ =< x > Φ(k) =

∫e

ik (a−µ )

May 4, 2009

Q(a − µ ) da Data Analysis

17

Central Limit Theorem-cont. Φ(k) =

ik [(x1 − µ )++(x n − µ )] n p(x ∫e

1 )dx1  p(x n )dx n

  ik (x − µ )   ik (x − µ ) =  ∫ e n p(x1 )dx1  ∫ e n p(x n )dx n      1

n

  k n = ϕ  where ϕ (k) is the characteristic function of x − µ   n 

ϕ (k) = ∫ e ik(x− µ ) p(x) dx k2 k 2σ 2 2 = 1+ ik x − µ − (x − µ) +  = 1− + 2 2 so 2

2

n

2

 1k σ  1k σ Φ(k) = [ϕ (k /n)] = 1− +  → 1− 2  2 n  n →∞ 2 n n

May 4, 2009

Data Analysis

2

=

n →∞

k 2σ 2 − e 2n

18

Central Limit Theorem-cont. To get the pdf, we use an inverse Fourier transform k kσ  − − 1 1 n 1 −ik(a− µ ) 2n Q(a − µ) = e = ∫ dk e ∫ dk e−ik(a− µ ) e 2ξ 2π 2π σ  2πξ 2

Q(a − µ) = P(a) =

n 2πσ

2

2

2

  

n ( a− µ ) 2 − 2 e 2 σ

The distribution of the average of a large number of measurements of a random variable x (given here by a) follows a Gaussian distribution. The width of the Gaussian is given by

σ ξ= where σ is the standard deviation of x n May 4, 2009

Data Analysis

The shape of the initial distribution is unimportant ! 19

Central Limit Theorem-Example 10 experiments where we sample 10 times randomly from a flat distribution. The data are shown as the black bars. The red bar gives the mean for the 10 samples.

May 4, 2009

Data Analysis

20

Central Limit Theorem-Example The mean value from 1000 experiments each with 10 samplings of the distribution. The red curve is a Gaussian with: µ=0.5 and σ=

1 1 12 10

Do you understand how the factors arise ? May 4, 2009

Data Analysis

21

Central Limit Theorem - conclusion When results are presented, the uncertainties are usually quoted assuming Gaussian distributions: •  For event counting, we have seen that the Binomial and Poisson reduce to the Gaussian distribution for large numbers of events (≥ 25 or so). The statistical error (1 Gaussian standard deviation) is then taken to be σ=√N (from Poisson distribution). •  For other types of uncertainties (so-called systematic uncertainties or systematic errors), again a Gaussian distribution is often assumed to describe the distribution of the measured relative to the true. This is usually justified with the CLT, although it is a rather indirect use. Examples of systematic uncertainties: energy calibration, alignment, time dependence, … May 4, 2009

Data Analysis

22

Full Width Half Maximum (FWHM) This quantity is often used instead of σ to quantify the width of a distribution: (x−µ)2 1 G(x; µ, σ) = √ e− 2σ2 2πσ Peak at x = µ

0.2 0.175 0.15 0.125

−

FWHM : e

0.075 0.05

F W HM ≈ 2.35σ

0.025 0

(x−µ)2 2σ 2

= 0.5 √ x = µ ± σ 2 ln 2

0.1

-20

-15

-10

-5

May 4, 2009

0

5

10

15

20

Data Analysis

23

Gaussian used for Binomial or Poisson Probability of r successes in N trials N! f (r; N , p) = p r q N −r r!( N − r )!

where q = 1 − p

Number of combinations - Binomial coefficient ! Binomial : µ = N p; σ = N p(1 − p)

E[n]=ν by definition σ2=ν variance=mean most important property √ Poisson : µ = ν; σ = ν

ν n e −ν f (n; ν ) = n!

€

May 4, 2009

Data Analysis

24

Poisson Distribution-cont. So,

ν=0.1

ν=0.5

€ ν=1.0

ν=5.0

ν=20. April 27, 2009

E[n]=ν by definition σ2=ν variance=mean most important property

ν n e −ν f (n; ν ) = n!

ν=2.0

ν=10.

ν=50.

Notes: •  As ν increases, the distribution becomes more symmetric •  Approximately Gaussian for ν>20 •  Poisson formula is much easier to use that the Binomial formula. Data Analysis

25

Gaussian used for Binomial or Poisson Gaussian is a continuous distribution, whereas Binomial and Poisson are discrete. Need to integrate Gaussian to get probability for a given outcome. Poisson

E.g.,

f (n; ν) G(n; µ = µ, σ =

√

ν)

= =

e−ν ν n n! ! n+0.5 n−0.5

Comparison:

May 4, 2009

(x−ν)2 1 − 2ν dx √ e 2πν

√

f (3; 0.5) = 0.013 G(3, 0.5, 0.5) = 0.0023 √ f (10; 9.) = 0.12 G(10, 9., 9) = 0.12 Data Analysis

26

Cumulative Distribution Function for Gaussian CDF

= =

!

(x! −µ)2 1 − 2σ2 dx! √ e 2πσ −∞ # " 1 x−µ 1 + erf ( √ ) The ‘error function’ is available in 2 2σ many computer math libraries.

x

Sum and difference of two independent Gaussian distributed quantities: u = x+y (u−µu )2 1 − 2σ2 u √ p(u; µx , σx , µy , σy ) = e 2πσu µ = µ + µ σ2 = σ2 + σ2 u

v

=

p(u; µx , σx , µy , σy )

=

May 4, 2009

x

y

u

x

y

x−y (v−µ )2 1 − 2σ2v v √ e 2πσv µv = µx − µy σv2 = σx2 + σy2 Data Analysis

27

Multivariate Gaussian 

   T µ ! =   

µ1 µ2 . . . µN

       



cov(x1 , x1 )  cov(x2 , x1 )   Σ=    cov(xN , x1 )

f (x1 , x2 , ..., xN ) =

cov(x1 , x2 )

.

.

cov(x1 , xN )

.

cov(xN , xN )

1 (2π)N/2 |Σ|1/2

       

!

" 1 exp − ("x − µ " )T Σ−1 ("x − µ ") 2

Example: Bivariate

f (x, y) =

2πσx σy

May 4, 2009

1 !

"

1 exp − 2(1 − ρ2 ) 1 − ρ2 Data Analysis

"

x y 2ρxy + 2 − 2 σx σY σx σy 2

2

28

##