5: Estimation: Maximum Likelihood Method

5: Estimation: Maximum Likelihood Method 1 Principle and examples The principle of the Maximum likelihood method (M L) is to estimate a parameter a ...
Author: Ross Norris
39 downloads 0 Views 116KB Size
5: Estimation: Maximum Likelihood Method 1

Principle and examples

The principle of the Maximum likelihood method (M L) is to estimate a parameter a by finding the b which maximizes the likelihood: value a d log L | =0 da a=ba Note that one does not need to ”justify” such an estimator: it is an estimator, the open questions being: is it consistent, unbiased and efficient? Example: lifetime with acceptance Suppose we run an experiment to measure the lifetime of an elementary particle, with limited acceptance: the decay time can only be measured for those events for time Tmin < T < Tmax , both of the same order of magnitude as the actual lifetime. The probability function, after proper normalization,is then: P (ti ; t0 ) = log L =

e−Tmin /t0

1 −ti /t0 1 e −T /t max 0 t0 −e

N X

[− log t0 −

i=1

d log L dt0

= −

ti − log(e−Tmin /t0 ) − e−Tmax /t0 )] t0

N X ti X (Tmin /t20 )e−Tmin /t0 − (Tmax /t20 ) e−Tmax /t0 + − t0 t20 e−Tmin /t0 − e−Tmax /t0 i i

We are looking for the quantity tb0 which maximizes this quantity: 0 = −N tb0 − i ti

N

ti + N

Tmax e−Tmax /tb0 − Tmin e−Tmin /tb0 1 − e−Tmin /tb0

i

P

tb0 =

X

+

Tmax e−Tmax /tb0 − Tmin e−Tmin /tb0 e−Tmin /tb0 − e−Tmax /tb0

which then needs to be solved numerically. See the code at ML.m and note the alternative to solving the equation. Example: weighting measurements with Gaussian errors Assume that the same quantity X has been measured N times, each time giving the result xi with a precision of σi , the error being normally distributed. What is the maximum likelihood estimator of X? N X

N X √ (xi − X)2 log L = − log(σi 2π) − 2σi2 i=1 i=1

d log L dX

=

N X xi − X i=1

σi2 1

N b X X

σ2 i=1 i b X

P

=

xi

σi2

P xi /σi2 P 2

=

1/σi

This is the weighted average intuitively justified in section 3.2.2. Note that one can evaluate its variance by: d2 log L dX 2 V =

= −

i=1

−1


σi2

1 2 i=1 1/σi

PN

which, of course, in the case of measurements with the same uncertainty, becomes V = σ 2 /N .

2

Basic properties of the ML method

It can be shown that ML estimators are usually consistent. However the bad news is that they are usually biased. Let’s first illustrate this with an

2.1

Example of a bias: estimating µ and σ.

We want to estimate the mean µ and the standard deviation σ of a normal distribution for which we have N measurements. The likelihood is: log L = −

N X

N X √ (xi − µ)2 log(σ 2π) − 2σ 2 i=1 i=1

N 1 X (xi − µ) σ 2 i=1

∂ log L ∂µ

=

∂ log L ∂σ

= +

N X (xi − µ)2 i=1

σ3



X1 i

σ

And the solution to this system of two equations is: b = µ b2 σ

=

N 1 X xi N i=1 N 1 X b)2 (xi − µ N i=1

which is the biased estimator for σ ! The reason for such a bias is that ML estimators have the nice property of being invariant b, then the ML under parameter change: by construction, if the ML estimator of the parameter a is a b). The price one has to pay for this is estimator of a related parameter b = f (a) will be bb = f (a clearly biaseness since in general < f (a) > 6= f (< a >) But the good news is that in frequent situations, one can recover the property of unbiaseness by b) = fd finding the function f (a (A) which indeed will be an unbiased estimator of f (a). 2

2.2

Example of the benefit of a parameter change:

Let’s consider the classic exponential distribution f (x, λ) = λe−λx The Maximum Likelihood method gives: log L = N log λ −

N X

λxi

i=1

d log L dλ

=

b = λ

N N X − xi λ i=1

N xi

P

P

Unfortunately this estimator is clearly biased since < i xi > is indeed 1/λ but < 1/ Repeating the exercise with the different parameterization t0 = 1/λ yields: f (x, t0 ) =

P

i xi

>6= λ.

1 − tx e 0 t0 log L = −N log t0 − d log L dt0

= −

d log L dt0

=

N X xi

t i=1 0

N N X xi + t0 i=1 t20

P

tb0 =

N xi − t0 ) ( t20 N P xi N

Which an unbiased and moreover efficient estimator, since we can, as explained is section 4.3.4, write d log L = (tb0 − t0 )f (t0 ) dt0

3

Asymptotic properties of the Maximum Likelihood

At large N , since the ML estimator is in general consistent, it is unbiased. Let us show that it is also efficient: suppose that the true value of a is a0 . By construction, one has: d log L | =0 da a=ba Because of consistency, we can perform an expansion around a = a0 : d2 log L d log L b − a0 ) |a=a0 + (a |a=a0 = 0 da da2 b differs from the true value a0 because the derivative of of log L at a0 differs from zero due to a statistical fluctuations in the data. We saw in section 4.3.2 that d log L/da has zero expectation.

3

The central limit theorem ensures it to be gaussian since it is obtained by summing N independent terms for each of the xi ’s and its variance is: V(

d log L d log L 2 d2 log L |a=a0 ) =< ( ) >= − < > da da da2

b − a0 , which is also gaussian of mean 0, has a variance of: Thus the distribution of a b − a0 ) = V (a

d2 log L > 2 − d2 log da L ( da2 |a=a0 )2


= − |a=a0 ∂ai ∂aj ∂ai ∂aj ∂ai ∂aj

Errors of an unbiased, efficient ML estimator

We are now working either at large N or at moderate N, where, in the latter case, we assume that we have a ML estimator which we know is unbiased and efficient. In either case, according to section 4.3.4, one can write, because the MVB is reached : d log L b − a) = A(a)(a da b is independent from a: Differentiating this with respect to a gives, noting once more that a

d2 log L dA b − a) − A = (a da da2 b is unbiased and A in independent of the xi ’s: Taking the expectation value gives, noting that a

0 − A =
da2

b yields: Evaluating again the same formula at a = a

A=−

d2 log L | da2 a=ba

b can be written: Thus, the inverse of the variance of a

d2 log L d2 log L >= − | da2 da2 a=ba We saw in the previous section that, thanks to the Central Limit Theorem, the probability distribution b is Gaussian. For this to be exactly true, the second derivative d2 log L/da2 has to be constant; for a b)−1 = − < V (a

4

for it to be a good approximation, it has to change little over the range of a close to a0 . Then we can integrate the above equation d log L b − a) = A(a)(a da into: A 2 L(xi ; a) = αe− 2 (a−ba) √ which means that the log likelihood function is a parabola. 1/ A, the standard deviation of the L b. gaussian, is also the standard deviation of the estimator a Conclusion: if a Maximum Likelihood estimator is unbiased and efficient, the asymptotic properties hold at moderate N: the likelihood function is a Gaussian, and the log-likelihood is a parabola whose parameter is the standard deviation of the estimator.

5 5.1

More examples of the Maximum Likelihood at work Another example illustrating the benefit of parameter change:

One measures an angle θ by two independent measurements of cos θ and sin θ with the same uncertainty σ (errors are normal). What are the ML estimators of tan θ and θ? Are they biased? Efficient? Let’s call c the measurement of cos θ and s that of sin θ; one has, noting the Normal distribution N (x; µ, σ) : L = N (c; cos θ, σ) · N (s; sin θ, σ) (c − cos θ)2 (s − sin θ)2 log L = + 2σ 2 2σ 2 1 [1 + c2 + s2 − 2c cos θ − 2s sin θ] log L = 2σ 2 d log L 1 = (c sin θ − s cos θ) dθ σ2 s tan θb = c By symmetry arguments, one can show that tan θb is a biased estimator while θb is not; a further differentiation of d log L/dθ gives the Minimum Variance Bound: d2 log L dθ2

1 (s sin θ + c cos θ) σ2 σ2 = σ2 < s sin θ + c cos θ >

= −

MV B =

This estimator is also efficient, since, from the law of propagation of errors, replacing the estimators by their expectation value: dθ dθ ∂ tan ∂ tan )2 V (s2 ) + ( )2 V (c2 ) ∂s ∂c 1 s2 = σ2 2 + σ2 4 c c 2 1 = σ 2 c 2 b V (θ) = σ

dθ) = ( V (tan

5

b the estimator of θ, is indeed As a exercise, one can write a computer program to show that θ, efficient. This is illustrated in

CS.m, SinCos.m

5.2

Estimating the mean of a Cauchy distribution

The Cauchy distribution

1 1 π 1 + (x − a)2 has the nasty property that its variance is infinite. Thus the intuitive estimator of the location parameter a, 1 X b= a xi N i f (x) =

will have an infinite variance. One can nonetheless handle such a situation at the expense of “solving” a high degree polynomial equation: log L = −

X

log[1 + (xi − a)2 ] − N log π

i

d log L da

=

X i

2(xi − a) =0 1 + (xi − a)2

This is an equation of degree 2N − 1 in a, and up to 2N − 1 different solutions may exist, N of which will correspond to maxima of the Likelihood Function. Usually the best value corresponding to the highest maximum of L is near to the sample median. This median may therefore be taken as the starting value in a iterative search for the maximum of L. Note that, at the limit of large N , the minimum variance bound is 2 MV B = N See the code in CAUCHY.m, LNCAUCHY.m

6

Summary of the properties of the Maximum Likelihood method 1. ML estimators are in most cases consistent but in general biased; quite often, a variable change can provide an unbiased estimator. 2. When N → ∞, they become unbiased and efficient and d2 log L d2 log L 1 =−< >= − |a=a0 b) V (a da2 da2 b is normally distributed, the distribution of L is normal, and the log-likelihood is a parabola a 1 a − a0 2 log L = − ( ) b) V (a 2

3. At moderate N , IF the ML estimator is known to be unbiased and efficient, the above is true b replacing a0 by a 4. The most important advice: while writing a program for finding the minimum of − log L, never attempt to write your own minimization package, especially when you are dealing with more than one parameter. 6