Hierarchical Bayes Models. Hierarchical Bayes Models. Hierarchical Bayes Models. Hierarchical Bayes Models

Hierarchical Bayes Models • • • Hierarchical Bayes Models • The hierarchical Bayes idea has become very important in recent years It allows us to en...
Author: Cameron Allen
2 downloads 2 Views 2MB Size
Hierarchical Bayes Models • • •

Hierarchical Bayes Models •

The hierarchical Bayes idea has become very important in recent years It allows us to entertain a much richer class of models that can better capture our statistical understanding of the problem With the advent of MCMC, it has become possible to do the calculations on these much more complex models, thus rendering the hierarchical Bayes approach much more practical

Nonhierarchical models have a very simple structure, e.g., p(X, " | x) # p(x | X, " )$ (X)$ (" )



This can be represented by a DAG (Directed Acyclic Graph), which diagrams the dependencies of each variable on!others. Since x is stochastically dependent on X and ! in the above model, we get the following DAG: !

X x

Bayesian Inference

1/18/06

36

Bayesian Inference

Hierarchical Bayes Models •

1/18/06

37

Hierarchical Bayes Models •

The basic idea in a hierarchical model is that when you look at the likelihood function, and decide on the right priors, it may be appropriate to use priors that themselves depend on other parameters not mentioned in the likelihood. These parameters themselves will require priors, which themselves may (or may not) depend on new parameters. Eventually the process terminates when we no longer introduce new parameters

Thus we might have something like

p(X, " ,W ,V, Z | x) # p(x | X, " )$ (X | W ,V )$ (W )$ (V )$ (" | Z )$ (Z )

• !

To which we would associate the DAG shown in the figure. The new bits in red represents the new hierarchical structure (W and V are not part of the likelihood)

W

V

Z

!

X x

Bayesian Inference

1/18/06

38

Bayesian Inference

1/18/06

39

Normal Hierarchical Model •



Normal Hierarchical Model •

Consider the case of measuring the metallicity of a class objects. We can expect that the measured metallicity varies from star to star for two reasons: (1) measurement error and (2) intrinsic (“cosmic”) scatter from object to object. Our aim is to estimate the metallicity of the class, as well as the individual metallicity of each star in the sample

Thus we postulate the following model

xi ~ N(" i , # i2 ), where # i2 is known

" i ~ N (µ, $ 2 ) µ ~ flat $ ~ flat on (0,%)

!

Bayesian Inference

1/18/06

40

Bayesian Inference

1/18/06

Normal Hierarchical Model •

Normal Hierarchical Model •

The DAG corresponding to this model is

#

µ xi ~ N(" i , # i2 ), where # i2 is known 2

" i ~ N (µ, $ ) µ ~ flat $ ~ flat on (0,%)

41

"

• x

Writing down the posterior probability explicitly we see that N ) & 1 1 p(" , µ, # | x) $ N , exp(% 2 (µ % " i )2 + ' 2# * # i=1 N ) & 1 -, exp(% 2 (" i % xi )2 + ' 2. i * i=1 Routine completion of the square in the first term shows that we can sample Gibbs on µ using

! !

µ | x," , # ~ N(" , # 2 / N ) where " is the mean of the "’s

! Bayesian Inference

1/18/06

42

Bayesian Inference

!

1/18/06

43

Normal Hierarchical Model •

Normal Hierarchical Model •

Compare the formula with the code below:

Writing down the posterior probability explicitly we see that N ) & 1 1 p(" , µ, # | x) $ N , exp(% 2 (µ % " i )2 + ' 2# * # i=1

µ | x," , # ~ N(" , # 2 / N ) mu = rnorm(1,mean(theta),tau/sqrt(N)) !

N ) & 1 -, exp(% 2 (" i % xi )2 + ' 2. i * i=1 Similarly, we can sample Gibbs on # using



" | x,# ~ InverseChisquare(S, N $1) !

where N

S = $ (µ " # i )2

! Bayesian Inference

1/18/06

44

i=1

Bayesian Inference

1/18/06

45

!

Normal Hierarchical Model •

Normal Hierarchical Model •

Compare the code with the formula:

" | x,# ~ InverseChisquare(S, N $1) N

S = $ (µ " # i )2 i=1

Writing down the posterior probability explicitly we see that N ) & 1 1 p(" , µ, # | x) $ N , exp(% 2 (µ % " i )2 + ' 2# * # i=1

N ) & 1 -, exp(% 2 (" i % xi )2 + ' 2. i * i=1 • By completing the square of the terms involving the individual "i, we find that we can also sample the ! individual "i:

!

tau = sqrt(sum((mu-theta)^2)/ rchisq(1,N-1)) !

" i | x, µ, # ~ N(ai , si2 )

! Bayesian Inference

1/18/06

46

Bayesian Inference

1/18/06

47

Normal Hierarchical Model •

Where



Thus we are using a mean ai that is a weighted mean of the observation xi and the group mean µ: The "i will “shrink” ! towards the group mean. This is very typical in hierarchical models. The variance is the usual variance for a weighted mean.

Normal Hierarchical Model •

x /" 2 + µ / # 2 ai = i 2i , 1/ " i +1/ # 2 1 1 1 = + si2 " i2 # 2

" i | x, µ, # ~ N(ai , si2 ) ! = 1/(1/sigma^2+1/tau^2) S a = (X/sigma^2+mu/tau^2)*S ! theta = rnorm(N,a,sqrt(S))

• Bayesian Inference

1/18/06

48

Bayesian Inference



Early Season EM Estimator Late Season 0.400 0.290 0.346 0.378 0.286 0.298 0.356 0.281 0.276 0.333 0.277 0.222 0.311 0.273 0.273 0.311 0.273 0.270 0.289 0.268 0.263 0.267 0.264 0.210 0.244 0.259 0.269 0.244 0.259 0.230 0.222 0.254 0.264 0.222 0.254 0.256 0.222 0.254 0.303 0.222 0.254 0.264 0.222 0.254 0.226 0.200 0.249 0.285 0.178 0.244 0.316 0.156 0.239 0.200

1/18/06

1/18/06

49

Example: RR Lyrae Statistical Parallax

The data are taken from a baseball example (Efron and Morris) quoted in Peter Lee’s book Bayesian Analysis, Second Edition. Clemente F Robinson F Howard Johnstone Berry Spencer Kessinger L. Alvardo Santo Swoboda Unser Williams Scott Petrocelli E Rodriguez Campaneris Munson Alvis

Loops are inefficient in R; this is much faster than a loop

Bayesian Inference

Normal Hierarchical Model •

Again, compare formula and code: x /" 2 + µ / # 2 ai = i 2i , 1/ " i +1/ # 2 1 1 1 = + si2 " i2 # 2

50

Here’s a more complicated astronomical example that illustrates the power of the hierarchical Bayes idea. We consider the problem of determining the absolute magnitude of RR Lyrae stars using proper motion and radial velocity data

Bayesian Inference

1/18/06

51

Example: RR Lyrae Statistical Parallax • •

Example: RR Lyrae Statistical Parallax •

Our data are the proper motions µ , the radial velocities & , and the apparent magnitudes m of the stars, the latter assumed measured with negligible error. The proper motions are related to the cross-track velocities by multiplying the former by the distance s to the star:

V"

µ

V

s

V|| sˆ = s / s is the unit vector!towards the star, V || = sˆ"V ! ! sˆ"V # = 0

sµ " V # By assuming that the proper motions and radial velocities are characterized by the same kinematical parameters, we can (statistically) infer s and then the absolute magnitude ! M of the star through the well-known relationship: s = 10

Temporarily suppressing the subscript i, for each star we have from linear algebra and the geometry of the figure

V = (V$# ,V%# ,V || ) = V # + V|| = V # + sˆV ||

0.2(m"M +5)

V|| = sˆsˆ"V (projects V onto sˆ ) V # = V & V|| = (I & sˆsˆ")V (projects V onto plane of sky)

Bayesian Inference

!

1/18/06

52

Bayesian Inference

1/18/06

53

!

The Likelihood •

Quantities Being Estimated •

If µo=(µ$o, µ%o,0) are the components of the observed proper motion vector (in the plane of the sky), and &o is the observed radial velocity, we assume

µ"o ~ N (V"# / s, $ µ2% ) µ%o ~ N(V%# / s, $ µ2% ) & o ~ N(V || , $ &2 )

• •

The variances are measured at the telescope and assumed known perfectly. The variables without superscripts are the “true” values. The likelihood is just the product of all these normals over ! all stars in the sample

Bayesian Inference

1/18/06



54

The posterior distribution is a function of at least the following quantities: • The true velocities Vi of the stars • The mean velocity V0 of the stars relative to the Sun • The three-dimensional covariance matrix W that describes the distribution of the stellar velocities • The distance si to each star, (in turn a function of each star’s associated absolute magnitude Mi) • The mean absolute magnitude M of the RR Lyrae stars as a group In addition, we will introduce several additional variables for convenience.

Bayesian Inference

1/18/06

55

Priors Defining the Hierarchical Model •



Priors Defining the Hierarchical Model

The priors on the Vi are assigned by assuming that the velocities of the individual stars are drawn from a (threedimensional) multivariate normal distribution:



Vi | V0 , W ~ N(V0 , W)



We choose an improper flat prior on V0, the mean velocity of the RR Lyrae stars relative to the velocity of the Sun !

The prior on W requires some care. One wants a proper posterior distribution (flat priors are improper, for example, but there is no problem if the posterior distribution is proper) To satisfy this demand, while maintaining a reasonably noncommittal (vague) prior, we chose a “hierarchical independence Jeffreys prior” on W, which for a threedimensional distribution implies

" (W) #| $I + W |%2

! Bayesian Inference

1/18/06

56

Bayesian Inference

Priors Defining the Hierarchical Model •

57

Priors Defining the Hierarchical Model •

The distances si are related to the Mi and M by an exact (defined) relationship: si = 10 0.2(mi "M i +5)



1/18/06

Illustrating the correlation between M and one of the Mi (doesn’t look bad but there are hundreds of stars in the sample, which exacerbates the problem)

We introduce new variables Ui=Mi–M, because preliminary calculations indicated that sampling would be ! on the variables Ui (the Mi are highly more efficient correlated with each other and with M, but the Ui are not). Therefore we write si = 10 0.2(mi "M "U i +5)

! Bayesian Inference

1/18/06

58

Bayesian Inference

1/18/06

59

Priors Defining the Hierarchical Model •

Priors Defining the Hierarchical Model •

Illustrating low correlation between M and one of the Ui

Bayesian Inference

1/18/06

60

Illustrating low correlation between the various Ui

Bayesian Inference

Priors Defining the Hierarchical Model • •

1/18/06

61

DAG Describing Hierarchical Model •

We choose a flat prior on M (we have also tried a somewhat informative prior based on known data with not much change). Evidence indicates a “cosmic scatter” of about 0.15 magnitudes in Mi. Thus a prior on Ui of the form

The following DAG is a simplified description of the structure of the hierarchical model V

W

U i ~ N(0,(0.15)2 )

Ui

seems appropriate.

M

Vi

!

µi,!i

Bayesian Inference

1/18/06

62

Bayesian Inference

1/18/06

63

DAG Describing Hierarchical Model •

Sampling Strategy •

The following DAG shows the structure of the hierarchical model in more detail

We can use Gibbs sampling to sample on V0

V0 | {Vi }, W ~ N(V, W / N ) where N is the number of stars in our sample and V=

!

1 " Vi N

!

Bayesian Inference

1/18/06

64

Bayesian Inference

1/18/06

Sampling Strategy •

Sampling Strategy •

We can also use Gibbs sampling to sample on W; however, we cannot generate W directly, but instead use a rejection sampling scheme. Generate a candidate W* using

We can also use Gibbs sampling to sample on W; however, we cannot generate W directly, but instead use a rejection sampling scheme. Generate a candidate W* using

W* | {Vi }, V0 ~ InverseWishart(T, df = N )

W* | {Vi }, V0 ~ InverseWishart(T, df = N )

where

where

T = # (Vi " V0 )(Vi " V0 )$

!



Then accept or reject the candidate with probability



| W* |2 / | I + W* |2 ! Repeat until successful. This scheme is very efficient for large degrees of freedom.

!

T = # (Vi " V0 )(Vi " V0 )$



Inverse Wishart is a multivariate Then accept or reject the candidate with version of probability inverse chi-square



| W* |2 / | I + W* |2 ! Repeat until successful. This scheme is very efficient for large degrees of freedom.

! Bayesian Inference

65

! 1/18/06

66

Bayesian Inference

1/18/06

67

Sampling Strategy •



Sampling Strategy •

Vi can be sampled in a Gibbs step; it involves data on the individual stars as well as the general velocity distribution, and the covariance matrix Si of the velocities of the individual stars depends on si because of the relation sµ = V " Omitting the algebraic details, the resulting sampling scheme is Vi ~ N(ui ,Z i )

We sample the Ui using a Metropolis-Hastings step. Our proposal is Ui* ~ N(Ui, w) with an appropriate w, adjusted for good mixing. Recalling that si is a function of Ui, the conditional distribution is proportional to si2 N(siµ i + sˆ iVi|| " V0 , W)

with the informative prior on Ui we described earlier,

!

U i ~ N(0,(0.15)2 )

!

where "1 "1 Z i = (S"1 i + Wi ) o "1 ui = Z i (S"1 i Vi + W V0 )

!

Vio = siµ oi + sˆ i # io Bayesian Inference

1/18/06

68

Bayesian Inference

1/18/06

69

!

Sampling Strategy •



We sample on M using a Metropolis-Hastings step. Our proposal for M* is a t distribution centered on M with an appropriate choice of degrees of freedom and variance, adjusted for good mixing. Recalling that si is a function of M, the conditional is proportional to

# s N(s µ 2 i



Sampling History of M(ab)

i

i

The samples on M show good mixing

+ sˆ iVi|| " V0 , W)

with a prior on M that may or may not be informative (as discussed earlier). We found that dof=10 and variance=0.01 on the t proposal ! distribution mixed well.

Bayesian Inference

1/18/06

70

Bayesian Inference

1/18/06

71

Marginal Density of M(ab) •

Marginal Density of M(c) •

Plotting a smoothed histogram of M displays the posteror marginal distribution of M

Bayesian Inference

1/18/06

72

The same, for the overtone pulsators

Bayesian Inference

1/18/06

73

Suggest Documents