The empirical Bayes approach

The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk...

Author: Anthony Glenn

2 downloads 1 Views 171KB Size

Report

Download PDF

Recommend Documents

BAYES AND EMPIRICAL BAYES ESTIMATION FOR THE CHAIN LADDER MODEL

Sports betting: A source for empirical Bayes

A PAC-Bayes approach to the Set Covering Machine

An Empirical Approach to the Normality

Entrepreneurial intuition, an empirical approach

Hierarchical Bayes Models. Hierarchical Bayes Models. Hierarchical Bayes Models. Hierarchical Bayes Models

AN EMPIRICAL APPROACH TO INSTRUMENTAL PERFORMANCE x

An Empirical Approach of Wireless Forensics

An empirical approach to the experience of architectural space

Empirical Phenomenology: A Qualitative Research Approach (The Cologne Seminars)

The Promise and Limitations of an Empirical Approach to Law

An Empirical Approach for the Evaluation of Voice User Interfaces

The Likelihood, the prior and Bayes Theorem

Bayes sches Updating (I)

BAYES SCHE STATISTIK

6.3.1 Allgemeiner Bayes-Filter

Bayes Factors and BIC

Today. CS 188: Artificial Intelligence Spring More Bayes Rule. Bayes Rule. Utilities. Expectations. Bayes rule. Expectations and utilities

Bayes-Verfahren in klinischen Studien

ENDORSING EMPLOYEE ENGAGEMENT THROUGH HUMAN CAPITAL APPROACH. AN EMPIRICAL RESEARCH

COST ACCOUNTING IN GREEK HOTEL ENTERPRISES: AN EMPIRICAL APPROACH

Open source software reliability model: an empirical approach

Bayes sche und probabilistische Netze

Learning from Data: Naive Bayes

The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi .

Chapter 5: The Empirical Bayes Approach – p. 1/14

The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi . Examples: θi = =

proportion of defectives in supplier’s lot i mean bushels of corn per acre for a random selection of farmers in a given county

Chapter 5: The Empirical Bayes Approach – p. 1/14

The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi . Examples: θi = =

proportion of defectives in supplier’s lot i mean bushels of corn per acre for a random selection of farmers in a given county

These problems have a long history: “random effects models” “mixed models” – the latter gave rise to Proc MIXED in SAS! Chapter 5: The Empirical Bayes Approach – p. 1/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown.

Chapter 5: The Empirical Bayes Approach – p. 2/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior.

Chapter 5: The Empirical Bayes Approach – p. 2/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s.

Chapter 5: The Empirical Bayes Approach – p. 2/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p.

Chapter 5: The Empirical Bayes Approach – p. 2/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p. Use the data to estimate the prior or the marginal distribution directly.

Chapter 5: The Empirical Bayes Approach – p. 2/14

The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p. Use the data to estimate the prior or the marginal distribution directly. pioneered/championed by Robbins (1950’s; actually older than PEB) Chapter 5: The Empirical Bayes Approach – p. 2/14

Nonparametric EB basics Start with compound sampling model: iid

iid

yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k

Chapter 5: The Empirical Bayes Approach – p. 3/14

Nonparametric EB basics Start with compound sampling model: iid

iid

yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k

Under squared error loss, the Bayes estimate is posterior mean: θˆi (y) = =

R

EG (θi |y)

uyi +1 −u e dG(u) yi! R uyi e−u dG(u) yi!

= =

EG (θi |yi )

(yi +1)mG (yi +1) . mG (yi )

Chapter 5: The Empirical Bayes Approach – p. 3/14

Nonparametric EB basics Start with compound sampling model: iid

iid

yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k

Under squared error loss, the Bayes estimate is posterior mean: θˆi (y) = =

R

EG (θi |y)

uyi +1 −u e dG(u) yi! R uyi e−u dG(u) yi!

= =

EG (θi |yi )

(yi +1)mG (yi +1) . mG (yi )

⇒ The “Robbins Miracle”: θˆi (y) is directly estimable as ′ s = (y + 1)] (y + 1)[#y (y + 1) m ˆ (y + 1) i i i i G = . θˆi (y) = ′ m ˆ G (yi ) [#y s = yi ]

Chapter 5: The Empirical Bayes Approach – p. 3/14

Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far...

Chapter 5: The Empirical Bayes Approach – p. 4/14

Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1)

Chapter 5: The Empirical Bayes Approach – p. 4/14

Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose.

Chapter 5: The Empirical Bayes Approach – p. 4/14

Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose. The maximizing G (the NPML) is a finite mixture of degenerate distributions with no more than k mass points, computable via the Expectation-Maximization (EM) algorithm (C&L Sec 5.2.2).

Chapter 5: The Empirical Bayes Approach – p. 4/14

Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose. The maximizing G (the NPML) is a finite mixture of degenerate distributions with no more than k mass points, computable via the Expectation-Maximization (EM) algorithm (C&L Sec 5.2.2). On the whole NPEB, can do quite well in wide variety of scenarios (i.e., when true prior is bimodal), and has spawned research into fully Bayesian nonparametric approaches (C&L Sec 2.6). Chapter 5: The Empirical Bayes Approach – p. 4/14

Parametric EB basics indep

Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k

Chapter 5: The Empirical Bayes Approach – p. 5/14

Parametric EB basics indep

Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid

Stage 2: θi ∼ p(θi |η), i = 1, . . . , k

Chapter 5: The Empirical Bayes Approach – p. 5/14

Parametric EB basics indep

Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid

Stage 2: θi ∼ p(θi |η), i = 1, . . . , k

Suppose we seek point estimates for the θi . The marginal distribution of y = (y1 , . . . , yk ) is Z m(y|η) = f (y|θ)p(θ|η)dθ #" k # Z "Y k Y = fi (yi |θi ) p(θi |η) dθ i=1

=

k Z Y i=1

i=1

fi (yi |θi )p(θi |η)dθi =

k Y i=1

mi (yi |η)

Chapter 5: The Empirical Bayes Approach – p. 5/14

Parametric EB basics indep

Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid

Stage 2: θi ∼ p(θi |η), i = 1, . . . , k

Suppose we seek point estimates for the θi . The marginal distribution of y = (y1 , . . . , yk ) is Z m(y|η) = f (y|θ)p(θ|η)dθ #" k # Z "Y k Y = fi (yi |θi ) p(θi |η) dθ i=1

=

k Z Y i=1

i=1

fi (yi |θi )p(θi |η)dθi =

k Y i=1

mi (yi |η)

⇒ yi are marginally independent (and iid if fi = f for all i) Chapter 5: The Empirical Bayes Approach – p. 5/14

Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)

Chapter 5: The Empirical Bayes Approach – p. 6/14

Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)

But if we assume η is unknown and estimate it from the marginal distribution of all the data, m(y|η), we get the estimated posterior, b) , p(θi |yi , η

b=η b (y), usually obtained as a MLE or method where η of moments (MOM) estimate from m(y|η).

Chapter 5: The Empirical Bayes Approach – p. 6/14

Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)

But if we assume η is unknown and estimate it from the marginal distribution of all the data, m(y|η), we get the estimated posterior, b) , p(θi |yi , η

b=η b (y), usually obtained as a MLE or method where η of moments (MOM) estimate from m(y|η).

Now take θbi to be the mean of the estimated posterior. b. Note that θbi depends on all the data through η

Chapter 5: The Empirical Bayes Approach – p. 6/14

Example: Normal/Normal model ind

yi |θi ∼ N (θi , σ 2 ), i = 1, . . . , k, σ 2 known ; iid

θi ∼ N (µ, τ 2 ), i = 1, . . . , k, (µ, τ 2 ) both unknown.

We know m(yi |µ, τ 2 ) = N (µ, σ 2 + τ 2 ), so 2

m(y|µ, τ ) =

k Y i=1

1 √ 2π(σ 2 + τ 2 )1/2

2 (yi − µ) . exp − 2 2 2(σ + τ )

Chapter 5: The Empirical Bayes Approach – p. 7/14

Example: Normal/Normal model ind

yi |θi ∼ N (θi , σ 2 ), i = 1, . . . , k, σ 2 known ; iid

θi ∼ N (µ, τ 2 ), i = 1, . . . , k, (µ, τ 2 ) both unknown.

We know m(yi |µ, τ 2 ) = N (µ, σ 2 + τ 2 ), so 2

m(y|µ, τ ) =

k Y i=1

1 √ 2π(σ 2 + τ 2 )1/2

2 (yi − µ) . exp − 2 2 2(σ + τ )

Maximizing this as a function of (µ, τ 2 ), we get µ b = y¯ and τb2 = (s2 − σ 2 )+ ≡ max{0, s2 − σ 2 } ,

where y¯ = k1 Σyi and s2 = k1 Σ(yi − y¯)2 .

Chapter 5: The Empirical Bayes Approach – p. 7/14

Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B

σ2 σ 2 +b τ2

=

σ2 σ 2 +(s2 −σ 2 )+

∈ [0, 1].

Chapter 5: The Empirical Bayes Approach – p. 8/14

Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B

σ2 σ 2 +b τ2

=

σ2 σ 2 +(s2 −σ 2 )+

∈ [0, 1].

The PEB point estimator is the mean of this dist’n: b µ + (1 − B)y b i = B b y¯ + (1 − B)y b i θbiP EB = Bb

This is sometimes called a “shrinkage” estimator, since every point estimate will be “shrunk back” toward the b is grand mean y¯ from its original estimate yi . Also, B sometimes called a “shrinkage factor.” Chapter 5: The Empirical Bayes Approach – p. 8/14

Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B

σ2 σ 2 +b τ2

=

σ2 σ 2 +(s2 −σ 2 )+

∈ [0, 1].

The PEB point estimator is the mean of this dist’n: b µ + (1 − B)y b i = B b y¯ + (1 − B)y b i θbiP EB = Bb

This is sometimes called a “shrinkage” estimator, since every point estimate will be “shrunk back” toward the b is grand mean y¯ from its original estimate yi . Also, B sometimes called a “shrinkage factor.” Intuitively, shinkage makes sense here: problems are independent, but similar.

Chapter 5: The Empirical Bayes Approach – p. 8/14

Illustration: Morris’ Baseball Data i

player

i

player

1

.346

10

.378

.298

F. Howard

.356

4

Johnstone

5

yi

θi

yi

θi

Clemente

.400

Swoboda

.244

.230

2

F. Robinson

11

Unser

.222

.264

3

.276

12

Williams

.222

.256

.333

.222

13

Scott

.222

.303

Berry

.311

.273

14

Petrocelli

.222

.264

6

Spencer

.311

.270

15

E. Rodriguez

.222

.226

7

Kessinger

.289

.263

16

Campaneris

.200

.285

8

L. Alvarado

.267

.210

17

Munson

.178

.316

9

Santo

.244

.269

18

Alvis

.156

.200

For players i = 1, . . . , 18, yi =

batting average after first 45 at bats in 1970,

θi =

true 1970 batting ability (pretend the final 1970 averages measure this) Chapter 5: The Empirical Bayes Approach – p. 9/14

Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B

Chapter 5: The Empirical Bayes Approach – p. 10/14

Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B

Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B

Chapter 5: The Empirical Bayes Approach – p. 10/14

Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B

Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B

Results show that the PEB point estimates work well:

Chapter 5: The Empirical Bayes Approach – p. 10/14

Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B

Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B

Results show that the PEB point estimates work well: individually: in 16 of the 18 cases, (θbiP EB − θi )2 < (yi − θi )2 (smaller individual risk)

Chapter 5: The Empirical Bayes Approach – p. 10/14

Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B

Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B

Results show that the PEB point estimates work well: individually: in 16 of the 18 cases, (θbiP EB − θi )2 < (yi − θi )2 (smaller individual risk) overall: aggregate MSE numbers are: P18 M SE(y) = i=1 (yi − θi )2 = .077 P18 ˆP EB P EB b M SE(θ ) = i=1 (θi − θi )2 = .022 (PEB has smaller ensemble risk) Chapter 5: The Empirical Bayes Approach – p. 10/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3!

Chapter 5: The Empirical Bayes Approach – p. 11/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc.

Chapter 5: The Empirical Bayes Approach – p. 11/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961)

Chapter 5: The Empirical Bayes Approach – p. 11/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either:

Chapter 5: The Empirical Bayes Approach – p. 11/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either: b frequentist risk: EY|θ L(θ, θ(y))

Chapter 5: The Empirical Bayes Approach – p. 11/14

Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either: b frequentist risk: EY|θ L(θ, θ(y)) preposterior (or “EB”) risk: b b b Eθ,Y L(θ, θ(y)) = Eθ EY|θ L(θ, θ(y)) = EY Eθ|Y L(θ, θ(y)) Chapter 5: The Empirical Bayes Approach – p. 11/14

What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) .

Chapter 5: The Empirical Bayes Approach – p. 12/14

What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) . In the normal/normal, the 95% naive EBCI is p E(θi | yi , ηˆ) ± 1.96 Var(θi | yi , ηˆ) .

Chapter 5: The Empirical Bayes Approach – p. 12/14

What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) . In the normal/normal, the 95% naive EBCI is p E(θi | yi , ηˆ) ± 1.96 Var(θi | yi , ηˆ) .

“Naive” since the variance approximates only the first term in the true posterior variance, Var(θi |y) = Eη|y [Var(θi |yi , η)] + Varη|y [E(θi |yi , η)] .

The naive EBCI is ignoring the posterior uncertainty about η ⇒ naive interval may be too short. Chapter 5: The Empirical Bayes Approach – p. 12/14

Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)]

Chapter 5: The Empirical Bayes Approach – p. 13/14

Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α

for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ .

Chapter 5: The Empirical Bayes Approach – p. 13/14

Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α

for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ . marginal posterior approach: place a hyperprior ψ(η) on η , and base EBCI for θi on the marginal posterior, Z lh (θi |y) = p(θi |y, η)h(η|y)dη , where h(η|y) ∝ m(y|η)ψ(η) ← “Bayes empirical Bayes”

Chapter 5: The Empirical Bayes Approach – p. 13/14

Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α

for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ . marginal posterior approach: place a hyperprior ψ(η) on η , and base EBCI for θi on the marginal posterior, Z lh (θi |y) = p(θi |y, η)h(η|y)dη , where h(η|y) ∝ m(y|η)ψ(η) ← “Bayes empirical Bayes” Trouble: All of these are hard to do outside of low-dimensional, conjugate, two-stage models!

Chapter 5: The Empirical Bayes Approach – p. 13/14

So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept

Chapter 5: The Empirical Bayes Approach – p. 14/14

So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand...

Chapter 5: The Empirical Bayes Approach – p. 14/14

So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.

Chapter 5: The Empirical Bayes Approach – p. 14/14

So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.

But recent work shows that such hyperpriors can actually have significant impact on the resulting posteriors (at least for the variances) lead to MCMC convergence failure – or apparent convergence when joint posterior is improper!

Chapter 5: The Empirical Bayes Approach – p. 14/14

So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.

But recent work shows that such hyperpriors can actually have significant impact on the resulting posteriors (at least for the variances) lead to MCMC convergence failure – or apparent convergence when joint posterior is improper! EB approach (replacing τ 2 by τˆ2 ) may produce estimates that are still improved, yet safer to use. Chapter 5: The Empirical Bayes Approach – p. 14/14