The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi .
Chapter 5: The Empirical Bayes Approach – p. 1/14
The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi . Examples: θi = =
proportion of defectives in supplier’s lot i mean bushels of corn per acre for a random selection of farmers in a given county
Chapter 5: The Empirical Bayes Approach – p. 1/14
The empirical Bayes approach Arises from the multiparameter estimation problem where known relationships among the coordinates of θ = (θ1 , . . . , θk )′ suggest pooling information across the similar experiments, to get a better estimate of each θi . Examples: θi = =
proportion of defectives in supplier’s lot i mean bushels of corn per acre for a random selection of farmers in a given county
These problems have a long history: “random effects models” “mixed models” – the latter gave rise to Proc MIXED in SAS! Chapter 5: The Empirical Bayes Approach – p. 1/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown.
Chapter 5: The Empirical Bayes Approach – p. 2/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior.
Chapter 5: The Empirical Bayes Approach – p. 2/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s.
Chapter 5: The Empirical Bayes Approach – p. 2/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p.
Chapter 5: The Empirical Bayes Approach – p. 2/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p. Use the data to estimate the prior or the marginal distribution directly.
Chapter 5: The Empirical Bayes Approach – p. 2/14
The empirical Bayes approach Morris (’83 JASA) classified EB methods into two categories: parametric and nonparametric: Parametric EB (PEB): We assume the prior for θ is in some parametric class p(θ|η), where only η (the hyperparameter) is unknown. b , and plug in to get estimate the hyperparameter by η the estimated posterior. well-developed in a series of papers by Efron and Morris in the mid-70’s. Nonparametric EB (NPEB): We assume only that the θi are iid from some distribution p. Use the data to estimate the prior or the marginal distribution directly. pioneered/championed by Robbins (1950’s; actually older than PEB) Chapter 5: The Empirical Bayes Approach – p. 2/14
Nonparametric EB basics Start with compound sampling model: iid
iid
yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k
Chapter 5: The Empirical Bayes Approach – p. 3/14
Nonparametric EB basics Start with compound sampling model: iid
iid
yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k
Under squared error loss, the Bayes estimate is posterior mean: θˆi (y) = =
R
EG (θi |y)
uyi +1 −u e dG(u) yi! R uyi e−u dG(u) yi!
= =
EG (θi |yi )
(yi +1)mG (yi +1) . mG (yi )
Chapter 5: The Empirical Bayes Approach – p. 3/14
Nonparametric EB basics Start with compound sampling model: iid
iid
yi |θi ∼ f (yi |θi ) = P oisson(θi ) and θi ∼ p(·), i = 1, . . . , k
Under squared error loss, the Bayes estimate is posterior mean: θˆi (y) = =
R
EG (θi |y)
uyi +1 −u e dG(u) yi! R uyi e−u dG(u) yi!
= =
EG (θi |yi )
(yi +1)mG (yi +1) . mG (yi )
⇒ The “Robbins Miracle”: θˆi (y) is directly estimable as ′ s = (y + 1)] (y + 1)[#y (y + 1) m ˆ (y + 1) i i i i G = . θˆi (y) = ′ m ˆ G (yi ) [#y s = yi ]
Chapter 5: The Empirical Bayes Approach – p. 3/14
Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far...
Chapter 5: The Empirical Bayes Approach – p. 4/14
Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1)
Chapter 5: The Empirical Bayes Approach – p. 4/14
Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose.
Chapter 5: The Empirical Bayes Approach – p. 4/14
Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose. The maximizing G (the NPML) is a finite mixture of degenerate distributions with no more than k mass points, computable via the Expectation-Maximization (EM) algorithm (C&L Sec 5.2.2).
Chapter 5: The Empirical Bayes Approach – p. 4/14
Nonparametric EB summary Maritz and Lwin (1988) discuss “Simple EB,” a generalization of this idea for non-Poisson families. But can’t take it very far... New idea: Use mGˆ (yi + 1) instead of m ˆ G (yi + 1) Computationally more feasible, and enables imposition of appropriate structure (monotonicity, convexity, etc.) that the empirical cdf doesn’t impose. The maximizing G (the NPML) is a finite mixture of degenerate distributions with no more than k mass points, computable via the Expectation-Maximization (EM) algorithm (C&L Sec 5.2.2). On the whole NPEB, can do quite well in wide variety of scenarios (i.e., when true prior is bimodal), and has spawned research into fully Bayesian nonparametric approaches (C&L Sec 2.6). Chapter 5: The Empirical Bayes Approach – p. 4/14
Parametric EB basics indep
Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k
Chapter 5: The Empirical Bayes Approach – p. 5/14
Parametric EB basics indep
Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid
Stage 2: θi ∼ p(θi |η), i = 1, . . . , k
Chapter 5: The Empirical Bayes Approach – p. 5/14
Parametric EB basics indep
Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid
Stage 2: θi ∼ p(θi |η), i = 1, . . . , k
Suppose we seek point estimates for the θi . The marginal distribution of y = (y1 , . . . , yk ) is Z m(y|η) = f (y|θ)p(θ|η)dθ #" k # Z "Y k Y = fi (yi |θi ) p(θi |η) dθ i=1
=
k Z Y i=1
i=1
fi (yi |θi )p(θi |η)dθi =
k Y i=1
mi (yi |η)
Chapter 5: The Empirical Bayes Approach – p. 5/14
Parametric EB basics indep
Stage 1: Yi |θi ∼ fi (yi |θi ), i = 1, . . . , k iid
Stage 2: θi ∼ p(θi |η), i = 1, . . . , k
Suppose we seek point estimates for the θi . The marginal distribution of y = (y1 , . . . , yk ) is Z m(y|η) = f (y|θ)p(θ|η)dθ #" k # Z "Y k Y = fi (yi |θi ) p(θi |η) dθ i=1
=
k Z Y i=1
i=1
fi (yi |θi )p(θi |η)dθi =
k Y i=1
mi (yi |η)
⇒ yi are marginally independent (and iid if fi = f for all i) Chapter 5: The Empirical Bayes Approach – p. 5/14
Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)
Chapter 5: The Empirical Bayes Approach – p. 6/14
Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)
But if we assume η is unknown and estimate it from the marginal distribution of all the data, m(y|η), we get the estimated posterior, b) , p(θi |yi , η
b=η b (y), usually obtained as a MLE or method where η of moments (MOM) estimate from m(y|η).
Chapter 5: The Empirical Bayes Approach – p. 6/14
Parametric EB basics (cont’d) Similarly, the posterior for θi depends on the data only through yi , namely fi (yi |θi )p(θi |η) p(θi |yi , η) = mi (yi |η)
But if we assume η is unknown and estimate it from the marginal distribution of all the data, m(y|η), we get the estimated posterior, b) , p(θi |yi , η
b=η b (y), usually obtained as a MLE or method where η of moments (MOM) estimate from m(y|η).
Now take θbi to be the mean of the estimated posterior. b. Note that θbi depends on all the data through η
Chapter 5: The Empirical Bayes Approach – p. 6/14
Example: Normal/Normal model ind
yi |θi ∼ N (θi , σ 2 ), i = 1, . . . , k, σ 2 known ; iid
θi ∼ N (µ, τ 2 ), i = 1, . . . , k, (µ, τ 2 ) both unknown.
We know m(yi |µ, τ 2 ) = N (µ, σ 2 + τ 2 ), so 2
m(y|µ, τ ) =
k Y i=1
1 √ 2π(σ 2 + τ 2 )1/2
2 (yi − µ) . exp − 2 2 2(σ + τ )
Chapter 5: The Empirical Bayes Approach – p. 7/14
Example: Normal/Normal model ind
yi |θi ∼ N (θi , σ 2 ), i = 1, . . . , k, σ 2 known ; iid
θi ∼ N (µ, τ 2 ), i = 1, . . . , k, (µ, τ 2 ) both unknown.
We know m(yi |µ, τ 2 ) = N (µ, σ 2 + τ 2 ), so 2
m(y|µ, τ ) =
k Y i=1
1 √ 2π(σ 2 + τ 2 )1/2
2 (yi − µ) . exp − 2 2 2(σ + τ )
Maximizing this as a function of (µ, τ 2 ), we get µ b = y¯ and τb2 = (s2 − σ 2 )+ ≡ max{0, s2 − σ 2 } ,
where y¯ = k1 Σyi and s2 = k1 Σ(yi − y¯)2 .
Chapter 5: The Empirical Bayes Approach – p. 7/14
Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B
σ2 σ 2 +b τ2
=
σ2 σ 2 +(s2 −σ 2 )+
∈ [0, 1].
Chapter 5: The Empirical Bayes Approach – p. 8/14
Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B
σ2 σ 2 +b τ2
=
σ2 σ 2 +(s2 −σ 2 )+
∈ [0, 1].
The PEB point estimator is the mean of this dist’n: b µ + (1 − B)y b i = B b y¯ + (1 − B)y b i θbiP EB = Bb
This is sometimes called a “shrinkage” estimator, since every point estimate will be “shrunk back” toward the b is grand mean y¯ from its original estimate yi . Also, B sometimes called a “shrinkage factor.” Chapter 5: The Empirical Bayes Approach – p. 8/14
Example: Normal/Normal model Thus the estimated posterior is b µ + (1 − B)y b i , (1 − B)σ b 2 , p(θi |yi , µ b, τb2 ) = N Bb b= where µ b = y¯ and B
σ2 σ 2 +b τ2
=
σ2 σ 2 +(s2 −σ 2 )+
∈ [0, 1].
The PEB point estimator is the mean of this dist’n: b µ + (1 − B)y b i = B b y¯ + (1 − B)y b i θbiP EB = Bb
This is sometimes called a “shrinkage” estimator, since every point estimate will be “shrunk back” toward the b is grand mean y¯ from its original estimate yi . Also, B sometimes called a “shrinkage factor.” Intuitively, shinkage makes sense here: problems are independent, but similar.
Chapter 5: The Empirical Bayes Approach – p. 8/14
Illustration: Morris’ Baseball Data i
player
i
player
1
.346
10
.378
.298
F. Howard
.356
4
Johnstone
5
yi
θi
yi
θi
Clemente
.400
Swoboda
.244
.230
2
F. Robinson
11
Unser
.222
.264
3
.276
12
Williams
.222
.256
.333
.222
13
Scott
.222
.303
Berry
.311
.273
14
Petrocelli
.222
.264
6
Spencer
.311
.270
15
E. Rodriguez
.222
.226
7
Kessinger
.289
.263
16
Campaneris
.200
.285
8
L. Alvarado
.267
.210
17
Munson
.178
.316
9
Santo
.244
.269
18
Alvis
.156
.200
For players i = 1, . . . , 18, yi =
batting average after first 45 at bats in 1970,
θi =
true 1970 batting ability (pretend the final 1970 averages measure this) Chapter 5: The Empirical Bayes Approach – p. 9/14
Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B
Chapter 5: The Empirical Bayes Approach – p. 10/14
Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B
Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B
Chapter 5: The Empirical Bayes Approach – p. 10/14
Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B
Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B
Results show that the PEB point estimates work well:
Chapter 5: The Empirical Bayes Approach – p. 10/14
Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B
Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B
Results show that the PEB point estimates work well: individually: in 16 of the 18 cases, (θbiP EB − θi )2 < (yi − θi )2 (smaller individual risk)
Chapter 5: The Empirical Bayes Approach – p. 10/14
Illustration: Morris’ Baseball Data b = .788. Data: y¯ = .265, B
Use our normal/normal EB model, so that b y¯ + (1 − B)y b i = .788(.265) + .212 yi θbiP EB = B
Results show that the PEB point estimates work well: individually: in 16 of the 18 cases, (θbiP EB − θi )2 < (yi − θi )2 (smaller individual risk) overall: aggregate MSE numbers are: P18 M SE(y) = i=1 (yi − θi )2 = .077 P18 ˆP EB P EB b M SE(θ ) = i=1 (θi − θi )2 = .022 (PEB has smaller ensemble risk) Chapter 5: The Empirical Bayes Approach – p. 10/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3!
Chapter 5: The Empirical Bayes Approach – p. 11/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc.
Chapter 5: The Empirical Bayes Approach – p. 11/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961)
Chapter 5: The Empirical Bayes Approach – p. 11/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either:
Chapter 5: The Empirical Bayes Approach – p. 11/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either: b frequentist risk: EY|θ L(θ, θ(y))
Chapter 5: The Empirical Bayes Approach – p. 11/14
Theoretical support for PEB It turns out that the PEB estimate will always have lower ensemble risk in this setting provided k ≥ 3! Surprising! The PEB estimate has better frequentist risk (MSE) properties than the usual, “unshrunk” estimate, which itself is the MLE, UMVUE, etc. This general area is called Stein Estimation (Stein, 1955; James and Stein, 1961) In summary, PEB point estimates have excellent ensemble risk performance, with respect to either: b frequentist risk: EY|θ L(θ, θ(y)) preposterior (or “EB”) risk: b b b Eθ,Y L(θ, θ(y)) = Eθ EY|θ L(θ, θ(y)) = EY Eθ|Y L(θ, θ(y)) Chapter 5: The Empirical Bayes Approach – p. 11/14
What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) .
Chapter 5: The Empirical Bayes Approach – p. 12/14
What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) . In the normal/normal, the 95% naive EBCI is p E(θi | yi , ηˆ) ± 1.96 Var(θi | yi , ηˆ) .
Chapter 5: The Empirical Bayes Approach – p. 12/14
What about EB interval estimation? Taking the upper and lower α/2-points of the estimated posterior p(θi |y, ηˆ) gives a 100 × (1 − α)% credible set for θi : P (θi ≤ qα (yi , η) | θi ∼ p(θi |yi , η)) = α , then the naive EBCI is qα/2 (yi , ηˆ), q1−(α/2) (yi , ηˆ) . In the normal/normal, the 95% naive EBCI is p E(θi | yi , ηˆ) ± 1.96 Var(θi | yi , ηˆ) .
“Naive” since the variance approximates only the first term in the true posterior variance, Var(θi |y) = Eη|y [Var(θi |yi , η)] + Varη|y [E(θi |yi , η)] .
The naive EBCI is ignoring the posterior uncertainty about η ⇒ naive interval may be too short. Chapter 5: The Empirical Bayes Approach – p. 12/14
Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)]
Chapter 5: The Empirical Bayes Approach – p. 13/14
Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α
for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ .
Chapter 5: The Empirical Bayes Approach – p. 13/14
Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α
for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ . marginal posterior approach: place a hyperprior ψ(η) on η , and base EBCI for θi on the marginal posterior, Z lh (θi |y) = p(θi |y, η)h(η|y)dη , where h(η|y) ∝ m(y|η)ψ(η) ← “Bayes empirical Bayes”
Chapter 5: The Empirical Bayes Approach – p. 13/14
Possible remedies for EBCIs Morris: get a “plug in” estimate for Varη|y [E(θi |yi , η)] bias corrected naive method: solve Eηˆ,yi |η P (θi ≤ qα (yi , ηˆ) | θi ∼ p(θi |yi , η)) = α
for α′ = α′ (ˆ η , α), and take the naive interval with α replaced by α′ . marginal posterior approach: place a hyperprior ψ(η) on η , and base EBCI for θi on the marginal posterior, Z lh (θi |y) = p(θi |y, η)h(η|y)dη , where h(η|y) ∝ m(y|η)ψ(η) ← “Bayes empirical Bayes” Trouble: All of these are hard to do outside of low-dimensional, conjugate, two-stage models!
Chapter 5: The Empirical Bayes Approach – p. 13/14
So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept
Chapter 5: The Empirical Bayes Approach – p. 14/14
So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand...
Chapter 5: The Empirical Bayes Approach – p. 14/14
So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.
Chapter 5: The Empirical Bayes Approach – p. 14/14
So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.
But recent work shows that such hyperpriors can actually have significant impact on the resulting posteriors (at least for the variances) lead to MCMC convergence failure – or apparent convergence when joint posterior is improper!
Chapter 5: The Empirical Bayes Approach – p. 14/14
So dump EB and return to full Bayes? EB still easier for non-Bayesians to accept Bayes requires MCMC computing, which can be tricky and may tempt us to fit models larger than the data can support – or we can understand... Example: Choice of “vague” hyperprior for a variance component τ 2 . Most widespread current choice: τ 2 ∼ Gamma(ǫ, ǫ); E(τ 2 ) = 1, V ar(τ 2 ) = 1/ǫ.
But recent work shows that such hyperpriors can actually have significant impact on the resulting posteriors (at least for the variances) lead to MCMC convergence failure – or apparent convergence when joint posterior is improper! EB approach (replacing τ 2 by τˆ2 ) may produce estimates that are still improved, yet safer to use. Chapter 5: The Empirical Bayes Approach – p. 14/14