for i = 1,..., n 1 if Y i is observed 0 if Y i is missing

Missing Data Outline 1. Notation and Definitions 2. Modeling Strategies (a) Selection models (b) Pattern mixture models 3. Assumptions: MCAR, MAR, Ign...
0 downloads 0 Views 97KB Size
Missing Data Outline 1. Notation and Definitions 2. Modeling Strategies (a) Selection models (b) Pattern mixture models 3. Assumptions: MCAR, MAR, Ignorable 4. Missing covariates (a) Selection model approach (b) Computational strategies 5. Example: Human Fertility

Notation: Yi iid ∼ fY (θ) for i = 1, . . . , n 

Mi = Interest: Inference on θ

 

1 if Yi is observed 0 if Yi is missing

How to proceed? Complete-case analysis: Focus on observed Y ’s Assumption: Complete cases are random subsample Alternative: Model joint distribution of Y & M

Illustration Suppose we send out a questionaire to 100 Duke professors We request information on their current body weight (bw) Our interest is in estimating the bw distribution Only 55 of the profs responded to the questionaire There are 45 missing observations

Density 0.005 0.010 0.015 0.020 0.025

Estimated body weight distribution

All profs

0.0

Profs providing data

150

200 Body Weight (lbs)

250

Modeling the Missing Data Mechanism Selection model: fY,M (yi, mi | θ, ψ) = fY (yi | θ) fM |Y (mi | yi, ψ) fY (yi | θ) = Complete-data model for Y fM |Y (mi | yi, ψ) = Model for missing data mechanism

Pattern-mixture model: fY,M (yi, mi | ϕ, π) = fY |M (yi | mi, ϕ) fM (mi | π) fY |M (yi | mi, ϕ) = conditional distribution of Y

Application to Body Weight Data Let y1, . . . , y100 denote the body weights for the 100 profs surveyed Let m1, . . . , m100 denote 0/1 indicators that the survey was returned Selection Model Likelihood: 

2 −n/2

(2πσ )

  Y  n n 1 X 1−mi 2 mi exp − 2 (yi − µ) h(yi; ψ) {1 − h(yi; ψ)} , 2σ i=1 i=1 

where h(yi; ψ) = Pr(M = 1 | Y = yi). Can we choose a non-informative prior for θ = (σ, µ)?

Point 1: For mi = 1, yi is unknown (i.e., a latent variable) Point 2: ψ is typically not identified from the data Point 3: Analyses with missing data must rely on mechanistic assumptions or informative priors

We simulated the prof weight data under the selection model with µ = 185 σ = 20 exp{ − 1 + 2(yi − µ)/σ} Pr(M = 1 | Y = yi) = h(yi; ψ) = 1 + exp{ − 1 + 2(yi − µ)/σ} Under this model, the probability of not returning the questionaire increases with body weight For example, the heavier profs may be embarassed to respond (or may respond inaccurately, which is a different issue)

Such informative missingness occurs commonly in biomedical studies & can lead to bias If the missingness model is known, it is straightforward to adjust for bias Potentially, sensitivity analyses can be conducted using a range of models

Pattern-Mixture Likelihood: 

2 −n/2

(2πσ )

  Y  n n 1 X 2 mi 1−mi (yi − µ − miα) π (1 − π) exp − 2 . 2σ i=1 i=1 

The normal mean depends on whether or not the observation is missing, which is not the case for the selection model For α = 0, random subsample assumption holds & complete-case analysis is appropriate Under the random subsample assumption, the pattern-mixture & selection model likelihoods are equivalent. Identifiability of α necessarily relies on an informative prior.

Identifying assumption: Missing Completely At Random (MCAR): M is independent of Y : fY (yi | θ) = fY |M (yi | mi, ϕ) θ = ϕ fM |Y (mi | yi, ψ) = fM (mi | π) ψ = π Under MCAR, Selection & Pattern-mixture models equivalent Alternative: Informative prior Note: When data are multivariate, MCAR assumption can be relaxed without sacrificing identifiability

Now suppose that we measure covariates X = (X1, . . . , Xp)0 for each subject Consider the extended selection model incorporating X, fY,M |X (yi, mi | xi, θ, ψ) = fY |X (yi | xi, θ) fM |Y,X (mi | yi, xi, ψ) If M is conditionally independent of Y given X, then fM |Y,X (mi | yi, xi, ψ) = fM |X (mi | xi, ψ) and the data are said to be Missing At Random (MAR)

Under MAR, whether or not an observation is missing depends only on the observed data Under MAR, it is appropriate to ignore the missingness process when making inference on θ if θ is distinct from ψ.

Ignorability The observed data likelihood L(θ, ψ | yobs, m, x) is n Z  Y

fY |X (yi | xi, θ) fM |Y,X (0 | yi, xi, ψ)

i=1 

1−m

i

× fY |X (ymis | xi, θ) fM |Y,X (1 | ymis, xi, ψ)

m

i

dymis

Under MAR, this reduces to n Y i=1

fM |Y,X (mi | xi, ψ) fY |X (yi | xi, θ)

1−mi

Z

fY |X (ymis | xi, θ) dymis.

Hence, as long as θ and ψ are distinct (i.e., have disjoint sample spaces), inference on θ can be based on the likelihood proportional to Y

i:mi =0

fY |X (yi | xi, θ).

Example Suppose we know the age (X1) & gender (X2) of all 100 profs Complete-data model: yi = β1 + β2xi1 + β3xi2 + i,

i ∼ N(0, σ 2)

Missing-data model: logit Pr(M = 1 | Y = yi, X1 = xi1, X2 = xi2) = ψ1 + ψ2xi1 + ψ3xi2 Under this model, missingness is conditionally independent of weight given age & gender

• Suppose older male profs are heavier & less likely to respond. • Then integrating X1, X2 out of the above expressions leads to a positive association between non-response & weight (as in the earlier model) • If we are willing to assume that violations of MCAR are explained by age & gender, we can focus on the complete-case regression model.

Missing Covariates Suppose interest focuses on relating a response Y to covariates X = (X1, . . . , Xp)0 • It is extremely common to be missing data on a subset of X for some of the subjects under study • For example, suppose we included questions about medical history, demographic characteristics, and diet in the questionaire sent to the profs. Some questions would likely be left blank. • Standard software discards all information collected on a subject having any missing data. • This complete-case approach can be very inefficient.

Selection Model Approach Let mi = (mi1, . . . , mip)0 denote the missing data indicators for covariates X1, . . . , Xp, respectively Model components: 1. Conditional likelihood of response: fY |X (yi | xi, θ) 2. Sampling density of X: fX (xi | τ ) 3. Missing-data model: fM |Y,X (mi | yi, xi, ψ) If there were no missing data, we would focus on 1. Typically, unless some sort of probability weighting scheme is used, fX needs to be modeled even if interest is in θ.

Imputing the Missing Values A Simple Bootstrap Approach: 1. Fill in the missing X’s by drawing a random sample with replacement from the empirical distribution of the observed X’s. 2. Repeat the process multiple times to produce multiple “complete” data sets. 3. Fit each complete data set using standard software. 4. Average the parameter estimates & obtain a variance estimate that accounts for within- and between-imputation uncertainty. Very easy to implement & does not require specification of fX Unfortunately, does require MCAR Numerous imputation schemes have been proposed in the literature.

Data Augmentation within MCMC Choose a prior for θ, τ, ψ. Consider the unknown X’s as latent data Apply the following MCMC algorithm: 1. Impute the missing X’s by sampling from their full conditional distribution. 2. Conditional on the completed data, follow standard Gibbs sampling (or other) steps for updating the parameters θ, τ, ψ. 3. Repeat steps 1-2 for a large number of iterations. Often, the conditional distributions required for implementation of this algorithm follow a simple form.

Human Fertility Modeling Example Data collected for women attempting pregnancy 

yij =

  

xijk =

  

mijk =

 

1 if conception occurs in cycle j 0 otherwise 1 if intercourse on day k of cycle j 0 otherwise 1 if missing intercourse data for day k 0 otherwise

Conception Probability Model: K Y



Pr(yij = 1 | xij ) = ω 1 −

(1 − pk )

xijk



k=1

Intercourse Occurrence Model: Pr(xijk = 1) = πk ,

for k = 1, . . . , K

Missing Data Model: Pr(mijk = 1 | xijk = 0) = λ0,

Pr(mijk = 1 | xijk = 1) = λ1

Once the missing intercourse indicators are filled in, the Gibbs sampling algorithm of Dunson & Zhou (2000, JASA) can be used for posterior computation. To extend the Dunson & Zhou algorithm, we simply add a step to impute the missing intercourse indicators by sampling from their full conditional distribution, which is d [mijk | yij , xij(k), ·] = 

Bernoulli

Pr(Y = yij | xijk

where λ∗k =

 Pr(Y = yij | xijk = 1, ·) λ∗k , = 1, ·)λ∗k + Pr(Y = yij | xijk = 0, ·)(1 − λ∗k )

λ1 π k . λ1πk + λ0(1 − πk )

A similar strategy can be used to accommodate informative missingness in a broad variety of applications

1.0

Estimated Pregnancy Probabilities

0.0

Pr(Pregnancy | Cycle Viable) 0.2 0.4 0.6 0.8

Uncorrected Corrected

-5

-4

-3 -2 Day Relative to Ovulation

-1

0

Summary • Selection and Pattern mixture models can be used to accommodate missing data. • Under MCAR or MAR assumptions, the missing data mechanism is typically ignorable. • In missing covariate applications, there can be a substantial loss of efficiency associated with discarding of subjects with missing observations - even under MCAR or MAR. • To avoid restrictive assumptions, one often needs to model the missing data mechanism. • Given lack of identifiability, informative priors should be used & sensitivity analyses conducted.

Assignment: 1. Complete the following exercise: Suppose yi ∼ N(µ, σ 2) and let mi be an indicator that yi is missing, for i = 1, . . . , n. Suppose that it is reasonable to assume that logit Pr(mi = 1 | yi, xi) = α + x0iβ, where xi = (xi1, . . . , xip)0 is a vector of known covariates. (a) Generate data under the above model with yi ∼ N(x0iθ, 1), where xi ∼ N5(0, I5×5), θ = β = (1, 1, 1, 1, 1)0, α = −1, and n = 250. (b) Estimate µ by maximum likelihood using a complete case analysis. (c) Does MCAR hold? Why or why not? (d) Does MAR hold? Why or why not? (e) Is the complete case estimate unbiased? (f) Propose a strategy for posterior computation of µ, σ: i. With the sampling density of xi assumed MVN. ii. With the sampling density of xi unspecified.

Suggest Documents