Random effect models for longitudinal binary/binomial responses Marco Alf` o
Random effect models for longitudinal binary/binomial responses
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach
Marco Alf`o
Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
with A. Spagnoli and J. Houwing-Duistermaat “Sapienza” Universit` a di Roma, Leiden Univ. Medical Center
London, April 15, 2009
Data analysis - more results Concluding remarks
Table of contents The statistical problem Leiden 85+ data Data MMSE index Modeling issues Modeling approach Random effect structure Modeling initial conditions Data analysis Missing data Ignorable dropouts Non-ignorable dropouts Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Purpose
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data
I
I
I
Define a regression model for longitudinal binomial data to account for dependence on baseline values and random effects, Adopt an adequate modeling structure to obtain reliable ML parameter estimates; Since some data may be missing due to dropout, consider potentially non-ignorable missing data.
Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Notation
Marco Alf` o The statistical problem
We have a two-stage random sample yit , i = 1, . . . , n, t = 1, . . . , T , from a binomial distribution f (y | θ) with covariates X = (xit ), and canonical parameter θ. We will consider binomial longitudinal responses, where i indexes individuals and t indexes time occasions within individuals. In this case, T will denote the designed completion time.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Intra-cluster Dependence
Random effect models for longitudinal binary/binomial responses Marco Alf` o
Primary interest: to allow for apparent and true contagion. I
I
Apparent contagion - Individuals are drawn from heterogeneous populations, each population having a constant, but different, propensity to the event of interest. True contagion - Current and future outcomes are directly influenced by past ones, which cause changes over time in the corresponding distribution.
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
We will consider the effect of the baseline outcome on subsequent responses, with a special focus on random effect models for short panel series.
Data
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
I
I
I
Between September 1997 and September 1999, 705 inhabitants of Leiden (The Netherlands), 85 years old, were eligible to participate to the Leiden 85-plus Study; 599 subjects were enrolled, 14 died before they could be enrolled and 92 refused to participate(der Wiel et al, 2002); Response variable: Mini-Mental State Examination (MMSE) index (Folstein et al, 1975), assessing the global cognitive status of older adults (measured once a year: 85 up to 90);
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
MMSE index
Marco Alf` o
I
Assessed by a standard questionnaire;
I
30 items to be separately answered by each subject; scores on each item are binary (1 for correct answer and 0 otherwise);
I
the MMSE index is defined as follows: Yit =
30 X
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data
Yijt
Yijt ∈ {0, 1}
j=1
where Yijt , i = 1, . . . , n, j = 1, . . . , 30, t = 1, . . . , T represents the MMSE score for the i-th subject on the j-th item a time t.
Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Modeling issues
Random effect models for longitudinal binary/binomial responses Marco Alf` o
I
I
I
Investigate the relationship between annual value of the cognitive function, the baseline value and a set of covariates. Covariates collected at entry time: gender (female=0), educational status classified into two levels (primary=0 and higher levels); A possible approach could be based on a Rasch (1960)-type model:
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results
logit [Pr(Yijt = 1 | xit , ui )] = xT it β + (ui − γj ) where ui and γj , i = 1, . . . , n, j = 1, . . . , 30, represent individual-specific (random) ability and item-specific (fixed) complexity.
Concluding remarks
Random effect models for longitudinal binary/binomial responses
Model structure
Marco Alf` o The statistical problem
I
However, in the following we will consider, as response, the MMSE index defined by: X Yit = Yijt j
i = 1, . . . , n, t = 1, . . . , T . Thus, we (implicitly) assume: I
local (conditional) independence
I
constant (across items and time) item-specific complexity, ie γj = γ ⇒ πijt = πit .
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Variance components
Marco Alf` o
Conditional on unobserved characteristics summarized by the individual-specific vector bi , the responses are assumed to be independent: Yit | bi ∼ Bin(30, πit )
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
The canonical model is
Data analysis Missing data
θit =
xT it β
+
zT it bi
where the random coefficient bi may be associated to a subset of covariates, zit . The bi may be independent multivariate Gaussian rv’s bi ∼ MVN(0, Σ), or arbitrary with unknown density g (b | Σ).
Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Marginal likelihood
Marco Alf` o
The likelihood is
The statistical problem Leiden 85+ data
L(β, Σ) =
n Z Y i=1
"
T Y
# f (yit | bi ) g (bi )dbi .
t=1
Under Gaussian assumptions, the likelihood may be approximated by "T # n . Y X Y L(β, Σ) = f (yit | bk1 ,...,km ) πk1 ,...,km , i=1 k1 ,...,km
t=1
where bk1 ,...,km and πk1 ,...,km are masspoints and masses for (K1 , . . . , Km )-point Gaussian quadrature.
Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Parametric assumptions?
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
The parametric approach may be implemented using a standard EM algorithm; see Aitkin (1999) for details. For an arbitrary (nonparametric) assumption, NPML estimation of the mixing distribution can be achieved in a finite mixture framework using the same algorithm. Computational details of NPML estimation in the context of clustered/longitudinal data analysis are discussed in Aitkin (1999). R libraries: lme4 (Bates and Maechler) and npmlreg (Einbeck, Darnell and Hinde).
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Dependence on baseline
Marco Alf` o The statistical problem
Unobserved heterogeneity may be (partially) accounted for by inserting, in the linear predictor, the response value observed at t = 1 (baseline value). The corresponding model is, for t > 1, Yit | Yi1 , bi ∼ Bin(30, πit ),
t = 2, . . . , T
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
with xT it β
+ αyi1 +
θit
=
bi
∼ g (bi | Σ).
zT it bi ,
Data analysis - more results Concluding remarks
ML estimation
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
In this case, the joint marginal distribution is given by: ) Z (Y T f (yi ) = [f (yit | yi1 , bi ) f (yi1 | bi ) g (bi )] dbi t=2
However, the term f (yi1 | bi ) is not specified by the model assumptions; thus, we need to modify the standard VC approach to account for this modeling structure. This issue can be linked to the initial condition problem in models with both AR and random terms, see Aitkin and Alf` o (1998).
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
How to handle it...
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
Various approaches can be adopted. We discuss: I
Conditional Modeling
I
Joint modeling of first and subsequent occasions
In the first case, we maximize the likelihood conditional on the initial conditions yi1 ; this is not feasible when the baseline values is included as a covariate in a random effect model. In the second case we define a joint model including a modified model for the first occasion, and a standard model for subsequent occasions.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Conditional modeling - 1
Marco Alf` o The statistical problem
Since the bi ’s are shared by all subject i’s responses, the likelihood function results from the following integral:
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach
L(β, Σ | yi1 ) =
(T n Z Y Y i=1
) [f (yit | yi1 , bi )g (bi | yi1 )] dbi
t=2
Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
see Aitkin and Alf`o (1998, 2006). However, the random effect distribution has changed due to conditioning ⇒ parametric assumptions on g (bi ) may not be valid for g (bi | yi1 ).
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Conditional modeling - 2
Marco Alf` o
The random coefficient vector can be rewritten as:
The statistical problem Leiden 85+ data
bi =
[b∗i
+ E (bi | yi1 )] =
b∗i
+ Γyi1
where b∗i is independent from Yi1 . The canonical model is T ∗ θit = xT it β + αyi1 + zit [bi + Γyi1 ]
Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data
When random effect models are considered, we have: T ∗ ˜ i1 + bi∗ θit = xT it β + αyi1 + bi + γyi1 = xit β + αy
Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
This means that the estimated α ˜ will include the effect of the baseline value on the response variable and on the random effect.
Drawbacks
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
The standard VC algorithm can be simply adapted to conditional modeling. However, it discards information about the random effect structure which, in the case of short time series, may produce inefficient parameter estimates. it can not be of any help in the case of random effects models with baseline dependence (in general the corresponding estimate would be biased). Solution: Include the term f (yi1 | bi ) in a full model for all the data.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Model all the data jointly
Marco Alf` o
As before, the model for t ≥ 2 is defined by:
The statistical problem Leiden 85+ data
θit =
xT it β
+ αyi1 +
zT it bi
while for the first occasion we assume:
Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
T θi1 = xT i1 β 1 + zit bi1
Data analysis Missing data
with bi1 = Λbi , Λ = diag (λ1 , . . . , λm ), cov (bi1 ) = ΛΣΛT . The RE covariance matrix may be different on later occasions; the same applies to the fixed parameter vector. This is handled by defining a dummy variable dt = 1, t ≥ 2, and interacting it with appropriate terms.
Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Leiden 85+ data, VC model
Marco Alf` o The statistical problem
Variable Cons. Age Gender Educational σb ` AIC
Coeff. 1.3525 -0.1944 0.2299 1.0240 1.3344 -2756 5522
Std. Err. 0.0783 0.0066 0.1216 0.1207
Table: VC model: parameter estimates
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Leiden 85+ data, VC.bas model
Random effect models for longitudinal binary/binomial responses Marco Alf` o
Variable Cons. Age Gender Educ. MMSE85 Cons.bas Gender.bas Educ.bas σb λ ` AIC
Coeff. -3.3395 -0.1944 -0.0133 0.2264 0.2091 1.4313 0.0526 0.3243 0.8705 1.1432 -2293 4606
Std. Err. 0.0717 0.0065 0.1079 0.0595 0.0050 0.1028 0.1311 0.0420
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results
0.2221
Table: VC.bas model: parameter estimates
Concluding remarks
Intro
Random effect models for longitudinal binary/binomial responses Marco Alf` o
However, 276 of 599 subjects (46.1 %) have incomplete sequences; these may occur due to a variety of selection rules: for example, the individual may refuse to answer to some questions, or drop out during the study. This individual behavior may bias the survey design and question the representativeness of the observed sample in drawing inference about the general population.
The key question is whether those who dropout differ (in any way relevant to the analysis) from those who still remain.
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
More on key question
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
Once other variables in the data have been controlled for, do the missing data depend on the current (unobserved) value of the response variable? If not, the missing data mechanism is said to be ignorable (ID); otherwise, it is said to be non-ignorable (NID). In the first case, one may use standard ML methods for consistent estimation. Otherwise, one has to take into account the missing data mechanism to obtain consistent model parameter estimates.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Some notation
Marco Alf` o
Let us denote by Rit a missing data indicator, defined as: I
Rit = 1 if the i-th unit drops out at any time ∈ (t − 1, t), t = 1, . . . , T ,
I
Rit = 0 otherwise.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach
The number of available responses for the i-th subject is: Si = T −
The statistical problem
T X
Random effect structure Modeling initial conditions
Data analysis Missing data
Rit
t=1
We focus on a special case, assuming that, once a person drops out, he/she is out forever (attrition is an absorbing state).
Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses
Statistically speaking...
Marco Alf` o The statistical problem
If the mechanism is ignorable, the joint density is:
Leiden 85+ data Data MMSE index Modeling issues
f (yiO , ri | bi , φi ) =
Z Z
= =
f (yiO , yiM | bi )h(ri | yi , φi )dyiM =
Modeling approach
f (yiO , yiM | bi )h(ri | yiO , φi )dyiM =
Data analysis
f (yiO
| bi )h(ri |
yiO , φi )
If bi ⊥φi , the likelihood function can be factorized and ML estimates for β can be derived from the first term, using standard approaches.
Random effect structure Modeling initial conditions
Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
RCBDM
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
When dropout is non-ignorable, we may choose a parameterization to describe association between the primary outcome and the missing data mechanism. In clinical trials, unobserved disease status could influence both the primary response and the dropout due e.g. to non-compliance. The dependence between the outcome and the missing data indicator may thus arise because they are jointly determined by shared (or correlated) latent effects.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
On the model side...
Random effect models for longitudinal binary/binomial responses Marco Alf` o
Assuming conditional independence given the random coefficients (bi , φi ), we may introduce an explicit discrete-time model for the dropout process:
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach
h(ri | φi , vi , yi ) = h(ri | vi , φi ) which leads to the following joint density:
Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Z f (yi , ri ) =
f (yi | Xi , bi )h(ri | Vi , φi )dG (bi , φi )
Data analysis - more results Concluding remarks
The previous density may be specified using different approaches.
Modeling choice
Random effect models for longitudinal binary/binomial responses Marco Alf` o
Little and Rubin (2002) discussed two classes of models to handle non-ignorable missing data: I
I
Pattern mixture models: the observed sample is stratified according to the observed patterns of dropout. See e.g. Alf`o and Aitkin (2000), Verbeke and Molenberghs (2000), Roy (2003),Wilkins and Fitzmaurice (2007). Selection models: a complete-data model is defined for the primary response, augmented by a model describing the missing-data mechanism conditional on the complete data. See e.g. Verzilli and Carpenter (2002), Gao (2004), Lin et al.(2004), Rizopoulos et al. (2008).
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Leiden 85+ data, Pattern Mixture model Variable Cons. Age Gender Educ. MMSE85 S Cons.bas Gender.bas Educ.bas S.bas σb λ ` AIC
Coeff. -3.2729 -0.1994 -0.0201 0.2065 0.2044 0.0542 1.4305 -0.0208 0.1314 0.0158 0.8622 1.1429 -2287 4598
Std. Err. 0.0718 0.0067 0.1080 0.0688 0.0051 0.0160 0.1029 0.0484 0.0524 0.0114
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
0.2223
Table: Pattern mixture model: parameter estimates
Random effect models for longitudinal binary/binomial responses
Leiden 85+ data, Selection model Response
Yit
1 − Rit
Variable Cons. Age Gender Educ. MMSE85 Cons.bas Gender.bas Educ.bas Age.surv Gender.surv Educ.surv MMSE85 .surv σb λ σφ ρ ` AIC
Coeff. -3.3343 -0.1893 0.0924 0.1867 0.2012 1.4311 0.1006 0.0674 -0.1064 -1.0755 0.3600 0.1008 0.9322 1.1543 2.4102 0.6923 -2775 5584
Std. Err. 0.0711 0.0086 0.1027 0.0476 0.0069 0.1032 0.1277 0.0454 0.0092 0.2604 0.1604 0.0191 0.1929
Table: Selection model: parameter estimates
Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Findings...
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem
I
I
I
I
Once we control for the baseline value, the effect of educational drops significantly; however, the effect remains significant in both the VC-bas and the PM model; when we move to the selection model, educational is not significant in the baseline model, while it is in subsequent occasions as well as in the discrete time survival model these findings are, obviously, just a starting point, they need to be validated but point out the need for proper handling of missing data.
Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Next steps?
Random effect models for longitudinal binary/binomial responses Marco Alf` o
I
I
I
I
I
From discrete-time to continuous-time survival models (event times available); Genetic factors (in particular APOE - apolipoprotein E) have to be considered; From random coefficient based to response driven dropout models; Bayesian approach? Perform sensitivity analysis: do model assumptions influence results? Explore the impact of potential deviations from non-ignorability (see eg Troxel, Ma and Heitjan, 2004); From static to dynamic mixing: eg HMM (Capp´e et al., 2005).
The statistical problem Leiden 85+ data Data MMSE index Modeling issues
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks
Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues
THANK YOU!
Modeling approach Random effect structure Modeling initial conditions
Data analysis Missing data Ignorable dropouts Non-ignorable dropouts
Data analysis - more results Concluding remarks