Random effect models for longitudinal binary/binomial responses Marco Alf` o

Random effect models for longitudinal binary/binomial responses

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach

Marco Alf`o

Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

with A. Spagnoli and J. Houwing-Duistermaat “Sapienza” Universit` a di Roma, Leiden Univ. Medical Center

London, April 15, 2009

Data analysis - more results Concluding remarks

Table of contents The statistical problem Leiden 85+ data Data MMSE index Modeling issues Modeling approach Random effect structure Modeling initial conditions Data analysis Missing data Ignorable dropouts Non-ignorable dropouts Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Purpose

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data

I

I

I

Define a regression model for longitudinal binomial data to account for dependence on baseline values and random effects, Adopt an adequate modeling structure to obtain reliable ML parameter estimates; Since some data may be missing due to dropout, consider potentially non-ignorable missing data.

Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Notation

Marco Alf` o The statistical problem

We have a two-stage random sample yit , i = 1, . . . , n, t = 1, . . . , T , from a binomial distribution f (y | θ) with covariates X = (xit ), and canonical parameter θ. We will consider binomial longitudinal responses, where i indexes individuals and t indexes time occasions within individuals. In this case, T will denote the designed completion time.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Intra-cluster Dependence

Random effect models for longitudinal binary/binomial responses Marco Alf` o

Primary interest: to allow for apparent and true contagion. I

I

Apparent contagion - Individuals are drawn from heterogeneous populations, each population having a constant, but different, propensity to the event of interest. True contagion - Current and future outcomes are directly influenced by past ones, which cause changes over time in the corresponding distribution.

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

We will consider the effect of the baseline outcome on subsequent responses, with a special focus on random effect models for short panel series.

Data

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

I

I

I

Between September 1997 and September 1999, 705 inhabitants of Leiden (The Netherlands), 85 years old, were eligible to participate to the Leiden 85-plus Study; 599 subjects were enrolled, 14 died before they could be enrolled and 92 refused to participate(der Wiel et al, 2002); Response variable: Mini-Mental State Examination (MMSE) index (Folstein et al, 1975), assessing the global cognitive status of older adults (measured once a year: 85 up to 90);

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

MMSE index

Marco Alf` o

I

Assessed by a standard questionnaire;

I

30 items to be separately answered by each subject; scores on each item are binary (1 for correct answer and 0 otherwise);

I

the MMSE index is defined as follows: Yit =

30 X

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data

Yijt

Yijt ∈ {0, 1}

j=1

where Yijt , i = 1, . . . , n, j = 1, . . . , 30, t = 1, . . . , T represents the MMSE score for the i-th subject on the j-th item a time t.

Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Modeling issues

Random effect models for longitudinal binary/binomial responses Marco Alf` o

I

I

I

Investigate the relationship between annual value of the cognitive function, the baseline value and a set of covariates. Covariates collected at entry time: gender (female=0), educational status classified into two levels (primary=0 and higher levels); A possible approach could be based on a Rasch (1960)-type model:

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results

logit [Pr(Yijt = 1 | xit , ui )] = xT it β + (ui − γj ) where ui and γj , i = 1, . . . , n, j = 1, . . . , 30, represent individual-specific (random) ability and item-specific (fixed) complexity.

Concluding remarks

Random effect models for longitudinal binary/binomial responses

Model structure

Marco Alf` o The statistical problem

I

However, in the following we will consider, as response, the MMSE index defined by: X Yit = Yijt j

i = 1, . . . , n, t = 1, . . . , T . Thus, we (implicitly) assume: I

local (conditional) independence

I

constant (across items and time) item-specific complexity, ie γj = γ ⇒ πijt = πit .

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Variance components

Marco Alf` o

Conditional on unobserved characteristics summarized by the individual-specific vector bi , the responses are assumed to be independent: Yit | bi ∼ Bin(30, πit )

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

The canonical model is

Data analysis Missing data

θit =

xT it β

+

zT it bi

where the random coefficient bi may be associated to a subset of covariates, zit . The bi may be independent multivariate Gaussian rv’s bi ∼ MVN(0, Σ), or arbitrary with unknown density g (b | Σ).

Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Marginal likelihood

Marco Alf` o

The likelihood is

The statistical problem Leiden 85+ data

L(β, Σ) =

n Z Y i=1

"

T Y

# f (yit | bi ) g (bi )dbi .

t=1

Under Gaussian assumptions, the likelihood may be approximated by "T # n . Y X Y L(β, Σ) = f (yit | bk1 ,...,km ) πk1 ,...,km , i=1 k1 ,...,km

t=1

where bk1 ,...,km and πk1 ,...,km are masspoints and masses for (K1 , . . . , Km )-point Gaussian quadrature.

Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Parametric assumptions?

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

The parametric approach may be implemented using a standard EM algorithm; see Aitkin (1999) for details. For an arbitrary (nonparametric) assumption, NPML estimation of the mixing distribution can be achieved in a finite mixture framework using the same algorithm. Computational details of NPML estimation in the context of clustered/longitudinal data analysis are discussed in Aitkin (1999). R libraries: lme4 (Bates and Maechler) and npmlreg (Einbeck, Darnell and Hinde).

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Dependence on baseline

Marco Alf` o The statistical problem

Unobserved heterogeneity may be (partially) accounted for by inserting, in the linear predictor, the response value observed at t = 1 (baseline value). The corresponding model is, for t > 1, Yit | Yi1 , bi ∼ Bin(30, πit ),

t = 2, . . . , T

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

with xT it β

+ αyi1 +

θit

=

bi

∼ g (bi | Σ).

zT it bi ,

Data analysis - more results Concluding remarks

ML estimation

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

In this case, the joint marginal distribution is given by: ) Z (Y T f (yi ) = [f (yit | yi1 , bi ) f (yi1 | bi ) g (bi )] dbi t=2

However, the term f (yi1 | bi ) is not specified by the model assumptions; thus, we need to modify the standard VC approach to account for this modeling structure. This issue can be linked to the initial condition problem in models with both AR and random terms, see Aitkin and Alf` o (1998).

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

How to handle it...

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

Various approaches can be adopted. We discuss: I

Conditional Modeling

I

Joint modeling of first and subsequent occasions

In the first case, we maximize the likelihood conditional on the initial conditions yi1 ; this is not feasible when the baseline values is included as a covariate in a random effect model. In the second case we define a joint model including a modified model for the first occasion, and a standard model for subsequent occasions.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Conditional modeling - 1

Marco Alf` o The statistical problem

Since the bi ’s are shared by all subject i’s responses, the likelihood function results from the following integral:

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach

L(β, Σ | yi1 ) =

(T n Z Y Y i=1

) [f (yit | yi1 , bi )g (bi | yi1 )] dbi

t=2

Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

see Aitkin and Alf`o (1998, 2006). However, the random effect distribution has changed due to conditioning ⇒ parametric assumptions on g (bi ) may not be valid for g (bi | yi1 ).

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Conditional modeling - 2

Marco Alf` o

The random coefficient vector can be rewritten as:

The statistical problem Leiden 85+ data

bi =

[b∗i

+ E (bi | yi1 )] =

b∗i

+ Γyi1

where b∗i is independent from Yi1 . The canonical model is T ∗ θit = xT it β + αyi1 + zit [bi + Γyi1 ]

Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data

When random effect models are considered, we have: T ∗ ˜ i1 + bi∗ θit = xT it β + αyi1 + bi + γyi1 = xit β + αy

Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

This means that the estimated α ˜ will include the effect of the baseline value on the response variable and on the random effect.

Drawbacks

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

The standard VC algorithm can be simply adapted to conditional modeling. However, it discards information about the random effect structure which, in the case of short time series, may produce inefficient parameter estimates. it can not be of any help in the case of random effects models with baseline dependence (in general the corresponding estimate would be biased). Solution: Include the term f (yi1 | bi ) in a full model for all the data.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Model all the data jointly

Marco Alf` o

As before, the model for t ≥ 2 is defined by:

The statistical problem Leiden 85+ data

θit =

xT it β

+ αyi1 +

zT it bi

while for the first occasion we assume:

Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

T θi1 = xT i1 β 1 + zit bi1

Data analysis Missing data

with bi1 = Λbi , Λ = diag (λ1 , . . . , λm ), cov (bi1 ) = ΛΣΛT . The RE covariance matrix may be different on later occasions; the same applies to the fixed parameter vector. This is handled by defining a dummy variable dt = 1, t ≥ 2, and interacting it with appropriate terms.

Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Leiden 85+ data, VC model

Marco Alf` o The statistical problem

Variable Cons. Age Gender Educational σb ` AIC

Coeff. 1.3525 -0.1944 0.2299 1.0240 1.3344 -2756 5522

Std. Err. 0.0783 0.0066 0.1216 0.1207

Table: VC model: parameter estimates

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Leiden 85+ data, VC.bas model

Random effect models for longitudinal binary/binomial responses Marco Alf` o

Variable Cons. Age Gender Educ. MMSE85 Cons.bas Gender.bas Educ.bas σb λ ` AIC

Coeff. -3.3395 -0.1944 -0.0133 0.2264 0.2091 1.4313 0.0526 0.3243 0.8705 1.1432 -2293 4606

Std. Err. 0.0717 0.0065 0.1079 0.0595 0.0050 0.1028 0.1311 0.0420

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results

0.2221

Table: VC.bas model: parameter estimates

Concluding remarks

Intro

Random effect models for longitudinal binary/binomial responses Marco Alf` o

However, 276 of 599 subjects (46.1 %) have incomplete sequences; these may occur due to a variety of selection rules: for example, the individual may refuse to answer to some questions, or drop out during the study. This individual behavior may bias the survey design and question the representativeness of the observed sample in drawing inference about the general population.

The key question is whether those who dropout differ (in any way relevant to the analysis) from those who still remain.

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

More on key question

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

Once other variables in the data have been controlled for, do the missing data depend on the current (unobserved) value of the response variable? If not, the missing data mechanism is said to be ignorable (ID); otherwise, it is said to be non-ignorable (NID). In the first case, one may use standard ML methods for consistent estimation. Otherwise, one has to take into account the missing data mechanism to obtain consistent model parameter estimates.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Some notation

Marco Alf` o

Let us denote by Rit a missing data indicator, defined as: I

Rit = 1 if the i-th unit drops out at any time ∈ (t − 1, t), t = 1, . . . , T ,

I

Rit = 0 otherwise.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach

The number of available responses for the i-th subject is: Si = T −

The statistical problem

T X

Random effect structure Modeling initial conditions

Data analysis Missing data

Rit

t=1

We focus on a special case, assuming that, once a person drops out, he/she is out forever (attrition is an absorbing state).

Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses

Statistically speaking...

Marco Alf` o The statistical problem

If the mechanism is ignorable, the joint density is:

Leiden 85+ data Data MMSE index Modeling issues

f (yiO , ri | bi , φi ) =

Z Z

= =

f (yiO , yiM | bi )h(ri | yi , φi )dyiM =

Modeling approach

f (yiO , yiM | bi )h(ri | yiO , φi )dyiM =

Data analysis

f (yiO

| bi )h(ri |

yiO , φi )

If bi ⊥φi , the likelihood function can be factorized and ML estimates for β can be derived from the first term, using standard approaches.

Random effect structure Modeling initial conditions

Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

RCBDM

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

When dropout is non-ignorable, we may choose a parameterization to describe association between the primary outcome and the missing data mechanism. In clinical trials, unobserved disease status could influence both the primary response and the dropout due e.g. to non-compliance. The dependence between the outcome and the missing data indicator may thus arise because they are jointly determined by shared (or correlated) latent effects.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

On the model side...

Random effect models for longitudinal binary/binomial responses Marco Alf` o

Assuming conditional independence given the random coefficients (bi , φi ), we may introduce an explicit discrete-time model for the dropout process:

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach

h(ri | φi , vi , yi ) = h(ri | vi , φi ) which leads to the following joint density:

Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Z f (yi , ri ) =

f (yi | Xi , bi )h(ri | Vi , φi )dG (bi , φi )

Data analysis - more results Concluding remarks

The previous density may be specified using different approaches.

Modeling choice

Random effect models for longitudinal binary/binomial responses Marco Alf` o

Little and Rubin (2002) discussed two classes of models to handle non-ignorable missing data: I

I

Pattern mixture models: the observed sample is stratified according to the observed patterns of dropout. See e.g. Alf`o and Aitkin (2000), Verbeke and Molenberghs (2000), Roy (2003),Wilkins and Fitzmaurice (2007). Selection models: a complete-data model is defined for the primary response, augmented by a model describing the missing-data mechanism conditional on the complete data. See e.g. Verzilli and Carpenter (2002), Gao (2004), Lin et al.(2004), Rizopoulos et al. (2008).

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Leiden 85+ data, Pattern Mixture model Variable Cons. Age Gender Educ. MMSE85 S Cons.bas Gender.bas Educ.bas S.bas σb λ ` AIC

Coeff. -3.2729 -0.1994 -0.0201 0.2065 0.2044 0.0542 1.4305 -0.0208 0.1314 0.0158 0.8622 1.1429 -2287 4598

Std. Err. 0.0718 0.0067 0.1080 0.0688 0.0051 0.0160 0.1029 0.0484 0.0524 0.0114

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

0.2223

Table: Pattern mixture model: parameter estimates

Random effect models for longitudinal binary/binomial responses

Leiden 85+ data, Selection model Response

Yit

1 − Rit

Variable Cons. Age Gender Educ. MMSE85 Cons.bas Gender.bas Educ.bas Age.surv Gender.surv Educ.surv MMSE85 .surv σb λ σφ ρ ` AIC

Coeff. -3.3343 -0.1893 0.0924 0.1867 0.2012 1.4311 0.1006 0.0674 -0.1064 -1.0755 0.3600 0.1008 0.9322 1.1543 2.4102 0.6923 -2775 5584

Std. Err. 0.0711 0.0086 0.1027 0.0476 0.0069 0.1032 0.1277 0.0454 0.0092 0.2604 0.1604 0.0191 0.1929

Table: Selection model: parameter estimates

Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Findings...

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem

I

I

I

I

Once we control for the baseline value, the effect of educational drops significantly; however, the effect remains significant in both the VC-bas and the PM model; when we move to the selection model, educational is not significant in the baseline model, while it is in subsequent occasions as well as in the discrete time survival model these findings are, obviously, just a starting point, they need to be validated but point out the need for proper handling of missing data.

Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Next steps?

Random effect models for longitudinal binary/binomial responses Marco Alf` o

I

I

I

I

I

From discrete-time to continuous-time survival models (event times available); Genetic factors (in particular APOE - apolipoprotein E) have to be considered; From random coefficient based to response driven dropout models; Bayesian approach? Perform sensitivity analysis: do model assumptions influence results? Explore the impact of potential deviations from non-ignorability (see eg Troxel, Ma and Heitjan, 2004); From static to dynamic mixing: eg HMM (Capp´e et al., 2005).

The statistical problem Leiden 85+ data Data MMSE index Modeling issues

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks

Random effect models for longitudinal binary/binomial responses Marco Alf` o The statistical problem Leiden 85+ data Data MMSE index Modeling issues

THANK YOU!

Modeling approach Random effect structure Modeling initial conditions

Data analysis Missing data Ignorable dropouts Non-ignorable dropouts

Data analysis - more results Concluding remarks