Lecture 5: Overdispersion in logistic regression

Lecture 5: Overdispersion in logistic regression Claudia Czado TU M¨ unchen c (Claudia Czado, TU Munich) ° ZFS/IMS G¨ottingen 2004 –0– Overview ...

Author: Deborah Stevens

0 downloads 2 Views 132KB Size

Report

Download PDF

Recommend Documents

Overdispersion: Logistic Regression

Lecture 12 Logistic regression

Lecture 6: Bayesian Logistic Regression

Unit 5 Logistic Regression

Lecture 18: Multiple Logistic Regression

5 Logistic Regression

Logistic regression. Lecture 14: Interpreting logistic regression models. Recall from last time: The logit function

Logistic Regression. Introduction CHAPTER The Logistic Regression Model 14.2 Inference for Logistic Regression

Topic2 - Logistic Regression --

Logistic Regression: Predicting Counts

STA6938-Logistic Regression Model

Lecture 15 (Part 1): Logistic Regression & Common Odds Ratios

Machine Learning - Michaelmas Term 2016 Lecture 8 : Classification: Logistic Regression

Logistic Regression & Classification

Bayesian Multivariate Logistic Regression

t-logistic Regression

Contingency Tables & Logistic Regression

LEC 6: Logistic Regression

Binary Logistic Regression

Chapter 14 Logistic regression

Logistic Regression Tree Analysis

Logistic Regression. The Model:

Multinomial Logistic Regression

Lecture 13 Estimation and hypothesis testing for logistic regression

Lecture 5: Overdispersion in logistic regression Claudia Czado TU M¨ unchen

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–0–

Overview

• Definition of overdispersion • Detection of overdispersion • Modeling of overdispersion

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–1–

Overdispersion in logistic regression Collett (2003), Chapter 6 Logistic model: Yi ∼ bin (ni, pi)

independent

t t x β x pi = e i /(1 + e i β )

⇒ E(Yi) = nipi

V ar(Yi) = nipi(1 − pi)

If one assumes that pi is correctly modeled, but the observed variance is larger or smaller than the expected variance from the logistic model given by nipi(1−pi), one speaks of under or overdispersion. In application one often observes only overdispersion, so we concentrate on modeling overdispersion.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–2–

How to detect overdispersion? If the logistic model is correct the asymptotic distribution of the residual deviance D ∼ χ2n−p. Therefore D > n − p = E(χ2n−p) can indicate overdispersion. Warning: D > n − p can also be the result of - missing covariates and/or interaction terms; - negligence of non linear effects; - wrong link function; - existance of large outliers; - binary data or ni small. One has to exclude these reasons through EDA and regression diagnostics. c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–3–

Residual deviance for binary logistic models Collett (2003) shows that the residual deviance for binary logistic models can be written as ¶ µ n X pˆi + ln(1 − pˆi)), D = −2 (ˆ pi ln 1 − pˆi i=1 ˆ xi t β

where pˆi = e /(1 + e assess goodness of fit.

ˆ xi t β

). This is independent of Yi, therefore not useful to

Need to group data to use residual deviance as goodness of fit measure.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–4–

Reasons for overdispersion Overdispersion can be explained by - variation among the success probabilities or - correlation between the binary responses Both reasons are the same, since variation leads to correlation and vice versa. But for interpretative reasons one explanation might be more reasonable than the other.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–5–

Variation among the success probabilities If groups of experimental units are observed under the same conditions, the success probabilities may vary from group to group. Example: The default probabilities of a group of creditors with same conditions can vary from bank to bank. Reasons for this can be not measured or imprecisely measured covariates that make groups differ with respect to their default probabilities.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–6–

Correlation among binary responses Let Yi =

ni P j=1

⇒ V ar(Yi)

½

Rij =

1 success P (Rij = 1) = pi 0 otherwise ni ni X X P Cov(Rij , Rik ) V ar(Rij ) + | {z } j=1 j=1 k6=j pi (1−pi ) | {z } Rij =

6=0

6= nipi(1 − pi) = binomial variance Yi has not a binomial distribution. Examples: - same patient is observed over time - all units are from the same family or litter (cluster effects) c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–7–

Modeling of variability among success probabilities Williams (1982) Yi = Number of successes in ni trials with random success probability vi, i = 1, . . . , n Assume E(vi) = pi parameter.

V ar(vi) = φpi(1 − pi), φ ≥ 0

unknown scale

Note: V ar(vi) = 0 if pi = 0 or 1 vi ∈ (0, 1) is unobserved or latent random variable

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–8–

Conditional expectation and variance of Yi: E(Yi|vi) = nivi V ar(Yi|vi) = nivi(1 − vi) Since

E(Y ) = EX (E(Y |X)) V ar(Y ) = EX (V ar(Y |X)) + V arX (E(Y |X)), the unconditional expectation and variance is E(Yi) = Evi (E(Yi|vi)) = Evi (nivi) = nipi V ar(Yi) = Evi (nivi(1 − vi)) + V arvi (nivi) = ni[Evi (vi) − Evi (vi2)] + n2i φpi(1 − pi) = ni(pi − φpi(1 − pi) − p2i ) + n2i φpi(1 − pi) = nipi(1 − pi)[1 + (ni − 1)φ] c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

–9–

Remarks - φ = 0 ⇒ no overdispersion - φ > 0 ⇒ overdispersion if ni > 1 - ni = 1 (Bernoulli data) ⇒ no information about φ available, this model is not useful

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 10 –

Modelling of correlation among the binary responses Yi =

ni P j=1

½ Rij , Rij =

1 success 0 otherwise

P (Rij = 1) = pi ⇒ E(Yi) = nipi

but

Cor(Rij , Rik ) = δ

k 6= j

⇒ Cov(Rij , Rik ) = δ ⇒

V ar(Yi)

=

p

ni P j=1

V ar(Rij )V ar(Rik ) = δpi(1 − pi) V ar(Rij ) +

ni P P j=1 k6=j

Cov(Rij , Rik )

= nipi(1 − pi) + ni(ni − 1)[δpi(1 − pi)] = nipi(1 − pi)[1 + (ni − 1)δ] c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 11 –

Remarks - δ = 0 ⇒ no overdispersion - δ > 0 ⇒ overdispersion if ni > 1 δ < 0 ⇒ underdispersion. - Since we need 1 + (ni − 1)δ > 0 δ cannot be too small. For ni → ∞ ⇒ δ ≥ 0. - Unconditional mean and variance are the same if δ ≥ 0 for both approaches, therefore we cannot distinguish between both approaches

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 12 –

Estimation of φ Yi|vi ∼ bin(ni, vi)

E(vi) = pi

V ar(vi) = φpi(1 − pi)

i = 1, . . . , g

Special case ni = n ∀i V ar(Yi) = npi(1 − pi) [1 + (n − 1)φ] | {z }

heterogenity factor

σ2

One can show that E(χ2) ≈ (g − p)[1 + (n − 1)φ] = (g − p)σ 2 where p = number of parameters in the largest model to be considered and g P (yi −npˆi )2 2 χ = npˆi (1−pˆi ) . i=1

⇒

2

σ ˆ =

χ2 g−p

⇒

φˆ =

σ ˆ 2 −1 n−1

Estimation of β remains the same c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 13 –

Analysis of deviance when variability among the success probabilities are present model df deviance 1 ν1 D1 2

ν2

D2

covariates xi1 , . . . , xiν1 xi1 , . . . , xiν1 , xi(ν1+1), . . . , xiν2

0 ν0 D0 For Yi|vi ∼ bin(ni, vi), i = 1, . . . , g. Since E(χ2) ≈ σ 2(g − p) we expect χ2 χ2 Statistic

a

a

a

∼ σ 2χ2g−p and D ∼ χ2 ∼ σ 2χ2g−p distribution

(D1 − D2)/(ν2 − ν1) ⇒ D0/ν0 → no change to ordinary case c (Claudia Czado, TU Munich) °

xi1 , . . . , xiν0

χ2ν2−ν1 ∼ χ2ν0 a

a

∼

Fν2−ν1,ν0

ZFS/IMS G¨ottingen 2004

– 14 –

Estimated standard errors in overdispersed models se( b βˆj ) = σ ˆ · se c0(βˆj ), where se c0(βˆj ) = estimated standard error in the model without overdispersion This holds since V ar(Yi) = σ 2nipi(1 − pi) and in both cases we have EYi = pi.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 15 –

Beta-Binomial models vi vi

= ∼

latent success probability ∈ (0, 1) Beta(ai, bi)

f (vi)

=

B(a, b)

=

ai −1 1 bi −1 v (1 − v ) , i B(ai ,bi ) i R1 a−1 b−1

x

(1 − x)

ai, bi > 0 density

dx − Beta function

0

E(vi) = V ar(vi) = τi :=

ai ai +bi

=: pi

ai b i 2 (ai +bi ) (ai +bi +1)

= pi(1 − pi)/[ai + bi + 1] = pi(1 − pi)τi

1 ai +bi +1

If ai > 1, bi > 1 ∀i we have unimodality and V ar(vi) < pi(1 − pi) 31 . If τi = τ , the beta binomial model is equivalent to the model with variability among success probabilities with φ = τ < 31 (⇒ more restrictive). c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 16 –

(Marginal) likelihood l(β) =

n R1 Q i=1 0

=

f (yi|vi)f (vi)dvi

n R ³ ´ Q ni

yi 1 v (1 yi B(ai ,bi ) i

i=1

− vi)ni−yi viai−1(1 − vi)bi−1dvi

t t x β x where pi = e i /(1 + e i β )

=

pi =

ai ai +bi

n ³ ´ Q ni B(yi +ai ,ni −yi +bi ) i=1

yi

B(ai ,bi )

needs to be maximized to determine MLE of β. Remark: no standard software exists

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 17 –

Random effects in logistic regression Let vi = latent success probability with E(vi) = pi µ log

vi 1 − vi

¶ = xtiβ + δi

“random effect”

δi measures missing or measured imprecisely covariates. When an intercept is included we can assume E(δi) = 0. Further assume δi i.i.d. with V ar(δi) = σδ2 Let Zi i.i.d. with E(Zi) = 0 and V ar(Zi) = 1 D

⇒ δi = γZi Therefore

µ

with γ = σδ2 ≥ 0 ¶

vi = xtiβ + γZi 1 − vi Remark: this model can also be used for binary regression data log

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 18 –

Estimation in logistic regression with random effects If Zi ∼ N (0, 1) i.i.d. the joint likelihood for β, γ, Zi is given by L(β, γ, Z) =

n ³ ´ Q ni i=1

=

yi

viyi (1 − vi)ni−yi

n ³ ´ Q ni i=1

t exp{xi β +γZi }yi yi [1+exp{xt β +γZ }]ni i i

p + 1 + n parameters

Too many parameters, therefore maximize marginal likelihood R L(β, γ) := L(β, γ, Z)f (Z)dZ Rn

=

∞ n ³ ´ R Q ni i=1

c (Claudia Czado, TU Munich) °

t exp{xi β +γZi }yi 1 − 21 Zi2 √ e dZi ni t yi 2π −∞ [1+exp{xi β +γZi }]

ZFS/IMS G¨ottingen 2004

– 19 –

This can only be determined numerically. One approach is to use a GaussHermite approximation given by Z∞

2

f (u)e−u du ≈ −∞

m X

cj f (sj )

j=1

for known cj and sj (see tables in Abramowitz and Stegun (1972)). m ≈ 20 is often sufficient.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 20 –

Remarks for using random effects - no standard software for maximization - one can also use a non normal random effect - extension to several random effects are possible. Maximization over high dim. integrals might require Markov Chain Monte Carlo (MCMC) methods - random effects might be correlated in time or space, when time series or spatial data considered.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 21 –

References

Abramowitz, M. and I. A. Stegun (1972). Handbook of mathematical functions with formulas, graphs, and mathematical tables. 10th printing, with corr. John Wiley & Sons. Collett, D. (2003). Modelling binary data (2nd edition). London: Chapman & Hall. Williams, D. (1982). Extra binomial variation in logistic linear models. Applied Statistics 31, 144–148.

c (Claudia Czado, TU Munich) °

ZFS/IMS G¨ottingen 2004

– 22 –