Logistic Regression versus Cox Regression
Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme
ESP, le 26 f évrier 2008
Why do we need multivariable analyses?
We
live in a multivariable world. Most events, whether medical, political, social, or personal, have multiple causes. And these causes are related to one another.
1
Definition Multivariable
analysis is a tool for determining the relative contributions of different causes to a single event.
Note:
the terms “multivariate analysis” and “multivariable analysis” are often used interchangeably. In the strict sense, multivariate analysis refers to simultaneously predicting multiple outcomes.
Multivariable approach Y Single dependent variable Outcome e.g., dead or alive
X1, X2, X3, … Independent variables Risk factors Predictors e.g., age, gender, …
2
Multivariate approach y1
0j 1j 2j 3j
y2
0j 1j 2j 3j
y3
=
1 x1
x
x2
0j 1j 2j 3j
x3
Multiple dependent variables Independent variables Outcomes Risk factors Predictors e.g., countries e.g., drugs, …
MULTIVARIATE AN ALYSIS
2
PROPOFOL
1.5
Belgium-Luxembourg
1
Ital y Finland
0.5
Germany Austria
0 -0.5 -1 -1.5 -2
SUFENTANIL
Sweden
Denmark
UK Ireland
Holland
Switz erland Spain
Portugal
MORPHINE Norway
France
FENTANY L
MIDAZOLAM
-2 -1.5 -1 -0.5 0 0.5 1
1.5
2
Soli man H.M., Mélot C., et al. Br. J. Anaesth. 2001;87:186-192
3
Definition
Outcome: n. That which comes out of, or follows from; issue; result; Webster’s dictionary
Outcome: status of the patient at the end of an episode of care - presence of symptoms, level of activity, and mortality.
Types of regression If y = continuous -> linear regression If y = categorical (1 or 0) -> logistic regression If y = count of events in a period of time -> Poisson regression If y = time to event (censored data) -> Cox regression
4
MULTIVARIABLE REGRESSION
If y = continuous variable: multiple regression y = o + 1 x1 + 2 x2 + 3 x3
If y = dichotomus variable: multiple logistic regression e o + 1 x1 + 2 x2 + 3 x3 y= 1 + e o + 1 x1 + 2 x2 + 3 x3 Logit(y) = o + 1 x1 + 2 x2 + 3 x3
MULTIVARIABLE REGRESSION
If y = count of events during a given period of time (ti ) : multivariable Poisson’s regression y = ti eo + 1 x1 + 2 x2 + 3 x3 Ln(y/ti ) = o + 1 x1 + 2 x2 + 3 x3
If
y = time to event: multivariable Cox’s regression y = h0 (t) e 1 x1 + 2 x2 + 3 x3 Ln(y/h0 (t)) = 1 x1 + 2 x2 + 3 x3
5
Multiple lin ear regression
Multiple logistic regression
Proportional hazards analysis
Multiple Poisson’s regression
What is being modeled?
The mean valu e of the outcome
The logarithm of the odds of the outcome (logit)
The logarithm of the relativ e hazard
The logarithm of the count of the events
Relationship of multiple independent variabes (X’s) to outcome
The mean valu e of the outcome changes linearly with x’s
The logit of the outcome changes linearly with X’s
The logarithm of the relativ e hazard changes linearly with X’s
The logarithm of the count of the events changes linearly with X’s
Distribution of the outcome variable
Normal
Binomial
None specif ied
Poisson
Variance of outcome variable
Equal ground the mean
Depends onl y on the mean
None specif ied
Mean equals variance
Relative hazard over time
Not applicable
Not applicable
Constant
Not applicable
Expression of the results
If y = continuous variable: multiple regression y = o + 1 x1 + 2 x2 + 3 x3
1 = « slope » for the risk factor x1
6
Expression of the results
If y = dichotomus variable: multivariable logistic regression Logit(y) = o + 1 x1 + 2 x2 + 3 x3 e 1 = odds ratio for the risk factor x1
Expression of the results
If y = count of events during a given period of time (ti ) : multivariable Poisson’s regression Ln(y/ti ) = o + 1 x1 + 2 x2 + 3 x3 e
1
= relative risk of the occurrence of the event during the period of time or relative risk incidence
7
Expression of the results If
y = time to event: multivariable Cox’s regression Ln(y/h0 (t)) = 1 x1 + 2 x2 + 3 x3 e
1
= hazard ratio for the risk factor x1 or incidence rate ratio
Cox versus Logistic Regression
8
Cox regression vs logistic regression Distinction
proportion:
between rate and
– Incidence (hazard) rate: number of new
cases of disease per population at-risk per unit time (or mortality rate, if outcome is death)
– Cumulative incidence: proportion of new
cases that develop in a given time period
Cox regression vs logistic regression Distinction
between hazard/rate ratio and odds ratio/risk ratio:
– Hazard/rate ratio: ratio of incidence
rates – Odds/risk ratio: ratio of proportions
Logistic By takingregression into account aimstime, to estimate you are the taking odds into ratio; account Cox more information than just binary yes/no. regression aims to estimate the hazard ratio Gain power/precision.
9
Risks vs Rates Relationship
between risk and rates:
R(t) = 1 – e-ht h = constant hazard rate R(t) = probability of disease in time t
Risks vs Rates For
example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is: R(t) = 1 – e –(.005) (10) R(t) = 1 – e
-.05
Compare to .005(10) = 5%
R(t) = 1 - .951 = 0.0488
The loss of persons at risk because they have developed disease within the period of observation is small relative to the size of the total group.
10
Risks vs Rates If
rate is 50 cases/1000 personyears, then the chance of developing disease over 10 years is: R(t) = 1 – e –(.05) (10) R(t) = 1 – e -.5
Compare to .05(10) = 50%
R(t) = 1 - .61 = 0.39
year
Persons at ri sk
Inci dence: 0.050
1 2 3 4 5 6 7 8 9 10
1000 950 902 857 814 773 734 697 662 629
50 48 45 43 41 39 37 35 33 32 403
11
Risk vs Rates
Relationship between risk and rates (derivation):
r(t) = h e-ht
Exponential density function for waiting time until the event (constant hazar d rate)
t
R(t) =
he-hu du = -e-hu
t 0
= -e-hu - -e-0 = 1 – e-ht
0 Waiting time distribution will change if the hazard rate changes as a function of time: h(t)
LOGISTIC REGRESSION
12
Data set (CHD: Coronary Heart Disease) (Yes:1/No:0) AGEGRP 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
AGE 20 23 24 25 25 26 26 28 28 29 30 30 30 30 30 30 32 32 33 33 34 34 34 34 34 35 35 36 36 36 37 37 37
CHD 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0
PA TID 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
AGEGRP 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6
AGE 38 38 39 39 40 40 41 41 42 42 42 42 43 43 43 44 44 44 44 45 45 46 46 47 47 47 48 48 48 49 49 49 50
C HD 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0
PATID 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 10 0
AG EGRP 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8
AGE 50 51 52 52 53 53 54 55 55 55 56 56 56 57 57 57 57 57 57 58 58 58 59 59 60 60 61 62 62 63 64 64 65 69
CHD 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1
LINEAR REGRESSION CHD (0 = No, 1 = Yes)
PATID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
1.0 0.8
y = 0.0218 x - 0.538 R² = 0.264
0.6 0.4 0.2 0 0
20
40
60
80
Age, yrs
13
LOGISTIC REGRESSION
Number of patients
20
CHD=0 (n = 57) CHD=1 (n = 43)
15 10 5 0
20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69
Age Group (yrs)
NUMBER OF PATIENTS
LOGISTIC REGRESSION CHD=0 (n = 57) CHD=1 (n = 43) 100% 80% 60% 40% 20% 0%
20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69
Age Group (yrs)
14
LOGISTIC REGRESSION 1.00
Proportion CHD
0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00
20
30
40
50
60
70
Age, yrs
Proportion CHD
LOGISTIC REGRESSION e-5.31 + 0.111 Age
(x) = 1 + e-5.31 + 0.111 Age
1.0 0.8 0.6 0.4 0.2 0
0
10
20
30
40
50
60
70
80
90 100
Age, yrs
15
LOGISTIC TRANSFORMATION (x) =
e
0 + 1 x
1+e
0 + 1 x
Logit [(x)] = ln [
(x) 1 - (x)
]
LOGISTIC REGRESSION Logit Proportion CHD
3 Logit 2
(x) = -5.31 + 0.111 age
1 0 -1 -2 -3 0 1 0 20 30 40 50 60 70 80 90 1 00
Age, yrs
16
Odds and Probability Probability =
1 = 0.166 6
1 Odds in favour = = 6 1-(x) 5 6
(x)
5 Odds against = 6 1 6
= 0.20
= 5 against 1
ODDS RATIO AND LOGISTIC REGRESSION
OR =
(x=1) 1-(x=1) (x=0) 1-(x=0)
Example: OR = e 0.110
= e
x 10
= 3.03
17
ODDS RATIO AND LOGISTIC REGRESSION
(x=1) 1-(x=1) (x=0) 1-(x=0)
OR =
= e
Ln(OR) = 1 95 % CI for OR = ln (e 1 ± 1.96 SE(1 ) )
Forest plot: Odds Ratio with 95 % confidence interval IC 95 % = OR ± 1.96 SE
a c
Trt A = Trt B Favours acti ve
Favours placebo
OR =
b d
SE(ln(OR)) =
ad bc
1 + 1 +1 + 1 a b c d
p = ns Amplitude of the observed effect
p < 0.05 Precision of the observed eff ect
0
0.5
1 Trt A > Trt B OR
2 Trt B > Trt A
3
p < 0.05
18
COX’s REGRESSION
Cox model A Cox model is a well-recognized statistical technique for exploring the relationship between the occurrence of an event (e.g., death, relapse,…) in a patient and several explanatory variables.
Survival analysis is concerned with stuying the time between entry to a study and a subsequent event (such as deat h).
Censored survival times occur if the event of interest does not occur for a patient during the study period.
19
Survival Analysis: Terms Time-to-event:
The time from entry into a study until a subject has a particular outcome Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. – If dropout is related to both outcome and
treatment, dr opouts may bias the results
Right Censoring (T>t) Common examples Termination of the study Death due to a cause that is not the event of interest Loss to follow-up We know that subject survived at least to time t.
20
Left censoring (T