Why do we need multivariable analyses?

Logistic Regression versus Cox Regression Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme ESP, le 26 f évri...
Author: Julian Lang
4 downloads 0 Views 307KB Size
Logistic Regression versus Cox Regression

Ch. Mélot, MD, PhD, MSciBiostat Service des Soins Intensifs Hôpital Universitaire Erasme

ESP, le 26 f évrier 2008

Why do we need multivariable analyses?

 We

live in a multivariable world. Most events, whether medical, political, social, or personal, have multiple causes. And these causes are related to one another.

1

Definition  Multivariable

analysis is a tool for determining the relative contributions of different causes to a single event.

 Note:

the terms “multivariate analysis” and “multivariable analysis” are often used interchangeably. In the strict sense, multivariate analysis refers to simultaneously predicting multiple outcomes.

Multivariable approach Y Single dependent variable Outcome e.g., dead or alive

X1, X2, X3, … Independent variables Risk factors Predictors e.g., age, gender, …

2

Multivariate approach y1

0j 1j 2j 3j

y2

0j 1j 2j 3j

y3

=

1 x1

x

x2

0j 1j 2j 3j

x3

Multiple dependent variables Independent variables Outcomes Risk factors Predictors e.g., countries e.g., drugs, …

MULTIVARIATE AN ALYSIS

2

PROPOFOL

1.5

Belgium-Luxembourg

1

Ital y Finland

0.5

Germany Austria

0 -0.5 -1 -1.5 -2

SUFENTANIL

Sweden

Denmark

UK Ireland

Holland

Switz erland Spain

Portugal

MORPHINE Norway

France

FENTANY L

MIDAZOLAM

-2 -1.5 -1 -0.5 0 0.5 1

1.5

2

Soli man H.M., Mélot C., et al. Br. J. Anaesth. 2001;87:186-192

3

Definition 

Outcome: n. That which comes out of, or follows from; issue; result; Webster’s dictionary



Outcome: status of the patient at the end of an episode of care - presence of symptoms, level of activity, and mortality.

Types of regression If y = continuous -> linear regression If y = categorical (1 or 0) -> logistic regression If y = count of events in a period of time -> Poisson regression If y = time to event (censored data) -> Cox regression

4

MULTIVARIABLE REGRESSION



If y = continuous variable: multiple regression y = o + 1 x1 + 2 x2 + 3 x3



If y = dichotomus variable: multiple logistic regression e o + 1 x1 + 2 x2 + 3 x3 y= 1 + e o + 1 x1 + 2 x2 + 3 x3 Logit(y) = o + 1 x1 + 2 x2 + 3 x3

MULTIVARIABLE REGRESSION 

If y = count of events during a given period of time (ti ) : multivariable Poisson’s regression y = ti eo + 1 x1 + 2 x2 + 3 x3 Ln(y/ti ) = o + 1 x1 + 2 x2 + 3 x3

 If

y = time to event: multivariable Cox’s regression y = h0 (t) e 1 x1 + 2 x2 + 3 x3 Ln(y/h0 (t)) = 1 x1 + 2 x2 + 3 x3

5

Multiple lin ear regression

Multiple logistic regression

Proportional hazards analysis

Multiple Poisson’s regression

What is being modeled?

The mean valu e of the outcome

The logarithm of the odds of the outcome (logit)

The logarithm of the relativ e hazard

The logarithm of the count of the events

Relationship of multiple independent variabes (X’s) to outcome

The mean valu e of the outcome changes linearly with x’s

The logit of the outcome changes linearly with X’s

The logarithm of the relativ e hazard changes linearly with X’s

The logarithm of the count of the events changes linearly with X’s

Distribution of the outcome variable

Normal

Binomial

None specif ied

Poisson

Variance of outcome variable

Equal ground the mean

Depends onl y on the mean

None specif ied

Mean equals variance

Relative hazard over time

Not applicable

Not applicable

Constant

Not applicable

Expression of the results 

If y = continuous variable: multiple regression y = o + 1 x1 + 2 x2 + 3 x3

1 = « slope » for the risk factor x1

6

Expression of the results 

If y = dichotomus variable: multivariable logistic regression Logit(y) = o + 1 x1 + 2 x2 + 3 x3 e 1 = odds ratio for the risk factor x1

Expression of the results 

If y = count of events during a given period of time (ti ) : multivariable Poisson’s regression Ln(y/ti ) = o + 1 x1 + 2 x2 + 3 x3 e

1

= relative risk of the occurrence of the event during the period of time or relative risk incidence

7

Expression of the results  If

y = time to event: multivariable Cox’s regression Ln(y/h0 (t)) = 1 x1 + 2 x2 + 3 x3 e

1

= hazard ratio for the risk factor x1 or incidence rate ratio

Cox versus Logistic Regression

8

Cox regression vs logistic regression  Distinction

proportion:

between rate and

– Incidence (hazard) rate: number of new

cases of disease per population at-risk per unit time (or mortality rate, if outcome is death)

– Cumulative incidence: proportion of new

cases that develop in a given time period

Cox regression vs logistic regression  Distinction

between hazard/rate ratio and odds ratio/risk ratio:

– Hazard/rate ratio: ratio of incidence

rates – Odds/risk ratio: ratio of proportions

Logistic By takingregression into account aimstime, to estimate you are the taking odds into ratio; account Cox more information than just binary yes/no. regression aims to estimate the hazard ratio Gain power/precision.

9

Risks vs Rates  Relationship

between risk and rates:

R(t) = 1 – e-ht h = constant hazard rate R(t) = probability of disease in time t

Risks vs Rates  For

example, if rate is 5 cases/1000 person-years, then the chance of developing disease over 10 years is: R(t) = 1 – e –(.005) (10) R(t) = 1 – e

-.05

Compare to .005(10) = 5%

R(t) = 1 - .951 = 0.0488

The loss of persons at risk because they have developed disease within the period of observation is small relative to the size of the total group.

10

Risks vs Rates  If

rate is 50 cases/1000 personyears, then the chance of developing disease over 10 years is: R(t) = 1 – e –(.05) (10) R(t) = 1 – e -.5

Compare to .05(10) = 50%

R(t) = 1 - .61 = 0.39

year

Persons at ri sk

Inci dence: 0.050

1 2 3 4 5 6 7 8 9 10

1000 950 902 857 814 773 734 697 662 629

50 48 45 43 41 39 37 35 33 32 403

11

Risk vs Rates 

Relationship between risk and rates (derivation):

r(t) = h e-ht

Exponential density function for waiting time until the event (constant hazar d rate)

t

R(t) =



he-hu du = -e-hu

t 0

= -e-hu - -e-0 = 1 – e-ht

0 Waiting time distribution will change if the hazard rate changes as a function of time: h(t)

LOGISTIC REGRESSION

12

Data set (CHD: Coronary Heart Disease) (Yes:1/No:0) AGEGRP 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3

AGE 20 23 24 25 25 26 26 28 28 29 30 30 30 30 30 30 32 32 33 33 34 34 34 34 34 35 35 36 36 36 37 37 37

CHD 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0

PA TID 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

AGEGRP 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 6

AGE 38 38 39 39 40 40 41 41 42 42 42 42 43 43 43 44 44 44 44 45 45 46 46 47 47 47 48 48 48 49 49 49 50

C HD 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0

PATID 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 10 0

AG EGRP 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8

AGE 50 51 52 52 53 53 54 55 55 55 56 56 56 57 57 57 57 57 57 58 58 58 59 59 60 60 61 62 62 63 64 64 65 69

CHD 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 0 1 1 1

LINEAR REGRESSION CHD (0 = No, 1 = Yes)

PATID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33

1.0 0.8

y = 0.0218 x - 0.538 R² = 0.264

0.6 0.4 0.2 0 0

20

40

60

80

Age, yrs

13

LOGISTIC REGRESSION

Number of patients

20

CHD=0 (n = 57) CHD=1 (n = 43)

15 10 5 0

20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69

Age Group (yrs)

NUMBER OF PATIENTS

LOGISTIC REGRESSION CHD=0 (n = 57) CHD=1 (n = 43) 100% 80% 60% 40% 20% 0%

20-29 30-34 35-39 40-44 45-49 50-54 55-59 60-69

Age Group (yrs)

14

LOGISTIC REGRESSION 1.00

Proportion CHD

0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00

20

30

40

50

60

70

Age, yrs

Proportion CHD

LOGISTIC REGRESSION e-5.31 + 0.111 Age

(x) = 1 + e-5.31 + 0.111 Age

1.0 0.8 0.6 0.4 0.2 0

0

10

20

30

40

50

60

70

80

90 100

Age, yrs

15

LOGISTIC TRANSFORMATION (x) =

e

0 + 1 x

1+e

0 + 1 x

Logit [(x)] = ln [

(x) 1 - (x)

]

LOGISTIC REGRESSION Logit Proportion CHD

3 Logit 2

 (x) = -5.31 + 0.111 age

1 0 -1 -2 -3 0 1 0 20 30 40 50 60 70 80 90 1 00

Age, yrs

16

Odds and Probability Probability =

1 = 0.166 6

1 Odds in favour = = 6 1-(x) 5 6

(x)

5 Odds against = 6 1 6

= 0.20

= 5 against 1

ODDS RATIO AND LOGISTIC REGRESSION

OR =

(x=1) 1-(x=1) (x=0) 1-(x=0)

Example: OR = e 0.110

= e

x 10

= 3.03

17

ODDS RATIO AND LOGISTIC REGRESSION

(x=1) 1-(x=1) (x=0) 1-(x=0)

OR =

= e

Ln(OR) = 1 95 % CI for OR = ln (e 1 ± 1.96 SE(1 ) )

Forest plot: Odds Ratio with 95 % confidence interval IC 95 % = OR ± 1.96 SE

a c

Trt A = Trt B Favours acti ve

Favours placebo

OR =

b d

SE(ln(OR)) =

ad bc

1 + 1 +1 + 1 a b c d

p = ns Amplitude of the observed effect

p < 0.05 Precision of the observed eff ect

0

0.5

1 Trt A > Trt B OR

2 Trt B > Trt A

3

p < 0.05



18

COX’s REGRESSION

Cox model A Cox model is a well-recognized statistical technique for exploring the relationship between the occurrence of an event (e.g., death, relapse,…) in a patient and several explanatory variables. 

Survival analysis is concerned with stuying the time between entry to a study and a subsequent event (such as deat h). 

Censored survival times occur if the event of interest does not occur for a patient during the study period. 

19

Survival Analysis: Terms  Time-to-event:

The time from entry into a study until a subject has a particular outcome  Censoring: Subjects are said to be censored if they are lost to follow up or drop out of the study, or if the study ends before ends before they die or have an outcome of interest. They are counted as alive or disease-free for the time they were enrolled in the study. – If dropout is related to both outcome and

treatment, dr opouts may bias the results

Right Censoring (T>t) Common examples  Termination of the study  Death due to a cause that is not the event of interest  Loss to follow-up We know that subject survived at least to time t.

20

Left censoring (T