Nested case-control and case-cohort studies

Outline for first two hours Nested case-control and case-cohort studies An introduction and some new developments Pre-course 13. Norwegian Epidemiolo...

Author: Abel Freeman

0 downloads 4 Views 209KB Size

Report

Download PDF

Recommend Documents

Retreatment in leprosy: a casecontrol

Nested case-control studies: Should one break the matching?

Nested Dependencies: Structure and Reasoning

For Loops and Nested Loops

Nested Context Model 3.0

Comparing if, switch and nested if

Fast and Efficient Dynamic Nested Effects Models

Shuffle expressions and words with nested data

COMPLEXES OF TREES AND NESTED SET COMPLEXES

Nested if Lesson Outline

Nested if statements. Nested if statements. Example. More examples. Week 11. Conditional Testing and Debugging

Nested data parallelism in Haskell

INTRODUCTION TO NESTED MARKOV MODELS

A Note on Nested Words

Unit 10: A Nested Design

Troubleshooting. Nested PCR. Primer design

Read_dwnom_2.c. Read_dwnom_3.c. I added this nested sort: I added this nested sort:

The Four-Level Nested Model Revisited: Blocks and Guidelines

Matrix sparsification and nested dissection over arbitrary fields

Floyd Bullard. 11 September ISDS, Duke University. On Nested Sampling. Floyd Bullard. Motivation. Nested Sampling

Nested parallelism: Allocation of processors to tasks and OpenMP implementation

Nested Precedence Networks with Alternatives: Recognition, Tractability, and Models

Agile Paging: Exceeding the Best of Nested and Shadow Paging

Hierarchical Models, Nested Models and Completely Random Measures

Outline for first two hours

Nested case-control and case-cohort studies An introduction and some new developments Pre-course 13. Norwegian Epidemiology Conference Tromsø 23-24. November 2005

In general on cohort and case-control studies Relative risks and odds-ratios Efficiency comparisons between case-control and cohort studies Logistic regression for cohort and unmatched studies Matched studies: Mantel-Haenzel and conditional logistic regression

Ørnulf Borgan and Sven Ove Samuelsen Department of Mathematics, University of Oslo

Nested case-control and case-cohort studies – p.1/46

Outline of course

Nested case-control and case-cohort studies – p.3/4

Cohort and Case-Control

Wednesday 23/11

Cohort study (prospective)

11:00-11:45

Introduction to case-control studies (SOS)

Exposure at start of study

12:00-12:45

More on traditional case-control studies (SOS)

Disease after follow-up of all individuals

14:00-14:45

Introduction to Cox-regression (ØB)

15:00-15:45

Nested case-control studies (ØB)

Case-status from registry

16:15-17:00

Case-cohort studies (SOS)

Exposure on cases and sample of eligible controls

Case-control study (retrospective)

Thursday 24/11 at Rica Ishavshotell 8:30-9:15 9:30-10:30

Countermatching (ØB) Stratified case-cohort studies (SOS) Nested case-control and case-cohort studies – p.2/46

Nested case-control and case-cohort studies – p.4/4

Main ideas:

Data sources case-control

Cohort study: Compare disease rates between

Cases:

exposed individuals

Registries

non-exposed individuals

Within cohort study

Higher incidence among exposed points to causes of disease Case-controls study: Compare exposure characteristics between

Patients with a specific disease in a hospital Controls:

diseased individuals: cases

General population

non-diseased individuals: controls

Within same cohort study Patients with other diseases in the hospital

Higher level of exposure among cases also points to exposure being associated with causes of disease.

Cases and controls should be selected from the same population Nested case-control and case-cohort studies – p.5/46

Nested case-control and case-cohort studies – p.7/4

Example: Bladder cancer in suspect industry

69

No

257

299

Total

375

368

118

31.5 % = 18.8 % =

Recall bias: Retrospective information gathered from cases and controls need not be equally reliable

Case Controls

Yes

Selection bias: Cases and controls not selected from same population

suspect industry

However, case-control studies can be subject to

Employed in

Case-control study will be cheaper and less time-consuming than a cohort study and can provide almost as precise risk estimates.

375 cases of bladder cancer 368 controls (no bladder cancer)

One will typically only take a sample of all eligible non-diseased individuals.

Cohort or Case-Control?

exposed among cases exposed among control

Exposure "suspect industry" seems to be associated with disease. Nested case-control and case-cohort studies – p.6/46

Nested case-control and case-cohort studies – p.8/4

Example: Smoking and lung cancer, Doll & Hill (1951)

Relative risks in cohort study (population) Total

Exposure for all in a population at start of study

Hospital controls

Lung cancer cases

1-4 5-14 15-24 25-49

No. cigarettes

Cohort (or prospective) study:

Disease registered during study

non-exposed

E

D

non-diseased (control)

D=disease (case)

we have the rates (or probability) of disease among exposed and non-exposed as and P DE P DE

We can test for differences between distributions among cases and controls in several ways, for instance

E=exposed

The distribution of cigarettes smoked seems to be shifted towards higher values among the lung cancer cases.

With dichotomous (0/1) exposure:

Total

= 6900

7018

= 29900

30157

20.84

(std. dev. 14.07)

Yes

118

Mean cig. among control patients

15.89

(std. dev. 11.69)

No

257

which leads to a t-statistic

Mean cig. among lung cancer cases

Non-diseased

Case

Suspect industry

Assigning no. cigarettes 0, 2.5, 10, 20, 37.5, 60 to groups 0, 2-4 etc. cigarettes per day we get

Assume we had complete population data for the bladder cancer data (we don’t!) with data as in this table

on 5 d.f.,

More on matching later).

Nested case-control and case-cohort studies – p.10/46

P DE P DE

(Actually this study was matched on age and sex and somewhat different tests are more appropriate.

RR

leading to

and again a p-value

and P D E

P DE

we would estimate disease rates

For this table the Pearson

Nested case-control and case-cohort studies – p.11/4

Example: Bladder cancer and suspect industry

Lung cancer example cont.

P DE P DE

Relative risk (RR) is thus given as RR

t-test assigning a number of cigarettes to each exposure group Nested case-control and case-cohort studies – p.9/46

Chi-square test for "homogeneity" in table

Bladder cancer twice as common in suspect industry.

Nested case-control and case-cohort studies – p.12/4

P DE

P D E and

Let

RR when incidence is small

Odds-ratio

Relativ risk

Odds

P(D|E) 1-P(D|E)

Thus difference is acceptably small with Nested case-control and case-cohort studies – p.13/46

as high as 0.20.

Nested case-control and case-cohort studies – p.15/4

Estimation of RR and OR in cohort study

Parameter-interpretation in logistic regression Can be estimated in case-control studies (as we will see)

From 2x2 table over a = no. of subjects that are exposed and diseased, etc. D D

Why Odds-ratio?

and

P DE P DE P DE P DE

P DE P DE

P DE P DE

Odds Odds

OR

The Odds-ratio is then defined as

P(D|E) 1-P(D|E)

Among unexposed

P(D|E) P(D|E)

Odds

Among exposed:

Instead of relative risks we often use odds-ratios defined by means of Odds

Approximation OR

Odds-ratio

Approximation to relative risk when incidence is low OR RR

b a+b

E a

d c+d

E c

OR

. Thus the

RR

and P D E with we estimate P D E with estimate for relative risk becomes

In general we either have

OR = RR = 1

RR

Thus RR is always closer to one.

while the odds-ratio is estimated by

Nested case-control and case-cohort studies – p.14/46

OR

RR

OR

Nested case-control and case-cohort studies – p.16/4

Odds-ratio from case-control studies

The artificial cohort Bladder-cancer data

We assumed we had cohort data as in the table

No

257

= 29900

30157

Odds

Among controls

Odds

Among cases:

P(E|D) P(E|D)

Odds Odds

OR

Nested case-control and case-cohort studies – p.17/46

Nested case-control and case-cohort studies – p.19/4

Why?

2x2 table in un-matched case-control

In a case-control study we know the number of cases and the number of controls D D b

c

d

P(E|D)

a

If you really want to verify this mathematical fact use that conditional probabilities are defined as

E

and similarly for other terms involved.

This is standard algebra, although rather boring and somewhat

and of P(E|D by

thus now the column marginals are fixed. Then we may estimate P(E|D) by

P(E and D) P(D)

Total

E

P(E|D) 1-P(E|D)

P(E|D) 1-P(E|D)

Odds Odds

and since

.

whereas

(as previously calculated)

This is so because the case-control study allows estimation of the odds of exposure for cases and controls

7018

= 6900

Then

is valid also in

118

Yes

Total

Non-diseased

Case

Suspect industry

However, the odds-ratio estimate OR case-control studies.

.

tedious.

However, without knowledge of sampling fractions, we can not estimate P D E and P D E and so neither can we estimate the relative risk RR. Nested case-control and case-cohort studies – p.18/46

Nested case-control and case-cohort studies – p.20/4

Estimation of OR in case-control study

Example: Lung cancer and no. cigarettes The argument can be made for more than two exposure levels f.ex. groups of no. cigarettes.

Odds

Total

Controls

and this gives

Cancer cases

Odds

1-4 5-14 15-24 25-49

No. sigarettes

The estimates of exposure-odds among cases and controls are

Odds-ratio

Odds

Odds

For instance the odds-ratio non-smoker and those that smoke 1-4 cigarettes becomes

also in a case-control study (Cornfield, 1951).

Nested case-control and case-cohort studies – p.21/46

Alternative argument

Nested case-control and case-cohort studies – p.23/4

Confidence interval for OR:

Case Control

Exposed

This gives a 95% confidence interval for OR:

Not exp.

se

by a normal distribution approximation when a, b, c and d are all

and hence

and

we have

"large".

for controls

With probabilities of being included in case-control study for cases

OR

Not exp.

Exposed

OR

var

se

Disease Not disease

Wolfe’s formula: Variance estimate for log-odds-ratio

Case-control

Population

Nested case-control and case-cohort studies – p.22/46

Nested case-control and case-cohort studies – p.24/4

Efficiency case-control vs. cohort When the disease is rare the number of available controls in the cohort and is large compared to the number of cases and , thus the cohort variance is approximately

,

so the 95% CI =

se

Ex) Lung-cancer: 1-4 cig vs Non-smoke:

Examples CI

where =no. exposed cases and =no. non-exposed cases.

Exact methods may give a better confidence interval.

cases with Assume an case-control study with all controls per case. Total no. controls is then .

The normal approximation this CI relies on, though, is shaky (b=2 isn’t really big).

Assume also OR=1, thus no effect of exposure. We would then have c K a and d K b Nested case-control and case-cohort studies – p.25/46

Efficiency case-control vs. cohort, contd.

sample size

Cohort variance Case-control variance

The efficiency becomes

Then since variances approximately are proportional to

Variance with design 1 Variance with design 2

By Wolfe’s formula the case-controll variance:

Assume that two designs allows estimation of the same quantity. The efficiency of design 2 relative to design 1 is then defined as

Efficiency between study designs

Nested case-control and case-cohort studies – p.27/4

when K large and little gain by more than K= 4-5.

that gives the same precision as design 2 (if design 1 more efficient than design 2).

Efficiency

Reduction in sample size with design 1

the interpretation of an efficiency is:

These efficiencies are approximately valid when OR not very different from 1 and exposure not very rare.

Nested case-control and case-cohort studies – p.26/46

Nested case-control and case-cohort studies – p.28/4

Logistic regression model for cohort studies

Example Efficiency: Bladder cancer

Then the odds of having the disease equals

and the odds-ratio between two individuals with covariates and becomes Odds OR Odds

a 1:1 case:control ratio.

Odds

, somewhat smaller 0.5 corresponding to

Efficiency

OR

var

P

Artificial cohort study

OR

var

Let be an indicator for disease and a covariate for an individual. Assume that the probability the individual has the disease can be written

Case-control study

Nested case-control and case-cohort studies – p.29/46

Logistic regression and binary exposure

Odds Odds

so Nested case-control and case-cohort studies – p.30/46

OR

OR

and

Efficiency

Odds

Odds

with

we get the following efficiencies

For different values of

257

No

118

Let if an individual is exposed and if the individual is un-exposed. Then the model for disease can be written as a logistic regression model

Yes

Non-diseased

Case

Suspect industry

controls per case

Artificial bladder-cancer data, contd. Assumed case-control study with

Nested case-control and case-cohort studies – p.31/4

.

Nested case-control and case-cohort studies – p.32/4

Proof: Logistic regression un-matched studies

Binary exposure and 2x2 table

c+d

d

c

E

a+b

b

a

E

D

D

Let be the indicator for being sampled as case or control. By Bayes’ rule

This framework with binary exposure and binary outcome can be put up in a 2x2 table:

instead of D, etc., henceforth.

Will use notation

Note: The argument shows that the estimates are valid.

We actually have that the estimates are maximum likelihood, so standard error, tests, etc. are also valid. Nested case-control and case-cohort studies – p.33/46

may be

P P

and other

.

Can estimate odds-ratio

where

.

P P

sampled

OR

P

In this setting

Then

where

P( sampled

P( sampled

P

P( sampled

P( sampled

sampling to case-control study does only depend on disease-status, not on covariates

Several covariates (confounders) adjusted for in model

Assume that

Multivariate logistic regression for cohort studies

Logistic regression for un-matched case-control studies:

Nested case-control and case-cohort studies – p.35/4

from case-control data!

Nested case-control and case-cohort studies – p.34/46

Nested case-control and case-cohort studies – p.36/4

Multivariate logistic regression

NB. Logistic regression for case-control requires

the argument will not hold.

sampled

P

P

Then, just like for univariate logistic regression,

is valid. If for instance the model is linear (risk difference model)

are sampling fractions among cases and controls

and

P

sampling to case-control study does only depend on disease-status, not on covariates

that the cohort model

for un-matched case-control studies: Again assume that

from case-control

Can estimate adjusted odds-ratios data! Only the intercept is changed.

Nested case-control and case-cohort studies – p.37/46

Ex: Dysmeli = missing fingers, parts of arm, toes, etc

AdjOR

Another method is to match on (some of) the confounders:

95% CI

Typical matching factors: Age, sex, neighborhood, family, ...

No

Mother smokes

Low

High maternal education

No

No Pregnant in spring time

For each case sample controls with same value on confounder as the case

Prev. spontaneous abortion

No

95% CI

Logistic regression is one method for controlling confounding.

CrOR

Nested case-control and case-cohort studies – p.39/4

Matching

21 cases and 107 controls from Grenland and Mo i Rana. Pregnant after using p-pills

.

However, if the sampling fractions and are known one can for cases and for do weighted regression with weights controls. STATA with "probability weighting" will produce correct standard errors.

where

Matching can also give some efficiency improvements and is generally a more flexible method for controlling the confounding factors. However, one can not estimate effects of the matching factor

Nested case-control and case-cohort studies – p.38/46

Nested case-control and case-cohort studies – p.40/4

Mantel-Haenzel by 1:1 matching

Matched sets

Discordant pairs:

1:1 matching: Select one control for each case

)

) and control non-exposed (

1) Case exposed (

controls for each case

1:K matching: Select

)

) and control non-exp. (

2) Case non-exposed (

cases

controls for a group of

M:K matching: Select

No. pairs of type 1: No. pairs of type 2:

If

The Mantel-Haenzel estimate then becomes

are large the design is often referred to as a stratified

, .

and

Methods of analysis may differ with theses different sizes.

design

OR

i.e. the ratio of the no. discordant pairs. Furthermore the OR is given by variance estimate of

Nested case-control and case-cohort studies – p.41/46

Conditional logistic regression with 1:K matching

Odds-ratio in matched study: Mantel-Haenzel estimate

Nested case-control and case-cohort studies – p.43/4

Logistic regression model: in set no.

disease-indicator for an individual

differ between sets (nuisance parameters).

d

Can "condition out" nuisance

:

one case in set

by maximizing conditional likelihood

and estimate

case

set

set

= no. cases in set (=1 with 1:M matching), = no. controls in set (=1 with 1:1 matching). Let also . Then the odds-ratio is estimated by OR

where

b

where

c

a

d

c

E

b

a

E

D

D

A matched study with binary exposure be represented by 2x2 tables for all matched sets:

which is on same form as a Cox-likelihood Nested case-control and case-cohort studies – p.42/46

Nested case-control and case-cohort studies – p.44/4

Conditional logistic 1:K matching and Cox-regression Actually is on the same form as a stratified Cox-regression. May fit model with program for Cox where Status variable is indicator for case For time variable use a common arbitrary value, f.ex. 1 for all individuals. Covariates as in Cox-regression Variable that represent matched set used as stratum variable Estimates and tests from Cox-regression are valid!

Nested case-control and case-cohort studies – p.45/46

Analysing M:K matched data More complex conditional likelihood Cox-regression.

. Can not use

Special programs available: Egret, Epicure, LogXact, SAS, ....

Estimates for usual covariate

With stratified studies, N and K large, use standard logistic regression with stratum as categorical covariate is almost unbiased

Estimates for the stratum variable are confounded with sampling fractions in the stratum and not interpretable.

Nested case-control and case-cohort studies – p.46/46