Age-period-cohort models in epidemiology advantages and disadvantages Eva Gelnarová Institute of Biostatistics and Analyses
1
Carcinoma diseases • •
Nowadays - Second place among the causes of death (industrial developed countries) Half of the 20th century – seemingly epidemic appearance in Europe and North America. Civilization disease? »
•
Epidemiology of chronic diseases
Archeological findings – carcinoma always present in the population. E.g. Breast cancer (C50) ¾ papyruses from the period about 3 000 years B.C ¾ Hippocrates ¾ Galen 2
Factors for development of the disease
Occurrence of carcinoma differs in different part of the world, different races. 1. 2. 3.
Internal factors (genetically conditioned) External factors – e.g.carcinogenic chemicals exposure, life – style Age - number of spontaneous mutation increases, genetic instability In last fifty years Average age of the population rapidly increases
Number of cases in the population increases. 3
Incidence • Incidence is defined as a number of new cases in a population of size 100 000. • Incidence is a number of persons who develop a disease over number of persons (person - years) at risk multiplied by 100 000
incidence
Number of new cases Number of persons at risk
100 000
• Age-specific incidence – the investigated population includes only persons in a specific age category. incidence rate 4
Three time scales •
Task of interest What factors and how incidence depends on? Estimate the development of incidence via projection.
•
Because lack of additional information (long incubation period, e.g. historical pollution monitoring not available) so the factors are expressed as a function of time. 3 time scales: ¾ age of diagnosis ¾ year of diagnosis ¾ year of birth 5
Currently available information • • • • • • • •
Data sources: Oncological registries No individual records currently available Number of persons (person-years) at risk Number of persons who develop a disease over interval of time Gender, region, stage ... Age at diagnosis (age – a, number of age groups – A) Date of diagnosis (period – p, number of periods – P) Date of birth (cohort – c, number of cohorts – C = A – 1 + P ) ¾ Artificial Cohort is determined by age and period: c=p–a ¾ restricted version of a more general age-by-period cohort interaction
6
Lexis diagram Calendar time (Mean date of diagnosis) – PERIOD
Age group (Mean Age) AGE
1977 – 1981 (1979)
1982 – 1986 (1984)
1987 – 1991 (1989)
1992 – 1986 (1994)
1987 – 2001 (1999)
20 – 24 (22)
1957
1962
1967
1972
1977
25 - 29 (27)
1952
1957
1962
1967
1972
30 – 34 (32)
1947
1952
1957
1962
1967
35 – 39 (37)
1942
1947
1952
1957
1962
40 – 44 (42)
1937
1942
1947
1952
1957
45 – 49 (47)
1932
1937
1942
1947
1952
50 – 54 (52)
1927
1932
1937
1942
1947
55 – 59 (57)
1922
1927
1932
1937
1942
60 – 64 (62)
1917
1922
1927
1932
1937
65 – 69 (67)
1912
1917
1922
1927
1932
70 – 74 (72)
1907
1912
1917
1922
1927
75 – 79 (77)
1902
1907
1912
1917
1922
80 –84 (82)
1897
1902
1907
1912
1917
85 – 89 (87)
1892
1987
1902
1907
1912
7
Age – period – cohort models
•
Complex analysis of trends in incidence, widely used in • • •
•
Epidemiology Demography Sociology
Consider APC models which fall into a class of GLM (log-linear model).
8
Age – period – cohort models: Assumptions 1. The number of cases in age group i at time period j is denoted yij and is a realisation of poisson random variable with mean θij, where i = 1, ..., A and j = 1, ..., P. 2. The number of persons at risk in age group i at time period j Nij is fixed known value. 1. Random variables yij are jointly independent. 2. The logarithm of the expected rate is a linear function: ln(E[rij]) = ln(θij / Nij ) = µ + αi + ßj + γ k , where i = 1,…, A, j = 1, …, P, k = 1, …, C. 9
Effects description ln(E[rij]) = ln(θij / Nij ) = µ + αi + ßj + γ k , E[rij] = θij / Nij = exp(µ)¯exp(αi )¯exp(ßj )¯exp(γk), µ - mean effect αi - effect of age group ßj - effect of time period j and γk - the effect of the kth birth cohort
10
Effects description ln(E[rij]) = ln(θij / Nij ) = µ + αi + ßj + γ k , E[rij] = θij / Nij = exp(µ)¯exp(αi )¯exp(ßj )¯exp(γk), µ - mean effect αi - effect of age group differing risks associated with different age groups. ßj - effect of time period j and γk - the effect of th kth birth cohort
11
Effects description ln(E[rij]) = ln(θij / Nij ) = µ + αi + ßj + γ k , E[rij] = θij / Nij = exp(µ)¯exp(αi )¯exp(ßj )¯exp(γk), µ - mean effect αi - effect of age group differing risks associated with different age groups. ßj - effect of time period j and change in rate that is associated with all age groups simultaneously γk - the effect of th kth birth cohort
12
Effects description ln(E[rij]) = ln(θij / Nij ) = µ + αi + ßj + γ k , E[rij] = θij / Nij = exp(µ)¯exp(αi )¯exp(ßj )¯exp(γk), µ - mean effect αi - effect of age group differing risks associated with different age groups. ßj - effect of time period j and change in rate that is associated with all age groups simultaneously γk - the effect of th kth birth cohort long-term habits
13
Estimating parameters of age – period – cohort models Maximal Likelihood Estimates • Parameters of the model can be estimated with any statistical package with generalized linear modelling procedure • e.g. GLIM, R or SAS. • GLM must be fitted with constrains αa0 = ßp0 = γc0= 0. (*) plus other additional constrain or assumption needed.
• Fixed effect model • Possible refinements (natural splines etc.) Bayesian statistics • BAMP (“Bayesian Age-Period-Cohort Modeling and Prediction”), www.stat.uni-muenchen.dc/NSchmidt/bamp 14
Age – period – cohort models: Problems • Artificial birth cohorts form a sequence of overleaping intervals. (Usually ignored). • There is an exact linear dependency among three factors. ¾Only one cohort is associated with each cell in two-way table. ¾The design matrix is singular ¾Infinitely many solutions. ¾Identifiability problem.
15
Identifiability problem’s solutions • Usage of individual records tabulation, triangular tabulation,
no artificial rectangular cohorts hierarchical, random effect models
• Additional assumptions: the cohort or period trend is superior Sequential method (e.g. firts Age-cohort model, the residuals of this model fit on period) • Additional constrains: e.g. ßp0 = ßp1 = 0. • Holford’s method: with any parametrisation fit a model, regress the age-estimates on age (period estimates on period, cohort,…). Residuals represent the deviance from linearity. Linear trend cannot be uniquely assign to any of 3 time scales. • Intrinsic estimator • Age-period-cohort characteristic models (O’Brien, 2000): a cohort characteristic variable used instead of cohort dummy variables. 16
Effect of different parametrisations: Danish cancer testis data
17
Example: C50 breast-carcinoma • • • •
Only females A=13 age groups (30 years and older) 1977 – 2003, stratified into P=27 39 cohorts (C=P+A-1)
90
2003
90
300
300
80
1999 2001
300
80
1995 70
70
60
200
Rates
200
Rates
Rates
60
1997 1993 1989
50
1991 1985 1987 1983
200
50
1981 1977 1979
100
100
100
40
40 30
0 1880
1900
1920
1940
Date of birth
1960
30
0 1980
1985
1990
1995
Date of diagnosis
2000
2005
0 30
40
50
60
70
80
90
Age at diagnosis
18
Comparison: Holford, Sequential, “weighted” method • Cohort effect is wanted to be major
5
• Holford • „weighted“ • Sequential
0.05
10
0.1
20
0.2
Rate
50
0.5
1
100
2
200
Data fit • Cohort effect is major, • Period effect is marginal
30
50
70 Age
90
1900 1920
1940 1960 1980 2000 2020 Calendar time
19
Comparison: Holford, Sequential, “weighted” method
2 0.5
1
100 50
0.1
20
0.05
10 5
how to quantify? • Holford • „weighted“ • Sequential
0.2
Rate
Data fit • Period effect is not so major, • Cohort effect is as significant as period effect Both models: • Cohort effect is needed,
200
• Period effect is wanted to be major
30
50
70 Age
90
1900 1920
1940 1960 1980 2000 2020 Calendar time
20
Period major 2 1 0.5 0.2 0.1 0.05
5
5
10
0.05 0.1 10
20
Rate
50
0.2 Rate 0.5 20 50
100
1 100
2
200
200
Cohort major
30
50
70 Age
•
90
1900 1920
1940 1960 1980 2000 2020 Calendar time
Goodness of fit - residual deviance
30
50
70 Age
90
1900 1920
Resid. Df
•Age •Age-drift •Age-Cohort •Age-Period-Cohort •Age-Period
1940 1960 1980 2000 2020 Calendar time
343 342 336 330 336
Resid. Dev
4658.3 1058.4 720.8 642.1 1006.1
21
BAMP results
0 .0 0 - 0 .1 0 30 - 34
45 - 49
60 - 64
75 - 79
90 - 94
-3
-1
1
Age
1977
1980
1983
1986
1989
1992
1995
1998
2001
0 .0 0
0 .1 5
Period
- 0 .1 5
• Model (median) deviance: 360.724
Age Period and Cohort effects
1882 - 1887
1902 - 1907
1922 - 1927 Cohort
1942 - 1947
1962 - 1967
22
Conclusion
• Age-period-cohort models are appropriate to fit observed cancer incidence rates. • Because of identifiability problem, the results must be treated with caution. • To construct the model, we need an additional information, which is, unfortunately wanted as the model output.
23
Literature
• O’Brien Robert M. (2000): Age Period Cohort Characteristic Models, Social Science Research, 29, p.123-139
24