Multinomial logistic regression: Analysis of multi-category outcomes and its application to a Salmonella Enteritidis investigation in Ontario PHO Rounds: Epidemiology February 21, 2013
Dr. Laura Rosella, Scientist Ryan Walton, Epidemiologist
Overview 1. An overview of multinomial logistic regression (Laura) 2. An applied example of an Salmonella Enteritidis investigation (Ryan)
www.oahpp.ca
2
Learning Objectives 1. Familiarity with multinomial logistic regression, including the ability to identify situations when the technique may be useful
2. An understanding of the strengths and limitations of multinomial logistic regression 3. Increased awareness of SE epidemiology in Ontario 4. Working knowledge of the application of multinomial logistic regression to public health practice, and where to look for more information
www.oahpp.ca
3
Multinomial Logit Analysis: When to use it • Binomial logistic regression has two outcomes but how do you deal with 2+ outcomes? • Re-categorize into 2 outcomes • E.g. Pain scale as an outcome; options are: No pain, Mild pain, Moderate Pain, Severe Pain • Could re-categorize as: No Pain versus Mild pain, Moderate Pain, Severe Pain OR No Pain/Mild pain versus Moderate Pain, Severe Pain
• Challenges with this approach: • Results in an inevitable loss of information • Results can change depending on how you collapse your categories • Could lead to seriously misleading conclusions www.oahpp.ca
4
Multinomial Logit Analysis: When to use it • One might try to use Ordinary Least Squares (linear) regression with categorical outcome (e.g. 1,2,3 or low, medium, high) Challenges with this approach: • The residuals cannot be normally distributed (OLS assumption)
• The OLS model makes nonsensical predictions, since the dependent variable is NOT continuous • The coding is completely arbitrary i.e recoding the dependent variable can give very different results www.oahpp.ca
5
Multinomial Logit Analysis: When to use it • One might delete one of the categories
Challenges with this approach: • Losing information, data, and power • Potentially systematically biasing your sample
www.oahpp.ca
6
Multinomial Logistic Regression • Another model which considers the full form of the outcome is called the ‘multinomial’ or ‘polytomous’ logistic model because the outcome is no longer assumed to be BINOMIAL but rather MULTINOMIAL • Powerful • Slightly more complicated interpretation because you are no longer comparing two outcomes (but this isn’t a reason not to use it)
www.oahpp.ca
7
Types of multinomial outcomes 1. Nominal
The outcomes do not have order; i.e. Discrete choice •
•
A nominal distribution is also assumed when the outcome may have an order; however, this order is not easily captured •
• •
E.g. Outcome is a particular strain of influenza e.g. H3N2, Flu B, H1N1 –no numeric meaning
E.g. Outcome for Asthma: Primary Care Physician visit, Hospitalization, Death
Even though they are less severe to most severe it may not be appropriate to consider them on a scale of increasing severity Referred to as the “generalized logit model”
www.oahpp.ca
8
Types of multinomial outcomes 2. Ordinal • When the categories are ordered • E.g. Disease Scales, Intensity (low, medium, high) • Referred to as the “proportional odds model”
• If the response is ordinal, this information can: • Result in simpler and more parsimonious model • Increase power to detect associations
www.oahpp.ca
9
Ordinal logistic regression
β describes the effect of the covariate – x on the log-odds response in category j or below
Implies that for each outcome the curve is identical – but is shifted – that shift is determined by the intercept (α) Choice of baseline category is either ther highest or the lowest www.oahpp.ca
10
www.oahpp.ca
11
π1 = 0.3
π2 = 0.4
Logit [P(Y ≤ 1)]
= log(0.3/(0.4+0.3)
π3 = 0.3 Logit [P(Y ≤ 2)]
= log(0.3 + 0.4)/(0.3) www.oahpp.ca
12
Proportional Odds Assumption • β - applies to any given category and to each cumulative probability • e β is the odds ratio for x on increasing or decreasing outcome categories • This is known as the PROPOTIONAL ODDS ASSUMPTION • The β s are independent of j
• If this assumption holds we can use ONE coefficient to study shifts between any of the categories of the dependent variable
www.oahpp.ca
13
Example
Merani, Abdulla, Kwong Rosella et al. Increasing tuition fees in a country with two different models of medical education. Medical Education. 2010: 44: 577–586 www.oahpp.ca
14
Example • In 2001 the odds of QB students reporting increasing levels of financial stress were 40% lower compared to the rest of Canada • “Although the overall rates of financial stress changed only slightly between 2001 and 2007, students in Quebec were much less likely to report financial stress than students outside Quebec” www.oahpp.ca
15
Testing the proportional odds assumption • Can use graphical methods as described in Harrell* • Can test in statistical software – most commonly using a score test – tend to be on the conservative side * Harrell, Jr., F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York.
www.oahpp.ca
16
What to do if Proportional Odds Test Fails • Collapse two or more levels, particularly if some of the levels have small N • Estimate separate models using dichotomizations to see how different they are • Examine graphically
• Run a multinomial nominal logit model • Use the partial proportional odds model (available in SAS through PROC GENMOD) - advanced
www.oahpp.ca
17
Logit models for nominal responses
• Note that the β has a j subscript meaning that each comparison has a different estimate – unlike proportional hazard assumption www.oahpp.ca
18
Logit models for nominal responses
• Multinomial regression fits the above model simultaneously for all outcome categories
• Choice of the baseline category is arbitrary www.oahpp.ca
19
Outcome: Willingness to vaccinate children against HPV • Willing to vaccinate only if vaccine is free • Willing to vaccine even if the vaccine is not free • Not willing to vaccinate/don’t know • Ran a multinomial logistic regression
www.oahpp.ca
20
Interpretation:
• Prior awareness of HPV increased parents’ willingness to vaccinate. This was true for both being willing to vaccinate only if the vaccine is free (OR: 1.42; 95% CI: 1.21–1.66) and being willing to vaccinate even if the vaccine is not free (OR: 1.96; 95% CI: 1.75–2.20) compared to the not willing/don’t know group. www.oahpp.ca
21
SAS code Ordinal: (outcome low,med,high)
proc logistic data = your_dataset; model outcome = X1 X2; run; Multinomial:(outcome A,B,C) proc logistic data = your_dataset;
class outcome (ref = “A") / param = ref; model outcome = X1 X2/ link = glogit; run; NB: Make sure you understand the reference category as different packages have different default settings www.oahpp.ca
22
Summary • Multinomial logistic regression can allow you to extract more information from your data and prevent the loss of information due to collapsing • Careful consideration is needed to interpretation when comparing multiple categories • Like any regression model, multinomial regression has assumptions, which should be carefully scrutinized www.oahpp.ca
23
Source: http://www.sagestossel.com
www.oahpp.ca
24
What is Salmonella Enteritidis (SE)? • Gram-positive, rod-shaped, facultative anaerobic bacteria • Serovar belonging to the S. enterica subspecies enterica • Phage-typing used to discriminate between clusters of SE (Ward et al., 1987) • Symptoms include nausea, vomiting, abdominal pain, diarrhea and fever • Estimated that for every reported case of salmonellosis in Canada, there are approx. 13-37 cases in the population (Thomas et al., 2006) www.oahpp.ca
Source: http://salmonellablog.com 25
Figure 1. Number of confirmed cases of SE in Ontario by month, 2007-2010 140 2007
Confirmed cases of SE in Ontario
120
2008 2009
100
2010
80 60 40 20
0 Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Episode month
www.oahpp.ca
Data source: Ontario Ministry of Health and Long-Term Care, integrated Public Health Information System (iPHIS) database, extracted by Public Health Ontario [2013/02/13]
26
Provincial SE Investigation •
Partnership between PHO (then OAHPP) and the Ministry of Health and Long-Term Care •
•
Centralized interviewing •
•
Questionnaire based on previous SE outbreaks, relevant literature
Symptoms, travel, animal exposures, and 3-day food history •
•
Support from public health units across the province
Eggs, chicken, dairy, nuts, certain fruits and vegetables
Multiple PTs multiple hypotheses
Hypothesisgenerating Case-control Jul 2010
www.oahpp.ca
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
2011
27
Hypothesis-Generating Stage – Objective • To examine the associations between various risk factors and infection with different domestic PTs of SE
• To inform the development of a case-control study that will investigate risk factors for infection with domestic PTs of SE
www.oahpp.ca
28
6
Methods – Study Population Table 1. Status of follow-up procedure for SE cases identified from OAHPP PHL line list between July 11 and December 1, 2010
Interview Status
Total (%)
Interviewed
238 (65)
Lost to follow-up
97 (27)
Refused
31 (8)
Total
366 Source: Ontario SE investigation
• SE cases identified daily from Public Health Ontario Laboratory line list • Age, sex, city of residence (used to impute health unit)
• SE investigation dataset contains interview data for 238 individuals • Collected between July 12 and December 10, 2010 www.oahpp.ca
29
Methods – Inclusion and Exclusion Criteria • Case definition: Laboratory confirmation of SE infection with clinically compatible signs and symptoms (i.e., headache, diarrhea, abdominal pain, nausea, fever) in an Ontario resident with lab received date on or after July 10, 2010 • Inclusion criteria: Interviewed case (access to telephone, ability to communicate in English), PT available • Exclusion criteria: Travel outside of Canada/U.S. in 3 days before illness onset, potential secondary cases (i.e., living in the same household as someone who was ill with similar symptoms in the week before illness onset) www.oahpp.ca
30
Phage Type
Table 2. Reported travel history for interviewed SE cases stratified by phage type (secondary cases excluded), 2010
123 individuals meeting case definition
www.oahpp.ca
Non-Travel
Int’l Travel
Total
PT 1
2 (9)
21 (91)
23
PT 2
1 (100)
0 (0)
1
PT 3
0 (0)
2 (100)
2
PT 4
1 (25)
3 (75)
4
PT 5b
2 (14)
12 (86)
14
PT 6a
2 (100)
0 (0)
2
PT 6c
0 (0)
1 (100)
1
PT 7a
0 (0)
2 (100)
2
PT 8
52 (87)
8 (13)
60
PT 13
14 (88)
2 (12)
16
PT 13a
32 (86)
5 (14)
37
PT 14b
1 (50)
1 (50)
2
PT 15a
0 (0)
2 (100)
2
PT 21
1 (25)
3 (75)
4
PT 22
4 (80)
1 (20)
5
PT 23
1 (100)
0 (0)
1
PT 27
0 (0)
1 (100)
1
PT 51
5 (83)
1 (17)
6
PT 55
0 (0)
1 (100)
1
PT 256
0 (0)
1 (100)
1
PT 339
1 (100)
0 (0)
1
Atypical PT
2 (66)
1 (33)
3
Untypable PT
2 (66)
1 (33)
3
Unknown
2 (66)
1 (33)
32
125 (64)
70 (36)
195
Total
31 9 Source: Ontario SE investigation
Methods – Measure of Variables • Exposure variables obtained from interview data • Date of symptom onset Date of interview (average = 18.4 days) • Do you remember what you ate on Sunday, February 3? • Example questions: •
During this 3-day period, did you eat any cooked eggs (either at home or at a restaurant)? Yes No Don’t know
•
What type of nuts did you eat? (CHECK ALL THAT APPLY)
Peanuts Almonds Cashews Pistachios Pecans Other, please specify:_______________
www.oahpp.ca
32
Table 3. Food items reported by SE cases with greater than 10% frequency during hypothesis-generating stage, 2010 Food Items
OSEI (%)
CDC Food Atlas
Nesbitt et al.
Cooked eggs
51 (41)
66.5
82.2
Runny eggs
18 (15)
17.7
42.0
Firm eggs
40 (33)
Eggs at home
45 (37)
-
Eggs at restaurant
15 (12)
Uncooked eggs
OSEI (%)
CDC Food Atlas
Nesbitt et al.
Processed cheese
45 (37)
-
-
Dried or powdered cheese
13 (11)
-
-
Hard cheese
72 (59)
73.7
-
-
Soft cheese
16 (13)
12.6
-
-
-
Peanut butter
37 (30)
54.1
-
12 (10)
7.3
5.9
Nuts
16 (13)
-
-
Poultry
94 (76)
84.0
91.4
Almonds
12 (10)
-
-
Poultry at home
67 (54)
-
-
Carrots
50 (41)
59.8
80.5
Poultry at restaurant
41 (33)
-
-
Broccoli
38 (31)
29.3
43.4
Poultry from unfrozen
32 (26)
-
-
Peppers
37 (30)
-
27.0
Poultry from frozen
17 (14)
-
-
Onions
51 (41)
40.8
45.9
Processed poultry
31 (25)
-
19.9
Lettuce
63 (51)
72.6
84.0
Fast food poultry
14 (11)
Spinach
15 (12)
16.8
19.9
Deli meat (poultry)
19 (15)
25.2
19.0
Tomatoes
62 (50)
68.2
72.5
26 (21)
-
-
Mushrooms
25 (20)
12.7
36.5
18 (15)
-
-
Strawberries
40 (33)
32.8
31.5
Pasteurized milk
84 (68)
82.0
87.0
Cantaloupe
13 (11)
27.5
22.8
Any cheese
91 (74)
74.0
66.7
Unaware poultry juice colour Prepared meal from raw poultry
Food Items
Source: Ontario SE investigation 33
9
Methods – Multinomial Logistic Regression • Case-case comparisons leverage exposure data collected for cases (usually with different diseases), in the absence of control data • Consistent with our hypothesis about PT-specific risk factors case-case comparisons could be made between PTs
• Multinomial approach maximizes power for finding associations • Outcome (nominal): infection with PT 8 or PT 13a or other PT • Main exposures of interest: Processed chicken products, items containing uncooked eggs • Potential confounders, effect modifiers: sex, age, remaining risk factors www.oahpp.ca
34
Methods – Exposures of Interest Processed Chicken Products • Risk factor for S. Heidelberg infections in Canada (Currie et al., 2005) • Epidemiological evidence (e.g., barbeque cluster) • Food samples positive for SE Items Containing Uncooked Eggs • Eggs have been implicated in past SE outbreaks
• Epidemiologic evidence (e.g., new protein shake diet) • Biological explanation www.oahpp.ca
35
12
Methods – Hosmer and Lemeshow Model Building • Test univariate models, select those with p-values < 0.2 • Significant variables: cooked eggs, poultry at home, peanut butter, nuts, spinach, tomatoes, age, sex • Run multivariate models (order based on univariate p-values), select those with p-values < 0.2 or altered effect size (10% rule) • Choose and test possible interaction terms
model phagetype (ref = 'Other PT') = procpltry conunckegg age sex eatnuts spinach anytomatoe eatckeggs homepltry www.oahpp.ca
36
Results – Descriptive Statistics (Person) Table 4. Descriptive statistics for selected cases from the Ontario SE investigation, July to December 2010 Item
Count (n)
Frequency (%)
Male
64
52.5
Female
58
47.5
0-19
53
43.1
20-39
40
32.5
40 +
30
24.4
PT 8
52
42.3
PT 13a
32
26.0
Other PTs
39
31.7
Diarrhea
113
91.9
Nausea
57
46.3
Fever
81
65.9
Gender (n = 122)
Age (n = 123)
PT (n = 123)
Symptoms (n = 123)
www.oahpp.ca
Source: Ontario SE investigation
37
15
Results – Descriptive Statistics (Place)
www.oahpp.ca
Source: Ontario SE investigation
38
16
Results – Descriptive Statistics (Time) Figure 2. Epidemic curve for cases selected from the Ontario SE Investigation (n=110)
Confirmed cases of SE in Ontario
5
4
3
2
1
0
Symptom onset date Source: Ontario SE investigation
www.oahpp.ca
39
17
Results – Multinomial Logistic Regression Table 5a. Unadjusted multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category Unadjusted Multinomial Regression Item
No. Cases in Analysis
SE PT 8
SE PT 13a
n (%)
OR (95% CI)
n (%)
OR (95% CI)
pvalue
0.08
Processed Chicken Products No
92
34 (37.0)
1
23 (25.0)
1
Yes
31
18 (58.1)
3.95 (1.12-13.1)
9 (29.0)
3.28 (0.88-12.2)
Items Containing Uncooked Eggs No
111
48 (43.2)
1
25 (22.5)
1
Yes
12
4 (33.3)
0.98 (0.20-4.80)
5 (41.7)
2.24 (0.48-10.5)
0.43
Source: Ontario SE investigation
www.oahpp.ca
40
18
Results – Multinomial Logistic Regression Table 5b. Age- and sex-adjusted multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category Unadjusted Multinomial Regression Item
No. Cases in Analysis
SE PT 8
SE PT 13a
n (%)
OR (95% CI)
n (%)
OR (95% CI)
pvalue
0.11
Processed Chicken Products No
92
34 (37.0)
1
23 (25.0)
1
Yes
31
18 (58.1)
3.78 (1.09-13.1)
9 (29.0)
2.77 (0.71-10.8)
Items Containing Uncooked Eggs No
111
48 (43.2)
1
25 (22.5)
1
Yes
12
4 (33.3)
0.99 (0.20-4.87)
5 (41.7)
2.15 (0.45-10.2)
0.48
Source: Ontario SE investigation
www.oahpp.ca
41
18
Results – Multinomial Logistic Regression Table 5c. Full model* multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category Unadjusted Multinomial Regression Item
No. Cases in Analysis
SE PT 8
SE PT 13a
n (%)
OR (95% CI)
n (%)
OR (95% CI)
pvalue
0.10
Processed Chicken Products No
92
34 (37.0)
1
23 (25.0)
1
Yes
31
18 (58.1)
4.91 (1.13-21.4)
9 (29.0)
2.89 (0.58-14.5)
Items Containing Uncooked Eggs No
111
48 (43.2)
1
25 (22.5)
1
Yes
12
4 (33.3)
1.48 (0.23-9.41)
5 (41.7)
3.25 (0.48-22.1)
* Full model is adjusted for age, sex, and consumption of cooked eggs, poultry at home, nuts, spinach, tomatoes
www.oahpp.ca
0.43
Source: Ontario SE investigation
42
18
Table 6. Full model* multinomial regression analysis of SE PT 8 and SE PT 13a, with all other phage types as the reference category Full Model* Multinomial Regression Item
No. Cases in Analysis
SE PT 8 n (%)
SE PT 13a
OR (95% CI)
n (%)
OR (95% CI)
pvalue
Cooked Eggs No
72
29 (40.3)
1
22 (30.6)
1
Yes
51
23 (45.1)
0.92 (0.33-2.60)
10 (19.6)
0.40 (0.12-1.39)
No
56
19 (33.9)
1
13 (23.2)
1
Yes
67
33 (49.3)
1.57 (0.59-4.19)
19 (28.4)
1.35 (0.42-4.29)
No
107
45 (42.1)
1
29 (27.1)
1
Yes
16
6 (37.5)
0.55 (0.13-2.23)
2 (12.5)
0.21 (0.03-1.55)
No
108
47 (43.5)
1
25 (23.1)
1
Yes
15
5 (33.3)
1.15 (0.21-6.43)
7 (46.7)
3.95 (0.70-22.3)
No
61
30 (49.2)
1
12 (19.7)
1
Yes
62
22 (35.5)
0.53 (0.19-1.52)
20 (32.3)
1.84 (0.55-6.14)
0.28
Poultry at Home 0.67
Nuts 0.30
Spinach 0.16
Tomatoes
* Full model also contains sex, age and consumption of processed chicken and items containing uncooked eggs
www.oahpp.ca
0.07
Source: Ontario SE investigation 43
21
Results – Interpretation • Domestic cases of SE PT 8 are 4.91 (1.13-21.4) times more likely to consume processed chicken products compared to domestic cases representing other phage types, when controlling for sex, age and consumption of cooked eggs, poultry at home, nuts, spinach, and tomatoes.
www.oahpp.ca
44
Discussion – Strengths and Limitations • Limitation – Small sample size (especially stratified by PT) • Limitation – Recall issues • Limitation – Attributable proportion explained • Limitation – Validity of comparison group (aggregation of other PTs) • Strength – Centralized interviewing • •
Eliminates inter-health unit differences in follow-up procedure Standard questionnaire, limited number of interviewers
• Strength – Multinomial approach, proof of principle • •
In agreement with notion of PT-specific hypotheses Uses greatest number of SE cases increased power for evaluating associations of interest compared to binomial logistic regression
www.oahpp.ca
45
Discussion – Implications • Results support laboratory and epidemiologic evidence implicating processed chicken as a risk factor for infection with domestic PTs of SE • Nine processed chicken samples tested positive during hypothesisgenerating stage (four PT8, three PT13a, one PT19, one PT22) • Anecdotal reports from cases indicated a lack of consumer knowledge re: uncooked nature of some products
• Results fed into provincial case-control study that ran from January to August 2011 • Questionnaire design, food sampling, sample size calculations • Results from case-control study, after adjustment for confounders, the following were significantly associated with SE infection: • Consuming any poultry – aOR 2.24, 95% CI 1.31-3.83 • Processed chicken – aOR 3.32, 95% CI 1.26-8.7 • Not washing hands following handling of raw eggs – OR 2.82, 95% CI 1.48-5.37 www.oahpp.ca
46
Discussion – Looking Ahead • If any meat product is not a ready-to-eat meat product but has the appearance of, or could be mistaken for, a ready-to-eat meat product, the meat product shall bear the following information on its label: (a) the words “must be cooked”, “raw product”, “uncooked” or any equivalent words or word as part of the common name of the product to indicate that the product requires cooking before consumption; and (b) comprehensive cooking instructions such as an internal temperature-time relationship that, if followed, will result in a ready-to-eat meat product (SOR/90-288).
www.oahpp.ca
47
24
Laura’s References • Agresti, A. (2012) Categorical Data Analysis. New Jersey: John Wiley. • Harrell, Jr., F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. Springer-Verlag, New York. • Armstrong, BG, and M. Sloan. Ordinal models for epidemiologic data. Am J Epidemiol 1989;129:191–204.
• Ananth, CV; Kleinbaum, DG. Regression models for ordinal responses: A review of methods and applications. International Journal of Epidemiology. 1997; 26: 1323-1333 . • Preisser, JS; Koch, GG. Categorical data analysis in public health. Annual Review of Public Health. 1997; 18: 51-82. www.oahpp.ca
48
Ryan’s References •
Ward LR, de Sa JD, Rowe B. A phage-typing scheme for Salmonella enteritidis. Epidemiol Infect 1987;99(2):291-294.
•
Thomas MK, Majowicz SE, Sockett PN, et al. Estimated numbers of community cases of illness due to Salmonella, Campylobacter and verotoxigenic Escherichia coli: pathogen-specific community rates. Can J Infect Dis Med Microbiol 2006 Jul-Aug;17(4):229–234.
•
Currie A, MacDougal M, Aramini J, et al. Frozen chicken nuggets and strips and eggs are leading risk factors for Salmonella Heidelberg infections in Canada. Epidemiol Infect 2005;133:809–816.
•
Tighe MK, Savage, R, Vrbova, L, et al. The epidemiology of travel-related Salmonella Enteritidis in Ontario, Canada, 2010–2011. BMC Public Health 2012;12:310.
•
Nesbitt A, Majowicz S, Finley R, et al. Food consumption patterns in the Waterloo Region, Ontario, Canada: a cross-sectional telephone survey. BMC Public Health 2008;8;370.
•
Centers for Disease Control and Prevention (CDC). Foodborne Active Surveillance Network (FoodNet) Population Survey Atlas of Exposures. Atlanta, Georgia: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2006-2007.
•
Varga C, Middleton D, Walton R, et al. Evaluating risk factors for endemic human Salmonella Enteritidis infections with different phage types in Ontario, Canada using multinomial logistic regression and a casecase study approach. BMC Public Health 2012;12;866. www.oahpp.ca
49
Thank You! Public Health Ontario • Dr. Dean Middleton • Rachel Savage • Badal Dhar • Steven Johnson • Caitlin Johnson • Duri Song • Diana Yung
www.oahpp.ca
Canadian Field Epidemiology Program • Marianna Ofner-Agostini Ontario Ministry of Agriculture, Food and Rural Affairs • Dr. Csaba Varga Health unit case investigators and epidemiologists 50
26