PubHlth640 - Spring 2012
Intermediate Biostatistics
Page 1 of 7
Unit 5 – Logistic Regression Practice Problems SOLUTIONS Version SAS
Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises #1-#3 utilize a data set provided by Afifi, Clark and May (2004). The data are a study of depression and was a longitudinal study. The purpose of the study was to obtain estimates of the prevalence and incidence of depression and to explore its risk factors. The study variables were of several types – demographics, life events, stressors, physical health, health services utilization, medication use, lifestyle, and social support. These exercises use just a subset of these data. I have provided them to you in three formats:: Stata (depress.dta), SAS (depress.sas7bdat), and Excel (depress.xls). http://people.umass.edu/~biep640w/webpages/assignments.html
Consider the following three variables. Variable DRINK SEX CASES
Codings 1 = yes 2 = no 1 = male 2 = female 0 = Normal 1 = Case of Depression
Format in SAS DRINK SEX CASES
Label in STATA DRINK SEX CASES
1. Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004, Problem 12.9, page 330. Use Stata or SAS or EXCEL , load the depression data set and fill in the following table: Sex Regular Drinker Yes No Total
Female 139 44 183
Male 95 16 111
Total 234 60 294
What are the odds that a woman is a regular drinker? 139 / 44 = 3.2 What are the odds that a man is a regular drinker? 95 / 16 = 5.9 What is the odds ratio? That is, compared to a man, what is the relative odds (odds ratio) that a woman is a regular drinker? OR = [odds for woman] / [odds for man] = 3.2/5.9 = 0.54
Sol_logistic_sas.doc
PubHlth640 - Spring 2012
Intermediate Biostatistics
Page 2 of 7
2. Repeat the tabulation that you produced for problem #1 two times, one for persons who are depressed and the other for persons who are not depressed.
Among Persons Who are Depressed Sex Regular Drinker Yes 33 No 7 Total 40
Female
Male 8 2 10
Total 41 9 50
‘ OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(33)(2)] / [(7)(8) ] = 1.18
Among Persons Who are NOT Depressed Sex Regular Drinker Yes 106 No 37 Total 143
Female
Male 87 14 101
Total 193 51 244
OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(106)(14)] / [(37)(87)] = 0.46 3. Fit a logistic regression model using these variables. Use DRINK as the dependent variable and CASES and SEX as independent variables. Also include as an independent variable the appropriate interaction term. Fitted Model: logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [ CASES] - 0.7743[ FEMALE ] + 0.9386 [ FEM_CASE ] where CASES =1 if depressed; 0 otherwise FEMALE = 1 if female; 0 otherwise FEM_CASE = (CASES) * (FEMALE)
ˆ ˆ ) = 0.96 and p-value = .33 Is the interaction term in your model significant? No. βˆ 3 = 0.9386 SE(β 0 How does your answer to problem #3 compare to your answer to problem #2? Comment. The answers match. Among Depressed: OR = 1.18 Among NON-depressed: OR = 0.46
Sol_logistic_sas.doc
PubHlth640 - Spring 2012
Intermediate Biostatistics
Page 3 of 7
logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [ CASES] - 0.7743 [ FEMALE ] + 0.9386 [ FEM_CASE ]
CASES FEMALE FEM_CASE
Among Depressed “1” = Female “0” = Male 1 1 1 0 1 0
logit [ female ] = 1.8269 – 0.4406 – 0.7743 + 0.9386 = 1.5506 logit [male] = 1.8269 – 0.4406 = 1.3863 logit [ female ] - logit [ male ] = 1.5506 - 1.3863 = + 0.1643 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { + 0.1643 } = 1.1786
CASES FEMALE FEM_CASE
Among NON Depressed “1” = Female “0” = Male 0 0 1 0 0 0
logit [ female ] = 1.8269 – 0.7743 = 1.0526 logit [male] = 1.8269 logit [ female ] - logit [ male ] = 1.0526 - 1.8269 = -0.7743 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { -0.7743 } = 0.4610
Sol_logistic_sas.doc
PubHlth640 - Spring 2012
Intermediate Biostatistics
Page 4 of 7
For SAS Users *_______________________________________________ * * Tell SAS location of data *________________________________________________; libname class "Z:\bigelow\teaching\web640\data sets";
You will have to edit this to be your path
*_________________________________________________ * * Read data of interest into a temporary copy *__________________________________________________; data temp(keep=drink sex cases); set class.depress; run; *______________________________________________________________ * * Create indicators as needed and format values for readability *_____________________________________________________________; proc format; value drinkf 0='0=nondrinker' 1='1=drinker'; value casef 0='0=normal' 1='1=depressed'; value sexf 0='0=male' 1='1=female'; run; data temp(drop=drink sex); set temp; drink01=.; if drink=1 then drink01=1; else if drink=2 then drink01=0; format drink01 drinkf. ; female=.; if sex=2 then female=1; else if sex=1 then female=0; format female sexf.; format cases casef.; fem_case = female*cases; run; *___________________________________________________________ * * Descriptives *_________________________________________________________; proc freq data=temp; tables drink01 female cases fem_case; run; *______________________________________________ * * Logistic regression model * NOTE - SAS chooses as the event the lower value * Use option DESCENDING so the value=1 is the event * of interest *_____________________________________________; proc logistic data=temp descending; model drink01 = cases female fem_case; run;
Sol_logistic_sas.doc
PubHlth640 - Spring 2012
Intermediate Biostatistics
Partial listing of Output Response Profile Ordered Value
Total Frequency
drink01
1 2
1=drinker 0=nondrinker
234 60
Probability modeled is drink01='1=drinker'.
The LOGISTIC Procedure Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard Error
Wald Chi-Square
Pr > ChiSq
Intercept CASES female fem_case
1 1 1 1
1.8269 -0.4406 -0.7743 0.9386
0.2880 0.8414 0.3455 0.9579
40.2469 0.2742 5.0223 0.9602