Unit 5 Logistic Regression Practice Problems. SOLUTIONS Version STATA

PubHlth640 - Spring 2012 Intermediate Biostatistics Page 1 of 8 Unit 5 – Logistic Regression Practice Problems SOLUTIONS Version STATA Source: Afi...
Author: Flora Lee
0 downloads 0 Views 164KB Size
PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 1 of 8

Unit 5 – Logistic Regression Practice Problems SOLUTIONS Version STATA

Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises #1-#3 utilize a data set provided by Afifi, Clark and May (2004). The data are a study of depression and was a longitudinal study. The purpose of the study was to obtain estimates of the prevalence and incidence of depression and to explore its risk factors. The study variables were of several types – demographics, life events, stressors, physical health, health services utilization, medication use, lifestyle, and social support. These exercises use just a subset of these data. I have provided them to you in three formats:: Stata (depress.dta), SAS (depress.sas7bdat), and Excel (depress.xls). http://people.umass.edu/~biep640w/webpages/assignments.html

Consider the following three variables. Variable DRINK SEX CASES

Codings 1 = yes 2 = no 1 = male 2 = female 0 = Normal 1 = Case of Depression

Format in SAS DRINK SEX CASES

Label in STATA DRINK SEX CASES

1. Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004, Problem 12.9, page 330. Use Stata or SAS or EXCEL , load the depression data set and fill in the following table: Sex Regular Drinker Yes No Total

Female 139 44 183

Male 95 16 111

Total 234 60 294

What are the odds that a woman is a regular drinker? 139 / 44 = 3.2 What are the odds that a man is a regular drinker? 95 / 16 = 5.9 What is the odds ratio? That is, compared to a man, what is the relative odds (odds ratio) that a woman is a regular drinker? OR = [odds for woman] / [odds for man] = 3.2/5.9 = 0.54

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 2 of 8

2. Repeat the tabulation that you produced for problem #1 two times, one for persons who are depressed and the other for persons who are not depressed.

Among Persons Who are Depressed Sex Regular Drinker Yes 33 No 7 Total 40

Female

Male 8 2 10

Total 41 9 50

OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(33)(2)] / [(7)(8) ] = 1.18

Among Persons Who are NOT Depressed Sex Regular Drinker Yes 106 No 37 Total 143

Female

Male 87 14 101

Total 193 51 244

OR (Relative odds, compared to a man, that a woman is a regular drinker): OR = [(106)(14)] / [(37)(87)] = 0.46

3. Fit a logistic regression model using these variables. Use DRINK as the dependent variable and CASES and SEX as independent variables. Also include as an independent variable the appropriate interaction term. Fitted Model: logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [ CASES] - 0.7743[ FEMALE ] + 0.9386 [ FEM_CASE ] where CASES =1 if depressed; 0 otherwise FEMALE = 1 if female; 0 otherwise FEM_CASE = (CASES) * (FEMALE)

ˆ ˆ ) = 0.96 and p-value = .33 Is the interaction term in your model significant? No. βˆ 3 = 0.9386 SE(β 0 How does your answer to problem #3 compare to your answer to problem #2? Comment. The answers match. Among Depressed: OR = 1.18 Among NON-depressed: OR = 0.46

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 3 of 8

logit [ pr (drinker=yes) ] = 1.8269 - 0.4406 [ CASES] - 0.7743 [ FEMALE ] + 0.9386 [ FEM_CASE ]

CASES FEMALE FEM_CASE

Among Depressed “1” = Female “0” = Male 1 1 1 0 1 0

logit [ female ] = 1.8269 – 0.4406 – 0.7743 + 0.9386 = 1.5506 logit [male] = 1.8269 – 0.4406 = 1.3863 logit [ female ] - logit [ male ] = 1.5506 - 1.3863 = + 0.1643 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { + 0.1643 } = 1.1786

CASES FEMALE FEM_CASE

Among NON Depressed “1” = Female “0” = Male 0 0 1 0 0 0

logit [ female ] = 1.8269 – 0.7743 = 1.0526 logit [male] = 1.8269 logit [ female ] - logit [ male ] = 1.0526 - 1.8269 = -0.7743 OR [women compared to men ] = exp { logit [ p1 ] - logit [ p0 ] } = exp { -0.7743 } = 0.4610

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 4 of 8

For STATA Users Dear class: I edited my session in word after exiting STATA to provide the coloring you see here - cb. *GREEN: comments BLACK: commands Blue - output . * Use FILE > OPEN to read in data set called depress.dta . use "http://people.umass.edu/biep640w/datasets/depress.dta" . * Use the command DESCRIBE to obtain description of data set . describe

(output omitted here) . . . .

* Solution to Exercise #1 * Use command TABULATE to obtain cross tab of regular drinker by sex * TABULATE ROWVARIABLE COLUMNVARIABLE tabulate drink sex

regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 95 139 | 234 no | 16 44 | 60 -----------+----------------------+---------Total | 111 183 | 294

. * Solution to Exercise #2 . * Use command SORT to sort the data by case status (depressed or not depressed) . sort case . * Use the command BY in front of the command TABULATE . by case: tabulate drink sex ----------------------------------------------------------------------------------------------------------------> cases = normal regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 87 106 | 193 no | 14 37 | 51 -----------+----------------------+---------Total | 101 143 | 244

----------------------------------------------------------------------------------------------------------------> cases = depressed regular | sex drinker? | male female | Total -----------+----------------------+---------yes | 8 33 | 41 no | 2 7 | 9 -----------+----------------------+---------Total | 10 40 | 50

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 5 of 8

.* Some variable creation commands .* Create 0/1 indicators of drinker and female gender . generate drink01=. (294 missing values generated) . replace drink01=1 if drink==1 (234 real changes made) . replace drink01=0 if drink==2 (60 real changes made) . label define drinkf 0 "0=nondrinker" 1 "1=drinker" . label values drink01 drinkf . generate female=. (294 missing values generated) . replace female=0 if sex==1 (111 real changes made) . replace female=1 if sex==2 (183 real changes made) . label define sexf 0 "0=male" 1 "1=female" . label values female sexf .* Create a new variable called FEM_CASE that is the interaction of FEMALE and CASES . generate fem_case=female*cases

. . . .

* * * *

Solution to Exercise #3 Use the command LOGISTIC if you want output to include ODDS RATIOS Use the command LOGIT if you want the output to include BETAs and SEs LOGISTIC OUTCOME PREDICTOR PREDICTOR etc..

. logistic drink01 cases female fem_case Logistic regression

Log likelihood = -145.95772

Number of obs LR chi2(3) Prob > chi2 Pseudo R2

= = = =

294 5.62 0.1318 0.0189

-----------------------------------------------------------------------------drink01 | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------cases | .6436782 .5415789 -0.52 0.601 .1237324 3.348528 female | .4610127 .1592889 -2.24 0.025 .2342104 .9074437 fem_case | 2.556483 2.448818 0.98 0.327 .3911017 16.71076 ------------------------------------------------------------------------------

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 6 of 8

. logit drink01 cases female fem_case Iteration Iteration Iteration Iteration

0: 1: 2: 3:

log log log log

likelihood likelihood likelihood likelihood

Logistic regression

Log likelihood = -145.95772

= = = =

-148.76664 -145.99305 -145.95773 -145.95772 Number of obs LR chi2(3) Prob > chi2 Pseudo R2

= = = =

294 5.62 0.1318 0.0189

-----------------------------------------------------------------------------drink01 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------cases | -.4405564 .8413815 -0.52 0.601 -2.089634 1.208521 female | -.7743296 .3455196 -2.24 0.025 -1.451536 -.0971237 fem_case | .9386327 .9578851 0.98 0.327 -.9387877 2.816053 _cons | 1.826851 .2879632 6.34 0.000 1.262453 2.391248 -----------------------------------------------------------------------------. log close

4. Source: Kleinbaum, Kupper, Miller, and Nizam. Applied Regression Analysis and Other Multivariable Methods, Third Edition. Pacific Grove: Duxbury Press, 1998. p 683 (problem 2). A five year follow-up study on 600 disease free subjects was carried out to assess the effect of 0/1 exposure E on the development (or not) of a certain disease. The variables AGE (continuous) and obesity status (OBS), the latter a 0/1 variable were determined at the start of the follow-up and were to be considered as control variables in analyzing the data.

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 7 of 8

(A) State the logit form of a logistic regression model that assesses the effect of the 0/1 exposure variable E controlling for the confounding effects of AGE and OBS and the interaction effects of AGE with E and OBS with E.

Solution: logit[π] = β 0 + β1 *E + β 2 *AGE + β 3 *OBS + β 4 *AGEE + β 5 *OBSE

I used the following notation: π = Probability [ disease ] AGEE = AGE * E. This is a created variable that is the interaction of AGE with E OBSE = OBS * E Similarly, this is the interaction of OBS with E. logit[π] = β 0 + β1 *E + β 2 *AGE + β 3 *OBS + β 4 *AGEE + β 5 *OBSE

(B) Given the model you have for part “A”, give a formula for the odds ratio for the exposure-disease relationship that controls for the confounding and interactive effects of AGE and OBS. Solution: The solution here follows the ideas on pp 9-11 in Lecture Notes 5, Logistic Regression. Predictor

Value of Predictor for Person who is Exposed Not Exposed

E AGE OBS AGEE OBSE

1 AGE1 OBS1 AGE1 OBS1

0 AGE0 OBS0 0 0

Then OR = exp { logit[π for exposed person] - logit[π for NON exposed person] } = exp { [ β 0 + β1 + β 2 *AGE1 + β3 *OBS1 + β 4 *AGE1 + β5 *OBS1 ] - [ β 0 + β 2 *AGE 0 + β 3 *OBS0 ] }

= exp { β1 + β 2 *(AGE1 -AGE o ) + β3 *(OBS1 - OBS0 ) + β 4 *AGE1 + β5 *OBS1 }

Sol_logistic_stata.doc

PubHlth640 - Spring 2012

Intermediate Biostatistics

Page 8 of 8

(C) Now use the formula that you have for part “B” to write an expression for the estimated odds ratio for the exposure-disease relationship that considers both confounding and interaction when AGE=40 and OBS=1. Solution: ORˆ = exp { β1 + (40)β 4 + β 5 }

Predictor E AGE OBS AGEE OBSE

Value of Predictor for Person who is Exposed Not Exposed 1 40 1 40 1

0 40 1 0 0

OR = exp { β1 + β 2 *(40-40) + β3 *(1 - 1) + β 4 *40 + β5 *1 } = exp { β1 + β 4 *40 + β5 *1 }

Sol_logistic_stata.doc