or categorical explanatory variable(s)

Logistic Regression • Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) – Goal: Model the pro...
4 downloads 0 Views 189KB Size
Logistic Regression • Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) – Goal: Model the probability of a particular as a function of the predictor variable(s) – Problem: Probabilities are bounded between 0 and 1

• Distribution of Responses: Binomial • Link Function: ⎛ µ ⎞ ⎟⎟ g ( µ ) = log⎜⎜ ⎝1− µ ⎠

Logistic Regression with 1 Predictor • Response - Presence/Absence of characteristic • Predictor - Numeric variable observed for each case • Model - π(x) ≡ Probability of presence at predictor level x α + βx

e π ( x) = α + βx 1+ e • β = 0 ⇒ P(Presence) is the same at each level of x • β > 0 ⇒ P(Presence) increases as x increases • β < 0 ⇒ P(Presence) decreases as x increases

Logistic Regression with 1 Predictor • α, β are unknown parameters and must be estimated using statistical software such as SPSS, SAS, or STATA · Primary interest in estimating and testing hypotheses regarding β · Large-Sample test (Wald Test): HA: β ≠ 0 · H0 : β = 0

T .S . : X

2 obs

R .R . : X

2 obs

⎛ ^ ⎜ β = ⎜ ^ ⎜σ ^ ⎜ β ⎝ ≥ χ α2 , 1

P − val : P ( χ

2

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

≥ X

2

2 obs

)

Example - Rizatriptan for Migraine • Response - Complete Pain Relief at 2 hours (Yes/No) • Predictor - Dose (mg): Placebo (0),2.5,5,10

Dose 0 2.5 5 10

# Patients 67 75 130 145

# Relieved 2 7 29 40

% Relieved 3.0 9.3 22.3 27.6

Example - Rizatriptan for Migraine (SPSS) Variables in the Equation Step a 1

DOSE Constant

B .165 -2.490

S.E. .037 .285

Wald 19.819 76.456

df 1 1

Sig. .000 .000

Exp(B) 1.180 .083

a. Variable(s) entered on step 1: DOSE.

−2.490 + 0.165 x

e π ( x) = − 2.490 + 0.165 x 1+ e ^

H0 : β = 0 H A : β ≠ 0 2

2 T .S . : X obs

⎛ 0.165 ⎞ =⎜ ⎟ = 19.819 ⎝ 0.037 ⎠

2 RR : X obs ≥ χ .205,1 = 3.84

P − val : .000

Odds Ratio • Interpretation of Regression Coefficient (β): – In linear regression, the slope coefficient is the change in the mean response as x increases by 1 unit – In logistic regression, we can show that:

odds ( x + 1) = eβ odds ( x)

⎛ π ( x) ⎞ ⎜⎜ odds ( x) = ⎟⎟ 1 − π ( x) ⎠ ⎝

• Thus eβ represents the change in the odds of the outcome (multiplicatively) by increasing x by 1 unit • If β = 0, the odds and probability are the same at all x levels (eβ=1) • If β > 0 , the odds and probability increase as x increases (eβ>1) • If β < 0 , the odds and probability decrease as x increases (eβ j ) ⎦

j = 1, K , c − 1

This is called the proportional odds model, and assumes the effect of X is the same for each cumulative probability

Example - Urban Renewal Attitudes • Response: Attitude toward urban renewal project (Negative (Y=1), Moderate (Y=2), Positive (Y=3)) • Predictor Variable: Respondent’s Race (White, Nonwhite) • Contingency Table: A ttitu d e \R a c e N e g a tiv e (Y = 1 ) M o d e r a te (Y = 2 ) P o sitiv e (Y = 3 )

W h ite 101 91 170

N o n w h ite 106 127 190

SPSS Output • Note that SPSS fits the model in the following form: ⎡ P (Y ≤ j ) ⎤ logit[P(Y ≤ j )] = log ⎢ = α j − βX ⎥ ⎣ P (Y > j ) ⎦

j = 1, K , c − 1

Parameter Estimates

Threshold Location

[ATTITUDE = 1] [ATTITUDE = 2] [RACE=0] [RACE=1]

Estimate Std. Error -1.027 .102 .165 .094 -.001 .133 a 0 .

Wald 101.993 3.070 .000 .

df 1 1 1 0

Sig. .000 .080 .993 .

95% Confidence Interval Lower Bound Upper Bound -1.227 -.828 -.020 .351 -.263 .260 . .

Link function: Logit. a. This parameter is set to zero because it is redundant.

Note that the race variable is not significant (or even close).

Fitted Equation • The fitted equation for each group/category: ⎡ P(Y ≤ 1 | White) ⎤ Negative/White : logit ⎢ = −1.027 − (−0.001) = −1.026 ⎥ ⎣ P(Y > 1 | White) ⎦ ⎡ P(Y ≤ 1 | Nonwhite) ⎤ = −1.027 − (0) = −1.027 Negative/Nonwhite : logit ⎢ ⎥ ⎣ P(Y > 1 | Nonwhite) ⎦ ⎡ P(Y ≤ 2 | White) ⎤ = 0.165 − (−0.001) = 0.166 Neg or Mod/White : logit ⎢ ⎥ ⎣ P(Y > 2 | White) ⎦ ⎡ P (Y ≤ 2 | Nonwhite) ⎤ = 0.165 − 0 = 0.165 Neg or Mod/Nonwhite : logit ⎢ ⎥ ⎣ P (Y > 2 | Nonwhite) ⎦

For each group, the fitted probability of falling in that set of categories is eL/(1+eL) where L is the logit value (0.264,0.264,0.541,0.541)

Inference for Regression Coefficients • If β = 0, the response (Y) is independent of X • Z-test can be conducted to test this (estimate divided by its standard error) • Most software will conduct the Wald test, with the statistic being the z-statistic squared, which has a chi-squared distribution with 1 degree of freedom under the null hypothesis • Odds ratio of increasing X by 1 unit and its confidence interval are obtained by raising e to the power of the regression coefficient and its upper and lower bounds

Example - Urban Renewal Attitudes Parameter Estimates

Threshold Location

[ATTITUDE = 1] [ATTITUDE = 2] [RACE=0] [RACE=1]

Estimate Std. Error -1.027 .102 .165 .094 -.001 .133 a 0 .

Wald 101.993 3.070 .000 .

df 1 1 1 0

Sig. .000 .080 .993 .

95% Confidence Interval Lower Bound Upper Bound -1.227 -.828 -.020 .351 -.263 .260 . .

Link function: Logit. a. This parameter is set to zero because it is redundant.

• Z-statistic for testing for race differences: Z=0.001/0.133 = 0.0075 (recall model estimates -β) • Wald statistic: .000 (P-value=.993) • Estimated odds ratio: e.001 = 1.001 • 95% Confidence Interval: (e-.260,e.263)=(0.771,1.301) • Interval contains 1, odds of being in a given category or below is same for whites as nonwhites

Ordinal Predictors • Creating dummy variables for ordinal categories treats them as if nominal • To make an ordinal variable, create a new variable X that models the levels of the ordinal variable • Setting depends on assignment of levels (simplest form is to let X=1,...,c for the categories which treats levels as being equally spaced)

Suggest Documents