or categorical explanatory variable(s)

Logistic Regression • Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) – Goal: Model the pro...

Author: Margaret Johnston

4 downloads 0 Views 189KB Size

Report

Download PDF

Recommend Documents

Categorical Variables

QUALITATIVE EXPLANATORY VARIABLES

1 Coding Categorical Variables

Describing Data: Categorical Variables

Section 21: Comparing categorical variables

Exercise: Categorical variables (inzight Lite)

Categorical Variables in Regression Analyses

FORWARD SELECTION OF EXPLANATORY VARIABLES

Displaying Categorical Variables Frequency Table

Explanatory Variables of Road Traffic in Portugal

Name. 1. Identify the variables and tell whether each is categorical or quantitative

1.1 Categorical Response Data. Example. Two Types of Categorical Variables. Example. 1.2 Probability Distributions for Categorical Data

From Coding to Categorical Variables: The New MAXQDA Function Transform into a Categorical Variable

Chapter 22. Two Categorical Variables: The Chi- Square Test

On Pearson s X 2 for categorical response variables

THE CONCEPT OF REINFORCEMENT: EXPLANATORY OR DESCRIPTIVE?

Common Sta 101 Commands for R. 1 One quantitative variable. 2 One categorical variable. 3 Two categorical variables. Summary statistics

- multiple linear regression, when having 1. one continuous response variable 2. Two or more continuous explanatory variables

Explanatory Part

Association between Two or More Variables

Explanatory Prompt

Is antisocial personality disorder continuous or categorical? A taxometric analysis

Categorical Syllogisms

Logistic Regression • Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) – Goal: Model the probability of a particular as a function of the predictor variable(s) – Problem: Probabilities are bounded between 0 and 1

• Distribution of Responses: Binomial • Link Function: ⎛ µ ⎞ ⎟⎟ g ( µ ) = log⎜⎜ ⎝1− µ ⎠

Logistic Regression with 1 Predictor • Response - Presence/Absence of characteristic • Predictor - Numeric variable observed for each case • Model - π(x) ≡ Probability of presence at predictor level x α + βx

e π ( x) = α + βx 1+ e • β = 0 ⇒ P(Presence) is the same at each level of x • β > 0 ⇒ P(Presence) increases as x increases • β < 0 ⇒ P(Presence) decreases as x increases

Logistic Regression with 1 Predictor • α, β are unknown parameters and must be estimated using statistical software such as SPSS, SAS, or STATA · Primary interest in estimating and testing hypotheses regarding β · Large-Sample test (Wald Test): HA: β ≠ 0 · H0 : β = 0

T .S . : X

2 obs

R .R . : X

2 obs

⎛ ^ ⎜ β = ⎜ ^ ⎜σ ^ ⎜ β ⎝ ≥ χ α2 , 1

P − val : P ( χ

2

⎞ ⎟ ⎟ ⎟ ⎟ ⎠

≥ X

2

2 obs

)

Example - Rizatriptan for Migraine • Response - Complete Pain Relief at 2 hours (Yes/No) • Predictor - Dose (mg): Placebo (0),2.5,5,10

Dose 0 2.5 5 10

# Patients 67 75 130 145

# Relieved 2 7 29 40

% Relieved 3.0 9.3 22.3 27.6

Example - Rizatriptan for Migraine (SPSS) Variables in the Equation Step a 1

DOSE Constant

B .165 -2.490

S.E. .037 .285

Wald 19.819 76.456

df 1 1

Sig. .000 .000

Exp(B) 1.180 .083

a. Variable(s) entered on step 1: DOSE.

−2.490 + 0.165 x

e π ( x) = − 2.490 + 0.165 x 1+ e ^

H0 : β = 0 H A : β ≠ 0 2

2 T .S . : X obs

⎛ 0.165 ⎞ =⎜ ⎟ = 19.819 ⎝ 0.037 ⎠

2 RR : X obs ≥ χ .205,1 = 3.84

P − val : .000

Odds Ratio • Interpretation of Regression Coefficient (β): – In linear regression, the slope coefficient is the change in the mean response as x increases by 1 unit – In logistic regression, we can show that:

odds ( x + 1) = eβ odds ( x)

⎛ π ( x) ⎞ ⎜⎜ odds ( x) = ⎟⎟ 1 − π ( x) ⎠ ⎝

• Thus eβ represents the change in the odds of the outcome (multiplicatively) by increasing x by 1 unit • If β = 0, the odds and probability are the same at all x levels (eβ=1) • If β > 0 , the odds and probability increase as x increases (eβ>1) • If β < 0 , the odds and probability decrease as x increases (eβ j ) ⎦

j = 1, K , c − 1

This is called the proportional odds model, and assumes the effect of X is the same for each cumulative probability

Example - Urban Renewal Attitudes • Response: Attitude toward urban renewal project (Negative (Y=1), Moderate (Y=2), Positive (Y=3)) • Predictor Variable: Respondent’s Race (White, Nonwhite) • Contingency Table: A ttitu d e \R a c e N e g a tiv e (Y = 1 ) M o d e r a te (Y = 2 ) P o sitiv e (Y = 3 )

W h ite 101 91 170

N o n w h ite 106 127 190

SPSS Output • Note that SPSS fits the model in the following form: ⎡ P (Y ≤ j ) ⎤ logit[P(Y ≤ j )] = log ⎢ = α j − βX ⎥ ⎣ P (Y > j ) ⎦

j = 1, K , c − 1

Parameter Estimates

Threshold Location

[ATTITUDE = 1] [ATTITUDE = 2] [RACE=0] [RACE=1]

Estimate Std. Error -1.027 .102 .165 .094 -.001 .133 a 0 .

Wald 101.993 3.070 .000 .

df 1 1 1 0

Sig. .000 .080 .993 .

95% Confidence Interval Lower Bound Upper Bound -1.227 -.828 -.020 .351 -.263 .260 . .

Link function: Logit. a. This parameter is set to zero because it is redundant.

Note that the race variable is not significant (or even close).

Fitted Equation • The fitted equation for each group/category: ⎡ P(Y ≤ 1 | White) ⎤ Negative/White : logit ⎢ = −1.027 − (−0.001) = −1.026 ⎥ ⎣ P(Y > 1 | White) ⎦ ⎡ P(Y ≤ 1 | Nonwhite) ⎤ = −1.027 − (0) = −1.027 Negative/Nonwhite : logit ⎢ ⎥ ⎣ P(Y > 1 | Nonwhite) ⎦ ⎡ P(Y ≤ 2 | White) ⎤ = 0.165 − (−0.001) = 0.166 Neg or Mod/White : logit ⎢ ⎥ ⎣ P(Y > 2 | White) ⎦ ⎡ P (Y ≤ 2 | Nonwhite) ⎤ = 0.165 − 0 = 0.165 Neg or Mod/Nonwhite : logit ⎢ ⎥ ⎣ P (Y > 2 | Nonwhite) ⎦

For each group, the fitted probability of falling in that set of categories is eL/(1+eL) where L is the logit value (0.264,0.264,0.541,0.541)

Inference for Regression Coefficients • If β = 0, the response (Y) is independent of X • Z-test can be conducted to test this (estimate divided by its standard error) • Most software will conduct the Wald test, with the statistic being the z-statistic squared, which has a chi-squared distribution with 1 degree of freedom under the null hypothesis • Odds ratio of increasing X by 1 unit and its confidence interval are obtained by raising e to the power of the regression coefficient and its upper and lower bounds

Example - Urban Renewal Attitudes Parameter Estimates

Threshold Location

[ATTITUDE = 1] [ATTITUDE = 2] [RACE=0] [RACE=1]

Estimate Std. Error -1.027 .102 .165 .094 -.001 .133 a 0 .

Wald 101.993 3.070 .000 .

df 1 1 1 0

Sig. .000 .080 .993 .

95% Confidence Interval Lower Bound Upper Bound -1.227 -.828 -.020 .351 -.263 .260 . .

Link function: Logit. a. This parameter is set to zero because it is redundant.

• Z-statistic for testing for race differences: Z=0.001/0.133 = 0.0075 (recall model estimates -β) • Wald statistic: .000 (P-value=.993) • Estimated odds ratio: e.001 = 1.001 • 95% Confidence Interval: (e-.260,e.263)=(0.771,1.301) • Interval contains 1, odds of being in a given category or below is same for whites as nonwhites

Ordinal Predictors • Creating dummy variables for ordinal categories treats them as if nominal • To make an ordinal variable, create a new variable X that models the levels of the ordinal variable • Setting depends on assignment of levels (simplest form is to let X=1,...,c for the categories which treats levels as being equally spaced)