Selecting the best model. Model selection in regression. Selecting the best model. Regression: Opposing criteria. Michael Friendly Psychology 6140

Selecting the “best” model There are often: • many variables to choose • many models: subtly different configurations • different costs • different po...

Author: Daniella Todd

15 downloads 0 Views 675KB Size

Report

Download PDF

Recommend Documents

model selection in linear regression

Logistic Regression. The Model:

The multinomial logistic regression model

171:290 Model Selection Lecture VII: Criteria for Regression Model Selection

The Classical Multiple Regression Model

Model Selection for Small Sample Regression

Further Inference in the Multiple Regression Model

A COVARIANCE REGRESSION MODEL

STA6938-Logistic Regression Model

Multiple Regression. SPSS output. Multiple Regression Multiple Regression Model:

Lecture 2: The Classical Linear Regression Model

ADAPTIVE ESTIMATION IN AUTOREGRESSION OR -MIXING REGRESSION VIA MODEL SELECTION

REGRESSION Simple Linear Regression Model 12.2 Fitting the Regression Line 12.3 Inferences on the Slope Parameter

Selecting a Regression Saturated by Indicators

PSEUDO-R 2 IN LOGISTIC REGRESSION MODEL

SELECTING THE BEST HYDRAULIC CATTLE CHUTE

1) In the multiple regression model, the adjusted R 2,

MODEL SELECTION FOR (AUTO-)REGRESSION WITH DEPENDENT DATA

Model Selection Methods for Linear Regression and Phylogenetic Reconstruction

Building and Applying Logistic Regression Models (Chapter 6) MODEL SELECTION

Logistic Regression. Introduction CHAPTER The Logistic Regression Model 14.2 Inference for Logistic Regression

Statistical Inference in the Classical Linear Regression Model

Checking Assumptions in the Cox Proportional Hazards Regression Model

Ch. 11 Logistic Regression. The Model. Interpretation of the Parameters. Parameter Estimation. Inference. Model Checking

Selecting the “best” model There are often: • many variables to choose • many models: subtly different configurations • different costs • different power • not an unequivocal “best”

Model selection in regression

More realistic goal: Select a “most-satisficing” model – gets you where you want to go, at reasonable cost

Michael Friendly Psychology 6140

Box: “All models are wrong, but some are useful” 2

Selecting the “best” model

Regression: Opposing criteria • Good fit, good in-sample prediction

Criteria for model selection

Make R2 large or MSE small ĺ,QFOXGHPDQ\YDULDEOHV

• Sometimes quantifiable • Sometimes subjective • Sometimes biased by pre-

• Parsimony:

conceived ideas

Keep cost of data collection low, interpretation simple,

• Sometimes pre-conceived

standard errors small ĺ,QFOXGHIHZYDULDEOHV

ideas are truly important

• How well do they apply in future samples?

Model selection: the task of selecting a (mathematical) model from a set of potential models, given evidence and some goal.

3

4

Statistical goals

Model selection criteria

• Descriptive/exploratory

• R2 = SSRmodel / SSTotal

• Scientific explanation

• Adjusted R2 attempts to adjust for # predictors.

Describe relations between response & predictors ĺZDQWSUHFLVLRQSDUVLPRQ\"

Cannot decrease as more variables added ĺORRNDWǻR2 as new variables added

Test hypothesis, possibly ‘causal’ relations ĺ&RQWURODGMXVWIRUEDFNJURXQGYDULDEOHV ĺ:DQWSUHFLVHWHVWVIRUK\SRWKHVL]HGSUHGLFWRUV

Adj R 2

• Prediction/selection

How well will my model predict/select in future samples? ĺ&URVV-validation methods

• Data mining

§ n 1 · 2 1 ¨ ¸ (1 R ) ©np¹

• This is on the right track, but antiquated (Wherry, 1931)

Sometimes we have a huge # of possible predictors Don’t care about explanation Happy with a small % “lift” in prediction 5

Model selection criteria: Cp

Model selection criteria: Cp • Relation to incremental F test:

• Mallow’s Cp: measure of ‘total error of prediction’

Cp = p + (m+1-p) (Fp -1)

using p parameters est. of

1

V2

¦ var( yˆ ) ( yˆ random error

yˆ p )

2

true

6

Fp = incremental F for omitted predictors, testing H0: ȕp+1 = … = ȕm = 0 when there are m available predictors. p = # parameters, including intercept

bias

Cp = (SSEp / MSEall) – (n-2p) Related to AIC and other measures favoring model parsimony

7

H0 true (no bias)

H0 false (bias)

Cp §S

Cp > p

Fp §

Fp > 1

$³JRRG´PRGHOVKRXOGWKHUHIRUHKDYH&S§S

8

Scientific explanation

Model selection criteria: Parsimony

• Need to include variable(s) whose effect you are testing

• Attempt to balance goodness of fit vs. # predictors error

• Akaike Information Criterion (AIC)

AIC

§ SSE · n ln ¨ ¸ 2p © n ¹

• Bayesian Information Criterion (BIC) BIC

§ SSE · 2 n ln ¨ ¸ 2( p 2)q 2q where q © n ¹ error

Does gasoline price affect consumption? Does physical fitness decrease with age?

penalty

• Need to include control variable(s) that could affect the outcome

nVl SSE 2

Omitted control variables can bias other estimates E.g., per capita income might affect consumption Weight might affect physical fitness

penalty

• AIC & BIC Smaller = Better Model comparison statistics, not test statistics– no p-values Applicable to all statistical model comparisons– logistic

• Better to risk some reduced precision than bias by including more variables, even if p-values NS

regression, FA, mixed models, etc. 9

Descriptive/Exploratory

10

Example: US Fuel consumption pop tax nlic inc road drivers fuel

• Generally only include variables with strong statistical support (low p values). Choose models with highest adjusted R2 or lowest AIC) Parsimony particularly valuable for making in-sample predictions • High precision • Fewer variables to measure

• Models with AIC close to best model are also supported by the data If you need to choose just one, pick the simplest in this group Better to report alternatives, perhaps in a footnote • Examine whether statistically significant relationships have effects sizes & signs that are meaningful Units of regression coefficients: units of Y/units of X

state AL AR AZ CA CO CT DE FL 11

...

Population (1000s) Motor fuel tax (cents/gal.) Number licensed drivers (1000s) Per Capita Personal income ($) Length Federal Highways (mi.) Proportion licensed drivers Fuel consumption (/person) pop

tax

nlic

inc

road

drivers

fuel

3510 1978 1945 20468 2357 3082 565 7259

7.0 7.5 7.0 7.0 7.0 10.0 8.0 8.0

1801 1081 1173 12130 1475 1760 340 4084

3333 3357 4300 5002 4449 5342 4983 4188

6594 4121 3635 9794 4639 1333 602 5975

0.513 0.547 0.603 0.593 0.626 0.571 0.602 0.563

554 628 632 524 587 457 540 574

...

...

12

%include data(fuel); proc reg data=fuel; id state; model fuel = pop tax inc road drivers / selection = rsquare cp aic best=4; run; Number in Model

R-Square

C(p)

AIC

cpplot macro %cpplot(data=fuel, yvar=fuel, xvar=tax drivers road inc pop, gplot=CP AIC, plotchar=T D R I P, cpmax=20);

Variables in Model

1 0.4886 27.2658 423.6829 drivers 1 0.2141 65.5021 444.3002 pop 1 0.2037 66.9641 444.9368 tax 1 0.0600 86.9869 452.8996 inc ---------------------------------------------------------------------------2 0.6175 11.2968 411.7369 inc drivers 2 0.5567 19.7727 418.8210 tax drivers 2 0.5382 22.3532 420.7854 pop drivers 2 0.4926 28.6951 425.2970 road drivers ---------------------------------------------------------------------------3 0.6749 5.3057 405.9397 tax inc drivers 3 0.6522 8.4600 409.1703 pop tax drivers 3 0.6249 12.2636 412.7973 inc road drivers 3 0.6209 12.8280 413.3129 pop road drivers ---------------------------------------------------------------------------4 0.6956 4.4172 404.7775 pop tax inc drivers 4 0.6787 6.7723 407.3712 tax inc road drivers 4 0.6687 8.1598 408.8362 pop tax road drivers 4 0.6524 10.4390 411.1495 pop inc road drivers ---------------------------------------------------------------------------5 0.6986 6.0000 406.3030 pop tax inc road drivers

13

14

NB: Cp always = p for model with all predictors

Variable selection methods

Variable selection methods

• All possible regressions (or best subsets)

• Forward selection proc reg; model … / selection=forward SLentry=.10; At each step, find the variable Xk with the largest partial Fk value

proc reg; model … / selection=rsquare best=; R: leaps package: regsubsets()

2p -1 : p=10 ĺPRGHOV Useful overview, but beware of: • Effects of collinearity • Influential observations (n: small, moderate) • Lurking variables: unmeasured, but important ĺ8VH52, Cp, AIC to select candidate models, to be explored in more detail, not for final selection

Fk

MSR( X k | others) MSE ( X k others)

If Pr(Fk) F

11.76 43.94 0.02 2.93 12.54

0.0013 SLstay: remove from model; else STOP

Result depends on SLstay (liberal default) Pr > F 0.0117 0.5497 0.0003 0.0331

Variable inc Entered: R-Square = 0.6175 and C(p) = 11.2968

17

18

Summary of Stepwise Selection

Variable selection methods • Stepwise regression proc reg; model … / selection=stepwise SLentry=.10 SLstay=.10; Start with 2 forward selection steps Then alternate:

Step

Variable Entered

1 2 3 4

drivers inc tax pop

Variable Removed

Number Vars In

Label Proportion Per Capita Motor fuel Population

licensed drivers Personal income ($) tax (cents/gal.) (1000s)

1 2 3 4

Partial R-Square 0.4886 0.1290 0.0573 0.0207

Summary of Stepwise Selection

• Forward step: Add Xk w/ highest Fk if Pr(Fk)SLstay • Until: no variables entered or removed

Step

Model R-Square

C(p)

1 2 3 4

0.4886 0.6175 0.6749 0.6956

27.2658 11.2968 5.3057 4.4172

F Value

Pr > F

43.94 15.17 7.76 2.93