Week 8 Lecture: Regression Models for Quantitative & Qualitative Predictor

Week 8 Lecture: Regression Models for Quantitative & Qualitative Predictor Variables (Chapter 8) Polynomial Regression Polynomial regression models ar...

Author: Abel Maxwell

4 downloads 1 Views 73KB Size

Report

Download PDF

Recommend Documents

Regression Models for Quantitative and Qualitative Predictors

Qualitative Information in Regression Analysis. QUALITATIVE INFORMATION in REGRESSION ANALYSIS. Qualitative Variables: Lecture Plan

Modeling Redundancy: Quantitative and Qualitative Models

11. Qualitative Predictor Variables

Multiple Linear Regression with Qualitative and Quantitative Independent Variables

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Exponentiating Poisson regression models. Lecture 15: Poisson assumptions, offsets, and relative risk. Poisson regression models

Chapter 2: Quantitative, Qualitative, and Mixed Research. Lecture Notes

Qualitative vs Quantitative Data

QUANTITATIVE & QUALITATIVE EDU702

Qualitative Models for Strategic Planning

Descriptive Research (I) Agenda. Quantitative design issues. Week 8 Lecture 1. Quantitative Design issues Correlation Research Survey Research

Models ANOVA. Comparing the Two Procedures. Dummy Coding. Multiple Regression ANOVA. Multiple Regression with Qualitative Variables

Lecture 1: Maximum likelihood estimation of spatial regression models

Lecture 8: Birth-death models #3

Modified bayesian regression modeling involving qualitative predictor variables: A tumor size study

Qualitative Variables and Regression Analysis

Lecture 12 Logistic regression

Qualitative and quantitative risk assessment

2. QUALITATIVE AND QUANTITATIVE COMPOSITION

2 QUALITATIVE AND QUANTITATIVE COMPOSITION

2. QUALITATIVE UND QUANTITATIVE ZUSAMMENSETZUNG

Lecture 12 Nonparametric Regression

Week 8 Lecture: Regression Models for Quantitative & Qualitative Predictor Variables (Chapter 8) Polynomial Regression Polynomial regression models are commonly used to model quadratic and higher power responses, mainly because curvilinear responses can be modeled using MLR rather than nonlinear regression. It involves using a single predictor variable raised to various powers to fit the curvilinear response. Polynomial regression can be used when the true curvilinear response function is a polynomial function or at least a good approximation of one. However, polynomial regression models should not be used for extrapolations! The fit statistics and inferences of MLR are applicable to polynomial regression models. The model form also looks similar to any other MLR model. For example, a second-order polynomial regression function with one predictor variable is: Yi = β 0 + β1 X i + β 2 X i2 + ε i .

Interaction Effects in the MLR Model Regression models with interaction effects are in the class of general linear models and are typically encountered when you use regression analysis in an ANOVA context. We will not go into this type of model in detail here, because that was covered in another class. It essentially involves making an interaction predictor variable from two predictor variables, then including it in the MLR model. For example, here’s an regression model with two predictor variables and an interaction variable (this is representative of a two-way ANOVA): 1

Yi = β 0 + β1 X i1 + β 2 X i 2 + β 3 X i1 X i 2 + ε i .

You can also fit models with quadratic and higher power interaction effects, but I refer you to the discussion in KNNL for that information. KNNL also provide a useful illustration (Figures 8.8 and 8.9) on page 309 that describes some of the possible interaction effects that you can encounter. You can make contour plots in SAS similar to those in Figures 8.8 and 8.9.

Indicator Variables or Qualitative Predictor Variables So far, we have been using dependent and independent variables that are measured on a continuous or quantitative scale. We will now discuss regression models that use variables that are measured on a discrete or qualitative scale. In this lecture, we will focus on qualitative independent variables. In a later lecture, we will discuss qualitative dependent variables (logistic regression, chapter 14).

One Qualitative Predictor Variable In a MLR model, independent variables can be measured quantitatively or qualitatively. A mixture of quantitative or qualitative independent variables does not affect the validity of a regression model. A qualitative independent variable is often called an indicator or dummy variable.

2

Example We will use the insurance firm example in chapter 8 of Kutner et al. (page 316). In this example, an economist wishes to relate speed with a particular adopted insurance innovation (Y) to the size of the firm (X1) and the type of firm (stock companies & mutual companies). The response variable, Y, represents the number of months elapsed between the time the first firm adopted the innovation and the time the given firm adopted the innovation. The size of the firm, X1, represents the amount off total assets of the firm (quantitative). The type of firm represents the type of company (qualitative). Indicator variables must be used to identify the type of firm in the MLR model. Indicator variables typically are assigned the values of 0 or 1, though other values can be used. We will use 0 and 1 in this example to identify the classification of the firm. We will create two new independent variables to represent the type of firm: 1

if stock company

0

otherwise

X2 =

The MLR model is then: Yi = β 0 + β1 X i1 + β 2 X i 2 + ε i

Thus, X2 will be activated if the firm is a stock company, but it will be deactivated if the firm is a mutual company. In the case of the stock company, X2 will be incorporated to the intercept term, β0. Thus, the lines in both cases (stock or mutual) have the same slope but different elevations (depending on whether the firm is a stock or mutual company). Here are the two models, depending on the value of X2: 1. E[Y ] = β 0 + β1 X1

if mutual company 3

2. E[Y ] = (β 0 + β 2 ) + β1 X 1

if stock company.

As you can see, both regression lines have the same slope, β1, but they have different intercepts. In the first equation, the intercept is β0 while in the second equation the intercept is β0 + β2. You can see the results of the regression on pages 458 – 460. In Table 11.1 on page 459, you can see how they coded the data to fit the model (note the indicator variable coding in column 4). See the sample SAS program, “IndicatorVariableExample.sas”, to see how to code this example in SAS.

Model With Interaction Effects You can also incorporate interaction terms that involve both quantitative and qualitative variables. In the insurance innovation example, the economist examined how the size of the firm interacted with the type of the firm. The MRL model with the cross-product term is: Yi = β 0 + β1 X i1 + β 2 X i 2 + β 3 X i1 X i 2 + ε i .

The response function can now have different elevations and slopes, depending on the firm: 1. E[Y ] = β 0 + β1 X1

if mutual company

2. E[Y ] = (β 0 + β 2 ) + (β1 + β 3 ) X 1

if stock company.

In these two cases, you can see that the elevation of the regression line is adjusted by b2 and the slope is adjusted by b3 if the company is a stock firm. You can see two possible results of fitting this model to the insurance innovation data in Figure 8.14 on page 325 and Figure 8.15 on page 326. The interaction term involving quantitative and qualitative predictors cannot be interpreted in the usual sense of “interfering” types of interaction. Instead, nonparallel responses that 4

intersect within the scope of the model (Figure 8.14) are called disordinal interactions; those that do not intersect within the scope of the model (Figure 8.15) are called ordinal interactions. When the MLR model with the cross-product term is fit to the insurance innovation data, the coefficient for the interaction term is not significantly different from zero (i.e., do not reject Ho: β2 = 0); see page 327 for the results.

More Complex Indicator Variable Models Qualitative Variable With More Than Two Classes If a qualitative independent variable has more than two classes, then more indicator variables will be necessary. In the example in Kutner et al. on page 318, the want to build a regression model that predicts tool wear (Y) as a function of tool speed (X1) and tool model. Tool model is a qualitative independent variable with four classes (M1, M2, M3, M4). We will need the following indicator variables to fit this model:

1

if tool model M1

0

otherwise

1

if tool model M2

0

otherwise

1

if tool model M3

0

otherwise

X2 =

X3 =

X4 =

5

The full model is then: Yi = β 0 + β1 X i1 + β 2 X i 2 + β 3 X i 3 + β 4 X i 4 + ε i .

Thus, the response functions will have different elevations depending on the tool type: 1. E[Y ] = (β 0 + β 2 ) + β1 X 1

if tool type M1

2. E[Y ] = (β 0 + β 3 ) + β1 X 1

if tool type M2

3. E[Y ] = (β 0 + β 4 ) + β1 X 1

if tool type M3

4. E[Y ] = β 0 + β1 X1

if tool type M4

You can view the results of their example on pages 319 - 320. You can also add interaction terms to the model as well.

More Than One Qualitative Variable As you may have already considered, you can build MLR models with more than one qualitative variable. In fact, you can build models that only have qualitative variables as predictors; you do not need to have a quantitative predictor if you do not want. You can also have interaction terms for multiple qualitative variables. I refer you to page 328 in Kutner et al. for an example with more than one qualitative variable. The procedure follows intuitively from what we have discussed already, so I will not talk about this example in class.

6

Comparing Two Regression Models You can use indicator variables to compare two or more regression models. For example, you might wish to determine if two populations are different, or you may want to compare two different regional models for predicting tree volume. We will work the example beginning on page 329 of Kutner et al. to show the procedure for comparing regression models. Basically, you want to first determine if the elevations are different, and if so, determine if the slopes are different. In their example, a company wants to compare the efficiency two production lines that make soap. They measured the amount of scrap produced (Y), speed of the line (X1), and two production lines (see Table 8.5 for the data). They fit the following model to the data: Yi = β 0 + β1X i1 + β 2 X i 2 + β 3 X i1X i 2 + ε i , where: Xi1 = line speed 1

if production line 1

0

if production line 2

Xi2 = i = 1, 2, …, 27.

They first examined the residuals for trends, and they formally tested that the error terms for both production lines was constant. The model passed these two diagnostics (results on page 331), so they proceeded to test if the elevations were different. Here is the formal hypothesis to test for different elevations (page 333): Ho: β2 = β3 = 0 Ha: not both β2 = 0 and β3 = 0.

7

F − statistic =

MSR (X 2 , X1 X 2 | X1 ) , MSE

where: MSR (X 2 , X1 X 2 | X 1 ) =

SSR (X 2 , X1 X 2 | X1 ) SSR (X 2 | X1 ) + SSR (X1 X 2 | X1 ) = 2 2

You can see that they reject Ho and conclude that the two regression lines are not identical.

They next tested if the slopes were the same (page 333): Ho: β3 = 0 Ho: β3 ≠ 0

F − statistic =

MSR (X1 X 2 | X1 , X 2 ) MSE

You can see that they do not reject Ho and conclude that the two regression lines have equal slopes. Thus, they concluded that though the scrap production increases with line speed at the same rate for both production lines, the expected amount of scrap produced on production line at any given speed differs significantly by a constant amount.

8