Residual Analysis and Outliers

Residual Analysis and Outliers Lecture 48 Sections 13.4 - 13.5 Robb T. Koether Hampden-Sydney College Wed, Apr 11, 2012 Robb T. Koether (Hampden-Syd...
Author: Ginger Burke
0 downloads 2 Views 379KB Size
Residual Analysis and Outliers Lecture 48 Sections 13.4 - 13.5 Robb T. Koether Hampden-Sydney College

Wed, Apr 11, 2012

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

1 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

2 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

3 / 31

Introduction

How do we know that a linear regression model is the best choice?

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

4 / 31

Introduction

How do we know that a linear regression model is the best choice? What other types of regression are there?

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

4 / 31

Introduction

How do we know that a linear regression model is the best choice? What other types of regression are there? There are many other types.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

4 / 31

Introduction

How do we know that a linear regression model is the best choice? What other types of regression are there? There are many other types. How many would you like?

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

4 / 31

Introduction

How do we know that a linear regression model is the best choice? What other types of regression are there? There are many other types. How many would you like? The linear model is by far the simplest, but it is not the only choice.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

4 / 31

TI-83 - Nonlinear Regression TI-83 Nonlinear Regression The TI-83 will do a variety of nonlinear regressions. Press STAT > CALC. The list includes LinReg - Linear regression: yˆ = a + bx. QuadReg - Quadratic regression: yˆ = ax 2 + bx + c. CubicReg - Cubic regression: yˆ = ax 3 + bx 2 + cx + d.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

5 / 31

TI-83 - Nonlinear Regression

TI-83 Nonlinear Regression And. . . QuartReg - Quartic regression: yˆ = ax 4 + bx 3 + cx 2 + dx + e. LnReg - Logarithmic regression: yˆ = a + b ln x. ExpReg - Exponential regression: yˆ = abx .

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

6 / 31

TI-83 - Nonlinear Regression

TI-83 Nonlinear Regression And. . . PwrReg - Power regression: yˆ = ax b . Logistic - Logistic regression: yˆ =

c . 1 + ae−bx

SinReg - Sinusoidal regression: yˆ = a sin (bx + c) + d.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

7 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

8 / 31

The Appropriateness of the Linear Model

We can learn a bit about the nature of the model by examining the residuals. This is called residual analysis. First, we need to find the residuals ei = yi − yˆi . Then we draw a scatterplot of x versus e and see whether there is a pattern.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

9 / 31

The Appropriateness of the Linear Model

To do this on the TI-83, first find the predicted values yˆ and store them in L3 : Y1 (L1 ) → L3 Then find the residuals and store them in L4 : L2 − L3 → L4 Then draw a scatterplot of L1 (x) versus L4 (e).

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

10 / 31

The Residual Plot Example (Residual Plots) Free lunch rate vs. graduation rate

Graduation Rate

90 80 70 60 50 40 0

Robb T. Koether (Hampden-Sydney College)

10

20

30 40 50 60 Free Lunch Rate

Residual Analysis and Outliers

70

80

Wed, Apr 11, 2012

11 / 31

The Residual Plot Example (Residual Plots) Free lunch rate vs. graduation rate

Graduation Rate

90 80 70 60 50 40 0

Robb T. Koether (Hampden-Sydney College)

10

20

30 40 50 60 Free Lunch Rate

Residual Analysis and Outliers

70

80

Wed, Apr 11, 2012

11 / 31

The Residual Plot Example (Residual Plots) The residual plot

20

Residuals

10 0 -10 -20 10

Robb T. Koether (Hampden-Sydney College)

20

30 40 50 60 Free Lunch Rate

Residual Analysis and Outliers

70

80

Wed, Apr 11, 2012

11 / 31

The Appropriateness of the Linear Model

If the residual plot shows no clear pattern, but just a big blob of points, then the linear model is appropriate. On the other hand, if the residual plot shows a distinct curvature, or any other distinct pattern, then the linear model may not be appropriate.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

12 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

13 / 31

A Nonlinear Model Example (A Nonlinear Model) Consider the following data. x 1 2 2 2 2 3 3 4 4

Robb T. Koether (Hampden-Sydney College)

y 2 2 4 4 5 7 8 9 10

x 5 6 6 7 7 7 8 8

Residual Analysis and Outliers

y 12 9 12 7 9 11 9 10

Wed, Apr 11, 2012

14 / 31

A Nonlinear Model Example (A Nonlinear Model) The scatterplot

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

A Nonlinear Model Example (A Nonlinear Model) The regression line

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

A Nonlinear Model Example (A Nonlinear Model) The residual plot

4 2 0 -2 -4

0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

A Nonlinear Model Example (A Nonlinear Model) The residual plot

4 2 0 -2 -4

0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

A Nonlinear Model Example (A Nonlinear Model) Quadratic regression

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

A Nonlinear Model Example (A Nonlinear Model) Quadratic regression

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

15 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

16 / 31

Outliers

Definition (Outlier) An outlier is a point with an unusually large residual (e.g., at least 2.5 standard deviations from the mean).

Definition (Influential Point) An influential point is a point that exerts a inordinate influence on the regression line.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

17 / 31

Outliers

An outlier may or may not be influential. An influential point may or may not be an outlier.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

18 / 31

Outliers and Influential Points Example (Outliers and Influential Points) Consider the following data. x 1 2 3 4 4 4 5 5 6

Robb T. Koether (Hampden-Sydney College)

y 6 5 5 6 4 10 3 4 3

Residual Analysis and Outliers

Wed, Apr 11, 2012

19 / 31

Outliers and Influential Points Example (Outliers and Influential Points) The scatterplot 12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

20 / 31

Outliers and Influential Points Example (Outliers and Influential Points) The regression line is yˆ = 7.0 − 0.5x. x 1 2 3 4 4 4 5 5 6

Robb T. Koether (Hampden-Sydney College)

y 6 5 5 6 4 10 3 4 3



y − yˆ

Residual Analysis and Outliers

Wed, Apr 11, 2012

21 / 31

Outliers and Influential Points Example (Outliers and Influential Points) The regression line is yˆ = 7.0 − 0.5x. x 1 2 3 4 4 4 5 5 6

Robb T. Koether (Hampden-Sydney College)

y 6 5 5 6 4 10 3 4 3

yˆ 6.5 6.0 5.5 5.0 5.0 5.0 4.5 4.5 4.0

y − yˆ

Residual Analysis and Outliers

Wed, Apr 11, 2012

21 / 31

Outliers and Influential Points Example (Outliers and Influential Points) The regression line is yˆ = 7.0 − 0.5x. x 1 2 3 4 4 4 5 5 6

Robb T. Koether (Hampden-Sydney College)

y 6 5 5 6 4 10 3 4 3

yˆ 6.5 6.0 5.5 5.0 5.0 5.0 4.5 4.5 4.0

y − yˆ −0.5 −1.0 −0.5 1.0 −1.0 5.0 −1.5 −0.5 −1.0

Residual Analysis and Outliers

Wed, Apr 11, 2012

21 / 31

Outliers and Influential Points

The mean residual is 0.0 (always) and the standard deviation of these residuals is 2.0. Thus, the residual 5.0 is 2.5 standard deviations above the mean, an outlier. But, is the point (4, 10) influential? Remove it and see what the effect is.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

22 / 31

Outliers and Influential Points Example (Outliers and Influential Points) Including the point (4, 10)

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

23 / 31

Outliers and Influential Points Example (Outliers and Influential Points) Excluding the point (4, 10)

12 10 8 6 4 2 0

Robb T. Koether (Hampden-Sydney College)

1

2

3

4

5

6

Residual Analysis and Outliers

7

8

Wed, Apr 11, 2012

23 / 31

Outliers and Influential Points

The regression line of the remaining points is yˆ = 6.615 − 0.564x. This is nearly the same as yˆ = 7.0 − 0.5x.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

24 / 31

Outliers and Influential Points

Now change the point (4, 10) to the point (12, 12). x 1 2 3 4 4 5 5 6 12

Robb T. Koether (Hampden-Sydney College)

y 6 5 5 6 4 3 4 3 12

Residual Analysis and Outliers

Wed, Apr 11, 2012

25 / 31

Outliers and Influential Points

12 10 8 6 4 2 0

1

2

3

4

5

6

7

8

9

10

11

12

Is (12, 12) an outlier?

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

26 / 31

Outliers and Influential Points

The regression line including (12, 12) is yˆ = 2.767 + 0.55x. Removing (12, 12) changes it to yˆ = 6.615 − 0.564x .

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

27 / 31

Outliers and Influential Points Example (Outliers and Influential Points) Including the point (12, 12)

12 10 8 6 4 2 0

1

2

Robb T. Koether (Hampden-Sydney College)

3

4

5

6

7

8

Residual Analysis and Outliers

9

10

11

12

Wed, Apr 11, 2012

28 / 31

Outliers and Influential Points Example (Outliers and Influential Points) Excluding the point (12, 12)

12 10 8 6 4 2 0

1

2

Robb T. Koether (Hampden-Sydney College)

3

4

5

6

7

8

Residual Analysis and Outliers

9

10

11

12

Wed, Apr 11, 2012

28 / 31

Outliers and Influential Points

Yet the residual of (12, 12) is only 2.63. The standard deviation of the set of residuals is 2.12. (12, 12) is only 1.24 standard deviations above the mean. Therefore, (12, 12) is not an outlier, but it is influential.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

29 / 31

Outline

1

Introduction

2

Residual Analysis

3

Nonlinear Regression

4

Outliers and Influential Points

5

Assignment

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

30 / 31

Assignment

Homework Read Sections 13.4, 13.5, pages 823 - 834. Let’s Do It! 13.5, 13.6. Exercises 8, 9, 10, page 835.

Robb T. Koether (Hampden-Sydney College)

Residual Analysis and Outliers

Wed, Apr 11, 2012

31 / 31