Simple Linear Regression for the Advertising Data

50 ● 40 ● ● Revenue 30 ● ● ● 20 ● ● ● ● ● ● ● 10 ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 5 10 15 20 25 Pages o...
Author: Joy Warren
1 downloads 0 Views 2MB Size
50



40





Revenue

30







20











● ●

10

● ●

0



● ● ● ●







● ● ● ● ●

● ● ●



● ● ●

5

10

15

20

25

Pages of Advertising

Simple Linear Regression for the Advertising Data

What do we do with the data? yi = Revenue of ith Issue xi = Pages of Advertisement in ith Issue i = 1, . . . , n n = 37 Sample Size

Primary Research Questions: 1.  How does pages of advertising relate to revenue?

Exploratory Results

50

r = 0.82 ●

Cov(X, Y ) = 65.83

40





Revenue

30



1.  Form





20







10



● ● ● ●







● ● ● ● ●

● ●

• 

● ●

● ● ●

5

10

Linear?

2.  Direction

● ●



0

• 







15 Pages of Advertising

20

25

Positive or Negative

3.  Strength 4.  Outliers

SLR Model Fit 4.09 + 1.67 ⇥ (Pages) 50

yˆ =

ˆ = 7.432 ●

40





Revenue

30







20











● ●

10

● ●

0



● ● ● ●







● ● ● ● ●

● ● ●



● ● ●

5

10

15

20

25

Pages of Advertising

Is this model any good? Or, does it explain the structure in the data well?

SLR Model Fit Measures of Goodness of Fit: 1.  R2 Total Sums of Squares (SST) =

n X

(yi

y¯)2

i=1

Sum of Square Errors (SSE) =

ˆ0 + ˆ1 xi

n X

(yi

n X

z}|{ ( yˆi

i=1

Sum of Squares from Regression (SSR) =

i=1

SST |{z}

Total Error

=

SSE |{z}

Error After Regression

+

z}|{ yˆi )2

ˆ0 + ˆ1 xi

SSR |{z}

y¯)2

Error Taken Away by Regression

SLR Model Fit Measures of Goodness of Fit: 1.  R2 R2 2 [0, 1] =

SSR =1 SST

SSE = 0.6719 SST

Interpretation: R2 is the percent of variation in revenue that is explained away by pages of advertisement. Intuition: Percent “better off” in predicting revenue when you include information from pages of advertisement. Issue: R2 only says how good your model is at explaining the data, it says nothing about how good your model is at prediction.

SLR Model Fit Measures of Goodness of Fit: 2.  Predictive Accuracy via cross validation i.  Randomly remove p% of your data (called a test set) ii.  Fit model to remaining (1-p)% of your data (called a training set) iii.  Use fit to predict the held-out test data. Predictive Bias = Predictive Mean Square Error (PMSE) =

1 ntest 1

n test X i=1 n test X

ntest i=1 p RPMSE = PMSE

(ˆ yi

yi )

(ˆ yi

yi ) 2

SLR Model Fit Interpreting Cross Validation Metrics Predictive Bias =

1 ntest

n test X

(ˆ yi

yi )

i=1

Bias: Systematic errors in estimation. For example, if bias0 then predictions are too high and if bias=0 predictions are just right. v u test u 1 nX RPMSE = t (ˆ yi ntest i=1

yi ) 2

Root predicted mean square error: How far off my predictions are on average.

SLR Model Fit 4.09 + 1.67 ⇥ (Pages)

50

yˆ =

ˆ = 7.432 ●

40





Revenue

30







20











● ●

10

● ●

0



● ● ● ●







● ● ● ● ●

● ● ●



● ● ●

5

10

15 Pages of Advertising

20

25

Are our assumptions OK? 1.  Linear – maybe 2.  Independent – maybe 3.  Normal – maybe 4.  Equal Variance – no

Assessing Model Assumptions Tools for Assessing Model Assumptions: 1.  Residuals vs. Fitted Values Scatterplot ˆ0

✏ˆi = yi

ˆ1 xi

) Should be independent (no pattern) of yˆi with 30

constant variance (if pattern then likely dependent).

20



10







● ● ● ● ●

0



● ● ● ● ● ●

● ●

● ●

● ● ●







● ● ● ●



−10

Residuals



● ● ●

0

10

20 Fitted Values

30

Assessing Model Assumptions Tools for Assessing Model Assumptions: 1.  Residuals vs. Fitted Values Scatterplot ✏ˆi = yi

ˆ0

ˆ1 xi

) Should be independent (no pattern) of yˆi with

constant variance (if pattern then likely dependent).

Assessing Model Assumptions Tools for Assessing Model Assumptions: 1.  Residuals vs. Fitted Values Scatterplot •  What is “close enough” to homoskedastic (equal variance) Breusch-Pagan Test (mathematical details are beyond the prereqs for this course): H0 : Data are homoskedastic HA : Data are heteroskedastic p

value : 0.01

Warning: Breusch-Pagan Test is highly sensitive so always check with fitted value vs. residual plot.

Assessing Model Assumptions Tools for Assessing Model Assumptions: 2.  Histogram (density) of standardized residuals should be normal: ✏ˆi /SE(ˆ✏i ) ⇠ N (0, 1) (in theory) Histogram of Standardized Residuals

10 5 0

Frequency

15

Do we have outliers?

−3

−2

−1

0

1

Standardized Residuals

2

3

4

Assessing Model Assumptions Tools for Assessing Model Assumptions: 3.  Normal Quantile-Quantile (QQ) Plot ✏ˆi /SE(ˆ ✏i ) ⇠ N (0, 1) (in theory)

i.  Sort ✏ˆi /SE(ˆ✏i ) so that ✏ˆ(1) /SE(ˆ✏(1) ) < · · · < ✏ˆ(n) /SE(ˆ✏(n) ) ii.  Find z(1) , . . . , z(n) so that Prob(z < zi ) ⇡ i/n from normal iii.  ✏ˆ(i) /SE(ˆ✏(i) ) ⇡ z(i) if the normal assumption holds 4

Normal Q−Q Plot

2

3



1



0





● ● ●

● ● ●

● ●



● ● ● ● ● ● ● ● ●

● ●













−1

● ●



● ●

−2

Sample Quantiles





−2

−1

0 Theoretical Quantiles

1

2

Here’s that outlier again!

Outliers in SLR Two questions: 1.  Should we worry about outliers? ˆ1 = Corr(Y, X) sy sx

Correlation is sensitive to outliers so our regression line will be sensitive as well

2.  How do we identify outliers? i.  Graphical – Histogram/QQ plot of standardized residuals ii.  Cooks Distance Di =

Pn

yj j=1 (ˆ

yˆj(

i) )

2

2ˆ 2

yˆj : Prediction of j th point using all data. yˆj(

i)

: Prediction of j th point using all data EXCEPT the ith point.

Outliers in SLR

50

Cooks Distance •  Use cutoff (rule of thumb) of 4/n as “influential” or “outlier” ●

40



D1 = 0.414



30



20











● ●

10

● ●

● ● ● ● ●

● ● ● ●

● ●









0

D8 = 0.409

Revenue

D2 = 0.395





● ●

● ● ●

5

10

15 Pages of Advertising

20

25

Assessing Model Assumptions Tools for Assessing Model Assumptions: 2.  What is “close enough” to normal? Kolmogorov-Smirnov (KS) Test (mathematical details are beyond the pre-reqs for this class): H0 : Data come from a normal HA : Data don’t come from a normal p

value : 0.2689

Dealing with Violations of the SLR Assumptions Based on the above diagnostics we know: 1.  Linear assumption is a bit sketchy 2.  Homoskedasticity is certainly an issue 3.  Normality OK with the exception of a few outliers. So, what do we do? 1.  Change your assumptions (hard but preferred) 2.  Transformations Idea

tY (yi ) =

0+

1 tX (xi ) + ✏i

tY (yi ) = Transformation of y tX (yi ) = Transformation of x

Example

log(yi ) =

0+

1

tY (yi ) = log(yi ) p tX (xi ) = xi

p

xi + ✏ i

Transformations Name log

Transformation ln(yi )

Fixes Nonlinearity, Heteroskedasticity

Issues Only if positive

Nonlinearity, Heteroskedasticity

Only if positive

yi /xi

Heteroskedasticity

Reverses relationship

2 ( 1, 1)

Heteroskedasticity

Hard to interpret

Non-normality

Impossible to interpret

p

Square root · Power Box-Cox

yi , (

yi

1

ln(yi )

yi

if if

6= 0 =0

With Box-Cox transformations, λcan be treated as a parameter and estimated using least squares OR maximum likelihood.

4

50

Transformations ●





3

40







20





● ● ● ● ●



● ● ●

1.0



● ●

1.5

● ●

● ● ●



● ● ●

● ● ● ●



2.0

2.5



● ●







● ● ●



● ● ●

● ●



1



● ●

0

10





2

ln(Revenue)

30

● ●

0



● ●





Revenue





● ● ● ●

3.0

5

10

15

● ●

6

● ●



3







7





● ● ●



1.0

5

● ●

2 1





1.5

2.0 ln(Pages)

2.5

3.0



● ●

● ● ●







4

● ●

3



● ● ●



1

● ●

















sqrt(Revenue)

2



0

ln(Revenue)

● ●

25

Pages

4

ln(Pages)

20

● ● ● ●

1.5



● ● ● ●

2.0

● ●





● ●

● ● ●

● ●

● ● ●

2.5

3.0

3.5

sqrt(Pages)

4.0

4.5

5.0

Transformations Issues with transformations: 1.  With the exception of Box-Cox, you’re guessing so your choice of transformation is subjective. 2.  Changes the interpretation of the parameters. •  Need to back-transform in order to produce anything interpretable 3.  Changes the standard errors of the parameters. 4.  Not always easy to keep track of all your assumptions.

Advertisement Example ln(yi ) =

0

[ = ) ln(y)

+

Cooks distance values are better now.

ln(xi ) + ✏i

1

1.05 + 1.48 ⇥ ln(x)

) yˆ = exp { 1.05 + 1.48 ⇥ ln(x)} ●



2



Normal Q−Q Plot

8

50

Histogram of Standardized Residuals

● ● ●

1

40



6

● ●● ● ● ● ●● ● ● ●●





0

Sample Quantiles

Revenue

● ●

●● ● ● ● ●● ● ●●

20



4

Frequency

30









−1

● ●



● ●









2

● ●

10

● ● ●









−2





● ●

● ●

0

● ● ● ● ●



● ●



0



5

10

15

Pages of Advertising

20

25

−3

−2

−1

0

1

Standardized Residuals

2

−2

−1

0 Theoretical Quantiles

1

2

End of Advertisement Analysis (see webpage for R and SAS code)