2013. Partial least Squares. Multivariate Regression. Multivariate Regression. MLR: Multiple Linear Regression

12/9/2013 Partial least Squares • Multivariate regression • Multiple Linear Regression (MLR) • Principal Component Regression (PCR) • Partial Least S...
Author: Barry Dean
12/9/2013

Partial least Squares • Multivariate regression • Multiple Linear Regression (MLR) • Principal Component Regression (PCR) • Partial Least Squares (PLS)

Partial Least Squares

• Validation

A tutorial • Preprocessing Lutgarde Buydens

Multivariate Regression

Multivariate Regression k

Raw data

p

Raw data

2

k

p

2

1.5

1.5

1

Y

X

1

0.5

Y

X 0.5

0

n

0

n

-0.5 2000

4000

6000

8000

10000

12000

14000

-1

W avenumber (cm ) 4000

6000

8000

10000

12000

14000

Rows: Cases, observations …

W avenumber (cm-1 )

Rows: Cases, observations, …

Collums: Variables, Classes, tags

Analytical observations of different samples Experimental runs Persons …. X: Independent variabels (will be always available) Y: Dependent variables ( to be predicted later from X)

P: Spectral variables Analytical measurements

Y = f(X) : Predict Y from X MLR: Multiple Linear Regression PCR: Principal Component Regression PLS: Partial Least Sqaures

K: Class information Concentration,..

MLR: Multiple Linear Regression

From univariate to Multiple Linear Regression (MLR) y

y

y= b0 +b1 x1 + ε

  

b0 : intercept b1 : slope

ε

Least squares regression

y= b0 +b1 x1 + ε

  

 

b0 : intercept b1 : slope

ε  

Least squares regression

Collums: Variables, Classes, tags

X: Independent variabels (will be always available) Y: Dependent variables ( to be predicted later from X)

x Multiple Linear Regression

y

y= b0 +b1 x1 + b2x2 + … bpxp + ε  

-0.5 2000

^ Y  Y E

  

maximizes

x

 ε 

  x1

r ( y, y ) x2

1

12/9/2013

MLR: Multiple Linear  Regression y= b0 +b1 x1 + b2x2 + … bpxp + ε

y

 ε 

^ Y  Y E

MLR: Multiple Linear Regression

x

• Uncorrelated X-variables required

 

• n  p +1

r(x1,x2) 1

y  x1

x2

p+1

Ynk = XnpBpk + Enk

y

b = (XTX)-1XTy

: : :

=

n

n

b0 b1

X

1 1

x1

e + x2

bp

1 1

n

MLR: Multiple Linear Regression

MLR: Multiple Linear Regression Disadavantages: (XTX)-1

y

• Uncorrelated X-variables required 

• Uncorrelated X-variables required Set A 

y

r(x1,x2) 1

Fits a plane through a line !!  

 

x1

Set B



r(x1,x2) 1

x1

x2

x1

x2

x2

y

-1.01

-0.99

-1.01

-0.99

-1.89

3.23

3.25

3.23

3.25

10.33

5.49

5.55

5.49

5.55

19.09

0.23

0.21

0.23

0.23

2.19

-2.87

-2.91

-2.87

-2.91

-8.09

3.67

3.76

3.67

3.76

11.29

y= b1 x1 + b2x2 + ε

x2 MLR

b1

b2

b1

10.3

-6.92

2.96

R2

b2

R2

=0.98

x1

yn1 = Xnpbp1 + en1

0.28

=0.98

PCR: Principal Component Regression

(XTX)-1

• Uncorrelated X-variables required

Step 1: Perform PCA on the original X

• n  p +1

Step 2 : Use the orthogonal PC-scores as independent variables in a MLR model p cols

a cols PCA T

X Step 1 X

a1 a2

MLR

aa

y Step2

p n-rows

n-rows

n Dimension reduction

 Variable Selection  Latent variables (PCR, PLS)

Step 3: Calculate b-coefficients from the a-coefficients

b0

n-rows a1 a2 aa

b1 Step 3 bp

2

12/9/2013

PCR: Principal Component Regression

PCR: Principal Component Regression

xp Step 0 : Meancenter

X

Step 1: Perform PCA:

X = TPT  X* = (TPT)*

Step 2: Perform MLR

Y=TA

PC1   

A = (TTT)-1TTY

x1

Step 3 : Calculate B

Y = X* B

Y = (T PT) B

MLR on reconstructed X*= (TPT)*

A = PT B B = (PPT)-1PA

x2 Dimension reduction:

B = PA

b 0  y  yˆ

Calculate b0’s

Use scores (projections) on latent variables that explain maximal variance in X

PCR: Principal Component Regression

PLS: Partial Least Squares Regression Phase 1 p cols

Optimal number of PC’s

Phase 2

a col

a2

PLS Calculate Crossvalidation RMSE for different # PC’s

RMSECV 

( y  y ) i

MLR

T

X

k cols

a1

aa

y

2

i

n-rows

n

n-rows

n-rows

a1 k cols

Phase 3

Y

b0 b1

a1 a2 aa

n-rows

PLS: Partial Least Squares Regression

PLS: Partial Least Squares Regression Phase 1 : Calculate new independent variables (T)

Projection to Latent Structure PCR

xp

PLS

xp

Sequential Algorithm: Latent variables and their scores are calculated sequentially • Step 0: Mean center X

PC1  

 

x1

 

LV1 (w)

• Step 1: Calculate w Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY (XTY)pk = WpaDaa ZTak

w1 = 1st col. of W

x1 xp

 x2

Use PC: Maximizes variance in X

bp

 w1

 x2

Use LV: Maximizes covariance (X,y) = VarX*vary*cor(X,y)

 

 

x1

 

x2 

3

12/9/2013

PLS: Partial Least Squares Regression

PLS: Partial Least Squares Regression Phase 1 p cols

Phase 1 : Calculate new independent variables (T) Sequential Algorithm: Latent variables and their scores are calculated sequentially

k cols

a1 a2

PLS

MLR

T

X

• Step 1: Calculate LV1= w1 that maximizes Covariance (X,Y) : SVD on XTY (XTY)pk = WpaDaa ZTak

Phase 2

a col

aa

y

w1 = 1st col. of W n-rows

xp

•Step 2:

a1 w

Calculate t1, scores (projections) of X on w1

tn1 = Xnpwp1

n-rows

n-rows

 

k cols

Phase 3

Y

x1

b0 b1

a1 a2

 

aa

n-rows

bp

x2

PLS: Partial Least Squares Regression

MLR, PCR, PLS:

Optimal number of LV’s

Set A

Calculate Crossvalidation RMSE for different # LV’s 

RMSECV  

(y i  y i )2

n

Set B

x1

x2

x1

x2

y

-1.01

-0.99

-1.01

-0.99

-1.89

3.23

3.25

3.23

3.25

10.33

5.49

5.55

5.49

5.55

19.09

0.23

0.21

0.23

0.23

2.19

-2.87

-2.91

-2.87

-2.91

-8.09

3.67

3.76

3.67

3.76

11.29

y= b1 x1 + b2x2 + ε

VALIDATION

b1

b2

b1

b2

MLR

10.3

-6.92

2.96

0.28

PCR

1.60

1.62

1.60

1.62

PLS

1.60

1.62

1.60

1.62

Common measure for prediction error

Estimating prediction error. Basic Principle: test how well your model works with new data, it has not seen yet!

4

12/9/2013

A Biased Approach

Validation: Basic Principle Basic Principle:

Prediction error of the samples the model was built on test how well your model works with new data, it has not seen yet!

Error is biased! Samples also used to build the model

Split data in training and test set.

 model is biased towards accurate prediction of these specific samples

Several ways: One large test set Leave one out and repeat: LOO Leave n objects out and repeat: LNO ... Apply entire model procedure on the test set

Validation

Training and test sets Split in training and test set. • Test set should be representative of training set • Random choice is often the best • Check for extremely unlucky divisions • Apply whole procedure on the test and validation sets

b0 Training set

Build model : bp

Full data set

Test set

RMSEP

Remark: for final model use whole data set.

Cross-validation

Cross-validation: an example • The data

• Most simple case: Leave-One-Out (=LOO, segment=1 sample). Normally 10-20% out (=LnO). • Remark: for final model use whole data set.

5

12/9/2013

Cross-validation: an example • Split data into training set and validation set

Cross-validation: an example

Cross-validation: an example • Split data into training set and test set

Cross-validation: an example

• Build a model on the training set

Cross-validation: an example • Split data again into training set and valid. set – Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

Cross-validation: an example • Split data again into training set and valid. set – Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

6

12/9/2013

Cross-validation: an example

Cross-validation: an example

• Split data again into training set and valid. set

• Split data again into training set and valid. set

– Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

– Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

Cross-validation: an example

Cross-validation: an example

• Split data again into training set and valid. set

• Split data again into training set and valid. set

– Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

– Until all samples have been in the validation set once – Common: Leave-One-Out (LOO)

Cross-validation: a warning

Cross-validation: a warning

• Data: 13 x 5 = 65 NIR spectra (1102 wavelengths) – 13 samples: different composition of NaOH, NaOCl and Na2CO3 – 5 temperatures: each sample measured at 5 temperatures

• The data 1102

3

1 Composit ion

NaOH (wt%)

NaOCl (wt%)

Na2CO3 (wt%)

1

18.99

0

0

15

21

27

34

40

2

9.15

9.99

0.15

15

21

27

34

40

3

15.01

0

4.01

15

21

27

34

40

  

4

9.34

5.96

3.97

15

21

27

34

40

13

13

16.02

2.01

1.00

15

21 27

34

40

Temperature (°C)

2 y

65

65

Leave SAMPLE out

7

12/9/2013

Validation

Selection of number of LV’s Training Set

Trough Validation:

2) Build model : b 0

Choose number of LV’s that results in model with lowest prediction errror Testset to assess final model cannot be used !

1) determine #LV’s : wit test’ set

Full data set

Test’ set

Divide trainingset Crossvalidation Test set

bp

RMSEP

Remark: for final model use whole data set.

Double Cross Validation

CV2

Double cross-validation

1) determine #LV’s : CV Innerloop

• The data

2) Build model : CV Outer loop b0 Full data set

Training setC CV 1

bp

RMSEP

Remark: for final model use whole data set Skip.

Double cross-validation

Double cross-validation

• Split data into training set and validation set

• Split data into training set and validation set

Used later to assess model performance!

8

12/9/2013

Double cross-validation

1LV

2LV

3LV

• Apply crossvalidation on the rest: Split training set into (new) training set and test set

1LV

2LV

3LV

1LV

2LV

3LV

Lowest RMSECV

Double cross-validation

9

12/9/2013

Cross-validation: an example

Cross-validation: an example

• Repeat procedure

• Repeat procedure

– Until all samples have been in the validation set once

– Until all samples have been in the validation set once

Double cross-validation

PLS: an example

• In this way:

Raw + meancentered data

– The number of LVs is determined by using samples not used to build the model with

Raw data

Meancentered data

2

0.3 0.25

1.5

0.2

Absorbance (a.u.)

0.15 Absorbance (a.u.)

– The prediction error is also determined using samples the model has not seen before

1

0.5

0.1 0.05 0 -0.05

0

-0.1 -0.15

Remark: for final model use whole data set. -0.5 2000

4000

6000

8000

10000

12000

-0.2 2000

14000

Wavenumber (cm-1)

RMSECV vs. No of LVs

4000

6000

8000

10000

12000

14000

Wavenumber (cm-1)

Regression coeffficients Raw data

Absorbance (a.u.)

2

RMSECV values for prediction of NaOH 0.7

0.6

-0.5 3000

0.4

4000

5000

6000

7000

8000

9000 10000 11000 12000 13000

Wavenumber (cm-1)

0.3

10

0.2

0.1

1

2

3

4

5 6 7 Number of LVs

8

9

10

Regression coefficient

RMSECV

1 0.5 0

0.5

0

1.5

8 6 4 2 0 -2 3000

4000

5000

6000

7000

8000

9000 10000 11000 12000 13000

Wavenumber (cm-1)

10

12/9/2013

Why Pre-Processing ?

True vs. predicted

Data Artefacts

3 Original spectrum

True values vs. predictions

18

NaOH, predicted

16

14

12

Baseline correction Alignment Scatter correction Noise removal Scaling, Normalisation Transformation …..

2.5

8

10

12

14 NaOH, true

16

original

0.8

0.6

0.7

0.5

0.6

0.4 0.3 0.2 0.1

18

Other

20

1400

1600

0.4

0.8

0.3

0.6 200

400

600 800 1000 1200 Wavelength (a.u.)

1400

1600

0.6

offset+slope

0.6

0.5

0.4

0.4

0.3

200

400

0 600 800 1000 1200 1400 16000 Wavelength (a.u.)

0

0

Pre-Processing Methods STEP 2: (10x) SCATTER

STEP 3: (10x) NOISE

STEP 4: (7x) SCALING & TRANSFORMATION S Meancentering

No baseline correction

No scatter correction

No noise removal

(3x) Detrending polynomial order (2-3-4)

(4x) scaling: Mean Median Max L2 norm

(9x) S-G smoothing (window: 5-9-11 pt) (order: 2-3-4)

(2x) Derivatisation (1st – 2nd )

SNV

Pareto scaling

(3x) RNV (15, 25, 35)%

Poisson scaling

AsLS

MSC

Level scaling

400

600

800 1000 Wavelength (a.u.)

1200

1400

1600

Pre-Processing Results • Complexity of the model : no of LV • Classification Accuracy

Raw Data

Autoscaling Range scaling

Log transformation

Supervised pre-processing methods No noise removal

200

200 400 600 800 1000 1200 1400 1600 Wavelength (a.u.)

4914 combinations: all reasonable

OSC DOSC

0.2

0.1

200 400 600 800 1000 1200 1400 160000 Wavelength (a.u.)

STEP 1: (7x) BASELINE

0.3

0.1

0.2

0.2 0.1

0.1

0.4

0.3

0.3

0.2

multiplicative

0.6

0.5

0.5

0.5

0.4

Intensity (a.u.)

Intensity (a.u.)

Intensity (a.u.)

offset

0.7

0.7

0.8 0.7

original offset offset+slope multiplicative offset + slope + multiplicative

0.7

0.1

0.8 0.8

2500

original offset offset+slope multiplicative offset + slope + multiplicative

Intensity (a.u.)

600 800 1000 1200 Wavelength (a.u.)

2000

Meancentering Autoscaling Range scaling Pareto scaling Poisson scaling Level scaling Log scaling

Complexity of the model (no of LV)

400

1500 Wavelength (cm-1)

0.2 200

1000

0.5

0 0

00

500

• Missing values • Outliers

0.7

0 0

1

0

Intensity (a.u.)

Intensity (a.u.)

0.8

1.5

0.5

10

8

Offset Slope Scatter

2 Intensity (a.u)

• • • • • • •

20

J. Engel et al. TrAC 2013

Classification accuracy %

11

12/9/2013

SOFTWARE • PLS Toolbox (Eigenvector Inc.) – www.eigenvector.com – For use in MATLAB (or standalone!)

• XLSTAT-PLS (XLSTAT) – www.xlstat.com – For use in Microsoft Excel

• Package pls for R – Free software – http://cran.r-project.org

12