Overview. Curve Fitting. Curve fitting: Definitions. Introduction. From whence do this best line come? Maximum probability likelihood. S.K

Overview • • • • • • Introduction Linear regression Linear-transformable regressions Linear Pitfalls, options and complications Non-linear fitting Ro...
Author: Allison Golden
0 downloads 0 Views 4MB Size
Overview • • • • • •

Introduction Linear regression Linear-transformable regressions Linear Pitfalls, options and complications Non-linear fitting Robust estimation: alternative cost functions and weighting • Implementation and software • Comparing and testing models

Curve Fitting S.K. Piechnik

Introduction • Frequently, a relation within the experimental data is desired in terms of an analytical expression between variables that were measured. • Parameters may be subsequently used as summary descriptives of underlying process enabling multi-level comparisons between datasets in abstract of particular choice of measurement points.

From whence do this “best” line come?

Curve fitting: Definitions • • •

Curve fitting: statistical technique used to derive coefficient values for equations that express the value of one variable (dependent variable) as a function of another (independent variable). Linear regression: curve fitting for relationships that are best approximated by a straight line Non-linear regression: curve fitting for relationships that are best approximated by a curved (i.e. non-linear) equation

Maximum probability likelihood • Probability that out measurements came from the specific line under Gaussian noise y − y( x ) N

P ~ ∏ exp − 21 i =1

Gut feeling Maximum probability principle

σ

i

* ∆y

• Maximize the above probability • After log and removal of constants (N, y and ) this is equivalent to minimising Cost =

Least (sum) of squares (of error)

2

i

N i =1

( yi − y ( xi ) )2

1

From whence do this “best” line come?

Line Fitting •Exact analytic solution •Implemented in scientific calculators and in M$Excel •Can even easily get the errors on the parameters

(Offset)

(Slope)

The Cartoon Guide to Statistics, L. Gonick & W. Smith

Linear regression of (some) nonlinear functions

Linear fitting of non-linear functions? Just contradiction of terms

• This method of least squares is not limited to linear fits (or 2 variables fits) • One can just as readily use the same procedure for Y = ax2 + bx +c by minimizing SS =

n i =1

Example: Quadratic Regression Yi = axi2+bxi+c ∂

n i =1

d i2

∂a ∂

n i =1

n i =1



=0



n i =1

d i2

∂c

n i =1

d i2

∂b ∂

=0

=0



n i =1

xi2 yi + a

xi yi + a

yi + a

n i =1 n i =1

n i =1

xi4 + b

xi3 + b

xi2 + b

n i =1 n i =1

n i =1

xi3 + c

xi2 + c

n i =1 n i =1

n

( yi − Yi )2 =

i =1

(y − ax

2 i

i

− bxi − c

)

2

Quadratic Regression (cont’d) • Solve linear system of equations

xi2 = 0 ( 1 )

xi = 0 ( 2 )

xi + nc = 0 ( 3 )

n i =1 n i =1 n i =1

xi4 xi3 xi2

n i =1 n i =1 n i =1

xi3 xi2 xi

n i =1 n

xi2

i =1

n

xi

n

a b = c

i =1 n i =1 n

xi2 yi xi yi

i =1

yi

2

120

Exponential Fitting

100

y = 4.2986e

Logarithmic Fitting

0.2668x

2

80

R = 0.9935

60 40 20 0 0

5

10

15

Linearize the equation and apply the fit to a straight line 1000 0.2668x

y = 4.2986e 2 R = 0.9935

100

10

1 0

5

10

15

(far from)

Power Law Fitting

Exhaustive list of regression transforms

Non-linar Function

ln( y )

x

ln( y ) = ln( A ) + Bx

y = Ax B 1 y= A + Bx

ln( y )

ln( x )

1 y

x

ln( y ) = ln( A ) + b ln( x ) 1 = A + Bx y 1 ln = Bx 1− y y e = A + Bx 1 1 = A +B y x ...

y = ln( A + Bx ) x y= A + Bx ...

y = 1.7x - 0.1 R2 = 0.85

9 8

1

3

2

2

3

4

4

7

5

9

Y (dependent)

7

Y

ln

1 1− y ey

x x

1 y

1 x

...

...

Correlation Coefficient • Given a relation between y and x, How good is the fit?

10

X

Equivalent regression

y = Ae Bx

y = 1 − e − Bx

Even excel...

Transform Y X

The parameter which convey this information is the correlation coefficient, usually denoted by r.

6 5 4 3 2 1 0 0

1

2

3 X (independent)

4

5

6

r = 1−

σ y2,x σ y2

12

(Variation in residuals)

(Variation in data)

3

Where: Variation in Y about its mean n

σy =

i =1

( yi − y )2

n

σ y ,x =

n −1

i =1

( yi − Yi )2

12

n−2

Eventually… n

n r=

i =1

n

n i =1

x − 2 i

Correlation Coefficient (cont’d)

Variation in Y about its prediction 12

n i =1

xi yi − 2 12

xi

n i =1

n

xi n

n i =1

yi

y −

i =1

2 i

n i =1

2 12

yi

• The value range of r ranges from -1 (perfectly correlated in the negative direction) to +1 (perfectly correlated in the positive direction). • When r = 0, the two variables are not correlated. • The “goodness” of fit is usually given by r2. • This also known as proportion of explained variability and used as the basic descriptor of fit quality.

Fit Quality (STATISTICA ver. 7.0)

Fit Quality (STATISTICA ver. 7.0)

Part of GLM

Part of GLM

Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3

Effect Intercept X

Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3

Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156

Effect Intercept X Error

Effect Intercept X

Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000

Effect Intercept X Error

Fit Quality (STATISTICA ver. 7.0)

Effect Intercept X

Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156

Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000

Model/Transform Identification

Part of GLM

Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3

Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156

35 30 25 20

Effect Intercept X Error

Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000

15

Y

10 5 0 0

5

10

15

20

25

30

4

Plots are pictures of science, worth thousands of words in boring tables.

Automatic Model Identification (SPSS ver. 12) Independent: Mth

Rsq

LIN LOG INV QUA CUB COM POW S GRO EXP LGS

.984 .800 .256 .993 .996 .818 .977 .528 .818 .818 .818

x

d.f.

F

Sigf

23 1412.48 23 92.22 23 7.93 22 1470.38 21 1841.49 23 103.33 23 996.62 23 25.69 23 103.33 23 103.33 23 103.33

.000 .000 .010 .000 .000 .000 .000 .000 .000 .000 .000

Upper b0

bound

b1

7.0775 1.0250 7.7379 5.5463 20.4520 -1.9434 5.6967 1.3839 4.7333 1.9133 8.0675 1.0664 7.2018 .4219 2.9657 -.1919 2.0878 .0643 8.0675 .0643 .1240 .9377

.

b2

b3

-.0149 -.0710

.0016

Actual function used Y=3*exp((0.7x)^0.3)

These 4 plots have the same slopes, intercepts and r values!

Non-linear Transform Exploration

Automatic Model Identification (SPSS ver. 12) Independent: Mth

Rsq

LIN LOG INV QUA CUB COM POW S GRO EXP LGS

.984 .800 .256 .993 .996 .818 .977 .528 .818 .818 .818

STATISTICA v. 7

x

d.f.

F

Sigf

bound

23 1412.48 23 92.22 23 7.93 22 1470.38 21 1841.49 23 103.33 23 996.62 23 25.69 23 103.33 23 103.33 23 103.33

.000 .000 .010 .000 .000 .000 .000 .000 .000 .000 .000

30.00

Actual function used Y=3*exp((0.7x)^0.3)

20.00

10.00

.

Upper b0

b1

7.0775 1.0250 7.7379 5.5463 20.4520 -1.9434 5.6967 1.3839 4.7333 1.9133 8.0675 1.0664 7.2018 .4219 2.9657 -.1919 2.0878 .0643 8.0675 .0643 .1240 .9377

b2

Observed

b3

Linear Logarithmic Inverse Quadratic Cubic

-.0149 -.0710

Compound Power S

.0016

Growth Exponential Logistic

0.00 0.00

5.00

10.00

15.00

20.00

25.00

x

Non-linear Transform Exploration in Pictures

Non-linear Transform Exploration :Y : Y^2 : Y^3

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

: Y^0.5

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

: Y^0.3

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

: exp(Y)

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

: log(Y)

Excel(transform&scale) + Statistica(categorized plot)

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4

Actual function used Y=3*exp((0.7x)^0.3)

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.2

0.0

0.2

0.4

0.6

:X

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

: X^2

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

: X ^3

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

: X^0.5

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

: X^0.3

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

: log(x)

0.8

1.0

1.2

-0.2

0.0

0.2

0.4

0.6

0.8

1.0

1.2

: exp(x)

5

• Transformations can be very useful when used appropriately. • But beware, follow these rules: – You should transform your data when the transformation makes the variability more consistent and more Gaussian. – You should not transform data when the transformation makes the variability less consistent and less Gaussian. 400

1000

Y

12

12

10

10

8

8

6

6

4

4

2

2

0

0 0

2

4

6

8

10

12

0

2

4

6

8

10

12

Y Y+noise

Y+noise

350

Choice of distance and error weighting

Transf ormFit

TransformFit

100

300

12 12

250

10

10

10

200

8

150

•Note: this is result of Exponential fit performed by MS Excel.

1 0

100

0.1

50 0 0

5

10

15

5

10

15

This is ONE BIG and (not) really MEAN square!

0.01

8

6 6

4 4

2 2

0 0

0 0

Choice of distance and error weighting M

SS =

( yi − a − bxi ) 2

i =1

M

SS =

i =1

4

6

8

10

i =1

( yi − a − bxi ) 1 + b2

8

10

12

“In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets.



This provides a much simpler analytic form for the fitting parameters.



Minimizing R2perp for a second- or higher-order polynomial leads to polynomial equations having higher order, so this formulation cannot be extended.



“In any case, for a reasonable number of noisy data points, the difference between vertical and perpendicular fits is quite small.” [Mathworld]

8

8

6



2

10

10

4

Vertical vs Perpendicular offsets

12

12

2

12

[( yi − a − bxi ) ⋅ cos( arctg (1 / b)]2 M

=

2

6

6

4

4

2

2

0

0 0

SS =

2

4

6

8

10

0

12

M

( yi − a − bxi ) 2

i =1

σ Yi 2

2

4

6

SS =

8

10

12

( yi − a − bxi ) σ Yi 2 + b 2σ Xi 2

2

M i =1

12 12

10 10 8 8

6 6

4 4

2 2

0 0

0 0

2

4

6

8

10

2

4

6

8

10

12

12

Regression of data with X&Y errors SS =

12

M i =1

10

( yi − a − bxi ) 2 = σ Yi 2 + bσ Xi 2

8

wi =

6

4

2

M i =1

( yi − a − bxi ) 2 ⋅ wi

1

σ Yi 2 + b 2σ Xi 2

0 0

2

4

M

a=

i =1

6

10

( yi − bxi ) 2 ⋅ wi M i =1

b=?

8

Non linear fitting • Often linearized approach is not adequate

– Linearisation not possible or introduces errors – Too cumbersome

• Optimal fit with a non-linear function is usually also obtained with least squares. • Difference lies in fact that the optimal set of function parameters (a1, … an) must be found iteratively by trial and error until the best combination is found.

12

(Offset)

wi

• Thus the problem reduces to the minimalisation of a function (SS) in multidimensional space. • Important aspects: – Choice of optimisation method – Start parameters – Convergence threshold and method

(Slope) – can NOT be calculated analytically. Must be optimised and fed back to calculations of weights!

6

Minimising SS SS =

M i =1

( yi − f ( xi , a0 , a1 ,..., am ) 2 ⋅ wi

• Treat sum of squares as a continuous function of the m parameters and search the m-dimensional space for the appropriate minimum value

•Grid Search: Vary each parameter in turn, minimizing chi-squared with each parameter independently. Many successive iterations are required to locate the minimum of chi-squared unless the parameters are independent. •Gradient Search: Vary all parameters simultaneously, adjusting relative magnitudes of the variations so that the direction of propagation in parameter space is along the direction of steepest descent of chi-squared •Expansion Methods: Find an approximate analytical function that describes the chi-squared hypersurface and use this function to locate the minimum. Number of computed points is less, but computations are considerably more complicated. •Marquardt Method: GradientExpansion combination

From Bevington and Robinson

Problems to look out for

Problems to look out for

• Ringing

• Ringing

• High initial SS • Multiple minima

• High initial SS • Multiple minima

– Max iterations

• Select good starting point • Inspect results

Problems to look out for • Ringing • High initial SS

– Stop condition

∆SS