Overview • • • • • •
Introduction Linear regression Linear-transformable regressions Linear Pitfalls, options and complications Non-linear fitting Robust estimation: alternative cost functions and weighting • Implementation and software • Comparing and testing models
Curve Fitting S.K. Piechnik
Introduction • Frequently, a relation within the experimental data is desired in terms of an analytical expression between variables that were measured. • Parameters may be subsequently used as summary descriptives of underlying process enabling multi-level comparisons between datasets in abstract of particular choice of measurement points.
From whence do this “best” line come?
Curve fitting: Definitions • • •
Curve fitting: statistical technique used to derive coefficient values for equations that express the value of one variable (dependent variable) as a function of another (independent variable). Linear regression: curve fitting for relationships that are best approximated by a straight line Non-linear regression: curve fitting for relationships that are best approximated by a curved (i.e. non-linear) equation
Maximum probability likelihood • Probability that out measurements came from the specific line under Gaussian noise y − y( x ) N
P ~ ∏ exp − 21 i =1
Gut feeling Maximum probability principle
σ
i
* ∆y
• Maximize the above probability • After log and removal of constants (N, y and ) this is equivalent to minimising Cost =
Least (sum) of squares (of error)
2
i
N i =1
( yi − y ( xi ) )2
1
From whence do this “best” line come?
Line Fitting •Exact analytic solution •Implemented in scientific calculators and in M$Excel •Can even easily get the errors on the parameters
(Offset)
(Slope)
The Cartoon Guide to Statistics, L. Gonick & W. Smith
Linear regression of (some) nonlinear functions
Linear fitting of non-linear functions? Just contradiction of terms
• This method of least squares is not limited to linear fits (or 2 variables fits) • One can just as readily use the same procedure for Y = ax2 + bx +c by minimizing SS =
n i =1
Example: Quadratic Regression Yi = axi2+bxi+c ∂
n i =1
d i2
∂a ∂
n i =1
n i =1
−
=0
−
n i =1
d i2
∂c
n i =1
d i2
∂b ∂
=0
=0
−
n i =1
xi2 yi + a
xi yi + a
yi + a
n i =1 n i =1
n i =1
xi4 + b
xi3 + b
xi2 + b
n i =1 n i =1
n i =1
xi3 + c
xi2 + c
n i =1 n i =1
n
( yi − Yi )2 =
i =1
(y − ax
2 i
i
− bxi − c
)
2
Quadratic Regression (cont’d) • Solve linear system of equations
xi2 = 0 ( 1 )
xi = 0 ( 2 )
xi + nc = 0 ( 3 )
n i =1 n i =1 n i =1
xi4 xi3 xi2
n i =1 n i =1 n i =1
xi3 xi2 xi
n i =1 n
xi2
i =1
n
xi
n
a b = c
i =1 n i =1 n
xi2 yi xi yi
i =1
yi
2
120
Exponential Fitting
100
y = 4.2986e
Logarithmic Fitting
0.2668x
2
80
R = 0.9935
60 40 20 0 0
5
10
15
Linearize the equation and apply the fit to a straight line 1000 0.2668x
y = 4.2986e 2 R = 0.9935
100
10
1 0
5
10
15
(far from)
Power Law Fitting
Exhaustive list of regression transforms
Non-linar Function
ln( y )
x
ln( y ) = ln( A ) + Bx
y = Ax B 1 y= A + Bx
ln( y )
ln( x )
1 y
x
ln( y ) = ln( A ) + b ln( x ) 1 = A + Bx y 1 ln = Bx 1− y y e = A + Bx 1 1 = A +B y x ...
y = ln( A + Bx ) x y= A + Bx ...
y = 1.7x - 0.1 R2 = 0.85
9 8
1
3
2
2
3
4
4
7
5
9
Y (dependent)
7
Y
ln
1 1− y ey
x x
1 y
1 x
...
...
Correlation Coefficient • Given a relation between y and x, How good is the fit?
10
X
Equivalent regression
y = Ae Bx
y = 1 − e − Bx
Even excel...
Transform Y X
The parameter which convey this information is the correlation coefficient, usually denoted by r.
6 5 4 3 2 1 0 0
1
2
3 X (independent)
4
5
6
r = 1−
σ y2,x σ y2
12
(Variation in residuals)
(Variation in data)
3
Where: Variation in Y about its mean n
σy =
i =1
( yi − y )2
n
σ y ,x =
n −1
i =1
( yi − Yi )2
12
n−2
Eventually… n
n r=
i =1
n
n i =1
x − 2 i
Correlation Coefficient (cont’d)
Variation in Y about its prediction 12
n i =1
xi yi − 2 12
xi
n i =1
n
xi n
n i =1
yi
y −
i =1
2 i
n i =1
2 12
yi
• The value range of r ranges from -1 (perfectly correlated in the negative direction) to +1 (perfectly correlated in the positive direction). • When r = 0, the two variables are not correlated. • The “goodness” of fit is usually given by r2. • This also known as proportion of explained variability and used as the basic descriptor of fit quality.
Fit Quality (STATISTICA ver. 7.0)
Fit Quality (STATISTICA ver. 7.0)
Part of GLM
Part of GLM
Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3
Effect Intercept X
Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3
Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156
Effect Intercept X Error
Effect Intercept X
Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000
Effect Intercept X Error
Fit Quality (STATISTICA ver. 7.0)
Effect Intercept X
Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156
Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000
Model/Transform Identification
Part of GLM
Test of SS Whole Model vs. SS Residual (Spreadsheet1) Dependnt Multiple Multiple Adjusted SS df MS SS df Variable R R² R² Model Model Model Residual Residual Residual Y 0.921954 0.850000 0.800000 28.90000 1 28.90000 5.100000 3
Parameter Estimates (Spreadsheet1) Sigma-restricted parameterization Y Y Y Y -95.00% +95.00% Param. Std.Err t p Cnf.Lmt Cnf.Lmt -0.100000 1.367479 -0.073127 0.946308 -4.45193 4.251930 1.700000 0.412311 4.123106 0.025865 0.38784 3.012156
35 30 25 20
Effect Intercept X Error
Univariate Tests of Significance for Y (Spreadsheet1) Sigma-restricted parameterization Effective hypothesis decomposition SS Degr. of MS F p Freedom 0.00909 1 0.00909 0.00535 0.946308 28.90000 1 28.90000 17.00000 0.025865 5.10000 3 1.70000
15
Y
10 5 0 0
5
10
15
20
25
30
4
Plots are pictures of science, worth thousands of words in boring tables.
Automatic Model Identification (SPSS ver. 12) Independent: Mth
Rsq
LIN LOG INV QUA CUB COM POW S GRO EXP LGS
.984 .800 .256 .993 .996 .818 .977 .528 .818 .818 .818
x
d.f.
F
Sigf
23 1412.48 23 92.22 23 7.93 22 1470.38 21 1841.49 23 103.33 23 996.62 23 25.69 23 103.33 23 103.33 23 103.33
.000 .000 .010 .000 .000 .000 .000 .000 .000 .000 .000
Upper b0
bound
b1
7.0775 1.0250 7.7379 5.5463 20.4520 -1.9434 5.6967 1.3839 4.7333 1.9133 8.0675 1.0664 7.2018 .4219 2.9657 -.1919 2.0878 .0643 8.0675 .0643 .1240 .9377
.
b2
b3
-.0149 -.0710
.0016
Actual function used Y=3*exp((0.7x)^0.3)
These 4 plots have the same slopes, intercepts and r values!
Non-linear Transform Exploration
Automatic Model Identification (SPSS ver. 12) Independent: Mth
Rsq
LIN LOG INV QUA CUB COM POW S GRO EXP LGS
.984 .800 .256 .993 .996 .818 .977 .528 .818 .818 .818
STATISTICA v. 7
x
d.f.
F
Sigf
bound
23 1412.48 23 92.22 23 7.93 22 1470.38 21 1841.49 23 103.33 23 996.62 23 25.69 23 103.33 23 103.33 23 103.33
.000 .000 .010 .000 .000 .000 .000 .000 .000 .000 .000
30.00
Actual function used Y=3*exp((0.7x)^0.3)
20.00
10.00
.
Upper b0
b1
7.0775 1.0250 7.7379 5.5463 20.4520 -1.9434 5.6967 1.3839 4.7333 1.9133 8.0675 1.0664 7.2018 .4219 2.9657 -.1919 2.0878 .0643 8.0675 .0643 .1240 .9377
b2
Observed
b3
Linear Logarithmic Inverse Quadratic Cubic
-.0149 -.0710
Compound Power S
.0016
Growth Exponential Logistic
0.00 0.00
5.00
10.00
15.00
20.00
25.00
x
Non-linear Transform Exploration in Pictures
Non-linear Transform Exploration :Y : Y^2 : Y^3
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4
: Y^0.5
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4
: Y^0.3
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4
: exp(Y)
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4
: log(Y)
Excel(transform&scale) + Statistica(categorized plot)
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4
Actual function used Y=3*exp((0.7x)^0.3)
1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.2
0.0
0.2
0.4
0.6
:X
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
: X^2
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
: X ^3
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
: X^0.5
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
: X^0.3
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
: log(x)
0.8
1.0
1.2
-0.2
0.0
0.2
0.4
0.6
0.8
1.0
1.2
: exp(x)
5
• Transformations can be very useful when used appropriately. • But beware, follow these rules: – You should transform your data when the transformation makes the variability more consistent and more Gaussian. – You should not transform data when the transformation makes the variability less consistent and less Gaussian. 400
1000
Y
12
12
10
10
8
8
6
6
4
4
2
2
0
0 0
2
4
6
8
10
12
0
2
4
6
8
10
12
Y Y+noise
Y+noise
350
Choice of distance and error weighting
Transf ormFit
TransformFit
100
300
12 12
250
10
10
10
200
8
150
•Note: this is result of Exponential fit performed by MS Excel.
1 0
100
0.1
50 0 0
5
10
15
5
10
15
This is ONE BIG and (not) really MEAN square!
0.01
8
6 6
4 4
2 2
0 0
0 0
Choice of distance and error weighting M
SS =
( yi − a − bxi ) 2
i =1
M
SS =
i =1
4
6
8
10
i =1
( yi − a − bxi ) 1 + b2
8
10
12
“In practice, the vertical offsets from a line (polynomial, surface, hyperplane, etc.) are almost always minimized instead of the perpendicular offsets.
•
This provides a much simpler analytic form for the fitting parameters.
•
Minimizing R2perp for a second- or higher-order polynomial leads to polynomial equations having higher order, so this formulation cannot be extended.
•
“In any case, for a reasonable number of noisy data points, the difference between vertical and perpendicular fits is quite small.” [Mathworld]
8
8
6
•
2
10
10
4
Vertical vs Perpendicular offsets
12
12
2
12
[( yi − a − bxi ) ⋅ cos( arctg (1 / b)]2 M
=
2
6
6
4
4
2
2
0
0 0
SS =
2
4
6
8
10
0
12
M
( yi − a − bxi ) 2
i =1
σ Yi 2
2
4
6
SS =
8
10
12
( yi − a − bxi ) σ Yi 2 + b 2σ Xi 2
2
M i =1
12 12
10 10 8 8
6 6
4 4
2 2
0 0
0 0
2
4
6
8
10
2
4
6
8
10
12
12
Regression of data with X&Y errors SS =
12
M i =1
10
( yi − a − bxi ) 2 = σ Yi 2 + bσ Xi 2
8
wi =
6
4
2
M i =1
( yi − a − bxi ) 2 ⋅ wi
1
σ Yi 2 + b 2σ Xi 2
0 0
2
4
M
a=
i =1
6
10
( yi − bxi ) 2 ⋅ wi M i =1
b=?
8
Non linear fitting • Often linearized approach is not adequate
– Linearisation not possible or introduces errors – Too cumbersome
• Optimal fit with a non-linear function is usually also obtained with least squares. • Difference lies in fact that the optimal set of function parameters (a1, … an) must be found iteratively by trial and error until the best combination is found.
12
(Offset)
wi
• Thus the problem reduces to the minimalisation of a function (SS) in multidimensional space. • Important aspects: – Choice of optimisation method – Start parameters – Convergence threshold and method
(Slope) – can NOT be calculated analytically. Must be optimised and fed back to calculations of weights!
6
Minimising SS SS =
M i =1
( yi − f ( xi , a0 , a1 ,..., am ) 2 ⋅ wi
• Treat sum of squares as a continuous function of the m parameters and search the m-dimensional space for the appropriate minimum value
•Grid Search: Vary each parameter in turn, minimizing chi-squared with each parameter independently. Many successive iterations are required to locate the minimum of chi-squared unless the parameters are independent. •Gradient Search: Vary all parameters simultaneously, adjusting relative magnitudes of the variations so that the direction of propagation in parameter space is along the direction of steepest descent of chi-squared •Expansion Methods: Find an approximate analytical function that describes the chi-squared hypersurface and use this function to locate the minimum. Number of computed points is less, but computations are considerably more complicated. •Marquardt Method: GradientExpansion combination
From Bevington and Robinson
Problems to look out for
Problems to look out for
• Ringing
• Ringing
• High initial SS • Multiple minima
• High initial SS • Multiple minima
– Max iterations
• Select good starting point • Inspect results
Problems to look out for • Ringing • High initial SS
– Stop condition
∆SS