Case Study: Curve Fitting

Mathematics Case Study: Curve Fitting This section provides an overview of some of the MATLAB basic data analysis capabilities in the form of a case ...
Author: Chad Sims
0 downloads 0 Views 111KB Size
Mathematics

Case Study: Curve Fitting This section provides an overview of some of the MATLAB basic data analysis capabilities in the form of a case study. The examples that follow work with a collection of census data, using MATLAB functions to experiment with fitting curves to the data: Polynomial fit Analyzing residuals Exponential fit Error bounds This section also tells you how to use the Basic Fitting interface to perform curve fitting tasks. Loading the Data The file census.mat contains U.S. population data for the years 1790 through 1990. Load it into MATLAB: load census Your workspace now contains two new variables, cdate and pop: cdate is a column vector containing the years from 1790 to 1990 in increments of 10. pop is a column vector with the U.S. population figures that correspond to the years in cdate.

Multiple Regression

Polynomial Fit

Mathematics

Polynomial Fit A first try in fitting the census data might be a simple polynomial fit. Two MATLAB functions help with this process. Curve Fitting Function Summary Function Description polyfit Polynomial curve fit. polyval Evaluation of polynomial fit. The MATLAB polyfit function generates a "best fit" polynomial (in the least squares sense) of a specified order for a given set of data. For a polynomial fit of the fourth-order p = polyfit(cdate,pop,4) Warning: Polynomial is badly conditioned. Remove repeated data points or try centering and scaling as described in HELP POLYFIT. p = 1.0e+05 * 0.0000

-0.0000

0.0000

-0.0126 6.0020

The warning arises because the polyfit function uses the cdate values as the basis for a matrix with very large values (it creates a Vandermonde matrix in its calculations - see the polyfit M-file for details). The spread of the cdate values results in scaling problems. One way to deal with this is to normalize the cdate data. Preprocessing: Normalizing the Data Normalization is a process of scaling the numbers in a data set to improve the accuracy of the subsequent numeric computations. A way to normalize cdate is to center it at zero mean and scale it to unit standard deviation: sdate = (cdate - mean(cdate))./std(cdate) Now try the fourth-degree polynomial model using the normalized data: p = polyfit(sdate,pop,4) p = 0.7047

0.9210

23.4706

73.8598

62.2285

Evaluate the fitted polynomial at the normalized year values, and plot the fit against the observed data points:

pop4 = polyval(p,sdate); plot(cdate,pop4,'-',cdate,pop,'+'), grid on

Another way to normalize data is to use some knowledge of the solution and units. For example, with this data set, choosing 1790 to be year zero would also have produced satisfactory results.

Case Study: Curve Fitting

Analyzing Residuals

Mathematics

Analyzing Residuals A measure of the "goodness" of fit is the residual, the difference between the observed and predicted data. Compare the residuals for the various fits, using normalized cdate values. It's evident from studying the fit plots and residuals that it should be possible to do better than a simple polynomial fit with this data set. Comparison Plots of Fit and Residual Fit

Residuals

p1 = polyfit(sdate,pop,1); pop1 = polyval(p1,sdate); plot(cdate,pop1,'b-',cdate,pop,'g+')

res1 = pop - pop1; figure, plot(cdate,res1,'g+')

p = polyfit(sdate,pop,2); pop2 = polyval(p,sdate); plot(cdate,pop2,'b-',cdate,pop,'g+')

res2 = pop - pop2; figure, plot(cdate,res2,'g+')

p = polyfit(sdate,pop,4); pop4 = polyval(p,sdate); plot(cdate,pop4,'b-',cdate,pop,'g+')

Polynomial Fit

res4 = pop - pop4; figure, plot(cdate,res4,'g+')

Exponential Fit

Mathematics

Exponential Fit By looking at the population data plots on the previous pages, the population data curve is somewhat exponential in appearance. To take advantage of this, let's try to fit the logarithm of the population values, again working with normalized year values. logp1 = polyfit(sdate, log10(pop),1); logpred1 = 10.^polyval(logp1,sdate); semilogy (cdate,logpred1,'-',cdate,pop,'+'); grid on

Now try the logarithm analysis with a second-order model.

logp2 = polyfit(sdate,log10(pop),2); logpred2 = 10.^polyval(logp2,sdate); semilogy(cdate,logpred2,'-',cdate,pop,'+'); grid on

This is a more accurate model. The upper end of the plot appears to taper off, while the polynomial fits in the previous section continue, concave up, to infinity. Compare the residuals for the second-order logarithmic model. Residuals in Log Population Scale

Residuals in Population Scale

logres2 = log10(pop)polyval(logp2,sdate); plot(cdate,logres2,'+')

r = pop - 10.^(polyval(logp2,sdate)); plot(cdate,r,'+')

The residuals are more random than for the simple polynomial fit. As might be expected, the residuals tend to get larger in magnitude as the population increases. But overall, the logarithmic model provides a more accurate fit to the population data.

Analyzing Residuals

Error Bounds

Mathematics

Error Bounds Error bounds are useful for determining if your data is reasonably modeled by the fit. You can obtain the error bounds by passing an optional second output parameter from polyfit as an input parameter to polyval. This example uses the census demo data and normalizes the data by centering it at zero mean and scaling it to unit standard deviation. The example then uses polyfit and polyval to produce error bounds for a second-order polynomial model. Year values are normalized. This code uses an interval of ±2 , corresponding to a 95% confidence interval. load census sdate = (cdate - mean(cdate))./std(cdate) [p2,S2] = polyfit(sdate,pop,2); [pop2,del2] = polyval(p2,sdate,S2); plot(cdate,pop,'+',cdate,pop2,'g-',cdate,pop2+2*del2,'r:',... cdate,pop2-2*del2,'r:'), grid on

Exponential Fit

The Basic Fitting Interface