Curve Fitting Best Practice

Enabling Science Curve Fitting Best Practice Part 2: Resolving fitting issues The most effective way to maximize the quality of data sets involves fo...
Author: Darrell Ross
21 downloads 2 Views 723KB Size
Enabling Science

Curve Fitting Best Practice Part 2: Resolving fitting issues The most effective way to maximize the quality of data sets involves following a set of ‘best practice’ criteria for your data, helping you rectify issues in the fitting process. Ensure there are enough measurements If there is not enough data in your data set, you may have to perform more measurements, change the assays being run or increase the number of data points being collected to generate more meaningful results. Ensure data is defined within all areas of the fit If the data set is not complete, you can take more measurements to fill in the blanks in the data ranges. For example, for a dose response curve, ensure the data set contains a well defined minimum and maximum, as well as enough data points to construct the center of the curve appropriately. Acquiring a complete set of data for all areas of your curve may also require you to check that your assays are collecting data in the correct ranges. Knock out outlying data points Outliers can be ‘knocked out’ from the fit either manually one by one or with automatic processes such as robust fitting, which can analyze and automatically detect outliers, refitting the data set accordingly. Be careful not to knock out too many data points, for example, avoid knocking out points until you seemingly get a good fit.

Fig 1: An Excel spreadsheet contains further analysis controls that limit the Chi2 value and the number of outlying data points that a user can knock out (top). The spreadsheet flags the number of knocked out points as red (bottom) when the limit is reached and flags the Chi2 value as red when its threshold is exceeded.

IDBS • Unit 2 • Occam Court • Surrey Research Park • Guildford • Surrey GU2 7QB • UK t: +44 1483 595000 • e: [email protected] • w: http://www.idbs.com

Curve Fitting Best Practice Part 2

By putting a control that limits the number of data points that a user can knock out in a spreadsheet or analysis, you can ensure that there are always the minimum necessary amount of data points to produce a meaningful fit. About robust fitting Iteratively Re-weighted Least Squares (IRLS) or robust fitting is an automatic method of knocking out outlying data points. A fitting process is iterative in that it performs a set of cycles and, in each successive cycle, changes the parameter values in order to converge on the best fit.

Robust fitting also takes into account the individual ‘weighting’ of a data point when performing this fitting process. Weighting assigns more bearing or relevance to data points for which the measured value is closer to the fitted value and less bearing to outlying data points, so reducing the impact of outliers. On each cycle of the fitting iteration, IRLS changes the weighting values for each data point as well as parameter values, converging on a better fit where the effect of outliers is reduced or eliminated.

Fig 2: Using the automated data knock out technique IRLS on the two datasets above, XLfit has identified and knocked out significant outlying data points, eliminating the need for manual interaction

Choose the correct model to analyze your data For example, if you are using a standard dose response curve to fit a data range that has several data points ‘sloping away’ to form a bell-shaped curve, you could use a bell-shaped dose response curve instead, which treats the fit as two linked dose response curves and ultimately extracts two IC50/EC50 values.

Fig 3: Rather than removing the last few data points in the curve above to enable the data to be fitted to a standard dose response curve, XLfit enables you to use a bell-shaped dose response curve which essentially fits the data as two curves, with individual results and slopes. The bell-shaped model allows you to analyze and interpret dose response data without having to reject data points.

IDBS 2008

Page 2 of 6

Curve Fitting Best Practice Part 2

Check the model accepts zero values If you are experiencing ‘division by zero’ errors, either change to a model that accepts zero values or change the zero concentration value to an arbitrarily small amount to ensure the model returns a result. Use a model with the correct number of parameters To avoid poor quality results due to an over-parameterized model, lock parameters at specific values or choose a more appropriate model for your fitting needs. For example, by locking a parameter that represents a hill slope, you can change a model from a four-parameter model to a three-parameter model. Use appropriate starting parameter values If you know what each parameter in a model represents, it is possible to make reasonable estimates for good starting parameter values based on the data set and an understanding of the model. By bringing parameter starting values closer to these estimates, or by employing techniques such as prefitting, where calculations define suitable starting parameter values for a particular fit, researchers can greatly improve fitting results quality.

Ensure the fit has converged

Fig 4: The curve above has failed to fit. Opening graph information in XLfit (below left), you can see that all the starting parameter values are at zero. Change to more meaningful values (below right) or use XLfit’s prefit functionality, which assigns a new set of parameter starting values. When you choose prefit, XLfit assesses the data set you have provided and uses inbuilt calculations to determine good starting parameter values for the selected model.

If the fit has failed to converge, examine the convergence criteria, including the convergence limit which specifies an optimum convergence, and the fit results to analyze why the fit is not converging. You may need to increase the number of iterations in the fitting cycle to achieve a meaningful result. The process may also have reached a local minima, which prevents convergence due to an endlessly repeating iterative process that cannot terminate.

IDBS 2008

Page 3 of 6

Curve Fitting Best Practice Part 2

Fig 5: The curve on the left has only 5 iterations so has failed to converge. Increasing the number of iterations in Excel to 100 enables an accurate fit (right).

Check results and corresponding error information For each final parameter value, i.e., result, check the corresponding standard error value and 95% confidence interval value. The higher these values, the poorer the quality of your fitting result. By taking into account all the individual error values for a result, you can determine more accurately whether or not the results are meaningful. Check goodness of fit When your fit has reached its final convergence, you can check ‘goodness of fit’ for the curve using a range of statistical checks. Residuals Squaring the sum of each data point’s residual – the distance between the measured data point and its corresponding point on the fitted line – gives an indication of best fit for your data. Outliers can also be identified based on their residual value – the farther away from the fitted line that a data point is, the more likely it is to be an outlier.

Fig 6: Measuring residuals in XLfit to identify outliers

IDBS 2008

Page 4 of 6

Curve Fitting Best Practice Part 2

Fig 7: Knocking out outliers in XLfit recalculates residuals and changes the fit to lower the Chi2 value and improve fit quality

F test The F test compares the variances of the Y values to the variances of the fitted Y values. T test The T test compares the means of the differences between the Y values and the fitted Y values to the corresponding mean standard deviations. Both the F and T tests judge goodness of fit depending on how close the results of the calculations are to 1. Chi2 2 Used as a direct representation of scatter in a data set compared to the fit, Chi is the magnitude of the residuals and sum of those residuals. Normalized Chi2 divides the Chi2 and is probably the best Chi2 result you can use.

An example best practice process Pre-analysis Check the concentration ranges in your data, for example, ensure that you have three orders of magnitude in the concentration range.

Check the number of data points. For example, if you only have three or four data points, you can immediately reject the data set as it will produce a poor quality fit. Data can then be normalized, as discussed earlier, to enable direct data range comparisons. Analysis Once data is normalized, check again that the number of data points is sufficient for a good quality fit. Fitting Choose from a single fit, such as a standard dose response curve, a dose response bell shaped curve, a three, four, or five parameter curve, or you can analyze multiple curves so that you have a selection of a library of curves that are related. For example, you might have a two site and single site dose response model that you can fit at the same time, then perform a check such as a T or Chi2 test or some other fitting analysis to see which of the two models has best fitted the data. Perform QC tests Check all goodness of fit statistics, parameter values and associated error values.

Check if your IC50 value is greater than the highest concentration tested, and if so that means you’re trying to extract data from the data set which is in a non-defined area of the curve. IDBS 2008

Page 5 of 6

Curve Fitting Best Practice Part 2

Check if the minimum response produced is greater than 50%, and if so, all the data points are too high.

Check if the maximum response is less than 50%, in which case the dose response may be non-active.

Apply automated robust fitting techniques to eliminate outliers and improve the quality of the fit. Perform residuals and goodness of fit checks such as F and T tests and Chi2. Check fitting status If the fit has failed, you can try to rectify by checking the parameter starting values and make changes if necessary, and then refit.

Perform a manual QC, if required, and reanalyze the data. Check the user hasn’t rejected too many data points and perform the fitting process again.

If the fit has passed, generate a report on that particular fit, including in the report all parameter values and error estimates, fitting statistics, Chi2 values, F and T test values, and number of knocked out points.

If another researcher or a project leader is interpreting the results you have produced, include the graph, as well as its analysis, in the report. If you are producing intersects, such as EC20s, EC80s, IC20s and IC80s, report the error values for the intersects as well.

Summary Using the best practice guidelines above, plus automation wherever possible, you can improve consistency, reduce errors and optimize the fitting process to be faster and more efficient.

While the pre-analysis checks help improve the quality of your data set before the process begins, automated fitting consistently analyzes data from start to finish and eliminates curves as soon as poor quality is detected. Improving consistency means that researchers can directly compare curves knowing that the data sets have been produced in the same way with the same techniques and analysis interpretation. Using best practices with automation increases throughput and saves time, reducing the need for manual QC intervention and accelerating decision making.

IDBS 2008

Page 6 of 6