Application of median filtering to noisy data

Application of median filtering to noisy data Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal ...
2 downloads 1 Views 605KB Size
Application of median filtering to noisy data

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

David C. Stone

Abstract: The properties and application of the median filter are investigated using both simulated and real data for signals evolving with time. Comparison is made with existing numerical techniques for drift compensation and noise reduction in analytical measurements such as the moving average and Savitzky-Golay digital filters. The median filter provides a means for dealing with "spiky" noise and separating peaks from a slowly changing baseline, even when the exact nature of the drift and noise distribution is not known. Median filtering is a useful and complementary addition to existing digital filtering techniques, being mathematically robust and readily implemented on any computer platform. Key words: median filtering, signal noise, signal processing, digital filters, chemometrics, data smoothing.

RCsumC : Utilisant des donnCes simulCes ainsi que rCelles de signaux Cvoluant avec le temps, on a CtudiC les propriCtCs et le domaine d'application du filtre mCdian. On a effectuC une comparaison de cette mCthode avec les techniques numkriques existantes pour la compensation de la dCrive et pour la rCduction du bruit dans des mesures analytiques telles que la moyenne pondtrCe et les filtres digitaux de Savitzky-Golay. Le filtre mCdian fournit une fagon de tenir compte du bruit i pointes et de sCparer les pics d'une ligne debase qui se modifie lentement, mCme lorsque la nature exacte de la dCrive et de la distribution du bruit n'est pas connue. Le filtrage mCdian est une addition utile et complCmentaire aux techniques existantes de filtrage digital, il est mathtmatiquement robuste et il peut facilement Ctre mis en place sur toutes sortes d'ordinateurs.

Mots clis : filtrage mCdian, bruit de signal, Ctude des signaux, filtres digitaux, chimiomCtrique, aplanissement des donnCes. [Traduit par la rCdaction]

Introduction Many measurements in analytical chemistry involve recording the evolution of some dependent variable (the signal) with an independent variable increasing in discrete steps. The most obvious examples are the measurement of time-dependent phenomena in kinetic studies, although with the increasing use of computers many wavelength-dependent measurements can be described in the same terms. This has led to the development of various numerical. methods for dealing with noisy or drifting signals, ranging from the simple to the complex (1-3). The methodof choice for a particular situation will be determined by factors such as the nature..of the measurement and noise, the magnitude of the signal-to-noise ratio, the computational and programming resources available, and the time scale for the required calculations. Different methods may also be considered where calculations must be performed in real time as data are collected rather than following completion of the measurement cycle. Thus while sophisticated and powerful procedures are increasingly being applied to analytical situations, there is still a place for simple mathematical procedures that can be accomplished with a minimum of computational expenditure and programming experience. One such method is the techReceived February 16, 1995.

I

D.C. stone.' Department of Chemistry, University of Toronto, Toronto, ON M5S 1Al, Canada.

' Telephone: (416) 978-6568. Fax: (416) 978-8775. E-mail: [email protected]

Can. J. Chem. 73: 1573-1581 (1995). Printed in Canada / Imprim6 au Canada

nique of median filtering. Originally described by Tukey (4) in the 1970s, median filtering has previously been applied to image ( 5 )and speech processing (6).For example, many imageprocessing programs that provide "sharpening" or "edge extraction" functions exploit median filtering to achieve this. It is difficult to say how widely median filtering has been employed in analytical chemistry, although it is likely that averaging techniques and polynomial smoothing are both considerably more prevalent. Papers explicitly on median filters and their properties are rare within the chemical literature (7, 8), while the technique is not as well covered in chemometrics texts as other methods of noise reduction. Median filtering can, however, be usefully applied to problems in analytical chemistry as demonstrated by the recent original work of Moore and Jorgenson (7). They applied this method to the extraction of peaks from chromatograms having strongly drifting baselines, and were able to show that it provided a superior performance to the mathematically more complex Butterworth and Chebyshev digital filters. More recently Camus and Larson (8) applied median filtering to noise reduction in atom probe analysis. This paper is of particular interest because it highlights the implicit assumption often made that the noise distribution is Gaussian. In this case the distribution is binomial (an event with probability q will occur m times out of n trials where m is an integer, 0 5 m 5 n), so filters developed for Gaussian noise d o not perform well, distorting important features of the data. A more familiar example of this situation might be photon-counting measurements, in which errors are described by the Poisson distribution.

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

Can. J. Chern. Vol. 73, 1995

In this paper, the properties of the median filter are explored for situations common in analytical chemistry using synthetic transient response data, and its performance is compared to other, readily implemented digital filters. The transient functions chosen represent both continuous and discontinuous, symmetrical and asymmetrical functions. The baseline correction method of Moore and Jorgenson is also further evaluated in terms of the integrity of the extracted peak profile as well as the recovered peak area. Finally, the median filter is demonstrated using "worse case scenario" data from a surface acoustic wave based chemical sensor system.

Fig. 1. The effect of filtering on outliers and steps: (A) smoothing by a moving average filter ( r = 3 ) ; ( B ) smoothing by a median filter ( r = 3 ) . (-) original data, (+) filtered data.

Experimental Instrumentation The surface acoustic wave - gas chromatographic combination used in this work is a development of a system described previously (9). Briefly, a surface acoustic wave (SAW) device is used as the frequency-controlling element in a radio frequency oscillator circuit, and is situated in a GC detector oven immediately at the outlet of a capillary column. Under these circumstances, the oscillation frequency is perturbed by mass eluting from the column and adsorbing to the SAW surface. The recorded chromatographic peaks are therefore a function of both the evolving eluate concentration and the adsorption kinetics at the sensor surface. In the current system, both channels of a quartz 52 MHz dual delay-line SAW device are electrically connected in parallel and exposed to a single gas stream. A second, external SAW device is used as a signal reference source. Measurements can be made by monitoring either the difference frequency between the two SAWS, or recording the frequency of the detector SAW directly. The oscillation frequency of the SAW device was monitored continuously using an HP5328A counter (Hewlett Packard) connected to an ~ ~ ~~ al c ien t o~ s hI1 "~computer via an IEEE 488 interface bus (National Instruments). Mixtures of up to six components were injected on to a J&W 30 m x 0.53 mm DB-Wax capillary column (Chromatographic Specialities, Inc.). Chemicals used were either analytical grade and used as received or were freshly distilled before use. Confirmation of the chromatogram was obtained using an FID detector in series with the SAW device and an HP3390A recording integrator (Hewlett Packard). Software Data collection and analysis were performed using custom 6.0 (Symantec software written and compiled using Think cTU Corp.). The median filtering subroutines were written using methods derived from Press et al. (10). Additional smoothing and filtering were achieved using the generalized SavitzkyGolay method (sliding or initial-point least-squares smoothing) as implemented by Gorry (1 1). The software also includes routines for peak height, width, and area calculation, baseline subtraction, and curve fitting. Peaks are defined by the user in a special window by drawing a straight line across the base of the peak as identified visually; all peak calculations are then performed within this region. A second program was written to generate artificial data for more detailed analysis of the filter. This allowed data sets describing a variety of functions to be constructed with or without the addition of specific noise distributions. Compari-

0.0001 0.00

2.50

1

5.00

7.50 Time (arbitrary units)

10.00

son of the original and filtered data was achieved by exporting the data to a spreadsheet program for analysis and plotting.

Results and discussion General description of filter properties A common technique for improving the signal-to-noise ratio of data evolving with time is the moving average (MA) filter (ref. 2, p. 175). In essence, the kth value is replaced with the arithmetic mean of all the values in the range (k - r) to (k + r) of a moving "window" of rank r and width (2r + 1). It should be noted that the first and last r values are "lost" from the data in the process. An alternative scheme involves bin (or boxcar) averaging, where each kth value is replaced with the mean of j replicate determinations. This latter approach is satisfactory where the rate of data collection is much greater than the rate of any change in the measured signal and the noise is normally distributed. When the sampling rate is low relative to the rate of change of the signal, however, this can result in a significant distortion of the measured profile. A simple calculation can readily show that a single outlier affects the mean greatly, resulting in a rectangular "step" of width 2 r + 1 in the filtered data. In the same way it can be shown that passing a MA filter over a step function will result in a ramp of width 2 r + 1 centred at the point of the step, since each successive window will contain a greater number of larger values (Fig. 1A). Recognition of this sensitivity of the mean to outliers has resulted in increasing interest in "robust" (nonparametric) statistical methods (12). Such methods are attractive since it is readily demonstrated that the median is considerably less sensitive to outliers than the mean. Furthermore, there are no assumptions concerning the nature of the noise distribution implicit in their use. From a computational

Stone

viewpoint, however, the median could be considered as less efficient since calculation requires sorting the data into either ascending or descending order. For a set of n values xi the median is calculated as

Fig. 2. The effect of filtering on a chromatographic peak. Upper: (A) original data; (B) smoothing by a moving average filter ( r = 8); (C) smoothing by a median filter ( r = 8). Data are spaced every 0.01 min. Lower: plot of relative peak height versus filter rank for both filters. 0.40

Xi). [2] m = (Xk 2 '

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

+

k = ni2; 1 = (n/2) + 1

(n even)

Since this is a selection process it is possible to speed up the calculation by using a variation of the Quicksort method in which only those subsets containing the required kth value are sorted (10). In this modification a further improvement in speed can be obtained for even n > 100 by equating the median to the (nI2)th value, avoiding two sorts (for the two centre values). With these considerations in mind, implementation of the median filter (ME) is essentially the same as for the moving average filter: For r > 0, the kth point in the filtered data corresponds to the median of the unfiltered values in the window (k - r) to (k + r). As with the moving average filter, the first and last r points are lost. Again, a simple calculation serves to illustrate the properties of this filter. When a median filter (moving average using the median instead of the mean) with r = 1 is passed over a single outlier, the outlying value will always be at the extreme end of the ranked data, and so will never become the median value. Similarly, when a median filter is passed over a step function it can readily be appreciated that the median value will become the higher value precisely when the centre of the window is at the step point provided that the window is narrower than the step function (Fig. 1B). This clearly demonstrates the importance of careful selection and optimization of filter parameters when applying digital filtering to experimental data. Figure 2 shows the effects of applying both filters to an artificially generated chromatographic peak, which is essentially a continuous function (Gaussian) with a slight asymmetry introduced by virtue of monitoring the evolving profile at a fixed point over time rather than at a fixed time over distance (13). For the data used here the retention time and variance were 3.00 min and 0.004 min2, respectively, while the actual peak consisted of =40 data points. Both filters caused a decrease in observed peak height with increasing r, although for small r the ME filter showed a smaller reduction. They also gave rise to increased peac widths, although the median filter did so at a slower rate. The two filters showed opposite behaviour with increasing r for other peak properties such as observed area and retention time, however. Thus the MA filter gave constant area and retention time, while the ME filter showed reduced area and a slight increase in retention time (3.00 to 3.08 min for r = 0 to r = 15). These results can be directly attributed to the difference in mathematical properties of the median and mean, which also explain the flat-topped, truncated peaks produced by the median filter. The latter are produced whenever the data points are symmetrically distributed about a single minimum or maximum, i.e., the values on one side are the mirror image of those on the other. In fact it can be shown (Fig. 3) that there is a relationship between rand the number of identical values about the centre, n,. For n even, symmetrically distributed points,

0.00 2.50

2.62

2.75

2.87 Time (mln)

3.00

40.0 0

3

6

9

12

Filter rank, r

Fig. 3. Origin of peak truncation for the median filter. The median of the (k - l)th window is identical to the (k - 1)th value. The median of the kth window is equal to the (k - 1)th and ( k + 1)th values, not the kth value. kth window

2,

+

1

Original data

[3] n, = r for r odd, r > 1 [4] n, = r + 1 for r even, r > 0 and for n odd, symmetrically distributed points,

15

Can. J. Chem. Vol. 73. 1995 Table 1. Effect of filtering on peak

shape for the chromatogram of Fig. 2. RMS deviation"

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

Rank r

Median

Average

Fig. 4. The effect of filtering on an exponential peak shape: (A) original data; (B) smoothing by a moving average filter (r = 10); (C) smoothing by a median filter (r = 10). Inset: close-up of the peak maximum showing truncation of the peak by the median filter.

Data are spaced every 0.01 min.

aRoot-mean-sum of the square of the differences between ith points in the original and filtered chromatograms.

Table 2. Effect of filtering on peak shape for the profile of Fig. 4. RMS deviation"

0.00

0.50

1.OO

1.50

2.00

2.50

Time (min)

Rank r

Median

Average

4

0.000 36 0.001 06 0.003 29

0.0064 0.0157 0.0345

8

15

'Same as Table 1.

[5] n, = r + 2 for r odd, r > 0 [6] n, = r + 1 for r even, r > 0 The effect of filtering on overall peak shape can be evaluated by calculating the root mean square of the differences between filtered and unfiltered data, i.e.,

Such calculations showed that both filters caused increasing distortion of overall peak shape, the MA filter having the greater effect (Table 1). A discontinuous, asymmetrical peak shape was generated using exponential functions. This type of transient would be encountered, for example, in flow injection analysis using a well-stirred mixing tank with a step concentration input function. The moving average and median filters were again applied to these data, and their effect evaluated as before (Fig. 4). Both filters caused approximately the same decrease in peak height with increasing rank (97.7% of maximum for r = 15) while the peak width increased slightly. Times for peak maximum were unchanged for the ME filter but decreased markedly for the MA filter (86.0% of original for r = 15). A similar truncation of the peak maximum was observed for the ME filter as before, but the moving average filter caused a much larger distortion of the overall peak shape (Table 2). In particular, the time at which the concentration first increases was masked by the moving average filter. Note also that the severity of all these effects for both filters depends on the number of points sampled during evolution of the profile: the

higher the sampling rate and the broader the profile, the lower the relative distortion of the filtered to original signals.

Effect of noise on filter performance The effect of measurement noise was investigated by superposing known noise functions onto the original data shown in Fig. 1 and repeating the filtering calculations. The noise values were generated using methods outlined in Press et al. (10) to give three different distributions: (i) uniform random deviates with zero mean and ranges ?0.02,0.05, and 0.10; (ii) normally distributed deviates with zero mean and standard deviations of 0.02, 0.05, and 0.10; and (iii) exponentially distributed deviates with zero mean multiplied by scaling factors of 0.02,0.05, and 0.10. As expected, the median filter successfully removed the spike from the data while retaining the step function for all three noise types. (Note, however, that when r is greater than or equal to the width of the step function in the data (r = 5 in this case) the step function will be completely removed.) This is to be expected since, as we have seen, the mathematical properties of the median filter favour application to data where edge retention is important (5). In the same way, the moving average filter gave rise to the same features as shown in Fig. 1B. Both filters were successful in reducing baseline noise as measured by the standard deviation, there being a significant difference in s between filtered and unfiltered data as measured by the F-test (p = 0.05). Using the same measure, there was no significant difference between the standard deviations for the two different filters. Similar results were obtained by Camus and Larson for a simulated composition profile with superposed binomially distributed noise (8). Filter performance for drift compensation Moore and Jorgenson (7) used median filtering to extract chromatographic peaks from data with a drifting baseline by first using the filter to completely remove the peak of interest and then subtracting the baseline to obtain the peak in the absence of drift. As long as the filter rank was greater than or equal to the number of points in the peak, accurate peak areas and heights were obtained on comparing the filtered and unfiltered data. A similar procedure was carried out using the data of Fig. 2 added to known baseline drift and noise functions. Initial cal-

Table 3. Evaluation of median filtering for correction of baseline drift

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

No.

Description" Original 1 + S 7.510.01 As 2 2 + R 0.2510.14 As 4 1 + S 3.510.04 As 6 1 + S 3.510.02 As 8 1 + S 3.510.01 As 10 As 10 1 + R 0.0710.10 1 + R 0.0310.10 1 + R 0.0110.10 1 + R 0.0510.10 1 + R 0.1010.10 1 + R 0.2010.10

Noise rangeb

Baseline slopec

r

% area

% height

% width

None None 0.02 0.02 0.02 None 0.02 None None None None None None None None None None None

'S XIY: sine curve with period X minutes and amplitude Y;R XIY: linear ramp with slope X and intercept Y. bunifom random deviates with zero mean and indicated range. 'Measured adjacent to peak for ramp functions and by interpolation across peak for sine functions, arbitrary units.

culations were performed using a filter of rank 40 since there were approximately this many points in the peak. The performance of this method of baseline correction was evaluated in terms of the recovered peak area as a percentage of the starting peak area, and the results are summarized in Table 3. In general terms, the technique worked well provided that the slope of the baseline within the peak envelope did not change rapidly, although a slight rise of the baseline, after filtering, on the trailing side of the peak was observed in some cases. This is a result of the properties of the median when applied to data having a varying slope, and results in a slight dip below zero in the baseline-subtracted peak. With the data analysed here, 95% or better of the original peak area (based on the area before addition of simulated drift) was recovered using r = 40 when the magnitude of the baseline slope under 4 comparison, the peak height/ the peak was less than ~ 0 . 0 (for width ratio was 2.10). The recovered area could be improved for any degree of baseline drift by increasing the width of the window used in the filter, but this also resulted in a considerable increase in the time taken to perform the filtering operation. In terms of the other peak parameters, the recovered peak heights and widths (width at half-maximum height) follow the same basic pattern as the areas, although the extent of the variation is lower. For example, when only 67% of the area was recovered, 82% of the height and 87% of the width were recovered. Peak retention times were only shifted relative to the peak of Fig. 2 when artificial noise was added to the signal, typically giving a variation of less than 0.3%.

Comparison with other digital filters A common approach to filtering noise is the application of Savitzky-Golay (SG) smoothing (14-18). This technique also

uses a moving window; however, a polynomial is fitted through the points in the window by the method of least squares. (The MA filter can therefore be considered as a zeroorder SG filter.) The success of this technique lies in the fact that fitting can be achieved by simple convolution with a set of integer weights, rendering it computationally fast and simple. An added benefit is that derivatives can easily be obtained during the smoothing process by calculating the appropriate weights, explaining the widespread use and popularity of the method. Although the first and last r points are again lost on smoothing the data, Gorry (1 1) has developed a refinement that retains these values. In the following discussion, the exact filter will be identified using Savitzky and Golay's notation, XaYZ, where X is the width (= 2 r + I), Y the order of the polynomial, and Z the order of the differential. The effect of applying SG smoothing to the data of Fig. 1, both with and without added noise, was investigated using r values of 2,5, and 7 and both second- and third-order polynomials. The noise distributions used were as follows: normal with mean zero and standard deviation 20.05; uniform with mean zero and range k0.05; and exponential with mean zero and scale factor 0.05. Some typical results are shown in Fig. 5. Note that at sharp edges the filter produces ripples that widen as the filter rank is increased. This is a result of trying to fit a polynomial function across a discontinuity: similar results would be obtained with filters based on the fast Fourier transform (FFT) algorithm such as the Butterworth and Chebyshev filters (7, 19). Savitzky-Golay smoothing also reduces the height and increases the width of such features as spikes and steps. Note that the smoothing gives rise to a curving baseline in the same way that sharp edges give rise to ripples. A onetailed F-test was used to see if there was a significant improvement between the baseline standard deviations of the filtered

Can. J. Chem. Vol. 73. 1995 Fig. 5. The effect of a 15a20 Savitzky-Golay smoothing on the data of Fig. 1 with added uniform noise. See text for further details.

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

0.700

0.100 0.00

2.00

4.00

6.00

8.00

Fig. 6. Effect of the Savitzky-Golay, median and moving average filters on different baseline noise distributions as a function of filter rank r. (A) Exponential noise with zero mean and scale factor 0.05; (B) Gaussian noise with zero mean and standard deviation 0.05; (C) uniform random noise with zero mean and range 40.05. Noise is expressed as the standard deviation relative to that of the unfiltered data.

10.00

Time (min)

and unfiltered data for each noise function over increasing time frames. Generally, it was found that while all the filters used here reduced baseline noise significantly over long time periods, a significant reduction was only obtained over short periods with normally distributed noise. When the moving average and median filters were applied to the same data, the results were the same as in the absence of the noise functions. Camus and Larson report that a MA filter of rank 9 gave a better reduction in noise than a ME filter of equal rank for a sloping function with binomially distributed noise, whereas the median filter showed a significantly better performance for a step function, in keeping with the results obtained here. Figure 6 shows the performance of the three different filters for three data sets, each containing 1000 points and having a different noise distribution function. A filter rank of zero indicates the unfiltered data. For comparison, Camus and Larson presented similar calculations using a binomial noise distribution and median, moving average, and parabolic filters. As expected, all filters successfully reduce noise as r is increased for flat baselines regardless of the noise distribution, although the magnitude of the reduction in variance decreases exponentially with increasing r. For exponential and normal noise functions, there is little to choose between the ME and MA filters if the sole criterion is reduction in baseline variance. For uniform and binomial noise distributions, the MA filter gives the better performance for the same rankprovided that the data do not contain a discontinuity, such as a step function. Both the MA and ME filters perf0c.m better than the SG filter for exponential and normal noise distributions. This is because (for the same rank) the noise reduction capability of SG filters decreases with order and, as mentioned previously, the MA filter is a SG filter of rank zero. For uniform deviates the SG filter would be preferred to the ME on the basis of computational speed only. A similar study was performed on the peak of Fig. 2 in order to compare the performance of the different filters with regard to peak shape: different noise functions were added to the original data, and the filters applied prior to peak shape analysis. All SG filters studied were third-order polynomials (i.e., Xa30 filters). Since the original data set is a continuous function with superposed noise, the SG filter produced increasingly smooth peaks as the window width was increased. The baseline was also flattened out, exhibiting similar features as noted above. In comparison, the ME filter gave rise to increasingly

0

2

4

6

8

Rank r

0

2

4

6

8

Rank r

0

2

4

6

Rank r

8

Stone Fig. 7. Distortion produced by different filters on noisy data. Original = data of Fig. 2; exponential = exponential noise with scale factor 0.02; Gaussian = normally distributed noise with zero mean and standard deviation f0.02; uniform = random distribution with zero mean and range f 0.02; medX = median filter of rank X;

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

aveX = moving average filter of rank X.

Fllter type and rank truncated, flat-topped peaks as described earlier. The MA filter again had a similar effect on the baseline, while the peaks were of the expected shape. Analysing the peak parameters showed that both the ME and MA filters gave a decrease in peak height and an increase in width at half height with increasing r as noted above, the ME filter showing the slower rate of change. Results for the SG filter were harder to interpret because of the added noise, but comparing results for the same analysis performed on the original peak showed that the SG filter increased area and decreased-height while leaving the width at half height and retention time unaffected. Similar results have been obtained by Enke and Nieman for Gaussian and Lorentzian peaks (16). Results for the ME and MA filters were identical to those noted earlier. The effect of filtering on overall peak shape was also studied by comparing the noisy peak (filtered and unfiltered) with the original noise-free data by calculating the distortion as described above. The results are summarized in Fig. 7 and show two effects: (i) an initial decrease due to noise reduction followed by (ii) increasing "true" distortion due to the filter properties. Note that in the absence of any superposed noise all the filters cause increasing distortion with increasing r, the least effect occurring with the SG filter and the greatest with the moving average. With a uniform noise distribution of zero mean and range k0.02 there appears to be an optimum filter rank for reducing peak distortion for all three filters, corresponding to the balance of the two processes already outlined. The rapid increase in distortion measured for the ME and MA filters can be attributed to the peak truncation and increasing inclusion of the baseline, respectively. For normal (Gaussian) and exponential ("spiky") distributions, the best results were obtained for the SG filter with increasing rank: significant reductions in distortion were also achieved using the ME and MA filters. but this was reversed if the rank was increased beyond an optimal value.

Table 4. Comparison of approximate

calculation times for different filters. Time" (s)

"Calculations performed on the same 1000 point data set using a Macintosh I1 with 68020 processor and 68881 co-processor. bThe specific implementation of the routine to calculate weight tables for SG filtering was limited to a rank of 25 for practical reasons.

The final comparison between these three filters is concerned with the time taken to perform the necessary computations. Table 4 summarizes the time required to complete filtering of the same 1000 point data set for different filter parameters. These results were obtained using the implementations described in the experimental section, with the exception that median calculations were performed using a full data sort. The reason for this was that the faster median implementation described earlier was found to be faulty whenever the routine was applied to a window containing a sequence of identical values. Clearly, if this error was remedied calculation times could be improved. Although calculation times on different platforms will to some extent be implementation and compiler dependent, generally faster times will be obtained on more recent personal computers. Other digital filtering techniques that have been applied to

Can. J. Chern. Vol. 73, 1995

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

Fig. 8. SAW-GC response for injection of 0.20 yL of 1:1 toluene:2-nitrotoluene (by volume) using Tinj= 207OC, T,,, = 17S°C, T,,, = 72OC, and T,,, = 57.6OC. 2-Nitrotoluene peak (a) in the presence of electrical noise, and (b) after application of the median filter, r = 1. Note: no coating on the quartz SAW device. For other details, see text and ref. 9.

0.0 2.00

3.50

5.00

6.50 Time (min)

that under certain conditions with the particular configuration used, the SAW response can be subject to noise resulting from the heating circuits of the GC turning on and off. This noise is characteristically a regularly spaced, single-point "spike" in the data and thus ME filtering represents a logical choice for its removal. An example is shown in Fig. 8 both before and after filtering. Since the noise peaks are only a single point, they can be successfully removed using r = 1, ensuring minimal distortion of the analyte peak profiles. Both SG and moving average filters would not be as successful in removing such features. A hazard in the use of any form of data processing is overprocessing, i.e., applying a sequence of processing techniques that produce a good picture but in the process completely mask the quality of the original measurement. For completeness, an example is given in Fig. 9. This shows results deliberately obtained from the SAW-GC system under a "worstcase" scenario: a badly aged SAW device operating at elevated temperature and close to the edge of the oscillators' stable region combined with an old, contaminated capillary column and injection port septum. Application of ME filtering with subsequent SG smoothing to restore a rounded appearance to the analyte peaks gives a reasonable chromatogram, but in the process has generated one spurious peak and significantly altered peak heights and areas. Multiple filter passes and combinations of techniques can make poor data look good, but they do not make it better. As always, it is the responsibility of the operator to exercise judgement on both the quality of the data and the required level and sophistication of filtering applied to it prior to analysis.

Conclusions analytical signals are based on either Fourier Transform or Kalman filtering techniques (1-3). Unlike the methods described so far, these techniques require more detailed knowledge of the nature of the signal and any noise or drift associated with it. In addition, they represent a significant increase in mathematical and computational complexity. A good example of this is the recent application of a modified extended Kalman filter incorporating least median of squares (robust) regression to detect and correct drifting calibration in graphite furnace-AAS while minimizing the effect of outliers and the number of recalibration measurements (20). One potential advantage of.Kalman filtering is that it is not necessary to know the exact physical relationship describing the evolving signal, as long as a suitable mathematical function can be found that accurately describes the observed behaviour. To obtain good results with this technique, however, it is essential to have good estimates of the system and measurement noise. Another potential problem with FFT-based filters (such as the previously mentioned Butterworth and Chebyshev filters) is that they can introduce features into the data that may constitute an interference from an analytical viewpoint. A more detailed comparison with these filters is beyond the scope of the present article.

Application to transient sensor signals As described in the experimental section, median filtering has been implemented in the software used to collect and analyse data from a SAW-GC combination. The reason for this was

Like any other digital processing technique, median filtering has its advantages and disadvantages. It has been demonstrated that, compared to moving average and Savitzky-Golay filtering, ME filtering is particularly effective at removing noise spikes from experimental data. When applied in this manner, low filter ranks should be used (1 5 r 5 3) to ensure successful filtering with minimal distortion of the original signal. Likewise, ME filtering is particularly well suited to situations in which retention of sharp edges and discontinuities in the data are important. Median filtering can also be successfully applied to peak measurement in the presence of drifting baselines as previously demonstrated by Moore and Jorgenson (7). In addition to the conditions identified by these authors, it has been demonstrated here that the relative baseline slope is important in determining the accuracy of peak areas and heights obtained by this method. In more general terms, a moving average or SavitzkyGolay filter may be preferred in situations were large filter ranks are required for successful smoothing of noisy data, but once again careful attention must be paid to the nature of the signal being processed and the information required from it: if peak heights are important, for example, an SG filter may well be preferable to an MA filter. Finally, a simple median filter could be implemented for real-time smoothing of experimental data as an alternative to other techniques, provided that large values of r are not required. Median filtering is thus a useful complement to existing data-processing methods for instrumental measurements.

Stone

Can. J. Chem. Downloaded from www.nrcresearchpress.com by MICHIGAN STATE UNIV on 01/20/17 For personal use only.

Fig. 9. SAW-GC chromatogram showing overprocessing of very noisy data: (a) original data for a six-component mixture of organic compounds; (b) the same data following median filtering (r = 2) and SG smoothing (15a20).

0.0 2.00

3.00

4.00

5.00

6.00

7.00

Time (min)

Acknowledgment I would like to thank Michael Thompson (University of Toronto, Canada) for providing the opportunity to perform this work; M. Donata Frank (University of Toronto, Canada) for the SAW-GC measurements; and Peter A. Gony (University of Manchester, England) for providing a FORTRAN listing of his generalized Savitzky-Golay routine. This work was supported in part by the Institute for Chemical Science and Technology, Canada.

References 1. S.D. Brown. Anal. Chim. Acta, 181, 1 (1986). 2. R.G. Brereton. Chemometrics: applications of mathematics and statistics to laboratory systems. Ellis Horwood, New York. 1990. 3. C.L. Erickson, M.J. Lysaght, and J.B. Callis. Anal. Chem. 64, 1l55A (1992). 4. J.W. Tukey. Exploratory data analysis. Addison-Wesley, Reading, Mass. 1977. 5. T.S. Huang (Editor). Two-dimensional digital signal processing 11: Transforms and median filters. Springer-Verlag, Berlin. 1981.

6. L.R. Rabiner, M.R. Sambur, and C.E. Schmidt. IEEE Trans. Acoust. Speech Signal Process. 23,552 (1975). 7. A.W. Moore and J.W. Jorgenson. Anal. Chem. 65, 168 (1993). 8. P.P. Camus and D.J. Larson. Appl. Surf. Sci. 76177,416 (1994). 9. M. Thompson and D.C. Stone. Anal. Chem. 62, 1895 (1990). 10. W.H. Press, S.A. Teukolsky, W.T. Vettering, and B.P. Flannery. Numerical recipes in C: The art of scientific computing. 2nd ed. Cambridge University Press, Cambridge. 1992. pp274ff, 341ff. 11. P.A. Gorry. Anal. Chem. 62, 570 (1990). 12. J.N. Miller. Analyst, 118,455 (1993). 13. P.J. Karol. Anal. Chem. 61, 1937 (1989). 14. A. Savitzky and M.J.E. Golay. Anal. Chem. 36, 1627 (1964). 15. J. Steinier, Y. Termonia, and J. Deltour. Anal. Chem. 44, 1906 (1972). C.G. Enke and T.A. Nieman. Anal. Chem. 48,705A (1976). A. Proctor and P.M.A. Shenvood. Anal. Chem. 52,2315 (1980). P.D. Wentzall, T.P. Doherty, and S.R. Crouch. Anal. Chem. 59, 367 (1987). R.W. Hamming. Digital filters. 3rd ed. Prentice Hall, Englewood Cliffs, N.J. 1989. D. Wienke, T. Vijn, and L. Buydens. Anal. Chem. 66, 841 (1994).

Suggest Documents