Fast Accurate Fuzzy Clustering through Data Reduction

Fast Accurate Fuzzy Clustering through Data Reduction Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Scien...
Author: Homer Pope
10 downloads 0 Views 116KB Size
Fast Accurate Fuzzy Clustering through Data Reduction Steven Eschrich, Jingwei Ke, Lawrence O. Hall and Dmitry B. Goldgof Department of Computer Science and Engineering, ENB 118 University of South Florida 4202 E. Fowler Ave. Tampa, Fl 33620 hall,eschrich,goldgof @csee.usf.edu November 13, 2002

Abstract Clustering is a useful approach in image segmentation, data mining and other pattern recognition problems for which unlabeled data exist. Fuzzy clustering using fuzzy c-means or variants of it can provide a data partition that is both better and more meaningful than hard clustering approaches. The clustering process can be quite slow when there are many objects or patterns to be clustered. This paper discusses an algorithm, brFCM, which is able to reduce the number of distinct patterns which must be clustered without adversely affecting partition quality. The reduction is done by aggregating similar examples and then using a weighted exemplar in the clustering process. The reduction in the amount of clustering data allows a partition of the data to be produced faster. The algorithm is applied to the problem of segmenting 32 magnetic resonance images into different tissue types and the problem of segmenting 172 infrared images into trees, grass and target. Average speed-ups of as much as 59 to 290 times a traditional implementation of fuzzy c-means were obtained using brFCM, while producing partitions that are equivalent to those produced by fuzzy c-means.

1 Introduction Clustering algorithms can be used to partition unlabeled data [15]. A clustering or partitioning of unlabeled feature vectors obtained from an image provides a segmentation of the image into unlabeled regions. The resultant regions can be labeled by hand or by some automated process.

c

2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.

1

Images are often large, made up of many pixels or voxels. A 512 by 512 pixel image from which a feature vector is extracted at each pixel provides 256,000 unlabeled feature vectors. Such an image, which is not considered particularly large today, will take a significant amount of time to cluster or partition. In the domain of data mining, a large number of unlabeled examples which can be described by real valued feature vectors may be better understood if they are partitioned by a clustering program. Such data sets may be created in the process of text mining [22] or drug discovery [23], for example. However, the process of producing a partition with a clustering algorithm can be very time-consuming. In this paper, we look at a method to speed up the clustering process and yet produce a partition of quality equal to that of well-known, slower algorithms. The number of examples to cluster are reduced by placing them into bins, creating an exemplar for each bin and weighting the exemplar by the number examples in its corresponding bin. In particular, we examine modifications to the fuzzy c-means (FCM) clustering algorithm [2, 10]. FCM has been shown to provide effective partitions for image segmentation on medical images [12] [3][4][21][11][20][33], satellite images[33][32], infrared images [13], etc. Here, we show examples of the modified algorithm applied to 2 image segmentation domains. The modifications proposed here are based on 2rFCM [16, 10]. In order to provide speedup and FCM correspondence results, we test against a set of synthetic infrared images of natural scenes. These images were generated for an Army Research Lab project and were used for automatic target recognition (ATR). The natural scenes were observed to cluster well into trees and grass. Each image is a 480x830 8-bit grey-level image and was clustered into 5 clusters. There are 172 images in the dataset. One of Laws’ texture energy features [19] was generated to create a two-dimensional vector (pixel intensity and texture). We also test the system using a set of magnetic resonance (MR) images of the human brain. There are 32 MRI slices from 7 normal volunteers. Each slice consists of three 12-bit features in a 256 by 256 pixel image. Physician-generated ground truth exists for each image. We call the algorithm introduced here brFCM and compare its reduced execution time to that of FCM on both sets of images. The differences in the resulting partitions are also examined and shown to be insignificantly different than different random initializations of FCM itself. This paper is organized into six sections. Section 2 briefly discusses related work, section 3 discusses the modifications to FCM that make up brFCM, section 4 discusses some implementation issues that must be addressed to maximize the speedup, section 5 presents experimental results and section 6 contains the conclusions.

2 Related work The most frequent operational complaint about FCM is that it may consume  for large data sets  significant amounts of CPU time [6]. This concern also applies to variants of FCM. In [6], AFCM was implemented to approximate FCM by using a lookup table approach. The time required per iteration in AFCM is one sixth of the time required by FCM. But AFCM is not guaranteed to converge, and the lookup tables in AFCM depend on the number of bits in the 2

data. In [28], a multistage random sampling fuzzy c-means (mrFCM) algorithm was devised to expedite FCM. The average speedup factor of mrFCM vs. FCM was shown to be 3.14. mrFCM preserves the convergence property and the quality of clustering of FCM. mrFCM has also been used in other research [33][32]. In [27, 24] a method of subsampling data is proposed and shown to provide good speed up when the subsampling algorithm is effective. Previously, the possibility of combining identical feature vectors to speed up FCM was examined. However, there are generally few identical feature vectors. In this paper, the combination of similar feature vectors is used to speed up FCM. Higher-dimensional feature spaces may yield improved image segmentation when compared to visual interpretation or gray-scale segmentation of a single image [4]. On the other hand, clustering in multidimensional feature spaces is more time intensive than in one dimensional feature spaces. Due to the difficulties in multidimensional optimization [16], people attempt to minimize an overall one-dimensional objective function. The fuzzy c-means algorithm uses iterative optimization to approximate minima of an objective function which is a member of a family of fuzzy c-means functionals using a particular inner product norm metric as a similarity measure [6]. Our work on speeding up fuzzy c-means has some connection to vector quantization [14, 7] in the sense that our first step can be seen to be a quantization of the data. In [1], examples to be clustered are removed during the clustering process when they are no further from the centroid than they were in the previous iteration. On several images, a little more than a two times speed up is obtained. In [18], a faster version of relational fuzzy c means is proposed and shown to be robust and up to an order of magnitude faster for text clustering. A fuzzy c means clustering algorithm designed for cases in which the number of features is significantly larger than the number feature vectors utilizes the covariance structure of the feature vectors and cluster centers to reduce the number of floating point operations and provide some speed increase [5]. Delaunay triangular functions are used to store proximity information to speed up a medoid clustering model for spatial data mining [29]. In [26], the authors utilized visualization in conjunction with automated clustering to speed up the process of partitioning data. In [23], an extension to the self organizing map is used to cluster protein sequences by choosing generalized medians of symbol strings to reduce the amount of data for clustering. A scalable, parallel approach to clustering has recently been explored for a shared nothing architecture [31]. Several efficient and scalable parallel algorithms for squared error clustering have been proposed for a special purpose architecture where reconfigurable optical buses and a variable number of processors are available [30]. A method of accelerating the hard c-means (k-means) algorithm and obtaining a partition equivalent to that obtained by hard c-means is given in [25]. They show a speed up of up to 170 times on real astrophysical data. The approach scales poorly for more than eight features. Some implementation efficiencies for FCM along the lines of those we utilize were recently examined in [17]. They look at eliminating the partition (U) matrix and show some impressive speedups on a small set of examples.

3

3 brFCM In [16], a description is given of a modified FCM algorithm known as 2rFCM. The algorithm reduces the number of feature vectors to be clustered, possibly reducing the precision of the data, in order to speed up the clustering. We present an alternate view of the algorithm, generalizing it to arbitrary numeric data. This algorithm is discussed within the image processing domain, however the technique can be applied to many other clustering problems. The brFCM algorithm consists of two phases: data reduction and fuzzy clustering using FCM. The data reduction phase consists of an optional precision reduction step and an aggregation step. Both steps attempt to reduce the number of feature vectors presented to the FCM algorithm. Specifically, we attempt to reduce the number of distinct examples to be clustered from  to  , for some    . At the same time, we want to preserve partition “goodness.”

3.1

Data Reduction: Overview

The first step in data reduction for the brFCM algorithm is quantization. When continuous data of any type is measured and translated into the digital domain, some level of quantization occurs. This quantization is often a reduction in precision and therefore a potential loss of information. However, an assumption is generally (and we feel appropriately) made: Small, often human-imperceptible changes in values do not affect the classification of the object in question. This can be seen as a tolerance to noise or simply the extent of distinguishability within the human brain. Typically, we expect small changes in values lead to correspondingly small changes in distances. Clustering makes the assumption that groups of examples exist within some close distance in feature space. The second step of data reduction in brFCM is aggregation. For this algorithm, aggregation combines identical feature vectors into a single, weighted exemplar. The existence of identical feature vectors is not common in all data, however where it does exist there is often significant complexity reduction seen by aggregating the duplicate examples. Again, consider the image processing domain and an image with 256 grey-level values of intensity as the only feature. As long as the image contains more than 256 pixels, some level of aggregation can occur. An image of size 800x600 would provide a 99.9% reduction in feature vectors (from 480,000 to 256) at a minimum. When both quantization and aggregation are used, significant data reduction can be obtained. Quantization forces different continuous values into the same quantization level or bin, creating identical feature vectors from “similar” ones. In brFCM, aggregation creates a single exemplar representing the quantization bin. The value of this exemplar is taken as the mean value of all full-precision feature vectors quantized to this bin. The data reduction step of brFCM attempts to create truly representative examples from the original feature space. The mean value of a set of examples retains some information. In addition, each representative feature vector has an associated weight, corresponding to the number of full-precision examples within the quantization level. 4

Once the data reduction has been accomplished using quantization and aggregation, the resulting dataset of examples is then clustered using a modified FCM clustering algorithm. Once clustering is complete, the representative feature vector membership values are distributed identically to all members of the quantization level. Data reduction using quantization will necessarily lose information about the dataset. There is no a priori method of determining an appropriate level of reduction; acceptable precision loss must be empirically determined. As will be shown later in this paper, small precision reductions can produce clusters that closely correspond to FCM. It should also be noted that quantization is an optional step in data reduction. The brFCM algorithm with only aggregation is functionally equivalent to the original FCM. If data redundancy is significant, the dataset can be represented in a more compact form for clustering. The brFCM algorithm can then be used with significant computational savings vs. FCM with no difference in clustering output.

3.2

brFCM Details

Once the data reduction phase of brFCM has been performed, the reduced precision dataset can be clustered by the FCM algorithm. FCM is modified to include support for weighted feature vectors. Recall that the aggregation step of data reduction creates representative exemplars. The weights correspond to the number of aggregated feature vectors. In more formal terms, consider the set of example vectors representing a reduced-precision view of the dataset . There are  such vectors, such that  . Each  represents the mean of all full-precision members in the  quantization bin,  . In addition,  has an associated weight,   , representing the number of feature vectors aggregated into  . Clustering is done through alternating calculations of membership values and cluster centroids. The cluster centroids are calculated by

 

 (*  ,+.-  %!#$'" &  )  (*  ,+./10 3246587 %!#$'" &  )

The cluster membership values are calculated by

*  

9 : $'&

;

 ;   ;   : ; 

(1)

& < =?>A@ B (2)

where 0 32CD5 and 0 6E . It is worth noting two particular features of this algorithm:

F When no quantization occurs and the aggregation step does not reduce the dataset,      and  0 for all 2 . The algorithm reduces to FCM.

F When the aggregation step is used by itself, the algorithm also reduces to FCM. However, it is a more efficient calculation since identical terms in the summation are grouped together. This formulation can significantly improve the speed of clustering, without a loss of accuracy. 5

3.3

Example

As an example of the brFCM algorithm, consider an image consisting of 4 pixels as listed in Table 1, with the values representing intensity levels. For this example, we will quantize the feature space by masking the lower G bits of the feature value. Feature Value Binary Quantized Binary Vector Value Value Quantized Value H& 25 011001 24 011000 HJI 26 011010 24 011000 HJK 32 100000 32 100000 HJL 32 100000 32 100000 Table 1: Example - 6 Pixel Image

H K NMO by itself would produce one reduced vector, with a weight of 2, since HJK Aggregation HJL

and are identical. The other two vectors are represented in the reduced dataset as they are, with a weight of 1. PO We can also quantize the feature space using a bit mask of ( 00A00  + , that is G , where the O%Q width of the quantized bin is . After quantization, the pixels have the values as seen in Table 1, columns 4 and 5. Aggregation can then be applied to the dataset to provide data reduction, as shown in Table 2. Note the mean value that is used in the clustering step is computed from the full precision value. Feature Vector

H& H I

Mean Value Weight ( 25.5 32



) 2 2

Table 2: Example - 2 Pixel Reduced Image

3.4

Image Characteristics

One question that arises from the brFCM algorithm is: “For what type of image is brFCM suited?” At present there are no direct tests to indicate the usefulness of the algorithm on a particular set of images. However, as discussed earlier, many identical feature vectors do indicate successful data compression. When multiple dimensions are considered in an image (e.g. multi-spectral, RGB) the feature space size increases and the possibility of identical vectors is decreased. 6

The brFCM algorithm can be an effective replacement for FCM if the feature space defined by the data set is similar in size or smaller than the number of OSR8TEOSRUTEOSR1VO IWL vectors in the dataset. As an example, consider an RGB image consisting of possible values.& With no prior knowledge of the image, the probability of any particular RGB value would be I

Suggest Documents