Carnegie Mellon
Hyperspectral Feature Selection for Detection of Chicken Skin Tumors Songyot Nakariyakul 2003
Electrical & Computer
ENGINEERING
Hyperspectral Feature Selection for Detection of Chicken Skin Tumors
Songyot
Nakariyakul 2003
Advisor: Prof. Casasent
Hyperspectral Feature Selection for Detection of Chicken Skin Tumors
Songyot Nakariyakul
Department of Electrical
and Computer Engineering
Carnegie Mellon University,
Pittsburgh,
PA 15213
Advisor: Prof. David Casasent
ABSTRACT Weconsider a feature selection methodto detect skin tumors on chicken carcasses using hyperspectral data. A chicken skin tumoris an ulcerous lesion that is surrounded by a rim of thickened skin. Detection of chicken tumors is a difficult detection problembecause chicken tumors are of manysizes and shapes; sometumors appear on the side of chicken. In addition, different areas of normal chicken skins have a variety of hyperspectral response variations, someof whichare very similar to the spectral responses of tumors. Similarly, different tumors have different spectral responses. Thus, proper training is neededand manyfalse alarms are expected. Since the spectral responses on the lesion and thickened skin regions of tumorsare considerably different, wetrain our feature selection algorithm to detect lesions and thickened skin separately; we then morphologically process the resultant images and we fuse the two detection results to reduce false alarms. Forwardselection and modified branch and boundalgorithms are used to select a small numberof features that are useful for discrimination. Initial results showthat our method has a goodtumordetection rate and a low false alarm rate. Keywords: feature reduction, feature selection, hyperspectral data, product inspection.
1. INTRODUCTION
Hyperspectral (HS) image data is high-dimensional data that contains more than a hundred images in narrowlyspaced spectral bands (~,). It has been shownthat use of hyperspectral information useful for detection of objects in military applications such as detecting military vehicles [ 1, 2] and mines [3], for land use applications [4], and for manyUSDA product inspection applications [5-12]. This occurs since HSdata provides spectral information that uniquely characterizes and identifies
the chemical,
moisture, and physical properties of the constituent parts of an input object, scene region, or an agricultural
product. Hyperspectral data has successfully classified:
internal-damaged almonds from
normal ones [5], aflatoxin-infested corn kernels from good ones [6-8], vitreous durumwheat kernels from non-vitreous durumones [9], and fecal contaminated chicken carcass from clean ones [10-12]. Oneof the main problemsin classification of high-dimensional data is that there are often not enoughsamplesin the training data. It is generally accepted that the required numberof training samples mustbe at least ten times the numberof features or in this case input ~ spectral samplesper class [13] if one wants to be able to accurately predict the class of an unknownsample. This phenomenon is knownas the curse of dimensionality. Thus, use of hyperspectral data requires more than a thousand training samplesper class in order to cope with the curse of dimensionality. In general, this numberof samplesis quite difficult to obtain. Thus, it is necessaryto reduce the numberof features by either feature extraction or feature selection techniques. Feature extraction refers to algorithms that mapall of the original features into a few features (each of whichis a function of all original features), and feature selection refers to algorithmsthat select a small subset of the input feature set (use of only several ~. features) to use for classification. Feature selection is preferred to feature extraction becauseHSdata acquisition systems then only need data from a few ~. bands (this provides faster data acquisition and a less expensivesystem that requires fewer filters or simpler laser diode light sources). Thus, weconsider a newfeature selection algorithm developedearlier at CarnegieMellonUniversity [ 14]
2
In lhis rcporl, ~’e discuss llic use of fcalurc sclcclion ~)~l llypcrspcclral images skin lulllors
ill
H~
ima~cs t)I chicken carcasses. A chicken skin lumor is a round ulcerous
surrounded by a rim of Ihickcncd skin I I 5]. Iq~urc I sho~’s a l~lI!O x 1500 pixcl color imag~ of a chicken carcass wilh 14 lumors prosonl, t:igure ~ ~splays fhc 5~4 llln wav¢lcllglh band inla~e wilh all
ltimors
numbered and marked by r~ian~lcs.
correspond lo rcllcctancc
’l’hc
images in H~ daia arc gray.-~calc
and
dala of the o~cc[; lhc image in oath 1~ spcclral balld is afl~clcd by lhc Skill
color, shading, and slope of each l¢~al region o1 Ihe carcass, l:igures images of lhc Iumors numbcr~] l(t,
3a, 3b, 3c and 3d are enlarged
I l, 6 anti 14 respcclively il~ I:igure 2. Chicken s~n lumors vary il~
size l~om4x2 pixcls Io more than 4(}x25 pixe]s. In a single g~-ay scale HSimage al tree ~, suct~ ~ls Iqgurc 2, Ihe cenlral tflcerot~s lesioll regions of ttllnors appear as bri{~h[ re’glens, as seen in i:igure 3a al~¢l 3b, and lhc Ihickcncd skin surrounding file lesion regions appears as dark.-gray rings as shown in Irigurc ~a and 3b. Whcll [tllllOrs recur ~)11 Ihc side of lilt carcass, lhey appear clliplical
and arc very small. SuchItllllOrS
arc shownin ]:igurc 3c and 3d. "l’hus, dclecling chickcll skin linnets is a dffficull problem.
Figure 1. A color image of a c~icken carcass wiil~ skin Ironers
Figure 2. The 554 nmwavelengthband image of the carcass shownin Figure 1 with all tumors numberedand markedby rectangles.
(a)
(b)
(c)
(d)
Figure 3. Enlargedimagesof the tumors numbered10, I. 1, 6 and 14 respectively in Figure 2.
Prior workon detection of chicken skin tumors using ttS data considered the statistical
properties
(mean, standard deviation, skewness, kurtosis, and coefficient of variations) of HS images in three selected bands (~,) [16]. Principle componentanalysis was applied to hyperspectral imagesof normaland tumor regions to produce the first
10 PCAcigenimages. The: mgenimagewith the best contrast and
differences between tumor and normal regions was analyzed to find the three bands ~vith the most
4
important (largest) coefficients contributing to that eigenimage. The three 465 nm, 575 nm, and 705 bands were used. A square grid with a mesh size of 64x64 pixels was placed over each HSimage with 2. Statistical features (mean, skewness, kurtosis, and each pixel correspondingto a samplearea of 0.1 mm coefficient of variation, defined as (standard deviation / mean)x 100) were calculated within each square in this grid were calculated and used as inputs to fuzzy classifiers. implemented with three operations: (1) fuzzify numerical statistical
The fuzzy inference process was inputs into input membership
functions based on observations; (2) apply fuzzy operators to the antecedents of the rule base; (3) evaluate the consequentportion of the rule [16]. The fuzzy classifiers classify each grid region into one of the two categories: normal or tumorous skin. Use of three features (coefficient of variation, skewness, and kurtosis) gave successful detection rates of 91%and 86%for normal and tumorous skin tissue region, respectively. However,the grid size is too large for our database since someskin tumors in our database consist of only 10 to 20 pixels. This emphasizesthe need to classifying each pixel individually. Kimet al [17] approachedthe problem differently. They computedthe maximum intensities,
slopes, and ratios
of maximum intensities in several specific wavelengthbands for each pixel and used themas features for a linear classifier. Three features werechosenby inspection of the spectra of the training data; as a result, these features are not guaranteed to give the best solution. A simple unspecified linear classifier was used. Image pixels were classified into either tumor or normal class. Normal-class pixels that were misclassified by the linear classifier as tumor-class pixels were referred to as false alarms. Morphological imageprocessing was applied to the resultant binary 2-class imageto removefalse alarms. 31 of 41 skin tumors (76%) were detected with 12 false alarms. Fluorescence images were used in this work. They used 10 images of chicken carcasses, our 2 images were in this set; but the sensor used was different. Theyused 48 tumor pixel samples, but it is not clear if these werefrom all 41 of the tumors. Our database contains HSreflectance imagesin a total of 65 spectral bands ()0 ranging from )~ 425 to 711 nm. Weshowthe spectral responses of someof the tumor regions in Figure 4. These are the responses at one pixel in the lesion regions of tumors numbered6, 10, 11 and 14 in Figure 2 (or Figure 3c, 3a, 3b and 3d, respectively). FromFigure 4, the spectra of the lesion regions of tumors have similar
relative shapes but varying intensities.
This is expected because tumors appearing on the side of the
carcass (tumors numbered6 and 14) reflect the light away from the HS sensors resulting in lower intensities
than those of tumors appearing in the middle of the carcass (tumors numbered10 and 11).
This emphasizes the need to normalize the response at every pixel in the database before training or testing. The response at each data pixel is normalizedby dividing its response by its average wavelength response. Figure 5 showsthe normalizedversion of the spectra in Figure 4. This data indicates that the spectral responses of the lesion regions of different tumors are not exactly the same, and thus one must carefully select the training set pixel databaseto represent all tumors.
Spectraof the lesionsof tumorsin Figure2. 8000
7OOO
6OOO
5000
4000 ~’~’~’,.o..O~°~.~ 3000
’-."
2000 0
~ 10
?? .... ".,. -~’"" "~" ............ " ..... "’""" ~": ...... ~ 20
~ 30 Band
~ 40
Tumor#10 ~ Tumor #11 Tumor ~ Tumor#14 ~ ~ 50 60
Figure 4. Non-normalizedspectra of the lesions of tumors in Figure 2.
Normalized spectraof the lesionsof tumorsin Figure2.
1.3 1.2 1.1 ¢-
1 0.9 0.8
.... Tumor#10 -- Tumor #11 ..... Tumor#6 ...... Tumor #14
0,7 0.6 0
I
I0
I
20
I
30 Band
I
40
I
50
~
60
Figure 5. Normalizedversion of the spectra in Figure 4.
Next, we address whetherthere is a noticeable difference in the response of the lesion regions and the surrounding thickened skin regions of tumors. Figure 6 showsthe normalized spectra of somepixels in the lesion and thickened skin regions of tumor number10 in Figure 2. It is clear from Figure 6 that the lesions and thickened skin of tumors have quite different HSresponses. Thus, weneed different sets of feature bands to detect each. To do this, weselect a portion of the lesion region and normal skin region pixels of the chicken imagesas the pixel training/test set or the lesion pixel database. Wealso create a thickened skin pixel database that includes a portion of the thickened skin region and the normal skin region pixels of the chicken images; this is the pixel training/test set for thickened skin. Wetrain our feature selection algorithm on the lesion pixel database and on the thickened skin pixel database; the results are used to detect the lesions and the thickenedskin regions of tumors, respectively. Wemust first select the optimal features (X) to use. Theonly optimal feature selection algorithms are exhaustive search and branch and bound (BB) [18]. An exhaustive search finds the best subset of features out of n by evaluating a criterion function J for all possible combinationsand selecting the best
7
65!
one. If wewant to find the best subset of 4 features out of 65, = /645 /
-
677,040subsets
4! (65 - 4)
must be searched. In manyhyperspectral image cases that have more than a hundred features (~), exhaustive search is very time consumingand prohibitive. The BBalgorithm is moreefficient because it avoids an exhaustive search of the wholesearch space by rejecting manysubsets that are guaranteed to be sub-optimal, and it guarantees that the selected subset is the globally optimal solution for any criterion function that satisfies
monotonicity. A modified branch and bound (MBB)algorithm developed by Xue-
WenChen[14] modifies the BBalgorithm by providing a more efficient way to search the subsets in the BBalgorithm. Thus, it is faster than BBand much faster than an exhaustive search. However, for general HS data with more than a hundred feature bands (~), the computational load for the MBB algorithm is also impractical for feature selection problems. This emphasizesthe need to reducing the dimensionality of the problem before we apply the MBBalgorithm. Weuse the forward selection (FS) algorithm to select 30 initial
features and then use the MBB algorithm to select the optimal subset of
features out of these 30 FS features. This is referred to as the high dimensionalBBalgorithm [14]. We note that the FS algorithm is not guaranteed to produce optimal results like the MBBalgorithm does because it does not examineall possible subsets. The FS methodalso has the nesting problem, i.e. the subset of the best four features chosen by FS contains the subset of the best three features chosen by FS, etc. In practice, the best four features maynot contain any of the best three features, etc. Hence,weonly use FS to reduce the initial
dimensionality of the problem and we use the MBBalgorithm to select a
numberof final features (three or four features at most) to use in a nearest neighbor(NNB) classifier. To assign a test sampleto a class in the NNB,the shortest Euclideandistance from that sampleto other tumor-class and normal-class samples in the training set is computed.The NNBclassifier
assigns
the test sampleto the same class as its nearest neighbor in the training set. Using this NNBclassifier, each pixel in the chicken carcass is then classified to be either one (white) for candidate skin tumor regions or zero (black) for candidate normalskin regions. Normal-class pixels that are misclassified the NNBclassifier
as tumor-class pixels are referred to as false alarms. There are two binary output
images formed for each HSimages, one to locate lesions (using the lesion pixel database) and one locate thickened skin regions (using the thickened skin pixel database). Wethen fuse (intersect) the binary output imagesto obtain the final detection result. Fusion of the two binary output imagesis shown to reject false alarms. Creating two pixel databases, the lesion pixel database and the thickenedskin pixel database, and fusing the two binary output imagesresulting from the two versions of the feature selection algorithm is newand has not previously been employedin HSdata processing. Sect. 2 describes the database used. Sect. 3 discusses various backgroundalgorithms and issues used in this paper. Methodsand test results are presented in Sects. 4 and 5.
Normalized spectraof the lesions andthickenedskin of tumor~0 in Figure2. 1.6 ...... 1.4
1.2
13.8 ~ Lesion ...... Thickened Skin
13.6
0.4 O
10
20
30 Band
40
50
60
Figure 6. Normalizedspectra of pixels in the lesion and thickened skin region of tumor #10 in Figure 2.
9
2.
DATABASE
Chicken carcasses with skin tumors were sent for processing to the Instrumentation and Sensing Laboratory (ISL) in Maryland. The hyperspectral (HS) imaging system used consists of a CCDcamera, spectrograph, a sample transport mechanism, and lighting sources [16]. More details
on the ISL
hyperspectral imaging system are provided elsewhere [ 19]. The locations of tumors were identified by a Food Safety and Inspection Service (FSIS) veterinarian. TwoHScubes were provided to us for initial testing (a HScube contains a series of images in narrowly spaced spectral bands (~k), whereeach image corresponds to the image obtained at one specific frequency band.) Each HScube consists of 65 spectral band images ranging from ~, = 425 to 711 nm. The first
HScube contains a single chicken carcass
(Figure 1) with 14 skin tumors. The size of each image is 460x400pixels; the 554 nmwavelength band image from this HS cube was shownin Figure 2 with all tumors numberedand marked by rectangles. Tumornumber2 was identified by the FSIS veterinarian as normal tissue, but Kimet al [17] stated in their paper that it was a tumor. If welook at the color imageof the carcass (Figure 1), we agree with Kim that it is a tumor (so does our feature selection algorithm). Tumornumber12 appears to contain two small tumors close to each other, but the FSISveterinarian classified themas one tumor. Six of the 65band imagesare shownin Figure 7. Althoughthey look similar, their pixel intensities vary drastically. The second HScube has two chicken carcasses with a total of 7 tumors on them. The size of each image is 460x600 pixels; the 554 nm wavelength band image is shownin Figure 8 with tumors numberedand marked by rectangles. The color image for these carcasses is shownin Figure 9. Manytumors in the second HScube are small compared to those in the first
HS cube. Tumornumber 2 in the second HS
cube consists of only six pixels and has no lesion region. Thus, we do not expect our feature selection algorithmto detect it.
10
Figure7. HSimagesat various )~k’s, k = 10, 20, 30, .40, 5(/, 60 fromleft - to - right, up - to - down
Figure 8. The 554 nmwavelength band image of second HScube with all minors numberedand marked byre,’ctangles.
1!
Figure ~). A color image of chicken carcasses I~r lhe second HScube,
in general, one would selec! lhe pixel lraining and lest sel pixel dalabase from ~ne HS cube and lram his/her l~alure seleclion algorilhm on lhem. ()ne then presenls lhe l~ature solcclion resull ~n lhe second HS cube Ihal has nol been Irainod on belk~re. However, our dalabase is ]nniled;
we have only
lhree chickeu carcasses ~w~ilable ik)r lrainin~ and lcslin~L Welhus selecled a portion of lhe skin Itlmors l~om lhe l]rsl
HScube and lhe normal skin regions l?om I~t)tl~ ftS cubes. This was nc~:essary io reduce
l~lse alarms m lhe second image, since some normal skin ~egions on the s~on(I HS cube tlavc very dill~renl speclral responses l~om [hose on lhe iirsl region lrainin~
dala
l~om bolh
HS
HScube. Thus, il is necessary [o selecl uorm~ll sMn
cubes. The lumor re, ions in bolh images s~m Io have similar speclral
responses. Weselecled pixel sa~nples l~om only some of lhe l~,lmors in lhe l]rsl HScube lk)r lraining and losling because manylumors in lhe l]rsl HS cube are lar:~er ih~m lhose in lhe second HScube. All of the pixel training and lesl sol dala l~)r the lesion and thickened skiu regions were selected l]’om lhe I]rs[ flS cube. l:or Ihe lesion pixel dalabase, we exlracled Ihc (~fg .~ band speclral responses lk~r 1()() pixcls
only five of the 14 ulcerous lesion regions of the tumors (tumors numbered4, 7, 9, 10 and 11 in Figure 2) and labeled themas "tumor" class. Half (50 samples) of themare used for the pixel training set and half (50 samples)for the pixel test set of target (tumor) pixel data. With50 tumorpixels for the pixel training set and 65 spectral features, this represents high-dimensionaldata. Weextracted the 65 ~. band spectral responses for 360 pixels from normal skin regions of the carcass and labeled them as "normal" class. Since different areas of normal chicken skin have different spectral responses, there are more normalclass samplesthan tumor-class samples. Weused half (180 samples) of these as the pixel training set and half (180 samples) as the pixel test set for normalclass pixel data. 300 of the 360 normalsamples were selected fromvarious areas of the normalskin regions of the carcass in HScube 1 that have very different spectral responses. These regions include pale skin, pinkish skin, skin covering bony joints, and shadowy area under wings as denoted in Figure 10. Figure 11 shows the normalized spectra of some of these regions; as seen, they are quite different. 60 of the 360 normalskin samples were chosen from the normal skin regions of a carcass in the second HScube. Unlike the carcass image in the first
HScube, this
carcass image displays the side of the chicken whencaptured by the HSimaging system. Thus, some regions in this carcass as noted in Figure 12 are not present in the carcass in the first HScube, and they have different spectral responses as shownin Figure 13. Weincluded 60 pixels in these regions in our normalskin pixel training and test set. For the thickened skin pixel database, we extracted the 65 ;k band spectral responses for 100 pixels from the thickened skin region surrounding the same five tumors chosen for the lesion pixel database and labeled them as tumor class. Half (50 samples) of themare used for the pixel training set andhalf (50 samples)for the pixel test set. The normalpixel training and test set pixel databases were the sameas used for lesion detection.
13
Figure 10. Normal skin regions with dilfcreilt
spectral responses,
Normalized spectraof different areasof r~orrnat chickenskins 1.6
1¸4
1.2
pale skin pinkish skin bony area shadowyregion, [I.4 LI
i 11-I
i 29
[ 39 Band
J 40
i 50
t .-.~ 60
Figure11. Nornializcdspcclra of di ffcr¢;~nl ;lrt)[lS of normalchickenskins.
14
Figure 12. Normalskin regions in lhe second imalge wilh ditTerenl speclral responses I]~)m lhose in the l~rsl imag,,e. Normalizedspectra of different areas of norrnal skin regions on the secondimace 1.6 ’ , , r , ~ .........
1.4
/
0.8 ~ region 1 .... region 2 region 3 region 4
0.6
I].4 0
’ 10
.1.
40
I
I
50
60
Band Figure 13. Normalized sp~tra of differcnl areas o1 normal skin regions shownin I~’igure 12.
15
3.
BACKGROUND
This section summarizesthe feature selection algorithms used in this paper. The goal of each feature selection algorithm is to select features that are important for distinguishing tumor-class samples from normal-class ones. These features along with the pixel training set are used in the NNBclassifier to classify each sample. To quantify performance, we give the Pc (percentage of tumor and normal pixels correctly classified) score for the pixel training andtest sets. 3.1 Forward Selection (FS) The FS methodfirst selects the best single feature and then adds one feature at a time, whichin combination with the first and subsequent selected features maximizesthe criterion function J. Weuse the Bhattacharyadistance as the criterion function, i.e., C1 "t-
1 J=-~([ll
1 -~2)
T .C1
2
-1 (+C2)
(~t
1
-laz)+lln
2
2
C2.
,
(1)
wheregl and ~2 are the meanvectors for the tumor-class and normal-class training samples, and C1 and C2are the covariance matrices for the tumor-class and normal-class training samples, respectively [20, p. 48]. The Bhattacharyadistance is large if the meandifference betweentwo classes is large (the first term in J) and if the variances of the two classes are different (the secondterm in J). To select the best subset of m features out of n original features,
the number of subsets searched by the FS algorithm is
[(2n - m + 1)m]! 2, which is muchsmaller than the numberof subsets evaluated in an exhaustive search
m!(r~--m)! or in the branch and bound method. For example, to select the best subset of 4 features out of 65, the FS algorithm requires searching [(2x65 - 4 + 1)x4]/2 =254subsets, whereasan exhaustive
search requires searching
65~
- 677,040 subsets. However,the FS method does not examine all
4!(65-4)!
16
possible subsets, so the resulting subset is not guaranteed to producethe optimal set of features nor the best classification
rate Pc- The FS methodalso has the nesting problem i.e. the subset of four best
features chosen by FS contains the subset of three best features chosen by FS, etc. This is not normally expected to be the case. Recall that the FS algorithm producesa set of ordered features. Wethus use the FS algorithm to select 30 initial features (more than the final numberof features 3 or 4 that we want to use) and the MBB algorithm to select the optimal subset of final features (three or four features at most) out of these 30 FS features to use in our classifier. This is our FS/MBB algorithm[14]. 3.2 Modified Branch and Bound (MBB) Since the modified branch and bound (MBB) algorithm
developed at CMU[14] uses
modifications to the basic branch and bound(BB)algorithm, we thus give a brief description of the basic BBalgorithm. To select the best set of mfeatures out of n original features, the BBalgorithm selects the n - m features to be discarded. It creates a search tree with n - m levels with one feature being omitted at each level of the tree. Theproblemis to select the best path throughthe tree that yields the largest J. The BBalgorithm assumes monotonicity of J, i.e. J decreases as we movedownthe tree; this is logical because more features are omitted as wemovedownthe tree. Weuse the Bhattacharya distance in (1) the criterion function. TheBBalgorithm starts the search at the top of the tree, and all nodes at level-1 are analyzed. For a given level-1 node, it has several nodes below it. The successor node below the level-1 node N1with the largest J is analyzedfurther. Thesearch continues until it reaches the bottomof the tree, the n - m level, resulting in one full path through the tree with an estimate (a boundB) on the criteria function J. J is then evaluated at other level-1 nodes and the process is continuedto lower levels of the tree; if J < B for a given node, then J does not have to be evaluated at successor nodes under that node because J decreases as weproceed downthe tree. If J > B for a given node, paths from that node to the bottomof the tree are explored (as long as their J remainslarger than B). If a newdifferent full path with a J > B is found, the boundB is updated with the newlarger value. With a mother node has a low J < B, its successor nodes need not be analyzed. Omitting evaluation of J for a set of successor nodes
17
(when J < B at some mother nodes) speeds up the, :search,
and thus BBis more efficient
than
exhaustive search. A new BBalgorithm improvementin the MBBalgorithm is to obtain a good initial
estimate of B
[ 14]. If wecan obtain a good, high initial B, many’moresubsequentJ values higher up in the tree are less likely to give a J > B. Therefore, calculations of J for manypaths can hopefully be omitted. The MBB uses FS or other sub-optimal algorithms to order all n features from best to worst, and the tree is then constructed with this ordered featured set. Aninitial estimate of B is calculated using the m best features ordered by FS. This B bound estimate is higher than one estimated by using non-ordered features. Using an ordered set of features 0v) puts the best FS features on mother nodes with manysuccessor nodes. Hence, weexpect a low J to be obtained at these nodes because whenthe best FS features are omitted, we expect J to be less than B. Whenthis occurs, .all subsets of features below these nodes can then be omitted in the search, and this speeds up the search. Another MBBmodification is to find a proper starting search level. The motivation for this is that at the upper levels of the search tree, we do not expect J < B, since only one or two features are omitted. In MBB,we thus start the BBsearch
(J
evaluation) at level (n - m)/4, because we only expect J to be less than a good initial B estimate when somereasonable numberof features are omitted. J is .evaluated for all nodes at this level. If all nodes give J > B, we jumpto level (n - m)/2, calculate J for all of its nodes, and apply the BBsearch to nodes below all nodes with J >B. If any node at some level such as (n - m)/4 has J B. If all nodes at level In - m) have J > B, then we know that we would have had to evaluate J at all nodes above that level. This "jump search" algorithmthus saves searchingJ at all nodes abovethat level [ 14].
18
4.
First,
MATERIAL
AND METHODS
the background must be removed prior to image processing.
around the chicken carcasses
We remove the background
by placing a mask whose value is one (white) on the carcasses and zero
(black) on the background. To obtain the mask for each carcass image, we first
obtain the spectra of the
background and several skin regions on the carcass. Unlike the spectral responses of tumors and normal skins, the spectral response of the background does not noticeably vary over all 65 spectral bands. Figure 14 shows the unnormalized spectra of the background, some normal skin regions,
and two tumor regions.
From these training data, we chose to compute the difference between the responses in two separate bands (10th and 40th bands)for each pixel; we set pixels with an intensity difference less than 500 to zero and otherwise set the pixel value to one. This forms the mask. The result is shownand discussed in Sect. 5.1. spectra of the backgroundand other chicken tissues SO00 7000
-Background ...... Normal .... Lesion ........ Dermis
.............. ""’~"
-
.,. .,- ...... . ,.,÷
6000
4000 3000 2000 1000 0
0
I
L
10
20
.I
30
I
40
I
__
60
I
50
Band Figure 14. Spectra of the background and other chicken regions.
Second, we select the spectral bands to use ~Io locate the lesion and lhickened skin regions of tumors and separate
them from normal skin regions.
Wetrain
our feature
selection
algorithm on the
19
lesion pixel database and the thickened skin pixel database; these are used to detect the lesions and thickened skin regions of tumors, respectively. Weuse forward selection (FS) to select the 30 best features out of the 65 available ones and we then apply the modifiedbranch and boundalgorithm to select a numberof final features (FS/MBB algorithm). For this database (with only 65 k features), it is possible to apply the MBBalgorithm directly to the original databases without first reducing the numberof features by FS. In general, we do not expect this to be the case. Weshow in Sect. 5.2 that the two methods (MBBand FS/MBB) give the same set of final features for our pixel databases. To select the best four features out of all 65 features, MBBtook more than three hours, while the FS/MBB algorithm (MBBapplied to 30 FS features) took less than two minutes. Thus, the proposed FS/MBB algorithm preferable for manyHSapplications. Wenote that the two methods do not give the same set of final features in general. In such cases, we find small differences in the different spectra chosen. Sect. 5.2 provides results and a discussion of this. The MBBalgorithm solution is optimal, whereas the FS/MBB algorithm solution is sub-optimal because FS/MBB only gives the best set of 30 features (by FS) and these are not the optimal set of 30 features. This is expected, since feature selection is an N-Pcomplete problem [21], and only a search over the entire, database can give the best solution. In general HS applications with more than a hundredfeature bands, applying MBBto the original database is too timeconsumingto be considered. Wenowaddress the computational times for tlhe different algorithms, Chen[14] noted that to select the best four features from 137 features, an exhaustive search required a search time of seven days, MBBtook four days, and FS/MBBneeded only six minu~es on a Pentimn II 250 MHzcomputer. We now comparethe computation times for the MBBalgorithm to those for the BBand exhaustive search algorithms on the present database. Wefirst ~sed the FS methodto reduce the numberof original ~. features in the lesion pixel database from 65 to 30. The three optimal feature selection algorithms were then used to select different numbersof final fealures from the reduced 30 FS features. Table 1 lists the numberof calculations of the criterion ftmclion J thai: each of the three algorithms has to evaluate to
2O
obtain one to eight final features out of 3(I initial ones. The numberof calculations of J is a measureof the speed of the different algorithms. Whenone or two features are selected, an exhaustive search is faster than BB-basedmethods. Whenthree or more features are needed, the MBB algorithm outperforms the other two methods. The improvementfactor increases as more final features are chosen.
Table 1. The numberof calculation of J required by three optimal feature selection algorithms to select different numbersof final features from 30 FS features for the lesion pixel database. # features selected
Exhaustive search
BB
MBB
1
30
440
45
2
435
1820
466
3
4060
5433
1686
4
27405
14960
5528
5
142506
34856
16418
6
593775
59284
30429
7
2035800
121119
33450
8
5 852925
196304
78240
Wenowaddress the processing applied to the binary image produced after using the features selected by the FS/MBB algorithm and applying them (for each image pixel) to an NNBclassifier. binary imageresults, with each pixel classified as one for the tumor class and zero for the normalskin class, Werefer to this binary image after application of our feature selection algorithm and an NNB classifier as the bina~, pixel classification image. Weexpect to have a higher false alarm rate in the binary pixel classification imagethan in the pixel databases becauseof the large variation in normalskin and becausethe normalskin training samplesin the piixel databases might not represent all of the normal skin regions in a full image. In addition, we expect false alarms because normal chicken skin has a variety of hyperspectral responses, someof whichare very similar to the spectral responses of tumors at the few )vs used. Nevertheless, these false alarms should not occur at a numberof adjacent pixels if we train our systemproperly. Since a tumoris not a :~ingle pixel but is a region that weassumeconsists of at
21
least 6 pixels, weanalyze the blob colored [221 version of the binary pixel classification imageand omit any pixel blobs that form connected regions with five or less pixels, Wethen perform morphological processing on the resultant binary image. We: apply a closing operation on the resultant binary image with a structuring elementof size 3x3 pixels; this flits in internal holes on tumorsof 2x2 pixels or less in size (holes are tumor-class pixels that are misclassified by our classifier as normalskin pixels). The structuring element size of 3x3 pixels was chosen after analysis of the binary images of the 5 tumors in the training set, which showedsome2x2 pixe[ holes inside the tumor regions; we do not expect large holes in tumor regions from our feature selection algorithm. The two post-processed binary images for the lesion and thickened skin cases are then fused (AND,intersected) to produce the final classification imageresult. Webelieve that the transition region fromthe lesion region to the thickenedskin region of a tumor respondto both the lesion and thickened skin features. As a result, whenwe fuse the binary images for the lesion and thickenedskin cases, these regions are detected. This requires further analysis in future work(i.e. modifyingthe fusion algorithm used). Fusion results of the binary pixel classification images are shownin Sect. 5.3 to further reduce false alarms.
5.
IRESULTS AND DISCUSSION
In this section, we review howwe removethe backgroundfrom the carcasses and showresults in Sect. 5.1. Wedemonstrate our feature selection algorithm performance on the two pixel databases in Sect. 5.2. Sect. 5.3 summarizesthe detection results on the two HSimages. 5.1 BackgroundRemoval Processes Weremovethe backgroundfrom the carcasses by placing a maskover the image; the mask has a value of zero (black) for the backgroundand one ,~,white) tBr the chicken. Usingthe fact that the spectral response of the backgroundis nearly constant over )v, we obtain [he maskby computingthe difference in responses at each pixel betweentwo bands wilh large differences for the carcass. Weused the 10th and 40th band images(see Figure 14 in Sect. 4). Wecalculated this difference for each imagepixel, formed 22
pixel difference image, and set the pixels with an intensity difference less than 500 to zero and other pixels to one. The resultant
image for the first
HS images is shownin Figure 15a. Weremove some
unwantedbackgroundblob regions by retaining only connected regions with more than 1000 pixels (the chicken carcasses will have more than 1000 pixels).
Wethen perform morphological processing. We
close the resultant binary imagewith a 5x5 pixel structuring element; this fills in small holes in the mask with 4x4 pixels or less. Thestructuring elementsize chosen is presently ad hoc. ’Fhe final masksfor the first HSimages and the second HSimages are shownin Figure 15b and Figure 16 respectively.
(a) Pre-processing
(b) Post-processing
Figure 15. Maskfor the first HSimages.
23
Figure 16. Mask for the second HS images
5.2 Feature Selection Pixel Database Results The Lesion Pixel Database Weused FS to reduce the number of original pixel database features from 65 to 30. Table 2 lists the 30 features
selected in order for the lesion pixel database by the FS algorithm.
MBBalgorithm to select Our objective
the best subsets ef one to four features
was to keep the number o1’ final features
implementation. Thus, a maximumof four final features
Wethen used the
from these 30 FS features
as small as possible
(FS/MBB).
to allow for real-time
for each pixel database is considered.
Wealso
applied the MBB algorithm to the original 65 )~ pixel database. Table 3 compares the features selected by the FS (on all 65 original features), algorithms.
MBB(on all 65 oriiginal
As seen, the MBBand FS/MBBalgorithms
select
features),
and ES/MBB (on 30 FS features)
the same features
for discrimination.
Thus, our FS/MBBalgorithm selects the oplimMset of~, features for this pixel database. Only two of the four best features ordered by FS (features
18 and 28; in Table 2) are present in the best subset of four
24
features selected by the FS/MBB algorithm (Table 3); none of the two best features ordered by FS Table 2 are present in the best subset of two features selected by the FS/MBB algorithm. Thus, an optimal feature selection algorithm (such as MBB) is neededand the initial L feature reduction algorithm (such as FS) must provide a numberof starting features that are muchlarger than the numberof final features considered. Thus, one cannot select the best features using only the FS algorithm. The FS/MBB data showsthe nesting problemthat FS produces. The: best subset of three features (features 11, 18, and 28) does not contain any feature in the best subset of two features (features 20 and 29) or one feature (feature 34). The best subset of four features contains only two features in the best subset of three features (features 18 and 28). The FS a~gorilhmcannot handle such cases.
Table 2. The30 features chosen (in order of importance)out of 65 by the FS algorithm for the lesion pix.elt database 8
feature #
1
2
3
4
5
6
7
)~ feature
34
18
28
11
64
14
2’9 46
feature #
16
17
18
19
20
~ feature
6
9
13
16
21 20
212 32
58
23
9
10
iI
12
13
14
15
7
33
37
47
42
5l
56
24
25
26
27
28
29
30
53
24
52
41
40
61
38
65
Table 3. Best features chosen by three feature selection algorithms from the 30 FS features in Table 2 for the lesion pixe[ database. The numberof features
FS
MBB
FS/MBB
34
34
34
18,34
20,29
20,29
18,28,34
11,18,28
11,18,28
11,18,28,34
14,18,28,64
14,18,28,64
25
After using the FS/MBBalgorithm to reduce the: number of features to a low-dimensional space, each sample in the training and test pixel database is fed to the NNBclassifier
(using the training set
pixels as the NNBdatabase),
and test pixel sets are
and the classification
rates Pc for the training
obtained. In obtaining training set Pc, the training set sample being classified the NNBclassifier.
is of course removedfrom
Table 4 compares the Pc scores using the features chosen by the FS and the FS/MBB
algorithms as the numberof final features is increased. Whentwo and three features are used, the training set Pc scores (88.7% and 98.3%) using ore FS/MBBalgorithm are noticeable 85.2% and 92.6%) using the features
selected
by the FS algorithm.
higher than those (Pc
Thus our optimal FS/MBBfeature
selection algorithm is needed. Since the Bhattacharya measure J used does not relate directly to P(7 (or Pc), the features chosen by the FS algorithm can and do sometimes give a better Pc than those chosen by our algorithm (e.g.
when 4 features
are u~sed). However, the Pc difference
is small. We note that
generalization is good for all cases in Table 4. Figure 17 shows the classification
rate Pc for the training
and test pixel sets as the number of final FS/MBB features used increases.
To select the number of final
features to use, we look at Pc of the training set. From ]Figure 17, we see that Pc for the training pixel set reaches a distinct
peak when three FS/MBBfeatures are: used (Pc = 98%) and it decreases when four final
features are used. Thus, we keep three features [or the lesion pixel database. Figure 17 shows that the test set pixel database gives a similar result,
thus confirming our choice of 3 final features.
Wealso
notice that the Pc for the pixel training set when l;our FS/MBB features are used (Pc = 95%) is lower than that when three features are used. This demonstrates that more features do not always give a higher In general, we expect Pc for the training :~;et to increase as the number of features increases; thus, in general, use of a validation set (as in [ 14]) i~; needed to :select the numberof final features to use. Table lists
the training and test pixel set Pc" and P>.~lpercentage of normal skin pixels misclassified as tumor
pixels) for the three final features (features 11, 18, and 128) selected by the FS/MBB algorithm. Although we obtain a low Pva score of 1.1% for the Iraining set and 2.8% for the test set, there are more % erro~-s for normal skin pixels than for lesion pixels Image results are presented and discussed in Sect. 5.3.
Table 4. Pc results for features chosen using the FS and FS/MBB algorithms for the lesion pixel database # features
FS algorithm
FS/MBBalgorithm
Pc(train)%
Pc(test)%
Pc(train)%
Pc(test)%
76.5
76.5
76.5
76.5
85.2
82.6
88.7
85.2
92.6
93.:5
98.3
95.2
96.1
94.4
95.2
93.9
~00
95
9O
85
8O
TrainingSet ] Test Set
0
1
2 3 Number of Fir, al Features
4
6
Figure 17. Pc: for the training and test pixel sets vs the: numberof final FS/MBB features for the lesion pixel database. Table 5. Pc and PFA (% of normalskiu pixels misclassiified as tumor) for the training and test pixel sets using the three features selected by the FS/MBB algorithm for the lesion pixel database. Pc (%)
PF,a (%)
Training set
’)8.3
1.l
Test set
’05.2
2.8
27
The Thickened Skin Pixel Database Table 6 lists
the 30 selected features ordered by the FS algorithm for the thickened skin pixel
database. Table 7 compares the features selected by the FS (on all 65 original features), original
features),
and FS/MBB(on 30 FS features)
algorithms.
MBB(on all
As we can see from Table 7, the best
single feature (feature 18) and the best two features (features l and 18) chosen by FS are also the optimal ones chosen by the FS/MBBalgorithm.
The MBBand FS/MBBalgorithms
yield
the same features
(for this database) both algorithms are optimal The best three and four features algorithms.
Whenthree features are used, the I:’S algorithm only has one (feature
are similar
and
for all
42) of the optimal
features chosen by the FS/MBBalgorithm, but the other two features are close in )~ (feature 1 vs 3 and feature 18 vs 19). Whenfour features are used, the FS algorithm has three (features
1, 3, and 42) of
optimal features chosen by the FS/MBBalgorithm. The other feature is close in )~ (feature 18 vs 19). also see the nesting problem in the FS algorithm;
from Table 7, we see that the best subset of three
features (features 3, 42, and 19) does not contain any feature in the best subset of two features (features and 18). The FS algorithm cannot make such changes. Table 8 compares the Pc scores using the features chosen by the FS and the FS/MBBalgorithms as the number of final features
selected is increased.
Table 8, we see that the training set Pc scores using the FS/MBBalgorithm are consistently those using the FS algorithm when three or four features
From
higher than
are used. Figure 18 shows the classification
rates Pc for the pixel training and test sets as the numberof final features used increases. The Pc score for the training set pixels is highest when four features are used. From Figure 18, we expect when five or more FS/MBBfeatures
are used, the training
set Pc will i~mreasc. However, we set a maximumof four
wavelength bands for real-time implementation. Wethus keep four features to use in the NNBclassifier for the thickened skin pixel database (using the training set pixcls as the NNBdatabase). In obtaining training set Pc:, the training set sample under test is removedfrom the NNBclassifier.
Table 9 lists Pc and
P~:.x for the training and test pixel set for the four final features (features 1, 3, 19, and 42) selected by the FS/MBBalgorithm. The training
set P~A :score (the % of normal pixels called thickened skin) for the
28
thickened skin pixel database (5.9%for four features) is noticeably higher than the training set P~,a score for the lesion pixel database (1.1% for three features), Thus, we expect more false alarms in the binary pixel classification output image for the thickened skin pixel database. Wealso note more normal skin errors (PFA)than thickenedskin errors.
Table 6. The 30 features chosen (in order of importance)out of 65 by the FS algorithm for the thickened skin pixel database. feature #
1
2
3
4
5
6
)~ feature
18
1
42
3
19
37
feature #
16
17
18
19
20
21
)v feature
15
38
51
45
47
26
7
8
9
8
5
48
23
27
4
22
123
24
25
26
27
28
29
30
12
49
61
65
54
58
24
44
20
l0
21
ll
12 13 14 15 28
16
Table 7. Best features chosen by three feature selection algorithms from the 30 FS features in Table 4 for the thickenedskin pixel database. The numberof features
FS
MBB
FS/MBB
1
18
18
18
2
1,18
1,18
1,18
3
1,18,42
3.19,42
3,19,42
4
1,3, 18,4.2
1,3,19,42
1,3,19,42
Table 8. Pc results for features chosen using the FS and FS/MBB algorithms fl)r the thickened skin pixel database FS algorithm
# features
FS/MBBalgorithm
Pc(train)%
Pc(test)%
Pc(train)%
Pc(test)%
1
78.2
78.2
78.2
78.2
2
88.2
87.2;
88.2
87,3
3
90.1
89.1
90.9
88.6
4
88.2
89.1
91.3
90.5
29
95
9O
80
0
1
2 3 Number of Final Features
4
5
Figure 18. Pc on the pixel training andtest sets vs the: numberof final features for the thickenedskin pixel database.
Table9. Pc and Pt.A [’or the training and test pixel sets using the four features selected by the FS/MBB algorithm for the thickened skin pixel database. Pc (%)
PFA (%)
Training set
91.4
5.9
Test set
90.5
6.5
5.3 Detection Results The First Image The three chosen features (features 11, 18 and 28) for the lesion pixel database and the NNB classifier were applied to the pixels in the first HSimages. To assign an unknown pixel to a class in the NNBclassifier,
the closest Euclidean distal~ce from that sampleto any tumor-class sample d~ and to any
normal-class sampled: are computed.The sampleis then assigned to the tumorclass if d~ < dx + T (T is threshold), and to the normal class otherwise. Whenthe threshold T is zero. the NNt3classifier
is
3O
unbiased. If the threshold T is a large negative number, d~_ + T becomesa negative or small positive number. Thus, d~ is more likely to be larger than de + T, and most of the pixels will be classified as normal skin. For the lesion pixel database, whenthe NNBthreshold is zero, the false alarm rate on image one is high (PFA= 5%). Note that this is noticeably larger than the PICA = I. 1%for the lesion pixel database. Weselect a small threshold of-0.0005 to use, and found the detection result to be acceptable (PFA= 1.5%). Figure 19a shows the binary classification
result for the lesion features on image one.
Since a tumor is not a single pixel but consists of at least 6 pixels, we discard (omit) any white (one) pixels in the binary classification imagewhichare part of a connected region of five or less pixels. We then performa closing operation on the resultant binary’ imagewith a 3x3 pixel structuring elementto fill in small holes in tumor regions. The resultant imageis shownin Figure 19b (detected tumors are marked by rectangles). The numberof false alarms in Figure 19a is significantly
reduced in Figure 19b. In
Figure 19b, 12 of the 14 tumorsare detected., but morethan 20 false alarm regions are still present. Thus, use of the thickened skin pixel features is necessary to reduce false alarms. Usingonly one database, the lesion pixel database or the thickened skin pixel database, for training is not recommended. Thefour chosen features (features 1. 3, 19 and 42) [’or the thickened skin pixel database are used in the NNBclassifier
and applied to the image, Weuse the same threshold of-0.0005 for the NNB
classifier for the thickened skin pixel spectra features, since it gives goodresults. Figures 20a and 20b showthe binary classification results before and after binary ~norphological image processing for the thickened skin pixel features, respectively. The numberof false alarm regions in Figure 20a is greatly reduced to 30 in Figure 20b. In Figure: 20b, 13 of the’, 14 tumors are detected and markedby rectangles. There are manymore false alarms in the binary pixel classification output imagefor the thickened skin features (Figure 20a) than for the lesion feamre,s (Figure 19a), as weexpected frownits lower Pc score the pixel database. Wethen fuse (intersect)
the morphological-processed binary classification
image
results for the two feature cases (Figures l!~b and 20b) and obtain the final classification imageresult Figure 21a. The result indicates that the transition regions from the lesion regions to the thickened skin
31
regions of tumors are detected by both the lesion and thickened skin features. Weuse the fact that we do not expect to detect tumors on the edgeof the chicken images(we refer to this rule as post-processing) removesuch potential false alarms (6 in Figure 2.1a) that appear within 15 pixe[s of the edge of the chicken in the classification
image result. The resultant image in Figure 21b has located 12 tumors
markedby rectangles and has only a single false alarm markedby a triangle. For this image, wedetect 12 of the 14 skin tumors with one false alarm. Tumornumber5 (left center) in Figure 2 was missed; it has small lesion region (only 5-6 pixels), and only one’, pixel in this lesion region is detected by our feature selection algorithm for the lesion features (Figure 19a). Althoughthe thickened skin region of this tumor is detected as shownin Figure 20b, the final classification imageresult does not detect this tumorsince its lesion region is not detected in Figure 19b. "Fumornumber6 in Figure 2 is also missed; it has no lesion region, and our feature selection algorithm thus does not detect it in Figure 19a. Somepixels in its thickened skin region are detected in Figure 20a, but ~hey consist of five or less pixels and thus are removedin Figure 20b. Thus, both missed tumors are expected. Fusion of the binary pixel classification images significantly reduces the numberof raise alarms from more than 20 in Figure 19b and more than 30 in Figure 20b to only 7 in Figure 21a. Althoughweuse our post-processing rule to remove6 of these false alarms from Figure 21a, fusion of the two bintary pixel classification imagesis necessary.
32
(b) After morphologicalprocessing
(a) Before morphologicalprocessing
Figure19. Detectionresults using the lesion features on the first imagewith tumorsmarked by rectangles.
(b) After morphological processing
(a) Before morphologicalprocessing.
Figure 20. Detection results using the thickened skin features on the first imagewith tumors markedby re, ctangles.
33
(a) pre-processing
(b) post-processing
Figure 21. Final fused classification imageresults on the first imagewith tumors markedby rectangles and the false alarm, markedby a triangle.
The Second Image The chosen features for the lesion pixel database and the thickened skin pixel database and the NNB classifier are nowapplied to the pixels in the sec,ond HSimage. Wedo not expect our feature selection algorithm to detect tumor number2 in this image(Figure 8) because the tumor consists of only six pixels and displays no lesion region. Figures 22a and 22b shows the binary classification
images for image 2
before and after morphologicalprocessing for the lesio~a features, respectively. In Figure 22a, the lesion region of tumor number 1 in Figure 8 is detected (center of carcass on left),
but these pixels are
disconnected and thus any group of them contains only five or less pixels. This and other regions are removedand not shownin Figure 22b. Tumornumber12 (o~ the right leg of the carcass on the left) is not detected in the binary classification result f,~r the lesion fe, atures: this is expectedbecausethis tumorhas no lesion region. Thus, both missed tumors on the lefl ca~-cass are expected at the present resolution. In Figure 22b, 4 of the 7 tumors are detected and markedby’ recta~gles, but morethan 40 false alarms are
34
present. Thus, use of the thickened skin pixel features is again necessary to reduce false alarms. Figures 23a and 23b shows the binary classification
images before and after morphological processing for the
thickened skin features, respectively. Again, wesee that the binary classification imagefor the thickened skin features (Figure 23a) has morefalse alarms than the imagefor the lesion features (Figure 22a). Figure 23a, only 4 of the 6 pixels on tumor number2 in Figure 8 are detected and thus it is removedin our blob analysis. In Figure 23b, 5 of the 7 tumors are detected and markedby rectangles. The 2 missed tumors (on the left carcass) are expected, as disct~ssed earlier. Figure 24 showsthe final classification imageresult for the second imageafter fusing Figures 22b and 23b and removing3 potential false alarms on the edge of the chicken image. 4 of the 7 skin tumors are detected and markedby rectangles, and 2 false alarms occur and are markedby triangles. Tumornumber6 (lower right) in Figure 8 is detected the thickenedskin features but is missed by the lesion features. Thetumorhas a small lesion region (_-pixels), but only 3 of these pixels are detected by our feature selection algorilhm. As a result, it is not detected in the final fused classification imagein Figure 24. Fusion of the binary classification results reduces the numberof false alarms from more than 40 in Figure 22b and more than 50 in Figure 23b to only 5. 3 of these 5 false alarms are near the edge of the chicken imageand thus removedby our postprocessing rule.
35
(a) Before morphological processing
(b) After lnorphological processing Figure 22. Detection results using the lesion features on the second imagewith tumors markedby rectan,~les 36
(a) Before morphological processing
(b) After morphologicalprocessing Figure 23. Detection results using the thickened skin features on the second imagewith tumors markedby rectangles.
37
Figure 24. Final fused classification imageresult on the second imagewith tumors markedby rectangles and false alarms markedby triangles.
6.
CONCLUSIONS
Since the spectral responses on the: lesions and thickened skin portions of tumors are different, wetrain our feature selection algorithm to detect lesions and thickened skin regions separately; wethen morphologically process the resultant images, and we fuse the two detection results to reduce false alarms. The FS/MBB feature selection algorithm was described. HS data was shownto be useful for detecting chicken skin tumors. Ourfeature selection algorithm found that only 7 bands (feature i, 3, 1 i, 18, 19, 28 and 42) of HSdata were used. Our initial
test result is promising with 16 of 21 skin tumors
detected with only 3 false alarms. Four of the five misclassified tumorsare ver~ small or has small lesion regions and thus were expected (for the data at the present resolution).
Txvo of these tumors were
detected for the thickenedskin features.
38
Muchmore extensive tests are needed on muchmore data. The database should also have higher resolution, so that there are more pixels on each tumor. Creating a training and test set database is difficult becauseexact pixel tumorlocations and sizes are not clear in the present data. Their locations should be carefully addressed. Future work shouh:l consider using a k-nearest neighbor (KNN)or a neural net classifier.
ACKNOWLEDGEMENT
The author would like to thank Dr. Yud-Ren Chen, Dr. Moon Kim and Dr. Kevin Chao of the Agricultural ResearchService in Marylandfor providing the database used in this paper.
REFERENCES
[1] B. Thai, and G. Healey, "Invariant subpixel target identification in hyperspectral imagery," Proc. SPIE, vol. 3717, pp. 14-24, 1999. [2] T. Nichols, J. Thomas, W. Kober, and V. Velten, "Interference-invariant
target detection in
hyperspectral images," Proc. SPIE, vol. 3372. pp. 176-87, 1998. [3] J. Goutsias, and A. Banerji, "A morphological approach to automatic mine detection problems," IEEE Trans. Aerospaceand Electronic Systems, vol. 34.. No. 4, pp. 1085-1096,1998 [4]
J.E. Pinzon, S.L. Ustin, C.M. Casteneda, J.l:.
Pierce, and L.A. Costick, "Robust spatial and spectral
feature extraction for multispectral and hyperspectral imagery," Proc. SPIE, vol. 3372, pp. 199-210. 1998. [5] D. Casasent, and X.-W. Chen, " Hyperspectral data discrimination methods," Proc. SPIE, vol. 4203, pp. 27-36, 2000.
39
[6] T.C. Pearson, D.T. Wicklow,E.B. Maghirang,F. Xie, and F.E. Dowell, "Detecting aflatoxin in single corn kernels by transmittance and reflectance spectroscopy," Trans. qfthe ASAE,vol.44(5), pp.12471254, 2001. [7] D. Casasent, X.-W. Chen, and S. Nakariyakul, "Hyperspectral methods to detect aflatoxin in whole kernel corn," Proc. of the 2’’’t F~,mga! Genomics, 3rd FumonismElimination and 15th Aflatoxin Elimination Workshops,October 20[)2. [8] F. Dowell, T. Pearson, E. Maghirang. F. Xie, and D. Wicklow, "Reflectance and transmittance spectroscopy applied to detecting fumonisin in single corn kernels infected verticillioides,"
with Fusarium
Cereal Chem.vol. 79(2), pp. 222-226, 2002.
[9] F. Dowell, "Detecting vitreous
and non-vitreous
durum wheat kernels using near-infrared
spectroscopy," in 1999 ASAEAnnual lmernational Meeting, Paper No. 993082, 1999. [10] W.R. Windham,K.C. Lawrence, and B. Park, "Visible/NiR spectroscopy for characterizing
fecal
contamination of chicken carcasses," AmericanSociety for Agricultural Engineers, Paper No. 016004, 2001. [11] W.R. Windham,B. Park, K.C. Lawrence, and R.J. Buhr, "Selection of visible/NIR wavelengths for characterizing fecal and ingesta contamination of poultry carcasses," hzternational Conference on Near-lnfrared Spectroscopy, Abstract p. O10-5, 2001. [12] W.R. Windham,B. Park, K.C. Lawrence, and D.P. Smith, "Analysis of reflectance
spectra from
hyperspectral images of poultry carcasses for fecal and ingesta detection," International Society fi)r Optical Engineering, Paper No. 481 (>30. 2002. [13] S. Kumar,J. Ghosh, and M. Crawford, "" A hierarchical multiclassifier system for hyperspectral data analysis," in Multiple Classifi’er Svste~,,s, J. Kitter and F. Roli (Eds.), LNCS,vol. 1857, Springer, pp. 270-279, 2000. [ 14] D. Casasent, and X.-W.Chert, "Waveband selection for hyperspectral data: optimal feature selection," Proc. SPIE, vol. 5601, April 2003.
40
[ 15] B.W. Calnek, H. John Barnes, C.W.Beard, W.M.Reid, and H.W.Yoder, Diseases of poultry, Chapter 16. pp. 386-484, IowaState University Press, Ames,IA. [16] K. Chao, P.M. Mehl, M. Kim, and Y.R. Chen, "Detection of chicken skin tumors by multispectral imaging," Proc. SPIE, vol. 4206, pp. 214-223,200l. [17] I. Kim, Y.R. Chen, M. Kim, and S. Kang, "Application of hyperspectral fluorescence imaging for detection of skin tumors on chicken carcasses," in 2002 ASAEAnnualInternational Meeting, Paper No. 023142, 2002. [18] P. Narendra, and K. Fukunaga, "A branch and bound algorithm for feature subset selection," 1EEE Trans. Comput.,vol. 26, pp. 917-922, l!~77. [19] M.S. Kim, Y.R. Chen, and P.M. Mehl., ’~ Hyperspectral reflectance and fluorescence imaging system for food quality and safety," Trans. of the ASAE,vol.44(3), pp. 721-729, 2001. [20] R. Duda, P. Hart, and D. Stork, Pattern Classi~ication,
2~ ed., A Wiley-Interscience Publication, New
York, p. 48, 2001. [21] T. Cover, and J. Campenhout,"On the pos, sible orderings in the measurementselection problem," IEEETrans. Systems, Man,and Cybernetics, SMC-’7(9), pp. 657-661, 1977. [22] D. Ballard, and C. Brown,ComputerVision, Prentice-Hall, EnglewoodCliffs, N.J., p. 151, 1982.
41