UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE

UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CL...

Author: Sandra Cortés Soto

0 downloads 2 Views 2MB Size

Report

Download PDF

Recommend Documents

FACULTY OF COMPUTER SCIENCE SOEGIJAPRANATA CATHOLIC UNIVERSITY

University of Ljubljana. Faculty of Computer and Information Science

COMPUTER SCIENCE FACULTY SOEGIJAPRANATA CATHOLIC UNIVERSITY

AGH University of Science and Technology Faculty of Physics and Applied Computer Science

Department of Computer Science, University of Otago

POLYTECHNIC UNIVERSITY OF VALENCIA

18 Faculty of Science, Lund University

Faculty of Information Technology, Department of Computer and Information Science

17 Faculty of Science, Lund University

FACULTY OF HEALTH SCIENCE, AARHUS UNIVERSITY

SCIENCE FACULTY No Faculty of Science

COMPUTER SCIENCE PROGRAM DESCRIPTION FEATURES CAREER POSSIBILITIES FACULTY COMPUTER SCIENCE BACHELOR OF SCIENCE MINOR MASTER OF SCIENCE

Chia Tien Dan Lo Department of Computer Science and Software Engineering Southern Polytechnic State University

ttohesier UNIVERSITY OF AD-A COMPUTER SCIENCE

Cardiff University. Department of Computer Science

COMPUTER SCIENCE at the UNIVERSITY OF MAINE

WOMEN AND COMPUTER ENGINEERING: THE CASE OF THE SCHOOL OF COMPUTER SCIENCE AT THE TECHNICAL UNIVERSITY OF MADRID

Division of Forest Science, Faculty of Agriculture, University of Miyazaki

LODZ UNIVERSITY OF TECHNOLOGY FACULTY OF ELECTRICAL, ELECTRONIC, COMPUTER AND CONTROL ENGINEERING INSTITUTE OF APPLIED COMPUTER SCIENCE

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

HELSINKI UNIVERSITY OF TECHNOLOGY Faculty of Information Processing Science. Juha Tiihonen. Computer-assisted Elevator Configuration

FMEA - FMECA. Dr. Miha Mraz University of Ljubljana Faculty of Computer and Information Science Ljubljana

UNIVERSITY POLYTECHNIC OF MADRID FACULTY OF COMPUTER SCIENCE

DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CLASSIFICATION LANDSAT TM IMAGES

Presented by KAMAL R. AL-RAWI To obtain the Ph.D. in Computer Science

MADRID- SPAIN 2001

CONSUELO GONZALO MARTIN, Associate Professor, Department of Architecture and Technology of Computer Systems,

Faculty

of

Computer

Science,

University

Polytechnic of Madrid.

CERTIFIES: that the thesis entitled "DESIGN NEW SUPERVISED ART-TYPE ARTIFICIAL NEURAL NETWORKS, AND THEIR PERFORMANCES FOR CLASSIFICATION LANDSAT TM IMAGES", has been carried out by KAMAL R. AL-RAWI, under my supervisión, in the Department of Architecture

and

Technology of Computer Systems, Faculty of Computer Science, University Polytechnic of Madrid.

To

Prof. Dr. Amos Eddy

ACKNOWLEDGEMENTS I gratefully thank Dr. Consuelo Gonzalo Martín, associate professor of computer science, at Faculty of Computer Science, University Polytechnic of Madrid, for her continuous efforts during the supervisión of this thesis. Her guide and criticisms were a great help to me. The criticisms of Dr. Águeda Arquero Hidalgo and Dr. Estibaliz Martínez Izquierdo were a great help. They were always in touch during preparing of this work. My grateful thanks to Professor Dr. Pedro Gómez Vilda and the rest of the Working Group on Computer Technology, Dr. Victoria Rodellar Biarge, Dr. Mercedes Pérez Castellanos, and Dr Víctor Nieto Lluis, for their support and useful discussion. I would like to thank all the gradúate students in the group, especially to Vicente Garcia del Cantara, for the friendly atmosphere during my stay in the Department of Architecture and Technology of Computer Systems. My thanks to the secretary of the department Mrs. M. del Carmen Parró Cruz, who was always there to arrange our administration works. I gratefully thank professor Dr. José Luis Casanova, Director of the Remote Sensing Laboratory (LATUV), University of Valladolid for using the facilities of the laboratory. My thanks to Miss Sarah Strauss and Miss Nicole Knudsen for the proof reading of the Thesis. Finally, I need to thank my wife Eman, my daughter Hiba, and my sons Saif Al-Deen and Haitham for their support during the preparing of this work.

INDEX

CHAPTER I: INTRODUCTION

1

1.1 Historical background

1

1.2 Adaptive Resonance Theory ANNs

2

1.2.1 Unsupervised ART ANNs

3

1.2.2 Supervised ART ANNs

4

1.3 classifying remotely sensed data with ANNs

4

1.4 Objectives

7

CHAPTER II: FUZZY ART ANN

8

2.1 Introduction

8

2.2 Matching system and vigilance parameter

8

2.3 Fuzzy ART dynamics

9

2.4 Fast-learning slow record option

14

2.5 Complement coding

15

2.6 Fuzzy subset and conservative limit

15

2.7 Training Algorithms of Fuzzy ART

16

2.8 Evolution of Fuzzy ART

19

2.9 Newly developed versions of Fuzzy ART

23

2.9.1 Flagged approach

24

2.9.2 Training algorithms of Flagged-Fuzzy ART.

27

2.9.3 Compact approach

29

2.9.4 Training algorithms of Compact-Fuzzy ART.

33

2.10 Categorization

35

C H A P T E R III: F U Z Z Y A R T M A P

,

37

3.1 Introduction

37

3.2 Fuzzy ARTMAP

37

3.2.1 Vigilance parameter dynamics in supervised environment.

39

3.2.2 trainingphase

43

3.2.3 Classification phase

47

3.3 Full algorithm of Fuzzy ARTMAP

47

3.3.1 Training algorithms of Fuzzy ARTMAP.

48

3.3.2 Classification algorithm of Fuzzy ARTMAP

50

C H A P T E R IV: S U P E R V I S E D A R T - I A N N

52

4.1 Introduction

52

4.2 Supervised ART-I

54

4.2.1 Architecture ofSupervised ART-I.

55

4.2.2 Data Description

58

A.2.3 Training of Supervised ART-I

58

4.2.4 Classification by Supervised ART-I

60

4.3 Algorithm of Supervised ART-1

60

4.3.1 Training Algorithm of Supervised ART-1.

60

4.3.2 Classification algorithm of Supervised ART-I

63

4.4 Discussion

64

CHAPTER V: SUPERVISED ART-II ANN

66

5.1 Introduction

66

5.2 Supervised ART-II

66

5.2.1 Architecture ofSupervised ART-II.

66

5.2.2 Training of Supervised ART-II

68

5.2.3 Classification by Supervised ART-II. 5.3 Full algorithm of Supervised ART-II

74 74

5.3.1 Training algorithm of Supervised ART-II

74

5.3.2 Classification algorithm ofSupervised ART-II.

78

5.4 Discussion

79

CHAPTER VI: PERFORMANCE OF SUPERVISED ART-I&II FOR CLASSIFICATION OF LANDSAT TM IMAGES

82

6.1 Satellites Landsat

82

6.2 Data

84

6.3 Performance

84

6.3.1 Training performance

84

6.3.2 Classification performance

92

CHAPTER VII: PERFORMANCES OF SUPERVISED ART ANNsWITH DIFFERENT VIGILANCE DYNAMICS

99

7.1 Introduction

99

7.2 Vigilance dynamics

99

7.2.1 Flying approach

99

7.2.2 Fixed vigilance approach

100

7.2.3 Free vigilance approach

100

7.2.4 Floating approach

102

7.3 Results and discussion

102

CHAPTER VIII: CONCLUSIONS

106

BIBLIOGRAPHY

109

APPENDIX: RESUMEN

115

A.l. INTRODUCCIÓN

115

A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA)

115

A.1.2 Clasificación de datos remotamente detectados con RNA

116

A.2. OBJETIVOS DE LA TESIS

119

A.3. REDES NEURONALES ARTIFICIALES TIPO ART

119

A.3.1 Fuzzy ART

121

A.3.2 Fuzzy ARTMAP

123

A.4. PROPUESTA DE DOS VERSIONES MEJORADAS DE FUZZY ART 125 A.4.1 Versión "Flagged" de Fuzzy ART

125

A.4.2 Versión "Compact" de Fuzzy ART.

126

A.5. PROPUESTA DE DOS NUEVAS ARQUITECTURAS SUPERVISADAS TIPOART 127 A. 5.1 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-I

128

A.5.2 Algoritmos de aprendizaje y clasificación de la arquitectura Supervised ART-IJ

130

A.6. EVALUACIÓN DE LAS PRESTACIONES DE SUPERVISED ART-I Y SUPERVISED ART-II EN LA CLASIFICACIÓN DE IMÁGENES REMOTAMENTE DETECTADAS 132 A.7. PRESTACIONES DE REDES SUPERVISADA TIPO ART PARA DIFERENTES DINÁMICAS DEL PARÁMETRO DE VIGILANCIA.

135

A.8. CONCLUSIONES

137

:

LIST OF FIGURES Figure 2-1: Fuzzy ART dynamics

13

Figure 2-2: The architecture of Fuzzy ART

17

Figure 2-3: The architecture of FlaggedFuzzy ART.

26

Figure 2-4: The architecture of Compact Fuzzy ART.

31

Figure 3-1: Block diagram shows supervisión through mapfield.

38

Figure 3-2: The full architecture for supervisión through mapfield.

40

Figure 3-3: The architecture of ARTMAP for classification problem

41

Figure 3-4: Full architecture of Fuzzy ARTMAP

42

Figure 3-5: Match tracking using flying vigilance parameter

44

Figure 4-1: Training of map filed weights

53

Figure 4-2: Supervisión dynamic of the tagging approach of Supervised ART-I...

56

Figure 4-3: Architecture of Supervised ART-I

57

Figure 5-1: Supervisión dynamic of the stacking approach of Supervised ART-II

67

Figure 5-2: Architecture of Supervised ART-II

69

Figure5-3: Determination the winning node in the stacking-supervision approach of Supervised ART-II 71 Figure 6-1 :Number of category nodes in the domain of the vigilance parameter p and the dynamic learning parameter J3, using 9000 pixels of the Landsat TM images 87 Figure 6-2: Training time, in minutes, for Supervised ART-I, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 89 Figure 6-3: Training time, in minutes, for Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 90

Figure 6-4: The ratio of training time for Supervised ART-I / Supervised ART-II, in the domain of the vigilance parameter p and the dynamic learning parameter /?, using 9000 pixels of the Landsat TM images 91 Figure 6-5: Classification time, in minutes in the domain of the vigilance parameter p and the dynamic learning parameter /?, for 52 440 pixels of the Landsat TM images 93 Figure 6-6: Classification performance, in the domain of the vigilance parameter p and the dynamic learning parameter j5, for Landsat TM images 94 Figure 6-7: The abo ve image is the reference image. The lower image is the classified image using Supervised ART-II, with vigilance parameter p =0.98, the dynamic learning parameter /? =0.50, and training with 9000 exemplars. The classification accuracy is 85.82%

95

Figure 7-1: Sketches show different vigilance parameter dynamics: Fixed, free, And float approaches

101

Figure 7-2: Classified images for landsat TM images. First, second, third, and forthcolumn represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively

104

LIST OF TABLES Table 2-1: Comparison among Original, Flagged, and Compact algorithms of Fuzzy ART. The last two have been developed in this study

32

Table 5-1: Comparisons between Fuzzy ARTMAP, Supervised ART-I, and Supervised ART-II

80

Table 6-1: Descriptions for Landsat-5 Thematic Mapper (TM) images Table 6-2: Performance of Supervised ART-II when trained with different sizes of training samples

86

Table 6-3: Training and classification statistics for Landsat TM image at individual classlevel

96

Table 6-4: The confusión matrix for the classification process for the 52 440 pixels of the Landsat TM image....

97

Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics

83

103

ABSTRACT New Supervised ART ANNs with simple architectures have been developed in this study. Their architectures have been built from a single module of ART rather than a pair of them connected by a map field as all other supervised ART-type ANNs that have been reported in the literatee. Two different algorithms have been developed: Supervised ART-I and Supervised ART-II. The developed algorithms reduced the number of dynamic parameters, memory requirement, and the training time which is the major problem facing the ANNs, without altering the classification accuracy. Two simplified versión of Fuzzy ART algorithms have been developed, keeping the categorization performance as that of the original algorithm. They are Flagged Fuzzy ART and Compact Fuzzy ART. While Supervised ART-I and Supervised ART-II are general in nature that can be applied to all ART ANNs, the supervisión of Compact Fuzzy ART has been addressed in this work. The full algorithms for Supervised ART-I and Supervised ART-II have been listed. The newly developed ANNs have been applied to classify Landsat Thematic Mapper (TM) images. The performance of the systems has been tested for different dynamic parameters and different training samples. The behavior of the systems in the vigilance parameter and dynamic learning parameter space has been addressed. Their performances in the domain of the vigilance parameter and the dynamic learning parameter have been understood. Only one approach, for vigilance dynamic in all supervised ART-type ANNs, has been addressed in the literatee. Three more approaches have been developed in this study: fixed, free, and float. The performance of the developed ANNs for classification landsat TM images has been tested for all these different vigilance dynamics.

CHAPTERI INTRODUCTION

1.1 Historical background Although the roots of the fieldof Artificial Neural Networks (ANNs) extend to 1943 when McCulloch and Pitts built the first artificial neural structure, its foundations have been established in mid seventies. (Werbos 1974) developed the principie of the Back Propagation (BP) ANN. (Grossberg 1976) developed the principie of Adaptive Resonance Theory (ART) ANNs. However, the great theoretical advance of the field has been achieved in 1980s. In that decade the algorithms of the BP ANN were developed independently by many authors (Le Cun 1986, Parker 1986, and Rumelhart et al. 1986). The Kohonen Self-Organizing Map KSOM (Kohonen 1982) and Hopfield ANN (Hopfield 1982) have been developed. A lot of advances have been achieved for ART ANNs (Carpenter & Grossbergh 1987a&b). ART ANNs is the concern of this study due to their stability, rapidity and accuracy (Carpenter et al. 1991a&b, 1992, 1997, and Gan & Lúa 1992). ART ANNs have been applied in many fields. Boeing Company has implemented ART-1 neural information retrieval system for its engineering designs (Caudell et al. 1994). Boeing Company has thousands of designs for its aircraft parts. Features are extracted for each design. These features are presented to the network to establish categories for these designs. When a new design is needed, its features are presented to the system to determine the category that the required design belongs to. 1

Retrieval some features from the designs of the pointed category will avoid the repetition of work for the new design. ART ANNs have been employed for target recognition (Seibert & Waxman 1992). Their approach is extracting features of the target (aircraft) from different views. (Bernardon & Carrick 1995) have used them also for target recognition using SyntheticAperture Radar (SAR) imagery. After learning the network, target recognition is done through matching the signal of the target with a set of stored target models. (Kumar & Guez 1989, and Waxman et al. 1995) have used ART ANNs for target recognition too. Kumar and Guez worked with visible, while Waxman and his group worked in visible, infrared and SAR. Moreover, ART ANNs have been employed for robot sensory motor control (Baloch & Waxman 1991, Bachelder et al. 1993, Dubrawski & Crowley 1994, Srinivasa & Sharma 1996) and robot navigation (Racz & Dubrawski 1995); machine visión (Caudell & Healy 1994); object recognition (Seibert & Waxman 1992); face recognition (Siebert & Waxman 1993); pattern clustering (Moore 1989, Mekkaoui & Jespers 1990); character recognition (Wilson et al. 1990); sonar signal processing (Simpson 1990); medical imaging (Soliz & Donohoe 1996); electrocardiogram wave recognition (Ham & Han 1996); signature verification (Murshed et al. 1995); fault identification problem in a nuclear power plant (Keyvan 1999); and remote sensing (Gopaleía/. 1994; Baraldi & Parmiggiani 1995).

1.2 Adaptive Resonance Theory ANNs There are two types of ANNs, supervised and unsupervised. In unsupervised case only the input features are introduced to the input layer, then the network categorizing them. While in the supervised type of ANNs the class code is supplied to

2

the network together with the input features. During training phase, when the network correctly classifies an input feature, weights are trained, otherwise correction should be done.

1.2.1 Unsupervised ART ANNs The principie of ART ANNs was introduced in the literature as a theory of human cognitive information processing (Grossberg 1976, 1980). Since then a series of ART-based ANNs have been developed for unsupervised category learning and pattern recognition in real-time: ART1 (Carpenter & Grossberg 1987a), ART2 (Carpenter & Grossberg 1987b), ART3 (Carpenter & Grossberg 1990), SART (Baraldi & Parmiggiani 1995), and Fuzzy ART (Carpenter et al. 1991a). ART1 has the ability to categorize arbitrary binary input patterns (Carpentar & Grossberg 1987a). ART2 has the ability to deal with binary and analog pattern as well (Carpentar & Grossberg 1987b). The information, in ART1 and ART2, flows forward through weights that are connected each node in the input layer to all nodes in the category layer, and backward through another set of weights which connect each category node to all nodes in the input layer. A simple architecture of unsupervised ART ANN has been developed (Carpentar et al. 1991a). They called it Fuzzy ART. It is like ART2, in that it has the ability to categorize analog multi-valued input patterns and binary input patterns as well. Weights in Fuzzy ART connect each node in input layer to all category nodes. Information flows through these weights in one way, from the input layer to the category layer. Fuzzy ART will be explained in details in chapter II.

3

1.2.2 Supervised ART ANNs In the early nineties, two supervised ART architectures have been developed ARTMAP (Carpenter et al. 1991b) and Fuzzy ARTMAP (Carpenter et al. 1992). Architecture of ARTMAP has been built from two modules of ARTl, while architecture of Fuzzy ARTMAP has been built from two modules of Fuzzy ART. ARTMAP has the ability of learning and classifying binary multivalued input patterns. Fuzzy ARTMAP has the ability of learning and classifying analog input patterns, in addition to the binary one (Carpenter et al. 1992). More supervised ART-type ANNs have been developed; ART-EMAP (Carpenter & Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC (Carpenter & Markuzon 1998), and Distributed ARTMAP (Carpenter .1998). In all these architectures, the supervisión has been done through map field that requires two modules of ART. Fuzzy ARTMAP has been used widely. It showed better performance than various other ANNs dealing with different problems such as, automatic analysis of electrocardiogram (Ham & Han 1996); diagnostic monitoring of nuclear plants (Keyvan et al. 1993); and prediction of protein secondary structure (Mehta et al. 1993).

1.3 Classifying remotely sensed data with ANNs Mapping land-cover using remotely sensed data is a very active área of research, due to the advances in space and computer technology (Benediktsson et al. 1990). Conventional classification is usually employed for this task. However, neural networks have been often used in the last decade. The main advantages of neural networks over conventional classifiers as Máximum Likelihood Classifier (MLC) are that: 1) they are non-parametric, therefore, the probability distributions for each class are not required. This allows us to introduce ancillary data (slope, topography, aspect,

4

...etc), in addition to the spectral data to the network, which many authors reported can increase the classifícation accuracy (Benediktsson et al. 1990, Carpenter et al. 1997). Moreover, neural networks are more robust when the distribution is not gaussian (Paola & Schowengerdt 1997, Hepner et al. 1990). 2) Unlike conventional classifíers, neural networks are able to manage fuzzy classifications (Paola & Schowengerdt 1997, Warner & Shank 1997, Yool 1998). The numbers in the output represent the strength of the classes membership of the specific input. This is very important when we deal with low spatial resolution. 3) The parallel feature of neural networks allows us to increase the speed of the classifícation process. This can be done by implement them on parallel computers (Salu & Tilton 1993, Heermann & Khazenie 1992). 4) The neural networks have flexibility for classifícation improvement (Carpenter et al. 1997). 5) It has the ability for establishing an arbitrary decisión boundary (Paola & Schowengerdt 1995, Tzenge/a/. 1994). "Neural networks offer a flexible approach to building the complex, highly non-linear models that required for a complex system. ... Unlike traditional expert systems where knowledge is made explicit in the form of rules, neural networks genérate their own rules by learning from exemplars" (Keyvan 1993). Multi-Layer Perceptron (MLP), with Back Propagation learning, is the most commonly used neural network in the literature to classify remotely sensed data. This is due to the preferable learning approach of the network, which is based on minimizing the error between the output of the network and the target valué. While some authors have reported that conventional classifíers perform better than MLP (Mulder & Spreeuwers 1991, Solaiman & Mouchot 1994), many authors have reported that MLP perform better than MLC in classifying remotely sensed data (Hepner et al. 1990, Heerman & Khazenie 1992, Paola & Schowengerdt 1994, Yoshida & Omatu 1994). The

5

classification performance of MLP can be improved by using ancillary data in addition to the spectral data (Benediktsson et al. 1990). However, employing MLP as a classifier incurs many problems. The architecture of the network is not fixed. The number of hidden layers and the number of nodes in each hidden layer must be determined by trial and error. This is a very costly process keeping in mind the long training time of the network. In addition to that MLP might fall in a local mínimum during the training phase. Moreover, MLP might not converge. Using a small learning rate to avoid the convergence problem makes the long training time of the MLP network much longer. (Heermann & Khazenie 1992) suggested using parallel computers to reduce the training time. This reduces the training time but increases the hardware cost. For classification of a Landsat image, (Carpenter et al. 1997) reported that MLP did not converge, using learning rate=0.6 and momentum rate=0.4, after 212 minutes of training time on a SUN 4 SPARC Station, using 100 000 input presentations. They employed a lower learning rate to avoid the convergence problem. The training time exceeded 1000 minutes, while the classification accuracy was less than 27%. They reported that Fuzzy ARTMAP (Carpentar et al. 1992) perform better classification accuracy than MLP, with lower training time. They reported also that Fuzzy ARTMAP and MLC perform the same level of classifying accuracy. Fuzzy ARTMAP has been employed also by (Mannan et al. 1998) to classify (512x512) pixels, of an image of the Linear Imaging Self-scanning Sensor (LISS-II) of Indian Remote Sensing Satellite (IRS-1B), for their 13 classes. They reported that Fuzzy ARTMAP performs better than both MLC and MLP in classification accuracy. The average classification for six data sets are 84.7%, 80.3%, and 79.9% for Fuzzy ARTMAP, MLC, and MLP, respectively. They reported that the training time was slightly less than that for MLC, but many times faster than MLP.

6

Unlike MLP, Fuzzy ARTMAP has a well define architecture, it always converges, and can tune itself to represent sub-classes by generating a new category node. However, the main drawback to Fuzzy ARTMAP lies in the complex architecture. ít is constructed from two modules of Fuzzy ART linked by a map field.

1.4 Objectives The global objective of this work is to design new simplified versions for ART ANNs architectures, which maintain their original performances, but improve computational time and memory. This objective can be divided in several partial objectives: •

Design new simple architectures of ART-type ANN, which provide the same classification performances of classical ARTs.

•

Develop learning and classification algorithms for these architectures.

•

Encode the developed algorithms.

•

Study of the behavior of the developed architectures for classification of remotely sensed images Landsat Thematic Mapper (TM) in the whole domain of the dynamic parameters. The lay out of this study will be as follow: Chapter II deals with Fuzzy ART.

Chapter III deals with Fuzzy ARTMAP. Chapter IV deals with the newly developed architectures "Supervised ART-I". Chapter V deals with the newly developed architecture Supervised ART-II. The performance of Supervised ART-I and Supervised ART-II ANNs for learning and classifying Landsat TM images are addressed in Chapter VI. Performances of the newly developed ANNs using different vigilance dynamics are addressed in chapter VIL Conclusions are listed in chapter VIII.

7

CHAPTERII FUZZY ART ANN

2.1 Introduction The Fuzzy ART is an unsupervised ART-based ANN. Its architecture has been designed for leaming and categorization of arbitrary analog or binary multi-valued input patterns. This has been achieved by using the mínimum operator ( A ) of the fuzzy set theory instead of the intersection operator ( n ) of the set theory, which has been employed in ART1.

2.2 Matchmg system and vigilance parameter "Fuzzy ART incorporates the basic features of all ARTs system, notably, pattern matching between bottom-up input and top-down leamed prototype vectors. This matching process leads either to a resonant state that focuses attention and triggers stable prototype leaming or a self-regulating parallel memory search. If the search ends by selecting an established category, then the category's prototype may be refined to incorpórate new information in the input pattern. If the search ends by selecting a previously untrained node, then leaming of a new category takes place" (Carpenter et al. 1991a). If the matching valué is greater than the predetermined valué, resonance occurs and new information is incorporated to the winning category node through training its weights, otherwise, a self-organizing parallel memory search is conducted.

8

The match criterion is called vigilance parameter/?. It calibrates the mínimum confídence that a category node must have to represent the current input, before search for a better-committed category node is triggered. If all committed category nodes fail to represent the current input, a new category node is committed, as long as the network's memory capacity is not fully utilized. The vigilance parameter is a nondimensional number pe(0,

1]. A valué of 1 means perfect matching. Low vigilance

parameter leads to code compression with broad generalization for categories. High vigilance parameter leads to large number of category nodes with fine categories. Vigilance parameter is the key feature of all ART ANNs. An ART ANN can discrimínate up to the individual level by setting p - 1, while creating a single category •node for all data by setting p = 0. The valué of the vigilance parameter is determined according to type and amount of data that we have, categorization level that we look for, the required speed, and available memory. The vigilance parameter is fixed during training in all unsupervised ART ANNs.

2.3 Fuzzy ART dynamics Input patterns A^e[0,

1] are presented at the input layer F¡. The choice

function Tj° for each committed category node of the category layer F? is computed according to equation (2-1). The choice valué represents the activation level of each committed category node;

2M

T{,)

E(4(,)AW,) = -^

• /=1

C

2-1

« + !>!/ 1=1

9

where wy are the weights which connect each category node/ in the Frfield with all nodes of the input layer F¡. All weight valúes are initially set to 1 (i.e. Wy = l;for / = 1, ..., 2M. and j = 1, ..., Q . M represents the dimensión of the input features. Since, the normalized features and their complements are introduced to the network, the dimensión of the input vector A¡° is 2M. a is the choice parameter (a > 0). C is the total number of committed category nodes at iteration t. The winning committed category node is determined;

T},) = max{T¡')};j=l,...,N

2-2

It represents the category node with the highest choice valué among all category nodes (committed and uncommitted) in the category layer. N represents the full memory capacity of Fuzzy ART. The valué of N is normally much larger than C (N»C).

All

category nodes N are involved, instead of committed category nodes C, which has been employed by (Carpenter et al. 1991a). Their reasoning for this is to let uncommitted category nodes be committed, when it is needed, in a sequence order (1, 2,...,

j-\,j,j+1,

..., N). To achieve this, they assigned a very small positive valué ^.to each category nodes before training is started. They called it, F2 -order constants. These valúes are decreasing as the index of the order of category node y" in the memory field is increased.

Tj=4j;MC

+ lf>,...,N

0 < ^ oo, all committed category nodes will be tested before a new node is committed. The valué of a alters the order of search among committed category nodes. "A node is called an uncommitted node if all its top-down weights are equal to the initial top-down weight valué, otherwise, the node is committed" (Georgiopoulos et al. 1996). This test takes time especially when the input features are large.

21

The weights updating in their fuzzy ART algorithm (they called it Fuzzy ART Variant, because a -» oo and 0: The choice valué parameter.

60

b) Data characteristics; i- M: The dimensión of the input features. ii- Pt: The number of exemplars to be used in learning. iii- L: The number of classes.

c) Initialization; i- Number of iterations t=l. ii- Number of committed category nodes C-\.

2) New input; A

v \

< |l-a,

for (/)

\ -tí s>

* 1 3 -52 - 3

t3

0 0

w

co "K co o

^ 59 03 o J3 " o

+-•

« °~

4-.

-tí

S o SP S§ co

C M C M O O O O O O O

° ° 5 t:

CM

CO

r>

-

o fc O O N ¡ ? 0 0 0 00 ^

O

0 ( 0 0 0 * 0 CM

r-~ o

CO

o

o

o

i -

lO

^ü

CM

CM CO

o

o

o

o

o

CM

* o

o o>

O)

T"

T - SS r^

"tO)

"5§ ^

CM oo r -

TJ-

o

o

o

o

^

cj

gs

CO o

o

o

o

o

CM

h- CM O CO o

CM

o

o

o

^

o co CO CM

co CO

O)

o

o

o ^

•*

CO r-, ^ o

CM

oo _

in

CO

CO

r-% o CM CO CO O to

^

O

O

CM O

O

h- o> o

o

o * j °y omí ^ o o o o o -M ,

CO

2

Oí C/3 Oí

X

)-( OH

Oí

a> T -

n

o

h-

s

Ol

CO

o

s o

O O CM oí m s

o

o

o

m m

o O

O

O

o

IO

co co

CM CM

o

o

CM

co CM

o

o

o

3 co

«-i fi 13 tí «

O O t í Ci) OÍ . 3 Oí

00

±¿ "tí a,

T-cMco^rif)cor--coo>

CM

CO

*¡

.tí

13 •*

OÍ . t í

t>

^

Oí

J

^

Oí

o ^

+tí

4_»

.r-t

¡e

-o 3 ° tí 8 -

Ii5

52 a B t3

§.!> Oí

13 O 03 01

°

J3 tí

H

Oí

"^

CO )-i

^H Oí

tí

co

¡I o Oí C/3

2 o, Oí k-l co

Oí

co co

—^

. 3 x>

o T3 Oí

Oí c3 Oí

o +3

- 03 a T? tí - o Oí Oí «/ o fl O) fl 60 . i ? Oí .sv

«

03 co X co • rt B . H co T3 03 O H 03

97

Forest has the lowest classification accuracy among all classes. Mountains contributed 347, 751, and 328 pixels to natural vegetation-1, natural vegetation-2, and forest, respectively. The behaviour of both Supervised ART-I and Supervised ART-II, for training remotely sensed data, for all the domain of the dynamic parameters is well understood. According to the results that have been obtained, Supervised ART-II should be employed when the number of category nodes is in thousands. Otherwise Supervised ART-I performs better. However, Supervised ART-II can be employed here too, since the learning time is very short when the number of category nodes is less than 1000, which is less than a minute.

98

CHAPTER VII PERFORMANCES OF SUPERVISED ART ANNs WITH DIFFERENT VIGILANCE DYNAMICS

7.1 Introduction Only one approach, for vigilance dynamic in supervised ART ANNs, has been áddressed in the literatures. If the match valué of the winning category node is greater than the predetermined vigilance parameter p while class matching is failed, then the current match valué is assigned to the vigilance parameter after increase it by a very small valué e (equation 3-1) The vigilance parameter p is only increasing during training phase when class matching of the wining category node is failed for a specific input features. The very small positive valué e is added to the failed match valué then is assigned to the vigilance parameter in order to classify rare events (Carpenter et al. 1991b). However, Carpenter et al. (1998b) reported that reducing it by s rather than increasing leads to reduction in number of category nodes without influence the classification accuracy of the network. The vigilance dynamic of this approach is shown in figure 3-5.

7.2. Vigilance dynamics 7.2.1 Flying approach It has been mentioned above the unique vigilance dynamic that reported in the literatures, which is only increasing during training phase when class matching of the wining category node is failed for a specific input features. This approach is called the 99

flying approach to differentiate it from other approaches that they have been proposed in this work. The vigilance parameter in the flying approach is controlled by the foliowing equation: 2M

pt+1 =max{pn{YJ(AiAwUK)IM)í}±£

1A

¡=i

The flying approach prevents committed category node that has a match valué greater than the initial vigilance parameter and belong to the class of the current input out of competition, if the match valué of the failed category node is higher than the match valué of this category node (see figure3-5). This leads to genérate more committed category nodes. Therefore, longer training and classification times are required.

7.2.2 Fixed vigilance approach In this approach, the vigilance parameter is constant during training phase, which has the initial valué. A+i = A

7-2

This allowed all committed category nodes to be created under the same level of confidence. Moreover, committed category node that has a match valué greater than the initial vigilance parameter and belong to the class of the current input can represent the input, independently to its choice valué rank among committed category nodes (see figure 7-la)

7.2.3 Free vigilance approach Free vigilance approach is assigned to the vigilance parameter the match valué of the previous category node if it is failed to represent the current input.

100

a- Fixed

A Ak

ii

•

ái

í i k

b- Free

i

i l

A k

J

A

"

i

c- Float i

l

A

k

i L

á

iL

í Figure7-1: Sketches show different vigilance parameter dynamics. The x-axis represent ranking for all committed nodes according to their choice valúes. The y-axis represents the match valué for each category nodes. First sketch (a) represents the fixed approach. All category nodes are committed at the same level of vigilance valué. The second sketch (b) represents the free approach. In this approach, the vigilance parameter is always equal to the previous match valué. Therefore, a category node might be committed with match valué less then the initial vigilance parameter p0. Finally, the third sketch (c) represents the float approach, the vigilance parameter is equal to the previous match valué if it is not smaller than the initial vigilance valué, otherwise initial vigilance valué will be employed.

101

2M

Pt+l=(Z(AiAWuK)/M)t

7-3

i=i

This allows the vigilance parameter to changed freely above and below the initial valué. This allows the network to attenuate itself to the (proper) vigilance parameter during training phase rather than forcé it to do so (see figure 7-Ib).

7.2.4 Floating approach The floating approach is like that of the free vigilance parameter but with constrain that does not let the vigilance parameter to be lower than its initial valué. This is to be sure that all committed category nodes have the minimum required level of confídence.

pt+]=max{po,(YJ(AiAwUK)/M)t}

7.4

/=i

This leads to genérate category nodes more than both fixed and free approaches, but less than flying approach (see figure 7-le). It should be mentioned here that; px = p0 for all the above vigilance dynamics. Where pB is the initial vigilance valué.

7.3 Results and discussion The performance of supervised ART-II ANN has been tested, for classification of the Landsat TM images, using all the above mentioned vigilance dynamics.

102

0.50 0^-4

0.98 —035— r~~^gy~ ^Ü70~~ 0.00

#cn

1241 ^350 "7740 ~—235' 76.84 227

mm

87.05

%

86.04 g= reT 78^ 7g^

%

Float

56

^

1241 ~T82.

#cn

-77^ T^gg

^ ^

66.71

%

Free %

#cn

120 85.73 1214 ^B5 —jgjg 57 ^™7Z07 . .. ^ j ""56" 71.44 U 13*

#cn

Fixed

Table 7-1: The performance of Supervised ART-II ANN with different vigilance dynamics. Alpha station 500 has been used for these runs.

0.15

nns^

p

p

Fly

Classifícatíon performance and number of ca tegory nodes

Figure 7-2: Classified images for landsat TM images. First, second, third, and forth column represents classified images using fly, float, fixed, and free vigilance parameter, respectively. First, second, third, fourth, and fifth raw represents classified images using initial vigilance parameter and dynamic learning rate of (0.98, 0.50), (0.95, 0.20), (0.90, 0.15), (0.70, 0.15), (0.00, 0.15), respectively. Classes are assigned colours as follow: 1) meadow-white, 2) mountain-brown, 3) fallow landl-yellow, 4) fallow land2-dark yellow, 5) fallow land3-bright yellow, 6) irrigated land-red, 7) alfalfa-black, 8) wetlanddark blue, 9) forest-dark green, 10) wheat-light green, 11) natural vegetationl-yellowish green, 12) natural vegetation2-green, and 13) riverblue.

The network has been tested using fíve different combinations; (0.98,0.50), (0.95,0.20), (0.90,0.15), (0.70,0.15), and (0.00,0.15), for vigilance parameter and dynamic leaming rate, respectively. These valúes of vigilance parameter p0 and dynamic leaming rate j3 are located on the optimum line for classifícation performance (figure 6-6). This optimum line represents the best valué of /? for a specific valué of p to get the máximum classifícation accuracy using flying approach. The classifícation performance is ranged from 66.71% using the free approach to 87.05% using the flying approach. The numbers of category nodes was 120 and 1252 for free and flying approaches, respectively. These results obtained when 0.98, and 0.50 were used for the vigilance parameter and for the dynamic leaming rate, respectively. See (table 7-1) for details. Classified images are shown in (figure 7-2). The neural network performances using flying, floating and fixed approaches are closer to each other as the vigilance parameter approach to unity. It is clear from the theory that all above-mentioned approaches lead to the same classifícation accuracy and number of category nodes at p0 = 1. The neural network performance using floating and free approaches are closer to each other as pB approach to zero. It is lead to the same performance at p0 = 0. While the flying approach shows better performance from accuracy point of view when the initial vigilance parameter is equal or greater than 0.95, the floating approach shows better performance for initial vigilance parameter less than 0.95. From number of category-nodes point of view, the network performs better using floating approach. While it is equal to each other at p0 = 1, it is reduced to less than 25% (56/227) at p0 = 0, (see table 7-1). Such reduction will let to reduction in the training time and the classifícation time as well.

105

CHAPTER VIII CONCLUSIONS In this study new simplifíed architectures of ANNs have been designed. These architectures have been employed to analyze remotely sensed data. The conclusions that can be drawn from this study are: 1) Two new versions of Fuzzy ART have been developed. The algorithms show that these new versions have the same performance as the original algorithm for categorization. However, they require less training and categorization times. 2) New supervised ART-type architecture has been developed called Supervised ARTI. It has been built from a single module of ART rather than two modules of ARTs linked by a map field as are the cases of all supervised ART ANNs which have been addressed in the literature. This leads to the elimination of the map field and its parameters. It is theoretically proven that Supervised ART-I has the classification performance of fuzzy ARTMAP, however, it requires less memory and less training time due to its simple architecture. 3) Other supervised ART-type architecture has been developed called Supervised ART-II. It is also has been built from a single module of ART. It has the classification performance of fuzzy ARTMAP and Supervised ART-I. The category layer of Supervised ART-II has been divided into stacks. Each stack represents a single class. This reduces the required memory for labeling category nodes from N in the tagging approach of Supervised ART-I to only L in the stacking approach of Supervised ART-II. 106

4) An uncommitted category node in Supervised ART-I is free to represent any class, however, an uncommitted category node in Supervised ART-II is predetermined to represent a specific class. When a stack runs out of uncommitted category nodes, borrowing uncommitted category node from other stack is not possible. Increasing the memory size of each stack can solve this limitation of the stacking supervisión approach, of Supervised ART-II. This additional memory is compensated by employing only L of the released N memory of the tagging supervisión of Supervised ART-I. The released memory can be used to increase the memory size of each stack by one fold. 5) While we only employed the newly developed supervisión approaches for Fuzzy ART, they can be applied to all ART-type ANNs. 6) Supervised ART-I is oriented to homogenous environment, while Supervised ARTII is oriented to non-homogenous environment. The homogenousity of the environment depends on the type of data and on the dynamic parameters. 7) Since both Supervised ART-I and Supervised ART-II have been built from a single module of ART, the cost for building chips for classification tasks will be much lower than the map field approach. 8) The behavior of both Supervised ART-I and Supervised ART-II, for training remotely sensed data, for all the domain of the dynamic parameters is well understood. 9) An automatic system for classifying Landsat TM images, with very good classification accuracy, has been developed. 10) This study shows that flying approach should be employed for vigilance dynamic if the vigilance parameter is very high (>0.95), while floating approach should be employed otherwise.

107

Some aspects derived from this study that need to be investigated in fiíture works are: new learning algorithms need to be developed. These learning algorithms must eliminate or reduce the under-training and over-training episode. Further studies are recommended to investígate the behavior of these designed architectures for dealing v/ith different digital signal processing problems. Some studies in this direction have been already conducted. The developed architectures have been employed successfully •for monitoring forest fire (Al-Rawi et al. 2001a, b, c & d) and for cloud detection (AlRawi et al. 200le & f).

108

BIBLIOGRAPHY Al-Rawi, K. R., 1999, "Supervised ART-I: A new neural network architecture for learning and classifying multivalued input patterns", Lecture Notes in Computer Science, 1606,756-765. Al-Rawi, K. R., Gonzalo, C , and Arquero, A., 1999, "Supervised ART-II: A new neural network architecture, with quicker learning algorithm, for classifying multivalued input patterns", In proceeding of the European Symposium on Artificial Neural Network ESANN'99, Bruges, Belgium, 289-294. Al-Rawi, K. R., Gonzalo, C.,and Martínez, E., 2000, "Supervised ART-II for classifícation Landsat Thamatic Mapper image", Remote Sensing in the 21st Century: Economic and Environmental Applications, Casanova (ed), Balkema, Rotterdam, 229-235. Al-Rawi, K. R., Casanova, J. L., and Calle, A., 2001a, "Burned área mapping system and fire detection system, based on neural networks and NOAA-AVHRR imagery", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova, J. L., and Romo, A., 2001b, "IFEMS: New approach for monitoring wildfire evolution with NOAA-AVHRR imagery", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova J. L., and Louakfaoui, M., 2001c, "IFEMS for monitoring spatial-temporal behaviour of múltiple fire phenomena", International Journal of Remote Sensing (in press). Al-Rawi, K. R., Casanova J. L., and Calle, A., 200Id, "ART neural network for mapping burned área and determination severity of burn with Landsat TM images", Submitted to IEEE on Geoscience and Remote Sensing. Al-Rawi, K. R., Casanova J. L., and Vasileisky, A., 200le, "A very quick neural network algorithm for cloud detection", Submitted to Geocarto International. Al-Rawi, K. R., and Casanova, J. L., 2001f, "Neural network as an aid tool for building non-linear threshold algorithm for cloud detection", Submitted to Remote Sensing ofEnvironment. Bachelder, I. A., Waxman, A. M., and Seibert, M., 1993, "A neural system for mobile robot visual place learning and recognition", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 512-517.

109

Baloch, A. A., and Waxman, A. M., 1991, "Visual learning, adaptive expectations, and behavioral conditioning of the mobile robot MAVIN", Neural Networks, 4, 271-302. Baraldi, A., and Parmiggiani, F., 1995, "A neural network for unsupervised categorization for multivalued input patterns. An application to satellite image clustering", IEEE Transaction on Geoscience and Remote Sensing, 33, 305316. Benediktsson, J. A., Swain, P. H., and Ersoy, O. K., 1990, "Neural network approaches versus statistical methods in classification of multisource remote sensing data", IEEE Transaction on Geoscience and Remote Sensing, 28, 540-552. Bernardon, A. M., and Carrick, J. E., 1995, "A neural system for automatic target learning and recognition applied to bare and camouflaged SAR target", Neural Amorfo, 8, 1103-1108. Carpenter, G. A., and Grossberg, S. 1987a, "A massively parallel architecture for a selforganizing neural pattern recognition machine", Computer Vision, Graphic, and Image Processing, 37, 54-115. Carpenter, G. A., and Grossburg, S., 1987b, " ART2: Stable self-organization of pattern recognition codes for analog input patterns", Applied Optics, 26, 4919-4930. Carpenter, G. A., and Grossberg, S., 1990, " ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures", Neural Networks, 3, 129-159. Carpenter, G. A., Grossberg, S., and Rosen, D. B., 1991a, " Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system", Neural Networks, 4, 759-771. Carpenter, G. A., Grossberg, S., and Renold, J. H., 1991b, "ARTMAP: Supervised realtime learning and classification of nonstationary data by self-organizing neural network", Neural Network, 4, 565-588. Carpenter, G. A., Grossberg, S., Markuzan, N., Reynold, J. H., and Rosen, D. B., 1992, "FUZZY ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps", IEEE Transaction On Neural Network, 3,698-882. Carpenter, G. A., and Ross, W. D., 1993, "ART-EMAP: a new neural network architecture for object recognition by evidence accumulation", IEEE Transaction On Neural Network, 6, 805-818. Carpenter, G. A., Gaja, M. N., Gapa, S., and Woodcok, C. E., 1997, "ART neural networks for remote sensing: vegetation classification from landsat TM and terrain data", IEEE Transaction on Geoscience and. Remote Sensing, 35, 308325.

no

Carpenter, G. A., 1997, "Distributed learning, recognition, and prediction by ART and ARTMAP neural networks", Neural Networks, 10, 1473-1494. Carpenter, G. A., and Markuzon, N., 1998, "ARTMAP-IC and medical diagnosis: Instance counting and inconsistent cases", Neural Networks, 11, 323-336. Carpenter, G. A., 1998, "Distributed ARTMAP: a neural network for fast distributed supervised learning", Neural Networks, 11,793-813. Caudell, T. P., and Healy, M. J., 1994, "Adaptive Resonance Theory networks in the Encephalon autonomous visión system", Proceedings of the IEEE International Conference on Neural Networks, Piscataway, NJ, USA, 12351240. Caudell, T. P., Smith, S. D. G., Escobedo, R., and Anderson, M., 1994, "NIRS: large scale ART-1 neural architectures for engineering design retrieval", Neural Network, 7,1339-1350. Dubrawski, A., and Crowley, J. L., 1994, "Learning locomotion reflexes: A selfsupervised neural system for a mobile robot", Robotic and Autonomous System, 12,133-142. Gan, K. W., and Lúa, K. T., 1992, "Chínese character classification using adaptive resonance network", Pattern Recognition, 25, 877-882. Georgiopoulos, M, Fernlund, H., Bebis, G., and Heileman, G. L., 1996, "Order of search in Fuzzy ART and Fuzzy ARTMAP: Effect of the choice parameter", Neural networks, 9, 1541-1559. Georgiopoulos, M, dagher, L., Heileman, G. L., and Bebis, G., 1999, "Properties of learning of a Fuzzy ART variant", Neural networks, 12, 837-850. Gopal, S., Sklarew, D. M., and Lambin, E., 1994, "Fuzzy-neural networks in multitemporal classification of landcover change in the Sahel", Proceeding of the DOSES Workshop on New Toolsfor Spatial Analysis, Lisbon, Portugal, 55-68. Grossberg, S., 1976, " Adaptive pattern classification and universal recoding, II: Feed back, expectation, olfaction, and illusions", Biological Cybernetics, 23, 187202. Grossberg, S, 1980, "How does a brain build a cognitive code?", Psychological Review, 1,1-51. Ham, F. M., and Han, S. W., 1996, "Quantitative study of the QRS complex using fuzzy ARTMAP and MIT/BIH arrhythmia datábase", in proceeding of Word congress on Neural Networks, 1,207-211. Heermann, P. D., and Khazenie, N., 1992, "Classification of multispectral remote sensing data using a Back-Propagation neural network", IEEE Transaction on Geoscience and Remote Sensing, 30, 81-88.

lll

Hepner, G. F., Logan,T., Ritter, N., and Bryant, N., 1990, " Artificial neural network classification using a minimal training set: comparison to conventional supervised classification", Photogrammetric Engineering & Remote Sensing, 56, 469-473. Hopfield, J. J., 1982, "Neural networks and physical systems with emergent collective computational abilities," Proceeding of National academy of Sciences, 79, 2554-2558. Keyvan, S., Drug, A., Rabelo, L. C , 1993, "Application of artificial neural networks for development of diagnostic monitoring system in nuclear plants", transaction of American Nuclear society, 1, 515-522. Keyvan, S, 1999, "Application of ART2-A as a Pseudo-supervised paradígn to nuclear reactor diagnostics", Lecture Notes in Computer Science, 1606, 747-755. Kohonen, T, 1982, "Self-organized formation of topologically correct feature maps," Biological Cybernetics, 43, 59-69. Kumar, S. S., and Guez, A., 1989, "A neural network approach to target recognition", International Joint Conference on Neural Network, Washington DC, Hillsdale, NJ, Erlbaum Associate, II, 573-578. Lang, K. J., and Withbrock, M. J., 1989, "Learning to tell two spirals apart", Proceedings 1988 Connectionist Models Summer School, 52-59. Le Cun, Y. 1986, "Learning processes in an asymmetric threshold network", in Disordered Systems and Biological Organization, E. Bienenstock, F. Fogelman Souli, and G. Weisbruch, Eds., Berlín, Spring-Verlag. Mannan, B., Roy, J., and Ray, K., 1998, "Fuzzy ARTMAP supervised classification of multi-spectral remotelt-sensed data", International Journal of Remote Sensing, 19, 767-774. Mehta, B. V., Vij, L., and Rabelo, L. C., 1993, "Prediction of secondary structure of protein using fuzzy ARTMAP", in proceeding of Word Congress on Neural Networks, 1,228-232. Mekkaoui, A., and Jespers, P., 1990, "An optimal self-organizing pattern classifier", International Joint Conference on Neural Networks, Washington DC, Hillsdale, NJ, Erlbaum Associate, 1,477-450. Moore, B., 1989, "ART1 and patterns clustering", proceeding 1988 connectionist models Summer School, D. Touretzky, G. Hintoon, and T. Sejnowski, Eds, San Mateo, CA : Morgan Kaufmann, 174-185. Mulder, N. J., and Spreeuwers, L., 1991, "Neural networks applied to the classification of remotely sensed data", International Geoscience and Remote Sensing Symposium (IGARSS'91). Espo, Finland, 2211-2213.

112

Murrshed, N. A., Bortozzi, F., and Sabourin, R., 1995, "Off-line signature verification, without a priori knowledge of class col. A new approach", Proceedings ofthe Third International Conference on Document Analysis and Recognition, Piscataway, NJ, USA. Paola J. D.,and Schowengerdt, R. A., 1994, " Comparisons of neural networks to standard techniques for image classification and correlation", International Geoscience and Remote Sensing Symposium (IGARSS'94). Pasadena, Ca, USA, 1404-1406. Paola, J. D., and Schowengerdt, R. A., 1995, "A review and analysis of backpropagatíon neural networks for classification of remotely-sensed multi spectral imagery", International Journal of Remote Sensing, 16, 3033-3058. Paola, J. D., and Schowengerdt, R. A., 1997, "The effect of neural-network structure on a multispectral land-use / land-cover classification", Photogrammetric Engineering & Remote Sensing, 63, 535-544. Parker, D., 1986, "Computational research in economics and management science", MIT, Cambridge, MA, USA, technical report TR-87, 1986. Racz, J., and Dubrawski, A.; 1995, "Artificial neural network for mobile robot topological localization", Robotics andAutonomous Systems, 16, 73-80. Rumelhart, D. E., Hinton, G. E., and Williams, R. J., 1986, "Learning internal representations by back-propagation", Parallel distributed Processing: Explorations in the Microstructure of Cognition (D. E. Rumelhart and J. L. McClelland, Eds). MIT Press, Cambridge, Massachusetts, 318-362. Salu, Y., and Tilton, J., 1993, "Classification of multispectral image data by the binary diamond neural network and by nonparametric, pixel-by-pixel methods", IEEE Transaction on Geoscience and Remote Sensing, 31, 606-617. Seibert, M., and Waxman, M., 1992, "Adaptive 3-D object recognition from múltiple views", IEEE Transactions on Pattern Analysis and Machine Intelligence, 14, 107-124. Seibert, M., and Waxman, A. M., 1993, "An approach to face recognition using saliency maps and caricatures", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 661-664. Simpson, P. K., 1990, "Neural networks for sonar signal processing", Handbook of neural computing applications (A. J. Maren, C. T. Harston, and R. M. Pap (Eds.), San Diego, Academic press, 319-335. Solaiman, B., and Mouchot, M. C , 1994., "A comparative study of conventional and neural network classification of multispectral data", International Geoscience. and Remote Sensing Symposium (IGARSS'94), Pasadena, CA, USA, 14131415.

113

Soliz, P., and Donohoe, G. W., 1996, "Adaptive resonance theory neural network for fundus image segmentation", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 1180-1183. Srinivasa, N., and Sharma, R., 1996, "A self-organizing invertible map for active visión applications", Proceedings of the World Congress on Neural Networks, Hillsdale, NJ, USA, 121-124. Tzeng, Y. C , Chen, K. S., Kao, W. L., and Fung, A. K., 1994, "A dynamic learning neural network for remote sensing applications", IEEE Transaction on Geoscience and Remote Sensing, 32,1096-1102. Yool, S. R., 1998, "Land cover classification in rugged áreas using simulated moderateresolution remote sensor data and an artificial neural network", International Journal of Remote Sensing, 19, 85-96. Yoshida, T., and Omatu, S., 1994, "Neural network approach to land cover mapping", IEEE Transaction on Geoscience and Remote Sensing, 32, 1103-1109. Warner, T. A., and Shank, M., 1997, "An evolution of the potential for fuzzy classification of multispectral data using artificial neural networks", Photogrammetic Engineering & Remote Sensing, 63,1285-1294. Waxman, A. M., Seibert, M. R.? Gove, A., Fay, D. A., Bernardon, A. M., Lazott, C , Steele, W. R., and Cunnigham, R. K., 1995, "Neural processing of targets in visible, multispectral IR and SAR imagery", Neural Networks, 8, 1029-1051. Werbos, P. J., 1974, "Beyond regression: New tools for prediction and analysis in the behavioural sciences", Ph.D. thesis, Harvard University, Cambridge, MA, USA. Williamson, J. R., 1996, "Gaussian ARTMAP: A neural network for fast incremental learning of noisy multidimentional maps", Neural Networks, 9, 881-897. Wilson, C. L., Wilkinson, R. A., and Ganis, M. D., 1990, "Self-organizing neural network character recognition on a massively parallel computer", International Joint Conferenceon neural Networks, San Diego, Piscataway, NJ, IEEE Service Center, II, 325-329.

114

APPENDIX RESUMEN

A.l. INTRODUCCIÓN A.1.1 Evolución histórica de las Redes Neuronales Artificiales (RNA) Aún cuando el origen de las RNA se puede fechar en 1943, cuando McCulloch and Pitts construyeron la primera estructura de RNA, los fundamentos de este área se desarrollaron en la primera mitad de los años setenta. Fue entonces cuando (Werbos 1974) planteó los principios del algoritmo de aprendizaje conocido como Back Propagation (BP) y (Grossberg 1976) estableció las bases de la Teoría de Resonancia Adaptativa (Adaptive Resonance Theory (ART)). No obstante, fue en la década de los ochenta cuanto se produjo un gran avance teórico en este campo. De tal forma que el algoritmo BP fue desarrollado simultánea e independientemente por diferente autores (Le Cun 1986, Parker 1986, y Rumelhart et al. 1986). Además se plantearon nuevas estructuras de redes neuronales y nuevos algoritmos de aprendizaje. Así, Kohonen propuso en 1982 las Redes Neuronales Autoorganizativas {Self-Organizing Map (KSOM)). En este trabajo se ha prestado una especial atención a la evolución experimentada por las RNA tipo ART (Carpenter y Grossbergh 1987a&b), dada su probada estabilidad, rapidez y precisión (Carpenter et al. 1991a&b, 1992, 1997, y Gan & Lúa 1992). Estas prestaciones han facilitado su aplicación en diferentes y numerosas áreas. Así, la compañía Boeing ha utilizado este tipo de RNA para la obtención de información de diferentes sistemas con objeto de facilitar el diseño de otros nuevos sistemas (Caudell et al. 1994). También se ha utilizado este tipo de redes para

115

reconocimiento de objetivos móviles (Seibert y Waxman 1992, Bernardon y Carrick 1995, Kumar y Guez 1989, Koch et al. 1995, y Waxman et al. 1995); Para el control de motores en robótica (Baloch y Waxman 1991, Bachelder et al. 1993, Dubrawski y Crowley 1994, Srinivasa y Sharma 1996); En navegación de robots (Racz y Dubrawski 1995); En visión artificial (Caudell y Healy 1994); Reconocimiento de objetos (Seibert y Waxman 1992); Reconocimiento de caras (Siebert y Waxman 1993); Agrupación de patrones (Moore 1989, Mekkaoui y Jespers 1990); Reconocimiento de caracteres (Wilson et al. 1990); Procesado de señales de Sonar (Simpson 1990); Procesado de imágenes médicas

(Soliz y Donohoe

1996); Reconocimiento

de ondas en

electrocardiogramas (Ham y Han 1996); Verificación de firmas (Murshed et al. 1995); Identificación de fallos en plantas nucleares (Keyvan 1999); y en Teledetección (Gopal etal. 1994, Baraldi y Parmiggiani 1995).

A. 1.2 Clasificación de datos remotamente detectados con RNA. Los avances experimentados en las últimas décadas, tanto en la investigación espacial como en las tecnologías de computación, han hecho posible la utilización de datos remotamente detectados para la determinación y ubicación automática de las clases temáticas presentes en la superficie terrestre. En la actualidad, este área de conocimiento se caracteriza por ser una línea de investigación muy activa (Benediktsson et al. 1990). Las ventajas que aporta el uso de RNA para llevar a cabo estas tareas de clasificación, frente a algunos clasificadores convencionales, tales como el de máxima probabilidad (MLC) son: 1) Las RNA no necesitan conocer apriori la distribución de probabilidad para cada clase, ya que son sistemas no-paramétricos. Además, esto permite introducir otros datos auxiliares de naturaleza no espectral (pendiente, topografía, textura, ...etc), los cuales parecen mejorar la precisión de la clasificación

116

(Benediktsson et al. 1990, Carpenter et al. 1997). También, se ha probado que las redes neuronales son más robustas cuando la distribución no es gaussiana (Paola y Schowengerdt 1997,

Hepner et al. 1990). 2) A diferencia de los clasificadores

convencionales, las RNA tiene capacidad para tratar con clasificaciones difusas (Paola y Schowengerdt 1997, Warner y Zanca 1997, Yool 1998). En estos casos, los valores proporcionados por las neuronas de salida pueden cuantificar el grado de pertenencia de los datos de entrada a una clase determinada. Este aspecto es especialmente relevante cuando se trabaja con sensores de baja resolución espacial. 3) El paralelismo inherente en las RNA permite una relativa facilidad de computación de estos sistemas en computadoras paralelas (Salu y Tilton

1993, Heermann y Khazenie

1992),

disminuyendo considerablemente el tiempo empleado en el proceso de clasificación, respecto de los clasificadores clásicos. 4) La flexibilidad de las RNA permite mejorar los resultados de clasificación en determinadas circunstancias (Carpenter et al. 1997). 5) Por último, estos sistemas tienen la capacidad de poder establecer límites de decisión arbitrarios (Paola y Schowengerdt 1995, Tzeng et al. 1994). La red neural mas habitualmente utilizada en la literatura para clasificar datos remotamente detectados es el Perceptron multi-capa (MLP), con el conocido algoritmo de aprendizaje Backpropagation. Este algoritmo se basa en la minimización del error entre el valor proporcionado por la red a su salida y el valor real. Algunos autores han afirmado que los clasificadores convencionales tienen mejores prestaciones que el MLP (Mulder y Spreeuwers 1991, Solaiman y Mouchot 1994). Sin embargo, otros han concluido que el MLP clasifica datos remotamente detectados con mayor precisión que el MLC ( Hepner et al. 1990, Heerman y Khazenie 1992, Paola y Schowengerdt 1994, Yoshida y Omatu 1994). No obstante, la clasificación de datos remotos mediante la red MLP presenta una serie de inconvenientes, como son: la arquitectura de la red no es fija,

117

el número de capas ocultas y el número de nodos en cada capa oculta debe determinarse mediante prueba y error. Este proceso puede ser muy costoso desde el punto de vista de tiempo de computación, dado que el entrenamiento de la red es lento. Además, durante el proceso de aprendizaje, la red puede quedar atrapada en mínimos locales, lo que impediría la convergencia de la red. Este problema se puede minimizar disminuyendo el valor de la razón de aprendizaje, pero esto supone un aumento en el tiempo empleado por la red durante el entrenamiento. (Heermann y Khazenie 1992) propusieron la utilización de computadoras paralelas para reducir el tiempo de entrenamiento, a costa de un aumento en el coste de hardware. Algunos estudios (Carpentar et al. 1992) han mostrado que Fuzzy ARTMAP proporciona una precisión de clasificación mayor que el MLP para imágenes del sensor Thematic Mapper (TM), transportado por el satélite Landsat, empleando menos tiempo para ello. Así mismo, estos autores concluyeron que en este caso, Fuzzy ARTMAP y MLC proporcionaban la misma precisión de clasificación. Sin embargo, (Marinan et al. 1998) compararon las prestaciones de Fuzzy ARTMAP, MLP y MLC para clasificar una imagen de 512x512 detectada por el sensor LISS-II transportado por el satélite Indio IRS-1B, concluyendo que la precisión de clasificación de Fuzzy ARTMAP era muy superior a la de los otros dos clasificadores. En cuanto al tiempo requerido para el aprendizaje era ligeramente inferior que el tiempo empleado por el MLC y considerablemente menor que él empleado por el MLP. Además es preciso destacar que a diferencia del MLP, la arquitectura de Fuzzy ARTMAP está bien definida, siempre converge, y es capaz por si misma de generar nuevos nodos que permitan representar subclases. El principal inconveniente que presenta Fuzzy ARTMAP es la complejidad de su arquitectura.

118

A.2. OBJETIVOS DE LA TESIS De los aspectos discutidos anteriormente se sigue el objetivo de la presente Tesis. Este objetivo se puede enunciar como la búsqueda de arquitecturas de redes neuronales tipo ART que presenten las mismas prestaciones que ellas, pero que sean más simples desde el punto de vista estructural, lo que a su vez supondrá la disminución de los tiempos de cómputo asociados tanto al proceso de aprendizaje como al de operación. Este objetivo global, se puede desglosar en algunos objetivos parciales como son: •

Diseño de nuevas arquitecturas dé RNA tipo ART, que proporcionen la misma precisión de clasificación que las ART clásicas, reduciendo la complejidad de sus arquitecturas.

•

Propuesta de algoritmos de aprendizaje para estas arquitecturas.

•

Codificación de los algoritmos de aprendizaje de las diferentes arquitecturas propuestas.

•

Estudio exhaustivo y comparativo de las prestaciones de las redes y los algoritmos propuestos para el caso de la clasificación de imágenes remotamente detectadas por el sensor Thematic Mapper.

A.3. REDES NEURONALES ARTIFICIALES TIPO ART Los principios de la Teoría de Resonancia Adaptativa (ART) fueron planteados por Carpenter y Grossberg (Centre for Adaptive Systems, Department of Cognitive and Neural System, University of Boston), como una teoría sobre el procesado de información del sistema cognitivo humano (Grossberg 1976, 1980). A partir de esta teoría, se desarrollaron inicialmente, diferentes estructuras no supervisadas, ART1 (Carpenter y Grossberg 1987a), ART2 (Carpenter y Grossberg 1987b), ART3

119

(Carpenter y Grossberg 1990), SART (Baraldi y Parmiggiani 1995) y Fuzzy ART (Carpenter et al. 1991a). Todas estas redes eran capaces de agrupar las diferentes entradas en clases, utilizando únicamente la información que caracterizaba a dichas entradas (aprendizaje no supervisado). La diferencia fundamental entre ART1 y ART2 es que la primera solo admite datos binarios, mientras que la segunda también admite datos analógicos. En ambas, hay flujo de información hacia delante y hacia atrás. Hacia delante, a través de los pesos que conectan cada nodo de la capa de entrada con todos los nodos de la capa que realiza el agrupamiento de los datos de entrada. A cada uno de estos nodos se le va a denominar nodo categoría. Y hacía atrás mediante otro conjunto de pesos que conecta cada nodo categoría, con todos los nodos en la capa de entrada. Al igual que ART2, Fuzzy ART puede clasificar tanto datos binarios como analógicos. Sin embargo, en este último caso la información solo fluye hacia delante desde la capa de entrada hasta la capa clasificadora. Otra diferencia fundamental entre Fuzzy ART, ART1 y ART2, es que el operador intersección de la teoría de conjuntos ( n ) , ha sido sustituido por el operador ( A ) que representa al operador de mínimo valor en la teoría de lógica difusa (fuzzy). La primera red neuronal tipo ART que presentó un aprendizaje supervisado fue ARTMAP, la cual fue propuesta por Carpentar et al. en (1991). En este caso, además de las características a clasificar es preciso proporcionar a la red, durante la fase de entrenamiento, el código de clase que corresponde a cada entrada. En 1992, estos mismos autores presentaron otra nueva red tipo ART con aprendizaje supervisado Fuzzy ARTMAP (Carpenter et al. 1992). Posteriormente, otras muchas arquitecturas supervisadas tipo ART han sido investigadas, entre las que cabe mencionar ART-EMAP (Carpenter y Ross 1993), Gaussian ARTMAP (Williamson 1996), ARTMAP-IC (Carpenter y Markuzon 1998), y Distributed ARTMAP (Carpenter 1998). Todas estas

120

arquitecturas, se caracterizan porque la supervisión se lleva a cabo mediante un "map field" que requiere la presencia de dos módulos tipo ART (ARTa y ARTb). Las principales diferencias entre ARTMAP y Fuzzy ARTMAP radican en que mientras la primera está construida con dos módulos de ART1, la segunda utiliza dos módulos de Fuzzy ART. ARTMAP tiene la habilidad de aprender y clasificar patrones de entrada binarios multievaluados, mientras que Fuzzy ARTMAP también admite patrones analógicos. De todas las redes supervisadas mencionadas anteriormente, Fuzzy ARTMAP ha sido la más utilizada. Ella ha sido aplicada a la resolución de diferentes problemas, como son: análisis automático de electrocardiogramas (Ham y Han 1996); gestión y diagnóstico de centrales nucleares (Keyvan et al. 1993); o predicción de la estructura secundaria de algunas proteínas (Mehta et al. 1993).

A.3.1 Fuzzy ART Dado que todas las arquitecturas y algoritmos propuestos en este trabajo están inspirados en Fuzzy ART, y Fuzzy ARTMAP, se va a realizar aquí una breve descripción de ambas. Previamente, es preciso hacer notar que ambas mantienen las características básicas y propias de todo los sistemas tipo ART. Entre ellas, es especialmente resefiable, el emparejamiento de acuerdo a criterios de semejanza (matching) entre los patrones de entrada y los vectores prototipo previamente aprendidos por la red. Este proceso de emparejamiento puede llevar a la red a un estado resonante que puede dar lugar al aprendizaje de nuevos prototipos (categorías) o a la búsqueda de prototipos semejantes y previamente aprendidos. Si la semejanza es mayor entre el patrón de entrada a la red y el almacenado que el predeterminado, la resonancia ocurre y la nueva información se incorpora al nodo de la categoría seleccionado

121

mediante el entrenamiento de sus pesos. El criterio de semejanza se establece a través del denominado parámetro de vigilancia/?. Este parámetro determina el umbral que debe superar un nodo categoría comprometido para poder representar un patrón de entrada dado, antes de que se dispare la búsqueda de otro nodo categoría que represente mejor dicho patrón. Si ninguno de los nodos categoría comprometidos supera dicho umbral, se debe comprometer un nuevo nodo categoría. Este proceso se puede repetir, siempre que no se supere la capacidad de memoria de la red. El parámetro de vigilancia, p,

es un número adimensional definido en el intervalo (0, 1]. Un valor de este

parámetro igual a 1 representa una semejanza perfecta, es decir determina clases muy bien diferenciadas, pero da lugar a un número alto de nodos categoría, mientras que valores bajos de este parámetro permiten trabajar con pocos nodos categoría pero da lugar a clases muy generales. Este parámetro es una de las claves de todas las RNA tipo ART. Su valor depende del tipo y volumen de datos, la precisión de clasificación que se desee, la velocidad requerida y la memoria disponible. Este parámetro se mantiene constante en la operación de todas las redes no supervisadas. En la figura 2-1 de la memoria, se muestra la dinámica de Fuzzy ART. En esta figura Fx representa la capa de entrada y F2 la denominada capa clasificadora. Los pesos ^conectan cada nodo de la capa de entrada con todos los nodos de la capa clasificadora. El aprendizaje de los pesos del nodo ganador, wu, solo se lleva acabo si este nodo pasa la prueba de semejanza, o dicho en otras palabras supera el parámetro de vigilancia, sino este nodo sale de la competición (reset). En la figura 2-1 \X\ representa el grado de semejanza entre la entrada y los pesos del nodo categoría ganador J. Este grado de semejanza está determinado por la relación X = ^ ( 4 ( , )

A

wu).

La selección

del nodo ganador supone calcular el nivel de activación de cada nodo categoría, Tj°

122

(ec. 2-1), y elegir el nodo que alcanza el nivel mas alto. El valor de y. es una estimación del grado de pertenencia de la entrada a la clase representada por el nodo/. La arquitectura de Fuzzy ART se muestra en la figura 2-2, donde se han representado los 2M nodos de la capa de entrada, siendo M el número de valores que definen a cada patrón de entrada. Los M últimos nodos de entrada representan los valores complementarios de dichos patrones. Además en la figura 2-2 se han representado los nodos categoría, así como todas las conexiones entre los nodos de F¡ y F2 • Los nodos categoría cuyo índice va desde 1 hasta C reciben el nombre de nodos categoría comprometidos, mientras que los nodos categoría cuyos índices van desde C+l hasta N se denominan nodos categoría no comprometidos. Cuando todos los nodos categoría comprometidos fallan en la representación de una entrada y consecuentemente están fuera de competición uno de los nodos categoría no comprometidos debe ser comprometido. Una vez que se ha encontrado un nodo capaz de representar al patrón de entrada a la red y dicho nodo ha pasado el test de vigilancia, el valor de los pesos de ese nodo categoría debe ser actualizados para que incorporen las características del nuevo patrón al nodo J (ec.2-7). La ecuación de adaptación de los pesos viene dada por la siguiente expresión: w"J» = /3{A?

A

wff ) + (1 - / ? )
y b . El primer conjunto de vectores representa los patrones de entrenamiento, mientras que el segundo grupo representa el código binario asignado a la clase a la que pertenece el correspondiente patrón de entrenamiento. Cuando el nivel de activación del nodo ganador supera el parámetro de vigilancia, se debe evaluar la semejanza de la clase que en este caso se considerará aceptable, si supera el valor predeterminado del parámetro de vigilancia del mapfield, pab, entonces se procede a la actualización de los pesos de acuerdo a las ecuaciones

" T = A4 ( ° A Wf) + (1 - PW?

;i=l 2M