MULTIVARIATE TEMPORAL CLASSIFICATION BY WINDOWED WAVELET DECOMPOSITION AND RECURRENT NEURAL NETWORKS

International Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human-Machine Interface Technologies (NPIC&HMIT 2000), Washington, DC, N...

Author: Dustin Dean

2 downloads 1 Views 72KB Size

Report

Download PDF

Recommend Documents

Piano Transcription using Wavelet Decomposition and Neural Networks

Combining Wavelet Transforms and Neural Networks for Image Classification

Speech Recognition By Using Recurrent Neural Networks

Lecture 10 Recurrent neural networks

MALWARE CLASSIFICATION WITH RECURRENT NETWORKS

Temporal Pattern Classification using Spiking Neural Networks. Olaf Booij

Pattern Classification Using Neural Networks

Minimal Gated Unit for Recurrent Neural Networks

Generating Text with Recurrent Neural Networks

Forecasting with Recurrent Neural Networks: 12 Tricks

IMPROVED BACKPROPAGATION LEARNING IN NEURAL NETWORKS WITH WINDOWED MOMENTUM

Supervised Sequence Labelling with Recurrent Neural Networks

Brain Tumor Classification Using Wavelet and Texture Based Neural Network

A guide to recurrent neural networks and backpropagation

Object Classification using Deep Convolutional Neural Networks

Heart Beat Classification Using Wavelet Feature Based on Neural Network

Forecasting Electricity Load with Advanced Wavelet Neural Networks

Forecasting Crude Oil Prices Using Wavelet Neural Networks

Abstractive Sentence Summarization with Attentive Recurrent Neural Networks

Pre-training of Recurrent Neural Networks via Linear Autoencoders

Recurrent Neural Networks Can Learn to Implement Symbol-Sensitive Counting

TTS Synthesis with Bidirectional LSTM based Recurrent Neural Networks

Japanese-to-English Machine Translation Using Recurrent Neural Networks

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

International Topical Meeting on Nuclear Plant Instrumentation, Controls, and Human-Machine Interface Technologies (NPIC&HMIT 2000), Washington, DC, November, 2000.

MULTIVARIATE TEMPORAL CLASSIFICATION BY WINDOWED WAVELET DECOMPOSITION AND RECURRENT NEURAL NETWORKS

Davide Roverso Institutt for energiteknikk, OECD Halden Reactor Project PO.Box 173, N-1751 Halden, Norway [email protected] Keywords: Neural Networks, Wavelets, Transient Classification ABSTRACT The operation of any industrial plant is based on the readings of a set of sensors. The ability to identify the state of operation, or the events that are occurring, from the time evolution of these readings is essential for tasks such as supervisory control, detection and diagnosis of faults, and process quality control. Reasoning in time, however, is very demanding, because time introduces a new dimension with significant levels of additional freedom and complexity. The real-time history of scores of variables can be displayed and monitored in most computerized process monitoring and control systems. However, when the process is in significant transience or crises have occurred, the displayed trends of interacting variables and alarms can easily overwhelm an operator. In this paper we describe how a combined use of wavelets and recurrent neural networks improves on our previously proposed solutions to the transient classification problem. In particular, the newly developed system overcomes two basic limitations of the earlier systems, namely the requirement for fixed length transients, and the requirement for a trigger signal indicating the start of a transient, i.e. the need for a separate transient detection component. The paper also includes an experimental analysis of the discrimination power of the proposed system, which provides a strong case for its application potential to a great variety of industrial processes. 1. INTRODUCTION Many industrial processes are characterized by long periods of steady-state operation, intercalated by occasional shorter periods of a more dynamic nature in correspondence of either normal events, such as minor disturbances, planned interruptions or transitions to different operation states, or abnormal events, such as major disturbances, actuator failures, instrumentation failures, etc. This second class of events represents a challenge, and possibly a threat, to the smooth, safe, and economical operation of the monitored process. The prompt detection and recognition of such an event is of the essence for the performance of the most effective and informed response to the challenge. The most common way of performing event detection and recognition in an industrial plant is to rely on experienced operators, which, by observing the current values of important process variables, as well as their recent history on trend displays, plus eventual alarms generated by the process monitoring system, can usually quickly and

reliably diagnose the current event and perform the adequate correcting actions through the plant control system. However, when the process is in significant transience or crises have occurred, the displayed trends of interacting variables and alarms can easily overwhelm an operator. When process variables change with different rates, or are affected by varying lags, it is very difficult for a human operator to track and recognize the current situation. Also, when the changes to the process variables caused by the occurring event are subtle or very slow, and do not cause the alarm system to generate alert signals, the abnormal situation can be easily overlooked by an operator. In both cases, a computerized operator support system (COSS) able to detect and classify these process changes would be of great value. This paper describes some recent advances in the design of such a COSS within the framework of the ALADDIN project (Roverso, 2000a), which has been investigating various techniques based on soft computing (neural network and fuzzy logic) as part of the internationally sponsored OECD Halden Reactor Project. The remaining of this paper is organized as follows. Section 2 presents a brief overview of previous work on soft computing applied to transient classification while focusing on the successes and limitations of our previous proposals (Roverso, 1998; Roverso, 2000a; Roverso, 2000b). Section 3 discusses the main contribution of the paper, namely the novel combination of the feature extraction capability of wavelets, with the dynamic classification properties of recurrent neural networks. Section 4 describes an experimental analysis of the discriminatory power of the new proposal, conducted on appositely designed multivariate time-series. Section 5 concludes the paper with a summary of the contribution and a discussion of current open issues. 2. PREVIOUS APPROACHES TO TRANSIENT CLASSIFICATION WITH NEURAL NETWORKS Artificial Neural Networks (ANNs) are particularly suited to deal with the problem of event identification in dynamic processes for several reasons (for a general reference on neural networks see (Hassoun, 1995). First of all ANNs can approximate any well-behaved function with an arbitrary accuracy, which is an essential advantage on methods based on linear regression when the problem at hand presents essential nonlinearities. The biggest advantage of ANNs manifests itself when dealing with hard problems, e.g. in the case of significantly overlapping patterns, high background noise, and dynamically changing environments. Focusing first to applications in the nuclear field, we note that possibly the first to demonstrate the feasibility of using ANNs were Bartlett et al. (1992). That work was developed further and enhanced introducing a modular ANN architecture (Basu, 1995). An important contribution was made by Bartal et al. (1995), where they recognized the necessity for a classifier of being able to provide a “don’t-know” answer when presented with a transient of a kind not contained in its accumulated knowledge base. An alternative way of dealing with temporal data using an implicit time measure was proposed by Jeong et al. (1995). The same authors later proposed (Jeong, 1996) the adaptive template

(2)

matching algorithm which allows to describe transients in a two-dimensional continuum of time and severity level. Many attempts have also been recently published which borrow techniques first developed for speech recognition tasks. Among these, we should particularly mention Dynamic Time Warping (see (Kassidas, 1998) for an application to batch processes, and (Keogh, 1999) for an application to knowledge discovery in time-series databases), and Hidden Markov Models (Kundu, 1993; Kwon, 1999). 2.1 The ALADDIN Prototypes In the last couple of years, various techniques based on neural networks and fuzzy logic for event classification have been developed and tested at the OECD Halden Reactor Project within the framework of the prototype system ALADDIN (Roverso, 2000a). The main motivation for the development of such a system derived from the need of finding new principled methods to perform alarm structuring/suppression in a nuclear power plant (NPP) alarm system. One such method consists in basing the alarm structuring/suppression on a fast recognition of the event generating the alarms, so that a subset of alarms sufficient to efficiently handle the current fault can be selected for the operator, minimizing the operator’s workload in a potentially stressful situation. The scope of application of a system like ALADDIN goes however beyond alarm handling, to include diagnostic tasks in general. The possible application of the system to domains other than NPPs was also taken into special consideration during the design phase. The first system prototype of ALADDIN is described in (Roverso, 1998; Roverso 2000a) together with a comparative evaluation of alternative neural classification models. The best performing architecture was based on a recurrent Elman ANN (Elman, 1990), and a possibilistic fuzzy clustering validation module (Krishnapuram, 1993). The principal problems encountered with this prototype were related to training difficulties of the recurrent neural network module. Recurrent neural networks are known to be hard to train (Mozer, 1992), especially when the temporal relationships that are being modelled span relatively long intervals (Bengio, 1994). These problems were tackled in (Roverso, 2000b), where the use of transient compression and of bagging ensembles (Breiman, 1996) of recurrent NN was proposed. A successful application of the resulting prototype to the classification of various occurrences of PWR NPP anomalous rapid load rejection events, i.e. plant islanding, including particular malfunctions on plant I&C systems (instrumentation, actuators or closed-loop control systems), was also presented. However, two main problems still affected the general applicability of the system: namely (a) the requirement for a trigger signal to indicate the beginning of a transient and (b) the requirement for each transient of being of a fixed predefined length. These restrictions were due to the necessity of compressing the transients to make them treatable (i.e. learnable) by the recurrent NNs. In the remainder of this paper a new architecture is presented which targets these restrictions by using wavelet features (Strang, 1996) extracted from a sliding window on the multivariate transient signals.

(3)

3. WAVELET-BASED RECURRENT NN CLASSIFIER The idea of combining neural networks with multiscale wavelet decomposition has been proposed by a number of authors (Chen, 1999; Dechenes, 1995; Koulouris, 1996; Wang, 1999). These approaches either use wavelets as the neuron's activation functions (Koulouris, 1996), or in a pre-processing phase by the extraction of features from the time-series data (Chen, 1999; Dechenes, 1995). Our approach is of the second kind but differs from earlier work mainly in that it combines wavelet feature extraction with recurrent NNs, and it shows that promising results can be obtained even with a greatly simplified wavelet feature extraction step. Such simplification is particularly welcome in on-line applications, as well as in high-dimensional applications, i.e. applications where the number of process variables and/or the number of transient classes is considerable. The belief that most of the realistic application scenarios are of this kind, led us to investigate the limits of the trade-off between simplicity and functionality, and Section 4 of this paper will show some related experimental results. 3.1 The Concept The idea proposed here is based on using an Elman recurrent neural network (ERNN) (Elman, 1990), or, if necessary, an ensemble of such networks (Roverso, 2000b), to detect and classify process events through the recognition of the corresponding transients generated in a set of monitored process variables (hereafter called signals). The ERNN has one output for each considered event class, and three input streams for each signal. These inputs are the consecutive values of features extracted by Haar wavelet decomposition (Strang, 1996) from a sliding window on the actual signal time-series, and are the following: •= The mean residual signal taken at he highest, i.e. coarser, scale. •= The minimum wavelet coefficient over all the scales. •= The maximum wavelet coefficient over all the scales. The rationale behind this choice is that (a) should capture the general trend of the signal in a compact way, while (b) and (c) should capture important discontinuities (e.g. step changes, spikes, etc.) which would otherwise be severely smoothed out by the compression process. The window size is selected so as to correspond to wavelet dyadic decomposition values (i.e. powers of 2), and consecutive windows are chosen with a slight overlap. Since we are using the Haar wavelet on a dyadic window, we don't need to be concerned with edge effects. The choice of window size can be used here to strike a balance between a high level of transient compression (which greatly improves the performance of the ERNN) and a resolution still sufficient to discriminate among the event classes. As an example, a window length of 32 samples with an overlap set at 5 samples will lead to a 4-scale wavelet decomposition and a linear compression factor of 27:1. Of course we have to take into consideration the fact that each original signal generates three inputs to the ERNN, so that the overall information compression could be considered to be in this case of about 9:1. A smaller window might be needed if the only discriminating features among signal

(4)

transients corresponding to separate event classes, are time differences close to, or less than, 27 samples. This new scheme of transient compression clearly overcomes the problems of earlier solutions (Roverso, 2000b) where transients needed to be of a fixed length and synchronized by a trigger flag signalling the start of a transient (i.e. they assumed the availability of a separate event detection system). 4. EXPERIMENTAL ANALYSIS OF DISCRIMINATION ABILITY Successful tests of the proposed system on data, obtained from EDF (Electricité de France), of the PWR 900 MW Nuclear Power Plant were conducted and were reported in (Roverso, 2000b). In that case the task was to discriminate among seven different transient classes, corresponding to various occurrences of anomalous rapid load rejection events, i.e. plant islanding, including particular malfunctions on plant I&C systems (instrumentation, actuators or closed-loop control systems). Following those tests, and in order to systematically investigate the discrimination ability of the proposed classification model, a series of controlled tests were conducted on artificially generated multivariate time-series. The aim of these tests was to demonstrate the ability of the system to base its classification decision on a range of discriminating feature, from low frequency to high frequency features, and from early to late developing features. Fig. 1 below shows two cases of transient behaviour for four distinct variables. The first variable has two types of behaviour that are distinguishable by the amplitude of the step change. The second variable has two types of behaviour that can be distinguished by the presence of a high frequency peak at the beginning of the transient. The third variable has two types of behaviour that differ in the presence of a small burst in the middle phase of the transient. Finally, the fourth variable has two types of behaviour that are distinguishable by the low frequency change in the later stage of the transient. C ase 1 V a r1

1

0 .5

0 .5

0 1 V a r2

0

200

300

0 1

0

0 -1 100

200

300

1

1

0 .5

0 .5

0

V a r4

100

-1 0 V a r3

C ase 2

1

0

100

200

300

0

1

1

0 .5

0 .5

0

0

100

Fig. 1

200

300

0

0

100

200

300

0

100

200

300

0

100

200

300

0

100

200

300

Basic transient behaviour cases.

By combining these transients in all possible ways, one obtains 16 prototype transients composed of 300 four-dimensional samples, as shown in Fig. 2. These 16 (5)

transient classes constitute a very challenging task for a transient classification system like ALADDIN, since for each transient class, there are 4 other classes which differ from it by only a single feature, forcing the system to make optimal use of the available discriminating features, which, as seen from Fig. 1, span a range of characteristic behaviours. 1

1

1

1

0

0

0

0

C1

-1 0

100

200

C2

-1 300

0

100

200

C3

-1 30 0

0

100

2 00

1

1

1

1

0

0

0

0

C5

-1 0

100

200

C6

-1 300

0

100

200

C7

-1 30 0

0

100

2 00

1

1

1

0

0

0

0

C9 100

200

300

-1 0

100

200

30 0

100

2 00

1

1

0

0

0

0

0

100

200

300

C 14

-1 0

100

Fig. 2

200

0

100

200

0

100

200

C 15

-1

30 0

300

C8 300

C 12

300

1

C 13

200

-1 0

1

-1

100

C 11

C 10 -1

0

0

-1 300

1

-1

C4

-1 300

0

100

2 00

300

300

C 16

-1 0

100

200

300

The 16 class prototypes.

Considering these 16 transients as class prototypes, a training set and a test set, each of 800 transients (50 for each of the 16 classes), were generated by random amplitude warping (± 30%), delay (up to 20%), time warping (± 20%), and noise (±1% gaussian). This introduced variation in the signal amplitudes and transient speeds (length) within each event class. The resulting transients varied in length from about 200 to 400 samples. The window size chosen was 64 samples, with a 9-sample overlap between windows, resulting in "compressed" transients of between 4 and 8 12-dimensional1 "samples" (i.e. windows). The basic recurrent NN architecture used was based on 12 input units, 32 hidden units, and 16 output (classification) units, and was trained for 200.000 epochs. The obtained classification accuracy is shown in Table 1. As it can be seen, the results are very satisfactory, with a 94.1% correct classification rate on the test set, 1.4% misclassification, and 4.5% indecisive classification (i.e. either the system did not classify the transient, or it classified it as belonging to more than one class).

1

12-dimensional because from each window we obtain 3 features for each of the 4 variables.

(6)

Table 1

Classification results. Correct

Misclassif.

Non classif.

Training

95.2 %

0.9 %

3.9%

Test

94.1 %

1.4 %

4.5 %

A similar test, aimed instead at investigating the discriminatory power of the system when the class differences are in the relative timing of signal features (in this case a small delay in the onset of a peak), was also carried out. Window size was this case set at 32 samples with a 5-sample overlap. The obtained classification accuracy was again above 90%, which is remarkable if one considers that the windowing resolution (32-5=27 samples) was actually bigger than the discriminating pike delay (which was of 25 samples). Space constraints do not allow us to go into further detail but a number of tests were also performed which investigated, among other properties, the memory ability of the system, that is to say the capability of discriminating among classes based on temporally distant signal changes. In this case the transient compression property of the system was enough to bring the signal changes sufficiently close for an effective learning by the ERNN. As the previous tests show, the extraction of the wavelet features is at the same time effective in retaining the information to be found in short-lived changes, such as spikes. As already mentioned, positive results were also obtained on the PWR islanding transients discussed in (Roverso, 2000b). 5. CONCLUSIONS In this paper, a new transient classification system has been described, which represents a significant step forward from earlier prototypes. The heart of the system is a novel, and computationally simple, combination of wavelet decomposition, performed on a sliding window on the multivariate transient signals, and recurrent neural network classifiers. A set of systematic tests has shown the good discrimination power of the system, and especially its ability of making concurrent use of temporally separated features, as well as long- and short-lived signal events. The integration of the system with the Halden experimental control room and its COSS'es will be our next step, which, we believe, will demonstrate the on-line capabilities of the system. The applicability of the system to a wide range of processes is also to be expected. REFERENCES Bartal, Y., J. Lin, and R.E. Uhrig, 1995. Nuclear Power Plant Transient Diagnostics Using Artificial Neural Networks that Allow “Don’t-Know” Classifications. Nuclear Tech. 110. Bartlett, E.B. and R.E. Uhrig, 1992. Nuclear Power Plant Status Diagnostics Using an Artificial Neural Network. Nuclear Technology 97, pp. 272-281. Basu, A. and E.B. Bartlett, 1995. Detecting Faults in a Nuclear Power Plant by Using a Dynamic Node Architecture Artificial Neural Network. Nuc. Sci. and Eng. 116.

(7)

Bengio, Y., P. Simard and P. Frasconi, 1994. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Trans. on Neural Networks 5(2), pp. 157-166. Breiman, L., 1996. Bagging Predictors. Machine Learning 24(2), pp. 123-140. Chen, B.H., X.Z. Wang, S.H. Yang, and C. McGreavy, 1999. Application of wavelets and neural networks to diagnostic system development, 1, feature extraction. Computers and Chemical Engineering 23, pp. 899-906. Dechenes, C.J. Fuzzy Kohonen, 1995. Network for the Classification of Transients Using the Wavelet Transform for Feature Extraction. Information Sci. 87, pp. 247-266. Elman, J.L., 1990. Finding structure in time. Cognitive Science 14 pp. 179-211. Hassoun, M.H., 1995. Fundamentals of Artificial Neural Networks. The MIT Press. Jeong, E., K. Furuta, and S. Kondo, 1995. Identification of Transient in Nuclear Power Plant Using Neural Network with Implicit Time Measure. Proc. of the International Topical Meeting on Computer-Based Human Support Systems: Technology, Methods, and Future. The American Nuclear Society Inc. 467-474. Jeong, E., K. Furuta, S. Kondo, 1996. Identification of Transient in Nuclear Power Plant Using Adaptive Template Matching with Neural Network. Proc. of the International Topical Meeting on Nuclear Plant Instrumentation, Control, and Human-Machine Interface Technologies, NPIC&HMIT’96. 243-250. Kassidas, A., J.F. MacGregor, and P.A. Taylor, 1998. Synchronization of Batch Trajectories Using Dynamic Time Warping. AIChE Journal 44(4), pp. 864-875. Keogh, Eamonn J. and M.J. Pazzani, 1999. Scaling up dynamic time warping to massive datasets. In Proceedings of the 3rd European Conference on Principles and Practice of Knowledge Discovery in Databases. Prague. Koulouris, A., B.R. Bakshi, and G. Stephanopoulos, 1996. Empirical Lerning Through Neural Networks: The Wave-net Solution. In G. Stephanopoulos and C. Han (Eds.) Intelligent Systems in Process Engineering. Academic Press, pp. 437-484. Krishnapuram, R. and J. Keller, 1993. A possibilistic approach to clustering. IEEE Transactions on Fuzzy Systems 1(2). Kundu, A., G.C. Chen, and C.E. Persons, 1994. Transient Sonar Signal Classification Using Hidden Markov Models and Neural Nets. IEEE Journal of Oceanic Eng. 19(1) pp. 87-99. Kwon, K.C., and J.H. Kim, 1999. Accident identification in nuclear power plants using hidden Markov models. Eng. Applications of Artificial Intelligence 12, 491-501. Mozer, M.C., 1992. Induction of multiscale temporal structure. In, Advances in Neural Information Processing Systems 4 (Moody, Hanson and Lippman (Ed)), 349-391. Roverso, D. and P.F. Fantoni, 1998. ALADDIN: A Neural Model for the Classification of Fast Transients in Nuclear Power Plants. In Proc. of the 3rd Int. FLINS Workshop on Fuzzy Logic and Intelligent Technologies for Nuclear Science and Industry. Roverso, D., 2000a. Neural and Fuzzy Transient Classification Systems. In D.Ruan (Ed.), Fuzzy Systems and Soft Computing in Nuclear Engineering, 208-234. Physica-Verlag, Heidelberg. Roverso, D., 2000b. Neural Ensembles for Event Identification. In Proc. of Safeprocess'2000, The 4th IFAC Symposium on Fault Detection, Supervision and Safety for Technical Processes, pp. 478-483. Strang, G., and T. Nguyen, 1996. Wavelets and Filter Banks. Wellesley-Cambridge Press. Wang, X.Z., B.H. Chen, S.H. Yang, and C. McGreavy, 1999. Application of wavelets and neural networks to diagnostic system development, 2, an integrated framework and its application. Computers and Chemical Engineering 23, pp. 945-954.

(8)