Input. feedforward. connections. feedback. connections. Output

to be published in: International Conference on Arti cial Neural Networks, ICANN 97 Lecture Notes in Computer Science, Springer Verlag An Extended E...
Author: Arron Wilkerson
1 downloads 2 Views 205KB Size
to be published in:

International Conference on Arti cial Neural Networks, ICANN 97 Lecture Notes in Computer Science, Springer Verlag

An Extended Elman Net for Modeling Time Series Peter Stagge?

Bernhard Sendho ?

Institut fur Neuroinformatik Ruhr-Universitat-Bochum 44780 Bochum, Germany

Abstract The prediction and modeling of dynamical systems, for example chaotic time series, with neural networks remains an interesting and challenging research problem. It seems to be rather natural to employ recurrent neural networks for which we will suggest a new structure based on the Elman net [1]. The major di erence to neural networks as proposed by Williams and Zipser [2] is the way we organize the time steps. The dynamic of the network and of the input ow is de ned in a way as to guarantee that the information at the input node is available at the output node in one time step, irrespective of the connection matrix. We apply the network to the Lorenz and the Rossler system and comment on the problem of evaluating the quality of a network used as a dynamical model.

1 Introduction The standard multi layer perceptron is known to be a universal function approximator [3]. The static mapping realized by feed forward neural networks has been frequently applied to the prediction of dynamical systems. Generally the system state is given by a vector q(t) and the neural network model learns a function F (q (t); t) = q(t + t). The identi cation of the state q(t) itself is a problem, however, we will propose an internal method in sec. 2. The quality of the prediction of nonlinear dynamical systems is usually determined by the deviation between the predicted and the known state at time t + t. In addition to the prediction we demand the modeling of a system. The quality of modeling a system is usually determined by iterating the model, i.e. feeding the output at time t back to the input at time t + t. The model can then itself be seen as a dynamical system and the deviation between invariants like Liapunov spectrum or entropy based measures determine the modeling quality. Simply observing the attractor structure of the model will also prove useful. For feedforward neural networks it is hard to identify the underlying dynamical process. But it was shown that the modeling capacity of those networks can be improved if iterating the network is incorporated in the learning process, [4] [5]. Since we are looking for dynamical systems as models, it is sensible to use recurrent neural networks. In recurrent structures we have, in addition to the static mapping between input and output, also internal states of the network. ? e-mail: fpeter,[email protected]

These states work as a short term memory and they are able to represent information about the preceding inputs. Led by this idea that internal states can represent the past, Elman proposed a neural network in which the activations of the hidden layer are used as an additional input in the next time step. Elman strictly kept the concept of layers in the network structure and used this architecture to predict words in simple [1], and more complex [6] sentences. The recurrent network structure proposed in this paper is an extension of the Elman network, which seems to be more suitable for time series modeling. Williams and Zipser [2] introduced a method to calculate the error gradient with respect to the weights for arbitrarily connected recurrent neural networks. Besides the very high computational cost of their real time recurrent learning algorithm, the number of computations increases as O(n4 ), n being the number of neurons, the algorithm has another disadvantage concerning the time series prediction task: When the activations for time step t are computed, it is possible that, depending on the network structure, the input at time t does not reach the output nodes at all. Instead the input information is transformed and kept in some internal activations from where it might in uence the output in one of the following time steps. This would implicitly transform a one-step prediction task to a two or more step prediction task.

2 Implicit Embedding The success of the prediction of a dynamical system strongly depends on the identi cation of the complete system state. Usually the embedding theorem is used to regroup the scalar measurement values at di erent times into vectors. Let fx(t)g, t 2 [0, 1,    , T ] be the set of measured values. The embedding theorem, [7], then guarantees the existence of a di eomorphism between the reconstructed vectors q~(t) = (x(t), x(t ,  ),    , x(t , (dE , 1) ), (1) and the original state space. However, in practice, the determination of the correct values for the embedding dimension dE and for the time-lag  is problematic, [8]. Furthermore, we expect that the correct reconstruction depends on the structure which is used to model the original dynamical system. Having in mind what the Elman net was originally used for, namely a representation of the past, we will use our recurrent network structure not only to learn the dynamics of the system, but also to identify the correct state space which is needed to achieve optimal representation and prediction. This way we circumvent the problem of model dependent reconstruction. In our system the reconstruction and the modelling problem are handled by the same system simultaneously. Thus, the recurrent networks proposed here, perform implicit reconstruction, which can actually be observed from some single neuron output after successful learning of the system. Of course the more we demand from the network the more exible we should keep its structure.

3 The Recurrent Network Structure Following the remarks made in the introduction, we de ne a recurrent neural network, based upon the work by Elman with the following dynamics and notation:

0 i,1 1 N K X X X yi (t + 1) =  @ wij yj (t + 1) + rij yj (t) + fik ink (t) + i A j =1

yi (t + 1) N; K  wij fik ; rij i ink (t)

j =1

k=1

(2)

: output of neuron i at time t + 1 : Number of Neurons and Inputs : sigmoidal function, e.g. tanh() : forward conection matrix, lower triangular, wii = 0 : input connection matrix, recurrent connections : threshold weights : input to the network at time t

We use standard back propagation for the calculation of the error gradients of weights. The Williams & Zipser structure does not contain the rst sum on the r.h.s. of equation (4). This makes the r.h.s. a function of t only and seems more natural to physicists, as it looks like a discrete di erential equation. However, as we pointed out in the introduction, neglecting this term leads to the problems of insucient propagation of the input information. The network, equation (2), itself de nes a dynamical system, if we identify the output at time t with the input at time t + 1: y(t + 1) = F (y(t)) (3) This is a typical return map and the dimensionality of its dynamical system equals the number of neurons whose output is fed back to the feedforward calculation in the next step (including the output nodes). Unfortunately, it is quite dicult to make any analytical statements about the behaviour of the dynamics produced by the return map, e.g. chaos, periodic orbits, stable xpoints or number of attractors and their basins. One result, which is guaranteed by the nitness of the sigmoid function, is the existance of at least one xpoint. Figure 1 shows an example of a neural network with the introduced recurrent structure. It has one input and one output node. In terms of layers this network would correspond to a net with ve hidden layers consisting of one neuron each, but we believe it is more appropriate to skip the restriction to a layered structure and call gure 1 a bulk of neurons instead. This network structure ( gure 1) will be used for the simulations in the next section.

4 Experiments with chaotic time series The system we want to model is the Lorenz attractor [9] which is de ned by the three coupled di erential equations in (4), which show chaotic behaviour

Input

feedforward connections

feedback connections

Output

Figure1. Recurrent neural network

structure. The lines indicate forward connections and the dashed lines indicate recurrent connections. In one time step all feedforward activations are calculated and propagated through all connections. The feedback connections correspond to the propagation of activations from one time step before.

for the parameter values  = 16:0 , r = 45:92, and b = 4:0. dx(t) = ,x + y; dydt(t) = ,xz + rx , y; dzdt(t) = xy , bz: (4) dt We solved (4) using a 4th order Runge-Kutta method with a time step dt = 0:01. The actual time step for the simulations was 4t = 5dt = 0:05, and we used the data from the x coordinate. We want the network to learn a model of the entire dynamical system. However, the task during the presentation of the training sequence was to predict one time step ahead: x(t) ! x(t + 4t). We used a training and a test sequence of 500 data points each, which corresponds to about 26 stretching and folding processes in the attractor. We took standard backpropagation with learning rates between 0.005 and 0.01 and a momentum term of 0.1. As we aim at modeling the dynamical system and not just at good prediction, the standard prediction error alone does not yield sucient insight. Therefore, we also monitor the largest Liapunov exponent during learning, which we calculate from the network seen as an iterative system (3), [10]. The resulting network after the training process comes very close to the Lorenz system in terms of the deviation of the network's Liapunov exponents from the Lorenz system's Liapunov exponents, see Table 1 for a comparison. Our network has six Liapunov exponents as the activations of neurons 2{7 are used for the next time step. rst exponent second exponent third exponent (4th, 5th, 6th) exponent

neural network 2.12 -0.084 -27.1 (-40.3, -97.5, -250)

Lorenz system 2.16 0 -32.4 ({, {, {)

Table1. Liapunov exponents of the Lorenz chaotic system and the neural network model (values in bits/sec).

We note that not only the rst { most important { exponent but also the second and the third are in good agreement with the real system.

There are several remarks to be made about the learning process. Firstly, when iterating the neural network the starting point of the trajectory depends on the rst input and on the initialisation of the activations which are needed to calculate the rst output. Although the neural net might have other attractors the trajectory robustly nds its way to the attractor which was learned. Secondly, in the nal network we did not nd another attractor or a x point for various initialisations. Whereas at the start of the learning process any network with randomized initial weights converged to a xed point. During the learning process the system can have more than one xed point or a combination of a xed point and a periodic orbit simultaneously. In a

Figure2. Learning curve for the Rossler attractor and images of the reconstructed attractors of the recurrent network (r(t) = (x(t); x(t ,4); x(t , 24))). The increasing quality of modeling the dynamical system at various stages during the training process is striking.

second experiment, we modeled the Rossler [11] chaotic attractor using data from the x-coordinate. Since the folding process occurs very irregularly, it is dicult for a network to model. Therefore, we slightly extended the structure of our recurrent network to be able to represent activation of more than one time step back. However, with these modi cations, the Liapunov exponents are more dicult to calculate. Therefore, we decided to visually inspect the network's performance in \copying" the Rossler attractor. Figure 2 shows

that the decrease in the approximation error coincides with the increasing quality of the modeling of the attractor structure. The insets display reconstructed attractors of the iterated network (r(t) = (x(t); x(t ,4); x(t , 24))), with 2000 points each. The bottom right picture shows the reconstructed attractor from the original x data. Beginning with a x point the network rstly learns a periodic oscillation which corresponds to a plateau in the one step approximation error. By learning higher periodic orbits the net comes closer to the real attractor and the approximation error decreases signi cantly. This shows the importance of a global model of a dynamical system, additionally to the forecasting error.

5 Conclusion Modeling dynamical systems beyond one or few step prediction is shown to be possible with an appropriate recurrent neural network structure. Choosing a recurrent network circumvents the problematic embedding procedure and allows the network to organize the temporal information in a problem and in a model dependent manner. In order to estimate the quality of the network model, global measures like system invariants are more reliable than the standard prediction error. As we have shown visual control of the networks dynamical structure can also lead to an interesting insight into the combination of the decrease in prediction error and the increase of the modeling quality.

References

1. J. Elman. Finding structure in time. Cognitive Science, 14:179 { 211, 1990. 2. R. J. Williams. A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1:270 { 280, 1989. 3. K. Hornik, M. Stinchcombe, and H White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359 { 366, 1989. 4. G. Deco and B. Schurmann. Neural learning of chaotic system behavior. IEICE Trans. Fundamentals, E77 A:1840 { 1845, 1994. 5. J.C. Principe and J.{M. Kuo. Dynamic modelling of chaotic time series with neural networks. In R.P. Lippmann, J.E. Moody, and D.S. Touretzky, editors, Advances in Neural Information Processing Systems (NIPS) 7. Morgan Kau man, 1995. 6. J. Elman. Learning and development in neural networks: the importance of starting small. Cognition, 48:71 { 99, 1993. 7. T. Sauer, J.A. Yorke, and M. Casdagli. Embedology. J. Stat. Phys., 65, 1991. 8. H. Abarbanel, R. Brown, J. Sidorowich, and L. Tsimring. Analysis of observed chaotic data in physical systems. Rev. Mod. Phys., 65:1331 { 1392, 1993. 9. E.N. Lorenz. Deterministic nonperiodic ow. J. Athmospheric Sci., 20:130 { 141, 1963. 10. A. Wolf, J.B. Swift, H.L. Swinney, and J.A. Vastano. Determining lyapunov exponents from a time series. Physica, D(16):285 { 317, 1985. 11. O.E. Rossler. Phys. Letter, 57A, 1976.