Transfer of Learned Knowledge in Life-Long Learning Agents

Transfer of Learned Knowledge in Life-Long Learning Agents Joseph O’Sullivan [email protected] School Of Computer Science Carnegie Mellon University...

Author: Veronica Poole

3 downloads 3 Views 240KB Size

Report

Download PDF

Recommend Documents

Lessons Learned in Developing Knowledge Retention and Transfer Systems

LEONARDO DA VINCI LIFELONG LEARNING PROGRAMM TRANSFER OF INNOVATION

Lifelong Learning Programme Leonardo da Vinci TRANSFER OF INNOVATION

Lifelong Learning in Korea

Adventures in Lifelong Learning

lIFElONG~lEARNING

The Journey of Lifelong Learning

Lifelong Learning for Communities 03. Lifelong Learning Cities: Lifelong Learning for Communities

Lifelong Learning Program

Promoting Lifelong Learning

Fall Osher Lifelong Learning at Vanderbilt. Osher Lifelong Learning Institute

Knowledge Transfer

Academy for Lifelong Learning

Osher Lifelong Learning Institute

Lifelong Learning Institute

LiFELonG LEARninG SoCiEty

LiFeLong LeArning society

Osher Lifelong Learning Institute

CAREER PREPAREDNESS & LIFELONG LEARNING

Academy for Lifelong Learning

Lifelong Learning Programme

Knowledge-Based Agents

LIFELONG LEARNING PROGRAMME

Transfer of Learned Knowledge in Life-Long Learning Agents Joseph O’Sullivan [email protected] School Of Computer Science Carnegie Mellon University February 1997 Abstract Previous work has demonstrated that the performance of machine learning algorithms can be improved by exploiting various forms of knowledge, such as domain theories. More recently, it has been recognized that some forms of knowledge can in turn be learned – in particular, action models and task-specific internal representations. Using learned knowledge as a source of learning improvement can be particularly appropriate for agents that face many tasks. Over a long lifetime, an agent can amortize effort expended in learning knowledge by reducing the number of examples required to learn further tasks. In developing such a “lifelong learning” agent, a number of research issues arise, including: will an agent benefit from learned knowledge, can an agent exploit multiple sources of learned knowledge, how should the agent adapt as a new task arrives, how might the order of task arrival impact learning, and how can such an agent be built? I propose that an agent can be constructed which learns knowledge and exploits that knowledge to effectively improve further learning by reducing the number of examples required to learn. I intend to study the transfer of learned knowledge by life-long learning agents within a neural network based architecture capable of increasing capacity with the number of tasks faced. This proposal describes an appropriate architecture, based on preliminary work in controlled settings. This work has shown that learned knowledge can reduce the number of examples required to learn novel tasks and that combining previously separate mechanisms can yield a synergistic improvement on learning ability. It has also explored how capacity can be expanded as new tasks arise over time and how the order in which tasks arise can be exploited with a graded curriculum. This preliminary work will be applied to a life-long learning agent and extended by carrying out experimental studies of a simulated robot agent in a controlled environment and of a real-world mobile robot agent in Wean Hall.

Thesis Committee: Tom Mitchell (Chair) Sebastian Thrun Manuela Veloso Jude Shavlik (University of Wisconsin) 1

1. Introduction One common factor exists in the majority of previous work on agents [54, 2]: As the agent ages, and further new tasks arise, the agent does not improve its ability to learn those tasks. Yet everyday experience tells us that knowledge learned from previous tasks can help in learning new tasks – by helping in some form to constrain the possible hypothesis space for those tasks. Recently the machine learning community has started focusing on this aspect of a learning system, the ability to exploit previously learned knowledge for the learning of new tasks. For instance, algorithms which learn useful representations for a particular domain [9, 4] and algorithms which learn domain specific models for use in further learning [31, 50, 36] have recently been developed (as will be discussed in detail in Section 4). Such algorithms have been described as “lifelong learning” algorithms [42, 48], or discussed in the context of “learning to learn” or “inductive transfer” [34]. An important contribution to these methods would be to embed them in a domain as an agent. The longer such an agent lives, the more knowledge it can learn, and the more opportunities will exist to exploit that knowledge. Given that little previous work in utilizing learned knowledge has situated the learning in a long-living agent, an exciting opportunity presents itself for a thesis which will address this area. This thesis is “that a life-long learning agent can be constructed on top of a neural network based architecture that learns internal representations and action models over its lifetime to reduce sample complexity for subsequent learning problems.” Section 2 defines the research proposed for such a thesis. In it, I outline the importance of life-long learning agents, present a specific neural network based approach for implementing an agent, and enumerate the issues I see as the goals for the thesis research: will an agent benefit from learned knowledge, how can an agent exploit multiple sources of learned knowledge, how should the agent adapt as a new task arrives, how might the order of task arrival impact learning, and how can such an agent be built. Preliminary work has already been carried out to back up some components of our approach. This work, which is reported in section 3, indicates that utilizing previously learned knowledge and combining separate learning mechanisms can yield a synergistic improvement in learning ability. Furthermore, I have explored how networks can expand capacity over time as new tasks arise, and whether the ordering of task arrival can be exploited with a graded curriculum . Such work has lead me to believe that the architecture underlying the research can be used in practice by a long-living agent, and will result in an improvement in learning. I examine how this will be evaluated, by carrying out experimental studies of a simulated robot agent in a controlled environment and of a real-world mobile robot agent in Wean Hall. In both domains, the life-long learning agent will be compared to a learning agent to test whether fewer examples are required by the life-long learning agent. Finally, in Section 4, I discuss related work, and how the research goals will be addressed, presenting a research plan and time-frame.

1

2. A Life-Long Learning Agent Here, a task is defined as learning a concept from a series of examples. Under supervised learning, an end-user must classify each example. The better the learning algorithm, the fewer examples required to learn a task. One perspective on machine learning is that all learning algorithms are search methods for exploring large spaces of potential hypotheses. A life-long learning agent learns knowledge, and uses that knowledge to direct this search and to choose hypotheses that are most appropriate for a domain from fewer examples. The longer an agent lives, the more knowledge it can learn, and the more opportunities will exist to exploit this knowledge. In my research to date, I have addressed two types of knowledge which a life-long learning agent can learn; action models and internal representations. Action models describe the effects of taking actions in the world, and can be learned in many domains. Action models can be exploited when learning novel tasks by using Explanation Based Neural Networks (EBNN) to bias the hypotheses considered during learning in favor of those that satisfy the predictions of the action models [49, 50, 31]. EBNN takes a 3 step approach to training a neural network from examples of a task and such action models: 1. Explain how the training example could satisfy the target function: Use the domain knowledge to predict the value of the target function for the training example. 2. Analyze this explanation to determine feature relevance: Examine the weights and activations of the networks of domain knowledge used in the above explanation, to extract the partial derivatives of the explained target function value, with respect to each training example feature. 3. Refine the target function network: Update the target network weights to fit both the observed target value (inductive component), and the target derivatives extracted from the explanation (analytical component). Internal representations are the language in which knowledge is described, and as such an internal representation is a constraint on the knowledge itself. Internal representations can be shared between tasks in a single domain, with the constraints on knowledge transferred, by using a simple yet highly effective mechanism, Multitask Learning (MTL) [10, 9, 4]. In MTL, a single network is trained to predict multiple tasks from the same domain using back-propagation. MTL in operation is conceptually simple: 1. Create a single network with n task outputs, with “sufficient” hidden units. 2. Train on n tasks. 3. Monitor performance by performing cross-validation on each individual output during training. MTL frequently outperforms “single-task learning” wherein a network is devoted to each task, with no sharing of internal representations (see section 3 and [10]). These shared internal representations help by being appropriate for tasks in a given domain, and also constrain the training process by biasing hypotheses toward that particular domain. 2

The preliminary research, discussed in section 3, suggests that an architecture which combines the use of internal representations and action models will yield a clear improvement in learning performance. Verifying this conjecture will be a research priority. In addition to learning and using knowledge, a life-long learning agent architecture should also consider that new tasks will arise during the lifetime of the agent. When the agent is created, all tasks may not be known – new tasks may be internally created or they may arrive from an external advisor. How well an approach handles the arrival of new task will impact the application of the agent in real-world situations. Arrival of tasks will influence the agent in two ways; the architecture will have to expand its capacity to cater for new tasks, and the order in which new tasks arrive may itself be exploitable as an aid to the learning process. One means of expanding the capacity of a network architecture as novel tasks arise is to widen a single hidden layer appropriately for each new task. Alternatively, the network depth can expanded. These variations offer different advantages and weaknesses for the transfer learned knowledge (see section 3.2 and 3.3) – compromises between these two variations shall be investigated. As the agent faces novel tasks, it will exploit knowledge learned from previous tasks. This lends importance to the order in which tasks are encountered by the agent. It provides a means for end-user to supply expert knowledge to the agent, in the form of a graded curriculum. Ideally, the agent is initially trained upon simple tasks, and gradually more complex tasks are introduced which are learned more effectively by exploiting the knowledge from the simple tasks. This curriculum aspect of a life-long learning agent is a factor which I see as being important in using these agents in real-world situations. In the thesis research, rather than attempting to derive precise definitions of “simple” and “complex” tasks for arbitrary agents, our primary aim will be to understand what effect task ordering has on learning performance, and to then derive “a priori” statements about tasks usefulness. The priority for the life-long learning agent is to reduce the number of examples necessary to learn a novel task, not to necessarily reduce the computational power and space required to learn that task (although generally it may expected that learning from far fewer examples will go hand in hand with convergence speed up). As such, I am prepared to store all examples seen by the agent during learning, and to reuse these examples for validation or for further learning as may be found necessary. Taken together, I see several key components in a life-long learning agent;

It lives for a long time, facing many tasks over its lifetime, so that effort expended in learning knowledge can be amortized over many tasks. It should exploit as much learned knowledge as possible. Two sources of knowledge – Internal representations and Action Models – are particularly appropriate. It should expand and cater to the novel tasks that arise as the agent ages. It should benefit when a graded curriculum exists. 3

2.1.

Preliminary Architecture

While recognizing that a wide range of architectures can be appropriate for life-long learning agents, I now present a particular architecture that will be the basis of the thesis work and that is supported by our preliminary research. The basic learning mechanism is that of neural networks trained by a variant of Backpropagation[39]. Each example supplied to the agent is classified as one of a number of existing tasks, or as a novel task, in which case the capacity of the architecture is expanded. Without any further knowledge, an agent based on this architecture is a learning agent. For the architecture to be a life-long learning architecture, each novel task introduced is related to appropriate previous tasks in terms of action models. For instance, in one simple setting that I have previously used, a robot agent moves along a corridor in Wean Hall, it repeatedly takes snapshots S of the current state of the world (see Figure 1). The tasks faced by the agent were: Doord : snapshot ! ft; f g, where Doord (S ) = t iff there is a door d meters ahead, f otherwise. Forward1M : snapshot ! snapshot, where Forward1M(S ) = S 0 and S 0 is the observed snapshot after the agent moves 1 meter forward. Given an agent that has initially learned Door0(S ) and Forward1M(S ), when the novel task Door1(S ) is introduced, Door1(S ) is related to Door0(Forward1M(S )) (See Figure 1). The quality of the learned knowledge will determine the accuracy of the approximation, but I have found that even weak action models can be exploited by EBNN[31].

111 000 00000 11111 000 111 00000 11111 000 111 00000 11111 000 111 00000 11111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111

Forward1M(S) Door0(S) Door1(S) New Task

Door

Hidden Units

New Hidden Units

Actions S

Figure 1: An example of the architecture in use by an agent. The agent is capable of performing actions that modify its view of the world. In its environment are tasks for it to learn. In this example, knowledge is transfered by exploiting the fact that Door1 (S ) Door0 (Forward1M(S )), and by a common hidden layer sharing internal representations.

All tasks (an action model is also a task) are represented within a single neural network. The capacity of the network expands as novel tasks arise. By training multiple tasks within a single network, knowledge is transferred via shared internal representations. Seri al MTL, introduced in Section 3.2 allows this to be effective as tasks arise during the lifetime of the agent.

4

2.2.

Research Issues

First and foremost is the issue of what is an appropriate architecture for life-long learning agents. Important characteristics of an appropriate architecture will be the degree to which it can exploit learnable knowledge, and the effectiveness of the architecture in a long living agent. Preliminary studies in Section 3 provide a foundation for the conjecture that using neural networks as a basis for the architecture offers a variety of opportunities for the successful exploitation of learned knowledge. However, neural networks have very rarely been applied to long living agent settings, and in serial learning paradigms. Typically, memory-based machine learning mechanisms (such as in [26]) are utilized. Here, I am prepared to investigate a variety of neural network management techniques in order to reduce the usual drawbacks of local minima, concept drift, etc. In my preliminary work, I have take an ad hoc initial approach, and should consider the work from automatic neural construction (such as [13, 14]) for superior approaches for network expansion, whether to periodically do network restructuring, either by pruning weights and nodes using a technique such as OBD or SBP[18, 25], and splitting single networks into multiple networks containing only highly related tasks (I previously have discovered that the notion of task relevance can be important in knowledge transfer[51]). Given an adequate architecture, there is the issue of suitable metrics. The priority of course is to reduce the number of examples required to learn novel tasks. An interesting bonus will be to discovering whether the average improvement between task i and i+1 about the same for all i > some N . Ideally, as i increases, learning task i will always be easier. In RATLE, indications can be seen of a law of diminishing returns whereby a 2nd piece of advice is not as good as for the 1st piece[19] – this can be explained by the two pieces of advice being redundant, but the thesis research will provide grounds for a full investigation of this issue. Indeed, this thesis should also provide some insight into the redundancy of fundamental different types of knowledge – in this case learning internal representations, and learned action models. There is a hope that I can present a synergistic improvement in learning when using different sources, although my preliminary work yields inconclusive results. A rich variety of further issues could be raised during the research; whether to throw away examples during training, how to most effectively use stored examples to estimate the ability of the learner, allowing the architecture to utilize continuous actions models, etc. Alternatively, improving bias by using ensembles of neural networks is appealing. Computational intensive techniques exist (e.g. [6]) to reduce variance error by using groups of networks. In addition, there are a number of directions in which to expand the application of the architecture – most notably, applying the learning mechanisms within a reinforcement learning framework, to improve the ability to learn task policies.

5

3. Exploratory Research When using this proposed architecture to transfer knowledge in a life-long learning agent, some unanswered questions exist: what degree of improvement can be expected by using learned knowledge in the agent, what is the impact when an agent is faced with a series of tasks (“serial learning”), and how important will be the order in which an agent faces a series of tasks? Satisfactory answers to these questions will be important to the thesis research, and so preliminary experiments were conducted in restricted problem settings. In addition, I have looked ahead to the environments in which I intend to carry out the thesis research.

3.1.

Study 1: Benefits of Learning Knowledge

Previous work has demonstrated learning improvements when either internal representations or action models are used as the sole source of domain knowledge. It is important to estimate what those benefits would be, and whether these improvements would complement or subsume each other. The experimental setting chosen to investigate this was originally used to investigate properties of EBNN [31]. As a robot moves along a corridor in Wean Hall, it repeatedly takes snapshots of the current state of the world. Each snapshot, a combination of 24 sonar readings and a coarse (10x10 pixel) camera image, makes up the input state S from which the tasks are predicted. The tasks were Doord : snapshot ! ft; f g, where Doord (S ) = t iff there is a door d meters ahead, f otherwise, for d = f0; 1; 2; 3; 4gmeters. Forward1M : snapshot ! snapshot, where Forward1M(S ) = S 0 and S 0 is the observed snapshot after driving forward 1M. In this domain, I compared 4 methods; Tabula rasa learning: all tasks were trained in separate networks using Backpropagation – no knowledge was transfered between tasks. MTL: all tasks were trained in the same network using MTL, sharing a common internal representation. EBNN: Doord and Forward-1M are chained to estimate Door d?1 . EBNN uses this estimation to constrain the training of the actual task Doord?1 . All tasks were trained in separate networks using EBNN. Both: An approach combining EBNN and MTL – EBNN used the chaining of Doord and Forward1M to constrain Doord?1 , and all tasks were trained in the same network using MTL, sharing internal representations. Each method was provided with the same data: four different starting points in the corridor, each more than 1 meter from the next doorway. Training sets were generated by starting the agent at these 6

points and moving forward to collect up to N contiguous examples, where N = 10; 25; 40; 55; 70 examples. For each starting point, a separate hold-out set leading up to an unseen doorway was used for cross-validation and a separate evaluation set of the remaining unseen examples was used for calculating the generalization performance. Cross-validation uses the prediction error over the hold-out set to determine the number of training iterations which produces the best performance over this hold-out set. Results were reported by averaging the prediction performance over the examples in the unseen evaluation sets. The results, shown in Figure 2, depict the resulting improvement due to the various sources of learned knowledge. A paired comparison is performed between different tasks for each experiment run, to reduce variance due to different door sequences. For this study, using internal representations proved to be more powerful than using just action models. Furthermore, when both sources were combined, the combination outperformed each individual method. In this example domain, being able to improve learning by even the 10 examples shown by the combined method over MTL is a win – remember that each example had been hand-labelled by an end-user, so any reduction in the number of examples that have to be labelled results in an immediate benefit to the end-user. 200

50 Both Ebnn Multitask Singletask

180

Both Sources of Knowledge Multitask(Shared Representations) EBNN(Action Models) Null Hypothesis

45

35 30

% Improvement

# wrong evalation examples

40 160

140

120

25 20 15

100 10 80

5 0

60 0

10

20

30 40 # training examples

50

60

-5 10

70

(a)

20

30

40 Number of examples

50

60

70

(b)

Figure 2: The learning curve in (a) is affected by the inherent ordering in the training data (initially, no positive examples, etc). To compensate for this ordering, in (b) under a paired comparison each method is compared to tabula rasa learning (where Xmethod is the performance of a method on a set of examples, this ?Xmethod )100 ). The null hypothesis is that the learning algorithm does paired comparison plots (XtabulaXrasa tabula rasa not benefit from the learned knowledge. Results are presented with 95% confidence intervals. When few examples are available, no learning algorithm does well. As the number of examples increase, the more knowledge available to the agent, the better the learning performance.

3.2.

Study 2: Serial Learning

The ability for an architecture to increase capacity appropriately as new tasks are encountered will be an important feature of a practical life-long learning agent. One of the architectural mechanisms, MTL, is ill-suited to a long-living agent. Either a network must be initially created containing all possible tasks, or the complete network must be retrained as each novel task is added. Otherwise the ability of MTL to transfer knowledge between multiple tasks is uncertain. 7

I examined MTL on a series of pose recognition tasks. From a database collected for face recognition, the learner had to classify visual examples as “a happy face”, “an angry face”, etc. This database was helpful for our study, as the learned internal representations can be visualized via hidden weight activations (see Figure 3). On this database, attempting to apply MTL without retraining from scratch led to an overall performance degradation. Continually retraining from scratch would be prohibitive in a long-living agent so I derived a modified version of MTL, Serial MTL, which improved performance over single-task learning, while allowing training to be continuous (Table 1). Initial State: A neural network created for n tasks, and trained on those tasks. MTL: Serial MTL: (1) Create network for n + 1 tasks. (1) Expand network to n + 1 tasks: (a) augment network with extra outputs for novel task (b) augment hidden layer with extra hidden units (c) Increase learning rate from new units to novel task outputs (2) Train network on n + 1 tasks. (2) Train network on n + 1 tasks. (a) During training, decay increased learning rate to the normal rate. (3) Monitor network performance by performing cross-validation on each individual output during training.

Table 1: Differences between MTL and Serial MTL As a new task arrived, naively adding in extra hidden units and an output for the task itself (Serial MTL without steps 1c and 2a) resulted in an improvement over single task training for that task. However it proved to be ineffective at allowing older tasks to exploit knowledge contained in the new examples, leading to a reduction in overall learning performance. When the learned internal representations were examined, the novel task was discovered to be chiefly utilizing representations from the old tasks, with the new nodes correcting residual error. To compensate for this, when a new task was presented, learning rates were increased for that task on the weights from the newly created hidden nodes. This learning rate is then decayed to the normal rate which allows tasks to exploit all nodes equally. For the face recognition problem, this technique lead to serial learning showing a decrease in generalization error both for the old and the new tasks. A limited amount of evaluation data hampered a statistical study of Serial MTL, and so an artificial problem setting was constructed.

3.2.1.

The “Peaks” Domain

A simple problem setting, that allowed for easy statistical evaluation, was constructed (with [11]) to test serial MTL. The “peaks” dataset consists of numeric vectors (A; B; C; D; E; F ), each numeric element of the input vector being represented by a Gaussian centered over its value, with tasks being functions on triplets from this vector, for example IF A 12 THEN B ELSE C , IF B 12 THEN E ELSE C , etc. Such functions can share a common decision, consequent, antecedent or some combination of all three. The degree to which two tasks are related depends upon the factors they have in common. 8

Figure 3: With Serial MTL, several tasks were learned from each example – (a) happy (b) forward (c) open (d) angry (e) left (f) sad (g) right (h) neutral and (i) up. This shows the weight activations corresponding to the internal representations learned for each task. Step 2a of Serial MTL (see Table 1) was not performed during training, to exaggerate the internal representations by reducing sharing. Networks were trained from this domain under single-task learning, MTL, and serial MTL. Each method was provided with the same data, N randomly generated examples, where N = 10::140. Due to the artificial problem setting, a large number of of random unseen evaluation examples (500) was generated. Learning parameters for each method were optimized by hand for the problem setting prior to running the experiment. Results for each method were reported by averaging the prediction performance on the examples in the unseen evaluation sets over 20 runs. The resulting generalization performances is presented in Figure 4. In summary, MTL can be demonstrated to have a significant performance improvement over single-task learning, needing 30 fewer examples than single-task learning to achieve the same level of performance. Several variants of serial MTL were explored and while none matched the best case performance of MTL, they all exploited the multiple tasks to outperform single task learning, needing for instance 80 examples to perform as well as single task learning does with 90. 13 single-task series multitask parallel multitask

12

% Error (0% = perfect)

11 10 9 8 7 6 5 4 3 0

20

40

60 80 Number of examples

100

120

140

Figure 4: Generalization performance on unseen evaluation sets summarized from 20 runs, for single-task learning, MTL, and serial MTL. MTL consistently outperforms serial MTL, yet serial MTL still demonstrates an improvement over single task learning. Results are presented with 95% confidence intervals.

9

3.3.

Study 3: Role of a Curriculum

0011

Figure 5: A simple robot agent – in this case perception is limited to N

=

12 sonars. Each sonar reports a range estimate of the nearest object to the agent in the cone traced by a sonar echo.

I have shown some evidence to support the idea that the mechanisms to be used in the thesis research can reduce the number of examples need to learn tasks. I now look in detail at how these mechanisms will actually be used in a life-long learning agent. In a simulated test-bed to be used during the thesis research, a simple robot agent perceives a world using N sonars (Figure 5). This world is made up of walls that face only north, south, east or west. The agent initially faces a compass direction. To aid in training this agent, an advisor constructs a curriculum of tasks, T1 ..Tn chosen to increase in complexity, with representations discovered from early tasks being useful for solving more complex tasks. In this domain, that curriculum could be: T1i(S ) In state S , is there a wall perpendicular to sonar i? T2(S ) Is there a wall within range of the agent’s perception? T3(S ) How far is that wall? T4(S ) In what direction does the wall run? T5(S ) Is the agent in a corridor? T6(S ) What is the direction of the corridor? T7i(S ) Is there a corner at sonar i? ... ...Is there an open door at sonar i?, How far away is the open door?... In addition, the agent can perform 4 discrete actions – go forward 1 unit, reverse 1 unit, and turn left or right 360 N degrees. These restrictions on movement have been chosen so as to simplify the analysis. They leads to 4 further tasks which can be learned;

A1(S ) A2(S ) A3(S ) A4(S )

Predict S 0, the state of the agent from state S after going forward 1 unit. Predict S 0, after the agent reverses for 1 unit. Predict S 0, after the agent turns left 360 N degrees. Predict S 0, after the agent turns right 360 N degrees.

Consider task T1i. T1i(S ) is true if, when the agent is in state S, a wall of sufficient length (that is, the length of the wall extends over enough sonar beacons) is perpendicular to sonar i at distance d. Looking at Figure 6a, when reading (i) = d, then for a wall to be perpendicular reading(i + 1) = cosd and reading(i + 2) = cosd3 , where = 360 and N = number of sonars. I 2N can rewrite these statements as constraints on any pairings of sonars, in particular

C1(i) : reading(i)( cos1 ) ? reading(i + 1) C2(i) : reading(i)( cos13 ) ? reading(i + 2) 10

=

0 =0

00000010 111111 1010 00 11 00 11 00 11 00 11 00 1010 11 00 11 1010 1010 (a)

i

d

T1 (S)

i+2 i+1 i

-C(i) 2

C(i) 1

-C(i) 1

C(i) 2

-B -A

B

A -1

1

-1

1

0

i

i+1

i+2

N

(b)

Figure 6: a) The agent’s relevant perceptions when a wall is perpendicular to a sonar i, (b) One solution to “T1i (S ): Is there a wall perpendicular to sonar i” using a multi-layer network with sigmoidal output 1 1?exp(?x) . More such constraints can be generated to increase the robustness of the solution in the event of shorter walls or noisy data. For a wall of sufficient length, T1i (S ) is true if C1 (i) and C2 (i) are both satisfied. A network with one hidden layer can be constructed to represent this solution (Figure 6b), with the hidden units representing these constraints. In this particular case, the leftmost hidden unit is monotonic in the angle between a perpendicular wall to sonar i, and that actually detected by sonars i and i + 1. By combining this unit with its inverse (offset by a small bias), the combination of the two units will peak only when that angle is 0, that is when C1 (i) is satisfied (Appendix A gives further details of how a network architecture can test for equality). C2 (i) can be represented similarly, as in turn can T1i (S ). This shows that sufficient internal representations to represent the solution can be derived. The action models can be used to help the network arrive at the particular weights which give this internal representation. Consider that a task based on sonar i is the same as the task based on sonar i + 1 if the robot rotates the width of one sonar, namely;

T1i(S ) = T1i 1(A4(S )) +

Given an action model which predicts to some degree the effect of A4 on S , knowledge learned from that action model can be used to transfer information in T1i+1 to T1i . In this case, the action models help to enforce a symmetry in the network. This study can be continued for other tasks. Take T2 – for a wall to be within range of the agent’s perception, that wall must be perpendicular to one sonar (due to the initial assumptions about the world, and the agents ability to move in the world). It can be stated that for a wall to exist, C1(i) and C2(i) must both be zero for some i. A shallow network which examines these constraints can solve this problem (as in Figure 7). As this is continued for more tasks, I note that it is known that a single hidden layer network can approximate any function. However, in expressing solutions to this study, it gets easier to describe the more complex tasks using deeper networks. By building on the initial tasks, a complete representational solution to the range of tasks encountered by the agent can be derived (and is sketched in Figure 8). Appendix A verifies that each task can indeed be represented in this configuration. 11

T2(S) C(0)+..C(i-1)+C(i)+..C(N) 1 1 1

0

C(0)+..C(i-1)+C(i)+..C(N) 2 2 2

1

i-2

2

i-1

i i+1

i+2

N

Figure 7: A solution to “Is there a wall within range of the agent’s perception?”, where only links with non-zero weights are shown, and the above subsection is duplicated for all sonars i.

T3 T4 T5 T6 T7(i) ...

PN PiN PiN PiN PNi

if C1 (i) & C2 (i) then jij else 0 =1 if C1 (i) & C2 (i) then dirn(i) else 0 N N =1 if C1 (i) & C2 (i) & C1 (i + 2 ) & C2 (i + 2 ) z N N =1 if C1 (i) & C2 (i) & C1 (i + 2 ) & C2 (i + 2 ) then dirn(i) else 0 z N N i=1 C1 (i) & C2 (i) & C1(i + 4 ) & C2 (i + 4 ) z =1

...

T4 0

N

T1(S) T1(S) T2

0

T6

T3

0

i

Figure 8: A network structure solving the tasks

i+1

N

T 7(S) T7(S)

T5

i+2

T1 (S )

N

through

T7 (S )

. See text for a description of

constraints on the weights.

We have shown that for a simulated setting at least, we can demonstrate methods by which representations and action models are utilized within a graded curriculum to improve a life-long learning agents ability to learn. This simulated setting will be used during the thesis research to explore whether a real agent can discover such solutions and to test the agents abilities.

3.4.

Lessons

Learned knowledge can benefit a life-long learning agent: The experiments presented are preliminary; the peaks dataset was artificial and may well be atypical of most real-world domains, and the doors experiment was restricted in terms of scope – only one action model was utilized and the agent did not face many tasks nor exist long enough to amortize the learning of that action model. However the experiments do indicate what sorts of improvements we might expect for a life-long learning agent. An important metric is the number of examples assuming a single wall.

zassuming a single corridor

12

required to learn a task, and the following table presents the approximate improvement seen for the experiments when compared to the learning of the task without using any previously learned knowledge. Experiment Learned Knowledge Peaks Internal Representations Doors Action Models Internal Representations Both

Reduction in Examples 30 out of 130 60 out of 150 90 out of 150 90 out of 150

In each of these cases, fewer examples were required when knowledge was utilized during learning. This strongly suggests that learned knowledge will be important for a life-long learning agent. Tradeoffs lurk in serial learning: To be confident about life-long learning, we had to understand how MTL, the component that had been least understood as regards serial learning, would perform when adapted to serial learning. Serial MTL was found to still demonstrate an improvement in generalization power over single-task learning, while not matching the performance of unmodified MTL on the Peaks domain of Experiment 1. We shall see in section 3.3 that allowing Serial MTL to also expand the network depth could provide an alternative means of exploiting the shared representations. This variation may be explored during the thesis research. Using multiple sources of knowledge is beneficial: In order to understand whether it would be worthwhile expending effort in utilizing multiple sources of knowledge, it was important to check that these different sources did not simply provide redundant learning improvement. In many settings, such redundancy alone could be valuable – however, it would be harder to measure empirically. Experiment 1 showed that using internal representations under MTL consistently outperformed using just action models under EBNN, but also showed the combination of both methods indeed outperforms the individual mechanisms.

3.5.

Validation: A Real World Test-bed

In addition to the simulated test-bed, outlined in Section 3.3, an important focus of this research will be to validate the life-long learning agent for tasks faced by a vision based instructable indoor mobile robot. In addition to being a domain in which we have extensive experience [29, 30, 31, 45], the domain has the interesting properties that there exists a clear tradeoff between the cost of gathering new training examples versus that of extracting more information from current knowledge, and that the robot’s perception of the world is easy to define, yet complex to model, posing a difficult problem for non-learning agents. Figure 9 shows Amelia, a commercial mobile robot based on Xavier, a robot developed at Carnegie Mellon University [30]. It is built on a 4 wheeled omni-directional base, 21 inches in 13

(a)

(b)

Figure 9: (a) Amelia (b) The environment, as perceived with the camera and sonar system on Amelia diameter. The sensors used are the color camera visible on the pan/tilt head on top of the robot, and a ring of sonar sensors visible approximately one third of the way down the robot torso. In addition, Amelia has a 4 degree of movement gripper, allowing for some simple manipulation tasks. Amelia has a large number of basic capabilities which have been tested over the past years: a low level navigation system for obstacle avoidance [44], a high level system for topological map based navigation [46], and a large repertoire of elementary actions including trash collection, marker tracking and wall following. Followings the ideas in [20, 21], we have previously had experience with “appearance-based” recognition tasks in this domain, discriminating within an 12 object database, based on around 30 examples of each object [51]. As such, we believe a suitable family of tasks in this domain is that of recognizing objects with which agent can interact. A second, more ambitious family of tasks, would be actively searching for or interacting with those objects. However the initial experiments will be to repeat training upon tasks T1 ..T7 and the curriculum used in the simulated domain. Allowing an agent to exist long enough to perform a large number of tasks in both the simulated and real world test-beds provides a foundation upon which we will:

evaluate the design, implementation and the degree of learning improvement for any mechanism. examine the degree to which the quality and source of prior knowledge impacts learning. examine the tradeoffs involved in serial training. examine the degree to which a training curriculum can assist learning.

4. Related Work The importance of using constraints provided by domain-related knowledge in the learning process has long been acknowledged [8]. In fact, systems where built-in knowledge restricts what can be learned have arguably seen the most effective learning agents[7, 33]. Without built-in knowledge, there are fundamental limits on the generalization accuracy that can be expected from a learner that learns just from examples [12, 40]. This results in it being increasingly difficult for a system to 14

generalize well as the the complexity of the learning task increases [15, 52]. A life-long learning agent would provide a flexible means of specializing to a domain of interest, especially domains that are difficult to model in advance[24, 17, 33]. The proposed life-long learning agent improves its ability to learn thanks to combining two mechanisms. MTL[4, 10] is a generalization of work on hints[1, 47]. With hints, extra tasks are important learnable features of a main task, and are supplied as network outputs to pass domain specific knowledge to the network. MTL recognizes that when learning many related tasks, a Backpropagation network can use these tasks as inductive bias for each other and thus learn better. EBNN[50, 49] extends earlier work that exploited hand crafted slope information during learning[43] by itself learning slope constraints. Both of these methods have been applied with success in various application domains (EBNN to chess, object recognition, robot control, – see [50], MTL to object recognition, medical diagnostics, – see [10]) but prior to this research combining these approaches had not been explored, nor had a life-long learning agent been constructed from these approaches (although the idea of a life-long learning agent is from [49]). Additional mechanisms exist to exploit domain knowledge, suitable for life-long learning agents, which could be explored in future work. For an exhaustive list see the introduction to [34], but some salient work includes learning an appropriate inductive learning algorithm for a domain[37], optimizing algorithm control parameters to a domain such as distance metrics[51], or generating synthetic data according to domain rules[33, 5]. Work has been done on the serial transfer of knowledge. [36] has explored transferring knowledge between networks, but with a primary goal of improving learning speed, rather than improved generalization. Indeed, in [35], negative results are reported for generalization performance. A life-long learning agent will have to face the the difficulties acknowledged with serial learning; preventing catastrophic forgetting while attempting to maintain a mutual benefit between tasks. Serial MTL as discussed here appears to be an effective, if ad hoc, solution. We note that there have been some more negative results on the use of prior knowledge in learning – [32] presents a simple example where prior knowledge hinders learning in FOCL, and [51] demonstrates that it is important for two tasks to be related, in order for them to share bias. While acknowledging that a notion of task-relevance would be beneficial to life-long learning agent, MTL and EBNN are both weak methods of utilizing prior knowledge, and in general degrade to the performance of a tabula rasa learner. EBNN achieves this with an LOB* heuristic which monitors the correctness of the prior knowledge, and MTL depends on both individual cross-validation, and sufficient hidden-units, to prevent tasks being damaged by prior knowledge. An aspect of long-living agents which is not considered in this research is that of concept drift. As the agents ages, the domain and the agent’s interaction with the domain may change. For instance, the agent’s camera may be nudged, causing every image to be shifted slightly. Or the agent’s bearings may loosen, causing the action models to have to gradually shift. A large body of research is on going in this area (see [16]), and this has been addressed specifically for robot agents[26]. This latter approach, of monitoring the relevance of training examples, could be applied on top of our approach. However, we feel that the research issues outlined are the priorities for this thesis, and will only consider the issue of concept drift if time allows. 15

Both CHILD[38] and IS[41, 55] are existing agents which “learn to learn” in simulated mazelike environments. Both solve complicated reinforcement learning tasks, and transfer the skills learned to similar but more complex tasks, improving learning ability. In CHILD, a hierarchical structure is constructed by constructing transitions between actions, which describe the degree to which one behavior should follow another – these transitions can in turn be layered, and their inherent knowledge to transfer further tasks in the same domain. IS transfers knowledge in the form of procedural bias – an agent learns to modify it’s update policy, and hence its ability to learn. In both cases, the improvement of an agent on future tasks depends on whether those future tasks are extensions of previous tasks. In our case, we care about any future tasks in the same domain, and utilize sources of bias which are more widely applicable, namely action models and shared internal representations. FourEyes [23] is a visual database query system that learns to describe components of visual data (“sky”, “grass”, etc) driven by user annotation of scenes. As with our domain, examples are expensive, and FourEyes attempts to learn domain bias during learning, to reduce the future need for examples. This domain bias is an adaption of the weighting for nearest neighbor learning, something we ourselves have considered previously[51], and is similar to the shared internal representation used here. FourEyes is not itself able to act, and so is unable to use action models as a source of domain knowledge. RATLE[19] is a reinforcement learning agent, operating in a simulated world, that incorporates advice from an external advisor during learning, to improve its ability to learn. It does not improve its ability to learn itself, yet is highly appropriate for many problem settings where an external advisor is available. Overall, our thesis research aims to progress beyond such work by broadening the range of knowledge which the agent uses to improve its ability to learn, by investigating how a long-living agent can expand to novel tasks, and by understanding the impact of a curriculum on the agent. Previously, curriculums have been used to insert procedural bias into a learning agent. Typically, such as in [3], a robot is trained on easy missions (tasks), and gradually moved to more difficult ones. Here, we will explore how a curriculum can be used to bias learning, by inserting constraints on the search space of the possible hypotheses considered by the agent. Agents have been embedded within simulated worlds since PENGI[2]. Such simulated worlds have been invaluable as a controlled means of investigating the performance of an agent[27, 53], and in this work we too rely on the simulated environment described in section 3.3 for basic validation of results. More recently, researchers have been moving agents from such simulated environments onto real-world robots – navigating office buildings [45] and picking up objects [28]. We consider it an important goal to demonstrate that a life-long learning agent can be effective as an real-world agent.

5. Research Plan

16

Action Models Internal Representations Serial Learning Agent Implementation Multiple Sources Curriculum Simulation Real World Writing

June ’96

Dec ’96

June ’97

Dec ’97

June ’98

Figure 10: The detailed schedule for addressing the proposed thesis The time-line for this research (see Figure 10) derives from several goals: Implementing a life-long learning agent utilizing multiple sources of learned knowledge on the test-beds described in sections 3.3 and 3.5. This requires the test-beds to be implemented, along with a basic implementation of a life-long learning agent. A curriculum of tasks must be generated, derived from, say, the curriculum from section 3.3, and the agent trained on a large number of tasks, to build up a background of learned domain knowledge for use in the experimental investigations. Investigating how multiple sources of knowledge may be combined to yield synergistic learning improvements. This requires the ability to perform a paired comparison between multiple approaches, i.e. each approach must be compared over the same sets of examples. Given a capability to record and reuse examples, a series of experiments will be conducted, investigating the learning performance of the agent over time while utilizing different sources of knowledge. Investigating the effects and benefits of a serial learning approach towards using learned domain knowledge and learned representations. In a separate series of experiments, the variants of parallel and serial learning outlined in Section 3 will be implemented and compared. This includes “from scratch” parallel learning, a more rigorous investigation of the use of individual learning rates in serial learning (Section 3.2), and the investigation of less ad hoc approaches to serial learning. Investigating how an end-user may use a curriculum to provide bias to an agent. Section 3.3 showed how a curriculum can be used to constrain future hypothesis for other tasks to satisfy basic properties of a domain. While we expect an agent to succeed irrespective of the arrival pattern of tasks, the possible boost from curriculum training can be also investigated in a series of experiments comparing both random and teacher-guided (“ideal”) task arrival order and investigating the sensitivity of the agent to small perturbations from the “ideal” ordering. Converting and documenting data from the experimental studies into a format suitable for the UCI Machine Learning Archive[22]. Life-long learning agents are a young field, and would be furthered by the availability of a shared test-bed.

17

6. Conclusions It is expected that this thesis will demonstrate life-long learning agents, by implementing such agents on mobile robots, and allowing them to exist in simulated and real-world domains. The agents will be flexible, by being able to exploit multiple sources of learned knowledge from the environment, and by being able to operate under a serial learning paradigm. The thesis will improve understanding of when and where learned knowledge is useful, whether that learned knowledge is in the form of learned internal representations or of learned action models. The thesis will also improve the understanding of employing neural network based agents in long-living settings, and under serial learning. In addition, the thesis will consider the value of having a user in the machine learning loop, by exploring how curriculums can be exploited in life-long learning.

18

References [1] Yaser S. Abu-Mostafa. Hints. Neural Computation, 7:637–671, 1995. [2] Philip E Agre and David Chapmann. Pengi: An implementation of a theory of activity. In AAAI-87, 1987. [3] Minoru Asada, Shoichi Noda, Sukoya Tawaratsumida, and Koh Hosoda. Purposive behavior acquisition for a real robot by vision-based reinforcement learnin. Machine Learning, 12((2/3)):279–303, May 1996. [4] Jonathan Baxter. Learning Internal Representations. PhD thesis, The Flinders University of South Australia, 1994. [5] D. Beymer and T. Poggio. Face recognition from one model view. In Proceedings of the International Conference on Computer Vision, 1995. [6] Leo Breiman. Bagging predictors. Technical report, University of California- Berkeley, 1994. [7] Rodney Brooks and Maja Mataric. Real Robots, Real Learning Problems. In Robot Learning, chapter 8, pages 193–213. Kluwer Academic Publishers, 1993. [8] J. G. Carbonell, R. S. Mitchalski, and T. M. Mitchell. General Issues in Machine Learning. In R. S. Mitchalski, J. G. Carbonell, and T. M. Mitchell, editors, Machine Learning: An Artificial Intelligence Approach, chapter 1, pages 3–24. Tioga Publishing, Palo Alto, 1983. [9] Rich Caruana. Learning Many Related Tasks At the Same Time With Backpropagation. In Advances in Neural Information Processing Systems 6. Morgan Kaufmann, December 1994. [10] Rich Caruana. Algorithms and applications for multitask learning. In Lorenza Saitta, editor, 13th International Conference on Machine Learning, pages 87–96, Bari, Italy, July 1996. [11] Rich Caruana. Multitask Learning. PhD thesis, School Of Computer Science, CMU, To appear. [12] T. G. Dietterich. Limitations of inductive learning. In Proceedings of the Sixth International Workshop on Machine Learning, pages 124–128. Morgan Kaufmann, 1989. [13] S. E. Fahlman and C. Lebiere. The cascade-correlation architecture. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2. Morgan Kaufmann, 1990. [14] M Frean. The upstart algorithm: A method for constructing and training feedforward neural networks. Neural Computation, 2:198–209, 1991. [15] D. Haussler. Decision Theoretic Generalizations of the PAC Model for Neural Net and other Learning Applications. Inform. Comput., 100:78–150, 1992. [16] Miroslav Kubat and Gerhard Widmer, editors. ICML-96 Workshop on Learning in ContextSensitive Domains, Bari, Italy, July 1996. 19

[17] J. Laird, E. Yager, C. Tuck, and M. Hucka. Learning in tele-autonomous systems using Soar. In Proceedings of the 1989 NASA Conference of Space Telerobotics, 1989. [18] Y. LeCun, J. S. Denker, and S. A. Solla. Optimal brain damage. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 2. Morgan Kaufmann, 1990. [19] R. Maclin and Jude W. Shavlik. Creating advice-taking reinforcement learners. Machine Learning, 22(1-3):251–281, 1996. [20] H. Marase and S. Nayar. Visual learning and recognition of 3-d objects from appearance. International Journal of Computer Vision, 14:5–24, 1994. [21] Barlett Mel. Seemore: A view-based approach to 3-d object recognition using multiple visual cues. In M.C. Mozer D.S. Touretzky and M.E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. MIT Press, 1996. [22] C. J. Merz and P. M. Murphy. Uci repository of machine learning databases. [http://www.ics.uci.edu/ mlearn/MLRepository.html], 1996. Irvine, CA: University of California, Department of Information and Computer Science. [23] T. P. Minka and R. W. Picard. Interactive learning using a “society of models”. Technical Report 349, M.I.T. Media Laboratory Perceptual Computing Section, 1996. Submitted to Special Issue of Pattern Recognition on Image Database: Classification and Retrieval. [24] Tom M. Mitchell. Becoming increasingly reactive. In Proceedings of the Eight National Conference on Artificial Intelligence, pages 1051–1059. AAAI Press/MIT Press, 1990. [25] John Moody. Prediction risk and architecture selection for neural networks. In V. Cherkassky, J.H. Friedman, and H. Wechsler, editors, From Statistics to Neural Networks: Theory and Pattern Recognition Applications, NATO ASI Series F. Springer-Verlag, 1994. [26] Andrew W. Moore. Efficient Memory-based Learning for Robot Control. PhD thesis, Trinity Hall, University of Cambridge, March 1991. Computer Science Technical Report 209. [27] Nils J. Nilsson. Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research, 1:139–158, Jan 1994. [28] Stefan Nolfi and Domenico Parisi. Evolving non-trivial behaviors on real robots: an autonomous robot that picks up object. In M. Gori and G. Soda, editors, Proceedings of Fourth Congress of the Italian Association of Artificial Intelligence, Topics in Artificial Intelligence, pages 243–254. Springer Verlag, 1995. [29] Joseph O’Sullivan. Towards a robot learning architecture. In Wei-Min Shen, editor, Learning Action Models - Papers from the 1993 AAAI Workshop, Technical Report WS-93-06, pages 47–51. AAAI Press, Menlo Park, CA 94025, 1993. [30] Joseph O’Sullivan and Karen Haigh. Xavier Manual. Carnegie Mellon University, School of Computer Science, http://www.cs.cmu.edu/˜Xavier, March 1994. 20

[31] Joseph O’Sullivan, Tom M. Mitchell, and Sebastian B. Thrun. Explanation Based Learning for Mobile Robot Perception. In Katsushi Ikeuchi and Manuela Veloso, editors, Symbolic Visual Learning. Oxford University Press, 1996. [32] Micheal J. Pazzani and Dennis Kibler. The role of prior knowledge in inductive learning. Machine Learning, 9:54–97, 1992. [33] Dean Pomerleau. Neural Network Perception for Mobile Robot Guidance. Kluwer Academic Publishers, 1993. [34] Lorien Pratt and Sebastian Thrun, editors. Learning To Learn. Kluwer Academic Publishers, 1997. [35] Lorien Y. Pratt. Transferring Previously Learned Back-Propagation Neural Networks to New Learning Tasks. PhD thesis, Rutgers University, Department of Computer Science, May 1993. Technical Report ML-TR-37. [36] Lorien Y. Pratt. Experiments on the Transfer of Knowledge between Neural Networks. In Computational Learning Theory and Natural Learning Systems, Constraints and Prospects, chapter 14, pages 523–560. MIT Press, 1994. [37] L. Rendell, R. Seshu, and D. Tcheng. Layered concept-learning and dynamically-variable bias management. In Proceedings of IJCAI-87, pages 308–314, 1987. [38] Mark Ring. Two methods for hierarchy learning in reinforcement enviroments. In From Animals to Animats 2: Proceedings of the 2nd International Conference on Simulation of Adaptive Behaviour:. MIT Press, 1992. [39] David E. Rumelhart, Geoffery E.Hinton, and Ronald J. Williams. Learning internal representations by error propagation. In David E. Rymelhart and J. L. McClelland, editors, Parallel Distributed Processing. Vol I+II. MIT Press, 1986. [40] C. Schaffer. Overfitting avoidance as bias. Machine Learning, 10:153–178, 1993. [41] Jurgen Schmidhuber, Jieyu Zhao, and Marrco. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, Corso Elvezia 36, CH-6900-LUGABO, Switzerland, June 1996. [42] Bart Selman, Rodney A. Brooks, Thomas Dean, Eric Horvitz, Tom M. Mitchell, and Nils J. Nilsson. Challenge problems for artificial intelligence. In Proceedings of AAAI-96, Menlo Park, CA, August 1996. AAAI Press. [43] Patrice Simard, Bernard Victorri, Yann LeCun, and John Denker. Tangent prop – a formalism for specifying selected invariances in an adaptive network. In J. E. Moody, S. J. Hanson, and R. P. Lipmann, editors, Advances in Neural Information Processing Systems 4, pages 895–903. Morgan Kaufmann, December 1992. [44] Reid Simmons. The curvature-velocity method for local obstacle avoidance. In International Conference on Robotics and Automation, Minneapolis MN, April 1996. 21

[45] Reid Simmons, Richard Goodwin, Karen Haigh, Sven Koenig, and Joseph O’Sullivan. A modular architecture for office delivery robots. In The First International Conference on Autonomous Agents, Feb 1997. [46] Reid Simmons and Sven Koenig. Probabilistic navigation in partially observable environments. In International Joint Conference on Artificial Intelligence, Montreal Canada, August 1995. [47] Steven C. Suddarth and Alistair D. C. Holden. Symbolic-neural systems and the use of hints for developing complex systems. Int. J. Man-Machine Studies, 35:291–311, 1991. [48] Sebastian Thrun. Lifelong learning: A case study. Technical Report CMU-CS-95-208, School Of Computer Science, Carnegie Mellon University, November 1995. [49] Sebastian Thrun. Explanation-Based Neural Network Learning: A Lifelong Learning Approach. Kluwer Academic Publishers, Boston, MA, 1996. to appear. [50] Sebastian B. Thrun and Tom M. Mitchell. Integrating inductive neural network learning and explanation-based learning. In Proceedings of IJCAI-93, Chamberry, France, July 1993. IJCAI. [51] Sebastian B. Thrun and Joseph O’Sullivan. Discovering Structure in Multiple Learning Tasks: The TC Algorithm. In Proceedings of ICML, Torino, Italy, July 1996. [52] L. G. Valiant. A Theory of the Learnable. Comm. ACM, 27:1134–1142, 1984. [53] S. Vere and T. Bickmore. A basic agent. Computational Intelligence, 6:41–60, 1990. [54] M. Wooldridge and N. R. Jennings. Intelligent agents: Theory and practice. Knowledge Engineering Review, 10(2), 1995. [55] Jieyu Zhao and Juergen Schmidhuber. Incremental self-improvement for lifelong multiagent reinforcement learning. In Fourth International Conference on Simulation of Adaptive Behavior, Cape Cod, USA, 1996.

22

A Neural Tool-box Section 3.3 sketched neural networks capable of implementing various tasks. In order to verify those sketches, it must be the case that single layer networks can represent gating, and equality. Here is presented examples of trained networks which solve each case, to show that the sketches can be completed. In each case, the hidden units use 1+1e?x as the nonlinear sigmoidal. A1. Gating: if x > c then y else 0 1 0.5

X

0.286 −22.6

if x then y else 0

1

0.8

1

0

0.2

0.4

0.6

0

1

−14.92

2.13

0.8

0

0.2

0.4

0.6

0

0.5

4.626 −9.28

−2.26

y

0 1

0.873

0.5 0 1

7.918

0.5

(a)

(b)

Figure 11: (a) A network which solves “if

x >

0:2 then

y

else 0” (b) top: the target network output,

bottom: the actual network output

A2. Equality:

x=y 1 0.5

X

−25.49

−4.46 −50

25.48

−20.80

y

21.24

0 1

3

0.8

1

0

0.2

0.4

0.6

0

0.8

1

0

0.2

0.4

0.6

0

0.5

x=y

1

−50

0.5 0 1

−3.97

0.5

(a) Figure 12: (a) A network which solves “x

(b) =

y

” (b) top: the target network output, bottom: the actual

network output

23