Computational Intelligence and Games: Challenges and Opportunities

International Journal of Automation and Computing 05(1), January 2008, 45-57 DOI: 10.1007/s11633-008-0045-8 Computational Intelligence and Games: Ch...
2 downloads 2 Views 4MB Size
International Journal of Automation and Computing

05(1), January 2008, 45-57 DOI: 10.1007/s11633-008-0045-8

Computational Intelligence and Games: Challenges and Opportunities Simon M. Lucas Centre for Computational Intelligence, Department of Computing and Electronic Systems, University of Essex, Colchester CO4 3SQ, UK

Abstract: The last few decades have seen a phenomenal increase in the quality, diversity and pervasiveness of computer games. The worldwide computer games market is estimated to be worth around USD 21bn annually, and is predicted to continue to grow rapidly. This paper reviews some of the recent developments in applying computational intelligence (CI) methods to games, points out some of the potential pitfalls, and suggests some fruitful directions for future research. Keywords:

1

Games, machine learning, evolution, temporal difference learning (TDL), neural networks, n-tuple systems.

Introduction

Games have long been seen as an ideal test-bed for the study of artificial intelligence (AI). Early pioneers of computing and information theory such as Alan Turing and Claude Shannon were intrigued by the idea that computers might one day play grand-master level chess, and sketched out ideas of how this might be achieved. Much of the academic work on AI and games has focused on traditional board games until recently, with the challenge being to beat humans at a variety of challenging games. This strand of work has been successful in achieving this goal for many complex games, such as checkers and chess. Checkers is now a solved game[1] , the result being a draw if both players play optimally. Chess is far from being solved, but computers play at world-leading level. Checkers and chess playing algorithms depend on efficient game-tree search coupled with a heuristic board evaluation function, which may be handprogrammed or learned. In AI research, the emphasis is on producing apparently intelligent behaviour using whatever techniques are appropriate for a given problem. In computational intelligence (CI) research, the emphasis is placed on intelligence being an emergent property. Examples of CI methods include evolutionary computation, where intelligence may arise from due to principles of natural selection, and neural networks, where intelligence may arise from the collective connected behaviour of many simple neurons. Both checkers and chess have seen the successful application of CI methods, with Fogel and Chellapilla0 s famous work on Blondie 24[2] (checkers) and Fogel et al.[3] on Blondie 25 (chess). Both used a fairly standard minimax engine, but instead of using a hand-coded heuristic evaluation function, used a multi-layer perceptron (MLP). The MLP had specially designed input layers with spatially arranged inputs covering all possible sub-squares of the 8 × 8 board. While the architecture of the MLP was fixed, the weights were evolved using co-evolution. Essentially, a (15 + 15) evolution strategy (ES) was used, but the fitness function was based on how well each network played against a set of opponent networks drawn from the current population. Manuscript received September 24, 2007; revised November 7, 2007 E-mail address: [email protected]

The Blondie systems achieved high levels of play both at checkers and at chess, and have acted as a great inspiration to many researchers. However, due to the way the MLP was embedded in a minimax engine, it is unclear how much of the playing ability comes from the neural network, and how much from the minimax search. Other games such as Go have been less susceptible to this combination of game tree search and heuristic position evaluation, due to the large branching factor and the difficulty of designing a useful position evaluation function. Go remains a grand challenge for computer science research, though very recently Monte Carlo methods have shown great promise, especially for smaller versions of the game such as 9 × 9 (standard Go is played on a 19 × 19 board). While traditional game-tree methods search broad but shallow, Monte Carlo methods are relatively narrow but deep. A position is evaluated by playing it many thousands of times using random moves until the end of each game is reached (at which point the true value of that particular line of play is known for sure). Each position is then characterized by the probability of winning from that position, based on those random play outs. Successful Go programs such as MoGo[4, 5] use a more sophisticated version of this called UCT Monte Carlo (UCT stands for upper confidence bounds applied to trees). As each games is played out, common initial moves are shared in a tree structure so that each line of play updated the statistics for all the tree-nodes that led to it. More attention is given to playing out promising nodes in the tree, but this is also balanced by the need for exploration. UCT Monte Carlo has achieved remarkable results in Go, and the method is a testament to the power of statistical techniques in general. It is reminiscent of the success of statistical language processing methods compared to their more grammar-based alternatives[6] . It is likely that UCT Monte Carlo also has significant potential to be applied to other games. While the branching factor of Go is much larger than chess, it is still of a different nature to that encountered in video games, with their virtually continuous state-spaces. While the traditional focus of academic research has been on board games, and this research continues to flourish, there is also a growing body of work on applying AI and

46

International Journal of Automation and Computing 05(1), January 2008

CI to video games, and many techniques are applicable to any game regardless of genre, though may require careful application to get best results. For example, it seems likely that UCT Monte Carlo may have important application to control non-player characters (NPCs) in video games, but although the concept is quite simple, getting the best out of the method has been the subject of intensive research over the last few years. In the UK, the Engineering and Physical Sciences Research Council (EPSRC) has just funded a network grant to foster closer collaboration between industry and academia in the area of AI and games[7] . Commercial games are based on increasingly sophisticated simulated environments, and present human players with world views approaching photo-realism. As virtual worlds become more complex, it becomes harder to NPCs with sufficiently interesting and believable behaviour. Current NPC control techniques mostly involve methods such as finite state machines, scripting, and search. Most of the behaviour is hand-programmed, involving much human effort to code and to test. Although games companies have occasionally used AI techniques such as genetic algorithms and neural networks[8, 9] , the methods used in games often lag behind the state of the art in machine learning. Similarly, machine learning and AI have much to benefit from games by applying and extending algorithms to more challenging environments than are usually encountered[4] . Furthermore, as games utilise more realistic physics engines, there is scope for greater synergy between games and robotics research: the control of a humanoid robot, or of an NPC may eventually share much in common. One challenge for research in this area is to find sufficient common ground to make collaboration attractive for both parties. This is because while academics may be most interested in how smart an NPC is, or how well it learns from its environment, a games company is in the business of selling games, and is therefore likely to be most interested in how intelligent NPC behavior can be channeled into making games more enjoyable. Many academics are now eager to test their methods in the complex virtual worlds of modern console games, and guided by industrial collaboration. This will bring great benefits both to the research community and to the games industry. Peter Molyneux OBE, an internationally renowned game designer recently commented: “AI is certainly the undiscovered country of games design. Any game genre – from hardcore shooters to the most story-driven adventure game – would be truly revolutionized by AI driving plot, characters and scenarios.” On the other hand, there is still some resistance to the more open-ended game play that advanced AI and CI techniques could facilitate. This comment on a blog site1 expresses a view held by many games industry insiders: “I think the most drastic different in the mind set of an AI programmer/designer in big budget game and an academic is that in a real game project you really need to think about the production issues (anything unpredictable is no-no) . . .” By their very nature, however, next generation video games will be more open-ended, and therefore unpredictable. This is in the same way that multi-player on1 Memoni, http://realtimecollisiondetection.net/blog/?p=36, October 13, 2007

line games are more open-ended and less predictable due to the intelligence and variable behaviour of the other human players. The views of game-industry insiders such as these cannot be simply dismissed. However, it is likely that their objection to more open-ended AI is based on what seems to be currently achievable, rather than what might one day be achieved. Hence it is not unpredictable behaviour that is unacceptable, but annoying or uninteresting behaviour. As NPC behaviour becomes more intelligent, and NPCs are better able to learn, this opens the way for new genres of game. For instance, the NERO game[10] involves players pre-training characters to act according to a desired tactical doctrine. This is achieved by NERO using a neuro-evolution approach internally to evolve soldier behaviours that satisfy the given training tasks. Fig. 1 shows a screenshot from the game.

Fig. 1

A screenshot from the NERO game

NERO presents an example of the open-ended game-play that CI methods enable, and one can expect to see many new games that exploit CI in various ways. Games have also been studied from a social, economics, and financial perspective. A classic social game is the iterated prisoner0 s dilemma (IPD)[11] , which has led to many CI studies of evolving and co-evolving strategies for this game. CI methods have also been applied to many aspects of finance, such as supply-chain management[9] and sequential bargaining[12] . CI methods have been used not only to optimize agent strategy, but also to design marketplace rules. A good example of this can be found in the work of Dave Cliff. He noted that Gode and Sunder[13] had found that stochastic agents with zero intelligence (ZI) could approximate the equilibrium price when trading in a suitably structured market. Cliff[13] showed that the ZI agents could vary significantly from this performance when the market structure was varied from the Gode and Sunder[14] setup. To overcome this, Cliff added a small amount of intelligence to the agents (hence ZI plus (ZIP)), endowing each one with a simple learning rule that adjusted their behaviour based on observations of the last transaction. The ZIP agents were shown to better approximate human behaviour than the ZI ones.

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities

Cliff then reasoned that if such simple agents could perform well, they could be used within a fitness function for an evolutionary algorithm where the aim was to optimize the design of the marketplace[15] . This is a great example of the creative application of CI. In game terms, this is using evolution to design the game rather than agents to play the game.

2

Playing games with CI

This section describes the two main ways in which a neural network or fuzzy system can be interfaced to a game. These are as an action selector, or as a value function. As an example, we consider the Mountain Car task introduced by Moore[16] , and also described in [17, pp. 214]. The problem is illustrated in Fig. 2. The task is to drive a car to reach a goal at the top of the hill on the right (the rightmost vertical line), but the engine force is weaker than gravity. There is a barrier to the left (the leftmost vertical line) which the car stops dead at. At each time step, there are three possible actions: accelerate left (at = −1), accelerate right (at = 1), or apply no force (at = 0). The car engine has insufficient power to overcome gravity, so the problem is a bit deceptive: to reach the goal as quickly as possible, it may be necessary to accelerate (apply a force) away from the goal in some circumstances. The following equations specify the update rules for position (s) and velocity (v).

47

generate a set of future states. Each future state is then evaluated by the value function. The action is then chosen that would lead to the state with the highest value. Fig. 3 shows that for this example, the state reached by taking the accelerate right action has the highest value, and so that would be the chosen action. Value function methods are very flexible because more computation can be applied to look further into the future, and hence get a truer picture of which actions lead to the best states in the long run.

Fig. 3 Illustration of the value function applied to the Mountain Car problem

To interface an action selection network to this, the inputs to the network are the current state of the system (defined by the car0 s current position and velocity), and the output must be interpretable as one of the possible actions. For complex action sets, one might use one output per possible action, and take the action associated with the highest output value as in Fig. 4.

Fig. 2

The Mountain Car problem

st+1

=

bounds (st + vt+1 )

(1)

vt+1

=

boundv (vt + 0.001at − 0.0025cos(3st )). (2)

The bounds for s and v are as follows: −1.2

≤ st+1

≤ 0.5

(3)

−0.07

≤ vt+1

≤ 0.07.

(4)

Each epoch (trial) starts with the car having a random position and velocity drawn uniformly within the above bounds. Each trial terminates when the car reaches the goal, or after 2 500 steps. The best solutions for this particular configuration of the problem solve the task in approximately 52 steps on average (each step has a cost of 1, so this is referred to as a score of −52). The value function method has been a popular way to solve this, and is depicted in Fig. 3. All possible actions are applied to a model of the current state of the car to

Fig. 4 Illustration of the action selection method applied to the Mountain Car problem

For this case, there are three possible outputs that have a naturally ordered relationship, so a natural way to code is to have a single output unit, and define the output intervals to represent left, neutral, and right.

48

International Journal of Automation and Computing 05(1), January 2008

The mountain problem was considered to be reasonably hard for neural networks to solve, and was used as a benchmark by Whiteson and Stone[18] as an initial test for their NEAT+Q evolutionary/temporal difference hybrid learning algorithm, where NEAT is an abbreviation for neuroevolution with augmenting topologies, and +Q denotes the addition of Q-learning, a particular type of temporal difference learning (TDL). Running TDL with a MLP (or a CMAC architecture) may produce value functions that have a meaningful interpretation, in this case, the value of each state is the negative of the number of steps that the agent should take to reach the goal from that state. Whiteson and Stone[18] solved the problem with a combination of neuro-evolution and Q-learning (a kind of TDL). They noted that the value functions produced by NEAT+Q were very different from the TDL ones, yet the performance of these functions was very similar. Either TDL or evolution can train a network to perform well at this task, and take hundreds (or even thousands) of epochs or fitness evaluations. The value function learned by TDL using a grid-based decomposition of the input space and after 3000 epochs is shown in Fig. 5. It is shown that the value had an average score of -61. The performance is similar but inferior to the results reported for CMAC architectures which use perturbed overlapping grids, but illustrates how TDL attempts to learn a meaningful value function.

interpretation of the output explained above, optimal solutions could be tanh(wv) where w is a single weight with a high value such as 10, enough to make the neuron behave like a sign function sgn(v). For the value function, abs(v) is near-optimal, as shown in Fig. 6, which assigns the value of a state to the absolute value of the velocity in that state. It is also worth noting that genetic programming would be expected to produce solutions for this problem with relative ease, especially if an abs(·) function was included in the terminal set.

Fig. 6

There are two important points raised here. First, the problem setup is critical. The problem can be made easy or hard depending on the details of the setup (input coding, action versus state value function). The second point that leads from this is that one should be wary of drawing any conclusions from tests on problems that have such simple solutions.

3

Fig. 5

A value function learned with TDL after 3000 epochs

However, as we can see, the problem can be made trivial depending on how it is presented to the learner. While experimenting on this problem, it was observed that the value function could look very different from run to run, while generally converging to networks with high performance. Furthermore, it was observed that when the inputs used were the square of the velocity and the position, it was easily solved by a single layer perceptron network, with all the best solutions tending to ignore the position, and strongly weigh the velocity. This led to the hypothesis that only the absolute value of the velocity matters for this problem, and this was confirmed empirically, with abs(v) having an average score of -51.97 (standard error = 0.09) based on 100 000 trials. This compares favourably with the Whiteson and Stone[18] quoted averages of 52.75 (NEAT+Q) and 52.02 (Sarsa + CMAC), though NEAT+Q took hundreds of thousands of episodes to reach that performance. This leads to two very simple near optimal solutions, depending on whether a state value or direct action method is used. For an action selection network with the ordered

A near-optimal solution

Learning game strategy

For many real-world game applications, it is vital that learning can happen quickly. The main methods for game strategy learning are TDL and evolution (including coevolution). From previous works of the author and others, it seems that these methods have significantly different strengths and weaknesses. When it works, TDL often learns much more quickly than evolution. This is not surprising in one sense, since TDL learns a value function that is constantly updated during game-play. However, note that for the most part this is done in an unsupervised way; it simply learns that game states that are close in time should have similar values. At certain points in the game (which points depends on the game; typically the end of the game in the case of board games), the true game-state is known, and over the course of training, this ties the value of states close to those states, and so on. The problem is that for non-trivial games, and for complex function approximators such as MLPs, there is no guarantee that the value function will converge to a reasonable value (let alone optimal). Until recently, there had been very little work comparing TDL with evolution. With recognition of the importance of such comparisons, the body of work is growing steadily. Previous comparisons deal mostly with learning evaluation functions for board games. An influential early experiment on using TDL to learn board game evaluation functions is due to Tesauro[19] , who achieved world-class performance when training neural network-based Backgammon evaluation functions with self-

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities

play. Pollack and Blair[20] tried training evaluation functions using the same game and function representation, but instead of TDL they used the simplest possible form of evolutionary algorithm (EA), a hill-climber where a single individual was repeatedly mutated, and the mutation was kept only if it won a number of games over the non-mutated individual. The algorithm worked, but its end results were inferior to those of Tesauro[21] . Darwen[22] did a set of similar comparisons for backgammon, and found that a population-based EA eventually outperformed TDL when training a linear board evaluation function, even though the EA was much slower. However, when training a nonlinear evaluation function, board evaluators trained by the EA never reached the performance of those trained by TDL for the simple reason that the EA took too long; to be effective, the EA needed to evaluate the same pair of individuals many times. Runarsson and Lucas[23] investigated TDL versus coevolutionary learning (CEL) for small-board Go strategies. There it was found that TDL learned faster, but that with careful tuning, CEL eventually learned better strategies. In particular, with CEL it was necessary to use parentoffspring weighted averaging in order to cope with the effects of noise. This effect was found to be even more pronounced in a follow-up paper by Lucas and Runarsson[24] , comparing the two methods for learning an Othello position value function. Kotnik and Kalita[25] found that evolution outperformed TDL at learning to play the card-game rummy, which unlike the board games in the above studies is not a game of perfect information. There have also been some comparisons using controlbased problems. Taylor et al.[26] compared TDL and the NEAT neuro-evolution algorithm for ‘keep-away’ robocup soccer, and found that evolution could learn better policies, though it took more evaluations to do so. Results also showed that TDL learned better policies when the task was fully observable and NEAT learned faster when the task was deterministic. Gomez et al.[27] investigated an impressive range of reinforcement techniques, including several versions of neuroevolution and TDL, on four increasingly difficult versions of the benchmark pole-balancing problem, ranging from simple to very hard. In striking contrast to some other studies, the best evolutionary methods universally outperformed the best temporal difference-based (TD-based) methods, both in terms of learning speed and in terms of which methods could solve the harder versions of the problem at all. Further, there was significant differences between different neuro-evolutionary and TD-based methods, with the best TD-based techniques sometimes outperforming some evolutionary techniques. The relative ordering of the algorithms was similar across the different versions of the problem. One feature of all their pole balancing problems was that they could be solved by relatively small neural networks. Some general results are beginning to emerge, but there is still much to be learned. One tendency that can be observed in many (though not all) of the observed studies is that TDL learns faster than evolution, but evolution eventually learns better strategies. Under which conditions this is true is an important question.

4

49

TDL versus evolution for simulated car racing

The choice of game to use for benchmarking CI methods is important. On one hand, it is preferable for it to be simple and fast to compute, since some learning algorithms may require millions of simulated game steps to converge. If it is too simple, however (as could be said of Mountain Car), then it is questionable what can be learned from it. Lucas and Togelius[28] compared TDL and evolution for a simplified car racing game. In that study, they used a point-to-point car racing game, i.e., the objective being to drive a car to visit as many way-points as possible within a given number time steps (they used 500). The way-points were randomly generated, and had to be visited in order. At any time the agent could only see the next three way-points. The cars also had simple underlying models. This made the simulation very fast to compute, allowing millions of gamesteps per second to be calculated (though the actual speed would then depend on the complexity of the controller). Fig. 7 shows a sample run for a hand-designed heuristic state value function, illustrating that the controller got stuck orbiting a way-point. The heuristic was to value proximity to the next way-point, while punishing excess speed, which can lead to oscillations, or the orbiting shown here (for more details see [28]).

Fig. 7 A naive controller that became stuck after orbiting 5 waypoints

Because each trial or episode (i.e., a run of 500 time steps) is randomly generated, there is a natural random variation in the scores obtained by a particular agent. For this task, average scores of about 20 are reasonable, above 30 are good, and above 40 are very good. Two types of car were used, normal and holonomic. The normal car was a simplified model of a bang-bang type radio controlled toy car, which at each time step had choices: accelerate forward, accelerate backward, steer left, and steer right. The holonomic car was modelled as a point mass. Again the controller had 5 choices at each time step: apply no force, or apply a force in one of the four compass point directions. Each network had three inputs: the square of the velocity, the distance to the next waypoint, and the angular differ-

50

International Journal of Automation and Computing 05(1), January 2008

ence between the current heading of the car and a straight line connecting the car with the next waypoint. Overall, the best results were obtained by evolving state value MLPs. However, when TDL worked, it could learn these state values much more quickly. For the normal car, TDL was competitive with evolution, with the very best learned controllers having very similar performance. While evolution also worked well for the holonomic car, TDL failed badly for this case. For the normal car, when TDL did learn it often achieved high performance within the first 10 epochs, and in these cases offered much faster learning than evolution. A sample successful run is shown in Fig. 8. Note that that score is plotted against epoch for TDL rather than against generation (used for evolution). An epoch is approximately equivalent to a single fitness evaluation in terms of the computation required. One of the most surprising conclusions of this work was just how sensitive the results were to details of the setup that might have hitherto seemed unimportant. It was observed that the MLP failed to learn a good controller for the holonomic car when all three inputs were used. The holonomic car does not have a heading in the sense of a normal car, since it can be accelerated in any of the four directions independently of its current direction of travel. Hence, the heading information seems to act as a spurious input. Evolution failed to learn to ignore this, but was much more successful when using only the two-input version, as can be seen in Tables 1–2. Regarding the learning speed of each technique, some sample runs demonstrate this. Fig. 9 shows an average of 20 runs of the evolutionary algorithm, evolving an MLP for the normal car (might replace with a perceptron). Table 1

Statistics for 20 learning runs of the normal car

Method

Mean

Standard error

EVO-MLP

35.0

0.24

EVO-perceptron

30.2

0.33

TDL-perceptron

26.2

1.4

Heuristic

18.8

1.0

Table 2

Statistics for 20 runs of the holonomic car

Method

Mean

Standard error

EVO-MLP-2

32.2

0.06

EVO-perceptron-2

26.7

0.01

EVO-perceptron-3

18.5

2.1

EVO-MLP-3

11.3

0.7

Heuristic

26.8

0.21

The extent to which results from one domain (e.g. rummy) can be used to make predictions for another domain, such as car racing or Othello is unclear. One of the significant findings of the current work is that even when tasks are very similar (and all to do with a simple car racing problem), minor changes in the problem setup can introduce unpredictable biases resulting in significant changes in the relative performance of each method. There currently seems to be very little theoretical research that can help us here, and a great need for empirical investigation.

Fig. 8 Learning a perceptron-based state-controller with TDL (a successful run)

Fig. 9

5

Evolving a state-value MLP for controlling a normal car

Othello, and the choice of architecture

The first strong learning Othello program developed was Bill[29, 30] . Later, the first program to beat a human champion was Buro and Logistello[31] , the best Othello program from 1993–1997. Logistello also uses a linear weighted evaluation function but with more complex features than just the plain board. Nevertheless, the weights are tuned automatically using self-play. Logistello also uses an opening book based on over 23 000 tournament games and fast game tree search[8] . More recently, Chong et al.[32] co-evolved a spatially aware MLP for playing Othello. Their MLP was similar to the one used by Fogel and Chellapilla[2] for playing checkers, and had a dedicated input unit for every possible subsquare of the board. Together with the hidden layers, this led to a network with 5 900 weights, which they evolved with around 100 000 games. The weighted piece counter (WPC) used in the current paper has only 64 weights. The results below show that optimal tuning of such WPCs can take hundreds of thousands of games, and relies heavily on parent-child averaging. These considerations suggest that further improvement in the performance of evolved spatial MLPs should be possible. In this section we review recent results on learning a position value function for Othello. Othello is a challenging unsolved game, where the best computer players already exceed human ability. It is an

51

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities

interesting benchmark for CI methods, and the author has been running a series of competitions to find the best neural network (or other function approximation architecture) for this game. First, we give a brief explanation of the game. Othello is played on an 8 × 8 board between two players, black and white (black moves first). At each turn, a counter must be placed on the board if there are any legal places to play, else the player must pass. At each move, the player must place a counter on an empty board square to “pincer” one or more opponent counters on a line between the new counter and an old counter. All opponent counters that are pincered in this way are flipped over to the color of the current player. The initial board has four counters (two of each color) with black to play first. This is shown in Fig. 10, with the open circles representing the possible places that black can play (under symmetry, all opening moves for black are identical). The game terminates when there are no legal moves available for either player, which happens when the board is full (after 60 moves, since the opening board already has four counters on it), or when neither player can play. The winner is the player with the most pieces of their color at the end of the game.

the site. When a network is uploaded, it is played against the standard heuristic WPC for many games (initially 1 000, but this has been reduced to 100 to reduce load), and this gives it a ranking in the trial league. Then, for particular competition events, entrants are allowed to nominate two of their best networks to participate in a round-robin league. For the competitions, and for the results in this paper, all play is based on value functions evaluated at 1-ply. All games are played with a 10% chance of a forced random move. Hence, this is no longer strictly speaking Othello, but it is a better benchmark for our purpose. When evaluating two feed-forward neural networks on the true game of Othello, there are only two possible outcomes, depending only on who plays first. The best network found in this way so far was an MLP. Co-evolution finds it hard to learn MLPs for this task, and for a long time the best network was a TDL-trained MLP. For the 2006 IEEE Congress on Evolutionary Computation (CEC) Othello competition, however, a new champion was developed by Kyung-Joon Kim and Sung-Bae Cho. They seeded a population with small random variations of the previous best MLP, and then ran co-evolution for 100 generations. This was able to produce a champion that performed in the round-robin league significantly better than the other players, and than the TDL-trained MLP that it was developed from. This points toward the value of TDL / evolution hybrids.

5.1

Fig. 10

The opening board for Othello

As play proceeds, the piece difference tends to oscillate wildly, and some strategies aim to have few counters during the middle stages of the game to limit possible opponent moves. Fig. 11 shows how piece difference can change during the course of a game.

Fig. 11 Typical volalite trajectory of piece difference during the game of Othello

The author has been running an Othello neural network web server for the past two years. During that time, well over one thousand neural networks have been uploaded to

Parent-child averaging

A surprising result obtained by Runarsson and Lucas[23] for small-board Go, and validated even more dramatically for Othello[24] , is the way that standard co-evolution can fail to learn very well. In both these studies, forced random moves were used to get a more robust evaluation of playing ability. To get good performance from co-evolution, they found it essential to use parent-child averaging. This is a technique that had been previously used in evolution strategies, and was also used by Pollack and Blair[20] with their (1 + 1) ES for back-gammon. Runarsson and Lucas[23] experimented with a wide variety of settings for the ES, but unless parent child weighted averaging was used, all performed poorly. The following equation explains the process. The weights of the parent neural network are in vector w0 and the weights of the best child network are in vector w. The β factor controls the weighting between parent and child. With β = 1.0 we get a standard ES. With β = 0, no information from the child is used at all, so no evolution occurs. Remarkably, they found that small values of β worked best. The results plotted in Fig. 12 (from [24]) show how standard co-evolution (i.e. a standard ES with relative fitness function) fails to learn much, but learning does occur with β = 0.05. Overall, they found that coevolution with weighted parent-child averaging performed better than TDL, and was able to learn a WPC that can outperform the standard heuristic weights. TDL was not able to match this, and while it learned more quickly, it was unable to match the standard heuristic weights.

w ~ 0 ← βw ~ + β(w ~i − w ~ 0 ).

(5)

52

International Journal of Automation and Computing 05(1), January 2008

Fig. 12 Co-evolutionary learning average performance (probability of a win) versus the heuristic player, plotted against generation (The 1/45000 term indicates the scaling on this axis; multiple each value by 45000 to find the number of games played. The grey lines indicate one standard deviation from the mean, and this run used a (1, 10) ES[24] .)

5.2

n-tuple architectures

Given that background, this now brings us on to the subject of architecture. In the field of CI and games, by far the most popular choice of architecture is the neural network. This comes in various forms, but perceptrons and multi-layer perceptrons are the most common. Also popular are more flexibly structured networks such as those created with NEAT, where there is no distinct layering and connections may evolve between any neurons, and the number of neurons is also not fixed. Very recently, the author has experimented with using ntuple networks for this task. n-tuple networks date back to the late 1950s with the optical character recognition work of Bledsoe and Browning[33] . More detailed treatments of standard n-tuple systems can be found in [34, 35]. They work by randomly sampling input space with a set of n points. If each sample point has m possible values, then the sample point can be interpreted as an n digit number in base m, and used as an index into an array of weights. The value function for a board is then calculated by summing over all table values indexed by all the n-tuples. The n-tuple works in a similar way to the kernel trick used in support vector machines (SVMs), and is also related to Kanerva0 s sparse distributed memory model. The low dimensional board is projected into a high dimensional sample space by the ntuple indexing process. This work is still in its initial stages, but has already proved to be remarkably successful. An n-tuple network trained with a few hundred of self-play games was able to significantly outperform the 2006 CEC champion. Fig. 13 illustrates the system architecture but shows only a single n-tuple. Each n-tuple specifies a set of n board locations, but samples them under all equivalent reflections and rotations. Fig. 13 shows a single 3-tuple, sampling 3 squares along an edge into the corner.

Fig. 13 The system architecture of the n-tuple-based value function, showing a single 3-tuple sampling at its eight equivalent positions (equivalent under reflection and rotation)

5.3

How it works

Fig. 13 illustrates the operation of a single n-tuple. The value function for a board is simply the sum of the values for each n-tuple. For convenient training with error backpropagation, the total output is put through a tanh function. Each n-tuple has an associated look-up table (LUT). The output for each n-tuple is calculated by summing the LUT values indexed by each of its equivalent sample positions (eight in the example). Each sample position is simply interpreted as an n digit ternary (base three) number, since each square has three possible values (white, vacant, or black). The board digit values were chosen as (white=0, vacant=1, black=2). By inspecting the board in Fig. 13, it can be seen that each n-tuple sample point indexes the look-up table value pointed to by the arrow. These table values are shown after several hundred self-play games of training using TDL. The larger the black bar for a LUT entry, the more positive the value (the actual range for this figure was between about +/ − 0.04). Some of these table entries have obvious interpretations. Good for black means more positive, good for white means more negative. The LUT entry for zero corresponds to all sampled squares being white: this is the most negative value in the table. The LUT entry for twenty six corresponds to all sampled squares being black: this is the most positive value in the table. This can be expressed in the following equation. v(b) =

X

l(d)

(6)

d∈D(b)

where b is the board, v(b) is the value of the calculated value of the board, d is a sampled n digit number in the set D(b) of samples, given the n-tuple, and l is the vector of values in the LUT. Given the explanation above for how the value function is calculated, the l entries of LUT can be seen as the weights of a single layer perceptron. The indexing operation performs a non-linear mapping to high-dimensional feature space, but that mapping is fixed for any particular choice of n-tuples.

53

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities

Since a linear function is being learned, there are no local optima to contend with. The TDL training process, as shown in Algorithm, is very simple, and can be explained in two parts. The first is how it is interfaced to the Othello game. The game engine calls a TDL update method for any TDL player after each move has been made. It calls “inGameUpdate” during a game, or “terminalUpdate” at the end of a game. To show just how simple the process is, the code of these algorithm is shown as below. Algorithm. The main two methods for TDL learning in Othello. public void inGameUpdate(double[ ] prev, double[ ] next)

and taking a random walk from that point. At each step of the walk, the next square is chosen as one of the eight immediate neighbours of the current square. Each walk was for six steps, but only distinct squares are retained. So each randomly constructed n-tuple had between 2 and 6 sample points. The results in this paper are based on 30 such ntuples. One would expect some n-tuples to be more useful than others, and there should be scope for evolving the ntuples sample points while training the look-up table values using TDL. Fig. 14 shows how performance improves with the number of self-play games. After every 25 self play games, performance was measured by playing 100 games against the standard heuristic player (50 each as black and white).

{ double op = tanh(net.forward(prev)); double tg = tanh(net.forward(next)); double delta = alpha · (tg − op) · (1 − op · op); net.updateWeights(prev, delta); } public void terminalUpdate(double[ ] prev, double tg) { double op = tanh(net.forward(prev)); double delta = alpha · (tg − op) · (1 − op · op); net.updateWeights(prev, delta); }

The variables are as follows: op is the output of the network; tg is the target value; alpha is the learning rate (set to 0.001); delta is the back error term; prev is the previous state of the board; next is the current state of the board; “net” is an instance variable bound to some neural network type of architecture (an n-tuple system in this case). The n-tuple system implements the “net” interface, and an instance of one is bound to the “net” in the code. The forward method calculates the output of the network given a board as input. The “updateWeight” method propagates an error term, and makes updates based on this in conjunction with the board input. For the n-tuple system the update method is very simple. While the value function was calculated by summing over all LUT entries indexed by the current board state, the update rule simply adds the error term δ to all LUT entries indexed by the current board: l(d) = l(d) + δ,

∀d ∈ D(b)

(7)

One of the best features of an n-tuple system is how it scales with size. Because of the indexing operation, it is independent of the size of the LUT. Therefore, although the LUT size grows exponentially with respect to n, the speed remains almost constant, and linear in the number of n-tuples. Hence, n-tuple value functions with hundreds of thousands of weights can be calculated extremely quickly.

5.4

Choosing the sample points

The n positions can be arranged in a straight line, in a rectangle, or as random points scattered over the board. The results in this paper are based on random snakes: shapes constructed from random walks. Each n-tuple is constructed by choosing a random square on the board,

Fig. 14 Variation in win ratio against a heuristic player (each sample point based on 100 games, 50 each as black and white) (The x-axis, nGames/25, indicates the scaling; multiply each value by 25 to find the number of games played. The y-axis shows the ratio of the number of wins to the total number of games played)

Table 3 shows how performance against the 2006 CEC champion varies with the number of self play games (in this case playing 200 games against the champion (100 each as black and as white), where nsp denotes the number of selfplay games. After the first 500 self-play games have been played, the champion is defeated in nearly 70% of games. Table 3 Performance of TDL n-tuple player versus 2006 CEC champion over 200 games, sampled after varying nsp nsp

Won

Drawn

Lost

250

89

5

106

500

135

6

59

750

142

5

53

1000

136

2

62

1250

142

5

53

Not only has the n-tuple based player reached a higher level of performance than any player to date, it has also done so much more quickly. While n-tuple systems are well known to a small base of appreciative users for their high speed and reasonable accuracy, their accuracy is usually not quite as high as the best neural networks or SVMs for many pattern recognition tasks. Game strategy learning is different to pattern recognition in important ways: the data is acquired through active exploration, and the data is typically much noisier. This might be the reason why they

54

International Journal of Automation and Computing 05(1), January 2008

perform so well on this task.

6

Future directions

6.1

Robotics and games

The simulation middleware that underlies many video games enables the development of more lifelike games to be developed with greater ease. With high quality middleware to support game development, developers can concentrate their creativity on creating fun environments with appropriate interest and challenge, with the game-play arising naturally from the situation rather than having to be preprogrammed. For example, until recently, the graphics for explosions would have to be designed with a great deal of manual effort. With modern physics engines such as Ageia0 s PhysX, it can be simulated as a particle system; the graphics then arise directly from the physical model. The explosion can then depend very naturally on the amount of fuel in the tank, for example. A major interest of our research group has been in car racing challenges. Competitive car driving is a problem of great practical importance, and has received some attention from the CI community. Most often researchers have used various learning methods for developing controllers for car racing simulations or games[36, 37] . However, CI techniques have also been applied to physical car racing, famously by Thrun in the DARPA Grand Challenge[38] (where DARPA is defense advanced research projects agency), but also by e.g. Tanev et al.[39] who evolved controllers for radiocontrolled toy cars. The author has recently been involved with work developing a robotic car racing platform2 . Robotic car racing offers the same type of challenge whether done on full size cars or model cars, but model cars are much cheaper. If the research can be done in simulation, then of course it is cheaper still, though ultimately less convincing, at least until transferred back to the real world. As a starting point, take the challenge of driving quickly along a path or road using a computer vision system as the main source of input. The vision problem is of similar complexity whether doing it on a real-world car, or using the video output of a modern console or personal computer (PC) racing game. This can be seen from the examples. Fig. 15 shows an image captured by the web-cam on our robotic model car (while in motion at about 10 miles per hour). Fig. 16 shows a screen shot taken from Sega Rally running on a PC. While the vision problems are of similar complexity (depending on the details of the environment or track), the game version typically offers much more forgiving physics than the real world, especially when it comes to car damage! We are currently investigating running a car racing competition using an on-line commercial gaming lobby such as XBox live, where all the competitors are software agents processing the real-time video from the console output to drive the tracks as competitively as possible.

2

http://dces.essex.ac.uk/staff/lucas/roborace/roborace.html

Fig. 15

An image captured on-board our autonomous model car

Fig. 16

6.2

A screen shot from Sega Rally

Direct video input mode

Direct video input mode has much wider application than car racing. The idea is that the input to the software agent is simply the real-time video output from the game (audio could also be included). One of the main difficulties for academic research on CI methods for video games is the time taken to learn the application programming interface (API) to interact with a complex game. While we may look forward to more standardization of NPC APIs in the future, the possibility of circumventing API issues by giving the software agent exactly the same view as a human player offers tremendous challenges, but for some types of game (especially car racing games) it may be possible to make rapid progress. Indeed, results have already been obtained for a relatively simple 3D car racing game using CI methods[37] . More recently, this screen capture mode has been used to enable software agents to play Ms. Pac-Man without requiring any access to details of the software. Fig. 17 shows an agent under test controlling the Ms. Pac-Man character using the direct screen capture mode of operation. The Ms. Pac-Man game is shown to the right, the extracted game objects are shown to the left, and the top window shows the direction key currently selected by the agent. The screen is captured approximately 15 times per second. The exact number depends on the speed of

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities

the computer and the amount of processing performed by the agent control algorithm. After each screen is captured, some basic image processing routines (e.g. connected component analysis) are then run to extract the main game objects, such as the Ms. Pac-Man agent, the ghosts, the power pills, and the food pills.

Fig. 17 An agent under test controlling the Ms. Pac-Man character via direct screen capture

These extracted objects are then given as input to the agent under test. In response to this, the agent generates a key event (one of the cursor keys) to control the movement of the Ms. Pac-Man agent. Because of the inherent delays in the screen capture process and the image processing operations, the state of the game that the agent sees will often be out of date at the point of delivery, and any further processing done by the agent only adds to the delay in sending the key event to the operating system to control the game. This delay is typically variable, but is often between 50 ms and 100 ms. Naive agent controllers that ignore this effect often spend much of their time oscillating around a junction while missing the turning! From a CI viewpoint this merely adds to the challenge of developing high quality controllers. This has been run as a competition for the 2007 IEEE CEC, and will also be run for 2008 IEEE World Congress on Computational Intelligence (WCCI). So far, the leading entry has been one called Pacool, supplied by Abbas Mehrabian and Arian Khosravi, which has scored over 17 000 points when run in this mode. The current human world champion record stands at 933 580 (by Abdner Ashman, who cleared 133 screens in the process), so there is some way to go yet until software agents can compete with the best humans at this game.

6.3

Competitions

There are many game-related competitions, and this is a great way to drive research forward. One of the most active fields of game research is Go, and the dedicated Go community is also one of the best at running regular competitions. Without this, radical new approaches such as MoGo, which really was a major departure from conventional com-

55

puter Go wisdom, would have most likely taken much longer to become established. Well-designed competitions are simply the best way to establish which techniques work best. Some games require interesting AI rather than competitive AI. Interesting behaviour is in the eye of the beholder: it is subjective and naturally hard to measure, though statistically there may be much agreement about what is interesting or not. Other games (especially real-time strategy) do require smarter AI, and any games company wishing to discover the best AI for their game might find that publishing the appropriate API and running open competitions offered a very cost-effective way to do this. This can be done for the entire behaviour of an agent, or to measure performance over certain tasks. This also enables a rapid transfer of technology between academia and the games industry. As for some standard tasks, it will be relatively easy to test the various performance aspects of a given algorithm or component. The use of standard benchmarks also makes entry into this research area attractive for an academic. This has already been seen for simpler game environments such as the Stanford general game playing competition3 , the Essex Othello neural network server4 , project Hoshimi5 and simulated Car racing6 . There are many interesting challenges and opportunities involved in making these competitions work for more complex games, including the standardization of sensory inputs, body parts, control systems and actuators. There are also great opportunities for companies wishing to outsource advanced software agent development work where it is unclear who may be the best providers for the required system. In such cases, a competition can be run where the reward goes to the entrants who provide the best solutions. This is essentially what DARPA did by organizing the grand challenge, but it can also be done using the web for smaller scale challenges.

7

Conclusions

There exists a diverse range of game genres on which to test and apply CI methods, and this paper has mentioned only a few examples. CI methods can be used to successfully develop competitive agents for these games to control non-player characters. However, the study of the Mountain Car problem, point-to-point car racing, and Othello led to some remarkable observations. In the case of the Mountain Car problem, we showed how certain problem setups could render the problem trivial, so that running an evolutionary algorithm often revealed near-optimal solutions in the initial randomly constructed population. In point-to-point car racing we observed that learning performance could be severely impaired by the presence of an unnecessary input. In the case of Othello, there were two main findings. First, standard co-evolution failed to learn good quality WPCs. However, by using parent-child weighted averaging we were able to learn high quality WPCs, the best of which 3

http://games.stanford.edu http://algoval.essex.ac.uk:8080/othello/html/Othello.html 5 http://www.project-hoshimi.com 6 http://julian.togelius.com/cig2007competition 4

56

International Journal of Automation and Computing 05(1), January 2008

even outperformed the standard heuristic weights. Second, by adopting a radically different type of neural network, an n-tuple system, we were able to easily outperform the best performing MLPs for this problem. In summary, much experimentation and perhaps even rather novel techniques may be necessary to get best performance from CI methods. This makes the area very interesting and challenging to research.

References [1] J. Schaeffer, N. Burch, Y. Bj¨ ornsson, A. Kishimoto, M. M¨ uller, R. Lake, P. Lu, S. Sutphen. Checkers Is Solved. Science, vol. 317, no. 5844, pp. 1518–1522, 2007. [2] K. Chellapilla, D. B. Fogel. Evolving an Expert Checkers Playing Program without Using Human Expertise. IEEE Transactions on Evolutionary Computation, vol. 5, no. 4, pp. 422–428, 2001. [3] D. B. Fogel, T. J. Hays, S. L. Hahn, J. Quon. An Evolutionary Self-learning Chess Program. Proceedings of the IEEE, vol. 92, no. 12, pp. 1947–1954, 2004. [4] S. Gelly, Y. Wang, R. Munos, O. Teytaud. Modification of UCT with Patterns in Monte-Carlo Go, Technical Report 6062, INRIA, France, 2006. [5] Y. Wang, S. Gelly. Modifications of UCT and Sequence-like Simulations for Monte-Carlo Go. In Proceedings of IEEE Symposium on Computational Intelligence and Games, IEEE Press, pp. 175–182, 2007. [6] E. Charniak. Statistical Language Learning, MIT Press, Cambrige, Massachusetts, USA, 1996. [7] S. Colton, P. Cowling, S. M. Lucas. An Industry/Academia Research Network on Artificial Intelligence and Games Technologies, Technical Report EP/F033834, EPSRC, Swindon, 2007. [8] M. Buro. ProbCut: An Effective Selective Extension of the Aalpha-Beta Algorithm. ICCA Journal, vol. 18, no. 2, pp. 71–76, 1995.

[15] D. Cliff. Explorations in Evolutionary Design of Online Auction Market Mechanisms. Journal of Electronic Commerce Research and Applications, vol. 2, no. 2, pp. 162–175, 2003. [16] A. Moore. Efficient Memory-based Learning for Robot Control, Ph. D. dissertation, University of Cambridge, UK, 1990. [17] R. Sutton and A. Barto. Introduction to Reinforcement Learning, MIT Press, Cambridge, MA, USA, 1998. [18] S. Whiteson, P. Stone. Evolutionary Function Approximation for Reinforcement Learning. Journal of Machine Learning Research, vol. 7, pp. 877–917, 2006. [19] G. Tesauro. Temporal Difference Learning and TDgammon. Communications of the ACM, vol. 38, no. 3, pp. 58–68, 1995. [20] J. B. Pollack, A. D. Blair. Co-evolution in the Successful Learning of Backgammon Strategy. Machine Learning, vol. 32, no. 3, pp. 225–240, 1998. [21] G. Tesauro. Comments on “Co-Evolution in the Successful Learning of Backgammon Strategy”. Machine Learning, vol. 32, no. 3, pp. 241–243, 1998. [22] P. J. Darwen. Why Co-evolution Beats Temporal Difference Learning at Backgammon for a Linear Architecture, but not a Non-linear Architecture. In Proceedings of Congress on Evolutionary Computation, IEEE Press, vol. 2, pp. 1003– 1010, 2001. [23] T. P. Runarsson, S. M. Lucas. Co-evolution versus Selfplay Temporal Difference Learning for Acquiring Position Evaluation in Small-board Go. IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp. 628–640, 2005. [24] S. M. Lucas, T. P. Runarsson. Temporal Difference Learning versus Co-evolution for Acquiring Othello Position Evaluation. In Proceedings of IEEE Symposium on Computational Intelligence and Games, Reno/Lake Tahoe, USA, pp. 53–59, 2006.

[9] T. Gosling, N. Jin, E. Tsang. Games, Supply Chains and Automatic Strategy Discovery Using Evolutionary Computation. Handbook of Research on Nature-inspired Computing for Economics and Management, J. P. Rennard (ed.), vol. 2, pp. 572–588, 2007.

[25] C. Kotnik, J. Kalita. The Significance of Temporaldifference Learning in Self-play Training: TD-rummy versus EVO-rummy. In Proceedings of the International Conference on Machine Learning, Washington D.C., USA, pp. 369–375. 2003.

[10] K. O. Stanley, B. D. Bryant, R. Miikkulainen. Real-time Neuroevolution in the NERO Video Game. IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp. 653–668, 2005.

[26] M. E. Taylor, S. Whiteson, P. Stone. Comparing Evolutionary and Temporal Difference Methods in a Reinforcement Learning Domain. In Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation, Seattle, Washington, USA, pp. 1321–1328, 2006.

[11] R. M. Axelrod. The Evolution of Cooperation, Basic Books Inc., New York, USA, 1984. [12] N. Jin, E. Tsang. Co-adaptive Strategies for Sequential Bargaining Problems with Discount Factors and Outside Options. In Proceedings of Congress on Evolutionary Computation, Vancouver, BC, Canada, pp. 2149–2156, 2006. [13] D. Cliff. Minimal-intelligence Agents for Bargaining Behaviors in Market-based Environments, Technical Report HPL97-91 970811, Hewlett Packard Laboratories, USA, 1997. [14] D. K. Gode, S. Sunder. Allocative Efficiency of Markets with Zero-intelligence Traders: Market as a Partial Substitute for Individual Rationality. The Journal of Political Economy, vol. 101, no. 1, pp. 119–137, 1993.

[27] F. Gomez, J. Schmidhuber, R. Miikkulainen. Efficient Nonlinear Control through Neuroevolution. In Proceedings of the European Conference on Machine Learning, Lecture Notes in Computer Science, vol. 4212, pp. 654–662, 2006. [28] S. M. Lucas, J. Togelius. Point-to-point Car Racing: An Initial Study of Evolution versus Temporal Difference Learning. In Proceedings of IEEE Symposium on Computational Intelligence and Games, Westin Harbour Castle Toronto, Ontario Canada, pp. 260–267, 2007. [29] K.-F. Lee, S. Mahajan. A Pattern Classification Approach to Evaluation Function Learning. Artificial Intelligence, vol. 36, no. 1, pp. 1–25, 1988.

S. M. Lucas / Computational Intelligence and Games: Challenges and Opportunities [30] K.-F. Lee, S. Mahajan. The Development of a World Class Othello Program. Artificial Intelligence, vol. 43, no. 1, pp. 21–36, 1990. [31] M. Buro. LOGISTELLO – A Strong Learning Othello Program. NEC Research Institute, Princeton, NJ, 1997, [Online], Available: http://www.cs.ualberta.ca/˜ mburo/ps/log-overview.ps.gz. [32] S. Y. Chong, M. K. Tan, J. D. White. Observing the Evolution of Neural Networks Learning to Play the Game of Othello. IEEE Transactions on Evolutionary Computation, vol. 9, no. 3, pp. 240–251, 2005. [33] W. W. Bledsoe, I. Browning. Pattern Recognition and Reading by Machine. In Proceedings of the Eastern Joint Computer Conference, pp. 225–232. 1959. [34] J. Ullman. Experiments with the n-tuple Method of Pattern Recognition. IEEE Transactions on Computers, vol. 18, no. 12, pp. 1135–1137, 1969. [35] R. Rohwer, M. Morciniec. A Theoretical and Experimental Account of n-tuple Classifier Performance. Neural Computation, vol. 8, no. 3, pp. 629–642, 1996. [36] B. Chaperot, C. Fyfe. Improving Artificial Intelligence in a Motocross Game. In Proceedings of IEEE Symposium on Computational Intelligence and Games, Reno/Lake Tahoe, USA, pp. 181–186, 2006. [37] D. Floreano, T. Kato, D. Marocco, E. Sauser. Coevolution of Active Vision and Feature Selection. Biological Cybernetics, vol. 90, no. 3, pp. 218–228, 2004. [38] S. Thrun, M. Montemerlo, H. Dahlkamp, D. Stavens, A. Aron, J. Diebel, P. Fong, J. Gale, M. Halpenny, G. Hoffmann, K. Lau, C. Oakley, M. Palatucci, V. Pratt, P. Stang, S. Strohband, C. Dupont, L.-E. Jendrossek, C. Koelen, C. Markey, C. Rummel, J. van Niekerk, E. Jensen, P. Alessandrini, G. Bradski, B. Davies, S. Ettinger, A. Kaehler, A. Nefian, and P. Mahoney. The Robot that Won the DARPA Grand Challenge. Journal of Field Robotics, vol. 23, no. 9, pp. 661–692, 2006.

57

[39] I. Tanev, M. Joachimczak, H. Hemmi, K. Shimohara. Evolution of the Driving Styles of Anticipatory Agent Remotely Operating a Scaled Model of Racing Car. In Proceedings of IEEE Congress on Evolutionary Computation, vol. 2, pp. 1891–1898, 2005.

Simon M. Lucas received the B. Sc. degree in computer systems engineering from the University of Kent, UK, in 1986, and the Ph. D. degree from the University of Southampton, UK, in 1991. He has worked a year as a research engineer for GEC Avionics. After a one-year post doctoral research fellowship (funded by British Telecom), he was appointed to a lectureship at the University of Essex in 1992 and is currently a reader in computer science. He was chair of IAPR Technical Committee 5 on Benchmarking and Software. He is the inventor of the scanning n-tuple classifier, a fast and accurate OCR method. He was appointed inaugural chair of the IEEE CIS Games Technical Committee in July 2006, and has been competitions chair for many international conferences, and co-chaired the first IEEE Symposium on Computational Intelligence and Games in 2005. He was program chair for IEEE Congress on Evolutionary Computation (CEC) in 2006, and program co-chair for IEEE Symposium on Computational Intelligence and Games (CIG) in 2007, and will be program co-chair for International Conference on Parallel Problem Solving from Nature in 2008. He is an associated editor of IEEE Transactions on Evolutionary Computation, and the Journal of Memetic Computing. He was invited keynote speaker at IEEE Congress on Evolutionary Computation in 2007. His research interests include evolutionary computation, games, and pattern recognition, and he has published widely in these fields with over 120 refereed papers, mostly in leading international conferences and journals.