Machine Learning-based gameplay

UNIVERSITATEA BABEŞ-BOLYAI CLUJ-NAPOCA FACULTATEA DE MATEMATICǍ ŞI INFORMATICǍ SPECIALIZAREA INFORMATICĂ ROMÂNĂ LUCRARE DE DIPLOMǍ Machine Learning-...
Author: Georgia Lester
1 downloads 0 Views 1MB Size
UNIVERSITATEA BABEŞ-BOLYAI CLUJ-NAPOCA FACULTATEA DE MATEMATICǍ ŞI INFORMATICǍ SPECIALIZAREA INFORMATICĂ ROMÂNĂ

LUCRARE DE DIPLOMǍ

Machine Learning-based gameplay

Conducător ştiinţific Prof. univ. dr. Czibula Gabriela Absolvent Cîmpean Alexandru

2012

Table of contents Introduction ................................................................................................................................... 3 I. Game perspective in Artificial Intelligence ............................................................................. 5 1.1 A general game description................................................................................................. 5 1.2 The context of video games ................................................................................................. 5 1.3 Artificial intelligence in video games ............................................................................... 13 II. Learning to play ..................................................................................................................... 17 2.1 Software agents: autonomous programs ......................................................................... 17 2.2 Machine learning: making computers act ....................................................................... 22 2.3 Embedding machine learning in computer ga mes ......................................................... 23 2.3.1 The history of machine learning and games............................................................. 23 2.3.2 What do the games have to learn? ............................................................................ 23 2.3.3 Creating a Learning Agent ........................................................................................ 25 2.3.4 Proble ms with Learning ............................................................................................. 28 III. Relevant applications of machine learning in games ........................................................ 30 3.1 TD-Gammon ...................................................................................................................... 30 3.2 Samuel's checkers program .............................................................................................. 31 3.3 NeuroChess......................................................................................................................... 32 3.4 SAL...................................................................................................................................... 32 IV. “A Tale of Two Catapults” – machine learning techniques applied in games................ 34 4.1 Proble m statement ............................................................................................................. 34 4.2 Motivation .......................................................................................................................... 34 4.3 Analysis and design............................................................................................................ 35 4.3.1 Proble m analysis ......................................................................................................... 35 4.3.2 Solution’s design ......................................................................................................... 38 4.4 Implementation .................................................................................................................. 42 4.5 User manual ....................................................................................................................... 49 4.6 Results obtained ................................................................................................................. 51 4.7 Future work........................................................................................................................ 53 Conclusions .................................................................................................................................. 54 Bibliography ................................................................................................................................ 55

2

Introduction

Since the beginning of computer games, whether they were inspired from traditional games such as chess, backgammon, checkers, whether they exploited the new opportunities of the virtual worlds, creating new and innovative genres, the problem of an artificial player appeared. The players of those video games desired a program that knows how to play a game, and will play it in collaboration or against him, using similar skills as themselves. The first games had a component that would play against the human player, but that proved to be very poor and limited according to the players’ demands. Then, in order to beat the game, some games had a collaborative part which helped the player in different situations that mostly couldn’t be done alone. To make such programs the developers inspired themselves from a classical branch of artificial intelligence, more exactly game theory. Using this method the programs used different algorithms to choose the best moves based on different heuristics and classical moves. The game theory approach proved to be a good one until its limits were discovered. This method requires a well established strategy and when the state space is big they will run very slow. Another deficit of this method is the incapacity to adapt to new situations, making the radical change of the game parameters difficult or even impossible. Later, the concept of agent, which we will explain on this work, introduced new possibilities for programs to play games. Thus the artificial intelligence in video games has grown according to the need of better non-human players and was split in multiple types: traditional game playing, mimicking intelligence and human player behavior and some of them tried to adapt to methods from another branch of artificial intelligence – machine learning. Machine learning is a concept we will explain further in this work and applied in games, agents are trying to learn a game without knowing its rules and without considering a strategy, only based on the feedback he receives after playing repeated games with himself.

3

In the first chapter of this work we will present theoretical aspects of video games in general, artificial intelligence in video games as well as different traditional techniques of making a game playing program. In the second chapter we will present other techniques of artificial intelligence. In the first place we will describe agents and their characteristics; we will see what an agent needs to be considered intelligent and what features are required; we will classify the agents according to different criteria and give examples of some of them. Further in this chapter we will present the concept of machine learning; we will describe its subtypes and characteristics; we will present the steps needed to make a learning algorithm; also here we will describe the concept of learning to play, give examples of learning methods used in game play and see how they work; In the third chapter of this work we will talk about the recent work in this domain and analyze some of the most relevant learning games together with their characteristics. The fourth chapter of this work is a practical application of machine learning in games. The game is called “A Tale of Two Catapults”, it is a game in which the player controls a catapult and in order to win the game he has to hit the opponent catapult. He can choose the required angle and initial velocity to shoot a projectile at some distance the opponent is situated. Furthermore the wind can blow from any direction and influence the trajectory of the projectile. In order to win the game the player has to hit the opponent three times. The opponent is in fact a program designed to learn to shoot after playing many games with himself. In the fourth chapter we will present all the steps in developing such a program including the design, the algorithms and methods used, difficulties met and the results obtained.

4

I. Game perspective in Artificial Intelligence

1.1 A general game description Below we have some definitions of games from different theoreticians and specialists in games: 

"A game is a system in which players engage in an artificial conflict, defined by rules, that results in a quantifiable outcome." (Katie Salen and Eric Zimmerman)[1]



"A game is a form of art in which participants, termed players, make decisions in order to manage resources through game tokens in the pursuit of a goal." (Greg Costikyan)[2]



"A game is an activity among two or more independent decision- makers seeking to achieve their objectives in some limiting context." (Clark C. Abt)[3]



"At its most elementary level then we can define game as an exercise of voluntary control systems in which there is an opposition between forces, confined by a procedure and rules in order to produce a disequilibrial outcome." (Elliot Avedon and Brian SuttonSmith)[4]



"A game is a form of play with goals and structure." (Kevin J. Maroney)[5]

1.2 The context of video games There are two principal reasons to continue to do research on games ... First, human fascination with game playing is long-standing and pervasive. Anthropologists have catalogued popular games in almost every culture....Games intrigue us because they address important cognitive functions....The second reason...is that some difficult games remain to be won, games that people play very well but computers do not. These games clarify what our current approach lacks. They set challenges for us to meet, and they promise ample rewards [6]. - Susan L. Epstein, Game Playing: The Next Moves

5

Computer game designer Chris Crawford attempted to define the term game using a series of dichotomies: 

Creative expression is art if made for its own beauty, and entertainment if made for money.



A piece of entertainment is a plaything if it is interactive. Movies and books are cited as examples of non- interactive entertainment.



If no goals are associated with a plaything, it is a toy. (Crawford notes that by his definition, (a) a toy can become a game element if the player makes up rules, and (b) The Sims and SimCity are toys, not games.) If it has goals, a plaything is a challenge.



If a challenge has no "active agent against whom you compete," it is a puzzle; if there is one, it is a conflict. (Crawford admits that this is a subjective test. Video games with noticeably algorithmic artificial intelligence can be played as puzzles; these include the patterns used to evade ghosts in Pac-Man.)



Finally, if the player can only outperform the opponent, but not attack them to interfere with their performance, the conflict is a competition. (Competitions include racing and figure skating.) However, if attacks are allowed, then the conflict qualifies as a game.[7]

Video Games encompass a range of electronic games, including coin-operated arcade games, console and cartridge games, various handheld electronics, and games on diskette and CD-ROM ranging in sophistication from simple geometric shapes to virtual reality programs with movielike qualities. [8]

Most games fall within a particular category. Some different gaming styles and, thus, could appear under more than one category simultaneously. And others pioneer new approaches to electronic entertainment. Those often fall outside of any pre-conceived genre. If such a title becomes popular enough that others try to duplicate the experience, a new genre comes into being.[9]

Below is a list of some of the most common video game genres with a brief description and some examples of games that fit within that category. This list is not comprehensive and is limited to larger classifications. It is easy to classify particular titles much more narrowly and, thus, create 6

dozens of genres and/or sub- genres. This is an attempt to give a broader perspective of types of video games.[9] 

Shooter:

One of the oldest genres of video game is the classic shooter. It has roots in the early 60s with Steve Russell's Spacewar! Shooters are games that require the player to blow away enemies or objects in order to survive and continue gameplay. They usually fall into one of two categories: 1) horizontal, or 2) vertical. However, like Spacewar!, Star Castle, and Asteroids, there are shooters that are neither horizontal or vertical. These involve moving around the screen and shooting in whatever direction necessary to keep from being destroyed. Other classic examples include Defender, Galaga, R-Type, Phoenix, Space Invaders, Tempest, Xevious, and Zaxxon. 

First-Person-Shooter (or FPS):

This is an example of a sub-genre that has grown enough to become its own genre. In fact, because of the prevalence of these games, many people use the term "shooter" to refer to firstperson-shooters. These games are realtime fast-paced action games in which the player navigates an environment from a first-person perspective and, usually, blows everything and everyone away whenever possible. Though Wolfenstein 3D is regarded as the first sucessful example of this genre, it wasn't until the release of Doom that people began to recognize the true potential of this type of gaming. Doom enabled multiple game players to share in the same game simultaneously via modem and LAN. This would become the standard of this genre, opening the game format up to multi-player deathmatches that would become so important to the format that some put little effort into story and the single-player experience in general (i.e., Unreal Tournament and Quake III). Though this is a relatively new genre (since the early 1990s), it has grown in popularity. Examples of first-person-shooter franchises include Wolfenstein 3D, Doom, Duke Nukem 3D, Descent, Marathon, GoldenEye, Halo, Quake, and Time Splitters.

7



Adventure:

Another of the first video game genres, especially from the computer platforms, was the adventure game. These were initially text-based games like Will Crowther's Collossal Cave and the original Zork games. However, as the power of the gaming systems grew, developers tried to tap into the visual capabilities of each consecutive platform. The Atari VCS offered a game entitled Adventure. Roberta Williams began develping the King's Quest series for Sierra Online in an attempt to add interactive graphics and point-and-click funtionality to the more puzzleoriented traditional text-based adventure. There has always been a strong following for this genre because of the challenge of puzzle-solving and the general lack of violence. This has also made it popular for many non-traditional gaming demographics. In recent years, LucasArts and Cyan have been known for their contributions to the adventure genre. Other examples of adventure franchises include Gabriel Knight, Indiana Jones, Maniac Mansion, Monkey Island, Myst, Police Quest, and Syberia. 

Platform:

It is believed that the platform genre began in 1981 with the release of the games Donkey Kong and Space Panic. Games within this genre are usually identified by navigating environments that require timing and jumping in order to reach a desitination while avoiding and/or disposing of enemies. Many of these, like Donkey Kong, have a series of screens, each with its own individual pattern of challenges. As companies began to develop platform games for home consoles and computers instead of arcade machines (i.e. Super Mario Bros for the Famicom and Nintendo Entertainment system), they took advantage of the evolving processors and and greater memory capacity by transcending individual screens and utilizing actively side-scrolling worlds. This evolutionary step in platform games moved them closer to immersive stories rather than challenging puzzles. Platform video games continued to evolve as gaming became more 3D. One of the greatest 3D platform games was introduced with the launch of the Nintendo 64 and was called Super Mario 64. Examples of 2D screen-based platform franchises include Bubble Bobble, Burgertime, Donkey Kong, Lode Runner, Mario Bros., and Space Panic. Examples of 2D scrolling platform franchises include Bonk, Donkey Kong Country, Sonic the Hedgehog,

8

Super Mario Bros., and Vectorman. Examples of 3D platform franchises include Crash Bandicoot, Pac-Man World, Spyro the Dragon, and the aforementioned Super Mario 64. 

Role-Playing Games (RPGs):

Evolving from pen-and-paper games like Dungeons and Dragons, RPGs are a special type of adventure game that usually incorporate three major elements: 1) a specific quest, 2) a process for evolving a character through experience to improve his/her ability to handle deadlier foes, 3) the careful acquisition and management if inventory items for the quest (i.e., weapons, armor, healing items, food, and tools). Having said that, these games still have many variations and appearances. 

Puzzle:

In many ways, puzzle video games are not dissimilar from traditional puzzles. What they offer are unique environments that are not as easily introduced in one's living room. For example, Wetrix enables the player to build up a series of walls that would be able to contain a deluge of water when it falls. Successful completion of a level involves capturing enough water. Other examples include Tetris, Intelligent Qube, Puzzle Bobble, Puyo Puyo, Devil Dice, and Mercury. 

Simulations:

By their nature, simulations are attempts to accurately re-create an experience. These can be in the form of management simulations like SimCity and Theme Hospital, or more hands on like MicroSoft Flight Simulator or Gran Turismo. 

Strategy/Tactics:

Like simulations, strategy/tactics games attempt to capture a sense of realism for the game player to experience. However, these titles are often turn-based as opposed to realtime and they give the player a greater sense of specific control over a situation. Franchises that fall into this genre include Ogre Tactics, Command and Conquer, Final Fantasy Tactics, and Worms.

9



Sports

As you can imagine, sports games are those that simulate the playing of sports. Many of these have incorporated novel aspects beyond the games themselves. For example, most football video games like the Madden series enable the player to create and customize teams and play them for an entire season. Furthermore, many sports games include management elements beyond the games themselves. There is quite a bit of variety in this genre for fans of the games, the players, and the behind the scenes responsibilities of owning a team. 

Fighting:

These titles pit player against player (usually 2 players head-to-head) and involve one triumphing over the other. Many of these games include a single player mode, but the real draw to this genre is the ability to demonstrate one's gaming prowess against a friend. Examples of franchises in this genre include Street Fighter, Soul Calibur, Mortal Kombat, Tekken, Virtua Fighter, Dead or Alive, King of Fighters, and Bloody Roar. 

Dance/Rhythm:

Dance Dance Revolution is probably the single largest franchise in this genre. Of the rest, many require a specialized controller like DDR, but several don't. This grouping of games is differentiated by the timed elements usually synched to music somehow. Other good examples of this form include Parappa the Rapper, Bust a Groove, Gitaroo Man, Space Channel 5, Frequency, Beatmania, Para Para Paradise, Donkey Konga, and Eyetoy Groove.



Survival Horror:

As the name suggests, these titles are an interactive evolutionary step of the horror genre. The main gameplay mechanic in these is to "survive" the environment that includes fantastic or supernatural elements that are very frightening and often disturbing. Many of these titles are rated mature because of they are not intended for younger audiences and often include graphic scenes. 10



Hybrids:

It's important to recognize that many games are not limited to a single genre. Some are the combination of two or more game types. In fact, as gaming evolves, we see lines blurred between genres more frequently than not. Since the introduction of 3D gaming, the action/adventure genre has grown dramatically. It is practically a catch-all category that incorporates 3D games with real-time combat and puzzle-solving in a fairly cohesive storyline. Many of these games are also first-person-shooters. Some are 3D platform titles. And most survival horror titles qualify as Action/Adventure games too. Another example of a hybrid is Myst. It is both an adventure game and a puzzle game. However, it is most certainly not an Action/Adventure game. [9]

Below we have three examples of games that show the evolution of video games throughout the decades: Figure 1 – OXO (Noughts And Crosses) – made in 1952 by Alexander S. Douglas Figure 2 - Super Mario Bros. – made in 1985 by Nintendo Co., Ltd Figure 3 – Assassin’s Creed – made in 2007 by Ubisoft Entertainment S.A.

11

Figure 1 - OXO

Figure 2 - Super Mario Bros.

Figure 3 - Assassin's Creed

12

1.3 Artificial intelligence in video games Artificial intelligence in video games can be implemented using two types of algorithms: machine learning, which we will discuss in the next chapter and search algo rithms. Next we will make a classification of search algorithms and describe them.

Search in artificial intelligence

Search plays a major role in solving many Artificial Intelligence (AI) problems. Search is a universal problem-solving mechanism in AI. In many problems, sequence of steps required to solve is not known in advance but must be determined by systematic trial-and-error exploration of alternatives.

The problems that are addressed by AI search algorithms fall into three general classes: single-agent path-finding problems, two-players games, and constraint-satisfaction problems [10] 

Path-finding problems

Classic examples in the AI literature of path- finding problems are sliding- title puzzles, Rubik’s Cube and theorem proving. The single-title puzzles are common test beds for research in AI search algorithms as they are very simple to represent and manipulate. Real-world problems include the traveling salesman problem, vehicle navigation, and the wiring of VLSI circuits. In each case, the task is to find a sequence of operations that map an initial state to a goal state. 

Two-players games

Two-players games are two-player perfect information games. Chess, checkers, and othello are some of the two-player games.

13



Constraint-satisfaction problems

Eight Queens problem is the best example. The task is to place eight queens on an 8*8 chessboard such that no two queens are on the same row, column or diagonal. Real- world examples of constraint satisfaction problems are planning and scheduling applications.

Proble m space

Problem space is a set of states and the connections between to represent the problem. Problem space graph is used to represent a problem space. In the graph, the states are represented by nodes of the graph, and the operators by edges between nodes. Although most problem spaces correspond to graphs with more than one path between a pair of nodes, for simplicity they are often represented as trees, where the initial state is the root of the tree. The cost of the simplification is that any state that can be reached by two different paths will be represented by duplicate nodes thereby increasing the tree size. The benefit of using tree is that the absence of cycles greatly simplifies many search algorithms.

One feature that distinguishes AI search algorithms from other graph-searching algorithms is the size of the graph involved. For example, the entire chess graph is estimated to contain over 10^40 nodes. Even a simple problem like twenty-four puzzle contains almost 10^25 nodes. As a result, the problem-space graphs of AI problems are never represented explicitly by listing each state, but rather are implicitly represented by specifying an initial state and a set of operators to generate new states from existing states. Moreover the size o f an AI problem is rarely expressed as the number of nodes in its problem-space graph. The two parameters of a search tree that determine the efficiency of various search algorithms are its branching factor and its solution depth. The branching factor is nothing but average number of children of a given node. For example in eight-puzzle problem, the average branching factor is square-root(3) or about 1.732. The solution depth of a problem instance is the length of a shortest path from the initial state to a goal state or the length of a shortest sequence of operators that solve the problem. [10]

14

Classes of search algorithms 

For virtual search spaces

Algorithms for searching virtual spaces are used in constraint satisfaction problem, where the goal is to find a set of value assignments to certain variables that will satisfy specific mathematical equations and inequations. They are also used when the goal is to find a variable assignment that will maximize or minimize a certain function of those variables. Algorithms for these problems include the basic brute-force search (also called "naïve" or "uninformed" search), and a variety of heuristics that try to exploit partial knowledge about structure of the space, such as linear relaxation, constraint generation, and constraint propagation. An important subclass are the local search methods, that view the elements of the search space as the vertices of a graph, with edges defined by a set of heuristics applicable to the case; and scan the space by moving from item to item along the edges, for example according to the steepest descent or best- first criterion, or in a stochastic search. This category includes a great variety of general metaheuristic methods, such as simulated annealing, tabu search, A-teams, and genetic programming, that combine arbitrary heuristics in specific ways. This class also includes various tree search algorithms, that view the elements as vertices of a tree, and traverse that tree in some special order. Examples of the latter include the exhaustive methods such as depth- first search and breadth-first search, as well as various heuristic-based search tree pruning methods such as backtracking and branch and bound. Unlike general metaheuristics, which at best work only in a probabilistic sense, many of these tree-search methods are guaranteed to find the exact or optimal solution, if given enough time. Another important sub-class consists of algorithms for exploring the game tree of multiple-player games, such as chess or backgammon, whose nodes consist of all possible game situations that could result from the current situation. The goal in these problems is to find the move that provides the best chance of a win, taking into account all possible moves of the opponent(s). Similar problems occur when humans or machines have to make successive decisions whose outcomes are not entirely under one's control, such as in robot guidance or in marketing, financial or military strategy planning. This kind of problem - combinatorial search - has been

15

extensively studied in the context of artificial intelligence. Examples of algorithms for this class are the minimax algorithm, alpha-beta pruning, and the A* algorithm.[11] 

For sub-structures of a given structure

The name combinatorial search is generally used for algorithms that look for a specific substructure of a given discrete structure, such as a graph, a string, a finite group, and so on. The term combinatorial optimization is typically used when the goal is to find a sub-structure with a maximum (or minimum) value of some parameter. (Since the sub-structure is usually represented in the computer by a set of integer variables with constraints, these problems can be viewed as special cases of constraint satisfaction or discrete optimization; but they are usually formulated and solved in a more abstract setting where the internal representation is not explicitly mentioned.) An important and extensively studied subclass are the graph algorithms, in particular graph traversal algorithms, for finding specific sub-structures in a given graph — such as subgraphs, paths, circuits, and so on. Examples include Dijkstra's algorithm, Kruskal's algorithm, the nearest neighbour algorithm, and Prim's algorithm. Another important subclass of this category are the string searching algorithms, that search for patterns within strings. Two famous examples are the Boyer–Moore and Knuth–Morris–Pratt algorithms, and several algorithms based on the suffix tree data structure. [11]

16

II. Learning to play

2.1 Software agents: autonomous programs There have been many debates about how a software agent should be defined, and a universally accepted definition has not been given. We have below some definitions of the software agents from the point of view of theoreticians and other people in this domain.

Software agents are computational systems that inhabit some complex dynamic environment, sense and act autonomously in this environment, and by doing so realize a set of goals or tasks for which they are designed (P.Maes) [12]. Russel and Norvig point out: “the notion of an agent is meant to be a tool for analyzing systems, not an absolute characterization that divides the world into agents and non-agents” [14]. We can say that ‘software agent’ or ‘intelligent agent’ can best be seen as an umbrella term for programs that to some extent display attributes commonly associated with agency. (Nwana H.) [13] Franklin and Graesser (1996) give their own definitions of what an agent is supposed to be: “an autonomous agent is a system situated within and part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.”[18]

17

Figure 4 -Franklin and Graesser’s (1996) agent taxonomy

Agent characte ristics There are some characteristics that define a so ftware agent. In the section below we will enumerate and describe each characteristic as defined and accepted in literature. 

Reactive

In order for an agent to function autonomously in any given environment it must be able to perceive its environment and act in a timely fashion upon changes that occur in it. A software agent may employ any type and number of sensors to sense its environment. The agent can react to sensory input using its actuators. We can differentiate between various degrees of reactivity, ranging from purely reactive software agents on the one hand, to software agents that deliberate extensively before reacting on the other hand. [15] 

Pro-active and goal-oriented

Pro-activity is a more specialized form of autonomy. When an agent is said to be pro-active, it does not simply act in response to its environment, but it will exhibit goal-directed behavior and take initiative to attain its goals or design objectives [15].

18



Deliberative

More sophisticated agents are not merely reactive (i.e., operating in a stimulus response manner) but are able to reason about their actions. The ability to reason enables agents to act pro-actively and perform more difficult tasks in complex environments. [16] 

Continual

In order for a software agent to accomplish its goals, it must have temporal continuity. The agent must be able to function over a certain period of time with persistence of identity and state. Agents that have an episodic memory are able to learn from previous experiences [17] 

Adaptive

Making agents adaptive is one way of attaining flexibility. Adaptivity can range from (1) being able to adapt flexibly to short-term, smaller changes in the environment, to (2) dealing with more significant and long-term (lasting) changes in the environment [12]. Software agents that are able to deal with longterm changes are able to improve themselves and their performance over time by storing the knowledge of past experiences within their internal state and taking this knowledge into account when executing (similar) actions in the future. [16] 

Communicative

Agents should be able to communicate with other software agents and even humans in order to complete their tasks and help other agents complete theirs. [15] Especially in multi-agent systems the ability to communicate with other agents is important. Agents communicate with other agents using a common agent language, such as FIPA ACL or KQML. When agents communicate with humans they must communicate using natural language. [16] 

Mobile

Although mobility is neither a necessary nor a sufficient condition for agency, many scholars (Gilbert et al. 1995; Nwana 1996; Brazier et al. 2003) include mobility when describing agent characteristics. It is oftentimes better for an agent to interact with a remote system at the location of the remote system than to do it over a distance. Several reasons for this preferred form of 19

interaction can be specified. A first reason is efficiency. Network traffic can be reduced when the agent and the remote system are at the same location. For instance, when an agent queries a remote database, data has to be sent back and forth between the remote database and the agent. This communication can be kept local when the agent and the remote system are at the same location, thereby reducing the strain on external networks such as the Internet. A second reason is that data need not be exchanged over (public) networks but can be handled locally. It also means that agents can operate more secure. A third reason is that the remote system only allows for agents to operate locally, thereby forcing the agent to migrate to the remote location.[16] Agent topologies In the next section we will investigate different agent topologies and try to place different agents into classes. A typology refers to the study of types of entities. There are several dimensions to classify existing software agents. Firstly, agents may be classified by their mobility, i.e. by their ability to move around some network. This yields the classes of static or mobile agents. [13] Secondly, they may be classed as either deliberative or reactive. Deliberative agents derive from the deliberative thinking paradigm: the agents possess an internal symbolic, reasoning model and they engage in planning and negotiation in order to achieve coordination with other agents. Work on reactive agents originate from research carried out by Brooks (1986) and Agre & Chapman (1987). These agents on the contrary do not have any internal, symbolic models of their environment, and they act using a stimulus/respo nse type of behavior by responding to the present state of the environment in which they are embedded (Ferber, 1994). Indeed, Brooks has argued that intelligent behavior can be realized without the sort of explicit, symbolic representations of traditional AI (Brooks, 1991b). [13]

20

Figure 5 - Typology based on Nwana’s (Nwana 1996) primary attribute dimension

Thirdly, agents may be classified along several ideal and primary attributes which agents should exhibit. We have identified a minimal list of three: autonomy, learning and cooperation. We appreciate that any such list is contentious, but it is no more or no less so than any other proposal. Hence, we are not claiming that this is a necessary or sufficient set. Autonomy refers to the principle that agents can operate on their own without the need for human guidance, even though this would sometimes be invaluable. Hence agents have individual internal states and goals, and they act in such a manner as to meet its goals on behalf of its user. A key element of their autonomy is their proactiveness, i.e. their ability to take the initiative rather than acting simply in response to their environment (Wooldridge & Jennings, 1995a). Cooperation with other agents is paramount: it is the raison d’être for having multiple agents in the first place in contrast to having just one. In order to cooperate, agents need to possess a social ability, i.e. the ability to interact with other agents and possibly humans via some communication language (Wooldridge & Jennings, 1995a). Having said this, it is possible for agents to coordinate their actions without cooperation (Nwana et al., 1996). Lastly, for agent systems to be truly smart, they would have to learn as they react and/or interact with their external environment. In our view, agents are (or should be) disembodied bits of intelligence. Though, we will not attempt to define what intelligence is, we maintain that a key attribute of any intelligent being is its ability to learn. The learning may also take the form of increased performance over time. [13]

21

2.2 Machine learning: making computers act Tom M.Mitchell is one of the most prominent theoreticians in the field of machine learning and he gave a widely quoted definition about this term : “A computer program is said to learn from experience E with respect to some class of tasks T and performa nce measure P, if its performance at tasks in T, as measured by P, improves with experience E”[19]

Machine learning algorithms are organized into taxonomy, based on the desired outcome of the algorithm. Common algorithm types include: • Supervised learning - where the algorithm generates a function that maps inputs to desired outputs. One standard formulation of the supervised learning task is the classification problem: the learner is required to learn (to approximate the behavior of) a function which maps a vector into one of several classes by looking at several input-output examples of the function. • Unsupervised learning - which models a set of inputs: labeled examples are not available. • Semi-supervised learning - which combines both labeled and unlabeled examples to generate an appropriate function or classifier. • Reinforcement learning - where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. • Transduction - similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and new inputs. • Learning to learn - where the algorithm learns its own inductive bias based on previous experience. [19] Of course the list above can be enlarged with other types of machine learning and all kinds of hybrid algorithms.

22

2.3 Embedding machine learning in computer games

2.3.1 The history of machine learning and games The history of the interaction of machine learning and computer game-playing started at the same time Artificial Intelligence appeared. Arthur Samuel was the first one to apply machine learning techniques in games, working on his checker-playing program, this way being a pioneer in machine- learning and game-playing techniques (Samuel, 1959, 1967). He facilitated the evolution of both fields and since then the intersection of the two can be found regularly in conferences in their respective fields and have evolved throughout research along the years. In recent years, the computer games industry has discovered AI as a necessary element to make games more entertaining and challenging and, vice versa, AI has discovered computer games as an interesting and rewarding application area. The industry’s perspective is witnessed by a multitude of recent books on introductions to AI techniques for game programmers or a series of edited collections of articles. AI research on computer games began to follow developments in the games industry early on, but since John Laird’s keynote address at the AAAI 2000 conference, in which he advocated Interactive Computer Games as a challenging and rewarding application area for AI (Laird & van Lent, 2001) [20] which demonstrates the growing importance of game-playing applications for Artificial Intelligence.[21] 2.3.2 What do the games have to learn? The need of AI in computer games is growing, and with a good user experience must come an artificial intelligence part as good as it. These needs can be recast as a call for new practical and theoretical tools to help with [21]: 

learning to play the game: Game worlds provide excellent test beds for investigating the potential to improve agents’ capabilities via learning. The environment can be constructed with varying characteristics, from deterministic and discrete as in classical 23

board and card games to non-deterministic and continuous as in action computer games. Learning algorithms for such tasks have been studied quite thoroughly. Probably the bestknown instance of a learning game-playing agent is the Backgammon-playing program TD-Gammon (Tesauro, 1995) [23]. 

learning about players: Opponent modeling, partner modeling, team modeling, and multiple team modeling are fascinating, interdependent and largely unsolved challenges that aim at improving play by trying to discover and exploit the plans, strengths, and weaknesses of a player’s opponents and/or partners. One of the grand challenges in this line of work are games like Poker, where opponent modeling is crucial to improve over game-theoretically optimal play (Billings et al., 2002) [22].



behavior capture of players: Creating a convincing avatar based on a player’s in- game behavior is an interesting and challenging supervised learning task. For example, in Massive Multiplayer Online Role-playing Games (MMORGs) an avatar that is trained to simulate a user’s game-playing behavior could take his creator’s place at times when the human player cannot attend to his game character. First steps in this area have been made in commercial video games such as Forza Motorsport (Xbox) where the player can train a “Drivatar” that learns to go around the track in the style of the player by observing and learning from the driving style of that player and generalizing to new tracks and cars. model selection and stability: Online settings lead to what is effectively the unsupervised construction of models by supervised algorithms. Methods for biasing the proposed model space without significant loss of predictive power are critical not just for learning efficiency, but interpretive ability and end-user confidence. optimizing for adaptivity: Building opponents that can just barely lose in interesting ways is just as important for the game world as creating world-class opponents. This requires building highly adaptive models that can substantively personalize to adversaries or partners with a wide range of competence and rapid shifts in play style. By introducing a very different set of update and optimization criteria for learners, a wealth of new research targets are created [21].



model interpretation: “What’s my next move” is not the only query desired of models in a game, but it is certainly the one which gets the most attention. Creating the illusion of intelligence requires “painting a picture” of an agent’s thinking process. The ability to describe the current state of a model and the process of inference in that model from 24

decision to decision enables queries that provide the foundation for a host of social actions in a game such as predictions, contracts, counter- factual assertions, advice, justification, negotiation, and demagoguery. These can have as much or more influence on outcomes as actual in- game actions [21]. 

performance: Resource requirements for update and inference will always be of great importance. The AI does not get the bulk of the CPU or memory, and the machines driving the market will always be underpowered compared to typical desktops at any point in time [21].

2.3.3 Creating a Learning Agent We will now see how to implement a learning system into a game, because of the need of the games to be adaptable. The example we will use is given by Nick Palmer in the article Machine Learning in Games development and is the example of a team-strategy based paintball game. The aim of the program is for a team of seven agents to capture the opponent's flag and bring it back to their starting position. They must do this without being hit by the opposing team's paintballs. For a start, we shall exclude the team strategy from our discussions, and concentrate on the individual agents - it is no good having a perfect strategy if the agents aren't smart enough to survive on their own [24]. We must consider the factors which will influence an agent’s performance in the game. Terrain is an obvious start point, as this is all that stands between the two teams, so it must be used to the agent's advantage. Secondly, there must be an element of stealth behind the behavior of each agent, as otherwise it will be simple to undermine any tactics used during the game by simply picking off the naïve agents one-by-one [24].

A learning agent is composed of a few fundamental parts: a learning element, a performance element, a curiosity element (or 'problem generator'), and a performance analyzer (or 'critic'). The learning element is the part of the agent which modifies the agent's behavior and creates improvements. Tom M.Mitchell names this component in his book “Machine Learning” The Generalizer [19]. The performance element is responsible for choosing external actions based 25

on the percepts it has received (percepts being information that is known by the agent about its environment). To illustrate this, consider that one of our agents is in the woods playing paintball. He is aware of an opposing paintballer nearby. This would be the percept that the agent responds to, by selecting an action - moving behind a tree. This choice of action is made by the performance element. This component is called The Performance System by Tom M. Mitchell. The performance analyzer judges the performance of the agent against some suitable performance measure. The performance must be judged on the same percepts as those received by the performance element - the state of affairs 'known' to the agent. When the analysis of performance has been made, the agent must decide whether or not a better performance could be made in the future, under the same circumstances. This decision is then passed to the learning element, which decides on the appropriate alteration to future behavior, and modifies the performance element accordingly. This component is called The Critic in Tom M. Mitchell’s book [19][24]. Below we have an illustrated example of the four components above working together as mentioned in Tom M. Mitchell’s book.

Figure 6 - Tom M. Mitchell's Learning Model

26

How do we make sure that the agent advances in its learning, and doesn't merely confine itself to previously observed behavior? This is dealt with by the curiosity element (so-called because it searches for a better solution) which has a knowledge of the desirable behavior of the agent (i.e. it knows that being shot is not desirable, and that finding the opponent's flag is). To achieve optimal performance, this element will pose new challenges to the agent in an attempt to prevent bad habits developing. This component is called The Expe riment Generator by Tom Mitchell [19].

To understand the benefits of this, we will consider a paintballer who is hiding behind a

tree. From his past experience, he knows that he is safe to stay where he is, and this would result in an adequate performance. However, the curiosity element kicks in, and suggests that he makes a break from his cover and heads to a nearby tree which is closer to the enemy flag. This may result in the agent ultimately being shot at, but could also achieve a more desirable goal. It is then up to the performance analyzer and the learning element to consider whether there is a benefit to this change in strategy [24].

Figure 7 - Learning architecture proposed by Nick Palmer

This style of learning is known as reinforcement learning, which means that agent can see the result of its actions, but is not told directly what it should have done instead. This means that the agent must use, what is trial and error, to evaluate its performance and learn from mistakes. The 27

advantage to this is that there is no limitation on the behavior, other than the limit to alterations suggested through the curiosity element. If after each action, the agent was told what its mistake was and how it should correct its behavior, then the desired behavior must already be understood [24].

As the learning agent is ultimately part of a game, it must not be left simply to work out for itself how to play. The agents must be imparted with a fair degree o f prior knowledge about the way to behave. In the case of paintball, this could include methods for avoiding fire by using cover, which may later be adapted during the learning process [24].

2.3.4 Proble ms with Learning Although machine learning can bring a very big improvement in gameplay there are certain problems that can occur when tying to implement such an agent [24]: 

Mimicking Stupidity - When teaching an AI by copying a human player's strategy, we will find that the computer could be taught badly. This is more than likely when the player is unfamiliar with a game. In this situation, a reset function may be required to bring the AI player back to its initial state, or else a minimum level must be imposed on the computer player to prevent its performance dropping below a predetermined standard.



Overfitting - This can occur if an AI agent is taught only a certain section of a game, and then expected to display intelligent behavior based on its experience. Using a FPS as an example, an agent which has learnt from its experience over one level will encounter problems when attempting a new level, as it may not have learnt the correct 'lessons' from its performance.



Local Optimality - When choosing a parameter on which the agent is to base its learning, be sure to choose one which has no dependency on earlier actions. As an example, we will take a snow-boarding game. The agent learns, through the use of an optimization algorithm, the best course to take down the ski slope, using its rotation as a parameter. This may mean that a non-optimal solution is reached, in which any small change cannot improve performance. 28



Set Behavior - Once an agent has a record of its past behavior and the resulting performance analysis, will it stick to the behavior which has been successful in the past, or will it try new methods in an attempt to improve? This is a problem which must be addressed or else an agent may either try to evaluate every possible behavior, or else stick to one without finding the optimal solution [24].

29

III. Relevant applications of machine learning in games In this chapter we will present the most relevant games with machine learning techniques and try to analyze their structure and algorithms used.

3.1 TD-Gammon One of the most impressive applications of reinforcement learning lately is presented by Gerry Tesauro in backgammon (Tesauro, 1992, 1994, 1995). Tesauro's program, TD-Gammon, needed little knowledge of table, but he learned to play very well, close to the world masters. Learning algorithm TD-Gammon was a simple combination of algorithm TD (λ) and nonlinear function approximation using a multilayer neural network trained by TD's errors calculated wit h backpropagation algorithm. [23] To apply the learning rule a source of backgammon games was needed. Tesauro obtained an unending sequence of playing games that teach against the agent himself. To choose movements, TD-Gammon considered each of the approximately 20 ways he could get the dice and corresponding positions that would result. The artificial neuronal network was consulted to estimate each of their values. The movement that would lead to the estimated value was selected. Continuing in this way, it was possible to easily generate a large number o f table games. Each game was treated as an episode, the sequence of positions as s0, s1, s2 ... that the rule applied after each movement of TD. The network’s weights were established initially as low random values. Initial assessments were thus completely arbitrary. Since movements were selected based on these assessments, initial movements were inevitably weak and original games often lasted hundreds or thousands of moves up to win one. After dozens of games, however, performance has improved rapidly.

30

After playing about 300,000 games against itself, TD-Gammon 0.0 as described above, learned to play as good as the best previous backgammon computer programs. This was a striking result, because all previous performance implementations have used the extensive knowledge of table. For example, the champion at that time was undoubtedly Neurogammon, another program written by Tesauro, who used neural network learning but not TD. The network used by Neurogammon was trained with copies provided by a backgammon expert, and, moreover, began with a set of features specifically designed for backgammon. TD-Gammon 0.0, on the other hand, was constructed essentially with zero backgammon knowledge. The fact that TD-Gammon was as good as Neurogammon and the other previous programs demonstrates potential of learning algorithms in games [23].

3.2 Samuel's checkers program An important precursor of TD-Gammon was one part of Arthur Samuel’s work (1959, 1967) to build the learning programs for playing checkers. Samuel was one of the first to effectively use search heuristics and what we now call temporal-difference learning. Samuel wrote a checkers program for the IBM 701 in 1952. Its learning program was completed in 1955 and was shown on television in 1956. Future versions have acquired an ability to play good, but not expert [25]. Samuel used two main methods of learning, the easiest one being called rote learning. It consisted simply in saving a description of each position of the board met during the game, along with its value determined by back-up procedure Min-Max. The result was that, if a position has already been seen again it appeared as a terminal position of a search tree, the search depth was amplified effectively because the position value is cached earlier. Rote learning produced slow but continuous improvement, which was most effective for opening and closing the game. Samuel's program has become a "better than average" program, having learned from many games with itself, a broad array of human opponents, and in a supervised learning mode from examples in books [25].

31

3.3 NeuroChess NeuroChess is a program that learned the game of chess created by Sebastian Thrun. It learns from a combination of playing with itself and the observation of experts. The search algorithm is taken from the GNU Chess, but the assessor is from a neural network. NeuroChess is an application of "explanation-based neural networks" (EBNN). EBNN adjusts explanation based learning theory for neural networks. NeuroChess represents a chess position as a feature vector of 175 data by given by hand. It has two neural networks, V and M. V is the evaluation function, which takes as input an array of features and produces an assessment used to play. M is a model of chess, which has as input an array of features and the predicted value of two half- moves later. The result of M is used in the formation of V [26]. After training of M with 120,000 games of master level, and V with other 2400 games, NeuroChess defeated the GNU Chess about 13% of the games. The same preparation, a version not using model M, NeuroChess chess wins about 10% of the games against GNU Chess. EBNN thus improves its performance. Of course these percents will grow with the number of games played. Learning from the games against itself has failed. However, learning from the experts first introduced some artifacts in program evaluations. A clear example of learning from experts is that the program observes that the probability of winning is greater when experts move their queen in the middle. The program does not realize that it is moved only when threatened. Therefore the best results are provided by combining the games with itself and learning from experts [26].

3.4 SAL SAL is a system developed by Michael Gherrity learners to play more games. SAL can apparently learn any two-player board game of perfect information, provided the game can be played on a rectangular board and uses fixed types of pieces [27].

32

He works the following way:

The user supplies a move generator, written in C, and some constants that declare the size of the board and the number and type of pieces used in the game. This is SAL’s only knowledge of the rules of the game. SAL chooses its moves with a search, like any chess program. It uses a two-ply, full-width alpha-beta search, plus a consistency search for quiescence. SAL’s evaluation function is a backpropagation neural network (actually, it uses separate evaluators for the two sides of the game). The inputs to the network are features representing the board position, the number of pieces of each type on the board, the type of piece just moved, the type of piece just captured, and a bunch of ad hoc features having to do with pieces and squares under attack and threats to win the game [27] . The evaluator of the neural network is trained by temporal difference methods to estimate the outcome of the game based on the current state. SAL has been tested on games tic-tac-toe (X and 0), Connect-four, and chess.  In tic-tac-toe, SAL learned to play perfect in 20,000 games, which means it is a bit slow.  At Connect-four, a more complex game, after 100,000 games SAL has learned to overcome an opponent about 80% of the time. 

In chess, SAL has played 4,200 games against GNU Chess, with GNU Chess set to play at a rate of one move per second. SAL has won eight games and lost others. The first game shows random play by SAL, since it hadn’t learned anything yet. A drawn game shows how much SAL improved, though it still plays very weakly. The graphs show that SAL continues to improve [27].

33

IV. “A Tale of Two Catapults” – machine learning techniques applied in games

4.1 Problem statement The application is meant to be a Windows Phone application, a turn-based game with catapults. The player controls a catapult and he must hit the opponent (in our case an AI opponent). In order to win, the player needs to hit the opponent three times directly. If the player is hit three times by the opponent, then the game is lost. The logic of the application is the essential part because the AI opponent is not a normal one. The application is, in essence an example of how machine learning techniques are used in video games and more than that. The central idea of this project is not the game itself but the design of a framework, an age nt-oriented architecture that can be applied to any turn-based two player game.

4.2 Motivation Our motivation for this application is the growing need of computer game players for a better experience and a more capable AI. Through the last decades games evolved drastically due to the new limits of graphic processors. Along with this evolution came the evolution of the video gamer, whose needs for a more competent opponent are still a problem nowadays. The problems of the recent games are that their AI component is too weak, because it is static and limited. There were some very good examples of machine learning techniques used in video games that had shown us the great potential of machine learning, but unfortunately they are just a few. Inspired by Tom M. Mitchell’s book “Machine Learning” we have decided to make a computer game with a learning component and to show the results obtained.

34

4.3 Analysis and design 4.3.1 Proble m analysis Being a game that can be only played on the computer, we do not have any human expert to teach our program different strategies, besides the fact that the game itself is pretty simple. So we have chosen a learning technique that doesn’t require a direct feedback. The most recommended learning strategy for this kind of task is reinforcement learning or any other learning that is works on the same principle, more exactly playing repeated games with itself in order to learn. That being said we have chosen gradient descent learning for our program, a learning technique in which the player has to learn a function, called target function which will reflect the mathematical function that calculates the distance of a projectile that is being shot with an initial velocity and angle. In order to evaluate a state of the game we will give every state a value assigned by the target function. For the final game states it is very easy, we decide that the value of the target function is V(s) = 2000 if the game is won or V(s) = -2000 if the game is lost, s being the current state of the game we have to assign a value. For the intermediate states we need an approximation function that must be as relevant as it can in reflecting the real value of the state. The mathematical function the program has to learn is D = v0 2 / g · sin (2θ), where v0 is the initial velocity, g is the gravitational acceleration (9,8 m/s2 ) and θ is the angle formed between the gound and the catapult arm which fires, given in degrees. Because the above function is too complex to be learned in that form, it will be expressed in a approximation form, a linear combination of the features of the environment in a given state. We consider the following features of the environment to be relevant for a state:  x1 – the number of hits player one can take;  x2 – the number of hits player two can take;  x3 – the distance between the two catapults (in meters);  x4 – the intensity of the wind (in m/s) and it’s direction;  x5 – the initial velocity of the projectile;

35

 x6 – the angle where the projectile in shot from, in degrees. To calculate the approximation function correctly we assign to each feature a weight represented by a float number and a bias. These weights are changed throughout the learning making the function converge to what we want. This being said, the approximatio n function can be expressed in the following way:

Ṽ(s) = w0 +

wi · xi

where x i represents the values of the features,

w i represent the values of the weights and w0

represents the bias. In order to learn properly, the agent needs some kind of feedback from a critic.In our case that feedback is given indirectly by the value of a function called training function. For every state s, the training function can be expressed in the following way:

Vtrain(s) = Ṽ(Succesor(s)) The Successor(s) function is the state that was right after s in the list of training examples provided by the game’s performance system. We need just oane more adjustment to make the algorithm work properly: the way weights are recalculated every turn. For that we have chosen the LMS(least mean squares) algorithm in which every turn wehave to minimize the value of E, which represents the error made by the sum of squared differences between the target function and the training function. The ideal situation, in which the agent has learned the function perfectly the target function is the same with the training function. In this way, each turn,for every single training example(made from a state and the associated value) every weight will recalibrate after the following function:

wi = wi + η· (Vtrain(s) - Ṽ(s))· xi where

η

is a very small constant called learning rate,and the other values have been explained

before.

36

After repeated simulations the target function should represent more and more correctly the desired function and finally it will be the same. As stated before, our learning model was inspired by Tom M. Mitchell’s book „Machine Learning”[19], and our learning model can be modeled in the same way using the following components:

Experiment Generator Updated target function

New initial state

Performance System

Generalizer

Training examples

Game History Critic

Figure 8 - Catapult Learning Model

37

4.3.2 Solution’s design

Use case diagram The application being a simple leads to a simple use case diagram. The player can start the game, play the game or exit the game. The characteristics of the game will be detailed in future sections.

Figure 9 - Use case diagram

Class diagram Because the application has many classes including the UI, made with the XNA Game Studio 4.0 framework together with the logic of the application which include s the abstract classes representing the agent architecture as well as the implementation of those classes with some specific algorithms used for this game we will show only the class diagram for the abstract agent model to clarify the relationship between the classes.

38

Figure 10 - Class diagram

 The diagrams were made using the CASE instrument StarUML Description of classes In this section we will describe every component of the model, and present the way it works and what is its relationship with other classes. The main classes of the agent architecture are: Play, Simulation, Player, AI, Action, Environment and State. Further, the machine learning part is given by the classes Learner, Critic, Generalizer, TargetFunction and PerformanceSystem. Play – this class is practically the core of the model. It has only one abstract method called “Start” which takes as parameter an initial state. It also has as class members the two players and the environment the game takes place. From this class two abstract classes are derived: Simulation and Game.

39

Simulation – this class is as important as its base class Play because it implements the actual logic of the game on an abstract level, independent of the implementation of the involved classes. It also has a function “IsComplete” which checks the environment if the game is over. In the implementation section we will present a code snippet from the algorithm. It also has two components that facilitate machine learning: the Learner and History which will be discussed later. Player – this class represents the abstract form of a player. It has as a member his number in the turn-based game. This class is further expanded in two classes: AI and Human. It has only one method ”SelectAction” which chooses the best action from the current state in order to win the game. AI – the representation of an AI player in a game, having all the components required to play it properly. As its parent class it has the “SelectAction” method but in addition it also has two new elements which are part of the machine learning system. The classes TargetFunction and PerformanceSystem are specific to a player and each of them has its role in the process of learning. Details about how these two classes work will be discussed later in this chapter. Action – this class represents a basic action an agent can do when is his turn. It has only one method “Execute”, which applies the effect of the selected action of the player over the current state. Environment – this class represents the environment in which the game takes place. Usually it has a current state and a method to update it. The environment is an important element in the model because it’s one of the key elements in the interaction process. The State class is a property of the environment that gives a clear description of a state in the whole process as well as the parameters needed for evaluation. Learner – this is the main component of the learning mechanism, consisting of the Critic and Generalizer and combining the two processes using a history of the game as an input. The History is created during the simulation and based on it the learner adjusts the target function.

40

Critic – this is a component of the learning part which takes the history of the game and transforms it into training examples which consist primarily of the state of the game and its value as a float number or any desired format. Generalizer – this component takes the training examples made by the Critic and adjusts the weights using the LMS algorithm described above. After the adjustments are made the Learner stores them so they can persist. TargetFunction – this class has only one method and no members. Its job is to calculate the value of the state according to the method we select in the implementation of it. PerformanceSystem – this is one of the most important parts of the learning system. It is a property of the AI class and it chooses the best move to make based on the target function and other action selection algorithms. Because in the beginning the target function is totally wrong we need a method to expand the search of the best solutions in all the cases. That’s way we implement an epsilon- greedy mechanism for the selection of the best moves, the epsilon’s value increasing along with the performance of the system. For the generation of random numbers we used an algorithm proposed by George Marsaglia that generates random numbers according to a Gaussian continuous distribution [28].

41

4.4 Implementation The implementation of the program is in C# programming language using the XNA Game Studio 4.0 framework and Windows Phone 7 SDK and as an IDE we used Microsoft Visual Studio 2010. In the following section we will present the frameworks above along with their characteristics.

Windows Phone 7 The Windows Phone Application Platform enables developers to create engaging consumer experiences running on a Windows® Phone. It is built upon existing Microsoft® tools and technologies such as Visual Studio, Expression Blend®, Silverlight®, and the XNA Framework. Developers already familiar with those technologies and their related tools will be able to create new applications for Windows Phone without a steep learning curve [29]. Runtimes With Silverlight and the XNA Framework, all development is done in managed code, in a protected sandbox allowing for the rapid development of safe and secure applications. Applications written for Silverlight or the XNA Framework today will run on Windows Phone with only a minor number of adjustments, such as for screen size or for device-specific features. The two frameworks, along with Windows Phone–specific components and the Common Base Class Library provide a substantial number of components for developers to construct applications on [29].  Silverlight - Silverlight is the ideal framework for creating Rich Internet Applicationstyle user interfaces. A Windows Phone Silverlight application exposes its UI through a set of pages. Windows Phone controls that match the Windows Phone visual style can be used to enhance the look and feel of the UI. Visual Studio or Expression Blend can be used for designing the XAML-based interfaces. Visual Studio can be used to implement

42

the application logic by utilizing the media rich Silverlight libraries or the base functionality provided by the Common Base Library.  XNA Framework - The XNA Framework is composed of software, services, and resources focused on enabling game developers to be successful developing on Microsoft gaming platforms. Microsoft provides technology that allows professional developers to quickly enable games on platforms like Windows Phone, Xbox 360, Zune HD, and Windows 7. The XNA Framework provides a complete set of managed APIs for game development. This includes 2D sprite-based APIs that support rotation, scaling, stretching, and filtering as well as 3D graphics APIs for 3D geometry, textures, and standard lighting and shading.  Sensors - A variety of sensors will return data that can be consumed by developers. For example, multi-touch input, accelerometer, compass, gyroscope, and microphone sensors will all be accessible by APIs.  Media – Both Silverlight and the XNA Framework pro vide developers with a programming model for building rich user experiences that incorporate graphics, animation, and media. The managed APIs support a variety of media formats and allow for discovery and enumeration of media on the device and for playback of that media.  Data - Isolated storage allows an application to create and maintain data in a sandboxed isolated virtual folder. All I/O operations are restricted to isolated storage and do not have direct access to the underlying operating system file system. This prevents unauthorized access and data corruption by other applications. 

Location - The Microsoft Location Service for Windows Phone allows application developers to access the user’s physical location information from a single API. Developers can query for the current location, subscribe to Location Changed events, set the desired accuracy of the data, get access to the device heading and speed, and calculate the distance between points. The Location APIs on the phone will work in conjunction with the Location Cloud Services [29].

XNA Frame work The XNA Framework class library is a library of classes, interfaces, and value types that are included in XNA Game Studio. This library provides access to XNA Framework functionality

43

and is designed to be the foundation on which XNA Game Studio applications, components, and controls are built [30].

Namespaces 

Microsoft.Xna.Framework

Provides commonly needed game classes such as timers and game loops. 

Microsoft.Xna.Framework.Audio

Contains low- level application programming interface (API) methods that can load and manipulate XACT-created project and content files to play audio. 

Microsoft.Xna.Framework.Content

Contains the run-time components of the Content Pipeline. 

Microsoft.Xna.Framework.Design

Provides a unified way of converting types of values to other types. 

Microsoft.Xna.Framework.GamerServices

Contains classes that implement various services related to gamers. These services communicate directly with the gamer, the gamers’ data, or otherwise reflect choices the gamer makes. Gamer services include input device and profile data APIs. 

Microsoft.Xna.Framework.Graphics

Contains low- level application programming interface (API) methods that take advantage of hardware acceleration capabilities to display 3D objects. 

Microsoft.Xna.Framework.Graphics.PackedVector

Represents data types with components that are not multiples of 8 bits. 

Microsoft.Xna.Framework.Input

Contains classes to receive input from keyboard, mouse, and Xbox 360 Controller devices. 

Microsoft.Xna.Framework.Input.Touch

Contains classes that enable access to touch-based input on devices that support it. 

Microsoft.Xna.Framework.Media

Contains classes to enumerate, play, and view songs, albums, playlists, and pictures. 

Microsoft.Xna.Framework.Net

Contains classes that implement support for Xbox LIVE, multiplayer, and networking for XNA Framework games. 44



Microsoft.Xna.Framework.Storage

Contains classes that allow reading and writing of files [30]. Code snippets of important functions In this section we will present relevant fragments of the main functions of the algorithms.

The Start() function which represents the actual functionality.

public override void Start(State initState) { env.SetInitialState(initState); env.CurrentState().Display(); history.AddToHistory(env.CurrentState()); while (!IsComplete()) { if (playersTurn) { Action action = player.SelectAction(env.CurrentState()); env.UpdateState(player, action); playersTurn = false; } else { Action action = opponent.SelectAction(env.CurrentState()); env.UpdateState(opponent, action); playersTurn = true; } env.CurrentState().Display(); history.AddToHistory(env.CurrentState()); } learner.Learn(history); Console.WriteLine("END of simulation"); }

In the function above the variable env represents the environment, the variables player and opponent represent the two players of the game, learner is the component who actually does the learning, after the game is over (IsComplete() function) based on the history which is a list of game states.

45

The Learn() function of the Learner component who combines the critic and Generalizer. public void Learn(History history) { List examples = new List(); examples = critic.CalculateTrainingFunction(history); State state = generalizer.AdjustWeights(examples); SaveValues(state); }

The AdjustWeights() function of the Generalizer component which implements the LMS algorithm. public override State AdjustWeights(List exemple) { float eta = 0.0000001f; GameState savedFeatures = new GameState(); State sta = exemple[0].State; if (sta is GameState) { savedFeatures = (GameState)sta; } foreach (TrainingExample te in exemple) { GameState newState = null; State st = te.State; if (st is GameState) { newState = (GameState)st; } float vtrain = te.Vtrain; float vaprox = agent.TF.CalculateValue(te.State); foreach (Feature f in newState.Features) { Feature newFeature = savedFeatures.getFeatureById(f.FeatureID); float weight = 0; weight = newFeature.Weight + eta * (vtrain - vaprox) * f.Value; newFeature.Weight = weight; } } return savedFeatures; } 46

The ChooseBestMove() function of the implementation of the PerformanceSystem is the component which chooses the next move based on the target function and using the epsilongreedy technique to increase the exploring rate. public override State ChooseBestMove(State s) { GameState current = null; if (s is GameState) { current = (GameState)s; } currentState = current; List nextStates = new List(); nextStates = GeneratePossibleSuccessors(current); RandomGenerator.SetSeedFromSystemTime(); GameState bestState = new GameState(current); float bestValue = agent.TF.CalculateValue(nextStates.ElementAt(randomIndex.Next(0, nextStates.Count-1))); double minThreshold = 1; double maxThreshold = 0; foreach (GameState g in ScrambleArrayList(nextStates)) { float approxVal = agent.TF.CalculateValue(g); if (approxVal > bestValue) { double treshold = RandomGenerator.GetNormal(0.5, 0.15); if (treshold < minThreshold) minThreshold = treshold; if (treshold > maxThreshold) maxThreshold = treshold; if (Math.Abs(treshold) < 1 - epsilon) { bestValue = approxVal; bestState = g; } } } nextStates.Clear(); return bestState; }

47

The CalculateValue() function from the implementation of the TargetFunction class which implements the formula the target function is calculated. public override float CalculateValue(State s) { GameState state = null; if (s is GameState) { state = (GameState)s; } float val = 0; int p1 = state.Features.Where(x => x.Name == "P1 Lives").First().Value; int p2 = state.Features.Where(x => x.Name == "P2 Lives").First().Value; if (p1 == 0) { val = -2000; } else if (p2 == 0) { val = 2000; } else { foreach (Feature f in state.Features) { val += f.Value * f.Weight; } } return val; }

The Execute() function from the Shoot class, and implementation of Action, where the effects over the environment are made. public override State Execute(Player p, State s) { .............. int varDist = dist - wind; float g = 9.8f; int range = (int)((Math.Pow(velocity, 2) * Math.Sin(2 * (Math.PI * angle / 180.0))) / 9.8f); if (Math.Abs(varDist - range) < 5) { if (p.playerNr == 1) { state.Features.Where(x => x.Name == "P2 Lives").First().Value--; } else if (p.playerNr == 2) { state.Features.Where(x => x.Name == "P1 Lives").First().Value--; } } ............. return state; } 48

4.5 User manual As the player starts the application the game will automatically start. The screen will look like the following screenshot, a simple environment with two catapults: blue and red.

Figure 11 - Screenshot from "A Tale of Two Catapults"

To select the angle and velocity swipe freely on the screen of the phone and release when you want to shoot. After you hit the opponent three times and he hit you three times the game is over. You can start over by selecting the game again. Anytime during the application you can exit the game by pressing the “Back” button on the phone. In the next section we have a flow chart of how the game should work.

49

Figure 12 - Game flowchart

We tried to optimize the game for the best user experience, but there can be improvements in the future on the UI as well as on the logic of the game *The chart was made using the online tool Gliffy, www.gliffy.com

50

4.6 Results obtained In the following section we will present the performance obtained by our machine learning program using logs from the application. The first chart represents the perce ntage of games won over many simulations against himself. The second table will represent the evolution of the weights over the simulations.

Figure 13 - Percentage of games won

As we can see from the graph above there is little difference between the wins and losses, both of them being kept almost equal. Although the improvements seem insignificant, the small differences reflect the evolution of the algorithm and because he is playing against himself about half of the games are won. The ups and downs of the evolution line are in fact the more significant changes in the weights’ values, changes who affect the entire function.

51

Initial Value 0.68

Value after 200 games 0.6731037

Value after 400 games 0.6733158

Value after 600 games 0.6732157

Value after 800 games 0.6702238

Value after 1000 games 0.6715883

Player 1 Lives Player 2 Lives Distance

0.1

0.1301638

0.1617171

0.1838343

0.2016241

0.2248101

0.1

0.09603488

0.09260094

0.08035062

0.06279583

0.05061561

3

-0.2506298

0.04197381

0.3638703

-0.1142112

0.5817273

Wind

12

6.957904

4.366641

2.584595

1.598882

0.9481826

Velocity

7

3.011064

2.094112

1.595586

0.9499249

0.8309253

Angle

6

1.566059

0.07657859

-0.01758801

-0.05721566

-0.01786953

Name

Bias

We can observe that the values either grow or decrease, and they tend to converge to a very small number. But in order to get to that number, they need more training, maybe more relevant features and maybe other action selectors. Nonetheless, this is the proof the learning is a good mechanism in artificial intelligence, but also that is one that needs perfection and research. The values used in the chart and table were taken from the files “Log.txt” and “weightLog.txt” located in the application resources folder. *The chart was made using the online tool FooPlot, www.foofplot.com

52

4.7 Future work

53

Conclusions

54

Bibliography [1] Salen, Katie; Zimme rman, Eric (2003). Rules of Play: Game Design Fundamentals. MIT Press. p. 80. ISBN 0-262-24045-9 [2] Costikyan, Greg (1994). "I Have No Words & I Must Design". Interactive Fantasy, 2 (1994), 3 [3] Abt, Clark C. Serious Games. Viking Press. 1970. p. 6. ISBN 0-670-63490-5 [4] Avedon, Elliot; Sutton-Smith, Brian (1971). The Study of Games. J. Wiley. p. 405. ISBN 0-471-03839-3 [5] Maroney, Kevin (2001). My Entire Waking Life. The Games Journal. May 2001, 1 [6] Epstein, Susan L. Game Playing: The Next Moves. 1999, Sixteenth National Conference on Artificial Intelligence, 987 - 993. Menlo Park, Calif.: AAAI Press. [7] Crawford, Chris (2003). Chris Crawford on Game Design. New Riders. ISBN 0-88134117-7. [8] Kent, Steven L. The Ultimate History of Video Games. Roseville, Calif.: Prima, 2001 [9] Ted Stahl, Video Game Genres, 2005, http://www.thocp.net/software/games/reference/genres.htm [10] Robin, AI Search Techniques. December 13th, 2009 http://intelligence.worldofcomputing.net/ai-search/ai-search-techniques.html [11] Knuth, Donald. The Art of Computer Programming. Volume 3: Sorting and Searching. ISBN 0-201-89685-0. [12] Maes, P. 1995. Modeling Adaptive Autonomous Agents. In Artificial Life: An Overvie w, Ed. C. G. Langton, 135–162. Cambridge, Mass.: MIT Press [13] Nwana, H. S. 1996. Software Agents: An Overview. Knowledge Engineering Review, 11(3): 205-244. [14] Russell, S.J., Norvig, P. (1995). Artificial-intelligence: A Modern Approach, Engle wood Cliffs, NJ: Prentice Hall [15] Wooldridge, M., Jennings, N.R. (1995). Intelligent Agents: Theory and Practice, Knowledge Engineering Review

55

[16] Schermer, B.W., Software agents, surveillance, and the right to privacy: A legislative frame work for agent-enabled surveillance. Leiden University Press, 2007. [17] Bradshaw, J. (1998). Software Agents, Menlo Park, California: AAArtificialintelligence Press [18] Franklin, S., Graesser, A. 1996. Is It an Age nt or Just a Program? A Taxonomy for Autonomous Agents. In Proceedings of the Third Inte rnational Workshop on Agent Theories Architectures, and Languages. New York: Springer-Verlag. [19] Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7 [20] Laird, J. E., & van Lent, M. (2001). Human-level AI’s killer application: Interactive computer games. AI Magazine, 22(2), 15–26. [21] Bowling, M., Furnkranz, J., Graepel, T., Musick, R. Machine learning and games, Mach Learn (2006) 63:211–215 DOI 10.1007/s10994-006-8919-x [22] Billings, D., Pena, L., Schaeffer, J., & Szafron, D. (2002). The challenge of poker. Artificial Intelligence, 134(1–2), 201–240, Special Issue on Games, Computers and Artificial Intelligence [23] Tesauro, G. (1995). Temporal diffe rence learning and TD-Gammon. Communications of the ACM, 38(3), 58–68. [24] Palmer, Nick Machine Learning in Games Development (June 2002) – http://www.aidepot.com/GameAI/Learning.html [25] Arthur, Samuel (1959-03-03). "Some Studies in Machine Learning Using the Game of Checkers" (PDF). IBM Journal 3 (3): 210–229. Retrieved 2011-10-31 [26] Thrun, Sebastian Learning to play the game of chess (1995) http://www.robots.stanford.edu/papers/thrun.nips7.neuro -chess.html [27] Gherrity, Michael (1993). A Game Learning Machine. Ph.D. Thesis, University of California, San Diego [28] G. Marsaglia, "Random numbers fall mainly in the planes", Proc. Natl. Acad. Sci. 61(1), 25–28 (1968). [29] MSDN, Windows Phone, Application Platform Ove rvie w for Windows Phone. March 22, 2012 http://www.ms dn.microsoft.com/en-us/library/ff402531(v=vs.92) [30] MSDN, XNA Game Studio 4.0, XNA Frame work Class Library. http://www.ms dn.microsoft.com/en-us/library/bb203940(XNAGameStudio.40).as px 56

Images and diagrams were taken from 1. Figure 1 – OXO http://www.dreamauthentics.blogspot.ro/2011/09/noughts-and-crossesfirst-real.html 2. Figure 2 – Super Mario Bros. http://nintendo-okie.com/tag/new-super-mario-bros-wii/ 3. Figure 3 – Assassin’s Creed - http://www.elder-geek.com/2009/09/re-reviewed-assassinscreed/ 4. Figure 4 - Franklin and Graesser’s (1996) agent taxonomy - Franklin, S., and Graesser, A. 1996. Is It an Agent or Just a Program? A Taxonomy for Autonomous Agents. In Proceedings of the Third International Workshop on Agent Theories Architectures, and Languages. New York: Springer-Ve rlag. 5. Figure 5 - Typology based on Nwana’s (Nwana 1996) primary attribute dimension Nwana, H. S. 1996. Software Agents: An Overview. Knowledge Engineering Review, 11(3): 205-244 6. Figure 6 – Tom M. Mitchell's Learning Model - Mitchell, T. (1997). Machine Learning, McGraw Hill. ISBN 0-07-042807-7 7. Figure 7 – Learning architecture proposed by Nick Palme r - Palme r, Nick Machine Learning in Games Development (June 2002) –http://www.aidepot.com/GameAI/Learning.html

57