Game Playing and AI. Game Playing. Types of Games. Computers Playing Chess

Game Playing and AI •  Game playing as a problem for AI research: Game Playing Chapter 5.1 – 5.3, 5.5 –  game playing is non-trivial •  players ne...
3 downloads 2 Views 2MB Size
Game Playing and AI •  Game playing as a problem for AI research:

Game Playing Chapter 5.1 – 5.3, 5.5

–  game playing is non-trivial •  players need “human-like” intelligence •  games can be very complex (e.g., Chess, Go) •  requires decision making within limited Ime

–  games usually are: •  well-defined and repeatable •  fully observable and limited environments

–  can directly compare humans and computers

Computers Playing Chess

Types of Games DefiniIons: •  Zero-sum: one player’s gain is the other player’s loss. Does not mean fair. •  Discrete: states and decisions have discrete values •  Finite: finite number of states and decisions •  DeterminisIc: no coin flips, die rolls – no chance •  Perfect informaIon: each player can see the complete game state. No simultaneous decisions.

Game Playing and AI

Fully Observable (perfect info)

Game Playing as Search

Deterministic

Stochastic (chance)

Checkers, Chess, Go, Othello

Backgammon, Monopoly

Partially Observable (imperfect info)

•  Consider two-player, perfect informaIon, determinisIc, 0-sum board games:

–  e.g., chess, checkers, Ic-tac-toe –  board configuraIon: a unique arrangement of "pieces"

•  RepresenIng board games as search problem:

Stratego, Battleship

–  states: board configuraIons –  ac'ons: legal moves –  ini'al state: starIng board configuraIon –  goal state: game over/terminal board configuraIon

Bridge, Poker, Scrabble

All are also mulI-agent, adversarial, staIc tasks

Greedy Search using an EvaluaIon FuncIon

Game Tree RepresentaIon What's the new aspect to the search problem?

•  A U"lity func"on is used to map each terminal state of the board (i.e., states where the game is over) to a score indicaIng the value of that outcome to the computer

There’s an opponent we cannot control!

… X

X

X X

… XO

X

O

X O

X O



How can we handle this?

XX

X O

X

X O

XO

•  We’ll use:

–  posiIve for winning; large + means be^er for computer –  negaIve for losing; large − means be^er for opponent –  0 for a draw –  typical values (loss to win): •  -∞ to +∞ •  -1.0 to +1.0

Greedy Search using an EvaluaIon FuncIon •  Expand the search tree to the terminal states on each branch •  Evaluate the UIlity of each terminal board configuraIon •  Make the iniIal move that results in the board configuraIon with the maximum value

F

-7

C C 9 G

-5

H 3

I

9

D D 2 J

-6

•  Assuming a reasonable search space, what's the problem? This ignores what the opponent might do! Computer chooses C Opponent chooses J and defeats computer

computer's possible moves

A A 9 B B -5

Greedy Search using an EvaluaIon FuncIon

K 0

E E 3 L 2

M 1

N 3

A 9

opponent's possible moves O 2

terminal states

B

C

-5

F

D

9

G

-7

computer's possible moves

-5

H 3

I

9

2

J

-6

K

L

0

2

M 1

E 3

opponent's possible moves

N

O

3

board evaluation from computer's perspective

board evaluation from computer's perspective

Minimax Principle

Minimax Principle

Assume both players play op'mally – assuming there are two moves unIl the terminal states, – high UIlity values favor the computer

2

terminal states

•  The computer assumes aeer it moves the opponent will choose the minimizing move •  The computer chooses the best move considering both its move and the opponent’s opImal move

•  computer should choose maximizing moves

A A 1

– low UIlity values favor the opponent

B B -7

•  smart opponent chooses minimizing moves F

-7

computer's possible moves

C C -6 G

-5

H 3

I

9

D D 0 J

-6

K 0

L 2

M 1

E E 1

opponent's possible moves

N

O

3

board evaluation from computer's perspective

2

terminal states

PropagaIng Minimax Values Up the Game Tree •  Explore the tree to the terminal states •  Evaluate the UIlity of the resulIng board configuraIons •  The computer makes a move to put the board in the best configuraIon for it assuming the opponent makes her best moves on her turn(s):

Deeper Game Trees •  Minimax can be generalized to more than 2 moves •  Propagate values up the tree A A 3 B B -5

–  start at the leaves –  assign value to the parent node as follows

•  use minimum when node is the opponent’s move •  use maximum when node is the computer's move

F F 4 N W -3

General Minimax Algorithm For each move by the computer: 1. Perform depth-first search, stopping at terminal states 2. Evaluate each terminal state 3. Propagate upwards the minimax values if opponent's move, propagate up minimum value of its children if computer's move, propagate up maximum value of its children 4. Choose move at root with the maximum of the minimax values of its children Search algorithm independently invented by Claude Shannon (1950) and Alan Turing (1951)

C C 3 G

H

D J J 9

opponent P 9 min

Q

3

8

-6

K K 5 R 0

opponent min

E

E -7

0

I

-5

O O -5

4

computer max

S 3

L T 5

computer max

M M -7

2

U

-7

V

-9

terminal states

X

-5

Complexity of Minimax Algorithm Assume all terminal states are at depth d and there are b possible moves at each step •  Space complexity Depth-first search, so O(bd) •  Time complexity Branching factor b, so O(bd) •  Time complexity is a major problem since computer typically only has a limited amount of Ime to make a move

Complexity of Game Playing •  Assume the opponent’s moves can be predicted given the computer’s moves •  How complex would search be in this case? –  worst case: O(bd) branching factor, depth –  Tic-Tac-Toe: ~5 legal moves, 9 moves max game •  59 = 1,953,125 states

–  Chess: ~35 legal moves, ~100 moves per game

Complexity of Minimax Algorithm •  Minimax algorithm applied to complete game trees is impracIcal in pracIce –  instead do depth-limited search to ply (depth) m –  but UIlity funcIon defined only for terminal states –  we need to know a value for non-terminal states

•  bd ~ 35100 ~10154 states, only ~1040 legal states

–  Go: ~250 legal moves, ~150 moves per game

•  Common games produce enormous search trees

StaIc Board EvaluaIon •  A Sta'c Board Evalua'on (SBE) func'on is used to esImate how good the current board configuraIon is for the computer –  it reflects the computer’s chances of winning from that node –  it must be easy to calculate from a board configuraIon

•  For example, for Chess: SBE = α * materialBalance + β * centerControl + γ * … where material balance = Value of white pieces - Value of black pieces, pawn = 1, rook = 5, queen = 9, etc.

•  Sta'c Evalua'on func'ons use heurisIcs to esImate the value of non-terminal states

StaIc Board EvaluaIon •  Typically, one subtracts how good it is for the opponent from how good it is for the computer •  If the SBE gives X for a player, then it gives -X for the opponent •  SBE should agree with the UIlity funcIon when calculated at terminal nodes

Minimax with EvaluaIon FuncIons

Tic-Tac-Toe Example

•  The same as general Minimax, except –  only go to depth m –  esImates value at leaves using the SBE funcIon

•  How would this algorithm perform at Chess? –  if could look ahead ~4 pairs of moves (i.e., 8 ply), would be consistently beaten by average players –  if could look ahead ~8 pairs, is as good as human master Evaluation function = (# 3-lengths open for me) – (# 3-lengths open for opponent)

Minimax Algorithm function Max-Value(s) inputs: s: current state in game, Max about to play output: best-score (for Max) available from s if ( s is a terminal state or at depth limit ) then return ( SBE value of s ) else v= –∞ // v is current best minimax value at s foreach s’ in Successors(s) v = max( v , Min-Value(s’)) return v function Min-Value(s) output: best-score (for Min) available from s if ( s is a terminal state or at depth limit ) then return ( SBE value of s) else v = +∞ // v is current best minimax value at s foreach s’ in Successors(s) v = min( v , Max-Value(s’)) return v

Summary So Far •  Can't use Minimax search to end of the game –  if we could, then choosing opImal move is easy

•  SBE isn't perfect at esImaIng/scoring –  if it was, just choose best move without searching

•  Since neither is feasible for interesIng games, combine Minimax and SBE concepts: –  Use Minimax to cutoff depth m –  use SBE to esImate/score board configuraIon

Minimax Example max

A

min

B

max min max

G

F N

-5

W -3

H 3

I

8

P

O

4

D

C

9

E

0

J Q

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

X

-5

Alpha-Beta Idea •  Some of the branches of the game tree won't be taken if playing against an intelligent opponent •  “If you have an idea that is surely bad, don’t take the Ime to see how truly awful it is.” -- Pat Winston •  Pruning can be used to ignore some branches •  While doing DFS of game tree, keep track of: –  At maximizing levels:

•  highest SBE value, v, seen so far in subtree below each node •  lower bound on node's final minimax value

–  At minimizing levels:

•  lowest SBE value, v, seen so far in subtree below each node •  upper bound on node's final minimax value

Alpha-Beta Idea: Alpha Cutoff max min

A 100 C 200

D

E

100

120

S

α = 100

B

v = 20 F

G

20

Depth-first traversal order Aeer returning from A, can get at least 100 at S Aeer returning from F, can get at most 20 at B At this point no ma^er what minimax value is computed at G, S will prefer A over B. So, S loses interest in B •  There is no need to visit G. The subtree at G is pruned. Saves Ime. Called “Alpha cutoff” (at MIN node B) •  •  •  • 

Beta Cutoff Example max

S

min max

B 20 D 20

•  •  •  • 

E

-10

F

-20

A 20

β = 20

C 25

v = 25 X

G

25

H

After returning from B, can get at most 20 at MIN node A After returning from G, can get at least 25 at MAX node C No matter what minimax value is found at H, A will NEVER choose C over B, so don’t visit node H Called “Beta Cutoff” (at MAX node C)

Alpha Cutoff •  At each MIN node, keep track of the minimum value returned so far from its visited children •  Store this value as v •  Each Ime v is updated (at a MIN node), check its value against the α value of all its MAX node ancestors •  If α ≥ v for some MAX node ancestor, don’t visit any more of the current MIN node’s children; i.e., prune (cutoff) all subtrees rooted at remaining children

Beta Cutoff •  At each MAX node, keep track of the maximum value returned so far from its visited children •  Store this value as v •  Each Ime v is updated (at a MAX node), check its value against the β value of all its MIN node ancestors •  If v ≥ β for some MIN node ancestor, don’t visit any more of the current MAX node’s children; i.e., prune (cutoff) all subtrees rooted at remaining children

ImplementaIon of Cutoffs At each node, keep both α and β values, where α = largest (i.e., best) value from its MAX node ancestors in search tree, and β = smallest (i.e., best) value from its MIN node ancestors in search tree. Pass these down the tree during traversal –  At MAX node, v = largest value from its children visited so far

•  v value at MAX comes from its descendants •  β value at MAX comes from its MIN node ancestors

ImplementaIon of Alpha Cutoff max

α = -∞ β = +∞

min

A C 200

Initialize root’s values

S

B D

100

E

F

120

20

G

•  At each node, keep two bounds (based on all ancestors

–  At MIN node, v = smallest value from its children visited so far

•  α value at MIN comes from its MAX node ancestors •  v value at MIN comes from its descendants

visited so far on path back to root): §  α: the best (largest) MAX can do at any ancestor §  β: the best (smallest) MIN can do at any ancestor §  v: the best value returned by current node’s visited children •  If at anytime α ≥ v at a MIN node, the remaining children are pruned (i.e., not visited)

Alpha Cutoff Example

Alpha Cutoff Example

max min

α = -∞ β = +∞ α = -∞ β = +∞ v = +∞ C 200

A

max

S

min

B D

100

E

120

F

20

G

α = -∞ β = +∞ α = -∞ β = 200 v = 200 C 200

A 200

S

B D

100

E

120

F

20

G

Alpha Cutoff Example max min

α=-∞ β=+∞ α=-∞ β=100 v=100

A 100

C 200

D

E

120

min F

20

min

C 200

A 100

α=100 β=120 v=120

B 120 D

100

E

120

A 100

F

20

G

α=100 β=+∞

B D

100

E

F

120

G

20

Alpha Cutoff Example max

α=100 S β=+∞ 100 α=-∞ β=100

α=-∞ β=100 C 200

G

Alpha Cutoff Example max

α=100 β=+∞ S v=100 100

max

S

B

100

Alpha Cutoff Example

min

α=100 S β=+∞ 100 α=-∞ β=100 C 200

A 100

α=100 β=120 v=20

B 20 D

100

E

120

F

20

X

α = 100 ≥ v = 20

G

Notes: •  Alpha cutoff means not visiting some of a MIN node’s children •  v values at MIN come from descendants •  Alpha value at MIN come from MAX node ancestors

Alpha-Beta Algorithm Starting from the root: func'on Max-Value (s, α, β) Max-Value(root, -∞, +∞) inputs: s: current state in game, Max about to play α: best score (highest) for Max along path from s to root β: best score (lowest) for Min along path from s to root if ( s is a terminal state or at depth limit ) then return ( SBE value of s ) v = -∞ // v = best minimax value found so far at s for each s’ in Successors(s) v = max( v, Min-Value(s’, α, β)) if (v ≥ β ) then return v // prune remaining children α = max(α, v) return v // return value of best child

func'on Min-Value(s, α, β) if ( s is a terminal state or at depth limit ) then return ( SBE value of s) v = +∞ // v = best minimax value found so far at s for each s’ in Successors(s) v = min( v, Max-Value(s’, α, β)) if (α ≥ v ) then return v // prune remaining children β = min(β, v) return v // return value of best child





Alpha-Beta Example

max B

N

-5

O

4

W -3

H 3

D

I

J

P

Q

-5

-6

L

K R 0

S 3

M

2

T 5

max B min α=-∞,Bβ=+∞

E

0

8

9

X

Call Stack

A

α=-∞,Aβ=+∞

C G

F

Alpha-Beta Example

U

-7

G

F V

N

-9

O

4

A

-5

W -3

3

-5

E

0

8

I

J

P

Q

9

X

D

C H

Call Stack

A

α=-∞, β=+∞

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

B A

Alpha-Beta Example

max B min α=-∞, β=+∞ max α=-∞,FFβ=+∞ G -5 N

3

O

4

W

Call Stack

A

α=-∞, β=+∞

D

C H

Alpha-Beta Example

I

J

P

Q

9

L

K R

-6

S

0

B minα=-∞, β=+∞

E

0

8

T

3

U

5

maxα=-∞,Fβ=+∞G -5

M

2

V

-7

-9

X

-3

max

-5

N

F B A

3

O

4

W

-5

Alpha-Beta Example

E

0

8

I

J

P

Q

9

X

-3

D

C H

Call Stack

A

α=-∞ β=+∞

-6

L

K R

S

0

T

3

M

2

U

5

V

-7

-9

brown: terminal state

N F B A

Alpha-Beta Example

v(F) = alpha(F) = 4, maximum seen so far

B α=-∞, β=+∞

min

FF G max α= β=+∞ -5 v=4, α=4, N

O

4

W -3

α=-∞, β=+∞

3

I

J

P

Q

X

-6

0

brown: terminal state

S 3

M

2

T 5

U

-7

-9

F B A

min

B

G

-5

α=4

O

N

4 α=4,O β=+∞

W -3

X

-5

D

C

β=+∞

F

max V

Call Stack

A

α=-∞

min

L

K R

max

E

0

8

9

-5

D

C

H

Call Stack

A

max

H 3

8

I

J

P

Q

9

E

0

-6

L

K R 0

brown: terminal state

S 3

M

2

T 5

U

-7

V

-9

O F B A

Alpha-Beta Example

Alpha-Beta Example

v(O) = -3, minimum seen so far below O max

Call Stack

A

α=-∞

min

B

F

max

-5

H 3

O

N

min

G

α=4

4 α=4, β=+∞

W

X

-3

-5

D

C

β=+∞

I

J

P

Q

9

-6

R

S

0

M

2

T

3

min

L

K

U

5

V

-7

-9

brown: blue: terminal terminalstate state (depth limit)

W O F B A

Call Stack

A

α=-∞

E

0

8

max B

F

max

-5

H 3

O

N

min

G

α=4

4 α=4, β=v=-3

W

X

-3

-5

Alpha-Beta Example

D

C

β=+∞

8

I

J

P

Q

9

E

0

-6

L

K R

S

0

T

3

M

2

U

5

V

-7

-9

brown: terminal state

O F B A

Alpha-Beta Example Why? Smart opponent will choose W or worse, thus O's upper bound is –3. So, at F computer shouldn't choose O:-3 since N:4 is be^er.

alpha(O) ≥ v(O): stop expanding O (alpha cutoff) max

Call Stack

A

α=-∞

B

min

F

max min

G

-5

α=4

O

N

α=4 v=-3

4

W -3

X

-5

D

C

β=+∞

H 3

I

J

P

Q

9

-6

R

red: not visited

0

S 3

M

2

T 5

B

min

L

K

U

-7

F

-9

O F B A

min

G

-5

α=4

O

N

α=4, v=-3

4

W -3

X

-5

D

C

β=+∞

max V

Call Stack

A

α=-∞

E

0

8

max

H 3

8

I

J

P

Q

9

E

0

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

O F B A

Alpha-Beta Example

Alpha-Beta Example

alpha(F) not changed (maximizing)

B

N

min

G

-5

α=4

H 3

O

4

D

I

J

P

Q

-6

S

0

M

2

T

3

U

5

V

-7

-9

-5

F B A

min

F

G

-5

α=4

N

H 3

O

4

Call Stack

A

α=-∞

D

C

β=4 β=

max

X

-3

B

min

L

K R

max

E

0

8

9

v=-3

W

α=-∞

C

β=+∞

F

max

Call Stack

A

max min

v(B) = beta(B) = 4, minimum seen so far

W

8

I

J

P

Q

9

v=-3

E

0

-6

L

K R

S

0

T

3

M

2

U

5

V

-7

-9

X

-3

-5

Alpha-Beta Example

B A

Alpha-Beta Example

v(B) = beta(B) = -5, updated to minimum seen so far

B

min F

min

G

-5

α=4

N

O

4

-3

H 3

D

I

J

P

Q

X

-5

-6

0

S 3

M

2

T 5

B

min

L

K R

max

E

0

8

9

v=-3

W

α=-∞

C

β=4

max

Call Stack

A

max

U

-7

F

V

-9

G B A

min

G

-5

α=4

N

O

4

-3

H 3

X

-5

E

0

8

I

J

P

Q

9

v=-3

W

D

C

β=-5 β=4

max

Call Stack

A

α=-∞

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

B A

Alpha-Beta Example

Alpha-Beta Example

Copy alpha and beta values from A to C

v(A) = alpha(A) = -5, maximum seen so far max B

min F

max N

min

G

-5

α=4

H 3

O

4

8

I

J

P

Q

-6

R

S

0

M

2

T

3

U

5

V

-7

F

-5

H 3

O

4

A

-5

G

α=4

N

min

-9

C

W

E

0

8

I

J

P

Q

9

v=-3

Call Stack D

α=-5,Cβ=+∞

β=-5

max

X

-3

B

min

L

K

A

Aβ=+∞ α=-5,α=

max

E

0

9

v=-3

W

D

C

β=-5

Call Stack

A

α=-5,α=β=+∞

-6

L

K R

S

0

T

3

M

2

U

5

V

-7

-9

X

-3

-5

Alpha-Beta Example

C A

Alpha-Beta Example

v(C) = beta(C) = 3, minimum seen so far A Aβ=+∞ α=-5,α=

max B

min F

max min

C

β=-5

N

-5

O

4

-3

H 3

I

J

P

Q

X

-5

-6

L

K R 0

S 3

M

2

T 5

B

min

U

-7

F

max V

-9

H C A

min

N

-5

O

4

-3

H 3

X

-5

E

0

8

I

J

P

Q

9

v=-3

W

D

α=-5, β=β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

Call Stack

A

max

E

0

8

9

v=-3

W

D

α=-5, β=+∞

G

α=4

Call Stack

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

C A

Alpha-Beta Example

Alpha-Beta Example

beta(C) not changed (minimizing)

B

min F

max min

N

-5

H 3

O

4

E

0

8

I

J

P

Q

9

v=-3

W

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

Call Stack

A

max

-6

L

K R

S

0

M

2

T

3

U

5

V

-7

F

-9

-5

I C A

min

N

-5

B

min

W

F

max min

G

-5

α=4

N

O

4

-3

H 3

X

-5

E

0

I

J

P

Q

8 α=-5,J β=3

9

v=-3

W

D

-6

R 0

S 3

M

2

T 5

J

P

Q

-6

L

K R

S

0

T

3

M

2

U

5

V

-7

-9

-5

C A

B

U

-7

F

max V

-9

J C A

min

N

-5

O

4

-3

H 3

X

-5

E

0

I

J

P

Q

-6

R 0

L

K

α=-5, 8 β=3, v=9

9

v=-3

W

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

Call Stack

A

max min

L

K

I

X

-3

Call Stack

A

α=-5, β=3

8

9

E

0

Alpha-Beta Example

α=-5,α= β=+∞

C

β=-5

3

v=-3

Alpha-Beta Example

max

H

O

4

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

max

X

-3

B

min

Call Stack

A

max

S 3

M

2

T 5

U

-7

V

-9

P J C A

Alpha-Beta Example

Alpha-Beta Example

v(J) ≥ beta(J): stop expanding J (beta cutoff)

v(J) = 9

B

min F

max min

N

-5

H 3

J

P

Q

9

v=-3

W

I

-6

L

K

R

S

0

M

2

T

3

U

5

V

-7

F

-9

-5

J C A

N

min

-5

H 3

O

4

X

-3

-5

Alpha-Beta Example

E

0

I

J

P

Q

-6

L

K

α=-5, 8 β=3, v=9

9

v=-3

W

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

max

X

-3

B

min

Call Stack

A

max

E

0

α=-5, 8 β=3, α= v=9

O

4

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

Call Stack

A

max

R

S

0

T

3

M

2

U

5

V

-7

-9

red: not visited

J C A

Alpha-Beta Example

Why? Computer should choose P or be^er at J, so J's lower bound is 9. But, smart opponent at C won't take J:9 since H:3 is be^er for opponent. v(C) and beta(C) not changed (minimizing)

B

min F

max min

N

-5

O

4

-3

H 3

I

J

P

Q

X

-5

-6

0

S 3

M

2

T 5

B

min

L

K R

max

E

0

8 v=9, β=3

9

v=-3

W

D

α=-5, β=3

G

α=4

α=-5,α= β=+∞

C

β=-5

Call Stack

A

max

U

-7

F

max V

-9

J C A

min

C

β=-5

N

-5

O

4

-3

H 3

I

X

-5

E

0

J

8

v=9

P

Q

9

v=-3

W

D

α=-5, β=3

G

α=4

Call Stack

A

α=-5,α= β=+∞

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

C A

Alpha-Beta Example

Alpha-Beta Example

v(A) = alpha(A) = 3, updated to maximum seen so far

B

min F

max N

min

-5

H

I

3

O

4

J

v=9

P

Q

-6

L

K R

S

0

M

2

T

3

U

5

V

-7

F

min

-9

N

A

-5

-5

W

B

F

max min

G

-5

α=4

N

O

4

-3

3

D

I

X

-5

J

v=9

P

Q

-6

0

S 3

Q

-6

R

S

0

T

3

M

2

U

5

V

-7

-9

-5

D A

M

2

T 5

B

U

-7

F

max V

min

-9

A

N

-5

O

4

-3

H 3

I

X

-5

E

0

J

8

v=9

P

Q

9

v=-3

W

D

β=3

G

α=4

α=3,α= β=+∞

C

β=-5

Call Stack

A

max min

L

K R

P

L

K

X

-3

E

0

8

9

v=-3

W

H

v=9

9

Call Stack

A

β=3

J

8

How does the algorithm finish the search tree?

α=3,α= β=+∞

C

β=-5

I

E

0

Alpha-Beta Example

alpha(A) and v(A) not updated after returning from D (maximizing)

min

3

v=-3

Alpha-Beta Example

max

H

O

4

D

β=3

G

α=4

α=3,α= β=+∞

C

β=-5

max

X

-3

B

min

Call Stack

A

max

E

0

8

9

v=-3

W

D

β=3

G

α=4

α=-5 α= α=3, β=+∞

C

β=-5

Call Stack

A

max

-6

L

K R 0

S 3

M

2

T 5

U

-7

V

-9

A

Alpha-Beta Example

Alpha-Beta Example Why? Smart opponent will choose L or worse, thus E's upper bound is 2. So computer at A shouldn't choose E:2 since C:3 is a be^er move.

Aeer visiIng K, S, T, L and returning to E, alpha(E) ≥ v(E): stop expanding E and don’t visit M (alpha cutoff)

B

min F

max min

N

-5

H 3

O

4

I

E

0

J

8

v=9

P

Q

9

v=-3

W

D

β=3

G

α=4

α=3,α= β=+∞

C

β=-5

Call Stack

A

max

-6

K

L

M

α=5, β=+∞ 2

R

S

0

T

3

U

5

min

-9

A

X

-3

F

-5

N

-5

O

4

-3

H 3

I

X

-5

E

0

J

8

v=9

P

Q

9

v=-3

W

D

β=3

G

α=4

α=3,α= β=+∞

C

β=-5

max V

-7

B

min

α=3, β=5, v=2

Call Stack

A

max

-6

α=3, β=5, v=2

K

L

M

α=5, β=2 2

R 0

S 3

T 5

U

-7

V

-9

A

Alpha-Beta Example Final result: Computer chooses move C max B

min F

max min

C

β=-5

N

-5

O

4

-3

H 3

I

X

-5

E

0

J

8

v=9

P

Q

9

v=-3

W

D

β=3

G

α=4

Call Stack

A

α=3,α= β=+∞

-6

α=3, β=5, v=2

K

L

M

α=5, β=2 2

R 0

S 3

T 5

U

-7

V

-9

A

Another step-by-step example (from AI course at UC Berkeley) given at h^ps://www.youtube.com/watch?v=xBXHtz4Gbdo

EffecIveness of Alpha-Beta Search •  EffecIveness depends on the order in which successors are examined; more effecIve if best successors are examined first •  Worst Case: –  ordered so that no pruning takes place –  no improvement over exhausIve search

•  Best Case:

–  each player’s best move is visited first

•  In pracIce, performance is closer to best, rather than worst, case

Dealing with Limited Time •  In real games, there is usually a Ime limit T on making a move •  How do we take this into account? –  cannot stop alpha-beta midway and expect to use results with any confidence –  so, we could set a conservaIve depth-limit that guarantees we will find a move in Ime < T –  but then, the search may finish early and the opportunity is wasted to do more search

EffecIveness of Alpha-Beta Search •  In pracIce oeen get O(b(d/2)) rather than O(bd) –  same as having a branching factor of √b since (√b)d = b(d/2)

•  Example: Chess –  Deep Blue went from b ~ 35 to b ~ 6, visiting 1 billionth the number of nodes visited by Minimax algorithm –  permits much deeper search for the same Ime –  makes computer chess compeIIve with humans

Dealing with Limited Time In pracIce, use itera've deepening search (IDS) –  run alpha-beta search with depth-first search and an increasing depth limit –  when the clock runs out, use the soluIon found for the last completed alpha-beta search (i.e., the deepest search that was completed) –  “anyIme algorithm”

The Horizon Effect •  SomeImes disaster lurks just beyond search depth –  computer captures queen, but a few moves later the opponent checkmates (i.e., wins)

•  The computer has a limited horizon, it cannot see that this significant event could happen •  How do you avoid catastrophic losses due to “short-sightedness”? –  quiescence search –  secondary search

Book Moves •  Build a database of opening moves, end games, and studied configuraIons •  If the current state is in the database, use database: –  to determine the next move –  to evaluate the board

•  Otherwise, do Alpha-Beta search

The Horizon Effect •  Quiescence Search –  when SBE value is frequently changing, look deeper than limit –  look for point when game “quiets down” –  E.g., always expand any forced sequences

•  Secondary Search 1.  find best move looking to depth d 2.  look k steps beyond to verify that it sIll looks good 3.  if it doesn't, repeat step 2 for next best move

More on EvaluaIon FuncIons The board evaluaIon funcIon esImates how good the current board configuraIon is for the computer –  it is a heurisIc funcIon of the board's features •  i.e., function(f1, f2, f3, …, fn)

–  the features are numeric characterisIcs •  feature 1, f1, is number of white pieces •  feature 2, f2, is number of black pieces •  feature 3, f3, is f1/f2 •  feature 4, f4, is esImate of “threat” to white king •  etc.

Linear EvaluaIon FuncIons •  A linear evalua'on func'on of the features is a weighted sum of f1, f2, f3, ... w1 * f1 + w2 * f2 + w3 * f3 + … + wn * fn –  where f1, f2, …, fn are the features –  and w1, w2 , …, wn are the weights

•  More important features get more weight

Examples of Algorithms that Learn to Play Well Checkers A. L. Samuel, “Some Studies in Machine Learning using the Game of Checkers,” IBM Journal of Research and Development, 11(6):601-617, 1959 •  Learned by playing thousands of Imes against a copy of itself •  Used an IBM 704 with 10,000 words of RAM, magneIc tape, and a clock speed of 1 kHz •  Successful enough to compete well at human tournaments

Linear EvaluaIon FuncIons •  The quality of play depends directly on the quality of the evaluaIon funcIon •  To build an evaluaIon funcIon we have to: 1.  construct good features using expert domain knowledge 2.  pick or learn good weights

Examples of Algorithms that Learn to Play Well Backgammon G. Tesauro and T. J. Sejnowski, “A Parallel Network that Learns to Play Backgammon,” ArCficial Intelligence, 39(3), 357-390, 1989 •  Also learns by playing against copies of itself •  Uses a non-linear evaluaIon funcIon - a neural network •  Rated one of the top three players in the world

Non-DeterminisIc Games 0

1 2 3 4 5 6

Non-DeterminisIc Games

7 8 9 10 11 12

•  Some games involve chance, for example: –  roll of dice –  spin of game wheel –  deal of cards from shuffled deck

•  How can we handle games with random elements? •  The game tree representaIon is extended to include “chance nodes:”

25

24 23 22 21 20 19

1.  computer moves 2.  chance nodes (represenIng random events) 3.  opponent moves

18 17 16 15 14 13

Non-DeterminisIc Games

•  Weight score by the probability that move occurs •  Use expected value for move: instead of using max or min, compute the average, weighted by the probabiliIes of each child

Extended game tree representaIon: A

50/50

.5 B 7

.5

.5

.5 E

0

5

4

.5 D

6

50/50 50/50

chance

50/50

6

2 9

A

max

C

2

Non-DeterminisIc Games

0 8

B

min

4

-4

50/50 50/50

.5

.5 D

6

2 9

chance

-2

C

2

7

.5

max

E

0

6

5

min

4

0 8

-4

Non-DeterminisIc Games Choose move with highest expected value A

max

α= 4 50/50

50/50

4

-2

.5 B

.5

C

2

7

.5

.5 D

6

2 9

chance E

0

6

5

min

4

0 8

ExpecIminimax Expectiminimax(n) = Utility(n)

for n, a Terminal state

maxs∈Succ(n) expectiminimax( s)

for n, a Max node

mins∈Succ(n) expectiminimax( s)

for n, a Min node

Σ s∈Succ ( n ) P ( s ) * expectiminimax( s) for n, a Chance node

-4

Non-DeterminisIc Games •  Non-determinism increases branching factor –  21 possible disInct rolls with 2 dice (since 6-5 is same as 5-6)

•  Value of lookahead diminishes: as depth increases, probability of reaching a given node decreases •  Alpha-Beta pruning less effecIve

History of Search InnovaIons • Shannon, Turing • Kotok/McCarthy • MacHack • Chess 3.0+ • Belle • Cray Blitz • Hitech • Deep Blue

Minimax search 1950 Alpha-Beta pruning 1966 TransposiIon tables 1967 IteraIve-deepening 1975 Special hardware 1978 Parallel search 1983 Parallel evaluaIon 1985 ALL OF THE ABOVE 1997

Computers Play GrandMaster Chess “Deep Blue” (IBM) •  •  •  •  •  •  •  • 

Parallel processor, 32 “nodes” Each node had 8 dedicated VLSI “chess chips” Searched 200 million configuraIons/second Used minimax, alpha-beta, sophisIcated heurisIcs Average branching factor ~6 instead of ~40 In 2001 searched to 14 ply (i.e., 7 pairs of moves) Avoided horizon effect by searching as deep as 40 ply Used book moves

Computers can Play GrandMaster Chess Kasparov vs. Deep Blue, May 1997 •  6 game full-regulaIon chess match sponsored by ACM •  Kasparov lost the match 2 wins to 3 wins and 1 Ie •  Historic achievement for computer chess; the first Ime a computer became the best chess player on the planet •  Deep Blue played by “brute force” (i.e., raw power from computer speed and memory); it used relaIvely li^le that is similar to human intuiIon and cleverness

Status of Computers in Other DeterminisIc Games

“Game Over: Kasparov and the Machine” (2003)

•  Checkers –  First computer world champion: Chinook –  Beat all humans (beat Marion Tinsley in 1994) –  Used Alpha-Beta search and book moves

•  Othello –  Computers easily beat world experts

•  Go –  Branching factor b ~ 360, very large! –  Beat Lee Sedol, one of the top players in the world, in 2016, 4 games to 1

Game Playing: Go Google’s AlphaGo beat Korean grandmaster Lee Sedol 4 games to 1 in 2016

How to Improve Performance? •  Reduce depth of search –  Be^er SBEs •  Use deep learning rather than hand-craeed features

•  Reduce breadth of search –  Sample possible moves instead of exploring all •  Use randomized exploraIon of the search space

Monte Carlo Tree Search (MCTS)

Pure Monte Carlo Tree Search

•  Concentrate search on most promising moves •  Best-first search based on random sampling of search space

•  For each possible legal move of current player, simulate k random games by selecIng moves at random for both players unIl game over (called playouts); count how many were wins out of each k playouts; move with most wins is selected •  Stochas'c simula'on of game •  Game must have finite number of possible moves, and game length is finite

•  Monte Carlo methods are a broad class of algorithms that rely on repeated random sampling to obtain numerical results. They can be used to solve problems having a probabilisIc interpretaIon.

ExploitaIon vs. ExploraIon

MCTS Algorithm

•  Rather than selecIng a child at random, how to select best child node during tree descent?

Recursively build search tree, where each round consists of: 1.  StarIng at root, successively select best child nodes using scoring method unIl leaf node L reached 2.  Create and add best new child node, C, of L 3.  Perform random playout from C 4.  Update score at C and all of C’s ancestors in search tree

–  Exploita'on: Keep track of average win rate for each child from previous searches; prefer child that has previously lead to more wins –  Explora'on: Allow for exploraIon of relaIvely unvisited children (moves) too

•  Combine these factors to compute a “score” for each child; pick child with highest score at each successive node in search

Monte Carlo Tree Search (MCTS)

Update Scores

L C Key: number games won / number played playouts

Playout Key: number games won / number played playouts

State-of-the Art Go Programs •  Google’s AlphaGo •  Facebook’s Darkforest •  MCTS implemented using mulIple threads and GPUs, and up to 110K playouts •  Also use a deep neural network to compute SBE

Summary •  Minimax is an algorithm that chooses “opImal” moves by assuming that the opponent always chooses their best move •  Alpha-beta is an algorithm that can avoid large parts of the search tree, thus enabling the search to go deeper •  For many well-known games, computer algorithms using heurisIc search can match or out-perform human world experts

Summary •  Game playing is best modeled as a search problem •  Search trees for games represent alternate computer/opponent moves •  EvaluaIon funcIons esImate the quality of a given board configuraIon for each player − good for opponent 0 neutral + good for computer