Greedy Search
Game Tree Representation
using an Evaluation Function
What's the new aspect • A Utility function is used to map each terminal state
to the search problem? of the board (i.e., states where the game is over) to a
score indicating the value of that outcome to the
computer
There’s an opponent …
we cannot control! X X X
• We’ll use:
– positive for winning; large + means better for computer
X
…
– negative for losing; large − means better for opponent
XO X O X X
O – 0 for a draw
O
– typical values (loss to win):
…
• -∞ to +∞
How can we handle this? XX X X
• -1.0 to +1.0
O X O XO
Greedy Search Greedy Search
using an Evaluation Function using an Evaluation Function
• Expand the search tree to the terminal states • Assuming a reasonable search space, what's the
on each branch
• Evaluate the Utility of each terminal board problem?
configuration This ignores what the opponent might do!
• Make the initial move that results in the board Computer chooses C
configuration with the maximum value Opponent chooses J and defeats computer
computer's computer's
A
A possible moves A possible moves
9 9
B C D E opponent's B C D E opponent's
B
-5
C
9
D
2
E
3 -5 9 2 3
possible moves possible moves
F G H I J K L M N O F G H I J K L M N O
-7 -5 3 9 -6 0 2 1 3 2 terminal states -7 -5 3 9 -6 0 2 1 3 2 terminal states
board evaluation from computer's perspective board evaluation from computer's perspective
Evaluation function = (# 3-lengths open for me) – (# 3-lengths open for opponent)
Minimax Algorithm Minimax Example
function Max-Value(s)
inputs:
s: current state in game, Max about to play
max A
output: best-score (for Max) available from s
D
if ( s is a terminal state or at depth limit ) min B C 0
E
then return ( SBE value of s )
else G H I L
v= –∞
max F -5 3 8
J K 2
M
foreach s’ in Successors(s)
v = max( v , Min-Value(s’)) min N
4
O P
9
Q
-6
R
0
S
3
T
5
U
-7
V
-9
return v
function Min-Value(s)
output: best-score (for Min) available from s
max W X
-3 -5
if ( s is a terminal state or at depth limit )
then return ( SBE value of s)
else
v = +∞
foreach s’ in Successors(s)
v = min( v , Max-Value(s’))
return v
α = -∞
min A B min β = +∞ A B
v= +∞
C D E F G C D E F G
200 100 120 20 200 100 120 20
α = -∞ α= -∞
β = 200 A β=100 A
min 200 B min v=100 100 B
v = 200
C D E F G C D E F G
200 100 120 20 200 100 120 20
α=100
α=-∞ A α=100 α=-∞ A B β=120
min β=100 100 B β=+∞ min β=100 100 120 v > α so
v=120
continue to
C D E F G C D E F G
200 100 120 20 200 100 120 20 next child of
B
Alpha Cutoff Example Alpha-Beta Cutoffs
max α=100 S
• At a MAX node, if v ≥ β then don’t visit (i.e.,
β=+∞ 100
α=100
cutoff) remaining children of this MAX node
α=-∞ A B β=120
min β=100 100 20 v=20
v = 20 ≤ α = 100 – v is the max value found so far from visiting current MAX
X so prune node’s children
C D E F G
200 100 120 20 – β is the best value found so far at any MIN node ancestor
of the current MAX node
• At a MIN node, if v ≤ α then don’t visit
Notes: remaining children of this MIN node
• Alpha cutoff means not visiting some of a MIN node’s children – v is the min value found so far from visiting current MIN
• v values at MIN come from descendants
node’s children
• Alpha value at MIN come from MAX node ancestors
– α is the best value found so far at any MAX node ancestor
of the current MIN node
Alpha-Beta Algorithm
Starting from the root:
function Max-Value (s, α, β)
inputs:
Max-Value(root, -∞, +∞) function Min-Value(s, α, β)
s: current state in game, Max about to play if ( s is a terminal state or at depth limit )
α: best score (highest) for Max along path from s to root then return ( SBE value of s)
β: best score (lowest) for Min along path from s to root v = +∞
if ( s is a terminal state or at depth limit ) for each s’ in Successors(s)
then return ( SBE value of s ) v = min( v, Max-Value(s’, α, β))
v = -∞ if (v ≤ α ) then return v // prune remaining children
for each s’ in Successors(s) β = min(β, v)
v = max( v, Min-Value(s’, α, β)) return v // return value of best child
if (v ≥ β ) then return v // prune remaining children
α = max(α, v)
return v // return value of best child
Alpha-Beta Example Alpha-Beta Example
B C D E min B D
0 α=-∞,Bβ=+∞ C 0
E
F G H I J K L M F G H I J K L M
-5 3 8 2 -5 3 8 2
N O P Q R S T U V N O P Q R S T U V
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B
W X A W X A
-3 -5 -3 -5
max α=-∞,FFβ=+∞ -5
G H
3
I
8
J K L
2
M maxα=-∞,Fβ=+∞-5
G H
3
I
8
J K L
2
M
N
N O P Q R S T U V F N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X A W X brown: terminal state A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
F G H I J K L M F G H I J K L M
max
v=4, α=4,
α= β=+∞
-5 3 8 2 max α=4 -5 3 8 2
O
N O P Q R S T U V N O
O P Q R S T U V
4 9 -6 0 3 5 -7 -9 F min 4 α=4, β=+∞ 9 -6 0 3 5 -7 -9 F
B B
W X brown: terminal state A W X brown: terminal state A
-3 -5 -3 -5
min B C D E min B C D E
β=+∞ 0 β=+∞ 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 W α=4 -5 3 8 2
O O
N O P Q R S T U V N O P Q R S T U V
min 4 α=4, β=+∞ 9 -6 0 3 5 -7 -9 F min 4 α=4,
β=v=-3 9 -6 0 3 5 -7 -9 F
B B
W X brown:
blue: terminal
terminalstate
state (depth limit) A W X brown: terminal state A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
Why? Smart opponent will choose W or worse, thus O's upper
bound is –3. So, at F computer shouldn't choose O:-3 since
N:4 is better.
v(O) = -3 ≤ alpha(O) = 4: stop expanding O (cutoff)
min B C D E min B C D E
β=+∞ 0 β=+∞ 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
O O O O
min N α=4 P Q R S T U V F min N α=4, P Q R S T U V F
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X red: not visited A W X A
-3 -5 -3 -5
v(F) = alpha(F) = 4 not changed (maximizing) v(B) = beta(B) = 4, minimum seen so far
min B C D E min B C D E
β=+∞ 0 β=4
β= 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
min N O P Q R S T U V F min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
min B C D E min B C D E
β=4 0 β=-5
β=4 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
min N O P Q R S T U V G min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5
v(A) = alpha(A) = -5, maximum seen so far Copy alpha and beta values from A to C
Call A Call
max A max α=-5,Aβ=+∞
β=+∞
α=-5,α= Stack α= Stack
B C D E B C D
min β=-5 0 min β=-5 α=-5,Cβ=+∞ 0
E
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
A Call A Call
max Aβ=+∞
α=-5,α= max
Stack β=+∞
α=-5,α= Stack
min B C D E min B C D E
β=-5 α=-5, β=+∞ 0 β=-5 β= v=3
α=-5, β=3, 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
min N O P Q R S T U V H min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5
min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2
min N O P Q R S T U V I min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0
max F G H I J J K L M max F G H I J K L M
α=4 -5 3 8 α=-5, β=3 2 α=4 -5 3 8
α=-5, β=3, v=9 2
P
min N O P Q R S T U V J min N O P Q R S T U V J
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5
min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 α=-5,
8 α= v=9
β=3, 2 α=4 -5 3 8
α=-5, β=3, v=9 2
min N O P Q R S T U V J min N O P Q R S T U V J
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X red: not visited A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
Why? Computer should choose P or better at J, so J's
lower bound is 9. But, smart opponent at C won't take
J:9 since H:3 is better for opponent.
v(C) and beta(C) not changed (minimizing)
min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9, β=3 2 α=4 -5 3 8 v=9 2
min N O P Q R S T U V J min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5
min B C D E min B C D E
β=-5 β=3 0 β=-5 β=3 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 2 α=4 -5 3 8 v=9 2
min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
D
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
alpha(A) and v(A) not updated after returning How does the algorithm finish the search tree?
from D (because A is a maximizing node)
min B C D E min B C D E
β=-5 β=3 0 β=-5 β=3 0
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 2 α=4 -5 3 8 v=9 2
min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
W X A W X A
-3 -5 -3 -5
min B C D E min B C D E
β=-5 β=3 0 α=3, β=5, v=2 β=-5 β=3 0 α=3, β=5, v=2
max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 α=5, β=+∞ 2 α=4 -5 3 8 v=9 α=5, β=2 2
min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example
Another step-by-step example (from AI course at
Final result: Computer chooses move C UC Berkeley) given at
A Call
max
β=+∞
α=3,α= Stack https://www.youtube.com/watch?v=xBXHtz4Gbdo
min B C D E
β=-5 β=3 0 α=3, β=5, v=2
max F G H I J K L M
α=4 -5 3 8 v=9 α=5, β=2 2
min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9
W X A
-3 -5
Non-Deterministic Games
Non-Deterministic Games
0 1 2 3 4 5 6 7 8 9 10 11 12
• Some games involve chance, for example:
– roll of dice
– spin of game wheel
– deal cards from a shuffled deck
• How can we handle games with random
elements?
– Modify the game search tree to include chance
nodes:
1. computer moves
2. chance nodes (representing random events)
3. opponent moves
25 24 23 22 21 20 19 18 17 16 15 14 13
Non-Deterministic Games Non-Deterministic Games
• Weight score by the probability that move occurs
• Use expected value for move: Instead of using
Extended game tree representation: max or min, compute the average, weighted by the
probabilities of each child
A max A max
7 2 9 6 5 0 8 -4 7 2 9 6 5 0 8 -4
Computers can Play GrandMaster Chess “Game Over: Kasparov and the Machine” (2003)
Monte Carlo Tree Search (MCTS) Pure Monte Carlo Tree Search
• Concentrate search on most promising moves • For each possible legal move of current player,
• Best-first search based on random sampling of simulate k random games by selecting moves
search space at random for both players until game over
(called playouts); count how many were wins
out of each k playouts; move with most wins is
• Monte Carlo methods are a broad class of selected
algorithms that rely on repeated random
sampling to obtain numerical results. They can • Stochastic simulation of game
be used to solve problems having a • Game must have finite number of possible
probabilistic interpretation. moves, and game length is finite
Exploitation vs. Exploration Scoring Child Nodes
• Choose child node, i, with highest score
• Rather than selecting a child at random, how
• One way of defining the score at a node:
to select best child node during tree descent?
– Upper Confidence Bound for Trees (UCT):
– Exploitation: Keep track of average win rate for
each child from previous searches; prefer child Exploitation Exploration
that has previously lead to more wins term wi lnt term
– Exploration: Allow for exploration of relatively +c
ni ni
unvisited children (moves) too
• Combine these factors to compute a “score” where wi = number of wins after ith move,
for each child; pick child with highest score at ni = number of playout simulations after ith move,
each successive node in search t = total number of playout simulations
Playout