Anda di halaman 1dari 29

Game Playing and AI

Game playing as a problem for AI research


– game playing is non-trivial
Game Playing • players need human-like intelligence
• games can be very complex (e.g., Chess, Go)
• requires decision making within limited time

Chapter 5.1 – 5.3, 5.5 – games usually are:


• well-defined and repeatable
• fully observable and limited environments
– can directly compare humans and computers

Computers Playing Chess Types of Games


Definitions:

• Zero-sum: one player’s gain is the other player’s


loss. Does not mean fair.
• Discrete: states and decisions have discrete values
• Finite: finite number of states and decisions
• Deterministic: no coin flips, die rolls – no chance
• Perfect information: each player can see the
complete game state. No simultaneous decisions.
Game Playing and AI Game Playing as Search

Deterministic Stochastic (chance) • Consider two-player, perfect information,


deterministic, 0-sum board games:
– e.g., chess, checkers, tic-tac-toe
Fully Observable Checkers, Chess, Backgammon,
Monopoly
– Board configuration: a specific arrangement of
(perfect info) Go, Othello
"pieces"
• Representing board games as search problem:
– states: board configurations
Partially Observable Stratego, Bridge, Poker,
(imperfect info) Battleship Scrabble – actions: legal moves
– initial state: starting board configuration
– goal state: game over/terminal board configuration
All are also multi-agent, adversarial, static tasks

Greedy Search
Game Tree Representation
using an Evaluation Function
What's the new aspect • A Utility function is used to map each terminal state
to the search problem? of the board (i.e., states where the game is over) to a
score indicating the value of that outcome to the
computer
There’s an opponent …
we cannot control! X X X
• We’ll use:
– positive for winning; large + means better for computer
X


– negative for losing; large − means better for opponent
XO X O X X
O – 0 for a draw
O
– typical values (loss to win):

• -∞ to +∞
How can we handle this? XX X X
• -1.0 to +1.0
O X O XO
Greedy Search Greedy Search
using an Evaluation Function using an Evaluation Function
• Expand the search tree to the terminal states • Assuming a reasonable search space, what's the
on each branch
• Evaluate the Utility of each terminal board problem?
configuration This ignores what the opponent might do!
• Make the initial move that results in the board Computer chooses C
configuration with the maximum value Opponent chooses J and defeats computer
computer's computer's
A
A possible moves A possible moves
9 9

B C D E opponent's B C D E opponent's
B
-5
C
9
D
2
E
3 -5 9 2 3
possible moves possible moves

F G H I J K L M N O F G H I J K L M N O
-7 -5 3 9 -6 0 2 1 3 2 terminal states -7 -5 3 9 -6 0 2 1 3 2 terminal states

board evaluation from computer's perspective board evaluation from computer's perspective

Minimax Principle Minimax Principle


• The computer assumes after it moves
Assume both players play optimally the opponent will choose the minimizing move
– assuming there are two moves until the • The computer chooses the best move
terminal states, considering both its move and the opponent s
– high Utility values favor the computer optimal move
computer's
possible moves
• computer should choose maximizing moves A
A
1

– low Utility values favor the opponent B


B C
C D
D E
E
opponent's
-7 -6 0 1 possible moves
• smart opponent chooses minimizing moves
F G H I J K L M N O terminal states
-7 -5 3 9 -6 0 2 1 3 2

board evaluation from computer's perspective


Propagating Minimax Values
Deeper Game Trees
Up the Game Tree
• Explore the tree to the terminal states • Minimax can be generalized to more than 2 moves
• Evaluate the Utility of the resulting board • Propagate values up the tree
configurations
• The computer makes a move to put the board A
A computer max
in the best configuration for it assuming the 3

opponent makes her best moves on her turn(s): B


B C
C D E
E
opponent
-5 3 0 -7 min
– start at the leaves
F G H I J K L M computer
– assign value to the parent node as follows F
4 -5 3 8
J
9
K
5 2
M
-7 max
• use minimum when node is the opponent s move
N O opponent P Q R S T U V
• use maximum when node is the computer's move 4
O
-5
min
9 -6 0 3 5 -7 -9
terminal states
W X
-3 -5

General Minimax Algorithm Complexity of Minimax Algorithm


For each move by the computer:
1. Perform depth-first search, stopping
at terminal states
Assume all terminal states are at depth d and
2. Evaluate each terminal state
there are b possible moves at each step
3. Propagate upwards the minimax values • Space complexity
if opponent's move, propagate up Depth-first search, so O(bd)
minimum value of its children • Time complexity
if computer's move, propagate up Branching factor b, so O(bd)
maximum value of its children
4. Choose move at root with the maximum
of the minimax values of its children • Time complexity is a major problem since
computer typically only has a limited amount
of time to make a move
Search algorithm independently invented by Claude
Shannon (1950) and Alan Turing (1951)
Complexity of Game Playing Complexity of Minimax Algorithm
• Assume the opponent’s moves can be
predicted given the computer’s moves • Minimax algorithm applied to complete game
• How complex would search be in this case? trees is impractical in practice
– worst case: O(bd) branching factor, depth – instead do depth-limited search to ply (depth) m
– Tic-Tac-Toe: ~5 legal moves, 9 moves max game – but Utility function defined only for terminal
• 59 = 1,953,125 states states
– Chess: ~35 legal moves, ~100 moves per game – we need to know a value for non-terminal states
• bd ~ 35100 ~10154 states, only ~1040 legal states
– Go: ~250 legal moves, ~150 moves per game • Static Evaluation functions use heuristics to
estimate the value of non-terminal states
• Common games produce enormous search trees

Static Board Evaluation Static Board Evaluation


• A Static Board Evaluation (SBE) function is used
to estimate how good the current board • Typically, one subtracts how good it is for the
configuration is for the computer opponent from how good it is for the
– it reflects the computer’s chances of winning from computer
that node • If the SBE gives X for a player, then it gives -X
– it must be easy to calculate from a board for the opponent
configuration
• SBE should agree with the Utility function
when calculated at terminal nodes
• For example, for Chess:
SBE = α * materialBalance + β * centerControl + γ * …
where material balance = Value of white pieces - Value of
black pieces, pawn = 1, rook = 5, queen = 9, etc.
Tic-Tac-Toe
Minimax with Evaluation Functions Example
• The same as general Minimax, except
– only go to depth m
– estimates value at leaves using the SBE function
• How would this algorithm perform at Chess?
– if could look ahead ~4 pairs of moves (i.e., 8 ply),
would be consistently beaten by average players
– if could look ahead ~8 pairs, is as good as human
master

Evaluation function = (# 3-lengths open for me) – (# 3-lengths open for opponent)
Minimax Algorithm Minimax Example
function Max-Value(s)
inputs:
s: current state in game, Max about to play
max A
output: best-score (for Max) available from s
D
if ( s is a terminal state or at depth limit ) min B C 0
E
then return ( SBE value of s )
else G H I L
v= –∞
max F -5 3 8
J K 2
M
foreach s’ in Successors(s)
v = max( v , Min-Value(s’)) min N
4
O P
9
Q
-6
R
0
S
3
T
5
U
-7
V
-9
return v
function Min-Value(s)
output: best-score (for Min) available from s
max W X
-3 -5
if ( s is a terminal state or at depth limit )
then return ( SBE value of s)
else
v = +∞
foreach s’ in Successors(s)
v = min( v , Max-Value(s’))
return v

Summary So Far Alpha-Beta Idea


• Can't use Minimax search to end of the game • Some of the branches of the game tree won't be
taken if playing against an intelligent opponent
– if we could, then choosing optimal move is easy
• If you have an idea that is surely bad, don t take
• SBE isn't perfect at estimating/scoring the time to see how truly awful it is.
– if it was, just choose best move without searching -- Pat Winston
• Since neither is feasible for interesting games, • Pruning can be used to ignore some branches
combine Minimax and SBE concepts: • While doing DFS of game tree, keep track of:
– At maximizing levels:
– Use Minimax to cutoff search at depth m • highest SBE value, v, seen so far in subtree below each node
– use SBE to estimate/score board configuration • lower bound on node's final minimax value
– At minimizing levels:
• lowest SBE value, v, seen so far in subtree below each node
• upper bound on node's final minimax value
Alpha-Beta Idea: Alpha Cutoff
max S α = 100
• Also keep track of min A
100 B v = 20 v≤α
C D E F
α = best already explored option along the path to 200 100 120 20
G

the root for MAX, including the current node


• Depth-first traversal order
β = best already explored option along the path to • After returning from A, can get at least 100 at S
the root for MIN, including the current node • After returning from F, can get at most 20 at B
• At this point no matter what minimax value is computed at
G, S will prefer A over B. So, S loses interest in B
• There is no need to visit G. The subtree at G is pruned.
Saves time. Called “Alpha cutoff” (at MIN node B)

Alpha Cutoff Beta Cutoff Example


max S
• At each MIN node, keep track of the minimum
value returned so far from its visited children min A
20 β = 20
• Store this value as v v≥β
max B C
25 v = 25
• Each time v is updated (at a MIN node), check 20
X X
its value against the α value of all its MAX D
20
E
-10
F
-20
G
25
H
node ancestors
• If v ≤ α for some MAX node ancestor, don’t • After returning from B, can get at most 20 at MIN node A
• After returning from G, can get at least 25 at MAX node
visit any more of the current MIN node’s C
children; i.e., prune (cutoff) all subtrees • No matter what minimax value is found at H, A will NEVER
choose C over B, so don’t visit node H
rooted at remaining children of MIN • Called “Beta Cutoff” (at MAX node C)
Implementation of Cutoffs
Beta Cutoff At each node, keep both α and β values, where α =
• At each MAX node, keep track of the maximum largest (i.e., best) value of all MAX node ancestors in
value returned so far from its visited children search tree, and β = smallest (i.e., best) value of all MIN
node ancestors in search tree. Pass these down the
• Store this value as v tree during traversal
• Each time v is updated (at a MAX node), check – At MAX node, v = largest value from its children
its value against the β value of all its MIN node visited so far; cutoff if v ≥ β
ancestors • v value at MAX comes from its descendants
• β value at MAX comes from its MIN node ancestors
• If v ≥ β for some MIN node ancestor, don’t visit – At MIN node, v = smallest value from its children
any more of the current MAX node’s children; visited so far; cutoff if v ≤ α
i.e., prune (cutoff) all subtrees rooted at • α value at MIN comes from its MAX node ancestors
remaining children of MAX • v value at MIN comes from its descendants

Implementation of Alpha Cutoff Alpha Cutoff Example


max α = -∞ max α = -∞
β = +∞ S Initialize root’s values β = +∞ S

α = -∞
min A B min β = +∞ A B
v= +∞

C D E F G C D E F G
200 100 120 20 200 100 120 20

• At each node, keep two bounds (based on all nodes on


path back to root):
§ α: the best (largest) MAX can do at any ancestor
§ β: the best (smallest) MIN can do at any ancestor
§ v: the best value returned by current node’s visited children
• If at anytime α ≥ v at a MIN node, the remaining children
are pruned (i.e., not visited)
Alpha Cutoff Example Alpha Cutoff Example
max α = -∞ max α= -∞
β = +∞ S β= +∞ S

α = -∞ α= -∞
β = 200 A β=100 A
min 200 B min v=100 100 B
v = 200

C D E F G C D E F G
200 100 120 20 200 100 120 20

Alpha Cutoff Example Alpha Cutoff Example


α=100
max β=+∞ S max α=100 S
v=100 100 β=+∞ 100

α=100
α=-∞ A α=100 α=-∞ A B β=120
min β=100 100 B β=+∞ min β=100 100 120 v > α so
v=120
continue to
C D E F G C D E F G
200 100 120 20 200 100 120 20 next child of
B
Alpha Cutoff Example Alpha-Beta Cutoffs
max α=100 S
• At a MAX node, if v ≥ β then don’t visit (i.e.,
β=+∞ 100

α=100
cutoff) remaining children of this MAX node
α=-∞ A B β=120
min β=100 100 20 v=20
v = 20 ≤ α = 100 – v is the max value found so far from visiting current MAX
X so prune node’s children
C D E F G
200 100 120 20 – β is the best value found so far at any MIN node ancestor
of the current MAX node
• At a MIN node, if v ≤ α then don’t visit
Notes: remaining children of this MIN node
• Alpha cutoff means not visiting some of a MIN node’s children – v is the min value found so far from visiting current MIN
• v values at MIN come from descendants
node’s children
• Alpha value at MIN come from MAX node ancestors
– α is the best value found so far at any MAX node ancestor
of the current MIN node

Alpha-Beta Algorithm
Starting from the root:
function Max-Value (s, α, β)
inputs:
Max-Value(root, -∞, +∞) function Min-Value(s, α, β)
s: current state in game, Max about to play if ( s is a terminal state or at depth limit )
α: best score (highest) for Max along path from s to root then return ( SBE value of s)
β: best score (lowest) for Min along path from s to root v = +∞
if ( s is a terminal state or at depth limit ) for each s’ in Successors(s)
then return ( SBE value of s ) v = min( v, Max-Value(s’, α, β))
v = -∞ if (v ≤ α ) then return v // prune remaining children
for each s’ in Successors(s) β = min(β, v)
v = max( v, Min-Value(s’, α, β)) return v // return value of best child
if (v ≥ β ) then return v // prune remaining children
α = max(α, v)
return v // return value of best child
Alpha-Beta Example Alpha-Beta Example

max A Call max A Call


A
α=-∞, β=+∞ α=-∞, β=+∞
Stack Stack

B C D E min B D
0 α=-∞,Bβ=+∞ C 0
E

F G H I J K L M F G H I J K L M
-5 3 8 2 -5 3 8 2

N O P Q R S T U V N O P Q R S T U V
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B
W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

max A Call max A Call


α=-∞, β=+∞ Stack α=-∞ β=+∞ Stack
B C D E B C D E
min α=-∞, β=+∞ 0 minα=-∞, β=+∞ 0

max α=-∞,FFβ=+∞ -5
G H
3
I
8
J K L
2
M maxα=-∞,Fβ=+∞-5
G H
3
I
8
J K L
2
M

N
N O P Q R S T U V F N O P Q R S T U V F
4 9 -6 0 3 5 -7 -9 4 9 -6 0 3 5 -7 -9
B B
W X A W X brown: terminal state A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example

v(F) = alpha(F) = 4, maximum seen so far

max A Call max A Call


α=-∞, β=+∞ Stack α=-∞, β=+∞ Stack
B C D E B C D E
min α=-∞, β=+∞ 0 min α=-∞, β=+∞ 0

F G H I J K L M F G H I J K L M
max
v=4, α=4,
α= β=+∞
-5 3 8 2 max α=4 -5 3 8 2
O
N O P Q R S T U V N O
O P Q R S T U V
4 9 -6 0 3 5 -7 -9 F min 4 α=4, β=+∞ 9 -6 0 3 5 -7 -9 F
B B
W X brown: terminal state A W X brown: terminal state A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

v(O) = -3, minimum seen so far below O

max A Call max A Call


α=-∞ Stack α=-∞ Stack

min B C D E min B C D E
β=+∞ 0 β=+∞ 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 W α=4 -5 3 8 2
O O
N O P Q R S T U V N O P Q R S T U V
min 4 α=4, β=+∞ 9 -6 0 3 5 -7 -9 F min 4 α=4,
β=v=-3 9 -6 0 3 5 -7 -9 F
B B
W X brown:
blue: terminal
terminalstate
state (depth limit) A W X brown: terminal state A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
Why? Smart opponent will choose W or worse, thus O's upper
bound is –3. So, at F computer shouldn't choose O:-3 since
N:4 is better.
v(O) = -3 ≤ alpha(O) = 4: stop expanding O (cutoff)

max A Call max A Call


α=-∞ Stack α=-∞ Stack

min B C D E min B C D E
β=+∞ 0 β=+∞ 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

O O O O
min N α=4 P Q R S T U V F min N α=4, P Q R S T U V F
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X red: not visited A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

v(F) = alpha(F) = 4 not changed (maximizing) v(B) = beta(B) = 4, minimum seen so far

max A Call max A Call


α=-∞ Stack α=-∞ Stack

min B C D E min B C D E
β=+∞ 0 β=4
β= 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

min N O P Q R S T U V F min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example

v(B) = beta(B) = -5, updated to minimum seen so far

max A Call max A Call


α=-∞ Stack α=-∞ Stack

min B C D E min B C D E
β=4 0 β=-5
β=4 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

min N O P Q R S T U V G min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
B B
W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

v(A) = alpha(A) = -5, maximum seen so far Copy alpha and beta values from A to C

Call A Call
max A max α=-5,Aβ=+∞
β=+∞
α=-5,α= Stack α= Stack
B C D E B C D
min β=-5 0 min β=-5 α=-5,Cβ=+∞ 0
E

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example

v(C) = beta(C) = 3, minimum seen so far


v(C) (=3) > α(C) (= -5), so no cutoff

A Call A Call
max Aβ=+∞
α=-5,α= max
Stack β=+∞
α=-5,α= Stack

min B C D E min B C D E
β=-5 α=-5, β=+∞ 0 β=-5 β= v=3
α=-5, β=3, 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

min N O P Q R S T U V H min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

beta(C) not changed (minimizing)

max A Call max A Call


β=+∞
α=-5,α= Stack β=+∞
α=-5,α= Stack

min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 2 α=4 -5 3 8 2

min N O P Q R S T U V I min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example

max A Call max A Call


β=+∞
α=-5,α= Stack β=+∞
α=-5,α= Stack

min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0

max F G H I J J K L M max F G H I J K L M
α=4 -5 3 8 α=-5, β=3 2 α=4 -5 3 8
α=-5, β=3, v=9 2
P
min N O P Q R S T U V J min N O P Q R S T U V J
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

v(J) (=9) ≥ beta(J) (=3) so stop expanding J (beta cutoff)


v(J) = 9

max A Call max A Call


β=+∞
α=-5,α= Stack β=+∞
α=-5,α= Stack

min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 α=-5,
8 α= v=9
β=3, 2 α=4 -5 3 8
α=-5, β=3, v=9 2

min N O P Q R S T U V J min N O P Q R S T U V J
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X red: not visited A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example
Why? Computer should choose P or better at J, so J's
lower bound is 9. But, smart opponent at C won't take
J:9 since H:3 is better for opponent.
v(C) and beta(C) not changed (minimizing)

max A Call max A Call


β=+∞
α=-5,α= Stack β=+∞
α=-5,α= Stack

min B C D E min B C D E
β=-5 α=-5, β=3 0 β=-5 α=-5, β=3 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9, β=3 2 α=4 -5 3 8 v=9 2

min N O P Q R S T U V J min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
C C
W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example

v(A) = alpha(A) = 3, updated to maximum seen so far

max A Call max A Call


α=3, β=+∞
α=-5
α= Stack β=+∞
α=3,α= Stack

min B C D E min B C D E
β=-5 β=3 0 β=-5 β=3 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 2 α=4 -5 3 8 v=9 2

min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9
D
W X A W X A
-3 -5 -3 -5
Alpha-Beta Example Alpha-Beta Example

alpha(A) and v(A) not updated after returning How does the algorithm finish the search tree?
from D (because A is a maximizing node)

max A Call max A Call


β=+∞
α=3,α= Stack β=+∞
α=3,α= Stack

min B C D E min B C D E
β=-5 β=3 0 β=-5 β=3 0

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 2 α=4 -5 3 8 v=9 2

min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9

W X A W X A
-3 -5 -3 -5

Alpha-Beta Example Alpha-Beta Example


Why? Smart opponent will choose L or worse, thus E's
After visiting K, S, T, L and returning to E, upper bound is 2. So computer at A shouldn't choose
v(E) (=2) ≤ alpha(E) (=3) so stop expanding E E:2 since C:3 is a better move.
and don’t visit M (alpha cutoff)

max A Call max A Call


β=+∞
α=3,α= Stack β=+∞
α=3,α= Stack

min B C D E min B C D E
β=-5 β=3 0 α=3, β=5, v=2 β=-5 β=3 0 α=3, β=5, v=2

max F G H I J K L M max F G H I J K L M
α=4 -5 3 8 v=9 α=5, β=+∞ 2 α=4 -5 3 8 v=9 α=5, β=2 2

min N O P Q R S T U V min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9 4 v=-3 9 -6 0 3 5 -7 -9

W X A W X A
-3 -5 -3 -5
Alpha-Beta Example
Another step-by-step example (from AI course at
Final result: Computer chooses move C UC Berkeley) given at

A Call
max
β=+∞
α=3,α= Stack https://www.youtube.com/watch?v=xBXHtz4Gbdo
min B C D E
β=-5 β=3 0 α=3, β=5, v=2

max F G H I J K L M
α=4 -5 3 8 v=9 α=5, β=2 2

min N O P Q R S T U V
4 v=-3 9 -6 0 3 5 -7 -9

W X A
-3 -5

Effectiveness of Alpha-Beta Search Effectiveness of Alpha-Beta Search


• Effectiveness (i.e., amount of pruning) depends
on the order in which successors are examined • In practice often get O(b(d/2)) rather than O(bd)
– same as having a branching factor of √b
• Worst Case: since (√b)d = b(d/2)
– ordered so that no pruning takes place
– no improvement over exhaustive search
• Example: Chess
– Deep Blue went from b ~ 35 to b ~ 6, visiting 1
• Best Case: billionth the number of nodes visited by the
– each player’s best move is visited first Minimax algorithm
– permits much deeper search for the same time
• In practice, performance is closer to best, – makes computer chess competitive with humans
rather than worst, case
Dealing with Limited Time Dealing with Limited Time
Solution #2: use iterative deepening search (IDS)
• In real games, there is usually a time limit, T,
on making a move
• How do we deal with this? – run alpha-beta search with depth-first search and
– cannot stop alpha-beta algorithm midway and an increasing depth-limit
expect to use results with any confidence – when time runs out, use the solution found for the
– Solution #1: set a (conservative) depth-limit last completed alpha-beta search (i.e., the deepest
that guarantees we will finish in time < T search that was completed)
– but the search may finish very early and – “anytime algorithm”
the opportunity is lost to do more searching

The Horizon Effect The Horizon Effect


• Sometimes disaster lurks just beyond the • Quiescence Search
search depth – when SBE value is frequently changing, look deeper
– computer captures queen, but a few moves later than the depth-limit
the opponent checkmates (i.e., wins) – look for point when game quiets down
• The computer has a limited horizon, it cannot – E.g., always expand any forced sequences
see that this significant event could happen
• How do you avoid catastrophic losses due to • Secondary Search
short-sightedness ? 1. find best move looking to depth d
– quiescence search 2. look k steps beyond to verify that it still looks good
– secondary search 3. if it doesn't, repeat step 2 for next best move
Book Moves More on Evaluation Functions
• Build a database of opening moves, end The board evaluation function estimates
games, and studied configurations how good the current board configuration
• If the current state is in the database, is for the computer
– it is a heuristic function of the board's features
use database: • i.e., function(f1, f2, f3, …, fn)
– to determine the next move – the features are numeric characteristics
– to evaluate the board • feature 1, f1, is number of white pieces
• feature 2, f2, is number of black pieces
• Otherwise, do Alpha-Beta search
• feature 3, f3, is f1/f2
• feature 4, f4, is estimate of threat to white king
• etc.

Linear Evaluation Functions Linear Evaluation Functions


• The quality of play depends directly on the
• A linear evaluation function of the quality of the evaluation function
features is a weighted sum of f1, f2, f3, ...
w1 * f1 + w2 * f2 + w3 * f3 + … + wn * fn
• To build an evaluation function we have to:
– where f1, f2, …, fn are the features
1. construct good features using expert domain
– and w1, w2 , …, wn are the weights knowledge or machine learning
2. pick or learn good weights
• More important features get more weight
Examples of Algorithms Examples of Algorithms
that Learn to Play Well that Learn to Play Well
Checkers Backgammon
A. L. Samuel, Some Studies in Machine Learning
using the Game of Checkers, IBM Journal of G. Tesauro and T. J. Sejnowski, A Parallel
Research and Development, 11(6):601-617, 1959 Network that Learns to Play Backgammon,
• Learned by playing thousands of times against a Artificial Intelligence, 39(3), 357-390, 1989
copy of itself • Also learned by playing against itself
• Used an IBM 704 with 10,000 words of RAM,
magnetic tape, and a clock speed of 1 kHz • Used a non-linear evaluation function - a neural
• Successful enough to compete well at human network
tournaments • Rated one of the top three players in the world

Non-Deterministic Games
Non-Deterministic Games
0 1 2 3 4 5 6 7 8 9 10 11 12
• Some games involve chance, for example:
– roll of dice
– spin of game wheel
– deal cards from a shuffled deck
• How can we handle games with random
elements?
– Modify the game search tree to include chance
nodes:
1. computer moves
2. chance nodes (representing random events)
3. opponent moves
25 24 23 22 21 20 19 18 17 16 15 14 13
Non-Deterministic Games Non-Deterministic Games
• Weight score by the probability that move occurs
• Use expected value for move: Instead of using
Extended game tree representation: max or min, compute the average, weighted by the
probabilities of each child
A max A max

chance 50/50 50/50 chance


50/50 50/50 50/50 50/50
4 -2
.5 .5 .5 .5 .5 .5 .5 .5
B C D E B C D E
2 6 0 -4 min 2 6 0 -4 min

7 2 9 6 5 0 8 -4 7 2 9 6 5 0 8 -4

Non-Deterministic Games Expectiminimax

Choose move with the highest expected value expectiminimax(n) =


SBE(n) for n, a Terminal state or
A state at cutoff depth
max
4
α=
maxsÎSucc(n) expectiminimax( s) for n, a Max node
50/50 50/50 chance
4 -2 minsÎSucc(n) expectiminimax( s) for n, a Min node
.5 .5 .5 .5
B
2
C
6
D
0
E
-4 min S sÎSucc ( n ) P ( s ) * expectiminimax( s) for n, a Chance node
7 2 9 6 5 0 8 -4
Non-Deterministic Games Computers Play GrandMaster Chess
• Non-determinism increases branching factor Deep Blue (1997, IBM)
– 21 possible distinct rolls with 2 dice (since 6-5 is
same as 5-6) • Parallel processor, 32 nodes
• Each node had 8 dedicated VLSI chess chips
• Value of look-ahead diminishes: as depth
• Searched 200 million configurations/second
increases, probability of reaching a given
• Used minimax, alpha-beta, sophisticated heuristics
node decreases • Average branching factor ~6 instead of ~40
• Alpha-Beta pruning is less effective • In 2001 searched to 14 ply (i.e., 7 pairs of moves)
• Avoided horizon effect by searching as deep as 40 ply
• Used book moves

Computers can Play GrandMaster Chess “Game Over: Kasparov and the Machine” (2003)

Kasparov vs. Deep Blue, May 1997


• 6 game full-regulation chess match sponsored by
ACM
• Kasparov lost the match 2 wins to 3 wins and 1 tie
• Historic achievement for computer chess; the first
time a computer became the best chess player on
the planet
• Deep Blue played by brute force (i.e., raw power
from computer speed and memory); it used
relatively little that is similar to human intuition and
cleverness
Game Playing: Go
Status of Computers Playing Other Games
Google’s AlphaGo beat Korean grandmaster
• Checkers Lee Sedol 4 games to 1 in 2016
– First computer world champion: Chinook
– Beat all humans (beat Marion Tinsley in 1994)
– Used Alpha-Beta search and book moves
• Othello
– Computers easily beat world experts
• Go
– Branching factor b ~ 360, very large!

AlphaGo Documentary Movie (2017) Game Playing Summary


• Game playing is modeled as a search problem,
doing a limited look-ahead (depth bound)
• Search trees for games represent both computer
and opponent moves
• A single evaluation functions estimates the
quality of a given board configuration for both
players (zero-sum assumption)
− good for opponent
0 neutral
+ good for computer
Summary How to Improve Performance?
• Minimax algorithm determines the • Reduce depth of search
optimal moves by assuming that both – Better SBEs
players always chooses their best move
• Use machine learning to learn good features
• Alpha-beta algorithm can avoid large parts of rather than use “manually-defined” features
the search tree, thus enabling the search to
• Reduce breadth of search
go deeper
– Explore a subset of the possible moves instead of
• For many well-known games, computer
exploring all
algorithms using heuristic search can match
or out-perform human experts • Use randomized exploration of the search
space

Monte Carlo Tree Search (MCTS) Pure Monte Carlo Tree Search
• Concentrate search on most promising moves • For each possible legal move of current player,
• Best-first search based on random sampling of simulate k random games by selecting moves
search space at random for both players until game over
(called playouts); count how many were wins
out of each k playouts; move with most wins is
• Monte Carlo methods are a broad class of selected
algorithms that rely on repeated random
sampling to obtain numerical results. They can • Stochastic simulation of game
be used to solve problems having a • Game must have finite number of possible
probabilistic interpretation. moves, and game length is finite
Exploitation vs. Exploration Scoring Child Nodes
• Choose child node, i, with highest score
• Rather than selecting a child at random, how
• One way of defining the score at a node:
to select best child node during tree descent?
– Upper Confidence Bound for Trees (UCT):
– Exploitation: Keep track of average win rate for
each child from previous searches; prefer child Exploitation Exploration
that has previously lead to more wins term wi lnt term
– Exploration: Allow for exploration of relatively +c
ni ni
unvisited children (moves) too
• Combine these factors to compute a “score” where wi = number of wins after ith move,
for each child; pick child with highest score at ni = number of playout simulations after ith move,
each successive node in search t = total number of playout simulations

MCTS Algorithm Monte Carlo Tree Search (MCTS)


Recursively build search tree, where each round
consists of:
1. Starting at root, successively select best child
nodes using scoring method until leaf node L
reached
2. Create and add best (or random) new child
node, C, of L
L
3. Perform a random playout from C
4. Update score at C and all of C’s ancestors in Note: Only exploitation used here C
to pick best child
search tree based on playout results
Key: number games won / number playouts
State-of-the Art Go Programs
• Google’s AlphaGo
• Facebook’s Darkforest
• MCTS implemented using multiple threads and
GPUs, and up to 110K playouts
• Also used a deep neural network to compute
Update
SBE
Scores

Playout

Key: number games won / number playouts

Anda mungkin juga menyukai