Anda di halaman 1dari 20

Game Theory 101: The Complete

Textbook
@rvvincelli
Introduction
Game theory is the study of strategical interdependency situations, which are
those where the action of the operator has a consequence not only for the ope-
rator itself but also over others.
The most famous example is the prisoner’s dilemma:
• if one speaks and the other does not the one scores zero and the other
faces the full sentence

• and dually
• if they both speak, they both go to jail but for a little less
The mathematical representation of the prisoner’s dilemma is:
shut up confess
shut up 1,1 12,0
confess 0,12 8,8
where the payoff, which is the action reward, can be represent months of
jailtime for example.
In the dilemma of the prisoners the maximizing strategy would be for both to
coordinate with each other and stay silent, but it is not sustainable; the confess-
confess strategy is instead. Solving a game is independent from the numerical
value of the outcomes, as long as it is consistent. Game theory is the math to
gap what is in between reasoning and taking conclusions.
The prisoner’s dilemma has a dominating strategy, which is one dominating, in
utility, any other, independently from what the other does. If applied to the
equilibria among countries in a bomb first context, it determines an approach
which is a little interventist.
A better option always exists regardless of what the other players do. The
prisoners’ dilemma also solves the arms race. Or the dynamics of import-export
tariffs.

The dynamics of the other


In America the cigarette act of the 70s has prohibited the advertisement of ci-
garette brands. Actually, this has been beneficial for tobacco producers because
it has forced them to cooperate:

no ads ads
no ads 3,3 1,4
ads 4,2 2,2

1
The model of the prisoners’ dilemma is: two agents both have a do/not
do against eachother, what should they better do? Are there situations where
losing is better than winning? Sure! And the matrix is:
try fail
try 0,0 -1,-1
fail 1,-1 0,0
Failure strictly dominates here. This is an example of deadlock : the players
cannot improve their results unless the opponent chooses the wrong strategy.
Up until now the players have thought alike, therefore the game will always have
one strategy dominating for both, no matter it is a fair or rigged game.
Let’s suppose that a player has no dominant strategy, what should they do?
They should just look at the opponent first, and minimize the damage. Looking
at actually means predicting and a speedy strategy could be to assume that the
opponent always picks a dominant strategy, if they have one. Once a degree of
freedom is removed, what will the opponent do? The choice is clear for him.
If he always chooses the best for himself, then the game is said to come to a
rational outcome. Complicating, let’s think of a game where, according to the
action of the opponent, there is every time a different action:
left center right
up 13,3 1,4 7,3
middle 4,1 3,3 6,2
down -1,9 2,8 8,-1
By looking at the different actions, player one should take once up, once middle,
once down. It is all about infering, politely guessing, how the other will behave,
and act accordingly. Or better, what they will not do. In the vanilla prisoner ga-
me, confessing strictly dominates keeping silent. Once the freedom of the other
player is eliminated, the game is reduced. Of course still under the assumption
that the opponent does not chooses a suboptimal strategy, willingly - but that’s
another story. This process of guessing out is called iterated elimation of strictly
dominated strategies (IESDS). A game is reduced to a trivial choice by means
of “he knows she knows”.

Different games
A duopoly - pure one-on-one competition - can also be modeled as a IESDS
game. In the book we have the example of a duopoly which finds the best
strategy for both sides if they produce a minimum amount of products. Notice
that in the elimination process the order does not matter - we converge anyways.
Strict dominance means that the return is always better; if the player expects
some return x whatever the case may be, then we have a weak dominance. The
elimination of weakly dominated strategies is not order invariant and it does
not necessarily bring to the optimum.
In some games the optimal strategy of a player is completely and univocally
determined by what the other player chooses. The game lacks a dominated
strategy. Such games are solved by isolating Nash equilibria. A Nash equilibrium
is a set of strategies, one per player, such that none of them has an incentive to
change those strategies given what other players are doing. Example. In a field

2
the red birds are very easy to capture but less valuable; the blue bird requires
cooperation and it is much more valuable:
blue red
blue 3,3 0,2
red 2,6 1,1
These equilibria are found like this: fix one player’s choice and then ask, given
that the other player knows it, is he motivated to change his own? Does is
make sense for him to deviate knowing how the other will play? The question
is basically: is there a hidden information gain? < blue, blue > is a Nash
equilibrium, a pure one. A pure equilibrium is one where both players have
deterministic strategies. Whenever from a strategy a deviation exists, towards
a more profitable strategy, that is not a Nash equilibrium, so when we are better
off choosing something else. < red, red > is also one. A Nash strategy does not
have to be optimal therefore. A Nash equilibrium is a no-regrets strategy: once
realized the payoff, a Nash-oriented player does not regret his choices.
The prisoner dilemma can be modified to an isomorphic red-blue game where
blue is keeping quiet and red is confessing.

The world around us is Nashing


Game theory is a unifying framework. But it excludes the environment, or it
tries to: the players will keep to their math no matter the investigator trying to
nail them. Best response: the optimal strategy of a player given what everybody
else does. An algorithm to find the Nash equilibria: for every player, fix the
strategy for a subset of the other players and find the best response for that
player itself. Example. A general wins if he brings more soldiers than the
counterpart, or just none:
0 1 2 3
0 0,0 0,1 0,0 0,0
1 0,0 0,0 -1,1 -1,1
2 0,0 1,-1 0,0 -1,1
3 0,0 1,-1 1,-1 0,0
If my strategy is “no troops”, shall I change it seen that the other general chooses
to bring no troops himself? Another way to see a Nash equilibrium is thus: my
best dual response. Interpretation: the Nash equilibrium is a law which would
naturally get followed, even without forcings or police. When there are more
equilibria, it may be that some of them are more profitable than others from the
perspective of a single player. Each of the players would like his own condition of
the Nash coordination. In case of a crossroad where two cars arrive in opposite
directions, the equilibria are < stop, go >, < go, stop >. And obviously each
of the two would like to proceed. A traffic light fixes a specific equilibrium,
one of the two, in turns, the one better for a player the other for the second
player, dually. These strategies are self-sufficient anyways, a third actor to
force the parties is not needed: this is the no-regrets property of Nash. As seen,
determining Nash equilibria is very direct, taking O(n2 ) in the number of moves
n. The result of the iterated elimination of dominated strategies, if unique, is a
Nash equilibrium and it is the only one. This does not hold for the elimination of

3
weakly dominated strategies - some Nash equilibria might have been eliminated.
Even if it should be very clear, these are all theorems. The proofs are left to the
student, which is a must in a math book ain’t it? On the first fact, intuitively,
this comes from the no-regrets property, and the elimination aims exactly at
removing those strategies which the player would regret to have chosen, because
there would have been better ones! If the reduction does not bring to a unique
strategy, there are other techniques to isolate the Nashes from the finalists
altogether, for example that of best responses. It is not possible to say how the
weak elimination will behave by just looking at the game itself, unfortunately -
this is not decidable. It is possible that a game has strategies that are weakly
and strictly dominated. It is better to eliminate all of the weakly dominated
ones first, only then moving on to the weak ones. If we do so we do are not
risking to eliminate Nash equilibria too soon without noticing.

The game that people play


So far we have seen games where players have an incentive to cooperate to
obtain the maximum result. And the maximum advantage, or win, of the one
player is not equivalent with the loss of the other. A strictly competing game,
or zero-sum game, is one where it is in the interest of both players that the
other plays the worst possible, that he loses. For example, we have two coins
and we flip them simultaneously. If they match I won them both, otherwise
you do. It is clear that this game has no pure Nash equilibrium indeed: if I
choose head, but then I come to know that you will play tail, then it is better
for me to switch to tail too so that I win. As we know, if we mark the best
responses for each pair of combinations we will never have one green as above.
The moment has come for the famous Nash theorem, of Beautiful Mind fame:
each finite game has at least one Nash equilibrium. The example above does
not have a pure Nash strategy, it must have some mixed form then: what does
it mean? Say that one player could read the other’s mind, we could think that
such a strategy may be neutralized with a trick: go ahead and read whatever
you like, but I am the one tossing the coin, determining the outcome. In other
words, a randomized strategy. The expected value of this game is zero. The
distribution is:
1
• P (XO) = P (A) = 4
1
• P (OX) = P (B) = 4
1
• P (XX) = P (C) = 4
1
• P (0O) = P (D) = 4

And from the perspective of player one:


1,4
• v(A) = − 1 = − −1
4

1,4 −1
• v(B) = − 1 = 4

1,4 1
• v(C) = 1 = 4
1,4 1
• v(D) = 1 = 4

4
And this results in an expected value for the game which becomes P (A)+P (B)+
P (C) + P (D) = −1,4
+
−1,4 1,4 1,4
+ + = 0.
Actually one can prove that with randomization of just flipping the coin, wha-
tever the distribution so also in the case of the proverbial non-fair coin, the
expected value is zero. If we both flip the coin, this does not represent a very
promising deviation from our strategy. So, this is a mixed strategy Nash equili-
brium. Mixed because with the randomization of the coin flip we choose not to
choose and we combine the two pure strategies, head and tail. Said otherwise,
no result can represent the best answer for both, the mutual best response.
Let us consider the game above, but with a little harder profits (or coins):
H T
H 3*,-3 -2,2*
T -1,1* 0,0
The best responses are marked. In this new game, if we start from the assump-
tion that one of the two would randomize, the other should fix an absolute and
optimal strategy. But if it is so, the other will play his optimal one. Therefore
it is just better to randomize. This is a loop! Intuitively for now, let’s take “mi-
xed” as “randomized”. But it is not always like this. Is going random always
the best mixed Nash strategy? In this modified version, the mixed strategy is
indifferent for one player and convenient for the other. But it an equilibrium
strategy must exist, the theorem tells us so. With the algorithm, it is found
by simply solving a system of equations to find under what probability the ex-
pected value for the two player matches. Let us setup, for each player move,
the expected utility of a given move as a function of the probability that the
opponentPwill make aPspecific move:
EUi,j = k = 1, K l = 1, Lpkl rkl where EUi,j is the expected value of player
i for the move j, pk is the probability that player k plays the move l whose reward
is rkl . An example:
EU1,lef t = EU1,right ↔ σ2,up − 3σ2,down = σ2,up + σ2,down . Observing how the
two ends must sum to one and that σdown = 1 − σup , it is then possible to find
that distribution for the choice of the move such that if player A follows that
then player B may just play random - independence is reached. Or better, B
may choose whatever distribution and the effect does not change. The collec-
tion of the indifference strategies for each player are a best response one for the
other and it defines a Nash equilibrium. It is possible that perfect equilibria
coexist with mixed ones in the same game. The full equilibria are indifferent to
the payoffs: if the reward of a given strategy changes the result does not. The
mixed equilibria are probability distributions calculated from a linear system,
they change of course. What happens if our system has no solutions? Simply,
there exists no mixed equilibrium. The same holds if it cannot be validated that
the scores sum up to one.
If a game has a strictly dominated strategy then it is not possible to have a
mixed Nash equilibrium. This follows from the fact that if a game has a pro-
babilistic mixed strategy, each time that the drawn move is strictly dominated
then the player is leaving something on the table and he would be better off
changing choices. This is the opposite of the no-regrets property of Nash. But
what is the meaning of using a strategy which is weakly dominated instead,
in a game which has a mixed-strategy Nash equilibrium? It makes sense. A
game might have both a pure and also a mixed Nash strategy. In that case how

5
do the two compare with each other? The payoff from a mixed distribution is
easily computed by multiplying for the known rewards. In the swerving game
for example, where two cars speed up against each other and must either steer
at last or they will collide, there are both kinds of equilibria. In the mixed one,
the players expect to earn −2 5 on average. The game is:

go straight swerve
go straight -10,-10 -2,2
swerve -2,2 0,0

EU1 = 15 51 − 10 + 45 15 − 2 + 15 54 2 + 45 54 0 = −2
5 These are the weighted averages
of each of the outcomes for player one. We do not know if a mixed strategy
is convenient until the exact expected reward is computed as above. A mixed
strategy is represented by a probability. We have to think of some machinery
which implements this: it tells the player what to play each time. Would that
make sense if they had one single machine to coordinate and eliminate the
uncertainty? A special kind of game in this sense is pure coordination:
left right
up 1,1 0,0
down 0,0 1,1
In this game the player do not have any other priorities than a shared one
to agree upon. Under this framework we can model the game of choosing on
which side of an unsigned street should two cars drive. It is easy to show that
this game also has a mixed strategy, but it is worse than the pure equilibria
< lef t, right >, < right, lef t >. Well this also makes sense intuitively. In real
life situations laws and principles save us from the traps of inefficient mixed
strategies, for example we all drive on the left, at least outside the UK. Without
saying, mixed strategies do not have to be symmetrical - that depends on the
reward matrix of the game itself. For exmaple in zero-sum games the loss of
the one is the win of the other - we just have to negate. There are alternative
methods to compute equilibria. They all agree as expected.
If a mixture of two pure strategies strictly dominates a third strategy, that
third strategy is strictly dominated. These strategies dominated by mixtures
may be removed like the standard ones during the reduction; and the concept of
strictly domained does not change - that is a strategy one cannot really deviate
to without a loss.

Mix & Match


Theorem: a game can never have an even or infinite number of equilibria. Almo-
st. There are some pathologic cases, the point is to understand what are they.
For example, let us think of a game where one dollar is given to two friends if
they both accept a secret vote. The unexpected thing is that, absurdly speaking,
uninamousity might not be reached.
vote yes 1 vote no 1
vote yes 2 1,1 0,0
vote no 2 0,0 0,0

6
With 1, 1 we have a pure Nash strategy of course. Actually, both refusing also
is: if I know that the other votes no, it is meaningless for me to change my
mind, we cannot agree and win. With the procedure we learned we can show
that there are no mixed equilibria. This is a rare example of a game with an
even number of equilibria. So the one above is one of those “almost theorems”
actually. Here is an example of a game with infinite equilibria:
left right
up 2,2 9,0
down 2,3 5,-1
< up, lef t >, < down, lef t > are pure strategy Nash equilibria. The point is
that any mixture between up and down qualifies as best response to left - any
distribution will work. Player 2 will always select left and the expected utility
equation for player one becomes EU1 = σup 2 + (1 − σup )2 = 2. We can see the
indifference because 2 is exactly the outcome of whaterver player 1 does (up
or down). Such a game with infinite mixed strategy equilibria is called a game
with partially mixed Nash strategy equilibrium. We can also have families of
infinite strategy sequences as long as one player chooses a given strategy exactly
a number of times. In these games there comes out a weakly dominated strategy,
usually. Think of a game where two players go for a pot of gold. If they both
claim it, no one gets it, if one, then he will - the whole amount, no sharing. This
may look like a prisoner dilemma, but it is not, even if it seems. But whereas
the prisoner had a best strategy which is to confess here there are three Nash
equilibria, all except < share, share >.

Lemon (game)tree
Selten’s game:
left right
up 3,1 0,0
down 2,2 2,2
This one has an infinite number of equilibria, actually with the cardinality of
the reals. It has two pure Nash strategies, < up, lef t > and < down, right >
and a continuous of mixed ones where p1 chooses down (pure strategy) and p2
chooses right, with a probability no larger than 23 . Therefore, we cannot predict,
without assumptions, what will the players do. In general then, in the games
with asymmetrical information seen so far, a player has no information on what
the other will do, when he must make his move. This happens quite often in
reality, and this kind of game is classic classic game, or distributed.
We can think of another kind of game, said sequential, where the decision of a
strategy is interleaved and temporally positioned. Player 1 makes his move, 2
evaluates that and makes his. A sequential game may be expressed as a tree:

7
I 2,2
down
up

II

lef t right

3,1 0,0

This is called the extended form of a game. A blue node is a decision one. Such
structure is indeed a proper decision tree. In this particular example of Selten’s
game, do the various equilibria still make sense in the new form? < up, lef t >
yes, for example. A subtree definies a subgame. Simply put, the definition of
Nash equilibrium is not anymore the question: if a player could see the oppo-
nent’s move having chosen his own already, would he change it? Actually there
is no need for hypotheses and bets here: first you see, then you choose. It is
not necessarily true that an equilibrium from the classic version is still an equi-
librium in the sequential version, no matter if pure or mixed. An equilibrium
which “survives” is referred to as subgame perfect equilibrium (SPE). In some
way, we can see SPEs as a further specification of Nash equilibria, a one ensuring
that the threats of a player are to be taken seriously. Rewards in the extended
form of a game have no absolute value, but just relative to what has already
happened above in the tree. As said anyway, the model does not cover how the
players assign their own rewards or anything - anything piece of knowlegde or
assumption is incorporated in the little number of the reward.
In the simplest form, extended games are sequential. But we can also think of
simultaneous moves:

H T
H H
1,1 2 1,1

T T

-1,1 1,-1

This is the matching pennies game. The dashed line shows that player 2 is blind
with respect to player 1 strategy. Such a game may be converted to the classic
form in order to be easier to solve. What does this game mean, in simple terms?

H T

1,1 2 -1,1

If 1 starts the game with heads then 2 will chose among the options of the left
tree, right otherwise. To analyze a situation from the strategy point of view,

8
this form is better than the matrix one. For each matrix form there could be
different extended forms. When presented a game in the extensive-decision tree
form, SPEs may be found by converting to the matrix form as we saw. Alter-
natively, backward induction may be applied if the game has no simultaneous
moves. Let’s consider this game:

A
1 0,0

B
C
2 1,-2

D
E
1 -2,1

-1,1

The decision node between E and F for player 1 is the starting point:

E
1 -2,?

-1,?

−1 > −2 so p1 will choose F if he can. Now to the one above: p2 between C


and D; p2 knows that if put in the condition to do so, p1 will play F. Therefore,
the game simplifies for p2:

C
2 ?,-2

?,-1

The best choice for p2 is, in this case, to play D. We can now move up to
the beginning of the game. By having the plan of decisions in front of him p1
can optimally choose to end the game by playing A. By playing A p1 gets 0,
by playing B, p2 will play D, and then p1 knows he will have to play F at
a net of −1. How can we formalize this? The subgame perfect equilibrium is
< accept, escalate >, < war, escalate >; p1 chooses to accept and war at its
decision nodes while p2 chooses escalate at its own.

9
The SPE is the study of credible threats: it is very likely that if a player th-
reathens an SPE action they will do it - actually it should be sure because that
would be the rational maximizing option. In case there are more leaf nodes, the
choice of the node from which to move up is irrelevant, but this should be done
for each leaf. Important to remind that the SPE gives the best choices for every
decision moment, regardless of whether the situation will manifest or not. The
alternative to this backwards induction is to find the equilibria from the matrix
and isolate the strategies which are credible threats as explained above. Usually,
backend induction produces one subgame perfect equilibrium - but some games
are an exception. An example is the ultimatum game:

Split T ake
2,1 -2,0
OK(accept) OK(accept)
Reject Spurn

0,0 0,0

Again no simultaneous moves, so backwards induction is ok. The SPE


is: < split, (accept, spurn) >. In this special game it can be proven that
an SPE is also: < take, (accept, σallow > 0.5) >, where σallow is the pro-
bability that player 2 plays allow. Conversely, for σallow < 0.5, SPE is <
split, (accept, σallow < 0.5) >. Finally, if σallow = 0.5 exactly, the SPE be-
comes: < σsplit , (accept, σallow = 0.5) >. Generalizing, whenever one player
at a decision node has payoffs which are the same for each of its choices then
there are multiple equilibria. In this situation the player can actually choose a
strategy or mix freely between them, as we can see from the parametrization
above.
When do we have guarantees that there must be a single SPE? As long as for
every player and outcome the payoffs are different, backwards induction must
lead to a unique SPE. This is because such a configuration represents for the
player a single optimal decision. It doesn’t matter if two or more players for a
particular strategy get to the same reward. Saying that backwards induction is
possible only if there are no simultaneous moves just means that each decision
node has only one history. It is possible thus to move to the last decision with
a unique history.

Credible threats
We talked about threats - a threat is nothing but a strategy a player convi-
ces the opponent he will play. In order to be credible, the player must shut
some doors, burn some bridges, or tie his hands. In the game of the two brid-
ges, an isle is reachable via two bridges, each on the side of two lands at war
with each other. It has no value in itself thus it is only useful to march throu-
gh to invade the enemy. In the easiest version one can think of, the SPE is:
< (burn, retreat), (notinvade, challenge) > where the moves are crossing the

10
bridge and burning it thus sealing his own land but also get trapped onto the
isle, cross the bridge but then hold back, decide not to do anything, go to war.
Player 1 may turn a non-credible threat into a credible one: burn the bridge
and he will not have any other choice than fight for the island in case player 2
shows up; player 2 sees the credibility and does not show up.
Tied hands: a boss knows that a crew member steals, but the cost of a new hire
is higher than the damage caused by stealing, therefore it doesn’t feel good to
replace the thief. At the first meeting after the fact, he says that whoever will
be caught stealing will be replaced. Is it a good idea to make such announce-
ment? Again, a simple tree could be built and, comparing the two backwards
inductions one sees that it is convenient for the boss to make the statement, the
SPE is < (warn, ignore, f ollowthrough), (steal, stop) >: boss warns, employee
stops. The SPE is actually the study of credible threats. Limiting the options
and thus making a threat credible the player can combine what is useful for him
with what is useful for the opponent.
Let us think that we got pulled over by a patrol, and that even if we don’t want
to it suggests to allow for a quick vehicle search, under the promise that in two
minutes they will be done. The alternative is to refuse but they will call special
ops, and we cannot say no to them. Therefore:

R.S.
You 2,1

Officer

2min long

3,2 1,3

(R.S. for rapid search). The SPE is < R.S., long >. The agents suffer from the
known commitment problem; even if a quick search would clearly be better for
everybody involved, officers’ word is not binding: if we say yes we are at their
mercy and it may last two minutes or much more. The strong assumption is
that policemen puts the duty - doing his job well - above honor and promises.
Let’s try to invert the order in the moves instead:

Officer

2min long

You 3, 2 You 2, 1

yes yes

3,2 1,3

(With a little notation abuse to show our payoff next to You, and the police one
below). In this game, both come to the extreme solution instead. The takeway
is that preferences are worth more than words of now (that’s what contracts

11
exist for anyways). If at all, the police do not want to be seen as a weak by his
own colleagues.
The reason why civil wars end is usual annihilation more than a peace treaty is
actually to be found in this commitment problem logic. The dictator’s dilemma:
smash the opposition or hope that tables turn and it is possible to improve, keep
the power?

Dictator

f ight concede
20%
7,-10 events rebels

80% murderhim

rebels -10,10

f orgive assassinate

-8,5 -13,7

The SPE is < f ight, (assassinate, murder) > - what a joy! What if the rebels
will not agree to spare the dictator’s life? The release finale outcome would be
better for both actually. As said the answer is a binding contract. The contract
represents a formal incentive to maintain one’s word, sometimes under the me-
nace of a legal repercussion.

Flipping backwards
Backward induction: do we always need the tree-form for an extensive game?
Pirates’ game: n + 1 pirates, 1 conqueres riches and makes a proposal to split
it. There must be a quorum, n+1

2 must accept, otherwise pirate +1 walks
the plank and the game continues with the remaning m + 1 = n pirates. Pi-
rates’ goal is first of all to survive, and in the second place to maximize their
own riches cut. Obviously they have also an interest in pirates above them to
runaway into the seas, because that way they can bump up in the hierarchy. As-
sumption: pirates do not coalize but always vote with their belly. The approach
with trees is not feasible since it is exponential. Let’s build another tree instead:

12
3

At every branch or we terminate and split the profits, or the current captain
is terminated. On this tree, linear in n, backwards induction is feasible, but
at each node we have to consider all of the strategies, somehow, so finding, at
every node, the SPE for that pirate. We can build something like this:

A B C D E
A a b c d x
B e f g h y
C i l m n w
D o p q r z
E dead dead dead dead 10
The SPE for the last pirate is of course to vote < no, no, no, no, yes > with
an expected payoff of 10 - and the rest of the crew dead. So, the rows are
SPE’s and the columns the offer received by a single player. For pirate four:
< no, no, no, yes >, with himself making 10 and the pirate five making nothing
- 1 of 2 is a quorum. Things get trickier above, the rest of the pirates have to
somehow convince, or just buyoff, the rest of the pirates. For three for example,
four will always vote against him, but just one coin is sufficient to byoff five,
since that one knows he will not get anything if four wins. In general the balance
changes at every round, and at every round one pirate is more vulnerable that
the others and a pirate who voted yes in the previous round will vote no in this
one - or so. The final table is:

A B C D E
A 8 0 1 0 1
B d 9 0 1 0
C d d 9 0 1
D d d d 10 0
E d d d d 10

13
These are all the SPEs. The equilibrium is instead: A makes an offer together
with C and E who vote yes. Be careful in considering that there is not only the
gain in coins as payoff, but also have his own life spared.
Another game we can learn to handle is nim: there are two players and n chips,
the players take turns in taking one or two and the winner is who takes the very
last one. Starting from two, we can create a sequence of trees ie a forest with
two generic players X and Y taking one or two chips at a time:

2
1,-1 X

-1,1

After a while a pattern becomes obvious, and so all the SPEs: since a clean
pattern may be seen by reducing the number of chips to 3m, whoever manages
to just take the second move will win.
Backwards induction and SPE are rather intuitive, but is that enough if players
play irrationaly, or make mistakes? An example: 100 players, if player x presses
A he wins 1, if B 100 but if and only if everybody before and after him do the
same. Take the player x1 , he has the most difficult choice: his expected value
depends on how much he trusts the other xi ’s, and how much they trust eacho-
ther too. The assumption failing to hold is therefore confidence and information
completeness in the choice.
One more example. A franchise is facing competition in five different cities,
from different local players. The only way they have to beat them is that, if
they come into a city, a price war is started. The franchise does not want that.
Solving the game with the backwards, all competitors will enter the market but
we have some kind of intuition that nobody will do that, as the the only thing
preventing competitors from doing so is the price war, where the franchise will
win; the competitors decide looking at what other competitors did. Analyzing
the situation it is complete information which is missing. And if the assump-
tions are incomplete or wrong any method relying on them will bring to different
outcomes.
An example of vicious irrationaliry now instead, the game of the millipede:

take
1 2,0

add

2 1,3
take
add
...
take

14
where players add or take two dollars from the pot. If we take n turns at the last
one there will be 2n dollars in the pot. The last player decides whether to split
in equal parts or take 2n
2 + 1 and leave the rest to the other player. Under the
assumption that players just want to maximize the payoff, the SPE is for both
to avoid the take move, at every hand. But the equilibrium is limiting, because
move after move the players may also always lose more and more. There is a
dichotomy between SPE and perfect play. In this kind of game what does it
mean to be rational anyway? And if instead the drive of the players wasn’t
the payoff, but rather altruism? Again, false assumptions produce inaccurate
predictions.

Back and forth


Dual to the backwards induction we have the forward induction: under the
assumption of rationality from previous move, we condition with respect to eve-
rything we can infer from those. A game we have seen already is stag hunt. A
change:

S S
S S
3,3 2 2,0

H H

0,2 1,1

In this game cooperation makes stronger. Seen that we have an epsilon (nonde-
terministic) move there are multiple equilibria. With forward induction, since
everything happened up to that moment was rational, even if 2 does not see
1’s move, he will be able to infer that he has not played H and thus he must
have played S. With reasoning, we come to < stag, stag > as equilibrium. The
random choice defines some sort of universe: the one where the player chooses
H and the other S, which is promise for the best stag hunt. Forward induction
is much closer to the standard way of thinking.
Let’s now consider a richer version of the crazy drive game above: drivers have
the chance to block the steer and throw the steering wheel out of the window,
with a gesture visible by the other. This action is a very credible threat: the
driver will not steer anymore for sure because he cannot, he is not bluffing.
When one of the two throws the steering wheel out of the window the other will
surely steer away to avoid a certain death: this is an SPE. But, as a curiosity,
what can say if none of the two throws anything out of the window? With
forward induction one sees that steering is not rational for the player: if the
wants to dodge the other well then it is better to throw it away. How do we
know that there is a mixed strategy hiding here? There is an indifference for
the players to keep pushing or throw the wheel away. This is something we
already know how to deal with - we will have an infinite space of equilibria. It
is interesting to observe that the equilibria we can pull out must be based on
players who are not only rational, but also for example who read the opponent

15
through inductions. What does it change it throwing it is exactly zero costs?
This basically influences as a negative penalty on the original payoffs deriving
from the decision. Reply strategies to surviving do not change; if A throws B
will have to dodge him; the irrationality of steering is still true. The solution in
forward inference for the player defines the following: player 1 should continue
full gas and player 2 will move aside; this is a threat which cannot be violated.
Another interesting case with forward induction is found when an equilibrium
strategy does not coincide with the optimal strategy for the group.

Back 2 Basics
Let’s go back to matrix games now, but with some generalizations. For example:
parametrized rewards.
A reward may be a pseudo-distribution. Let’s consider the free kick game:
P sx P dx
A sx 0,0 x,-x
A dx 1,-1 0,0
where of course the attacker maximizes his payoff if the goalkeeper jumps to
the opposite side. If this striker decided to shoot left he has a precision given
by the indicator x with 0 < x < 1. Another generalization may be a mixed
strategy with more than two moves, let’s think for example of a randomized
rock-scissor-paper.
The generalized prisoners’ dilemma:
L sx R dx
U R,r S,t
D T,s P,p
The letters are the so called exogenous variables. By setting specific constraints
on those variables we instantiate a particular version of the game. This also
allows us to define generic equations to solve any instance, ie we find genera-
lized formulas for the mixed strategy. The point is that as long as a constant
like A > a > b is defined, an algorithm like PSNE will work anyway, MSNE
b−c
too. MSNE will bring us to something like: σup = a+b−2c and of course will
always havr to check that this makes sense as a probability score. About the
example, do mixed strategies arise in this generalized version of the dilemma?
We set: T > R > P > S and t > r > p > s and one sees that even if strict
dominance is overlooked trying to derive a mixed strategy solving the equations
will magically lead to a contradiction, like for example deriving p = s when we
set p > s initially. < down, lef t > is confirmed as the only Nash equilibrium.
Equilibria follow from a calculation, and if the numbers change they may disap-
pear. When we have a generic game then, modeled on exogenous variables, the
equilibria will be there only for a particular set of instances.
It is convenient to know how to detect bizarre equilibrium configurations and
exclude those because irrealistic. We call those knife-edge equilibria. The
pidgeon-hawk game:
A sx P dx
v v
A 2−c , 2−c v, 0
v v
P 0, v 2, 2

16
v > 0 is the value of a piece of food contested between two animals, and they
may decide to act as hawks or pidgeons, so in practice to fight or not for that
piece. c > 0 is a cost for the conflict, to pay only if the animal decides to act as
an eagle and fight. For v ↔ ∞ there exists a dominating strategy but for v ↔ 0
the game has two PSNEs and just one MSNE. About this game the interesting
point is when v2 ≈ c; it is therefore convenient to study v2 versus c in all three
unequality cases separately. For example for > both players choose hawk in
equilibrium and the game is isomorphic to the prisoner dilemma with hawking
equal to confessing and doving equal to keep quiet. For the equality there is
an indifference between eagle and pidgeon when one player chooses eagle which
leads to an infinite equilibria situation; this condition comes with a number of
other peculiarities but since on the real line v2 = c has probability zero this is
righteously a knife-edge equilibrium case.
Knife-edge equilibria are known to induce indifference where weak dominance
arises. It is always possible to forge a new game of its own by setting the
variables to edge cases; this usually creats challenging and fun games.
Whenever we introduce changes in a game by altering the payoffs, it can be
interesting to compare the two outcomes; the study of this change is called
comparative statistics . One approach to this is:
1. solve for the equilibria
2. calculate one or more interesting elements, such as the probability of a
certain total payoff for the players
3. derive the function with respect to the exogenous variable to be manipu-
lated in the altered game
4. take the derived function as the model to study the change
This might sound intimidating especially because we have to define a derivable
function - an example is due.
Let’s take a simplified free kick problem where both players are superhuman:
the only chance for the goalie is to correctly guess the jump - otherwise the
superprecision of the puntero will always inevitably lead to a goal
g left g right
pu left 0,0 1,-1
pu right 1,-1 0,0
No PSNE can exist: the players have opposite intentions - this is a zerosum. The
Nash equilibrium here is a MSNE; if any player flips a coin, the best the other
can do is flip as well - no profitable deviation under perfectly simultaneous
actions - which is our assumption of course. In the generalized version with
exogenous variables instead, we can model a weakness too as we have seen K-
left; D-right thus becomes (x, −x) with 0 < x < 1. Let’s think we want to study
the following fact: as the kicker’s accuracy on his left side increases, will he kick
to that side more frequently, should he? Now:
x
1. find the equilibria: goalie goes left with probability 1+x , puntero aims left
1
with 1+x
2. this point is already taken care of: the element of interest is alreayd
represented by the mixed strategy distribution

17
3. we must derive f (x) = 1−(1+x) using the quotient rule for example:
2
f 0 (x) = −1−(1+x) , still bounding 0 < x < 1 where f 0 (x) < 0∀x
4. the counterintuitive result is this: the probability the kicker aims left
decreases as his accuracy to that side improves!!
About the last point, one would guess the opposite: why would improving your
abilities on one side make you want to utilize that same side less frequently?
The kicker is only as good as his weakest link. Another assumption which must
be made explicit in order for this counterintuitiveness to hold is the following:
the goalie must be aware of such a weakness and because of this he will be
inclined to guard the kicker’s stronger side even more. And dually, the kicker
will be tempted to take a try at his own weaker side knowing it will be less
guarded. Again, much - too much perhaps - information and value lies in the
assumptions, and especially in players who act rationally.
A very archetipal game is the volunteers’ dilemma. Two players are willing to
take an action, only if they know the other will not do it; the action comes with
a cost for them. It is unclear who has to take action and there is a risk nobody
will. The action might be assumed to be idempotent, or not, at least externally:
if c is the cost of the player taking action then:
ignore call
ignore 0, 0 1, 1 − c
call 1 − c, 1 1 − c, 1 − c∗
This could model something like: two neighbors see a third neighbor on the
pavement bleeding but don’t dare to call the police because they fear to become
suspects, automatically. The PSNEs are: < call, ignore >, < ignore, call >, so
basically waiting on the other or make the call and risk to draw attention on
himself. We also know that virtually all games have an odd-number of equili-
bria: seen there is no domination, this should be a mixed strategy. Doing the
math this turns out to be P (ignore) = c, P (call) = 1 − c: this is on the other
hand quite logical, the higher the c, for example the probability of being accused
to be a murderer, the less likely is the neighbor to call the police. And thus c2
of the time the third neigbhor will die in the alley - even if each of the other two
would surely call if they only knew this, so that the rival neighbor would not
call 911. Since f (c) = c2 is the probability of the woman dying and this goes as
f 0 (c) = 2c things go worse and worse as c ↔ 1 - which was clear already.
The hawk-pidgeon game above can be a model to study two enemy countries,
will they declare war (hawk) or not (pidgeon)? The probability that they de-
clare war decreases, but weakly, with the increase of the costs c; weakly because
if c < v2 countries will combat anyway, among other things.
Finally, interesting comparative dynamics are not a given - we could have a ga-
me that is invariant to the variation of the variables. An important property of
MSNEs: all pure strategies in a mix at equilibrium generate the same expected
utility. It is also important to notice that a mix strategy can insist on n moves,
not just two.
But we are not done yet with the trivia. If a player mixes among all strategies
he has then the other player cannot play a weakly dominated strategy; intui-
tively, this is because the move is expected to get dominated according to the
distribution of the dominating moves in the mix of the opposer. This result can

18
Pn
be represented by the following: EUA = i=1 σi EUAi where as we know EUAi
represents the expected utility of playing A with the opponent playing i. If B is
weakly dominated then: σi EUAi ≥ σi EUBi ∀i, with at least one equality holding
because of weakness. It cannot be mixed between A and B because to do so the
utilities must be equal, which is not the case. Computationally, isolating weakly
dominated strategies rules out as non-equilibria the strategies
 containing them
- it saves time because the range of possible mixes has nk ∀1 ≤ k ≤ n. The
other golden rule to guide us here is that in order to be willing to mix, all pure
strategies must yield the same utility.
We just mentioned n-moves games, and flew upon a few cases anyway. Solving
a generic n-moves game can be expensive though. The decision matrix is as
follows:
pure mixed
pure up Y N
mixed N N

Classic games
We come now to the father of all games, rock-scissor-paper (or whatever the
right order is).

ROCK PAPER SCISSOR


ROCK 0,0 -1,1 1,-1
PAPER 1,-1 0,0 -1,1
SCISSOR -1, 1,-1 0,0

No PSNE exists because there is no *-couple of moves for the players. We


could think that a purely random and uniform strategy is a good one. Again
from previous section any move choice leads to an empty payoff, which brings
indifference between the choices and thus an MSNE. Things start to get coun-
terintuitive when one unbalances the payoffs, eg say that paper, if it wins, gets
you a 10 instead. Let’s start to work towards generalizations by adopting a
trick for zero-sum games - this is one of them. The reward is three-valued in
−1, 0, 1 and for one to win the other must lose. For symmetric zero-sum games,
the expected utility in equilibrium must be zero. This can be proven. It can
also be shown that there cannot be any strategy-mixing. The most meaningful
generalization is the one hinted at above, choosing some x, y, z > 0 powers for
the three items against the other two. No two-valued mix can exist, still. With
the usual calculations the generalized version yields an MSNE on the three va-
x y z
lues, and this is of the form: x+y+z , x+y+z , x+y+z ; the mixed of the base case
x
game is clearly an instance of this. Interestingly, if x+y+z is the benefit score
of playing scissors (or any other two) then it will not appear in the rewards of
that (and dual).
x
Comparative analysis of D( x+y+z ) also highlights this link to x - counterintui-
tive. Say that x is the payoff of paper; then we can think ahead and realize
that as paper grows in popularity, because of his return scissors, its nemesis,
must as well. In a real world, do people randomize, even if optimal? No; it does
not feel natural and it could also be difficult to defend in front of the stakehol-
ders. People usually master one single strategy. Many videogames are actually

19
crafted versions of the scissor’s game, where the MSNE just teaches us a way
of playing the game, a way which might be a winning one, but it uninteresting
and purpose-beating.
All games we have seen have a finite number of pure strategies and are ame-
nable for a tree or matrix representation, but there is more. The definition of
game is independent of them anyway. We know that finite games must have at
least one equilibrium. And the infinite case? Let’s take the xy game, where the
players choose x, y as naturals and the reward is x ∗ y for both; clearly, there is
no equilibrium because we are on an infinite set and there is always a profitable
deviation: for any choice n we will make, that is just n + 1. An interesting
game prototype, infinite as well, is Hotelling, where two gelato gazebos have to
claim some space on a uniformly crowded beach. The only equilibrium, pure,
is in the middle, both: under the fair assumption that they will split all clients
in position x = 0.5, one will not be able to profitably deviate, being right in
the middle the best option already. They will lose something doing so? Ever
wondered why gas stations or fast foods cluster together then? This is also why
presidents try to win the median voter during the elections.
Closing up, a few classic games with real world applications.
Gunslingers (pistoleros). Where is it the best place to stand and fire knowing
that missing if fatal and hitting saves us? It is d1 + d2 = 1 where d1 = d2 and 1
represent the maximum accuracy. That is why competing technologies such as
PlayStation3 and XboX come out always at the same time!
A second price auction is one where the highest bid wins yet paying what the
second highest bid has offered. What should the bidders offer? An equilibrium
is reached when all of them offer the highest price they are ready to pay. For
a loser, offering more trying to overtake the winner means going above budget;
offering less has no point. For the winner, it is pointless to offer more since
he is already sure he has won thus entitled to pay in the second worst case
no more than what he offered. Offering less instead is dangerous because we
run the risk of slipping out of the first place thus losing the bid. This is a no
brainer strategy: players only have to focus on their own estimation bidding for
the max of their budgets. Also, this equilibrium is invariant of the number of
bidders. One important assumption is incomplete information, we have always
assumed actors who had an estimate of the opponents utility - they know their
reward. In this case, if they are playing honest with themselves, the budget is
independent of others, but they do not know anything about other budgets in
this one anyway.

20

Anda mungkin juga menyukai