PL #6 Dynamic Programming
Alexandra Moutinho
th
State
This definition of the state is chosen because it provides the needed information about the current
situation for making an optimal decision on how many chips to bet next.
Because the objective is to maximize the probability that the statistician will win her bet, the
objective function to be maximized at each stage must be the probability of finishing the three plays
with at least five chips. (Note that the value of ending with more than five chips is just the same as
ending with exactly five, since the bet is won either way.) Therefore,
(
) = probability of finishing three plays with at least five chips, given that the statistician
starts stage in state , makes immediate decision , and makes optimal decisions
thereafter,
( ) = max
,,
).
) must reflect the fact that it may still be possible to accumulate five
The expression for (
chips eventually even if the statistician should lose the next play. If she loses, the state at the next
(
stage will be
, and the probability of finishing with at least five chips will then be
. If she wins the next play instead, the state will become + , and the corresponding probability
(
). Because the assumed probability of winning a given play is 2/3, it now follows
will be
that:
(
)=
1
3
)+
2
3
for
( )=
max
,,
( ) as just defined.
= 1, 2, 3, with
1
3
)+
2
3
( )
0
-#
-#
= :
( ,
0
2 (or more)
1 (or more)
1*
0 (or
)=
Nothing to bet
Probability of winning
*
5)
)+
Already won
( )
0
1 or 2
0, 2, or 3
= :
( ,
0
)=
)+
2
0 (or
( )
5)
= 3)
=1
if win (
if lose (
= 4)
= 2)
=1
if win (
if lose (
= 1 or 2
= 5),
= 3),
if win (
if lose (
=0
= 2 or 3
2 or 3 (for = 1)
1, 2, 3 or 4 (for = 2)
= 1 or 0), bet is lost
= 3 or 4),