Anda di halaman 1dari 3

OPTIMIZAO E DECISO 10/11

PL #6 Dynamic Programming
Alexandra Moutinho
th

(from Hillier & Lieberman Introduction to Operations Research, 8 edition)

DYNAMIC PROGRAMMING PROBABILISTIC PROBLEM


An enterprising young statistician believes that she has developed a system for winning a popular Las
Vegas game. Her colleagues do not believe that her system works, so they have made a large bet
with her that if she starts with three chips, she will not have at least five chips after three plays of the
game. Each play of the game involves betting any desired number of available chips and then either
winning or losing this number of chips. The statistician believes that her system will give her a
probability of 2/3 of winning a given play of the game.
Assuming the statistician is correct, use dynamic programming to determine her optimal policy
regarding how many chips to bet (if any) at each of the three plays of the game. The decision at each
play should take into account the results of earlier plays. The objective is to maximize the probability
of winning her bet with her colleagues.
Resolution:
The dynamic programming formulation for this problem is:
Stage

State

th play of game ( = 1, 2, 3),


= number of chips to bet at stage ,
= number of chips in hand to begin stage .

This definition of the state is chosen because it provides the needed information about the current
situation for making an optimal decision on how many chips to bet next.
Because the objective is to maximize the probability that the statistician will win her bet, the
objective function to be maximized at each stage must be the probability of finishing the three plays
with at least five chips. (Note that the value of ending with more than five chips is just the same as
ending with exactly five, since the bet is won either way.) Therefore,
(

) = probability of finishing three plays with at least five chips, given that the statistician
starts stage in state , makes immediate decision , and makes optimal decisions
thereafter,
( ) = max

,,

).

) must reflect the fact that it may still be possible to accumulate five
The expression for (
chips eventually even if the statistician should lose the next play. If she loses, the state at the next
(
stage will be
, and the probability of finishing with at least five chips will then be
. If she wins the next play instead, the state will become + , and the corresponding probability
(
). Because the assumed probability of winning a given play is 2/3, it now follows
will be
that:
(

)=

1
3

)+

2
3

[where ( ) is defined to be 0 for < 5 and 1 for


5]. Thus, there is no direct contribution to
the objective function from stage other than the effect of then being in the next state. These basic
relationships are summarized in the next figure.
1

Therefore, the recursive relationship for this problem is

for

( )=

max

,,

( ) as just defined.

= 1, 2, 3, with

1
3

)+

2
3

This recursive relationship leads to the following computational results.


= :

( )
0

-#

-#

= :

( ,
0

Winning would still


lead to < 5 =>
( )=0

2 (or more)

1 (or more)

1*

0 (or

)=

Nothing to bet

Probability of winning
*

5)
)+

Already won

( )
0

1 or 2

0, 2, or 3

= :

( ,
0

)=

)+
2

Optimizao e Deciso 09/10 - PL #6 Dynamic Programming - Alexandra Moutinho

0 (or

( )

5)

Therefore, the optimal policy is:

= 3)

=1

if win (

if lose (

= 4)

= 2)

=1

if win (
if lose (

= 1 or 2

This policy gives the statistician a probability of

= 5),
= 3),

if win (

if lose (

=0
= 2 or 3

2 or 3 (for = 1)
1, 2, 3 or 4 (for = 2)
= 1 or 0), bet is lost

= 3 or 4),

of winning her bet with her colleagues.

Optimizao e Deciso 09/10 - PL #6 Dynamic Programming - Alexandra Moutinho

Anda mungkin juga menyukai