Anda di halaman 1dari 19

Probabilistic

Dynamic
Programming
Group 5
Minie Joy Adarne
Kathleen Marzan

STOCHASTIC

CONDITION

probability models for

processes that evolve over time in a probabilistic manner.

defined to be an indexed collection of random variables


where the index t runs through a given set of non
negative integers

STOCHASTIC CONDITION

represent the characteristics of interest at


time .

represent an inventory level of a particular


product at the end of week .

represents the state of the system at time .

STOCHASTIC CONDITION

The

current status of the system can fall into any

one of mutually exclusive categories called states.

Mathematical representation of how status


of physical system evolves over time

Probabilistic Dynamic
Programming
State at next stage is not completely determined
by state and policy decision at current stage.
There is a probability distribution for determining
the next state, see figure.

S = number of possible states at stage n + 1.


system goes to state i (i = 1,2,,S) with probability
Pi given state Sn and decision Xn at stage n.
Ci = contribution of stage n to objective function.
figure is expanded to all possible states and
decisions at all stages, it is a decision tree.

Probabilistic Dynamic
Programming

Probabilistic Dynamic
Programming

Example 7: winning in Las Vegas

Statistician has a procedure that she believes


will win a popular Las Vegas game

Colleagues bet that she will not have at least


five chips after three plays of the game

67% chance of winning a given play of the game

If she begins with three chips

Assuming she is correct, determine optimal


policy of how many chips to bet at each play

Taking into account results of earlier plays

Probabilistic Dynamic
Programming
Objective:

maximize probability of winning


her bet with her colleagues
Dynamic programming problem formulation

Stage n: nth play of game (n = 1, 2, 3)


The decision at stage n is Xn = how much to
bet
The states of stage n is Sn = chips available
for betting, Sn = 3

Probabilistic Dynamic
Programming
Since our goal is to maximize the chance of
winning and ending up with at least 5 chips
at the end of the 3rd bet given that you have
3 chips to begin with.
Recursion relation: Note that if you have 3 chips at
the beginning of stage n and bet Xn then the
expected return is
fn(Sn, Xn)=1/3 x f*n+1(Sn-Xn) + 2/3 x f*n+1
(Sn+Xn)

Therefore the recursive


equation is
f*n(Sn)=max

{1/3f*n+1(Sn-Xn)
+2/3f*n+1(Sn+Xn)}
Xn=0,1,2Sn
For n = 1,2,3 with
f*4(S4) as defined
For boundary Condition, we have:
f*4S4 = 0 if S4 < 5 and
f*4S4 = 1 if S4 is greater than or equal to 5

Stage 3: n=3 (sample computation)


f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3

S3

f* 3(S3)

0
1
2
3
4
5
As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5
from recursive equation 1/3f3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4

X*3

Stage 3: n=3 (sample computation)


f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3

S3

X*3
0
1
2
3
4
5

f* 3(S3)

As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5


from recursive equation 1/3f3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4

Stage 3: n=3 (sample computation)

f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3

S3
X*3
0
1
2
3
4
5

2/3

2/3

2/3

2/3

2/3

f* 3(S3)

As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5


from recursive equation 1/3f3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4

Stage 3: n=3 (sample computation)


f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3

S3
0
1
2
3
4
5

2/3

2/3

2/3

2/3

f* 3(S3)

2/3

As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5


from recursive equation 1/3f 3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4

X*3

0 or S 3-5

Stage 3: n=3 (sample computation)


f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3

S3

0
0
1
2
3
4
5

1
0

0
0
0
1

3
-

0
0
0
2/3
2/3
2/3
2/3
2/3

4
-

2/3
2/3
2/3

f*3(S3)
-

2/3
2/3
2/3

X*3

0
0

0
0

0
2/3
2/3
1

As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5


from recursive equation 1/3f3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4

0
2 or 3
1,2,3,4
0 or S3-5

Probabilistic Dynamic Programming


Solution:

n=3
S3
0

F3*(S3)
0

X3 *
-

2/3

2 (or more)

2/3

1(or more)

0 (or S3 -5)

Probabilistic Dynamic
Programming
n=2

X2

f2 ( S2, X2) = -X) + (+X

S2

f2 (S2)

X2 *

4/9

4/9

4/9

1 or 2

4/9

2/3

2/3

2/3

0, 2 or 3

8/9

2/3

2/3

2/3

8/9

0 (or S2 -5)

0
0
0
2/3
2/3
1

Probabilistic Dynamic
Programming
n=1
X1

f1*(s1)

f1 ( S1, X1) = -X) + (-+X

S1

2/3

20/27

2/3

2/3

20/27

X 1*

Probabilistic Dynamic
Programming
Solution

(contd.)

From the tables, the optimal policy is:

Statistician has a 20/27 probability of winning the bet


with her colleagues

Anda mungkin juga menyukai