Dynamic
Programming
Group 5
Minie Joy Adarne
Kathleen Marzan
STOCHASTIC
CONDITION
STOCHASTIC CONDITION
STOCHASTIC CONDITION
The
Probabilistic Dynamic
Programming
State at next stage is not completely determined
by state and policy decision at current stage.
There is a probability distribution for determining
the next state, see figure.
Probabilistic Dynamic
Programming
Probabilistic Dynamic
Programming
Probabilistic Dynamic
Programming
Objective:
Probabilistic Dynamic
Programming
Since our goal is to maximize the chance of
winning and ending up with at least 5 chips
at the end of the 3rd bet given that you have
3 chips to begin with.
Recursion relation: Note that if you have 3 chips at
the beginning of stage n and bet Xn then the
expected return is
fn(Sn, Xn)=1/3 x f*n+1(Sn-Xn) + 2/3 x f*n+1
(Sn+Xn)
{1/3f*n+1(Sn-Xn)
+2/3f*n+1(Sn+Xn)}
Xn=0,1,2Sn
For n = 1,2,3 with
f*4(S4) as defined
For boundary Condition, we have:
f*4S4 = 0 if S4 < 5 and
f*4S4 = 1 if S4 is greater than or equal to 5
S3
f* 3(S3)
0
1
2
3
4
5
As defined f*4S4 = 0 if S4 <5 and f*4S4 = 1 if S4 > and = 5
from recursive equation 1/3f3(S3-X3) = 1/3f4S4
and 2/3f3(S3+X3) = 2/3f4S4
X*3
S3
X*3
0
1
2
3
4
5
f* 3(S3)
f3(S3,X3)=1/3f*4(S3-X3)+2/3f*4(S3+X3)
X3
S3
X*3
0
1
2
3
4
5
2/3
2/3
2/3
2/3
2/3
f* 3(S3)
S3
0
1
2
3
4
5
2/3
2/3
2/3
2/3
f* 3(S3)
2/3
X*3
0 or S 3-5
S3
0
0
1
2
3
4
5
1
0
0
0
0
1
3
-
0
0
0
2/3
2/3
2/3
2/3
2/3
4
-
2/3
2/3
2/3
f*3(S3)
-
2/3
2/3
2/3
X*3
0
0
0
0
0
2/3
2/3
1
0
2 or 3
1,2,3,4
0 or S3-5
n=3
S3
0
F3*(S3)
0
X3 *
-
2/3
2 (or more)
2/3
1(or more)
0 (or S3 -5)
Probabilistic Dynamic
Programming
n=2
X2
S2
f2 (S2)
X2 *
4/9
4/9
4/9
1 or 2
4/9
2/3
2/3
2/3
0, 2 or 3
8/9
2/3
2/3
2/3
8/9
0 (or S2 -5)
0
0
0
2/3
2/3
1
Probabilistic Dynamic
Programming
n=1
X1
f1*(s1)
S1
2/3
20/27
2/3
2/3
20/27
X 1*
Probabilistic Dynamic
Programming
Solution
(contd.)