Probabilistic Dynamic Programming (Stochastic Dynamic Programming)

PROBABILISTIC DYNAMIC
PROGRAMMING
Neal Cristian S. Perlas

Probabilistic Dynamic Programming
(Stochastic Dynamic Programming)
 What does Stochastic means? It is having a random probability distribution or pattern
that may be analyzed statistically but may not be predicted precisely.
 Uncertainty is involved
 Given input results to different outputs
 Uses backward recursion or backward pass rule
 Has three basic elements
a. Stages
b. State
c. Objective
As many of the problems in the ﬁeld of Operations Research deals with future planning and
many future events are hard to predict with certainty, it is not hard to imagine the
importance of SDP and related techniques. According to Bellmann and Dreyfus [5] this -
that is; the stochastic case - is always the actual situation.
Problem:
An enterprising young statistician believes that she has
developed a system for winning a popular Las Vegas
game. Her colleagues do not believe that her system
works, so they have made a large bet with her that if she
starts with three chips, she will not have at least five chips
after three plays of the game. Each play of the game
involves betting any desired number of available chips
and then either winning or losing this number of chips. The
statistician believes that her system will give her a
probability of 2/3 of winning a given play of the game.
n (Stage) = nth play of game (1,2,3)

Xn = number of chips to bet at stage n
Sn (State) = number of chips in hand to begin stage n.
Objective = to win the bet against her colleagues (to have
at least 5 chips after three plays of the game)
fn(Sn,Xn) – probability of finishing three plays with at least
five chips
If she wins, the state at the next stage will be fn+1 = (Sn + Xn).
probability of winning = 2/3
If she loses, the state at the next stage will be fn+1 = (Sn – Xn).
probability of losing = 1 – 2/3 = 1/3
1 2
f*n(Sn,Xn) = (Sn – Xn) + (Sn + Xn)
3 3
n=3
X3
F*3(S3) X*3
S3
0 0 -
1 0 -
2 0 -
3 2/3 2 (or more)
4 2/3 1 (or more)
≥5 1 0 (or ≤ S3 – 5 )
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
≥5
1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3
1 2 1 2 1 2
f2(0,0) = f*3(0-0) + f*3(0+0) = f*3(0) + f*3(0) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2
f2(1,0) = f*3(1-0) + f*3(1+0) = f*3(1) + f*3(1) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2
f2(1,1) = f*3(1-1) + f*3(1+1) = f*3(0) + f*3(2) = (0) + (0)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
2 0 4/9 4/9 4/9 1 or 2
≥5
1 2
f2(s2,x2) = f*n+1(s2-x2) + f*n+1(s2+x2)
3 3
1 2 1 2 1 2
f2(2,0) = f*3(2-0) + f*3(2+0) = f*3(2) + f*3(2) = (0) + (0)
3 3 3 3 3 3
1 2 1 2 1 2 2
f2(2,1) = f*3(2-1) + f*3(2+1) = f*3(1) + f*3(3) = (0) + ( )
3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(2,2) = f*3(2-2) + f*3(2+2) = f*3(0) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
2 0 4/9 4/9 4/9 1 or 2
3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3
≥5
1 2 1 2 1 2 2 2
f2(3,0) = f*3(3-0) + f*3(3+0) = f*3(3) + f*3(3) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(3,1) = f*3(3-1) + f*3(3+1) = f*3(2) + f*3(4) = (0) + ( )
3 3 3 3 3 3 3
1 2 1 2 1 2
f2(3,2) = f*3(3-2) + f*3(3+2) = f*3(1) + f*3(5) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(3,3) = f*3(3-3) + f*3(3+3) = f*3(0) + f*3(6) = (0) + (1)
3 3 3 3 3 3
n=2
𝟏 𝟐
X2 f*2(S2,X2) = f*3(S2 – X2) + f*3(S2 + X2)
𝟑 𝟑
f*2(S2) X*2
S2 0 1 2 3 4
0 0 0 -
1 0 0 0 -
2 0 4/9 4/9 4/9 1 or 2
3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3
4 2/3 8/9 2/3 2/3 2/3 8/9 1
≥5 1 1 0 (or ≤ S2 – 5 )
1 2 1 2 1 2 2 2
f2(4,0) = f*3(4-0) + f*3(4+0) = f*3(4) + f*3(4) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 2 2
f2(4,1) = f*3(4-1) + f*3(4+1) = f*3(3) + f*3(5) = ( ) + (1)
3 3 3 3 3 3 3
1 2 1 2 1 2
f2(4,2) = f*3(4-2) + f*3(4+2) = f*3(2) + f*3(6) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(4,3) = f*3(4-3) + f*3(4+3) = f*3(1) + f*3(7) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f2(4,4) = f*3(4-4) + f*3(4+4) = f*3(0) + f*3(8) = (0) + (1)
3 3 3 3 3 3
n=1
𝟏 𝟐
X1 f*1(S1,X1) = f*2(S1 – X1) + f*2(S1 + X1)
𝟑 𝟑
F*1(S1) X*1
S1 0 1 2 3
3 2/3 20/27 2/3 2/3 20/27 1
1 2 1 2 1 2 2 2
f1(3,0) = f*2(3-0) + f*2(3+0) = f*2(3) + f*2(3) = ( ) + ( )
3 3 3 3 3 3 3 3
1 2 1 2 1 4 2 8
f1(3,1) = f*2(3-1) + f*2(3+1) = f*2(2) + f*2(4) = ( ) + ( )
3 3 3 3 3 9 3 9
1 2 1 2 1 2
f1(3,2) = f*2(3-2) + f*2(3+2) = f*2(1) + f*2(5) = (0) + (1)
3 3 3 3 3 3
1 2 1 2 1 2
f1(3,3) = f*2(3-3) + f*2(3+3) = f*2(0) + f*2(6) = (0) + (1)
3 3 3 3 3 3
This policy gives the statistician a probability of 20/27 of winning

her bet with her colleagues.
Thank
You!

Probabilistic Dynamic Programming (Stochastic Dynamic Programming)

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Probabilistic Dynamic Programming (Stochastic Dynamic Programming)

Diunggah oleh

Hak Cipta:

Format Tersedia

PROBABILISTIC DYNAMIC

Neal Cristian S. Perlas

n (Stage) = nth play of game (1,2,3)

2 0 4/9 4/9 4/9 1 or 2

2 0 4/9 4/9 4/9 1 or 2

3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3

2 0 4/9 4/9 4/9 1 or 2

3 2/3 4/9 2/3 2/3 2/3 0, 2 or 3

4 2/3 8/9 2/3 2/3 2/3 8/9 1

3 2/3 20/27 2/3 2/3 20/27 1

This policy gives the statistician a probability of 20/27 of winning

Anda mungkin juga menyukai