7 OD Dynamic Programming A-2008

Dynamic Programming
DYNAMIC
PROGRAMMING
It is a useful mathematical technique for making a

sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
the Dynamic Programming problem.
Knowing when to apply dynamic programming
depends largely on experience with its general
structure.
Joo Miguel da Costa Sousa / Alexandra Moutinho
Prototype example
Stagecoach problem
Stagecoach problem
Fortune seeker that wants to go from Missouri to
California in the mid-19th century.
Possible routes shown in the figure of next page.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowest
cost is equivalent to a safest trip.
Road system and costs.
229
Costs
230
Solving the problem

Note that greedy approach does not work.
Cost cij of going from state i to state j is:
228
Problem: which route minimizes the total cost of the

policy?
231
Solution A B F I J has total cost of 13.

However, e.g. A D F is cheaper than A B F.
Other possibility: trial-and-error. Too much effort

even for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large
problems.
Usually starts from the last stage of the problem, and
enlarges it one stage at a time.
232
Formulation
Formulation
Decision variables xn (n = 1, 2, 3, 4) are the

immediate destination of stage n.
where
fn (s , x n ) = immediate cost (stage n) + minimum future cost (stages n+1 onward)
Route is A x1 x2 x3 x4, where x4 = J.
Total cost of the best overall policy for the remaining

stages is fn(s, xn)
Actual state is s, ready to start stage n, selecting xn as the
immediate destination.
xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum

value of fn(s, xn):
fn* (s) = min fn (s , x n ) = fn (s , x n* )
= c sxn + fn+ 1 ( x n )
Value of csxn is given by cij where i = s, and j = xn.

Objective: find f1*(A) and the corresponding route.
Dynamic programming finds successively f4*(s), f3*(s), f2*(s)
and finally f1*(A).
xn
233
Solution procedure
234
Stage n = 3
Needs a few calculations. If fortune seeker is in state F,
he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller,
and it is the optimal choice for state F.
When n = 4, the route is determined by its current

state s (H or I) and its final destination J.
Since f4*(s) = f4*(s, J) = csJ, the solution for n = 4:
s
f4*(s)
x4*
F
3
235
Stage n = 3
x3
f3*(s)
x3*
In this case, f2*(s, x2) = csx2 + f3*(x2).

Example for node C:
x2 = E: f2*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal
x2 = F: f2*(C, F) = cC,F + f3*(F) = 2 + 7 = 9.
x2 = G: f2*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.
f3*(s, x3) = csx3 + f4*(x3)
237
7
2
C
4
236
Stage n = 2
Similar calculations can be made for the two possible

states s = E and s = G, resulting in the table for n = 3:
s
F
G
6
238
Stage n = 2
Stage n = 1
Similar calculations can be made for the two possible

states s = B and s = D, resulting in the table for n = 2:
x2
f2*(s, x2) = csx2 + f3*(x2)

E
f2*(s)
x2*
11
11
12
11
E or F
10
11
E or F
Just one possible starting state: A.

x1 = B:
x1 = C:
x1 = D:
Results in the table:

s
239
Optimal solution
f2*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.

f2*(A, C) = cA,C + f2*(C) = 4 + 7 = 11 optimal
f2*(A, D) = cA,D + f2*(D) = 3 + 8 = 11 optimal
x1
A
f1*(s, x1) = csx1 + f2*(x1)

B
C
D
13
11
11
f1*(s)
11
x1*
C or D
240
Characteristics of DP
Three optimal solutions, all with f1*(A) = 11:
1. The problem can be divided into stages, with a

policy decision required at each stage.
Example: 4 stages and life insurance policy to choose.
Dynamic programming problems require making a
sequence of interrelated decisions.
2. Each stage has a number of states associated with

the beginning of each stage.
Example: states are the possible territories where the
fortune seeker could be located.
States are possible conditions in which the system might
be.
241
242
3. Policy decision transforms the current state to a state

associated with the beginning of the next stage.
Example: fortune seekers decision led him from his
current state to the next state on his journey.
DP problems can be interpreted in terms of networks:
each node correspond to a state.
Value assigned to each link is the immediate contribution
to the objective function from making that policy
decision.
In most cases, objective corresponds to finding the
shortest or the longest path.
243
4. The solution procedure finds an optimal policy for

the overall problem. Finds a prescription of the
optimal policy decision at each stage for each of the
possible states.
Example: solution procedure constructed a table for each
stage, n, that prescribed the optimal decision, xn*, for
each possible state s.
In addition to identifying optimal solutions, DP provides a
policy prescription of what to do under every possible
circumstance (why a decision is called policy decision).
This is useful for sensitivity analysis.
244
5. Given the current state, an optimal policy for the

remaining stages is independent of the policy
decisions adopted in previous stages.
Optimal immediate decision depends only on current
state and not on how it was obtained: this is the principle
of optimality for DP.
Example: at any state, the insurance policy is
independent on how the fortune seeker got there.
Knowledge of the current state conveys all information
necessary for determining the optimal policy henceforth
(Markovian property). Problems lacking this property are
not Dynamic Programming Problems.
245
7.
xn
Recursive relationship differs somewhat among

dynamic programming problems.
246
fn* ( sn ) = max { fn (sn , x n )} or fn* (sn ) = min{ fn (sn , x n )}

xn
xn
where fn(sn, xn) is written in terms of sn, xn , fn*+1 ( sn +1 ) , and

probably some measure of the immediate contribution of xn
to the objective function.
sn = current state for stage n.

x n = decision variable for stage n.
x n* = optimal value of x n (given sn ).
fn ( sn , x n ) = contribution of stages n, n + 1, , N to objective
function if system starts in state sn at stage n,
immediate decision is x n , and optimal decisions
are made thereafter.
*
n
f ( sn ) = fn ( sn , x )
247
8. Using recursive relationship, solution procedure

starts at the end and moves backward stage by
stage.
Stops when optimal policy starting at initial stage is found.
The optimal policy for the entire problem is found.
Example: the tables for the stages show this procedure.
248
Deterministic dynamic programming
8. (cont.) For DP problems, a table such as the

following would be obtained for each stage (n = N,
N-1, , 1):
xn
fn* (s ) = min{c sx n + fn*+1 ( x n )}
7. (cont.) Recursive relationship:
N = number of stages.
n = label for current stage (n = 1,2, , N ).
sn
Example: recursive relationship was
(cont.) Notation:
*
n
6. Solution procedure begins by finding the optimal

policy for the last stage. Solution is usually trivial.
7. A recursive relationship that identifies optimal
policy for stage n, given optimal policy for stage
n+1, is available.
Deterministic problems: the state at the next stage is

completely determined by the state and policy decision
at the current stage.
fn(sn, xn)
fn* (sn )
x n*
Form of the objective function: minimize or maximize the

sum, product, etc. of the contributions from the individual
stages.
Set of states: may be discrete or continuous, or a state
vector. Decision variables can also be discrete or continuous.
249
250
Example: distributing medical teams

The World Health Council has five medical teams to allocate to
three underdeveloped countries.
Measure of performance: additional person-years of life, i.e.,
increased life expectancy (in years) times countrys population.
Thousands of additional person-years of life
Country
Medical teams
0
50
45
20
70
45
70
90
75
80
105
110
100
120
150
130
251
Formulation of the problem

Problem requires three interrelated decisions: how
many teams to allocate to the three countries
(stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one
stage to another?
sn = number of medical teams still available for
remaining countries (n, , 3).
Thus: s1 = 5, s2 = 5 x1 = s1 x1, s3 = s2 x2.
States to be considered
Overall problem
pi(xi): measure of performance from allocating xi
medical teams to country i.
Thousands of additional
person-years of life
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
252
Maximize
p ( x ),
i
i =1
subject to
3
= 5,
i =1
and x i are nonnegative integers.

Joo Miguel da Costa Sousa
253
Policy
Solution procedure, stage n = 3

For last stage n = 3, values of p3(x3) are the last
column of table. Here, x3* = s3 and f3*(s3) = p3(s3).
Recursive relationship relating functions:
fn* ( sn ) = max
xn =0,1 ,,sn
*
3
254
{p ( x ) + f
n
*
n +1
( sn x n )} , for n = 1,2
Country
f (s3 ) = max p3 ( x 3 )
x n =0,1,,s3
255
Medical
teams
0
50
n = 3:
s3
f3*(s3)
x3*
50
70
45
20
70
45
70
80
90
75
80
100
130
105
110
100
120
150
130
256
Stage n = 2
Stage n = 2
Here, finding x2* requires calculating f2(s2, x2) for the

values of x2 = 0, 1, , s2. Example for s2 = 2:
Country
Medical
teams
45
20
50
70
45
70
90
75
80
105
110
100
120
150
130
n=2:
50
20
State: 2
s1
x1
5
50
20
70
70
45
80
90
95
75
100
100
115
125
110
130
120
125
145
160
150
f2*(s2)
x2*
50
70
0 or 1
95
125
160
258
Optimal policy decision

Country
Medical
teams
...
125
45
20
50
70
45
70
90
75
80
160
n=1:
257
Only state is the starting

state s1 = 5:
0
Stage n = 1
45
f2*(s2, x2) = p2(x2) + f3*(s2 x2)
x2
70
State: 5
s2
45
120
Similar calculations can be made for the other values

of s2:
105
110
100
120
150
130
f1*(s1, x1) = p1(x1) + f2*(s1 x1)

0
f1*(s1)
x1*
160
170
165
160
155
120
170
259
Distribution of effort problem
260
Formulation of distribution of effort
One kind of resource is allocated to a number of

activities. Objective: how to distribute the effort
(resource) among the activities most effectively.
DP involves only one (or few) resources, while LP can
deal with thousands of resources.
The assumptions of LP: proportionality, divisibility and
certainty can be violated by DP. Only additivity (or
analogous for product of terms) is necessary because
of the principle of optimality.
World Health Council problem violates proportionality and
divisibility (only integers are allowed).
Joo Miguel da Costa Sousa
261
Stage n = activity n (n = 1,2, , N ).

x n = amount of resource allocated to activity n.
State sn = amount of resource still available for
allocation to remaining activities (n, , N ).
When system starts at stage n in state sn, choice xn

results in the next state at stage n + 1 being sn+1 = sn - xn :
Stage:
State:
sn
n+1
xn
sn xn
262
Example
Distributing scientists to research teams
3 teams are solving engineering problem to safely fly
people to Mars.
2 extra scientists reduce the probability of failure.
Probability of failure
Team
New scientists
0.40
0.60
0.80
0.20
0.40
0.50
0.15
0.20
0.30
263

7 OD Dynamic Programming A-2008

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

7 OD Dynamic Programming A-2008

Diunggah oleh

Hak Cipta:

Format Tersedia

Dynamic Programming

It is a useful mathematical technique for making a

Road system and costs.

Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa / Alexandra Moutinho

Solving the problem

Cost cij of going from state i to state j is:

Problem: which route minimizes the total cost of the

Solution A B F I J has total cost of 13.

Other possibility: trial-and-error. Too much effort

Decision variables xn (n = 1, 2, 3, 4) are the

Route is A x1 x2 x3 x4, where x4 = J.

Total cost of the best overall policy for the remaining

xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum

Value of csxn is given by cij where i = s, and j = xn.

Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa / Alexandra Moutinho

When n = 4, the route is determined by its current

Joo Miguel da Costa Sousa / Alexandra Moutinho

In this case, f2*(s, x2) = csx2 + f3*(x2).

f3*(s, x3) = csx3 + f4*(x3)

Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa / Alexandra Moutinho

Similar calculations can be made for the two possible

Joo Miguel da Costa Sousa / Alexandra Moutinho

Similar calculations can be made for the two possible

f2*(s, x2) = csx2 + f3*(x2)

Joo Miguel da Costa Sousa / Alexandra Moutinho

Just one possible starting state: A.

Results in the table:

f2*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.

f1*(s, x1) = csx1 + f2*(x1)

Joo Miguel da Costa Sousa / Alexandra Moutinho

Three optimal solutions, all with f1*(A) = 11:

1. The problem can be divided into stages, with a

2. Each stage has a number of states associated with

3. Policy decision transforms the current state to a state

Joo Miguel da Costa Sousa / Alexandra Moutinho

4. The solution procedure finds an optimal policy for

5. Given the current state, an optimal policy for the

Recursive relationship differs somewhat among

fn* ( sn ) = max { fn (sn , x n )} or fn* (sn ) = min{ fn (sn , x n )}

where fn(sn, xn) is written in terms of sn, xn , fn*+1 ( sn +1 ) , and

sn = current state for stage n.

8. Using recursive relationship, solution procedure

Deterministic dynamic programming

8. (cont.) For DP problems, a table such as the

fn* (s ) = min{c sx n + fn*+1 ( x n )}

7. (cont.) Recursive relationship:

Example: recursive relationship was

6. Solution procedure begins by finding the optimal

Deterministic problems: the state at the next stage is

Joo Miguel da Costa Sousa / Alexandra Moutinho

Form of the objective function: minimize or maximize the

Joo Miguel da Costa Sousa / Alexandra Moutinho

Example: distributing medical teams

Joo Miguel da Costa Sousa / Alexandra Moutinho

Formulation of the problem

and x i are nonnegative integers.

Joo Miguel da Costa Sousa / Alexandra Moutinho

Solution procedure, stage n = 3

Recursive relationship relating functions:

Joo Miguel da Costa Sousa / Alexandra Moutinho

In this case, f2(s, x2) = csx2 + f3(x2).

f3(s, x3) = csx3 + f4(x3)

f2(s, x2) = csx2 + f3(x2)

f2(A, B) = cA,B + f2(B) = 2 + 11 = 13.

f1(s, x1) = csx1 + f2(x1)

f2(s2, x2) = p2(x2) + f3(s2 x2)

f1(s1, x1) = p1(x1) + f2(s1 x1)