Anda di halaman 1dari 7

Dynamic Programming

DYNAMIC
PROGRAMMING

It is a useful mathematical technique for making a


sequence of interrelated decisions.
Systematic procedure for determining the optimal
combinations of decisions.
There is no standard mathematical formulation of
the Dynamic Programming problem.
Knowing when to apply dynamic programming
depends largely on experience with its general
structure.
Joo Miguel da Costa Sousa / Alexandra Moutinho

Prototype example

Stagecoach problem

Stagecoach problem
Fortune seeker that wants to go from Missouri to
California in the mid-19th century.
Possible routes shown in the figure of next page.
Travel has 4 stages. A is Missouri and J is California.
Cost is the life insurance of a specific route; lowest
cost is equivalent to a safest trip.

Road system and costs.

Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa / Alexandra Moutinho

229

Costs

230

Solving the problem


Note that greedy approach does not work.

Cost cij of going from state i to state j is:

228

 Problem: which route minimizes the total cost of the


policy?
Joo Miguel da Costa Sousa / Alexandra Moutinho

231

 Solution A B F I J has total cost of 13.


 However, e.g. A D F is cheaper than A B F.

Other possibility: trial-and-error. Too much effort


even for this simple problem.
Dynamic programming is much more efficient than
exhaustive enumeration, especially for large
problems.
Usually starts from the last stage of the problem, and
enlarges it one stage at a time.
Joo Miguel da Costa Sousa / Alexandra Moutinho

232

Formulation

Formulation

Decision variables xn (n = 1, 2, 3, 4) are the


immediate destination of stage n.

where
fn (s , x n ) = immediate cost (stage n) + minimum future cost (stages n+1 onward)

 Route is A x1 x2 x3 x4, where x4 = J.

Total cost of the best overall policy for the remaining


stages is fn(s, xn)
 Actual state is s, ready to start stage n, selecting xn as the
immediate destination.

xn* minimizes fn(s, xn) and fn*(s, xn) is the minimum


value of fn(s, xn):
fn* (s) = min fn (s , x n ) = fn (s , x n* )

= c sxn + fn+ 1 ( x n )

Value of csxn is given by cij where i = s, and j = xn.


 Objective: find f1*(A) and the corresponding route.
 Dynamic programming finds successively f4*(s), f3*(s), f2*(s)
and finally f1*(A).

xn

Joo Miguel da Costa Sousa / Alexandra Moutinho

233

Solution procedure

Joo Miguel da Costa Sousa / Alexandra Moutinho

234

Stage n = 3
Needs a few calculations. If fortune seeker is in state F,
he can go to either H or I with costs cF,H = 6 or cF,I = 3.
Choosing H, the minimum additional cost is f4*(H) = 3.
Total cost is 6 + 3 = 9.
Choosing I, the total cost is 3 + 4 = 7. This is smaller,
and it is the optimal choice for state F.

When n = 4, the route is determined by its current


state s (H or I) and its final destination J.
Since f4*(s) = f4*(s, J) = csJ, the solution for n = 4:
s

f4*(s)

x4*

F
3

Joo Miguel da Costa Sousa / Alexandra Moutinho

235

Stage n = 3

x3

f3*(s)

x3*

In this case, f2*(s, x2) = csx2 + f3*(x2).


Example for node C:
 x2 = E: f2*(C, E) = cC,E + f3*(E) = 3 + 4 = 7 optimal
 x2 = F: f2*(C, F) = cC,F + f3*(F) = 2 + 7 = 9.
 x2 = G: f2*(C, G) = cC,G + f3*(G) = 4 + 6 = 10.

f3*(s, x3) = csx3 + f4*(x3)

237

Joo Miguel da Costa Sousa / Alexandra Moutinho

7
2

C
4

Joo Miguel da Costa Sousa / Alexandra Moutinho

236

Stage n = 2

Similar calculations can be made for the two possible


states s = E and s = G, resulting in the table for n = 3:
s

Joo Miguel da Costa Sousa / Alexandra Moutinho

F
G
6

238

Stage n = 2

Stage n = 1

Similar calculations can be made for the two possible


states s = B and s = D, resulting in the table for n = 2:

x2

f2*(s, x2) = csx2 + f3*(x2)


E

f2*(s)

x2*

11

11

12

11

E or F

10

11

E or F

Joo Miguel da Costa Sousa / Alexandra Moutinho

Just one possible starting state: A.


 x1 = B:
 x1 = C:
 x1 = D:

Results in the table:


s

239

Optimal solution

f2*(A, B) = cA,B + f2*(B) = 2 + 11 = 13.


f2*(A, C) = cA,C + f2*(C) = 4 + 7 = 11 optimal
f2*(A, D) = cA,D + f2*(D) = 3 + 8 = 11 optimal

x1
A

f1*(s, x1) = csx1 + f2*(x1)


B
C
D
13
11
11

f1*(s)
11

x1*
C or D

Joo Miguel da Costa Sousa / Alexandra Moutinho

240

Characteristics of DP

Three optimal solutions, all with f1*(A) = 11:

1. The problem can be divided into stages, with a


policy decision required at each stage.
 Example: 4 stages and life insurance policy to choose.
 Dynamic programming problems require making a
sequence of interrelated decisions.

2. Each stage has a number of states associated with


the beginning of each stage.
 Example: states are the possible territories where the
fortune seeker could be located.
 States are possible conditions in which the system might
be.
Joo Miguel da Costa Sousa / Alexandra Moutinho

241

Characteristics of DP

242

Characteristics of DP

3. Policy decision transforms the current state to a state


associated with the beginning of the next stage.
 Example: fortune seekers decision led him from his
current state to the next state on his journey.
 DP problems can be interpreted in terms of networks:
each node correspond to a state.
 Value assigned to each link is the immediate contribution
to the objective function from making that policy
decision.
 In most cases, objective corresponds to finding the
shortest or the longest path.
Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa / Alexandra Moutinho

243

4. The solution procedure finds an optimal policy for


the overall problem. Finds a prescription of the
optimal policy decision at each stage for each of the
possible states.
 Example: solution procedure constructed a table for each
stage, n, that prescribed the optimal decision, xn*, for
each possible state s.
 In addition to identifying optimal solutions, DP provides a
policy prescription of what to do under every possible
circumstance (why a decision is called policy decision).
This is useful for sensitivity analysis.
Joo Miguel da Costa Sousa / Alexandra Moutinho

244

Characteristics of DP

Characteristics of DP

5. Given the current state, an optimal policy for the


remaining stages is independent of the policy
decisions adopted in previous stages.
 Optimal immediate decision depends only on current
state and not on how it was obtained: this is the principle
of optimality for DP.
 Example: at any state, the insurance policy is
independent on how the fortune seeker got there.
 Knowledge of the current state conveys all information
necessary for determining the optimal policy henceforth
(Markovian property). Problems lacking this property are
not Dynamic Programming Problems.
Joo Miguel da Costa Sousa / Alexandra Moutinho

245

Characteristics of DP
7.

xn

 Recursive relationship differs somewhat among


dynamic programming problems.
Joo Miguel da Costa Sousa / Alexandra Moutinho

246

fn* ( sn ) = max { fn (sn , x n )} or fn* (sn ) = min{ fn (sn , x n )}


xn

xn

where fn(sn, xn) is written in terms of sn, xn , fn*+1 ( sn +1 ) , and


probably some measure of the immediate contribution of xn
to the objective function.

sn = current state for stage n.


x n = decision variable for stage n.
x n* = optimal value of x n (given sn ).
fn ( sn , x n ) = contribution of stages n, n + 1, , N to objective
function if system starts in state sn at stage n,
immediate decision is x n , and optimal decisions
are made thereafter.
*
n

f ( sn ) = fn ( sn , x )
Joo Miguel da Costa Sousa / Alexandra Moutinho

247

Characteristics of DP

8. Using recursive relationship, solution procedure


starts at the end and moves backward stage by
stage.
 Stops when optimal policy starting at initial stage is found.
 The optimal policy for the entire problem is found.
 Example: the tables for the stages show this procedure.
Joo Miguel da Costa Sousa / Alexandra Moutinho

248

Deterministic dynamic programming

8. (cont.) For DP problems, a table such as the


following would be obtained for each stage (n = N,
N-1, , 1):

xn

fn* (s ) = min{c sx n + fn*+1 ( x n )}

7. (cont.) Recursive relationship:

N = number of stages.
n = label for current stage (n = 1,2, , N ).

sn

 Example: recursive relationship was

Characteristics of DP

(cont.) Notation:

*
n

6. Solution procedure begins by finding the optimal


policy for the last stage. Solution is usually trivial.
7. A recursive relationship that identifies optimal
policy for stage n, given optimal policy for stage
n+1, is available.

Deterministic problems: the state at the next stage is


completely determined by the state and policy decision
at the current stage.

fn(sn, xn)

Joo Miguel da Costa Sousa / Alexandra Moutinho

fn* (sn )

x n*

 Form of the objective function: minimize or maximize the


sum, product, etc. of the contributions from the individual
stages.
 Set of states: may be discrete or continuous, or a state
vector. Decision variables can also be discrete or continuous.
249

Joo Miguel da Costa Sousa / Alexandra Moutinho

250

Example: distributing medical teams


 The World Health Council has five medical teams to allocate to
three underdeveloped countries.
 Measure of performance: additional person-years of life, i.e.,
increased life expectancy (in years) times countrys population.
Thousands of additional person-years of life
Country
Medical teams

0
50

45

20

70

45

70

90

75

80

105

110

100

120

150

130

Joo Miguel da Costa Sousa / Alexandra Moutinho

251

Formulation of the problem


Problem requires three interrelated decisions: how
many teams to allocate to the three countries
(stages).
xn is the number of teams to allocate to stage n.
What are the states? What changes from one
stage to another?
sn = number of medical teams still available for
remaining countries (n, , 3).
Thus: s1 = 5, s2 = 5 x1 = s1 x1, s3 = s2 x2.
Joo Miguel da Costa Sousa / Alexandra Moutinho

States to be considered

Overall problem
pi(xi): measure of performance from allocating xi
medical teams to country i.

Thousands of additional
person-years of life
Country
Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130

252

Maximize

p ( x ),
i

i =1

subject to
3

= 5,

i =1

and x i are nonnegative integers.


Joo Miguel da Costa Sousa

253

Joo Miguel da Costa Sousa / Alexandra Moutinho

Policy

Solution procedure, stage n = 3


For last stage n = 3, values of p3(x3) are the last
column of table. Here, x3* = s3 and f3*(s3) = p3(s3).

Recursive relationship relating functions:

fn* ( sn ) = max

xn =0,1 ,,sn

*
3

254

{p ( x ) + f
n

*
n +1

( sn x n )} , for n = 1,2

Country

f (s3 ) = max p3 ( x 3 )
x n =0,1,,s3

Joo Miguel da Costa Sousa / Alexandra Moutinho

Thousands of additional
person-years of life

255

Medical
teams

0
50

n = 3:

s3

f3*(s3)

x3*

50

70

45

20

70

45

70

80

90

75

80

100

130

105

110

100

120

150

130

Joo Miguel da Costa Sousa / Alexandra Moutinho

256

Stage n = 2

Stage n = 2

Here, finding x2* requires calculating f2(s2, x2) for the


values of x2 = 0, 1, , s2. Example for s2 = 2:
Thousands of additional
person-years of life

Country

Medical
teams

45

20

50

70

45

70

90

75

80

105

110

100

120

150

130

n=2:

50
20

State: 2

s1

x1
5

50

20

70

70

45

80

90

95

75

100

100

115

125

110

130

120

125

145

160

150

f2*(s2)

x2*

50

70

0 or 1

95

125

160

Joo Miguel da Costa Sousa / Alexandra Moutinho

258

Optimal policy decision


Thousands of additional
person-years of life
Country
Medical
teams

...

125

45

20

50

70

45

70

90

75

80

160

n=1:

257

Only state is the starting


state s1 = 5:
0

Stage n = 1

45

f2*(s2, x2) = p2(x2) + f3*(s2 x2)

x2

70

Joo Miguel da Costa Sousa / Alexandra Moutinho

State: 5

s2

45

120

Similar calculations can be made for the other values


of s2:

105

110

100

120

150

130

f1*(s1, x1) = p1(x1) + f2*(s1 x1)


0

f1*(s1)

x1*

160

170

165

160

155

120

170

Joo Miguel da Costa Sousa / Alexandra Moutinho

259

Distribution of effort problem

260

Formulation of distribution of effort

One kind of resource is allocated to a number of


activities. Objective: how to distribute the effort
(resource) among the activities most effectively.
DP involves only one (or few) resources, while LP can
deal with thousands of resources.
The assumptions of LP: proportionality, divisibility and
certainty can be violated by DP. Only additivity (or
analogous for product of terms) is necessary because
of the principle of optimality.
World Health Council problem violates proportionality and
divisibility (only integers are allowed).
Joo Miguel da Costa Sousa / Alexandra Moutinho

Joo Miguel da Costa Sousa

261

Stage n = activity n (n = 1,2, , N ).


x n = amount of resource allocated to activity n.
State sn = amount of resource still available for
allocation to remaining activities (n, , N ).

When system starts at stage n in state sn, choice xn


results in the next state at stage n + 1 being sn+1 = sn - xn :
Stage:

State:

sn

Joo Miguel da Costa Sousa / Alexandra Moutinho

n+1
xn

sn xn
262

Example
Distributing scientists to research teams
 3 teams are solving engineering problem to safely fly
people to Mars.
 2 extra scientists reduce the probability of failure.
Probability of failure
Team
New scientists

0.40

0.60

0.80

0.20

0.40

0.50

0.15

0.20

0.30

Joo Miguel da Costa Sousa / Alexandra Moutinho

263

Anda mungkin juga menyukai