Dynamic Programming Algorithm Explained in ECE 551 Lecture

ECE 551 LECTURE 2
Preface
This lecture reviews the method of dynamic programming. This method was developed by R. Bellman
during the early 1960s and was meant to leverage the new digital computer technology emerging at the
time. The principle of optimality is a key concept to understanding dynamic programming. Because
dynamic programming is a decision-based algorithm for solving optimization problems that does not
depend on the mathematical particulars of the problem, it has found wide use in both science and
engineering as well as in many non-scientific fields.
Dynamic Programming
Dynamic programming is a method of determining a control input that minimizes the performance
measure, i.e. solves the optimal control problem. The algorithm was developed by R. Bellman using his
principle of optimality concept.
Principle of Optimality
According to Bellman, the following statement is true of an optimal policy, i.e.
An optimal policy has the property that whatever the initial state and initial decision are, the
remaining decisions must constitute an optimal policy with regard to the state resulting from the
first decision.
Thus, consider a multi-stage decision process as shown below:
Jab Jbe
a b e
Suppose that the first decision made at point a results in segment a-b with accompanying cost Jab.
Further suppose that the remaining decision(s) yield segment b-e at a cost Jbe. Thus, the minimum cost
Jae* from a to e is given by
J ae J ab J be (3.1)
1
ECE 551 LECTURE 2
If we apply the principle of optimality, and if a-b-e is the optimal path from a to e, then b-e must be
the optimal path from b to e.
Suppose multiple paths exist between the initial and final states of a dynamical process. For example,
consider the process with initial state b and final state f as shown below:
J*cf
Jbc c
Jbd J*df
b d f
Jbe
J*ef
Each of the paths shown above represents an allowable decision at state b, and the paths from the
intermediate states c, d, and e to the final state f all are optimal. So, which trajectory represents the
optimal path b-f? The answer to that question is found by examining the following quantities (costs):
*
C bcf J bc J cf*
*
C bdf J bd J df*
*
C bef J be J ef*
The lowest cost among these three alternatives must represent the optimal decision at state b. Thus,
dynamic programming is a computational technique using a sequence of decisions to define an optimal
state trajectory.
Simple Illustrative Example
A motorist wishes to minimize the cost of reaching a destination h from his current location. He must
travel through a series of one-way streets as depicted in the diagram below.
a d e h
N
8 3 8
5 5 2 2
E
9 3 3
b c f g
2
ECE 551 LECTURE 2
Referring to the street diagram above, dots represent intersections and arrows indicate allowed travel
directions. Each state corresponds to an intersection, and each decision is the choice of heading (i.e.
control) when leaving an intersection.
Now, suppose that the motorist is at c. He has only two choices, i.e. go to d or go to f. Lets assume
that the motorist knows the minimum costs J*dh = 10 and J*fh = 5. Then the minimum cost J*ch is the
smaller of the following quantities:

C cdh J cd J dh 5 10 15

C cfh J cf J fh 3 5 8
Thus, we have that

J ch min C cdh

, C cfh
min 15 , 8
8
So, the optimal decision at c is to go to f. But, how does the motorist know the values for J*dh and J*fh?
The answer is that these values are pre-calculated by working backward from h, e.g.

J gh 2 (there is only one possible path from g to h)
And so,
J fh J fg J gh

3 2
5
The situation for determining J*dh is a bit more complicated since there are two possible paths to h from
e, i.e.

J eh* min C eh , C efh
*

min J eh , J ef J fh
min 8 , 7
7
And so,

J dh J de J eh 3 7 10
3
ECE 551 LECTURE 2
Note that the optimal path for the motorist problem was found by starting at the end point (final
state) and working backwards toward the beginning point (initial state). At each succeeding point,
one incrementally computed a segment of the optimal path by selecting the minimum cost among
all allowable decisions or choices. This is the algorithm of dynamic programming.
Let us define the following quantities:
is the current state
ui is an allowable decision at state
xi is the state adjacent to reached by application of u i
h is the final state
J xi is the cost to move from to xi
J xi h is the minimum cost to reach h from xi
C xi h is the minimum cost to go from to h via xi
J h is the minimum cost to go from to h via any allowable path
u ( ) is the optimal decision (control) at
Using the above quantities we have that
C xi h J xi J xi h

(3.2)
J h min C x1 h , C x2 h , , C xi h ,
Equation (3.2) defines the algorithm known as dynamic programming.
4
ECE 551 LECTURE 2
Lets now re-visit our motorist problem and apply the algorithm of dynamic programming. We start at h

and work backwards so that J xi h is known prior to calculating C xi h J xi J xi h . The table below
provides the results of applying dynamic programming to the motorist problem.
ui xi C xi h J h u ( )
g N h 2+0=2 2 N
f E g 3+2=5 5 E
E h 8+0=8
e 7 S
S f 2+5=7
d E e 3 + 7 = 10 10 E
N d 5 + 10 = 15
c 8 E
E f 3+5=8
b E c 9 + 8 = 17 17 E
E d 8 + 10 = 18
a 18 E
S b 5 + 17 = 22
Lets work an example using our table above. We will start at b. From the table, the optimal decision is
to head E (East) to c. At c, the optimal choice is to again head E, and so on. The resulting optimal
path is diagrammed below.
h
N
2
9 3 3 E
b c f g
5
ECE 551 LECTURE 2
Thus, the optimal path results in the following cost,

J bh 9 3 3 2 17
Optimal Control Problem
With these concepts in mind, lets now attack a simple optimal control problem. Consider the following
first-order dynamical equation, i.e.
x (t ) a x (t ) b u (t ) (3.3)
We require that the state and control variables must obey the following constraints
0 x (t ) 1.5
(3.4)
1 u (t ) 1
Finally, the performance measure to be minimized is given by the following expression:
T
J x 2 (T ) u 2 (t ) dt (3.5)
0
Before we use dynamic programming to solve this problem, we must first transform equation (3.3) into a
discrete-time representation or difference equation. For simplicity, we begin by approximating the
derivative by a finite difference, i.e.
x (t t ) x (t )
a x (t ) b u (t )
t
Or,
x (t t ) 1 a t x (t ) b t u (t ) (3.6)
Where t is the time interval associated with subdividing 0 t T into N equal segments. Since
t k t for k 0, 1, 2, , N 1 , we have that
x (k 1 ) 1 a t x (k ) b t u ( k ) (3.7)
Similarly, we can convert the performance measure into a discrete representation as follows:
6
ECE 551 LECTURE 2
t 2 2t N t

J x ( N t ) u (0 ) dt u ( t ) dt u 2 ( N 1 t ) dt
2 2
0 t ( N 1) t
Or, assuming a rectangular approximation to the integral terms, we have that
N 1
J x ( N ) t u 2 (k )
2
(3.8)
k 0
Lets substitute the following values for the un-defined parameters, i.e.
a0
b 1
2
T 2
t 1 N 2
Thus,
x ( k 1) x ( k ) u ( k ) for k 0, 1 (3.9)
J x 2 (2) 2 u 2 (0) 2 u 2 (1) (3.10)
0 x ( k ) 1 .5 for k 0, 1, 2 (3.11)
1 u (k ) 1 for k 0, 1 (3.12)
One last chore is to quantify the constraints, and so we let
x ( k ) 0.0, 0.5, 1.0, 1.5

u (k ) 1.0, 0.5, 0.0, 0.5, 1.0
We can now apply the method of dynamic programming to solve this problem. We begin our calculations
by letting k 1 . Thus, we compute x( 2) x (1) u (1) using all admissible values of x(1) and u (1) as
shown in the Table 1 below.
7
ECE 551 LECTURE 2
Table 1
u (1)
-1.0 -0.5 0.0 +0.5 +1.0
+1.5 +0.5 +1.0 +1.5
+1.0 0.0 +0.5 +1.0 +1.5

x (1)
+0.5 0.0 +0.5 +1.0 +1.5
0.0 0.0 +0.5 +1.0
Note that the populated entries in Table 1 above correspond to the values of x ( 2) , while the empty boxes
shaded light-red represent non-admissible quantities. Given these results, we can now calculate
J 12 x (1), u 1 x (2) 2 u (1) . Table 2 below captures those predicted costs.
2 2
Table 2
u (1)
-1.0 -0.5 0.0 +0.5 +1.0
+1.5 (0.5)2 +2(-1)2 = 2.25 1.50 2.25
+1.0 (0.0)2 +2(-1)2 = 2.00 0.75 1.00 2.75

x (1)
+0.5 0.50 0.25 1.50 4.25
0.0 0.00 0.75 3.00
8
ECE 551 LECTURE 2
Note that the populated boxes in Table 2 above correspond to the values of J 12 x (1), u 1 , i.e. the cost
of applying the control u (1) at state x (1) . As before, the empty boxes shaded light-red represent non-
admissible quantities. Whereas, each box shaded light-green represents the optimal decision, i.e.
u
x (1),1 , and its corresponding cost J 12 x (1) .
Note that we started at the last stage, i.e. k 1 or the second to last state, and we computed the optimal
decision (control) associated with each possible value of the state, i.e. u

x (1),1 . So, we are following
the optimal path backwards from the final state, which is not fixed in this particular case; we are
implementing the dynamic programming algorithm.
Since the optimal control u x (1),1 and its corresponding cost J 12 x (1) will be needed in subsequent
stages, we must store these quantities in computer memory as shown in Table 3 below.
Table 3
J 12 x (1) u x (1),1
J 12 1.5 1.50 u 1.5,1 0.5
J 12 1.0 0.75 u 1.0,1 0.5
J 12 0.5 0.25 u 0.5,1 0.0
J 12 0.0 0.00 u 0.0,1 0.0
The next step in this process is to let k 0 , and calculate the cost of the two-stage process. We need
not compute x (1) x (0) u (0) , since Table 1 above already reflects those results directly if we simply
replace k 1 with k 0 in the table. Instead, we compute C 02 x (0), u (0) 2 u 2 (0) J 12 x (1) as
shown in Table 4 below.
9
ECE 551 LECTURE 2
Table 4
u (0)
-1.0 -0.5 0.0 +0.5 +1.0
+1.5 0.25 +2(-1)2 = 2.25 1.25 1.50
+1.0 0.00 +2(-1)2 = 2.00 0.75 0.75 2.00

x(0)
+0.5 0.50 0.25 1.25 3.50
0.0 0.00 0.75 2.75
As before, each box shaded light-green represents the optimal decision, i.e. u

x (0), 0 , and its
corresponding cost J

02 x ( 0) . Note that the optimal control is not unique. We will re-visit this result
later in the course. For now, we would once again store the quantities in computer memory as shown in
Table 5 below.
Table 5

J 02 x ( 0) u x (0), 0

J 02 1.5 1.25 u 1.5, 0 0.5
1.0 0.0
0.75

J 02 u 1.0, 0
0.75 0.5

J 02 0.5 0.25 u 0.5, 0 0.0

J 02 0.0 0.00 u 0.0, 0 0.0
10
ECE 551 LECTURE 2
Of course, one could extend this procedure (process) to as many stages as required. Thus, in general,
we have that

C kN x (k ), u (k ) J k , k 1 x (k ), u (k ) J k1, N x (k 1)

J kN
u (k )

x (k ) min C kN x (k ), u (k ) (4.9)
Note that equation (4.9) represents the functional equation of dynamic programming. We will next take
these mathematical expressions and make them specific to the optimal control problem. Before we do
that, a few comments are in order regarding the numeric choices (discrete values) made for the
constraints in this problem. The example above used discrete values for the constraints which were
chosen merely for convenience, i.e. calculation was straightforward and results always remained within
the set of discrete values. Thus, computational complexity was either avoided altogether or made
minimal. In practice, a digital computer will be used to solve the optimal control problem, and so much
finer numerical resolution will be used (if not required) and the possibility of needing to interpolate values
will generally arise.
11

Dynamic Programming Algorithm Explained in ECE 551 Lecture

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Dynamic Programming Algorithm Explained in ECE 551 Lecture

Diunggah oleh

Hak Cipta:

Format Tersedia

ECE 551 LECTURE 2

According to Bellman, the following statement is true of an optimal policy, i.e.

Thus, consider a multi-stage decision process as shown below:

Simple Illustrative Example

Thus, we have that

Let us define the following quantities:

is the current state

ui is an allowable decision at state

xi is the state adjacent to reached by application of u i

h is the final state

J xi is the cost to move from to xi

J xi h is the minimum cost to reach h from xi

C xi h is the minimum cost to go from to h via xi

J h is the minimum cost to go from to h via any allowable path

u ( ) is the optimal decision (control) at

Using the above quantities we have that

Equation (3.2) defines the algorithm known as dynamic programming.

Thus, the optimal path results in the following cost,

Optimal Control Problem

Finally, the performance measure to be minimized is given by the following expression:

Or, assuming a rectangular approximation to the integral terms, we have that

J x 2 (2) 2 u 2 (0) 2 u 2 (1) (3.10)

One last chore is to quantify the constraints, and so we let

x ( k ) 0.0, 0.5, 1.0, 1.5

-1.0 -0.5 0.0 +0.5 +1.0

+1.5 +0.5 +1.0 +1.5

+1.0 0.0 +0.5 +1.0 +1.5

0.0 0.0 +0.5 +1.0

-1.0 -0.5 0.0 +0.5 +1.0

+1.5 (0.5)2 +2(-1)2 = 2.25 1.50 2.25

+1.0 (0.0)2 +2(-1)2 = 2.00 0.75 1.00 2.75

0.0 0.00 0.75 3.00

J 12 1.5 1.50 u 1.5,1 0.5

J 12 1.0 0.75 u 1.0,1 0.5

J 12 0.5 0.25 u 0.5,1 0.0

J 12 0.0 0.00 u 0.0,1 0.0

-1.0 -0.5 0.0 +0.5 +1.0

+1.5 0.25 +2(-1)2 = 2.25 1.25 1.50

+1.0 0.00 +2(-1)2 = 2.00 0.75 0.75 2.00

0.0 0.00 0.75 2.75

Anda mungkin juga menyukai