Anda di halaman 1dari 2

An Iterative Solution to the Finite-Time Linear Quadratic Optimal Feedback Control Problem

Randal Beard George Saridis John Wen Department of Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute Troy, NY 12180-3590, e-mail: beard, saridis, wen]@cat.rpi.edu

Abstract: Saridis's successive approximation theory is applied to the nite-time linear quadratic optimal control problem. The result is an iterative scheme which successively improves any initial control law ultimately converging to the optimal state feedback control. The novelty of the approach is that the solution of a nonlinear Riccati equation is replaced by the successive solution to a linear Lyapunov equation. Numerical examples illustrate the approach.
in 5, 6] to the nite-time LQ optimal control problem, to obtain an easily computable feedback control law. Without loss of generality, we consider the problem where an initial state is to be driven to the origin in nite time. The system equations are given by the standard formulas below:

2 The Main Algorithm

To improve an initial stationary feedback control law we apply the approximation theory presented in 6]. To be de nite, let u(0) be de ned as (0) In the in nite-time case, the initial control law is required to be stabilizing for the approximation method to converge. For nite-time this is not the case; in particular we may choose K = 0. However, as we will show K provides a degree of freedom which may be judiciously chosen to speed the convergence of the algorithm.

u (t; x) = ?K (t)x(t); t 2 0; T ]:

(8)

1 Introduction In this paper we apply the successive approximation theory developed


x _ = Ax + Bu IRn

Initial Step

V (0) (T; x(T )) = kx(T )k2 ; where A(0) = A ? BK and G(0) = Q + K T RK . Assuming that V (0) (t; x) is of the form Assuming that R is symmetric positive de nite and that (A; B ) is controllable, then a necessary condition for V to be the optimal cost V (0) (t; x) = xT (t)P (0) (t)x(t) (10) is that it satisfy the Hamilton-Jacobi-Bellman equation (0) (0) T where P (t) = P (t), and that the trajectory is non-singular, (9) reduces to T BR?1 B T Vx = 0 V (3) Vt + Vx T Ax + xT Qx ? 1 x 4 _ (0) + P (0) A(0) + A(0)T P (0) + G(0) = 0; P (0) (T ) = I: (11) P

J =

kx(T )k +
2

Z T
0

(1)

According to the approximation theory 6] we initialize the process by nding V (0) (t; x) to satisfy the Generalized-Hamilton-Jacobi-Bellman (GHJB) equation: with boundary condition

kxkQ + kukR dt;


2 2

(2)

Vt(0) + Vx(0)T A(0) x + xT G(0) x = 0;

(9)

! IRm , A, B, Q, and R are constant matrices of compatible dimension, and kxk2 Q = xT Qx.

where x 2 IRn , u : IR

with boundary condition

If we assume that V is of the form

V (T; x(T )) = kx(T )k :


2

and that the resulting optimal trajectory is non-singular (x(t) 6= 0 on a time set of positive measure), then the optimal cost is given by (4) if P , satis es _ + P A + AT P + Q ? P BR?1 BT P = 0; P (T ) = I: (5) P

V (t; x) = xT (t)P (t)x(t)

(4)

It is well known (c.f. 1]) that (11), which is a Lyapunov equation, has the solution Z T 2 2 P (0) (t) = eA(0) (T ?t) + eA(0) ( ?t) d : (12)
t G(0)

The Iterative Step

Furthermore, the optimal control is given by the feedback control u (t; x) = ?R?1 BT P (t)x(t); (6) and the optimal cost is

J (x0 ; u ) = xT 0 P (0)x0 :

(7)

While this result is well known 4], in practice it is hard to implement due to the di culty in solving the time-varying Riccati equation, (5). In section 2 we present an iterative scheme that is easy to compute and successively improves any feedback control law, ultimately converging to the optimal feedback control. Iterative solutions to the algebraic Riccati equation (ARE) have been reported in 2, 3, 6]. This paper extends these solutions to the nite-time case. One of the major differences between the results of this paper and those for the ARE, is that all requirements on the initial iteration (i.e. stability of the initial control law) are removed. The signi cance of Saridis's approximation approach is that any initial control is successively improved and that the control law at any iteration has a guaranteed (sub-optimal) performance index.

Setting i = 1, we let the new control law be u(i) (t; x) = ?R?1 BT P (i?1) (t)x(t): (13) ( i ? 1) ? 1 T ( i ? 1) ( i ? 1) Letting A = A ? BR B P and G = Q+ P (i?1) BR?1 BT P (i?1) we nd the value function for this control law by solving the following GHJB equation: Vt(i) + Vx(i)T A(i?1) x + xT G(i?1) x = 0; (14) with the boundary condition V (i?1) (T; x(T )) = kx(T )k2 : As in the initial step, we assume a non-singular solution and that V (i) (t; x) has the form

V (i) = xT (t)P (i) (t)x(t): (15) ( i ) The result is the following equation for P (t): _ (i) + P (i) A(i?1) + A(i?1)T P (i) + G(i?1) = 0; P (i)(T ) = I: (16) P
The solutions to (16) can be written analytically as Z T (i?1) (i?1) P (i) (t) = (T; t) 2 + ( ; t)
t G(i?1) d ; (17) where (i) is the state transition matrix associated with A(i) (t). The following Theorem shows that u(i) (t; x) and V (i) (t; x), given
2

This work was supported by NASA grants NGT 10000 and in equations (13) and (15), converge monotonically to u (t; x) and V (t; x) given in equations (6) and (7). NAGW-1333.

rive an iterative solution to the nite-time LQ optimal control problem. We show that the proposed algorithm monotonically improves any iniV (t; x) V (t; x) V (t; x): (18) tial control law, with guaranteed convergence to the optimal control. practical signi cance of this approach is that if a sub-optimal con( i ) with equality holding i V (t; x) V (t; x). Furthermore The trol law for a linear plant exists, the algorithm derived in Section 2 may V (i) (t; x) ?! V (t;nx) and u(i) (t; x) ?! u (t; x) pointwise for all be used to improve its performance in the LQ sense. This paper is also signi cant in that it illustrates an approach that is t 2 0; T ] and x 2 IR . Proof: This proof extends 6, Theorem 4] by removing all re- applicable to nonlinear systems. Using a similar approach we have developed a practical feedback synthesis algorithm for H2 optimal control strictions on the initial control u(0) (t; x), and by showing that in the lin- of nonlinear plants that are a ne in the control. These results will be ear case convergence is strictly monotone. By completing the squares presented in forthcoming papers. (c.f. 6]) we can derive the following formulas which directly imply (a) Cost vs Initial State equation (18):
(0) (1)
12000 10000 8000 6000 4000 2000

Theorem T

If R = R > 0, (A; B ) is controllable and A, B , and Q are such that a unique positive de nite solution to (5) exist, then

4 Conclusions In this paper we apply Saridis's successive approximation theory to de-

V (t; x) ? V (t; x) = (19) Z T 2 T (0) (1) ?1 4 t 2(B P ( ) ? RK ) (t; )x(t) R?1 d ; V (i) (t; x) ? V (i?1) (t; x) = (20) Z T 1 ?4 2BT (P (i?1) ( ) ? P (i?2) ( )) (i) (t; )x(t) 2 (b) Cost vs Iteration R?1 d ; t V (t; x) ? V (i) (t; x) = (21) Z T T (i?1) ?1 ( )) t; )x(t) 2 R?1 d : 4 t 2B (P ( ) ? P Since V (i) is bounded above and below, it converges in the limit to V (1) and by direct substitution into (14) it can be shown that V (1) Figure 1: Double integrator with initially unstable control: (a) Cost state for several iterations (b) Cost function verses itersatis es the Hamilton-Jacobi-Bellman equation, (3), with the appro- function verse ation for xT priate boundary conditions. Substituting the equation 0 = (10; 0). V (1) (t; x) = xT P (1) x; (a) Cost vs Initial State we see that P (1) must satisfy (5). Uniqueness of these equations give that V (1) = V . Furthermore, if the integral in (19) or (20) is zero then V (i) (t; x) also satis es (3) and the uniqueness of (5) again give that V (i) (t; x) V (t; x), showing strict monotone convergence. In the linear case, it is straightforward to show that V is continuously di erentiable which implies the pointwise convergence of u(i) (t; x) to u (t; x). Q.E.D.
(1) (0)
Cost

(0)

: ..

(i)

0 10

0 x

10

12000 10000 8000 6000

Cost

4000 0

Iteration

10000 8000 6000 4000 2000

(0)

: ..

(i)

Cost

0 10

0 x

10

with feedback linearization

(I + mp l2 ) + mp glsin( ) = ; = mp glsin( ) + u: 0 1 0 0 + 0 u: 1

Cost

3 Illustrative Example

9000 8000 7000 6000 5000

(b) Cost vs Iteration

As an example of the above technique, consider the inverted pendulum

4000 0

3 Iteration

By normalizing the parameters we obtain the double integrator

x _=

In this example we use performance index (2) with

Figure 1 (a) shows the cost function of the system verses x1 when the initial condition is xT 0 = (x1 (0); 0). Figure 1 (b) shows the cost verses iteration for the xed initial condition of xT 0 = (10; 0). These plots clearly show the convergence of V (i) to V . The initial control used to initialize the system in Figure 1 is u(0) (t; x) 0. In the in nite-time case, u(0) is required to be asymptotically stabilizing for the method to converge (c.f. 2]), this restriction does not apply in the nite-time problem. The initial error, however, may be very large if an initially unstable controller is used. This situation can be avoided by choosing the initial control gain K such that the matrix A ? BK is Hurwitz. For example if the initial control is u = ?Kx with which corresponds to the optimal LQR gains for the in nite-time horizon problem, then simulations similar to Figure 1 are shown in Figure 2. By comparing the two gures, it can be observed that the convergence is faster when an initial asymptotically stable control law is used. It should be noted that the \size" of the control e ort as t ! T is governed by : as becomes large, the performance improves but the control becomes correspondingly large. This di culty is inherent in global state-feedback control, with a nite-time constraint.

Q = I; R = 10; = 100; T = 1:

References 1] Robert R. Bitmead and Michel Gevers. Riccati di erence and dif2] 3] 4] 5] 6]

Figure 2: Double integrator with initially stable control: (a) Cost function verse state for several iterations (b) Cost function verses iteration for xT 0 = (10; 0).

K = (0:3162; 0:8558);

ferential equations: Convergence, monotonicity and stability. In Willems Bittanti, Laub, editor, The Riccati Equation, pages 263{ 291. Springer Verlag, 1991. David L. Kleinman. On an iterative technique for Riccati equation computations. IEEE Transactions on Automatic Control, AC13:114{115, February 1968. Alan J. Laub. Invariant subspace methods for the numerical solution of Riccati equations. In Willems Bittanti, Laub, editor, The Riccati Equation, pages 163{196. Springer Verlag, 1991. Frank L. Lewis. Optimal Control. John Wiley & Sons, New York, 1986. G. N. Saridis and J. Balaram. Suboptimal control for nonlinear systems. Control Theory and Advanced Technology, 2(3):547{562, September 1986. George N. Saridis and Chun-Sing G. Lee. An approximation theory of optimal control for trainable manipulators. IEEE Transactions on Systems, Man, and Cybernetics, SMC-9(3):152{159, March 1979.

Anda mungkin juga menyukai