Anda di halaman 1dari 7

Combining State-Dependent Riccati Equation (SDRE) methods with Model Predictive Neural Control (MPNC)

Eric A. Wan and Alexander A. Bogdanov Oregon Graduate Institute of Science & Technology 20000 NW Walker Rd, Beaverton, Oregon 97006 ericwan@ece.ogi.edu, alexb@ece.ogi.edu
Abstract In this paper, we present a method for control of MIMO non-linear systems based on receding horizon model predictive control (MPC) that is used to augment with statedependent Riccati equation (SDRE) control. The MPC is implemented using a neural network (NN) feedback controller, hence we refer to the approach as model predictive neural control (MPNC). The NN is optimized to minimize state and control costs, subject to dynamic and kinematic constraints. While SDRE control provides robust control based on a pseudo-linear formulation of the dynamics, the MPNC utilizes highly accurate numerical nonlinear models. By combining the two techniques, we achieve overall improved robustness and performance. Specically, the SDRE technique is used in a number of ways to provide a) enhanced local stability of the system, b) an initial feasible control trajectory for use in the MPC optimization, c) a control Lyapunov function (CLF) for approximation of the cost-to-go used in the receding horizon optimization, and d) increased computational performance by approximating plant Jacobians used in the MPC optimization. Results are illustrated with an example involving control of a highly realistic helicopter model. 1. Introduction Model predictive control (MPC) is an optimization based framework for learning a stabilizing control sequence which minimizes a specied cost function [9, 8]. For general nonlinear systems, this requires a numerical optimization procedure involving iterative forward (and backward) simulations of the model dynamics. An initial guess of a stabilizing control sequence is essential for implementation and proper convergence. The resulting control sequence represents an openloop control law, which can then be re-optimized on-line at periodic update intervals to improve robustness. Our approach to MPC utilizes a combination of an SDRE controller and an optimal NN controller. The SDRE technique [3, 4] is an improvement over traditional linearization
This work was sponsored by DARPA under grant F33615-98-C-3516 with principal co-investigators Richard Kieburtz, Eric Wan, and Antonio Baptista. We also would like to express special thanks to Ying Long Zhang, Andy Moran and Magnus Carlsson for assistance with the Flightlab model.

based Linear Quadratic (LQ) controllers (SDRE control will be elaborated on in Section 3.1). In our framework, the SDRE controller provides an initial stabilizing controller which is augmented by a NN controller. The NN controller is optimized using a calculus of variations approach to minimize the MPC cost function. For simplicity, we will refer to the combined control system (SDRE+NN) as model predictive neural control (MPNC). Earlier related work on our approach is described in [11, 2]. We begin in the next section with additional details on the MPC and receding horizon framework. 2. MPC and Receding Horizon Control The general MPC optimization problem involves minimizing a cost function

(1)

which represents an accumulated cost of the sequence of and controls from the current discrete time to states . the nal time . For regulation problems Optimization is done with respect to the control sequence subject to constraints of the system dynamics,

corresponds to the standard Linear Quadratic cost. For linear systems, this leads to linear state-feedback control, which is found by solving a Riccati Equation [5]. In this paper we consider general MIMO non-linear systems with tracking error costs of the form

As an example,

(2)

where reference state trajectory. The last term provides a penalty for control saturation, where each element ( ) of the vector is dened as

(3) , with corresponding to a desired

, if sign , otherwise

In general, a numerical optimization approach is used to solve for the sequence of controls, , corre sponding to an open-loop control law, which can then be reoptimized on-line at periodic update intervals. The complexity of the approach is a function of the nal time , which determines the length of the optimal control sequence. In practice, we can reduce the number of computations by taking a Receding Horizon (RH) approach, in which optimization is performed over a shorter xed length time interval. This is accomplished by rewriting the cost function as

This representation is not a linearization. To illustrate the principle, consider a simple scalar example: , then . Based on this new state-space representation, we design . This an optimal LQ controller to track the desired state leads to the nonlinear controller,

(4)

where the last term denotes the cost-to-go from time to time . The advantage is that this yields an optimization problem of xed length . In practice, the true value of is unknown, and must be approximated. Most common is to simply set ; however, this may lead to reduced stability and poor performance for short horizon lengths [7]. Alternatively, we may include a control Lyapunov function (CLF), which guarantees local stability if the CLF is an upper bound on the cost-to-go [7]. In Section 3.2, we will use the solution to the SDRE controller to form the cost-to-go function. 3. Neural and SDRE Receding Horizon Control Difculties with MPC include, a) the need for a good initial sequence of controls that is capable of stabilizing the model, and b) the need to re-optimize at short intervals to avoid problems associated with open-loop control. We address these issues by directly implementing a feedback controller as a combination of an SDRE stabilizing controller and a neural controller:

where is a solution of the standard Riccati Equations using state-dependent matrices and , which are treated as being constant. The procedure is repeated at every and provides local asymptime step at the current state totic stability of the plant [3]. In practice, the approach has been found to be far more robust than LQ controllers based on standard linearization techniques.

(8)

3.2. Neural Network Controller The overall owgraph of the MPNC system in shown in Figure 1. Optimization is performed by learning the weights of the NN in order to minimize the receding horizon MPC cost function (Equation 4) subject to the system dynamics and the composite form of overall feedback controller (Equation 5). The problem is solved by taking a standard calculus of variand are vectors of Lagrange ations approach, where multipliers in the augmented cost function

approximated using the solution of the SDRE at time ,

(9) . The cost-to-go is where


where

(5)

(10)

is a constant. The SDRE controller provides a robust stabilizing control (as established via simulation), while the weights of the neural network are optimized to minimize the overall receding horizon MPC cost. Each of these components are detailed in the following sections.

This CLF provides the exact cost-to-go for regulation assuming a linear system at the horizon length. A similar formulation was used for nonlinear regulation control in [10]. We can now derive the recurrent Euler-Lagrange equations

3.1. SDRE Controller Referring to the system state-space Equation 2, an SDRE con as troller [3] is designed by reformulating

(6) if (7)

This yields the resulting system

(11)

(12)

, if

overT over uk Rsat uk

T uk Ruk

J lim ()

Ju ( )

xk

Neural control
Neural Network
nn uk

,
1
sd k

uk
Plant dynamics

xk +1

ek
Targets coord. and vel. x tar
k

xk

u
Desired des state xk

Plant
xk
q
1

ek x
des k

K(xk )

SDRE control
J e final ()

xk
J e ()
T ek Qek

(et + N )

Figure 1: MPNC signal ow diagram


overT 2uk Rsat

T 2uk R

Neural control
nn dxk

nn duk

Neural Network Jacobian

duk
Plant Jacobians

k +1

deknn deknn

dxk duksd dek


( K ( xk )ek ) xk

Plant
k
q +1

des Des. state xk Jacobian xk

dx
nn dxk

des k

SDRE Jacobian

T 2ek Q

(et + N ) xk

Figure 2: Adjoint system

, with , , and ( is the dimension of control , given by Equation 3, vector ). For

, and each element of gradient vector , if .

2. Run the adjoint system backward in time to accumulate the Lagrange Multipliers (Figure 2). Jacobians are evaluated analytically or by perturbation. In practice .

These equations correspond to an adjoint system shown graphically in Figure 2, with optimality condition

3. Update the weights using gradient descent 1 , .

4. Repeat until convergence or until an acceptable level of cost reduction is achieved. The NN is initialized with small weights, which allows the SDRE controller to initially dominate, providing stable tracking and good conditions for training. As training progresses, the NN decreases the tracking error. This reduces the SDRE
1 In practice we use an adaptive learning rate for each weight in the network using a procedure similar to delta-bar-delta [6]

(13)

The overall training procedure for the NN can now be summarized as follows: 1. Simulate the system forward in time for time steps (Figure 1). Note that the SDRE controller is updated at each time step.

control output, which in turn gives more authority to the neural controller. The training process is repeated at the update interval to recompute new weights for the next horizon.

des uk

3.3. Computation considerations and approximations MPC design implies numerous simulations of the system forward in time and the adjoint system backward in time. Computations scale with the number of training epochs and the horizon length. The most computational demanding operations correspond to solving the Riccati equation for the SDRE controller in the forward simulation, and the numeric (i.e., by perturbation) computation of the plant Jacobians in the backward simulation. In order to approach real-time feasibility, we consider possible trade-offs between computational effort and controller performance through a number of successive simplications: is approxi1. The plant Jacobian with respect to , i.e., we use the statemated as dependent transition matrix found analytically from the model. In addition, the SDRE Jacobian is ap , proximated as where , can then be calculated analytically as given in Equation 15.

des k

des wk

Target

Figure 3: Reference trajectory generation response, dynamic loads, vortex wake, blade element aerodynamics, and can also provide nite elements structure analysis. For our research purposes, we generated a high-delity helicopter model with a rigid fuselage, exible blades, quasiunsteady airloads and 3-state inow. The model is presented as a numerical discrete-time nonlinear system with 76 internal states. A challenge in using ight simulators such as FlightLab to design controllers, is that governing dynamic equations are not readily available (i.e., the aircraft represents a black-box model). This precludes the use of most traditional nonlinear control approaches that require an analytic model. To utilize the MPNC approach, the numeric simulator model is approximated by a 6DOF rigid body dynamic model, providing a set of governing equations at each time instance necessary to design the SDRE. The neural network controller, however, is still trained to minimize the MPC cost based on the full nonlinear simulator model. Details of this are discussed in the following section. 4.1. Helicopter and FlightLab Design Considerations to correspond For helicopter control, we dene the state to the standard states of a 6DOF rigid body model. This 12 dimensional state vector consists of Cartesian coordinates (yaw, roll, in the inertial frame , Euler angles pitch), linear velocities and angular velocities in the body coordinate frame. Technically, this represents a reduced state-space, as our FlightLab model utilizes a total of 76 internal states (e.g., rotor states). However, we treat the additional states as both unobservable and uncontrollable for purposes of deriving the controller. There are 4 control inputs, , corresponding to the main collective, lateral cyclic, longitudinal cyclic, and tail collective (incident angles of blades) 2 . While the dynamics of the control mechanisms are simulated (and thus accounted for in the MPC design), the explicit states associated with these dynamics are again not utilized in our derivations. The tracking error for the helicopter, , is determined by the trajectory of a reference target . The

2. Same as (1), plus the discrete system matrix is memorized at each time step during the rst epoch and then used for all subsequent epochs within the current horizon. Here we assuming the control sequence and thus the Jacobians are not changing signicantly during training. 3. Same as (2), plus the matrices and are memorized for the rst epoch, and again used for the all subsequent epochs. This simplication, allows us to avoid resolving the SDRE in the corresponding forward simulations. 4. Same as (3), plus all matrices that were calculated in the previous horizon are re-used in the current horizon for the time segment that the two overlap. For example, if horizon length = 10, and update interval = 2, then there will be an overlap of 8 time steps, and only the last 2 time steps require computations of new matrices. The computational versus performance trade-offs associated with these simplications are evaluated in the context of helicopter control in Section 4.4. 4. Application to Helicopter Control We develop helicopter control through the use of the FlightLab simulator [1]. Flightlab is a commercial software product developed by Advanced Rotorcraft Technologies. It includes modeling of the main rotor, tail rotor and fuselage ow interference, effects of dynamic stall, transonic ow, aeroelastic

2 We

set control constraints as follows: Max. deg., Max. deg., Max.

deg., Max. deg.

s s c c c c s c s s c s s c c c s

s c c s s c c s s s

c c

c s

s, c denote and

, , ,

Figure 4: State-dependent system matrix representation. target species desired coordinates and velocities in the inertial frame (with roll, pitch and angular velocities set to zero). The reference state is then projected into the body frame to produce the desired state, craft mass, and are rotor-induced forces and moments. The forces and moments are nonlinear functions of helicopter states and control inputs. We then rewrite this into a SDRE continuous canonical representation . The matrix is given explicitly in Figure 4. Thus is obtained from by discretization at ) 3 . each time step (e.g., Since the nonlinear mapping of states and control inputs to rotor-induced forces and moments are not known, cannot be explicitly found. Thus we approximate by linearizing the full FlightLab model with respect to the conaround an equilibrium point in hover at the curtrol inputs rent altitude or appropriate trim state. This is accomplished numerically by successively perturbing each control input to the simulator at the current state and time. Finally, given and , we can design the SDRE control gain at each time step. Note that while is based on the 6DOF model, the state argument comes directly from the FlightLab model. We have found that this mixed approach using the approximate model plus the linearized control matrix, , is far more robust that simply using a standard LQ approach based on linearization for all system matrices.

where is an appropriate projection matrix consisting of necessary rotation operators. Minimization of this error causes the the helicopter to move in the direction of the target motion (see Figure 3 on the previous page). 4.2. Specics of the SDRE Controller As stated earlier, the SDRE controller requires a set of governing equations, which are not available in FlightLab. Thus we derive a 6DOF rigid body model as an analytical approximation. The simplied dynamics are given by

(14)

where is a rotation matrix (coordinate trans formation) from body frame to inertial frame, is the air-

4.3. Specics of the Neural Network Controller The neural controller is designed using the full FlightLab helicopter model and augments the SDRE controller. The overall owgraph of the system in consistent with that shown in
3 Parameters

ft .

settings to match FlightLab are lb ft , lb ft , lb ft ,

lb,

lb

Position in the XYZ space


10 9

Main rotor collective angle [deg] 20

15

8 7
10

200 180

11 12 6
0 5

zposition [ft]

160 140 120 100 200 150 100 100 50 0 yposition [ft]
1 2 3 4 5

10

200 150 50
20 0 5 10 15 20 time [s] Main rotor collective angle [deg] 20 25 30 35 40 45 15

0 xposition [ft]

Position in the XYZ space


9 8 10 7

15

10

200 180

11 12 6 5 4 3
5 0 5

zposition [ft]

160 140 120 100 200 150 100


1 2

10

200 150 100 50 0 0 xposition [ft]


20 0 5 10 15 20 time [s] 25 30 35 40 45 15

50 yposition [ft]

Figure 5: Test trajectory: a) SDRE, b) MPNC

Figure 6: Collective control: a) SDRE, b) MPNC

Figure 1. However, the neural controller is specied as:

4.4. Simulation Results Figure 5 shows a test trajectory for the helicopter (vertical rise, forward ight, left turn, u-turn, forward ight to hover). The gure compares tracking performance at a velocity of 20 for the MPNC system (SDRE+NN) versus a standard SDRE controller (MPNC settings are: horizon , update interval , training epochs , sampling time sec, ). Note that a standard LQ controller based on linearization exhibits loss of tracking and crashes for velocities above 12 . The smaller tracking error for the MPNC controller is apparent (note the reduced overshooting and oscillations at mode transitions). Figure 6 shows that the control effort spent by the MPNC controller is also less. Table 1 illustrates the trade-offs between computational effort and control performance (accumulated cost, ) for the simplications discussed in Section 3.3. Clearly, substantial speed-up in simulation time can be achieved with only a minor loss of performance (while we have included representative simulation times for relative comparison, the experiments were performed in MATLAB (with the standalone vehicle model generated by FlightLab simulator) and were not optimized for efcient implementation). Note that all simplications still result in a substantial improvement over the

s s

c c

where we have included sines and cosines of the yaw, pitch, and roll angles. This is motivated since the helicopter dynamics depend on such trigonometric functions of the Euler angles. Coordinates of the aircraft in the inertial frame do not inuence dynamics, and are excluded as inputs. The adjoint system (see Figure 2) also requires a slight modication to incorporate the effects of the tracking error. Specically, the term appearing in Equation 11 is given by

(15)

where the partial is evaluated analytically by specifying

All other aspects of the NN training (e.g., approximation of the cost-to-go, computational considerations, etc.) are as described previously.

Table 1: Simplication levels vs. computing time and performance costs (Pentium-3 750 MHz, Linux).
Simp. level Cost GFLOPs Sim. time Accurate 5.85 34hr. 1 6.15 3.5hr. 2 7.40 54min. 3 7.17 33min. 4 7.84 24min.

mental short term forecast for improved ight control.


6. References [1] Flightlab release note - version 2.8.4. Advanced Rotorcraft Technology, Inc., 1999. [2] A. A. Bogdanov and E. A. Wan. Model predictive neural control of a high delity helicopter model. In Submitted to AIAA Guidance Navigation and Control Conference, Montreal, Quebec, Canada, August 2001. [3] J. R. Cloutier, C. N. DSouza, and C. P. Mracek. Nonlinear regulation and nonlinear H-innity control via the state-dependent Riccati equation technique: Part1, Theory. In Proceedings of the International Conference on Nonlinear Problems in Aviation and Aerospace, Daytona Beach, FL, May 1996. [4] J. R. Cloutier, C. N. DSouza, and C. P. Mracek. Nonlinear regulation and nonlinear H-innity control via the state-dependent Riccati equation technique: Part2, Examples. In Proceedings of the International Conference on Nonlinear Problems in Aviation and Aerospace, Daytona Beach, FL, May 1996. [5] G. F. Franklin, J. D. Powell, and M. L. Workman. Digital Control of Dynamic Systems. Addison-Wesley, Reading, MA, second edition, 1990. [6] R. A. Jacobs. Increasing rates of convergence through learning rate adaptation. Neural Networks, 1(4):295307, 1988. [7] A. Jadbabaie, J. Yu, and J. Hauser. Stabilizing receding horizon control of nonlinear systems: a control Lyapunov function approach. In Proceedings of American Control Conference, 1999. [8] E. S. Meadow and J. B. Rawlings. Nonlinear Process Control, chapter Model predictive control. PHALL, 1997. [9] S. J. Qin and T. A. Badgwell. An overview of industrial model predictive control technology. Chemical Process Control AIChE Symposium Series, pages 232256, 1997. [10] M. Sznaizer, J. Cloutier, R. Hull, D. Jacques, and C. Mracek. Receding horizon control Lyapunov function approach to suboptimal regulation of nonlinear systems. The Journal of Guidance, Control, and Dynamics, 23(3):399405, May-June 2000. [11] E. A. Wan and A. A. Bogdanov. Model predictive neural control with applications to a 6 DoF helicopter model. In Proceedings of IEEE American Control Conference, Arlington, VA, June 2001.

Table 2: Performance cost comparisons.


Hid. neurons Horiz./update MPNC + MPNC w/o SDRE 50 neurons 200 neurons 25/5 50/10 10/2 25/5 50/10 8.29 10.31 7.56 7.17 8.95 7.47 9.58 10.78 6.21 8.50 31.28

10/2 8.76

standard SDRE controller, which has an associated cost of . For all subsequent simulations we use simplication level 3. Finally, Table 2 summarizes comparisons of the accumulated cost with respect to the horizon length and MPC update interval, number of neurons in the hidden layer, and with and without the use of the cost-to-go . The weights of the NN are trained for 10 epochs for each horizon (number of simulated trajectories). Missing data in the table corresponds to a case where states and control inputs exceeded the envelope of Flightlab model consistency. Results indicate that the optimal horizon length is between 10 to 25 time steps (1-2 seconds). The importance of the cost-to-go function is apparent for short horizon lengths. On the other hand, inclusion of the cost-to-go does not appear to help for longer horizons. Overall, signicant performance improvement is clearly achieved with the MPNC controller relative to the pure SDRE controller. 5. Conclusions In this paper, we have presented a new approach to receding horizon MPC based on a NN feedback controller in combination with an SDRE controller. The approach exploits both a sophisticated numerical model of the vehicle (FlightLab) and its analytical nonlinear approximation (6DOF model). The NN is optimized using properties of the full FlightLab simulator to minimizes the MPC cost, while the SDRE controller is designed using the approximate model and provides a baseline stabilizing control trajectory. In addition, we considered a number of simplications in order to improve the computational requirements of the approach. Overall, results verify the superior performance of the approach over traditional SDRE (and LQ) control. Future work includes incorporation of vehicle-environment interaction model and use of environ-

Anda mungkin juga menyukai