SAAB2001 - On A Discrete Time Stochastic Learning Control Algorithm

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.
8, AUGUST 2001
On a Discrete-Time Stochastic Learning Control Algorithm
1333
a unique control input ud (t) 2 <p generating the trajectory for the
nominal plant. That is the following difference equation is satisfied
Samer S. Saab
xd (t + 1) = A(t)xd (t) + B (t)ud (t)
AbstractIn an earlier paper, the learning gain for a D-type learning
algorithm, is derived based on minimizing the trace of the input error covariance matrix for linear time-varying systems. It is shown that, if the
product of the input/output coupling matrices is full-column rank, then the
input error covariance matrix converges uniformly to zero in the presence
of uncorrelated random disturbances, whereas, the state error covariance
matrix converges uniformly to zero in the presence of measurement noise.
However, in general, the proposed algorithm requires knowledge of the
state matrix. In this note, it is shown that equivalent results can be achieved
without the knowledge of the state matrix. Furthermore, the convergence
rate of the input error covariance matrix is shown to be inversely proportional to the number of learning iterations.
Index TermsIterative learning control, stochastic control.
yd (t) = C (t)xd (t):
Define the state and the input error vectors as x(t; k) = xd (t) 0
1
x(t; k), and u(t; k ) = ud (t) 0 u(t; k ), respectively. It is assumed
that the initial state error x(0; k), initial input error u(t; 0), state
disturbance w(t; k), and the unbiased measurement error v (t; k) are
all modeled as zero-mean white Gaussian noise, and statistically independent.
Defining the input error and state error covariance matrices as
Pu; k = E [u(t; k )u(t; k)T ], and Px; k = E [x(t; k )x(t; k)T ],
respectively, where E is the expectation operator. It is shown [1]
that the learning gain, which minimizes the trace of the input error
covariance matrix, is given by
NOMENCLATURE
Kk
and
Pu; k
Kk ; Pu; k
P u; k :
Learning gain and the input error covariance matrices resulting from the optimal control algorithm presented in
[1].
Learning gain and sequence of matrices (analogous to
Pu; k ) used to define the modified learning algorithm
proposed in this note.
Actual input error covariance matrices resulting from the
proposed (modified) learning algorithm.
x(t + 1; k ) = A(t)x(t; k) + B (t)u(t; k) + w(t; k )

y (t; k ) = C (t)x(t; k ) + v (t; k) + vb (k)
2 [0; n ];
2< ;
2<;
2< ;
2<;
(1)
n
p
undesired bias vector vb (k)

state matrix;
input coupling matrix;
output coupling matrix.
The learning update is given by
+ (C
0 C + A)P
x; k (
(4)
+1 ]01
1
where the argument t is dropped for compactness, and C + = C (t + 1).
+ C Qt C
0 C +A)
+ Rt + Rt
The corresponding input error covariance update is given by

Pu; k+1 = (I
0K
N )Pu; k
T
(5)
01 01
1; k N ] Pu; k
= [I + Pu; k N S
The system considered in [1] is a discrete-time-varying linear system

described by the following difference equation
t
x(t; k)
u(t; k)
w(t; k )
y (t; k )
v (t; k)
A(t)
B (t)
C (t)
Kk = Pu; k (C B ) [(C B )Pu; k (C B )
I. PRELIMINARY
where
(3)
2< ;
q
u(t; k + 1) = u(t; k) + K (t; k)[e(t + 1; k )
0 e(t; k)]
(6)
where S1; k = (C 0 C + A)Px; k (C 0 C + A)T +C + Qt C + + Rt +

1
Rt+1 , and N = C + B . The corresponding results are summarized as
follows. If C (t + 1)B (t) is full-column rank, then the learning algorithm, presented by (2), (4), and (5), guarantees the following.
2) Pu; k is a symmetric positivedefinite matrix 8 k , and
t 2 [0; nt ]. Moreover, the eigenvalues of (I 0Kk C + B ) are positive and strictly less than one; i.e., 0 < (I 0Kk C + B ) < 1 8 k ,
and t 2 [0; nt ]. Consequently, there exists a consistent norm
k 1 k such that 8 k and t 2 [0; nt ], kI 0 Kk C + Bk < 1.
3) kPu; k+1 k < kPu; k k8 k . In addition, Pu; k ! 0 and Kk ! 0
uniformly in [0; nt ] as k ! 1.
4) In the absence of state disturbance and reinitialization errors
(excluding biased measurement noise () Rt is positivedefinite), kPu; k+1 k < kPu; k k 8 k . Pu; k ! 0, Kk ! 0, and the
state error covariance matrix Px; k ! 0 uniformly in [0; nt ] as
k ! 1.
II. MAIN RESULTS
(2)
where K (t; k) is the (p 2 q ) learning control gain matrix, and e(t; k)

is the output error; i.e., e(t; k) = yd (t) 0 y (t; k) where yd (t) is a
realizable desired output trajectory. It is assumed that for any realizable
output trajectory and an appropriate initial condition xd (0), there exists
Manuscript received June 5, 2000; revised November 16, 2000 and March
9, 2001. This work was supported by the University Research Council at the
Lebanese American University.
The author is with the Department of Electrical and Computer Engineering,
Lebanese American University, Byblos, Lebanon.
Publisher Item Identifier S 0018-9286(01)07686-3.
In this section, the modified, or suboptimal, stochastic learning

control algorithm and its convergence characteristics are presented.
Consider the modified learning gain matrix to be given by
T
Kk = Pu; k N [N Pu; k N
2 01
+S ]
(7)
where S2 = C + Qt C + + Rt + Rt+1 , and the recursion of the matrix

Pu; k to be given by
Pu; k+1 = (I
00189286/01$10.00 2001 IEEE
0 K N )P
0 K NP
= Pu; k
u; k (
u; k
+ Kk (N Pu; k N
0K
0P
N ) + Kk S2 Kk
u; k
N Kk
T
k
+ S )K :
1334
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001
Note that this learning gain can no longer be claimed as optimal gain
matrix. Substituting the value of Kk into the last equality, we get
Pu; k+1
=P 0 P
u; k
u; k
(N P
(N P
+ P N (N P
=(I 0 K N )P :
0P
u; k
u; k
u; k N
u; k
+ S2 )01 N P
+ S2 )01 N P
+ S2 )01 N P
u; k
Theorem 2: If N is full-column rank and Pu; 0 is a symmetric positivedefinite matrix, then the learning algorithm, presented by (2), (7),
and (8), guarantees that
kP k < k1 c1
u; k
u; k
(10)
u; k
=1 1= min[(N
kK (t; k)k < (1=k)c2 .
u; k
where c1
u; k
=1 c1 kN k kS201k, then
S201 N . Let c2
)]
Proof: Since Pu; k is a symmetric positive definite matrix, then

using the results of Claim 1, we have
Making use of [1 Claim 1], we have
=(I 0 K N )P
kP k = k(P 010 + kN S201N )01 k
=[I + P N S201N ]01 P :
(8)
=1=min[(P 010 + kN S201N )]:
It is worthwhile noting that by eliminating (C 0 C + A)P (C 0
C + A) , the matrix S1 becomes S2 , and consequently the results of Note that since P 01 is symmetric positive definite matrix,
0
P
, that is P +1 = (I 0K N )P , and (8) are consistent with (5)
then min[(P 010 + kN S201 N )]
>
min[(kN S201N )]
+
+
and (6), respectively. Since the term (C 0 C A)P (C 0 C A) is
0
1
= k min[(N S2 N )]. Therefore
eliminated in the learning matrix K and the update of the error covariance matrix, then 1) knowledge of the state matrix is no longer needed,
kP k < k1 min[(N1 S 01N )] = k1 c1 :
and 2) P
does no longer represent the true input error covariance
2
Pu; k+1
u; k
u; k
u; k
u; k
u;
u;
x; k
;k
u; k
u;
u; k
u; k
x; k
u;
T
u; k
u; k
matrix.
In the following, the convergence characteristics of Kk and Pu; k are
shown to be equivalent to Kk and Pu; k .
Theorem 1: If N
C t
B t is full-column rank, then the
learning algorithm, presented by (2), (7), and (8), guarantees the following:
2) Pu; k is a symmetric positive-definite matrix 8 k , and t 2 ; nt ;
3) the eigenvalues of I 0 Kk N are positive and strictly less than
one; i.e., < I 0 Kk N < 8 k , and t 2 ; nt ;
4) kPu; k+1 k < kPu; k k 8 k . In addition, Pu; k ! and Kk !
uniformly in ; nt as k ! 1.
The proofs of Theorem 1 are identical to the proofs of their counterparts
presented in [1], and are thus omitted.
In what follows, we show that as k ! 1, Pu; k ! (and consequently Kk ! ) uniformly in ; nt if and only if the true input
error covariance matrix P u; k ! uniformly in ; nt . This fact underlines the main contribution of this manuscript. In addition, we show
that the convergence is inversely proportional to the number of learning
iterations.
Claim 1: If N is full-column rank, then the learning algorithm, presented by (2), (7), and (8), assures that
Using (7), we have
kK (t; k)k < ck1 kN k k(N P

ck1 kN k kS201k:
= ( + 1) ( )
[0 ]
) 1
[0 ]
0
[0 ]
Pu; k
[0 ]
0
= [I + kP
u;
0N
( )
[0 ]
01 01
S2 N ] Pu; 0 :
1
where M = N
= [I + P
u; 0 M
]01 P
u;
= 1,
S201 N is a symmetric positivedefinite matrix. Since

(9) is true for k
, we assume that the equality is true for k
, i.e.,
Pu; k01
I
k
Pu; 0 M 01 Pu; 0 . Again using (8), we get
Pu; k
01
01
+ [I + (k 0 1)P 0 M ]01 P 0 M
1 [I + (k 0 1)P 0 M ]01 P 0
= [I + (k 0 1)P 0 M ]
1 [I + [I + (k 0 1)P 0 M ]01 P 0 M ]
= fI + (k 0 1)P 0 M + P 0 M g01P 0
= (I + kP 0 M )01P 0 :
=
u;
u;
u;
+ S2 )01 k
[0 ]
P u; k+1
=1 E [u(t; k + 1)u(t; k + 1) ]
=(I 0 K N )P (I 0 K N )
+ K ((C 0 C + A)P (C 0 C + A) + S2 )K
T
u; k
x; k
T
k
(=
where P x; k is the state error covariance matrix corresponding to Kk

1 C 0 C +A P C 0
Pu; k N T N Pu; k N T
S2 01 . Define Lk
x; k
+
T
T
Lk , and Lk . Employing the fact
C A , this implies that Lk
that
+ ] )
=
Pu; k+1
=(
= (I 0 K N )P (I 0 K N ) + K S2K
k
u; k
T
k
then by subtracting the last two equations, we get
1P
u; k
1
+1 =
P u; k+1
0 P +1
u; k
=(I 0 K N )1P (I 0 K N ) + K L K
k
u; k
T
k
u;
u;
u;
u;
u;
Theorem 3: Assuming that N is a full-column rank matrix, the

learning algorithm, presented by (2), (7), and (8) guarantees that the
input error covariance matrix P u; k !
uniformly in ; nt as
k ! 1. Furthermore, the rate of convergence of P u; k is inversely
proportional to k for k > , that is, there exists a positive constant c P
such that kP u; k k < c P =k . Conversely, if P u; k ! as k ! 1, and
kP u; k k < c P =k, then Pu; k ! as k ! 1. Furthermore, if the rate
of convergence of P u; k is inversely proportional to k , then the rate of
convergence of Pu; k is also inversely proportional to k .
Proof: In [1], it is shown that for any given learning gain matrix
Kk , in particular Kk Kk , the input error covariance matrix is given
by
(9)
=1
= [ + ( 0 1)
u; k
In the following, we denote by M the eigenvalues of M .

Proof: The proof is proceeded by induction. Using (8) for k
we obtain
Pu; 1
u;
u;
u;
01
Pu; 0
Without loss of generality (see the subsequent remark), it is assumed

that only at initialization we set Pu; 0
P u; 0 , that is P u; 0
.
1 K L K T . Iterating the last equation up to k 0 , we
Define Dk
k k
k
have
u;
P u; k
= P + 1P
u; k
u; k
=0
1
Fig. 1. Top plot:
(solid), and
01
01
where
1P
u; k
(dashed). Bottom plot:
=0 = +1
01
j
i
j
( ) =1
From the previous results, we have kK k < (1=k)c2 or kK k <

(1=i)c2 . Since for all k, kI 0 K N k < 1, then the boundedness of
P
is guaranteed [1], which implies that there exists a positive constant c such that kL k c . Therefore, kD k < (1=i2 )c22 c . De1
1
fine c 1 = (c22 c kI + Gk2 )=fmin[(G)]g2 , and c 2 = (kD0 kkI +
2
2
Gk )/fmin[(G)]g . Taking the norm on both sides of 1P defined
k
(I 0 K 0 N )
k
x; k
(I 0 K 0 N )
k
= +1
(11)
= +1
(I 0 K 0 N )P
k
u;
k1P k
u;
=1
Since
then
01
(I 0 K 0 N ) = [I + (k 0 i)G]01(I + G):
k
minf[I +(k0i)G]g > minf[(k0i)G] = (k0i)min[(G)],

kI + Gk :
k[I + (k 0 i)G]01(I + G)k < (k 01 i) min[
(G)]
01
=1 = +1
01
(I 0 K 0 N ) kD k
k
(I 0 K 0 N )
k
= +1
01
(I 0 K 0 N ) kD0 k
k
=
01
=[ + ]
= +1
01
where G Pu; 0 N T S201 N , and Equation (9) is used to obtain the last
equality. Note that since Pu; 0 , and N T S201 N are symmetric positive
definite matrices, then the eigenvalues of G are strictly positive, that
is, G > . Substituting Pu; 1
I G 01 Pu; 0 into the second
equality of the last equation, we get
( ) 0
u; k
=[I + (k 0 i)G]01 P
Pu; k0i =(I 0 Kk0i01 N )Pu; k0i01
u; k
01
in (11) and using the derived norm bounds, we get
k01
with
:
I . Note that since Di is symmetric and posj =k
itivesemidefinite matrix, then
P u; k is also symmetric and
semipositivedefinite matrix. Employing (8), we have
1D
1335
(I 0 K 0 N )
k
< cc1
01
i2 (k 0 i)2
i=1
+ c 2 k12 :
c
01 =i2 k 0 i 2 < =k , and k01 =i2 k 0

Note that for k > ; ki=1
i=1
2
, and 0.5, for k
, and 3, respectively. This implies that for
i
01 =i2 k 0 i 2 =k . Define cc3 1
k > ; ki=1
cc1 ; cc2 ,
then we have
) ]=1
1
[1 ( ) ] 1
=2
[1 ( ) ] 2
k1P k < ck3 :
u; k
[1 (
= max(2
1336
Equation (11) implies that
iterations. The state error covariance matrix is also shown to converge

uniformly to zero in presence of random measurement errors.
kP k kP k + k1P k:
u; k
u; k
u; k
Therefore, by applying the learning algorithm presented by (2), (7), and

(8), thus kPu; k k < =k c1 , we have
(1 )
kP k <
u; k
=1 +
cP
1
1
(1 )
1P
The author would like to thank one of the earlier reviewers for constructive suggestions in improving this work.
where c P
c1
cc3 . Consequently, as k ! 1; P u; k ! . Conversely, (11) implies that kP u; k k kPu; k
P u; k k. Since Pu; k is
symmetric and positive definite matrix and P u; k is symmetric and
positivesemidefinite matrix, then 1) if P u; k ! as k ! 1, then
Pu; k ! and P u; k ! , as k ! 1, and 2) if kP u; k k < c P =k ,
then there exist positive constants cP 1 and c1P 1 such that kPu; k k <
=k cP 1 , and k P u; k k < c1P 1 =k for all k > .
Remark: When applying the proposed algorithm, it is intuitive to
initially set Pu; 0 to the estimate of P u; 0 . However, if this is not the
case, then P u; 0 6
. Thus, P u; k in (11) becomes
ACKNOWLEDGMENT
+1
1
REFERENCES
[1] S. S. Saab, A discrete-time stochastic learning control algorithm, IEEE
Trans. Automat. Contr., vol. 46, pp. 877887, June 2001.
=0
u; k
01
01
(I 0 K 0 N )
k
=0 = +1
01
(I 0 K 0 N )
k
= +1
01
j
=0
(I 0 K 0 01 N )
k
1 1P 0
u;
01
=0
(I 0 K 0 01 N )
k
Zhong-Ping Jiang and Iven Mareels

AbstractIt is well known from linear systems theory that an integral
control law is needed for asymptotic set-point regulation under parameter
perturbations. This note presents a similar result for a class of nonlinear
systems in the presence of an unknown equilibrium due to uncertain nonlinearities and dynamic uncertainties. Both partial-state and output feedback cases are considered. Sufficient small-gain type conditions are identified for existence of linear and nonlinear control laws. A procedure for
robust nonlinear integral controller design is presented and illustrated via
a practical example of fan speed control.
1D
Robust Nonlinear Integral Control
Index TermsDynamic uncertainties, input-to-state stability, nonlinear

systems, robust integral control, small-gain.
For instance, to guarantee that P u; k is symmetric positivesemidefinite matrix, then it may be assumed that P u; 0 is also symmetric positivesemidefinite matrix. Applying the argument similar to the original
proof leads to the desired results.
Theorem 4: If N is a full-column rank matrix, then in absence of
state disturbance and reinitialization errors (excluding biased measurement noise), the learning algorithm, presented by (2), (7), and (8), guarantees that the input error covariance matrix P u; k ! , and the state
error covariance matrixP x; k E x t; k x t; k T ! uniformly
in ; nt as k ! 1.
Proof: Theorem 1 implies that Pu; k ! , and consequently Theorem 3 implies that P u; k ! . The rest of the proof is similar to the
proof of its counterpart in [1], thus omitted.
[0 ]
0
= [ ( ) ( )] 0
0
0
III. NUMERICAL EXAMPLE

Application of the algorithm presented by (2), (7), and (8) is now
added to the same example given in [1]. The convergence characteristics of P u; k ; Pu; k , and Kk are illustrated in Fig. 1. The top and
bottom plots conform with the 10 dB/decade attenuation characteristics of P u; k , Pu; k , and Kk .
IV. CONCLUSION
This note presented an upgraded, or suboptimal, version of the
stochastic algorithm presented in [1]. This presented algorithm does
not require the use of the state matrix. In the presence of uncorrelated
random state disturbance, reinitialization errors, and biased measurement errors, this algorithm is shown to drive the input error covariance
matrix to zero as the number of learning iterations increases. The rate
of convergence is shown to be inversely proportional to the number of
I. INTRODUCTION
It is widely recognized that an integral controller is inherently robust
in the face of model and controller parameter variations. The value of
integral control in achieving robust asymptotic regulation has recently
been exploited for nonlinear uncertain systemssee, e.g., [1][4], [9],
and [10], and the references therein. In [1] and [2], Freeman and Kokotovic propose a backstepping scheme for robust integral control of a
class of nonlinear systems with unknown nonlinearities. Global setpoint regulators with disturbance rejection property are constructed at
the price of assuming full-state information and the relative degree
being equal to the system order. Both assumptions in [1], [2] are relaxed by Khalil [10] by means of his high-gain observers techniques
complemented by the idea of saturating the controller outside a compact set of interest. Naturally, as a consequence of the worst-case
design, the results in [10] are of regional and semiglobal types.
The purpose of this note is to propose global regulation results for
a class of nonlinear systems with disturbances combining those in [1],
[2], [10], i.e., we do consider unmeasured zero-dynamics and uncertain nonlinearities. Both partial-state and output feedback control cases
will be investigated. The obtained results extend our previous results
Manuscript received November 7, 2000; revised March 22, 2001. Recommended by Associate Editor Z. Lin. This work was supported in part by the
National Science Foundation under Grants INT-9987317 and ECS-0093176.
Z.-P. Jiang is with the Department of Electrical and Computer Engineering,
Polytechnic University, Brooklyn, NY 11201 USA (e-mail: zjiang@control.poly.edu).
I. Mareels is with the Department of Electrical and Electronic Engineering,
Melbourne University, Parkville 3052 Victoria, Australia (e-mail: i.mareels@ee.mu.oz.au).
Publisher Item Identifier S 0018-9286(01)07685-1.
00189286/01$10.00 2001 IEEE

SAAB2001 - On A Discrete Time Stochastic Learning Control Algorithm

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

SAAB2001 - On A Discrete Time Stochastic Learning Control Algorithm

Diunggah oleh

Hak Cipta:

Format Tersedia

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.

On a Discrete-Time Stochastic Learning Control Algorithm

yd (t) = C (t)xd (t):

x(t + 1; k ) = A(t)x(t; k) + B (t)u(t; k) + w(t; k )

undesired bias vector vb (k)

The corresponding input error covariance update is given by

The system considered in [1] is a discrete-time-varying linear system

Kk = Pu; k (C B ) [(C B )Pu; k (C B )

u(t; k + 1) = u(t; k) + K (t; k)[e(t + 1; k )

where S1; k = (C 0 C + A)Px; k (C 0 C + A)T +C + Qt C + + Rt +

where K (t; k) is the (p 2 q ) learning control gain matrix, and e(t; k)

In this section, the modified, or suboptimal, stochastic learning

where S2 = C + Qt C + + Rt + Rt+1 , and the recursion of the matrix

00189286/01$10.00 2001 IEEE

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001

Proof: Since Pu; k is a symmetric positive definite matrix, then

Making use of [1 Claim 1], we have

Using (7), we have

kK (t; k)k < ck1 kN k k(N P

S201 N is a symmetric positivedefinite matrix. Since

where P x; k is the state error covariance matrix corresponding to Kk

then by subtracting the last two equations, we get

Theorem 3: Assuming that N is a full-column rank matrix, the

In the following, we denote by  M the eigenvalues of M .

Without loss of generality (see the subsequent remark), it is assumed

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001

Fig. 1. Top plot:

(dashed). Bottom plot:

From the previous results, we have kK k < (1=k)c2 or kK k <

minf[I +(k0i)G]g > minf[(k0i)G] = (k0i)min[(G)],

Pu; k0i =(I 0 Kk0i01 N )Pu; k0i01

in (11) and using the derived norm bounds, we get

01 =i2 k 0 i 2 < =k , and k01 =i2 k 0

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 8, AUGUST 2001

Equation (11) implies that

iterations. The state error covariance matrix is also shown to converge

Therefore, by applying the learning algorithm presented by (2), (7), and

Zhong-Ping Jiang and Iven Mareels

Robust Nonlinear Integral Control

Index TermsDynamic uncertainties, input-to-state stability, nonlinear

III. NUMERICAL EXAMPLE

00189286/01$10.00 2001 IEEE

Anda mungkin juga menyukai

In the following, we denote by M the eigenvalues of M .

minf[I +(k0i)G]g > minf[(k0i)G] = (k0i)min[(G)],