SAAB2001 - A Discrete Time Stochastic Learning Algorithm

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.
6, JUNE 2001
877
A Discrete-Time Stochastic Learning Control

Algorithm
Samer S. Saab, Senior Member, IEEE
AbstractOne of the problems associated with iterative

learning control algorithms is the selection of a proper learning
gain matrix for every discrete-time sample and for all successive
iterations. This problem becomes more difficult in the presence of
random disturbances such as measurement noise, reinitialization
errors, and state disturbance. In this paper, the learning gain, for
a selected learning algorithm, is derived based on minimizing the
trace of the input error covariance matrix for linear time-varying
systems. It is shown that, if the product of the input/output coupling matrices is full-column rank, then the input error covariance
matrix converges uniformly to zero in the presence of uncorrelated
random disturbances. However, the state error covariance matrix
converges uniformly to zero in presence of measurement noise.
Moreover, it is shown that, if a certain condition is met, then the
knowledge of the state coupling matrix is not needed to apply the
proposed stochastic algorithm. The proposed algorithm is shown
to suppress a class of nonlinear and repetitive state disturbance.
The application of this algorithm to a class of nonlinear systems is
also considered. A numerical example is included to illustrate the
performance of the algorithm.
Index TermsIterative learning control, optimal control, stochastic control.
I. INTRODUCTION
TERATIVE learning control algorithms are reported and applied to fields topping robotics [4][7], and [23][36], for
example, precision speed control of servomotors and its application to a VCR [1], cyclic production process with application
to extruders [2], and coil-to-coil control in rolling [3]. Several
approaches for updating the control law with repeated trials on
identical tasks, making use of stored preceding inputs and output
errors, have been proposed and analyzed [4][38]. Several algorithms assume that the initial condition is fixed; that is, at each
iteration the state is initialized always at the same point. For
a learning procedure that automatically accomplishes this task,
the reader may refer to [28] and [17]. For algorithms that employ
more than one history data, refer to [15] and [16]. A majority of
these methodologies are based on contraction mapping requirements to develop sufficient conditions for convergence and/or
boundedness of trajectories through repeated iterations. Consequently, a condition on the learning gain matrix is assumed in
order to satisfy a necessary condition for contraction. Based on
the nature of the control update law, conditions on , such as
or
, are imposed (where the matrix
is a function of input/output coupling functions to the system)
Manuscript received September 1, 1999; revised April 25, 2000. Recommended by Associate Editor M. Polycarpou. This work was supported by the
University Research Council at the Lebanese American University.
The author is with Lebanese American University, Byblos 48328, Lebanon
(e-mail: ssaab@lau.edu.lb).
Publisher Item Identifier S 0018-9286(01)05129-7.
[4][20]. Several methods are proposed to find an appropriate

value of the gain matrix . For example, the value of is found
based on gradient methods to minimize a quadratic cost error
function between successive trials [35], another is based on providing a steepest-descent minimization of the error at each iteration [37], an additional is achieved by choosing larger values of
the gain matrix until a convergence property is attained [38]. Another methodology, based on frequency response technique, is
also presented in [25]. On the other hand, for task-level learning
control, an on-line numerical multivariate parametric nonlinear
least square optimization problem is formulated and applied to a
two-link flexible arm [24]. Then again, these methods are based
on deterministic approach. To the best of the authors knowledge, this paper presents the first attempt to formulate a computational algorithm for the learning gain matrix, which stochastically minimizes, in a least-square sense, trajectory errors in
presence of random disturbances.
This paper presents a novel stochastic optimization approach,
for a selected learning control algorithm, in which the learning
gain minimizes the trace of the input error covariance matrix. A
linear discrete time-varying system, with random state reinitialization, additive random state disturbance and biased measurement noise, is initially considered. The selected control update
, at a sampled time and iterative index
,
law
consists of the previous stored control input added to weighted
difference of the stored output error, that is
. Although this update law possesses a derivative action, the proposed algorithm is
shown to become insensitive to measurement noise throughout
iterations. The problem can be interpreted as follows: assume
that the random disturbances are white Gaussian noise and uncorrelated, then, at time and iterative index , find a learning
, such that the trace of the input error cogain matrix
is minimized. It is shown [11] that
variance matrix at
is chosen such that
if
, then boundedness of trajectories of this control law is
guaranteed. A choice of this learning gain matrix requires the
to be full-column rank. In addition, in abmatrix
sence of all disturbances, uniform convergence of all trajectois full-column
ries is also guaranteed if and only if
rank. This latter scenario is not realistic because there are always
random disturbances in the system in addition to measurement
noise. Unfortunately, deterministic learning algorithms cannot
learn random behaviors. That is, after a finite number of iterations, the repetitive errors are attenuated and the dominating
errors will be due to those random disturbances and measurement noise, which cannot be learned or mastered in a determinis
istic approach. In this paper, we show that if
00189286/01$10.00 2001 IEEE
878
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001
full-column rank, then the algorithm that generates the learning

gain matrix, which minimizes the trace of the input error covariance matrix automatically assures the contraction condition,
, which ensures the supthat is
pression of repeatable or deterministic errors [11], [14]. Moreover, it is shown that the input error covariance matrix converges
uniformly to zero in presence of random disturbances, where
the state error covariance matrix converges uniformly to zero in
presence of biased measurement noise. Obviously, if the input
error is zero and in the case where reinitialization errors are
nonzero, then one should not expect this type of learning control
to suppress such errors. Furthermore, if a zero-convergence of
the state error covariance matrix is desired in presence of state
disturbance, then the controller (if any) is expected to heavily
employ full knowledge of the system parameters, which makes
the use of learning algorithms unattractive. On the other hand,
it is shown that if a certain condition is met, then knowledge of
the state matrix is not required. Application of this algorithm to
a class of nonlinear system is also examined. In one case where
the nonlinear state function does not require the Lipschitz condition, unlike the results in [5][9], [12][15], and [24], it is shown
that the input error covariance matrix is nonincreasing. In addition, if a certain condition is met, then the results for this class
of systems become similar to the case of linear time-varying
systems. A class of nonlinear systems satisfying Lipschitz condition is also considered.
The outline of the paper is as follows. In Section II, we formulate the problem and present the assumptions of the disturbance characteristics. In Section III, we develop the necessary
learning gain that minimizes the trace of the input error covariance matrix, and present the propagation formula of this covariance matrix through iterations. In Section IV, we assume that
is full-column rank and present the convergence
results. In Section V, we present a pseudo-code for computer
application, and give a numerical example. We conclude in Section VI. In Appendix A, we show that if a certain condition is
met, then the knowledge of the state matrix is not required. In
Appendix B, we apply this type of stochastic algorithm to a class
of nonlinear systems.
where
is the
learning control gain matrix, and
is the output error; i.e.,
where
is a realizable desired output trajectory. It is assumed that
for any realizable output trajectory and an appropriate initial
, there exists a unique control input
condition
generating the trajectory for the nominal plant. That is, the
following difference equation is satisfied:
(3)
is full-column rank and
is a
Note that if
realizable output trajectory, then a unique input generating the
output trajectory is given by
Define the state and the input error vectors as

, and
, respectively.
to be the expectation operAssumptions: Denote , and
ators with respect to time domain, and iteration domain, respecand
be sequences of zero-mean
tively. Let
is
white Gaussian noise such that
is pospositivesemidefinite matrix,
itivedefinite matrix for all , and
for all indices
, and, ; i.e.,
have zero crosscorrelathe measurement error sequence
at all times. The initial state error
and
tion with
are also assumed to be zero-mean
the initial input error
is poswhite noise such that
itivesemidefinite matrix, and
is a symmetrical positivedefinite matrix. One simple scenario
. Moreover,
is uncorrelated
is to set
, and
. The main target is to find
with
such that the input error
a proper learning gain matrix
variance is minimized. Note that since the disturbances are assumed to be Gaussian processes, then uncorrelated disturbances
are equivalent to independent disturbances.
II. PROBLEM FORMULATION

Consider a discrete-time-varying linear system described by
the following difference equation:
(1)
where
III. STOCHASTIC LEARNING CONTROL ALGORITHM

DEVELOPMENT
In this section, we develop the appropriate learning gain matrix, which minimizes the trace of the input/state error covariance matrix. In the following, we combine the system given by
(1) and (2) in a regressor form. We first write expressions for
is given by
the state and input errors. The state error at
(4)
;
;
;
;
and
.
The learning update is given by
and the input error for iterate
(2)
is given by
SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM
Substituting the values of the state errors in the previous equation, we have
879
variances of the elements of

the error covariance matrix
. Toward this end, we first form
Since the initial error state

and
the disturbances
, and
. Thus,
last equation is now reduced to
Collecting terms, the input error yields to
is uncorrelated with
, then at iterate ,
, and
. The
(8)
(5)
Combining (4) and (5) in the two-dimensional Roessor Model
[39] as shown in (6) at the bottom of the page. The following
is justified because the input vector of difference equation (6)
has zero mean. Note that if disturbances are nonstationary or
colored noise, then the system can be easily augmented to cover
this class of disturbances. Writing (6) in a compact form, we
have
Let
where
and
(7)
As a consequence,
and
tivesemidefinite matrices.
where
are
is an
is an
are symmetrical and posi-
vectors
where the zero submatrices are due to zero crosscorrelation
and
. For compactness, we denote
between
,
,
,
,
,
, and
.
Expanding on the left hand terms of (8), we get the equation
shown at the bottom of the next page. Consequently, the trace
is equivalent to the following:
of
vector
matrix, and
trace
trace
is an
matrix.
such
Next, we attempt to find a learning gain matrix
that the trace of the error covariance matrix
is minimized. It is implicitly assumed that the input and state
as a covarierrors have zero mean, so it is proper to refer to
ance matrix. Note that the mean-square error is used as the performance criterion because the trace is the summation of error
(6)
880
Expanding and rearranging the terms on the right-hand side of

the last equation, we get
trace
trace
then it will be reasonable to find a proper gain to minimize the

instead of . In order to find
such that the trace
trace of
is minimized, we set the derivative of trace of
with
of
to zero as follows:
respect to
trace
Note that since

and
are positivesemidefinite matrices,
is positive definite, then
and
in nonsingular. The following solution of
is the optimal learning gain which minimizes the trace of
(9)
Setting
and
(10)
. This is
Next, we show that
in function
accomplished by expressing the state error
,
, and
,
of the initial state error
, and similarly, the input error
in function of
,
,
,
, and
,
, and then correlating
and
. Iterating the
argument of (4), we get
then the above equation is reduced to

trace
trace
(11)
Define
Since the learning gain,
is employed to update the con, and in addition, this gain does not effect
trol law at
, that is
the state error at
trace
and
Similarly, iterating the
argument of (5), we get for
(12)
Correlating the right hand terms of (11) and (12), it can be
,
,
,
readily concluded that
,
, and
, are all uncorrelated. In addition, these
,
terms are also uncorrelated with
. At this point, the correlation
and
of (11) and (12) is equivalent to correlating
and
, where
,
.
and
Since the terms
cannot be represented as a function of one another, hence
these terms are considered uncorrelated. Therefore,
, and consequently
.
is reduced to
Then,
881
is also full-row rank. Multiplying both sides of (13), from the

and letting
,
right-hand side, by
.
implies that
The input error covariance matrix associated with the learning
gain (13) may now be derived. Note that if the initial condition
is symmetric, then, by examining (14),
is symof
, and
metric for all . For compactness, we define
.
Expanding (14), we have
(15)
Substitution of (13) into (15) yields to
(16)
Claim 1: Assuming that
, and
nite, then
is symmetric positivedefi-
(17)
Proof: Substituting (13) in
, we get
(13)
and the input error covariance matrix becomes
One way to show the equality claimed, we multiply the left-hand
side of (17) by the inverse of the right-hand side, and show that
the result is nothing but the identity matrix
(14)
IV. CONVERGENCE
In this section, we show that stochastic convergence is
is
guaranteed in presence of random disturbances if
full-column rank. In particular, it is shown that the input error
covariance matrix converges uniformly to zero in presence
of random disturbances (Theorem 3), where the state error
covariance matrix converges uniformly to zero in presence of
biased measurement noise (Theorem 5). In the following, some
useful results are first derived.
is a full-column rank matrix, then
Proposition 1: If
if and only if
.
Proof: Sufficient condition is the trivial case [set
in (13)]. Necessary condition. Since
is full-column rank,
is a full-row rank
then
(18)
Using a well-known matrix inversion lemma [40], we get
(19)
Substituting (19) into the right-hand side of (18), we have
882
Proof: Equation (16) along with (20), imply
Canceling two terms and rearranging others, the above equation

yields to
where again
. We now show that

by a contradiction argument. Since
is strictly decreasing sequence and bounded from
exists. Assume that
below by zero, then the
. Consider the following ratio test for the
:
sequence
We may rewrite the above equation as
Note that
Since
is
Lemma 2: If
is full-column rank, then the
learning algorithm, presented by (2), (13), and (16), guarantees
is a symmetric positive-defthat
, and
. Moreover, the eigenvalues
inite matrix
are positive and strictly less than one, i.e.,
of
, and
.
Proof: The proof is proceeded by induction with respect
. By examining (14),
to the iteration index for
is assumed to be a symmetric positivedefinite masince
is symmetric and nonnegative definite. Define
trix, then
. Equation (17) implies that
. Since
is
is symmetric possymmetric positivedefinite, then
symmetric and posiitivedefinite. In addition, having
are
tivedefinite implies that all eigenvalues of
are
positive. Therefore, the eigenvalues of
strictly greater than one, which is equivalent to conclude that
are positive and strictly less than one.
the eigenvalues of
is nonsingular. Equation (16) implies that
This implies that
is nonsingular. Thus,
is a symmetric possymmetric
itive definite matrix. We may now assume that
is sympositive definite. Equation (14) implies that
metric and nonnegative definite. Using a similar argument of
for
, implies that
, and consequently
,
is
with strictly positive eigenvalues, is nonsingular and
symmetric and positivedefinite.
Remark: Since all the eigenvalues of
are
, and
strictly positive and strictly less than one
(Lemma 2), then there exists a consistent norm
such that
, and
(20)
is full-column rank, then the
Theorem 3: If
learning algorithm, presented by (2), (13), and (16), guarantees
. In addition,
and
that
uniformly in
as
.
is bounded in
, then
also bounded in
, where
. This can be seen by em-
ploying (11). Define
and
then
(21)
is bounded, then
Since
is symmetric positivedefinite and bounded. Therefore,
is symmetric, positivedefinite,
and bounded. The latter together with the assumption
, then one may conclude that
that
, or
Since
, then the ratio test implies that

, which is equivalent to
. Employing Proposition 1, we get
. Since
takes on a finite values (
), then uniform convergence
is equivalent to pointwise convergence.
Lemma 4: If
, , then
for
.
, and
. Then,
Proof: Let
(positivesemidefinite matrix), and
. Taking the limit as
, we have
. Sim-
883
ilarly,
.
.
Therefore,
Theorem 5: If
is a full-column rank matrix,
then in absence of state disturbance and reinitialization errors
is positive
(excluding biased measurement noise
definite), the learning algorithm, presented by (2), (13), and
. In addition,
(16), guarantees that
,
, and the state error covariance matrix
uniformly in
as
.
Proof: Obviously, the results of the previous theorem still
, and
, (11) is reduced
applies. Setting
to
Consequently, taking the expected value of

, we have
and defining
Taking the limit as
TABLE I
PERFORMANCE OF PROPOSED ALGORITHM
We assume that
,
,
,
,
,
, and
are all available. Starting with
, then the pseudocode is as follows:
Step 1) For
a) apply
to the system described by (1),
;
and find the output error
;
b) employing (22), compute
;
c) using (13), compute learning gain
;
d) using (2), update the control
.
e) using (16), update
, go to Step 1).
Step 2)
may be
In the case where
neglected, then the pseudocode is the same as the previous one
excluding the second numbered item, and using (24) instead of
.
(13), to compute learning gain
B. Numerical Example
, we get
System Description: In this section, we apply the proposed

learning algorithm to a discrete-time varying linear system, and
compare the performance with a similar algorithm where the
gain is computed in a deterministic approach. The latter is referred to as generic algorithm. The system considered is given
by
From
previous
results,
,
. Therefore,
.
. Again,
is straightforward.
,
Consequently,
uniform convergence for
V. APPLICATION
In this section, we go over the steps involved in the application of the proposed algorithm. A pseudo code is presented
for both cases where the term
is considered, and neglected. In addition, a numerical example
is presented to illustrate the performance of the proposed algorithm.
A. Computer Algorithm
In order to apply the proposed learning control algorithm,
the state error covariance matrix needs to be available. From
Section III, one can extract the following
(22)
(23)
, and
,
second,
,
where the integration period
,
,
, and
are normally distributed
,
,
white random processes with variances
,
, and 1, respectively. All the disturbances are
assumed to be unbiased except for the measurement noise with
. The desired trajectories with domain
mean
(1 second interval) are the trajectories associated with
. The initial input for the generic and proposed al,
. The initial input error
gorithms
,
.
covariance scalar (single input)
Next, the generic algorithm requires the control gain to sat. It is noted that if
is chosen
isfy
is close to zero, then, in the iterative dosuch that
main, fast transient response is noticed but at the cost of larger
is chosen such that
steady-state errors. However, if
884
Fig. 1. State errors and input error at k = 100, for t
2 [0; 100].
is close to one, then, smaller steady-state errors is noticed at the cost of very slow transient response. Consequently,
is chosen such that
,
the generic gain
.
for all
Performance: In order to quantify the statistical and deterministic size of input and state errors, after a fixed number of
iterations, we use three different measures. For statistical evaluation, we measure the variance and the mean square error of
each variable. For deterministic examination, the infinite norm
is measured. Since the convergence rate, as the number of iteration increases, is also of importance, then the accumulation
of each of these measures (variance, mean square, and infinite
norm) are also computed. The superiority of the proposed algorithm can be detected by examining both Table I and Fig. 1.
In Fig. 1, the solid, and dashed lines represent the employment
of the proposed and generic gains, respectively. The top plot
, whereas the bottom plot
of Fig. 2 shows the evolution of
shows the conformation of Proposition 1. In addition, an examination of Table I and Fig. 2 (top) shows the close conformity
of the computed input error variance and the estimated one gen,
erated by the proposed algorithm. In particular, at
, and
, with
.
VI. CONCLUSION
This paper has formulated a computational algorithm for the
learning gain matrix, which stochastically minimizes, in a leastsquare sense, trajectory errors in presence of random disturis full-column rank,
bances. It is shown that if
then the algorithm automatically assures the contraction condition, that is

, which ensures
the suppression of repeatable errors and boundedness of trajectories in presence of bounded disturbances. The proposed algorithm is shown to stochastically reject uncorrelated disturbances
from the control end. Consequently, state trajectory tracking becomes stochastically insensitive to measurement noise. In paris full-column
ticular, it is shown that, if the matrix
rank, then the input error covariance matrix converges uniformly
to zero in presence of uncorrelated random state disturbance,
reinitialization errors, and measurement noise. Consequently,
the state error covariance matrix converges uniformly to zero
in presence of measurement noise. Moreover, it is shown that,
if a certain condition is met, then the knowledge of the state
coupling matrix is not needed to apply the proposed stochastic
algorithm. Application of this algorithm to class of nonlinear
systems is also examined. A numerical example is added to conform with the theoretical results.
APPENDIX A
MODEL UNCERTAINTY
Unfortunately, the determination of the proposed learning
gain matrix, specified by (13), requires the update of the state
error covariance matrix, which in turns requires full knowledge
of the system model. This makes the application of the proposed
learning control algorithm somehow unattractive. Note that
in many applications, only the continuous-time statistical and
deterministic models are available, furthermore, the discrete
and
, and the system and measurement noise
matrices
Fig. 2. Evolution of learning gain function K , t

(bottom).
885
2 [0; 100] for k = 1; 2; 5, and 10 (top), avg
covariance matrices depend on the size of the sampling period

(while discretizing). Suppose that a continuous-time linear
,
time-invariant system is given by
where is the time argument in continand
uous domain. In order to evaluate the discrete-state matrix ,
we may use
, and avg
versus k , 1
100
k
the state error covariance matrix, hence knowledge of the state

matrix is no longer needed. In the case where
, and
in absence of plant knowledge, then a parameter identification
scheme, similar to the one presented in [34], may be integrated
with the proposed algorithm.
APPENDIX B
APPLICATION TO NONLINEAR SYSTEMS
A. Application to a Class of Nonlinear Systems
(
) with
the sampling period. For a small sampling
period, one may approximate
. There are several other mathematical applications
and the sampling period does not have to
where
be small, e.g., if the rows of the state matrix corresponding to
th output is equal to the associated row of the output coupling
, and only the second
matrix . For example, if
. Therefore, the
row of is equal to , then
learning gain matrix is reduced to
(24)
Note that, by using the learning gain of (24) [instead of (13)], all
the results in Sections IV and V still apply as presented. Consequently, the computer algorithm does not require the update of
Consider the state-space description for a class of nonlinear

discrete-time varying systems given by the following difference
equation:
(25)
is a vector-valued function with range in
. Dewhere
, and
. We assume
fine
and a desired trajectory
are given
that the initial state
,
and assumed to be realizable that is, bounded
,
, and
, such that
, and
.
, and
are as asThe restrictions on the noise inputs
,
,
sumed in Section II. Again, we define
,
,
, the state error
, and the input error vector
vector
886
. Using (25), and the learning algorithm presented in (2), we obtain
B. Nonlinear Deterministic Disturbance

In this section, nonlinear repetitive state disturbance is added
to the linear system as follows
(27)
may represent the repetitive state disturbance. It is
where
is uniformly globally Lipschitz
assumed that the function
. Let
in and for
, then
is also uniformly globally Lipschitz
. The error equation corresponding to
in and for
(27) is given by
Consequently, the input error covariance matrix becomes
(28)
where
, and since
is a function
, then
. Borrowing the results
of
from Section III, we find that the learning gain that minimizes
is similar to the one presented by (13). In
the trace of
particular, defining
then
(26)
Therefore, borrowing some of the results presented in Sections IV and V, one can conclude the following.
Theorem 6: Consider the system presented in (25). If
,
then the learning algorithm, presented by (2), (26), and (16),
. In addition,
guarantees that
and
uniformly in
as
.
is not necessary. In case
Lipschitz requirement on
, then
where
. If the function
is uniformly globally Lipschitz in
and for
; then, positive constant
such that
.
Theorem 7: Consider the system presented in (25), If
is uniformly globally Lip, then the learning algorithm,
schitz in and for
presented by (2), (26), and (16), guarantees that
. In addition,
and
uniformly in
as
.
Proof: The proof is the same as the corresponding part
.
of the proof of Theorem 3 except the nonsingularity of
is Lipschitz, then positive constant such that
Since
. Borrowing the results
given in [14, Th. 1], then the infinite norms of the state errors
is bounded for all , consequently,
are bounded. Therefore,
is nonsingular.
in place of
in the definition of the
Substituting
matrix of (26), the following results are concluded.
is a full-column rank matrix,
Theorem 8: If
then for any positive constant , the learning algorithm, presented by (2), (26), and (16), will generate a sequence of inputs
such that the infinite-norms of the input, output, and state errors
, in
given by (28) decrease exponentially for
. In addition,
.
. Since
,
,
Proof: Define
, and
, thus
in particular, for
, for
and
.
Borrowing results of [14, Corollary 2], then the infinite-norms
of the input, output, and state errors given by (28) decrease ex, in
. The results related
ponentially for
to the input error covariance matrix can be concluded from the
previous subsection.
REFERENCES
[1] Y.-H. Kim and I.-J. Ha, A learning approach to precision speed control
of servomotors and its application to a VCR, IEEE Trans Contr. Syst.
Technol., vol. 7, pp. 466477, July 1999.
[2] M. Pandit and K.-H. Buchheit, Optimizing iterative learning control
of cyclic production process with application to extruders, IEEE Trans
Contr. Syst. Technol., vol. 7, pp. 382390, May 1999.
[3] S. S. Garimella and K. Srinivasan, Application of iterative learning control to coil-to-coil control in rolling, IEEE Trans Contr. Syst. Technol.,
vol. 6, pp. 281293, Mar. 1998.
[4] S. Arimoto, S. Kawamura, and F. Miyazaki, Bettering operation of
robots by learning, J. Robot. Syst., vol. 1, pp. 123140, 1984.
[5] S. Arimoto, Learning control theory for robotic motion, Int. J. Adaptive Control Signal Processing, vol. 4, pp. 544564, 1990.
[6] S. Arimoto, T. Naniwa, and H. Suziki, Robustness of P-type learning
control theory with a forgetting factor for robotic motions, in Proc. 29th
IEEE Conf. Decision Control, Honolulu, HI, Dec. 1990, pp. 26402645.
[7] S. Arimoto, S. Kawamura, and F. Miyazaki, Convergence, stability,
and robustness of learning control schemes for robot manipulators, in
Int. Symp. Robot Manipulators: Modeling, Control, Education, Albuquerque, NM, 1986, pp. 307316.
[8] G. Heinzinger, D. Fenwick, B. Paden, and F. Miyaziki, Stability of
learning control with disturbances and uncertain initial conditions,
IEEE Trans. Automat. Contr., vol. 37, pp. 110114, Jan. 1992.
[9] J. Hauser, Learning control for a class of nonlinear systems, in Proc.
26th Conf. Decision Control, Los Angeles, CA, Dec. 1987, pp. 859860.
[10] A. Hac, Learning control in the presence of measurement noise, in
Proc. American Control Conf., 1990, pp. 28462851.
[11] S. S. Saab, A discrete-time learning control algorithm for a class of

linear time-invariant systems, IEEE Trans. Automat. Contr., vol. 40, pp.
11381142, June 1995.
, On the P-type learning control, IEEE Trans. Automat. Contr.,
[12]
vol. 39, pp. 22982302, Nov. 1994.
[13] S. S. Saab, W. Vogt, and M. Mickle, Learning control algorithms for
tracking slowly varying trajectories, IEEE Trans. Syst., Man, Cybern.,
vol. 27, pp. 657670, Aug. 1997.
[14] S. S. Saab, Robustness and convergence rate of a discrete-time learning
control algorithm for a class of nonlinear systems, Int. J. Robust Nonlinear Control, vol. 9, no. 9, July 1999.
[15] C.-J. Chien, A discrete iterative learning control for a class of nonlinear time-varying systems, IEEE Trans. Automat. Contr., vol. 43, pp.
748752, May 1998.
[16] Z. Bien and K. M. Huh, Higher-order iterative learning control algorithm, in Proc. Inst. Elec. Eng., vol. 136, 1989, pp. 105112.
[17] Y. Chen, C. Wen, Z. Gong, and M. Sun, An iterative learning controller
with initial state learning, IEEE Trans. Automat. Contr., vol. 44, pp.
371376, Feb. 1999.
[18] Z. Geng, R. Carroll, and J. Xie, Two-dimensional model and algorithm
analysis for a class of iterative learning control systems, Int. J. Control,
vol. 52, pp. 833862, 1990.
[19] Z. Geng, M. Jamshidi, and R. Carroll, An adaptive learning control
approach, in Proc. 30th Conf. Decision Control, Dec. 1991, pp.
12211222.
[20] J. Kurek and M. Zaremba, Iterative learning control synthesis based on
2-D system theory, IEEE Trans. Automat. Contr., vol. 38, pp. 121125,
Jan. 1993.
[21] T. Y. Kuc, J. S. Lee, and K. Nam, An iterative learning control theory
for a class of nonlinear dynamic systems, Automatica, vol. 28, no. 6,
pp. 12151221, 1992.
[22] K. L. Moore, Iterative Learning control for deterministic systems, in
Advances in Industrial Control. London, U.K.: Springer-Verlag, 1993.
[23] P. Bondi, G. Casalino, and L. Gambardella, On the iterative learning
control schemes for robot manipulators, IEEE J. Robot. Automat., vol.
4, pp. 1422, Aug. 1988.
[24] D. Gorinevsky, An approach to parametric nonlinear least square optimization and application to task-level learning control, IEEE Trans.
Automat. Contr., vol. 42, pp. 912927, July 1997.
[25] R. Longman and S. L. Wirkander, Automated tuning concepts for Iterative learning and repetitive control laws, in Proc. 37th Conf. Decision
Control, Dec. 1998, pp. 192198.
[26] J. Craig, Adaptive control of manipulators through repeated trials, in
Proc. Amer. Control Conf., San Diego, CA, June 1984, pp. 15661573.
[27] C. Atkeson and J. McInntyre, Robot trajectory learning through practice, in Proc. IEEE Int. Conf. Robotics Automation, San Francisco, CA,
1986, pp. 17371742.
[28] P. Lucibello, Output zeroing with internal stability by learning, Automatica, vol. 31, no. 11, pp. 16651672, 1995.
, On the role of high-gain feedback in P-type learning control of
[29]
robot arms, IEEE Trans. Syst., Man, Cybern., vol. 12, pp. 602605,
Aug. 1996.
887
[30] A. De Luca and S. Panzieri, End-effector regulation of robots with

elastic elements by an iterative scheme, Int. J. Adapt. Control Signal
Processing, vol. 10, no. 45, pp. 379393, 1996.
[31] R. Horowitz, W. Messner, and J. B. Moore, Exponential convergence
of a learning controller for robot manipulators, IEEE Trans. Automat.
Contr., vol. 36, pp. 890894, July 1991.
approach to
[32] N. Amann, D. Owens, E. Rogers, and A. Wahl, An
linear iterative learning control design, Int. J. Adaptive Control Signal
Processing, vol. 10, no. 6, pp. 767781, 1996.
[33] S. Hara, Y. Yamamoto, T. Omata, and M. Makano, Repetitive control
systems: A new type servo system for periodic exageneous signals,
IEEE Trans. Automat. Contr., vol. 33, pp. 659667, May 1998.
[34] S. R. Oh, Z. Bien, and I. H. Suh, An iterative learning control method
with application for the robot manipulators, IEEE J. Robot. Automat.,
vol. 4, pp. 508514, Oct. 1988.
[35] M. Togai and O. Yamano, Analysis design of an optimal learning
control scheme for industrial robots: A discrete system approach, in
Proc. 24th Conf. Decision Control, Ft. Lauderdale, FL, Dec. 1985, pp.
13991404.
[36] N. Sadegh and K. Gluglielmo, Design and implementation of adaptive
and repetitive controllers for mechanical manipulators, IEEE Trans.
Robot. Automat., vol. 8, pp. 395400, June 1992.
[37] F. Furuta and M. Yamkita, The design of a learning control system for
multivariables systems, in Proc. IEEE Int. Symp. Intelligent Control,
Philadelphia, PA, Jan. 1987, pp. 371376.
[38] K. L. Moore, M. Dahleh, and S. P. Bhattacharyya, Adaptive gain adjustment for a learning control method for robotics, in Proc. 1990 IEEE Int.
Conf. Robotics Automation, Cincinnati, OH, May 1990, pp. 20952099.
[39] R. Roessor, A discrete state-space model for linear image processing,
IEEE Trans. Automat. Contr., vol. AC-21, pp. 110, Jan. 1975.
[40] F. L. Lewis, Applied Optimal Control and Estimation: Digital Design
and Implementation. Upper Saddle River, NJ: Prentice-Hall, 1992.
Samer S. Saab (S92M93SM98) received

the B.S., M.S., and Ph.D. degrees in electrical
engineering in 1988, 1989, and 1992, respectively,
and the M.A. degree in applied mathematics in 1990,
all from the University of Pittsburgh, Pittsburgh, PA.
He is Assistant Professor of Electrical and
Computer Engineering at the Lebanese American
University, Byblos, Lebanon. Since 1993, he has
been a Consulting Engineer at Union Switch and
Signal, Pittsburgh, PA. From 1995 to 1996, he
was a System Engineer at ABB Daimler-Benz
Transportation, Inc., Pittsburgh, PA, where he was involved in the design
of automatic train control and positioning systems. His research interests
include iterative learning control, Kalman filtering, inertial navigation systems,
nonlinear control, and development of map matching algorithms.

SAAB2001 - A Discrete Time Stochastic Learning Algorithm

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

SAAB2001 - A Discrete Time Stochastic Learning Algorithm

Diunggah oleh

Hak Cipta:

Format Tersedia

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO.

A Discrete-Time Stochastic Learning Control

AbstractOne of the problems associated with iterative

[4][20]. Several methods are proposed to find an appropriate

00189286/01$10.00 2001 IEEE

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

full-column rank, then the algorithm that generates the learning

Define the state and the input error vectors as

II. PROBLEM FORMULATION

III. STOCHASTIC LEARNING CONTROL ALGORITHM

and the input error for iterate

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

variances of the elements of

. Toward this end, we first form

Since the initial error state

Collecting terms, the input error yields to

are symmetrical and posi-

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Expanding and rearranging the terms on the right-hand side of

then it will be reasonable to find a proper gain to minimize the

Note that since

then the above equation is reduced to

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Similarly, iterating the

argument of (5), we get for

is also full-row rank. Multiplying both sides of (13), from the

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Proof: Equation (16) along with (20), imply

Canceling two terms and rearranging others, the above equation

. We now show that

ploying (11). Define

, then the ratio test implies that

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Consequently, taking the expected value of

Taking the limit as

System Description: In this section, we apply the proposed

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

Fig. 1. State errors and input error at k = 100, for t

then the algorithm automatically assures the contraction condition, that is

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

Fig. 2. Evolution of learning gain function K , t

2 [0; 100] for k = 1; 2; 5, and 10 (top), avg

covariance matrices depend on the size of the sampling period

the state error covariance matrix, hence knowledge of the state

Consider the state-space description for a class of nonlinear

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 6, JUNE 2001

. Using (25), and the learning algorithm presented in (2), we obtain

B. Nonlinear Deterministic Disturbance

Consequently, the input error covariance matrix becomes

SAAB: A DISCRETE-TIME STOCHASTIC LEARNING CONTROL ALGORITHM

[11] S. S. Saab, A discrete-time learning control algorithm for a class of

[30] A. De Luca and S. Panzieri, End-effector regulation of robots with

Samer S. Saab (S92M93SM98) received

Anda mungkin juga menyukai