Learning Systems Theory and Application

Learning systems: Theory and application
K. Najim
G.Oppenheim
Indexing terms Control systems, Oplimisation, Adaptiue control, Automata, Neural networks
Abstract: A survey of the state of the art in learning systems (automata and neural networks) which are of increasing importance in both theory and practice is presented. Learning systems are a response to engineering design problems arising from nonlinearities and uncertainty. Definitions and properties of learning systems are detailed. An analysis of the reinforcement schemes which are the heart of learning systems is given. Some results related to the asymptotic properties of the learning automata are presented as well as the learning systems models, and at the same time the controller (optimiser) and the controlled process (criterion to be optimised). Two learning schemes for neural networks synthesis are presented. Several applications of learning systems are also described.
Introduction
Learning systems have been the subject of considerable amount of attention. These concepts have been initially used by psychologists [l] and biologists [2] to describe human behaviour from both psychological and biological viewpoints. The vocabulary and the concepts associated with the learning systems and used by psychologists, biologists and automatic control engineers are the same and will be defined. As with adaptive systems, learning systems improve their behaviour on the basis of the response of the environment in which they operate. In automatic control, the environment (the medium), corresponds to the process to be controlled (the problem to be solved : optimisation, pattern recognition, diagnosis, etc.). Adaptive control systems contain two feedback loops. Learning systems, used for control purposes, contain three feedback loops [3]. Adaptive controllers contain a classical feedback loop and an adaptation loop. Learning systems also contain a learning loop. A study concerning the behaviour of finite automata in a stationary random environment was performed by Tseltin [2, 41. Different behaviour norms of finite automata operating in random environments were proposed. The concept of games between automata was introduced by Kyrlov and Tseltin [SI. This concept was used for modelling biological systems [Z]. Tseltin [2] demonstrated that an automaton game can be described
Paper 8085E (C4), first received 27th October 1989 and in revised form 14th March 1991 K. Najim is with the Ecole Nationale Supheure Dlngenieurs de Genie Chimique, CNRSjURA 192, GRECO SARTA, Chemin de la Loge, 31078 Toulouse CMex, France G . Oppenheim is with the Universite de Paris-Sud, Departement de Mathematiques, Blt. 425,91405 Orsay, France
IEE PROCEEDINGS-E, Vol. 138, N o . 4 , JULY 1991
by a finite Markov chain. The behaviour of automata in periodic random media was also considered. The games between automata have been considered by several authors 15-20], Varshavskii and Vorontsova [21] have shown that the different concepts introduced by Tseltin [2] and the results obtained can be applied to the stochastic automata with varying structure. The learning behaviour of variable structure stochastic automata in unknown random media has been studied by many authors [ I ] , 12, 18,22-621. A comparative study of learning automata operating in nonstationary environments is given in Reference 55. The behaviour of learning automata in multiple environments has been considered by Thathachar and Bhakthavathsalam [57]. Another aspect of learning systems concerns the connectionist networks [63-831. They represent an attempt to model the human brain. Neural networks learn mappings between input and output patterns. Information (signals) can only propagate through the network between adjacent layers and only in a single direction. Connectionist networks exhibit the rulefollowing behaviour of knowledge-based expert systems without containing any explicit representations of the rules. Neural networks seem to be very attractive for several purposes (modelling, control, pattern recognition, etc.). They are parallel machines. Nonlinear and very complex relationships can be handled by this kind of representation. There is a deep and useful connection between learning automata and neural networks. The first input-output information is presented to the net as the net just begins to learn. The prediction error is used to adjust the weights. Then the second pair of data is used, and so on. Several learning rules [63-841 have been used for neural network synthesis. The problem to be solved in connectionist network synthesis concerns the identification of the networks weights. The solution of this problem generally leads to the optimisation of multimodal function. The backpropagation approach is commonly used for adjusting the parameters of the connectionist networks. Four survey papers 111, 12, 18, 811 have been devoted to learning systems. Some definitions for learning automata are provided. The properties of learning automata are presented. An analysis of the reinforcement schemes is reported. The asymptotic properties of the learning automaton is considered. Neural networks are presented and the synthesis of neural networks using learning automata given. The most important applications of learning systems are finally reviewed.
2
Learning systems: definitions
The most useful definitions concerning learning systems will be given.

I83
The learning automata can be classified in three categories: deterministic, fixed-structure/stochastic and variable-structure/stochastic. In the deterministic automata, the transition and the output matrices are deterministic. Their transitions are determined by timeinvariant stochastic matrices for the fixed-structure/ stochastic automata. The variable-structure automata have stochastic transition matrices whose elements are adjusted as the learning system operates. The use of stochastic automata leads to a reduction of states in comparison with deterministic automata [21]. A learning automaton is a stochastic automaton connected in a feedback loop with a random medium (the environment). A learning automaton is shown in Fig. 1.
measure, it is necessary that the following conditions be satisfied all the time: f,(P(t))=
j#i
c f,(p(t))
dP(t))= Q , { P ( t ) )
JfI
(2)
learning automaton
+I
Fig. 1
Learning automaton
A stochastic automaton may be of fixed or variable structure [2, 11, 12, 181. It can be described by the set { W , @, II, P , R, G}, where W is the input set {O, 1 ) which is emitted by the performance evaluation unit. W(t)= 1 is called the 'penalty' and w(t)= 0 is called the 'nonpenalty' or 'reward'. In the process control context, the non-penalty or reward corresponds to a good choice of the control signal (good resulting behaviour of the process) and the penalty is assocaited with the learning system when the control applied to the process under consideration leads to non-desirable behaviour. @ = {al, as} the set of internal states. II = {uI, ..., is . . ., U,} with N < s is the output or the set of actions. The action chosen by the automaton plays the role of the environment input (the control variable). P(t) = [PI@), ... , pN(t)lTis the state probability distribution at time t. The probability for the automaton to choose the action ui at the instant t is p i t ) and
Pr (u(t) = ui) = pXt) and
i=l
Pi(t) = 1
G is the mapping of the set @ onto the set II (C: @ + n). Generally G may be a stochastic function; it is often assumed that G is deterministic and one-to-one (i.e., N = s and s < 00) [l 13. In adaptive control, the behaviour of the system is slightly improved at every sampling period by estimating in real time the parameters (model or control law parameters) to attain a specified goal. In learning systems, the probability of occurrence of good actions (in the same sense) is increased and the probability of poorer actions is decreased. The performance unit emits output W(t)= 1 with probability cXi = 1, . . ., N ) and output W(t)= 0 with probability 1 - ci. The parameters cdi = 1, . .., N ) represent the probability of penalty. The automaton operates in an environment without knowing very much about how these actions affect the outcome. It is assumed that ci are initially unknown, as the problem would be trivial if they were known a priori. The environment (or medium) is a term that can cover just about anything. In automatic control, the environment corresponds to the process to be controlled. The action selected by the automaton plays the role of the environment input, the control action denoted u(t). The environment response y(t), the controlled variable, assigns either reward or penalty to the adaptive unit. Using an example, the behaviour of the automatonenvironment will be detailed from a process control point of view. The environment is said to be stationary if the penalty probabilities are not dependent on the index t, otherwise it is said to be nonstationary. At every sampling period, the automaton chooses an action on the basis of the probability distribution defined by P(t). The performance evaluation system reaction (penalty, non penalty, reward) and a reinforcement scheme are then used to update the probability distribution. The learning automaton, where the performance evaluation unit appears, is shown in Fig. 2.
for all t . R is the learning algorithm (reinforcement scheme or updating scheme) which changes the probability vector from P(t) to P(t + 1). It can generally be written as follows: If the action selected at time t is uithen, for t = 1,2, . . . For W(t)= 0
PXt) = PXt - 1) +fXW
~
p l a n t output
cantrol Input
evaluation
automaton with varying structure
1))
pit)
= p i t - 1) --f,{P(t
- 1)) j # i ; i
1,2, . . . , N
Fig. 2 Block diagram o learning controller f
For W ( t )= 1
p i t ) = pit - 1) - q i ( W - 1)) pit) =Pit
-
'1 + SAP('
# i; =
'9
2*...)
(l)
where fi( .) and gJ( .) are correcting terms for p i at the tth stage of operation of the automaton-environment feedback. They are continuous functions such that p&) E CO, 11, k = 1, . . .,N . In the case of inaction, the probability vector stays unchanged. To preserve the probability
184
When the number of actions increases, the behaviour of the automaton will be slow and the computer memory capacity required for the implementation of the learning systems will also increase. These problems have been avoided by using a hierarchical structure of automata [45,50,84].
IEE PROCEEDINGS-E, Vol. 138, N o . 4 , J U L Y 1991
The hierarchical system of automata is composed at

different levels of single automata with a limited number
of actions. The first level comprises a single automaton with N internal states. The second level is composed of N single automata (of N actions each) and the kth level is automata (Fig. 3). formed by (Nk-')
f r o m environment
Let cl = min {ci, 1 < . < N . Suppose that C I < C z < ' . . < c w , and define {cJl < i < N } N , the value of a(t) when p l ( t ) = p2(t) = . . . = p N ( t )= 1/N. Some definitions are obtained from Reference 14.
A;(<=
Definition I : The learning automaton is expedient iif
The expected cost is nearly always lower than the mean cost AO.
lim E(A(t))
< A0
Definition 2 : The learning automaton is optimal iif

second I e v e y
U ,
I
'323
(lim E(A(t)))= c1 In this case, the distribution P(t) has a tendency to load the minimal cost action. The optimality property is generally unavailable; only the 6-optimality is accessible.
Definition 3: Let E be a real positive scalar. The learning automaton is &-optimal iif
t o environment
lim IE(A(t))- cII < E Very often the expected cost is close to its minimal value
C1.
Fig. 3
Hierarchical system ofautomata
The behaviour of hierarchical-structure/stochastic automata operating in a non-stationary multi-teacher environment has been considered by Baba and Mogami [SO]. Based on the probability distribution, the first level automaton randomly selects an action (Ujl). This, in turn, activates the automaton A j , at the second level, which chooses an action (ujlj2) from its action probability distribution. Consequently, the automaton A j l j z is activated, etc. The probability vector P ( t ) at each level depends only on the corresponding level and on the action selected at the previous level. Remark: Any knowledge (models, correlations, knowhow, etc.) concerning the process to be controlled (problem to be solved: optimisation, pattern recognition, diagnosis, etc., can easily be introduced into the performance evaluation unit. Most industrial processes, especially batch and/or polyvalent processes, do not operate under automatic control strategies. In this case, the quality of the manufactured products greatly depends on the know-how and experience of the operator. Learning systems are closely linked to artificial intelligence in so far as human decision making for production monitoring can be introduced by means of heuristic rules in the performance evaluation unit. To solve some optimisation problems (optimal synthesis of industrial processes, etc.) or to make the search of the optimal action more fine as the learning process go on, it seems necessary to automatically select the number of automation actions. A learning algorithm with a changing number of actions has been developed by Thathachar and Harita [49].
A more interesting definition is the following:
Definition 4 : The learning automaton is absolutely expedient iif [14] for every cost values such that 0 < c l < c2 < < CN < 1. for every t,t N , E(A(t l)/P(t)) < A ( t ) if P ( t ) belongs to the interior of the simplex S, then E(A(t l ) / P ( t ) )< A ( t ) if P ( t ) loads one of the actions with the mass 1, then E(A(t l ) / P ( t ) )= A(t).
" '
+ +
The sequence of costs ( A ( [ ) ,t E N ) is a bounded, by unity, positive supermartingale for the monotone sequence of sigma-algebra generated by the family ( P ( t ) E N ) . If the probability P ( 0 ) is uniform, the absolute expediency property implies the expediency property.
Definition 5 : A learning automaton is said to be ergodic in the mean if the mean action probability is the state probability of an ergodic Markov chain. Entropy is also used as a performance measure of the behaviour of the automaton. The entropy which measures the decision disorder is defined as H(t) =
-
z=1
,EPi01 1%
Pi(4
In the case of no prior information, the probability distribution is initialised to p i ( [ ) = 1/N. The initial entropy is equal to 1/N. As the learning automaton improves its behaviour, the entropy decreases.
3.2 Some other definitions These definitions complete earlier approaches in which the cost concepts were not directly included. The main tool is the operator R (linear or not) which associates a probability q = R(p, w,a) to a probability p , a reinforcement w and an action a . Such an operator will possess good properties if it is a contraction: it will draw near the images q and q' of two probabilities p and p'. The following type can be given [U, 861 for a precise definition: The learning automaton is distance-diminishing iif (1) whatever (w, p . p') are in their respective set: a, dist (q, 4') < dist (p, p') where dist is a distance on the
185
Learning automaton: Definitions and properties
3.1 Some important properties
The important properties which can characterise a learning automaton are detailed. The behaviour of a learning automaton is often specified in terms of the average penalty [2, 11, 12, 14, 181 defined as
A(t) = {pi(t)cJl< i Q N }
IEE PROCEEDINGS-E, Vol. 138, N o . 4, J U L Y 1991
space S of the probability distributions over the set of actions. (2) The inequality in 1 is strict, over an interesting family of probability distributions. As Lakshmivarahan [14] states: this condition is a very useful one, since it can be verified in many applications. The asymptotic behaviour of the sequence P ( t ) can be very different depending on the algorithm R even when R is distance-diminishing. In some cases, the sequence can be attracted by some absorbing point p*. For a prespecified criterion this point can be optimal. The point p* is a simple point when a single action carries all the weight. The probability p* is then one corner of the simplex. When a learning automaton has at least one such simple absorbing point, it is called an absorbing barrier algorithm. When there is no absorbing point, the automaton is called a non-absorbing barrier algorithm. Recall that p is an absorbing point iif for every w and for every a
P
=
or
0 <fi(P(t))< P,m, P,G) either gJ

or
E CO,
11
0<
j t i
c CPAt) + SJi(P(t))< 1, PJi(t)
CO, 11
(3)
Then the learning automaton operating according the reinforcement scheme of eqn. 1, is absolutely expedient if no distinction is made among the actions not selected by the automaton. In other terms, the ratio { p j + l(t)/p,(t)} should be the same for all the actions u j , j # i. These conditions are generally formulated as follows: The reinforcement scheme of eqn. 1 is absolutely expedient if
R h w, a)
in the presence of absorbing points, the asymptotic behaviours can generally be modified by the initial condition P(0). There is another important family of algorithms, which asymptotically forgets all information on the initial condition P(O), and behaves in the same manner whatever P(0). A full definition of ergodic algorithms is included in References 85 and 86. The following example easily shows that according to the parameter values, the automaton belongs to one family or the other [86]. Suppose that there are two possible actions, noted 1 and 2, and the updating rules defining R are, for every t E N
Action Reinforcement P (t + 1 ) (1 )
0 < W ( t ) ) 1; 0 < p ( p ( t ) )min b X t ) / ( l <

J
P,W
for all j = 1, 2, . . . , N and all p,(t) E [0, I] wherefi(.) and gi( ) are arbitrary continuous functions.
4
Reinforcement schemes
The heart of a learning automaton is the reinforcement scheme which is the mechanism used to adapt the probability distribution. Based on the environment response and the action selected by the automaton at time t , it generates P ( t + 1) from P(t). The reinforcement schemes can be classified on the basis of the properties that they induce in the learning automaton or on the basis of their own characteristics (linearity, nonlinearity, etc.). To judge the effectiveness of a stochastic automaton operating in a random environment, introduce the following average penalty function [44]
N
< H a j < 1 for 1 < i , j < 2.
J(c) =
t=1
[@&I,
. . . , pNlEi{
- ui} - y&l?
The reinforcement evolution w is defined by the parameters nij

q j = {Pr (w(t) = j } / { a ( t )= i}) 1 < i, j < 2 In this simple case the following results display the wide range of possibilities: (1) The probability p defined by ~ ( ( ( 1 )= 1, ~ ( ( ( 2 =) 0, ) ) is absorbing iif (el2= 0 or n,, = 0). (2) The automaton is distance-diminishing (in a sense very close to that given [62] there exist k and I (k E (1, 2}, I E { 1, 2)): B , , n , , > 0 and Oz,n,, > 0. (3) Leaving aside simple and insignificant cases, a distance diminishing automaton is either ergodic or of the absorbing-barrier type [86].
. . .> pN)Ei{ui}l ( 5 )
where the functions @( .) and Y( .) represent the amount of change in the probability vector under reward ( W ( t )= 0) and penalty (W(t) 1). u i , i = 1,2 correspond = to the reward and penalty environment responses. According to the Kiefer-Wolfowitz stochastic approximation method [87], the reinforcement scheme which optimises the function J ( C ) by setting the gradient of J(C) equal to zero is derived. the If the selected action at time t is ai, following algorithm is obtained:
pit) =pit PAt) = PAt
-
l ) f Y(t)[{a@i(P1>. . > PN)/aPi}(l

- (ayibl,
.?
- ui)
Efficient tools to characterise the properties of an algorithm are not common. Lakshmivarahan [I41 provides necessary and sufficient conditions for the algorithm (eqn. 1) to be absolutely expedient. Considering the conditions which ensure that the probability pi(t + l), i = 1, 2, , , ., N ; t = I, 2 , . . ., belongs to the simplex (pi(t + 1) E S ) either
&=O
186
. . PN)/Pi}il/Pi
pN)ldpi)(l
- ui)
Y(t)[{am&l,
. . >PN)/Pi}il/bdN (6) The reinforcement schemes [21, 22, 341 correspond to the following functions mi( ) and Y i (.): Viswanathan and Narendra [34]: mi( = p i - p 2 / 2 ; .) Y i ( )= 0
I E E P R O C E E D I N G S - E , Vol. I3X. N o . 4, J U L Y 1991
Yarshavskii and Vorontsova [ I : @I(')= Cte - pf/2 Z]

-p:/3;Yi(.)=Qi(.)
5.2 Results concerning actions and reinforcements
Bush and Mosteller [22]: Qi( = pi pz/2; 'Pi( .) = .) P?/2 These schemes [21,22, 341 are c-optimal, like most of the reinforcement schemes given [Il, 12, 14, 18, 27, 38, 39, 59, 83, 841. The scheme developed by Seret and Macchi [54] is optimal.
-
Asymptotic properties
Let g be a function defined on the Cartesian product of the action and reinforcement spaces. For an ergodic distance-diminishing automaton, a good number of theorems can be proved: strong laws of large numbers (uniform with regard to the initial conditions), central limit theorems, iterated logarithm laws, and sometimes the speed of convergence can be given [85, 861. The proofs rely mainly on the contraction properties of the operators R.
5.3 Absolutely expedient learning automata Lakshmivarahan [14] and other authors have studied different absolutely expedient automata which do not always have the quality of being distance-diminishing algorithms. The convergences rely on the fact that the cost is a bounded positive supermartingale. The ci being positive lower than one, it can be deduced from the definition of absolute expediency that the sequence converges to a random variable p*, taking only a finite number of values, which are some of the corners of the simplex S [14]. Different formulae provide bounds for the probabil0, ity of the event p* = (0, ..., 1, 0, ..., 0) conditioned by ~ ( 0=)p i ~ 4 1 .
6
Neural networks
Many results are known about learning automata acting in the following way: Suppose that, at the instant t, the action is a i . The environment response is random and is not known. Only the value of w(t) is known. w ( t ) is obtained by a random sampling scheme in an urn, filled with two kinds of balls labelled 1 or 2 and such that:
Pr ({w(t) = l/a(t) = ai})= ci
and Pr ({w(t)
P(t
= O/a(t) = q})= 1 = c,
Knowing the action, reinforcement and P(t) values, + 1) is calculated. Suppose that the algorithm R is a non-stochastic function of its arguments. At the instant (t I), the action value is obtained by sampling in an urn composed according to the distribution P(t I). Several types of results are available: (1) Those concerning f ( t ) when t 2 m (2) Those concerning the actions, the reinforcements or both of them, when t 2 CO. (3) Those concerning the costs when t 2 CO. (4) Those proving 'ODE type' convergences, for some reward-penalty algorithms [I41 linearly depending on a parameter 0, which tends toward zero. These types of results shall be omitted.
Neural networks are parallel information processing systems. A neural network consists of a set of nodes locally interacting across very low bandwidth connections [63-841. The architecture of these systems is specified by the nodes characteristics, net topology and learning algorithm (perceptron, adaline, Hopfield nets, feedforward networks (or backpropagation), adaptive resonance theory, counterpropagation network, cognitron and neocognitron, self organising map, bidirectional associative memory, etc.) A typical node is shown in Fig.
5.1 Results for P ( t ) for distance-diminishing algorithms
4.
Let A be an event concerning the space S of probabilities over the set {al, a 2 , ..., aN} and define Q") for every tENby eo@,A ) = 1 i f p E A;and
x'\
if p E A. Then, for a general distance-diminishing algorithm, when n 2 m converges (uniformly in p ) towards a transition Q'")(P, .). If, the algorithm is an ergodic one (in the sense of [86], it forgets the initial condition and Q'"(p, ' ) does not depend on p. It is then a probability over the space S . For any continuous function f on S, for every p o , when n>m lim (1/4
XI
I
X"
Fig. 4
Typical node
Each of the inputs has a weight w associated with it.
1 { f ( p ( t ) = lf(u)Q'"(d4 )
x,
=f( A , )
(7)
is almost certain for a probability distribution depending OnPo. If there exists any absorbing point, then the sequence p(t) almost surely converges towards a random variable p,.
IEE PROCEEDINGS-, Vol. 138, No. 4, J U L Y 1991
where
Ai =
1
k=l
WiliXk
andf( .) is a nonlinear function (e.g. sigmoidal function).

187
Assume that there are m input-output pairs (x', y') available for training the net. Consider a mean-squared error (difference between the actual outputs from the output layer and the desired output for all the patterns) performance index C(W)defined as:
Consider a quantification { X , } of the admissible region X
X I
x,, , n X,t, X
N
= @ ( i , j = l . . , N ) , R M 3 X = U X , (10)
I=
C ( W )= (1/2m) C '
k
where
c' = 1 [s: - y:y

The index s is associated with the neurons of the output layer. It is necessary to find the value of the weights wik where C(W) takes on a minimum value. The initial weights values are normally set to small random numbers. Backpropagation is the most popular learning algorithm used for neural networks synthesis. In this algorithm the error is propagated backward through the network. The backpropagation algorithm is summarised as follows:
E; =
f ( x ) multimodal function ( n e u r a l network)
loss function construction
Fig. 5
Opfimisafion using learning automaron
Let y, be the observation off(x)

Y, = f ( x t ) + o r (11)
[: s
- yf]
(9)
where x, E {x(l), x(2), . . ., x(N)}, X i 3 x(i) - fixing points, wI is a random value which characterises the observation noise. Construct the automata input 5, at the instant t as follows:
wheref, = a c k / a A if '(Ai)= as:/aAi. , Backpropagation is a gradient-based algorithm, in the sense that the weight update is performed along the direction of the gradient of the performance index C(W). It is simple and requires a minimal amount of memory from computational point of view. The extremum associated with the cost function C( W) can be either global or local. Virtually nothing is known about finding global extrema in general. The backpropagation algorithm which uses the Widro-Hoff techniques can lead to a non-global optimisation. Random search techniques [88, 891, simulated annealing [82] and learning automata [83, 84, 901 have been widely used in the optimisation of functions where more than one local optimum exists. Random search techniques are generally based on random sampling and search region contraction [88] or on stochastic approximation techniques [91]. Simulated annealing method is suitable for the optimisation of large scale systems and multimodal functions [82]. The essence of simulated annealing is an analogy with thermodynamics, especially with the way in which liquids freeze and crystallise.
if
x,
=
ui
where
s,(i) =
"= 1
cY
I
d "
= U;)
i = 1, . . . , N , [x] + = (x if x 0, 0 if x < 0), x(A) is equal to unity if the event occurs (reward) and zero otherwise (inaction or penalty). The loss function associated to the learning automaton is given by:
c i.r,
x(x, = Ui)
The use of learning automata for neural networks synthesis, has to (1) discretise the variation domain of both weights (2) associate the different combination of these discretisations to the states of the automaton. (3) construct the automaton input 5, (4) adapt the probability distribution using any reinforcement scheme. If the number of the discrete values of the weights is large then hierarchical structures of automata can be used.
8
Application of learning systems
Neural networks synthesis using learning automata
The application of a learning automaton to multimodal functions optimisation (neural networks synthesis, etc.) (Fig. 5) [83, 84, 901 is considered. The function to be minimised is the sum of square errors. Let f ( x ) be a real-valued function (C(W))of a vector parameter x (x E X - compact from R'). It is necessary to find the value of x, namely x*, that minimisesj(x).
188
Learning systems are attractive methods for process control, robotics, optimisation, pattern recognition, image processing, telecommunication, scheduling, etc. They are very simple to implement and need little prior knowledge. The number of learning systems applications
I E E PROCEEDINGS-E, Vol. 138, No. 4, J U L Y 1991
computers
has increased with the advent of the highly integrated which made the technology cost-effective. The use of learning systems for process control purposes is briefly presented. To do this, consider the control of a liquid-liquid extraction column [52, 92-95]. In a liquid-liquid extraction column, mass transfer occurs between two immiscible phases (continuous and dispersed). The dispersed phase, Q,,, is dispersed through a perforated pipe distributor at the bottom of the column (Fig. 6). The continuous phase, Q c , which contains the
These rules partly express the reaction of the operator who manually conducts the liquid-liquid extraction column. The probability vector was adjusted using the following reinforcement scheme: If W ( t )= 0 and ui is the selected action at the instant t then
Pit
P,@
+ 1) = Pit) + B o P i W - P i a + 1) = pit) - Bopi(t)pJ@)j # i(j = 1, . . ., N )
If W ( t )= 1 then
+ 1) = P i t ) - B l P X W - Pi(t)l + 1) = p J W + B A t k i t ) j # i(j = 1, . . . with 0 < B o < 1 and 0 < P I < 1.

PAC
N)
After performing these first steps (discretisation of the control domain variation and definition of the behaviour of the performance evaluation unit), and for every sampling period the control algorithm, based on a single or hierarchical structure of automata, performs the following steps: Step I: Choice of one action ui (related to the control action). The technique used by the algorithm to select one action uiamong N possibilities is based on the generation of normally distributed random variable z (any specific machine routine; e.g., RANDU, can be used to carry out the random variable z). The algorithm chooses the action ui such that i is equal to the least value ofj, verifying the following constraint:
Fig. 6
Diagram ofcolumn
product to be extracted, is fed at the top of the column. The control objective is to maintain the conductivity, fit), of the medium located below the distributor, close to a desired value, by controlling the pulse frequency, u(t). The control of the extraction column can be modelled by a learning automaton which considers the column as the random environment where it operates. The controlled variable, the conductivity measurement, and the control variable, the pulse frequency, are the response and the input of the environment, respectively. The variation domain of the control variable is discretised into a set of N values
[U17 U 2 3
...> U N ] ; U 1
= U m i n , U N = U,,, ,,,
where U,,,," and , U are the lower and the upper bound of the variation domain of the control action, pulse frequency, respectively. No prior information is used initially, i.e., the probability vector is initialised to P(0) = [l/N, 1/N, _.., l/NIT. The performance evaluation unit is supplied by rules derived from the knowledge of the extraction column (simulations and observations of the column). These rules allow the description of the desired behaviour of the considered process. They are summarised as follows:
W(t)
=
Step 2: Application of this control to the plant to be controlled (the control variable of the process under consideration corresponds to the automaton output). Step 3: Measurement of the controlled variable. Step 4; Use of this measure in the performance evaluation unit to decide, on the basis of heuristic and/or analytical rules, whether the choice of the action ui complies with the conditions of a good process behaviour. This leads the unit to deliver a response which can correspond, in the first case, to a penalty or inaction (nonpenalty) or, in the second case, to a reward. Step 5 : Use of this response to adjust the probability distribution associated to the set of N actions by means of an optimal or &-optimal reinforcement scheme which generates a new probability distribution P(t + 1) from the previous P(t) according to the performance evaluation unit response, w(t). Step 6: Return to step 1.
0 { f i t )< y, and u(t) < u(t - 1)} or { f i t ) > y , and ~ ( t> u(t - 1)} )
1
otherwise
where y e is the desired value of the conductivity.

I b E PROCEEDINGS-E, Vol. 138, No. 4. J U L Y 1991
The conductivity reference (desired output) was taken to be equal to 0.45mS/cm. The parameters were chosen to be equal to Po = B1 = 0.2. The output variable was measured at time t and the control action was calculated and applied to the column at time t + T, where z is the computation time related to the learning control algorithm. (7 = 2s). The behaviour of the column under learning control is shown in Fig. 7 which represents the conductivity time variation. Fig. 7 illustrates the ability of the learning algorithm to adapt itself to the variations of the parameters affecting the behaviour of the column. Fig. 8 shows the time variation of the pulse frequency. If a multivariable system with k control variables { ~ ' ( t ) u2(t), , . . , uk(t) is to be controlled, their variation , domains should be discretised into a set of a series of N
189
values. The choice of an action ai then corresponds to a set of k values of the control variables {U,?, U;, . . ., U:} [92-941. To avoid the numerical problems caused by round-off errors, and to guarantee that At) E [0, 11, a normalisation [52] or projection [41] procedure can be used.
and uncertainty. The most important results concerning the theory of learning systems and their applications have been presented. Observe that: 0 few studies deal with non-stationary environments, stochastic neural networks and with the stability of connectionist networks. there exists only one optimal reinforcement scheme [54]. From a practical point of view, it should be necessary to adapt on-line the reinforcement scheme parameters as in the recursive identification schemes where the forgetting factor is on-line adapted; automatically select the number of automaton actions (numbers of nodes and layers) without increasing the number of the set of actions as is the case in Reference 49. develop fast learning algorithms for neural network synthesis.
0 ol
500
1500
time, s
2500
Fig. 7
Time evolution ofconductivity y(t)
10
References
16
I N
I
500
1500
time, s
c
a 3
1L
g12
10
250
Fig. 8
Time evolution ofpulsefrequency $ 1 )
The applications of learning systems cover a wide range of problems (process control, optimisation, pattern recognition, diagnosis, etc.) which are listed here : Bioreactors [96,97] Computers [25,98] Diagnosis and fault detection 199-1011 Distillation columns [102, 1031 Drying furnaces, VLSI process [46, l W l 0 8 l Fluidised bed reactors [IOS, 1IO] Image processing [ill, 1121 Irrigation canals [46, 1131 Liquid-liquid extraction columns [52, 92-95] Market price formation [131 Optimisation [27, 58, 83, 84, 901 Packet-switching networks [I 141 Pattern recognition [29, 31, 115, 1161 Queue control [117, 1181 Resource allocation [I31 Robots [119-1211 Sensors 1122, 1231 Speech 0 1 5 , 124,1251 Taxicab operation [13] Telephone traffic routine Cl261281 Thermal reactors [ 110, 1291
9
Conclusion
Learning systems have made a significant impact on all areas of engineering problems arising from complexity
190
I ROBINSON, D.N.: Psychology, traditions and perspectives (Van Nostrand, New York, 1972) 2 TSELTIN, M L.: Automation theory and modeling of biological systems (Academic Press, New York, 1973) 3 SLANSKY, J.: Learning systems for automatic control, IEEE Trans., 1966, AC-11, pp. 6 1 9 4 TSELTIN, M.L.: On the behavior for finite automata in random media, Autom. Remote Control, 1961, 22, pp. 121CL-1219 5 KYRLOV, V.YU., and TSELTIN, M.L.: Games between automata, Autom. Remote Control, 1963.24, pp. 889-899 6 MrKINSEY, J.: Introduction to the theory of games (McCrawHill, New York, 1952) 7 LUCE, R.D., and RAIFFA, H.: Games and decisions (Wiley, New York. 1957) 8 CHANDRASEKARAN, B., and SHEN, D.W.C.: Stochastic automata games, IEEE Trans., 1969, SMC-5, pp. 145-149 9 ZAMIR, S.: On the notion of the value for games with infinitely many stages, Ann. Stat., 1973, 1, pp. 791-796 IO VISWANATHAN, R., and NARENDRA, K.S.: Games of stochastic automata, IEEE. Trans., 1974, SMC-4, pp. 145-149 11 NARENDRA, K.S.: Learning automata a survey, IEEE Trans., 1974, SMC-4, pp. 323-334. 12 NARENDRA, K.S., and LAKSHMIVARAHAN, S.: Learning automata a critique, J. Cyhrrn. l n f Sci., 1977, 1, pp. 53-65 13 EL FATTAH, Y.M., and FOULARD, C.: Learning systems: decision, simulation and control (Springer Verlag, Berlin, 1978) 14 LAKSHMIVARAHAN, S.: Learning algorithms: theory and applications (Springer-Verlag, Berlin, 1981) 15 EL FATTAH, Y.M.: Multi-Automaton Games: a rationale for expedient collective behavior, Syst. Con. Lett., 1982, 1, pp. 332-339 16 LAKSHMIVARAHAN, S., and NARENDRA, K.S.: Learning algorithms for two-person zero-sum stochastic games with incomplete information: a unified approach, SIAM J. Cont. Optim., 1982.20, pp. 541-552 17 EL FATTAH, Y.M.: Fairness and mutual profitability in collective behavior of automata, IEEE Trans., 1983, SMC-13, pp. 23C241 18 NARENDRA, K.S., and WHEELER, R.M.: Recent developments in learning automata, in NARENDRA, K.S. (Ed.): Adaptive and learning systems: theory and applicatlons (Plenum Press, New York, 1986) 19 THATHACHAR, M.A.L., and SASTRY, P.S.: Learning optimal discriminant functions through a cooperative game of automata, IEEE Trans., 1987, SMC-17,;~. 73-75 20 THATHACHAR, M.A.L.. and RAMAKRISHNAN, K.R.: A cooperative game of a pair of learning automata. Automalica, 1984, 20, pp. 797-801 21 VARSHAVSKII, V.I., and VORONTSOVA, I.P.: On the behavior of stochastic automata with variable structure, Autom. Remote Control, 1963,24, pp. 327-333 22 BUSH, R.R., and MOSTELLER, F.: Stochastic models for learning(J. Wiley, New York, 1958) 23 LUCE, R.D.: Individual cholce behavior (J. Wlley, New York, 1959)
~ ~
I E E PROCEEDINGS-E, Vol. 138, No. 4 , JULY 1991
24 TSERTVADZE, G.N.: Certain properties of stochastic automata and certain methods for synthesizing them, Autom. Remote
Control., 1963, 24, pp. 31G-326
25 KASHYAP, R.L.: Optimization of stochastic finite state systems, IEEE Trans., 1966, AC-11, pp. 68S692. 26 CHANDRASEKARAN, B., and SHEN, W.C.: On expediency and convergence in variable-structure automata, IEEE Trans., 1974, CMS4, pp. 52-60 27 SHAPIRO, I.J., and NARENDRA, K.S.: Use of stochastic automata for parameter self-optimization with multi-modal performance criteria. IEEE Trans., 1975, SMC-5, pp. 352-360 28 DEVYATERIKOV, I.P., KAPLINSKII, A.]., and TSYPKIN, Y.Z.: Convergence of learning algorithms, Autom. Remote Control., 1969,10, pp. 8 3 8 9 29 MENDEL, J.M., and FU, K.S.: Adaptive learning and pattern recognition systems (Academic Press, New York, 1970) 30 TSYPKIN, Y.Z.: Adaptation and learning in automatic systems (Academic Press, New York, 1971) 31 FU, K.S.: Pattern recognition and machine learning (Plenum Press, New York, 1971) 32 TSYPKIN, Ya.Z.: Foundations of the theory of learning systems (Academic Press, New York, 1973) 33 POZNYAK, A.S.: Learning automata in stochastic programming problems, Autom. Remote Control, 1973,34, pp. 160&1619 34 VISWANATHAN, R., and NARENDRA, K.S.: Stochastic automata models with applications to learning systems, IEEE Trans., 1973, SMC-3, pp. 107-111 35 SAWARAGI, Y., and BABA, N.: A note on the learning behavior of variable-structure stochastic automata, IEEE Trans., 1973, SMC-3, pp. (44647. 36 POZNYAK, A.S.: Learning automata in stochastic plant control problems, Autom. Remote Control, 1974,5, pp. 777-789 37 LYUBCHIK, L.M., and POZNYAK, AS.: Learning automata in stochastic plant control problems, Autom. Remote Control, 1974, 35, pp. 777-789 38 VISWANATHAN, R., and NARENDRA, K.S.: A note of the linear reinlorcement scheme for variable-structure stochastic automata, IEEE Trans., 1974, SMC-4, pp. 292-294 39 SAWARAGI, Y., and BABA, N.: Two E-optimal nonlinear reinforcement schemes for stochastic automata, IEEE Trans., 1974, SMC-4, pp. 126131 4 BABA, N., and SAWARAGI, Y.: On the learning behavior of sto0 chastic automata under a nonstationary random environment, IEEE Trans., 1975, SMC-5, pp. 273-275 41 POZNYAK, AS.: Investigation of the convergence of algorithms for the functioning of learning stochastic automata, Aulom. Remote Control., 1975, 36, pp. 77-91 42 LAKSHMIVARAHAN, S., and THATHACHAR, M.A.L.: Bounds on the convergence probabilities of learning automata, IEEE Trans., 1976, SMC-6, pp. 75&763 43 BABA, N.: On the learning behavior of the SL,., reinforcement scheme for stochastic automala, JEEE Trans., 1976, SMC-6, pp. 586582 44 TSYPKIN, Ya.Z., and POZNYAK, A.S.: Learning automata, J. Cybern. Inf: Sci., 1977, 1, 128-161 45 THATHACHAR, M.A.L., and RAMAKRISHNAN, K.R.: A hierarchical system of learning automata, IEEE Trans., 1981, 11, pp. 23C241 46 NAJIM, K.: Commande adaptative des processus industriels (Masson, Paris, 1982) 47 BABA, N.: The absolutely expedient nonlinear reinforcement schemes under the unknown multiteacher environment, IEEE Trans., 1983, SMC-13, pp. 1W108 48 THATHACHAR, M.A.L., and OOMMEN, B.J.: Learning automata processing ergodicity of the mean: the two-action case, IEEE Trans., 1983, SMC-13, pp. 11431148 49 THATHACHAR, M.A.L., and HARITA, B.R.: Learning automata with changing number of actions, IEEE Trans., 1987, SMC-5, pp. 352-360 50 BABA, N., and MOGAMI, Y.: Learning behaviors of hierarchical structure stochastic automata operating in a non-stationary multiteacher environment, Int. J. Syst. Sci., 1988, 19, pp. 1345-1350 51 BABA, N.: On the learning behaviors of stochastic automaton in the general N-teacher environment, IEEE Trans.. 1988, SMC-13, pp. 224-231 52 NAJIM, K.: Control of liquid-liquid extraction columns (Gordon and Breach, London, 1988) 53 LASKMIVARAHAN, S., and THATHACHAR, M.A.L.: Optimum non linear reinforcement schemes for stochastic automata. Report EEEI, May 1970, Department of Electrical Engineering, Indian Institute of Science, Bangalore-12, India 5 SERET, D., and MACCHI, 0.:Automates adaptatifs optimaux, 4 Techniques et Sciences Informatiques, 1982, 1, (2). pp. 143-153
IEE PROCEEDINGS-E, Vol. 138, No. 4 , J U L Y 199J
55 LOUI, M.C., and NARENDRA, K.S.: Comparison of learning automata operating in nonstationary environments. 1975, Control Theory Becton Center Technical Report CT-65 56 FU, K.S.: A class of learning control systems using statistical decisions processes. 1965, IFAC Symp. Theory of Self-Adaptive Control, 14-17 September, Teddington, United Kingdom, Plenum Press. 57 THATHACHAR, M.A.L., and BHAKTHAVATHSALAM, R.. Learning automaton operating in parallel environments, J. Cybern. In5 Sci., 1977, 1, pp. 121-127 58 McMURTRY, G.J., and FU, K.S.: A variable structure automaton used as a multimodal search technique, IEEE Trans., 1966, AC-11, pp. 379-387 59 COMBES, M., MARCHAND, D., and MACCHI, C.: Un algorithme de convergence presque siire pour automate adaptatif. C.R. Acad. Sc. Paris, 1977, Serie A, pp. 505-507 60 NIKOLIC, Z.J., and FU, K.S.: An algorithm for learning without external supervision and its application to learning control systems, IEEE Trans., 1966, AC-11, pp. 414-422 61 NARENDRA, K.S., and THATHACHAR, M.A.L.: Learning automata, an introduction (Prentice Hall, London, 1989) 62 NAZIN, A.V., and POZNYAK, A.S.: Adaptive choice of variants (Nauka, Moscow, 1986) 63 HOPFIELD, J.J., and TANK, D.W.: Neural computation of decisions in optimization problems, B d g . Cybern., 1985, 52, pp. 141-152 64 PINEDA, F.J.: Generalization of backpropagation to recurrent neural networks, Phys. Rev. Lett., 1987,18, pp. 2229-2232 65 ANDERSON, J.A., and ROSENFELD, E.: Neurocomputing: foundations of research (MIT, 1988) 66 KOSKO, B.: Bidirectional associative memories, JEEE Trans., 1988, SMC-18, pp. 49-60 67 CHEN, F.C., and PAO, Y.H.: Learning control with neural networks. IEEE Int. C o d . Robot. Autom., 1989 68 SONTAG, E.D., and SUSSMANN, H.J.: Backpropagation can give rise to spurious local minima even for networks without hidden layers, Complex Syst., 1989,3, pp. 91-106 69 RANGWALA, S.S., and DORNFELD, D.A.: Learning and optimization of machining operations using computing abilities of neural networks, IEEE Trans., 1989, SMC-19, pp. 299-314 70 AMAR, H.: Mathematical foundations of neurocomputing, Proc. IEEE, 1990,78, pp. 143-1463 71 NARENDRA, K.S., and PARTHASARATHY, K.: Identification and control of dynamic systems using neural networks, IEEE Trans., 1990, 1, pp. 4-27. 72 TAYLOR, J.G.: Noisy neural net states and their time evolution, SIAM J. Appl. Moth., 1990, 50, pp. 1073-1087 73 KOHONEN, T.: The self-organizing map, Proc. IEEE, 1990. 78. - . pp. 1464-1480 74 ALLEN, R., and ALSPECTOR, I.: Learning of stable states in stochastic asvmmetric networks. IEEE Trans. Neural Networks. 1990,1, pp. f33-238 75 LEE, C.C.: A self-learning rule-based controller with approximate reasoning and neural nets. 1lth IFAC World Congress, Tallinn, USSR, 13-17 August, 1990 76 CHEN, F.C.: Backpropagation neural networks for nonlinear selftuning adaptive control, IEEE Control Syst. Mog., 1990, 10, pp. 4448 77 REYNERI, L.M., and FILIPPI, E.: Modified backpropagation algorithm for fast learning in neural networks, Electron. Lett., 1990,26, pp. 156&1566 78 WIDROW, B., and LEHR, M.A.: 30 years of adaptive neural networks: perceptron, madaline and backpropagation, Proc. IEEE, 1990,78, pp. 1415-1442 79 POGGIO, T., and GIROSI, F.: Networks for approximation and learning, Proc. IEEE, 1990.78, pp. 1481-1497 80 HINTON, G.E.: Special issue on connectionist symbol processing, Arflfcial Intelligence, 1990.46 81 THIBAULT, J., and GRANDJEAN, P.A.: Neural networks in process control a survey. IFAC Symp., ADCHEMPl, Toulouse, France, 14-16 October, 1991 82 DOLAN, W.B., CUMMINGS, P.T., and LE VAN, M.D.: Process optimization via simulated annealing: application to network design, AlChE J., 1989, 35, pp. 725-736 83 POZNYAK, A.S., NAJIM, K., and CHTOUROU, M.: A learning automaton with continuous inputs and its application for multimodal functions optimization. I n t . J. Svst. Scr.. 1991 (to be published) 84 POZNYAK, A.S., NAJIM, K., and CHTOUROU, M Multilevel : hierarchical system of learnine automata. Jnt. J. Svst. Sri.. 1991 (to he published j 85 NORMAN, M.F.: Markov processes and learning models (Academic Press, New York, 1972)
191
86 IOSIFESCU, M., and THEODORESCU, R.: Random process and learning (Springer-Verlag, Berlin, 1969) 87 KIEFER, J., and WOLFOWITZ, I.: Stochastic estimation of the maximum of a regression function, Ann. Math. Stut., 1952, 23, pp. 462466 88 BI-CHONG, W., and LUUS, R.: Reliability of optimization procedures for obtaining global optimum, AIChE J., 1978, 24, pp. 619-626 89 DOREA, C.C.Y.: Stopping rules for a random optimization method, SIAM J., Cont. Opt., 1990,28, pp. 841-850 90 NAJIM, K., LE LANN, M.V., and PIBOULEAU, L.: Optimization technique based on learning automata, J. Opt. Theory Appl., 1990,64, pp. 331-347 91 KUSHNER, H.J.: Stochastic approximation algorithms for the local optimization of functions with nonuniaue stationary . . points, IEEE Trans., 1972, AC-17, pp. 64-54 92 NAJIM, K., LE LANN, M.V., and CASAMATTA, G.: Learning control of a pulsed liquid-liquid extraction column, Chem. Eng. Sci., 1987,42, pp. 1619-1628 93 NAJIM, K., and LE LANN, M.V.: Multivanable learning control of an extractor. Chem. Enu. Sci.. 1988.43. DD. 1538-1546 94 NAJIM, K., and LE LA, M.V.;Conirol of a pulsed liquidliquid extraction column based on a multilevel system of automata, Chem. Eng. Commun., 1988,70, p p 107-126 95 NAJIM, K.: Multivariable control of a liquid-liquid extraction column using a probabilistic automaton, IEE Proc. D., 1988, 135, pp. 479485 96 NAJIM, K., DAHHOU, B., and BABARY, J.P.: A variable structure automaton used as model and controller for a bioreactor. Advanced Information processing in automatic control 89 IFAC Congress, France, Nancy, 1989 97 THIBAULT, J., and VAN BREUSEGEM, V.: Modeling, prediction and control of fermentation processes via neural networks. 1st European Conf. Contr., Grenoble, France, 2-5 July, 1991 98 ANDES, D., WIDROW, B., LEHR, M., and WAN, E.: MRII: a robust algorithm for training analog neural networks. Int. Joint Conf. Neural Networks, Washington, DC, 1990 99 KRAMER, M.A., and LEONARD, J.A.: Diagnosis using backpropagation neural networks-analysis and criticism, Comput. Chem. Eng., 1990,14, pp. 1323-1338 100 VENKATASUBRAMANIAN, V., and CHAN, K.: A neural networks methodology for process fault diagnosis, AIChE J., 1989, 35, pp. 1993-2002 101 YDSTIE, B.E.: Forecasting and control using adaptive connectionist networks, Comput. Chem. Eng., 1990, 14, pp. 583-599 102 BHAT, N.V., and McAVOY, T.: Dynamic process modeling via neural computing. Inst. AlChE Meeting, 1989 103 HOSKINS, J.C., and HIMMELBLAU, D.M.: Artificial neural network models of knowledge representation in chemical engineering, Compur. Chem. Eng., 1988, 12, pp. 881-890 104 NAJIM, K., and EL FATTAH, Y.M.: Use of a learning automaton in static control of a phosphate drying furnace. 5th IFAC/ IFIP Int. Conf. Digital Computer Appl. Process Control, 1 6 1 7 June, The Hague, Netherlands, 1977 105 NAJIM, K., and EL FATTAH, Y.M.: Practical problems related to the use of learning models for control of industrial processes. Yale Workshop in Application of Adaptive Control Systems Theory, New Haven, Ct., USA, 1979 106 NAJIM, K.: Modelling and learning control of a rotary phosphate dryer, Int. J. Syst. Sci., 1989, U),pp. 1627-1636 107 NAJIM, K.: Modelling and Hierarchical Learning Control of Dryer, Appl. Math. Mod., 1990, 14, pp. 65S660 108 MARKS, K.M., and GOSER, K.F.: Analysis of VLSI process data based on self-organizing feature maps. Proc. Neur. Nimes 88, France, 1988 109 NAJIM, K., KOUTSCHOUKALI, M.S., and LAGUERIE, C.:
110
111
112 113 114
115
116
Control of a fluidized bed reactor based on learning theory. 3rd Yale Workshop on Applications of Adaptive Systems Theory, 1>17 June, Yale, USA, 1983 BHAT, N.V., MINDERMAN, P.A. Jr., McAVOY, T., and WANG, N.S.: Modeling Chemical Process Systems via Neural Computation, IEEE Control Syst. Mag., 1990, 10, pp. 24-30 HASHIM, A.A., AMIR, S., and MARS, P.: Application of learning automata to image data compression, in NARENDRA, K.S. (Ed.): Adaptive and learning systems: theory and applications (Plenum Press, New York, 1986) BURR, D.J.. Experiments on neural net recognition of spoken and written text, I IEEE Trans., 1988, ASP-36, pp. 1162-1168 NAJIM, K.: Application des automates a structure variable a la commande dun canal #irrigation, RAIRO Automutique, 1981, 15, pp. 263-270 MASON, L.G., and GU, X.D.: Learning automata models for adaptive flow control in packet-switching networks, in NARENDRA, K.S. (Ed.): Adaptive and learning systems: theory and applications (Plenum Press, New York, 1986) LEE, Y., and LIPPMANN, R.P.: Practical characteristics of neural network and conventional pattern classifiers on artificial and swech Droblems. Conf, Neural Inlormation Processing Systems, Denier, USA, 1989 WIDROW, B., AND WINTER, R.G.: Neural nets for adaptive filters and adaptive pattern recognition, IEEE Computer, 1988, pp.
117 MEYBODI, M.R., and LAKSHMIVARAHAN, S.: Application of learning approach to priority assignment in a two class M/M/l queuing system with unknown parameters. 3rd Yale Workshop on Applications of Adaptive Systems Theory, New Haven, USA, 1983, pp. 10&109 118 GLORIOSO, R.M.: Engineering intelligent systems (Digital Press, Bedford, 1980) 119 SIMONS, J., VAN BRUSSEL, H., D E SCHUTTER, J., and VERHAERT, J.: A self-learning automaton with variable resolution for high precision assembly by robots, IEEE Truns., 1982, AC-27, pp. 721-730 120 MILLER, W.T.: Real-time application of neural networks for sensor-based control of robots with vision, IEEE Trans., 1989, SMC-19, pp. 825-831 121 KITAMURA, S., and KUREMATSU, Y.: Autonomous motion planning and learning control of a biped locomotive robot. 11th IFAC World Congress, Tallinn, USSR, August 13-17, 1990 122 McAVOY, T., WANG, N.S., and BHAT, N.V.: Use of neural nets for interpreting biosensory data. Joint Conf. Neural Networks, Washington, DC, June 1989 123 NAIDU, S.R., ZAFIRIOU, E., and McAVOY, T.J.: Use of neural networks for sensor failure detection in a control system, IEEE Control S y s t . Mug., 1990, 10, pp. 49-55 124 LIPPMANN, R.P.: Review of neural networks for speech recognition, Neurul Comput., 1989, 1, pp. 1-38 125 WAIBEL, A., HANAZAWA, T., HINTON, G., SHIKANO, K., and LANG, K.J.: Phoneme recognition using time delay neural networks, IEEE Trans., 1989, ASR-37, pp. 328-339 126 RAUSCH, H., and WINARSKE, T.: Neural networks for routing communication trallic, IEEE Control Syst. Mug., 1988, 8, pp. 2630 127 GLORIOSO, R.M., GRUENEICH, G.R., and DUNN, J.C.: Self organization and adaptive routing for communication network. EASCON Rep., 1969, pp. 243-250 128 NARENDRA, K.S., WRIGHT, E., and MASON, L.G.: Application of learning automata to telephone trflic routing, IEEE Trans., 1977, SMC-7, pp. 785-792 129 RIORDON, J.S.: An adaptive automation controller for discretetime Markov processes, Automatica, 1969.5, pp. 721-730
_-
7c70 _,
192
IEE PROCEEDINGS-E, Vol. 138, N o 4 , J U L Y I991

Learning Systems Theory and Application

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Learning Systems Theory and Application

Diunggah oleh

Hak Cipta:

Format Tersedia

Learning systems: Theory and application

The most useful definitions concerning learning systems will be given.

automaton with varying structure

The hierarchical system of automata is composed at

Definition I : The learning automaton is expedient iif

Definition 2 : The learning automaton is optimal iif

Hierarchical system ofautomata

A more interesting definition is the following:

Learning automaton: Definitions and properties

3.1 Some important properties

IEE PROCEEDINGS-E, Vol. 138, N o . 4, J U L Y 1991

0 <fi(P(t))< P,m, P,G) either gJ

c CPAt) + SJi(P(t))< 1, PJi(t)

0 < W ( t ) ) 1; 0 < p ( p ( t ) )min b X t ) / ( l <

< H a j < 1 for 1 < i , j < 2.

The reinforcement evolution w is defined by the parameters nij

l ) f Y(t)[{a@i(P1>. . > PN)/aPi}(l

I E E P R O C E E D I N G S - E , Vol. I3X. N o . 4, J U L Y 1991

Yarshavskii and Vorontsova [ I : @I(')= Cte - pf/2 Z]

5.2 Results concerning actions and reinforcements

5.1 Results for P ( t ) for distance-diminishing algorithms

Each of the inputs has a weight w associated with it.

andf( .) is a nonlinear function (e.g. sigmoidal function).

Consider a quantification { X , } of the admissible region X

c' = 1 [s: - y:y

f ( x ) multimodal function ( n e u r a l network)

loss function construction

Opfimisafion using learning automaron

Let y, be the observation off(x)

Neural networks synthesis using learning automata

+ 1) = Pit) + B o P i W - P i a + 1) = pit) - Bopi(t)pJ@)j # i(j = 1, . . ., N )

+ 1) = P i t ) - B l P X W - Pi(t)l + 1) = p J W + B A t k i t ) j # i(j = 1, . . . with 0 < B o < 1 and 0 < P I < 1.

where y e is the desired value of the conductivity.

Time evolution ofconductivity y(t)

Time evolution ofpulsefrequency $ 1 )

I E E PROCEEDINGS-E, Vol. 138, No. 4 , JULY 1991

112 113 114

IEE PROCEEDINGS-E, Vol. 138, N o 4 , J U L Y I991

Anda mungkin juga menyukai