Anda di halaman 1dari 6

SIMULATED NEUROCONTROL OF AN AUTOGENOUS MILL

WITH EVOLUTIONARY REINFORCEMENT LEARNING

J.W. de V. Groenewald1, C. Aldrich1,*, J.J. Eksteen1,


A.v.E. Conradie1 and L.P. Coetzer2

1
Department of Process Engineering, University of Stellenbosch,
Private Bag X1, Matieland 7602, South Africa, Fax +27(21)8082059,
*E-mail: ca1@sun.ac.za (corresponding author)
2
Anglo Platinum Management Services, Anglo Platinum, South Africa.

Abstract: In this investigation the development of nonlinear control system for an


autogenous mill was considered. A symbiotic adaptive neuroevolution algorithm was
used in conjunction with a dynamic multilayer perceptron model fitted to actual plant
data to evolve neurocontrol systems. Simulation studies established the potential of the
approach, which yielded satisfactory results, despite having had to learn from a model
that covered part of the state space only. Copyright © 2007 IFAC

Keywords: Neural Networks, Nonlinearity, Neural Control, Time Series Analysis,


System Identification.

Although grinding circuits exhibit nonlinear dynamic


1. BACKGROUND behaviour, controller design has largely been invest-
tigated from a linear controller perspective (e.g. PID
Autogenous mills operate on the same principles as control) that requires the use of linear process models.
ball mills, except that grinding is accomplished As rigorous grinding circuit models are invariably
though the impact of larger pieces of material on nonlinear, the nonlinear models are typically linea-
smaller pieces of equal density. For successful rized in the vicinity of a predetermined economically
autogenous milling, it is thus essential that the top optimal operating region. Although these linear
size range of the ore is capable of generating a MIMO control strategies have been applied with suc-
grinding load able to crush the finer portion of the cess in industrial grinding circuits (Herbst and
feed, as well as its own progeny. At the same time, Rajamani, 1979), linearization becomes ineffective in
the large lumps in the feed should not be so durable the presence of severe process nonlinearities.
that it cannot be reduced at a rate equal to that at
which feed is entering the mill. In this paper, it is shown by way of simulation studies
that an advanced control scheme based on evolutio-
The principal variables that affect the control of an nary reinforcement learning from a dynamic model
autogenous mill are ore feed rate, water addition rate, fitted to historic process data can lead to significantly
ore feed hardness, and ore feed size analysis. Ore feed improved control of an industrial autogenous mill.
rate and water addition rate are variables that may be The paper is organized as follows. In section 2, the
deliberately manipulated in order to control the principles of evolutionary reinforcement learning are
process. In contrast, ore feed hardness and feed considered. In section 3, identification of an industrial
particle size distribution are disturbance variables, mill is explained and in section 4, the results of the
causing major problems when attempting to run a simulation study are presented, followed by some
stable mill under optimal conditions (Napier-Munn et concluding remarks in section 5.
al., 1999).
2. EVOLUTIONARY REINFORCEMENT ing algorithm, developed by Moriarty and Miikku-
LEARNING lainen (1996) was used for developing a neurocon-
troller with which to control the FAG mill.
2.1 Reinforcement learning
2.3 SANE evolutionary reinforcement learning
Reinforcement learning is learning, not through algorithm
specifying specific actions, but allowing the
discovery of actions which will yield a maximum With SANE, both a neuron and a network blueprint
reward by actually trying them. This trial-and-error population are evolved. In the population of neurons,
search characteristic, combined with a delayed reward each neuron specifies a set of weights and
characteristic, is probably the most important connections to the input layer and output layer of a
distinguishing feature of reinforcement learning single hidden layer neural network. In the population
(Sutton and Barto, 1998). During learning, the of neural networks, each genotype specifies a
learning agent (e.g. the controller) needs to be able to grouping of neurons to include in the network. As
observe the state of its environment (e.g. the process) indicated in Fig. 1, neurons are thus evolved as
and take actions that will affect the current effective partial solutions (i.e., neurons), in tandem
environmental state. In contrast with many other with the effective combinations of these partial
approaches, reinforcement learning explicitly con- solutions to develop neurocontrollers (Moriarty &
siders the whole problem of a goal-directed agent Miikkulainen, 1998).
interacting in an uncertain environment, therefore
requiring the learning agent to have an explicit goal. Blueprint Population Neuron Population

2.2 Evolutonary algoritms l w l w l w

Evolutionary algorithms (EAs) are the collective term l w l w l w

used to describe problem-solving systems using


known mechanisms of evolution in their design and l w l w l w

implementation. Whereas standard reinforcement


learning focuses on individual structures, EAs l w l w l w

maintain a population of structures that evolve


according to rules of selection and evolutionary l w l w l w

operators. The operators used by EAs to simulate the


evolution of individual structures are selection, l w l w l w
mutation, recombination and reproduction. These
search operators are responsible for exploitation and l w l w l w
exploration as found in standard reinforcement
learning. EAs, though simplistic, are sufficiently Fig. 1. Network blueprint population in relation to the
complex to provide robust and powerful adaptive neuron population.
search mechanisms (Heitkötter and Beasley, 2001).
Pairs of genes in each neuron genotype encode an
Combining evolutionary algorithms and reinforce- even number of connection-weight combinations. The
ment learning results in a technique referred to as first gene in a pair encodes a neuron connection and
evolutionary reinforcement learning that has several the second gene encodes the weight for that particular
advantages over standard reinforcement learning. One connection. Each connection gene can assume an
of the more important advantages of evolutionary integer value that ranges between zero and one less
reinforcement learning is the fact that it can be used than the total number of input and output nodes.
to balance exploration and exploitation by maintain- Decoding a connection gene assigns a connection to
ing a population of different strategies. Using evolu- either an input or an output node. Should the integer
tionary reinforcement learning, exploitation is accom- value in the connection field be less than the total
plished by assigning a greater number of evaluations number of input nodes, the connection is made to the
to individuals that display more effective behaviour corresponding input node number. Otherwise, the
than an average individual, as well as by transmitting connection is made to the corresponding output node
parents with high fitness values unchanged to the next number. Connections are thus probabilistically
generation for evaluation. Exploration, using assigned to either input or output nodes, based on the
evolutionary reinforcement learning, is represented by number ratio of input to output nodes.
the evolutionary (search) operators producing
offspring, which, unlike other random exploration
Each weight gene is a floating point value with a zero
strategies, is directed towards solutions having a
mean Gaussian distribution and standard deviation of
greater probability of producing greater rewards
around 2. Connection and weight genes are randomly
(Conradie, 2000).
allocated to each neuron in the population (Moriarty
& Miikkulainen, 1998).
In this investigation, the SANE (Symbiotic, Adaptive
Neuro-Evolution) evolutionary reinforcement learn-
Genotypes in the network blueprint population are This aggressive, elitist breeding strategy is normally
comprised of sets of pointers (i.e. address pointers) to not incorporated in the evolution of neurocontrollers,
neural structures. Initially, these neural address as this would generally lead to premature
pointers are assigned randomly to neural structures convergence of the population. However, since SANE
(Moriarty & Miikkulainen, 1998). Moreover, the is resistant to pressure against convergence, SANE
number of neurons in each network is fixed, performs well with this aggressive strategy (Moriarty
depending on the complexity of the problem. SANE’s & Miikkulainen, 1998).
evolutionary generational algorithm operates in two
main phases – evaluation and recombination, as Recombination for the network population: Crossover
briefly explained below. in the blueprint population results in the exchange of
address pointers to the neuron population. Should a
Evaluation: The evaluation phase determines the parent point to a specific neuron, one of its children
fitness of each neuron and network in the populations. will consequently also point to that particular neuron
Network blueprints are evaluated based on (Moriarty & Miikkulainen, 1998).
reinforcement from direct interaction with either a
dynamic simulation or real world environment. A To avoid convergence in the blueprint population, a
fitness value is assigned to the blueprint network twofold mutation strategy was incorporated. Pointers
based on the criterion specified by the objective were reassigned with a probability of 0.2% to
function. randomly selected neurons in the neural population.
This promoted the use of neurons other than those in
Each neuron genotype contained within a blueprint the elite neuron population. During the evolution of
network is assigned a fitness based on the summed blueprints, breeding neuron pointers were assigned to
fitness of the five best networks that the neuron descendent neuron pointers with a 50% probability.
participated in. Using only the best five networks These two mutation operators preserved neuron
prevents an average neuron with several pointers pointers in the top blueprints, by not mutating any
from dominating more effective neuron discoveries breeding (elite) networks (Moriarty & Miikkulainen,
that have few pointers (Moriarty & Miikkulainen, 1998).
1998).
3. IDENTIFICATION OF AN INDUSTRIAL
Recombination of neuron population: After evalua- AUTOGENOUS MILL
tion, the neuron population and the network blueprint
population are ranked based on the assigned fitness. All the variables from the closed circuit FAG mill
For each neuron in the top 20% of the neuron were embedded by use of singular spectrum analysis
population (i.e. elite population), a mate is selected (Golyandina et al., 2001; Vautard et al. 1992),
randomly from the neurons that have a higher fitness including the input variables that are also the
than that particular neuron. For example, the neuron principal variables that can affect the control of an
ranked third may only reproduce with the neurons autogenous mill. Four variables were used, viz. the
ranked second and first. Two descendent neurons are mill power (x1), mill load (x2), mill fine feed rate (x3)
created from a one-point crossover operator and one and mill inlet water flow rate (x4).
of the descendent neurons is randomly selected to
enter the population. The remaining descendent More specifically, each of the four time series
neuron is replaced randomly by one of the parent variables xj(t), t = 1, 2, … n, and j = 1, 2 .. 4, was
neurons, after which the descendent neuron is inserted embedded via construction of a trajectory matrix
into the population. Copying one of the parent obtained by sliding a window of length M along the
neurons as the second descendent reduces the effect time series to give lagged vectors xi∈ΡM (dropping
of adverse neuron mutation on the blueprint the subscript j for clarity):
population. The two descendents replace the most
ineffective neurons in the population according to xi = [x(i), x(i+1), … x(i+M-1)]T, (1)
rank. This replaces a number of the most ineffective
neurons (i.e, double the number of elite neurons) in for i = 1, 2, … N-M+1
the population after each generation. No mutation
operator is used on the elite or breeding neurons The length of the sliding window, M, was determined
(Moriarty & Miikkulainen, 1998). from the lowest index of the autocorrelation function
of the time series, where the autocorrelation function
Mutation is applied to the non-elite (non-breeding) assumed a negligibly small value. The vectors xi thus
portion of the neural population only. Connection formed were collected in an augmented multi-
genes were randomly reassigned with a 2% dimensional time series known as a trajectory matrix,
probability to either an input or output node. Weight
genes were mutated with a probability of 4% for a X = [x1, x2, … xN-M+1]T (2)
random Gaussian weight adjustment and a probability
of 0.1% for inversion of the sign of weight. (Moriarty The trajectory matrices of all the variables were
& Miikkulainen, 1998). consequently aggregated and trimmed to form a large
matrix with Q columns, from which a Q x Q
covariance matrix CX of was constructed (Broomhead mill load (x2) one-step ahead in time (30 seconds
and King, 1986): sampling interval). Fig. 2 shows the free-run
prediction of the mill power, i.e. prediction of the
1 (3) model when it uses its predicted outputs as inputs in a
CX = XT X
N − M +1 moving time window. For comparative purposes, Fig.
3 shows the same predictions obtained with a linear
The decomposition of the aggregated trajectory model. From these two figures, it is clear that the
matrix was obtained as linear ARX model was unable to capture the
underlying non-linear dynamics of the data, while the
nonlinear MLP neural network model resulting in a
CX a k = λk a k , k = 1, 2, ... , M (4)
model that generalized better.

where ak and λk are the k’th eigenvector and Fig. 4 shows the dynamic behaviour of the mill in
eigenvalue respectively. The eigenvector is often terms of the three most important pseudostate space
referred to as the empirical orthogonal function in variables of the system (the first three principal
time series literature. components extracted from the aggregated trajectory
matrix of the variables). As can be seen from the
The scores of these empirical orthogonal functions figure, the reconstructed attractor occupies three
were consequently mapped to the target variables distinct areas in the state space, with the largest
with a neural network. containing most of the observations. The two smaller
clusters in Fig. 4 indicate periods of faulty operation,
where the FAG mill discharge flow rate (light gray
cluster in the middle) and rougher feed flow rate
(smaller darker shaded cluster at the bottom) was
zero.

Fig. 2. Free-run prediction using of mill power with a


multilayer perceptron neural network.

Fig. 4. Dynamic behaviour of the mill as portrayed by


the attractor of the mill variables in phase space.
The models depicted in Figs 2 and 3 were fitted to
the data contained in the main cluster only and not
the smaller clusters associated with deviations in
process operating conditions.

4. RESULTS

Using the aforementioned model with the


neurocontroller software using the SANE algorithm, a
neurocontroller was developed for a FAG mill at an
industrial plant in South Africa. Simulation tests were
conducted using the neurocontroller, whereby the
desired setpoint for the FAG mill power (x1) was set
Fig. 3. Free-run prediction using of mill power with a at 3000 kW, although only sparse process data were
linear ARX model. available within this operating region, with the
neurocontroller determining the FAG mill fine feed
This resulted in a multilayer perceptron neural rate (x3) and the FAG mill inlet water flow rate (x4) to
network neural network containing seven nodes in its be used in order to achieve the desired setpoint.
single hidden layer. The output of this model
predicted both the FAG mill power (x1) and the FAG During simulation, the desired set points were
achieved within approximately one hour after
initiating control (Figs 5 and 6), which appeared to be It should be noted that a drawback of using artificial
realistic. The neurocontroller selected the FAG mill neural networks for control, as with modelling, is that
inlet water flow rate (x4) as the primary manipulated although the network is capable of learning the input-
variable, since the control actions of the variable had output response of a given a region, it cannot
a larger effect than those of the FAG mill fine feed extrapolate, scale up, or transfer the learning to other
rate (x3). regions of operation, as is possible with models
composed of differential equations (Hoskins and
Himmelblau, 1992). It is therefore important that
sufficient process data are collected in the region of
1.00
(scaled)
Power

0.75

0.50
the desired process setpoint (currently the FAG mill
0.25
0 0.5 1 1.5 2
2.5 power (x1) at 3000 kW), which should result in
improved process model performance in this region
6
(scaled)
Load

2 of the state space.


0
0 0.5 1 1.5 2
2.5
5. CONCLUSIONS
396
Fine Mill
Feed

395

394

393
0 0.5 1 1.5 2
2.5 An advanced control system capable of controlling a
nonlinear FAG mill could be evolved with the SANE
50
Mill Feed
Water

algorithm. Although previous work, using the same


45

40
0 0.5 1 1.5 2
2.5 approach on ball mills (Conradie and Aldrich, 2001),
Time (h)
yielded promising results as well, the significance of
Fig. 5. Process responses for FAG mill power (x1) this work is the approach was extended to an
initially above 3000 kW. industrial system that could not be modelled or
simulated from first principles.
0.75

Instead, simulation was accomplished by fitting a


(scaled)
Power

0.50

0.25
dynamic process model to historic data that covered
part of the state space only. More specifically, the
0
0 0.5 1 1.5 2
2.5
1

effects of the hydrocyclone parameters, such as


(scaled)
Load

density and pressure, as well as the coarse feed were


0.5

not included in the model, despite the fact that these


0
0 0.5 1 1.5 2
2.5
394

variables can have a significant influence on the


Fine Mill
Feed

circulating load and mill power.


393

392
0 0.5 1 1.5 2
2.5
52

Despite these limitations, simulation studies have


Mill Feed
Water

50

48
indicated that the neurocontrollers developed by
evolutionary reinforcement learning (SANE algo-
46
0 0.5 1 1.5 2
2.5
Time (h)
rithm) were capable of realistically controlling the
FAG mill.
Fig. 6. Process responses for FAG mill power (x1)
initially below 3000 kW. REFERENCES

Although the effect of the inlet water flow rate is Broomhead, D.S. and King, G.P. (1986) Extracting
consistent with results obtained from both a qualitative dynamics from experimental data.
sensitivity analysis and decision tree model fitted to Physica D, 20, 217-236.
the system under investigation, it may not be the ideal Conradie, A.v.E. (2000). Neurocontroller
manipulated variable to use, particularly in open Development for Nonlinear Processes Utilising
circuits. Adding large volumes of water to rectify an Evolutionary Reinforcement Learning. Thesis
overfilled mill may lead to intermediate and fine (MSc). University of Stellenbosch, Stellenbosch,
particles being flushed out, thereby leaving only large South Africa.
particles in the mill. This could lead to a reduction in Conradie, A.v.E. and Aldrich, C. (2001).
the rate of abrasive grinding and the production of Neurocontrol of a ball mill grinding circuit.
fines. Moreover, flushing of intermediate particle Minerals Engineering, 14(9), 1277-1294.
sizes from the mill could result in particles that are Golyandina, N., Nekrutkin, V. and Zhigljavsky
too coarse entering the flotation circuit. These coarse (2001). Analysis of Time Series Structure: SSA
particles will not float and depending on the circuit and Related Techniques. FL, USA, Chapman &
setup, they may even cause sanding in the flotation Hall/CRC.
circuit. These potential problems could easily be Heitkötter, J. and Beasley, D., eds. (2001). The Hitch-
circumvented though, by placing suitable constraints Hiker's Guide to Evolutionary Computation: A
on the manipulated variables at the neurocontroller’s list of Frequently Asked Questions (FAQ)
disposal during the learning phase. [online]. USENET: comp.ai.genetic. Available
from:
ftp://rtfm.mit.edu/pub/usenet/news.answers/ai- Moriarty, D.E. and Miikkulainen, R. (1998). Forming
faq/genetic/, 12 April 2002. Neural Networks through Efficient and Adaptive
Herbst, J.A., and Rajamani, K. (1979). Evaluation of Coevolution. Evolutionary Computation, 5 (4),
Optimising Control Strategies for Closed Circuit 373-399.
Grinding," In: Developments in Mineral Napier-Munn, T.J., Morrell, S., Morrison, R.D. and
Processing, D.W. Fuerstenau (ed.), International Kojovic, T. (1999). Mineral Comminution
Mineral Processing Congress, Warsaw, 1642- Circuits: Their Operation and Optimisation.
1670. Australia: Julius Kruttschnitt Mineral Research
Hoskins, J.C., and Himmelblau, D.M. (1992). Process Centre.
Control via Artificial Neural Networks and Sutton, R.S. and Barto, A.G. (1998). Reinforcement
Reinforcement Learning. Computers & Chemical Learning: An Introduction. London: MIT Press.
Engineering, 16(4), 241-251. Vautard, R., Yiou, P., & Ghil, M. (1992). Singular-
Moriarty, D.E. and Miikkulainen, R. (1996). Efficient spectrum analysis: A toolkit for short, noisy
Reinforcement Learning through Symbiotic chaotic signals. Physica D, 58, 95-126.
Evolution. Machine Learning, 22, 11-33.

Anda mungkin juga menyukai