Managing Editor:
M. HAZEWINKEL
Centre for Mathematics and Computer Science, Amsterdam, The Netherlands
Volume 282
Cellular Automata,
Dynamical Systems
and Neural Networks
edited by
Eric Goles
and
Servet Martinez
Departamemo de /ngenieria Matenuitica,
F.C.F.M.,
Universidad de Chile,
Samiago. Chile
FOREWORD vii
This book contains the courses given at the Third School on Statistical Physics
and Cooperative Systems held at Santiago, Chile, from 14th to 18th December
1992. The main idea of this periodic school was to bring together scientists work-
ing on subjects related with recent trends in Statistical Physics. More precisely
related with non linear phenomena, dynamical systems, ergodic theory, cellular au-
tomata, symbolic dynamics, large deviation theory and neural networks. Scientists
working in these subjects come from several areas: mathematics, biology, physics,
computer science, electrical engineering and artificial intelligence. Recently, a very
important cross-fertilization has taken place with regard to the aforesaid scientific
and technological disciplines, so as to give a new approach to the research whose
common core remains in statistical physics.
The expository text of Fran<;ois Blanchard concerns the study of normal numbers
and its preservation under some symbolic transformation. This work furnishes
the main concepts used in symbolic dynamics and automata theory. Some open
problems dealing with cellular automata are presented.
The survey paper of Artur Lopes is devoted the study of the relations between
ergodic theory of dynamical systems with large deviation theory. In this paper
the main concepts and basic results of ergodic theory are introduced: Birkhoff's
theorem, entropy pressure and the Ruelle-Perron-Frobenius operator; as well as
vii
viii
the formalism of large deviation theory. The main results connecting pressure and
free energy are established.
The exposition of Patricio Perez deals with the storage capacities of artificial
neural networks. He presents in the framework of statistical mechanics the stor-
age of unbiased, biased and correlated patterns as well as some numerical results
concerning the storage capacity of two coupled Hopfield networks.
The editors are grateful to the participants of the School, as well as to the
authors of the individual chapters. They are also indebted to the sponsors and sup-
porters whose interest and help was essential for the success of the meeting: Fonde-
cyt, Conicyt, French Cooperation, Departamento de Relaciones Internacionales
and DTI of the Universidad de Chile and Departamento de lngenieria Matematica
and CENET of the Facultad de Ciencias Fisicas y Matematicas.
Mrs. Gladys Cavallone deserves a special mention for her very fine and hard
work typing the book.
The Editors
CELLULAR AUTOMATA AND TRANSDUCERS.
A TOPOLOGICAL VIEW
FRANCOIS BLANCHARD
C.N.R.S.
Laboratoire de M athematiques Discretes
C.-tse 930 - 163 avenue de Luminy
13288 Marseille Cedex 9
France
ABSTRACT. In this article we deal with two of the numerous instances in which automata
play a part in Topological or Measurable Dynamics. The first is preservation of normality by
transducers; here we give a detailed account of the main proof in [2], that of normality preservation
under multiplication by rationals. The second instance is an introduction to the dynamical
properties of onto cellular automata - since there are but a few known results about them, we
mainly give definitions, point out some elementary properties and ask questions. These two
applications of automata in the field of Dynamics, though very different in spirit, are strongly
linked, because cellular automata are a particular class of transducers, because entropy and other,
mainly topological, notions from Dynamical Systems play an important part in both, and finally
because one of the underlying aims is to study the transformations of the interval associated to
some of these automata.
1. Introduction
E. Goles and S. Martinez (eds.), Cellular Automata, Dynamical Systems and Neural Networks, 1-22.
© 1994 Kluwer Academic Publishers.
2
of tools. In the first instance transducers are used as tools; it is an ergodic proof
of the fact that normality is preserved under multiplication by rationals. Only the
proof is original in the article where it comes from [2] - the result was first obtained
during the fifties by Harmonic Analysis - but there are some interesting side effects:
the methods permit to investigate what happens to real numbers that are generic
for some non Lebesgue measures, under multiplication by rationals; it also allows
to prove that multiplication by rationals also preserves "near" normality.
The second topic is the dynamics of onto cellular automata - which means
in this case their properties are described for their own sake. A possible physical
interpretation is the behaviour of some interacting or cooperative systems at equi-
librium. This field is wide and largely unexplored, and for this reason promising for
beginners. After recalling the definitions of topological conjugacy and entropy, we
introduce two notions for one-sided cellular automata, expansiveness and a "stan-
dard symbolic factor", which we feel sure will play an important part in future
developments, and then prove that the symbolic factor has the entropy of the cel-
lualr automaton, and that expansiveness is equivalent to the fact that the cellular
automaton and its symbolic factor are conjugate. Finally a few open questions are
posed.
We made references to some of the papers we are aware of, which does not
mean authors of other papers in the field should feel undeservedly neglected; this
is decidedly not a survey, and all the less so in the domain of cellular automata,
where a huge literature already exists. As should be expected, there is infinitely
more in the ergodic theory of compact spaces than is to be found in this paper; only
a few useful ergodic definitions and results are recalled, mainly at the beginning of
the sections in which they are used. Motivated readers can consult [7] or [21] for a
deeper insight. Let us point out that in this article the point of view is rather that of
topological dynamics - considering primarily a compact metric space, such as [0, 1)
or the shift space, endowed with a continuous transformation, and then introducing
one or several invariant measures - than that of metric dynamics - considering a
probability space, and then a measurable measure-preserving transformation.
After defining transducers and introducing our two topics in Section 2, Section
3 is devoted to multiplication transducers and normality preservation, and Section
3
2. Transducers
Let A be a finite set of symbols, or alphabet. A* denotes the set of all finite
sequences on A, and A+ the set of all nonempty sequences. The sets of infinite
sequences on A, A IN and A E will be constantly used in this article: their elements
are denoted by x = (x;, i E IN or m); we write x(i,j) = x;x;+ 1 ••• Xj. They are
compact for the usual topology, and usually endowed with the shift transformation
a:
Example 1. For A= A'= {0, 1}, C ={a, b}, consider the following graph as that
of a left-to-right transducer:
0/0
0
8
1/1
8 0/1
01/0
5
Let us next introduce some relevant vocabulary. When one consider only one
of the labels (the input or the output), the transducer T is reduced to what is
usually called an automaton. Because of this, all usual terms in Automata Theory
are applied to transducers, with the added qualification "input" or "output" when
necessary for the siJ,ke of definiteness.
A transducer is said to be irreducible when its graph is strongly connected.
A word u E A*(A'*) is accepted for input (output) by T if there is a path with
input (O!Jtput) label u in the graph. The set of all input (output) words accepted
by Tis denoted by L 0 (T) (L 1 (T)). These two languages define two subshifts.
But the only case we are concerned with is when L 0 (T) = L 1 (T) = A*.
Suppose a transduc~r is assigned the (too hard) task of translating English into
Spanish. One would expect it to accept more or less correct English words, phrases
or sentences: there is no point in translating rubbish. Then the sets of "words" (in
the sense of Language Theory) accepted by this transducer for input and output
would be proper subshifts. On the other hand, a transducer performing multipli-
cation by 3 on the expansions of all real numbers (or even integers) to base 2 must
accept all words on {0, 1}, since all of them occur in some expansion. The cellular
automata we shall be dealing with possess the same property.
A transducer is said to be input-deterministic if given c E C, a E A, there exists
at most one arc from c with input a (same definition for output). For instance
the transducer of Example 1 is input- and output-deterministic. This property is
very useful but sometimes too strong. A convenient one is the following: a non
ambiguous transducer is one such that for any u E A+, c, c' E C, there is at most
one path in the graph from c to c' with label u. A deterministic transducer is
obviously non ambiguous. This implies for any u E A+ there are at most #(C)
paths from c with label u.
Two particular classes are especially considered in the sequel: cellular au-
tomata, which are a very simple kind of transducers, much easier to handle than
the general type since they are merely maps, and multiplication transducers.
6
Cellular automata are widely used in statistical physics in order to model the
microscopic equilibrium or evolution of fluids or spin glasses. A cellular automaton
is a map F from the configuration set A E or A IN to itself, defined in the first case by
(F(x))i = f(x(i- n, i + n)
for some given map f: A2 n+l--+ A, and in the second by
Not all transducers can be reduced to cellular automata. The ones performing
multiplication by k in base p generally cannot [3]. Multiplication transducers and
their properties are well-known to language theorists, but we do not know of any
book in which they are described; we are therefore referring the reader to [2, section
4.A] for proofs.
7
The transducer 7i.:,p multiplying by kin base pis just a representation of the
usual algorithm. Each of the sets A and A' is equal to {0, 1, ... ,p- 1}. The state
set Cis the set of carries {0, 1, ... , k- 1}. Like the algorithm, the transducer acts
from right to left. Denote by [r] the-integer part of the real number r. Suppose
the initial carry is c, and one has to multiply the input a: then the output b and
the new carry c' are given by the formulas
b = ka + c(mod.p) (1)
and
c' = f(ka +c), (2)
where f(n) = [n/p]. From these formulas one easily deduces that the set of carries
may be restricted to {0, 1, ... , k- 1}. The graph of Tk,p, together with the input
labels, can be deduced from Equation (2); Equation (1) gives the output labels.
Equation (2) also testifies that no other carry need be added to the set C, and
that Tk,p is always input-deterministic (this simply means that given the input
and carry at some time, one can deduce from them the carry at next time). It
is slightly more difficult to check that no carry in C can be done without. The
results we need are summed up in the following classical statement.
3. Normality Preservation
I.e. random from a purely statistical point of view, do not behave satisfactorily
at the "beginning" (say, for the 10 10 first digits !), at least for some sophisticated
simulations. This is a strong motivation for developping theoretical researches on
normality.
Another one is more specific. Implicit in [9] is the question whether any non
atomic measure on the 1-torus, invariant under multiplication by 2 and multipli-
cation by 3, is necessarily Lebesgue. -5ome progress was made towards a positive
answer (see [13] for more on this subject) but the general result is still unknown.
Transducers yield a new formulation of the problem - which does not mean they
are pointing out to a solution.
In [1], [5], [14], [20] the reader can find other aspects of what can be done
about normality with more general kinds of transducers.
Both definitions imply fl. is T-invariant; when fl. is also ergodic, Birkhoff's
theorem states that {I.-almost all points are {I.-generic. The most interesting cases
for this study are when X = [0, 1) and Tis multiplication by the integer k {mod. 1),
or X= AJN and Tis the shift; a point x E [0, 1), generic for the Lebesgue measure,
is called normal, as well as its expansion, which is generic for the uniform measure.
We only give the definition of the entropy of a shift-invariant measure on
X = ABV {or some subshift). For u E An, denote by [u] the cylinder set {x E
Xjx(O, ... ,n -1) = u}.
Proposition 2. Suppose 4> is a factor map from the subshift Y to the subshift X,
such that card {4> - I ( x)} is bounded on X; let v be an invariant measure on Y and
fl. = ci>v. Then h,. = hv.
3.2 MORE ABOUT MULTIPLICATION TRANSDUCERS
All schoolchildren know (or rather should know) how to use multiplication
algorithms. So why bother to represent them as transducers? The answer is this
presentation emphasises elementary properties of the associated graphs, and these
turn out to be decisive for the prpof that normality is preserved under multiplica-
tion by rationals, as well as for further results.
It is obvious how to use Tk,p to multiply an integer by k in base p, or to do
the same mod. 1 to a dyadic number. But is the second case really as obvious as it
seems ? It is if we only think of the canonical expansion of a dyadic number; but
if, as is the case, we want the set of expansions to be closed, a dyadic number has
got two expansions ! In this paper we consider that a real number r E [0, 1) has
IO
a set of expansions E(r) C AJN with one or two elements; conversely, a sequence
x E AJN has one valuation:
L x;.p-i(mod. 1).
00
V(x) =
i=l
Consider all possible infinite paths on the graph of Tk,p with input label in
E(r), then the set F(r) of their output labels. I claim F(r) = E(k.r (mod. p)),
which implies it contains at most two elements. Indeed, since Tk,p is deterministic,
finite paths in the graph having the input label s( 0, n) for some s E E( r) are
entirely determined by the carry chosen for the "initial" time n. But changing the
carry at time n corresponds to a difference of valuation with modulus less than
kn-I. Letting n tend to infinity, this means the valuations of distinct elements of
F( r) are identical.
Formally, what we have been doing is this: to a real number in [0, 1) we sub-
stituted the set of its infinite expansions; then we considered all infinite sequences
of cJN X AJN' representing a path in the graph of Tk,p together with its input
label in E(r), and output label in F(r); and finally, from these triple sequences we
selected the sequences of output labels, all of them representing the same number
k.r(mod.l). This situation is represented in the following commutative diagram,
y
X
;/~ X
vj
0 jv
xk(mod. I)
[0, I) [0, I)
where y is the closed subset of cJN X AJN X AlN, for which the sequence on c
represents a path in the graph and the sequences on A are its input and output
labels; ¢> is the projection of Y onto X, corresponding to input labels, and 'lj; is the
one corresponding to output labels.
11
Proof. Let r E [0, 1), q Ea}, x be the expansion of r to base p. The statement is
equivalent to the following claim: for any integer k, the expansion x' of k.r (mod. 1)
is normal iff x also is. Indeed, assuming q = k / k', the "if" part of the former claim
establishes normality of k.r (mod. 1) and the "only if" part, the normality of
k.rfk' = q.r (mod. 1); putting the two results together achieves the proof. Now
it is sufficient to prove this for two simple cases: when k and p are coprime, and
when k divides p. This is done by using the elementary properties of multiplication
transducers (Proposition 1), together with classical results of Symbolic Dynamics.
since >. is the unique measure on X with maximal entropy, so is J.L 1 , which implies
f.L' = >. • •
12
Two other results in [2), generalising the latter, may be quoted here. The first
states that given a rational q, the closer to normality r is, the closer is q.r. This
may also be obtained by Harmonic Analysis, but of course the topological methods
used in [2) are perfectly natural.
The second is the following: suppose the invariant measure p on X is such
that <I> - 1 (p) is a singleton, and x E X is generic for p. Then any transduced
image X1 is w(<I>- 1 (p))-generic. Non trivial examples of this situation exist.
of X
One important fact about this result is the difference with normality preservation:
one does not assume w(<I>- 1 (1'1)) = !'1; the transduced image of a generic point is
still generic, but generally for another invariant measure. In what cases are the two
measures identical ? This question brings us back to the one asked by Furstenberg.
Assume 'R is an open cover of the compact metric set X: we denote by H('R)
the nonnegative real number inf{logcard('R')}, where the infimum is taken over
all finite subcovers 'R' of 'R. The cover 'R is said to be finer than S if for any
U E 'R there is V E S with U C V; this property is denoted by S ~ 'R. It implies
H(S) ~ H('R).
Denote by 'R V S the cover made up of all intersections R n S, R E 'R, S E S.
For n E IN write
n-1
'R(n) = V r-in.
i=O
h(X, T) = n-oo
lim 1/n log#(L(X) nAn). (3)
Proposition 6. For any compact space X endowed with the continuous onto map
T, one has
h(X, T) = sup (hi').
I'EI(X)
Here are two classical properties concerning the topological entropy of sym-
bolic sytems we are going to use:
- factor maps between sofic systems preserve topological entropy if and only if
they are bounded-to-1, i.e. the preimages of points under the map have bounded
cardinality.
- any proper subshift of A;,: has entropy strictly less than log #(A).
In this subsection proofs are given for cellular automata acting on A;,:, but
they are strictly identical for those acting on A IN.
Kari [15] proved that it is undecidable whether a 2-dimensional cellular au-
tomaton is onto. In the one-dimensional case, the same question (which is a key
issue in this context) is fortunately decidable. Since this fact seems not to be
widely known, we shall give a sketch of the proof.
Proof.
Denote by FA the image of A under the map F. By our assumption F(AE) =
AE, therefore h(F(AE),a) = h(AE,a); since topological entropy is preserved by
bounded-to-1 factor maps only, this means F is bounded-to-1. Now a bounded-to-
1 map preserves measurable entropy, and h( FA, a) = h( A, a). But A is the unique
measure on X having this entropy, hence FA = A. •
Example 2. Let p > 1 be an integer and A = {0, ... ,p- 1} be endowed with
addition mod. p. Define F by f(xo, ... , Xr) = g(xo ... Xr-d + Xr, where g is some
map from An to A.
17
From now on, denote by X the set AJN of simply infinite sequences ~n A.
Every onto cellular automaton (X, F) has a symbolic factor (Z, a) which plays a
primary role in its dynan1ics (it can also be defined for two-sided cellular automata,
though we do not do it here). It is sometimes conjugate to (X, F) (Exan1ple 2 above
and below) and sometimes not (F = Id, Example 3); it is not canonical, except of
course in the first case. It plays an outstanding part in the entropy calculations of
[6] and [17], though this is not pointed out in these articles.
We first introduce this factor map. Let 7l' : X --t (Ar)JN be defined by 1l'X =
(F;(x)(O, r - 1)), i E IN. One may see 1l'X as a set of r different infinite sequences
on which the shift acts simultaneously. The reader can check that 7l' is continuous
and F o 7l' = 7l' o a: putting Z = 7r(X), 7l' is a factor map from (X, F) to the
subshift ( Z, a). To put things heuristically, 7l' shrinks x E X to the sequence of
elements of the open cover n = {[u] / u EAr} (in fact a clopen partition) to which
x, Fx, ... , Fnx, ... belong in their turn. Or else a acts on Z exactly the way F
acts on nand the sequence of its preimages.
In general, the topological entropy of a cellular automaton is undecidable [11].
It can nevertheless be computed for some restricted classes of automata, as shown
by Coven [6] and Lind [17]. In both papers the topological entropy is in fact
computed on the factor ( Z, a) or its equivalent for two-sided cellular automata,
which prompted A. Maass and the author to make the following observations.
Remark. The family of subshifts Zn are all conjugate to Z = Zo, because they
are the same set of points endowed with the same transformation. But 1rn is never
a conjugacy map for n -::j:. 0, since no information at all about x(O, n - 1) can be
recovered from its image.
Another property of onto cellular automata which we think to be promising
is the following.
Proof. Suppose (Z, u) is conjugate to (X, F). Then (X, F) is symbolic, therefore
expansive.
Conversely, assume F has radius rand is expansive. For x -::j:. y, d(Fnx, Fny) >
f. for some n: whatever the chosen distance d this means there exists q E IN+ such
that Fnx(O,q -1) -::j:. Fny(O,r -1), and since f. is universal q does not depend on
19
x and y. For any p E IN+ consider the factor map '1/Jp : X -+ (AP)lN defined by
'1/Jp(x) = (Fi(x)(O,p- 1)), i E IN, and its image Yp: the previous remark means
that psiq is 1-to-1, therefore (Yq, a) and (X, F) are conjugate and (Y, a) is symbolic.
There remains to replace Yq by Z = Yr. If q :::; r this is very easy: (X, F) is
conjugate to (Yq,a), which is a factor of (Yr = Z,a), which is in its turn a factor
of (X, F), so that (Z,a) and (X, F) are conjugate.
Now call q the smallest possible value such that (Yq, a) and (X, F) are con-
jugate, and assume q is greater than r. As q is the smallest possible value for
conjugacy one can find x ::f. yin X with 1l"q- 1 (x) = 7rq_ 1 (y), the last equality im-
plying that x( 0, q- 2) = y(O, q- 2). Consider the two points x' ::f. y' of X defined
by ax' = x, ay' = y and x(O) = y(O) =a E A. One has
which identifies all but the first infinite sequences of symbols constituting 1r qX 1 and
1r q y'; also, since the first q letters of x' and y' are the same this is also true by
induction for Fnx' and Fny'. So 1l"qX 1 = 1l"qY1 whereas x' =/= y', which contradicts
the minimality assumption on q. Hence the result. •
F is onto [10]. But to compute the entropy, up to now it has been necessary to
make an extra assumption: b = b1 ... br must be an aperiodic word, meaning there
20
References
[I] Blanchard, F., Non Literal Transducers and Some Problems of Normality, J.
Th. Nombre8 Bordeaux, to appear.
[2] Blanchard, F., Dumont, J.-M., Thomas, A., Generic Sequences, Transducers
and Multiplication of Normal Numbers, l8rael J. Math., 80,257-287 (1992).
[3] Blanchard, F., Host, B., Maass, A., Representation par Automates de Fonc-
tions Continues du Tore, preprint (1993).
[4] Botelho, F., Garzon, M., On Dynamical Properties of Neural Networks, Com-
plex Sy8tem8 5, 401-413 (1991).
[5] Broglio, A., Liardet, P., Prediction with Automata. Symbolic Dynamics and
its Applications, P. Walters (ed.), Contemporary Math. 135, AMS, Provi-
dence (1985).
[6] Coven, E.M., Topological Entropy of Block Maps, Proc. Amer. Math. Soc.
78, 590-594 (1980).
[7] Denker, M., Grillenberger, C., Sigmund, K., Ergodic Them::y on Compact
Spaces, Lecture Note8 in Math. 527, Springer, Berlin (1976).
[8] Fischer, R., Sofie Systems and Graphs, Monat8. Math. 80, 179-186 (1975).
[9] Furstenberg, H., Disjointness in Ergodic Theory, Minimal Sets, and a Problem
of Diophantine Approximation, Math. Sy8tem8 Theory 1, 1-49 (1967).
[10] Hedlund, G.A., Endomorphisms and Automorphisms of the Shift Dynamical
System, Math. Sy8tem8 Theory 3, 320-375 (1969).
[11] Hurd, L. P., Kari, J., Culik, K., The Topological Entropy of Cellular Automata
is Undecidable, Ergodic Th. Dynam. Sy8 12, 255-265 (1992).
[12] Gales, E., Martlnez, S., Automata Networks, Dynamical Systems and Sta-
tistical Physics, Kluwer Acadamic Pub., Mathematics and its Applications,
(1992).
[13] Johnson, A.S., Measures on the Circle Invariant under Multiplication by a
Nonlacunary Subsemigroup of the Integers, preprint (1991).
[14] Kamae, T., Weiss, B., Normal Numbers and Selection Rules, l8rael J. Math.
21, 101-110 (1975).
[15] Kari, J., Decision Problems Concerning Cellular Automata, Thesis, University
of Turku, Finland (1990).
22
[16] Lind, D.A., Application of Ergodic Theory and Sofie Systems to Cellular
Automata, Physica 10 D, 36~44 {1984).
[17] Lind, D.A., Entropies of Automorphisms of a Topological Markov Shift, Proc.
Amer. Math. Soc. 99, 589-595 (1987).
[18] Maass, A., On Sofie Limit Sets of Cellular Automata, preprint (1992).
[19] Parry, W., Topics in Ergodic Theory, Cambridge University Press, London
{1981) .
[20] Thomas, A., Suites Normales et Transducteurs, preprint {1992).
[21] Walters, P., An Introduction to Ergodic Theory, Graduate Texts in Math. 79,
Springer, Berlin {1982).
[22] Wolfram, S., Theory and Applications of Cellular Automata, World Scientific,
Singapore (1986).
AUTOMATA NETWORK MODELS OFINTERACTING
POPULATIONS
NINO BOCCARA
DRECAM-SPEC
CE-Saclay, France
Department of Physics, University of Illinois
Chicago, USA
1. Introduction
The first task that faces the theoretician who wants to interpret the time evolution
of a complex system is the construction of a model. In the actual system many
features are likely to be important. Not all of them, however, should be included in
the model. Only the few relevant features which are thought to play an essential
role in the interpretation of the observed phenomena should be retained. Such
simplified descriptions should not be criticized on the basis of their omissions and
oversimplifications. The investigation of a simple model is often very helpful in
developing the intuition necessary for the understanding of the behavior of complex
real systems. In many-body physics, for instance, models such as the van der
Waals model of a fluid, the Heisenberberg model of ferromagnetism, the mass and
spring model of lattice vibrations, the Landau model of phase transitions, the Ising
model of cooperative phenomena, to mention just a few, have played a major role.
A simple model, if it captures the key elements of a complex system, may elicit
highly relevant questions.
This series of lectures is devoted to the investigation of models of interact-
ing populations such as susceptibles and infectives in epidemiology or competing
species in ecology.
Most models in population dynamics are formulated in terms of differential
equations, 1 the classical example being the predator-prey model proposed in the
23
E. Gales and S. Martinez (eds.), Cellular Automata, Dynamical Systems and Neural Networks, 23-77.
© 1994 Kluwer Academic Publishers.
24
The different models to be discussed here are extensions of the so-called "gen-
eral epidemic model" (see Bailey 1975). In this model, infection spreads by con-
tact from infectives to susceptibles, and infectives are removed from circulation by
death or isolation. A simple model of this type was proposed by Kermack and
McKendrick (1927). A nice and simple discussion of their model can be found in
Waltman (1974). These authors assumed that infection and removal were governed
by the following rules:
(i) The rate of change in the susceptible population is proportional to the number
of contacts between susceptibles and infectives, where the number of contacts
is taken to be proportional to the product of the number of susceptibles S by
the number of infectives I.
(ii) lnfectives are removed at a rate proportional to their number I.
(iii) The total number of individuals S +I+ R, where R is the number of removed
infectives, is constant, that is, the model ignores births, deaths by other causes,
immigration, emigration, etc.
If S, I and Rare supposed to be real positive functions of timet, (i), (ii) and
(iii) yield
dS =-iS!
dt
dl =iSI-ri
dt
dR
-=rl,
dt
where i and r are positive constants representing, respectively, the infection rate
and the removal rate. From the first equation, it is clear that S is a nonincreas-
2 Vito Volterra (1860-1940) was stimulated to study this problem by his future son-in-law,
Umberto D'Ancona, who, analyzing market statistics of the Adriatic fisheries, found that, during
the First World War, certain predaceous species increased when fishing was severely limited. A
year before, in .1925, Alfred James Lotka (1880-1949) had come up with an almost identical
solution to the predator-prey problem. His method was very general, but, probably because of
that, his book-reprinted as Elements of Mathematical Biology (New York: Dover, 1956)-
did not receive the attention it deserved.
25
ing function, whereas the second equation implies that I(t) increases with t if
S( t) < r I i and decreases otherwise. Therefore, if, at t = 0, the initial number of
susceptibles S(O) is less than ilr, since S(t) ~ S(O), the infection dies out, that is,
no epidemic occurs. If, on the contrary, S(O) is greater than the critical value ilr,
the epidemic occurs, that is, the number of infectives first increases and then de-
creases when S(t) becomes less than ilr. 3 This "threshold phenomenon" shows
that an epidemic can occur if, and only if, the initial number of susceptibles is
greater than a threshold value.
The Kermack-McKendrick model assumes a homogeneous mixing of the pop-
ulation, that is, it neglects the local character of the infection process. This as-
sumption which is, in general, questionable may, however, be valid in some limit
cases to be discussed later. The model also neglects the motion of the individuals,
which is a factor that clearly affects the spread of the disease.
To take into account the motion of the individuals, it is usually assumed
that they disperse randomly. This hypothesis amounts to incorporating diffusion
terms in the equations. Models of this type help understanding the spatial spread
of epidemics. Consider, for instance, rabies epidemic among foxes. Rabies is a
viral infection of the nervous central system. It is transmitted by contact and
is invariably fatal. If the virus enters the limbic system, that is, the part of the
brain thought to control behavior, the fox loses its sense of territory and wanders
in a more or less random way. To discuss the spatial spread of rabies among
foxes in Europe, Kallen et al (1985) added a diffusion term in rate equation of
the infectives in the Kermack-McKendrick model in order to take into account the
random dispersion of the rabid foxes. 4 We have then
dS = -iSI
dt
di . fj2 I
- = zSI -ri + D -2
dt ox
dR
dt = rl,
where D is the diffusion coefficient of the infected foxes. This system of equations
admit travelling wavefront solutions of the form S(x- ct), I(x- ct) and R(x-
ct), where c is the speed propagation of the epidemic wave. For the epidemic to
occur, the average initial susceptible population density, i.e., ahead of the epidemic
wave, has to be greater than the threshold value r /i; and, in this case, it is found
that c behaves as D 112 • To explain the observed fluctuations in the susceptible
fox population density after the passage of the wavefront, Murray et al (1986)
have considered a less simple model taking into account fox reproduction and the
existence of a rather long incubation period (12 to 150 days).
Although the models presented sofar have unquestionably contributed to our
understanding of the spread of an infectious disease (e.g., Murray's model allows
for quantitative comparison with known data), the short-range character of the in-
fection process is not correctly taken into account. This will be manifest when we
discuss systems that exhibit bifurcations. In phase transition theory, for instance,
it is well-known that in the vicinity of a bifurcation point-i.e., a second-order tran-
sition point-certain physical quantities have a singular behavior. 5 It is only above
a certain spatial dimensionality-known as the upper critical dimensionality-that
the behavior of the system is correctly described by a partial differential equation.
For instance, the spatial fluctuations of the order parameter close to a second-order
transition point are correctly described by the time-independent Landau-Ginzburg
equation above 4 dimensions. 6
One way to take correctly into account the short-range character of the infec-
tion process is to discretize space, and to represent the spread of an epidemic as
the growth of a random cluster on a lattice. A kinetic model of cluster growth may
be defined as follows (Grassberger, Cardy and Grassberger). Denote, as usual, by
Z 2 the two-dimensional square lattice. At a time t a site of Z 2 is either vacant
(healthy), occupied (infected) or immune. An immune site is one which has been
occupied in the past. At timet+ 1 a vacant site becomes occupied with a proba-
bility p if, at least, one of its neighbor is occupied at time t. An occupied site at
time t becomes immune at time t + 1. However, immunisation is not perfect and an
immune site may become reoccupied with probability p-q if, once again, one of its
neighbors is occupied. More generally, one might assume that the probability that
a site becomes occupied depends on the number of occupied neighbors. If p = q,
any bond can be tried only once, since at a second try one of the neighboring
sites is completely immune and no infection can pass. A similar model has been
studied by McKay and Jan (1984) to discuss forest fires. Vacant, occupied and
immune sites correspond, respectively, to sites occupied by unburnt, burning and
burnt trees. It is found that there is a critical probability Pc below which only a
finite number of sites are immune. In the vicinity of Pc the system exhibits a sec-
ond order phase transition characterized by a set of critical exponents. The upper
critical dimensionality is equal to 6, and Cardy (1983) has calculated the critical
exponents to first order in E = 6- d. Cardy and Grassberger (1985) have shown
that these models are in the same universality class-i.e., have the same critical
exponents-as percolation cluster growth models. The relationship of the general
epidemic model to the percolation process has been first noticed by Mollison (1977).
The general epidemic model on a lattice may be viewed as a discrete dynamical
system, in space and time. More precisely, it may be defined as a probabilistic
automata network. In simple words, an automata network (Goles and Martinez
1991) consists of a graph where each site takes states in a finite set. The state of a
site changes in time according to a rule which takes into account only the states of
the neighboring sites in the graph. This is the point of view which will be adopted
in these lectures.
To conclude this already rather long introduction, it is probably worthwhile
to give a slightly more general definition of the spatial general epidemic model
since, after the review paper of Mollison (1977) and the introduction of random
graphs-which are graphs with randomly colored edges-by Gertsbakh (1977),
several papers have appeared in the mathematical literature on this topic. 7 Let V
be a set of sites (usually V = zd). At any timet ~ 0 each site is either empty or has
a healthy or an infected individual. The number of sites with infected individuals
is initially finite. An infected individual emits germs in a Poisson process until
he is removed after a random lifetime. Each germs goes independently to another
site chosen according to a probability distribution attached to the parent site. If a
7 See, e.g., Kuulasmaa (1982), Kuulasmaa and Zachary (1984), and Cox and Durrett (1988).
28
germ meets an infected individual or goes to an empty site, nothing happens. After
an individual has been removed his site remains empty for ever. The infectives
have all the same emission rate and identical lifetime distribution.
All these different versions of the spatial general epidemic model still neglect
the motion of the individuals. The influence of this factor on the spread of the
epidemic is one of the main concerns of this series of lectures. Various models
will be discussed. All of them are site-exchange cellular automata, that is, au-
tomata networks whose local rule consists of two subrules. The first one, applied
synchronously, models the interaction process between the individuals. It is a prob-
abilistic cellular-automaton rule. The second subrule, applied sequentially, models
the motion of the individuals. It is a site-exchange rule. Such models may also
be viewed as interacting particle systems. The interested mathematically-oriented
reader should refer to Liggett (1985).
Cellular automata provide simple models for a variety of complex systems con-
taining a large number of identical elements with local interactions (Farmer et al,
Wolfram, Manneville et al, Gutowitz, Boccara et al). A cellular automaton ( CA)
consists of a lattice with a discrete variable at each site. The state of the CA is
specified by the values of the variables at each site. A CA evolves in discrete time
steps. At a given time, the value of the variable at one site is determined by the
values of the variables at the neighboring sites-and the neighborhood of a site
might include the site itself-at the previous time step. The evolution rule is syn-
chronous, that is, all sites are updated simultaneously. CAs are, therefore, discrete
(in space and time) dynamical systems. They may be more precisely defined as
follows. Let s: Z x N t--? {0, 1} be a function that satisfies the equation
where N is the set of nonnegative integers, Z the set of all integers, and
so:Z --t {0, 1} a given function that specifies the initial condition. Such a sys-
tem is a one-dimensional cA . d-dimensional CAs may be defined in a similar way.
The mapping f: {0, 1)2r+l --t {0, 1} determines the dynamics. It is referred to as
the local rule of theCA. The positive integer r is the range--or the radius--of the
rule. The function St: i--t s(i, t) is the state of theCA at timet. S = {0, 1}z is the
state space. An element of the state space is called a configuration. Since the state
at time t + 1 is entirely determined by the state at time t and the rule f, f induces
a mapping f: S --t S, called the global rule-or the evolution operator-such that
= nrtcs),
t~O
where, for any t E N, ft+ 1 = f oft with f 1 = f. AJ is clearly invariant, that is,
f(AJ) = AJ· Since any £-invariant subset belongs to AJ, the limit setoff is the
maximal £-invariant subset of S.
Based on investigations of a large sample of cAs, Wolfram (1984) has shown
that, according to their asymptotic behavior, cAs rules appear to fall into four
qualitative classes. Class-1 CAS evolve, from almost all initial states, to a unique
homogeneous state in which all sites have the same value. Class-2 CAS yield sep-
arated simple stable or periodic structures. Class-3 CAs exhibit chaotic patterns.
The statistical properties of these patterns are typically the same for almost all
initial states. In particular, the density of nonzero site variables tends to a fixed
value as time t tends to oo. The evolution of class-4 cAs leads to complex localized
or propagating structures.
The evolution of a class-1 or -2 CA is rather simple. On the opposite, the
evolution of a class-4 CA seems very complex. Gallas and Herrmann (1990) have,
however, argued that class-4 CAs are, actually, either class-1 or -2. The only
difference is that they reach their steady state, either homogeneous or periodic in
space, after a long transient.
30
As far as their statistical properties are concerned, class-3 CAS are somewhat
similar to systems studied in equilibrium statistical physics. The limit set of class-3
CA contains a strange at tractor. That is, after sufficiently many time steps, starting
from almost any initial configuration, the state of a class-3 CA evolves chaotically
on a Cantor-like subset of S. The asymptotic behavior as the timet tends to oo
of, say the density of nonzero site-values, is either of the form exp( -at) or r-r,
where a and 1 are constants. As the range of the rule increases, the exponential
behavior is more and more frequent. Most range-1 class-3 have, on the contrary,
a power-law behavior., To illustrate these different asymptotic behaviors, we shall
briefly describe the evolution of toward their attractor of range-1 Rules 18, 54 and
22. 8
For this rule, configurations belonging to the at tractor consist of sequences of zeros
of odd lengths separated by isolated ones. The average number of sequences of
zeros of lengths 2n + 1 per site is equal to 1/2n+ 3 (Boccara et al 1990). 9 With
respect to this background a sequence of two ones or a sequence of zeros of even
length it; a "defect" or a "kink" (Grass berger 1983). Since two sequences of zeros of
odd lengths separated by two neighboring ones generate, at the next time step, a
8 Any rule may be specified by its rule number, which, following Wolfram (1983), is defined by
where
2r
e(xi, X2,. ·., X2r+l) = 2:::: f(xl, x2, ... , X2r+I) 2:::: Xj+12 2 r-j.
Xt,X2 1 ••• ,X2r+l j=O
9 Different rules may have the same attractor (Fig. 2.1b). Due to different time correlations,
their spatiotemporal patterns are, however, different.
31
10 See Bramson and Lebowitz (1991) for a rigorous study of asymptotic behaviors of densities
of pairwise annihilating particles executing random walks on a d-dimensional cubic lattice, for
any integral value of d.
32
Here again the evolution toward the attractor may be viewed as annihilation pro-
cesses of interacting particlelike structures {Boccara et al1991 ). The background is
periodic in space and time, both periods being equal to 4. Three types of particles
may be distinguished. Two of them are non propagating and periodic in time.
Their periods are equal to 4. They may be generated by sequences of zeros whose
lengths are greater that 3. We shall denote them by 9e and Yo (g for gutter)
according to whether they consist sequences of zeros of even or odd length. There
34
exists also a propagating particle w (for wall), which may be generated by three
zeros following three ones or the converse. This particle may propagate to the right
or to the left. Its velocity is equal to 1. These particles have a rather rich variety
of interactions (Fig. 2.2). As represented in Figure 2.2c, we have the reactions
where 1 is not larger than 0.2, since less probable events also contribute to the
pairwise annihilation of even gutters.
Figure 2.4. Rule 54. Remaining particles after 3 x 10 8 time steps. Note that,
in the asymptotic regime, the number of walls is proportional to the number of
even gutters. The figure shows the evolution of 512 lattice sites from a 104 -site
lattice during 512 time steps. The periodic background is eliminated through the
mapping 1J( i, t) = 2.::~:~ s( i + k, t) mod 2.
For this rule, the evolution toward the attractor cannot be viewed as annihilation
processes of interacting particlelike structures, and for large t the density of nonzero
sites tends to its stationary value exponentially. This exponential behavior has
been studied in detail by Zabolitzky (1988) who found that the constant a in the
argument of the exponential depends actually on the initial density of nonzero sites
Czz(O), and goes to zero as 0.44 cz 2 (0).
Since, in the infinite-time limit, the density of nonzero sites approaches its
stationary value as the decreasing number of particles, we may put forward the
36
following conjecture: For a class-9 CA, the density of nonzero sites tends to its
stationary value either as C"'~ or e-at, where 'Y and a are positive. In the case of
a one-dimensional CA, the power-law behavior is observed if, and only if, after a
short transient, the spatiotemporal pattern generated by the evolution of the cA may
be viewed as interacting particlelike structures evolving in a regular background.
Site-exchange CAs are automata networks whose rule consists of two subrules. The
first one is a standard synchronous CA rule, whereas the second is a sequential site-
exchange rule. This last rule, characterized by a parameter m, is defined as follows.
A site, whose value is one, is selected at random and swapped with another site
value (either zero or one) also selected at random. The second site is either a
neighbor of the first one (local site-exchange) or any site of the lattice (nonlocal
site-exchange). This operation is repeated mct(m, t)N times, where N is the total
number of sites, CJ(m, t) the density of nonzero sites at timet, and f is theCA
37
rule. The parameter m is called the degree of mixing. It is important to note that
this mixing process, which will be used to model either short- or long- range moves
of interacting individuals, does not change the value of the density CJ(m, t).
If m = oo, the correlations created by the application of the cA rule f are
completely destroyed, and the value of the stationary density of nonzero sites
c J( oo, oo) is then correctly predicted by a mean-field-type approximation in which
it is assumed that the probability, at time t, for a site value to be equal to one is
CJ(m, t). This approximation is incorrect when m is not sufficiently large.
In this section we will describe some results, obtained recently by Boccara
and Roger (1993), concerning the behavior of the stationary densities of nonzero
sites of the three one-dimensional range-one CAs whose evolution has been studied
in the preceding section.
2.2.1. Rule 18. The evolution of the density of nonzero sites according to the
mean-field approximation is determined by
0.05 (a)
.,. 0 0 0 0
---
,...,
c ""
;::;- 0.02
/"""
0 <l
8 0 02 8
£ 0 01
i
c"' 0 005 ~ 0.01
0.01 0 002
0 001
I
0. 0 0 5 L.L.I-'.illl!l_LL.I.W!II.-1-.U.UlllL-LllilillL-LLU.<illL._L.LW~--'..llllill
10-4 10-3 10- 2 10- 1 10° 10 1 10 2 10 3
m m
Figure 2.6. Rule 18. Log-log plot of the stationary density of nonzero sites
as a function of the degree of mixing m. (a) local site-exchange. (b) nonlocal
site-exchange.
From the log-log plots we have obtained ao = 0.415 ± 0.005 (Fig. 2.5a) and
a 00 = 0.44 ± 0.01 (Fig. 2.5a inset). These results may be understood as follows.
If, starting from the attractor of the cellular automaton evolving according
to Rule 18, we exchange a very small number of sites between two successive
applications of Rule 18, we create defects at a rate which can be assumed to
be proportional to m. Therefore, for small m, the number ~v+(m, t) of defects
created during a short time interval ~t is given by
On the other hand, for large t, due to the annihilation process described in the
preceding section, the number of defects v( m, t) decreases as C 7 with "( = t for
Rule 18. Hence, during ~t, the number of defects decreases by
39
r-y-l = O(m),
thus, for large t and small m, assuming that Dac1s(m, t) = c1s(m, t)- c1s(O, t) is
proportional to the number of defects, we have
It should be stressed that this simple argument can, at best, give an order of
magnitude. The assumption that defects are created at a rate proportional to m
is the simplest but may be not exact. Moreover, it is assumed that the defects
are created at random in the lattice. However, in a local site-exchange process,
when we move a nonzero site to a neighboring zero site, we create two neighboring
defects which have, therefore, a higher probability to annihilate than more distant
defects. Hence, the exponent a 0 should be slightly greater than the value -y/(/+1).
This is indeed the case.
When m is large, a given site has moved, on the average to a distance vm,
thus, if the correlations created by the application of Rule 18 have a range that is
small compared to this distance, the exponent a 00 should be close to ~'since the
local site-process is a random walk in a random environment (DeMasi et al1989).
If the mixing process is nonlocal, the argument explaining the behavior of
c 18 (m, oo) for small m remains valid, but the motion of nonzero sites being non-
diffusive, the behavior for large m is very different. Figure 2.6b shows that
no= 0.425 ± 0.005, and a 00 = 3.7 ± 0.1 (inset).
2.2.2. Rule 54. For small and large m, the arguments given above are still valid.
However, for this particular rule, the number of defects decreases so slowly (cf.
preceding section) that we should consider values of m of the order of 10-7 leading
to prohibitive computation times. In the simulations of Boccara and Roger (1993),
the smallest m is of the order of 10- 3 . Around this value it is found that no =
0.11 ± 0.01 (Fig. 2.7a), which is not so far from m ---+ 0-value -yj(! + 1) = 0.13,
and a 00 = 0.53 ± 0.01 (Fig. 2. 7a inset). For nonlocal site-exchange it is found that
a0 = 0.19 ± 0.01, notably higher than -yj(! + 1), and 0 00 = 5.3 ± 0.1 (Fig. 2.7b).
40
(a) (b)
0.1 10-1
0 0 0 0
0.05
-...
"'.... / "'
-...
....
10-2
+ 0.02 +
8 810-3
i 0.01
--;;: _§_
{}
I
8I
0.005 10-4
Figure 2.7. Rule 54. Log-log plot of the stationary density of nonzero sites
as a function of the degree of mixing m. (a) local site-exchange. (b) nonlocal
site-exchange.
2.2.3. Rule 22. For this rule, we have seen that the evolution toward the attractor
cannot be viewed as annihilation processes of interacting particlelike structures,
and for large t the density of nonzero sites tends to its stationary value expo-
nentially, that is, 'Y = oo. For small m, the argument given for Rule 18 is still
valid if, for large t, the number of defects v(m, t) is replaced by the difference
.6.c22 (m, t) = c 22 (m, t) - c 22 (0, t). c22(m, oo) should, therefore, behave linearly
for small m. This is indeed the case (Fig. 2.7a). For large m, we have found
a 00 = 0.55 ± 0.01 (Fig. 2.7a inset) for local site-exchange, in agreement with the
argument given for Rule 18, whereas a 00 = 4.4 ± 0.1 (Fig. 2.7b) for nonlocal site-
exchange. Note that, in this case, for small m, .6.c22( m, oo) is not exactly linear.
We have ao = 0.86 ± 0.01 (Fig. 2. 7b ).
41
Probabilistic CAs with an absorbing state exhibit phase transitions. Directed per-
colation is a typical example. 11 In this section we study the infl.ence of the mixing
process defined in the preceding section on the critical properties of the following
cellular-automaton rule
determines, therefore, the evolution of the density of nonzero sites within the mean-
field approximation. This mapping has two fixed points: 0 and and 1 - 1/ y2p.
Since the density of nonzero sites is a nonnegative quantity not greater than 1, the
second fixed point exists if, and only if, 2p 2: 1. The stability of these two fixed
points is easy to determine. We have:
(i) if p < 1/2, 0 is stable,
(ii) if p > 1/2, 0 is unstable and 1- ljy2p is stable.
At p = Pc( oo) = ~, the system exhibits a transcritical bifurcation similar to a
second-order phase transition characterized by a nonnegative order parameter. In
the neighborhood of the bifurcation point, the stationary density of nonzero sites
c( oo, p, oo) behaves as p - Pc( oo ).
For a fixed finite value of the degree of mixing m, the system exhibits, at
p = Pc(m), a transcritical bifurcation. The behavior of the stationary density
of nonzero sites c( m, p, oo) in the neighborhood of the bifurcation point may be
characterized by a critical exponent f3(m) defined by
II
........
8
J
~
i 00
G 0.1
0. 05 L_.J_-'--'-,W_--'---'--'---'--'--'-'---'--'----'------'------'---'
0.0005 0.001 0.002 0.005 0.01 0.02
(p- Pc(m))
Figure 2.8. Typical log-log plot of c(m,p, oo) as a function of p- Pc(m), for
m = 0.41. A least-square fit gives Pc( m) = 0. 781 and {3( m) = 0.275 (local site-
exchange).
To determine the values of Pc(m) and /3(m), we measure, for a fixed degree
of mixing m, the stationary density of nonzero sites c( m, p, oo) for different prob-
abilities p. If pis close to Pc(m), the stationary density is of the form
(a) (b)
.........
• 0.5
e<5:: I
• I I
IIIIIII IIIII
0. 5 0.2 L......J....J..J.J..LWL-J....U.Wlii.......I....I..I.J..UJJ'----'..J..J..J..LIJ.<L_J....U.Wlii.....LLC--
10-3 10- 2 10- 1 10° 10 1 10 2 10 3 10- 3 10-2 10- 1 10° 10 1
m m
Figure 2.10. Spatia-temporal pattern for p close to Pc( m) for short-range moves.
(a) m = 0.1, p = 0.8, (b) m = 0, p = 0.81.
(a) (b)
f I! f 1ft!f
ft
••
I., ?
f{
u
CQ
.•• f
•
0.5 .I ,I 0.5 ,I .I
0.001 0.01 m 0.1 0.001 0.01 m 0.1
Figure 2.12. Spatio-temporal pattern for p close to to Pc(m) for long-range moves.
Herem= 0.1, p = 068.
47
3. Interacting Populations
12 During the latent period the individual who has been exposed to the disease is not yet
infectious, whereas during the incubation period, the individual does not present symptoms but
is infectious.
48
states and fi: QIUd Q a mapping, called the local transition rule associated to
-+
chosen depends on the range of the move. In a short-rang move the chosen
vertex is any one of the four nearest neighbors, whereas in a long-range move
the chosen vertex is any vertex of the graph. Since individuals may only
move to empty sites, the average number of times an individual is selected
to perform a move during one time step is an average number of tentative
moves. during a unit of time. This parameter is denoted by m. Even when
m > 1, some individuals do not move. For a given m and in the limit of an
infinite number of individuals, the probability that s given individuals do not
move-i.e., have not been selected to move-is e-sm.
This model is rather crude, it assumes that the system is closed, births, deaths
by other causes, immigrations, or emigrations are ignored. We shall also study a
sligthly more general model in which susceptibles and infectives may both give
birth to susceptibles at neighboring empty sites with respective probabilities b8
and b;. Moreover, we may also allow susceptibles to be removed with a probability
ds.
These epidemic models could also be presented as predator-prey models. ln-
fectives may in fact be viewed as predators preying on susceptibles. However, small
changes should be made. When a prey is eaten by a predator, it is, of course, not
immediately changed into a predator. One has to take into account the efficiency
with which extra food is turned into extra predators, and predators give birth to
predators and not to prey.
These models are automata networks with mixed transition rules. That is,
at each time step, the evolution results from the application of two subrules. The
first subrule is a probabilistic translation invariant synchronous CA rule, and the
second one is a sequential site-exchange rule.
Conventionally, we may represent a model by a graph in which the vertex i
correspond to the group G; and the directed arc (i,j) is labeled by the probability
Pij to transform an individual belonging to G; into an individual belonging to Gj.
Such a graph is called a transfer diagram. For instance, the simple SIR model
described above corresponds to the transfer diagram
p; d;
S---+ I---+ R,
50
3.2.1. SIR Model. The evolution equations read (Boccara and Cheong 1992)
where z is the number of neighboring vertices of a given vertex. For the two-
dimensional square lattice considered in our simulations, z = 4. Note that, within
the framework of this approximation, the "incidence rate", represented by the term
SMFA(t)(1- (1- p;IMFA(t))z), is not-in this model as in most models (Bailey
1975, Waltman 1974, Anderson and May 1991)-bilinear. 13
From Equations (1-3), it follows that SMFA(t) is positive nonincreasing
whereas RMFA(t) is positive nondecreasing. Therefore, the infinite-time limits
action. The influence of incidence rates of the form Sa Jb on the dynamics of different epidemic
models, has been studied by various authors (see, for example, Hethcote and van den Driessche
(1991) who also considered incidence rate of the form Sg(J), where g is not linear). When
a ":11, the qualitative dynamical behavior of the model is not altered, but for b > 1, multiple
equilibria and limit cycles have been found (see subsection 3). In statistical physics, the existence
of new phases resulting from nonbilinear two-body interactions has been known for a long time
(see, for example, Boccara (1976)).
51
RMFA(O) = 0 and
Hence, according to the initial value of the density of susceptibles, we may distigu-
ish two cases:
1. If SMFA(O) < d;jzp; then IMFA(1) < IMFA(O). Since SMFA(t) is a
nonincreasing function of time, IMFA(t) goes monotonically to zero as t tends to
oo. That is, no epidemic occurs.
2. If SMFA(O) > d;jzp; then IMFA(1) > IMFA(O). The density IMFA(t) of
infectives increases as long as the density of susceptibles SMFA(t) is greater than
the threshold d;j zp; and then tends monotonically to zero.
This shows that the spread of the disease occurs only if the initial density
of susceptibles is greater than a threshold value. This is exactly the threshold
theorem of Kermack and McKendrick. Since I MFA ( t) is, in general, very small,
Equation (3) is well approximated by
0.04
0.03
-
-
~ 0.02
8
......
0.01
0.00
0 10 20 30 40
t
Figure 3.1. Time evolution of the density of infectives for the SIR model within
the mean-field approximation. C = 0.6, IMFA(O) = 0.01, z = 4, p; = 0.3. (a)
d; = 0.5 (SMFA(O) > d;jzp;). (b) d; = 0.75 (SMFA(O) < Prfzp;).
to more than one population. For instance, the heterosexual spread of a venereal
disease involves the obligatory switching of infection back and forth between two
distinct populations. In this case, the probability for a susceptible of Population 1
(resp. 2) to become infective by contact with an infective of Population 2 (resp. 1)
is denoted by p 1 ,; (resp. P2,i), and the probability for an infective of Population 1
(resp. 2) to be removed is denoted by d 1 ,; (resp. d2 ,;). 14 We have then (Boccara
14 A less crude model for the heterosexual spread of a venereal disease may be obtained, for
example, by separating males and females into different age groups and assuming that a male
(resp. female) susceptible belonging to a given age group can catch the disease from a female
(resp. male) infective if, and only if, the infective belongs to neighboring age groups.
53
(5)
RMFA(t + 1)=RMFA(t) + da,iiMFA(t) (6)
I'M FA(t + 1)=I'M FA(t)+SMFA(t)(1-(1-pa,ii~FA(t)Y)-da,dMFA(t), (7)
and
where a and f3 are equal to 1 or 2 with a =/= /3. Hence, according to the initial
values Si.tFA(O), SlrFA(O), Ii.tFA(O) and IlrFA(O), we may observe the following
behaviors:
l.If
and
then
54
and
then
and
are satisfied. Since the densities of susceptibles decrease with time, the densities
of infectives, after having reached a maximum, tend monotonically to zero.
3. If
and
then
0.012 0.10
0.010
0.08
0.006
0.06
......
...... 0.008
......
.....
~.;
~
i~
0.04
0.004
«! «!
......
.... ......
..... 0.02
i 0.002
i.....
~
~
I
0.000 0.00
0 6 10 15 20 0 20
t
0.020
0.010
,..... ,.....
..,
..... 0.016
i
~
i~
0.005
«! «!
,..... 0.010 ,.....
..... .....
i i.....
~
0.000
6 10 15 20 0 6 !0 15 20
t
Figure 3.2. Time evolution of the densities of infectives for the two-population
model using the mean-field approximation.
QI = zpi,iS1tFA(O)I1rFA(O)- di,JitFA(O),
Q2 = ZP2,iS1rFA(O)I1tFA(O)- d2,ii1rFA(O). z = 4,
S1tFA(O) = S1rFA(O) = 0.29, IitFA(O) = I1rFA(O) = 0.01.
(a) Ql < 0 and Q2 < 0, PI,i = 0.37, P2,i = 0.23, dl,i = 0.6, d2,i = 0.3.
(b) Ql > 0 and Q2 > 0, PI,i = 0.5, P2,i = 0.8, d1,i = 0.35, d2,i = 0.25.
(c) QI < 0 and Q2 > 0, PI,i = 0.13, P2,i = 0.8, d1,i = 0.27, d2,i = 0.35.
(d) QI < 0 and Q2 > 0, PI,i = 0.15, P2,i = 0.6, d1,i = 0.5, d2,i = 0.3.
56
becomes positive. The spread of the disease in Population 2 may trigger the
epidemic in Population 1. If, however, the increase of the density of infectives in
Population 2 is not high enough, then the density of infectives in Population 1
will decrease monotonically whereas the density of infectives in Population 2 will
increase as long as
and then tends monotonically to zero. The disease spreads only in Population 2
whereas no epidemic occurs in Population 1.
Figures 3.2a-2d show some typical time evolutions of the density of infectives
in both populations.
3.2.3. Model. In this model, after recovery, infected individuals become sus-
SIS
ceptibleto catch the disease again (as, e.g., with common cold). This model is
interesting because it exhibits a transcritical bifurcation between an endemic state
and a disease-free state.
Here, the state of the system at time t is characterized by the densities
S MFA ( t) and I MFA ( t) of susceptibles and infectives, and the evolution equation
of the density of infectives is (Boccara and Cheong 1993)
where Pr denotes the probability per unit time for an infective to recover.
Since the population is closed, the total density
(10)
In the infinite-time limit, the stationary density of infectives IM FA( oo) is such that
57
lM FA( oo) = 0 is always a solution of Equation (12). This value characterizes the
disease-free state. It is a stable stationary state if, and only if, zCp; - Pr ~ 0. If
zCp; - Pr > 0, the stable stationary state is given by the unique positive solution
of Equation (12). In this case, a nonzero fraction of the population is infected. The
system is in the endemic state. For zCp;-pr = 0 the system, within the framework
of the mean-field approximation, undergoes a transcritical bifurcation similar to
a second order phase transition characterized by a nonnegative order parameter,
whose role is played, in this model, by the stationary density of infected individuals
IM FA( oo ). This threshold theorem is a well-known result for differential equation
s1s models (Hethcote 1976).
It is easy to verify that, in the endemic state, when zCp; - Pr tends to zero
from above, lMFA(oo) goes continuously to zero as zCp;- Pr· In the (p;,pr)
parameter plane,
zCp;- Pr = 0 (13)
3.2.4. Generalized SIR Model. Let us assume that, more generally, susceptibles and
infectives may give birth to susceptibles at neighboring empty sites with respective
probabilities b8 and b;, and that susceptibles may be removed with a probability
d 8 • The evolution equation read
(20a)
d; > 0. (20b)
(22)
As for the simple SIR model, there exists a threshold value for the density of
susceptibles above which the number of infectives increases.
3. The expression of the Jacobian matrix J ( S* ,I*) is rather complicated.
We shall discuss the stability of (S*, I*) and the bifurcation corresponding to the
coalescence of (S* ,I*) and (So, 0) as follows. Let
Then, S* and I* are such that F( S* ,I*) = 0 and G( S*, I*) = 0. When, at the
bifurcation point, (S*,I*) and (S0 ,0) coalesce-the bifurcation is transcritical-
' S* tends to So and I* to zero. This bifurcation point is the analogue of a
second-order transition point from the endemic to the disease-free state (see above
subsection). Hence, in the vicinity of this point we have
where
0 2 G ( So, 0) = (1 - ds ) Sop;!
2 II ( 0 ) . (25h)
012
I*, which plays the role of the order parameter, goes to zero as
since there exist no values for the parameters b8 , and ds such that
F(So,O) = 0,
oF
aS (So, 0) = 0,
that is,
interaction, are no more probabilities. That is, they may take any nonnegative
value.
In phase transition theory, it is well-known that constant-interaction models
exhibit a mean-field behavior (Boccara 1976). In epidemilogy, these models are,
however, much less artificial than in phase transition theory. For instance, in a
group of individuals in a vacation resort, the infection process is well approxi-
mated by a constant interaction model, whereas in many-body physics, constant-
interaction models are not realistic.
We will not analyze constant-interaction models since their qualitative dynam-
ical behavior is identical to the behavior found using the mean-field approximation
(Boccara and Cheong 1992 and 1993).
3.4. SIMULATIONS
In all our simulations, the total density of individuals is above the site percolation
threshold for the square lattice, which is equal to 0.593 (Stauffer 1979), in order
to be able to observe cooperative effects when m = 0.
3.4.1. SIR model. Figure 3.3a shows that the influence of the parameter m on
the time evolution of an epidemic with permanent removal for short-range moves.
As m increases the density of infectives as a function of time tends to the mean-
field result. Figure 3.3b shows that the convergence to the mean-field result is
much faster for long-range moves. Mixing is more effective with long-range moves.
If, instead of permanent removal, infectives recover with the probability Pr and
become permanently immune the convergence to the mean-field result is slower
(Fig. 3.3c) since the presence of the inert immune population on the lattice lowers
the effective mixing.
Note that, since the initial configuration is random, for any type of move and
any value of m, the value of density of infectives after the first time step is correctly
predicted by the mean-field approximation.
As shown by Kermack and McKendrick (1927) the spread of the disease does
not stop for lack of a susceptible population. As the time t tends to infinity, the
stationary density of susceptibles S( m, oo) for a given value of m is positive. The
63
0.26
0.20
0.16
....
,-..
~ 0.10
0.06 0.26
0.20
6 15 20
0.16
....
,-..
~ 0.10
0.26
0.05
0.20
6 10 15 20
0.16 t
....
,-..
~ 0.10
0.06
Figure 3.3. Time evolution of an epidemic for the siR model for different values
of m. The dashed line corresponds to the mean-field approximation . C = 0.6,
I(O) = 0.01, Pi= 0.5, di = 0.3, 100 X 100 lattice. Each point represents the average
of 10 experiments. (a) Short-range moves and permanent removal. + : m = 0,
x :m = 5, o : m = 250. (b) Long-range moves and permanent removal. + : m = 0,
x : m = 0.2, o: m = 2. (c) Short-range moves and permanent recovery. C = 0.6,
I(O) = 0.01, Pi= 0.5, Pr = 0.3, 100 X 100 lattice. Each point represents the average
of 10 experiments. + : m = 0, x : m = 5, o : m = 250.
64
0.3
-a
8
-
0.2
f ll
0.1
~
~
\
0.0
0 10 20 30 40 50
m
Figure 3.4. Stationary density of susceptibles for the SIR model as a function of
m in the case of permanent removal and short-range moves.
3.4.2. sis Model. Figure 3.5 represents the (p;, Pr) phase diagram for different
values of min the case of short-range moves. Figure 3.6 shows a typical variation
of the stationary density of infectives I( m, oo) as a function of p; for given values of
Pr and m. The slope at the critical point (i.e., the transcritical bifurcation point)
seems to be infinite. If this is indeed the case, the critical exponent f3 defined by
f3 = lim log I(m, oo), (10)
p;-pf->O+ log(p; -pi)
P1
Figure 3.5. (p;, Pr) phase diagram in the case of short-range moves form = 0 ( o),
m = 2 ( x ), m = 8 (o ). The dashed line represents the mean-field approximation.
Total density: C = 0.6. Lattice size: 100 x 100.
0.5
0.4
,........_ 0.3
8
(")
0
....__.
,_. 0.2
0.1
0.0
I
0.2 0.3 0.4 0.5 0.6 0.7 0.8
Pi
Figure 3.6. Typical variation of I( m, oo) as a function of Pi for given values of
Pr and m in the case of short range moves. Here Pr = 0.5, m = 0.3. The critical
value of Pi is 0.3018. Total density: C = 0.6. Lattice size: 100 X 100.
For a given value of Pr, the variations of (3 and Pi as functions of m (Fig. 3.8)
exhibit two regimes. In the small m regime, i.e., for m :S 10, Pi and particularly
(3 have their m = 0 values. In the large m regime, i.e., for m ~ 300, Pi and (3
have their mean-field values.
Concerning the value of the critical exponent (3 the following two points should
be stressed.
1. The fact that, for small values of m, the exponent (3, approximately equal to
0.6, is much less than its mean-field value illustrates how wrong is the assumption
of homogeous mixing. This model, which takes into account the fluctuations in
67
the number of contacts in space and time, neglects, however, all other causes of
heterogeneity.
-1
0
0
-2 0
0
l'
.;t'
0
-3 0
0
I
-4~~~~~~~~~~~~~~~~~
-6 -5 -4 -3 -2 -1 0
ln (pi-Pic)
Figure 3.7. Typical log-log plot of I(m,oo) as a function of Pi- p'f for given
values of Pr and min the case of short-range moves. Here Pr = 0.5, m = 0.3. The
critical value of p; is 0.3018. Total density: C = 0.6. Lattice size: 100 x 100.
2. When m = 0, the value of fJ for this model is equal to the value of fJ for
two-dimensional directed percolation (Bease 1977). This result strongly suggests
that the critical properties of our model are universal, i.e., model-independent (see
Section 2.3).
For given values of p; and Pr, the asymptotic behavior of the stationary density
of infectives, for both small and large values of m, has also been studied (Boccara
and Cheong 1993). When m tends to zero, I( m, oo)- I(O, oo) tends to zero as mao
with a 0 = 0.177 ± 0.15. When m tends to oo, I( m, oo) tends to I( oo, oo) as m -aoo,
and, as for the siR model, it is found that a 00 is close to 1 (exactly 0.945 ± 0.065).
68
to- 1
m
The fact that a 0 is rather small shows the importance of motion in the spread
of a disease. The stationary number of infectives increases dramatically when
the individuals start to move. In other words, we may say that the response
8!( m, 00 )/am of the stationary density of infectives to motion of the individuals
tends to oo when m tends to 0. The asymptotic behavior of I(m, oo) for small m
is related to the asymptotic behavior of I(O, t) for large t (Section 2.3).
For long-range moves, the variations of f3 and p'f as functions of m, for a fixed
value of Pn are very different than those for short-range moves. Figure 3.9 shows
that f3 and Pi go very fast to their mean-field values. Whereas for short-range
moves, f3 and p'f do not vary in the small m regime, here, on the contrary, the
derivatives of f3 and Pi with respect to m tend to oo as m tends to 0. For small
m, the asymptotic behaviors of f3 and p'f may, therefore, be characterized by an
exponent. These exponents are not easy to measure. It is found (Boccara and
Cheong 1993) that f3(m)- /3(0) and pi(O)- pi(m) both behave approximately as
mi/2.
69
2 2.5
m
Figure 3.9. Variations of {3 and p'f as functions of m for long range moves.
Pr = 0.5, C = 0.6, lattice size: 100 x 100. For {3, typical error bars have been
represented.
3.4.3. Generalized SIR Model. Since the transition from the endemic state to
the disease-free state corresponding to the coalescence of (S* ,I*) and (So, 0) is
the analogue of the transcritical bifurcation studied in the case of the simple SIS
model, we have determined, form= 0, the value of the exponent {3(0) defined by
for fixed values of all the other parameters. 17 The log-log plot represented in
Figure 3.10 shows that {3(0) = 0.568 ± 0.050. This value is equal to the value of
17 For the simple SIS model, the exponent {3 was defined (Equation 10) for a fixed value of
Pr (Pi variable). A similar definition for a fixed value of Pi (pr variable) could have been given.
These two definitions lead to identical values for {3 since the value of the exponent does not
depend upon the direction along which the transition line is approached. Here, d; has been
chosen as our variable parameter. Any other choice is equivalent.
70
t'(O) obtained for the simple sis model, strongly suggesting again that all these
models belong to the same universality class.
-2.0
-2.5
-3.0
-3.5
-10 -9 -8 -7
ln (d 1c-d 1)
Figure 3.10. Log-log plot of I(O, oo) as a function of d; - d~ for given values of
all other parameters. Here b8 = 0.4, b; = 0.1, d 8 = 0.3, and Pi = 0.01. The critical
value of d; is 0.01149. Lattice size: 200 x 200.
The most interesting feature of this model is the existence of the Hopf bifur-
cation. Since the emphasis of these lectures is on the importance of motion, we
have studied the influence of m on the stability of (S*, I*).
Our simulations indicate that motion favors cyclic behavior. This is, shown
in Figures 3.11a-f. In the case of short-range moves, form = 300 (Fig. 3.11b),
we observe a cyclic behavior; the cycle, in the (S, I) plane being almost identical
to the cycle predicted by the mean-field approximation (Fig 3.lla). When m is
decreased, the size of the cycle starts first to increase (Fig 3.11c), then decreases
(Fig. 3.11d-e) to finally disappear completely (Fig 3.11f).
71
.....................-1......-r-r-r-1,............,...,....1,............,........,1,............,....,..,
0
- ..., r- -
0.11 - - .... - -
G.IO .... - -
I I I
L I 1 I
OM o...._......_.OM......._......._O.._.Mw....o._._._o.oe......._._._.o._.oe~~o.l ....u.............,_..~:_I ...........,O.IM~.........~ !:'-'""""":..-!::.-:-'-'-'::"'o.··
....
0.10- - 0.101- -
Q.ll 1- - .... - 0 -
Q.IO r - .... r- -
uo 1- - 0.10 - -
G.ll 1- 0 - 0.111- -
0.10 - 0.10 -
~-) -
I I I I
c
0..10 - •~·,.
-
?'~I
0.10 1- ...:$
.;;
-
•·
•'~"
0.10 1- -
I I I I 0.1
0.011
0 0.01 0.04 0.011 0.011 0.1
s
OA
0.6
0.4
0.:1
0.1
0.1
0.0 0.8
s
4. Conclusion
We have discussed various automata network models for the spread of infectious
diseases in populations of moving individuals. The local rules of the automaton
consist of two subrules. The first, which is synchronous, is a probalistic cellular au-
tomaton rule. It models birth, death, infection and recovery. The second, applied
sequentially, is a site-exchange rule. It describes the different types of moves the
individuals may perform. The emphasis has been on the influence of motion, that
is, the degree of mixing which follows from the application of the second subrule.
The degree of mixing is measured by a parameter m representing the average num-
ber of tentative moves per individual. If m goes to oo then the time evolution of
the difi"erent models is exactly described by a mean-field-type approximation. The
stationary densities of the different populations approach their mean-field values
is characterized by an exponent a 00 close to 1 if the motion of the individuals is
diffusive, that is, for short-range moves. For long-range moves the approach to
the mean-field value is faster. The asymptotic behavior of these densities for small
values of m has also been studied. For the SIRmodel, the derivative with respect
to m of S( m, oo) is negative and very large showing that as soon as the individuals
start to move, the spread of the disease increases dramatically. This effect is even
more stricking in the SIS model, the derivative of the density of infectives I( m, oo)
being, in this case, infinite.
The SIS model exhibits a transcritical bifurcation similar to a second-order
phase transition. In the neighborhood of the phase transition the system exhibits
a critical behavior due, for any finite value of m, to the local character of the
first subrule modeling infection and recovery. For m = 0, the critical exponent
j3 has the value found for two-dimensional directed percolation, suggesting that
the critical behavior of the sis model is universal. j3 depends, however, on the
74
References
[I] Bailey, N.T.J., The Mathematical Theory of Infectious Diseases and its Ap-
plications, London, Charles Griffin (1975).
[2] Bease, J., Series Expansions for the Directed-Bond Percolation Problem, J.
Phys. C: Solid State Phys. 10, 917-924 (1977).
[3] Bidaux, R., N. Boccara, H. Chate, Order of the Transition Versus Space
Dimensionality in a Family of Cellular Automata, Phys. Rev. A 39, 3094-
3105 (1989).
[4] Boccara, N., Symetries Brisees, Paris, Hermann (1976).
[5] Boccara, N., K. Cheong, Automata Network SIR Models for the Spread of
Infectious Diseases in Populations of Moving Individuals, J. Phys. A: Math.
Gen. 25, 2447-2461 (1992).
[6] Boccara, N., K. Cheong, Critical Behaviour of a Probabilistic Automata Net-
work s1s Model for the Spread of an Infectious Disease in a Population of
Moving Individuals, J. Phys. A: Math. Gen. 26, 3707-3717 (1993).
75
[7] Boccara, N., E. Goles, S. Martinez, P. Picco (eds.), Cellular Automata and
Cooperative Phenomena. Proc. of a Workshop, Les Houches, Dordrecht,
Kluwer {1993).
[8] Boccara, N., J. Nasser, M. Roger, Annihilation of Defects During the Evo-
lution of Some One-Dimensional Class-3 Deterministic Cellular Automata,
Europhys. Lett. 13, 489-494 (1990).
[9] Boccara, N., J. Nasser, M. Roger, Particlelike Structures and their Interactions
in Spatia-Temporal Patterns Generated by One-Dimensional Deterministic
Cellular-Automaton Rules, Phys. Rev. A 44, 866-875 (1991).
[10] Boccara, N., J. Nasser, M. Roger, Critical Behavior of a Probabilistic Local
and Nonlocal Site-Exchange Cellular Automaton (1993), to appear.
[11] Boccara, N., M. Roger, Some Properties of Local and Nonlocal Site-Exchange
Deterministic Cellular Automata, (1993), to appear.
[12] Bramson, M., J.L. Lebowitz, Asymptotic Behavior of Densities for Two-
Particle Annihilating Random Walks, J. Stat. Phys. 62, 297-372 (1991).
[13] Cardy, J.L., Field-Theoretic Formulation of an Epidemic Process with Immu-
nisation, J. Phys. A: Math. Gen. 16, L709-L712 (1983).
[14] Cardy, J.L., P. Grassberger, Epidemic Models and Percolation, J. Phys. A:
Math. Gen. 18, L267-L271 (1985).
[15] Cox, J.T., R. Durrett, Limit Theorems for the Spread of Epidemics and Forest
Fires, Stoch. Proc. Appl. 30, 171-191 (1988).
[16] DeMasi, A., P.A. Ferrari, S. Golstein, W.D. Wick, An lnvariance Principle for
Reversible Markov Processes. Applications to Random Motions in Random
Environments, J. Stat. Phys. 55, 787-855 (1989).
[17] Farmer, D., T. Toffoli, S. Wolfram (eds.), Cellular Automata: Proc. of an
Interdisciplinary Workshop, Los Alamos, Amsterdam, North-Holland (1984).
[18] Gallas, J.A.C., H. Herrmann, Investigating an Automaton of Class-4, Int. J.
Mod. Phys. C 1, 181-191 (1990).
[19] Gertsbakh, LB., Epidemic Process on a Random Graph: Some Preliminary
Results. J. Appl. Prob. 14, 427-438 (1977).
[20] Goles, E., S. Martinez, Neural and Automata Networks, Dordrecht, Kluwer
(1991).
76
[21] Grassberger, P., New Mechanism for Deterministic Diffusion, Phys. Rev. 28
A, 3666-3667 (1983).
[22] Grassberger, P., On the Critical Behavior of the General Epidemic Process
and Dynamical Percolation, Math. Biosci. 63, 157-172 (1983).
[23] Gutowitz, H. (ed.), Cellular Automata: Theory and Experiment, Proc. Work-
shop, Los Alamos, Amsterdam, North-Holland (1990).
[24] Hethcote, H.W., Qualitative Analyses of Communicable Disease Models,
Math. Biosci. 28, 335-356 (1976).
[25] Hethcote, H.W., P. van den Driessche, Some Epidemiological Models with
Nonlinear Incidence, J. Math. Biol. 29, 271-287 (1991).
[26] KaJlen, A., P. Arcuri, J.D. Murray, A Simple Model for the Spatial Spread
and Control of Rabies, J. theor. Biol. 116, 377-393 (1985).
[27] Kermack, W.O., A.G. McKendrick, A Contribution to the Mathematical The-
ory of Epidemics, Proc. Roy. Soc. A 115, 700-721 (1927).
[28] Kinzel, W., Directed Percolation, Ann. Israel Phys. Soc. 5, 425-445 (1983).
[29] Kinzel, W., Phase Transitions in Cellular Automata, Z. Phys. B 58, 229-244
(1985).
[30] Kuulasmaa, K., The Spatial General Epidemic and Locally Dependent Ran-
dom Graphs, J. Appl. Prob. 19, 745-758 (1982).
[31] Kuulasmaa, K., S. Zachary, On Spatial General Epidemic and Bond Percola-
tion Processes, J. Appl. Prob. 21, 911-914 (1984).
[32] Liggett, T.M., Interacting Particles Systems, Heidelberg, Springer-Verlag
(1985 ).
[33] Lotka, A.J., Elements of Physical Biology, Baltimore, Williams and Wilkins
(1925).
[34] McKay, G., N. Jan, Forest Fires as Critical Phenomena, J. Phys. A: Math.
Gen. 17, L757-L760 (1984).
[35] Manneville, P., N. Boccara, G. Vichniac, R. Bidaux (eds.), Cellular Automata
and Modeling of Complex Physical Systems. Proc. of a Workshop, Les
Houches, Heidelberg, Springer-Verlag (1989).
[36] Mollison, D., Spatial Contact Models for Ecological and Epidemic Spread,
J.R. Statist. Soc. B 39 283-326 (1977).
77
ARTUR LOPES
Instituto de M atemcf.tica
Universidade Federal do Rio Grande do Sul
91500 Porto Alegre RS
Brasil
ABSTRACT. We present a brief introduction to Ergodic Theory and equilibrium states of Ther-
modynamic Formalism. We also analyze Large deviation properties of the equilibrium states
defined in Thermodynamic Formalism. Several problems related to Statistical Mechanics are
consider.
1. Introduction
Our purpose in the first paragraphs of this text is to present the basic concepts
of Ergodic Theory in the most simple way. We introduce the Ergodic Theorem of
Birkhoff and the concept of entropy and pressure. Our final goal is to analyze sev-
eral important problems related to Statistical Mechanics in the setting of Ergodic
Theory.
We hope to present some of the main ideas of Ergodic Theory without too
many technicalities. The relation between the concepts of pressure and entropy
with the free-energy of Large Deviation Theory will be explored in the last para-
graphs.
Given a space X, a probability P on X is a law that associates to each subset
B of X a real value P(B). The value P(X) is assumed to be one. We also assume in
the definition of probability that for any sequence Bn, n EN of disjoint subsets of
X (that is, Bn n Bm = 0 form different from n), the union of such sets, UneNBn,
satisfies P(UneN Bn) = L:::'=o P(Bn)· Finally we require that P(A-B) = P(A)-
P(B) for any subsets A and B of X, such that B is contained in A.
Unfortunately, in most cases one can not have all the above properties defined
for all subsets of X. Therefore we define the probability P on a smaller family of
79
E. Gales and S. Martfnez (eds. ), Cellular Automata, Dynamical Systems and Neural Networks, 79-146.
© 1994 Kluwer Academic Publishers.
80
Let n = {0, 1} N be the set of sequences of O's and 1 's, that is, z E n if z =
(zo,ZI,Z2, ... ,Zn,···) where z; E {0,1} for all i EN.
We call this set the Bernoulli space. We can think of this set as the set of
events of tossing a coin infinitely many times, in which we associate head with
0 and tail with 1. For example, (0,1,0,1,0,1, ... ) is the event in which we have
alternatedly head and tail, beginning with a tail at time 0, that is, zo = 0.
A cylinder (or a parallepiped) A is a subset of n defined by a finite specification
of elements; the set A= {(0,1,1,0,1,z5,z6,···,zn,···) I z; E {0,1}, i ~ 5}, for
example, is a cylinder, which we denote by (0, 1, 1,0, 1). In general a cylinder is
given by
where n is fixed and a 0 , a 1 , ... ,am belonging to {0, 1} are also fixed. We should
think of (0, 1, 1,0, 1) as the event of tossing a coin and have successively head, tail,
tail, head and tail and no specification about the rest of the other tossings.
81
that the above result is true. We explain now more carefully the meaning of the
Ergodic Theorem.
Note first that P depends on Po and PI· The Birkhoff Ergodic Theorem (it
will be formally stated later) claims that there exists a set A such that P( A) = 1,
and such that for all z E A, where z = (zo,zt,z2,····zn, ... ), we have that
Po = lim
n-<X>
~n (cardinal of heads among zo, ZI, ... , Zn-d
and
PI = lim
n~=
~n (cardinal of tails among zo, ZI, ... , Zn-d·
The above result claims that the mean value of heads that appears in tossing
the coin n times converges to p 0 . Before we state the Birkhoff Ergodic Theorem in
precise mathematical terms we need to introduce the concepts of shift and invariant
measure.
The shift map (} from n to n is the map such that for
we have
Therefore we can express the number of tails we have tossing the coin n times
(as expressed by z E n) by
n-I
L IA((Jl(z)),
j=O
L IB((Ji(z))
j=O
is the number of heads we have for the event z of tossing the coin n times; here
IB(z) is the indicator of the set B = (0), that is, IB( z) = 0 if z ¢ (0) and IB(z) = 1
if z E (0).
83
In this way we can see that the shift helps us to formulate the number of
heads and tails in a simple expression.
Definition 2.1. The set { z, a( z ), a 2 ( z ), ... , an( z ), ... } is called the orbit of z under
the shift map a. The element an(z) is called the nil!. iterate of z.
We will call the Borel a-Algebra of n the a-Algebra generated by the cylinders.
The Borel a-Algebra of R is the a-Algebra generated by the finite intervals (see
[16]). We say that f from X toR is measurable if for each set A in the a-Algebra
of Borel of R, f- 1 (A) is in the a-Algebra of Borel of n.
Given a certain measurable map ¢> : n ~ R, the mean value of ¢> on z up to
the n1l! iterate is
n-1
-1 "'"'
~ ¢>(a1.(z)).
n .
J=O
In this way, for ¢> = I(o)' the mean value of I(o) on z, up to the n.!lLiterate is the
mean value of times we obtain a head, tossing the coin n times. In the case of the
fair coin, that is, Po = 0.5 = p 1 , and ¢> = J(ii), one should expect that the mean
number of heads should converge to 0.5 when n goes to infinity.
We will be interested in obtaining the limit of these mean values as n goes to
infinity, that is,
1 n-1 .
lim - "'"'
n--+oo n ~
¢>( a 1 ( z))
j=O
Proposition 2.1. The probability P 1s always invariant for the shift map
a: n-+ n.
Proof. It is enough to show that P(T- 1 (A)) = P(A) for the sets A that are
generators (the cylinders) of the a-algebra.
Consider A= (a 0 , a 1 , ••• ,an) a cylinder, then
Notation. We introduce the following notation: M(T) is the set of all invariant
probabilities p for the measurable map T : X -+ X.
Therefore M(a) denotes the set of all invariant probabilities for a. For each
Po,pl, such that Po+ PI = 1, 0, we have that the corresponding P
po,PI ~
belongs to M(a) as was shown in the proposition above. There exist of course
other probabilities p E M (a) that are not of the form P.
The set of probabilities M(T) is a convex simplex in the set of all measures
on the a-algebra A of the set X. It is well known in Convex Analysis that the
points in the corners of the convex play a very important role.
It is possible to show that the probability measures that are extremals for the
set of invariant probabilities C = M(T) are the ergodic probabilities.
85
Definition 2.4. We say that J.L E M(T) is ergodic if for all A E A such that
T- 1 (A) =A either J.L(A) = 0 or J.L(A) = 1.
The above definition means that for an ergodic measure the action of the
measurable map T on any non trivial set A E A (a trivial set being equal to 0
or X up to a set of J.L-measure zero) is so random that it can not leave the set A
invariant; in other words the set A has to spread around the set X under iteration
ofT.
Note that the empty set 0 and the total set 11 are always invariant, but they
have respectively measure 0 and 1.
Remark. It can be shown that the shift with the invariant probability P defined
above is ergodic [18].
In Ergodic Theory, most of the proofs of general results follow the recipe:
first prove the result for ergodic measures and then use the ergodic decomposition
theorem [13] to extend the result for other kind of measures.
Notation. Given a probability J.L on the set X, we will say that a property happen
J.L-almost everywhere, if there exist a subset A contained in X, such that J.L(A) = 1
and the property is true, for all z in the set A.
Theorem 2.1. (Birkhoff)- Let (X, A, J.L) be a probability space and T: X-+ X a
measurable transformation that preserves J.L, that is, J.L E M(T) and suppose that
J.L is ergodic. Then for any f E £ 1 (J.L ),
n-1
The above result essentially claims that for ergodic measures, spatial mean
(the right hand side of {1)) is equal to temporal mean {the left hand side of {1)) for
almost every point z. Therefore, in this case, in order to compute an integral, one
has to estimate the value of a series. In several practical situations this property
brings a simplification to the problem of estimating an integral.
When we consider T = u, P = J.t and X = n in the Bernoulli shift example
we mentioned before, then considering f(x) = I(o)(x), we get
=j
n-1
(for P-almost every z ), which we mentioned before in our reasoning. This theorem
therefore is a very general result that can, as a particular case, assure the validity
of the Strong Law of Large Numbers.
In the case p 0 = 0.5 = p 1 , the fair coin, the event of obtaining head every
time from 0 to infinity {that is, {1,1,1,1,1, ... )) is rare (hasP-measure zero). For a
set A of measure one the events (z0 , ZI, ••. , Zn, •• ) E A are such that head and tail
appear with the same frequency.
The questions that people in Probability and Ergodic Theory are concerned
with are not of deterministic nature. The statements that are relevant and perti-
nent, are the ones about events that happen with probability one. In other words,
the statements a bouts sets A such that J.t( A) = 1. Sets of measure zero are consider
negligible.
The Birkhoff Ergodic Theorem is one of the most celebrated theorems of
Mathematics and was inspired by Statistical Mechanics, more specifically by the
billiard ball model, which is a model for a particle reflecting on the walls of a closed
compartment [13].
We now state a more general version of Birkhoff's Ergodic Theorem, without
the assumption that the measure is ergodic.
Theorem 2.2. (Birkhoff)- Let (X,A,J.t) be a probability space and J.t E M(T),
87
where Tis measurable, T: X-+ X. Then for any f E .C 1 (f.1.) the limit
1 n-1 .
lim - ""'
n-+oo n L....J
f(T 1 (z))
j=O
lim
n-+oo
~n ""'f(Ti(x)) = j(x),
L....J
j=O
I ](x)dtJ.(x) =I f(x)dtJ.(x).
Note that the difference of the last result to the previous one is that in the
case the measure is ergodic J is constant f.1. - almost everywhere.
The Bernoulli space n can be equipped with a distance do : nXn -+ R
in the following way: for a fixed value () with 0 < () < 1, we define the metric
do(x, y) = ()N, (where N is the largest natural number such that x; = y;, Iii< N)
if x is different from y. When x is equal to y then we define the distance to be
zero. If we define open sets n in the usual way (product topology) we have that the
a-algebra generated by the cylinders is the a-algebra of Borel, since the cylinders
form a basis for the topology of n.
As an example consider()= 0.3, z=(1,1,0,1,0,0,1, ... ) and e = 0.0081 = 0.34,
then is easy to see that B( z, e) (the open ball of center z and radius e) is equal to
the cylinder (1, 1,0, 1).
Note that the indicator function fA is continuous if A is a cylinder.
In the rest of this text we will consider a certain fixed value () and denote by
d the metric associated with it.
Definition 2.5. A map T from a metric space (X, d) into itself is expanding if
there exist A > 1 such that for any x, there exist e > 0 such that \:ly E B(x, e),
d(T(x), T(y)) ~ Ad(x, y).
Note that if do(x,y) =a, x,y En, then do(a(x),a(y)) = o:B- 1 = B- 1 do(x,y).
Therefore the Bernoulli shift a is expanding with the value A = B- 1 in the notation
of above definition.
88
when z = (z;). For example for z = (z;) where z; = 1 fori even and z; = 0 for i
odd, a(z) = (zi+ 1 ) = (y;) where y; = 1 fori odd andy;= 0 fori even. Note that
a 2 (z) = z in this case.
J.Lo(i) = p;
and
k-1
LPi = 1.
i=O
Consider also the set of sequences of these symbols, that is, the set of sequences
z = (zo, Zl' Z2, ... , Zn, ... ) where z; E {0, 1, ... , k- 1}. We will again denote by n the
set of all these sequences. Sometimes we denote by z : N - t {0, 1, ... , k- 1}
an element of n and z(n) by Zn· The shift on n is defined in the same way
as before, a: n - t n is such that for z = (zo,ZI,Z2, ... ,Zn+l, ... ) En, a(z) =
(z1,z2, ... ,zn, ... ) En.
89
Definition 2.7. Given finite subsets Ao,AJ, ... ,Am of {0,1, ... ,k -1} and j EN,
we define the cylinder C(j, Ao, ... , Am) by
IT flo(A;).
m
0 < () < 1 fixed, the metric we will consider on n is ds(x, y) = ()N where N is the
largest N such that x; = y; for all i such Iii ~ N if x is different from y and zero
otherwise.
We will call such system the two-sided Bernoulli shift on
and
Xk-1 = (k -1,zo,z1,···•Zn,···)
are such that a(x;) = z, i E {0, ... , k- 1 }, that is, a- 1 (z) = {x 0 , ••• , Xk-d.
More generally for z = (z 0 ,z 1 , ••. ), the set of solutions x of an(x) = z is the
set of points x of the form
where x 0 ,x 1 , ... ,Xn- 1 E {1,2, ... k} are arbitrary. Therefore the cardinality of the
set of such solutions x is kn.
Periodic orbits for a are also easy to find. The set of all periodic orbits of
period n is obtained in the following way: take z0 , z1 , .. , Zn- 1 in all possible ways
such that z; E {0, 1}, i E {1, 2, ... , n- 1}. For each one of these zo, z1, ... Zn-1
91
repeat the block infinitely many times, in order to obtain the set of all x such that
an(x) = x, where
X= (zo, ZJ, ... , Zn-1, Zo, ZJ, ... , Zn-1, Zo, ZJ, ... , Zn-1, ... ).
Remark. Note that the cardinality of the set of solutions z of an(z) = z and the
cardinality of the set of solutions x of an(x) = z is the same and equal to kn. In
fact, the procedure of finding the set of solutions is quite similar in both cases.
Proposition 2.2. The set of all periodic points for the shift is dense in n with
the de metric.
Proof. Given z = (z;)iEN, z; E {0, ... , k- 1}, and~> 0, takeN such that ()N < ~
Now define x as the successive repetition of the string (z 0 , z1 , ... , ZN ), that is,
X= (zo, z1 , z2 , ... , ZN, Zo, z1 , z2 , ... , ZN, Zo, ZJ, z2 , ... , ZN, ... ).
Then de( z, x) < ()N < ~ and TN+ 1 ( x) = x, that is, x is a periodic point of period
at most (N + 1) and~ close to z. This proves the proposition. •
Remark. A similar result for the preimages of a certain point z can be obtained
(the proof is basically the same), that is: any y E X can be aproximated by
preimages of z.
Note that the temporal mean }(z) off (in Birkhoff's Theorem) at a point z
belonging to a periodic orbit, is the mean value off in the orbit of z. Therefore,
in most cases (but not all cases, as we can see below), the periodic orbits have to
be excluded from the set A of measure one mentioned in Birkhoff's Theorem.
In an extensive number of cases in Dynamical Systems the periodic orbits are
dense in the region where the dynamics is concentrated [6]. Periodic orbits are
extremely important for understanding the dynamics and the ergodic properties
of a measure 11 even if they can have 11 - measure zero.
There exist invariant probabilities that are finite sums of Dirac measures in
M(T), but they have to be concentrated on periodic orbits because of the invari-
ance.
92
J1.{(001001001001...)) = 1/3
Notation. A law "1 such that for each set A in the a-Algebra of Borel of X, ry(A)
is a real number (not necessarily positive) or is equal to oo, and such that:
L ry(A;)
00
a) ry(U~ 1 A;) =
i=I
b) ry(0) =0
c) ry(A- B)= ry(A)- ry(B)
when B C A, is called a signed measure. We denote by S(X) the set of all signed
measures on the Borel a-algebra of X.
Example. For the set X= R, given a continuous function ci>(x) (not necessarilly
positive and not necessarily integrable), the law ry(A) = JA ci>(x)dx is a signed-
measure on X.
93
Given a certain normed space V, the dual of V, denoted by V*, is the set of
all continuous linear functionals on V, that is, the set of all functionals C : V -+ R
that are linear and continuous. The following theorem claims that the dual of the
set C(X) is the space S(X) [16].
Corollary 2.1. If C is positive (that is, for any f E C(X), C(f) 2:: 0 iff 2:: 0)
and if £(1) = 1, then there exists a unique probability p. E M(X) such that
C(f) = J fdp. for any f E C(X).
The measure w always exists and is well defined by Riesz's Theorem applied
to C(f) = J(! o S)( x )dv( x). The measure w is usually called the pull back of the
measure v by the map S.
It is easily obtained from well known properties about approximation of con-
tinuons functions by step functions (finite sums of indicators with different weights)
and vice-versa [16] that 1) and 2) below are equivalent:
1) for any Borel set A,
ju o S)(x)dv(x) = j f(x)dw(x).
One would like to say that a sequence of measures f-tn converges to f-t if and
only if, for any Borel set A, the sequence f-tn(A) converges to J-t(A). This is almost
true. One has to suppose that the boundary of the set A has f-t - measure zero and
then the claim is true [16). The more usefull definition of convergence is in terms
of the action of the measures on the continuous functions:
Definition 2.10. The Dirac Delta measure at the point z is by definition the
probability measure that associates measure one to each Borel set that contains z
and has measure zero otherwise. We will denote such a probability by Oz.
f-t = n--tooo
lim .,!_ '"' OTi(z) (2)
n L....,;
j=O
95
Definition 2.11. The right hand side of the above equality is called the empirical
measure . [7]
Definition 2.12. The support of a measure I" defined on X is the set of points
x EX such that for any e greater than zero the measure fL(B(x, e)) of the ball of
center x and radius e is strictly positive.
3. Entropy
Theorem 3.1. (Brin-Katok) [4] - Suppose J.L is ergodic for the transformation T
on (X, A ,J.L) and consider d a metric on the compact set X. Then the two limits
Definition 3.1. For an invariant ergodic measure J.L E M(T) we define the entropy
of J.L as the value
h(J.L) =-lim
e--o
(limsup~n logJ.L(B(z,n,e))),
n-+oo
where z was chosen in a set of measure one satisfying the above Theorem.
Note that we could define alternatively the entropy by the lim inf (see Theorem
3.1).
Later on we will define the entropy of a measure J.L E M(T) when J.L is not
ergodic.
Note that the larger the entropy of the measure, the faster will be the decreas-
ing of indeterminacy of the system. Therefore larger the entropy, more chaotic the
system is.
97
Example. A trivial example where we can compute the entropy is the following:
consider a periodic point x of period n, and the probability 1-' = Ej,:-g ~8Ti(x)· It
is easy to see that this measure 1-' is ergodic and that the entropy h(l-') = 0.
The above example is in fact not exactly random or chaotic, but, m some
sense, totally deterministic.
Proposition 3.1. The entropy of the probability 1-' = P(p 0 ,p!), with po,PI > 0
and Po+ PI = 1, invariant for the shift on n= B(po,p!), is
Proof. As we mention before, it can be shown that the probability P(po,PI) under
the action of the shift is ergodic (see Remark after Definition 2.4).
Consider z E D in a set A of P-measure one satisfying the Birkhoff Ergodic
Theorem. That is, for any f E C(X),
lim ~"'
n~<X> n ~
n-I
j=O
f(al(z)) = J f(x)dP(x).
The intersection of A with the set of full measure of the Definition 3.1 will
also have measure one. Without loss of generality we can suppose that z is in this
intersection.
Fix ~ > 0. Remember that we consider on D the metric
de(x, y) = ()N
where N is the largest integer such that x; = y; for any 0:::; i :::; N. Let no be such
that ()no < ~ and assume no is the smallest possible such. Then, for n > no we have
B(z,n,O ={yEn I de(al(z),ai(y)) < ~.o =::;j:::; n-1} = (zo,ZI,Z2,····Zn+no-I)
and therefore
l n+no-1 . 1 n+no-1
L
.
= lim - " ' I(o)(al(z))log Po+ lim - I(o)(al(z))log PI·
n-oo n ~ n--+OC> n
j=O j=O
98
The limits in the last expression exist see because z was chosen satisfying
Birkhoff's Ergodic Theorem, and therefore
and
lim 1
n+no-1
~ I(I)(ui(z)) =
Jl<I>(x)dP(x) =Pt·
n->oo n + no - 1 L.....J
i=O
Therefore
Finally,
and therefore
h(P) = -po log Po - Ptlogpl. •
The next result can be obtained using a similar argument to the one used in
the proof of last Theorem:
Theorem 3.2. For the probability P(p 1 ,p2, ... ,pn), invariant for the shift u inn
symbols, the entropy is:
n
h(P) =- LPi logp;.
i=l
Note that from the definition of entropy in principle the value h(J.L) could
depend on the metric d we are using. In fact the entropy h(J.L) do not depend of
the metric but we will not prove this result in the text (see [13]).
99
h(z) = lim{-limsup
e-..o \ n--+00
~logf..l(B(z,n,e)~
n
1im{-liminf ~logf..l(B(z,n,e)~.
1 = e-..o \ n
n--+oo 1
The function h( z) is integrable.
The difference between this result and the previous one for ergodic measures
is the function h(z). When the measure is not ergodic the "limit sup" can change
from point to point even in a set of full measure. When f..l is ergodic h( z) is constant
for all z f..l-almost everywhere.
Definition 3.2. The entropy of f..l E M(T) is the integral J h(z)df..l(Z) where h(z)
is defined in the above theorem.
4. Topological Pressure
The entropy of a system ( T, X, f..l) measures the randomness of the system. The
larger the entropy, the more chaotic the system is.
The concept of entropy appears in Physics and is associated with the principle
that Nature tends to maximize entropy. That is, if one considers particles of a gas
concentrated at a corner of a closed box, at an initial time t 0 , then after some time
the particles will tend to an equilibrium where the particles are spread in a totally
random way. This means that after some time the gas will have a uniform density
in the box. As the velocity of the particles is very large, in fact, this is the state
that will be observed. Therefore the state that will ocurr in Nature will be the one
that is most randomic among all possible states.
100
i=l i=l
is called the entropy of the distribution (PI, pz, ... , Pm ). Let - I:;: 1 PIE; denote
the average energy E(p1,p2, ... ,pm)·
Then we can say that the Gibbs distribution maximizes
The expression BE - H is called, in this context, free energy (in fact, there
exist several different concepts in Mathematics and Physics also called free energy).
Therefore we can say that Nature minimizes free energy. When the temper-
ature T = oo, that is, E = 0, nature maximizes entropy. In this case the Gibbs
state is the most random probability, namely, Pi = 1/m, j E {1, 2, ... , m }. Again,
using analogy with Classical Mechanics, E plays the role of potential energy and
H plays the role of kinetic energy.
Now, let us return to Gibbs measures. Generalizing the above considera-
tions, Ruelle proposed the following model: consider the one-dimensional lattice
Z. Here one has for each integer a physical system with possible states 1, 2, ... , m. A
configuration of the system consists of assigning an x; E {1, 2, ... , m} for each i E Z.
Thus a configuration is a point X= {x;}iEZ E {1,2, ... ,m}z = n
Considering now on the space n the shift map
and M (a) the space of probabilities v such that for any Borel set A
where h(p) is the entropy of the probability p. We call such a supremum the
Topological Pressure (a better name would be Free Energy, but we follow here the
terminology of Ruelle) associated with¢> and denote it by P(¢>).
equilibrium state associated to the potential ¢>. The equilibrium state f..l will be
defined by means of a variational formula (see Definition 4.2). In the case of the
present example, the solution can be obtained by means of the theory of Markov
Chains and Perron-Frobenius operator (note that we introduce a stochastic matrix)
and this will be explained in section 7 (see example after Theorem 7.5).
The solution for the case of a general r/> (not constant in cylinders) will require
a more sofisticated version of the Perron-Frobenius theorem that will be presented
on section 7.
Most of the time we will use the word pressure instead of topological pres-
sure. It is natural to ask which properties does a probability p which attain such
supremum have.
Definition 4.2. We will call the probability f..l that attains the above supremum
(in the case there exists one such f..l) the Gibbs state (or equilibrium probability for
r/>) for the one-dimensional lattice with potential function ¢>. In other words:
h(f..l) + J
r/>(z)df..l(z) = P(r/>)
J J
or
h(f..l) + rf>(z)df..l(z) 2: h(v) + rf>(z)dv(z)
Notation. Sometimes we will denote this probability f..l by f..lt/> in order to express
the dependence of f..l on ¢>.
For expanding systems the probability that attains the above supremum is
unique, and therefore equilibrium states do exist (see paragraph 7). Non-unique-
ness of the probability that attains the supremum is related with Phase Transition
of spin-lattices [9],[10],[12]. D.Ruelle [17] was able to obtain a certain function r/>
that represents interactions of a certain special kind and such that the probability
that attains the above supremum P( r/>) is exactly the "Gibbs state" in the lattice
Z that, with other procedures, people in Physics already knew a long time ago.
Therefore the terminology of Gibbs state that we introduced above is quite proper.
105
The analogy of the above setting in the lattice Z with the finite case we
mention before is transparent.
If we assume a wall effect, then we have to consider the lattice N, that is the
one-sided shift.
The setting we presented above is suitable for analyzing problems in Statistical
Mechanics of the one-dimensional lattice Z. For the two-dimensional case Z 2 (or
for the three-dimensional case Z3 ), one should consider actions of Z2 (or Z3 ) and
the situation is much more complicated (see [17] for references).
Entropy is defined for measures and Pressure for continuous functions. The set
of measures and the set of continuous functions are dual one of the other. In fact
these two concepts are related one to the other by means of a Legendre Transform
[8]. Some of these properties will be consider in the last part (see section 7) of
these notes.
We refer the reader to [7] [8] [5] [11] for a complete description of the above
results.
When two different ¢ and 1/J determine the same equilibrium state Jl ? That
is, when Jl<t> = Jl.P ? This is an important question that will be analized more
carefully later. The following proposition is an easy consequence of the properties
of the probabilities v E M(T).
Proposition 4.1. Criterium of Homology- Suppose¢ and 1/J are two continuous
functions such that there exist a continuous function g and a constant k satisfying
¢ -1/J =goT- g + k, then Jl<t> = Jl.P·
and this value P(O) is called the topological entropy ofT . We will denote such
value by h(T).
We refer the reader to [3] [15] [17] [18] for results about Pressure and Ther-
modynamic Formalism.
In the case T = a it can be shown that h( a) = log d (see Definition 4.3) if
(a, n) is the shift in d symbols.
More generally, if an expanding map T has the property that for any
a EX, #{T- 1 (a)} = d, then h(T) =log d.
From Theorem 3.2 the entropy of the shift a of d symbols, under the probabil-
ity P( 1I d, 1I d, ... , 1I d) is equal to log d. Therefore, in this case we can identify very
easily the equilibrium state for¢>= 0, it is the probability p 0 = P(11 d, 11 d, ... , 11 d).
This measure will be called later the maximal entropy measure .
In paragraph 7 we will consider very precise results on the existence of equi-
librium states for expanding maps.
5. Large Deviation
In this paragraph and in the next one, we will consider T a continuous map from
a compact metric space (X,d) into itself, p an ergodic invariant measure on (X, A)
and f a continuous function from X to R m. Some of the proofs will be done for
m = 1 in order to simplify the notation.
The Ergodic Theorem of Birkhoff claims that for an ergodic measure p E
M ( T) and a continuous function f from X to R m, for p-almost every point z E X,
n-1
J~moo ~ L f(Ti(z))
j=O
= J f(x)dp(x).
j l0(x)dP(x) I,
n-1
I~?: fo(ai(z))-
]=0
1 999 . 1 1 1
- ""'I-(a 1 (z))-- = 1-- =- > 0.01.
n~ 0 2 2 2-
i=O
Definition 5.1. Given E greater than zero and n EN, then by definition Qn(E)
is equal to:
J=O
f(Ti(z))- I f(x)dp(x) I~ t:}.
lim Qn(E) = 0.
n-oo
An= {z II~ L
n-1
J=O
f(Ti(z))- I f(x)dp(x) I~ t:}.
j=O
f(Ti(z))- I f(x)dp(x) I~ €} = 1
109
One would like to be sure that the convergence to zero we consider above in
Proposition 5.1 is at least exponential, that is: for any E, there exists a positive M
such that for every n
Under suitable assumptions we will show that this property will be true (see
Prop. 6.8).
It is quite surprising that in the case Jl is an equilibrium state (see Def. 4.2)
this result can be obtained using properties related to the Pressure (see paragraph
7 and 8). We will return to this fact later, but first we need to explain some of the
basic properties of Large Deviation Theory.
The relevant question here is how fast, in logarithmic scale the value Qn( ~:)
goes to zero, that is, how to find the value
lim
n-+cx:>
~log
n
Qn( E).
Qn(A) = fl{Z I- 1
n i=O
110
In the same way as before one would like to know the value
Remark. If the set A is an open interval that contains the mean value J f( x )dp.( x ),
then the above limit is zero because limn--+oo Qn(A) = 1 (see corollary 5.1).
First, we will try to give a general idea of how the solution of this problem is
obtained, and then later we will show the proofs of the results we will state now.
There exists a magic function I( v) defined for v E R m (the set where the
function f takes its values) such the the above limit is determined by:
when A is an interval.
The function I it will be called the deviation function. The shape of I is
basically the shape of I v - J f( x )dp.( x) 12 , v E R m, that is, I( v) is a non-negative
continuous function that attains a minimum equal to zero at the value J f(x)dp.(x).
The properties we mentioned before are not always true for the general T, p.
and f, but under reasonable assumptions the above mentioned properties will be
true. This will be explained very soon.
The natural question is: how can one obtains such a function I? The function
I( v ), v E R m is obtained as the Legendre Transform (we will present the general
definition later) of the free energy c(t), tERm to be defined below.
Definition 5.4. Suppose that for each t E R 0 and n EN, the value cn(t) is finite,
then we define c(t), the free energy, as the limit:
We say g is strictly convex , if for any 0 < .X < 1 the above expression is true with
< instead of ::; .
It is easy to see that a differentiable function g( t) such that its second deriva-
tive satisfies g" ( t) 2: 0 for all t E R is convex.
where h and k are respectively on Cp(P,) and Cq(P,) and p and q are such that
1/p + 1/q =1.
Consider s,t ERn, h(x) = e<.Xs,J(x)+ ... +f(Tn-l(x))>,
k(x) = e<(I-.\)s,J(x)+ ... +J(rn-'(x))> , .X E (0, 1), and then define p=1/.X and
q=1/(1- .X). Now, using the Holder inequality:
(J e <s,f(x)+ ... + f( yln-1 l(x))> dp,( X)).\(! e <t,J(x)+ ... + f(T(n-l)(x))> dp,( X) )1-.\.
112
Therefore, taking ~ log in each side of the above inequality, one obtains that:
Definition 5.6. The deviation function I(v), v E Rm, is by definition the Legen-
dre transform of the function c(t), tERm, that is
I(v)= sup{<t,v>-c(t)}.
tER"'
where t 0 is such that c' (to)= v (see proposition 6.1). Such a t 0 is well defined if c
is strictly convex and differentiable. In this case the deviation function I( v) is also
differentiable in v, as it is easy to see. If c(t) is piecewise differentiable (with left
and right derivatives), then I(z) has also this property.
In more precise mathematical terms one should say that the deviation function
I(v) of c(t), t E Rm, takes values v in the dual of Rm. The dual of Rm is Rm
itself, and therefore, in the finite dimensional case (m finite) there is no problem to
define the Legendre transform in the way we did above. We will need to consider
Legendre transforms in infinite dimensional vector spaces soon. This will require
some small changes in the definition of Legendre Transform. Before that, we will
consider the main properties that are true in the finite dimensional case. The
key property is the differentiability of the free energy c(t). Assuming piecewise
differentiability (with the existence of right and left derivatives for c(t), t E R),
most results we will state below will be true (Theorem 6.2 and Proposition 6.8
require that the free energy be differentiable).
113
Theorem 5.1. Assume the free energy c(t), tERm is well defined and also that
cis differentiable, then for an open paralepiped A contained in Rm
lim
n--+oo
~logQn(A)
n
=- inf{J(v)}.
zEA
The above result is true for much more general sets A contained in Rm, but
we will state and prove the general result later.
The main results for the finite dimensional case will be proved for n=l. The
general case is not very much different from the case n=l. The infinite dimensional
case is however much more difficult than the finite dimensional case [7].
Definition 6.1. Given a convex piecewise differentiable map g(y), y E Rm, the
Legendre transform of g, denoted by g*(p),p E Rm, is by definition
Proposition 6.1. Suppose g(y) is defined for all y E R and that the second
derivative is continuous. If there exists a> 0 such that, g"(y) >a> 0, y E R,
then g*(p) = PYo- g(yo) where g' (yo)= p.
Proof. In the case there exists a value y0 such that g' (yo) = p, then clearly
g*(p) = YoP - g(y0 ). Therefore, all we have to show is that g' (y) is a global
diffeomorphism from R to R.
Note that for a positive h, g' (x +h)- g' (x) = J:+h g"(y)dy > ah. Therefore
the map g' is injective. The map g' is open (that is, the image g'(A) of each open
set A is open) because g'(x +h)- g'(x) > ah. The map g' is closed (that is, the
image g' (K) of each closed set K is closed), because it is continuous. We claim
114
that g' is sobrejective. This is easy too see: the image by g' of the open and closed
set R, is an open and closed interval and therefore equal to R. The conclusion is
that g' is bijective from R to itself. •
Proposition 6.2. Suppose g(y) defined on y E R satisfies g" (y) > 0 for all y E R,
then g* satifies g*" (p) > 0 for all p E R.
Proof. We will use the following notation: for each value p denote y(p) the
only value y such that ~(y(p)) = p. As we saw in the last proposition g*(p) =
y(p)p- g(y(p)). Taking derivatives with respect top,
dg* dy dg dy dy dy
dP(p) = dp (p)p + y(p)- dy (y(p)) dp (p) = dp (p)p + y(p)- pdp (p) = y(p).
Remark. We will assume that all maps g to which we apply the Legendre trans-
form satisfy the condition g" (y) > a, y E R for a certain fixed· positive value a.
When we consider piecewise differentiable maps (with left and right derivatives),
then we will also suppose that the left and right derivatives satisfy the same con-
dition in a.
Proposition 6.3. Suppose f(x) and f*(x) are stricly convex and differentiable
for every x; then the Legendre Transform is an involution, that is, f** =f.
115
y(p)
Figure 1.
•
116
Xo X
Figure 2.
The last result claims that iff is conjugated to g, then g is also conjugated to f.
This definition allows one to deal with the case c(t), t E R, piecewise dif-
ferentiable (it is differentiable up to a finite set of points t;, i E {1, 2, ... , n} ). In
the values t where cis differentiable there is a unique subdifferential c' (t) = bc(t),
but in the values t; where c( t) has left and right derivatives (we assume in the
definition that this property is true) respectively equal to u; and v;, then bc(t;) is
the interval [ u;, v; ].
The next result shows a duality between the subdiferentials of conjugated
functions.
for all z E R.
The last expression is equivalent to
for all z E R.
Therefore y E hg(x) is equivalent to say that x realizes the supremum of
< y,z > -g(z).
We also obtain from the above reasoning that y E hg(x) is equivalent to
g*(y) =< y,x >- g(x), and thus equivalent to< x,y >= g*(y) + g(x).
Applying the same result for g = g*, and interchanging the role of x and
y, that is, x=y and y=x, we conclude that x E 8g*(y) is equivalent to < y, x >=
g**(x)+g*(y). The last expression is equivalent to< y, x >= g(x)+g*(y), because
from the last proposition g** = g.
Hence y E hg(x) is equivalent to x E 8g*(y) •
Proof. First note that as I= c*, then from the last proposition v E &(0), if and
only if, 0 E 8!( v ). In this case,
The proof of the main Theorem 5.1 is done in two separated parts: the upper
large deviation inequality and the lower large deviation inequality. First we will
118
show the upper large deviation inequality. This inequality is true in a quite general
context, even without the hypothesis of full differentiability of c(t) [7]. In the
second inequality we will use differentiability of the free energy.
IL
n-1
lim sup~ logjt{x f(Ti(x)) E K} =lim sup~ logQn(K) ~- inf I(z) (5)
n-oo n . n-+oo n zEK
]=0
{X I g(X ) 2: d} J h(g(x))dJl(x)
Jl ~ h( d) .
. 1
hm sup -log Qn([a, oo )) ~ -sup{ ta- c(t)}. (6)
n-oo n t~O
119
Proof of the Claim. c( t) is convex, hence u 0 , the left derivative of cat 0, satisfies
Uo ::; c~t) 't < 0. Therefore,
c(t)
ta- c(t) = t(a- -t-)::; t(a- uo).
Before we return to the proof of Theorem, we will need first to prove another
claim.
Now, from equation (6) and using the two claims stated above, we obtain the
desired conclusion
1
lim sup- log Qn {I<} ::; - inf.J( z) (7)
n-oo n zEh
when I< = [a, oo) and a larger than v 0 , the rigth derivative of c at 0.
The proof for intervals K of the form ( -oo, a], a < u 0 is similar.
Now we will prove the claim of the theorem for a general closed set K.
120
First note that if K intersects the set &(0) = [uo, vo], then the claim is trivial
because infzeK I(z) = 0 (remember that v E c5c(O), if and only if, I(v) = 0, by
proposition 6.5).
Hence, we will suppose that K does not intersect the set [u 0 , v0 ].
Consider a, b two real values such that ( -oo, a] U [b, oo) is the smallest possible
set such that I< C ( -oo, a]U[b, oo). As the set K is closed, then (a= -oo or a E I<)
and (b = oo or b E I<). Suppose for simplification of the notation that a, b E I<
(the other case can be easily handled by the reader). From the first part we know
that infze(-oo,a] I(z) =I( a) and infze[b,oo) I(z) = I(b). Therefore infzeK I(z) =
min {I( a), I( b)}, because a, bE I<.
Finally from the first part(7):
lim sup.!_ log Qn(I<) :5 lim sup.!_ log(Qn( -oo, a] + Qn[b, oo)) :5
n~oo n n~oo n
lim sup .!_(log Qn( -oo, a] +log Qn[b, oo)) :5 -I( a) - I(b) :5
n-+oo n
- inf{I(a),I(b)} = inf I(z).
zEK
p.( {z 1"'
n-1
II - f(T .(z)) E K} = p.( {z I -1"' f(T .(z))- c (0)
~ 1
n-1
~ 1 I
I~ e} :5 e-n M . (8)
n i=O n i=O
121
From the last inequality, p;-almost every point z has the property that its
temporal mean converges to c' (0), and from the Theorem of Birkhoff, this value
c' (0) has to be the spatial mean J f( x )dp;( x ). Hence we obtain a contradiction
and the proposition is proved. •
Definition 6.4. We say that the p;-integrable function f from X to R has the
exponential convergence property, if for any e > 0, there exist M > 0 such that:
p;{y II
n-1
L
j=O
f(Ti(y))- J f(x)dp;(x) I~ e}:::; e-nM
Proof. As we have just shown that c'(O) = J f(x)dp;(x) and v = c'(o) is the only
value that I( v) = 0, then given e, there exists
M = inf I(z),
zE[j f(x)dp(x)-•.J f(x)dp(x)+•]
such that
p;{y II e-nM. •
j=O
We will need the very well known definition of distribuition in order to simplify
the notation in the proof of the next theorem:
J J
function g : R ---+ R
go fdp;(z) = g(x)dp;f(x)
Such a measure ,_,I always exists (using the notation of the first chapter f :
X ---t Y (or f: X ---t R), then f..tJ it is the pull-back of the measure f..t by the map
f as introduced in Definition 2.8).
f..t( {z II Xn
-(z)- c (0)
1
I~ f
}
:::; e-n M (9)
n
for n large enough.
The value M is obtained in the following way:
M = inf I(l),
IE( -oo,c' (0)-<)U( c' (O)+<,oo)
where for each value l, I (I) = sup 1ER { sl- c( s)}, is the Legendre transform of c( s ).
Theorem 6.2. (Lower large deviation inequality) Suppose that the free energy
c( t) is differentiable for every t E R, then for any open set A:
Proof. We will assume that for any real value z E R there exists a value t such
that c' (t) = z. If we suppose that c" ( t) > o: > 0, then this assumption is satisfied
as we saw in Proposition 6.1.
The above hypothesis is not necessary for the proof of the theorem, but in or-
der to avoid too many technicalities, we will prove the result under this assumption.
Consider z in the open set A and r such that B(z, r) = (z-r, z+r) is contained
in A. Denote by t a value such that c' (t) = z (there exists such at by hypothesis).
Now we will need to use the concept of distribuition of a p-measurable function
that we introduce before.
nr · · on R sue h th a t JL n = JL .!.n z=~-l
vve WI"11 d enote b y JL n t h e d"1stn"b mtwn 1= 0
f(Tj (z))
J(a,b)
dpn(x) = JLn((a, b))= p{z
n
n-1
I ..!:_ L
j=O
f(Ti(z)) E (a, b)} = Qn((a, b)).
Denote Zn(t) = J etnxdpn(x) = encn(t) (see definition 5.3 and remember the
practical rule mentioned in the remark after the definition 6.5 of distribuition). The
reader familiar with Statistical Mechanics will recognize the Partition function in
the definition we introduced.
124
For each value t E R and n E N, we will now denote by J-L~ the probability
on R given by
(11)
and therefore the term Zn(t) = ecn(t) appears only as a normalization term in the
definition of the probability 1-"~ (it does not depend on x ).
This one-parameter family of probabilities J-L~, t E R, will play a very impor-
tant role in the proof of the theorem.
One should think of the measure J-L~ in the following way: for t=O the measure
J-L n = J-Lf. From the Theorem of Birkhoff, the measure J-L n = J-Lo focalizes on (or
has mean value) v = J f(x)dJ-L(x) = c'(O), that is,
For the given value z E A, we choose t such that c' (t) = z, and then the
measure J-L~, will focalize on (or has mean value) z = c' ( t) as will be shown:
Proof of the Claim. For the value t and n EN, let Xn be a measurable functions
such that ~ has distribuition function J-Lf (such measurable functions always exist
by trivial arguments). Now we will use the last theorem and the fact that z = c' (0).
Define the new free energy
R~OO
1
c1(s) = lim -log
n
J esnxdp~(x) lim -log
n--+00
1
n
J enx(s+t)
eRCn
(t) dpn(x)
Using the fact that we choose tin such manner that c~(O) = c' (t) = z, we conclude
that:
lim p~((B(z,r)) = 1
n-co
Note that introducing the parameter t in our problem (defining the one-
parameter family of measures pf, n E N), has the effect of translating by t the
free energy c( s) (on the parameter s ), that is,
Hence
From the claim we know that the last term in the right hand side of the above
expression is zero. Hence, as c(t)- tz = -I(z), because c' (t) = z, then
1
liminf -logQn(A) 2::- inf I(z),
n~oo n zEA
Theorem 6.3. Suppose c(t) is differentiable in t, then for a given interval C (open
or closed)
lim
n~oo
~n logQn(C) = - zEC
inf I(z).
Now we will want to relate the results we obtained above with the Pressure
of Thermodynam ic Formalism.
127
In this chapter we will present several results related to the pressure of expanding
maps. For such a class of maps the Ruelle Operator will produce a complete solu-
tion for the problem of existence and uniqueness of equilibrium states. Theorem
7.2 will explain how to obtain in a constructive way the equilibrium states. We
point out that the Bernoulli shift is a very important case where the results we
will present can be applied. In this section we will consider only maps that have
the property that for each point z EX, {T- 1 (z)} is equal to a fixed valued> 1,
independent of z. Therefore the results will aply directly to the one-sided shift
but not for the two-sided shift (see section 2 for definitions). The results presented
here can be extented to the two-sided shift, but this require a certain proposition
that we will not present here (see [15]).
Recall the definition:
Definition 7.1. A map T from a compact metric space (X, d) to itself is expanding
if there exist A > 1 such that, for any x E X there exist e > 0 such that Vy E
B(x, e), d(T(x), T(y)) > Ad(x, y).
Example. Consider a0 =0 < a1 < a2 < a3 < ... < an-I < an =1a sequence
of distict numbers on the interval [0,1]. Suppose T is a differentiable (C 00 ) by
part map from [0,1] to itself such that IT'(x)l > A > 1, for all z different from
a 0 , a 1 , •.• an. Suppose also that for each i E {0, 1, 2, .. n- 1} , T([a;, a;+l]) = [0, 1].
We will also suppose that T has a C 00 extension to the values a;, i E {0, 1, 2, ... , n}
with the same properties. This map is expanding and is one of the possible kinds
of maps where the results we will present in this section can apply. In fig 3 we
show the graph of a map T where all the above properties happen.
Notation. We will use the following notation: for </> E C(X) and v E M(X) or
(S(X)) we denote the value J </>(x)dv(x) by< </>,v >.
Definition 7.2. For a given operator C from C(X) to itself, the dual of C is
the operator£* defined from the dual space C(X)* = S(X) (the space of signed
128
measures) to itself defined in the following way: £*is the only operator from S(X)
to itself such that for any¢ E C(X) and v E S(X)
Figure 3.
Remark. Remember that P(O) = log d = h(T) and therefore J-L is the equilibrium
state for 1/J = 0 (see definition 4.3). The maximal measure for the one-sided
shift in d symbols can be obtained also as the Probability P(1/d, 1/d, ... , 1/d) (see
definition 2. 7 and remark in the end of section 4 ).
Definition 7 .4. The above defined measure J-L is called the maximal measure.
C.p¢>(x) = L ei/J(y)¢>(y)
yET- 1 x
for any ¢> E C(M) and x E M. We call this operator the Ruelle-Perron-Frobenius
Operator (Ruelle Operator for short).
A function 1/J is called Holder-continuous is there exist "Y > 0 such that Vx, y E
X, d(T( x ), T(y)) < d( x, y p. We will require in the next theorem that the function
1/J be Holder and without this hypothesis about 1/J the results stated in the theorem
will not be necessarily true (see [10] for a counter-example).
Now we will state a fundamental theorem in Thermodynamic Formalism.
Theorem 7.2. (see [3] for a proof) - Let T : X +-' be an expanding map and
1/J : X -+ R be Holder-continuous. Then there exist h : X -+ R Holder-continuous
and strictly positive, v E M(X) and .A> 0 such that:
(1) J hdv = 1 ;
130
= >.h ;
(2) £1/Jh
(3) C~v = >.v ;
( 4) II >. -n £~¢ - h J ¢dv llc(x)---+ 0 for any ¢ E C(X). ;
(5) h is the unique positive eigenfunction of £1/J, except for multiplication by
scalars;
(6) The probability J-Ll/1 = hv is T-invariant (that is, J-Ll/1 E M(T)), ergodic,
has positive entropy, is positive on open sets and satisfies
log>. = h(J-Ll/1) + J
.,Pdp.;
In order to explain how one can obtain the equilibrium states J-ll/1 associated
to .,P in a more appropriate way, we will need to consider a series of remarks.
Remark. It follows from (6) and (7) of Theorem 7.2 that P( .,P) = log>. and that
J-ll/1 is the unique equilibrium state for ¢. Therefore the pressure is equal to log>.,
where>. is an eigenvalue of the Ruelle Operator. In fact, it can be shown that >.is
the largest eigenvalue of the operator £1/1 [3] [15]. The remaider of the spectrum of
£1/1 is contained in a disc (on C) of radius strictly smaller than >.. The multiplicity
of the eigenvalue .A is one.
Note that J-Ll/1 E M(T), but v is not necessarily in this set.
Remark. The value P( .,P) can be computed in the following way: fix a certain
point x 0 EX and consider¢ constant and equal to 1 in (4) of Theorem 7.2. Ash
is bounded (being continuous on a compact space) then from (4) Theorem 7.2
1 cn1(x )
lim - log 1/1
n-+oon >.n
°=0
that is,
lim
n_..oo
_!n log£~1(x 0 ) =log>.= P(.,P) (14)
131
or
lim ~log ""' et/J(y)+t/J(T(y))+ ... +t/J(Tn-t)(y) = P(,P). (15)
n-+oon L..J
In this way we can obtain v by means of the limit of a sequence of finite sum
of Dirac measures on the preimages of the point x. In the case of the maximal
measure (,P = O,P(O) = logd,>. = d,h = 1,v = p, = p,.p), the weights in the points
x such that Tn(x) = x 0 are evenly distribuited and equal to d-n. For the general
Holder continuous map '1/J, it is necessary to distribute the weights in a different
form as above. There is a more apropriate way to obtain directly the equilibrium
measure p,.p, that will be presented later.
Let us see now how Theorem 7.1 follows from Theorem 7.2. Take '1/J =0 and
let .>., h and v be given by Theorem 7.2. Then
i
respect to J.l E M(X) if
J.L(T(A)) = JdJ.L
Theorem 7 .3. Suppose that J (the Jacobian of an invariant measure J.l) is Holder-
continuous and strictly positive. Then
(a) h(J.L) = JlogJdJ.L;
(b) J.l is ergodic.
I:
T(x)=y
1
I J(x) I
=1 (18)
Proof. From (18) and condition (2) of Theorem 7.2, h =1 and A = 1 in the last
theorem. Hence P( -log J) = 0. •
Theorem 7 .5. Suppose .,P is Holder continuous, 1-L.P is the equilibrium state asso-
ciated with .,P and h is the eigenfunction associated with A in Theorem 7.2, then
the Jacobian J.p of the probability 1-L.P is given by:
J ( ) =, -.P(x)hoT(x) (19)
.p x Ae h(x)
That is .,P and -logJ.p satisfies the homology criterium (Proposition(4.1)) and
therefore they determine the same equilibrium state, that is 1-L.P = /-L-log J.p. Re-
member that P(- log J .p) = log A = log 1 = 0.
It follows from the last claims and from
(see ( 4) in Theorem 7.2) that the equilibrium state 1-L.P can be obtained in the
following way:
i-L'I/1= lim
n-oo
L e -log J.p(y)-log J.p(T(y))- ... -log J.p(Tn- 1 (y)) b
y (21)
Tn(y)=x
Hence from A and h one can obtain 1-L.P as the limit of a sum of weights placed
in the preimages of a point x EX (J.p is given by (19)).
Example. We will consider now the example mentioned in section 4, just after
Definition 4.1. In fact we can analyze a more general example where we will be
able to show explicitely the equilibrium probability. Consider p( +,+ ), p( +,-),
p( -,+) and p( -,-) non-negative numbers such that p( +,+) + p( +,-) = 1 and p( -,+)
134
+ p( -,-) = 1. These numbers p(i,j), i,j E { +,-} express the probability of having
spin j at the right of spin i in the lattice Z.
Consider the matrix
A- (p(+,+) p(-,+))
- p(+,-) p(-,-)
It can be shown [15] that this matrix A has the value 1 as the larger eigenvalue (this
result is known in the usual textbooks on Matrix Theory as the Perron-Frobenius
Theorem) and we will denote by (p( + ),p(-)) the normalized eigenvalue associated
to the eigenvalue 1, that is:
Now we can define a measure f.l on cylinders (and then extend to the more
general class of Borel sets) by:
n E N , io, i 1 , i 2 , ... ,in E { +,- }. It quite easy to see that considering in Theorem
7.2 the potential 1/J constant in each one of the four cylinders given by:
a) ,P(z) = logp(+,+) Vz E (+,+),
b) ,P(z) = logp(+, -) Vz E (+, -),
c) ,P(z) = logp( -, +) Vz E (-,+)and
d) ,P(z) = logp( -,-) Vz E ( - , - ),
then the eigenfunction h is constant equal 1 and >. equal 1. It is not difficult to
see that the measure f.l given above satisfies the equation (3) in Theorem 7.2 (see
also Definition 7.2), that is L~f.l = f.l (first show that L~f.l(B) = f.l(B), for the
cylinders B depending on the two first coordinates, and then depending on three
coordinates, and so on ... ). Therefore f.l is the equilibrium state for the 1/J given
above.
This example shows that the Ruelle Operator is in fact an extension of the
Perron-Frobenius Operator of Matrix Theory (finite dimension) to the infinite
dimensional space of functions.
135
The Jacobian of the measure J.l is constant by parts and is constant in each
cilinder (see Theorem 7.5)
The above described example includes the one we mention before in section
4.
(a) v E M(T)
and
(b) V¢ E C(X), < ¢, v >"5: P(¢). (23)
(b)-+ (a)
Suppose v satisfies (b), then we will show first that v is a measure, that is,
for any non-negative continuous function ¢, < ¢, v >2: 0.
Consider¢ E C(X) such that ¢(x) 2: 0, Vx EX, then given n EN and b > 0
by assumption (b). By definition of pressure and from the fact that ¢is nonegative
J
For n E Z n dv(x) = Jndv(x) :::; P(n) = h(T) + n, therefore v(X) :::;
h~T) + 1, if n > 0, and v(X) 2:: h<;') + 1, if n < 0.
Now letting n go to oo in the first expression and n to -oo in the second we
conclude that v(X) = 1.
This means that vis a probability. Finally we will show that v E M(T), that
J J
is, we will show that for any</> E C(X), </>(x)dv(x) = </>(T(x))dv(x). In other
words we have to show that < </> o T - ¢>, v >= 0.
For a given n E Z, n < </> o -</>, v >:S P(¢> o T- </>)n by assumption (b). Now
using the criteria of the homology we have that the last term is P(O) = h(T). Hence
< </> o T- </>, v >:S h<;'), if n > 0, and < </> o T- ¢>, v >2:: h<;'), if n < 0.
Now letting n go to oo in the first expression and n to -oo in the last expression
we conclude that < </> o T - </>, v >= 0. Thus the Theorem is proved. •
The Pressure P( 1/1) is a continuous function of 1/J ( see[18]); one could ask if
the entropy h(v) is continuous in v E M(T), that is, wheter
Wn E M(T)
Theorem 7.8. (see [18] for a proof) For expanding systems the entropy is upper-
semicontinuous at any probability v E M(T).
Remark. From the two results presented above one can conclude that a measure
v is invariant for an expanding map T, if and only if
The disturbing point in the above expression is that we are taking supremum
in a smaller set M(T) and not in the dual of C(X), that is in the set S(X). If we
define the entropy of a signed measure rt by
as in (24), then h(TJ) < 0 for 7J E S(X)- M(T) (see theorem 7.6).
Hence we finally can state that:
because the entropy of non-invariant measures will not interfere in the supremum
and the analogy with the finite dimensional case is complete.
For results about Large Deviation properties in this setting (level-2 large de-
viation) we refer the reader to [8]. In the next paragraph we will consider large
deviation properties, but in another setting (the Ievel-l large deviation). The ter-
minology of Ievel-l and level-2 is explained in more detail in [7]. The reference
[7] is an excelent source of results for large-deviation, but does not consider the
entropy (Kolmogorov-Shanon entropy) and presure as we are doing here.
We will repeat definition 6.3 but now for the infinite dimensional case.
Definition 7.8. For a given convex function K from C(X) toR, we call a signed
measure J-l E S(X) (the dual of C(X)) a subdifferential of K at the value 7J and
write J-l = bK(TJ), if the following is true: for any'¢ E C(X),
Notation. As the pressure P( '¢) is convex in '¢ we can consider the above defini-
tion for the pressure and we will denote the subset of signed-measures fl that are
subdifferential of P at the value 7J by t( 7J ). In other words,
t(77) = bP(TJ) = {J-L E S(x) IP('¢) ~ P(71) + j(t/J(x) -TJ(x))dJ-L(x), V'¢ E C(X)}.
(26)
Remember that for a continuous function '¢, the set of probabilities fl such
that P( '¢) = h(J-L) + J'¢( x )dJ-L( x) is called the set of equilibrium measures. The
main Theorem stated in the beginning of this section is that for an expanding map
T and a Holder continuous function'¢, equilibrium states exist and are unique.
Theorem 7.9. (see [18]) Suppose Tis an expanding map such that h(T) is finite.
If'¢ is a continuous function on X, then t( '¢) is the set of equilibrium states for '¢.
The set t( '¢) is not empty.
139
The next result improves the claim that for expanding systems the subdiffer-
ential of the pressure P at .,P is J.L.p (that is, 6P( .,P) = J.L.p ).
and therefore f + tg and log J + tg are homologous. Hence J.L f+tg = J.L-Iog J+tg and
furthermore P(f + tg) = P( -log J + tg) + P(f). Taking derivative with repect to
t in both sides of the last expression:
J cf>dJ.L-!og J = lim
n-tooo
"
L....J
Tn(y)=xo
</>(y)e- I:;:: log J(Ti(y)) (27)
One of the Remarks after Theorem 7.2 states that (see (15))
1 "" L:~- 1 (-log J+tg)(Tj(y))
P(-logJ+tg)= lim -log L., e 1=0 ,
n--+oo n
hence, derivativing term by term (the fact that this is possible is a crucial step
that will not be proved here [15](17]) one obtains:
d "' n "~-1 (Ti( ))eL:;,::< -log J+tg)(Tj (y))
P( 1 J ) 1. 1 LJT (y)=xo LJJ=O 9 Y
dt - og + tg = 1m -
n--+oo n
"' "'" 1
L..Jj,:o (-log J+tg)(Tj(y))
LJT"(y)=xo e
Now in the last expression considering t = 0 we obtain
d 1 "' n "~-1 (Ti( ))e- L:7,:o1log J(Tj (y))
P( 1 J )i 1' LJT (y)=xo LJJ=O g y
dt - og + tg t=O = n.:.~ ~ "' - L:;,:: log J(Tj(y))
LJT"(y)=xo e
(29)
• "' - L:~~: log J(Tj (y)) _
C 1aim. LJT"(y)=xo e ,_ - 1, Vn EN, Vxo EX
Proof of the Claim. The proof is by induction. The claim is true for n = 1 by
(18). Suppose the claim is true for n, then we will prove that the claim is true for
n+l.
In fact
L e- L:7=olog J(Tj(y)) =
In the last two equalities we used the fact that the claim is true for n and 1.
This is the end of the proof of the claim.
Now, we return to the proof of the Theorem. It follows from the claim and
(29)(27)(28) (taking ,P =go Ti) that:
n-1
d 1 · ""- 1 -1 J(Tj( 11 ))
-P( -log J
dt
+ tg)lt=O = n--+oo
lim -
n
"" ""g(T1(y))eL..Jj=o og
L., L.,
T"(y)=x j=O
n-1
(30)
=lim
n --+oo
~"".C~logJ(g(Ti))(xo)
n L.,
j=O
141
In this paragraph we will show a result relating large deviation with pressure. It
is possible to obtain very precise results about the deviation function for Holder
functions and the maximal measure of an expanding map.
We know that the maximal entropy measure (see theorem 7.1) JL can be ob-
tained as
d"
JL = n--+oo
lim d-n "bz(n
LJ ''i zo)·
i=l
142
Notation. In this section we will will denote by f-L the maximal entropy measure
(see theorem 7.1).
From [3] (in this moment the hypothesis about expansivity and Holder-conti-
nuous are essential), there exist constants C 1 , c1 such that for n large enough
Now from the expression of the pressure that appears as a Remark after theorem
7.2 (see expression 7.15) the claim of the theorem is proved. •
Theorem 8.2. The free-energy c(t) for a Holder-continuous function g and the
maximal measure p satisfies
from results of Thermodynamic Formalism. The next theorem answers this ques-
tion.
where Jlt 0 g = Jlt/> is the equilibrium state for t/J =tog and to satisfies p' (to) = v.
Proof. By definition
I(v) =sup{ tv- c(t)} =sup{ tv- (P(tg) -log d)}= sup{ tv- p(t)} +log d.
tER tER tER
It is easy to see that p(t) is convex and from theorem 7.10 p(t) is also dif-
ferentiable. Suppose t 0 is the unique value such that p' (to) = v, then from last
theorem and the definition of pressure
Now from Theorem 7.10 v = p' (to)= J g(x)dJJ 109 (x), and the claim of the Theorem
follows. •
In conclusion, for g E C(7) and the maximal measure Jl one can obtain the
value of I( v), v E R by I( v) =log d- h(Jlt 09 ) where to satisfies p' (to) = v.
145
Remark. More general results about large deviations and free-energy of Holder
functions g and equilibrium states /lg can be obtained, but we will not consider
such questions here. We refer the reader to [5],[8],[9] for interesting results in this
subject. Theorem 3 in [8] is not correctly stated, but is not necessary for the proof
of Theorem 7, the main result of [8].
References
[14] Mane, R., On the Hausdorff Dimension of the Invariant Probabilities of Ra-
tional Maps, Lecture Notes in Math. 1331,86-116, Springer-Verlag(1990).
[15] Parry, W., M. Pollicott, Zeta Functions and the Periodic Orbit Structure of
Hyperbolic Dynamics, Asterisque 187-188 (1990).
[16] Rudin, W., Real and Complex Analysis, McGraw-Hill (1974).
[17] Ruelle, D., Thermodynamic Formalism, Addison-Wesley (1978).
[18] Walters, P., An Introduction to Ergodic Theory, Springer-Verlag (1981).
FORMAL NEURAL NETWORKS: FROM SUPERVISED TO
UNSUPERVISED LEARNING
JEAN-PIERRE NADAL
Laboratoire de Physique Statistique*
Ecole Normale Superieure,
24, rue Lhomond, F-75231 Paris Cedex 05
France
ABSTRACT. This lecture is on the study of formal neural networks. The emphasis will be put on
the bridges that exists between the analysis of the main tasks and architectures that are usually
considered: auto-associative learning by an attractor neural network, hetero-associative learning
by a feedforward net, learning a rule by example and unsupervised learning. In particular a
duality between two architectures will be shown to provide a tool for comparing supervised and
unsupervised learning.
1. Introduction
In the study of formal neural networks (for a general review see [21], [31]), one
usually distinguishes two main types of learning paradigmes, and two main types
of architectures. For the learning tasks:
• Supervised learning (the desired output is given for a set of patterns). There
are two sub-families:
- learning by heart (that is realizing an associative memory)
- learning a rule by example: the set of input-ouput pairs to be learned
are a set of examples illustrating a rule. One expects the network to
generalize, that is to give a correct output when a new (unlearned) pattern
is presented.
* Laboratoire associe au C.N .R.S. (U .R.A. 1306) et aux universites Paris VI et Paris VII.
147
E. Coles and S. Martinez (eds.), Cellular Automata, Dynamical Systems and Neural Networks, 147-166.
© 1994 Kluwer Academic Publishers.
148
Fifty years ago McCulloch and Pitts have defined the formal neuron as a binary
element [26]. What they have shown is that, with such a caricature of the biological
neuron, one can build a universal Turing machine. However this positive result
says nothing on how to use (formal) neurons in order to learn a task. But the basic
ideas on how learning could take place were proposed at about the same time: in
1949 the neuropsychologist D. 0. Hebb publishes a book [20] where he formulates
hypotheses explaining how associative learning might occur in the brain. In fact
almost every neural network modeling has its roots in this pioneering work of Hebb.
In all these models one basic postulate is that the properties of the synapses
might be modified during learning. This was exploited during the 60's in the study
of the simplest possible neural networks, in particular the perceptron [28]. This
network has an input layer directly connected to an output layer. The couplings
(synaptic efficacies) between the two layers are adaptable elements (in the original
design of the perceptron there is a preprocessing layer, but of fixed architecture and
couplings: one can thus ignore it for all what follows). The simplest perceptron
has only one output unit, as on figure 1.
input
(
j=N
Let me precise my notations for the case of the perceptron with one binary
150
output. Its state a takes, say, the values 0 or 1. There are N inputs units,
N couplings J = { J 1 , ... , J N} and a threshold B. Inputs may be continuous or
discrete. In a supervised learning task, one is given a set X of p input patterns,
which have to be learned by the perceptron. For a given choice of the couplings,
the output a~' when the J.tth pattern is presented is given by:
N
a~' = e(~= Ji~J - B) (2)
j=l
During the 60's the storage capacity of the percetron has been obtained from
geometrical arguments [13]. One considers the space of couplings (f = {Jj,j =
1, ... , N} being considered as a point in an N dimensional space). Then each
pattern J.l defines an hyperplane, and the output a~' is 1 or 0 depending on which
side of the hyperplane lies the point J Hence the p hyperplanes devide the space
of couplings in domains (figure [2]), each domain being associated to one specific
set ii = {a 1 , ... , aP} of outputs. Let us call ~(X) the number of domains. Since
each all is either Qor 1, there is at most 2P different output configurations iJ, that is
(3)
151
If the patterns are "in general position" (that is every subset of at most N pat-
terns are linearly independant), then a remarkable result is that ~(X) is in fact
independant of X and a function only of p and N [13]:
(1,0,1)
(1,0,0) (1, 1, 1)
1
(0,0,0) (0, 1, 1)
(0, 1,0)
3
2
L c;
minp,N
~= (4)
k=O
2P if p:::; N
~ = { < 2P ifp> N
(5)
dvc = N (6)
If the task is to learn a rule by example, the VC dimension plays a crucial role:
generalization will occur if the number of examples p is large compared to dvc
152
the fraction of output configurations wich are not realized remains vanishingly
small for a greater than 1, up to the "critical storage capacity" ([13], [18]) ac,
ac = 2. (8)
C =ln.6. (9)
Here (and in the following) logarithms are expressed in base 2, and S(x) is the
entropy function (measured in bits):
fraction of errors, f, that can be achieved can be computed by writing that the
capacity per synapse c( a) is equal to the amount of information stored per synapse
(when there is p€ errors), that is to a(1- S(€)):
The above formula (13) can be seen as an application of Fano's inequality [10]
giving the smallest possible error rate that can be achieved by a communication
channel of (Shannon) capacity C: the r.h.s. of (13) is the number of bits (per
synapse) that cannot be correctly processed, and the l.h.s is the amount of infor-
mation needed to specify where the errors are.
The preceding results for the perceptron appear to be also useful when considering
more complex architectures- in fact any learning machine with a binary output.
For a general learning machine, the VC dimension and the number of domains are
defined as above : D.( X) is the number of different possible output configurations
iJ. In general it will depend on the choice of X (and not only on p and N as for the
perceptron). However one can consider its maximal value over all possible choices
of X:
D.m =maxD.(X)
X
(14)
This maximal value D.m is equal to 2P for p up to some number called the VC
(Vapnik-Chervonenkis) dimension, dvc (possibly infinite), and is strictly smaller
above. As mentioned above for the perceptron, generalization is guaranteed for
p much larger than the VC dimension. Vapnik [39) has shown the remarkable
result that D.m is bounded above by l:~~~p,dvc c;. That is, there is an upper
bound which is precisely the number of domains of a perceptron having the same
VC dimension (i.e. with a number of inputs equal to that value of dvc, see {4)).
Hence this upper bound is optimal (all learning machines with a given value of the
VC dimension satisfy the bound, and equality is realized for at least one of these
machines, the perceptron).
154
To conclude this short section, one sees that the results for the simple percep-
tron gives us some insight on any learning machine if one replace N, the number
of couplings, by the VC dimension.
I now turn to ANN, relating its study to the one of the perceptron.
Hebb suggested also that the associative behavior of the human memory might be
the result of a collective behavior. The Attractor Neural Networks as introduced
by J. J. Hopfield in 1982 [22] can be seen as direct formalization of Hebb's ideas.
In this model, every neuron is connected to every other neuron, as on figure 3.
Each neuron is a linear threshold unit as above. With an asynchronous dynamics,
the state of neuron i is computed at time t + 8t according to
N
CJ;(t + 8f) = 0(L J;jCTj(t) - 8;) (15)
j=l
In the above dynamics, synaptic noise can be incorporated by replacing the deter-
minist updating rule by a stochastic one, but I will restrict here to the noiseless
155
(16)
then one can associate an "energy" to the dynamics (15) and show that from any
initial configuration the network will evolve towards a (possibly local) minimum
of the energy. This means that the network behaves like an associative mem-
ory: starting from some initial configuration- coding for a stimulus-, the network
evolves until it settles down to a fix point: the stable configuration that is reached
is the respons of the network to the stimulus; the presented pattern (initial con-
figuration) has been recognized as being the fixed point pattern. In this context
learning is equivallent to impose as fixed points a given set of patterns. In the
Hopfield model, an empirical (Hebbian) rule fixing the couplings as function of the
patterns is chosen. This particular learning scheme leads to symmetric couplings.
Using statistical mechanics tools (in particular thanks to the analogy with a spin-
glass model), the Hopfield model has been studied (4], as well as many variant of
it. Very soon it was recognized that the symmetry condition is not necessary, and
that other attractors than fixed points can be considered (2]. One of the most well
known result is the storage capacity of the Hopfield model: in the large N limit,
the maximal number of patterns that can be stored is Pmax = a eN, with
This means that for a = -/!;r smaller than ac the system does behaves as an as-
sociative memory, with for each stored pattern the existence of a fix point which
is very close to (although not identical to) that pattern. Since 1982 many studies
have been devoted to the Hopfield model and its variants (2] (31] (37], with as main
result that they do provide associative memory devices, with a storage capacity
proportionnal to the connectivity of the network (that is to the typical number to
which each neuron is connected; the connectivity is N in the standard Hopfield
model). Moreover it has been possible to modify the original model in order to
take into account biological constraints, and to consider ANN with more realistic
neurons and architectures (3] (31] in such a way that comparison with experiments
is becoming possible.
156
However these studies do not tell us how good (or bad) are the performances
of such models: are there better ways of computing the synaptic efficacies; under
which conditons is it possible to learn a given set of patterns?
A first answer to the preceding questions has been given by E. Gardner in 1987 [17]
[18] in a way I explain now. Instead of choosing a particular rule for computing
the couplings, one may ask first wether there exists at least one set of couplings
which stabilizes the patterns. A simple remark allows to get the answer. Looking
for a network that effectively memorizes a set of p patterns, {(" = ( ~'j, j =
1, ... ,N), f.1- = 1, ... ,p} (where each ~j is either 0 or 1), means looking for a set of
couplings and thresholds that satisfy the N p inequalities
1 N
for each i and each f.1- : (~f- 2)(L J;j~'l - o;) > o (18)
j=l
where usually the self-coupling terms J;; are set to 0 (one wants to avoid the
trivial solution J;; > 0, J;j = 0 for i =/; j, which do not give any associative
property). However, ifwe do not impose any particular symmetry condition, so
that the couplings J;j and Jji are independant parameters, one sees that the above
inequalities decouple in N independant sets of p inequalities: for each neuron i,
one has to solve the problem P; consisting of p inequalities for which the unknown
are the couplings {J;j, j = 1, ... , N, j =/; i} and the threshold 0;:
1 N
P;: for each f.l-, c~r- 2)(L J;j~r - 0;) > o (19)
j=l
2 that as many as 2N patterns can be learned exactly (which is much more than
the 0.14N patterns imperfectly learned in the Hopfield model). Moreover, the
perceptron algorithm ([28], see next section), applied to each neuron i (that is to
each problem P;), allows to effectively compute a set of couplings.
But Elizabeth Gardner went much further by introducing a statistical physics
approach to this theoretical study of learning [18]. She introduced a measure in
the space of couplings, so that it is possible to ask for the number (or the fraction)
of couplings that effectively learn a set of pattern. From that approach, using
the techniques developed for the study of spin-glass models, one gets the storage
capacity of the perceptron under various conditions (unbiased or biased patterns,
continuous or discrete couplings, ... ; the critical capacity O'c = 2 corresponding to
the particular case of continuous couplings and unbiased patterns). One gets also
the typical behavior of a network taken at random among all the networks which
have learned the same set of patterns. Moreover this approach has been adapted
to the study of generalization, that is to the learning of a rule by example [38]. I
will not give more details here on these aspects, and I consider now the algorithmic
problem.
arbitrarily chosen, amount of time, since one does not know in advance wether at
least one solution exists.
Since it has been realized that learning algorithms for the perceptron can be
used in the context of ANN, as explained above, many variant of the basic algo-
rithm have been proposed in order to find couplings having some specific properties
[1). In particular several algorithms (the "minover" [23), the "adatron" [5) and the
"optimal margin classifier" [11)) allow to find the synaptic efficacies which will
maximize the size of the basins of attraction.
But what if the desired associations are not learnable? There are various
algorithms which tend to find couplings such that the number of errors will be
as small as possible [15) [32) [40). In particular, the "pocket" algorithm [15) is
a variant of the perceptron algorithm which guarantees to find a solution with
the smallest possible number of errors - provided one lets run the algorithm long
enough ...
In most practical applications, where one wants to find a rule hidden behind a set
of examples, an architecture more complicated thant the one of the perceptron is
required. The most standard approach is to choose a layered network with an a
priori chosen number of hidden layers, and to let run the backpropagation algorithm
[24) [35). There exists however alternatives to this method: one can also "learn"
the architecture. Since 1986 there exists a family of constructive algorithms, which
adds units until the desired result is obtained [9) [14) [16) [19) [27) [34) [36). Most
of these algorithms are based on perceptron learning. I give here one example,
the "Neural Tree" algorithm [19) [36) (also called the "upstart" algorithm in the
slightly different version of M. Frean [14)).
Given the "training set", a set of p input patterns with their class 0 or 1 (their
desired output T ), one starts by trying a perceptron algorithm in order to learn
the p associations (pattern-class): in case theses associations were learnable by a
perceptron, the algorithm will give one solution, and the problem is solved. If not
(in practice if no solution has been found after some given amount of time), then
one keeps the couplings given by the algorithm (or the pocket [15) solution, that
159
is the set of couplings with the least number of errors). These couplings define
our first neuron. They define an hyperplan which cuts the input space into two
domains (figure 4), and input patterns on one side have a a 1 = 1 output, patterns
on the other side have a a 1 = 0 output. At least one of these domains contains a
mixture of patterns of the two classes. We will say that such a domain is unpure,
a pure domain being one which contains patterns of a same class. The goal of the
algorithm is to end up with a partition of the input space in pure domains. One
considers each unpure domain separately. For a given ( unpure) domain, one lets
run a perceptron algorithm trying to separate the patterns according to the class
they belong to. This leads to a new unit, defining an hyperplane which cuts the
domain into two new domains. This procedure is repeated until every domain is
pure. On figure 4 five domains have been generated.
0 1
Figure 4. A Neural Tree. Above: partition of the input space by a Neural Tree.
Below: The functional tree architecture.
160
One should note that every neuron that has been built is receiving connections
from (and only from) the input units. The tree is functional: consider for example
the neural tree of figure 4; to read the class of a new pattern, one looks at the
output of the first neuron. Depending on its value, 1 or 0, one reads the output
of neuron 2 or 3. In the first case, the output of neuron 2 gives the class. In the
second case, if the output of neuron 3 is 1, then one reads the output of neuron 4
which gives the class.
One should note also that the perceptron algorithm can be replaced by any
learning algorithm (for the perceptron architecture) that one finds convenient.
Most importantly, this algorithm can be easily adapted to multiclass problems
[36], that is when the desired output can take more than two values: in the final
Neural Tree, each domain will contain patterns of a same class.
In many applications one has noisy data, so that the best performances on
generalization may not be obtained when every example of the training set is
correctly learned. But with a Neural Tree (as with most constructive algorithm)
one can always add units until every output is equal to the desired output. Hence
it is likely that the net will in fact "learn by heart" all the examples and will not
generalize. Indeed, one has to stop the growth of the tree when generalization,
as measured by the number of correct answers on a test set, starts to decrease.
Such strategy can be applied locally, that is at each leave of the current tree. This
is an advantage of this algorithm: the input space is partitioned in a way that
reflects the local density of data, so that one has a good control on the quality
of generalization (one acquires more knowledge on the rule where there are more
examples).
Let us now come back to the perceptron, and reconsider formula (2) giving the
output of the perceptron with a single output unit. One can say, as above, that
there are p input-ouput pairs realized by a perceptron with a single output unit,
161
whose couplings are the J's. But one can as well say that one has a perceptron
with p output units, where J is now an input pattern, and the {11, fL = 1, ... , p are
the p coupling vectors (figure [5]). I will call A the initial perceptron with a unique
output, and A* the dual perceptron with p output units as just explained.
To avoid confusions when considering one of the dual perceptrons, whenever
condisering A* I will append a "*" to each ambiguous word: in particular I will
write "pattern*" and "couplings*", the * being a remainder that for A* these
denominations refer to J and to the {11, respectively.
input
f
j=N
Now let us reconsider the geometrical argument from the point of view of the
dual perceptron A*. What we have seen in section 2.2, is that for a given choice
of the couplings*, X, one explores all the possible different output states a that
can be obtained when the input pattern* J varies. If J represents, say, the light
intensities on a retina, a is the first neural representation of a visual scene in the
visual pathway. Since all visual scenes falling into a same domain are encoded with
the same neural representation, ~(X) is the maximal number of visual scenes that
can be distinguished. This can be said in term of transmission of information: to
specify one domain out of ~(X) represents ln~(X) bits of information. Hence
162
C= ln~(X). (20)
6. Conclusion
In this lecture I have given a quick overview of the bridges that exists between
the study of supervised as well as unsupervised learning tasks. I have shown the
163
remarkable fact that the study of the simplest architecture, the perceptron, can
be useful for understanding more complex architectures, such as fully connected
networks and multilayer networks. Moreover complex architectures can be build
by using perceptron algorithms.
The duality between supervised and unsupervised learning needs to be fur-
ther exploited. One puzzling aspect is the discrepancy between the standard view
points that come from the study of the two paradigms : in supervised learning
one insists on having distributed representations (the patterns should be made of
features distributed as randomly as possible), this in order to ensure good associa-
tive properties. In unsupervised learning one finds that efficient encoding produces
"grand-mother" type cells, each neuron learning to respond to a particular (set of)
feature(s). The duality presented above should help in analysing this problem.
Acknowledgements
I thank the organizers of FIESTA92 for inviting me. I thank Nestor Parga for a
fruitful ongoing collaboration on the study of unsupervised learning, on which part
of this talk is based.
References
[1] Abbott, L.F., Learning in Neural Network Memories, Network 1, 05-22 (1990).
[2] Amit, D.J ., Modeling Brain Function, Cambridge University Press (1989).
[3] Amit, D.J ., M.R. Evans, M. Abeles, At tractor Neural Networks with Biolog-
ical Probe Neurons, Network 2 (1991).
[4] Amit, D.J., H. Gutfreund, H. Sompolimsky, Storing an Infinite Number of
Patterns in a Spin-Glass Model of Neural Networks, Phys. Rev. Lett. 55,
1530-1533 (1985).
[5] Anlauf, J.K., M. Biehl, The Adatron: an Adaptative Perceptron Algorithm,
Europhys. Lett. 10, 687 (1989).
[6] Atick, J.J., Could Information Theory Provide an Ecological Theory of Sen-
sory Processing, Network 3, 213-251 (1992).
164
[21] Hertz, J., A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Com-
putation, Addison-Wesley, Cambridge MA (1990).
[22] Hopfield, J.J., Neural Networks as Physical Systems with Emergent Compu-
tational Abilities, Proc. Natl. Acad. Sci. USA 79, 2554-58 (1982).
[23] Krauth, W., M. Mezard, Learning Algorithms with Optimal Stability in Neu-
ral Networks, J. Phys. A: Math. and Gen. 20, L745 (1987).
[24] Le Cun, Y.,1 A Learning Scheme for Asymmetric Threshold Networks, Pro-
ceedings of Cognitiva 85, 599-604, Paris, France (1985). CESTA-AFCET.
[25] Linsker, R., Self-Organization in a Perceptual Network, Computer 21, 105-17
(1988).
[26] McCulloch, W.S., W.A. Pitts, A Logical Calculus of the Ideas Immanent in
Nervous Activity, Bull. of Math. Biophys. 5, 115-133 (1943).
[27] Mezard, M., J.-P. Nadal, Learning in Feedforward Layered Networks: the
Tiling Algorithm, J. Phys. A: Math. and Gen. 22, 2191-203 (1989).
[28] Minsky, M.L., S.A. Papert, Perceptrons, M.I.T. Press, Cambridge MA (1988).
[29] Nadal, J.-P., N. Parga, Duality between Learning Machines: a Bridge between
Supervised and Unsupervised Learning, LPSENS preprint, to appear in Neural
Computation (1993).
[30] Nadal, J.-P., N. Parga, Information Processing by a Perceptron in an Unsu-
pervised Learning Task, Network 4, 295-312 (1993).
[31] Peretto, P., An Introduction to the Modeling of Neural Networks, Cambridge
University Press (1992).
[32] Personnaz, L., I. Guyon, G. Dreyfus, Collective Computationnal Properties
of Neural Networks: New Learning Mechanisms, Phys. Rev. A34, 4217-28
(1986).
[33] Rosenblatt, F., Principles of Neurodynamics, Spartan Books, New York
(1962).
[34] Rujan, P., M. Marchand, Learning by Minimizing Resources in Neural Net-
works, Complex Systems 3, 229-42 (1989).
[35] Rumelhart, D.E., G.E. Hinton, and R.J. Williams, Learning Internal Repre-
sentations by Error Propagation, In McClelland J.L. Rumelhart D.E. and the
PDP research group (eds. ), Parallel Distributed Processing: Explorations in
166
PATRICIO PEREZ
Departamento de Fisica
Universidad de Santiago de Chile
Casilla 307, Correo 2
Santiago
Chile
ABSTRACT. We describe here some ways of storing correlated patterns in neural networks of
two state neurons. We begin with a calculation of the bounds for storage capacity in the case of
uncorrelated, unbiased patterns. We extend these results to the case of biased patterns, which is
a form of correlation. We present then some especific models that allow the storage of patterns
with different kinds of correlation. A model based in the segmentation in sub-nets is described
in more detail. We can store patterns in the sub-nets, by varying the interaction between these,
we obtain an efficient way to store correlated patterns that can be related to the human ability
to memorize and retrieve words.
1. Introduction
An important class of models of neural networks is formed by the Ising spin type
of fully connected neurons. In these models, state S; = +1 corresponds to a firing
neuron and S; = -1 to a quiescent neuron. The potential at the membrane of
neuron i, at each instant of time, IS assumed to correspond to the local field h;,
which is given by:
(1)
where J; 1 characterizes the synaptic efficacy for action potentials traveling from
neuron j to neuron i. The precise values of these matrix elements or weights
are determined by "learning" a set of patterns which represent the information
167
E. Gales and S. Martinez (eds.), Cellular Automata, Dynamical Systems and Neural Networks, 167-189.
© 1994 Kluwer Academic Publishers.
168
where K 2::: 0 is a measure of the basin of attraction or region around S* from which
it is reached.
The network will have associative memory if these stationary states, or an
important fraction of them, correspond to the patterns used to build the J[is
and the basins of attraction are not vanishingly small. The storage capacity of
the network (a c) is usually defined as the ratio between the maximum amount of
stationary states that is possible to program in advance (p) and the total number
of neurons (N).
Several different models have been proposed for the construction of the synap-
tic coefficients. In some of them each pattern is memorized in a single learning
event [1,3,8,14], and in others each pattern is learned by repeated presentation of
it to the network in a sequence of learning steps [2,4,5,7,9]. Most of the learning
rules cited are local [1,2,3,4,5,7,8,9] but sometimes allowing nonlocality leads to
interesting properties [14]. Here locality means that the synapses between two
neurons depend only on the activity of them when the patterns to store are taken
into account. General results for the storage capacity and stability of the stored
patterns are due to E. Gardner [7], a calculation that we summarize in section
2. Storage depends strongly on whether the patterns are correlated. Two stored
patterns ~; and ~r are uncorrelated if they satisfy
<~r~r>=o {4)
where the brackets mean average over the statistical distribution of stored patterns.
The Hopfield model is an example of a learning rule that allows the storage of only
uncorrelated patterns. In this case, the synaptic coefficients are given by:
169
The condition of stability for pattern ~i and for N > > 1 is obtained from equation
(3) with li = 0:
This condition may be written as 1 + R > 0. If the stored patterns are uncorrelated
R will average to zero but it can have deviations of the order of V'PJN.
In this
manner we can understand why with the Hopfield model we can store of the order
of N uncorrelated patterns. If correlation is present, in general R will not average
to zero and the storage capacity is drastically reduced.
Usually, the components of a prescribed pattern (a pattern we want to store)
are chosen at random with probability:
1 1
P(~f) = 2(1 + m)b(~f- 1) + 2(1- m)b(~f + 1). (7)
If m = 0, we say that the patterns are unbiased, every neuron has the same
probability of being active and quiescent. That is the case for the Hopfield model.
If m =/:- 0 the prescribed patterns are biased. This will produce that the patterns
have a mean correlation < ~f ~i >= m 2 . Then, bias implies correlation, although
this is not the only type of correlation assumed in neural network models.
The best known models to store correlated patterns are based on a non-local
synaptic matrix or an iterative learning algorithm. The former are problematic
because involve the inversion of a very large matrix and are biologically unrealistic.
The later may have convergence problems. In section 3 we discuss ways to store
correlated patterns in especific neural network models, including a novel approach
using a local and one-presentation learning rule. In section 4, some results of recent
numerical calculations using this last type of model are presented.
170
We will try to answer the following question: what is the maximum amount of
patterns with a given stability that we can store in a network with an optimum
synaptic matrix?
Since multiplying Jii by any set of constants has no effect on the dynamics
expressed by equation (2), it is convenient to assume a normalization condition
The idea is to calculate the fractional volume of the space of solutions for the
synaptic coefficients. For a given stability "'• storage of p patterns will be possible
as long as this volume does not vanish. The maximum storage capacity is obtained
when upon increasing pfN the fractional volume goes to zero.
The fraction of phase space Vr that satisfies conditions (3) and (5) for the
embedded patterns equation (8) can be written as
(10)
where 0( x) is the step function. If Vi is the fractional volume for fixed i we can
assume that
N
Vr = fiV;. (11)
i=l
Since we are interested in the case of N large, we study the thermodynamic limit
171
1 ln Vr = lim N
lim N 1 '""' ln v; (12)
N -oo N -oo L....,
i
. < vn > -1
< ln V >= hm - - - - - (13)
n-o n
(14)
(15)
j#i p a
(16)
= exp [~ L ln cos(L x~ Jij / VN)l
J¢• 1-' a
where in the last step, only the lowest order term in 1/N in the Taylor expansion
of ln cos x is kept. We introduce now the parameters
a<b (17)
which correspond to the mutual overlaps between the couplings in the different
replica copies. We will make also use of auxiliary variables pab and Ea which
satisfy the following relations:
Jtl
a<b a
with
(i
(21)
X exp L xa Aa -1/2 L(xa) 2 - L qabxaxb)
a a a<b
and
Gz(Fab, Ea) = ln fr J
a=!
dr exp( -1/2 L Ea(r) 2
a
+L
a<b
Fab r Jb) (22)
The integrals in equation (20), in the limit N ---+ oo, can be solved with the help
of the saddle point method and the replica-symmetric ansatz
(23)
1 n
G(q, F, E)= o:G1(q) + Gz(F, E)- 2n(n -1)qF + 2E (24)
where in the limit n ---+ 0 and using the saddle point conditions we can eliminate
F and E. Finally we are left with
G(q) =a J 1
DtlnH((Jqt + K)/(1- q) 1 12 )) + 2 ln(1- q) + 2 q/(1- q)
1
(26)
where
(27)
H(x) = 1 00
Dz (28)
(30)
The analysis of the previous section assumed that the stored patterns were random,
with the ~f having equal probability of being +1 or -1. If the patterns are drawn
from a biased random distribution
1 1
P(~f) = 2(1 + m)b(~f -1) + 2 (1- m)b(~f + 1) (31)
175
the calculation of the expectation value < vn > that appears in eq (14) leads to
a relevant term of the form
Ma = _1_
,fN '..../..
L ]'!-.
IJ
(34)
Jrl
Besides the auxiliary variables pab and Ea defined through equations (18) and
(19) we define Ka from the identity
1 1
-oo
00
dMa
00
-oo
(Nj21r)dKa exp (iNKa [Ma- ~L
vN Jrl
.......
Jijl) = 1. (35)
G~ = ln ( fi 1oo-oo
dxa {oo d>. a
},. 27r
a=l
(37)
a
where the average is taken over the random variable~ with distribution (31 ). With
the replica symmetric ansatz and in the saddle point approximation we get
where extM,q means maximum with respect toM and minimum with respect to q
and G(q, M) comes from the function
1 n(n- 1) n
G(q,M,F,E)=o.G 1 (q,M)+G 2 (F,E)- 2 qF+2,E (39)
after eliminating F and E. The conditions for q and M, in the limit q--+ 1 produce
1
1 = ac(m, K) [ -(1 + m) 100
Mm
Dt ( ( K _- 2 ) 1 / 2 + t) 2
1
2 11 1 m
(40)
K+Mm
+ t )2]
00
1
+ 2(1- m) 72
(
Dt (1- m2)1/2
and
(42)
which is in agreement with the result for uncorrelated patterns. Form approaching
one, O.c goes to infinity as:
1
O.c =- . (43)
(1- m)ln(1- m)
For intermediate values of m we can solve equations (40) and ( 41) numerically.
It is interesting to notice that while the storage capacity increases with bias,
the information content does not. According to Shannon's formula, the information
is given by
(44)
177
If we use the probability distribution (31 ), and sum over all sites and patterns we
obtain
I =-N 2 _1_[1-m 1 1-m l+m 1 l+m] (45)
acln2 2 n 2 + 2 n 2
The bounds for storage capacity calculated above are based on the assumption
of a second order synaptic matrix J;i. We may think on the possibility of using
a higher order matrix J;jk ... and in this case the expression for the local fields
(Eq. (1)) should be accordingly modified. It has been shown that a higher order
generalization of the Hebb rule used in the Hopfield model [8] indeed increases
the maximum amount of stored patterns [6,12]. Even more general, G.A.Kohring
[11] demonstrated that for an optimal synaptic matrix of order n, the number
of uncorrelated patterns that can be stored is of the order of Nn-l. However
if we define information density as the ratio between number of stored bits per
synapse, it is found that it cannot exceed the value 2 of the second order case. For
biased patterns, again the storage capacity goes to infinity as the parameter m
approaches one, but if we look at the information density, this quantity decreases
as bias increases.
Returning to second order synaptic matrices, it is interesting to mention that
with the Hopfield model, we can store only uncorrelated patterns [8,1]. For this
model we have:
1
:L (f e;
p
J;j = r.:r (46)
V .JV JJ=l
and its storage capacity for uncorrelated patterns is ac = 0.144. When we in-
troduce correlation in the way described by Eq.(31 ), the amount of patterns that
we can store is drastically reduced, due to the noise produced by the embedded
patterns over the pattern that we intend to retrieve [1].
To improve the mentioned weakness of the Hopfield model, when correlation
is introduced imposing that the embedded patterns have a mean activity m, Amit
178
et al. [1] proposed a modification to rule (46) in which the synaptic efficacies are
given by:
1 p
Jij = 'N L:<~r- m)(~j- m) (47)
V .J.V 1£=}
In this case a signal to noise analysis leads them to conclude that the storage
capacity decreases with m as:
ijesides, spurious states are observed to dominate the dynamics. However, a slight
variation to this learning rule allows nearly optimal storage capacity in the limit
of very low activities [3].
The pseudo inverse solution [14] has the following synaptic matrix:
In this model the synaptic matrix permits the storage of a maximum of N patterns,
which can be correlated but must be linearly independent. A problem here is the
nonlocality of the learning rule, which may be biologically unrealistic.
In the last time several models have been proposed in which the synaptic
matrix is built by a repeated Hebbian learning [2,4,5, 7,9], where terms of the form
1
aJij = N~r~: (51)
are iteratively added until every pattern (" has a desired stability. In this way
we can saturate the Gardner limit and store a large amount of uncorrelated or
correlated patterns.
For the pseudoinverse solution and the iterative models, correlation is not nec-
essarily introduced by assuming a given mean activity for the prescribed patterns,
but when correlation is due to any different mechanism the bounds for information
content do not change.
179
where each §<i) has Ni consecutive components different from zero starting at
position N1 + N2 + ... + Ni-l + 1. We can understand this as a partition of the
network in k sub-nets, each of them with Ni neurons. The local field acting on
neuron i which belongs to sub-net i 1 can be written as:
(53)
given by
k k
T.,·J· = '""' '""'..\· . J~~ti2)
~ ~ ltl2 1)
(54)
it=l i2=1
where the ]~ 1 i 2 ) are defined as before and the Ai 1 i 2 are real numbers. In the very
special case when Ai 1 i 2 = bi 1 i 2 and with a local synaptic matrix Jij, it is easy to see
that we can store and retrieve patterns in each subnet independently. This is so
because the subnets are uncoupled. The storage capacity of the whole network will
be given by all possible combinations of stored segments in the sub-nets when each
of them is at its limiting capacity. Obviously, these patterns are correlated due to
non negligible mutual overlaps between the N component vectors. However, the
case of no coupling between the subnets is not very interesting since the extension
from what is known for the subnets is trivial and recognition of a pattern that is
180
noisy in only a few of the segments cannot be helped by the matching of the rest.
We may think on the possibility of building a synaptic matrix of the form given by
equation (54) which allows interactions between the different subnets. A simple
case 1s:
(55)
with f varying between zero and one. As a further simplification we will assume
that all the sub-nets are of the same size, with Nk neurons, so kNk = N.
In this case the local fields (equation ( 49)) are of the form:
N k N
L:
j=l,j#i
L: L: (56)
L
N
J;jtid~)iddit) > c(p) > 0 (57)
j=I,j#i
where, given that all subnets are of the same size, a given number of stored patterns
p per net is assumed to give the same bounds for stability for all of them. Then,
by combining equations (57) and (56), we see that the following condition ensures
stability for the pk combinations of segmented patterns:
k
E < c(p)(l 2: (58)
If the original J;j's are of the order of unity, we can in addition, consistently with
that property, require that in all subnets
L
Nk
(J;jtid)2 = Nk. (59)
j=l,j#i
181
We can as a first approximation, assume that J;~'i 2 )~; 2 in (58) takes the values +1
and -1 with equal probability. Using the central limit theorem, we observe that
the upper bound for E in order to store the pk patterns will be approximately
If the patterns stored in each subnet are unbiased, we can simply relate c(p) to the
amount of stored states in them by using equation (30) of the previous section:
(61)
with solution
(62)
From this equation we can solve numerically for c for any p and replace it in (60).
So far we have shown that if the basins of attraction for the patterns stored
within the subnets are of a given size, a certain degree of interaction between the
subnets does not distabilize these patterns. Besides this, it would be desirable,
that due to the interaction between nets, not only the stored segments are not
weakened, but some selected combinations are preferentially recognized. This is
a property of a model introduced by U. Krey and G. Poppel (10], in which they
use the classical one presentation, Hebbian learning of the Hopfield model for
the interaction within the subnets. For the internet matrix elements they define
a copuling parameter Ep;, ,p; 2 , which is different from zero only for some of the
combinations and then we have:
(63)
the case of two subnets (two-letter words) and T = 0 they find a phase diagram
(storage capacity a= pfN as a function of the magnitude of~:) which shows regions
where only the preferred words are retrieved and other where non-preferred words
are also retrieved. Assuming that each letter in subnet 1 forms a unique word with
a letter of subnet 2, if we are in a region where only preferred words are retrieved,
presentation of a pattern where only one of the letters is distinguished should lead
to the retrieval of the complete word.
More interesting is the case of three-letter words. Here in order to keep the
basic idea of sub-nets, to store and retrieve preferred words it is necessary to
introduce three neuron or spin interactions. We divide a network with N neurons
in three subnets of the same size and in each of them we store a few patterns
(letters).
If we start from a fully connected network with three spin interactions with
the intention to find an expression for the local fields similar to (53) and then
extrapolate to something like (57), we expect to derive a complicated mixture of
terms and indices. Instead, we think that we can keep the basic ingredients of the
approach if simpler local fields are assumed. For example, for neurons in subnet 1:
where Jgl) is the usual Hopfield matrix (Eq. (46)) for patterns within subnet 1 and
(65)
with i within subnet 1, j within subnet 2 and k within subnet 3. Similar expressions
apply for subnets 2 and 3, after a cyclic rearrangement of indices. The first term in
Eq. (64) will stabilize the single letters in the subnet and the second will take care
of the collective aspects of the words. In the next section, numerical calculations
on the storage and retrieval of non preferred combinations in two coupled Hopfield
nets and preferred three letter words with a synaptic matrix including three neuron
interaction will be presented.
183
4. Numerical Calculations
In collaboration with G. Salini I have studied the storage capacity of two coupled
Hopfield nets, one with N 1 neurons and the other with N 2 = N- N 1 neurons.
The stored patterns are of the form:
(66)
(67)
and
(68)
where we have allowed for asymmetric coupling between the sub-nets and the
synaptic matrices are of the form:
PI P2
i~2)
I)
= "~~'>a,J.I'>J,V'
"~~1) ~~2) i = l, ... ,Nt,j = N1 + l, ... ,N (70)
J.l=lv=l
.
"-
"'
"' ...&!.
~
.Q. ~
"
~ . ~
. &l
Fi gu re 1.
.
t
,.. .
..
ii
-.
:: ti;:
~ ~
~
(/) <!:
~ .., _,,q«:~\"
~
'·-
Fi gu re 2.
185
Within the model described by equations (64) and (65) we divided a net with
N = 60 neurons in three subnets of the same size. In each subnet we stored 3
patterns that represented letters of the alphabet. + 1 corresponds to an 'x' and -1
to a blank in a two dimensional array. They were chosen to be different enough in
order to have a low correlation.
In the first subnet the letters were 'U', 'C' and 'J', in the second, '0', 'K' and
'I' and in the third subnet, 'A', 'X' and 'S'. From the 27 possible combinations
186
we selected 7: 'UOA', 'UKX', 'UIS', 'COX', 'CKA', 'JOS' and 'JIA', which have
the property that two of them differ at least in two letters. In order to test the
ability of the net to retrieve these words we did the following calculation: with the
initial state corresponding to a pattern in which two of the letters of an embedded
patterns were present and the third segment totally random, the fraction of times
that the complete word was retrieved was plotted against the magnitude of the
coupling parameter e.
FIGURE J
0.8
~
0.6
~
..."'0
.;
0.4
...
0
0:
0.2
Figure 3.
The results displayed in Figure 3 show that when only the two neuron interac-
tion is present ( e = 0), there is no retrieval. For small values of e, the three neuron
term has a positive effect, allowing a very good recognition. However, when e in-
creases beyond 0.15, retrieval ability is lowered remaining stable in a value around
0.6.
Figure 4 shows what happens when, in the absence of noise, the net initialized
in a combination of stored letters that does not correspond to a stored word is
allowed to evolve, as a function of coupling. We observe that coupling distabilizes
very rapidly these patterns.
187
0.8
0.6
0.4
0.2
0.0 + - - - . , - - - - - - - . - - - - , - - - - - , - - - - I E - - - 6 - @ - - - - - - j
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
COUPUNG STRENGTH
Figure 4.
The results of both calculations suggest that in Equation (64), the two terms
on the right side are important and have a cooperative behaviour. The two neuron
term stabilizes the single letters and the three neuron term is encharged of giving
meaning to the words, but when it becomes too important in comparison with the
first, the correlations present are an obstacle for the retrieval as in the case of the
Hopfield model and its higher order extensions.
Acknowledgements
This work has been supported by Fondo Nacional de Desarrollo Cientifico y Tec-
nol6gico de Chile (FONDECYT) through projects 91-0443 and 1930106 and De-
partamento de Investigaciones Cientlficas y Tecnol6gicas, Universidad de Santiago
de Chile (DICYT).
188
References
[1] Amit, D.J., Gutfreund, H., Sompolinsky, H., Information Storage in Neural
Networks with Low Levels of Activity, Phys. Rev. A 35, 2293-2303 (1987).
[2] Blatt, M.G., Vergini, E.G., Neural Networks: a Local Learning Prescription
for Arbitrary Correlated Patterns, Phys. Rev. Lett. 66, 1793-1796 (1991 ).
(3] Buhmann, J., Divko, R., Schulten, K., Associative Memory with High Infor-
mation Content, Phys. Rev. A 39, 2689-2692 (1989).
[4] Diederich, S., Opper, M., Learning of Correlated Patterns in Spin Glass Net-
works by Local Learning Rules, Phys. Rev. Lett. 58, 949-952 (1987).
[5] Forrest, B.M., Content-addressability and Learning in Neural Networks, J.
Phys. A 21, 245-255 (1988).
[6] Gardner, E., Multiconnected Neural Network Models, J. Phys. A: Math Gen.
20, 3453-3464 (1987).
(7] Gardner, E., The Phase Space of Interactions in Neural Network Models, J.
Phys. A: Math Gen. 21, 257-270 (1988).
[8] Hopfield, J.J., Neural Networks and Physical Systems with Emergent Com-
putational Abilities, Proc. Natl. Acad. Sci., U.S.A. 79, 2554-2558 (1982).
[9] Krauth, W., Mezard, M., Learning Algorithms with Optimal Stability in Neu-
ral Networks, J. Phys. A 20, L745-L751 (1987).
[10] Krey, U., Poppel, G., On the Thermodynamics of Associative recall of Struc-
tured Patterns within a given Context, Z. Phys. B 76, 513-520 (1989).
[11] Kohring, G.A., Neural Networks with Many-Neuron Interactions, J. Physique
51, 145-155 (1990).
[12] Matus, I.J., Perez, P., Generalized Learning Rule for High Order Neural Net-
works, Phys. Rev. A 43, 5683-5688 (1991 ).
[13] Perez, P., Salini, G., Storage of Words in a Neural Network, Phys. Lett. A.
181, 61-66 (1993).
[14] Personnaz, L., Guyon, 1., Dreyfus, G., Information Storage and Retrieval in
Spin-Like Neural Networks, J. Physique Lett. 46, L-359-L365 (1985).
[15] Sherrington, D., Kirkpatrick, S., Solvable Model of a Spin-Glass, Phys. Rev.
Lett. 35, 1792-1796 (1975).
189
[16) Venkatesh, S., Epsilon Capacity of Neural Networks, Proc. Conf. on Neural
Networks for Computing, Snow Bird, UT, J. Denker (ed.), AlP, New York,
440-464 (1986).
Other Mathematics and Its Applications titles of interest: