Anda di halaman 1dari 16

m ●

! Socletv=t PetroleumEno!aeers

SPE 27905

Higher-Order Neural Networks in Petroleum Engineering


A.O. Kumoluyi* and TS. Daltaban, Imperial College
“SPE Member

Copyright 19S4, S&M y of Petroleum Engineers, Inc.

This paper was prepared for presentation at the Western Ragonal Meeting hefd m Long Beach, California, U. S.A., 23-25 March 19S4.

This paper was sdected for presentation by an SPE Program Committee bflowing review of information contained in an abstract submilted by the author(s). Contents of the paper,
as preaentad, have not bean reviewed by the S@ety of Petroleum Engineers and are subjacf to correction by the author(s). The matarkl, as presented, does not nacaasarily ref!act
any position of the Society of Petroleum Engineers, ila offters, or nwmbam. Papers presented at SPE maetings are subject to pubficalion reviaw by Ediicfial Committees of the Sodety
of Petroleum Engineers Penniasion to cxpy is restricted to an abstract of net more than SS0 words. Illustrationsmay not ba copied. Tlw abstract ehc?uldcontain conspkwus ackn@a4adgment
of where and by whom the paper is presented. Write Librarian, SPE, P.0, Sox .9sWS6, Richardam, TX 750S3-SS3, U.S.A., Talex 16S245 SPEUT.

I Abstract

In this paper, we discuss the general application of Kohonen feature map. The most widely used of these
higher neural networks, this follows from the successful networks is the CF networks and they have been applied
application of such networks to the identification of well in extensive areas such as speech recognition, image
test models. During the course of the research, it was processing, and medical diagnosis. Despite their
discovered that these networks have a large range of robustness, there is an enormous cost associated with
applications in petroleum engineering. Hence the their training. Using higher order neural networks, this
objective of this paper is to give a background of higher training time can be reduced substantially for certain
order neural networks and their potential uses. claw of problems, e.g. problems with transformation
Conventional neural networks have activation functions properties, such as translation, scaling, and rotation. If
that are linear correlations of their inputs, whereas any of these properties are known a priori then they can
higher order networks have a non-linear correlation of be built into the architecture of higher order neural
their inputs. Higher order neural networks do not have networks, thereby providing some of the tools required
wide practical applications due to the enormous amount by the network to execute its job. For example, a child
of parameters (weights) associated with them. However, will identi~ a certain pencil regardless of its position or
for certain problems these vast amount of weights are orientation, in this case the transformation groups are
greatly reduced by constraining the architecture of the both translation and rotation.
network. That is, problems that need to be classified
regardless of some transformation groups such as We briefly describe pattern recognition and some of the
translation, scaling and rotation. A typical example is terms used in this field, describe some of the general
the identification of well test models where standard properties of neural networks with specific references to
type curves are translated both horizontally and CF networks, define higher order neural networks and
~e~ca!!y .~,i$h~esp~ct LOfield data plots. its implementation, how such networks can be extended
:~ ~u]tipha~e flQw re~ime identification, interpretation
of well logs, and seismic data processing.
2 Introduction

3 Pattern Recognition
Higher order neural networks are a special case of
common conventional feedforward networks (CF),
sometimes referred to as backpropagation In this section, we give a brief introductory discussion
networks. There are other topologies of neural networks: of pattern recognition in general with emphasis on
Adaptive resonance theory (ART), Weightless systems adaptive patsem recognition. The common types of
.~W17 !-lonfield nets. Boltzmann machines. and pattern representation formats are examined. Also, the
. . . .. .A~l
.- /r I

References and illustrations at the end of paper


555
..

2 Higher order neural networks in Petroleum Engineering SPE27905

focus and objectives of pattern recognition are explained. fashion. This procedure is limited in that it is practically
impossible to capture the universe; notwithstanding, there
have been numerous successful applications of this
3.1 General Description
approach to real-life problems. The method contrasts with
our knowledge of biological information processing.
Pattern recognition stems from statistics. Various types of Although, biological systems undoubtedly abs~ct
classifiers are employed, for example the Gaussian universals, but most learning is incremental, hence the
classifier and the Bayes methodology. In the Gaussian connotation of adaptive pattern recognition.
classifier it is presumed that the underlying distribution of
the class is Gaussian and that these classes differ only in Traditional pattern recognition uses various statistical
their mean and variance. Hence classification is based on methods, and sometimes mathematical linguistics in the
maximum likelihood. The likelihood of some unknown classification of syntactic structures. Fuzzy logic has also
input is calculated for all known classes and it is then aroused some interests. This is applicable where a
categorised according to the maximum of these values. This nonnumeric feature value can be represented by a set of
is different in style from the neural network approach, some quantitative measurements with an associated
although it can be shown that the neural classifiers are probability distribution. Such a set is referred to as a fuzzy
somewhat analogous to Bayes methodology [Lippman, set, there is a prohabili~ associated with membership.
R.P3’ - 1987].
Neural networks model the biological nervous system (see
Pattern recognition is the process of determining the classes Figure 1) and offer an alternative computing paradigm
for sets of features, such as speech utterances or images, in closer to the biological model. These networks comprise of
a robust manner despite distortions, variations, or layers of nodes (numerous elemental processors executing
omissions. It is also valid to view this process as the ability simple functions); the simplest form has an input layer and
.lU. .-A:,,. ,- ;.rmrm=i{fi” (whole patterns) from associated
lC1l lGVG ,,ll”,,ll”. .. .. \
an output layer of nodes. These networks are designed to
cues, consisting of a subset of the representative features. find mappings (in this case equations, funCtiOnai
There are great motivations for the study of pattern relationalships to characterise the system in question) from
recognition. It is useful and interesting to capture the tasks a larger dimensional input space (e.g. digitised pressure
of perception and cognition. The formalisation and derivative response to a given flow rate) to a lower
explication would lead to the construction of new machines dimensional output space (for example, reservoir models)
to perform certain mundane jobs. Moreover, correctly as shown in Figure 2. These processors are interconnected
applied, these techniques should produce consistent results. in such a way that tasks are performed in parallel. This
Secondly, it is imperative to unders~nd how approach seems plausible for modelling both perception
pattern-formatted information is handled; because an and cognition. Inspirations for these models are received
isolated fact or even a body of isolated facts is not very from other disciplines like neurobiology and psychology.
useful. It is the interrelationship of these facts that gives a
meaningful interpretation. In some cases, the The following subsections describes some properties of
interrelationships are implicit, in the sense that all the facts pattern recognition.
represent the same object. For example, a set of railway
carriages running under London conveying commuters
from destination A to B is a train. In other c=es 3.2 Mappings
relationships amongst the fact must be explicit. Consider,
the set of numbers Pattern recognition can be described as the determination of
a mapping from a pattern space into a class space. The
{3.7, 3.8,3.9, 4.0} process by which this mapping is performed by an observer
in nature is often rather vague, in the sense that the process
Standing akme these features do not make much sense, they cannot be written down explicitly. It is just a case of
could represent some aspect of the weather or any other identifiiiig tii ~bject based on experience; whereas, for a
object. In this particular case, it is the average score of a successful implementation of pattern recognition, it is
student for four college years. The objective might be to use necessary to be able to describe the details of finding
this set to compute the student’s final grade. appropriate mappings from the pattern space to the class
space. The formation of these mappings involves two
Pattern recognition tasks can be executed using artificial distinct steps. First, an instance of the object is described in
intelligence, traditional pattern recognition, and neural terms of appropriate features. The second stage uses a
networks or some composite mixture of these techniques. distinct procedure to carry out the required mapping. The
mapping is the result of identification of relevant and
Artificial intelligem,e is based on the idea of a irrelevant features in the feature set. It should be stressed at
representation hypothesis. That is, that universal knowledge this stage, that the first stage is of paramount importance
can be acquired, manipulated and interpreted in a symbolic and also an integral part of the whole exercise. It is dit?icult

556
SPE27905 Kumoluyi, A.O., and Daltaban, T.S 3
to determine the features which are representative of an vectorial, hence pattern classification could be regarded as
object, whereas there is theoretical guidance for performing a form of metric calculation. For example two
constructing the mapping. The choice of inappropriate patterns are similar if the Euclidean distance between them
features can lead to complex decision rules, while adequate is smallest compared to all other patterns. Another reason is
features result in simple and comprehensive rules. that a lot is known about the co-ordinate transformations,
hen~ such transformations could be used to obtain more
convenient representations. With numeric representation,
3.3 Objectives
classification is either deterministic or non-deterministic
[Specht, D“. - 1990]. In the former case, decisions are
There are two valid views of pattern recognition, which based on schemes such as the nearest neighbors or
could be considered as its objectives - classification and discriminant. In the latter case decisions are subjective
estimation of attributes. Under classification, the concerns beliefs based on Bayes’ theorem.
are:
Numeric representation is very convenient but there are
1. Performing feature extraction: this is the symbolic occasions where it is impossible to represent patterns in this
representation of instances of an object. ,. -..-A ~a.-..
- cm nonpumeric are used and the
Iormat. in =UW,, .. .. .. S-ymbols

2. Learning a mapping: a training set is used to well established metric measurements are not applicable.
dichotomise the pattern space into appropriate
There are many ways of classifying linguistic features. One
decision regions. method is to ascribe sfbitrarjj .Zalues
.. . to the nonnumeric
3. using the mapping is used to execute the symbols, thereby using a metric calculation. For example,
classification task. we might reasonably describe a fruit with the following
attributes: colour, weight, and taste. Appropriate values are
This view is considered by some [Pao, Y32. - 1989] as ascribed to these features using some heuristics. Another
limited, in tiiat it dOe~ -IIUL
..+ .“.-am.maqq the diverse uses of
-l-+--- .. . . procedure is to determine an exact match to a class on the
pattern recognition in solving real-world problems. number of matching features, this is the Hamming dis’mnce
metric. Some researchers have retained the <feature
The alternate view only differs in the second stage in that name-feature value pair as a feature and ascribed a
the mapping is done horn a set of input features unto a set certainty value or belief to this new feature.
of output features.
Finally, the interest of pattern recognition might just be to
3.4 Formats determine structural relationship of features, or determine
whether the pattern of symbols has been generated by some
A pattern could be viewed as an instantiation of a data production rules. In this case, the feature size or its ordering
structure of features. The features themselves are also data are irrelevant. One area in which this type of representation
structures. Each feature has as its structure: feature name, has been successful is in syntactic pattern recognition.
feature value, and relationships with other features. There Under this scenario patterns are viewed as sentences, where
are two major types of pattern representation formats - classification is described as the parsing of these sentences
numetic and nonnumeric. Numeric representation is to see whether they satis& a certain grammar, i.e. they
illustrated as follows: e.g. students in a class can be belong to certain classes. The grammars are often
classified into three categorises - excellent very good, and context-ffee. Good examples are the various language
gcod, b~ed on their scores in seven subjects. These subject compilers used in computing.
names represent the feature names and the scores represent
the feature values. For instance, student A might be 4 General properties of neural networks
represented, thus,
This section describes a new type of computational models
A = {physics: 90, Chemisby: 75, Biology: 95, based on the biological nervous system, in terms of certain
Maths: 78, English: 87, Religious
components and properties common to this class in general.
Studies: 90, Add Maths: 98} This class of models is referred to as neural networks or
parallel distributedprocessing (PDP) models.
If the order of the features is fixed and their names are
implicit then
4.1 Common Properties
A= {90, 75,95,78,87,90, 98} 1. Representation
2. Local learning and Distributed representation
This representation format was influenced by decision 3. Attractive properties
theoretics and received considerable favour amongst
researchers for a number of reasons. The representation is

557
SPE27905
l-li~her
. ..=.... order neural networks in Petroleum Engineering
4 ----- ;A. - .h. ..mW. I tendencv of the pattern pair
ieam [o iiSSUGldLG L1lG w,..-. .---.-,

4.1.1 Representation
and ignore the noise. Whereas if uncorrelated patterns are
presented they do not interact with each other in this way.
The knowledge in PDP models is stored in the connection
strengths between the units. This information embedded in 4.2 Common Components
the strengths is used in the classification and pattern
retrieval process. This type of representation contsasts with
1. Processing units
the conventional storage schemes, where the knowledge of 2. Activation state
~ ~anem. iC ctored as a static copy of the Pattern itself. ln
.W------
3. Output finctionjbr each unit
this case, retrieval is the process of searching for the item in
4. A pattern of connectivip among units
the long-term memory and copying into the working buffer.
5. Propagation rules
In such systems there is no difference between the pattern 6. Activation rules
and stored information.
7. Learning rules
8. An environment
As a result of the knowledge representation in PDP models,
both the processing and learning are affected enormously.
The course of processing is influenced by the knowledge, 4.2‘.1 Processing units
that is the knowledge is part and parcel of processing;
whereas in conventional models, knowledge is used just for Detining the set of processing units is a vital stage of a PDP
the right information in memory. mIodel. Typically, for some paradigms (such as feedfomard
n~wworks) the inPut and output ‘nits ‘night ‘enote
Since knowledge in these models means the strengths of the C(~ncepmal objects such as features, letters, and words. And
connections, this implies that these machines can actually tt Ie hidden units are just artifacts of the model. For other
learn. This is achieved by tuning their connection strengths Paradigms, these processing units usually do not represent
to reflect the interdependencies between activation units anything but they stand for abstractions, -in which case
when exposed to training data. In this Scenario, this sYs~m nmeaningful attributes are ascribed to them during learning.
in that the interconnections
could be viewed as plastic, 1lese units do not denote one single concept as found in
could change at any time through experience. That is, the cther conventional models, rather the pertinent information
weights change as a function of experience. In this waY the a ~bout the whole pattern is distributed among connection
systesn is said to develop, What a connection represents can \ veights or learnable parameters.

change with experience, and the system can perform in


substantially different ways. I ~ach unit does its own simple job - typically, calculates the
1 weighted sum of its net inputs, this value is passed through
. mf..JL ...-.4 .dnpncantdion .. . . ~..--.: -- +- ,-~~m,,t~ an nutmt which it sends to
Local iearning ana ,a transler lU1lGLIU1l w w. I. IJ...- -.. ---r. .;
4.1.2 Umwmwtcw ,=F. . . . .... ... ..

other units.
The knowledge required to store any given pattern is
Feedforward networks have typically three layers of units;
neither stored in a particular unit nor in the connections to units
input, hidden and output laYers. The inPut layer
a given unit. In parallel distributed processing models, the
receive data from the environment. This could be :ensov
knowledge is distributed over the learnable parameters, it is
data, or other appropriate sets of features representing the
the pattern of activations of the system that yields the
pattern. The hidden layer recodes the input data to generate
required information. The units in certain cases may
represent conceptual primitives, and m other cases they are
appropriate internalrepresentation. That is, essential
features of the patterns are retained. The essential
only meaningful as a group.
parameters from the network are related to its environment
me learning is performed locally in most nemork via the output layer.
pamdigms. That is, the adjustment of the connection
strength requires just the activations of the connecting units. 4.2.2 Activation state

Attractive properties At any point in time, every unit in the system has a Vdle
4.1.3
attributed m it, known as its activation. It Is the pattern of
activation over the set of units that captures what the
Some PDP models are like the pattern associators - models
system is representing at any given time.
in which a pattern of activation over one set of units can
cause a pattern of activation over another set of activation
These activation values can either be binary or continuous
units. ln these models similar patterns tend to reinforce the
such that if similar values. They are also either bounded or unbounded. These
connection strengths between units,
constraints lead to models with slightly different
patterns are presented over and over again, but for each ch~cteristics.
presen~tion noise was added randomly, the system would
558
SPE 27905 Kumoluyi, A. O., and Daltaban, T.S 5
4.2.3 Output function of units which can be associated with the node or, more
conventionally, with the links leading to the node. In this
The interaction among units is due to their out-going case the activation function is computed as
signals. These signals are functions of the activation values. m
The relationship between the output signals and the netj=~W’jiVi . . . . . . . . . . . . . . . . . . . . . . (1)
activations can be varied in different models or within a i=l
--A-1
Illuue
fin. ~Y9~ple,
. , v. -,. - . .
in some cases, it is a linear relation,
where wi! is the weight from j to i and Vi is the output of
where the output is the same as the activation. In others it is
node i. The distinction is then made using an oiitptit
a step fimction where the output signal is not activated
function g(netj) which can take binary values.
unless the activation is above a certain value. Sometimes it
is a stochastic relation, that is the output is related to the
If the activation fimction is a non-linear fimction of the
activation by some probabilistic function.
inputs we call the network a higher-order network. For
example, in the ellipsoidal model the activation function
4.2.4 Pattern of connectiviv becomes

The pattern of connectivity determines how much


knowledge is stored in a network. It is this pattern and the i=, -
. . . . . . . . . . . . . . . . . . . . . .(2)
inputs that determines the activation value of each unit. In a
given PDP model there are different kinds of inputs coming !!# =2@i --Cji)
to a unit. If the associated strength of an input is positive,
this is described as an excitatory input, if it is negative it is where the Afiterms are squared to ensure that the quadratic
defined as an inhibitory input and if this value is zero, it is form is positive definite, thus net s 1 defines a closed
presumed there are no connections to the unit. The ellipsoidal region of the input space.
activation of a unit is a function of its input strengths and
outputs. It is the combination rules used by the Iimction that The net inputs of each kind are combined with the current
determines the tYpe of connectivity activation of a unit, ai(t) = neti(t) to produce a new
-“ T
activation for the unit at a gwen staie ui(l ~ 1)
, . ~-
. . -- nutpuf at
In many cases, it is presumed that each input has an time t is generally represented by g(neti(t)), where,
additive contribution to its unit; such that the activation of depending on the model being used, g could be the identity
the unit is the weighted sum of its inputs - this is one kind function, a step function, or some type of stochastic
of connectivity. In other cases, it might be a more complex function.
rule for the different kinds of inputs. For example, in
[Jones, A. J“. - 1992 ], in which ellipsoidal and quadratic The output value has two alternative interpretations. In one
activation functions are used, the fimction which computes case, it is valid to consider this value as a degree of
the activation depends on a number of parameters, which confidence that the prefemed feature is present. The
are more conveniently associated with the node itself, rather alternative case is that the unit encodes (within a certain
than using a weight matrix related to the connectivity (as in range) the amount of its preferred feature present. This rule
more conventional models). is applied in some models to all units simultaneously, that is
new values are determined at a regular time. This is known
4.2.5 Propagation rule as synchronous update. In other models the rule is applied
to a small number of units at a time. These models are said
to perform asynchronous update.
This requires a specification of how every unit is connected
to every other unit. If there are no feedback loops in the
connectivity matrix then the model is a simple ~eea’’orward
model and is essentially a transfer function from input 4.2.7 The learning rule
nodes to output nodes. If feedback loops are present then
the model has its own intrinsic temporal behaviour, i.e. In neural networks, a specific mapping is implemented via
becomes a dynamic system. Such models are extremely the learning process, through the iterative adaptation of
. .
interesting Dut fau11-....:
UULWG
~- +k~t-fine
t,lw .-r-
of~his work. weights (or parameters) based on a learning rule and the
network’s response to a training <]ggyzl. In genem! the
4.2.6 Activation rules and ou@ut mapping to be learnt is represented by a training set of input
patterns and output patterns. The training set is a subset of
possible examples of the mapping.
The primary function of a node is to make a distinction.
. -- ..-:-- A the fi& instance, a linear
This M usually done U~i115, .,. .. .. . .
There we two basic classes of learning in PDP models -
function of the inputs to bisect the input space. If the
associative learning and regularity detectors, although, in
number of inputs is m then this naturally leads to m weights
559
6 Higher order neural networks in Petroleum Engineering SPE27905
certain instances the difference is subtle. In associative %(0 = activation of unit i at time t
learning, the objective is to learn the association between ti(t) = target value of unit i at time t
patterns such that if a noisy or good pattern is presented to vi(t) = output value of unit i at time t
the network subsequent to training, it will respond with the vi(t) = output value of unit j at time t
appropriate output pattern. This association is either rl>o = learning rate
hetero-association or auto-association. In the former case,
two distinctive patterns are shown to the network, one as Original Hebbian learning rule
the input pattern and the other as required output. Whereas
for auto-associative systems the same pattern is used for Awij=~Vi(t)Vj(t) . . . . . . . . . . . . . . . . . . . . ..(3)
both the input and the required output.
Extended and generalised Hebbian learning rule:
For regularity detectors no output is provided, the unit will
.- c..+
learn to respond to certain .. . . ,+fi~mdino
LWLUIC=
on an internal
“WOW . . . . ...= ----- Awi! = f{ai(t), ti(t)}g{Vj(t), wij} . . . . . . . . . . . . (4)
teaching function and the nature of the input patterns. A
system with such properties is said to be undergoing Where f is some fimction of current activation and target of
unsupervised learning. Learning takes place in this type of the unit i; and g is a function of the in-coming unit, unit j
network without the provision of training signals. The and current weight associated with the two units.
problem with this paradigm is that it is not known a priori
whether the classification categories generated will be Widrow-Hoff rule :
useful or interesting. It is possible to hand-croft these
~etworks with classification types using high-order Aw~=~{ti(t)-a i(t)} . . . . . . . . . . . . 0.....(5)
competitive networks [Giles, C.L. ~t_al’2 - 1988].
Generalised Delta Rule:
The Kohonen feature map is an example of a network that
implements this approach. The training rule in these Awij = ~{ti(t)-ai(t)}{vj(t)}. . . . . . . . . . . . (6)

networks is very powerfid in that it disallows one unit


capturing man y features. Grossberg Learning Rule:

The knowledge in parallel distributed processing models is Awij=q{ai(t)){Vj(t) –wij} . . . . . . . . . . ...(7)


stored in the connections between the units. The
modification of the pattern of connectivity implies a change
4.2.8 Environment
of its knowledge. Basically, learning could be in any of the
following forms,
PDP models are designed for different kinds of
1. Development of a new connection
environments. It is important to have a clear model of this
2. Loss of an existing connection
environment. In general, this model is a time-vasying
3. Modl~cation of an existing connection
stochastic function over the set of input patterns. This
probabilistic function might also be dependent on the past
Very little work has been done on the first two modes of inputs and outputs of the system. In certain cases, these
learning, however they may be regarded as special cases of models are also restricted by the type of inputs. Some will
the last one in that, if an existing connection with zero only function properly for an orthogonal set of input
strength changes into a positive or negative value, then it is patterns, or just a linearly independent set of input patterns.
conceived as development of a new connection. In the case In most cases, they accept arbitrary sets of input patterns.
where the value of a connection changes to zero, then it is
presumed the connection is lost. 5 De~ition of higher order network

The original learning rule was developed by [Hebb, D“. - Higher order networks are characterised by a non-linear neti
1949]. It simply states that the connection between two which is a higher order polynomial function of the inputs,
,.
simultaneously acnve ... ..- :. -a.mttened ~~gj @hiI)g
unw IS k-1,5 .W...-, .. .... . Lhan a simple linear fimction. If we define tie
rnther
happens otherwise. This rule is not very practical, however activation (value of node) of node i with inputs
it laid the basis for some of the more sophisticated rules vk,, vk2, . . .,vkN by
used in most PDP models. There are many variants to this
rule:
neti = ~ ~ wp@l, ”””,kN)vk,, ”””>vkN
@ &,,.,kd
Let . . . . . . . . . . . . . . . . . . . . . . (8)
Ui = unit i where the inner sum is taken over all p-tuples (k,,...,\)
‘u, = unitj {vector of dimension p} with 1 s kjs N, the network is
w~ = weight from unit j to unit i
560
SPE 27905 Kumoluyi, A. O., and Daltaban, T.S 7
said to be of order r. The weight WOcorresponds to a before being passed to the neural networks. This is achieved
threshold and in general the weights wP(k,,...,~) correlate using transforms such as Fourier, Hough, Zemike Moments
some subset ofp input (O < p <N). but they are computationally expensive and also very
sensitive to noise [Perantonis, S.J. et a125- 1992]. Another
Such a unit is referred to by some writers as a higher-order way of solving this problem (or problems of this nature in
logic unit, HOLU. Layers consisting of at least one of these general) is to hard-code the neural network in the sense that
units are referred to as higher-order slabs, an example of all possible transformedpattems are shown to the network.
which are the sigma-pi units discussed by [Rumelhart, D.E. This has a greater number of disadvantages in that it is
et al’9 - 1986] or the ellipsoidal unit mentioned earlier pattern specific, it is dependent on the learning rule and also
[Jones, A.JIS. - 1992], p~ote ~ha~if On!y ~he first order terms dependent on the weights assumed by the network.
are considered then a higher-order network becomes a
conventional single layer network. The problems associated with the above two cases are not
applicable to higher order networks. Higher order networks
can be constructed such that their outputs are invariant to
5.1 Operation of higher networks
various transformations of the input data or the retina. The
inherent advantage of this process is that the network
In general, the operation of a network can be viewed as the weights are dramatically reduced and its topology reduces
finding of appropriate functions to carry out a given task. to that of a single layer network, which itself reduces the
A network could learn this itseif or it couid be biiit into its training time.
architecture, if it is known. For instance, a robot could be
built with the prior knowledge of a physical law. For higher The methodology of higher order neural networks is given
order networks, it is possible to encode invariance into the in appendix A.
network architecture. Consequently fimctions that are
needed by the network in solving its problem are provided,
thereby reducing the training time. The high order 6 Application of higher order neural networks
comelations in these networks capture the invariance in
patterns. The invariants of patterns are recognised through In this section, we suggest some of the application of this
the formation of complex mappings from a high technique in Petroleum Engineering: well logs
dimensional input space (such as the pressure derivative interpretation, multiphase flow analysis and seismic data
data) to a low dimensional output space (classification class processing. A fill description of this approach to the
- homogeneous or fractured reservoirs). These mappings are identification of well test interpretation models is given in
represented internally in the network. These [Kumoluyi, A.O. et al’ - 1993]. Also, the procedure
representations are formed adaptively via learning as outlined in the preceding paper are applicable to the
discussed in the backpropagation algorithm [Rumelhart, following cases.
D.E. et al” - 1986].
6.1 Well logs interpretation
The training of higher order networks can, in principle, be
done by a modified backpropagation algorithm. The higher
In interwell correlation, this system can be used to establish
order networks operations are very similar to conventional
the continuity of stratigraphic units. Initially, the network is
networks. Single layer higher order networks can model
trained on all anticipated patterns as shown in Figure 3
more than just the simple correlations of inputs which
(Spontaneous potential logs). The first well is used to
obtain for single layer linear activation (r = 1) networks.
augment the trained network for any new pattern
However, an apparent significant initial disadvantage is the
rapid increase in the number of weights; which is now, in (previously unseen lithology unit). As more wells are
correlated, newly seen patterns are also used to update the
the worst case (r =N), as many as 2N per node. For a
trained network, Higher order neural networks will play
sizeable retina this could easily become impractical.
significant roles, because it can be trained in realtime. Also,
both translation and scaling invariant can be built into its
5.2 Advantages of higher order networks architecture, These constraints are necessary for the
automatic interwell correlation because stratigraphic units
It is necessmy to recognise patterns regardless of position, have inherently different sizes and depths, caused by both
rotation, and scaling in both x and y directions. Problems diagenesis and post-deposition effects. This is illustrated in
of this nature are abundant in the real world, well test model Figure 4, (where the same unit in 4A, 4B, and 4C have
. . . .. . .. .
identlflcatlon IS one of them. well test models ~-e ;“.,mrin”t
,,.. . . .. . different sizes and are also at varying depths). The
under translation and scaling in both x and y directions. following condition thus,
This problem could be solved in a number of ways using
X,-X2
neural networks. In one pmiculm case, the pattern is ~ . ... ... .. .. .. ... .. .. . . .. .. (9)
pre-processed such that invariant properties are recoded

561
8 Higher order neural networks in Petroleum Engineering SPE 27905
.1 .-.u..--+..-
can beusedto constrain medrcmlwu,eu. ,-.4fh.o h,oh~r nrdq
.. . .... .. .. -.–. coefficients. Making appropriate change of variables and
neural network; if it is assumed that scaling is the same in following steps highlighted for transition invariance
both the horizontal and vertical directions. That is, for the (appendix B), the pertinent constraint is obtained. This is
higher order neural network to be both translation and used to alter the structure of the higher order neural
scaling invariant, any two points with the same inverse networks in order for it to work accordingly for data
slope must be in the same equivalence class. When different compression. Finally, higher order neural network can be an
scaling factors areusedfor both the horizontal and vertical integral part of an automated structure map generation. In
directions: this case, the network is trained on the different variety of
deformations (faults and folds) as shown in Figure 5. Then
1. All first order weights must be in the same as different subsection is shown to the network its
equivalence class. 1.-A. ,lc.Aoc
AZIUA a*”& !S updated, At the same time, Iithology
,
2. Ail secona order weigh,,+. -,, nlcn he in the same
,,,A t -..” _ ... . boundaries and deformations are identified as illustrated in
equivalence class. Figure 6. This figure shows units boundaries dispiac.ed
3. The equivalence classes of third order weights can relatively to each other as a result of diagenesis and
be determined using the following constraint: post-depositional effects. For this classification, the
translation invariant is sufficient.
x,+x,
.—
YI+YI
X*+X, ~ yt+y,
7 Conclusion
This means that all three points on the retina that satisfies
the above criteria are placed in the same equivalence class. In this paper, we have discussed:
These constraints can be deduced by following steps
highlighted in appendix B. 1. general overview of pattern recognition
2. properties of neural networks
6.2 Multiphaseflow analysis 3. definition higher order neural networks and their
advantages
4. application of higher order neural networks to some
in niuitipimse fkw ~ngiy~i~, when it is required to ~.ea~ ~f petroleum engineering
determine the flow regime for given flow conditions (rate,
pressure, line size, and inclination), this method can be used
We hope this paper serves as a basis for those interested in
to automate such a process. l%e normal scenario of
adaptive pattern recognition and in particular researchers
training, evaluating and subsequent testing of patterns is
working on the application of neural networks. Higher
carried out. The training and evaluation data can be
order neural networks have diverse applications: image
obtained from theoretical models proposed by [Taitel, Y et
processing and speech recognition (a cohort of mine is
also -1976]. These models are translation invariant, caused
currently implementing an embedded system for speech
by the variation of pipe size, fluid properties, and angle of
recognition using this technique).
inclination. Hence the condition in appendix B must be
applied to training, evaluation, and test data.
It should noted that the constraint given for each of the
transformation groups is one of many possible constraints.
6.3 Seismic daia A different constraint can be obtained for the same
transformation property [Giles, C.L. et al’3 - 1988]. Finally,
In seismic data processing, one of the primary objectives is incorporating invariance into the architecture of higher
to compress the large amount of data without compromising order neural networks is mathematically rigorous.
vital information. In this particular instance, the affine
mapping plays the central role. Consider the general affine 8 Acknowledgements
form:
We would like to express our profound gratitude to OMV

‘(XY)=(::)(:
I+(F) (lo) (UK) Ltd and in particular to Mr. Mike Lucas for
supporting this project. Special thanks to Dr. A.J. Jones of
the Department of Computing, Imperial College, London
for her assistance.
where

x=au+bv+e . . . . . . . . . . . . . . . . . . ..(11)
9 References
y=cu+dv+f . . . . . . . . . . . . . . . . .. ..(12)
1. Home, R. N. and AIlain, O. F. : “ Use of Artificial
a, b, c and are the aftlne coefficients for rotation, skewing, Intelligence in Well-Test Interpretation,” JPT (March 1990)
exuansion, and contraction and e and f are the translation 342-349.

562
SPE27905 Kumoluyi, A.O., and Daltaban, T.S 9
2. A1-Kaabi, A. U., McVay, S. A., Holditch, S. A., and Lee, 20. Veezhinathan, J., and Wagner, D.. : “A Neural Network
W.J.:” Using an Expert System to Identify the Well Test Approach to First Break Picking,” IJCNN tnt. Joint Conf.
Interpretation Model,” paper SPE 18158 prepared for the on Neural Networks, Pages 1-235 thru’ 1-240, San Diego,
63rd Annual Technical Conference and Exhibition of the CA, USA, 17-21 June, 1990.
SPE held in Houston, Texas, October 2-5, 1988. 21. Kimoto, T., Asakawa, K., Yoda, M., and Takeoka, M. :
3. Erdie, J. C., Archer, D.A., St@jj T.J., and Callihan, M. : “Stock Market Prediction System with Modular Neural
“Well Test Software with Built-in Expert Advice,” paper Networks,” IJCNN Int. Joint Conf. on Neural Networks,
SPE 15309 presented at the 1986 Symposium on Petroleum Pages 1-1 thru’ 1-6, San Diego, CA, USA, 17-21 June, 1990.
Industry Application of Microcomputers, Silver Creek, June 22. Yang, H., and Guest, C.C. : “Higher Order Neural
18-20, 1989. Networks with Reduced Numbers of Interconnection
4. Stewart, G. and Du, K.F. : “Feature Selection and Weights,” IJCNN Int. Joint Conf. on Neural Networks,
Extraction for Well Test Interpretation by an Artificia! Pages III-28 1 thru’ I-284, San Diego, CA, USA, 17 - 21
intelligence Approach,” paper SPE 19820 prepared for the June, 1990.
~~th #. ~~Itn!
.. -- Technical Conference and Exhibition of the 23. Maren, A. J., Harston, C. T., and Pap, R.M. : “Handbook
SPE held in San Antonio, Texas, October 8-11, 1989. of ~xr~~rai Computing Applications,” Academic Press,
5. Watson, A. T., Gatens 111, J. M., and Lane, H.S. : London, 1990.
“Model Selection for Well Test and Production Data 24. Reid, M.B., Spirkovska, L. and Ochoa, E. : “Rapid
Analysis,” SPEFE( March, 1988)215 -221. Training of Higher-Order Neural Networks for Invariant
6. Earlougher, R.C. Jr. : “Advances in Weii Test Pattern R.econgnition,” Proc. IJCNN Int, Conf. Neural
Analysis,” Monograph Series, SPE, Richardson, Texas, Networks, Vol. 1, Pages 1-689 thru’ 1-692, 1989.
107A <
17/-r. J. 25.Perantonis, S.J. and Lisboa, P.J.G. : “Translation
7. Kumoluyi, A. O., Da/taban, T.S., and Archer, J.S.: “Weii Rotation, and Scaie !nvariant Pattern Recognition b}
Test Models Identification Using Higher Order Neural High-Order Neural Networks and Moment Classifiers,” IEEE
Networks,” paper S!% ~7~5 g prepared for European Transaction on Neural Networks, Vol. 3, No. 2, Pages 241
Petr~ieum Computer Conference, Aberdeen, Scotland, thru’ 251, ‘March i 9%2.
March 14-17, 1994. ~5, s~:~z, y. and Gitosltt J. ; “The Pi-Sigma Network : An
8. Matthews, C.S. and Russeii, A%G. : “PEWX Buildup Etllcient Higher-Order Neural Network for Pattern
and F1OW Tests in Wells,” Monograph Series, SPE, Classification and Function Approxinirhri,” L!CNN in!.
Richardson, Texas, 1964, Joint Conf. on Neural Networks, Pages 1-13 thru’ 1-18,
9. Nauta and Feritag : “Fundamentals of Neuroanatomy,” Seattle, WA, USA, 8-14 July, 1991.
Freeman Press, New York, 1989. 27. Minai, A.A. and Williams, R.D. : “Acceleration of
10. Patten, B. M. and Car[son : “Foundations of Back-Propagation through Learning Rate And Momentum
Embryology,” McGraw Hill, New York, 1974. Adaptation,” IJCNN Int. Joint Conf. on Neural Networks,
11. Carpenter, M. : “Human Neuroanatomy,” Williams and Pages 1-676 thru’ I-679, Theory Track, Neural and Cognitive
~Wilkins,7th cd., Baltimore, 1979. Sciences, Washington DC, USA, 15-19 Jan., 1990.
~12.A/bus, J.S. : : “Brains, Behaviour and Robotics,” Byte 28. Namatame, A. : “Backpropagation Learning with
~Publications, Peterborough, N. Hampshire, i 98 i. High-Order Functional Networks and Analyses of Its
13. Cites, C.L., Gr@n, R.D., and Maxwell, i? : “Encoding Internal Representation,” IJCNN Int. Joint Conf. on Neurai
eometric lnvariances in Highe~=Order Neural Networks,f’ Networks, Pages 1-680 thru’ 1-683, Theory Track, Neural
merican Institute of Physics, 1988. and Cognitive Sciences, Washington DC, USA, 15-19 Jan.,
14. Minsky, M. L., and Papert, S. : “Perceptions,” MIT 1990.
Press, Cambridge, MA, 1969. 29. Klassen, T., Pao, Y.H., and Chen, K : “Characteristics’
15. Maxwell, T., Giles, C.L., Lee. XC., and Chen, H.H. : of the Functional Link Nets,” IEEE Int. Conf. on Neural
“Transformation Invariance Using High Order Correlation Networks, Vol. 1, Pages 509thru’5 13, 1988.
in Neural Net Architectures,” IEEE Trans. Syst. Man 30.Weigend, A.S., Rumelhart, D.E. and Huberman, B.A. :
ybemetics SMC-86CH2364-8, pages 627-63 I , 1989. “Generalisation by Weight -Elimination applied to Current
116. Jones, A..l : “Models of Living Systems: Evolution and Exchange Rate Prediction, “ IJCNN lnt. Joint Conf. on
neurology,” Department of Computing, Imperial College, Neural Networks, Pages III-2374 thru’ I-2379, Signapore, 18
~London, 1992. -21 Nov., 1991.
17. Jones, A.J. : “The Modular Construction of Dynamic 31. Dobbins, R. W. and Eberharf, R.C. : “Neural Networks
eta,” Neural Computing and Applications, Springer PC rook - A Practical Guide,” Academic Press, Inc,
International, Vol. 1, No. 1, 1993. California, 1990.
I 32. Pao, Y. : “Adaptive Patiem Recognition and Neural
i 8. Jiitiafig, W.Y., end Lippmann, R. P. : “Neural Neworks
nd Traditional Classifiers,” American Institute of Physics, Networks,” Addison=Wesley, Reading, 1989.
1988. 33. DayHof$ J. : “Neural Network Architectures : An
19. Rurnelhart, D. E., and McCleUand, J. : “Parallel introduction,” Van Nostrand Reinhold, London, 1990.
Distributed Processing,” Vol. 1, M.I.T. Press, 1989. 34. Hebb, D.O. : “The Organisation of Behavior,” Wiley
1 Press. New York. 1949.
563
.

10 Higher order neural networks in Petroleum Engineering SPE 27905


5. McCulloch, W.S and Pitts, W: “A Logical Calculus of

I
e Ideas Immanent in Nervous Activity,” Bulletin of
athematical Biophysics, 5, Pages 115 thru’ 133, 1943. —-
6 Lippntair, R.P. : “An Introduction to Computing with
eural Nets,” IEEE ASSP Magazine, Pages 4 thru’ 22, April,
1984.
37. Lippman, R.P. : “Pattern Classification Using Neural
Networks,” IEEE Communications Magazine, Pages 47 thru’
63, November, 1989.
38. Specht, D. : “Probabilistic Neural Networks,” Neural
Networks 3, Pages 109 thru’ 118, 1990.
39. Taitel, Y, and Dukler, A. E.; “A Model for Predicting
Flow Regime Transitions in Horizontal and Near Horizontal
Gas-Liquid Flow,” AIChe Journal, V0122, No 1, Pages 47
thru’ 55, January, 1976.
40. Lee, J W. : “Well Testing,” SPE Textbook Series Voi. i,
New York, 1982.
41. Rosenblatt, F. : “The perception: a probabilistic model
for information storage and organisation in the brain,” Iwmu aA%cmTwr3L’”
Psychological Review 65, Pages 386 thru’ 408, 1958.
42. Pitts, W. and McCullock, W.S. : “How we know
universals: the perception of auditory and visual forms,”
Bulletin of Mathematical Biophysics 9, Pages 127 thru’ 147,
1944.
43. Hornik, K. : “Multilayer feedforward networks are
universal approximations, ” Neural Networks 2, Pages
359-366, 1989.
44. Jones, A.J. : “Genetic Algorithms and their applications
to the design of Iwrid networks,” Neural Computing and ~ -+---–$-”-
Applications, Springer International, Vol. 1, No. 1, 1993.
45. A1-Gheithy, A.A. : “Oil Well Testing in Stratified
Reservoirs,” Ph.D Thesis, Mineral Resources Engineering, Figure 1- Olfactory system neurons.
Imperial College, London, 1993.
46. Gringatten, A.(7., Bourdet, D.P., Landel, P.A., and
Kniazef$ U : “A comparison between Different Skin and
Wellbore Storage Type Curves for Early-Time Transient
Analysis,” paper SPE 8205 presented at the 54th Annual

x=
Technical Conference and Exhibition of the SPE held in Las
Vegas, Nevad& September 23-26, 1979.
47. Smith, C. U.M. : “Elements of Molecular Neurobiology,”
John Wiley & Sons, 1989.
48. Baba, N., Yamashita, Y., and Shirai.shi, Y. :
“Classification of flow patterns in two phase flow by neural
network,” ICANN-91 Int, Conf. on Artificial Neural
Networks, Pages 1617 thru’ 1620, Finland, 24-28 June,
1991.

lypDclNl
--

___
%u --–: -=*- I

Figure 2- Commingled system identified by neural


networks.

564
11
SPE27905 Kumoluyi, A.O., and Daltaban, T.S.

Well logs

w.aIllANEOus cONOUCTIVITY ~
—.~

B
@OrEW!AL
-.
. e

+J+ RESISTIVITY
---
.“’
.
. --~--, ----,-.
...---.----_-

-J!
Figure 3- SP response to different Iithologies.

.
.A.

.nA*cn,EKLs

::)
AAd-b-
iA-tilFcRE’”Ls
f,--!#\ ,,~

\lY[mr
‘s

UAOLEFIELOSs.
,-3
7 “ -- .> WAL
:., .-J
,::.. --- ----
(MAPEL —,- /“ WEST FRANKLIN
WAL -8 -- .= L$
1.
} 0:
J ,,hlo
TRIVOLI ~ ->
m
s,. ~+
=- -

{
Ooo
. ,’
+-+ – -!JJJ.\ OANVILLE
MNVUE .gq.~ -- 120AL
WAL

Figure 4- Interwell correlation.

565
12 Higher order neural network in Petroleum Engineering SPE27905

Seismic sections

Figure 5- Deformations..

Figure 6- Identification of lithoiogies boundaries and deformations.

566
SPE27905 Kumoluyi, A.O., and Daltaban, T.S 13
10Appendix A - Higher order neural network verifi that this is an equivalence relation and there are n
equivalence classes.
In this section, we discussed the implemention of higher
order neural networks. The basic properties of equivalence classes are as follows:

Every element of Z is in some equivalence class.


10.1 Method Two-equivalence classes are eith~r equal or disjoint.

The approach used here is based on an idea pioneered by 10.2.1 Equivalence Relations Proof
rr.;l~=
~.... --, r-. -.
1. et
-. all]
. . 19w.
. . , The basic theme, is that for the
output of a higher order unit to be invariant under a The following simple theorem describes the basic properties
transformation group operating on the input space, the of equivalence classes,
weights must be appropriately constrained. This is achieved
conceptually by averaging the inputs over the transformation Theorem.
group, hence applying a member of this group to an input
will be invariant. This averaging operation allows the unit to Every element of Z is in some equivalence class.
only detect features which are compatible with the imposed Two equivalence classes are either equal or disjoint.
group invariance. As a result of this averaging process,
equivalence classes of interconnection weights are Proof.
generated, thereby reducing the total number of higher order
terms. To illustrate this methodology, it is necessary to 1. By the reflexive axiom of the equivalence relation a - a.
digress and describe concisely the following concepts: Hence a = [a].
equivalence relations, equivalence classes, and groups.
2. Suppose [a] and [b] are not disjoint. Then there exists c in
10.2 Equivalence relations. [a] n [b]. Hence c ● [a] and c E [b]. By definition of these
sets we have c - a and c - b. By the symmetric axiom
applied to the first of these we obtain a - c. We now have a
It is necessary that some nodes of say, the input layer have
- c and c - b, Hence by the transitive axiom a - b.
the same weight, in order to implement translation
invariance. This means that some features in the input layer
Now let x [a] then, by definition, x -a. But a - b, so by the
must be in the same equivalence class. In order to determine
the equivalence class of a given set (input nodes), the set transitive property x - b. Hence x ~ [b]. Hence [a] G [b].
must have an equivalence reiation. Hence, in this section tiie Simii~!v,.. [bl —...
c [al Hence [a] = [b] as required.
mathematical concept of equivalence relations is explained
as follows, What this theorem says, in essence, is that any equivalence
relation partitions a set Z into disjoint subsets (equivalence
Let Z be some set. A relation R on Z is any subset of the classes) such that their union is the whole of Z. The converse
cartesian product Z X Z. An equivalence relation E on Z is a is also true. Given any disjoint partition of Z, sayZ ~ ZO,
particular kind of relation which satisfies three basic where a = ~ implies 2== Zp = @ and ranges over some
properties. index set (not necessarily finite or even countable), then this
induces an equivalence relation on Z defined by a - b if and
(Reflexive) For every a e Z we have (a, a) ● E. only if a and b are in the same subset . It is easy to check
(Symmetric) (a, b) E E implies (b, a) G E. that - so defined is indeed an equivalence relation since
,,..- mlm;
axmns 1-3 ale ml,,.Acfied
p,-,,,., -S+..”..- .
(Transitive) If (a, b) E E and (b, c) ● E then (a, c)
E E.
10.3 Groups
The first condition says that every member of Z is related to
itself. The second says that if a is related to b then b is The input layer of a neural network consists set of features.
related to a. We often write a - b to stand for (a, b) = E and In order to make the network transformation (translation,
speak of the equivalence relation ‘-’. We denote by scaling, rotation) invariant certain features as stated earlier,
must have the same weight. This means that the original set
[a]={ b: b~Zandb-a} must be partitioned into disjoint subsets, each of which is
known as a group (an equivalence class can be referred to as
the set of all elements of Z related to a under -, and call this a group in this context). These groups are then used in the
the equivalence class of a. For example if Z is the set of implementation of higher order networks.
integers, n is some positive integer n > 1, and we define a b
(mod n) to mean that n divides a - b then one can easily The mathematical definition of a group is given as follows:

567
. .

14 Higher order neural networks in Petroleum Engineering SPE27905


Let G = {g} be a finite group such that the following took thousands of iterations by a multi-layer perception, is
properties hold : solved by this method in one iteration in most cases. This
problem is described here: it is required that the output of a
l. Forall patterns, x inthe problem domain single-unit output layer be 1 if its input has an odd number
and for all gin G of 1‘s, otherwise its value should be O. There are four
patterns in all with dimension of 2, thus
gx=v
Input output
where v is also a pattern in the problem domain VI v, Y
o 0 0
2. For g, c G and g2 e G such that 1 0 1
0 1 1
g,gz CG 1 I o

then G is closed under composition. This patterns could be reformulated as

3. There is an inverse transformation, I in G -1 -1 0


such that 1 -1 1
-1 1 1
g = G then 1 1 0

g-’ ●G and gg-’x =x This problem is now invariant under the sign change group,
s
or
s= {sO,s,}= {-,+)
Ix=x
where
h also follows that associative property holds g,(g,g,) =
(g,g,)gl implicit in condition 1. SO{Vj} ‘-Vj

l’h~ -. .-.-I----
~qtiivwmbc
-I...
~lax
C.
OL a
.-+
WL
-fnatiernc
v, pL..-.sw,
# ~S d~fifi~~ g~

xs={gxslge G}
The original dynamics of the system are therefore

A higher order learning unit is invariant to this equivalence


class of patterns if its output is such that Y= j$iWIQ)Vj+WIVIVz . . . . . . . . . . (A.4)
{
y(x~ = y(gxs) for aii g in G and arbhrarY x’. Summing over the sign change groiip

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (Al)

This is true if and only if

y(I~glxs)=y(I,#) forg, ~G . . . . . . . . . . . . . . . ..(A.2)


Y=f
[I. +
j! W @(-Vj
I

W2(V1V2+VIV2)
+ Vj)
II.........
I
(A.5)

where 1~is an invariance operator, defined as


Y= f{2w2vlv2] . . . . . . . . . (A.6)
IG=~g ... . . . . . . . . . . . . . . . . . . . . . . . ..(A.3)
This means that the invariance operator must be applied to
all patterns for the output of a higher-order learning unit to The new dynamic is now equation (A.6) which could be
be invariant to an arbitrary group. learnt in one iteration of the patterns. Since there is no
correlation between the Vj and Y the first order weight
The following example (exclusive-or problem) demonstrates decreases to zero in the learning, although the V,VZ and Y
the capability of higher order neural networks, it is known are perfectly comelated hence the second order is learnt.
that this example is the subset of most real-world problem in This same technique is employed for geometric invariance.
pattern recognition. This problem, which could not be
formulated by the perception (single layer network) and
568
.

SPE27905 Kumoluyi, A.O., and Daitaban, T.S i5


llAppendLx B - Implementation of translation invariance.
~m~~ neridic
~-. ...--- ~Q~~~~~On~QUi~ & ~Mag~ned ~~ the Sitle wave
Consider a retina of size, N; where N is very large, it is valid where the amplitude is the same at a given period.
to change the dynamics as defined in equation 8 into a
continuous formulation. Moreover, it makes the derivation Making the following change of variables :
more general and it is easier to see the physical significance
of the results.
jj. j+m
. . . . . . . . . . . . . . . . . . . . . . . . . . (B.9)

kk=k+m . . . . . . . . . . . . . . . . . . . . . . . . (B.1O)
wo + ~djw,(i)vj + f~djdkwz(i,k)vjv~
Y=f Equations (B.6) and (B.7) becomes
djdk.. .dnwnQ, k
[ ~“-.l ‘“;”~”””n)vjvk”””vn 1
f{~wl(j)vjdj} = f{wl(ij-m)viidjj)
. . . . . . . . . . . . . . . . . . . . . . . . . . . (B.1) . . . . . . . . . . . . . . . . . . . . . . . . . (B.11)

Taking the terms up to the second order term for simplicity, f{flwz(j, k)VjVkdjdk}
.
Y = f{w~ + ~djwl~)Vj + ~ldjdkwZfi,k)VjVk}
[ f{~~wz(jj-m, kk - m)VjVkkdjjdkk} 1
. . . . . . . . . . . . . . . . (B.12)
,. .,.. .................... (B.2)

Let an input pattern be translated by a distance, m such that Since the integrals on both sides of the equations are the
same (that is infinite) and Vj are arbitrary, the functional
Vj=Vj+~ . . . . . . . . . . . . . . . . . . . . . . . ..(B.3) form of the weights must be equal or in the discrete
implementation, certain weights must the same.
For the output unit, i to be invariant under this translation
group means that equation A.3 holds, that is wl(j)=wl Qj-m) . . . . . . . . . . . . . . . . . . . . ..(B.13)

Y = f{w~ + ~wlQ)Vjdj + ~fw.2(j, k)VjVkdjdk} wz(j, k)=wz(j-m, kk-m) . . . . . . . . . . . . . ..(B.14)

. . . . . . . . . . . . . . . . . . . . . . . . .(B.4) Equation (B. 13) means that j = jj - m, this is not feasible


because the distance, m is not known. Hence all first order
f{wO+~wl(j)Vj+~ d(j+m)} weights must be in the same equivalence class.
.
I, Equation (B. 14) means that
~W2Vj+mVk+m~0 + rn)d(k + m) \
,

j=j-m
. . . . . . . . . . . . . . . . . . . . . . . . . . (B.15)
Because the values of input, Vj are arbitrary, term by term
equality is imposed on B.4; thus k=kk-m . . . . . . . . . . . . . . . . . . . . . . . . . .(B.16)

f{wo}=f{wo} . . . . . . . . . . . . . . . . . . . . . . ..(B.5) This means that m could be expressed in terms of known


values, j, k , jj and kk. Thus making a unit translation
invariant, impiies that oniy terms upto the second order
f{ ~wl@Vjdj ] = f{ ~wl~)vj+md(i + m)]
weight need only be considered.

. . . . . . . . . . . . . . . . . . . . . . . . . (B.6)
From equations B. 15 and B. 16

f{flwz(j, k)VjVkdjdk} j-k=jj-kk . . . . . . . . . . . . . . . . . . . . . . . ..(B.17)


.

{ f{~wz@ k)vj+~vk+~d~ + m)d(k + m)} 1 That is the second order weight must be dependent on the
distance between two points rather than being dependent on
. . . . . . . . . . . . . . . . . . . . . . . . . (B.7) the absolute points.
Thus,

boundary conditions are im~osed on the retina such &at wz(j, k)=wz~-k) . . . . . . . . . . . . . . . . . . . . . ..(B.18)

Vj=Vj+~ . . . . . . . . . . . . . . . . . . . . . ..-. .(B.8)


569
., .

16 Higher order neural networks in Petroleum Engineering SPE 27905


The implication of equation (B. 18) in the discrete
implementation is that the order of the second weight is
reduced by N; that is from a dimension of N x N to N. The
equivalence classes of weights that satis@ equation (B. 17)
are determined. Each equivalence class will be associated
with a weight, ECW that denotes the set of weights in its
class, a node ECN, and output from this node, ECO. These
equivalence class nodes will be the input nodes into the
network. They are viewed as pre-processing nodes and their
orders are similar to those of ordinary nodes. Tinese are the
expected outputs from equivalence class nodes :

11.1 First order equivalence class node output

ECOl=~Vj . . . . . . . . . . . . . . . . . . . . . . . . .(B. 19)


jcl

where N = size of the retina

11.2 Second order equivalence class node output

Ec02=~vj,1Vj,2 . . . . . . . . . . . . . . . . . . . . ..(B.20)
j=]
where nn = number of 2-tuple points in this equivalence
Ckiss,
j, 1 = co-ordinate of the first point
j,2 = co-ordinate of the second point

11.3 Nth order equivalence class node output

ECOn=g Vj,l Vj,z...Vj,n . . . . . . . . . . . . . . . . ..(B.21)


jel
where nn = number of n-tuple points in this equivalence
class
j, 1 = co-ordinate of the first point
j,2 = co-ordinate of the second point
j,n = co-ordinate of the nth point

Anda mungkin juga menyukai