s
>
=
=
=
=
threshold net if
threshold net if
f
x w net
net f out
n
i
i i
0
1
) (
1
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.6
A neuron is an abstract model of a natural neuron, as illustrated in Figs. We have inputs x
1
,
x
2
, ..., x
m
coming into the neuron. These inputs are the stimulation levels of a natural neuron.
Each input x
i
is multiplied by its corresponding weight w
i
, then the product x
i
.w
i
is fed into the
body of the neuron. The weights represent the biological synaptic strengths in a natural neuron.
The neuron adds up all the products for i = 1, m. The weighted sum of the products is usually
denoted as net in the neural network literature, so we will use this notation. That is, the neuron
evaluates net = x
1
w
1
+ x
2
w
2
+ ... + x
m
w
m
.
In mathematical terms, given two vectors x = (x
1
, x
2
, ..., x
m
) and w = (w
1
, w
2
, ..., w
m
), net is
the dot (or scalar) product of the two vectors, xw = x
1
w
1
+ x
2
w
2
+ ... + x
m
w
m
. Finally, the neuron
computes its output y as a certain function of net, i.e., y = f (net). This function is called the
activation (or sometimes transfer) function. We can think of a neuron as a sort of black box,
receiving input vector x then producing a scalar output y. The same output value y can be sent
out through multiple edges emerging from the neuron.
Fig. (a)A neuron model that retains the image of a natural neuron. (b) A further abstraction of Fig. (a).
Back Propagation Network (BPN)
It is a multi-layer forward network used extend gradient-descent waste delta learning rule.
Fig. Structure of biological neuron
The artificial neuron was designed to mimic the first order characteristics of the biological
neuron. McCulloch and Pitts suggested the first synthetic neuron in the early 1940s. In essence, a
set of inputs are applied, each representing the output of another neuron. Each input is multiplied
by a corresponding weight, analogous to a synaptic strength, and all of the weighted inputs are
then summed to determine the activation level of the neuron. If this activation exceeds a certain
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.7
threshold the unit produces an output response. This functionality is captured in the artificial
neuron known as the threshold logic unit (TLU) originally proposed by McCulloch and Pitts.
Fig. Artificial neuron structure (perceptron model)
Figure shows a model that implement this idea. Despite of the diversity of network paradigms,
nearly all are based upon this neuron configuration. Here a set of input labeled X1, X2, . . . .,Xn
is applied from the input space to artificial neuron. These inputs, collectively referred as the
input vector X corresponds to the signal into the synapses of biological neuron. Each signal is
multiplied by an associated weight W1, W2, . . .Wn, before it is applied to the summation block.
The activation a, is given by
This may be represented more compactly as
the output y is then given by y = f(a), where f is a activation function. In McCullohPitts
Perceptron model hard limiter as activation function was used and defined as:
The threshold s will often be zero. The activation function is sometimes called a step-function.
Some more non-linear activation functions also tried by the researchers like sigmoid, Gaussian,
etc. and the neuron responses for different activation functions shown in Fig. 3.3
Network Architectures
Network architectures can be categorized to three main types: feedforward networks,
recurrent networks (feedback networks) and self-organizing networks. This classification of
networks has been proposed by Kohonen [1990]. Network is feedforward if all of the hidden and
output neurons receive inputs from the preceding layer only. The input is presented to the input
layer and it is propagated forwards through the network. Output never forms a part of its own
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.8
input. Recurrent network has at least one feedback loop, i.e., cyclic connection, which means that
at least one of its neurons feed its signal back to the inputs of all the other neurons. The behavior
of such networks may be extremely complex.
Haykin divides networks into four classes [Haykin, 1994]: 1) single-layer feedforward
networks, 2) multilayer feedforward networks, 3) recurrent networks, and 4) lattice structures. A
lattice network is a feedforward network, which has output neurons arranged in rows and
columns.
Layered networks are said to be fully connected if every node in each layer is connected to
all the following layer nodes. If any of the connections is missing, then network is said to be
partially connected. Partially connected networks can be formedif some prior information about
the problem is available and this information supports the use of such a structure. The following
treatment of networks applies mainly to feed forward networks (single layer networks, MLP,
RBF, etc.). The designation n-layer network refers to the number of computational nodes or the
number of weight connection layers. Thus the input node layer is not taken into account.
Feed-forward networks
Feed-forward ANNs allow signals to travel one way only from input to output. There is
no feedback (loops) i.e. the output of any layer does not affect that same layer. Feed-forward
ANNs tend to be straight forward networks that associate inputs with outputs. They are
extensively used in pattern recognition. This type of organisation is also referred to as bottom-up
or top-down.
Fig.3.7 An example of a simple feedforward network
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.9
Figure A feedforward network
Feedback networks
Feedback networks can have signals travelling in both directions by introducing loops in
the network. Feedback networks are very powerful and can get extremely complicated. Feedback
networks are dynamic; their 'state' is changing continuously until they reach an equilibrium
point. They remain at the equilibrium point until the input changes and a new equilibrium needs
to be found. Feedback architectures are also referred to as interactive or recurrent, although the
latter term is often used to denote feedback connections in single-layer organisations.
Multilayered/non-multilayered - Topology of the network architecture
(i) Multilayered
The back propagation model is multilayered since it has distinct layers such as input, hidden,
and output. The neurons within each layer are connected with the neurons of the adjacent layers
through directed edges. There are no connections among the neurons within the same layer.
(ii) Non-multilayered
We can also build neural network without such distinct layers as input, output, or hidden.
Every neuron can be connected with every other neuron in the network through directed edges.
Every neuron may input as well as output. A typical example is the Hopfield model.
Non-recurrent/recurrent - Directions of output
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.10
(i) Non-recurrent (feedforward only)
In the backpropagation model, the outputs always propagate from left to right in the
diagrams. This type of output propagation is called feedforward. In this type, outputs from the
input layer neurons propagate to the right, becoming inputs to the hidden layer neurons, and then
outputs from the hidden layer neurons propagate to the right becoming inputs to the output layer
neurons. Neural network models with feedforward only are called non-recurrent. Incidentally,
"backpropagation" in the backpropagation model should not be confused with feedbackward.
The backpropagation is backward adjustments of the weights, not output movements from
neurons.
(ii) Recurrent (both feedforward and feedbackward)
In some other neural network models, outputs can also propagate backward, i.e., from right
to left. This is called feedbackward. A neural network in which the outputs can propagate in
both directions, forward and backward, is called a recurrent model. Biological systems have
such recurrent structures. A feedback system can be represented by an equivalent feedforward
system
Single-layer Feed forward Networks
A Single Layer Feed forward Network represents the simplest form of Neural Network. In
such Network, there are only 2 layers, an input layer and an output layer. The phrase Single
layer refers to the output layer of neurons (computation nodes). The input layer is not
considered as a layer as no computation is done in the layer. The inputs are multiplied by a
weight denoted by W. For instance, the input X
1
is multiplied by a weight of W
1
. The same is
done for the rest of the inputs as well. Finally a weight vector comprising of all the weights is
formed. The result of all the multiplication of the inputs and weights are then fed to the summer
where addition is executed. The output of the summer is then fed to the Linear Threshold unit. If
the input to the summer is above the threshold level, an output of 1 will take place. Else, an
output of 0 will occur. All the data can be presented to the Network in binary. (Example: 1
and 0) or in bipolar (Example: 1 and -1) Figure illustrates the block diagram of a Single
Layer Feed forward Network.
Fig. Block Diagram of Single Layer Feed forward Network
The simplest choice of neural network is the following weighted sum
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.11
where d is the dimension of input space, x0 = 1 and w0 is the bias parameter. Input vector x can
be considered as a set of activations of input layer. In classification problem is called a
discriminant function, because y(x) = 0 can be interpreted as a decision boundary. Weight vector
w determines the orientation of decision plane and bias parameter w0 determines the distance
from origin. In regression problems the use of this kind of network is limited; only (d-1) -
dimensional hyper planes can be modeled. An example of single-layer networks is a linear
associative memory, which associates an output vector with an input vector.
where again x0 = 1 and wk 0 is the bias parameter. The connection from input i to output k is
weighted by a weight parameter wki .
Figure The simplest neural networks. Computation is done in the second layer of nodes.
Functions of the form can be generalized by using a (monotonic) linear or nonlinear activation
function which acts on the weighted sum as
where g(v) is usually chosen to be a threshold function, piecewise linear, logistic sigmoid,
sigmoidal or hyperbolic tangent function (tanh). The first neuron model was of this type and was
proposed as early as in 1940s by McCulloch and Pitts.
Threshold function (step function):
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.12
Piecewise linear function (pseudolinear)
Logistic sigmoid
Multilayer Feed forward Network
The clear distinction between a single and Multilayer Feed forward Network is the
introduction of hidden units. In a single layered network there is an input layer of source nodes
and an output layer of neurons. A multi-layer network has in addition one or more hidden layers
of hidden neurons. Some standard three-layer feed-forward networks are used widely.
The objective of the hidden unit is to intervene between the input and output layer,
enabling the Network to extract higher-order statistics. Figure illustrates the Architecture of the
Multilayer Feed forward Network. The data processing between the input layer and the summer
is similar to the single layer feed forward Network. Apart from formation of a weight vector
between the two layers, another vector between the hidden units and the output layer must be
formed.
A representative feed-forward neural network consists of a three layer structure: input
layer, output layer and hidden layer. Each layer is composed of variable nodes. The number of
nodes in the hidden layers is selected to make the network more efficient and to interpret the data
more accurately. The relationship between the input and output can be non-linear or linear, and
its characteristics are determined by the weights assigned to the connections between the nodes
in the two adjacent layers. Changing the weight will change the input-to-output behavior of the
network.
Fig. A fully connected feed-forward network with one hidden layer and one output layer
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.13
Figure 3.10 The multilayer perceptron network
The summing junction of the hidden unit is obtained by the following weighted linear
combination:
where wji is a weight in the first layer (from input unit i to hidden unit j) and wj0 is the bias for
hidden unit j. The activation (output) of hidden unit j is then obtained by
For output of whole network the following activation is constructed
Two-layered multilayer perceptron in Fig. can be represented as a function by combining the
previous expressions in the form
The activation function for the output unit can be linear. In that case becomes a special case of
in which the basis functions are
If the activation functions in the hidden layer are linear, then such a network can be converted
into an equivalent network without hidden units by forming a single linear transformation of two
successive linear transformations. So the networks having non-linear hidden unit activation
functions are preferred.
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.14
Note: In literature the equation for output of neuron can also be seen written as Follows
where uk is the bias parameter. Mathematically this is equivalent to the former equations where
the bias was included in the summation. We can always set wk 0 k = u and x0 = 1. The sign
can, of course, be included in weight parameter and not in the input.
MLPs are mainly used for functional approximation rather than classification problems.
They are generally unsuitable for modeling functions with significant local variations. The
universal approximation theorem states that MLP can approximate any continuous function
arbitrarily well, although it does not provide indication about the complexity of MLP. The
Vapnik-Chervonenkis dimension dVC gives a rough approximation about the complexity.
According to this principle, the amount of training data should be approximately ten times the
dVC, or the number of weights in MLP.
MLPs are suitable for high-dimensional function approximation, if the desired function can
be approximated by a low number of ridge functions (MLPs employ ridge functions in the
hidden layer). They may perform well, although the training data have redundant inputs.
Back-Propagation Algorithm
Multiple layer perceptrons have been applied successfully to solve some difficult diverse
problems by training them in a supervised manner with a highly popular algorithm known as the
error back-propagation algorithm. This algorithm is based on the error-correction learning rule.
It may be viewed as a generalization of an equally popular adaptive filtering algorithm- the least
mean square (LMS) algorithm.
Error back-propagation learning consists of two passes through the different layers of the
network: a forward pass and a backward pass. In the forward pass, an input vector is applied to
the nodes of the network, and its effect propagates through the network layer by layer. Finally, a
set of outputs is produced as the actual response of the network. During the forward pass the
weights of the networks are all fixed. During the backward pass, the weights are all adjusted in
accordance with an error correction rule. The actual response of the network is subtracted from a
desired response to produce an error signal. This error signal is then propagated backward
through the network, against the direction of synaptic connections. The weights are adjusted to
make the actual response of the network move closer to the desired response.
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.15
Fig.3.11 Multiple layer perceptrons with back-propagation algorithm
A multilayer perceptron has three distinctive characteristics:
1. The model of each neuron in the network includes a nonlinear activation function. The
sigmoid function is commonly used which is defined by the logistic function:
1.
Another commonly used function is hyperbolic tangent.
2.
The presence of nonlinearities is important because otherwise the input- output relation of the
network could be reduced to that of single layer perceptron.
2. The network contains one or more layers of hidden neurons that are not part of the input or
output of the network. These hidden neurons enable the network to learn complex tasks.
3. The network exhibits a high degree of connectivity. A change in the connectivity of the
network requires a change in the population of their weights.
Learning Process
To illustrate the process a three layer neural network with two inputs and one output,which
is shown in the picture below, is used.
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.16
Each neuron is composed of two units. First unit adds products of weights coefficients and
input signals. The second unit realise nonlinear function, called neuron activation function.
Signal e is adder output signal, and y = f(e) is output signal of nonlinear element. Signal y is also
output signal of neuron.
Three layer neural network with two inputs and single output
The training data set consists of input signals (x1 and x2 ) assigned with corresponding
target (desired output) y. The network training is an iterative process. In each iteration weights
coefficients of nodes are modified using new data from training data set. Symbols wmn represent
weights of connections between output of neuron m and input of neuron n in the next layer.
Symbols yn represents output signal of neuron n.
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.17
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.18
Propagation of signals through the output layer.
In the next algorithm step the output signal of the network y is compared with the desired
output value (the target), which is found in training data set. The difference is called error signal
of output layer neuron.
It is impossible to compute error signal for internal neurons directly, because output values of
these neurons are unknown. For many years the effective method for training multiplayer
networks has been unknown. Only in the middle eighties the backpropagation algorithm has been
worked out. The idea is to propagate error signal (computed in single teaching step) back to all
neurons, which output signals were input for discussed neuron.
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.19
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.20
When the error signal for each neuron is computed, the weights coefficients of each neuron
input node may be modified. In formulas below df(e)/de represents derivative of neuron
activation function (which weights are modified).
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.21
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.22
Coefficient affects network teaching speed. There are a few techniques to select this
parameter. The first method is to start teaching process with large value of the parameter. While
weights coefficients are being established the parameter is being decreased gradually. The
second, more complicated, method starts teaching with small parameter value. During the
teaching process the parameter is being increased when the teaching is advanced and then
decreased again in the final stage. Starting teaching process with low parameter value enables to
determine weights coefficients signs.
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.23
Fig 3.2 Flowchart showing working of BPA
Recurrent Neural Network (RNN)
A feed forward architecture does not maintain a short-term memory. Any memory effects are
due to the way past inputs are re-presented to the network.
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.24
Fig. 3.11 A simple recurrent network
A simple recurrent network has activation feedback which embodies short-term memory. A
state layer is updated not only with the external input of the network but also with activation
from the previous forward propagation. The feedback is modified by a set of weights as to enable
automatic adaptation through learning (e.g. backpropagation).
Fig. 3.12 A simple recurrent network
Neural networks with closed paths in their topology are known as recurrent neural
networks (RNNs). RNNs are an improvement on MLPNs, and are characterized by cyclic paths
between neurons. RNNs can propagate data from later processing stages to earlier stages. In
RNNs, the present activation state is a function of the previous activation state as well as the
AI Applications to Power Systems
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.25
present inputs. In essence, the recurrent connections allow storing information from the past
input and the past state of the network. Adding feedback from the prior activation step introduces
a form of memory to the process. This enhances the networks ability to learn temporal
sequences without fundamentally changing the training process. Therefore, RNNs have the
capability of dealing with spatio-temporal problems which have been found to be difficult for
feedforward networks
A recurrent neural network differs from a feedforward neural network in the fact that
there are no restrictions on the placements of synapes in a recurrent network. This makes all
kinds of feedbacks and connections possible and achieves the full computational power of neural
networks. With such a general architecture, recurrent neural networks have important capabilities
not found in feedforward networks, such as attractor dynamics and the ability to identify a time-
varying system.
Various learning algorithms in recurrent neural networks have been proposed. Algorithms
for associative memory networks which are recurrent networks settling to stable states have been
proposed by Hopfield and Pineda. However, Jordan, Gallant and King and Pearlmutter develop
algorithms to train recurrent networks to handle time-varying systems. The algorithm is a real-
time recurrent learning algorithm for completely recurrent networks running in continually
sampled time devised by R. J. Williams and D. Zipser[S].
The real-time recurrent learning algorithm exhibits the generality of the backpropagation-
through-time approach without the growing memory requirement in arbitrarily long training
sequence. With the feedbacks from the output layer, a small recurrent neural network can well
simulate a time-varying and nonlinear system.
A typical real-time recurrent neural network is shown in Figure. It consists of two layers:
output layer and input layer. The output layer includes output and hidden neurons. Some or all of
the output/hidden neurons are delayed and fedback to the input layer. Therefore, the input layer
consists of delayed output and external input. The algorithm proceeds as follows:
Fig. A Recurrent Neural Network
1. Forward process: compute output y
j
for all j C as
Unit III Artificial Neural Network
A.S.S.Murugan, SL/EEE, KLNCE, Pottapalayam 3.26
2. Backward process: with
Compute error gradient as
3. Weight updates:
where
A : external input neurons
B : feedback output/hidden neurons
O : desired output neurons
C : all output/hidden neurons
U
i
: neurons of input layer where i A U B