Anda di halaman 1dari 65

UNIT : III

SOFT COMPUTING
II SEMESTER (MCSE 205)



















PREPARED BY ARUN PRATAP SINGH

PREPARED BY ARUN PRATAP SINGH 1

1

UNSUPERVISED LEARNING IN NEURAL NETWORK :




UNIT : III

PREPARED BY ARUN PRATAP SINGH 2

2



PREPARED BY ARUN PRATAP SINGH 3

3








PREPARED BY ARUN PRATAP SINGH 4

4





PREPARED BY ARUN PRATAP SINGH 5

5
In machine learning, the problem of unsupervised learning is that of trying to find hidden
structure in unlabeled data. Since the examples given to the learner are unlabeled, there is no
error or reward signal to evaluate a potential solution. This distinguishes unsupervised learning
from supervised learning and reinforcement learning.
Unsupervised learning is closely related to the problem of density
estimation in statistics.
[1]
However unsupervised learning also encompasses many other
techniques that seek to summarize and explain key features of the data. Many methods employed
in unsupervised learning are based on data mining methods used to preprocess data.
Approaches to unsupervised learning include:
clustering (e.g., k-means, mixture models, hierarchical clustering),
hidden Markov models,
blind signal separation using feature extraction techniques for dimensionality
reduction (e.g., principal component analysis, independent component analysis, non-negative
matrix factorization, singular value decomposition).
Among neural network models, the self-organizing map (SOM) and adaptive resonance
theory (ART) are commonly used unsupervised learning algorithms. The SOM is a topographic
organization in which nearby locations in the map represent inputs with similar properties. The
ART model allows the number of clusters to vary with problem size and lets the user control the
degree of similarity between members of the same clusters by means of a user-defined constant
called the vigilance parameter. ART networks are also used for many pattern recognition tasks,
such as automatic target recognition and seismic signal processing. The first version of ART was
"ART1", developed by Carpenter and Grossberg (1988).


COUNTERPROPAGATION NETWORK:
The counterpropagation network is a hybrid network. It consists of an outstar network and a
competitive filter network. It was developed in 1986 by Robert Hecht-Nielsen. It is guaranteed to
find the correct weights, unlike regular back propagation networks that can become trapped in
local minimums during training.
The input layer neurode connect to each neurode in the hidden layer. The hidden layer is a
Kohonen network which categorizes the pattern that was input. The output layer is an outstar array
which reproduces the correct output pattern for the category.
Training is done in two stages. The hidden layer is first taught to categorize the patterns and the
weights are then fixed for that layer. Then the output layer is trained. Each pattern that will be

PREPARED BY ARUN PRATAP SINGH 6

6
input needs a unique node in the hidden layer, which is often too large to work on real world
problems.


The CounterPropagation update algorithm updates a net that consists of a input, hidden and
output layer. In this case the hidden layer is called the Kohonen layer and the output layer is called
the Grossberg layer. At the beginning of the algorithm the output of the input neurons is equal to
the input vector. The input vector is normalized to the length of one. Now the progression of the
Kohonen layer starts.
This means that a neuron with the highest net input is identified. The activation of this winner
neuron is set to 1. The activation of all other neurons in this layer is set to 0. Now the output of all
output neurons is calculated. There is only one neuron of the hidden layer with the activation and
the output set to 1.
This and the fact that the activation and the output of all output neurons is the weighted sum on
the output of the hidden neurons implies that the output of the output neurons is the weight of the
link between the winner neuron and the output neurons. This update function makes sense only
in combination with the CPN learning function.

PREPARED BY ARUN PRATAP SINGH 7

7



PREPARED BY ARUN PRATAP SINGH 8

8




PREPARED BY ARUN PRATAP SINGH 9

9



PREPARED BY ARUN PRATAP SINGH 10

10


PREPARED BY ARUN PRATAP SINGH 11

11


PREPARED BY ARUN PRATAP SINGH 12

12



PREPARED BY ARUN PRATAP SINGH 13

13


PREPARED BY ARUN PRATAP SINGH 14

14



PREPARED BY ARUN PRATAP SINGH 15

15


PREPARED BY ARUN PRATAP SINGH 16

16
















PREPARED BY ARUN PRATAP SINGH 17

17
ARCHITECTURE OF COUNTER PROPAGATION NETWORK :




PREPARED BY ARUN PRATAP SINGH 18

18



PREPARED BY ARUN PRATAP SINGH 19

19



PREPARED BY ARUN PRATAP SINGH 20

20


PREPARED BY ARUN PRATAP SINGH 21

21







PREPARED BY ARUN PRATAP SINGH 22

22
ASSOCIATIVE MEMORY:



PREPARED BY ARUN PRATAP SINGH 23

23



PREPARED BY ARUN PRATAP SINGH 24

24



PREPARED BY ARUN PRATAP SINGH 25

25


PREPARED BY ARUN PRATAP SINGH 26

26


PREPARED BY ARUN PRATAP SINGH 27

27


PREPARED BY ARUN PRATAP SINGH 28

28


PREPARED BY ARUN PRATAP SINGH 29

29



PREPARED BY ARUN PRATAP SINGH 30

30


PREPARED BY ARUN PRATAP SINGH 31

31







PREPARED BY ARUN PRATAP SINGH 32

32


PREPARED BY ARUN PRATAP SINGH 33

33



PREPARED BY ARUN PRATAP SINGH 34

34


Bidirectional associative memory (BAM) is a type of recurrent neural network. BAM was
introduced by Bart Kosko in 1988. There are two types of associative memory, auto-associative
and hetero-associative. BAM is hetero-associative, meaning given a pattern it can return another
pattern which is potentially of a different size. It is similar to the Hopfield network in that they are
both forms of associative memory. However, Hopfield nets return patterns of the same size.

PREPARED BY ARUN PRATAP SINGH 35

35
Procedure-
Learning
Imagine we wish to store two associations, A1:B1 and A2:B2.
A1 = (1, 0, 1, 0, 1, 0), B1 = (1, 1, 0, 0)
A2 = (1, 1, 1, 0, 0, 0), B2 = (1, 0, 1, 0)
These are then transformed into the bipolar forms:
X1 = (1, -1, 1, -1, 1, -1), Y1 = (1, 1, -1, -1)
X2 = (1, 1, 1, -1, -1, -1), Y2 = (1, -1, 1, -1)
From there, we calculate where denotes the transpose. So,

Recall
To retrieve the association A1, we multiply it by M to get (4, 2, -2, -4), which, when run through a
threshold, yields (1, 1, 0, 0), which is B1. To find the reverse association, multiply this by the
transpose of M.


PREPARED BY ARUN PRATAP SINGH 36

36








PREPARED BY ARUN PRATAP SINGH 37

37


HOPFIELD NETWORK:
A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in
1982. Hopfield nets serve as content-addressable memory systems with binarythreshold nodes.
They are guaranteed to converge to a local minimum, but convergence to a false pattern (wrong
local minimum) rather than the stored pattern (expected local minimum) can occur. Hopfield
networks also provide a model for understanding human memory.
Structure-
The units in Hopfield nets are binary threshold units, i.e. the units only take on two different values
for their states and the value is determined by whether or not the units' input exceeds their
threshold. Hopfield nets normally have units that take on values of 1 or -1, and this convention
will be used throughout the article. However, other literature might use units that take values of 0
and 1.
Every pair of units i and j in a Hopfield network have a connection that is described by the
connectivity weight . In this sense, the Hopfield network can be formally described as a
complete undirected graph , where is a set of McCulloch-Pitts
neuronsand is a function that links pairs of nodes to a real value, the connectivity
weight.
The connections in a Hopfield net typically have the following restrictions:
(no unit has a connection with itself)
(connections are symmetric)
The requirement that weights be symmetric is typically used, as it will guarantee that the energy
function decreases monotonically while following the activation rules, and the network may exhibit

PREPARED BY ARUN PRATAP SINGH 38

38
some periodic or chaotic behaviour if non-symmetric weights are used. However, Hopfield found
that this chaotic behavior is confined to relatively small parts of the phase space, and does not
impair the network's ability to act as a content-addressable associative memory system.

A Hopfield net with four nodes

Updating-
Updating one unit (node in the graph simulating the artificial neuron) in the Hopfield network is
performed using the following rule:

where:
is the strength of the connection weight from unit j to unit i (the weight of the connection).
is the state of unit j.
is the threshold of unit i.
Updates in the Hopfield network can be performed in two different ways:
Asynchronous: Only one unit is updated at a time. This unit can be picked at random, or a
pre-defined order can be imposed from the very beginning.
Synchronous: All units are updated at the same time. This requires a central clock to the
system in order to maintain synchronization. This method is less realistic, since biological or
physical systems lack a global clock that keeps track of time.

PREPARED BY ARUN PRATAP SINGH 39

39
Neurons attract or repel each other
The weight between two units has a powerful impact upon the values of the neurons. Consider
the connection weight between two neurons i and j. If , the updating rule implies
that:
when , the contribution of j in the weighted sum is positive. Thus, is pulled by j
towards its value
when , the contribution of j in the weighted sum is negative. Then again, is pulled
by j towards its value
Thus, the values of neurons i and j will converge if the weight between them is positive. Similarly,
they will diverge if the weight is negative.

Training-
Training a Hopfield net involves lowering the energy of states that the net should "remember".
This allows the net to serve as a content addressable memory system, that is to say, the network
will converge to a "remembered" state if it is given only part of the state. The net can be used to
recover from a distorted input to the trained state that is most similar to that input. This is called
associative memory because it recovers memories on the basis of similarity. For example, if we
train a Hopfield net with five units so that the state (1, 0, 1, 0, 1) is an energy minimum, and we
give the network the state (1, 0, 0, 0, 1) it will converge to (1, 0, 1, 0, 1). Thus, the network is
properly trained when the energy of states which the network should remember are local minima.
Learning rules-
There are various different learning rules that can be used to store information in the memory of
the Hopfield Network. It is desirable for a learning rule to have both of the following two properties:
Local: A learning rule is local if each weight is updated using information available to neurons
on either side of the connection that is associated with that particular weight.
Incremental: New patterns can be learned without using information from the old patterns that
have been also used for training. That is, when a new pattern is used for training, the new
values for the weights only depend on the old values and on the new pattern.
[1]

These properties are desirable, since a learning rule satisfying them is more biologically plausible.
For example, since the human brain is always learning new concepts, one can reason that human
learning is incremental. A learning system that would not be incremental would generally be
trained only once, with a huge batch of training data.


PREPARED BY ARUN PRATAP SINGH 40

40
Hebbian learning rule for Hopfield networks
The Hebbian Theory was introduced by Donald Hebb in 1949, in order to explain "associative
learning", in which simultaneous activation of neuron cells leads to pronounced increases in
synaptic strength between those cells.

It is often summarized as "Neurons that fire together, wire
together. Neurons that fire out of sync, fail to link".
The Hebbian rule is both local and incremental. For the Hopfield Networks, it is implemented in
the following manner, when learning binary patterns:

where represents bit i from pattern .
If the bits corresponding to neurons i and j are equal in pattern , then the product will be
positive. This would, in turn, have a positive effect on the weight and the values of i and j will
tend to become equal. The opposite happens if the bits corresponding to neurons i and j are
different.

The Storkey learning rule
This rule was introduced by Amos Storkey in 1997 and is both local and incremental. Storkey also
showed that a Hopfield network trained using this rule has a greater capacity than a corresponding
network trained using the Hebbian rule.
[3]
The weight matrix of an attractor neural network is said
to follow the Storkey learning rule if it obeys:

where is a form of local field
[1]
at neuron i.
This learning rule is local, since the synapses take into account only neurons at their sides. The
rule makes use of more information from the patterns and weights than the generalized Hebbian
rule, due to the effect of the local field.






PREPARED BY ARUN PRATAP SINGH 41

41
ADAPTIVE RESONANCE THEORY:



Basic ART architecture


PREPARED BY ARUN PRATAP SINGH 42

42

Grossberg competitive network

Grossberg Network-
The L1-L2 connections are instars, which performs a clustering (or categorization)
operation. When an input pattern is presented, it is multiplied (after normalization) by
the L1-L2 weight matrix.
A competition is performed at Layer 2 to determine which row of the weight matrix is
closest to the input vector. That row is then moved toward the input vector.
After learning is complete, each row of the L1-L2 weight matrix is a prototype
pattern, which represents a cluster (or a category) of input vectors.

ART Networks
Learning of ART networks also occurs in a set of feedback connections from Layer 2
to Layer 1. These connections are outstars which perform pattern recall.
When a node in Layer 2 is activated, this reproduces a prototype pattern (the
expectation) at layer 1.
Layer 1 then performs a comparison between the expectation and the input pattern.
When the expectation and the input pattern are NOT closely matched, the orienting
subsystem causes a reset in Layer 2.
The reset disables the current winning neuron, and the current expectation is removed.
A new competition is then performed in Layer 2, while the previous winning neuron is
disable.
Input
Layer 1
(Retina)
Layer 2
(Visual Cortex)
LTM
(Adaptive
Weights)
STM
Normalization
Constrast
Enhancement

PREPARED BY ARUN PRATAP SINGH 43

43
The new winning neuron in Layer 2 projects a new expectation to Layer 1, through the
L2-L1 connections.
This process continues until the L2-L1 expectation provides a close enough match to the
input pattern.
ART Architecture
Bottom-up weights bij
Top-down weights tij
Store class template
Input nodes
Vigilance test
Input normalisation
Output nodes
Forward matching
Long-term memory
ANN weights
Short-term memory
ANN activation pattern


PREPARED BY ARUN PRATAP SINGH 44

44

The basic ART system is unsupervised learning model. It typically consists of
a comparison field and a recognition field composed of neurons,
a vigilance parameter, and
a reset module
Comparison field
The comparison field takes an input vector (a one-dimensional array of values)
and transfers it to its best match in the recognition field. Its best match is the
single neuron whose set of weights (weight vector) most closely matches the
input vector.
Recognition field
Each recognition field neuron, outputs a negative signal proportional to that
neuron's quality of match to the input vector to each of the other recognition field
neurons and inhibits their output accordingly. In this way the recognition field
exhibits lateral inhibition, allowing each neuron in it to represent a category to
which input vectors are classified.
Vigilance parameter
After the input vector is classified, a reset module compares the strength of the
recognition match to a vigilance parameter. The vigilance parameter has
considerable influence on the system.
Reset Module
The reset module compares the strength of the recognition match to the vigilance
parameter.

PREPARED BY ARUN PRATAP SINGH 45

45
If the vigilance threshold is met, then training commences.

ART Algorithm

ART Types :
ART-1
Binary input vectors
Unsupervised NN that can be complemented with external changes to the
vigilance parameter
ART-2
Real-valued input vectors
ART-3
Parallel search of compressed or distributed pattern recognition codes in a
NN hierarchy.
Search process leads to the discovery of appropriate representations of a
non stationary input environment.
Chemical properties of the synapse emulated in the search process




PREPARED BY ARUN PRATAP SINGH 46

46
The ART-1 Network :


Applications of ART :
Mobile robot control
Facial recognition
Land cover classification
Target recognition
Medical diagnosis
Signature verification



PREPARED BY ARUN PRATAP SINGH 47

47


Learning model :
The basic ART system is an unsupervised learning model. It typically consists of a comparison
field and a recognition field composed of neurons, a vigilance parameter (threshold of
recognition), and a reset module. The comparison field takes an input vector (a one-dimensional
array of values) and transfers it to its best match in the recognition field. Its best match is the
single neuron whose set of weights (weight vector) most closely matches the input vector. Each
recognition field neuron outputs a negative signal (proportional to that neurons quality of match
to the input vector) to each of the other recognition field neurons and thus inhibits their output. In
this way the recognition field exhibits lateral inhibition, allowing each neuron in it to represent a
category to which input vectors are classified. After the input vector is classified, the reset module
compares the strength of the recognition match to the vigilance parameter. If the vigilance
parameter is overcome, training commences: the weights of the winning recognition neuron are

PREPARED BY ARUN PRATAP SINGH 48

48
adjusted towards the features of the input vector. Otherwise, if the match level is below the
vigilance parameter the winning recognition neuron is inhibited and a search procedure is carried
out. In this search procedure, recognition neurons are disabled one by one by the reset function
until the vigilance parameter is overcome by a recognition match. In particular, at each cycle of
the search procedure the most active recognition neuron is selected and then switched off if its
activation is below the vigilance parameter (note that it thus releases the remaining recognition
neurons from its inhibition). If no committed recognition neurons match overcomes the vigilance
parameter, then an uncommitted neuron is committed and its weights are adjusted towards
matching the input vector. The vigilance parameter has considerable influence on the system:
higher vigilance produces highly detailed memories (many, fine-grained categories), while lower
vigilance results in more general memories (fewer, more-general categories).
Training :
There are two basic methods of training ART-based neural networks: slow and fast. In the slow
learning method, the degree of training of the recognition neurons weights towards the input
vector is calculated to continuous values with differential equations and is thus dependent on the
length of time the input vector is presented. With fast learning, algebraic equations are used to
calculate degree of weight adjustments to be made, and binary values are used. While fast
learning is effective and efficient for a variety of tasks, the slow learning method is more
biologically plausible and can be used with continuous-time networks (i.e. when the input vector
can vary continuously).

SUPPORT VECTOR MACHINE:
In machine learning, support vector machines (SVMs, also support vector networks)
are supervised learning models with associated learning algorithms that analyze data and
recognize patterns, used for classification and regression analysis. Given a set of training
examples, each marked as belonging to one of two categories, an SVM training algorithm builds
a model that assigns new examples into one category or the other, making it a non-
probabilistic binary linear classifier. An SVM model is a representation of the examples as points
in space, mapped so that the examples of the separate categories are divided by a clear gap that
is as wide as possible. New examples are then mapped into that same space and predicted to
belong to a category based on which side of the gap they fall on.
A Support Vector Machine (SVM) performs classification by constructing an N-dimensional
hyperplane that optimally separates the data into two categories. SVM models are closely related
to neural networks. In fact, a SVM model using a sigmoid kernel function is equivalent to a two-
layer, perceptron neural network.
Support Vector Machine (SVM) models are a close cousin to classical multilayer
perceptron neural networks. Using a kernel function, SVMs are an alternative training method for
polynomial, radial basis function and multi-layer perceptron classifiers in which the weights of the
network are found by solving a quadratic programming problem with linear constraints, rather than
by solving a non-convex, unconstrained minimization problem as in standard neural network
training.

PREPARED BY ARUN PRATAP SINGH 49

49
In the parlance of SVM literature, a predictor variable is called an attribute, and a transformed
attribute that is used to define the hyperplane is called a feature. The task of choosing the most
suitable representation is known as feature selection. A set of features that describes one case
(i.e., a row of predictor values) is called a vector. So the goal of SVM modeling is to find the
optimal hyperplane that separates clusters of vector in such a way that cases with one category
of the target variable are on one side of the plane and cases with the other category are on the
other size of the plane. The vectors near the hyperplane are the support vectors. The figure below
presents an overview of the SVM process.

A Two-Dimensional Example
Before considering N-dimensional hyperplanes, lets look at a simple 2-dimensional example.
Assume we wish to perform a classification, and our data has a categorical target variable with
two categories. Also assume that there are two predictor variables with continuous values. If we
plot the data points using the value of one predictor on the X axis and the other on the Y axis we
might end up with an image such as shown below. One category of the target variable is
represented by rectangles while the other category is represented by ovals.

PREPARED BY ARUN PRATAP SINGH 50

50

In this idealized example, the cases with one category are in the lower left corner and the cases
with the other category are in the upper right corner; the cases are completely separated. The
SVM analysis attempts to find a 1-dimensional hyperplane (i.e. a line) that separates the cases
based on their target categories. There are an infinite number of possible lines; two candidate
lines are shown above. The question is which line is better, and how do we define the optimal
line.
The dashed lines drawn parallel to the separating line mark the distance between the dividing line
and the closest vectors to the line. The distance between the dashed lines is called the margin.
The vectors (points) that constrain the width of the margin are the support vectors. The following
figure illustrates this.

An SVM analysis finds the line (or, in general, hyperplane) that is oriented so that the margin
between the support vectors is maximized. In the figure above, the line in the right panel is
superior to the line in the left panel.

PREPARED BY ARUN PRATAP SINGH 51

51
If all analyses consisted of two-category target variables with two predictor variables, and the
cluster of points could be divided by a straight line, life would be easy. Unfortunately, this is not
generally the case, so SVM must deal with (a) more than two predictor variables, (b) separating
the points with non-linear curves, (c) handling the cases where clusters cannot be completely
separated, and (d) handling classifications with more than two categories.
Flying High on Hyperplanes
In the previous example, we had only two predictor variables, and we were able to plot the points
on a 2-dimensional plane. If we add a third predictor variable, then we can use its value for a third
dimension and plot the points in a 3-dimensional cube. Points on a 2-dimensional plane can be
separated by a 1-dimensional line. Similarly, points in a 3-dimensional cube can be separated by
a 2-dimensional plane.

As we add additional predictor variables (attributes), the data points can be represented in N-
dimensional space, and a (N-1)-dimensional hyperplane can separate them.
When Straight Lines Go Crooked
The simplest way to divide two groups is with a straight line, flat plane or an N-dimensional
hyperplane. But what if the points are separated by a nonlinear region such as shown below?

PREPARED BY ARUN PRATAP SINGH 52

52

In this case we need a nonlinear dividing line.
Rather than fitting nonlinear curves to the data, SVM handles this by using a kernel function to
map the data into a different space where a hyperplane can be used to do the separation.

The kernel function may transform the data into a higher dimensional space to make it possible
to perform the separation.

PREPARED BY ARUN PRATAP SINGH 53

53


Ideally an SVM analysis should produce a hyperplane that completely separates the feature vectors
into two non-overlapping groups. However, perfect separation may not be possible, or it may result
in a model with so many feature vector dimensions that the model does not generalize well to other
data; this is known as over fitting.


PREPARED BY ARUN PRATAP SINGH 54

54
The Kernel Trick
Many kernel mapping functions can be used probably an infinite number. But a few kernel
functions have been found to work well in for a wide variety of applications. The default and
recommended kernel function is the Radial Basis Function (RBF).
Kernel functions supported by DTREG:
Linear: u*v

(This example was generated by pcSVMdemo.)
Polynomial: (gamma*u*v + coef0)^degree


PREPARED BY ARUN PRATAP SINGH 55

55

Radial basis function: exp(-gamma*|u-v|^2)

To allow some flexibility in separating the categories, SVM models have a cost parameter, C, that
controls the trade off between allowing training errors and forcing rigid margins. It creates a soft
margin that permits some misclassifications. Increasing the value of C increases the cost of
misclassifying points and forces the creation of a more accurate model that may not generalize
well. DTREG provides a grid search facility that can be used to find the optimal value of C.
Finding Optimal Parameter Values
The accuracy of an SVM model is largely dependent on the selection of the model parameters.
DTREG provides two methods for finding optimal parameter values, a grid search and a pattern
search. A grid search tries values of each parameter across the specified search range using
geometric steps. A pattern search (also known as a compass search or a line search) starts at
the center of the search range and makes trial steps in each direction for each parameter. If the

PREPARED BY ARUN PRATAP SINGH 56

56
fit of the model improves, the search center moves to the new point and the process is repeated.
If no improvement is found, the step size is reduced and the search is tried again. The pattern
search stops when the search step size is reduced to a specified tolerance.
Grid searches are computationally expensive because the model must be evaluated at many
points within the grid for each parameter. For example, if a grid search is used with 10 search
intervals and an RBF kernel function is used with two parameters (C and Gamma), then the model
must be evaluated at 10*10 = 100 grid points. An Epsilon-SVR analysis has three parameters (C,
Gamma and P) so a grid search with 10 intervals would require 10*10*10 = 1000 model
evaluations. If cross-validation is used for each model evaluation, the number of actual SVM
calculations would be further multiplied by the number of cross-validation folds (typically 4 to 10).
For large models, this approach may be computationally infeasible.
A pattern search generally requires far fewer evaluations of the model than a grid search.
Beginning at the geometric center of the search range, a pattern search makes trial steps with
positive and negative step values for each parameter. If a step is found that improves the model,
the center of the search is moved to that point. If no step improves the model, the step size is
reduced and the process is repeated. The search terminates when the step size is reduced to a
specified tolerance. The weakness of a pattern search is that it may find a local rather than global
optimal point for the parameters.
DTREG allows you to use both a grid search and a pattern search. In this case the grid search is
performed first. Once the grid search finishes, a pattern search is performed over a narrow search
range surrounding the best point found by the grid search. Hopefully, the grid search will find a
region near the global optimum point and the pattern search will then find the global optimum by
starting in the right region.
Classification With More Than Two Categories
The idea of using a hyperplane to separate the feature vectors into two groups works well when
there are only two target categories, but how does SVM handle the case where the target variable
has more than two categories? Several approaches have been suggested, but two are the most
popular: (1) one against many where each category is split out and all of the other categories
are merged; and, (2) one against one where k(k-1)/2 models are constructed where k is the
number of categories. DTREG uses the more accurate (but more computationally expensive)
technique of one against one.
Optimal Fitting Without Over Fitting
The accuracy of an SVM model is largely dependent on the selection of the kernel parameters
such as C, Gamma, P, etc. DTREG provides two methods for finding optimal parameter values,
a grid search and a pattern search. A grid search tries values of each parameter across the
specified search range using geometric steps. A pattern search (also known as a compass
search or a line search) starts at the center of the search range and makes trial steps in each
direction for each parameter. If the fit of the model improves, the search center moves to the new
point and the process is repeated. If no improvement is found, the step size is reduced and the
search is tried again. The pattern search stops when the search step size is reduced to a specified
tolerance.

PREPARED BY ARUN PRATAP SINGH 57

57
To avoid over fitting, cross-validation is used to evaluate the fitting provided by each parameter
value set tried during the grid or pattern search process.
The following figure by Florian Markowetz illustrates how different parameter values may cause
under or over fitting:












PREPARED BY ARUN PRATAP SINGH 58

58
KOHONEN SELF-ORGANIZING MAPS :

Kohonen's networks are one of basic types of self-organizing neural networks. The
ability to self-organize provides new possibilities - adaptation to formerly unknown
input data. It seems to be the most natural way of learning, which is used in our brains,
where no patterns are defined. Those patterns take shape during the learning process,
which is combined with normal work. Kohonen's networks are a synonym of whole
group of nets which make use of self-organizing, competitive type learning method.
We set up signals on net's inputs and then choose winning neuron, the one which
corresponds with input vector in the best way. Precise scheme of rivalry and later
modifications of synapthic wages may have various forms. There are many sub-types
based on rivalry, which differ themselves by precise self-organizing algorithm.
Architecture of self-organizing maps :

Structure of neural network is a very crucial matter. Single neuron is a simple
mechanism and it's not able to do much by itself. Only a compound of neurons
makes complicated operations possible. Because of our little knowledge about actual
rules of human's brain functioning many different architectures were created, which
try to imitate the structure and behaviour of human's nervous system. Most often
one-way, one-layer type of network architecture is used. It is determined by the fact
that all neurons must participate in the rivalry with the same rights. Because of that
each of them must have as many inputs as the whole system.

PREPARED BY ARUN PRATAP SINGH 59

59

Neural network

2-D map of neurons



PREPARED BY ARUN PRATAP SINGH 60

60

Stages of operations:
Functioning of self-organizing neural network is divided into three stages:
construction
learning
identification
System, which is supposed to realize functioning of self-organizing network,
should consist of few basic elements. First of them is a matrix of neurons which are
stimulated by input signals. Those signals should describe some attributes of effects
which occure in the surrounding. Thanks to that description the net is able to group
those effects. Information about events is translated into impulses which stimulate
neurons. Group of signals transfered to every neuron doesn't have to be identical,
even its number may be various. However they have to realize one condition:
unambiguously define those events.
Another part of the net is a mechanism which defines the stage of similarity of
every neuron's wage and input signal. Moreover it assigns the unit with the best
match - the winner. At the beginning the wages are small random numbers. It's

PREPARED BY ARUN PRATAP SINGH 61

61
important that no symetry may occure. While learning, those wages are being
modificated in the best way to show an internal structure of input data. However
there is a risk that neurons could link with some values before groups are correctly
recognized. Then the learning process should be repeated with diffrent wages.
At last, absolutely necessary for self-organizing process is that the net is able to
adapt wages values of winning neuron and his neighbours, according to response
strenght. Net topology can be defined in a very simple way by determining the
neighbours of every neuron. Let's call the unit whose response on stimulation is
maximal the image of this stimulation. Then we can presume that the net is in order,
if topologic relations between input signals and their images are identical.
Algorithm of learning::

The name of the whole class of networks came from the designation of algorithm
called self-organizing Kohonen's maps. They had been described in the publication
"Self Organizing Map". Kohonen proposed two kinds of proximity : rectangular and
gauss. The first is :

and the second:

"lambda" is the radius of proximity, it decreases in time.

Use of Kohonen's method gives us better results than "Winner Takes All" method.
Organization of the net is better (neurons organization represents the distribution of
input data in a better way) and the convergence of the algorithm is higher. Because of
that the time of single iteration is a few times longer - wages of many neurons , not
only winners', have to be modified.

PREPARED BY ARUN PRATAP SINGH 62

62



PREPARED BY ARUN PRATAP SINGH 63

63






PREPARED BY ARUN PRATAP SINGH 64

64

Anda mungkin juga menyukai