Anda di halaman 1dari 13

Lecture 2 MCOM0079

Neural Networks and Machine Learning

Let the Errors of a wise man make your rule


Not the perfection of fools
William Blake
Artificial Neural Networks (ANNs) are biologically inspired models of computation founded on a
conceptual view of the animal brain. ANNs consist of a number of interconnected neurons, each neuron
being conceived as a simple computational unit.
The simplest model of a neuron is given below
Integrator (soma)
inputs
(dendrites)

output (axon)

The connection between neurons takes place between an afferent neurons output (its axon) and an
efferent neurons input (their dendrites). This connection takes place at a junction called the synapse.
The synapse either promotes the incoming signal (in which case the afferent neuron is said to be
excitatory) or conversely it might inhibit such a signal. A diagram is given below
Afferent neuron

Efferent Neuron

axon

dendrite

axon

synapse
In the simple models that we shall study, the property of a synapse may change from excitatory to
inhibitory and vice versa. It is these changes that constitute learning.
A typical Neural Network is given below.

It consists of three layers (input, hidden and output), in this picture the output from one layer connects
to the input to another. Often we decorate each connection by a number or weight the value of this
weight reflects the conductance of the synapse.
i

wij

In general the weight between neuron i and neuron j is represented by wij.

The Neuron acts in the following way, input signals are sent from afferent neurons (along their axons),
transmitted across the synapses, passing along the dendrites and finally received at the soma. It is there
that the signals are integrated (or summed); if a certain value is exceeded (the neurons threshold level),
then a signal is propagated (from the axon hillock) along the neurons axon.
x1
x2

w1
w2

y
w3

x3

The total input to a given neuron is modelled as a = x1*w1 + x2*w2 + x3*w3: where xi represents the
input from an external source and wi reflects the conductance of the synapse.
The output y is a function of a and , threshold value of the neuron, written y = f(a, ). The function, f,
is either bivalent (the step or sign function) or their continuous equivalents (sigmoid or hyperbolic), the
details are not important here.
Learning in neural networks is either supervised or unsupervised. Both supervised and unsupervised
learning involves a number of weight changes; the changes for supervised learning being dictated by a
comparison between the output from the network and a training set the required results.
Unsupervised learning involves the network responding to underlying patterns in the input data.
The learning paradigm that we will use here is that due to Donald Hebb. Hebbian learning is normally
associated with unsupervised learning but we shall apply the principle to supervised learning. Hebbian
learning is based on the following observation in biological neural networks.
When an axon of cell A repeatedly takes part in exciting a cell B, some metabolic change takes place
(at the synapse) such that As efficacy is promoted for firing cell B.
We generalise this observation as follows.

w
B

Suppose the output from neuron A is , and that of B is , then the change in weight w is given as
w = f(,)
where f is a suitable function reflecting the Hebbian learning principal above.
The Hebbian rule that we shall adopt in the following lecture is as follows:
Each input / output to the neurons are taken to be bipolar (-1 for false and 1 for true).
The change in weight w is then modelled as
w = c*
where c is either 0 or 0+ for a correct answer
or 0- for an incorrect answer
The affect of this rule promotes the strength of the synapse (w >= 0) if and agree and the answer
is correct or if and disagree and the answer is incorrect. It will weaken the strength of the synapse
(w < 0)) otherwise. This under careful consideration is just what is required for supervised learning.
The following algorithm describes Hebbian Learning

First Some Functions that will be of use


/**
* class Maths
* David Smith
*
* random(int n)
- returns random number from 0 -- n
* random()
- returns random number from -0.3 -- +0.3
* max(int a,int b)
- returns maximum of two integers
* sign(double x)
- returns the sign (+1 or -1) of a given number
* round(int n,double x) - rounds x to n significant figures
*/
public class Maths{
/*
* random(int n)
* pre n > 0
* returns random number from 0 -- n
*/
public static int random(int n){
double rnm = n*Math.random();
return (int) (rnm + 0.5);
} // random
/*
* random()
* returns random number from -0.3 -- +0.3
*/
public static double random(){
double sign;
double scale = 0.3;
if (Math.random() > 0.5){
sign = 1.0;
}else{
sign = -1.0;
}
return sign*scale*Math.random();
} // random
public static int max(int a,int b){
if (a < b){
return a;
} else {
return b;
} // if
} // max
public static int sign(double x){
if (x < 0){
return -1;
}else {
return 1;
} // if
} // sign
public static double round(int n, double x){
double result = (x * Math.pow(10,n)) + sign(x)*0.5;
return ((int) result)/ Math.pow(10,n);
} // round
} // Maths

Next The Class Layer


/**
* class Layer
* David Smith
*
* "Let the errors of wise men make your rule
* Not the perfections of fools"
*
William Blake
*/
public class Layer{
public double[][] weights;
public double[] input;
public double[] output;
public int nodesIn;
public int nodesOut;
public double rate;
// learning rate
public double percentage;
// fraction (0-1) of updated synapses
public int count;
// number of iterations (correct)
public Layer(int nodesIn,int nodesOut){
this.nodesIn = nodesIn;
this.nodesOut = nodesOut;
weights = new double[nodesIn+1][nodesOut];
this.output = new double[nodesOut];
this.input = new double[nodesIn];
initialiseWeights();
this.percentage = 0.2;
this.count = 1;
} // Layer
/*
* learn(boolean correct)
*
* Apply the Hebbian Rule
*/
public void learn(boolean correct){
// training is sensitive to the rate
if (correct){
count++;
rate = 0.01/count; // slight reinforcement;
} else {
rate = -0.2;
// learn by your mistakes!
} // if
// calculate the number of weights to be updated
double tmp = (nodesIn+1)*nodesOut*percentage;
int wtsUpdated = Maths.max((int) tmp,3);
// select an appropriate weight
for (int k = 0; k < wtsUpdated; k++){
int i = Maths.random(nodesIn);
int j = Maths.random(nodesOut - 1);
if (i < nodesIn){ // input
weights[i][j] += rate*Maths.sign(input[i])*Maths.sign(output[j]);
} else {
// threshold
weights[i][j] += rate*Maths.sign(-1)*Maths.sign(output[j]);
} // if
} // for
} // learn

/*
* propogate(double[] data)
*
* sets the attribute input to data
* and calculates the attribute output using the stored weights
*/
public void propogate(double[] data){
input = data;
for (int j = 0; j < nodesOut; j++){
output[j] = 0.0;
for (int i = 0; i < nodesIn; i++){
output[j] += weights[i][j]*input[i];
} // for i
output[j] += (-1)*weights[nodesIn][j];
// threshold
output[j] = Maths.sign(output[j]);
} // for j
} // propogate
public void fuzz(double[] data){
input = data;
for (int j = 0; j < nodesOut; j++){
output[j] = 0.0;
for (int i = 0; i < nodesIn; i++){
output[j] += weights[i][j]*input[i];
} // for i
output[j] += (-1)*weights[nodesIn][j];
output[j] = Maths.sigmoid(3,output[j]);
} // for j
} // fuzz

// threshold

public void prop(double[] data){


} // prop
/*
* display the weights of this layer
*/
public void show(){
for(int i = 0; i < nodesIn+1;i++){
for(int j = 0; j < nodesOut;j++){
System.out.print(Maths.round(3,weights[i][j])+ "\t");
} // for
System.out.println();
} // for
System.out.println("--");
} // show
// -----------------------------------------------------------------------------------------// -----------------------------------------------------------------------------------------private void initialiseWeights(){
for (int i = 0;i < nodesIn+1;i++){
for (int j = 0; j<nodesOut; j++){
weights[i][j] = Maths.random();
} // for j
} // for i
} // intialise
} // Class Layer

The Class Network sticks a number of Layers together


/**
* class Network
* David Smith
*/
public class Network{
public Layer[] network;
public int layers;
public double[] input;
public double[] output;
public Network(int[] architecture){
layers = architecture.length - 1;
network = new Layer[layers];
// create each layer
for (int i = 0; i < layers; i++){
int nodesIn = architecture[i];
int nodesOut = architecture[i+1];
network[i] = new Layer(nodesIn,nodesOut);
} // for i
this.input = new double[architecture[0]];
this.output = new double[architecture[layers]];
}// Network
/*
* propogate(double[] data)
*
* propogate the data through the network
*/
public void propogate(double[] data){
this.input = data;
double[] tmp = data;
for (int i = 0; i < layers; i++){
network[i].propogate(tmp);
tmp = network[i].output;
} // for i
output = network[layers - 1].output;
} //
public void fuzz(double[] data){
this.input = data;
double[] tmp = data;
for (int i = 0; i < layers-1; i++){
network[i].propogate(tmp);
tmp = network[i].output;
} // for i
network[layers-1].fuzz(tmp);
output = network[layers - 1].output;
} // fuzz
/*
* learn(boolean correct)
*/
public void learn(boolean correct){
for (int i = 0; i < layers; i++){
network[i].learn(correct);
} // for i
} // learn

/*
* display the weights of each layer in the network
*/
public void show(){
System.out.println("The Weights");
for(int i = 0; i < layers; i++){
System.out.println("Layer " + (i+1));
network[i].show();
}
} // show
} // class Network

Finally Class Hebb Create an object specifying the number of nodes in the hidden layer and
call the routine go()
/**
* class Hebb
* An Exercise in Supervised Hebbian Learning
* David Smith
*/
public class Hebb{
// 2 bit adder
public String title = "Binary Adder";
// Inputs: Cin, A1, A2
double[][] input = {{1,1,1},
{1,1,-1},
{1,-1,1},
{1,-1,-1},
{-1,1,1},
{-1,1,-1},
{-1,-1,1},
{-1,-1,-1}
};
// Outputs: Sum, Cout
double[][] result = { {1,1},
{-1,1},
{-1,1},
{1,-1},
{-1,1},
{1,-1},
{1,-1},
{-1,-1}
};
public Network net;
public int outputNodes = result[0].length;
public int hiddenNodes;
public int inputNodes = input[0].length;
public int[] architecture;
private int maxEpochs = 100000;
private int epochs;
public Hebb(int hiddenNodes){
this.hiddenNodes = hiddenNodes;
architecture = new int[3];
architecture[0] = inputNodes;
architecture[1] = hiddenNodes;
architecture[2] = outputNodes;
net = new Network(architecture);
} // Test
public void train(){
epochs = 0;
go();
showTrainingResults();
} // train

public void test(double[] input){


net.fuzz(input);
showTestResults(input);
} // test
//-----------------------------------------------------------------//-----------------------------------------------------------------private void go(){
boolean correct;
boolean done = false;
while (epochs < maxEpochs){
if (done){
break;
} // if
done = true;
for (int k = 0; k < input.length; k++){ // for each input pattern
net.propogate(input[k]);
for (int j = 0; j < outputNodes; j++){ // for each output pattern
correct = (Maths.sign(net.output[j]*result[k][j]) == 1);
net.learn(correct);
if (!correct){
done = false;
} // if
} // for j
} // for k
epochs++;
} // while
} // go
private void showTrainingResults(){
System.out.println(title);
System.out.print("Network Architecture ");
for (int i = 0; i < architecture.length;i++){
System.out.print(architecture[i] + " ");
}
System.out.println();
System.out.println("Network Input | Output");
for (int i = 0; i < input.length; i++){
net.propogate(input[i%input.length]);
for (int j = 0; j < input[0].length;j++){
System.out.print(Maths.round(1,input[i][j]) + "\t");
} // for j
System.out.print("|" + "\t");
for (int j = 0; j < outputNodes;j++){
System.out.print(Maths.round(1,net.output[j]) + "\t");
} // for j
System.out.println();
} // for
System.out.println("Training Epochs = " + epochs);
System.out.println("--------------------");
net.show();
} // showTrainingResults

private void showTestResults(double[] input){


System.out.println("Test Results");
System.out.println("Input ");
for (int i = 0; i < input.length; i++){
System.out.print(input[i] + ",");
}
System.out.println();
System.out.println("--");
System.out.println("Fuzzy Output ");
for (int i = 0; i < outputNodes; i++){
System.out.print(Maths.round(2,net.output[i]) + ",");
}
System.out.println();
System.out.println("--");
} // showTestResults
} // Hebb

Example 1:
Binary Adder
Network Architecture 3 3 2
Network Input | Output
1.0
1.0
1.0
|
1.0
1.0
1.0
1.0
-1.0
|
-1.0
1.0
1.0
-1.0
1.0
|
-1.0
1.0
1.0
-1.0
-1.0
|
1.0
-1.0
-1.0
1.0
1.0
|
-1.0
1.0
-1.0
1.0
-1.0
|
1.0
-1.0
-1.0
-1.0
1.0
|
1.0
-1.0
-1.0
-1.0
-1.0
|
-1.0
-1.0
Training Epochs = 12006
-------------------The Weights
Layer 1
0.33
-0.04
0.3
0.29
-0.07
0.31
0.11
0.2
-0.1
0.06
-0.04
0.03
(threshold values)
-Layer 2
-0.08
0.16
0.04
-0.07
0.08
0.0
0.02
0.03
(threshold values)
-Exercise:
Draw the neural network given by the above output.
Example 2:
XOR Gate
Network Architecture 2 2 1
Network Input | Output
-1.0
-1.0
|
-1.0
-1.0
1.0
|
1.0
1.0
-1.0
|
1.0
1.0
1.0
|
-1.0
Training Epochs = 46
-------------------The Weights
Layer 1
0.055 0.185
0.05
0.175
-0.041 0.084
(threshold values)
-Layer 2
0.16
-0.291
0.191
(threshold values)
-Exercise:
Draw the neural network given by the above output.
Check by hand that the network really does behave as an OR gate.

What does this all mean?


The following is the output from our training algorithm.
XOR Gate
Network Architecture 2 2 1
Network Input | Output
-1.0
-1.0
|
-1.0
-1.0
1.0
|
1.0
1.0
-1.0
|
1.0
1.0
1.0
|
-1.0
Training Epochs = 16
-------------------The Weights
Layer 1
0.115 0.092
-0.234 -0.077
-0.287 0.087
-Layer 2
-0.294
0.124
-0.181
-Here we will give a geometric interpretation note that this is only an explanatory framework, it is
certainly not believed that Hebbian learning in the animal brain is governed by considerations of
Euclidean geometry.
The XOR gate above has the following architecture (2 2 1)
x1
x2

x1
x2

x1

z1
z2

x2

Depicted on the diagram is the output from each node, y is the output from the network, x1 and x2 are
the inputs to the network z1 and z2 are the outputs from the hidden layer. Note that the ouputs from the
input layer are simply the inputs themselves.
z1 = sign(0.115x1 0.234x2 + 0.287)
z2 = sign(0.092x1 0.077x2 0.087)
y = sign(-0.294z1 + 0.124z2 + 0.181)
Each of the above functions represents a transformation as given in the following diagram
0.115x1 0.234x2 + 0.287 = 0
x2

x1

0.092x1 0.077x2 0.087 = 0

z2

-0.294z1 + 0.124z2 + 0.181 = 0

z1

y
-1

+1

Let us consider the output z1 = sign(0.115x1 0.234x2 + 0.287) from the first node in the hidden layer.
The function sign separates (by either assigning a value -1 or +1) members of the (x1, x2) plane
partitioned by the straight line 0.115x1 0.234x2 + 0.287 = 0.
A similar argument applies for the output z2 of the second node in the hidden layer.
The effect of the training algorithm ensures that both (1,1) and (-1,-1) are mapped to a single point (in
this case (1,-1)). The points (1,-1) and (-1,1) are mapped (1,1) and (-1,-1) respectively.
The (z1, z2) plane is partitioned by the line -0.294z1 + 0.124z2 + 0.181 = 0. In particular, the points
(1,1) and (-1,-1) are mapped to +1 and the point (1,-1) is mapped to -1 (as required).