Anda di halaman 1dari 21



Artificial Neurons
An artificial neuron is a mathematical function conceived as a counterpart of a
biological neuron. It is alternatively named elementary processing unit, binary
neuron, node, linear threshold function or McCulloch-Pitts (MCP) neuron.
An artificial neuron receives one or more weighted inputs (representing the
number of dendrites and synaptic weights) and sums them up (equivalently to
spatio-temporal summation of signals at soma level). Subsequently, the sum is
passed through a nonlinear function known as an activation or transfer function.
If a certain threshold level is exceeded, the neuron fires up and send a signal to
the neighboring cells.
The transfer functions usually have a sigmoid shape, though they may take
several forms of non-linear functions, like piecewise linear functions, or step
Generally, the transfer functions are monotonically increasing, continuous,
differentiable, and bounded.
Most artificial neurons of simple types, such as McCulloch-Pitts models are
sometimes characterized as caricature models, in that they are intended to
reflect only some neurophysiological characteristics, but without regard to
realism /full representation of their biological counterparts.

Structure of Artificial Neurons

An artificial neuron is a more or less simplified mathematical (computational)

model of a biological neuron.
These models mimic the real life behavior of neurons with respect to the
electro-chemical messages they produce between input (afferent signals from
dendrites), signal processing at the level of soma, and output (efferent action
potentials delivered by the axonal buttons).

Structure of Artificial Neurons

Consider an artificial neuron and let there be P + 1 inputs with signals x0

through xP and weights wk0 through wP.
Usually, the x0 input is assigned the value +1, which makes it a bias input with
wk0 = bk. This leaves only P actual inputs to the neuron: from x1 to xP.

Structure of Artificial Neurons

The equations describing input signals and the output of a single neuron
indexed by k are give by the following equations:

uk = wkj x j
j =1

yk =
( uk ) wkj x j
j =1

Here u refers to the spatiotemoral sum of all weighted inputs of the neuron.
In most cases, it is useful to include the threshold k for each neuron:

yk =
( uk ) wkj x j k
j =1

In vectorial notation, for each neuron k,

x = {x1 , x2 ,..., xP }

- vector of input signals

w k = {wk 1 , wk 2 ,..., wkP } - vector of synaptic weights

History of Artificial Neurons

The first artificial neuron was the Threshold Logic Unit (TLU) first proposed by
Warren McCulloch and Walter Pitts in 1943. As a transfer function, it employed a
threshold equivalent to using the Heaviside step function.
Initially, a simple model was considered, with binary inputs and outputs, yet it
was noticed that any Boolean function could be implemented by networks of
such devices.
Cyclic networks, with feedback through neurons, could define dynamical
systems with memory, but most of the research concentrated strictly on
feedforward networks because of easier mathematical tractability.
An artificial neural network (ANN) that used the linear threshold function was the
perceptron, developed by Frank Rosenblatt. This model already considered
more flexible weight values in neurons and it was used in machines with
adaptive capabilities.
The representation of the threshold values as a bias term was introduced by
Bernard Widrow in 1960.
In the late 1980s, neurons with more continuous shapes were considered and
optimization algorithms like gradient descent used for adjusting the weights.
ANNs also started to be used as a general function approximation models.

Transfer Function
The transfer function of a neuron is chosen to have a number of properties
which either enhance or simplify the network containing the neuron.
A sigmoid function is a bounded differentiable real function that is defined for all
real input values and has a positive derivative at each point.

All sigmoid functions are normalized in that their slope at the origin is 1.

Sigmoid Function
A sigmoid function is a bounded differentiable real function that is defined for all
real input values and has a positive derivative at each point.
A sigmoid function is a function having an "S" shape (sigmoid curve).
Often, sigmoid functions refer to special cases like the logistic function defined
by the formula:

S (t ) =

1 + exp ( t )

Another example is the Gompertz curve, which is used in modeling systems

that saturate at large values of t.

Gompertz Curve or Function

X ( t ) = a exp ( b exp ( ct ) )
where a is the upper
asymptote, b and c are
negative numbers, b sets the
displacement along the x axis
(translates the graph to the
left or right), and c sets the
growth rate (y scaling).
A Gompertz function is a sigmoid function that models time series, where growth
is slowest at start and end of a time period. The Gompertz curve is used in
modeling systems that saturate at large values of t. Gompertz function is a
special case of the generalized logistic function.
Examples of usage are in modeling the growth of tumors or populations in a
confined space where the availability of nutrients is limited.
The future value asymptote of the function is approached much more gradually
by the curve than the left-hand or lower valued asymptote, in contrast to the
simple logistic function in which both asymptotes are approached symmetrically.

Error Function

The error function (also called the Gauss error function) is a special function
(non-elementary) of sigmoid shape:

erf ( x )

( )

exp t 2 dt

The complementary error function is defined as:

erfc ( x ) =
1 erf ( x ) =

( )

exp t 2 dt

Gudermannian Function

The Gudermannian function, named after Christoph Gudermann (17981852),

relates the circular functions and hyperbolic functions without using complex
The Gudermannian function denoted by gamma or gd and its inverse are
defined as:

gd ( x ) =
gd 1 ( x )
cosh ( t )

cos ( t )

Heaviside Function
The Heaviside step function, or
the unit step function, usually
denoted by H, is a discontinuous
function. The output y of H
transfer function is binary,
depending on the specified
threshold, :

1 if u
y (u ) =
0 if u <
It seldom matters what value is
used for H(0), since H is mostly
used as a distribution.
The Heaviside function is the integral of the Dirac delta function, , although this
expansion may not hold (or even make sense) for x = 0, depending on which
formalism is used to give meaning to integrals involving .


H (u ) =

( s ) ds

Heaviside Function
The Heaviside step function is used in some neuromorphic models as well. It
can be approximated from other sigmoidal functions by assigning large values
to the weights. It performs a division of the space of inputs by a hyperplane.
An affine hyperplane is an affine subspace of codimension 1 in an affine space.
Such a hyperplane in Cartesian coordinates is described by a linear equation
(where at least one of the ais is non-zero):

a1 x1 + a2 x2 + ... + an xn =
In the case of a real affine space (when the coordinates are real numbers) this
affine space separates the space into two half-spaces, which are the connected
components of the complement of the hyperplane given by the inequalities:

a1 x1 + a2 x2 + ... + an xn < b
a1 x1 + a2 x2 + ... + an xn > b
The Heaviside function is specially useful in the last layer of a multilayered
network intended to perform binary classification of the inputs
Affine hyperplanes are used to define decision boundaries in many machine
learning algorithms such as decision trees and perceptrons.

Random Variables
A random variable (aleatory variable or stochastic variable) is a real-valued
function defined on a set of possible outcomes of a random experiment, the
sample space . That is, the random variable is a function that maps from its
domain, the sample space , to its range (real numbers or a subset of the real
A random variable can take on a set of possible different values (similarly to
other mathematical variables), each with an associated probability (in contrast
to other mathematical variables).
The mathematical function describing the possible values of a random variable
and their associated probabilities is known as a probability distribution.
Random variables can be discrete, that is, taking any of a specified finite or
countable list of values, and hence with a probability mass function as
probability distribution, continuous, taking any numerical value in an interval or
collection of intervals, and with a probability density function describing the
probability distribution, or a mixture of both types.
Random variables with discontinuities in their CDFs can be treated as mixtures
of discrete and continuous random variables.

Probability Density Function

Probability density function (pdf), or density of a continuous random variable X,

denoted fX is a function that describes the relative likelihood for this random
variable to take on a given value.
A random variable X has density fX, where fX is a non-negative Lebesgueintegrable function, if:

Pr [ a X b] =f X ( x ) dx

Cumulative Distribution Function

The cumulative distribution function (cdf), describes the probability that a realvalued random variable X with a given probability distribution X will be found at a
value less than or equal to x. In the case of a continuous distribution, it gives the
area under the probability density function from minus infinity to x.
The cdf of a continuous random variable X can be expressed as the integral of its
probability density function X as follows:

FX ( x ) =

f X ( t ) dt

Probability Mass Function

A probability mass function (pmf) is a function that gives the probability that a
discrete random variable is exactly equal to some value.
The probability mass function is often the primary means of defining a discrete
probability distribution, and such functions exist for either scalar or multivariate
random variables whose domain is discrete.
A probability mass function (pmf) differs from a probability density function (pdf)
in that the latter is associated with continuous rather than discrete random
variables; the values of the latter are not probabilities as such: a pdf must be
integrated over an interval to yield a probability.[
Consider a random variable X, defined on sample space , then the probability
mass function fX is defined as follows:

X : A R, f X : A [0,1], f X ( x=
) Pr( X= x=) Pr ({s : X ( s=) x})

The probability mass

function of a fair die.

Probability Mass Distribution - Example

If the sample space is the set of all
possible numbers rolled on two dice,
and the random variable X of interest is
the sum S of the numbers on the two
dice, then X is a discrete random
variable whose distribution is described
by the probability mass function plotted
as the height of columns in the figure.

In this case, the random variable of interest X is defined as the function that
maps the pairs to the sum:

{( n1 , n2 )},

X : S , X ( ( n1 , n2 ) ) = n1 + n2 , n1 , n2 {1,2,3,4,5,6}

and has the probability mass function X given by:

fX (S )

min( S 1, 13 S )
, S {2,3,4,5,6,7,8,9,10,11,12}

Degenerate Distribution
A degenerate distribution is the probability distribution
of a random variable which takes a single value only.
The degenerate distribution is localized at a point k0 on the real axis. The
probability mass function and cumulative distribution function are given by:

Probability mass function

1, if k = k0
f ( k ; k0 ) =
0, if k k0
PMF for k0=0.
The horizontal axis is the index i of ki.

Cumulative distribution function

1, if k k0
F ( k ; k0 ) =
0, if k < k0
CDF for k0=0.
The horizontal axis is the index i of ki.

McCulloch, W. and Pitts, W. (1943), A logical calculus of the ideas immanent in
nervous activity. Bulletin of Mathematical Biophysics, 7: 115-133.
Mutihac R., Modelarea si Simularea Neuronala Elemente Fundamentale.
Editura Universitatii din Bucuresti, 2000.
Werbos, P.J. (1990). Backpropagation through time: What it does and how to
do it. Proceedings of the IEEE, 78 (10):1550-1560.
Robertson, J.S. (1997), "Gudermann and the simple pendulum", The College
Mathematics Journal 28(4):271276.
Fitzhugh, R. and Izhikevich, E. (2006), FitzHugh-Nagumo model.
Scholarpedia, 1 (9): 1349.
Haykin S., Neural Networks: A Comprehensive Foundation. 2 ed., Prentice Hall,
Hebb, D.O., The Organization of Behavior. New York Wiley, 1949.
Hodgkin, A.L. and Huxley, A.F. (1952), A quantitative description of membrane
current and its application to conduction and excitation in nerve. The Journal
of Physiology, 117 (4): 500544.
Hoppensteadt, F.C. and Izhikevich E.M., Weakly Connected Neural Networks.
Springer, 1997.

Abbott, L.F. (1999). Lapique's introduction of the integrate-and-fire model
neuron (1907). Brain Research Bulletin, 50 (5/6): 303-304.
Koch, C. and Segev, I. (1999), Methods In Neuronal Modeling : From Ions To
Networks. 2nd ed., Cambridge, Mass., MIT Press., 1999.