Anda di halaman 1dari 41

Neural Networks And Their

Statistical Application

By Clint Hagen
Statistics Senior Seminar 2006
Outline
 What Neural Networks are and why they
are desirable
 How the process works and appropriate
statistical applications
 Basic Architectures and algorithms
 Applications
 Drawbacks and limitations
 Demonstration using “NeuroShell 2”
The original analyst
What are they?
 Computer algorithms designed to mimic
human brain function
 Set of simple computational units which
are highly interconnected
Human Brain Function
Neural Network Function
Some Similarities
Why Neural Networks are desirable
 Human brain can generalize from abstract
 Recognize patterns in the presence of
noise
 Recall memories
 Make decisions for current problems
based on prior experience
Why Desirable in Statistics
 Prediction of future events based on past
experience
 Able to classify to nearest pattern in
memory, doesn’t have to be exact
 Predict latent variables that are not easily
measured
 Non-linear regression problems
What are Neural Networks?

The computational ability of a


digital computer combined with
the desirable functions of the
human brain.
How the Process Works
Terminology, when to use neural
networks and why they are used
in statistical applications.
Terminology
 Input: Explanatory variables also referred
to as “predictors”.
 Neuron: Individual units in the hidden
layer(s) of a neural network.
 Output: Response variables also called
“predictions”.
 Hidden Layers: Layers between input and
output that an apply activation function.
Terminology
 Weights: Result (parameters) of an
objective function (usually sum of squares
error) used while training a network.
 Backpropagation: Most popular training
method for neural networks.
 Network training: To find values of network
parameters (weights) for performing a
particular task.
Terminology
 Patterns: Set of predictors with their actual
output used in training the network
When to use neural networks
 Use for huge data sets (i.e. 50 predictors
and 15,000 observations) with unknown
distributions
 Smaller data sets with outliers as neural
networks are very resistant to outliers
Why Neural Networks in Statistics?
 The methodology is seen as a new
paradigm for data analysis where models
are not explicitly stated but rather implicitly
defined by the network.
 Advanced pattern recognition capabilities
 Allows for analysis where traditional
methods might be extremely tedious or
nearly impossible to interpret.
Basic Architectures
Feed Forward
 Feed-forward method trained using
backpropagation (backpropagation
network) is used in time series prediction
problems most often. It is the most
commonly used algorithm.
 We will see this algorithm in more detail
soon
Adaline Network
 Pattern recognition network
 Essentially a single layer backpropagation
network
 Only recognizes exact training patterns
Hopfield Model
 The Hopfield model is used as an auto-
associative memory to store and recall a
set of bitmap images.
 Associative recall of images, given
incomplete or corrupted version of a
stored image the network can recall the
original
Boltzmann Machine
 The Boltzmann machine is a stochastic
version of the Hopfield model.
 Used for optimization problems such as
the classic traveling salesman problem
Note

Those are only a few of the more


common network structures.
Advanced users can build
networks designed for a particular
problem in many software
packages readily available on the
market today.
Feed Forward Network Trained
Using Backpropagation
Structure
Input Input Input Input

 One-way only
 Can have multiple
hidden layers
 Each layer can have Hidden Hidden Hidden Hidden
Layer Layer Layer Layer

independent number
of neurons
 Each layer fully
connected to the next
layer. Output Output Output Output

Feed-Forward Design
Alternate Structure
Predictor 1 Predictor 2 Predictor 3

Neuron i Neuron j
Wik

Wjl
Neuron k Neuron l

Output t
Weights
 Each connection (arrow) in the previous
diagram has a weight, also called the
synaptic weight
 The function of these weights is to reduce
error between desired output and actual
output
Weights
 Weights are adjustable
 Weight Wij is interpreted as the strength of
the connection between the jth unit and the
ith unit
 Weights are computed in opposite
direction as the networks runs
 Netinput ij = ∑ wij * outputj + µi

 µi is a threshold for neuron i


Threshold
 Each neuron takes its net input and
applies an activation function to it
 The output of the jth neuron (activation
value) is g(∑ wij * xi) where g(·) is the
activation function and xi is the output of
the ith unit connected to j
 If the net input exceeds the threshold the
neuron will “fire”
Activation Function
 The only practical requirement for an
activation function is that it be
differentiable
 Sigmoid function is commonly used
 g(netinput) = 1/(1+ exp-(netinput))
 Or a simple binary threshold unit
 Ө(netinput) = {1 ,if netinput ≥ 0 ; 0 ,
otherwise}
Backpropagation
 The backpropagation algorithm is a
method to find weights for a multilayered
feed forward network.
 It has been shown that a feed forward
network trained using backpropagation
with sufficient number of units can
approximate any continuous function to
any level of accuracy
Training the Network
 Neural Networks must be first trained
before being used to analyze new data
 Process entails running patterns through
the network until the network has “learned”
the model to apply to future data
 Can take a long time for noisy data
 Usually doesn’t converge with desired
output, but an acceptable value close to
desired can be achieved
New Data
 Once the network is trained new data can
be run through it
 The network will classify new data based
on the previous data it trained with
 If an exact match can not be found it will
match with the closest found in memory
Regression and Neural Networks
 Objective of regression problem is to find
coefficients that minimize sum of errors
 To find coefficients we must have a
dataset that includes the independent
variable and associated values of the
dependent variable. (very similar to
training the network)
 Equivalent to a single layer feed forward
network
Regression
 Independent variables correspond to
predictors
 Coefficients β correspond to weights
 The activation function is the identity
function
 To find weights in a neural network we use
backpropagation and a cost function
Difference in Neural Networks
 The difference in the two approaches is
that multiple linear regression has a closed
form solution for the coefficients, while
neural networks use an iterative process.
 In regression models a functional form is
imposed on the data
 In the case of multiple linear regression
this assumption is that the outcome is
related a linear combination of the
independent variables.
 If this assumption is not correct, it will lead
to error in the prediction
 An alternate approach is not to assume
any functional relationship between the
independent variables (predictors) and let
the data define the functional form.
 This is the basis of the power of the neural
networks
 This is very useful when you have no idea
of the functional relationship between the
dependent and independent variables
 If you had an idea, you’d be better off
using a regression model
Drawbacks and Limitations
 Neural Networks can be extremely hard to
use
 The programs are filled with settings you
must input and a small error will cause
your predictions to have error also
 The results can be very hard to interpret
as well
Drawbacks and Limitations
 Neural networks should not be used when
traditional methods are appropriate
 Since they are data dependent
performance will improve as sample size
increases
 Regression performs better when theory
or experience indicates an underlying
relationship
A short demonstration using
“NeuroShell 2”

Anda mungkin juga menyukai