With Matlab
By
Thorolf Horn Tonjum
School of Computing and Technology,
University of Sunderland, The Informatics Centre,
St Peter's Campus, St Peter's Way,
Sunderland, SR6 !!,
United "ingdom
#mail $ thorolf%ton&um'sunderland%ac%u(
Introduction
This paper describes neural network time series prediction project,
applied to forecasting the American S&P !! stock inde"#
$%& weeks of raw data is preprocessed and used to train a neural network#
The project is built with Matlab 'Mathworks inc#(
Matlab is used for processing and preprocessing the data#
A prediction error of !#!!))$ 'mean s*uared error( is achie+ed#
,n of the major goals of the project is to +isuali-e how the network
adapts to the real inde" course b. appro"imation, this is
achie+ed b. training the network in series of !! epochs each,
showing the change of the appro"imation 'green color( after each training#
/emember to push the 0Train !! 1pochs2 button at least ) times,
to get good results and a feel for the training# 3ou might ha+e to restart the whole program
se+eral times, before it 0lets loose2 and achie+es a good fit,
one out of reruns produce god fits#
To run4rerun the program in matlab, t.pe 5
66 preproc
66 t"
Dataset
$%7 weeks of American S&P !! inde" data#
8) basic forecasting +ariables#
The 8) basic +ariables are 5
8# S&P week highest inde"#
9# S&P week lowest inde"#
:# N3S1 week +olume#
)# N3S1 ad+ancing +olume 4 declining +olume#
# N3S1 ad+ancing 4 declining issues#
$# N3S1 new highs 4new lows#
%# NAS;A< week +olume#
7# NAS;A< ad+ancing +olume 4 declining +olume#
&# NAS;A< ad+ancing 4 declining issues#
8!# NAS;A< new highs 4new lows#
88# : Months treasur. bill#
89# :! 3ears treasur. bond .ield#
8:# =old price#
8)# S&P weekl. closing course#
These are all strong economic indicators#
The indicators ha+e not been subject to re>inde"ation or other alternations of the measurement
procedures, so the dataset co+ers an unobstructed span from ?anuar. 8&7! to ;ecember 8&&9#
@nterest rates and inflation are not included, as the. are reflected in the :! .ears treasur. bond
and the price of gold# The dataset pro+ides an ample model of the macro econom.#
Preprocessing
The weekl. change in closing course is used as output target for the network,
The 8) basic +ariables are transformed into ) features b. 5
Taking the first 8: +ariables and producing 5
@# The change since last week 'delta(#
@@# The Second power '"A9(#
@@@# The third power '"A:(#
And using the course change from last week as an input +ariable the week after, gi+es
) feature +ariables 'the 8) original static +ariables included(#
All input +ariables are then subjected to normali-ation, which
ensures that the input data follows the normal distribution, with a standard de+iation of 8,
and a mean of -ero# BMatlab command 5 prestdC
The dimensionalit. of the data is then reduced to 97 +ariables after a principal component
anal.sis with !#!!8 as threshold# The threshold is set low since we want to preser+e as much
data as possible for the 1lman network to work on# BMatlab command 5 prepcaC
We then scale the +ariables 'including the target data( to fit the B>8,8C range, as we use tansig
output functions# BMatlab command 5 premnm"C
S1 matlab file 0Preproc#m2 for further details#
Choice of Network architecture and algorithms.
We are doing time series prediction, but we are forecasting a stock inde", and rel. on current
economic data just as much as the lagged data from the time series being forecasted,
this gi+es us a wider specter of neural model options#
Multi De+el Perceptron networks 'MDP(,
Tapped ;ela.>line 'T;NN(, and a recurrent network model can be used#
@n our case, detecting c.clic patterns, becomes a priorit. together with good multi+ariate
pattern appro"imation abilit.#
The 1lman network is selected on behalf of its abilit. to detect both temporal and spatial
patterns# Ehoosing a recurrent network is fa+orable, as it accumulates historic data in its
recurrent connections#
Fsing a 1lman network for this problem domain, demands a high number of hidden weights,
: is found to be the best trade off in our e"ample, whereas if we used a normal MDP
network, around 8$ hidden weights would be enough#
The 1lman network needs more hidden nodes to respond to the comple"it. in the data,
as well as ha+ing to appro"imate both temporal and spatial patterns#
We train the network with gradient descent training algorithm, enhanced with momentum,
and adapting learning rate, this enables the network to performance>+ise climb past points
were gradient descent training algorithms without adapting learning rate
would get stuck#
We use the matlab learning function 5 learnpn for learning,
as we need robustness to deal with some *uite large outliers in the data#
Ma"imum +alidation failures Bnet#trainParam#ma"GfailH9C
@s set arbitraril. high , but this pro+ides the learning algorithm higher abilit. to escape local
minima, and continue to impro+e, were it would otherwise get stuck#
The momentum is also set high '!#&( to ensure high impact of pre+ious weight change,
This speeds up the gradient descent, helps keeps us out of local minima, and
resists memori-ation#
The learning rate is initiall. set relati+el. high at !#9, this is possible because of the high
momentum, and because it2s remote controlled b. the adapti+e learning rate rules of the
matlab training method traingd"#
We choose the purelin as the transfer function for the output of the hidden la.er,
as this pro+ided more appro"imation power, and tansig for the output la.er, as we scaled the
target data to fit the B>8,8C range#
The weight initiali-ation scheme init-ero is used to start the weights off from -ero,
this pro+ides the best end results, but heightens the trial and error factor, resulting in
ha+ing to restart the program between to 7 times to get a Iluck.J fit#
,nce .ou ha+e a Iluck.J fit, training the network for :> K !! epochs usuall.
.ields result in the !#!!) mse range#
Ma"imum performance increase is set to 8#9), gi+ing the algorithm some leewa. to test out
alternati+e routes, before getting called back on the path#
Bnet#trainParam#ma"GperfGinc H 8#9)C#
With well o+er )!! training cases to work with, : hidden neurons and 97 input +ariables,
we get &7! hidden la.er weights, which is well below the rule of thumb number )!!!
'8! " Eases(#
/esults in the !#!!) mse range, supports the conclusion that the model choice
was not the worst possible# Additional results could ha+e come from adding lagged
data, like e"ponentiall. smoothed a+erages from different time frames and with different
smoothing factors, efficientl. accumulating memor. of large time scales#
@ntegrating a tapped dela.>line setup, could also ha+e been beneficial#
Lut these alternati+es would ha+e added to the course of dimensionalit., probabl. not
.ielding great benefits in return, especiall. as long as the recurrent memor. of the 1lman
network seemed to perform with ample sufficienc.#
The training sets )!! weeks was taken from the start of the data, then came the 8)! weeks of
test set, and finall. 8:& weeks of +alidation data,
@n effect appro"imating data on the !#!!) mse le+el,
more than .ears '9%& weeks( into the future#
Training & Visualiation.
The data is as described abo+e, di+ided in the classic $! 9! 9! format for
training>set testing>set and +alidation>set#
The appro"imation is +isuali-ed b. the actual course 'blue( +ersus the appro"imation 'green(#
This is done for the training set, the testing set, and the +alidation set#
This clearl. demonstrates how the neural net is apro"imating the data 5
1rrors are displa.ed as red bars in the bottom of the charts#
The training is done b. training !! epochs, then displa.ing the results, then training a new
!! epochs, and so forth# Seeing the appro"imation 0li+e2 gi+es interesting insights into how
the algorithm adapts, and how changes in the model affect adaptation#
Push the button to train a new !! epochs#
The effect of the adapti+e learning rate is *uite intriguing, specificall. the effect
on the performance#
;.namic learning rate, controlled b. adaptation rules#
Mi+id performance change b. the changing learning rate#
The correlation plot gi+es ample insight into how close the model is mapping the data#
To see this push the 0correlation plot2 button#
The 0Sum'abs'errors((2 displa.s the sum of all the absolutes of the errors, as a steadfast
and unfiltered measurement#
Bi!liography
Malluru /ao 8&&: IENN Neural NettworksJ#
Neural Nettwork Toolbo" Fserguide# ) edition#
"ppendi#$ ". The %atla! code $
Preproc#m 5 Preprocessing the data#
T"#m 5 Setting up the network#
=ui#m 5 Training and displa.ing the network#