Data Prediction
Huaien Gao1,2, Rudolf Sollacher2, Han-Peter Kriegel1
1- University of Munich, Germany
2- Siemens AG, Corporate Technology, Germany
Abstract— A spiral recurrent neural network (SpiralRNN) II. S PIRAL R ECURRENT N EURAL N ETWORK
has a special structure of the reccurent hidden layer which
allows to bound the eigenvalues of the recurrent weight matrix. A. Hidden Units
Thus, the network can learn characteristic temporal correla-
tions online without running into dynamical instabilities. In A SpiralRNN [1], [2] is a recurrent neural network with
this paper, SpiralRNN is employed to solve the financial time special recurrent layer structure which can be broken down
series prediction problem in NN5 competition. We use these into smaller units, namely “hidden units” or “spiral units”.
time series to demonstrate the performance of SpiralRNN for Each hidden unit receives signals from the input neurons
data with particular weakly and seasonal periodicities. These
are taken into account by providing additional sinusoidal input
and provides processed signals to the output neurons. In
time series to the problem in question, where methods such as addition, they receive signals from other hidden neurons in
pre-processing and enlargement with appropriate periodicities. the same unit delayed by one time step. Fig-1(a) illustrates a
The non-regular Easter holidays are taken into account by typical hidden unit with three input neurons and three output
an additional Gaussian-shaped input signal centered at these neurons, where the hidden layer structure is only shown
holidays. The prediction performance is enhanced by a mixture
of experts approach consisting of the combined output of 30
symbolically. Note that hidden neurons are fully connected
online learning SpiralRNNs with weights proportional to their to all input neurons and all output neurons. More details
temporal average one-step forceast error. The main advantage of the connections inside the hidden layer are shown in
of this approach is the low configuration effort and the online fig-1(b), where the connections from only one particular
learning capability. An evaluation based on a forecast of the neuron to all other neurons in the hidden unit are displayed.
last 56 data of the 111 time series is provided.
With all neurons in the hidden unit aligned clockwise on a
circle, values of connection weights are defined such that the
I. I NTRODUCTION
connection from one neuron to its first clockwise neighbor
Time series prediction is a common task in various in- has value β1 , the connection to its second clockwise neighbor
dustry sectors, such as robotic control and financial market. has value β2 and so on. The definition of connection values
NN5 competition1 is one of the leading competitions with is applied to all the neurons, so that all connections from
an emphasis on utilizing computational intelligence methods. neurons to their respective first clockwise neighbors have an
The data in question come from the amount of money identical weight β1 , and all the connections from neurons to
withdrawn from ATM machines across England. These data their second clockwise neighbors have value β2 , and so on.
exhibit strong periodical (e.g. weekly, seasonally and yearly)
behavior. The associated processes have deterministic and
stochastic components. In general, they will not be stationary, 0 β1 β2 . . . βu−1
.. 0 1 0
as for example more tourists are visiting this area or a new
βu−1 0 β1 . . . . .. .. ..
shopping mall has opened. In this paper, we apply online
.
.. P = . .
M = βu−2 βu−1 0 . . .
.
learning Spiral Recurrent Neural Network (SpiralRNN) [1], ..
0 . 1
. . . .
[2] to this prediction problem. Our approach focuses on .. .. .. .. β
1 1 0 ... 0
the online learning capability and on an as low as possible β1 . . . . . . βu−1 0
configuration and preprocessing effort. (1)
The remainder of this paper is arranged as followed: This corresponding hidden-weight matrix M is shown in
Section-II introduces the SpiralRNN structure; section-III ~∈
eq. (1). Its matrix elements are determined by a vector β
discusses the adaptation of SpiralRNN model to the pre- R(u−1)×1 where u refers to the number of hidden neurons in
diction of NN5 competition data; section-IV presents some the hidden unit. Furthermore, matrix M can be decomposed
evaluation results of forecasting the last 56 data of the 111 into iterated permutations described by a matrix P:
time series.
u−1
M = β1 P + β2 P 2 + . . . + βu−1 P , P ∈ Ru×u (2)
This paper has been presented to the special section of time series
competition in the World Congress on Computational Intelligence (WCCI)
2008 in Hong Kong. It is obvious that matrix P 2 is also a permutation matrix
1 http://www.neural-forecasting-competition.com/ shifting a multiplier vector by two positions. Similarly, P u
Output xt
Data Output Layer
Target x̂t
Output Layer
Z−1
Gradient
Calculation
Hidden Layer
Hidden Layer
Input x̂t−1
Fig. 2. The typical structure of SpiralRNNs. Note that all hidden
units have the same basic topology (however the number of hidden
neurons in the hidden units can be different), as shown in fig-1,
1
0.5
1
0.8
1
0.5
and are separated from each other whereas the input and output
0
0.6
0
connections are fully connected to the hidden neurons.
0.4
−0.5
−0.5
0.2
−1
0 −1
320 325 330 335 340 0 200 400 600 800 0 200 400 600 800
ε ← αε + (1 − α)e2t (10) 15 15
35 30
35 20
30
10
25
20 0
0 200 400 600 800
15 Fig. 6. Comparison between result and data, in terms of easter
behavior. Dashed line is the data and solid line is the prediction.
10
5 Table-II shows the SMAPE errors and its variances values
of the hybrid approach on the testing dataset (i.e. the data
700 710 720 from the last 56 time steps) with varied number of member
Fig. 4. Comparison between result and data, in terms of weekly in the expert committee. It is shown in the table that number
behavior. Dashed line with circles is the data and solid line with of experts doesn’t alter the average result, which on the
squares is the prediction. other side can save the effort from utilizing large amount of
experts and is favourable to the distributed sensor network
application.
# experts 3 5 10 15 20 30
SMAPE 20.65 20.15 20.41 20.96 20.58 20.38
variance 2.45 2.82 2.78 3.30 3.16 3.30
TABLE II
Statistic results. The average SMAPE error value with its
variance of expert committee on all 111 time series, given
different numbers of expert members.
R EFERENCES
[1] H. Gao, R. Sollacher, and H.-P. Kriegel, “Spiral recurrent neural
network for online learning,” in 15th European Symposium On Artificial
Neural Networks Advances in Computational Intelligence and Learning,
Bruges (Belgium), April 2007.
[2] H. Gao and R. Sollacher, “Condictional prediction of time series using
spiral recurrent neural network,” in European Symposium on Artificial
Neural Networks Advances in Computational Intelligence and Learning,
2008.
[3] K. Wieand, “Eigenvalue distributions of random permutation matrices,”
The Annals of Probability, vol. 28, no. 4, pp. 1563–1587, 2000.
[4] H. Jaeger, “Adaptive nonlinear system identification with echo state
networks,” Advances in Neural Information Processing Systems, vol. 15,
pp. 593–600, 2003.
[5] J. L. Elman, “Finding structure in time,” Cognitive Science, vol. 14,
no. 2, pp. 179–211, 1990.
[6] R. Kalman, “A new approach to linear filtering and prediction prob-
lems,” Transactions of the ASME–Journal of Basic Engineering, vol. 82,
pp. 35–45, 1960.
[7] F. Lewis, Optimal Estimation: With an Introduction to Stochastic
Control Theory. A Wiley-Interscience Publication, 1986, iSBN: 0-
471-83741-5.
[8] G. Welch and G. Bishop, “An introduction to the Kalman filter,”
University of North Carolina at Chapel Hill, Department of Computer
Science, Tech. Rep. Technical Report 95-041, 2002.
[9] R. Sollacher and H. Gao, “Efficient online learning with spiral recurrent
neural networks,” in to appear in: International Joint Conference on
Neural Networks, 2008.