Eleftherios Giovanis
Abstract
This paper examines the estimation and forecasting performance of ARIMA models
in comparison with some of the most popular and common models of neural
networks. Specifically we provide the estimation results of AR-GRNN (Generalized
regression neural networks) and the AR-RBF (Radial basis function). We show that
neural networks models outperform the ARIMA forecasting. We found that the best
model in the case of real US GNP is the AR-GRNN and for US unemployment rate is
the AR-MLP.
1 Introduction
Artificial neural networks are computational networks which aim and attempt
(Graupe, 2007) . The difference between the neural networks and the other estimation
and approximation methods is that neural networks conclude the hidden layers in
which the input variables or data are transformed into special function, as the logistic
or the negative exponential and many more. With this hidden layer and the synapses
functions, the approach can be prove a very efficient to model and to estimate
nonlinear processes (McNelis, 2005). In this paper we have to deal with two
1
Aryal and Yao-Wu (2003) applied a MLP network with 3 hidden layers to
forecast the Chinese construction industry and they compare the forecasting
performance of the MLP networks with that of ARIMA. They found that the RMSE
of the MLP estimation is 49 percent lower than the ARIMA counterpart. Maasoumi
et al., (1996) have applied a back-propagation ANN model to forecast GDP and
unemployment rate among others. The network they apply is a single hidden layer
feedforward networks with the hidden units. Swanson and White (1997a, 1997b)
series and they found generally neural networks outperform the linear models. Tkacz
and Hu (1999) have applied neural networks to forecast the Canadian GDP growth at
4-quarter horizon and they found that forecast accuracy is statistically significant,
while the performance in the 1-quarter horizon is poor. Also they found that the best
quarter horizon. Tkacz (2001) has found that neural networks produce lower
forecasting errors for the yearly growth rate of the real Canadian GDP relative with
2 Data
The data concern quarterly series of the real gross national product (GNP) and
the unemployment rate for the economy of the USA during period 1948-2006. The
data have been obtained by the Reserve Federal Bank of St. Louis.
2
3 Methodology
a. Autoregressive moving average
The first model we estimate is the ARMA, which its process (Gujarati, 2004)
is defined as
This is the ARMA(p,q) process. If the series are not stationary in their levels ,
which means that aren’t I(0), then we have to estimate an ARIMA(p,d,q) process
(Gujarati, 2004).
E[y | x] =
∫-∞
yg(x, y) dy
(2)
∞
∫-∞
g(x, y) dy
, where E[y | x] is the expected value of y given x and g(x,y) is the Parzen
n
|| x − x i || 2
^
∑i
y i exp( −
2σ
2
)
y(x) = i (3)
n
|| x − x i || 2
∑ exp( −
i 2σ
2
)
i
Usually the GRNN consists of four layers. The first layer , which are the input
data, the synaptic and the activation functions are linear. In the second layer, the
pattern layer, the synaptic function is the radial and the activation function is the
negative exponential. The third layer, the summation layer, has as the first layer linear
synaptic and activation functions. The last layer , the output, has a synaptic function a
3
division and linear activation function. More specifically input layer receives the input
vector X and distributes the data to the pattern layer. Each neuron in the pattern layer
summation layer. In this layer the numerator and denominator neuron compute the
weighted and simple sums based on the value of w and θ , which is wijθj , the
numerator is Sj = Σi wijθj and denominator is Sd = Σi θj. In the output layer output y are
computed as Υj = Sj/ Sd. We must mention that the hidden layer consists of 24 units.
The smooth rate for GNP is set at 0.01 and for the unemployment rate is set at 0.05
based on the lowest train and test errors. In our case we propose the AR-GRNN
model (Li et al., 2007), which means that the output is the vector of data yt and inputs
are the data with lags as yt-1, yt-2…yt-p. So the general form of the AR-GRNN is
defined as
probably stationary, as indicates the KPSS test , so we apply the following AR(p)
function
We apply relation (4) and (5) for all neural networks models and specifically we apply
AR(1) for GNP and AR(2) for the first differences of unemployment rate. The
technique we obtain is the following. Suppose that we have quarterly output data for a
period e.g. 1948:Q1-2006:Q4 which is the variable yt. If we have AR(1) then we
obtain the yt-1 , which is the output data with one lag. But this lag is referred again to
4
same data for period 1948:Q2-2007:Q1, which means that we don’t extinguish the last
observation , but we put it forward to the next period. The same process is followed
for AR(2). So in this paper we estimate for the period 1948:Q1-2006:Q4 and then we
make the forecast for the period 2007:Q1-2008:Q1. This definition is applied also for
the other two neural network models. In figure 1 is presented a general GRNN
architecture. In all neural network models estimations the training sample is set up for
period 1948:Q1-1990:Q4 and the testing sample is set up for period 1991:Q1-
2006:Q4. The
Y1 Y2 YJ
Output Layer
……………..
Numerator Denominator
1 22
2 J
……………… Summation Layer
X1 X2 Xk
5
c. Radial Basis Function
M
y k ( x) = ∑ wkjφ j ( x) + wko (6)
j =1
, where wkj are the weights and wko are biases and φj(x) can be estimated by
|| x − µ j ||
φ j ( x ) = exp( − ) (7)
2σ 2
j
The RBF consists by three layers, the input, which its synaptic and activation
function are linear, the hidden layer , where the synaptic and the activation functions
are radial and negative exponential respectively. Finally the third layer, which is the
output layer, has linear synaptic and activation function, as in the case of the input
layer. In figure 2 we present a general RBF illustration. The hidden layer in the RBF
estimation has 11 units. The radial for GNP and for unemployment rate has been set at
50 based on the lowest train and test error as in the case of GRNN estimation. The
d. Multilayer perceptron
The last model we estimate is the multilayer perceptron (MLP), which has two
differences in relation with the RBF (McNelis, 2005). First the RBF has at the most
one hidden layer, while MLP can have more. Second the activation function in RBF
computes the Euclidean distance of the between the signal from input vector and the
6
center of that unit , while MLP computes the inner products of the inputs and the
The first layer, input, in the MLP has linear synaptic and activation function,
as the last layer, the output, has. The hidden layers, which in our case are three , have
linear synaptic function and hyperbolic activation functions. For networks with binary
units MLP with one hidden layer has been shown that is suffice. But in our case we
Output
Linear weights
Weights
Input x
have continuous variables or data , so we prefer three hidden layers. In the first phase
the back-propagation method is applied. Each layer consists of units and receive input
from the units of the layer directly below, and then send the output to unit directly
above the unit. The Ni inputs are fed into the first layer of Nh,1 hidden units (Krose &
7
, where ∑ (2)
! (3)
"#
The error measure of Et is defined as the total square error for pattern t at the output
units and it is
'(
$ ∑&)%& & (4)
, where %& is the desired output for unit i and pattern t. Then we can write by the
chain rule
*#
(5)
!"# *# !"#
But by equation (2) we find that the second factor from the right hand term of the
equation (5) is equal with
*#
(6)
!"#
And we define the first factor as
*#
+ * (7)
#
Then to compute + we write the partial derivation, by applying the chain rule, as the
product of two factors. The one factor in relation (9) reflects the change in error as a
function of the output of the unit , while the other reflects the change in the output as
a function of changes in the input. Relation (9) is defined as
,#
+ * , *#
(9)
# #
8
,#
(10)
*#
, which is the derivative of function f for the kth unit. For the first factor computation
we assume that k=i. Then in this case we have
%& & (11)
,#
, for any output unit i. Second if k is a hidden unit and not an output, which means
that k=h , then the error measure can be written as a function of net inputs from
hidden to output layers and we use the chain rule.
'
0 / ∑&)
(
& /& (14)
In the first phase we use the back-propagation method. In the second phase we use the
Levenberg-Marquardt algorithm (Bishop, 1995). Suppose that we have the error
function
$ ∑21 2 (15)
, where ε4 is the error for the nth pattern. We set WA as the old weight space and WB
as the new weight space. Then we can expand the error vector ε to first order in
Taylor series.
6 7
52& (17)
! (
$ ∑2 εWΑ Ζ WB – WΑ ||εWΑ Ζ WB – WΑ|| (18)
9
In this paper we estimate a MLP network with three hidden layers and three
units each of them. The learning rate is set at 0.01 and the momentum at 0.3. In the
first phase the number of epochs are 100 and in the second phase they are 500. The
AR-MLP is defined as in the other two neural network models, the AR-GRNN and
the AR-RBF. In figure 3 a general MLP illustration with three hidden layers is
presented.
h h
h h
No
Also we will apply the unit root test to examine if the series are I(0) or not,
which with other words means, if they are stationary in levels or in first difference and
process. We apply two tests the Dickey-Fuller (Greene, 2003) and the KPSS
(Kwiatkowski, 1992) tests. For DF GLS test we examine the regression with constant
> , ?@ 1 (19)
10
And we test the hypothesis
, which means that if we accept the null hypothesis then the series are non stationary
in first differences , so they are I(1), else if we reject the null hypothesis the time
series are stationary, I(0). For the KPSS test we have the hypotheses.
H0: stationary
H1: non-stationary
The KPSS test is based in the residuals by the OLS regression of yt on exogenous
yt = α + βt + γΖt (20)
If γ equals with zero , then the process is stationary if β=0 and trend stationary if
∑G) $
DEFF
H IJ
∑LM@ K ∑L
TM"U@ ST ST?"
, where IJ 2 ∑P)1 PQR , while R
G G
apply two statistical measures, the RMSE (root mean square error) and the MAE
11
4 Results
Table 1
Unit root tests for real GNP and unemployment rate of USA
with both tests. For unemployment rate we conclude that with KPSS test is I(1) as we
Table 2
KPSS unit root test for first difference of unemployment rate
unemployment rate. So we apply an AR(1) for the three neural networks in the case of
12
GNP and AR(2) for unemployment rate. From table 3 we conclude that neural
networks modeling is better, with AR-GRNN to have the lowest RMSE and MAE. So
we prefer neural network for the forecasting of the real GNP of USA. Specifically we
found that the RMSE of forecasting for neural network models is 7 to 17 per cent
lower than the ARIMA counterpart an the MAE is 9 to 22 per cent lower than the
MAE of ARIMA.
Table 3
Forecasting comparison between ARIMA and neural networks for the real GNP of USA for the period
2007:Q1-2008:Q1
In table 4 the conclusions are almost the same with that of GNP results. The
neural networks modeling is again more reliable and these models present lower
RMSE and MAE than that of ARIMA(2,1,3). Especially the AR-MLP and then the
AR-GRNN are the best models. In the case of the unemployment the RMSE and
the ARIMA counterparts. In table 5 we present the actual values of real US GNP and
Table 4
Forecasting comparison between ARIMA and neural networks for the real unemployment rate of USA
for the period 2007:Q1-2008:Q1
13
Table 5
Forecasting values for GNP with the four models
Model Period Actual Predicted Model Predicted Model Predicted Model Predicted
2007:Q1 0.164 0.76379 0.860 0.513 0.729
2007:Q2 0.983 0.80734 0.390 0.941 0.893
ARMA 2007:Q3 1.411 0.82153 GRNN 1.001 RBF 0.695 MLP 0.936
(1,0) 2007:Q4 0.462 0.82615 0.209 0.974 0.790
2008:Q1 0.044 0.82765 0.060 0.643 0.864
Table 6
Forecasting values for unemployment rate with ARMA (2,1,3) and neural networks
Model Period Actual Predicted Model Predicted Model Predicted Model Predicted
2007:Q1 0.567 0.390 0.659 0.629 0.550
2007:Q2 -0.367 -0.176 -0.300 -0.328 -0.280
ARIMA 2007:Q3 0.234 0.391 GRNN 0.197 RBF 0.269 MLP 0.384
(2,1,3) 2007:Q4 -0.100 -0.232 0.100 0.155 -0.144
2008:Q1 0.700 0.342 0.749 0.722 0.667
unemployment with ARIMA (2,1,3) and the three neural network models. In figure 4
we present the forecasting with for US real GNP during the period 2007:Q1-2008:Q1,
while in figure 5 are presented the forecasting results for US unemployment for the
same period.
14
1.6 1.6
1.4 1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1
(a) (b)
1.6
1.6
1.4
1.4
1.2 1.2
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1
ACTUAL MLP
ACTUAL RBF
(c) (d)
Figure 4. Actual against forecasting for US GNP in the period 2007:Q1-2008:Q1 with: (a) ARMA
(1,0), (b) GRNN, (c) RBF and (d) MLP
15
.8 .8
.6 .6
.4 .4
.2 .2
.0 .0
-.2 -.2
-.4 -.4
2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1
(a) (b)
.8 .8
.6 .6
.4 .4
.2 .2
.0 .0
-.2 -.2
-.4 -.4
2007Q1 2007Q2 2007Q3 2007Q4 2008Q1 2007Q1 2007Q2 2007Q3 2007Q4 2008Q1
(c) (d)
Figure 5. Actual against forecasting for US unemployment first differences in the period 2007:Q1-
2008:Q1 with: (a) ARIMA (2,1,3), (b) GRNN, (c) RBF and (d) MLP
16
5 Conclusion
We examined the forecasting performance of the traditional time series
method, the ARIMA process in comparison with three neural networks models. We
proposed the three of the most usual models the generalized regression neural
networks (GRNN), the radial basis function (RBF) and the multilayer perceptron
(MLP). We obtained the autoregressive (AR) of these neural models, which means
that input data are just the output data with time lags. We configure the AR(p) order
as we define by the unit root tests, so we have AR(1) for the real gross national
product (GNP) and AR(2) for the unemployment rate for the economy of USA. We
show that all neural models outperform the ARIMA process , so we conclude that
traditional time series and econometrical methods , are not always the best or even the
only choice, but we must look out for more sophisticated modeling , as the neural
networks modeling, which are able to capture with great success , the non-linear
processes.
REFERENCES
Aryal R.D. & Yao-Wu W. (2003). Neural Network Forecasting of the Production
Level of Chinese Construction Industry. Journal of comparative
international management , 29, 319-33
Bishop C.M. (1995). Neural Networks for Pattern Recognition. pp. 164-170, 290-
291. Oxford: Clarendon Press
Graupe D. (2007). Principles of Artificial Neural Networks. 2nd Edition, pp. 1 World
USA: Scientific Publishing
Greene H. W. (2003). Econometric Analysis. Fifth Edition, pp. 637-640. New
Jersey: Pearson Education
Gujarati D. (2004). Basic Econometrics. Fourth Edition, pp. 839-840. USA: McGraw-
hill
Krose B. & Smagt. V.D. P. (1996). An introduction to neural networks. Eighth
edition . pp. 33-37. The University of Amsterdam
Kwiatkowski, D., P.C.B. Phillips, P. Schmidt and Y. Shin (1992). Testing the
Null Hypothesis of Stationarity against the Alternative of a Unit Root.
Journal of Econometrics, 54, 159-178.
Li W., Luo Y., Zhu Q., Liu J. & Le J. (2007). Applications of AR*-GRNN model
17
for financial time series forecasting. Neural Computing & Applications,
London: Springer
Maasumi E., Khotanzad A., and Abaye A. (1996). Artificial neural networks for
some macroeconomic series: a first report. Econometric Reviews, 13 (1),
105-122
McNelis D. P. (2005). Neural Networks in Finance: Gaining Predictive Edge in the
Market. pp. 21. USA : Elsevier Academic Press
Swanson, N.R., and White, H. (1997a). A model selection approach to real time
macroeconomic forecasting using linear models and artificial neural
networks. Review of Economics and Statistics, 79, 540-50.
Swanson, N.R., and White, H. (1997b) . Forecasting economic time series using
adaptive versus non-adaptive and linear versus nonlinear econometric
models. International Journal of Forecasting, 13, 439-61.
Tkacz G. and Hu, S. (1999). Forecasting GDP Growth Using Artificial Neural
Networks. Working Paper, Bank of Canada, 99-3
Tkacz G. (2001). Neural network forecasting of Canadian GDP growth.
International Journal of Forecasting, 17, 57-69.
18