Anda di halaman 1dari 14

Time Series Analysis and forecasting using ARIMA

modeling, Neural Network and Hybrid Model using ELM


Puneet Singh, Ved Prakash Gupta
Department of Mathematics and Scientific Computing, Indian Institute of Technology, Kanpur

Abstract
For past three decades ARIMA modeling of time series data has been the most used method for forecasting.
Recent research has shown that using Artificial Neural Networks (ANNs) can improve model fit significantly
giving better predictions. With some data ARIMA models and ANNs give mixed results in terms of model
superiority. In this paper, we try to apply a hybrid model constituting of ARIMA model for the linear part in time
series and ANN model for modeling the non-linear part. We prove that the result obtained by this hybrid model
is better than any of the model used alone. Further, we have implemented Extreme Learning Machine technique
as a substitute to ANN and identified some interesting results. Mean square error has been used as a measure of
models strength.








Introduction
Time series forecasting is one the most important area of forecasting where future value of a variable is
predicted based on the past value of the same variable. A model is estimated based on the properties of the
actual data and the obtained model is used for extrapolating the time series in to the future. This method works
even when there is no significant information about the origin of the data and its nature. A lot of research is
aimed at improving the time series forecasting models. One of the most important and widely used time series
models is the autoregressive integrated moving average (ARIMA) model. The popularity of the ARIMA model is
due to its statistical properties as well as the well-known BoxJenkins methodology in the model building
process. In addition, various exponential smoothing models can be implemented by ARIMA models. Although
ARIMA models are quite flexible in that they can represent several different types of time series, i.e., pure
autoregressive (AR), pure moving average (MA) and combined AR and MA (ARMA) series, their major limitation
is the pre-assumed linear form of the model. That is, a linear correlation structure is assumed among the time
series values and therefore, no nonlinear patterns can be captured by the ARIMA model. The approximation of
linear models to complex real-world problem is not always satisfactory.
Recently, artificial neural networks (ANNs) have been extensively studied and used in time series forecasting.
Zhang et al. presented a recent review in this area. Artificial neural networks (ANNs) are one of the most
accurate and widely used forecasting models that have enjoyed fruitful applications in forecasting social,
economic, engineering, foreign exchange, stock problems, etc. Several distinguishing features of artificial neural
networks make them valuable and attractive for a forecasting task. First, as opposed to the traditional model-
based methods, artificial neural networks are data-driven self-adaptive methods in that there are few a priori
assumptions about the models for problems under study. Second, artificial neural networks can generalize. After
learning the data presented to them (a sample), ANNs can often correctly infer the unseen part of a population
even if the sample data contain noisy information. Third, ANNs are universal functional approximators. It has
been shown that a network can approximate any continuous function to any desired accuracy.
Finally, artificial neural networks are nonlinear. The traditional approaches to time series prediction, such as the
BoxJenkins or ARIMA, assume that the time series under study are generated from linear processes. However,
they may be inappropriate if the underlying mechanism is nonlinear. In fact, real world systems are often
nonlinear [Zhang et al.].
In this paper, auto-regressive integrated moving average models are applied to construct a new hybrid model in
order to yield more accurate model than artificial neural networks. In our proposed model, the future value of a
time series is considered as nonlinear function of several past observations and random errors.
Therefore, in the first phase, an auto-regressive integrated moving average model is used in order to generate
the necessary data from under study time series. Then, in the second phase, a neural network is used to model
the generated data by ARIMA model, and to predict the future value of time series. Two well-known data sets
the Wolfs sunspot data and the Canadian lynx data are used in this paper in order to show the appropriateness
and effectiveness of the proposed model to time series forecasting.
Time Series Forecasting Methods
For more than half a century, auto-regressive integrated moving average (ARIMA) models have dominated many
areas of time series forecasting. In an ARIMA (p,d,q)model, the future value of a variable is assumed to be a
linear function of several past observations and random errors. That is, the underlying process that generates
the time series with the mean has the form:
(B)
d
(y
t
p) = 0(B)e
t
.(1)
where, y
t
and e
t
are the actual value and random error at time period t, respectively;
(B) = 1

, 0(B) = 1 + 0


q
=1
p
=1
.(2)
are polynomials in B of degree p and q,

(i=1,2,3,..p) and 0

(i=1,2,3.q) are model parameters, = 1


B, B is the backward shift operator, p and q are integers and often referred to as orders of the model, and d is
an integer and often referred to as order of differencing. Random errors e
t
are assumed to be independently
and identically distributed with a mean of zero and a constant variance of o
2
.
The Box and Jenkins (1976) methodology includes three iterative steps of model identification, parameter
estimation, and diagnostic checking. The basic idea of model identification is that if a time series is generated
from an ARIMA process, it should have some theoretical autocorrelation properties. By matching the empirical
autocorrelation patterns with the theoretical ones, it is often possible to identify one or several potential models
for the given time series. Box and Jenkins (1976) proposed to use the autocorrelation function (ACF) and the
partial autocorrelation function (PACF) of the sample data as the basic tools to identify the order of the ARIMA
model. Some other order selection methods have been proposed based on validity criteria, the information-
theoretic approaches such as the Akaikes information criterion (AIC) (Shibata, 1976) and the minimum
description length (MDL) (Hurvich & Tsai, 1989; Jones, 1975; Ljung, 1987). In addition, in recent years different
approaches based on intelligent paradigms, such as neural networks (Hwang, 2001), genetic algorithms (Minerva
& Poli, 2001; Ong, Huang, & Tzeng, 2005) or fuzzy system (Haseyama & Kitajima, 2001) have been proposed to
improve the accuracy of order selection of ARIMA models.
In the identification step, data transformation is often required to make the time series stationary. Stationarity is
a necessary condition in building an ARIMA model used for forecasting. A stationary time series is characterized
by statistical characteristics such as the mean and the autocorrelation structure being constant over time. When
the observed time series presents trend and heteroscedasticity, differencing and power transformation are
applied to the data to remove the trend and to stabilize the variance before an ARIMA model can be fitted. Once
a tentative model is identified, estimation of the model parameters is straightforward. The parameters are
estimated such that an overall measure of errors is minimized. This can be accomplished using a nonlinear
optimization procedure. The last step in model building is the diagnostic checking of model adequacy. This is
basically to check if the model assumptions about the errors e
t
are satisfied.
Several diagnostic statistics and plots of the residuals can be used to examine the goodness of fit of the
tentatively entertained model to the historical data. If the model is not adequate, a new tentative model should
be identified, which will again be followed by the steps of parameter estimation and model verification.
Diagnostic information may help suggest alternative model(s). This three-step model building process is typically
repeated several times until a satisfactory model is finally selected. The final selected model can then be used
for prediction purposes.

The ANN approach to time series modeling
Recently, computational intelligence systems and among them artificial neural networks (ANNs), which in fact
are model free dynamics, has been used widely for approximation functions and forecasting. One of the most
significant advantages of the ANN models over other classes of nonlinear models is that ANNs are universal
approximators that can approximate a large class of functions with a high degree of accuracy (Chen, Leung, &
Hazem, 2003; Zhang & Min Qi, 2005). Their power comes from the parallel processing of the information from
the data. No prior assumption of the model form is required in the model building process. Instead, the network
model is largely determined by the characteristics of the data. Single hidden layer feed forward network is the
most widely used model form for time series modeling and forecasting (Zhang et al., 1998). The model is
characterized by a network of three layers of simple processing units connected by acyclic links (Fig. 1). The
relationship between the output y
t
and the (y
t-1
y
t-p
) has the following mathematical
representation:
y
t
= w
o
+ w
]
q
]=1
. g(w
0,]
+ w
,]
p
=1
. y
t-
) +e
t
.(3)
where,w
,]
(i=0,1,2p, j=1q) w
]
(j=0,1,2q) are model parameters often called connection weights; p is the
number of input nodes; and q is the number of hidden nodes.
Activation functions can take several forms. The type of activation function is indicated by the situation of the
neuron within the network. In the majority of cases input layer neurons do not have an activation function, as
their role is to transfer the inputs to the hidden layer. The most widely used activation function for the output
layer is the linear function as non-linear activation function may introduce distortion to the predicated output.
The logistic and hyperbolic functions are often used as the hidden layer transfer function that are shown in
Eqs.(4) and (5), respectively. Other activation functions can also be used such as linear and quadratic, each with
a variety of modeling applications.
Sig(x) =
1
1+cxp (-x)
(4)

Tanh(x) =
1-cxp (-2x)
1+cxp (-2x)
(5)

Hence, the ANN model of(1), in fact, performs a nonlinear functional mapping from past observations to the
future value y
t

, i.e.,
y
t
= (y
t-1
y
t-p
, w) +e
t
(6)
where, w is a vector of all parameters and f() is a function determined by the network structure and connection
weights. Thus, the neural network is equivalent to a nonlinear auto-regressive model. The simple network given
by (1) is surprisingly powerful in that it is able to approximate the arbitrary function as the number of hidden
nodes when q is sufficiently large. In practice, simple network structure that has a small number of hidden nodes
often works well in out-of-sample forecasting. This may be due to the over fitting effect typically found in the
neural network modeling process. An overfitted model has a good fit to the sample used for model building but
has poor generalizability to data out of the sample (Demuth & Beale, 2004).
There are some similarities between ARIMA and ANN models. Both of them include a rich class of different
models with different model orders. Data transformation is often necessary to get best results. A relatively large
sample is required in order to build a successful model. The iterative experimental nature is common to their
modeling processes and the subjective judgment is sometimes needed in implementing the model. Because of
the potential over fitting effect with both models, parsimony is often a guiding principle in choosing an
appropriate model for forecasting.
Overview of Extreme Learning Machine
Gradient descent based algorithms require all the weights be updated after every iteration. So, gradient
based algorithms are generally slow and may easily converge to local minima. On the other hand, ELM,
proposed by Huang et al. in 2004, randomly assigns the weights connecting input and hidden layers; and hidden
biases. Then it analytically determines the output weight using the Moore-Penrose generalized inverse. It has
been proved in that given randomly assigned input weights and hidden biases with almost any non-zero
activation function, we can approximate any continuous function on compact sets. Unlike the traditional
algorithms, ELM not only achieves the minimum error but also assigns the smallest norm for the output weights.
The reason for using Moore-Penrose inverse is that according to Bartletts theory, smaller norm of weights
results in better generalization of the feedforward neural network. The advantages of ELM over traditional
algorithms are as follows
ELM can be up to 1000 times faster than the traditional algorithm.
ELM has better generalization performance as it not only reaches the smallest error but also the assigns
smallest norm of weights.
Non-differentiable functions can also be used to train SLFNs with ELM learning algorithms.
Given a training set N = {(x
I
, t
I
)|x
I
R
n
, i = 1, , n anu t
I
R
m
], activation function g(x), and hidden node
number N

the mathematical equation for the neural network can be written as:
g(w
I
x
j
+b
I
) = o
j
, j = 1, , N
N

I=1

where w
I
= |w
I1
, w
I2
, w
I3
, , w
In
]
T
is the weight vector connecting the ith hidden neuron and the input
neurons and
I
= |
I1
,
I2
,
I3
, ,
Im
]
T
is the weight vector connecting the ith hidden neuron and the output
neurons. tj denotes the target vector of the input xj whereas oj denotes the output vector obtained from the
neural network. wi.xj denotes the inner product of wi and xj. The output neurons are chosen to be linear.
Standard SLFNs with N

hidden neurons with activation function g(x) can approximate these N samples with
zero mean error means that [o
j
t
j
[ = u
N

j=1
, i.e, there exists
I
, w
I
anu b
I
such that g(w
I
x
j
+b
I
) = t
j
,
N

I=1
j = 1, , N
The above n equations can be written compactly as: B = T where
B = _
g(w
1
x
1
+b
1
) g(w
N

x
1
+b
N

)

g(w
1
x
n
+b
1
) g(w
N

x
N
+b
N

)
_

= _

1
T

T
_ anu T = _
t
1

t
N
_
The smallest norm least squared solution of the above linear system is:
= B
|
T
The algorithm for the ELM with architecture as shown in figure 1 can be summarized as follows:
Given training sample N, activation function g(x) and number of hidden neurons N

,
1. Assign random input weightsw
I
andbiasb
I
, i = 1N


2. Calculate the hidden layer output matrix H.
3. Calculate the output weight = B
|
T

The Hybrid Methodology
Both ARIMA and ANN models have achieved successes in their own linear or nonlinear domains. However, none
of them is a universal model that is suitable for all circumstances. The approximation of ARIMA models to
complex nonlinear problems may not be adequate. On the other hand, using ANNs to model linear problems
have yielded mixed results. For example, using simulated data, Denton showed that when there are outliers or
multicollinearity in the data, neural networks can significantly outperform linear regression models. Markham
and Rakes also found that the performance of ANNs for linear regression problems depends on the sample size
and noise level. Hence, it is not wise to apply ANNs blindly to any type of data. Since it is difficult to completely
know the characteristics of the data in a real problem, hybrid methodology that has both linear and nonlinear
modeling capabilities can be a good strategy for practical use. By combining different models, different aspects
of the underlying patterns may be captured.
It may be reasonable to consider a time series to be composed of a linear autocorrelation structure and a
nonlinear component. That is,
y
t
= I
t
+N
t
,
where I
t
, denotes the linear component and Nt denotes the nonlinear component. These two components have
to be estimated from the data. First, we let ARIMA to model the linear component, and then the residuals from
the linear model will contain only the nonlinear relationship. Let et denote the residual at time t from the linear
model, then
c
t
= y
t
I
`
t

Where I
`
t
is the forecast value for time t from the estimated relationship (2). Residuals are important in
diagnosis of the sufficiency of linear models. A linear model is not sufficient if there are still linear correlation
structures left in the residuals. However, residual analysis is not able to detect any nonlinear patterns in the
data. In fact, there is currently no general diagnostic statistics for nonlinear autocorrelation relationships.
Therefore, even if a model has passed diagnostic checking, the model may still not be adequate in that nonlinear
relationships have not been appropriately modeled. Any significant nonlinear pattern in the residuals will
indicate the limitation of the ARIMA. By modeling residuals using ANNs, nonlinear relationships can be
discovered. With n input nodes, the ANN model for the residuals will be
c
t
= (c
t-1
c
t-2
. c
t-n
) +e
t
(7)
where f is a nonlinear function determined by the neural network and t is the random error. Note that if the
model f is not an appropriate one, the error term is not necessarily random. Therefore, the correct model
identification is critical. Denote the forecast from (7) as View the N

t
source, the combined forecast will be
y
t=
I
`
t
+N

t

In summary, the proposed methodology of the hybrid system consists of two steps. In the first step, an ARIMA
model is used to analyze the linear part of the problem. In the second step, a neural network model is
developed to model the residuals from the ARIMA model. Since the ARIMA model cannot capture the nonlinear
structure of the data, the residuals of linear model will contain information about the nonlinearity. The results
from the neural network can be used as predictions of the error terms for the ARIMA model. The hybrid model
exploits the unique feature and strength of ARIMA model as well as ANN model in determining different
patterns. Thus, it could be advantageous to model linear and nonlinear patterns separately by using different
models and then combine the forecasts to improve the overall modeling and forecasting performance.
As previously mentioned, in building ARIMA as well as ANN models, subjective judgment of the model order as
well as the model adequacy is often needed. It is possible that suboptimal models may be used in the hybrid
method. For example, the current practice of BoxJenkins methodology focuses on the low order
autocorrelation. A model is considered adequate if low order autocorrelations are not significant even though
significant autocorrelations of higher order still exist. This sub optimality may not affect the usefulness of the
hybrid model. Granger has pointed out that for a hybrid model to produce superior forecasts, the component
model should be suboptimal. In general, it has been observed that it is more effective to combine individual
forecasts that are based on different information sets.



Empirical Results
1)Data Sets
Two well-known data setsthe Wolf's sunspot data and the Canadian lynx data are used in this study to
demonstrate the effectiveness of the hybrid method. These time series come from different areas and have
different statistical characteristics. They have been widely studied in the statistical as well as the neural network
literature. Both linear and nonlinear models have been applied to these data sets, although more or less
nonlinearities have been found in these series.



The sunspot data we consider contains the annual number of sunspots from 1700 to 2012, giving a total of 313
observations. The study of sunspot activity has practical importance to geophysicists, environment scientists,
and climatologists. The data series is regarded as nonlinear and non-Gaussian and is often used to evaluate the
effectiveness of nonlinear models. The plot of this time series (see Fig. 1) also suggests that there is a cyclical
pattern with a mean cycle
of about 11 years. The
sunspot data have been
extensively studied with a
vast variety of linear and
nonlinear time series
models including ARIMA
and ANNs.
The lynx series contains
the number of lynx
trapped per year in the
Mackenzie River district of
Northern Canada. The
data shows a periodicity
of approximately 10 years.
The data set has 114
observations,
corresponding to the period of 18211934. It has also been extensively analyzed in the time series literature
with a focus on the nonlinear modeling. Following other studies, the logarithms (to the base 10) of the data are
used in the analysis.
To assess the forecasting performance of different models, each data set is divided into two samples of training
and testing. The training data set is used exclusively for model development and then the test sample is used to
evaluate the established model. The data compositions for the three data sets are given in Table 1.
2) Results
Only the one-step-ahead forecasting is considered. The mean squared error (MSE) is selected to be the
forecasting accuracy measures.
Table 1 gives the forecasting results for the sunspot data. A subset autoregressive model of order 9 has been
found to be the most parsimonious among all ARIMA models that are also found adequate judged by the
residual analysis.. The neural model and Extreme Learning Machine used is a 231 network based on
experimental results achieved by calculating the mean square error. Forecast horizons of 67 periods are used.
Results show that while applying neural networks alone could not improve the forecasting accuracy over the
ARIMA model in the 67-period horizon. The results of the hybrid model show that by combining two models
together, the overall forecasting errors can be significantly reduced. The comparison between the actual value
and the forecast value for the 67 points out-of-sample is given in the figure below. Although at some data
points, the hybrid model gives worse predictions than either ARIMA or ELM forecasts, its overall forecasting
capability is improved.





Table 1: Sunspot Data 67 points ahead
Model Mean Square Estimate Value
ARIMA 393.73
ANN 482.02
ELM 487.64
Hybrid(ARIMA+ELM) 363.57
Hybrid(ARIMA+ANN) 361.69

In a similar fashion, we have fitted to Canadian lynx data with a subset AR model of order 12. This is a
parsimonious model also used by Subba Rao and Gabr and others. The overall forecasting results for the last 14
years are summarized in Table 2. A neural network structure/Extreme Learning Structure of 221 gives is not
able to improve the results of ARIMA models. Applying the hybrid method, we find a significant decrease in the
MSE error of the data. Figure below gives the gives the actual vs. forecast values with individual models of
Extreme Learning Machine and ARIMA as well as the combined model.




Table 2: Lynx Data 14 points ahead
Model Mean Square Estimate Value
ARIMA 0.128
ANN .148
ELM .154
Hybrid(ARIMA+ELM) .118
Hybrid(ARIMA+ANN) .123

Conclusion
Applying quantitative methods for forecasting and assisting investment decision making has become more
indispensable in business practices than ever before. Time series forecasting is one of the most important
quantitative models that has received considerable amount of attention in the literature. Artificial neural
networks (ANNs) have shown to be an effective, general-purpose approach for pattern recognition,
classification, clustering, and especially time series prediction with a high degree of accuracy Nevertheless, their
performance is not always satisfactory. Theoretical as well empirical evidences in the literature suggest that by
using dissimilar models or models that disagree each other strongly, the hybrid model will have lower
generalization variance or error. Additionally, because of the possible unstable or changing patterns in the data,
using the hybrid method can reduce the model uncertainty, which typically occurred in statistical inference and
time series forecasting.
In this paper, the auto-regressive integrated moving average models and ELM models are applied to propose a
new hybrid method for improving the performance of the artificial neural networks to time series forecasting. In
our proposed model, based on the BoxJenkins methodology in linear modeling, a time series is considered as
nonlinear function of several past observations and random errors. Therefore, in the first stage, an auto-
regressive integrated moving average model is used in order to generate the necessary data, and Extreme
Learning Machine is used to determine a model in order to capture the underlying data generating process and
predict the future, using preprocessed data. Empirical results with two well-known real data sets indicate that
the proposed model can be an effective way in order to yield more accurate model than traditional artificial
neural networks. Thus, it can be used as an appropriate alternative for artificial neural networks.

References
[1] J.M. Bates, C.W.J. Granger, The combination of forecasts, Oper. Res. Q., 20 (1969), pp. 451468
[2] G.E.P. Box, G. Jenkins,Time Series Analysis, Forecasting and ControlHolden-Day, San Francisco, CA (1970)
[3] M.J. Campbell, A.M. Walker,A survey of statistical work on the MacKenzie River series of annual
Canadian lynx trappings for the years 18211934, and a new analysis,J. R. Statist. Soc. Ser. A, 140 (1977),
pp. 411431
[4] F.C. Palm, A. Zellner,To combine or not to combine? Issues of combining forecasts,J. Forecasting, 11
(1992), pp. 687701
[5] Z. Tang, C. Almeida, P.A. Fishwick, Time series forecasting using neural networks vs BoxJenkins
methodology,Simulation, 57 (1991), pp. 303310
[6] G.B. Huang, Q. Zhu, C.Siew, Extreme Learning Machine: A New Learning Scheme of Feedforward Neural
Networks, International Joint Conference on Neural Networks (2004), Vol. 2, pp:985-990
[7] G.B. Huang, Q. Zhu, C. Siew, Extreme Learning Machine: Theory and Applications, Neurocomputing
(2006), Vol. 70, pp: 489-501
[8] G.B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme Learning Machine for Regression and Multiclass
Classification, IEEE Transaction on Systems, Man. And Cybernetics(2012), Cybernetics, Vol. 42, No. 2

Anda mungkin juga menyukai