Abstract—this paper presents an auto-regressive network called The use of these networks [1, 25, 8, 7, 14, 18, 27] allows for
the Auto-Regressive Multi-Context Recurrent Neural Network the avoidance of the previous techniques' limitations by
(ARMCRN), which forecasts the daily peak load for two large power employing non-linear modeling and adaptation.
plant systems. The auto-regressive network is a combination of both
Artificial Neural Networks (ANN), as in [9, 1], are
recurrent and non-recurrent networks. Weather component variables
are the key elements in forecasting because any change in these information processing paradigms simulated by the way
variables affects the demand of energy load. So the AR-MCRN is biological nervous systems, such as the brain, process
used to learn the relationship between past, previous, and future information. ANN can also be a form of multiprocessor
exogenous and endogenous variables. Experimental results show that computer system with straightforward process elements, a
using the change in weather components and the change that high degree of interconnection, easy scalar messages and
occurred in past load as inputs to the AR-MCRN, rather than the
adaptive relations between elements: ANNs are similar to
basic weather parameters and past load itself as inputs to the same
network, produce higher accuracy of predicted load. Experimental people, as they learn from experience. An ANN is configured
results also show that using exogenous and endogenous variables as for a specific application, such as pattern recognition or data
inputs is better than using only the exogenous variables as inputs to classification, through a learning process. Learning in
the network. biological systems involves adjustments to the synaptic
connections that exist between the neurons. There are mainly
Keywords—Daily Peak Load Forecasting, Neural Networks, two types of neural networks: conventional neural networks
Recurrent Neural Networks, Auto Regressive Multi-Context Neural and recurrent neural networks. Conventional neural networks,
Network
as in [9, 1], consist of three interconnecting layers; one input
layer, one or more hidden layers and one output layer.
I. INTRODUCTION
Conventional neural networks allow signals to travel in one
36
International Journal of Computational Intelligence Volume 3 Number 1
online forecasting system as our future work and finally we rule, the training set consists of input training patterns only.
outline the conclusion of this paper. As a result the network is taught without the assistance of a
target, such as the Kohonen network [16]. Fixed weight
II. AUTO-REGRESSIVE RECURRENT NEURAL NETWORK networks, as indicated by their name, cover fixed weights. No
learning takes place; therefore, the weights cannot be
modi ed. An example of this type is the Hop eld network
In [2], we have proved that the modified multi-context [12]. For supervised learning networks, there are several
recurrent neural network (MCRN) overcomes the limitations learning techniques that are widely used by researchers. The
of the SRN. The hybrid network introduced here, is a main three are dynamic online learning, modi ed back
combination of the conventional neural network and SRN propagation and back propagation through time, all of which
with MCRN [2] and is called the auto-regressive multi-context were used for our MCRANN [2, 23] depending on the
recurrent neural network (AR-MCRN). Two different AR- application. Dynamic online learning is the most accurate
MCRN structures were designed, as can be seen from Figure amongst them, however, it is time consuming and slow due to
1a, the network is structured with two hidden layers on the the complexity of the computation. Modified back
same level; we called it AR-MCRN-a . However, in Figure 1b, propagation is driven to include recurrent memory [2].
the network is structured with two hidden layers on different Modified back propagation was the quickest and produced
levels, we called it AR-MCRN-b. In each topology, one accurate results.
hidden layer acts as a conventional neural network to the
output layer while the other hidden layer acts as both a feed– IV. FORECASTING SYSTEM
forward to the output layer and a feed back to context layers.
Logistic sigmoid transfer functions were used for all neurons The creation of a forecasting system can be described as
in the hidden and linear transfer function was used for neuron follows: acquire and analyze the historical data, pre-process
and normalise the data, choose the training and testing sets,
in the output layer. This type of structure will improve the
choose the network architecture and its parameters, choose a
speed of the training session and is an appropriate method for
suitable learning algorithm, and lastly implement the system.
approximating daily peak load [13, 24].
A. Historical Data
37
International Journal of Computational Intelligence Volume 3 Number 1
current change in the temperature T and past change in the 5) Input neuron I 10 is for the change in temperature
load L. The future load in the data set (B) is a function of the
between the current day and the previous
calendar, the status of the day, the past and current change in
the weather components (such as temperate T, cloud rate C, day: ∆Tt = Tt − Tt −1 .
wind speed W and humidity H) and the past change in load L. 6) Input neuron I 11 is allocated for inputting the change
In the following, details about the size and structure of the
in the temperature over the previous two consecutive
inputs of the MCRN network implementing each data set are
given and explained. Note that as the two data sets contain days: ∆Tt −1 = Tt −1 − Tt − 2 .
different sets of parameters, the expression of the future load 7) The last input neuron I 12 is reserved for inputting
for each data set is given as follows:
the change in the load over the previous two
1. The future (predicted) load for data set (A) is calculated days: ∆Lt −1 = Lt −1 − Lt − 2 / Lt − 2 .
from the difference between the historical values of load
and its future values. The difference between the two is Figure 2, shows a sample of input data selected from data
expressed as follows: ∆Lt = f ( past and current set (A).
calendar; past and current social events;
∆Tt ,..., ∆Tt − n ; ∆Lt −1 ,..., ∆Lt − n )
2. The difference between the future and historical loads
for data set (B) depends on a richer set of parameters
than data set (A): ∆Lt = f ( past and current
calendar; past and current social events; Fig. 2 shows a sample of input data selected from data set (A) to the
∆Tt ,..., ∆Tt − n ; ∆C t ,..., ∆C t − n ; ∆Wt ,..., ∆Wt − n ; ∆H t , network
..., ∆H t − n ; ∆Lt −1 ,..., ∆Lt − n ). The network input layer for data set (B) consists of 18
neurons. In addition to the 12 inputs described above, another
where t is the index of the day. The change in the load 6 input neurons are needed to represent other parameters
(difference between future and historical loads) and the recorded in this data set, such as wind speed, cloud rate,
change in weather components (temperature, cloud rate, wind humidity, etc. Therefore, the first 11 neurons are exactly the
speed, humidity) can be described as follows: same as for data set (A), and neuron 12 of network (A) is
similar to neuron 18 of network (B). In the following we
describe the additional neurons:
∆Lt = ( Lt − Lt −1 ) / Lt −1 ; ∆Tt = Tt − Tt −1 ;
1) Input neuron I 12 indicates the change in the cloud
∆C t = C t − C t −1 ; ∆Wt = Wt − Wt −1 ; ∆H t = H t − H t −1 ; rate between the current day and the previous
day: ∆C t = C t − C t −1 .
According to the parameters recorded in each data set the size
of the network input layer is 12 neurons for data set (A) and 2) Input neuron I13 indicates the change in the cloud
18 neurons for data set (B). Let I v denote an input neuron rate over the previous two consecutive
v . The following is the allocation of each input neuron of the days: I 13 : ∆C t −1 = C t −1 − C t − 2 .
network for data set (A):
3) Input neuron I 14 indicates the change in wind speed
1) Input neurons I 1 ...I 4 are allocated for the index of between the current day and the previous
the month expressed in binary representation. day: ∆Wt = Wt − Wt −1 .
2) The next three input neurons I 5 , I 6 , I 7 represent the
4) Input neuron I 15 is allocated for inputting the
index of the week. Thus the network can identify the
change in wind speed over the previous two
seasonal periods of the year and can also distinguish
the days with high temperatures from those with low consecutive days: ∆Wt −1 = Wt −1 − Wt − 2 .
temperatures.
5) Input neuron I 16 indicates the change in humidity
3) Input neuron I 8 indicates whether the forecasted
between the current day and the previous
day is a working day or a holiday.
day: ∆H t = H t − H t −1 .
4) Input neuron I 9 indicates whether the day prior to
the forecasted day was a working day or a holiday. 6) Input neuron I 17 indicates to the network the change
Usually this will affect the next day's load. in humidity over the previous two consecutive
days: ∆H t −1 = H t −1 − H t − 2 .
38
International Journal of Computational Intelligence Volume 3 Number 1
For both networks with data sets (A and B), inputs I 1 ...I 9 are
binary coded and inputs I 10 ....I 18 are scaled between [0:1]. A
binary representation is used for each group independently of
the others. Alongside some other differences in parameter
settings, which are described in the section above, both
networks implementing data sets (A and B) have the same
output layer, which consists of one neuron. The output of the
networks is the current change of the daily peak load, which is
the difference between the forecasted daily peak and the
previous daily peak load: ∆Lt = Lt − Lt −1 / Lt −1 . Both
networks output is also normalised between 0 and 1.
The forecasting system which is described in [3] takes into
account only the time and change in previous loads. In
comparison, the technique offered in this paper considers
more than just weather components. It considers the change in
weather components for days, which are very close in time.
This includes change in the load and details of the status of the
day and calendar rather than just pure weather data. These
changes are presented to the network as inputs and give the
network a momentous enhancement in terms of accuracy and
stability. The average error of the network performance
dropped from approx. 4.5% to 1.9%. This is because the
variation of the differences between the loads for two
consecutive days is less than the differences between the loads
factors themselves for two consecutive days. Consequently
the network takes inputs in time series with values that are
close to each other. This allows the network to learn more
easily than if it was presented with inputs whose values are
not close. The same remark applies to the other variables such
as weather components. These types of differences between
two parameter values that are close in time are shown in
Figures 3 and 4. Figure 3 shows the variations in the daily Fig. 3 (a) is the daily average temperate for the Jan 1997 and 1998,
average temperate for January 1997 and January 1998. Figure (b) is the difference between daily average temperature over
4 shows the variations in the daily peak load for January 1997 consecutive days for Jan 1997 and 1998 (data set (A))
and January 1998.
39
International Journal of Computational Intelligence Volume 3 Number 1
2 context layers each of which has 3 neurons and 1 3) Present the TS data set to the network. Obtain the
output neuron. forecasting.
40
International Journal of Computational Intelligence Volume 3 Number 1
The two ways of testing are compared in terms of the Where o, h1 , h2 , I , c are the number of output neurons, first
forecasting results and the speed of convergence.
hidden neurons, second hidden neurons, n input neurons and
the number of context layers, respectively? As can be seen
from the above two equations, they are very similar except
that the fourth terms are different. Therefore, we expect that
both AR-MCRN-a and ARMCRN-b can perform the task
equally.
V. EXPERIMENTAL RESULTS
The performance of the training and the validation of the
network are evaluated by computing the sum of MSEi
averaged over the number of training and validation sets using
the equation below:
1 n
MSE( perform.) = ∑ MSE.i .............................(3)
Figure 5 displays the cross validation procedures. n i =1
F. Complexity Computations of the AR-MCRNN The error results of AR-MCRN-a obtained for load
forecasting using the cross validation of 10 training and
testing sets are shown in Table 1, using cross validation on
The multi-context layer in recurrent networks provides the
data set A. The results from using the cross validation
potential for the network to store information about previous
technique are very close to the actual forecasting errors
inputs. If one context layer is doing well for the task, it is
produced by the network on the same data set as shown in
possible that two or more context layers will construct the
Table 2. The second part of Table 2 shows the results for data
network better for a sequential task because they have more
set B, with only the change in the temperature component
accurate information about the previous inputs. The number of
included as an input to the network. The last part of Table 2
context layers and the number of hidden layers, and neurons
displays the results for data set B, for which all the changes in
in each hidden layer are user specified. The common practice
weather components are included as inputs to the network.
is to select these parameters so that the best achievable
Obviously, the results shown in the last part presents better
structure with as few potential parameters as possible is
results in both accuracy in training and testing. Figures 6 and
acquired. This cannot be very helpful, and, in practice, we
7 display the load forecasting results of AR-MCRN-a and AR-
have to experiment with different structures and evaluate their
MCRN-b for data set A and data set B, respectively, using
outcomes, to get the most appropriate neural network structure
only the change in daily average temperature component.
for the task to be tackled. In various applications, one or two
While Figure 8 displays the forecasting results of AR-MCRN-
hidden layers are adequate. The recommendation is to
a and AR-MCRN-b for data set B with the influence of all
commence with a linear model, in order to facilitate neural
weather components. The evaluation of this network
networks with no hidden layers, followed by changing over to
implementation of the load forecasting application is realised
networks with one hidden layer but with no more than five to
using two performance measures, namely the Mean Absolute
ten neurons. As a last step you should try two hidden layers.
Percentage Error (MAPE) and Maximum Error (MAX). The
The number of weights for AR-MCRN-a can be calculated by
expressions of these two functions are given below, in
the formula below:
equations (4) and (5):
w (o, h1 , h2 , I , c ) = ch22 + Ih2 + 2 h2 + Ih1 + h1 ........(1)
100 n Lri − Lpi
And the number of weights for the MCRNN-b can be MAPE= ∑ ......................................................(4)
calculated by n i=1 Lri
w(o, h1 , h2 , I , c) = ch22 + Ih2 + 2h2 + h1h2 + h1 ........(2) MAX= max(Lri − Lpi ).............................................................(5)
Where n , is the number of outputs forecasted from the
network, Lri and Lpi , are the target and the predicted values
of the daily peak load, and i is the index of the day.
41
International Journal of Computational Intelligence Volume 3 Number 1
TABLE I
DISPLAYS THE RESULTS OF VARIOUS ERROR PERFORMANCES OF
THE n NUMBERS OF TRAINING AND VALIDATION SETS ON THE AR-MCRNN_a AR-MCRNN_b Target
AR-MCRNN-A FOR THE DATA SET (A)
3650
3450
3250
3050
DPL
2850
2650
2450
2250
0 5 10 15 20 25 30
Days
3300
3100
DPL
2900
2700
2500
2300
0 5 10 15 20 25 30
Days
AR-MCRNN_a AR-MCRNN_b Target Fig. 8 displays the forecasting results of AR-MCRNN-a and AR-
MCRNN-b for the data set (B) with influence of all weather
components variables
790
730
have been used on our networks. In this paper an effective
710 approach for predicting the energy load is presented. The
690
approach is mainly based on the neural network introduced
initially in [10, 6, 5, 26, 2, 13]. Because the application of the
670 initial network to the load-forecasting problem was not
straightforward, some modifications and improvements in
0 5 10 15 20 25 30 35
Days
both the network structure and architecture were needed. AR-
Fig. 6 displays the forecasting results of AR-MCRNN-a and AR- MCRN-a and AR-MCRN-b are designed to encode past
MCRNN-b for the data set (A) with only influence of the daily histories and produce relatively equal accurate forecasting
average temperature component variable after short training periods. Furthermore, this paper also
presented a different approach for modeling the load
forecasting application. Weather components were identified
42
International Journal of Computational Intelligence Volume 3 Number 1
and used in the model. The experimental results showed that [2] B.Q.Huang, T.Rashid, and T.Kechadi, `A new modi ed network based
on the elman network', in In Proceedings of IASTED International
the use of these components affected the network performance
Conference on Arti cial Intelligence and Application, ed., M. H.
and therefore its output. More notably these components Hamza, volume 1 of ISBN: 088986-404-7, pp. 379--384, Innsbruck,
helped the network in the learning phase and made it easier Austria, (2004). ACTA Press.
and faster than without them. [3] W. Charytoniuk and M-S. Chen, `Very short-term load forecasting
In addition, the experimental results also showed a using neural networks', In IEEE Tran. On Power Systems, 15(1), 1558-
-1572, (2000).
network presented with exogenous and endogenous inputs is [4] B.J Chen, M.W Change, and C.J. Lin, `Eunite network competition:
better than a network presented with just exogenous inputs, as, Electricity load forecasting', Technical report, In EUNITE 2001
in the first case, some relationships between various values of symposium, a forecasting competition, (2001).
parameters were made clear. While, in the latter case, the days [5] K.W.Yeung Dit-Yan, `A locally recurrent neural network model for
grammatical interface', in processing Proceedings of the International
are implicitly the same if there is no information to the Conference on Neural Information Processing, pp. 1468--1473, (1995).
contrary. The results obtained in the first case were stable with [6] G. Dorffner, `Neural networks for time series processing', In Neural
higher precision than in the second case. Network World, 6, 447--468, (1996).
The main result of this paper is the development of well [7] I. Drezga and S. Rahman, `Input variable selection for ann-based
shortterm load forecasting', In IEEE Trans. On Power Systems, 13(4),
suited neural networks (AR-MCRN) to model the load 1238--1244, (November 1998).
forecasting application, and also the demonstration that the [8] I. Drezga and S. Rahman, `Short term load forecasting with local an
change in weather components over time leads to better predictors', In IEEE Trans. On Power Systems, 14(3), 844--850,
performance than using current absolute weather components (August 1999).
[9] M. A. El-Sharkawi and Dagmar Niebur Editors, `Application of
for the power plant peak load forecasting. Finally, the artificial neural networks to power systems', In IEEE press, 96 TP 112-
approach presented here compares favorably with other 0, (1996).
techniques proposed in [4, 11, 15, 17, 19], with maximum [10] J.L. Elman, `Finding structure in time', In Cognitive Science, 14(2),
values of 1.5 in mean average percentage error. 179--211, (1990).
[11] D. Esp, `Adaptive logic networks for east slovakian electrical load
forecasting', Technical report, In EUNITE 2001 symposium, a
VII. FUTURE WORK forecasting competition, (2001).
[12] J. J. Hop eld, `Learning algorithms and probability distributions in
Our future work will continue to study energy load forecasting feed-forward and feed-back networks', In Proceedings of the National
using a new Auto-Regressive multi-context recurrent neural Academy of Sciences of the USA, volume 84, pp. 8429--8433, USA,
network. This network is characterised by the links from both (1987).
hidden and output layers to the set of context layers. This [13] K. Kalaitzakis, G.S. Stavrakakis, and E.M. Anagnostakis, `Shot term
load forecasting based on arti cial neural networks parallel
network has previously been tested on other applications and
implementation', In Eclectic Power System Research 63, pp. 185--196,
has proved to be very competent when compared to networks Isanbul, Turky, (2002).
in the same category such as Elman and Jordan networks. [14] M.S. Kandil, S.M. El-Debeiky, and N.E. Hasanien, `Overview and
We describe a methodology to take full advantage of this comparison of long-term forecasting techniques for a fast developing
network's capabilities. This approach consists of two main utility', In part I, Electric Power Systems Research 58, pp. 11--17,
New York, (2001).
phases: of-line training and online training. During of fine [15] I. King and J. Tindle, `Storage of half hourly electric metering data and
training, the network is trained with a few years' data. Then, forecasting with arti cial neural network technology', Technical
from all this data a particular season is chosen and the report, In EUNITE 2001 symposium, a forecasting competition, (2001).
network is re-trained using the weights of the first training run [16] T. Kohonen, Self-Organizing Maps, Springer Verlag, Berlin,
Heidelberg, 1995.
as initial weights. Again, at the end of this training session, the [17] W. Kowalczyk, `Averaging and data enrichment: two approaches to
new weights are obtained and are used as initial weights to electricity load forecasting', Technical report, In EUNITE 2001
train for a particular month of that season. symposium, a forecasting competition, (2001).
The second phase has two main steps. The first step [18] A. Lendasse, M. Corttell, V.Wertz, and M. Veleysen, `Prediction of
electric load using kohonen maps-application to the polish electricity
consists of selecting a day for which one wants to predict the consumption', In the proceeding of the American Control conference
load. According to the inputs of that day (i.e., temperature, Anchorage, pp. 8–10, AK, (May 2002).
weather parameters, etc.), a clustering technique is used to [19] L. Lewandowski, F. Sandner, and P. Portzel, `Prediction of electricity
extract patterns (days) from the historical data that have load by modeling the temperature dependencies', Technical report, In
EUNITE 2001 symposium, a forecasting competition, (2001).
“similar” features to that day. The network is then trained with [20] A. D. Papalxopoulos and T. C. Hiterbe, `A regression-based approach
these patterns. The second step starts just after the completion to short-term load forecasting', In IEEE Tran. On Power Systems, 5(4),
of the first step. It consists of inputting the selected day to the 1535--1547, (1990).
trained network and the output should correspond to the [21] D. Park, M. Al-Sharkawi, R. Marks, A. Atlas, and M. Damborg,
`Electric load forecasting using an arti cial neural network', In IEEE
energy load of that day. Experimental results show that the Tran. On Power Systems, 6(2), 442--449, (1991).
network is very efficient and the prediction accuracy is very [22] S. Rahman, `Formulation and analysis of a rule-based short-term load
high. forecasting algorithm', Proceedings of the IEEE, 39, pp.161--169,
(1994).
[23] T. Rashid, B.Q. Huang, and M.-T Kechadi, `A new simple recurrent
REFERENCES network with real-time recurrent learning process', In the14th Irish
Arti cial Intelligence and Cognitive Science (AICS'03), volume 1, pp.
[1] A. Bakirtzis, V.Petridis, and S. Klartizis, `A neural network short term 169--174, Dublin, Ireland, (2003).
load forecasting model for the greek power system', In IEEE Tran. On [24] T. Rashid and T. Kechadi, `A practical approach for electricity load
Power Systems, 2 (11), 858--863, (1996). forecasting', In the proceeding WEC'05, The 3rdWorld Enformatika,
volume 5, pp. 201--205, Isanbul, Turky, (2005). ACTA Press.
43
International Journal of Computational Intelligence Volume 3 Number 1
[25] D. Srinivasan, `Evolving arti cial neural networks for short term load
forecasting', in In Neuro computing, volume 23, pp. 265--276, (1998).
[26] William H. Wilson, `Learning performance of networks like elman's
simple recurrent netwroks but having multiple state vectors', Workshop
of the 7th Australian Conference on Neural Networks, Australian
National University Canberra, (1996).
[27] B-L. Zhang and Z-Y. Dong, `An adaptive neural-wavelet model
forshort term load forecasting', In Electric Power Systems Research 59,
pp. 121--129, New York, (2001).
44