Abstract—— Short-term load forecasting plays a major are computer algorithms that have the power to define and
role in the operation of electric power systems to ensure prescribe actions [3]. This is based on different rules and steps
instantaneous balance between electricity generation and created by experienced engineers; these are converted into
demand. The accuracy of the forecast generated by a software to forecast load [1].
neural network (NN) has several factors, including but not The most common method, neural network (NN), is a
limited to, the algorithm used to train the network, how hybrid method that uses time series and regression [3]. The
much and what kind of data are used in the network’s NN looks at previous load data and finds trends from those
training set, how many hidden layers are in the NN, and data. It uses that knowledge to predict the load with weather
the size of the hidden layer(s). We investigate the best forecast data. There are different structures for NN used to
combination of these factors to decrease the mean absolute predict load, such as Hopfield, backpropagation, and
percent error (MAPE) and to give the best forecast Boltzmann machines. The most common NN design used for
possible. Based on system load data from the Electric load forecasting is backpropagation. Unlike statistical
Reliability Council of Texas (ERCOT), this paper focuses forecasting techniques, a mathematical model does not need to
on a comprehensive understanding of the accuracy of be defined a priori for NN; one can easily include any relevant
forecasts generated by NN’s with different algorithms, parameter as a node in the input layer, and the network will
while varying the length of the network’s training set, “learn” the relationship. Reference [4] completed an extensive
hidden layer size, number of hidden layers, and the literature review on NN’s place in Short-Term Load
addition of data sets when training to create a forecast Forecasting (STLF), and found that, although the individual
with the highest accuracy possible. conclusions of most papers on the subject are not sufficiently
convincing by themselves, the great number of positive results
Index Terms—Load forecast, deep learning, neural on the subject demonstrate it is a superior forecasting method.
network, Levenberg-Marquardt, Scaled Conjugate Hippert, Pedreira, and Souza suggest that many early
Gradient, Bayesian Regularization researchers on this topic likely overparameterized their NN,
leading to overfit of the training set and ultimately less
I.INTRODUCTION promising results.
Load forecasting has become one of the major research The different algorithms used in this paper for training NN
fields in electrical engineering. With the frequent changes in are Bayesian Regularization (BR), Scaled Conjugant Gradient
(SCG), and Levenberg-Marquardt (LM). BR trains the
weather conditions, electricity prices and demand side
network by reducing the sum of squared errors. It uses a
participation, load forecasting is more necessary now than
Jacobian matrix for its calculations and takes the most time of
ever [1]. Forecasting divides into long-term, medium-term and
the three methods [5]. LM uses sum-of-mean-squares for
short-term forecasting. Short-term prediction is based more on
weather conditions, such as dry bulb temperature and dew training and a Jacobian matrix for calculations; it takes more
memory to produce the results [6]. SCG is different from the
point temperature. It is usually from a one-hour to a one-week
previous two methods; it uses conjugate direction as a basis
period. Medium-term and long-term predictions depend more
for training the data [7].
on economic factors and political decisions. Medium-term
There are some challenges associated with load forecasting.
lasts from a week to a year and long-term is usually more than
a year [1]. First, load forecasting is based on predicted weather
There are various methods used for load forecasting these conditions [8]. These predicted values are not always accurate,
creating a skewed forecast [9]. Second, different seasons
days [2]. Some common ones used are fuzzy logic, expert
affect the forecast. This sometimes requires multiple models
systems, and neural networks [3]. The fuzzy logic method
to properly perform the forecast. Based on system load data
gathers similarities from large amounts of data. Instead of
from the Electric Reliability Council of Texas (ERCOT), this
saying two values are the same, it determines the degree to
which they are similar. It uses this logic to predict the load. paper focuses on a comprehensive understanding of the
The biggest advantage of using this logic is the absence of accuracy of forecasts generated by NN with different
algorithms, while varying the length of the network’s training
mathematical models and specific inputs [1]. Expert systems
set, hidden layer size, and number of hidden layers, and while
Fig. 1. The MAPE of each training set size for the three available NARX
algorithms
C. Effects of Deep NN on Forecast Accuracy
To study the impact of the number of hidden layers on load
forecasting accuracy, we conducted testing on deep NN. Figs.
5, 6, and 7 summarize the accuracy found with the three
NARX algorithms. The hidden layers we tested were set to the
default size of 20 neurons.
We notice that, for the LM and BR algorithms in the first
year, the error increases with the NN’s complexity (number of
layers). This is caused by overparameterization, mentioned in
Section V.A. The error increases beyond three years for these
two algorithms, for each NN size. However, multi-hidden-
layer networks gave us the best results as the training
interval’s size increased. The MAPE found with a 2 hidden-
layer NN trained with BR forecasted a load with the smallest
Fig. 2. Boxplot of MAPE vs. Hidden Layer size when trained with the MAPE of 2.61%. Eventually, as the quantity of data increases,
Levenberg Marquardt algorithm. the multi-hidden-layer NN’s outperform the SLN. Results in
Figs. 5-7 suggest that when deciding on the size of the NN
used to forecast, one must consider how much useful data is
available.
Fig. 3. The time taken for one training epoch of a SLN for the LM algorithm
vs. the hidden layer size (the red single line shows average MAPE over
iterations)
Fig. 4. The MAPE of the three NARX algorithms Fig. 6. MAPE vs. Number of Hidden Layers for the scaled conjugate
gradient algortithm.
obtained from NOAA to get additional variables. We acquired
hourly data for relative humidity, visibility, and wind speed as
additional predictor variables.
To test how each extra variable affected the MAPE, a NN
had to be created for each scenario. This included a new NN
for the original variables, addition of relative humidity,
addition of visibility, addition of wind speed, and the addition
of all three. For each new NN, a forecasting simulation ran
eleven times. Within each iteration, the simulation ran for
different years of training data, ranging from one year to
twelve. These iterations were then averaged together to form a
single resulting MAPE for each year of training data for each
combination of fields. Without loss of generality, only the LM
algorithm was tested.
Fig. 9 compares the accuracy results of added fields, where
Fig. 7. MAPE vs. Number of Hidden Layers for the LM algorithm. a bar graph along with several stem plots are used to show the
MAPE for different combinations of fields for different
D. Effects of Deep NN on Training Time amounts of training data when using the LM algorithm.
For a fixed hidden layer size, we see that the number of
synapse weights that adjust for each epoch increases by n2
each time a hidden layer is added, where n is the number of
neurons in the hidden layer. This is only the case for an NN
without feedback synapses. As a result, we can expect a
significant increase in training time as hidden layers are added
to the NN. Fig. 8 outlines the results of training time versus
number of training years for LM. For the other two
algorithms, similar test results can be obtained.