Anda di halaman 1dari 13

Neurocomputing 173 (2016) 958–970

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Electric load forecasting by the SVR model with differential empirical


mode decomposition and auto regression
Guo-Feng Fan a, Li-Ling Peng a, Wei-Chiang Hong b,n, Fan Sun a
a
College of Mathematics &Information Science, Ping Ding Shan University, Ping Ding Shan 467000, Henan, China
b
Department of Information Management, Oriental Institute of Technology, 58 Sec. 2, Sichuan Rd., Panchiao, New Taipei 220, Taiwan

art ic l e i nf o a b s t r a c t

Article history: Electric load forecasting is an important issue for power utility, associated with the management of daily
Received 26 December 2014 operations such as energy transfer scheduling, unit commitment, and load dispatch. Inspired by strong
Received in revised form non-linear learning capability of support vector regression (SVR), this paper presents a SVR model
5 June 2015
hybridized with the differential empirical mode decomposition (DEMD) method and auto regression
Accepted 20 August 2015
(AR) for electric load forecasting. The differential EMD method is used to decompose the electric load
Bijaya Ketan Panigrahi
Available online 1 September 2015 into several detail parts associated with high frequencies (intrinsic mode function (IMF)) and an
approximate part associated with low frequencies. The electric load data from the New South Wales
Keywords: (NSW, Australia) market and the New York Independent System Operator (NYISO, USA) are employed for
Electric load forecasting
comparing the forecasting performances of different alternative models. The results illustrate the validity
Support vector regression
of the idea that the proposed model can simultaneously provide forecasting with good accuracy and
Differential empirical mode decomposition
Auto regression interpretability.
& 2015 Elsevier B.V. All rights reserved.

1. Introduction demonstrated further performance improvements which could be


made for energy forecasting. Though these methods can yield some
Electrical energy could be hardly stocked; therefore, electric significant improvements in terms of forecasting accuracy, they are
load forecasting plays a vital role in the daily operational man- usually lacking of the interpretability. Recently, expert systems,
agement of power utility, such as energy transfer scheduling, unit mainly developed by means of linguistic fuzzy rule-based systems,
commitment, load dispatch, and so on. With the emergence of allow us to deal with the system modeling with good interpretability
load management strategies, it is highly desirable to develop [14]. However, these models have strong dependency on the expert
accurate, fast, simple, robust and interpretable load forecasting and often cannot generate satisfied forecasting accuracy. Therefore,
models for these electric utilities to achieve the purposes of higher hybrid models, which are based on the existed methods such as
reliability and management efficiency [1]. expert systems and other techniques, are proposed to receive both
In the past decades, researchers have proposed lots of meth- high accuracy and interpretability.
odologies to improve load forecasting accuracy. For example, Arda- Based on the advantages in statistical learning capacity to handle
kani et al. [2] proposed linear regression models for electricity con- high dimensional data, support vector regression (SVR) model,
sumption forecasting; Arisoy et al. [3] applied a Grey prediction especially suitable for small sample size learning, has become a
model for electricity demand in Turkey; Afshar and Bigdeli [4] pro- popular algorithm for many forecasting problems [15–17]. However,
the worst shortcoming of an SVR method is that it is easily trapped
posed an improved singular spectral analysis method for short-term
into a local optimum during the nonlinear optimization process
load forecasting (STLF) for the Iranian electricity market; and Kumar
particularly while its three parameters are determining. In addition,
and Jain [5] applied three time series models—Grey–Markov model,
its robustness also requires some improvement. These improving
Grey-Model with rolling mechanism, and singular spectrum analysis
issues are still ongoing in the SVR forecasting research fields [18].
—to forecast the consumption of conventional energy in India. By
On the other hand, in terms of finding time series fluctuation ten-
employing artificial neural networks, Refs. [6–9] proposed several
dency, the wavelet transform possesses the ability to construct a
useful short-term load forecasting models. By hybridizing the pop-
good time resolution in high frequency region of a time series
ular method and evolutionary algorithm, the authors of [10–13] (signal). However, a shortcoming of wavelet transform is that the
computing is somewhat time consuming and, particularly, it cannot
n
Corresponding author. achieve fine resolutions in both time domain and frequency domain
E-mail address: samuelsonhong@gmail.com (W.-C. Hong). simultaneously while suffering from large size data analysis [19,20].

http://dx.doi.org/10.1016/j.neucom.2015.08.051
0925-2312/& 2015 Elsevier B.V. All rights reserved.
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 959

For the empirical mode decomposition (EMD) with auto regression the literature, such as ARIMA model, BPNN model (artificial neural
(AR), which is a fast, easy, and reliable unsupervised clustering network trained by back-propagation algorithm), and GA–ANN
algorithm, it has been successfully applied to many fields, such as model (artificial neural network trained by genetic algorithm).
communication, economy, engineering, and so on [21–23], and also These experimental results indicate that this proposed DEMD–
has achieved good effects. In the meanwhile, the EMD method can SVR–AR model has the following advantages: (1) simultaneously
effectively extract the components of the basic mode from non- receives higher accuracy and interpretability; (2) the proposed
linear or non-stationary time series [21,23–26]. By employed EMD, model can tolerate more redundant information than the original
the original complex time series (with multi-scale) can be locally SVR model, thus, it has better generalization ability.
separated into a sum of a low frequency part (residual) and a high The rest of this paper is organized as follows: in Section 2, the
frequency part (IMF), i. e., time series can be transferred into a series DEMD–SVR–AR forecasting model is introduced and the main steps
with more apparent component by reducing noise [26]. However, of the model are given. In Section 3, the data description and the
the sifting process in the EMD modeling phase will stop when the research design are outlined. The numerical results and compar-
residual becomes either over-distorted or a monotonic function
isons are presented and discussed in Section 4. A brief conclusion of
from which no further IMF can be extracted [27,28]. Therefore,
this paper and the future research are provided in Section 5.
Bhusana and Chris [23] proposed the differential empirical mode
decomposition (DEMD) to overcome the fluctuation problem which
the original EMD method is unable to do well. In their model, a
derivate signal can be obtained by several derivative of the original 2. Support vector regression with differential empirical mode
signal which will eliminate the fluctuated gradient, so that the decomposition
signal can be better to meet the requirements of EMD. The new
signal is then used by EMD to integrate and receive each order 2.1. Differential empirical mode decomposition (DEMD)
intrinsic mode function (IMF) and the residual amount of the ori-
ginal signal. The DEMD method is used to decompose the electric The EMD method is based on the simple assumption that any
load to several detail parts associated with high frequencies IMF and signal consists of different simple intrinsic modes of oscillations.
an approximate part associated with low frequencies IMF. It can Each linear or non-linear mode will have the same number of
effectively reduce the interactions among lots of singular values and extreme and zero-crossings. There is only one extreme between
improve the forecasting performance of a single kernel function. successive zero-crossings. Each mode should be independent of
Thus, it is useful to employ suitable kernel functions for forecasting the others. Since the original work on EMD, several studies have
the medium-and-long-term tendencies of the time series. been presented to improve EMD. One improvement is the differ-
In this paper, we present a new hybrid model with clear ential EMD [23]. In this section, the differential EMD will be
human-understandable knowledge on training data to achieve a described as follows. In this way, each signal could be decomposed
satisfied forecasting accuracy. The principal idea is hybridizing into a number of intrinsic mode functions (IMFs), each of which
DEMD with SVR and AR, namely the DEMD–SVR–AR model, to should satisfy the following two definitions [25],
receive better forecasting performances. The rationale of our
forecasting model is as follows: (1) the raw data can be divided a. In the whole data set, the number of extreme and the number
into two parts by DEMD technology, one is the high frequency of zero-crossings should either equal or differ to each other at
item, another is the residuals; (2) the high frequency item have most by one.
little redundant information than the raw data and trend infor- b. At any point, the mean value of the envelope defined by local
mation, because these information are gone to the residuals, so the maxima and the envelope defined by the local minima is zero.
SVR model is employed to forecast the high frequency, the accu-
racy is higher than the original SVR model particularly in some An IMF represents a simple oscillatory mode compared with the
peak and valley values period; (3) the residuals is monotonous and simple harmonic function. With the definition, any signal x (t ) can be
stationary, so the AR model is appropriate for forecast the resi- decomposed as following steps, and the flowchart is shown as Fig. 1.
duals; (4) the forecasting results would be eventually obtained 1. Identify all local extremes, and then connect all the local
from the high frequency item and the residuals. The proposed maxima by a cubic spline line as the upper envelope.
DEMD–SVR–AR model has the capability in smoothing and redu- 2. Repeat the procedure for the local minima to produce the
cing the noise (inherited from DEMD), the capability in filtering lower envelope. The upper and lower envelopes should cover all
dataset and improving forecasting performance (inherited from
the data among them.
SVR), and the capability in effectively forecasting the future ten-
3. The mean of upper and low envelope value is designated as
dencies (inherited from AR). The forecasting outputs by using the
m1, and the difference between the signal x (t ) and m1 is the first
hybrid method are described in the following section.
component, h1, as shown in Eq. (1),
To show the applicability, generality and superiority of the
proposed model, firstly, half-hourly electric load data (48 data h1 = x (t ) − m1 (1)
points per day) from the New South Wales (NSW, Australia) with
Generally speaking, h1 will not necessarily meet the require-
two different sample sizes are employed to compare the fore-
ments of the IMF, because h1 is not a standard IMF. It needs to be
casting performances of the proposed model and other four
determined for k times until the mean envelope tends to zero.
alternative models existed in the literature, namely the PSO–BP
Then, the first intrinsic mode function c1 is introduced, which
model (BP neural network trained by a particle swarm optimiza-
tion algorithm), SVR model, PSO–SVR model (SVR parameters stands for the most high-frequency component of the original data
determined by the PSO algorithm), and the AFCM model (an sequence. At this point, the data could be represented as Eq. (2),
adaptive fuzzy combination model based on a self-organizing map h1k = h1 (k − 1) − m1k (2)
and support vector regression). Secondly, another hourly electric
load data (24 data points per day) from the New York Independent where h1k is the datum after k times siftings. h1 (k − 1) stands for
System Operator (NYISO, USA), also, with two different sample the data after k − 1 times sifting. Standard deviation (SD) is used to
sizes are used to further compare the forecasting performances of determine whether the results of each filter component meet the
the proposed model with other three alternative models existed in IMF or not. SD is defined as Eq. (3),
960 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

n
Start x1 (t ) = ∑ ci + rn
i=1 (6)
input signal Because the sifting process stops when the residual rn(t) becomes
x(t) either over-distorted or a monotonic function from which no further
IMF can be extracted. The power density of white Gaussian noise has
r = x(t), n = 1 a normal distribution, so eliminating the IMF that represents
the normal distribution is therefore assumed to cancel the white
Gaussian noise. Next, the last IMF, the lagged IMF before the
Determination of local maxima monotonic function emerges, is the most suitable because its local
and minima of X(t)
curves have a normal distribution. Subsequently, we subtract the
Fitting the envelope envelope under
original signals using the last IMF, denoted as c0 (t ) in Eq. (1).
E1 and E2 Finally, the differential EMD is proposed by Eq. (7),
DEMD = xn (t ) − c0 (t ) (7)
x(t) = h
m = (E1 + E2) / 2
where xn (t ) refers to dependent variables.
The original data can be expressed as the IMF component and
h = x(t) - m remainder.
x(t) = r

if h meet IMF conditions N


2.2. Support vector regression

The notions of support vector machines (SVMs) for the case of


Y
regression are introduced briefly. Given a data set of N elements
{(Xi , yi ) , i = 1, 2, ⋯ , N}, where Xi is the i-th element in n-dimen-
n = n+1, c(n) = h,
r = r- c(n)
sional space, i. e., Xi = [x1i , ⋯ , xni ] ∈ Rn , and yi ∈ R is the actual
value corresponding to Xi . A non-linear mapping function, g(  ):
Rn → R nh , is defined to map the training (input) data Xi into the
If r is a monotonic function N
so-called high dimensional feature space (which may have infinite
dimensions), R nh . Then, in the high dimensional feature space,
Y there theoretically exists a linear function, f , to formulate the non-
linear relationship between input data and output data. Such a
End
linear function, namely SVR function, is shown as Eq. (8),

f (X ) = WT φ (X ) + b (8)
Fig. 1. Differential EMD algorithm flowchart.
where f (X ) denotes the forecasting values; the coefficients
2
W (W ∈ R nh ) and b ( b ∈ R ) are adjustable. As mentioned above, the
T
h1 (k − 1) (t ) − h1k (t ) SVM method aims at minimizing the empirical risk, shown as Eq. (9),
SD = ∑
k=1
h12(k − 1) (t ) (3) N
1
R emp (f ) = ∑ Θε (yi , WT φ (Xi ) + b)
where T is the length of the data. N i=1 (9)
The value of standard deviation SD is limited in the range of
0.2 to 0.3, which means when 0.2 < SD < 0.3, the decomposition where Θε (y, f (x ))is the ε-insensitive loss function and defined
process can be finished. The consideration for this standard is that as Eq. (10),
it should not only ensure hk (t ) to meet the IMF requirements, but ⎛ f (X ) − Y − ε, if f (X ) − Y ≥ ε
also control the decomposition times. Therefore, in this way, the Θε (Y , f (X )) = ⎜
⎝ 0, otherwise (10)
IMF components could retain amplitude modulation information
in the original signal. In addition, Θε (Y , f (X )) is employed to find out an optimum hyper-
4. When h1k had met the basic requirements of SD, based on the plane on the high dimensional feature space (Fig. 1b) to maximize the
condition of c1 ¼ h1k , the signal x (t ) of the first IMF component c1 distance separating the training data into two subsets. Thus, the SVR
can be obtained directly, and a new series r1 could be achieved focuses on finding the optimum hyper plane and minimizing the
after deleting the high frequency components. This relationship
training error between the training data and the ε -insensitive loss
could be expressed as Eq. (4),
function. Then, the SVR minimizes the overall errors, shown as Eq. (11),
r1 = x1 (t ) − c1 (4) N
1 T
Min Rε (W , ξ *, ξ ) = W W + C ∑ (ξi* + ξi )
The new sequence is treated as the original data and repeats W , b, ξ *, ξ 2 (11)
i=1
the steps 1 to 3 processes. The second intrinsic mode function c2
could be obtained. with the constraints:
5. Repeat previous steps 1 to 4 until the rn cannot be decom-
Yi − WT φ (Xi ) − b ≤ ε + ξi*, i = 1, 2, ... , N
posed into the IMF. The sequence rn is called the remainder of the
original data x (t ) : rn is a monotonic sequence, it can indicate the − Yi + WT φ (Xi ) + b ≤ ε + ξi, i = 1, 2, ... , N
overall trend of the raw data x1 (t ) or mean, and it is usually ξi* ≥ 0, i = 1, 2, ... , N
referred as the so-called trend items. It is of clear physical sig- ξi ≥ 0, i = 1, 2, ... , N (12)
nificance. The process is expressed as Eqs. (5) and (6):
The first term of Eq. (11), employing the concept of maximizing
r1 = x1 (t ) − c1, r2 = r1 − c2, …, rn = rn − 1 − cn (5) the distance of two separated training data, is used to regularize
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 961

weight sizes to penalize large weights, and to maintain regression Input (data)
function flatness. The second term penalizes training errors of f (x )
DEMD
and y by using the ε -insensitive loss function. C is the parameter
to trade off these two terms. Training errors above ε are denoted as
ξi*, whereas training errors below  ε are denoted as ξi .
Resi
After the quadratic optimization problem with inequality con- IMF1 IMF2 IMF3 IMFk
duals
straints is solved, the parameter vector w in Eq. (8) is obtained as
Eq. (13), SVR
AR
N
W= ∑ (βi* − βi ) φ (Xi )
i=1 (13) Prediction
where ξi*,ξi are obtained by solving a quadratic program and are
Fig. 2. The full flowchart of DEMD–SVR–AR model.
the Lagrangian multipliers. Finally, the SVR regression function is
obtained as Eq. (14) in the dual space:
from SVR model and AR model, respectively, the final fore-
N casting results would be eventually obtained from the high
f (X ) = ∑ (βi* − βi ) K (Xi , X ) + b frequency item and the residuals.
i=1 (14)

where K (Xi , X ) is called the kernel function, and the value of


the kernel equals the inner product of two vectors, Xi and Xj , 3. Numerical examples
in the feature space φ (Xi ) and φ (Xj ), respectively; that is,
K (Xi , Xj ) = φ (Xi ) φ (Xj ). Any function that meets Mercer's condition To show the applicability, superiority and generality of the
[29] can be used as the kernel function. proposed model, we employ two different electric markets, the
There are several types of kernel function. The most used ker- New South Wales (NSW) market in Australia (namely Case 1) and
nel functions are the Gaussian radial basis functions (RBF) with a the New York Independent System Operator (NYISO) in USA
width of σ : K (Xi , Xj ) = exp ( − 0.5‖Xi − Xj ‖2 /σ 2) and the poly- (namely Case 2). In addition, for each case, we all conduct two
nomial kernel with an order of d and constants a1 and a2: kinds of sample size, small sample and large sample, respectively.
K (Xi , Xj ) = (a1Xi Xj + a2 )d . However, the Gaussian RBF kernel is not
only easy to implement, but also capable of non-linearly mapping 3.1. The experimental results of Case 1
the training data into an infinite dimensional space, thus, it is
suitable to deal with non-linear relationship problems. Therefore, For Case 1, firstly, the proposed model is trained by electric load
the Gaussian RBF kernel function is specified in this study. obtained from 2 to 7 May 2007 (i.e., training data set), and testing
electric load data is on 8 May 2007. The employed electric load data
2.3. AR Model is on a half-hourly basis (i.e., 48 data points per day). The data size
contains only 7 days, to differ from the other example with more
Eq. (15) expresses a p-step autoregressive model, referring as sample data, this example is so-called the small sample size data, and
AR(p)model [30]. Stationary time series {Xt } that meet the model illustrated in Fig. 3(a).
AR(p) is called the AR(p) sequence. That a = (a1, a2, ⋯ , ap )T is Secondly, too large training sets should avoid overtraining
named as the regression coefficients of the AR (p) model: during the learning process of the SVR model. Therefore, the
second experiment with 23 days (1104 data points from 2 to 24
p
May 2007) is modeled by using part of all the training samples as
Xt = ∑ aj Xt − j + εt , t ∈ Z
j=1 (15) training set, i.e., from 2 to 17 May 2007, and testing electric load
data is from 18 to 24 May 2007. This example is so-called the large
sample size data, and illustrated in Fig. 3(b).
2.4. The full procedure of DEMD–SVR–AR model
3.1.1. Results after DEMD in Case 1
The full procedure of the proposed DEMD–SVR–AR model is After being decomposed by DEMD, the data can be divided
briefed as follow and is illustrated in Fig. 2. into eight groups, which are shown in Fig. 4(a)–(h) and the last
group (Fig. 4(h) is a trend term (residuals)). The so-called high
Step 1. Decomposed the input data by DEMD: each electric load frequency item is obtained by adding the preceding seven groups.
data (input data) could be decomposed into a number of intrinsic From Fig. 3(a) and (b), the trend of the high frequency item is the
mode functions (IMFs), i. e., two parts, one is the high frequency same as original data, and the structure is more regular, i.e., it is
item, the other is the residuals. Please refer Section 2.1 and Fig. 1 to more stable. Then, the high frequency item (data-I) and the
learn more detail process of DEMD. residuals (data-II) have good effects of regression by the SVR and
Step 2. SVR modeling: SVR model is employed to forecast the AR, respectively, and will be described as follow.
high frequency item, thus, to look for most suitable parameters,
different sizes of fed-in/fed-out subsets will be set in this stage. 3.1.2. Forecasting using SVR for data-I (the high frequency item in
Please refer Section 2.2 to learn more detail process of SVR. Case 1)
Step 3. AR modeling: the residuals item is forecasted by the AR As shown in Fig. 3, the high frequency data and raw data have
model due to its monotonous and stationary. Please refer sub- the same characteristic such as nonlinearity, chaos. The SVR model
section 2.3 to learn more detail processes of AR modeling. is very adaptive to solve forecasting problems.
Similarly, while the new parameters are with smaller MAPE value Firstly, for both small sample and large sample data, the high-
or maximum iteration is reached, the new three parameters and frequency item is simultaneously employed for SVR modeling, and
its corresponding objective value is the solution in this stage. the better performances of the training and testing (forecasting) sets
Step 4. DEMD–SVR–AR forecasting: after receiving the fore- are shown in Fig. 5(a) and (b), respectively. The correlation coeffi-
casting values of the high frequency item and the residuals item cients of training effects are 0.9935 and 0.9927, respectively, of the
962 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

11000
11000
10000 10000
9000 9000
8000 8000
7000 7000
6000 6000

Electric load (MW)


Electric load (MW)

5000 5000 Original data


Original data
4000 DEMD data-I
4000 DEMD data-I
3000
3000
2000
2000
1000
1000
0
0 -1000
-1000 -2000
-2000 -3000

-50 0 50 100 150 200 250 300 350 0 200 400 600 800 1000 1200
Time (half hour) Time (half hour)

Fig. 3. (a) Half-hourly electric load in NSW from 2 to 8 May 2007; (b) half-hourly electric load in NSW from 2 to 24 May 2007.

800 1500
1500
Electric load (MW)

Electric load (MW)


Electric load (MW)

600
400 1000 1000

200 500
500
0
-200 0 0

-400 -500
-500
-600
-800 -1000 -1000

-50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350

Time (half hour) Time (half hour) Time (half hour)

1500 1200
1500
1000
Electric load (MW)
Electric load (MW)

Electric load (MW)

1000 1000 800


500 600
500
400
0
0 200
-500 0
-1000 -500 -200
-400
-1500 -1000 -600
-2000 -800
-1500
-50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350

Time (half hour) Time (half hour) Time (half hour)

300 9400 11000


Electric load (MW)
Electric load (MW)

9200
Electric load (MW)

200 10000
9000
100
8800 9000

0 8600
8000
8400
-100
8200 7000
-200
8000 6000
-300 7800
-50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350 -50 0 50 100 150 200 250 300 350

Time (half hour) Time (half hour) Time (half hour)


Fig. 4. For ease of prevention, the graphs (a)–(h) show the plots at different IMFs for the small sample size in Case 1.

forecast effects are 0.9976 and 0.9984, accordingly. This implies that 3.1.3. Forecasting using AR for data-II (the residuals in Case 1)
the decomposition is helpful to improve the forecasting accuracy. The As shown in Fig. 4(h), the residuals are linear locally and stable,
parameters of a SVR model for data-I are shown in Table 1, in which so the AR technique is very suitable to forecast.
the forecasting error for the high-frequency decomposed by the Then, according to the geometric decay of the correlation coef-
modified DEMD and SVR has been reduced. ficient and partial correlation coefficients fourth-order truncation
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 963

Fig. 5. Comparison of the data-I and the forecasted electric load of training and testing by the SVR model for the small sample and large sample data in Case 1: (a) one-day
ahead prediction of May 8, 2007 are performed by the model; (b) one-week ahead prediction from 18 to 24 May 2007 are performed by the model.

3.2.2. Forecasting using SVR for data-I (the high frequency item in
Table 1
The SVR's parameters for data-I in Case 1.
Case 2)
As shown in Fig. 7, the high frequency data and raw data have
Sample size m σ C ε Testing MAPE the same characteristic such as nonlinearity, chaos. The SVR model
is very adaptive to solve forecasting problems.
The small sample size 20 0.1 100 0.0047 9.72
The large sample size 20 0.24 128 0.0021 4.9
Firstly, for both small sample and large sample data, the high-
frequency item is simultaneously employed for SVR modeling, and
the better performances of the training and testing (forecasting)
for data-II (the residuals), it can be denoted as AR(4) model. The sets are shown in Fig. 9(a) and (b), respectively. The correlation
parameters of an AR model for data-II are also shown in Table 2. coefficients of training effects are 0.9901 and 0.9915, respectively,
As shown in Fig. 6(a) and (b), the residuals, for both small sample of the forecast effects are 0.9936 and 0.9957, accordingly. This
and large sample data, almost are in a straight line. In addition, it is not implies that the decomposition is helpful to improve the fore-
difficult to find straight line in Fig. 4(h), which is also the superiority of casting accuracy. The parameters of a SVR model for data-I are
DEMD technology. The good forecasting results are shown in Table 2, shown in Table 3, in which the forecasting error for the high-fre-
and the errors have reached the level of 10  5 for the small or large quency decomposed by the modified DEMD and SVR has been
amount of data. It has demonstrated the superiority of the AR model. reduced.
In Table 2, the forecasting error of the residuals by the improved
decomposition DEMD has significantly reduced. 3.2.3. Forecasting using AR for data-II (the residuals in Case 2)
As shown in Fig. 8(h), the residuals are linear locally and stable,
3.2. The experimental results of Case 2 so the AR technique is very suitable to forecast.
Then, according to the geometric decay of the correlation coef-
For Case 2, firstly, the proposed model is trained by electric load ficient and partial correlation coefficients fourth-order truncation
obtained from 1 January 2015 to 12 January 2015 (i.e., training data for data-II (the residuals), it can be denoted as AR(4) model. The
set), and testing electric load data is from 13 to 14 January 2015. parameters of an AR model for data-II are also shown in Table 4.
The employed electric load data is on an hour basis (i.e., 24 data As shown in Fig. 10(a) and (b), the residuals, for both small
points per day). The data size contains only 14 days, to differ from sample and large sample data, almost are in a straight line. In
the other example with more sample data, this example is so- addition, it is not difficult to find straight line in Fig. 8(h), which is
called the small sample size data, and illustrated in Fig. 7(a). also the superiority of DEMD technology. The good forecasting
Secondly, the second experiment with 46 days (1104 data results are shown in Table 4, and the errors have reached the level
points from 1 January to 15 February 2015) is modeled by using of 10  5 for the small or large amount of data. It has demonstrated
part of all the training samples as training set, i.e., from 1 January the superiority of the AR model. In Table 4, the forecasting error of
to 1 February 2015, and testing electric load data is from 2 to 15 the residuals by the improved decomposition DEMD has sig-
February 2015. This example is so-called the large sample size nificantly reduced.
data, and illustrated in Fig. 7(b).

3.2.1. Results after DEMD (in Case 2) 4. Results and analysis


After being decomposed by DEMD, similarly, the data can also
be divided into eight groups, which are shown in Fig. 8(a) to This section focuses on the efficiency of the proposed model
(h) and the last group (Fig. 8(h) is a trend term (residuals)). The with respect to computational accuracy and interpretability. To
high frequency item is also obtained by adding the preceding consider the small sample size modeling ability of the SVR model
seven groups. From Fig. 7(a) and (b), the trend of the high fre- and conduct fair comparisons, we perform two real experimental
quency item is the same as original data, and the structure is more cases, as mentioned in Section 3, which are both with relatively
regular, i.e., it is more stable. Then, the high frequency item (data- small sample size for the first experiment. And, the second next
I) and the residuals (data-II) have good effects of regression by the experiment with 1104 data points is focused on illustrating the
SVR and AR, respectively, and will be described as follow. relationship between sample size and accuracy.
964 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

Table 2
Summary of results of the AR forecasting model for data-II in Case 1.

Residuals MAE Eqution

The small sample size 9.7725 × 10−5 xn = 5523.894 + 1.01xn − 1 + 0.372176xn − 2 + 0.002791xn − 3 − 0.791445xn − 4
The large sample size 7.5921 × 10−5 xn = 5538.269 + 1.0022xn − 1 + 0.369828xn − 2 + 0.001914xn − 3 − 0.753692xn − 4

8640
Actual Values Actual Values
9500
8620 Predicted Values Predicted Values

8600 9400

Electric load-Trend (MW)


8580
Electric load-Trend(MW)

9300
8560

8540 9200

8520
9100
8500

8480 9000

8460
8900
8440
0 10 20 30 40 50 -50 0 50 100 150 200 250 300 350
Time(half hour) Time (half hour)

Fig. 6. Comparison of the data-II and the forecasted electric load by the AR model for the two experiments in Case 1: (a) one-day ahead prediction of 8 May 2007 performed
by the model; (b) one-week ahead prediction from 18 to 24 May 2007 performed by the model.

26000 26000
24000 24000
22000 22000
20000 20000
18000 18000
16000 16000
Electric load (MW)

Electric load (MW)

14000 14000
12000 original 12000
10000 data-I 10000 original
8000 8000 data-I
6000 6000
4000 4000
2000 2000
0 0
-2000 -2000
-4000 -4000
-6000 -6000
-50 0 50 100 150 200 250 300 350 -100 0 100 200 300 400 500 600 700 800 900 1000 1100 1200
Time (hour) Time (hour)

Fig. 7. (a) Hour electric load in NYISO from 1 to 14 January 2015; (b) hour electric load in NYISO from 1 to 15 February 2015.

4.1. Forecasting evaluation methods 4.2. Parameter settings of the employed forecasting models

For the purpose of evaluating the forecasting capability, we As mentioned by Taylor [31], and to be based on the same
examine the forecasting accuracy by calculating three different sta- comparison condition with Che et al. [32], in Case 1, some para-
tistical metrics, the root mean square error (RMSE), the mean absolute meter settings of the employed forecasting models are set as fol-
error (MAE) and the mean absolute percentage error (MAPE). The lowings. For the PSO–BP model, as mentioned in [32], they employ
definitions of RMSE, MAE and MAPE are expressed as Eqs. (16)–(18): 90% of all collected samples as the training set, and the rest as the
n 2 evaluation set. The parameters used in the PSO–BP are set as fol-
∑i = 1 ( Pi − Ai ) lows, (i) The BP neural network is set as that the input layer
RMSE =
n (16) dimension (indim) is 2, hidden layer dimension (hiddennum) is 3,
output layer dimension (outdim) is 1; (ii) the related settings of the
n
∑i = 1 Pi − Ai PSO, as mentioned in [32], are as that maximum iteration number
MAE =
n (17) (itmax) is 300, number of particles N is 40, length of particle D is 3,
weight c1 and c2 are set as 2. Because the PSO–SVR model embeds
∑i = 1
n Pi − Ai the construction and prediction algorithm of SVR in the fitness
Ai
MAPE = *100 value iteration step of PSO, it will take a long time to train the
n (18)
PSO–SVR using the full training dataset. For the above reason, we
where Pi and Ai are the i-th predicted and actual values, draw a small part of all training samples as training set, and the
respectively, and n is the total number of predictions. rest as evaluation set. The parameters of PSO used in this case are
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 965

2000

Electric load (MW)


2000 3000
Electric load (MW)

Electric load (MW)


1500
2000
1000
1000
1000
500
0 0 0
-1000 -500
-1000
-2000 -1000
-2000 -3000 -1500

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

Time (hour) Time (hour) Time (hour)

2000 1000

Electric load (MW)


800 100
Electric load (MW)

Electric load (MW)


1500
600
1000
400 0
500 200
0 0 -100

-500 -200
-400 -200
-1000
-600
-1500 -300
-800
-2000 -1000
-400
0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

Time (hour) Time (hour) Time (hour)

20000
80 40

Electric load (MW)


Electric load (MW)

Electric load (MW)

60 30 19900
40 20 19800
20 10
19700
0 0
-20 -10 19600
-40 -20
19500
-60 -30
-80 -40 19400

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200

Time (hour) Time (hour) Time (hour)


Fig. 8. For ease of prevention, the graphs (a)–(h) show the plots at different IMFs for the small sample size in Case 2.

as follows, for small sample size, maximum iteration number SVR model and the proposed DEMD–SVR–AR model are shown in
(itmax) is 50, number of particles N is 20, length of particle D is 3, Fig. 11(a). Notice that the forecasting curve of the proposed
weight c1 and c2 are set as 2; for large sample size, maximum DEMD–SVR–AR model (red solid dot and red curve) fits better
iteration number (itmax) is 20, number of particles N is 5, length of than other alternative models. For the Case 2, the forecasting
particle D is 3, weight c1 and c2 are set as 2. results (the electric load from 13 to 14 January 2015) of the
Regarding to Case 2, to further verify the applicability, gen- ARIMA model, the BPNN model, the GA–ANN model, and the
erality and superiority of the proposed model, the newest electric proposed DEMD–SVR–AR model are shown in Fig. 12(a). Simi-
load data from NYISO is employed for modeling, three alternative larly, the forecasting curve of the proposed DEMD–SVR–AR
forecasting models (including the ARIMA model, BPNN model, and model (red solid triangle and red curve) also fits better than
GA–ANN model) existed in the literature are selected to be com- others.
pared with the proposed model. Some parameter settings of the The second experiments in Cases 1 and 2 show the one-week-
employed forecasting models are set as followings. For BPNN
ahead forecasting for the large sample size data. The peak load
model, the node numbers of its structure are different between
values of testing set are bigger than that of training set shown in
small sample size and large sample size, for the former one, the
Figs. 5(b) and 9(b), respectively. The detailed forecasted results of
input layer dimension is 240, the hidden layer dimension is 12,
this experiment are shown in Figs. 11(b) and 12(b). It indicates that
and the output layer dimension is 48, and 480, 12, 336, respec-
the results obtained from the DEMD–SVR–AR model fits the peak
tively, for the latter one. The parameters of GA–ANN model used in
load values exceptionally well. In other words, the DEMD–SVR–AR
this case are as follows, generation numbers are set as 5, popula-
tion size is set as 100, bit numbers are set as 50, mutation rate is model has better generalization ability than the three comparison
set as 0.8, crossover rate is 0.05. models in both Cases. Particularly in Case 1, for example, the local
enlargement (peak) details of Fig. 11(a) and (b) are shown in Fig. 13
4.3. Empirical results and analysis (a) and (b), respectively. It is clearer to see that the forecasting curve
of the proposed DEMD–SVR–AR model (red solid dot and red curve)
For the first experiment in Case 1, the forecasting results (the fits more precise than other alternative models, i.e., it is powerful to
electric load on 8 May 2007) of the original SVR model, the PSO– keep the data changing trend including fluctuation tendency.
966 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

5000
5000
original
4000 original 4000
predict
predict
3000 3000

2000 2000

1000 1000
data-I (MW)

data-I (MW)
0

-1000 -1000

-2000 -2000

-3000 -3000

-4000 -4000

-5000 -5000

-6000 -6000

0 24 48 72 96 120 144 168 192 216 240 264 288 0 48 96 144 192 240 288 336 384 432 480 528 576 624 672 720 768

Time (hour) Time (hour)

4000 5000 4000 5000

3000 4000 3000 4000

2000 3000 2000 3000

Original data- (MW)


2000 2000
1000 1000
DEMD-dataI (MW)

forecast (MW)
forecast (MW)
1000 1000
0 0

0 0
-1000 -1000

-1000 -1000
-2000 -2000
-2000 -2000
-3000 -3000
-3000 -3000
-4000 -4000
-4000 -4000
-5000 -5000
0 5 10 15 20 25 30 35 40 45 50 0 20 40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340

Time(hour) Time(hour)

Fig. 9. Comparison of the data-I and the forecasted electric load of training and testing by the SVR model for the small sample and large sample data in Case 2: (a) one-day
ahead prediction from 13 to 14 January 2015 are performed by the model; (b) one-week ahead prediction from 2 to 15 February 2015 are performed by the model.

Table 3 learning or regression, and the level of local optimum increases.


The SVR’s parameters for data-I in Case 2. Therefore the forecasting accuracy increases significantly.
Several observations can also be noticed from the results.
Sample size m σ C ε Testing MAPE
Firstly, from the comparisons among these models, we point
The small sample size 24 0.12 113 0.0038 8.19 out that the proposed model outperforms other alternative
The large sample size 24 0.21 127 0.0019 5.37 models. Secondly, the DEMD–SVR–AR model has better gen-
eralization ability for different input patterns as shown in the
second experiment. Thirdly, from the comparison between the
different sample sizes of these two experiments, we conclude
The forecasting results in Cases 1 and 2 are summarized in
Tables 5 and 6, respectively. The proposed DEMD–SVR–AR model that the hybrid model can tolerate more redundant information
is compared with four alternative models. It is found that our and construct the model for the larger sample size data set.
hybrid model outperforms all other alternatives in terms of all the Finally, since the proposed model generates good results with
evaluation criteria. One of the general observations is that the good accuracy and interpretability, it is robust and effective as
proposed model tends to fit closer to the actual value with a shown in Tables 5 and 6. Overall, the proposed model provides
smaller forecasting error. a very powerful tool to implement easily for electric load
The proposed model shows the higher forecasting accuracy in forecasting.
terms of three different statistical metrics. In view of the model Furthermore, to verify the significance of the accuracy
effectiveness and efficiency on the whole, we can conclude that improvement of the DEMD–SVR–AR model, the forecasting accu-
the proposed model is quite competitive against other compared racy comparisons in both Cases among original SVR, PSO–SVR,
models, the ARIMA, BPNN, GA–ANN, PSO–BP, SVR, PSO–SVR, and PSO–BP, AFCM, ARIMA, BPNN, GA–ANN and DEMD–SVR–AR
AFCM models. In other words, the hybrid model leads to better models are conducted by a statistical test, namely a Wilcoxon
accuracy and statistical interpretation. signed-rank test, at the 0.025 and 0.05 significance levels in one-
In particularly, as shown in Fig. 13, our method shows higher tail-tests. The test results are shown in Tables 7 and 8. Clearly, the
accuracy and well flexibility in peak or inflection point, because proposed DEMD–SVR–AR model is significant (under a significant
the little redundant information could be used to statistical level 0.05) superior to other alternative models.
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 967

Table 4
Summary of results of the AR forecasting model for data-II in Case 2.

Residuals MAE Eqution

The small sample size 6.7345 × 10−5 xn = 10372.441−0.998xn − 1 + 0. 65218xn − 2−0. 3316xn − 3 + 0. 00072xn − 4
The large sample size 7. 8579 × 10−5 xn = 11013.26 + 0. 9782xn − 1 + 0. 11xn − 2−0. 4783xn − 3 + 0. 36437xn − 4

19940
20130
19920
20120 Electric load Trend
20110 19900 forecast
data-II

Electric load Trend (MW)


20100 forecast
Electric load Trend (MW)

19880
20090
20080 19860

20070
19840
20060
20050 19820

20040
19800
20030
20020 19780

0 5 10 15 20 25 30 35 40 45 50 0 25 50 75 100 125 150 175 200 225 250 275 300 325
Time (hour) Time (hour)

Fig. 10. Comparison of the data-II and the forecasted electric load by the AR model for the two experiments in Case 2: (a) one-day ahead prediction of 13 to 14 January 2015
are performed by the model; (b) one-week ahead prediction from 2 to 15 February 2015 are performed by the model.

10500 (1) 11500


11000 (2)
10000
10500
9500 10000
9500
Electric load (MW)
Electric load (MW)

9000
9000

8500 8500
8000
8000 7500
Raw data
Forecasted load by DEMDSVRAR 7000
7500
Forecasted load by SVR 6500
7000 Forecasted load by PSOSVR 6000 Raw data
Forecasted load by DEMDSVRAR
5500 Forecasted load by SVR
6500 Forecasted load by PSOSVR
5000
0 10 20 30 40 50 0 50 100 150 200 250 300 350
Time (half hour) Time(half hour)

Fig. 11. Comparison of the original data and the forecasted electric load by the DEMD–SVR–AR Model, the SVR model and the PSO–SVR model for: (a) the small sample size
(One-day ahead prediction of May 8, 2007 are performed by the models); (b) the large sample size (one-week ahead prediction from 18 to 24 May 2007 are performed by the
models). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

5. Conclusions other sub-classes with small size. The DEMD term of the proposed
DEMD–SVR–AR model has been employed in the present research,
The proposed model achieves superiority and significantly out- details of which have discussed in the above section.
performs the original SVR model while forecasting based on the The interest in applying the DEMD forecast systems arises from
unbalanced data. In addition, the goal of the training model is not to the fact that those systems consider both accuracy and compre-
learn an exact representation of the training set itself, but rather to hensibility of the forecast result simultaneously. To this end, a
set up a statistical model that generalizes better forecasting values hybrid model has been proposed and its effectiveness in forecasting
for the new inputs. In practical applications of a SVR model, if the the electric load data has been compared with three other alter-
SVR model is over trained to some sub-classes with overwhelming native models. In this study, various data characteristics of electric
size, it memorizes the training data and gives poor generalization of load are identified where the proposed model performs better than
968 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

24000 24000

22000
22000

20000
Electric load (MW)

Electric load (MW)


20000

18000
18000
16000

16000 Raw data Raw data


ARIMA(4,1,4) 14000 ARIMA(4,1,4)
BPNN BPNN
14000 GANN GANN
DEMDSVRAR 12000
DEMDSVRAR

0 10 20 30 40 50 0 50 100 150 200 250 300


Time (hour) Time (hour)

Fig. 12. Comparison of the original data and the forecasted electric load by the DEMD–SVR–AR Model, the ARIMA model, the BPNN model and the GA–ANN model for:
(a) the small sample size (one-day ahead prediction from 13 to 14 January 2015 are performed by the models); (b) the large sample size (one-week ahead prediction from 2
to 15 February 2015 are performed by the models). (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

10600
11500

10400
11000
Electric load (MW)

Electric load (MW)

10200

10500

10000

10000
9800

9500
34 36 38 40 42 160 180 200 220 240 260 280 300 320 340

Fig. 13. The local enlargement (peak) comparison of the DEMD–SVR–AR Model, the SVR model and the PSO–SVR model for (a) the small sample size; (b) the large sample
size. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

Table 5
Table 6
Summary of results of the forecasting models in Case 1.
Summary of results of the forecasting models in Case 2.

Algorithm MAPE RMSE MAE Running time


Algorithm MAPE RMSE MAE
(s)
ARIMA(4,1,4) 45.33 320.45 25.72
For the first experiment (small sample size)
BP–ANN 31.76 219.43 21.69
Original SVR 11.6955 145.865 10.9181 180.4
GA–ANN 23.89 220.96 23.55
PSO–SVR 11.4189 145.685 10.6739 165.2
EMD–SVR–AR 14.31 158.11 17.44
PSO–BP 10.9094 142.261 10.1429 159.9
DEMD–SVR–AR 8.19 140.16 12.79
AFCM [24] 9.9524 125.323 9.2588 75.3
ARIMA(4,1,4) 60. 65 733.22 54.05
EMD–SVR–AR 9.8595 117.159 9.0967 80.7
BP–ANN 42.5 479.48 50.39
DEMD–SVR– 9.7162 110.159 8.7459 76.8
GA–ANN 33.12 450.63 44.35
AR
EMD–SVR–AR 11.29 289.21 20.76
For the second experiment (large sample size) DEMD–SVR–AR 5.37 160.58 15.82
Original SVR 12.8765 181.617 12.0528 116.8
PSO–SVR 13.503 271.429 13.0739 192.7
PSO–BP 12.2384 175.235 11.3555 163.1
AFCM [26] 11.1019 158.754 10.4385 160.4
EMD–SVR–AR 5.100 134.201 9.8215 162.0 experimental results, we conclude that the proposed DEMD–SVR–
DEMD–SVR–AR 4.826 130.118 9.5440 163.3 AR model algorithm can generate not only human-understandable
rules, but also better forecasting accuracy levels. Our proposed
model also outperforms other alternative models in terms of
the other algorithms in terms of its forecasting capability. For interpretability, forecasting accuracy and generalization ability,
example, in Case 2, the electric load from NYISO is with more which are especially true for forecasting with unbalanced data and
fluctuated tendency, where DEMD algorithm can significantly very complex systems. In particular, the analyzed sequence can be
overcome the fluctuation problem. Based on the obtained decomposed by the improved DEMD accurately, thereby improve
G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970 969

Table 7 [11] W.-C. Hong, Electric load forecasting by seasonal recurrent SVR (support
Wilcoxon signed-rank test in Case 1. vector regression) with chaotic artificial bee colony algorithm, Energy 36
(2011) 5568–5578.
Compared models Wilcoxon signed-rank test [12] M. Yesilbudak, S. Sagiroglu, I. Colak, A new approach to very short term wind
speed prediction using k-nearest neighbor classification, Energy Convers.
α ¼0.025; W ¼4 α ¼ 0.05;W¼ 6 Manag. 69 (2013) 77–86.
[13] H. Peng, F. Liu, X. Yang, A hybrid strategy of short term wind power prediction,
Renew. Energy 50 (2013) 590–595.
DEMD–SVR–AR vs. original SVR 8 3a
[14] X. An, D. Jiang, C. Liu, M. Zhao, Wind farm power prediction based on wavelet
DEMD–SVR–AR vs. PSO–SVR 6 2a
decomposition and chaotic time series, Expert Syst. Appl. 38 (2011) 11280–11285.
DEMD–SVR–AR vs. PSO–BP 6 2a [15] Y. Lei, J. Lin, Z. He, M.J. Zuo, A review on empirical mode decomposition in fault
DEMD–SVR–AR vs. AFCM 6 2a diagnosis of rotating machinery, Mech. Syst. Signal Process. 35 (2013) 108–126.
DEMD–SVR–AR vs. EMD–SVR–AR 6 2a [16] P. Wong, Q. Xu, C. Vong, H. Wong, Rate-dependent hysteresis modeling and
control of a piezostage using online support vector machine and relevance
a
Denotes that the DEMD–SVR–AR model significantly outperforms other vector machine, IEEE Trans. Ind. Electron. 59 (2012) 1988–2001.
alternative models. [17] Z. Wang, L. Liu, Sensitivity prediction of sensor based on relevance vector
machine, J. Inf. Comput. Sci. 9 (2012) 2589–2597.
[18] W.-C. Hong, Intelligent Energy Demand Forecasting, Springer, London, UK, 2013.
Table 8 [19] Z.K. Peng, P.W. Tse, F.L. Chu, A comparison study of improved Hilbert–Huang
Wilcoxon signed-rank test. in Case 2. transform and wavelet transform: Application to fault diagnosis for rolling
bearing, Mech. Syst. Signal Process. 19 (2005) 974–988.
Compared models Wilcoxon signed-rank test [20] H. Li, B. Xu, Y. Zuo, G. Wu, The comparative study of the signal trend extraction
based on Wavelet Transformation and EMD method, Instrum. Anal. Monit. 3
α¼ 0.025; W ¼ 4 α ¼0.05; W¼ 6 (2013) 28–30.
[21] B. Huang, A. Kunoth, An optimization based empirical mode decomposition
DEMD–SVR–AR vs. ARIMA 6 2a scheme, J. Comput. Appl. Math. 240 (2013) 174–183.
DEMD–SVR–AR vs. BPNN 6 2a [22] G. Fan, S. Qing, Z. Wang, Shi, W.-C. Hong, L. Dai, Study on apparent kinetic
DEMD–SVR–AR vs. GA–ANN 6 2a prediction model of the smelting reduction based on the time series, Math.
DEMD–SVR–AR vs. EMD–SVR–AR 6 2a Probl. Eng. 2012 (2012) 1–15, http://dx.doi.org/10.1155/2012/720849.
[23] P. Bhusana, T. Chris, Improving prediction of exchange rates using differential
a EMD, Expert Syst. Appl. 40 (2013) 377–384.
Denotes that the EMDSVRAR model significantly outperforms other alter-
[24] X. An, D. Jiang, M. Zhao, C. Liu, Short-term prediction of wind power using
native models. EMD and chaotic theory, Commun. Nonlinear Sci. Numer. Simul. 17 (2012)
1036–1042.
the forecasting accuracy of the SVR model. Meanwhile, even the [25] Y. Huang, F.G. Schmitt, Time dependent intrinsic correlation analysis of tem-
perature and dissolved oxygen time series using empirical mode decom-
interference is decomposed into the residuals, the AR model is still position, J. Mar. Syst. 130 (2014) 90–100.
receive well forecasting performance. [26] G. Rilling, P. Flandrin, P. Gonçalvès, On empirical mode decomposition and its
algorithms, in: Proceedings of the 6th IEEE/EURASIP Workshop on Nonlinear
Signal and Image Processing (NSIP'03), Grado, Italy, 2003.
[27] W. Huang, Z. Shen, N.E. Huang, Y.C. Fung, Nonlinear indicial response of
Acknowledgments complex nonstationary oscillations as pulmonary hypertension responding to
step hypoxia, Proc. Natl. Acad. Sci. 96 (1996) 1834–1839 , USA.
[28] N.E. Huang, N.O. Attoh-Okine, The Hilbert Transform in Engineering, CRC
This work was supported by the Startup Foundation for Doctors Press, Florida, USA, 2005, Taylor & Francis Group.
(No. PXY-BSQD-2014001), Educational Commission of Henan [29] V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag,
Province of China (No. 15A530010), The Youth Foundation of Ping New York, NY, USA, 1995.
[30] H.L. Koul, X. Zhu, Goodness-of-fit testing of error distribution in nonpara-
Ding Shan University (No. PXY-QNJJ-2014008), and Ministry of metric ARCH(1) models, J. Multivar. Anal. 137 (2015) 141–160.
Science and Technology, Taiwan (NSC 100-2628-H-161-001-MY4 [31] J.W. Taylor, Short-term load forecasting with exponentially weighted meth-
ods, IEEE Trans. Power Syst. 27 (2012) 458–464.
and MOST 104-2410-H-161-002).
[32] J. Che, J. Wang, G. Wang, An adaptive fuzzy combination model based on self-
organizing map and support vector regression for electric load forecasting,
Energy 37 (2012) 657–664.

References

[1] J.T. Bernard, D. Bolduc, N.D. Yameogo, S. Rahman, A pseudo-panel data model
Guo-Feng Fan was born in Shanxi Province, China.
of household electricity demand, Resour. Energy Econ. 33 (2010) 315–325.
Birthdate: May 29th, 1985. He received his Doctoral
[2] F.J. Ardakani, M.M. Ardehali, Long-term electrical energy consumption fore-
degree in Engineering Research Center of Metallurgical
casting for developing and developed economies based on different optimized Energy Conservation and Emission Reduction, Ministry
models and historical data types, Energy 65 (2014) 452–461. of Education, Kunming University of Science and
[3] I. Arisoy, I. Ozturk, Estimating industrial and residential electricity demand in Technology, Kunming, 2013. His research interests are
Turkey: A time varying parameter approach, Energy 66 (2014) 959–964. ferrous metallurgy, Energy forecasting, Optimization,
[4] K. Afshar, N. Bigdeli, Data analysis and short term load forecasting in Iran System Identification.
electricity market using singular spectral analysis (SSA), Energy 36 (2011)
2620–2627.
[5] U. Kumar, V.K. Jain, Time series models (Grey–Markov, Grey Model with
rolling mechanism and singular spectrum analysis) to forecast energy con-
sumption in India, Energy 35 (2010) 1709–1716.
[6] P. Li, Y. Li, Q. Xiong, Y. Zhang, Application of a hybrid quantized Elman neural
network in short-term load forecasting, Int. J. Electr. Power Energy Syst. 66
(2014) 1–8.
[7] A. Kavousi-Fard, H. Samet, F. Marzbani, A new hybrid modified firefly algo- Li-ling Peng, Hunan Province, China. Birthdate: February
rithm and support vector regression model for accurate short term load 15th, 1985, She received his master degree in Faculty of
forecasting, Expert Syst. Appl. 41 (2014) 6047–6056. Science, Kunming University of Science and Technology,
[8] F. Rodrigues, The daily and hourly energy consumption and load forecasting Kunming, 2013 and research interests on recognition of
using artificial neural network method: a case study using a set of 93 pattern in image and computer. Especially she is good at
households in Portugal, Energy Procedia 62 (2014) 220–229. the recognition and prediction of the meteorology.
[9] S. Kouhi, F. Keynia, S.N. Ravadanegh, A new short-term load forecast method
based on neuro-evolutionary algorithm and chaotic feature selection, Int. J.
Electr. Power Energy Syst. 62 (2014) 862–867.
[10] J. Geng, M.-L. Huang, M.-W. Li, W.-C. Hong, Hybridization of seasonal chaotic
cloud simulated annealing algorithm in a SVR-based load forecasting model,
Neurocomputing 151 (2015) 1362–1373.
970 G.-F. Fan et al. / Neurocomputing 173 (2016) 958–970

Wei-Chiang Hong received his Ph.D. degree in Man- Fan Sun was born in Henan, China, November 13th
agement from Da-Yeh University, Taiwan, in2008. Since 1972. She received her B.S. degree in Mathematics
September 2006, he has been with the Department of education from Henan University, China, 1996. Her
Information Management of the Oriental Institute of research interests are Mathematics education and
Technology, where he is currently a professor. His Applied mathematics.
research interests mainly include applications of fore-
casting technology and computational intelligence. He
is currently appointed as the Editor-in-Chief of the
International Journal of Applied Evolutionary Compu-
tation, he is also on the Editorial Board of several
journals, including Neurocomputing, Applied Soft
Computing, The Scientific World Journal, Journal of
Applied Mathematics, Energy Sources Part B: Econom-
ics, Planning, Policy, etc.