Abstract-Electric power consumption is known at fine tempo load at various temporal and spatial scales ([2]). For aggre
ral scales (e.g. hourly) for geographical zones corresponding to gated load forecasting from national down to substation level,
the electric network service divisions (e.g. substations). In several
state-of-art regression methods achieve satisfactory prediction
applications, there is a strong need to estimate the past or to
forecast future consumption at different divisions, for example the
error, with exogeneous variables such as temperature and
town, district or city block levels. The deployment of smart meters calendar variables ([3], [4], [5]). However, there is generally a
only gives a partial answer to this problem because they usually decrease in accuracy of these methods when the aggregation
do not provide exhaustive measures at such temporal scales. We level becomes smaller ([6], [7]), even when historical data of
propose in this paper a generic approach to estimate electric
the target zones are available. Consequently, more complex
consumption on any geographical zones from source zones where
fine-grained consumption data is available, using in addition
strategies are proposed to tackle small-area load forecasting.
socio-demographic information. The approach is evaluated on Reference [8] uses the hierarchy structure in electric grid to
both real and simulated data. uncover patterns in the distribution network.
Index Terms-Spatio-temporal modeling, Spatial load forecast We propose to use socio-demographic information to trans
ing, Clustering, Time series, Electric grid management.
port consumption data from source zones to target zones
with no or little historical consumption data. The idea is
I. INT RODU CTION
to form clusters of both source and target zones based on
Electric consumption at local level - a city, a village or a client information or socio-demographical information from
block - is an important concern for utilities and grid operators. publicly available datasets. We then estimate semi-parametric
A grid planner needs load data at a sufficiently small scale consumption models for each cluster, using consumption data
to perform spatial load forecasting to optimize maintenance only from source zones. These models are then transported
and investments on the grid. With increasingly decentralized to target zones to estimate consumption for past periods, or
generation of renewable energy (for instance, wind or solar to forecast future periods. Clustering approaches ([9], [10],
power), local consumption is becoming more important for [11], [12]) are employed on smart meter time series to form
managing the supply-demand balance at the distribution level. consumer profiles to improve modeling accuracy. Contrary to
Crossed with telecommunication or other utility data at the these methods, we form clusters using socio-demographics or
same level, local electricity consumption can help authorities client information instead of consumption data. This allows
understand local human activities at a fine-grained temporal us to produce estimates for target zones without historical
level. data. The number of public datasets has been exploding
In this paper, we propose a method to estimate past and for a few years. Many governments now provide detailed
future electricity consumption at a small temporal scale for demographic, economic, and sociological statistics collected at
target zones with little or no historical consumption data. Two a local scale ([13], [14], [15]). Our method takes advantage of
cases are of our interest here: 1. for new substations or feeders, these new data to produce new small-area load estimates. The
historical data is insufficient for forecasting; 2. for a block estimation procedure is explained in Section II. The dataset,
or a village, measurements in substations or feeders are not the validation procedure, and an application using real and
spatially fine-grained enough to supply a direct estimate. To simulated consumption is presented in Section III.
solve these two problems, we can use consumption data in
source zones which are similar to the target zones in terms of II. M E T H ODOLOGY
splines, which have identical values for up to the second In the application, we consider a region around Lyon in
derivative at both ends of a year. We estimate the time France. The source zones are geographical zones served by
of-year functions for weekdays and weekends separately. medium-voltage (MY) feeders. Two kinds of target zones are
In our tests, hourly consumption data is used. To account for considered: MV feeder distribution zones with insufficient
intraday variations, we model the 24 hours separately, resulting historical data and IRIS (flots Regroupes pour l'Information
in 24 consumption models. Statistique), an administration zone of approximately 2,000
Computationally, the parametric specification is fitted using habitants, used by the French National Institute of Statistics
classical least square square method (function 1m in the for releasing official socio-economic data. Note that feeder
754
Three years' load Three weeks' load
TABLE I
RULES OF ASSOCIATION OF CLIEN T TYPE AND DISTRIBUTION
TRANSFORMERS. THE + (OR -) SIGN FOR EITHER VARIABLE SIGNIFIES
THAT THE VALUE OF THE VARIABLE IS LARGER (OR SMALLER) THAN THE
MEDIAN OF ALL TRANSFORMERS.
Apartments/CP - Apartments/CP +
Warehouses/CP - rural homes apartment block
Warehouses/CP + industrial client professional
755
IRIS clusters CP clusters IRIS clusters CP clusters
��--------------�
"' o
�
"' o
o
"
0
N
o
�
1:3 0 t5 1:3 tS ci
ci
:l!1 :l!1 :l!1
0 0
w w w 0 W N
"'
"' o o
o I I
I "
o 0
o I
,- T--'---'-'��--�� ,- o
�
Jan Mar May Jul Sep Nov Jan Jan Mar May Jul Sep Nov Jan Jan Mar May Jul Sep Nov Jan Jan Mar May Jul Sep Nov Jan
1 I 1 I
Clusler Clusler
-1 2-3-4-5-6-7 -1 2-3-4-5-6-7
Fig. 2. Cluster model: GAM estimation of annual cycle for the consumption Fig. 3. Cluster model: GAM estimation of annual cycle for the consumption
at 10 a.m. on weekdays. at 10 a.m. on weekend.
� �
sets of variables: 0
1:3 1:3
0
• the age structure and number of offices, shops and factors; :l!1
W
0 :l!1
W
0
0
0
';2
• percentage of contractual power in client types. N
0
I 0
I
This results in two sets of clusters (IRIS clusters and CP 06:00 16:00 02:00 06:00 16:00 02:00
clusters). On each cluster, we use the 30 months of source
Hour Hour
zone consumption data to estimate one consumption model.
1- I
Clusler
Predictions are made for the last three months on source zones
2-3-4-5-6-7
and target zones alike.
Figs. 2 and 3 show the GAM estimation of the annual cycle
for consumption at 10 a.m. for each cluster of both sets of Fig. 4. Cluster model: GAM estimation of the SR effect throughout the day.
with a large decrease in August due to vacations, and a smaller Specification Model type IRIS clus- CP clusters
decrease in May due to the bank holidays. In both sets of ters
clusters, several clusters have a noticeably smaller decrease in Parametric Observed level (source) 18.62 17.90
Parametric Observed level (target) 19. 13 18.54
August (Clusters 3 and 5 in IRIS data clusters, Clusters 1 and Parametric Estimated level (target) 38.20 37.76
7 in CP clusters). This is linked to the fact that these clusters GAM Observed level (source) 18.90 17.59
are generally located in rural area (i. e. they have a lower than GAM Observed level (target) 19.38 18.28
GAM Estimated level (target) 38.41 37.53
average percentage of apartments), where less people go on
vacations in summer. These differences in clusters are also
visible in functions estimated in the parametric specification
Prediction error rates for the different model specifications
(not shown here).
are shown in Table II. In forecasts with observed levels (first
Fig. 4 shows the effect of special rate (SR) days estimated
two rows in each half of the table), the difference of error rate
by the GAM specification. The higher price starts at 6 a.m.
between source and target zones is small. This suggests that
on the SR day, and ends at midnight. The midnight to 6 a.m.
the consumption models obtained by the estimation procedure
part of the graphs corresponds to this time interval of the
are transportable to target zones. In both cases, the error rate
day following SR day. Often clients reduce their consumption
is less than 20%, comparable to the state of art at similar
during high rate hours and increase consumption just after
aggregation level with full observations ([6]). Such error rate
midnight, as can be seen in the figures. Clusters 1 and 5 in
is higher than predictions made with individual feeder models
the IRIS clusters and Cluster 1 in CP clusters have much less
with similar specification in our preliminary studies (1l.7% for
drastic SR effect. This shows that the clusters we obtained are
the 473 feeders on average), but this is not surprising, since
able to capture some differences in electricity consumption
the cluster models are much less specialized than individual
of the clients, although no consumption data is used in the
feeder models. This decrease in model accuracy is a small
clustering step. The parametric specification has qualitatively
price to pay for the increased model transportability, which
similar results.
enables us to estimate hourly consumption for target zones
2 The scales are relative on Figs.2 and 3, since the estimation is applied to without using its historical data.
normalized time series. The CP clustering model has a lower error rate, indicating
3 Results for other hours are qualitatively similar. that direct client information has greater importance than
756
TABLE III IV. CONCLUSION
MEAN PREDICTION ERROR RATE ON SIMULATED CONSUMPTION.
This study proposes a method to estimate hourly elec
IRIS clusters CP clusters tricity consumption for target zones with insufficient or no
MV feeders (source) 13.20 1 1.22
historical consumption data using measurements from source
MV feeders (target) 13.87 1 1.45
IRIS estimation (target) 16.93 14.09 zones and public socio-demographic information. The proce
dure provides state-of-art prediction performance on real and
simulated datasets for target zones with very little data, and
significantly outperforms the benchmark interpolation method
indirect socio-demographic variables. Parametric models have
in simulations.
similar performances in prediction as GAMs.
We also tried to estimate consumption levels by linear REFERENCES
regression using CP variables. The error rates are in the third [ 1] H. L. Willis, Spatial electric load forecasting. CRC Press, 2002.
row of each half of Table II). These predictions have an [2] 1. Melo, A. Padilha-Feltrin, and E. Carreno, "Data issues in spatial elec
tric load forecasting, " in 2014 IEEE PES General Meeting I Conference
average error rate between 35% and 40%. This is because
Exposition, 20 14, pp. 1-5.
the estimation of consumption level by CP is not accurate. [3] H. Hahn, S. Meyer-Nieberg, and S. Pickl, "Electric load forecasting
If yearly aggregated consumption of target zones is available, methods: Tools for decision making, " European Journal of Operational
Research, vol. 199, no. 3, pp. 902-907, Dec. 2009.
this error rate can be reduced.
[4] S. Fan and R. J. Hyndman, "Short-term load forecasting based on a
semi-parametric additive model, " Power Systems, IEEE Transactions on,
vol. 27, no. 1, pp. 134- 141, 2012.
[5] A. Pierrot and Y. Goude, "Short-term electricity load forecasting with
D. Application on synthetic consumption of feeders and IRIS
generalized additive models, " in Proceedings of ISAP power, 20 1 1, pp.
593 - 600.
On the synthetic consumption dataset, we keep the same [6] P. Mirowski, S. Chen, T. Kam Ho, and C.-N. Yu, "Demand Forecasting
five sets of randomly selected 100 feeders as source zones. in Smart Grids, " Bell Labs Technical Journal, vol. 18, no. 4, pp. 135-
158, 20 14.
As target zones, we use the rest of the feeders, and the
[7] R. Sevlian and R. Rajagopal, "A model for the effect of aggregation
1,094 IRIS. As in the real consumption application, models on short term load forecasting, " in P ES General Meeting I Conference
are estimated on source zone load of the first 30 months. Exposition, 2014 IEEE, 20 14, pp. 1-5.
[8] X. Sun, P. B. Luh, K. W. Cheung, W Guan, L. D. Michel, S. S.
Predictions are made for the last 3 months on both source and
Venkata, and M. T. Miller, "An Efficient Approach to Short-Term Load
target zones. We only report the results on GAM specification Forecasting at the Distribution Level, " IEEE Transactions on Power
with observed levels, since the relative difference is similar to Systems, vol. 3 1, no. 4, pp. 2526-2537, 2016.
[9] C. Alzate and M. Sinn, "Improved electricity load forecasting via kernel
the real consumption case. To produce clusters, we use either
spectral clustering of smart meters, " in Data Mining (ICDM). 2013 IEEE
the apartment, warehouse variables used in simulation, or the 13th International Conference on. IEEE, 20 13, pp. 943-948.
percentage of CP in client type. The two sets of clusters are [ 10] J. D. Rhodes, W J. Cole, C. R. Upshaw, T. F. Edgar, and M. E. Webber,
"Clustering analysis of residential electricity demand profiles, " Applied
similarly called IRIS clusters and CP clusters.
Energy, vol. 135, pp. 46 1-471, 2014.
Prediction error rates are shown in Table III. The difference [ 1 1] F. L. Quilumba, W-J. Lee, H. Huang, D. Y. Wang, and R. L. Szabados,
between source and target zones of feeder service area is "Using Smart Meter Data to Improve the Accuracy of Intraday Load
Forecasting Considering Customer Behavior Similarities, " Smart Grid,
small, as in the real consumption case. This confirms that IEEE Transactions on, vol. 6, no. 2, pp. 9 1 1-918, 20 15.
consumptions models obtained by the process are indeed trans [l2] R. Li, c. Gu, F. Li, G. Shaddick, and M. Dale, "Development of
portable to the target zones. The prediction error is generally Low Voltage Network Templates - Part I: Substation Clustering and
Classification, " IEEE Transactions on Power Systems, vol. 30, no. 6,
lower in this case, as consumption is smoother in simulations pp. 3036-3044, 20 15.
than in real data, and the clusters are more homogeneous. We [ 13] "United States Census Bureau, " accessed: 2016-02-10. [Online].
Available: hup:llwww.census.gov/
observe a similar slight decrease in prediction error when the
[l4] "20 1 1 census for England and Wales - Office for National Statistics, "
percentage of CP in client type is used. This is also reassuring, accessed: 2016-02-10. [Online]. Available: hup:llwww.ons.gov.uk/ons/
for it suggests a similarity between real data and simulations. guide-method/census120lllindex.html
[l5] "INSEE - Database - Population census results, " accessed: 2016-
The third row in Table III shows the prediction error 02-10. [Online]. Available: hup://www.inseeJr/en/bases-de-donnees/
on IRIS consumption. We observe an average error rate of default.asp?page=recensements.htm
16.93% and 14.09% for IRIS clusters and CP clusters. This [l6] A. Bruhns, G. Deurveilher, and J.-S. Roy, "A non linear regression
model for mid-term load forecasting and improvements in seasonality, "
is higher, but of the same order as short-term forecasting at in Proceedings of the 15th Power Systems Computation Conference,
this aggregation level with full data ([6]). Such precision is 2005, pp. 22-26.
reasonably good, given that the estimation requires only the [ 17] S. Wood, Generalized additive models: an introduction with R. CRC
press, 2006.
average consumption level for each IRIS. As a benchmark, [ 18] J. Friedman, T. Hastie, and R. Tibshirani, The elements of statistical
we used an interpolation proportional to the population from learning. Springer series in statistics Springer, Berlin, 200 1, vol. 1.
the 100 source zones to produce an estimation for aggregated [l9] "Regles relatives a la Programmation, au Mecanisme d' Ajustement et
au dispositif de Responsable d'Equilibre, Regles chapitre F, " accessed:
consumption of the IRIS. This method does very poorly on 2016-07-08. [Online]. Available: http://clients.rte-france.com/langlfr/
aggregated IRIS consumption, with 49.57% of mean error. clients_producteurs/services_clients/regles.jsp
Our method is thus significantly better than the interpolation
approach.
757