Ieee Smart Car Parking

2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)
Symposium on Information Processing

Singapore, 2124 April 2014
Smart Car Parking: Temporal Clustering and

Anomaly Detection in Urban Car Parking
Yanxu Zheng , Sutharshan Rajasegarar , Christopher Leckie , Marimuthu Palaniswami
Dept.
of Computing and Information Systems, Dept. of Electrical and Electronic Eng.

The University of Melbourne, Australia;
E-mails: {yanxuz@student., sraja@, caleckie@, palani@}unimelb.edu.au
AbstractA major challenge for modern cities is how to

maximise the productivity and reliability of urban infrastructure,
such as minimising road congestion by making better use of
the limited car parking facilities that are available. To achieve
this goal, there is growing interest in the capabilities of the
emerging Internet of Things (IoT), which enables a wide range
of physical objects and environments to be monitored in ne
detail by using low-cost, low-power sensing and communication
technologies. While there has been growing interest in the IoT
for smart cities, there have been few systematic studies that can
demonstrate whether practical insights can be extracted from
real-life IoT data using advanced data analytics techniques. In
this work, we consider a smart car parking scenario based on
real-time car parking information that has been collected and
disseminated by the City of San Francisco. We investigate whether
useful trends and patterns can be automatically extracted from
this rich and complex data set. We demonstrate that by using
automated clustering and anomaly detection techniques we can
identify potentially interesting trends and events in the data. To
the best of our knowledge, we provide the rst such analysis of
the scope for clustering and anomaly detection on real-time car
parking data in a major urban city.
I.
I NTRODUCTION
Around 70% of the worlds population is expected to live

in cities and surrounding regions by 2050 [5]. Therefore, cities
will need to be better managed, if only to survive as platforms
that enable economic, social and environmental well-being. A
smart city [1], [15], according to Forrester, is one that uses
information and communications technologies (ICT) to make
the critical infrastructure and services of a city, such as public
safety, transportation and utilities, more aware, interactive
and efcient [5]. Smart city management is technologically
predicated on the emergent Internet of Things (IoT) [1] a radical evolution of the current Internet into a network
of interconnected objects, such as sensors, parking meters,
energy measuring devices and actuators, that not only harvests
information from the environment (sensing) and interacts with
the physical world (actuation/command/control), but also uses
existing Internet standards to provide services for information
transfer, analytics, applications and communications [12].
Wireless Sensor Networks (WSNs), seamlessly integrated
into urban infrastructure (transport, health, environment), form
the sensing-actuation core objects of an IoT in a smart city
system and information will be shared across diverse platforms
and applications. The growth in low-cost, low-power sensing
and communication technologies enables a wide range of
physical objects and environments to be monitored in ne
detail. The detailed, dynamic data that can be collected from
devices on the IoT provides the basis for new business and
government applications in areas such as public safety, transport logistics and environmental management. A key challenge
in the development of such applications is how to model
and interpret the large volumes of complex data streams that
will be generated by the IoT. Examples of such large scale
deployments of sensors include (1) SmartSantander [1], [3],
in the Spanish city of Santander, with around 12,000 sensors
installed in places such as lamp posts for sensing temperature,
CO, noise, light and buried in the asphalt for parking sensing,
(2) in the city of San Francisco (SF), USA, where around
8,200 wireless parking sensors in neighborhoods across the
city are installed in on-street spaces, which can enable real
time monitoring and also inform drivers of the available vacant
parking lots and the rates in real time.
While there has been much discussion of the potential for
smart cities based on the IoT, there have been few systematic
studies of how data analytics can provide practical insights
from IoT data. The collection of such data is intended to be
used for improving trafc management, energy management,
environment protection, public health and safety. However,
urban authorities are not equipped to make use of this type
of Big Data. Without suitable data analytics to detect and
correlate relevant events in the urban environment, this sensing
infrastructure will not be effectively utilised and these public
services will remain manual tasks.
In this paper, we use the parking data collected from one
of the cities, namely the city of San Francisco (SF), USA,
and apply data analytics to infer interesting events buried
in the data. Although the SF parking data provides real-tine
parking availability data to the public, a meaningful analysis
of the data is lacking for interpretation by the authorities. In
particular, we perform data clustering and anomaly detection
on the collected parking data, and present several interesting
practical insights from the data, which are impossible to infer
without performing such machine learning tasks. To the best
of our knowledge, this is the rst time such an analysis has
been performed in terms of clustering and anomaly detection
on the SF parking dataset, which has been made available to
the public by the city, and can be accessed from [2].
The rest of the paper is organised as follows. Section II
provides the existing related work in this domain, and Section
III introduces the SF data set, the challenges in the analytics
and our approach. Section IV describes the clustering and
the anomaly detection algorithm, and Section V discusses the
outcomes. In Section VI, we provide a discussion about the
results and conclude highlighting further research directions.
1
978-1-4799-2843-9/14/$31.00 2014 IEEE
II.
computed the average OCC rate over the two week period
for every 15 minutes interval. The resulting data set consists
of 570 parking locations and 96 fteen minute time instances
per location over a day. We perform clustering and anomaly
detection on this 96 dimensional data in this paper.
R ELATED W ORK
Analysing parking data in terms of predicting available

parking lots has received attention in the literature. The main
challenges of parking availability prediction are the accuracy of the long time prediction, the interaction between the
parking lots in an area, and how user behaviors affect the
parking availability. In [9], Caliskan et al. built a decentralized
parking guide system based on vehicular ad hoc networks
(VANETs) that uses a continuous-time homogeneous Markov
model for parking availability prediction. The model only
targets a single parking lot and is most effective within 15
minutes. In [17], Klappenecker et al. followed the research
of [9] and developed a structural solution that simplies the
computation of transition probabilities. In [8], Caicedo et al.
present an aggregated approach that is combined with an
Intelligent Parking Reservation (IPR) system, which lets users
set their parking preferences, pay in advance and provides realtime parking information. The whole system receives drivers
parking requests, and allocates them to the best parking lot
using a calibrated discrete choice model and estimate the future
departure based on requests of all drivers and their habits. Then
using the previous results to predict the parking availability,
this method can reach a reasonable accuracy of prediction in a
4 hour window. However, these analyses are done in either one
of the parking lots or considering all of them together, without
grouping them based on their behavior. Further no anomaly
detection has been performed to infer interesting events in the
data.
Analysing all the 570 parking location data purely using

the time series plots is cumbersome and makes it difcult to
infer any useful information. However, systematic clustering
analysis and anomaly detection can limit the scope of such an
analysis by providing a focus in terms of potentially interesting
locations. Below we provide a brief overview of the clustering
and the anomaly detection algorithms that we use for analysing
this dataset.
IV.
Clustering is a process of nding groups of similar data

vectors in a given data set. This can be non-parametric, which
does not assume any distribution over the data a priori, and
unsupervised, which does not need prior labeling of the data
as to which class or cluster it belongs, learning technique.
Consequently, this is a suitable data analytics technique for a
new dataset like the SF parking data. There is a wide variety
of clustering techniques available in the literature with varying
pros and cons [6], [18], [19], [27], [33]. We use a simple
but effective approach based on automatically separating the
farthest data vectors, called farthest rst clustering, and an
expectation maximisation clustering method for the analysis.
Our analysis on clustering and anomaly detection provides

the means to identify the interesting parking locations, such
as extreme occupancy rates during the day, and possible faulty
data. Combining our analysis along with the parking prediction
will enable more accurate predictions in the future. Below we
describe the SF parking data and our approach to clustering
and anomaly detection in detail.
III.
C LUSTERING AND A NOMALY D ETECTION
The task of detecting interesting or unusual events in

a general manner is an open problem in the data mining
community, and is often referred to as the anomaly detection
problem. An outlier or anomaly in a set of data is dened by
Barnett et al. [4] as an observation (or subset of observations)
which appears to be inconsistent with the remainder of that
set of data. Anomaly or outlier detection mechanisms can
be categorised into three general approaches depending on the
type of background knowledge of the data that is available.
The rst approach nds outliers without prior knowledge of
the underlying data. This approach uses unsupervised learning
or clustering, and assumes that the outliers are well separated
from the data points that are normal. The second approach
uses supervised classication, where a classier is trained with
labeled data, i.e., the training data is marked as normal or
abnormal. Then the trained classier can be used to classify
new data as either normal or abnormal. This approach requires
the classier to be retrained if the characteristics of normal
and abnormal data changes in the system. The third approach
is novelty detection, which is analogous to semi-supervised
recognition. Here a classier learns a succinct generalisation
of a given set of data, which can then be used to recognise
anomalies [6], [7], [20][23], [25], [26], [28]. In this work we
use a novelty detection approach based on one-class suppport
vector machine to detect anomalies in the SF parking data.
Below we provide an overview of each of the algorithms we
use for the analysis.
SF PARKING DATA SET AND O UR A PPROACH
The city of San Francisco has deployed 8200 parking

sensors in city parking lots [2]. Sensors are installed in onstreet metered parking spaces and gate controlled off-street
facilities [2]. The real time data are collected and made available on-line for public use and research activities. These sensor
data make it more convenient to nd vacant parking spots
by the drivers, bicyclists, pedestrians, visitors, residents and
merchants. In addition to the parking availability map available
on the SFpark.org web, information on parking availability is
distributed via a mobile apps and the regional phone system.
By checking parking availability before leaving home, drivers
will know where they can expect to nd parking and how much
it will cost.
We used the public API data feed to collect parking
data over a two weeks period between 13/August/2013 and
26/August/2013 at a sampling rate of 15 minute time intervals.
The features collected include date and time, type of parking
lot, whether it is on street or off street parking, parking
lot name, number of spaces currently occupied (OCC rate),
number of spaces currently operational for this location, and
the longitude and latitude values of each location. In order
to perform clustering and anomaly detection analysis, we
1) Farthest First (FF) Clustering: The FF clustering is a

simple clustering algorithm, introduced by [10], [11], [14], that
performs clustering of the data given a number of clusters k
a priori. It uses a farthest rst traversal to nd the mutually
farthest k points. The steps involved in the algorithm are as
follows. First, a data vector x is selected arbitrarily. Second,

a second data vector y is selected furthest from the rst one.
Third, a data vector furthest from the two data vectors x and y
are selected. This procedure is continued until k data vectors
are obtained. These k data vectors are used as the cluster
centers, and the remaining data vectors are assigned to the
closest of those cluster centers. We utilised the Weka software
[13] for performing FF clustering on the smart car parking
data.
2) Expectation Maximisation (EM) Clustering: The EM
algorithm assigns a probability distribution to each data point,
indicating the probability of it belonging to each of the clusters.
It is an unsupervised clustering method that makes use of a
nite Gaussian mixtures model, where the number of mixtures
is equal to the number of clusters, and each probability
distribution corresponds to one cluster. The steps involved in
the EM algorithm for clustering are as follows [16].
Fig. 1. Geometry of SVDD: Data vectors are mapped from the input space
to a higher dimensional space and a hypersphere (with center c and radius R)
is t to the majority of the data. Data that falls outside the hypersphere are
anomalous.
the proportion of data vectors considered to be anomalies is

controlled by a parameter of the algorithm. Tax et al. [32]
formulated the one-class SVM using a hypersphere, called
support vector data description (SVDD). In this approach, a
minimal radius hypersphere is xed around the majority of the
image vectors in the feature space. The data that falls outside
the hypersphere are identied as anomalous. Figure 1 shows
the geometry of the SVDD. This hypersphere formulation uses
quadratic programming optimisation.
1) Initial values are arbitrarily assigned for the mean and

standard deviation of the normal distribution model.
2) The parameters are iteratively rened using the two
steps of the EM algorithm, namely the Expectation step
(E) and the Maximisation step (M). In the E step, the
membership probabilities for each data vector based on
the above initial parameters are computed. In the M step,
the parameters are recomputed based on the new membership probabilities found in the E step. The algorithm
terminates when the distribution parameters converge or
the algorithm reaches a maximum number of iterations.
3) Each data vector is assigned to a cluster with which it
has the maximum membership probability.
Consider a data vector xi in the input space from a set of

data vectors X = {xi : i = 1..n} mapped to a the feature
space by some non-linear mapping function (.), resulting in
a mapped vector (xi ) (image vector). The aim of tting a
hypersphere with minimal radius R, having a center c and
encompassing a majority of the image vectors in the feature
space yields the following optimisation problem:
EM determines the number of clusters by cross validation.

The cross validation is performed as follows. First, the number
of clusters is set to one. Second, the training set is split into
a given number of folds. In this case it is split into 10 folds.
EM procedure is performed 10 times with the 10 folds. The
log likelihood values obtained from the 10 fold procedure is
averaged. If the log likelihood is increased when the number of
clusters is increased by 1, then the above procedure is repeated
from the second step. We utilised the Weka software [13] for
performing the EM clustering on the smart car parking data.
min
R+ ,n
subject to:
1
R +
i
n i=1
2
(xi ) c R2 + i , i 0, i (1)
where {i : i = 1...n} are the slack variables that allow some

of the image vectors to lie outside the sphere. The parameter
(0, 1] is the regularisation parameter which controls the
fraction of image vectors that lie outside the sphere, i.e., the
fraction of image vectors that can be outliers or anomalies.
Using the Lagrange technique, the above primal problem (1)
is converted to a dual problem as follows, which is a quadratic
optimisation problem:
3) One-class SVM: Support Vector Data Description

(SVDD): A class of machine learning algorithms, called kernel
methods, uses kernel functions to emulate a mapping of data
measurements from the input space (the space where the data
is collected) to a higher dimensional space called the feature
space [24], [29][31]. The mapped vectors in the feature space
are called image vectors. Linear or smooth surfaces in the
feature space are used to classify the data as either normal or
anomalous. The linear or smooth surfaces in the feature space
usually yield nonlinear surfaces in the input space. The advantage of this method is that the dimension of the mapped feature
space is hidden by the kernel function and is not explicitly
known. This facilitates highly nonlinear and complex learning
tasks without excessive algorithmic complexity. A specic
class of algorithms called one-class support vector machines
(SVMs) do not require labeled data for training. In this scheme
a separating smooth surface such as a hypersphere is found in
the feature space, such that the surface automatically separates
the data vectors into normal and anomalous. In these schemes,
min
n
subject to:
n

i,j=1
n
i j k(xi , xj )
n
i = 1, 0 i
i=1
i k(xi , xi )
i=1
1
, i = 1...n. (2)
n
where, k(xi , xj ) = (xi ).(xj ) is the kernel function, and

the i are the Largrange multipliers. The data vectors with
i > 0 are called the support vectors. Using the solution for
i , the decision function
for a data vector xcan be written as
n
n
f (x) = sgn(R2 i,j=1 i j k(xi , xj )+2 i=1 i k(xi , x)
1
,
k(x, x)). Anomalous data vectors are those with i = n
1
which fall outside the sphere. Data vectors with 0 i < n
fall inside or on the the sphere, and are considered normal.
The kernel function that we use in this work is the Gaussian
2
function k(xi , xj ) = exp( xi xj / 2 ), where is the
kernel width parameter. A larger value for provides a

smoother boundary around the data, while a smaller value
provides a rugged boundary. It can be shown that is an
upper bound on the fraction of anomalies and a lower bound
for the fraction of support vectors. The and are the two
parameters of this algorithm that need to be tuned depending
on the data set [32].
V.
In terms of the one-class SVM (the SVDD), exactly the

same normal proles were identied as was the case with FF
clustering. However, SVDD identied those locations that experienced extreme behavour, either abnormally high (similar to
FF anomalies - cluster 0), but also abnormally low occupancy
locations. These include parking locations with close to zero
occupancy in the early morning. An open question for further
investigation is the reason for such low occupancy, i.e., being
in a business district or having security and safety concerns.
SF PARKING DATA A NALYSIS
The aim is to analyse the car parking data from a major

urban center and reveal any interesting clustering structure that
exists in the data. Further, we aim to identify any anomalies
present in the data, that are indicative of potential sensor
failures or unusual behaviour.
Figure 5 shows the clusters obtained using FF clustering

with the number of clusters set to four. The four clusters show
different bands of occupancy rates over the 24 hour period
in the city. This is evident from Figures 5(b) and 5(c). In
addition to the clusters that show consistently lower occupancy
(cluster 0) and higher occupancy locations (cluster 1), it
also reveals two more clusters that have different behaviour.
Cluster 2 (shown in black) shows very low occupancy rates,
around 0.3, during the early morning and higher occupancy
rates, around 0.7, during the day time. This shows larger
variation between early morning and busy hours of the day.
The other cluster, cluster 4 (shown in purple), shows occupancy
rates of around 0.6 (on average) throughout the 24 hour
period. The identication of these bands of clusters, i.e., the
bands of parking locations, helps parking managers to identify
consistent occupancy behaviors in the parking lots over the
city region, and potentially help devise appropriate parking
strategies. For example, the parking rates for parking locations
in the cluster 3 and cluster 2 can be readjusted such that a
uniform occupancy is achieved, and hence better utilisation
of parking lots, throughout the region. This demonstrates the
benet of performing such clustering analysis on parking data
for identifying interesting scenarios.
First we aim to identify any anomalous parking locations

in the SF region. We considered two approaches to anomaly
detection. First, using the farthest rst (FF) clustering algorithm with the number of clusters set to two. Second, using
the one class SVM algorithm (SVDD). Figure 2 provides a
Fig. 2.
Clustering and SVDD results.
Finally, we consider the use of ne-grained clustering in

order to identify more specic behaviours. We used EM clustering for this analysis. EM clustering automatically identies
the number of clusters in the data set. In this case it identied
16 distinct clusters in the data. Figures 6(b) and 6(c) show
the median and the mean occupancy rates over the 24 hour
period for each of the 16 clusters respectively. Note that we
omitted showing the MAD and the standard deviation values
we computed for each cluster in the graphs for clarity. Figure
6(a) shows the locations of the parking lots for each of the
clusters.
table that shows the number of data vectors assigned to each

of the clusters form the FF and SVDD methods along with
the parameter values used for each of the algorithms. Figures
3(a) and 4(a) show the parking locations in the SF region
corresponding to each of the clusters, denoted using different
colours. The time series plots of the occupancy rate (OCC
rate) vs time for the 24 hour period are shown along with
the location maps. Figures 3(b) and 4(b) show the median
and median absolute deviation (MAD) of the data vectors that
belong to each of the clusters. Figures 3(c) and 4(c) show the
mean and standard deviation of the data vectors that belong
to each of the clusters. The FF produced a big cluster, cluster
0, with 538 data vectors and a small cluster, cluster 1, with
31 data vectors, whereas the SVDD produced 373 normal and
197 anomalous data vectors. Both methods identied similar
normal behaviors (shown in green in Figures 3(c) and 4(c)).
In these normal data vectors, the time series demonstrates two
periods of high occupancy around morning and evening peak
hours. Further, a higher occupancy during the day time and a
lower occupancy during the early morning can be observed in
these data vectors.
When we analyse these clusters, again we can see that these

highlight clusters with consistently higher or lower occupancy,
and with the typical daily variation in the occupancy proles.
However, there are two clusters that are particularly noteworthy
for further investigation. Cluster 2 consistently has a median of
zero. Those parking lots are geographically dispersed. They are
likely to indicate a faulty sensor. This identication becomes
useful for fault analysis. Further questions that arise from this
analysis are, how do these sensors change over time?, i.e., if
they are truly faulty, is there any drift in their prole that could
act as an early warning?
In terms of anomalies, FF clustering identied a small

number of locations (31 locations) that had consistently higher
occupancy even during the early morning. These parking
lots are geographically distributed across the city (see Figure
3(a)), but still reasonably concentrated within each geographic
region. This may be an indication of special time limitations
or parking rates.
Cluster 4 also shows unusual behaviour, in terms of having

higher occupancy during the early morning compared to during
the day. Further investigation is warranted to see if these
locations are affected by particular daytime activities, such
as road or building construction works, which limit access
to these parking spots during the day. Figure 7 shows the
1
0.9
cluster0
cluster1
1
0.9
0.8
cluster0
cluster1
0.8
0.7
OCC rate
OCC rate
0.7
0.6
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0
00.00
0.6
0.1
05.00
10.00
15.00
Time (hours.minutes)
20.00
0
00.00
01.00
05.00
10.00
15.00
20.00
01.00
(a)
(b)
(c)
Fig. 3. Farthest First clustering with two clusters. (a) Spatial locations of the parking lots in each cluster (b) Median and median absolute deviation of the data
vectors in each of the two clusters. (c) Mean and standard deviation of the data vectors in each of the two clusters
1
1
Anomalies
Normal
0.9
0.8
0.8
0.7
0.7
0.6
0.6
OCC rate
OCC rate
0.9
0.5
0.4
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0
00.00
Anomalies
Normal
0.1
05.00
10.00
15.00
Time (hour.minute)
20.00
0
00.00
01.00
05.00
10.00
15.00
20.00
01.00
(a)
(b)
(c)
Fig. 4. One-class classication (SVDD) with parameters = 0.1 and = 100. (a) Spatial locations of normal and anomalous parking lots. (b) Median and
median absolute deviation of the normal and anomalous data vectors. (c) Mean and standard deviation of the normal and anomalous data vectors.
cluster0
cluster1
cluster2
cluster0
cluster3
cluster2
cluster3
0.8
0.6
OCC rate
OCC rate
0.8
0.4
0.2
0
00.00
cluster1
0.6
0.4
0.2
05.00
10.00
15.00
20.00
01.00
0
00.00
05.00
10.00
15.00
20.00
01.00
(a)
(b)
(c)
Fig. 5. Farthest First clustering using four clusters. (a) Spatial locations of the parking lots in each cluster. (b) Median and median absolute deviation of the
data in each of the four clusters. (c) Mean and standard deviation of the data in each of the four clusters.
infrastructure that has been monitored using IoT devices. In

this paper we demonstrated the importance of clustering and
anomaly detection on car parking management in a major
urban center- the city of San Francisco. In contrast to using
simple average proles, we have shown that we can both
characterise normal temporal behavior, as well as identifying
anomalous behavior. In particular, we showed that farthest
rst (FF) clustering identies a small number of heavy usage
parking spots, while the one-class SVM (SVDD) identies extreme behavior (both high and low occupancy). These ndings
provide a focus for further analysis into external factors that
may affect parking behavior, e.g., pricing, land use (business or
residential), security and safety, and adjancy to other modes of
transport. Furthermore, we identied how ner scale clustering
can identify potential operational issues, such as the possibility
of faulty sensors, or parking spots that are being affected by
external factors during specic periods of the day. Furthermore,
our research has highlighted how clustering and anomaly
detection can provide a focus for more detailed investigation,
such as correlating observations with other sources of data,
(a)
(b)
Fig. 7. Spatial locations of the parking lots in selected clusters (using EM
clustering) (a) Cluster 2: have data vectors with a mean/median value of zero
(green lines). (b) Cluster 4 (red lines)
locations of the parking lots in selected clusters (Cluster 2

and Cluster 4).
VI.
D ISCUSSION AND C ONCLUSION
Data analytics in large data sets collected by smart cities

is an important task to enable intelligent management of the
0.8
OCC rate
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
00.00
05.00
10.00
15.00
Time (hour.minute)
20.00
01.00
cluster0
cluster1
cluster2
cluster3
cluster4
cluster5
cluster6
cluster7
cluster8
cluster9
cluster10
cluster11
cluster12
cluster13
cluster14
cluster15
0.9
0.8
0.7
OCC rate
cluster0
cluster1
cluster2
cluster3
cluster4
cluster5
cluster6
cluster7
cluster8
cluster9
cluster10
cluster11
cluster12
cluster13
cluster14
cluster15
0.9
0.6
0.5
0.4
0.3
0.2
0.1
0
00.00
05.00
10.00
15.00
20.00
01.00
(a)
(b)
(c)
Fig. 6. EM clustering. (a)Spatial locations of the parking lots in each cluster. (b) Median values of the data vectors from each cluster. (c) Mean values of the
data vectors from each cluster
e.g., crime statistics, other modes of transport, construction activity. This type of detailed correlation with other data sources
can be impractical if it needs to be applied to all parking
locations. However, the cluster analysis can limit the scope of
such an analysis by providing a focus in terms of potentially
interesting locations. In particular, we have demonstrated that
it is possible to nd clusters that are indicative of potential
sensor faults. In the future, we aim to perform clustering based
on both spatial and temporal similarity in the SF data as well
as from other platforms such as Smart Santander.
[15] J. Jin, J. Gubbi, T. Luo, and M. Palaniswami, Network architecture

and QoS issues in the Internet of Things for a Smart City, in Proc. of
the ISCIT, 2012, pp. 974979.
[16] X. Jin and J. Han, Expectation maximization clustering, in Encyc. of
Mach. Learn., C. Sammut and G. Webb, Eds., 2010, pp. 382383.
[17] A. Klappenecker, H. Lee, and J. L. Welch, Finding available parking
spaces made easy, Ad Hoc Nets., vol. 12, no. 0, pp. 243 249, 2014.
[18] M. Moshtaghi, T. Havens, L. Park, J. C. Bezdek, S. Rajasegarar,
C. Leckie, M. Palaniswami, and J. Keller, Clustering ellipses for
anomaly detection, Pattern Recog., vol. 44, no. 1, pp. 5569, 2011.
[19] M. Moshtaghi, S. Rajasegarar, C. Leckie, and S. Karunasekera, An
efcient hyperellipsoidal clustering algorithm for resource-constrained
environments, Pattern Recog., vol. 44, no. 9, pp. 21972209, 2011.
[20] C. OReilly, A. Gluhak, M. A. Imran, and S. Rajasegarar, Anomaly
detection in wireless sensor networks in a non-stationary environment,
IEEE communications, surveys and tutorials, 2013.
[21] , Online anomaly rate parameter tracking for anomaly detection
in wireless sensor networks, in IEEE SECON, 2012.
[22] S. Rajasegarar, J. C. Bezdek, C. Leckie, and M. Palaniswami, Elliptical anomalies in wireless sensor networks, ACM Trans. on Sensor
Networks, vol. 6, no. 1, p. 28, Dec. 2009.
[23] S. Rajasegarar, J. C. Bezdek, M. Moshtaghi, C. Leckie, T. C. Havens,
and M. Palaniswami, Measures for clustering and anomaly detection
in sets of higher dimensional ellipsoids, in IEEE WCCI, 2012.
[24] S. Rajasegarar, C. Leckie, J. C. Bezdek, and M. Palaniswami, Centered
hyperspherical and hyperellipsoidal one-class support vector machines
for anomaly detection in sensor networks, IEEE Trans. on Info.
Forensics and Sec., vol. 5, no. 3, pp. 518533, 2010.
[25] S. Rajasegarar, C. Leckie, and M. Palaniswami, Anomaly detection in
wireless sensor networks, IEEE Wireless Comms., vol. 15, no. 4, pp.
3440, 2008.
[26] , Detecting data anomalies in sensor networks, in Security in Adhoc and Sensor Networks, R. Beyah, J. McNair, and C. Corbett, Eds.
World Scientic Publishing, Inc, ISBN: 978-981-4271-08-0, July 2009,
pp. 231260.
[27] , Hyperspherical cluster based distributed anomaly detection in
wireless sensor networks, Jnl. of Parallel and Distributed Computing,
no. 0, pp. , 2013.
[28] S. Rajasegarar, C. Leckie, M. Palaniswami, and J. C. Bezdek, Distributed anomaly detection in wireless sensor networks, in IEEE ICCS,
2006.
[29] S. Rajasegarar, A. Shilton, C. Leckie, R. Kotagiri, and M. Palaniswami,
Distributed training of multiclass conic-segmentation support vector
machines on communication constrained networks, in ISSNIP, 2010,
pp. 211216.
[30] B. Scholkopf and A. Smola, Learning with Kernels, 2002.
[31] A. Shilton, S. Rajasegarar, and M. Palaniswami, Combined multiclass
classication and anomaly detection for large-scale wireless sensor
networks, in IEEE ISSNIP), 2013, pp. 491496.
[32] D. M. J. Tax and R. P. W. Duin, Support vector data description,
Machine Learning, vol. 54, no. 1, pp. 4566, 2004.
[33] H. Wackernagle, Multivariate Geostatistics: An Introduction with Applications, 1998.
ACKNOWLEDGMENT
We thank the support from the Australian Research Council
grants LP120100529 and LE120100129.
R EFERENCES
[1] IoT,
http://issnip.unimelb.edu.au/research program/Internet of
Things, 2013.
[2] San Francisco parking data, http://sfpark.org, 2013.
[3] Smart Santander, http://www.smartsantander.eu/, 2013.
[4] V. Barnett and T. Lewis, Outliers in Statistical Data, 3rd ed. John
Wiley and Sons, 1994.
[5] J. Belissent, Getting clever about smart cities: New opportunities require new business models, in http:// www.forrester.com/ rb/ Research/
getting clever about smart cities new opportunities/ q/ id/ 56701/ t/ 2,
2013.
[6] J. C. Bezdek, T. Havens, J. Keller, C. Leckie, L. Park, M. Palaniswami,
and S. Rajasegarar, Clustering elliptical anomalies in sensor networks,
in IEEE WCCI, 2010.
[7] J. C. Bezdek, S. Rajasegarar, M. Moshtaghi, C. Leckie, M. Palaniswami,
and T. Havens, Anomaly detection in environmental monitoring networks, IEEE Comp. Int. Mag., vol. 6, no. 2, pp. 5258, 2011.
[8] F. Caicedo, C. Blazquez, and P. Miranda, Prediction of parking space
availability in real time, Expert Systems with Apps., vol. 39, no. 8, pp.
7281 7290, 2012.
[9] M. Caliskan, A. Barthels, B. Scheuermann, and M. Mauve, Predicting
parking lot occupancy in vehicular ad hoc networks, in IEEE VTC,
2007.
[10] S. Dasgupta and P. M. Long, Performance guarantees for hierarchical
clustering, Jnl. of Comp. and Sys. Sci., vol. 70, no. 4, pp. 555 569,
2005.
[11] T. F. Gonzalez, Clustering to minimize the maximum intercluster
distance, Theoretical Comp. Sci., vol. 38, no. 0, pp. 293 306, 1985.
[12] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami, Internet of
Things (IoT): A Vision, Architectural Elements, and Future Directions,
Accepted for publ. in Future Generation Computer Systems, Jan 2013.
[13] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and
I. H. Witten, The WEKA data mining software: An update, SIGKDD
Explorations,, vol. 11, no. 1, 2009.
[14] D. S. Hochbaum and D. B. Shmoys, A best possible heuristic for the
k-center problem, Maths. of Oper. Res., vol. 10, no. 2, pp. 180184,
1985.

Ieee Smart Car Parking

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ieee Smart Car Parking

Diunggah oleh

Hak Cipta:

Format Tersedia

2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP)

Symposium on Information Processing

Smart Car Parking: Temporal Clustering and

of Computing and Information Systems, Dept. of Electrical and Electronic Eng.

AbstractA major challenge for modern cities is how to

Around 70% of the worlds population is expected to live

Analysing parking data in terms of predicting available

Analysing all the 570 parking location data purely using

Clustering is a process of nding groups of similar data

Our analysis on clustering and anomaly detection provides

C LUSTERING AND A NOMALY D ETECTION

The task of detecting interesting or unusual events in

SF PARKING DATA SET AND O UR A PPROACH

The city of San Francisco has deployed 8200 parking

1) Farthest First (FF) Clustering: The FF clustering is a

follows. First, a data vector x is selected arbitrarily. Second,

the proportion of data vectors considered to be anomalies is

1) Initial values are arbitrarily assigned for the mean and

Consider a data vector xi in the input space from a set of

EM determines the number of clusters by cross validation.

where {i : i = 1...n} are the slack variables that allow some

3) One-class SVM: Support Vector Data Description

where, k(xi , xj ) = (xi ).(xj ) is the kernel function, and

kernel width parameter. A larger value for provides a

In terms of the one-class SVM (the SVDD), exactly the

SF PARKING DATA A NALYSIS

The aim is to analyse the car parking data from a major

Figure 5 shows the clusters obtained using FF clustering

First we aim to identify any anomalous parking locations

Clustering and SVDD results.

Finally, we consider the use of ne-grained clustering in

table that shows the number of data vectors assigned to each

When we analyse these clusters, again we can see that these

In terms of anomalies, FF clustering identied a small

Cluster 4 also shows unusual behaviour, in terms of having

infrastructure that has been monitored using IoT devices. In

locations of the parking lots in selected clusters (Cluster 2

D ISCUSSION AND C ONCLUSION

Data analytics in large data sets collected by smart cities

[15] J. Jin, J. Gubbi, T. Luo, and M. Palaniswami, Network architecture

Anda mungkin juga menyukai