Anda di halaman 1dari 6

An Exploration of Citi Bike, New York City

Spencer Woody
Department of Statistical Science
Duke University
November 30, 2014

Abstract
Bicycle sharing systems have sprouted up across the United States and
around the world in recent years. The main problems these programs face
is regular redistribution of bikes across stations to avoid imbalances in
the system, and planning where to open new stations. The aim of this
paper is twofold: first, to predict hourly rentals and returns of bikes at
every station in every hour, and second, to group stations and trips into
clusters to find spatio-temporal patterns in activity. New Yorks newlyopened bike share, Citi Bike, is chosen as a case study.

Introduction

Bike sharing systems have surged in popularity over recent years, with large
installments appearing in major cities like London, Paris, and Washington, D.C.
In these systems, users buy a membership, either short term (24 hours) or long
term (one month or more), and in return are able to rent bikes from any station
in the city, as long as they return them within a certain time window. Bike
shares are desirable because they are a clean, efficient, and healthy alternative
means of transportation, and solves the last mile problem of public transit,
whereby commuters get off a bus or the subway and still have a considerable
distance to cover to reach their destination. City planners are particularly keen
on these systems because they have the potential to mitigate automobile traffic
and improve air quality. Many of these bike shares have their usage data released
publicly.
Previous work has focused on two main problems. First, asymmetry of
rentals and returns at stations causes stations to occasionally become completely
empty or full, meaning that customers cannot rent or return bikes at these
stations. For instance, many customers may rent bikes and ride downtown
during the morning rush hour, causing an imbalance in the number of bikes
available at stations. Second, there is the issue of planning where to open new
adviser:

Robert Wolpert

stations. It is necessary to ensure that To facilitate this, it is helpful to find


space and time patterns of activity.
This paper aims to tackle these issues building off of previous work done
on bike shares. First, a Poisson model is built to predict hourly rentals and
returns at every station of New Yorks Citi Bike system. This is similar to the
approach taken in studies such as that of Pariss system (Velib; see [7]), which
also constructs a Poisson count model, but here we incorporate weather data
as well as day of the week into a hierarchical linear model. Such a model is
beneficial to the balancing of bikes throughout the city to avoid build-ups and
depletions within the system. Next, we group stations into communities based
on similar activity patterns throughout the course of the day and week. Finally, we use spatiotemporal clustering on high-activity routes between stations
i and j via ST-DBSCAN (Spatiotemporal Density-based Spatial Clustering of
Applications with Noise) to gauge traffic flows

The Data

Trip history data are accessed from Citi Bikes website [1]. A trip consists
of a bike being rented out from one station and being returned to another
station. For every data point, a trips beginning time, duration, origin, and
destination is recorded. These data are available going back to when Citi Bike
first launched in May 2013 and span all the way to August 2014. This comprises
of approximately 12 million trips in total. These data are preprocessed by Citi
Bike to remove trips lasting less than one minute to avoid counting rentals which
are the result of a malfunction, or a member accidentally renting out a damaged
bike. The identification of each rider is kept hidden by Citi Bike as a means to
protect members privacy, and the number of unique riders over this timespan is
unknown. In addition, every stations geographic coordinates and elevation are
available online. Finally, weather data were obtained from NOAA, including
daily average temperatures, rainfall, and snowfall [2].
Below is an exploratory analysis of Citi Bike using data only from a sampled
week of July 6 through July 12, 2014. This sample was selected purposely
because global activity is highest in the month of July, and an analysis.

2.1

Time dynamics

Figure 1 shows histograms for system-wide hourly rentals for both weekdays
and weekends during the sampled week. There is markedly different behavior
between these two. Weekday rentals show a bimodal distribution with peaks
around 9:00 AM and 5:00 PM, corresponding to the beginning and end of the
average workday. In addition, the morning peak is slightly below the evening
peak. This could be explained by travelers too groggy to bike in the morning
and preferring to wait until the evening to take advantage of the bikeshare. On
the other hand, weekend usage shows a relatively smooth unimodal distribution

0.0e+00

5.0e06

1.0e05

1.5e05

2.0e05

2.5e05

3.0e05

Weekday Usage

00:00

03:00

06:00

09:00

12:00

15:00

18:00

21:00

00:00

18:00

21:00

00:00

Start Time

0.0e+00

5.0e06

1.0e05

1.5e05

2.0e05

Weekend Usage

00:00

03:00

06:00

09:00

12:00

15:00

Start Time

Figure 1: Weekday and Weekend Usage, July 6 12, 2014

with a peak in the mid-afternoon, suggesting that these trips were taken by
users mainly as a leisurely activity.

2.2

Weather dynamics

Weather conditions seemingly have a large impact on the activity of Citi Bike.
Figure 2 demonstrates the relationship between system-wide daily rentals and
average temperature. The relationship is roughly linear, and interestingly, there
is no immediately noticeable decreasing trend in trips taken for high temperatures. For extremely low temperatures below 30 degrees F, the number of trips
taken approaches zero.

No. of Trips

10000

20000

30000

40000

Daily No. of Trips vs. Average Temperature

Precipitation Not Present


Precipitation Present

20

40

60

80

Average Temperature (degrees F)

Figure 2: Number of Trips Taken in Varying Temperatures

Methods

3.1

Prediction of Rentals and Returns at Stations

out
in
We define Xslt
and Xslt
as outflows and inflows, respectively, through station
s for one hour, where

l = 0 for weekends, l = 1 for weekdays


t represents an hour-long block, t {1, 2, . . . , 24} and t = 0 [00:00, 01:00)

Assume Xsl(t=i) is independent of Xsl(t=j) for i 6= j, conditional on s and


l.
For the sake of simplicity, we will simply use Xslt to represent either inflows
or outflows for a station. Because Xslt represents count data, we propose the
Poisson distribution
Xslt,d Po(slt,d ),

(1)

with a parameter following a two-level regression


log(slt,d ) = 0,slt + 1,slt Td + 2,slt Pd

(2)

0,slt = 00,st + 00,st Ld

(3)

1,slt = 10,st + 11,st Ld

(4)

2,slt = 20,st + 21,st Ld ,

(5)

for day d, given mean-centered average temperature in degrees F (Td ), indicator


variable for precipitation (Pd ), and weekday / weekend (Ld ). Note that in
this model, the extent to which temperature and precipitation impact rentals /
returns varies for weekends and weekdays.

3.2

Communities of Stations

We propose a clustering scheme to make a nested community of stations. The


clustering algorithm presented in [5] uses a heuristic method to optimize the
Newmans modularity defined below.
X
1
T [n, m]
Q=
N (N 1) n,m

3.3

j6=n

T [j, n]

k6=m

T [m, k]

N (N 1)

(6)

Clustering of Routes from Station i to Station j

Here we use ST-DBSCN to cluster routes based on temporal patterns in activity.

Discussion

Conclusion

References
[1] Citi Bike. System data. http://www.citibikenyc.com/system-data. Accessed: September 15, 2014.
5

[2] NOAA. Quality controlled local climatological data (QCLCD).


http://www.ncdc.noaa.gov/data-access/land-based-station-data/landbased-datasets/quality-controlled-local-climatological-data-qclcd.
Accessed: September 15, 2014.
[3] Srishti Gupta Alicia Bargar, Amrita Gupta and Ding Ma. Interactive visual analytics for multi-city bikeshare data analysis. The 3rd International
Workshop on Urban Computing, Aug 2014.
[4] Derya Birant and Alp Kut. ST-DBSCAN: An algorithm for clustering
spatial-temporal data. Data Knowl. Eng., 60(1):208221, January 2007.
[5] V.D. Blondel, J.L. Guillaume, R. Lambiotte, and E.L.J.S. Mech. Fast
unfolding of communities in large networks. J. Stat. Mech, page P10008,
2008.
[6] Pierre Borgnat, Patrice Abry, Patrick Flandrin, Celine Robardet, JeanBaptiste Rouquier, and Eric Fleury. Shared Bicycles in a City: A Signal
processing and Data Analysis Perspective. Advances in Complex Systems,
14(3):124, June 2011.
[7] C
ome Etienne and Oukhellou Latifa. Model-based count series clustering
for bike-sharing system usage mining, a case study with the velib system
of paris. ACM TIST, 5(3), 2012.
[8] E. A. Leicht and M. E. J. Newman. Community structure in directed
networks. Phys. Rev. Lett., 100:118703, Mar 2008.
[9] M. E. J. Newman and M. Girvan. Finding and evaluating community
structure in networks. Phys. Rev. E, 69(2):026113, February 2004.
[10] Patrick Vogel, Torsten Greiser, and Dirk Christian Mattfeld. Understanding bike-sharing systems using data mining: Exploring activity patterns.
Procedia - Social and Behavioral Sciences, 20(0):514 523, 2011.

Anda mungkin juga menyukai