Spencer Woody
Department of Statistical Science
Duke University
November 30, 2014
Abstract
Bicycle sharing systems have sprouted up across the United States and
around the world in recent years. The main problems these programs face
is regular redistribution of bikes across stations to avoid imbalances in
the system, and planning where to open new stations. The aim of this
paper is twofold: first, to predict hourly rentals and returns of bikes at
every station in every hour, and second, to group stations and trips into
clusters to find spatio-temporal patterns in activity. New Yorks newlyopened bike share, Citi Bike, is chosen as a case study.
Introduction
Bike sharing systems have surged in popularity over recent years, with large
installments appearing in major cities like London, Paris, and Washington, D.C.
In these systems, users buy a membership, either short term (24 hours) or long
term (one month or more), and in return are able to rent bikes from any station
in the city, as long as they return them within a certain time window. Bike
shares are desirable because they are a clean, efficient, and healthy alternative
means of transportation, and solves the last mile problem of public transit,
whereby commuters get off a bus or the subway and still have a considerable
distance to cover to reach their destination. City planners are particularly keen
on these systems because they have the potential to mitigate automobile traffic
and improve air quality. Many of these bike shares have their usage data released
publicly.
Previous work has focused on two main problems. First, asymmetry of
rentals and returns at stations causes stations to occasionally become completely
empty or full, meaning that customers cannot rent or return bikes at these
stations. For instance, many customers may rent bikes and ride downtown
during the morning rush hour, causing an imbalance in the number of bikes
available at stations. Second, there is the issue of planning where to open new
adviser:
Robert Wolpert
The Data
Trip history data are accessed from Citi Bikes website [1]. A trip consists
of a bike being rented out from one station and being returned to another
station. For every data point, a trips beginning time, duration, origin, and
destination is recorded. These data are available going back to when Citi Bike
first launched in May 2013 and span all the way to August 2014. This comprises
of approximately 12 million trips in total. These data are preprocessed by Citi
Bike to remove trips lasting less than one minute to avoid counting rentals which
are the result of a malfunction, or a member accidentally renting out a damaged
bike. The identification of each rider is kept hidden by Citi Bike as a means to
protect members privacy, and the number of unique riders over this timespan is
unknown. In addition, every stations geographic coordinates and elevation are
available online. Finally, weather data were obtained from NOAA, including
daily average temperatures, rainfall, and snowfall [2].
Below is an exploratory analysis of Citi Bike using data only from a sampled
week of July 6 through July 12, 2014. This sample was selected purposely
because global activity is highest in the month of July, and an analysis.
2.1
Time dynamics
Figure 1 shows histograms for system-wide hourly rentals for both weekdays
and weekends during the sampled week. There is markedly different behavior
between these two. Weekday rentals show a bimodal distribution with peaks
around 9:00 AM and 5:00 PM, corresponding to the beginning and end of the
average workday. In addition, the morning peak is slightly below the evening
peak. This could be explained by travelers too groggy to bike in the morning
and preferring to wait until the evening to take advantage of the bikeshare. On
the other hand, weekend usage shows a relatively smooth unimodal distribution
0.0e+00
5.0e06
1.0e05
1.5e05
2.0e05
2.5e05
3.0e05
Weekday Usage
00:00
03:00
06:00
09:00
12:00
15:00
18:00
21:00
00:00
18:00
21:00
00:00
Start Time
0.0e+00
5.0e06
1.0e05
1.5e05
2.0e05
Weekend Usage
00:00
03:00
06:00
09:00
12:00
15:00
Start Time
with a peak in the mid-afternoon, suggesting that these trips were taken by
users mainly as a leisurely activity.
2.2
Weather dynamics
Weather conditions seemingly have a large impact on the activity of Citi Bike.
Figure 2 demonstrates the relationship between system-wide daily rentals and
average temperature. The relationship is roughly linear, and interestingly, there
is no immediately noticeable decreasing trend in trips taken for high temperatures. For extremely low temperatures below 30 degrees F, the number of trips
taken approaches zero.
No. of Trips
10000
20000
30000
40000
20
40
60
80
Methods
3.1
out
in
We define Xslt
and Xslt
as outflows and inflows, respectively, through station
s for one hour, where
(1)
(2)
(3)
(4)
(5)
3.2
Communities of Stations
3.3
j6=n
T [j, n]
k6=m
T [m, k]
N (N 1)
(6)
Discussion
Conclusion
References
[1] Citi Bike. System data. http://www.citibikenyc.com/system-data. Accessed: September 15, 2014.
5