AbstractWiFi networks play a signicant role in providing of WiFi networks in relatively small areas. Chen et al. [2]
todays wireless connectivity; therefore, understanding and measured the network performance of a university cam-
improving WiFi network performance is important for todays pus WiFi network and identied the dominant factors that
mobile applications and services. Previous studies conducted
to investigate WiFi network performance have generally been affect network performance. Divgi et al. [3] focused on
performed using specic types of WiFi networks in relatively commercial WiFi networks in Australia and presented some
small areas and have been limited by either the scale of user activity characteristics. Ghosh et al. [4] studied user
the studied WiFi access points (APs) or by the number of behaviors when they were associated with public hotspots
users. This paper describes a country-wide measurement study using AT&T public datasets. Patro et al. [5] measured the
on WiFi network performance with the goal of determining
which factors affect the quality of service (QoS) of high- user experience using OpenWrt-based access points. Farshad
density WiFi networks. We use a crowdsourced approach to et al. [6] measured the density of access points and inter-
study both the latency and bandwidth experienced by users in ferences among neighboring access points in the city. Some
different types of WiFi networks. Our ndings indicate that other large-scale studies have been performed, but, in these
(1) WiFi network performance is correlated not only with studies, the WiFi networks were typically controllable and
signal strength but also with factors such as time; (2) the
latency and bandwidth experienced by users generally exhibit operational. Biswas et al. [7] monitored large-scale wireless
different patterns and are affected differently by the studied links using customized access points that were controllable
factors; and (3) users experience signicantly different QoS with dedicated radio hardware (e.g., Meraki MR18). Sui
from different nearby APs, which suggests that providing users et al. [8] deployed the lightweight WiFiSeer framework
with better information when choosing a WiFi connection can into large-scale operational WiFi networks on a campus to
yield more satisfactory results. We also provide insights on how
such ndings can be utilized to improve AP deployments and characterize and improve WiFi latency.
optimize association strategies.
1
Keywords-WiFi; QoS; Data Mining; Latency; Bandwidth
0.8
I. I NTRODUCTION
0.6 Beijing
Smart mobile devices, such as smartphones and tablets,
CDF
Shenzhen
have become increasingly popular among users. These de- 0.4 Shanghai
Guangzhou
vices allow users to enjoy the mobile Internet more often
than before. WiFi networks have become one of the most 0.2
popular choices for users to surf the Internet [1] while on the
0
move. Currently, WiFi networks can be found everywhere, 1 5 10 20 50 100 300
from retail stores to hotels, and are used to deliver large Nbunber of Hotspots in Each Cell
amounts of multimedia trafc such as video/audio streaming,
Figure 1. The Density of WiFi Networks in Four Cities in China. Using
online gaming, and so forth. These uses indicate that WiFi collected WiFi datasets, we divide the areas of these four cities into 100
networks play a signicant role in providing todays wire- x 100-meter cells and count the number of WiFi hotspots in each cell to
less connectivity. When many WiFi networks are available calculate the density and draw the cumulative distribution function (CDF)
curves. Nearly 50 percent of the cells in each city have more than 5 WiFi
in a city, studying network performance and determining hotspots, indicating that the density of WiFi networks in cities is high.
which factors affect the WiFi quality of service (QoS) are
important problems. Solving these problems will provide us However, compared with previous studies, studying net-
with opportunities to better understand and improve network work performance and mining the factors that affect WiFi
performance and to recommend better WiFi choices for users QoS on a large scale in cities remain challenging. There
so they can achieve better QoS. are multiple reasons for this challenge. First, collecting
In previous studies, network performance measurements large-scale WiFi datasets in metropolitan areas is difcult
and the determination of which factors inuence WiFi because i) WiFi networks are composed of many different
QoS have generally been performed using specic types types of WiFi networks, including private, public, residen-
10
To deploy this crowdsourced WiFi association system, we Table II
W I F I COVERAGE IN REPRESENTATIVE CITIES IN C HINA
collaborate with our industrial partner, the Tencent WiFi THE SIZE OF EACH CITY IS UPDATED IN 2015.
team, which runs a popular APP installed on smart mobile
devices. By combining the crowdsourced approach with
City WiFi covered area (km2 ) Coverage fraction
this APP, we can collect users hotspot associations and
Beijing 1357.47 0.083
WiFi network environment whenever the APP is activated Guangzhou 950.04 0.128
by users. Note that users have allowed the actions of this Shanghai 1396.40 0.220
APP and that we strictly protect user privacy. Specically, Shenzhen 535.19 0.274
the APP measures the following information: (1) latency,
such as connection time cost, is the time cost of sending
an association request to the response received and (2) speed) are used separately in latency mining and bandwidth
bandwidth, including both download and upload speeds, is mining because some (i.e., number of connections and ping
measured by downloading/uploading a 5 Mb le from/to the speed) are recorded in the latency dataset and the others are
testing CDN servers. recorded in the bandwidth dataset.
B. WiFi Datasets Table III
T HE FACTORS OF W I F I DATASETS FOR M INING
1) The Whole Datasets: The WiFi datasets were collected
from 4 representative cities (i.e., Beijing, Shanghai, Shen- ID Factors Value Ranges Used
zhen and Guangzhou) in China and are composed of two 1 Signal Strength 0,1,2,3,4 L B
parts: latency data and bandwidth data, as shown in Table I. 2 Signal-to-noise Ratio (SNR) 0,1,2,3,4,5,6,7,8 B
The latency data records are from Nov. 27 to Dec. 10, 2015, 3 Internet Service Provider (ISP) 8 classes* B
4 Number of Connections 0 255 L
and the bandwidth data records are from Nov. 27 to Dec. 5 Ping Speed 0ms L
27. These WiFi datasets have as many as ten million records 6 Time 24 Hours L B
in each class (i.e., users and hotspots), making it possible to * Includes 46000,46001,46002,46003,46007,46011,20404,45412.
study the factors that inuence WiFi QoS on a large scale.
This information also indirectly indicates that WiFi network 4) QoS: To study the factors that inuence WiFi network
density in Chinese cities is high. QoS, we choose 2 classic metrics, latency and bandwidth,
to measure the WiFi network QoS. To measure latency, we
Table I
T HE I NTRODUCTION OF W I F I DATASETS select the latency factor, which includes the time cost of
pinging from users smart mobile devices to CDN servers.
Items Users Hotspots Sessions Days For bandwidth, download and upload speeds are chosen,
Latency 6,812,933 12,812,420 38,444,958 14 which provide information about the transmission speed
Bandwidth 4,119,049 4,676,806 11,409,178 30 from users smart mobile devices to CDN servers.
Total 10,931,982 17,489,226 49,854,136 30*
* All days range from Nov. 27 to Dec. 27, 2015. C. Preprocessing
However, continuous values are not appropriate for data
2) WiFi Coverage: We investigate how representative mining algorithms such as decision tree. Therefore, we rst
cities are covered by WiFi networks. Based on the locations discretize the value ranges of the latency and download
of the WiFi hotspots, we calculate the area in these cities speed. As illustrated in Figure 6(a), we nd that upload
covered by WiFi networks. We assume that a WiFi hotspot speed and download speed have nearly the same distribution;
can cover a circular area with a radius of 30 meters. Table II thus, we choose download speed as a representative case.
lists the WiFi-covered areas in different cities, in both km2
(i.e., the size of the area that is covered by WiFi networks) Table IV
and percentage (i.e., the percentage of WiFi-covered area D ISCRETIZATIONS OF L ATENCY AND BANDWIDTH
compared to the entire city area). This observation suggests ID Classes Latency Bandwidth
that cities with higher population densities usually have 0 Ex Fast/Low 50ms (16.7%) 40Kbps (8.3%)
higher rates of WiFi availability. 1 Fast/Low 50 130 ms (24.9%) 40 500 Kbps (32.7%)
2 Slow/High 130 300 ms (28.3%) 0.5 1.6 Mbps (36.6%)
3) Factors: By analyzing the WiFi datasets, we select 6 3 Ex Slow/High 300ms (30.1%) 1.6Mbps (22.4%)
core factors from the data items to mine the inuences: sig-
nal strength, SNR, ISP, number of connections, ping speed,
To perform discretization, we draw the CDF curves of
and time (Table III). In Table III, the Used column indicates
latency and download speed as shown in Figure 3. Com-
whether these factors are used in latency or bandwidth; an
paring the equal frequency discretization and equal width
L indicates latency and a B indicates bandwidth. These
discretization, we decide to discretize these value ranges
four factors (i.e., SNR, ISP, number of connections and ping
based on real-life demands. As shown in Table IV, the
11
value ranges for latency and download speed are both Table V
M AINSTREAM M ULTI -C LASSIFICATION M ETHOD ACCURACIES
split into four classes. For latency, we adopt a threshold
value of 50 ms to represent extremely fast performance Methods Latency Bandwidth
(Ex Fast); this performance level allows users to surf the Decision Tree 0.39658362 0.431763571
Internet more satisfactorily. For download speed, we nd that Extra Trees 0.31634051 0.438247991
approximately 22.4 percent of bandwidth records exceed 1.6 AdaBoost 0.39581812 0.432840988
Random Forest 0.39617334 0.432116284
Mbps; this speed allows les to be downloaded easily and is LDA 0.39602068 0.425746223
represented by Ex High. Using this discretization approach, KNN 0.29447991 0.338264691
network performance can be suitably characterized.
1 1
We must be careful when choosing machine learning al-
0.8
0.8
gorithms to mine the rules: the selected algorithms should
0.6 0.6 be able to not only tackle the complex relationships and
CDF
CDF
12
3.5 0.25 10-3 0.3
6
Time
Kendall Coefficient
Pearson Coefficient
0.2 Signal Strength
Information Gain
(a) Information Gain of Latency (b) Ratio of Information Gain for Latency (c) Pearson Coefcient of Latency (d) Kendall Coefcient of Latency
4.5 0.45 0.07 0.35
4 Information Gain Ratio 0.4 0.06 Time 0.3
ISP
Pearson Coefficient
Kendall Coefficient
Time Time 0.05
Information Gain
(e) Information Gain of Bandwidth (f) Ratio of Information Gain for Bandwidth (g) Pearson Coefcient of Bandwidth (h) Kendall Coefcient of Bandwidth
Figure 4. The inuences of six core factors on WiFi latency and bandwidth. We use statistical approaches and feature selection methods to calculate
these inuences. Figures (a), (b), (c), and (d) show the latency results, while (e), (f), (g), and (h) show the bandwidth results. The X-axis in all the images
represents the date.
C. Metrics of WiFi QoS nearly insignicant. As we all know, the time cost equals the
In this part, we present two traditional metrics, latency length of the path divided by the speed. When the number
and bandwidth, to measure WiFi QoS. of network hops from users smart mobile devices that are
1) Latency: WiFi latency [8] is a critical performance connected to the same access point to the same CDN server
measure for modern real-time interactive mobile Internet are the nearly same, latency is highly correlated with the
applications, including online games, instant messaging and ping speed.
live collaborative applications. In Table III, we research how From Figure 4(c) and Figure 4(d), we observe that the
much inuence four factors, namely, signal strength, number Pearson and Kendall correlations between the four factors
of connections, ping speed and time, have on WiFi latency. and latency are relatively small. As Figure 4(c) shows,
2) Bandwidth: Bandwidth is another core metric for the Pearson correlations are very small, below 0.01, which
measuring WiFi QoS. For users streaming video or audio indicates almost no correlation, while Figure 4(d) shows
streamingactivities that consume large amounts of data that the Kendall correlations are also small. The Kendall
bandwidth is more important than latency. In this paper, we correlations between latency and signal strength on some
select two metrics (download speed and upload speed) to days are greater than 0.25but overall, these values do not
represent bandwidth. indicate a strong positive correlation.
From the above analyses, the latency mining can be
IV. W I F I DATA M INING E XPERIMENTS summarized as follows: the PCC and KRCC tests show that
In this section, we use IG/IGR feature selection methods, these factors do not have obvious linear correlations with
PCC/KRCC statistical approaches and decision tree to per- WiFi latency, whereas the IG/IGR research results indicate
form mining experiments to investigate the effects of the that ping speed has a greater inuence on WiFi latency than
selected factors on latency and bandwidth and show results. the other three factors.
A. Latency B. Bandwidth
With the latency dataset, this subsection presents studies Bandwidth is another core metric for measuring WiFi
on the inuences of four factors on WiFi latency. As illus- QoS. As illustrated in Figure 4(e) and Figure 4(f), the study
trated in Figure 4(a) and Figure 4(b), in both measurements, results show that the four factors inuence bandwidth in
the four factors inuence latency in the following order of the following order: time > SNR > signal strength > ISP.
importance: ping speed > signal strength, time > number of Time has the highest correlation with bandwidth because
connections. Ping speed has the largest degree of inuence the same access point provides only a certain amount of
on latency, providing good evidence for the assertion above. total bandwidth, and people use WiFi networks regardless of
The degrees of inuence of signal strength and time are their current behavior (e.g., working, shopping or resting).
nearly the same, whereas the number of connections is The reason why the SNR factor has more inuence than
13
the signal strength factor is that the SNR is the signal D. Combining Latency with Bandwidth
strength without the noise that cause bandwidth losses. The factors shown in Table III could be used in both
Although WiFi networks are constructed by different ISPs, latency and bandwidth studies by using the same unique
the ISP factor has little inuence on bandwidth because the BSSID of the same hotspot to combine the WiFi datasets.
total bandwidth offered by the ISP is far greater than the Based on this idea, we research the correlation between
bandwidth available through a WiFi access pointin other latency and bandwidth at the same access point and ob-
words, the bottleneck is the AP, not the ISP. However, the tain the results shown in Figure 6(b). As shown in Fig-
total trafc on a given ISP trunk line might explain the slight ure 6(b), the latency and bandwidth of the same hotspot are
inuence that ISP has in this study. nearly independentin other words, users enjoying good
In Figure 4(g), the mining result from the PCC statistical latency/bandwidth are doing so randomly to some extent.
approach shows that the Pearson correlation values are very
small, indicating that the four factors have no obvious linear 1 10 6
2.2 400
correlations with WiFi bandwidth. In Figure 4(h), the KRCC 0.8
Download Speed
Upload Speed
download speed (bps)
ping latency (ms)
CDF
0.4 1.8
the other factors, indicating that signal strength has a slight 200
bandwidth, while the KRCC results show that signal strength (a) Distribution Of WiFi bandwidth (b) Correlation between Latency and Bandwidth
has a small correlation with bandwidth. The IG/IGR research Figure 6. Mining the correlation between latency and bandwidth combined:
results indicate that time has the largest inuence on WiFi (a) the distribution of WiFi download speed and upload speed are nearly
the same; therefore, we choose download speed to represent bandwidth. We
bandwidth while SNR has the second largest, and both have use the same unique BSSID to select the top 72 APs that appear in both
more inuence than the other factors. in the latency and bandwidth datasets to draw the curves in Figure (b).
C. Signal Strength
We use the signal strength reported by the users through E. Decision Tree Method
the WiFi connection session traces and associate it with the As shown in Table V, we chose the decision tree method
reported QoS, which includes users perceptions of latency (Figure 7) to model the effects of six factors on WiFi QoS.
and download speed. To study the correlation between signal In Figure 7(a), the decision tree method achieves an
strength and the QoS metrics, we plot the CDFs of the accuracy of 0.4318 in classifying the bandwidth based on
average latencies experienced by users at different signal four factors. We know that signal strength has an obvious
strength levels in Figure 5(a) and the CDFs of the average impact on bandwidth. However, from Figure 7(a), we can see
download speeds experienced by users at different signal the impact of other factors as well: (1) The smaller the signal
strength levels in Figure 5(b). The impact of signal strength strength is, the lower the bandwidth is, as shown in the left-
on bandwidth is obvious (i.e., a higher signal level leads to bottom and right-bottom branches; (2) Time is associated
a larger bandwidth speed), indicating that the bottleneck is with user behavior. Here, the value 63 means 21:00 hours.
usually the last hopthe wireless hop. Thus, it is important As the gure shows, the time mainly affects the bandwidth
to improve the deployment of WiFi networks to reduce around the time before 21:00 and after 21:00 (i.e., after
interference. In contrast, the impact of signal strength on 21:00 hours, a signal strength of <= 3.5 results in Low
the latency experienced by users is less obvious. Compared bandwidth, otherwise it results in High bandwidth); (3) As
with download speed, the latency caused in a network has indicated in the footnote for Table III, the 2 ISP indicates
a signicant impact on QoS [9]. the 46002 class. ISP 3 offered Ex High bandwidth
when the signal strength was > 2.5 and before approximately
1 1
signal-level=0
21:00; and (4) The SNR factor provides a chance to diagnose
signal-level=0
0.8 signal-level=1
signal-level=2
0.8
signal-level=1
signal-level=2
the Ex Low and Low classes shown in the right-bottom
0.6
signal-level=3
signal-level=4 0.6
signal-level=3
signal-level=4
branches. The ways in which these factors affect bandwidth
CDF
0.4 0.4
consideration when diagnosing the bandwidth.
0.2
0.2
In Figure 7(b), the decision tree approach achieves an
0
1 10 30 50 100 300500 1000
0
100 101 102 103 104 105 accuracy of 0.3966. By analyzing Figure 7(b), we can nd
Download Speed (Kbps)
Latency (ms)
the following. (1) Ping speed is the deciding factor in many
(a) The Inuence on Latency (b) The Inuence on Bandwidth branches, indicating that it is important for latency. (2)
Figure 5. The Inuence of Signal Strength on Latency and Bandwidth. In most cases, when the ping speed is large, the latency
14
Signal
Strength Signal
Strength
Signal
ISP
Strength Ping
Ping
Speed Speed
High Time ISP Time
Number of Ping Ping
Fast
Ex Low ISP
Signal Connections Speed Speed
SNR SNR Strength
Ex Low Low Ex Low Low High Ex High Low High Ex Fast Ex Slow Slow Ex Slow Ex Fast Slow
is Slow or Ex Slow. (3) However, in the right-bottom WiFi networks that are in range but belong to others would
branch, when the ping speed <= 1271.5ms (the prerequisite provide Internet access to mobile users. Soroush et al. [12]
is that signal strength must be > 2.5 and the ping speed studied how mobile users utilize dense deployments of WiFi
> 168.5ms), the latency is classied into the Ex Fast class. APs for concurrent WiFi connections. In particular, the
(4) Finally, in the left-bottom branch, the factor number practical issues of access-point discovery and DHCP lease
of connections has an important role in determining the acquisition using a single wireless channel were investigated
latency class, i.e., when the number of connections is <= 29, through a prototype. The access point association decision
the latency belongs to the Ex Fast class; otherwise it is generally made by the client locally and selshly. This
belongs to Ex Slow class. These results indicate that smart behavior has been analyzed through game theory in different
mobile devices connecting to the same access point will network settings [13]. Biswas et al. [14] studied wireless
compete for bandwidth resources. From the above ndings, network behaviors using traces collected from a cloud-based
we know that the inuences of various factors are quite network management system.
complex; the same factor will contribute to different results
B. WiFi Network Performance and Improvement
under different conditions.
Based on these study results, we provide some sugges- Some efforts have been devoted to WiFi network perfor-
tions to improve AP deployments and optimize association mance. Gupta et al. [15] studied the factors responsible for
strategies in summary. When selecting a WiFi network to the poor performance of dense WiFi networks and found that
achieve high bandwidth, we recommend considering the trafc asymmetry is a major factor in performance degrada-
time and SNR factor more. The inuence of SNR on WiFi tion in such environments. The limited number of orthogonal
QoS is larger than the inuence of the signal strength. This channels in 802.11 wireless networks result in overlapped
knowledge provides opportunities to improve access point channels among multiple access points, a situation known
deployment to decrease interference among high-density as co-channel access points [16] [17]. These access points
WiFi networks in cities.When selecting a WiFi network to inevitably suffer from higher interference, higher collisions
achieve low latency, we recommend connecting to the WiFi and, consequently, sub-optimal throughput. Sundaresan et
network that has the highest real-time ping speed and the al. [18] observed that Internet access links can signicantly
smallest number of connections to achieve better QoS. affect the performance users achieve because different ISPs
use different policies and trafc-shaping strategies and there
V. R ELATED W ORKS is no best ISP for all users.
A. Usage and Measurement of WiFi Networks C. AP selection
802.11 based WiFi networks have emerged as an at- There are many approaches to AP selection. Currently,
tractive solution that can provide network connectivity in most devices only utilize a subset of the available in-
places where individuals spend considerable amounts of formation to make choices based on simple assumptions
time. Several studies have investigated WiFi network usage. concerning wireless performance. The traditional preference
Afanasyev et al. [10] investigated the role that city-wide is to connect to access points with stronger RSSIand this is
WiFi deployments play in the increasingly diverse access a common approach [19]but stronger signal strength does
network spectrum and observed that a diverse set of mobility not ensure better performance [20]. Some implementations
patterns map well to the archetypal use cases for tradi- utilize historical or actively measured client-side information
tional access technologies. Efstathiou et al. [11] proposed in addition to RSSI [21]. However, it is time-consuming and
a decentralized approach for WiFi sharing, in which private bothersome for each user device to test each nearby AP.
15
VI. C ONCLUSION [8] K. Sui, M. Zhou, D. Liu, M. Ma, D. Pei, Y. Zhao, Z. Li, and
T. Moscibroda, Characterizing and improving wi latency
In this paper, we focus on mining the factors that inuence in large-scale operational networks, in Proceedings of the
the QoS of urban high-density WiFi networks. We create a 14th Annual International Conference on Mobile Systems,
crowdsourced approach that is combined with a popular APP Applications, and Services, ser. MobiSys 16, 2016.
installed on smart mobile devices to collect WiFi datasets
easily in four representative cities in China. The large [9] C. Ly, C.-H. Hsu, and M. Hefeeda, Improving Online
Gaming Quality using Detour Paths, in ACM International
scales of these WiFi datasets are helpful in overcoming the Conference on Multimedia (Multimedia), 2010.
limitations of WiFi mining. To the best of our knowledge, we
are the rst to conduct a country-level measurement study [10] M. Afanasyev, T. Chen, G. M. Voelker, and A. C. Snoeren,
on WiFi network performance and investigate which factors Usage patterns in an urban wi network, Networking,
affect the urban high-density WiFi latency and bandwidth. IEEE/ACM Transactions on, vol. 18, no. 5, 2010.
First, to understand the WiFi network performance, we select [11] E. C. Efstathiou, P. A. Frangoudis, and G. C. Polyzos,
six core factors that determine the QoS metrics and use Controlled wi- sharing in cities: A decentralized approach
feature selection methods and statistical approaches to study relying on indirect reciprocity, Mobile Computing, IEEE
the correlations. Second, we choose decision tree method to Transactions on, vol. 9, no. 8, pp. 11471160, 2010.
model the rules based on these factors to instruct to improve
[12] H. Soroush, P. Gilbert, N. Banerjee, B. N. Levine, M. Corner,
AP deployments and optimize association strategies. Finally, and L. Cox, Concurrent wi- for mobile users: analysis and
for users who need to achieve low latency and high band- measurements, in Proceedings of the Seventh COnference on
width simultaneously, our mining results show that these emerging Networking EXperiments and Technologies. ACM.
two targets cannot be simultaneously guaranteed; however,
we will study this problem more deeply in the future. [13] W. Xu, C. Hua, and A. Huang, Channel assignment and user
association game in dense 802.11 wireless networks, in IEEE
ACKNOWLEDGMENT International Conference on Communications, 2011, pp. 15.
We would like to thank our industrial partnerthe Ten- [14] S. Biswas, J. Bicket, E. Wong, R. Musaloiu-E, A. Bhartia, and
cent WiFi teamfor providing the WiFi datasets. This work D. Aguayo, Large-scale measurements of wireless network
is supported in part by funding from the Tsinghua-Tencent behavior, in Proceedings of the 2015 ACM Conference on
Joint Laboratory for Internet Innovation Technology. Special Interest Group on Data Communication. ACM.
16