Anda di halaman 1dari 15

3160 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO.

11, NOVEMBER 2015

Source Selection and Content Dissemination


for Preference-Aware Traffic Offloading
Hsueh-Hung Cheng and Kate Ching-Ju Lin, Member, IEEE

AbstractAs mobile devices become more ubiquitous, the amount of cellular traffic for multimedia content grows explosively.
Therefore, data dissemination through proximity-based opportunistic communications attracts the attention of service providers who
are eager for solutions of traffic offloading. In this paper, we propose PrefCast, a preference-aware opportunistic content dissemination
protocol that uses as few cellular bandwidth as possible to maximally satisfy user preferences for content objects. The efficiency of
PrefCast depends on 1) how does the base-station select initial sources, and 2) how does each user forward objects within a limited
contact duration. Since mobile users typically form communities and have heterogeneous preferences, PrefCasts base-station selects
sources that efficiently produce the maximal utility to their communities. We then derive a model to predict how much utility a forwarder
can contribute to future contacts. PrefCasts users can hence use such prediction to find their optimal forwarding schedule, which
maximizes the utility contribution, in a distributed way. Our trace-based evaluation shows that, without explicit source selection,
PrefCast produces a 15.7 and 22.6 percent higher average utility than the protocols that only consider contact frequency or preference
of local contacts, respectively. Enabling source selection in PrefCast further improves the utility by 49.3 percent.

Index TermsOpportunistic dissemination, source selection, traffic offloading, user preference

1 INTRODUCTION

W ITH the growing popularity of personalization appli-


cations, clients prefer to access multimedia content
based on their personal interests via mobile devices. Previ-
forwarding in a mobile social network. Efficiency of such
opportunistic traffic offloading is mainly determined by
two key factors: 1) selection of initial sources, and 2) oppor-
ous study [1] has reported that mobile data traffic of AT&T tunistic forwarding strategy. Most of previous studies [1],
has increased 5,000 percent in the past few years. Among [2], [3], [4] address the above two challenges only according
these traffic demands, a major portion of traffic is contrib- to contact frequency between mobile users, but are lack
uted by content dissemination. One way to eliminate the of considering heterogeneous user preferences for various
bandwidth requirement for this application-layer content content objects. Therefore, this paper targets at developing a
distribution service is to exploit cellular multicast protocols, preference-aware opportunistic content dissemination sys-
e.g., MBMS or eMBMS. The efficiency of MBMS/eMBMS tem that addresses the above two design issues with consid-
however could decrease when users join the system in dif- eration of user preferences.
ferent time and require the base-station to broadcast several Efficient offloading relies on a suitable set of initial sources
times whenever every new user joins. To eliminate this bur- that can quickly distribute content objects over an MSN. An
den, many prior works [2], [3], [4] have shown the benefit of intuitive idea is to select the users with a high probability of
leveraging mobile social networks (MSNs) to help offload encountering other users as initial sources. This simple solu-
cellular traffic via opportunistic communications for content tion however neglects the structure of a social network. Spe-
dissemination applications. cifically, if the base-station delivers the same object to
Opportunistic communication [5], [6] is a technique that sources belonging to the same community, users in different
allows users without permanent connection to communi- communities could hardly obtain the object through oppor-
cate using low-cost proximity-based connection, such as tunistic communications. A more desirable solution is to
Wi-Fi or Bluetooth, when they encounter opportunistically deliver objects to as few sources as possible, while ensuring
[7]. Thus, instead of unicasting the content object to every that the selected sources belong to different communities
subscriber via cellular connection, the cellular base-station and can help distribute content over the entire mobile social
can deliver content objects to part of subscribers, called
network. In addition, different communities might be inter-
initial sources, and have the initial sources propagate
ested in different types of content, source selection should
the objects to all the subscribers through opportunistic
also consider heterogeneous interests of communities. The
first goal of this work is hence to propose a community-
based source selection scheme that takes both the commu-
 The authors are with the Research Center for Information Technology
Innovation, Academia Sinica, Taipei, Taiwan. nity structure of an MSN and user preferences into account.
E-mail: {xuehung, katelin}@citi.sinica.edu.tw. Once the objects are delivered to initial sources, users can
Manuscript received 24 Mar. 2014; revised 7 Oct. 2014; accepted 9 Oct. 2014. exploit opportunistic communications to carry-and-forward
Date of publication 16 Oct. 2014; date of current version 7 Oct. 2015. the objects. In an MSN, due to user mobility and varying
Recommended for acceptance by Y. Wang. network conditions, connection between users usually exists
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below. only for a short period of time. To improve forwarding effi-
Digital Object Identifier no. 10.1109/TPDS.2014.2363652 ciency within a limited contact duration, each user should
1045-9219 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3161

carefully schedule when to deliver which content objects. how to select a fixed number of initial sources. It however
Intuitively, a user can forward objects merely according to does not discuss how to efficiently distribute objects in an
the preferences of its local contacts. This strategy is however opportunistic network. Baier et al. [3] utilize movement
sub-optimal because any local contact can further distribute prediction of mobile users to improve the performance of
those received objects to its future contacts until the objects opportunistic traffic offloading. Lu et al. [8] then propose
expire. Therefore, a forwarder should consider not only the an approximation algorithm to identify a small number of
preferences of its local contacts, but also the overall utility individuals to diffuse the information to the entire net-
gained by future contacts. Our second goal is hence to work as soon as possible. Our prior work [4] investigates
design a forwarding strategy that maximizes the utility of a source selection scheme that further takes social com-
all users, including both local contacts and future contacts. munities and forwarding probability in an opportunistic
In this paper, we propose PrefCast, a preference-aware network into account. In addition to the above offloading
content dissemination protocol that offloads increasing cel- protocols, community detection algorithms, e.g., [9], [10],
lular traffic via opportunistic communications, while can also be modified to select sources for traffic offload-
achieving the maximal global utility of all mobile users. The ing. For example, an intuitive modification is to select a
rationale of PrefCasts design is to use as few cellular band- few sources from each community to ensure full cover-
width resources as possible, but efficiently exploit limited age. The above works consider a scenario where the cellu-
contact opportunities to maximally satisfy users preferen- lar network wants to opportunistically disseminate a
ces. To this end, PrefCast considers both the community single content object to as many subscribers as possible,
structure of an MSN and heterogeneous user preferences to without considering users heterogeneous interests in dif-
address two issues: initial source selection and forwarding ferent content objects. This work however focuses on off-
scheduling. Our contributions are as follows. loading traffic load by disseminating content to users of
interest through opportunistic communications. On the
 PrefCast enables community-based source selection other hand, all the above works, including our previous
that considers the importance of selected sources in work, do not take user incentive into account, which how-
the communities of an MSN. It hence achieves the ever is also one of the design challenges in mobile social
trade-off between maximizing user preferences and networks. While our work takes the first step to consider
offloading cellular traffic. user preference in joint source selection and content dis-
 PrefCast allows each user in an MSN to predict the semination, incorporating recent incentive strategies, e.g.,
total future utility it can contribute by forwarding an [11], [12], [13], into our design is undoubtedly the next
object in a given time period. Given such prediction, step to improve efficiency and practicality of our protocol.
each user then finds its optimal forwarding schedule
that generates the maximal global utility in a distrib-
uted manner. 2.2 Opportunistic Communications
 We use trace-based simulations to evaluate the per- Opportunistic communication is a routing scheme typically
formance of PrefCast. The results show that, without used in Delay Tolerant Networks (DTNs) [14]. Prior works
explicit source selection, the proposed forwarding on opportunistic communications can be classified into
scheduling scheme produces a 15.7 and 22.6 percent three classes: unicast, multicast, and content dissemination.
higher average utility than the protocols that only Several unicast routing schemes have been proposed to
consider contact frequency or preferences of local improve the end-to-end delivery ratio or transmission
contacts, respectively. Enabling the proposed com- delay in a DTN. For example, Epidemic routing [15] floods
munity-based source selection in PrefCast further messages to every contact until the messages reach the des-
improves the utility by 49.3 percent. tination. To reduce the overhead of epidemic routing,
The remainder of this paper is organized as follows. some later works investigate the trade-off between the
Section 2 summarizes existing works on cellular traffic off- number of relays and the delivery probability. In
loading and content dissemination through opportunistic PROPHET [16], each user predicts the contact probability
communications. Section 3 gives an overview of PrefCast. of each node pair, and forwards messages to the node that
Sections 4 and 5 describe the details of the proposed source has the highest contact probability with the destination.
selection algorithm and forwarding scheduling scheme, Spray and wait routing [17] operates similar to Epidemic
respectively. In Section 6, we evaluate the performance of routing, but restricts the number of forwardings for each
PrefCast using real trace data. Finally, Section 7 concludes object in order to reduce the forwarding overhead.
this work. Recently, social-based forwarding schemes [18], [19] con-
sider various social network properties including central-
2 RELATED WORK ity and communities, and forward data to the nodes
playing the vital roles in a social network.
In this section, we review the related work in three catego-
Unicast routing protocols have been extended to support
ries: cellular traffic offloading, opportunistic communica-
multicast in [20], [21], [22]. The approach proposed in [20]
tions, and interest-based content distribution.
attempts to maximize the message delivery ratio for multi-
cast members that join or leave the network dynamically. In
2.1 Traffic Offloading [21], [22], given a set of destinations, the authors extend the
The concept of offloading cellular traffic through opportu- two-hop relay algorithms used in unicast to support multi-
nistic communications is proposed in [1], [2], which study cast destinations.
3162 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

Different from the above unicast and multicast protocols pre-defined utility function. In particular, an object m is
that have a set of specific destinations, content dissemina- usually associated with several attributes am;1 ; am;2 ;    ; am;k ,
tion [23], [24], [25], [26], [27], [28] targets on distributing e.g., singer, genre, language for a music object. Each client
objects to multiple destinations. The dissemination protocol might have different preference ui;m;a for different attributes
proposed in [23] ensures that content delivered to the users a. An example utility function g is to estimate the final util-
is as fresh as possible subject to the limited downlink ity value by calculating the weighted average of the
capacity of the service provider. Another work [24] adapts preference
P for different attributes, i.e., ui;m gui;m;a
a a i;m;a where the weight g a can be selected by the
g
the forwarding strategy to the temporal utility of subscrip- u ,
tions and events such that information can be delivered to application developers based on the importance of attribute
users in time. The work [27] then exploits the concept of a. Our design only requires the information about the final
geocommunity to improve data broadcasting efficiency. utility value ui;m , but lets the application developers or sys-
Time-varying channel conditions and user mobility are then tem operators define their own utility function g, which
jointly considered in [28]. All the above works focus on can be a linear or non-linear function. User i can gain the
improving content delivery efficiency without considering utility ui;m if it retrieves object m before m expires, i.e., Tm .
user interest. Our goal is to exploit limited cellular bandwidth resources
to provide users as high utility as possible.
2.3 Interest-Based Content Distribution
Interest-based content distribution has been considered as
an important issue in several networks, such as peer-to-peer 3.1 PrefCasts Components
networks [29], [30] and cellular networks [31]. Recently, To offload cellular traffic, PrefCast lets the base-station
some works further study interest-based content distribu- deliver the objects to part of mobile users, which are desig-
tion in delay tolerant networks [32], [33], [34], [35]. The nated as initial sources. Those initial sources then help dis-
work [32] investigates how to place data objects in regions seminate the objects to other users in the MSN using
where users of interest locate. It however is not designed opportunistic forwarding. To realize this goal, PrefCast has
from the perspective of cellular traffic offloading, and also the following two components: 1) source-object assignment,
only leverages random forwarding. Ning et al. [33] consider which is performed by the base-station, and 2) preference-
a network where users are selfish, and propose a credit- aware forwarding, which is determined by each MSN user in
based incentive scheme to encourage users to share the a distributed way. We define the two problems as follows,
objects of interest. A publish/subscribe scheme is proposed and describe the detailed design of two components in Sec-
in [34] to allow DTN users to advertise or collect content tions 4 and 5, respectively.
they are interested in. The user-centric scheme [35] attempts Source-object assignment. We consider a scenario where
to select the minimum number of relays to forward an object the base-station has multiple objects to distribute. To ensure
to a set of known clients that are interested in that object. load balancing, the base-station should select multiple sour-
Our work however focuses on a best effort scenario, where ces to distribute different objects. The tuple of a source-
content sources do not explicitly know who will be the object assignment s; m indicates that the base-station
interesters, but simply disseminate the objects so that the selects user s 2 V as an initial source of object m 2 M. Since
utility gained by all clients can be maximized. These two different content objects typically have different popularity,
types of dissemination designs are both important and nec- We allow a popular object to be assigned to multiple initial
essary for diverse upper-layer applications. Our previous sources, which can cooperatively disseminate that popular
work [36] investigates preference-aware content dissemina- object to more users. The more initial sources the base-sta-
tion, which takes heterogeneous user preferences into tion selects to distribute the objects, the more mobile users
account and enables each forwarder to determine its opti- can obtain the objects opportunistically and gain utility.
mal object forwarding schedule that can produce the maxi- However, delivering the objects to too many sources also
mal total utility for all users. Different from our preliminary degrades the benefit of cellular traffic offloading. Hence, to
study that randomly selects a set of initial sources to dis- balance the trade-off between utility performance and off-
seminate content objects, this work argues that initial sour- loading efficiency, the base-station must determine an effi-
ces have a great impact on dissemination efficiency, and cient assignment, i.e., selecting a suitable set of tuples A
jointly solves initial source selection and preference-aware from all possible tuples fs; m : 8s 2 V; 8m 2 Mg. The
content dissemination. objective of source-object assignment is to maximize the
total preference utility for all mobile users, while keeping
the number of deliveries through cellular connection, i.e.,
3 PREFCAST OVERVIEW AND ASSUMPTIONS the size of A, as small as possible.
We consider an environment where a base-station intends Maximum-utility forwarding scheduling. Once the selected
to distribute a set of multimedia content objects M to a set sources download the assigned objects via cellular connec-
of users V before the deadline. All the users in V form a tion, they can disseminate those objects to other mobile
mobile social network G V; E, where E is the set of edges users via opportunistic communications. An MSN user
between any two users if they have contact with each other. becomes a forwarder if it obtains an object from the base-
Each edge i; j 2 E is associated with the contact frequency station or any other forwarder. Each forwarder can use
fij between users i and j. Each object m 2 M has a deadline proximity-based communication techniques, such as Wi-Fi
Tm . Suppose each user i 2 V has a different preference ui;m or Bluetooth, to distribute the objects to its contacts within
for each object m 2 M. The utility ui;m here is the value of a its transmission range. Each forwarder might have multiple
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3163

objects, but can only broadcast a single object at a time. The TABLE 1
problem becomes even more challenging because different Notations Used in Source-Object Assignment
contacts of a forwarder might have different contact dura-
Notations Definition
tions and would overhear different subsets of the forwarded
objects. As a result, different forwarding schedules, i.e., the V the set of users in an MSN
forwarding sequence of the forwarders objects, could result E the set of edges between any two users if they
have contact with each other
in various achievable utilities. To maximize the utility that M the set of content objects
all MSN users can get, each forwarder should carefully Gm V m ; E m the duplicate MSN graph for object m,
determine its forwarding schedule, which specifies which where V m V and E m E
object should be forwarded in each time-slot. We refer to Tm the deadline of object m
this problem as the forwarding scheduling problem. To solve ui;m the preference (utility) of user i for object m
this problem in a distributed way, we let each forwarder A the set of source-object assignments
use its local information to predict the utility that can be fij ; fijP the one-hop and multi-hop contact frequency
between users i and j, respectively
contributed to all the other users in the future by forward- 1 the community including the users that have
Ci;m ; Ci;m
ing an object in a particular time. We then design a prefer- multi-hop and one-hop contact, respectively,
ence-aware forwarding scheduling algorithm that allows each with user i by deadline Tm
forwarder to find its optimal forwarding schedule based on Ut the cumulative utility until the t-th iteration of
the prediction. source-object assignment
Du the threshold used to determine the number of
assignments
3.2 Assumptions
We assume that each user i can estimate the contact fre-
quency with user j based on the encounter history [37] by contact frequency but belong to the same community.
fij Nij =T , where Nij is the number of past contacts within a Namely, it is very likely that the objects can only be spread
duration T . Without loss of generality, fii is set to 1 for all through the single community, but cannot be distributed
i 2 V. The accuracy of fij can be improved by computing the across different communities. In addition, we also need to
moving average over time. A newly-joined user might not assign an object to an initial source with consideration of
obtain accurate enough estimation of contact frequency. We user preferences. Without considering preferences, we might
however assume that new users will join the network sequen- assign a source with a high contact frequency to deliver an
tially, instead of in a burst. The impact of such a new user object to many of its contacts who are not interested in that
problem is hence negligible. We will describe our protocol in object. Thus, we propose a community-based algorithm to
Section 4.1 by assuming that each client knows the contact determine the most effective set of source-object assign-
frequency with other clients. However, in practice, a client ments, i.e., tuples s; m. The basic idea behind our design is
might not have enough memory space or computational to select top k source-object assignments that can generate
power to track the contact frequency of all nearby contacts. the highest utility for users belonging to the communities of
We will explain in Section 4.3 that, in practice, our protocol those sources. We then find a suitable number of k to achieve
only requires each client to measure the contact frequency of the best trade-off between preference utility and cellular
a few most-frequent contacts. Each user i hence only periodi- bandwidth requirement. The notations used in our commu-
cally updates the measured contact frequency fij of the most nity-based algorithm are summarized in Table 1.
frequency contacts and its preference ui;m to the base station.
4.1 Selecting a Fixed Number of Assignments
Some previous studies [22], [23] have observed from sev-
eral real traces that the inter-contact duration between We first describe how our community-based scheme selects
users is exponentially distributed. We hence further assume a fixed number k of source-object assignments. We will
that the contact process of each node pair follows a homoge- describe how to adaptively determine the value of k in
neous Poisson process. Though some other works [38], [39] Section 4.2. In our design, an object could be assigned to
model the inter-contact duration as a power law distribu- multiple sources if it is popular; on the other hand, a user
tion, we leave generalizing our design to the power law of could be designated as the source of multiple objects. To
inter-contact times as our future work. Finally, this work evaluate how good is a source-object assignment i; m, we
mainly focuses on improving dissemination efficiency, predict how much utility will a source candidate i contrib-
and assumes that users that join this system are willing to ute to the mobile social network before object m expires. In
cooperatively forward content. Considering incentive is an MSN, any other user j can get the utility uj;m if it
another challenging issue, and could be a potential future retrieves object m from source i through opportunistic
research direction. multi-hop forwarding before object m expires. Based on this
observation, we duplicate the graph of the MSN G to
Gm V m ; E m for each object m 2 M, and define Ci;m as
4 COMMUNITY-BASED SOURCE-OBJECT
the community of user i for object m, which is the subset of
ASSIGNMENT users in V m that could get object m from user i through
In an MSN, users usually form multiple communities [19], multi-hop forwarding before m expires.
and those communities could be disjointed or partially over- We say that user i can deliver an object to user j through
lapped. Hence, solely considering the contact frequency is multi-hop forwarding if they have a multi-hop contact along a
not sufficient for content dissemination applications, because path P : i ! r1 ! r2 j. That is, user i might not encounter
the objects might be distributed to the users that have a high user j directly, but can ask its contact r1 to forward the
3164 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

object hop-by-hop along path P . Let fijP denote multi-hop


contact frequency, which predicts how often user i can have a
multi-hop contact with j along any path P . Since each node
pair reports their measured one-hop contact frequency, the
base-station can estimate fijP by
1
fijP max P 1
;
P 2Pij
8r;r0 2P frr0

where Pij is the set of all possible paths between i and j and
frr0 is the one-hop contact frequency between relays r and r0
over a path P . The rationale is that we assume the expecta-
tion of time to forward an object along the multi-hop path
can be estimated by the sum of the average contact interval,
1=frr0 , between each node pair along the path. Though an
Fig. 1. An example shows how the proposed community-based algo-
object could opportunistically traverse multiple paths, we rithm selects three source-object assignments for three objects m1; m2
only conservatively use the frequency of the single best and m3.
(shortest) path as the estimate.
Given such prediction, we expect that user j could get select the top k source-object assignments. In our design, an
object m from i via multi-hop forwarding before deadline object could be assigned to multiple sources. As a result, a
Tm if the traversing time 1=fijP along path P does not exceed user could get multiple duplicates of an object from differ-
ent sources through opportunistic communications, but can
the deadline Tm , i.e., Tm  fijP  1. Note that, since each node
gain the utility of that object only once. To avoid from repet-
pair measures their contact frequency by the inverse of the itively counting the utility of getting the duplicated object,
expected inter-contact time, the timescale of Tm here should our source selection algorithm removes the users belonging
be the same with the timescale used in contact frequency to Ci;m and the adjacent edges from Gm if user i has
measurement.
been selected as the source of object m, as in line 6 of
We then define the community of user i for object m as
Algorithm 2. By doing this, we can ensure not to repetitively
  count the preferences of the users in Ci;m for object m in the
Ci;m j : Tm  fijP  1; 8j 2 V m ;
following iterations of source-object assignment.
which is the set of users that have a multi-hop contact with i
at least once before deadline Tm . The community Ci;m can Algorithm 2. Community-Based Source-Object Selection
be found by a breadth-first search mechanism shown in input: set of objects M; Gm V m ; E m ; 8m 2 M; contact
Algorithm 1. Since all the user j in Ci;m could get the utility frequency fij ; 8i; j 2 E; a given number of assignments
uj;m before deadline Tm , the utilityP
contribution of assigning k; termination threshold Du and a pre-defined checking
m to source i can be estimated by j2Ci;m uj;m . The base-sta- interval n; set of initial source-object assignments A fg
1 total Ut 0; 8t 2 Z
tion can hence use such prediction to select the best source-
2 for t 1 to k do
object assignment s ; m that generates the maximum util- 1
3 find community Ci;m and one-hop community Ci;m over
ity contribution to its community as follows:
Gm for all i 2 V m and m 2 M P
X 4 s ; m arg maxi2V m ;m2M j2C 1 uj;m
s ; m arg max uj;m : 5 A A [ fs ; m g
i;m
i2V m ;m2M
j2Ci;m
6 Gm V m nCs ;mP ; E m nfi; j : i 2 Cs ;m ; j 2 V m g
7 Ut Ut  1 j2Cs ;m uj;m
Algorithm 1. Search Community Ci;m of User i on Gm 8 if Ut  Ut  n=Ut  n  Du then
9 break
input: user i; graph Gm for object m 2 M; contact frequency 10 return A
fjk ; 8j; k 2 E m
P
1 fi;j fi;j ; 8j 2 V m
Fig. 1 is an example showing how the proposed algo-
2 Ci;m fg
rithm selects three source-object assignments for three
3 Q fig
4 while Q6f do objects m1; m2 and m3. The MSN graph is duplicated to
5 j dequeueQ Gm1 , Gm2 and Gm3 accordingly. User a is selected as the first
6 Ci;m Ci;m [ fjg source of object m1 because, based on the contact frequency,
7 forall the j; k 2 E m do the users in its community Ca;m1 can get the highest utility
8 P
fik maxfikP
; 1=f1P f1 by retrieving object m1. After the first assignment, since all
jk
9 if fij  T  1 then ij
P users in Ca;m1 could get object m1 from source a before Tm1 ,
10 Q Q [ fkg we exclude those users by removing them and their adja-
11 return Ci;m cent edges from Gm1 . We then choose the tuples b; m2 and
c; m1 as the second and third assignments, respectively.
The proposed community-based source selection algo- This example illustrates two design ideas. First, even
rithm, as shown in Algorithm 2, uses the above strategy to though user d could get object m1 from source c by the
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3165

deadline Tm1 , we do not count its preference for m1 because 4.3 Reducing Feedback Overhead
it might also be able to retrieve m1 from source a and we To perform the proposed source selection algorithm, the
already account for this utility gain in the first iteration. base-station needs to know contact relationships between
Second, we select user c as the second source of m1, rather MSN users. In principle, each MSN user i should feedback
than any source in Gm3 , because the utility contributed by the contact frequency fij of all its contacts j to the base-sta-
Cc;m1 is higher than any in both Gm2 and Gm3 , even if we tion in order for the base-station to construct graph G. How-
have already excluded the users who could get the dupli- ever, in practice, a client might not measure the contact
cate of m1 after the first assignment. The same procedure frequency of all nearby contacts. In addition, updating such
can be repeated until a specific number of assignments is contact information might require a considerable amount of
selected or when the terminating condition, which will be upload bandwidth. To reduce such an overhead, we allow
introduced in Section 4.2, is reached. each user to only measure and feedback the contact fre-
Ideally, we should select a source-object assignment quency of its most frequent contacts to the base-station. Spe-
that can produce the maximum utility for the community cifically, we let each user only report the contact frequency
of the selected source. However, finding the community of of d most frequent contacts, i.e., the neighbors with the d
each user on Gm for all m 2 M requires an expensive highest contact frequency. The idea is that the probability of
computational cost. To reduce the computational complex- content sharing between two users who do not encounter
1 frequently is low; hence, we can neglect the expected utility
ity, we can instead use one-hop community Ci;m of each user
1
i to select source-object assignments. Specifically, Ci;m is a gain of content sharing between those infrequent contacts.
The base-station can then use the feedback of partial contact
subset of Ci;m and only includes the users that directly
have contact with a candidate source i before deadline Tm . relationships to construct an incomplete graph G0 V; E 0 ,
That is, Ci;m1
fj : Tm  fij  1g. The one-hop community where E 0 is a subset of E and only contains the edges with a
high contact frequency, and performs Algorithm 2 over G0 .
can be easily found without breadth-first search in Algo-
We will evaluate in Section 6.3 how such partial information
rithm 1. Therefore, to reduce the computational cost, we
feedback affects the achievable utility and the amount of
can alternatively select the source-object tuple that produ-
upload traffic.
ces the maximum utility to P its one-hop community, i.e.,
s ; m arg maxi2V m ;m2M j2C 1 uj;m , as in line 4 of
i;m 5 PREFERENCE-AWARE OPPORTUNISTIC
Algorithm 2. After each assignment, we can still remove FORWARDING
the users in the multi-hop community Cs ;m from Gm , as
shown in line 6 of Algorithm 2, because the cost of finding We first formulate the maximum-utility forwarding model to
the community of a single source is relatively low. maximally satisfy user preferences, and propose an optimal
forwarding scheduling algorithm to solve the above model
4.2 Terminating Condition in Section 5.1. We then derive in Section 5.2 how to predict
So far we have described how to select a fixed number of the future utility that each user can contribute by forward-
source-object assignments. We next discuss how to deter- ing an object in a specific time-slot. The notations used in
mine a suitable number of assignments. Our goal is to bal- our forwarding algorithm are summarized in Table 2.
ance the trade-off between preference utility and cellular
bandwidth requirement. Our design is based on an observa- 5.1 Maximum-Utility Forwarding Scheduling
tion that the total achievable utility usually converges after We formulate the forwarding scheduling problem as a dis-
selecting a sufficient number of sources. More specifically, crete-time model, where the length of each time-slot is set to
the incremental utility brought by adding an additional the time required to broadcast an object. Without loss of gen-
source-object assignment usually decreases with the num- erality, we assume that each transmission starts at the begin-
ber of selected assignments. Hence, our algorithm termi- ning of a time-slot and occupies the whole slot. Let Mtf
nates iterative source selection when the incremental utility denote the set of objects cached by forwarder f 2 V in a given
is less than a pre-defined threshold Du . To avoid undesirable time-slot t, which is a subset of all objects, i.e., Mtf  M.
termination due to a single iteration of assignment, we com- Each object m 2 Mtf might be an original object downloaded
pute the incremental utility of every n assignments, rather from the base-station or a copy retrieved from other forward-
than that of a single assignment, and terminate our algo- ers. We then collect the users that locate within the transmis-
rithm when the following condition is satisfied. sion range of forwarder f in time-slot t as a group V tf . Since
mobile users might join or leave a group arbitrarily, the
Utcurr  Utcurr  n members in forwarder fs group V tf could change with time,
 Du ; 0
Utcurr  n i.e., V tf 6 V tf if t 6 t 0 . This implies that broadcasting the
same object m 2 Mtf in different time-slots, e.g., the current
where Ut is the cumulative utility until the tth iteration of time-slot t or any future time-slot t > t, could produce dif-
source selection and tcurr is the current iteration. If the above ferent utilities. Therefore, each forwarder must schedule the
equation holds, it implies that the number of assignments is transmission sequence of its objects such that the total utility
sufficient to achieve a high enough utility, and we do not gained by all users can be maximized. In particular, for any
need to waste more cellular bandwidth to deliver the objects forwarder f, we define a binary variable xm;t to indicate
to sources in the MSN. We will investigate how to choose a whether f decides to broadcast object m in time-slot t.
proper threshold in Section 6.2. Namely, xm;t 1 if object m is broadcast in time-slot t, and
3166 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

TABLE 2 TABLE 3
Notations Used in Forwarding Scheduling Example of Utility Contribution

Notations Definition user i time-slot t


Ui; m; t vgm;t
V tf the set of users that have contact with n1 n2 t t1
forwarder f in time-slot t object m m1 1 5 object m m1 1 + 5 = 6 5
Mtf the set of objects owned by forwarder f object m m2 3 2 object m m2 3 + 2 = 5 2
in time-slot t (a) future utility contribution (b) global utility
dfi the duration of contact between user f and i
T the set of feasible forwarding time-slots for
a forwarder X
vlm;t total local utility contribution of broadcasting Vx xm;t vgm;t : (2)
object m by forwarder f in time-slot t m2Mtf ;tt
vgm;t total global utility contribution of broadcasting
object m by forwarder f in time-slot t We note that clients can build proximity-based communica-
Ui; m; t the future utility contributed by user i if it gets tions via LTE D2D, Wi-Fi Direct or Bluetooth, each of which
object m in time-slot t
xm;t ; x an indicator denoting if forwarder f sends has specified the discovery and association protocol. We
object m in time-slot t, and its matrix form can hence embed the global utility contribution in the asso-
Vx; V the total utility contributed by a forwarder ciation messages. The extra overhead required by utility
schedule x, and the maximal total utility information exchange is hence negligible. Take Wi-Fi Direct
V maxx Vx as an example. Device scanning consists of two phases: the
Fi;m t the probability that user j has not received searching state, which sends the probe message, and the lis-
object m in time-slot t from any other contacts tening state, which waits for the response message, each
qj k the probability that j and k will not meet in a
time-slot t with 100-300 ms [40]. Say we annotate the response message
Gb the bipartite graph used for forwarder f to with our contribution utility, which is represented by a 4-
Mtf [ T ; E solve the forwarding schedule in time-slot t byte floating point and sent at the base-rate (e.g., 6 Mb/s in
M a bipartite matching in graph Gb 802.11a). Then, the airtime required to deliver these infor-
mation is 32=6  jMm j ms, where jMm j is the number of
objects owned by neighbor m. For example, the overhead is
xm;t 0 otherwise. Our goal is to enable each forwarder to only increased from 200600 ms to 205605 ms if
find its optimal forwarding schedule xm;t ; 8m 2 Mtf ; t  t, jMm j 1000.
denoted by the vector form x for short, that produces the We leave presenting how each user predicts its future
maximal total utility for all users in the MSN. utility contribution Ui; m; t in Section 5.2, and first address
When forwarder f meets a group of neighboring users how a forwarder f schedules the forwarding sequence of all
V tf in time-slot t, it can broadcast any object m 2 Mtf to its objects m 2 Mtf according to the global utility contribu-
all its neighbors in V tf . Each user i 2 V tf who does not tion vgm;t for all t  t. Intuitively, for each time-slot t, for-
own object m can then obtain the utility ui;m . The warder f can greedily maximize the global utility by
forwarder f can therefore produce the local utility contri- broadcasting the object mg with the maximal global utility
P
bution vlm;t = Mt ui;m in time-slot t. Since for-
i2V t ;m 2 contribution, i.e., mg arg maxm2Mt vgm;t . We let x~ denote
f i f
warder f can only broadcast a single object in time-slot t, the forwarding schedule selected by such a greedy solution,
f can choose to forward the object with the maximal local and hence x~mg ;t 1. The forwarder f can then remove object
utility contribution, i.e., m arg maxm2Mt vlm;t , for group mg from the object set, i.e., Mt1 Mtf nfmg g, and repeat
f f
V tf . However, solely maximizing local utility might not be the same greedy procedure in every time-slot t  t to pro-
able to produce the maximal global utility due to the lack ~ V~
duce a total global utility contribution V x.
of consideration of the utility contributed by group mem- However, the greedy strategy is sometimes suboptimal
bers in V tf to their future contacts. Taking such future con- because the forwarder only considers the future utility
tribution into account, we let each user predict the total contribution of a single time-slot at a time. Ideally, differ-
utility it can contribute by storing and forwarding object ent users in V tf might meet the forwarder f for different
m after time t, denoted by future utility contribution durations and, therefore, should have heterogeneous pri-
Ui; m; t. The forwarder f can then request each of its orities. Consider a scenario where f has contact with
neighbors to report the estimate of Ui; m; t, and com- neighbors n1 and n2 in time-slot t for one and two time-
pute the following global utility contribution vgm;t that slots, respectively. In this case, n1 has a shorter contact
can benefit not only local group members but also their duration and can only retrieve a single object from f, while
future contacts by broadcasting object m to V tf in time- n2 can get two objects. If the forwarder wants to broadcast
slot t two objects m1 and m2 , it does not matter for n2 which
X object is sent first. However, the forwarding sequence of
vgm;t Ui; m; t: (1)
m1 and m2 determines which one can be received by n1
= Mti
i2V tf ;m 2
and, thus, affects the total utility that can be contributed
Given a forwarding schedule x, a forwarder can generate by the two transmissions.
the following total global utility contribution for all time- More specifically, assume that n1 and n2 have a future
slots t  t: utility contribution shown in Table 3a. Here, we assume that,
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3167

for each user i and object m, the future utility contribution


of any two consecutive time-slots, i.e., Ui; m; t and
Ui; m; t 1, does not change much, and thus use
Ui; m; t to approximate Ui; m; t 1 in this example. The
global utility contribution vgm;t of broadcasting the objects
m 2 fm1 ; m2 g in time-slots t 2 ft; t 1g can then be com-
puted as shown in Table 3b. The greedy strategy solely con-
siders the global contribution of the first time-slot t, and, as Fig. 2. The optimal forwarding schedule can be transformed to the maxi-
a result, first sends object m1 because forwarding m1 in t mum weight bipartite matching problem. The red lines indicate the edges
with the maximal total weight. Hence, xm4 ;t1 xm1 ;t2 xm2 ;t3 1 is the
produces a higher global contribution than m2 , i.e., optimal forwarding schedule, and object m3 is not broadcast due to the
vgm1 ;t 6 > vgm2 ;t 5. However, if we compute the total limited contact duration.
utility contribution of two time-slots t and t 1, we get that
the greedy forwarding sequence m1 ! m2 produces a total The above maximum-utility forwarding model can
global utility vgm1 ;t vgm2 ;t1 6 2 8 lower than the actually be transformed to the maximum weight bipartite
reverse sequence m2 ! m1 , which produces the total global matching (MWBM) problem [41]. To represent our maxi-
utility vgm2 ;t vgm1 ;t1 5 5 10. mum-utility forwarding problem as the MWBM problem,
We observe from the above example that forwarder f can we create a bipartite graph Gb Mtf [ T ; E, where the
achieve the maximal total contribution V by considering vertex set is the union of the sets of objects and available
multiple successive time-slots after t if it can exactly know time-slots and the edge set E fm; t : m 2 Mtf ; t 2 T g,
or predict the contact durations dfi for all neighbors i 2 V tf . as shown in Fig. 2. Each edge e m; t 2 E is associated
That is, given the object set Mtf in time-slot t, f can schedule with a weight, which is set to the global utility vgm;t that f
which objects should be broadcast during the following can contribute by broadcasting object m in time-slot t, i.e.,
available time-slots t; t max to maximize the total global setting xm;t 1.
utility contribution, where t max maxi2V t t dfi is the time- Note that our model restricts that each object m 2 Mtf
f can be assigned at most a single time-slot and, in addition,
slot when group V tf is disbanded. For ease of exposition, we each time-slot t 2 T can be allocated to at most a single
define a set of available time-slots T ft; t1; t2: . . . ; object. Therefore, any feasible solution of the forwarding
t max g. Such a maximum-utility forwarding problem for for- schedule x is a matching M in the bipartite graph Gb , where
warder f in time-slot t can then be formalized as follows: M is a subset of E such that no two edges in M share an end-
X point, as the red lines shown in Fig. 2. For any m 2 Mtf and
V maxx Vx maxx xm;t vgm;t t 2 T , the forwarding schedule xm;t 1 if edge m; t is
m2Mtf ;t2T
included in the matching M, and xm;t 0, otherwise. Then,
X  X  we get that the total weight of edges in M
maxx xm;t Ui; m; t (3a) Pexactly equals the
total global utility contribution Vx m2Mt ;t2T xm;t vgm;t .
m2Mtf ;t2T = Mti ;t<tdfi
i2V tf ;m 2 f
As a result, finding the maximal total global utility V is
subject to equivalent to solving the maximum weight matching in the
X bipartite graph Gb . Therefore, we can apply the well-known
xm;t  1; 8t 2 T (3b) polynomial-time algorithm, called the Hungarian algorithm
m2Mtf [42], to solve the maximum weight bipartite matching, i.e.,
the optimal forwarding schedule x .
X
xm;t  1; 8m 2 Mtf (3c)
5.2 Future Contribution Prediction
t2T
We now describe how a user predicts its future contribu-
tion by forwarding an object in a specific time slot. Since
xm;t 2 f0; 1g: (3d) there are no specific destinations in content dissemination
applications, our goal is to propagate objects to as many
users who are interested in them as possible. In contrast to
Eqs. (3b) and (3c), respectively, restrict that forwarder f
existing MSN unicast routing protocols that predict the
can at most broadcast a single object in each time-slot, and
probability of delivering a message to the destinations,
can only forward each object at most once. Since each user i
our model needs each user i to predict the future utility
in group V tf has a contact with f for a different duration dfi
contribution Ui; m; t that it can contribute by forwarding
and will leave the group in time-slot t dfi , the system can object m to all its contacts during t; Tmmax , where Tmmax is
only gain the future utility contribution Ui; m; t if user i the expiration time of object m. Note that user i can only
receives the object m before it leaves the group, i.e., contribute object m to any contact j if j does not have
t < t dfi . Therefore, the global utility of broadcasting object object m when they meet after time-slot t. Intuitively, we
m, i.e., vgm;t , varies with time-slot t, as shown in Eq. (3a). The have Ui; m; t  Ui; m; t0 for any time-slots t < t0 if i
objective is hence to find the optimal forwarding schedule does not drop m after t. This is because the number of
xm;t for all m 2 Mtf and t 2 T , such that forwarder f can con- users that have not downloaded object m decreases with
tribute the maximal total global utility V Vx . time. Therefore, the value of Ui; m; t can be determined
3168 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

by three factors: 1) the probability that forwarder i can in Eq. (6) involves the value of Fk;m for all users
meet other contacts who have not owned object m after k 2 Vnfjg, j does not really need to solve Fk;m for all
time-slot t; 2) the utility of is contacts on object m; and 3) users. Instead, j only needs to compute Fk;m for its his-
the probability that user i drops object m due to its lim- torical contacts because, if j and k have never met, the
ited buffer space. contact probability 1qjk must be 0. To reduce the com-
To compute Ui; m; t, we start by deriving the probabil- plexity, we let each user j 2 V compute its Fj;m and
ity that a contact j 2 V has not received object m in time- exchange the information about Fj;m with other users
slot t0 from any contacts k 2 V, denoted by Fj;m t0 . Recall k 2 V when they meet. Hence, the computational com-
that we model the contact process of each node pair as a plexity can be shared by all users.
homogeneous Poisson process [22], [23]. We then define We can compute the cumulative probability that i can
the random variable Njk y as the cumulative number of forward object m to contact j between t and Tmmax as follows:
contacts between j and k in continuous-time y with the
max 1
mean contact frequency jk . However, since we formulate X
Tm
T max 1z
the problem as a discrete-time model, we redefine the ran- 1qij qijm Fj;m z: (7)
zt
dom variable Njk y as Njk d
t, which equals the cumulative
number of contacts between j and k until the tth time-slot, However, the object m might be dropped by user i in any
and set Njk d
t Njk Ttx t, where Ttx is the length of a time-slot z due to buffer overflow with a probability di;m z.
time-slot. Without loss of generality, we let any contact in The cumulative probability needs to be rewritten as
time interval Ttx t1; Ttx t start transmitting in time Ttx t, max 1
i.e., the beginning of time-slot t. According to the charac- X
Tm
T max 1z
1qij qijm Fj;m z1  di;m z: (8)
teristics of homogeneous Poisson process, we can compute zt
the probability qjk that j and k will not meet in a time-slot
t by The expected utility that i can contribute to contact j
between t and Tmmax for object m then equals
qjk PNjk Ttx t  Njk Ttx t1 0 ejk Ttx : (4)
max 1
X
Tm
T max 1z
Thus, we can set the contact probability between j and k in ki;j;m t uj;m 1qij qijm Fj;m z1  di;m z:
each time-slot at 1  qjk . We observe that user j will not be zt
able to download object m from user k before time-slot t if
and only if one of two following events occurs: 1) j and k Therefore, the expected total utility that all future con-
have never met before time-slot t, i.e., Njk d
t1 0; or 2) tacts can gain by downloading object m from user i during
the last contact between j and k was in time-slot z < t, but k t; Tmmax can be predicted by
did not have object m in time-slot z. Therefore, the probabil-
1  X Y
X Y 
ity fj;k;m t that j cannot download object m from k before
Ui; m; t n ki;j;m t 1ki;j;m t ; (9)
time-slot t can be computed by F 2Fn j2F
n0 j2
=F

X
t1
where Fn collects all possible sets that contain exactly n
fj;k;m t qjk
t1
1qjk qjk
t1z
Fk;m z: (5)
z1
contacts. We observe that the computational complexity
of Eq. (9) is relatively high. However, because we only
Then, the probability Fj;m t that user j has not downloaded care about which user can make a larger contribution
object m until time-slot t equals the probability that j cannot and do not concern about the absolute value of Ui; m; t,
download object m from any user k 2 V; j 6 k, before t. we use the following approximation as the final contri-
Y bution metric:
Fj;m t fj;k;m t: (6) X
j6k Ui; m; t
ki;j;m t: (10)
j2Vnfig
Because Eq. (6) includes the probability qjk of no contact
between j and k, user j has a lower probability Fj;m t if it
can meet other users k 2 V more frequently, i.e., a lower qjk . In Eq. (8), the dropping probability di;m z depends on
Moreover, the contact k will only be helpful if it carries the dropping policy of each user. We consider the policy
object m when it meets j. Consequently, j is more likely to that each user i drops the object m it is least interested in,
have a lower probability Fj;m t and cache object m in time- i.e., with the lowest utility ui;m , when its buffer is over-
slot t if its contact k has a higher probability of receiving flowed. Specifically, each user sorts its objects by utility in
object m before they meet, i.e., with a lower Fk;m z for all ascending order, and drop the first object when it does not
z < t. have enough buffer space to cache a new object. Assume
Since only the data sources have object m in the initial that user i ranks object m as the kth object of interest among
time-slot 1, we have Fk;m 1 0 if k is a data source of all its objects. The user will drop object m if it receives k
object m and Fk;m 1 1 otherwise. Given the initial objects with a utility higher than object m after its buffer
value of Fk;m 1, we can compute the value of Fk;m z becomes full. Say that, in the current time-slot t, user i has a
for each time-slot z 2; 3; . . .; t1 iteratively for all users residual buffer space that can cache additional b objects.
k 2 V. Note that, even though the computation of Fj;m t The dropping probability di;m z can then be computed by
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3169

the following equation: let the number of songs owned by a user with the same art-
ist represent its preference (i.e., utility) for any songs of that
X
z
artist. For example, if a user has five songs of Maroon 5, that
di;m z PRi t  t; 0 bPRi z  t; ui;m  k; (11)
tt
user has a preference 5 for Maroon 5. That is, it can gain the
utility 5 if it downloads any song of Maroon 5 that is not
where Ri t; u denotes the number of objects with a utility cached in its buffer. We let each user in the mobility trace
higher than u received by user i during a period of t time- associate with a randomly-selected listening profile col-
slots. Thus, PRi t  t; 0 b indicates the probability that lected from Last.fm, and compute its utility for each object
user i receives any b objects from t to t and leads to a full based on the artist of that object. We repeat each simulation
buffer, while PRi z  t; ui;m  k is the probability that with 20 random profile associations, and report the average
user i receives more than k objects with a utility higher than performance.
ui;m during time-slots t; z , as a result dropping object m.
Again, based on [22], [23], we model the contact process of 6.1 Utility Gain of PrefCast
each node pair as a homogeneous Poisson process, and esti- Our trace-based simulations compare the following
mate the above probability as follows: schemes.

0 t  tb e
0 tt
 Epidemic routing [15]. Each forwarder randomly
PRi t  t; 0 b ; and (12)
b! selects an object to broadcast in each time-slot.
 PROPHET [16]. It finds the forwarder that has the
highest probability of delivering a message to the
X
k1 u
u z  tb e zt destination using unicast. However, since we con-
PRi z  t; u  k 1  ; (13)
b! sider a content dissemination application, we
b0
modify PROPHET to Interest-based PROPHET (or
where u is the frequency of meeting a user who can con- I-PROPHET), which allows a forwarder to find an
tribute an object with the utility higher than u. In other object that can be distributed to the most number of
words, 0 denotes the frequency of meeting a user who users. Specifically, we let each user only have a
can contribute any object. However, since it is difficult to binary utility for each object, i.e., an interested or
predict whether a contact can share an object with the util- non-interested object, and compute its probability of
ity higher than u, we approximate the value of u by the having a contact with other users who have not
contact frequency multiplied the normalized utility, i.e., owned that object. The forwarder then computes the
P
u 0  u= m2Mt ui;m , where Mti is the set of objects summation of such contact probability for all group
i
members who are interested in an object as the
owned by user i in t.
weight of that object, and broadcasts the object with
the highest weight.
6 PERFORMANCE EVALUATION  Local utility. Each forwarder broadcasts the object
We evaluate the performance of PrefCast using four real with the highest local utility vl in each time-slot.
traces: NUS [43], INFOCOM06 [44], MIT Reality [45], and  PrefCast w/o SS. It uses our preference-aware for-
UCSD [46]. The NUS trace contains the schedules of the warding, but applies random source-object selection.
4,885 classes and 22,341 students for 77 class hours. We ran-  PrefCast w/ community-based SS[9]. It uses our prefer-
domly select 500 users from the trace in our trace-based sim- ence-aware forwarding, and performs preference-
ulations. We assume that two students have contact with oblivious community-based source selection [9]. Spe-
each other if and only if they are in the same classroom, cifically, we classify clients into communities based
which follows the same method in [43]. We randomly select on the method proposed in [9], and determine
some students in each class to be absent or leave early, and source-object assignment without considering user
generate contact patterns outside the classrooms based on preference. We sort the communities in a descending
the survey and observations made in [43]. The INFOCOM06 order of the number of users and select sources from
trace includes 78 users who attend the same conference for different communities in a round robin manner until
a few days; the MIT Reality trace includes 97 users who K sources are picked. To select a proper source from
work in the same building; the UCSD trace, which is also each community, we pick the user with the most
used in [11], records the contact history of 275 HP Jornada number of overlapped communities and contacts, as
PDAs carried by students over 77 days. We consider a sce- the source, and assign it a randomly-select object
nario where each user can use the broadcast technique, without considering user preference. For fair com-
such as Wi-Fi multicast, to distribute its objects at the trans- parison, we let clients use our preference-aware for-
mission rate 1 Mb/s. Each user has a buffer that can store warding (i.e., Section 5) to distribute content objects.
100 objects.  PrefCast w/ preference-aware SS. It applies both of our
We use log-based user profiles collected from Last.fm preference-aware source-object assignment and
[47], a database that tracks listening habits of music. We forwarding.
crawl the webpages of Last.fm and collect the profiles of The first three schemes simply apply random initial
8,000 randomly-selected users. For each sampled user, the source-object selection. Because the comparison schemes do
profile records top 100 songs it had listened to the most. We not have explicit source selection strategies, for fair compar-
categorize the songs of each user based on their artists, and ison, we let all the schemes select a fixed number (100 in our
3170 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

Fig. 3. Cumulative utility over time-slots.

simulations) of source-object assignments. We will evaluate interested in that object. Our preference-aware selection
the performance of dynamically adapting the number of scheme however identifies communities with consideration
sources in Section 6.2. of user preference, and hence can distribute objects to those
Fig. 3 plots the average utility obtained by each user in users of interest.
the three traces over time. The figures show that, without We next evaluate the impact of the number of source-
explicit source selection, PrefCast can produce a 15:7 and object assignments on the performance of comparison
22:6 percent higher average utility than I-PROPHET and schemes. We vary the number from 10 to 100. Fig. 4 plots
Local Utility, respectively, until the deadline. The improve- the average utility of all users until the end of the simula-
ment is mainly because that each forwarder in PrefCast con- tions. Generally, more source-object assignments provide
siders the future utility contribution of its members. users a higher average utility. However, by selecting proper
Specifically, the utility metric Ui; m; t of a group member i initial sources, PrefCast can further achieve a higher utility
used in PrefCast not only considers heterogeneous user pref- gain, especially when more users are allowed to retrieve the
erences, but, more importantly, predicts how many users objects via cellular connection from the base-station.
(including current and future contacts) can gain utility by Another thing worth noting is that, when the number of
obtaining object m from member i. Therefore, a forwarder sources is small, preference-oblivious community-based
can select the object that benefits all users in the system, source-object selection might perform even worse than
instead of an object that only interests local neighboring other random selection schemes. This is because only a few
users. The figures also show that, in most of the traces, large communities can help disseminate the objects, but
I-PROPHET performs better than Local Utility. This also those communities are not interested in the randomly-
explains that the number of future possible contacts, i.e., the assigned objects. This explains that the effectiveness of Pre-
estimated number of contacts who have not owned the con- fCast highly relies on how the base-station utilizes limited
sidering object, has a greater impact on utility contribution cellular bandwidth resources to deliver the objects to well-
than local preferences. selected initial sources.
Moreover, by enabling source selection with consider-
ation of user preferences and the community structure, Pre- 6.2 Adaptive Number of Source-Object
fCast can further increase the total utility by 49:3 percent Assignments
over PrefCast without explicit source selection. The In the above simulations, we show the performance of Pre-
improvement is mainly because our algorithm assigns an fCast with a given number of source-object assignments.
object to the initial sources belonging to the communities However, as mentioned in Section 4.2, the trade-off between
that are more likely to be interested in that object. Hence, traffic offloading and preference utility in PrefCast is deter-
the object can be spread out in those communities through mined by the number of selected assignments, yet a suitable
forwarding, and produce a significant utility gain for users number of assignments usually changes with the scale of a
in the MSN. The results also show that, though preference- mobile social network and heterogeneous user preferences.
oblivious community-based source-object selection outper- Therefore, we now evaluate whether PrefCast can flexibly
forms random source-object selection, its performance is determine a suitable number of source-object assignments,
still far below our preference-aware community-based adapting to a mobile social network. In this simulation, we
source-object selection. The main reason is that an object set the value of n in Algorithms 2 to 5. The deadline of the
might be able to be diffused through the whole community, NUS, INFOCOM06 and MIT Reality traces are set to 75
but the members in the community are not actually hours, 16 hours, and 35 days, respectively.
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3171

Fig. 4. Impact of number of source-object assignments.

In Fig. 5, we compute the average utility of PrefCast use, the more assignments we can select, and thus higher
with a given number of assignments ranging from 1 to average utility we achieve. However, the achievable util-
250, and compare it with the average utility of PrefCast ity converges as the number of assignments increases. In
determined by Algorithm 2. We check the performance of other words, when the base-station has selected a suffi-
adaptive source selection when the threshold Du is set to cient number of assignments, the incremental utility gain
0.025 and 0.05, respectively. For each different threshold, of including a new one decreases. The figures verify our
we plot the corresponding average utility as a horizontal argument that we can balance the trade-off between cellu-
line in Fig. 5. It is obvious that the smaller threshold we lar bandwidth usage and preference utility by selecting a
suitable number of assignments. More importantly, we
can see that our adaptive algorithm with an appropriate
threshold, e.g., setting Du 0:025, can achieve an utility
close to the convergence point of preference utility. It
explains that our algorithm can adaptively determine
how many source-object assignments should be selected
for different networks (traces).

6.3 Overhead of Information Feedback


To reduce the overhead of information feedback, we stipu-
late that each user only reports the contact frequency of d
most frequent contacts to the base-station. Therefore, we
now check how much overhead can be saved by such limited
information feedback, and how such incomplete information
affects the achievable utility. Fig. 6a plots the utility of Pre-
fCast when d is set to 1, 3, 5, and unlimited, respectively. The
figure shows that the utility only decreases slightly when we
restrict the amount of information feedback. This explains
that the base-station can exploit partial but significant infor-
mation, i.e., the contact frequency of a few most frequent
contacts, to find a suitable set of initial sources. Nevertheless,
we can observe from Fig. 6b that the total number of mes-
sages required to report the frequency information is
reduced significantly. In particular, setting d 5 achieves a
utility comparable to that using full information, while sav-
ing about 76 percent of message overhead on average.

6.4 Sensitivity Study


We next discuss how PrefCast performs in different environ-
Fig. 5. Adaptive number of initial sources. ments with respect to the buffer size, the number of users,
3172 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

Fig. 8. Impact of number of users (SLAW).

Fig. 9 shows the impact of the transmission range of


each mobile device on the utility contribution using
SLAW. Each synthetic trace includes 150 users. Again, we
choose SLAW in this simulation because the real traces
only specify the contact event without the explicit geomet-
ric location information. However, the SLAW mobility
model assigns each user a geometric location. Note that
more users can overhear the transmission of an object
from a forwarder if the forwarder has a longer transmis-
sion range. Thus, when there are more members in a for-
Fig. 6. Impact of limited information feedback. warders group, the forwarder generates a significant
local utility, which dominates the benefit of future contri-
butions. This is the reason that the performance gap
and the transmission range. To take a closer look at the between PrefCast and Local Utility decreases when the
impact of the above factors on the performance of forward- transmission range increases.
ing efficiency, we only compare Local Utility to our PrefCast
without explicit source selection.
Fig. 7 compares the performance of PrefCast when the 7 CONCLUSION
dropping probability is considered, as in Eq. (8), or not, as In this paper, we propose PrefCast, a preference-aware
in Eq. (7). The figure shows that the performance gap protocol that offloads cellular traffic for content dissemi-
between PrefCast and Local Utility is small when the buffer nation applications through opportunistic communica-
size is too small or too large. When the buffer size is too tions with consideration of heterogeneous user
large, each user can almost cache all the objects it down- preferences. To balance the trade-off between cellular
loads from other forwarders and does not need to drop any bandwidth requirement and the utility gain, we investi-
objects. On the contrary, when the buffer can only include a gate two issues, 1) source-object assignment and 2) for-
small number of objects, each user can only cache a few warding scheduling, such that the base-station can use as
objects and, in general, acquire a low total utility. The few cellular bandwidth as possible to maximally satisfy
impact of considering the dropping probability is therefore user preferences. Our source-object selection algorithm
only obvious when the buffer is large enough but cannot exploits the community structure of a mobile social net-
cache all the objects. work to designate suitable users to help disseminate the
Fig. 8 shows the impact of the number of users on the assigned objects. We then formulate the maximum-utility
utility contribution. Since we can not adjust the number forwarding scheduling model that allows each forwarder
of users in a real trace, we choose to use the synthetic to predict its future utility contribution and determine its
trace generated based on SLAW, the state-of-the-art optimal forwarding strategy in a distributed manner
human mobility model proposed in [48], in this simula- accordingly. The trace-based evaluation shows that, with-
tion. When there are more users that can help forward out explicit source selection, PrefCast can provide a higher
objects, each user can quickly get the objects that it is total utility than the preference-oblivious and local-pref-
interested in. Therefore, the figure shows that the utility erence-based dissemination schemes. Enabling source
improvement is limited when there are more users in the selection in PrefCast further improves the total utility by
network. 49.3 percent.

Fig. 7. Impact of considering the dropping probability (INFOCOM06). Fig. 9. Impact of transmission range (SLAW).
CHENG AND LIN: SOURCE SELECTION AND CONTENT DISSEMINATION FOR PREFERENCE-AWARE TRAFFIC OFFLOADING 3173

ACKNOWLEDGMENTS [21] U. Lee, S. Y. Oh, K.-W. Lee, and M. Gerla, RelayCast: Scalable
multicast routing in delay tolerant networks, in Proc. IEEE Int.
This work is partially supported by the National Science Conf. Netw. Protocols, 2008, pp. 218227.
[22] W. Gao, Q. Li, B. Zhao, and G. Cao, Multicasting in delay tolerant
Council, National Taiwan University and Intel Corporation networks: A social network perspective, in Proc. ACM Int. Symp.
under Grants NSC102-2911-I-002-001, NSC102-2221-E-001- Mobile Ad Hoc Netw. Comput., 2009, pp. 299308.
012-MY2 and NTU103R7501. [23] S. Ioannidis, A. Chaintreau, and L. Massoulie, Optimal and scal-
able distribution of content updates over a mobile social
network, in Proc. IEEE Conf. Comput. Commun., 2009, pp. 1422
REFERENCES 1430.
[1] B. Han, P. Hui, V. A. Kumar, M. V. Marathe, G. Pei, and A. Srini- [24] G. Sollazzo, M. Musolesi, and C. Mascolo, TACO-DTN: A time-
vasan, Cellular traffic offloading through opportunistic commu- aware content-based dissemination system for delay tolerant
nications: A case study, in Proc. 5th ACM Workshop Challenged networks, in Proc. 1st Int. MobiSys Workshop Mobile Oppor. Netw.,
Netw., 2010, pp. 3138. 2007, pp. 8390.
[2] B. Han, P. Hui, V. Kumar, M. Marathe, J. Shao, and A. Srinivasan, [25] A. Mashhadi, S. Ben Mokhtar, and L. Capra, Habit: Leveraging
Mobile data offloading through opportunistic communications human mobility and social network for efficient content dissemi-
and social participation, IEEE Trans. Mobile Comput., vol. 11, nation in delay tolerant networks, in Proc. IEEE Int. Symp. World
no. 5, pp. 821834, May 2012. Wireless, Mobile Multimedia Netw. Workshops, 2009, pp. 16.
[3] P. Baier, F. Durr, and K. Rothermel, TOMP: Opportunistic traffic [26] I. Solis and J. J. Garcia-Luna-Aceves, Robust content
offloading using movement predictions, in Proc. IEEE Local Com- dissemination in disrupted environments, in Proc. 3rd ACM
put. Netw., 2012, pp. 5058. Workshop Challenged Netw., 2008, pp. 310.
[4] Y.-J. Chuang and K.-J. Lin, Cellular traffic offloading through [27] J. Fan, J. Chen, Y. Du, W. Gao, J. Wu, and Y. Sun, Geocommunity-
community-based opportunistic dissemination, in Proc. IEEE based broadcasting for data dissemination in mobile social
Wireless Commun. Netw. Conf, 2012, pp. 31883193. networks, IEEE Trans. Parallel Distrib. Syst., vol. 24, no. 4,
[5] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, pp. 734743, Apr. 2013.
Pocket switched networks: Real-world mobility and its conse- [28] H. Cai, I. Koprulu, and N. Shroff, Exploiting double opportuni-
quences for opportunistic forwarding, Comput. Lab, Univ. ties for deadline based content propagation in wireless networks,
Cambridge, Cambridge, England, Tech. Rep. UCAM-CL-TR-617, in Proc. IEEE Conf. Comput. Commun., 2013, pp. 764772.
Feb. 2005. [29] R. Zhang and Y. C. Hu, Assisted Peer-to-Peer search with partial
[6] M. Motani, V. Srinivasan, and P. S. Nuggehalli, PeopleNet: Engi- indexing, IEEE Trans. Parallel Distrib. Syst., vol. 18, no. 8, pp.
neering a wireless virtual social network, in Proc. Annu. ACM 11461158, Aug. 2007.
Int. Conf. Mobile Comput. Netw., 2005, pp. 243257. [30] K.-J. Lin, C.-P. Wang, C.-F. Chou, and L. Golubchik, SocioNet: A
[7] A. Aijaz, H. Aghvami, and M. Amani, A survey on mobile data social-based multimedia access system for unstructured P2P
offloading: Technical and business perspectives, IEEE Wireless networks, IEEE Trans. Parallel Distrib. Syst., vol. 21, no. 7,
Commun., vol. 20, no. 2, pp. 104112, Apr. 2013. pp. 10271041, Jul. 2010.
[8] Z. Lu, Y. Wen, and G. Cao, Information diffusion in mobile social [31] Y. Wang, Y. Guo, and J. Wu, Making many people happy:
networks: The speed perspective, in Proc. IEEE Conf. Comput. Greedy solutions for content distribution, in Proc. Int. Conf. Paral-
Commun., 2014, pp. 19321940. lel Process., 2011, pp. 693702.
[9] N. P. Nguyen, T. N. Dinh, S. Tokala, and M. T. Thai, Overlapping [32] C. Boldrini, M. Conti, and A. Passarella, ContentPlace: Social-
communities in dynamic networks: Their detection and mobile aware data dissemination in opportunistic networks, in Proc.
applications, in Proc. 17th Annu. ACM Int. Conf. Mobile Comput. 11th Int. Symp. Model., Anal. Simul. Wireless Mobile, 2008, pp. 203
Netw., 2011, pp. 8596. 210.
[10] A. Lancichinetti and S. Fortunato, Erratum: Community detec- [33] T. Ning, Z. Yang, X. Xie, and H. Wu, Incentive-aware data dis-
tion algorithms: A comparative analysis, Phys. Rev. E, vol. 89, semination in delay-tolerant mobile networks, in Proc. 8th Annu.
p. 049902, Apr. 2014. IEEE Commun. Soc. Conf. Sensor, Mesh Ad Hoc Commun. Netw.,
[11] X. Zhuo, W. Gao, G. Cao, and Y. Dai, Win-Coupon: An incentive 2011, pp. 539547.
framework for 3G traffic offloading, in Proc. 19th IEEE Int. Conf. [34] F. Li and J. Wu, MOPS: Providing content-based service in dis-
Netw. Protocols, 2011, pp. 206215. ruption-tolerant networks, in Proc. 29th IEEE Int. Conf. Distrib.
[12] L. Anderegg and S. Eidenbenz, Ad hoc-VCG: A truthful and Comput. Syst., 2009, pp. 526533.
cost-efficient routing protocol for mobile ad hoc networks with [35] W. Gao and G. Cao, User-centric data dissemination in disrup-
selfish agents, in Proc. 9th Annu. ACM Int. Conf. Mobile Comput. tion tolerant networks, in Proc. IEEE Conf. Comput. Commun.,
Netw., 2003, pp. 245259. 2011, pp. 31193127.
[13] W. Wang, X.-Y. Li, and Y. Wang, Truthful multicast routing in [36] K.-J. Lin, C.-W. Chen, and C.-F. Chou, Preference-aware content
selfish wireless networks, in Proc. 10th Annu. ACM Int. Conf. dissemination in opportunistic mobile social networks, in Proc.
Mobile Comput. Netw., 2004, pp. 245259. IEEE Conf. Comput. Commun., 2012, pp. 19601968.
[14] K. Fall, A delay-tolerant network architecture for challenged [37] Y. Liao, K. Tan, Z. Zhang, and L. Gao, Estimation based erasure-
internets, in Proc. Conf. Appl., Technol., Archit., Protocols Comput. coding routing in delay tolerant networks, in Proc. Int. Conf. Wire-
Commun., 2003, pp. 2734. less Commun. Mobile Comput., 2006, pp. 557562.
[15] A. Vahdat and D. Becker, Epidemic routing for partially [38] A. Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott,
connected ad hoc networks, Dept. Comput. Sci., Duke Univ., Impact of human mobility on opportunistic forwarding
Durham, NC, USA, Tech. Rep. CS-200006, 2000. algorithms, IEEE Trans. Mobile Comput., vol. 6, no. 6, pp. 606620,
[16] A. Lindgren, A. Doria, and O. Schelen, Probabilistic routing in Jun. 2007.
intermittently connected networks, ACM SIGMOBILE Mobile [39] T. Karagiannis, J.-Y. Le Boudec, and M. Vojnovic, Power law and
Comput. Commun. Rev., vol. 7, no. 3, pp. 1920, 2003. exponential decay of intercontact times between mobile devices,
[17] T. Spyropoulos, K. Psounis, and C. S. Raghavendra, Spray and IEEE Trans. Mobile Comput., vol. 9, no. 10, pp. 13771390, May
wait: An efficient routing scheme for intermittently connected 2010.
mobile networks, in Proc.ACM SIGCOMM Workshop Delay-Toler- [40] WiFi Direct [Online]. Available: http://www.wi-fi.org/Wi-
ant Netw., 2005, pp. 252259. Fi_Direct.php, 2014.
[18] E. M. Daly and M. Haahr, Social network analysis for routing in [41] D. B. West, Introduction to Graph Theory, 2nd ed. Englewood Cliffs,
disconnected delay-tolerant MANETs, in Proc. 8th ACM Int. NJ, USA: Prentice-Hall, 1999.
Symp. Mobile Ad Hoc Netw. Comput., 2007, pp. 3240. [42] H. Kuhn, The Hungarian method for the assignment problem,
[19] P. Hui, J. Crowcroft, and E. Yoneki, Bubble Rap: Social-based for- Naval Res. Logistics Quarterly, vol. 2, nos. 1/2, pp. 8397, 1955.
warding in delay tolerant networks, in Proc. ACM Int. Symp. [43] V. Srinivasan, M. Motani, and W. T. Ooi, Analysis and implica-
Mobile Ad Hoc Netw. Comput., 2008, pp. 241250. tions of student contact patterns derived from campus schedules,
[20] W. Zhao, M. Ammar, and E. Zegura, Multicasting in delay toler- in Proc. 13th Annu. ACM Int. Conf. Mobile Comput. Netw., 2006,
ant networks: Semantic models and routing algorithms, in Proc. pp. 8697.
ACM SIGCOMM Workshop Delay-Tolerant Netw., 2005, pp. 268275.
3174 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 26, NO. 11, NOVEMBER 2015

[44] P. Hui A. Chaintreau, J. Scott, R. Gass, J. Crowcroft, and C. Diot, Kate Ching-Ju Lin received the BS degree from
Pocket switched networks and human mobility in conference the Department of Computer Science, National
environments, in Proc. ACM SIGCOMM Workshop Delay-Tolerant Tsing Hua University in 2003, and the PhD
Netw., 2005, pp. 244251. degree from the Graduate Institute of Networking
[45] N. Eagle and A. Pentland, Reality mining: Sensing complex and Multimedia, National Taiwan University in
social systems, Personal Ubiquitous Comput., vol. 10, no. 4, p. 268, 2009. She was a visiting scholar at CSAIL, MIT
2006. from March 2007 to March 2008 and from Octo-
[46] M. McNett, and G. M. Voelker, Access and mobility of wireless ber 2010 to March 2011. After her graduation,
PDA users, ACM SIGMOBILE Mobile Comput. Commun. Rev., she joined Research Center for Information Tech-
vol. 9, no. 2, pp. 4055, Apr. 2005. nology Innovation at Academia Sinica, Taiwan.
[47] Last.fm [Online]. Available: http://www.last.fm, 2014. She is currently an associate research fellow.
[48] K. Lee, S. Hong, S. Kim, I. Rhee, and S. Chong, SLAW: A new Her current research interests include MIMO systems, wireless multime-
mobility model for human walks, in Proc. IEEE Conf. Comput. dia networking, and mobile social networks. She is a member of the
Commun., 2009, pp. 855863. IEEE and ACM.

Hsueh-Hung Cheng received the BS degree


from the Department of Computer Science and " For more information on this or any other computing topic,
Information Engineering, National Taiwan Univer- please visit our Digital Library at www.computer.org/publications/dlib.
sity in 2011. After his graduation, he served in
Navy as an Information and Network Reserve
officer from December 2011 to July 2012. Cur-
rently, he is a research assistant in Research
Center for Information Technology Innovation at
Academia Sinica, Taiwan. His areas of research
interest include E-home, traffic offloading and
network mobility.