Anda di halaman 1dari 17

A Dual Framework and Algorithms for Targeted

Online Data Delivery


Haggai Roitman, Avigdor Gal, Senior Member, IEEE, and Louiqa Raschid
AbstractA variety of emerging online data delivery applications challenge existing techniques for data delivery to human users,
applications, or middleware that are accessing data from multiple autonomous servers. In this paper, we develop a framework for
formalizing and comparing pull-based solutions and present dual optimization approaches. The first approach, most commonly used
nowadays, maximizes user utility under the strict setting of meeting a priori constraints on the usage of system resources. We present
an alternative and more flexible approach that maximizes user utility by satisfying all users. It does this while minimizing the usage of
system resources. We discuss the benefits of this latter approach and develop an adaptive monitoring solution Satisfy User Profiles
(SUPs). Through formal analysis, we identify sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we
empirically analyze the behavior of SUP under varying conditions. Our experiments show that we can achieve a high degree of
satisfaction of user utility when the estimations of SUP closely estimate the real event stream, and has the potential to save a
significant amount of system resources. We further show that SUP can exploit feedback to improve user utility with only a moderate
increase in resource utilization.
Index TermsDistributed databases, online information services, client/server multitier systems, online data delivery.

1 INTRODUCTION
T
HE diversity of data sources and Web services currently
available on the Internet and the computational Grid, as
well as the diversity of clients and application requirements,
poses significant infrastructure challenges. In this paper, we
address the task of targeted data delivery. Users may have
specific requirements for data delivery, e.g., how frequently
or under what conditions they wish to be alerted about
update events or update values, or their tolerance to delays
or stale information. The challenge is to deliver relevant data
to a client at the desired time, while conserving system
resources. We consider a number of scenarios including RSS
news feeds, stock prices and auctions on the commercial
Internet, and scientific data sets and Grid computational
resources. We consider an architecture of a proxy server that
is managing a set of user profiles that are specified with
respect to a set of remote autonomous servers.
Push, pull, and hybrid protocols have been used to solve a
variety of data delivery problems. Push-based technologies
include BlackBerry [16] and JMS messaging, push-based
policies for static Web content ( e.g., [20]), and push-based
consistency in the context of caching dynamic Web content
(e.g., [30]). Push is typically not scalable, and reaching a large
number of potentially transient clients is expensive. In some
cases, pushing information may overwhelm the client with
unsolicited information. Pull-based freshness policies have,
therefore, been proposed in many contexts such as Web
caching (e.g., [15]) and synchronizing collections of objects,
e.g., Web crawlers (e.g., [4]). Several hybrid push-pull
solutions have also been presented (e.g., [12]). We focus on
pull-based resource monitoring and satisfying user profiles.
As an example, consider the setting of RSS feeds that are
supported by a pull-based protocol. Currently, the burden
of when to probe an RSS resource lies with the client.
Although RSS providers use a Time-To-Live (TTL) measure
to suggest a probing schedule, a study on Web feeds [21]
shows that 55 percent of Web feeds are updated on a
regular hourly rate. Further, due to heavy workloads that
may be imposed by client probes (especially on popular
Web feed providers such as CNN), about 80 percent of the
feeds have an average size smaller than 10 KB, suggesting
that items are promptly removed from the feeds. These
statistics on refresh frequency and volatility illustrate the
challenge faced by a proxy in satisfying user needs. As the
number of users and servers grow, service personalization
through targeted data delivery by a proxy can serve as a
solution for better managing system resources. In addition,
the use of profiles could lower the load on RSS servers by
accessing them only to satisfy a user profile.
Much of the existing research in pull-based data delivery
(e.g., [7], [24]) casts the problem of data delivery as follows:
Given some set of limited system resources, maximize the utility
of a set of user profiles. We refer to this problem as Ojt`oi
1
.
We consider the following two examples: a Grid perfor-
mance monitor tracks computational resources and notifies
users of changes in system load and availability. Excessive
probing of these machines may increase their load and hurt
their performance. As another example, consider a data
source that charges users for access. Clearly, minimizing the
number of probes to such a source is important to keep
probing costs low.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011 5
. H. Roitman is with IBM Haifa Research Lab, Haifa, Israel.
E-mail: haggai@il.ibm.com.
. A. Gal is with the Faculty of Industrial Engineering and Management,
TechnionIsrael Institute of Technology, Technion City, Haifa 32000,
Israel. E-mail: avigal@ie.technion.ac.il.
. L. Raschid is with the University of Maryland, College Park, MD 20742.
E-mail: louiqa@umiacs.umd.edu.
Manuscript received 17 Sept. 2008; revised 5 May 2009; accepted 12 Sept.
2009; published online 11 Jan. 2010.
Recommended for acceptance by J. Liu.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-2008-09-0488.
Digital Object Identifier no. 10.1109/TKDE.2010.15.
1041-4347/11/$26.00 2011 IEEE Published by the IEEE Computer Society
AsolutiontoOjt`oi
1
is accompaniedbythe needtomeet
rigid a priori bounds on system resource constraints. This
may not be adequate in environments where the demand for
monitoring changes dynamically. Examples of changing
needs in the literature include help desks and reservation
systems. Arigidapriori settingmayalsohave theunintended
consequence of forcing excessive resource consumption even
when there is no additional utility to the user.
To address some of the limitations of Ojt`oi
1
, we
propose a framework where we consider the dual of the
previous optimization problem as follows: Given some set of
user profiles, minimize the consumption of system resources while
satisfying all user profiles. We label this problem as Ojt`oi
2
;
it will be formally defined in Section 2. With this class of
problems, user needs are set as the constraining factor of the
problem (and thus, need to be satisfied), while resource
consumption is dynamic and changes with needs. We
present an optimal algorithm in the Ojt`oi
2
class, namely,
Satisfy User Profiles (SUPs). SUP is simple yet powerful in its
ability to generate optimal scheduling of pull requests. SUP
is an online algorithm; at each time point, it can get
additional requests for resource monitoring. Through
formal analysis, we identify sufficient conditions for SUP
to be optimal given a set of updates to resources. Therefore,
it satisfies all client needs with minimal resource consump-
tion. We also show the conditions under which SUP can
optimally solve Ojt`oi
1
as well.
SUP depends on an accurate model of when updates
occur to perform correctly. However, in practice, such
estimations suffer from two problems. First, the underlying
update model that is used is stochastic in nature, and
therefore, updates deviate from the expected update times.
Second, it is possible that the underlying update model is
(temporarily or permanently) incorrect, and the real data
stream behaves differently than expected. To accommodate
changes to source behavior, compensating for stochastic
behavior, correlations, and bursts, SUP exploits feedback
from probes and can adapt the probing schedule in a
dynamic manner and improve scheduling decisions. We
also present SUP(`) that addresses the second problem and
can locally apply modifications to the update model
parameters. Both SUP and SUP(`) are shown empirically
to work well under stochastic variations.
We present an extensive evaluation of the solutions to
the two monitoring problems. For our experimental
comparison of Ojt`oi
1
, we consider the WIC algorithm
[24] which provides the best solution in the literature. For
Ojt`oi
2
, we consider the ubiquitous TTL algorithm [15].
We use real traces from an RSS server and synthetic data,
and several example profiles. Our experiments show that
we can achieve a high degree of satisfaction of user utility
when the estimations of SUP closely estimate the real event
stream, and can save significant amount of system
resources compared to solutions that have to meet strict a
priori allocations of system resources. We further show that
feedback improves user utility of SUP(`) with only a
moderate increase in resource utilization.
The rest of the paper is organized as follows: Section 2
provides a description of dual framework for targeted data
delivery. We next present our model for targeted data
delivery in Section 3. Sections 4-6 introduce SUP, an optimal
dynamic algorithm for solving an Ojt`oi
2
problem,
discuss its properties, and provide a heuristic variation
SUP(`) that locally applies modifications to the update
model parameters. We present our empirical analysis in
Section 7. We conclude with a discussion of related work
(Section 8) and conclusion (Section 9).
2 DUAL FRAMEWORK FOR TARGETED DATA
DELIVERY
In what follows, let 1 = i
1
. i
2
. . . . . i
i
be a set of resources;
let T be an epoch; and let T
1
. T
2
. . . . . T
`
be a set of
equidistant chronons
1
in T . A schedule o = :
i.,

i=1...i.,=1...`
is a set of binary decision variables, set to 1 if resource i
i
is
probed at time T
,
, and 0 otherwise. For example, Fig. 1
illustrates a possible schedule o as a matrix with binary
values, where rows represent resources and columns are
chronons. Thus, for example, at chronon T
3
, the illustrated
schedule assigns probes to resources i
2
and i
4
. We further
denote by o the set of all possible schedules. Next, we
define the dual approaches for targeted data delivery,
namely, Ojt`oi
1
and Ojt`oi
2
.
2.1 Ojt`oi
1
Ojt`oi
1
can be roughly describedas the following problem:
maximize user utility
:.t. satisfying system constraints.
(1)
The Ojt`oi
1
formulation assumes that system con-
straints are hard constraints where their assignment is, in
general, a priori independent of specific user utility
maximization task. For example, in [24], [28], Ojt`oi
1
involves a system resource constraint of the maximum
number of probes per chronon for all resources. In [24], this
constraint represents the number of monitoring tasks that a
Web monitoring systemcan allocate per chronon for the task
of maximizing the utility gained from capturing updates to
Web pages (see Section 7 for a more technical description of
this algorithm). In [28], the same constraint represents the
number of available crawling tasks for maximizing the
freshness of Web repositories. Eckstein et al. [13] present a
politeness constraint which sets an upper bound on the
total number of probes a proxy client is allowed to have for
6 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 1. Examples of a schedule and system constraints.
1. A chronon is an indivisible unit of time. Our model is not restricted to
equidistant chronons, yet such a representation avoids excessive use of
notation.
the whole epoch. These two constraint types are illustrated
in Fig. 1. The vertical oval represents the per-chronon
constraint and the horizontal oval represents the politeness
constraint. Both constraints are violated in the example
given in Fig. 1.
The benefits of Ojt`oi
1
are apparent whenever there are
hard system constraints on resources, e.g., limited band-
width for mobile users. In such a setting, Ojt`oi
1
can
maximize user utility. On the down side, Ojt`oi
1
formula-
tion has two main limitations. First, with a diversity of server
and client profiles, we expect that there will be periods of
varying intensity with respect to the intensity of updates at
the server(s) as well as the intensity of probes needed to
satisfy client profiles. The second problem is the rigidity of
Ojt`oi
1
algorithms with respect to system resource alloca-
tion. It is generally not known a priori how many times we
need to probe sources. An estimate that is too low will fail to
satisfy the user profile, while an estimate that is too high may
result in excessive and wasteful probes (e.g., as is the case
with current RSS applications). Solutions to Ojt`oi
1
have
not dynamically attempted to reduce resource consumption,
even if doing so would not negatively impact client utility.
For example, in a solution to Ojt`oi
1
[24], once the upper
bound on bandwidth has been set (given in terms of how
many data sources can be probed in parallel per chronon),
bandwidth can no longer be adjustedanduser needs may not
be fully met. Moreover, while Ojt`oi
1
may be allocated
with additional system resources over time, this by itself
cannot guarantee an efficient utilization of system resources
that could improve the gain in user utility.
2.2 Ojt`oi
2
We propose a dual formulation Ojt`oi
2
, which reverses the
roles of user utility and system constraints, setting the
fulfillment of user needs as the hard constraint. Ojt`oi
2
assumes that the system resources that will be consumed to
satisfy user profiles should be determined by the specific
profiles and the environment, e.g., the model of updates, and
does not assume an a priori limitation of system resources.
Ojt`oi
2
can be stated as the following general formulation:
minimize system resource usage
:.t. satisfying user profiles.
(2)
2.3 Comparison of Ojt`oi
1
and Ojt`oi
2
The dual problems are inherently different. Therefore, no
solution for one problem can dominate a solution to the
other for all possible problem instances. Each solution has a
better fit for a different application. For example, a crawler
may have dedicated resources that must be consumed or
will go to waste; Ojt`oi
1
fits this scenario. Ojt`oi
2
best
suits scenarios in which system constraints are soft (e.g.,
more bandwidth can be added, or more disk space can be
bought) and where the consumption of these system
resources depends heavily on user specifications. For
example, a proxy serving many clients may procure
resources on demand; here, Ojt`oi
2
works better.
Fig. 2 illustrates the benefit of Ojt`oi
2
. In the monitor-
ing schedule, each vertical bar represents the amount of
needed probes in a chronon to fully satisfy client needs. The
data represented in this figure is taken from one of the
traces we use for our experiments (see Section 7) and is
brought here for illustration purposes only. The two
horizontal lines represent a fixed allocation of probes, one
of a single resource per chronon and the other for three
resources per chronon. For each such allocation, we
compute the amount of missed resources due to insufficient
probes and the wasted probes. For a single resource per
chronon, 95 resources are missed and 29 probes are wasted.
For three resources per chronon, 24 resources are missed
and 158 probes are wasted. This confirms that a flexible
resource allocation, driven by needs, can assist in efficient
resource consumption while catering better to client needs.
3 MODEL FOR TARGETED DATA DELIVERY
The centerpiece of our model is the notion of execution
intervals, a simple modeling tool for representing dynami-
cally changing client needs. We discuss user profiles, server
notifications, and monitoring. We also discuss how execu-
tion intervals are generated from user profiles. We then turn
our attention to the formal definition of a schedule and the
utility of probing.
To illustrate our model and algorithms, we present a case
study using RSS, a popular format for publishing informa-
tion summaries on the Web. Diverse data types are
nowadays available as publications in RSS, including news
and weather updates, blog postings, media postings, digital
catalog notifications, promotions, white papers, and soft-
ware updates. The use of RSS feeds is continuously growing
and is supported by a pull-based protocol. RSS customiza-
tion today is provided using specialized RSS readers (also
known as RSS aggregators). A user of such a reader can
customize her profile by specifying the rate of monitoring
each RSS feed. Some readers even allow defining filtering
rules over the RSS feed content which support further
personalization. Recently, the RSS protocol was extended
with special metatags such as server side TTL that hint
when new updates are expected. We note that while this
improves customization, server side hints such as TTL for
static content delivery are not used often in other contexts,
and was shown to be inefficient [17, 95]. Despite these
features, a client who is only interested in being alerted of
updates for a particular item in some news category,
whenever the rate of updates increases to be at least twice
as much as the usual rate, cannot specify such a profile
using standard available RSS readers. This scenario requires
further refined personalization that is currently unavailable.
Our case study is that of RSS monitoring of CNN News,
providing publications of news updates by CNN on various
topics such as world news, sports, finance, etc. Typically,
only news article titles are provided in an RSS item and a
link directs the reader to the original article. Each item also
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 7
Fig. 2. Resource allocation illustrated.
has a time stamp of its publication date, and sometimes,
CNN also provides TTL tags.
3.1 User Profiles
Profiles are declarative user specifications for data delivery.
A profile should be easy to specify and sufficiently rich to
capture client requirements. A profile should have clear
semantics and be simple to implement. To illustrate the
basic elements of a profile, we introduce next an example of
a profile template (Fig. 3). The profile is given in a profile
language we have developed and whose full specification
can be found in [25]. The language syntax uses XML and its
various elements are explained below. In particular, we
assume that every resource i 1 has a unique identifier
(e.g., URI) and can be described using some schema (e.g.,
Relational Schema, DTD, XML-Schema, RDF-Schema, etc.) A
resource can be queried using a suitable query language
(e.g., SQL, XQuery, SPARQL, etc.). For example, the profile
in Fig. 3 queries an RSS resource using XQuery (RSS schema
is available in [27]). This profile can be stated in English as
follows: Return the title and description of items published on
the CNN Top Stories RSS feed channel, once A new updates to
the channel have occurred. Notification received within Y
minutes after each new A updates occurred will have value of 1
else 0. Notifications should take place during two months starting
on 24 August 2008, 10:00:00 GMT.
In our model, we assume a setting in which a proxy
monitors a set of resources in 1 given a set of client profiles
T = j
1
. j
2
. . . . . j
i
. A Profile j T contains two element
types, namely, Domain and Notification. 1oioii(j) _ 1 is a
set of resources of interest to the client. A notification rule j
is a rule defined over a subset of resources in 1oioii(j). A
profile j contains a set A = j
1
. j
2
. . . . . j
/
of notification
rules. It is worth noting that user profiles can be dynamic,
where the set of notification rules in a given profile may
change over time, expressing changes in user interests.
3.2 Notifications
Clients use notification rules to describe their data needs and
express the utility (see Section 3.4) they assign with data
delivery. A notification rule extends the Event-Condition-
Action(ECA) structure inactive databases [1], [11] andcanbe
modified dynamically by the user. A notification rule j is a
quadruple Q. Ti. T . l). Qis a query writteninsome suitable
query language (e.g., XQuery inthe example profile inFig. 3).
Ti is a trigger. T is the epoch in which rules are evaluated.
Finally, l is a utility expression specifying the utility client
gains by notifications of Q.
A notification query Q is specified over a set of resources
from the profile domain denoted by 1oioii(Q. j). Queries
are equivalent to actions in the ECA structure. Fig. 3 has an
XQuery expression that selects all items from the CNN Top
Stories RSS channel element and returns, for each item, its
title, and description.
A trigger Ti is an event-condition expression c. c)
specifying the triggering event c and a condition c . Ti is
also specified over a set of resources from the profile
domain denoted by 1oioii(Ti. j). It is worth noting that
1oioii(Q. j) and 1oioii(Ti. j) may overlap, that is,
1oioii(Q. j) 1oioii(Ti. j) ,= O. Once an event c is
detected, the condition c is immediately evaluated (im-
mediate coupling mode [18]) and if true, the query Q is
scheduled. We note that other coupling modes are available
in the literature [18].
We consider two event types. The first is an update to a
resource in 1oioii(Ti. j). The second is a temporal event,
e.g., once an hour. The condition c is a Boolean value
expression. For example, the trigger in Fig. 3 specifies an
update event
AFTER UPDATE TO $rss,channel .
with a condition
NUPDATE( $rss,channel )% X = 0.
where % is the modulo division operator. NUPDATE
returns the total number of times the RSS channel was
updated since the start of the notification epoch.
Anotification utility expression l states the utility a client
gains from notifications of Q. For example, the utility
expression in Fig. 3 specifies the following utility expression:
WITHIN Y minutes 1 ELSE 0.
which means that the client assigns utility of 1 for notifica-
tions of Qthat are delivered at maximumdelay of Y minutes
after A update events have occurred. In Section 3.4, we
provide formal specifications of the utility of probing.
3.3 Execution Intervals and Monitoring
Once an event, specified in the trigger part of the
notification rule, occurs, the trigger condition is immedi-
ately evaluated and if it is true, the notification is said to be
executable. The period in which a notification rule is
executable was referred to in the literature as life [24]. We
emphasize here the difference between the executable
period of a notification (life) and the period in which rules,
in general, can be evaluated (epoch). Two examples of life
we shall use in this paper are |i)c = o.ciniitc, in which an
update is available for monitoring only until the next
update to the same resource occurs. Another life setting is
|i)c = niidon(Y ), for which an update can be monitored
up to Y chronons after it has occurred (where Y = 0 denotes
a requirement for immediate notification).
8 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 3. Profile example for RSS feeds.
The time period in which a notification is executable for
some event defines an execution interval (interval for short),
during which monitoring (i.e., the query part of a notifica-
tion) should take place. An execution interval starts with an
event and its length is determined by the relevant life policy.
Each notification rule j is associated with a set of intervals
11(j). For each 1 11(j), we define t(1) as the times T
,
T
in which the notification is executable for interval 1. If, for
example, 1 is determined using |i)c = niidon(Y ) policy,
then t(1) would contain exactly Y chronons. It is worth
noting that execution intervals of a notification rule may
overlap; thus, the execution of a notification query may occur
at the same time for two or more events that cause the
notification to become executable. It is also worth noting that
execution intervals change dynamically once a notification
rule has been modified.
Monitoring can be done using one of three methods,
namely, push-based, pull-based, or hybrid. With push-based
monitoring, the server pushes updates to clients, providing
guarantees with respect to data freshness at a possibly
considerable overhead at the server. With pull-based mon-
itoring, content is delivered upon request, reducing over-
head at servers, with limited effectiveness in estimating
object freshness. The hybrid approach combines push and
pull, either based on resource constraints [12] or role
definition. As an example of role-based hybrid push-pull,
consider an architecture in which a mediator is positioned
between clients and servers. The mediator can monitor
servers by periodically pulling their content, and determine
when to push data to clients based on their content delivery
profiles. In this paper, we focus on pull-based monitoring.
For completeness, we describe in the online supplement,
which can be found on the Computer Society Digital
Library at http://doi.ieeecomputersociety.org/10.1109/
TKDE.2010.15, via an example, how execution intervals
can be derived from notification rules using update models.
For this purpose, we utilize Poisson update models. In [7],
[14], [19], it was argued that the use of an update model
based on Poisson processes suits well updates in a Web
environment. Poisson processes are suitable for modeling a
world where data updates are independent from one
another, which seems plausible in data sources with widely
distributed access, such as updates to auction Web sites.
Following [14], we devise an update model, based on
nonhomogeneous Poisson processes, capturing time-vary-
ing update intensities. Such a model reflects well scenarios
in which e-mails arrive more rapidly during work hours
and more bids arrive toward the end of an auction.
Example 1. As an example, we now assume that A = 2
and Y = 10, that is, the notification rule described in
Fig. 3 requires to deliver every other update assuming
|i)c = niidon(10). Fig. 4 illustrates an example of an
update event stream realization, estimated using some
update model. The gray intervals are the derived
execution intervals and the black intervals in Fig. 4
illustrate the derived execution intervals in the case
with |i)c = o.ciniitc.
Example 1 highlights one of the main aspects of our
model, which is personalization. Note that the life parameter
represents not only different server capabilities but also
client preferences. Some clients are interested in receiving an
update before the next update arrives (e.g., giving a purchase
order before the next price change), while others are tolerant
to some extent (represented by a time-based window). The
SUP algorithm we introduce in this work proposes an
efficient schedule given a variety of user profiles, repre-
sented by an abstract set of execution intervals. Fromnowon
we assume the availability of a streamof execution intervals,
possibly generated using the method suggested in the online
supplement, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TKDE.2010.15. It is worth noting that the derivation
of execution intervals can be done online. Such online
derivation can be used to delay the generation of execution
intervals, thus utilizing feedback gathered during monitor-
ing to improve future monitoring. We use this observation to
derive adaptive monitoring schemes.
As a final note, in this work, we assume that once an
execution interval is probed, the notification to the user is
immediate. An extension in which notifications may be
delayed is easy to model using execution intervals. In
such cases, an execution interval 1
/
is computed to be
1
/
= [T
:
. T
)
1[, where 1 denotes the estimated delay in
notification to the user. The interval is shortened (from
the right) to ensure a timely delivery of update events.
3.4 Schedules and the Utility of Probing
Let A
/
be the set of notification rules of profile j
/
T. Let
j A
/
be a notification rule that utilizes resources from j
/
domain. The satisfiability of a schedule with respect to j is
defined next.
Definition 1. Let o o be a schedule, j be a notification rule
with 1oioii(Q. j), and T be an epoch with ` chronons. o is
said to satisfy j in T (denoted by o [
T
j) if
\1 11(j). \i
i
1oioii(Q. j)(T
,
t(1) : :
i.,
= 1).
Definition 1 requires that in each execution interval, every
resource referenced by js query Q is probed at least once. It
is worth noting that each execution interval 1 11(j) is
associated with some (either update or periodical) event,
and therefore, a schedule that satisfies the notification rule j
actually needs to capture every event required in j.
Whenever it becomes clear from the context, we use o [
j instead of o [
T
j. Definition 1 is easily extended to a
profile and a set of profiles, as follows:
Definition 2. Let o o be a schedule, T = j
1
. j
2
. . . . . j
i
be
a set of profiles, and T be an epoch with ` chronons.
o is said to satisfy j
/
T (denoted as o [ j
/
) if for
each notification rule j A
/
, o [ j.
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 9
Fig. 4. Example execution intervals derived from an update model.
o is said to satisfy T (denoted as o [ T) if for each profile
j
/
T, o [ j
/
.
Example 2. As an example of profile satisfaction, we
consider Fig. 5. Fig. 5 contains an example user profile
and two possible schedules. In the left side of Fig. 5, we
have an epoch with five chronons and three execution
intervals, each is associated with a different notification
rule of the user profile. The first 1
1
requires to probe
resource i
1
during [T
1
. T
2
[; the second 1
2
requires to
probe resource i
2
during [T
3
. T
4
[; and finally, the third 1
3
that requires to probe resource i
3
during chronon T
5
.
Both Schedule 1 and Schedule 2 in Fig. 5 probe each one
of the execution intervals; thus, they both satisfy the
three notification rules, and thus, satisfy the profile.
Given a notification rule j A
/
and a resource
i
i
1oioii(Q. j), a utility function n(i
i
. j. T
,
) describes
the utility of probing a resource i
i
at chronon T
,
according
to notification rule j. Intuitively, probing a resource i at
chronon T is useful (and therefore, should receive a positive
utility value) if it is referred to in the Query part of the
notification rule and if the condition in the Trigger part of
that profile holds. It is important to emphasize again the
difference of roles between the Query part and the Trigger
part of the notification rule. In particular, probing a resource
i is useful only if the data required by a notification rule
(specified in the Query part) can be found at i.
n(i
i
. j. T
,
) is derived by assigning positive utility when a
condition is satisfied, and a utility of 0 otherwise. n is
defined to be strict if it satisfies the following condition:
n(i
i
. j. T
,
) =
n. T
,

S
111(j)
t(1) . i
i
1oioii(Q. j).
0. otherwise.
&
(3)
That is, n(i
i
. j. T
,
) assigns a constant value of n whenever
there exists an execution interval for resource i
i
, derived
from notification rule j, and the probe of resource i
i
,
referenced by the query Q, coincides with the time
constraints of the execution interval.
From now on we shall assume the use of a binary utility,
i.e., n = 1. Examples of strict utility functions include
uniform (where utility is independent of delay) and sliding
window (where utility is 1 within the window and 0 outside
it). Examples of nonstrict utility functions are linear and
nonlinear decay functions. Nonstrict utilities quantify
tolerance toward delayed data delivery (or latency). We
shall restrict ourselves in this work to strict utility functions.
The case of nonstrict utility functions can be handled in the
scope of Ojt`oi
2
problems by allowing users to define a
threshold for the minimal utility required in the user profile
(e.g., the maximum delay allowed on each notification to
the user). We handle such utilities in our model using the
restrictions of |i)c = niidon(Y ) parameter.
The expected utility l accrued by executing monitoring
schedule o in an epoch T is given by
l o ( ) =
X
j
i
|=1
A
|
X
11(j)
X
i
i
1oioii(Q.j)
min 1.
X
T
,
1
:
i.,
n(i
i
. T
,
. j)
0
@
1
A
.
(4)
The innermost summation ensures that utility is accumu-
lated whenever a probe is performed within an execution
interval. This utility cannot be more than 1 since probing a
resource more than once within the same execution interval
does not increase its utility. The utility is summed over all
execution intervals, all relevant resources, and over all
notification rules in a profile.
Example 3. As a concrete example of utility calculation,
consider once more Fig. 5. In this example, we sawthat the
two schedules satisfy the profile, where each execution
interval monitoring credits each schedule with a utility of
1, and since they both satisfy the three notification rules,
the total utility acquired by these schedules is 3.
4 THE SUP ALGORITHM
Let 1 = i
1
. i
2
. . . . . i
i
be aset of iresources, T
1
. T
2
. . . . . T
`

be a set of chronons in an epoch T , and o = :


i.,

o be a
schedule. Let T = j
1
. j
2
. . . . . j
i
be a set of user profiles. A
concrete Ojt`oi
2
problem can be formalized as follows:
minimize
X
i
i
1.T
,
T
:
i.,
:.t. o [ T.
(5)
Recall that a notification rule j is associated with a set of
resources 1oioii(Q. j). Given a notification rule j and the
set of its execution intervals 11(j), SUP identifies the set of
resources Q
j
1
_ 1oioii(Q. j) that must be probed in an
execution interval 1 11(j).
The main intuition behind the SUP algorithmis to identify
the best candidate chronon in which the assignment of
probes to resources maximizes the number of execution
intervals that can benefit fromeach probe. This will then lead
to a reduction of the number of resources in Q
j
1
that need to
be probed during each execution interval 1. We identify the
best candidate chronons by delaying the probes of execution
intervals to the last possible chronon in which the utility is
10 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 5. Example of a user profile and two schedules satisfying it.
still positive (and notifications can be safely delivered to
users). As we will prove later in Section 5.2, such probe
assignments ensure a perfect elimination order of execution
intervals that leads to an optimal solution.
Example 4. Fig. 6 provides an illustration of SUP execution
philosophy, with two notification rules j
1
(shown at the
top of the figure) and j
2
(shown at the bottom).
Notification rule j
1
is executable once every two updates
with a life window of four chronons (|i)c = niidon(4)).
Notification rule j
2
is executable once every five updates
with |i)c = o.ciniitc. In this example, we assume that
both queries of notification rules j
1
and j
2
refer to
the same set of resources, that is, 1oioii(Q. j
1
) =
1oioii(Q. j
2
). Stars represent estimated update events,
using some update model. The execution intervals,
defined in Section 3.3 to be the times in which a
notification rule is executable, are denoted as rectangles.
The upper gray rectangles represent the derived execu-
tion intervals of notification rule j
1
, while the lower black
rectangles represent the ones derived from notification
rule j
2
. We number execution intervals (EIs) for conve-
nience of exposition.
Suppose the last update has occurred during chronon
T
1
and SUP has also probed at T
1
for the last time. After
the second update which occurred during chronon T
2
,
notification rule j
1
becomes executable and the EI 1
1
=
[T
2
. T
6
[ was generated and delivered to SUP. SUP delays
the probe of each execution interval until the last
possible chronon in which the notification rule j
1
is
executable (and still has some value to the user). Thus, EI
1
1
is scheduled for probing at chronon T
6
. After the
fourth update event at chronon T
5
, notification rule j
1
becomes executable again and a new execution interval
1
2
= [T
4
. T
8
[ is generated (and scheduled for probing at
T
8
). Meanwhile, at chronon T
5
after the occurrence of the
fifth update event, notification rule j
2
also becomes
executable for the first time and remains so for the next
three chronons until the occurrence of the sixth update
event, resulting in the generation of EI 1
3
= [T
5
. T
8
[ (and
scheduled by SUP at chronon T
8
). At chronon T
6
, EI 1
1
is
being probed by SUP (according to the schedule). At that
chronon, EI 1
1
overlaps with EIs 1
2
and 1
3
(which we
prove in Section 5.2 to be the maximum possible overlap
with 1
1
), and thus, by probing EI 1
1
, SUP guarantees also
that the other two EIs are also probed (satisfied). Thus,
using a single probe, SUP can satisfy three different EIs
for the two notification rules. The same process occurs
again with EI 1
4
at chronon T
13
, resulting in a total usage
of two probes by SUP that satisfy all six EIs.
We now provide a description of the algorithm. The
pseudocode of the SUP algorithm and the two routines,
namely, AdaptiveEIsUpdate and UpdateNotificationEIs that
served in building our prototype are available in the online
supplement, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TKDE.2010.15. The algorithm builds a schedule
iteratively. It starts with an empty schedule (\:
i.,
o.
:
i.,
= 0) andrepeatedlyadds probes. The algorithmgenerates
an initial probing schedule, where the last chronon in the first
1 11(j) is picked to execute the probe. It then determines
the earliest chronon in which to probe, the notification rule
associated with this monitoring task, and the specific
execution interval. When probed, all resources in the query
part of that notification rule are probed.
SUP depends on an accurate set of execution intervals to
perform correctly. However, in practice, determining a set
of execution intervals suffers from two main problems.
First, the underlying update model that is used to compute
the execution intervals is stochastic in nature, and therefore,
updates deviate from the expected update times. Second, it
is possible that the underlying update model is (temporarily
or permanently) incorrect, and the real data stream behaves
differently than expected.
To tackle these two problems, we propose to exploit
feedback from probes to revise the probing schedule in a
dynamic manner, after eachmonitoring task. Thus, execution
interval generation is deferred to the last possible moment,
and responds to deviations in the expected update behavior
of sources that it observes as a result of probing feedback,
which is used to improve its next scheduling decisions. We
first introduce the general scheme of SUP that addresses the
first problem and does not require changes to any para-
meters. In Section 6, we present a heuristic improvement
SUP(`) that addresses the second problem and adds local
online modifications to update model parameters.
Recall that given a notification rule j and an execution
interval 1 11(j), SUP probes the resources Q
j
1
referenced
in the notification rule query Q. Given a resource i
i
Q
j
1
that is probed by SUP and a notification rule j, we assume
that we can use the feedback from probing Q
j
1
to validate
whether j was actually executable during 1 or not. For this
purpose, we define a Boolean function C
j
()ccd/oc/(i
i
). T)
set to true if j is executable at time T given the feedback
gathered from probing resource i
i
. )ccd/oc/(i
i
) is a
feedback function, returning feedback data that were
gathered from the last probing of resource i
i
. In this
work, we consider a feedback function )ccd/oc/(i
i
) that
returns the actual number of update events of resource i
i
.
It is worth noting that such validation from feedback is
possible (or required) only when the event c of the
notification rule trigger Ti is an update-based event and
when 1oioii(Ti. j) 1oioii(Q. j) ,= O, implying that
there is at least one resource that is referenced both by
the trigger part and the query part of the notification rule.
It is also noteworthy that using feedback from probing a
resource i
i
Q
j
1
, SUP can conserve system resources
when the validation of notification rule j fails (i.e.,
C
j
()ccd/oc/(i
i
). T) = 1o|:c).
Given that a resource i
i
Q
j
1
was probed by SUP, we use
the feedback )ccd/oc/(i
i
) to validate the notification rule j
using C
j
()ccd/oc/(i
i
). T). When validation fails, SUP
prunes the probing of the rest of the resources in Q
j
1
i
i

ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 11
Fig. 6. Illustrating example of SUP execution.
that were not probed yet, and makes adaptive modifications
to the input execution intervals that require also to probe
resource i
i
, including execution interval 1 itself.
SUP uses the AdaptiveEIsUpdate routine to apply the
adaptive modifications. This routine first applies adaptive
modification to notification rule j, by recalculating a new
execution interval 1
+
to be scheduled. Then, the routine
determines a set of notification rules (denoted by A
dcj
(i
i
))
that may be associatedwithexecutionintervals that needtobe
modifiedbyidentifyingthose notificationrules that reference
resource i
i
intheir trigger part. For eachsuchnotificationrule
j
/
A
dcj
(i
i
), the routine then identifies execution intervals
1
/
11(j
/
) that needtobe modifiedaccordingtothe feedback
from probing resource i
i
. For each execution interval that
needs tobe modified(where the notificationrule j
/
is foundto
be invalid), a new execution interval is calculated using the
feedback and replaces the invalid one.
The following example illustrates the adaptive nature of
the SUP algorithm:
Example 5. As an example, consider our case study
notification rule j and assume that at the time of
monitoring resource i
i
, only)ccd/oc/(i
i
) = | < Aupdates
occur. Here, we define C
j
()ccd/oc/(i
i
). T) as follows:
C
j
()ccd/oc/ i
i
( ). T) = Tinc = )ccd/oc/ i
i
( ) = A.
SUP will generate a new execution interval, checking for
A | updates ahead. To illustrate the mechanism for
generating 1
+
, consider Fig. 7. SUP has set the monitoring
task for this execution interval to be at chronon T
/
t
. At
chronon T
/
t
, the monitoring task has revealed that in the
interval (T
:
. T
/
t
[, only | < A updates have occurred, and
thus, C
j
()ccd/oc/(i
i
). T)) = 1o|:c. The last update has
occurred at chronon T
:
< T
/
:
< T
/
t
. Therefore, a new
execution interval is now computed.
We provide a more detailed description of this
example using a specific update model in the online
supplement, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TKDE.2010.15. It is worth noting that while the
example handles the case of )ccd/oc/(i
i
) = | < A, where
less updates than expected occurred, the AdaptiveEIsUp-
date routine is general and handles also the case of
)ccd/oc/(i
i
) = | A. In this case, there is at least one
missed update and the procedure revises the schedule
to capture the next update on time.
The UpdateNotificationEIs routine is called to ensure that
resources that belong to overlapping intervals are only
probed once. This routine involves a rather simple book-
keeping. We explain next the routine logic. Let | = j be the
assignment of SUP, where j is the notification rule whose
execution interval 1 is processed at time T
,
, and all
resources referenced in Q
j
1
are scheduled for probing at
time t
,
. Given an execution interval 1
/
of a notification
rule j
/
, this procedure removes from Q
j
/
1
/ the (possibly
empty) resource set Q
j
1
Q
j
/
1
/ if T
,
t(1
/
). By doing so, we
ensure that resources that belong to overlapping execution
intervals will be probed only once. In addition, this
procedure removes any execution interval 1 for which
Q
j
1
= O, allowing SUP to consider only execution intervals
for which monitoring is still needed. The process continues
until the end of the epoch.
5 ALGORITHM PROPERTIES
SUP assumes the availability of a stream of execution
intervals, generated using this or that update model. Such
abstractionallows the algorithmtofocus onthe monitoringof
execution intervals; thus, SUPoptimal solution depends only
onthe number of executionintervals it is requiredto consider
during the monitoring task. This implies that SUP can handle
an arbitrary number of user profiles, depending only on the total
number of execution intervals of all input profiles.
SUP is executed in an online fashion, where execution
intervals are introduced right before they are required to be
considered by SUP. This adds further flexibility to the
monitoring scheme by allowing user profiles to change over time.
Further, we can exploit the feedback gathered during the
monitoring scheme to better improve the probing of future
scheduled execution intervals by adaptive monitoring.
SUP accesses O(1) execution intervals, where 1 is the
number of total probes in a schedule, bounded by `i
(number of resources multiplied by number of chronons in
an epoch). We expect, however, 1 to be much smaller than
`i, since 1 serves as a measure of the amount of data users
expect to receive during the monitoring process.
We next provide a detailed analysis of three of SUPs
properties. Section 5.1 analyzes SUP correctness. SUP
optimality is given in Section 5.2. Finally, in Section 5.3,
we discuss terms under which SUP is also optimal as an
Ojt`oi
1
solution.
5.1 SUP Correctness
SUP correctness is given by the following theorem:
Theorem 1. Let o be the schedule generated by ol1. Given a set
of profiles T = j
1
. j
2
. . . . . j
i
, o [ T.
Proof. Let o be the schedule generated by ol1. Let j T,
j A(j), and 1 11(j). We define
A(1) = 1
/
; 1
/
,= 1 . 1 1
/
,= O . j
/
A : 1
/
11(j
/
).
t = min
1
/
A(1)
max t(1
/
).
Let i
i
Q
j
1
. First, lets assume that i
i
Q
j
1
Q
j
/
1
/ . If t 1,
then according to the algorithm, :
i.,
= 1 where T
,
= t.
Else, the algorithm selects another 1
/
A(1) and probes
all resources of Q
j
/
1
/ including i
i
. In both cases, T
,
T :
:
i.,
= 1 for resource i
i
Q
j
1
. Let
^
1 be the execution interval
selected by ol1. In case
^
1 = 1, then all resources in Q
j
1
were probed and we finish. Else, 1 has some resources in
Q
j
1
that were not probed, and thus, still remains in A(1)
and we repeat again the same process. At every such step,
we are guaranteed that at least one execution interval will
12 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 7. Illustrating example of SUP adaptivity.
be removedfromA(1). Since according to the algorithm, 1
will not be removed fromA(1) until each resource i
i
Q
j
1
is probed, in the worst case, all resources of Q
j
1
will be
probed after [A(1)[ steps. Thus, we guarantee to probe
every resource i
i
Q
j
1
in some chronon inside 1, and
according to Definition 2, we get that o [ T. .
SUP is surprisingly simple, given its ability to ensure
correctness, and in some clearly defined cases, efficiency
(see below). We attribute the algorithm simplicity to the
Ojt`oi
2
formalism and the execution interval abstraction.
Generally speaking, a new probe is set for a resource at the
last possible chronon where a notification remains execu-
table. That is, it is deferred to the last possible chronon
where the utility is still 1. This, combined with the use of
Procedure 2, is needed to develop an optimal schedule in
terms of system resource (probes) utilization.
5.2 SUP Optimality
We now provide an optimality proof based on the graph-
theoretic properties of SUP. We begin with the following
definition:
Definition 3. Given A =
i
/=1
A
/
, a resource i 1, and two
execution intervals 1 and 1
/
, we say that the intervals r-
intersect (denoted by 1
i
1
/
) if the following two conditions
are satisfied:
1. 1 1
/
,= O.
2. j. j
/
A : i Q
j
1
. i Q
j
/
1
/ .
According to Definition 3, two execution intervals r-
intersect if the same resource i is required to be probed
during some shared chronon of both execution intervals.
Given a set of profiles T = j
1
. j
2
. . . . . j
i
and an epoch
T = T
1
. T
2
. . . . . T
`
, we construct an interval graph
G(\ . 1) from the execution intervals derived fromT during
the epoch T , where \ = 1[1 A and 1 = (1. 1
/
)[i
1 : 1
i
1
/
. It is worth noting that G can be defined as a
union graph G(\ . 1) =
S
i
i=1
G
i
(\
i
. 1
i
), where for each
subgraph G
i
(\
i
. 1
i
), \
i
= 1[j A : i
i
Q
j
1
and 1
i
=
(1. 1
/
)[1
i
i
1
/
.
It is known that for every interval graph (or a general
chordal graph), there always exists a perfect elimination
ordering [5]. A perfect elimination ordering of a graph G is
an order -
G
that assures that every vertex . selected by the
order and the set of its neighbors (denoted by `(.)) that
succeed . in -
G
jointly form a clique.
We first show that SUP provides a perfect elimination
ordering for each subgraph G
i
(\
i
. 1
i
). Let _
ol1
be the SUP
order, that is, given two execution intervals 1 and 1
/
, SUP
prefers the interval with an earlier termination (see lines 8
and 24 of the algorithm pseudocode in the online supple-
ment, which can be found on the Computer Society Digital
Library at http://doi.ieeecomputersociety.org/10.1109/
TKDE.2010.15). Formally:
1 _
ol1
1
/
= max
,
t(1) _ max
,
t(1
/
) .
Lemma 2. Given G
i
(\
i
. 1
i
), _
ol1
provides a perfect elimina-
tion ordering of G
i
.
Proof. Let 1 be an interval selected by SUP order _
ol1
, and
let `(1) be the set of neighbors of 1 in G
i
. According to
_
ol1
, every interval 1
/
`(1) intersects with 1 and ends
together with or after interval 1. Thus, every two
intervals in `(1) intersect. Therefore, `(1) 1 is a
clique in G
i
. Since 1 is an arbitrary interval selected by
SUP, _
ol1
provides a perfect elimination ordering. .
We next show that the set of neighbors of an interval 1
that is selected by SUP is the largest possible for 1, and thus,
the clique `(1) 1 is the maximal possible clique that
contains 1.
Lemma 3. Let 1 be an interval selected by SUP for probing at
chronon T = max
tt(1)
t, then the clique formed from 1
`(1) at chronon T is a maximal clique containing 1.
Proof. SUP chooses to probe interval 1 at the last possible
chronon for probing 1, where at that chronon, 1 intersects
with all of its possible neighbor intervals, and therefore,
the clique formed from `(1) 1 is maximal. .
Given a schedule o [ T, we denote by 1
i
=
P
T
,
T
:
i.,
the total number of probes performed by schedule o during
the epoch T by monitoring resource i
i
1 . Thus, the total
number of probes of schedule o is given by 1 =
P
i
i
1
1
i
.
The following concludes the proof of SUP optimality:
Theorem 4. Let 1 = i
1
. i
2
. . . . . i
i
be a set of i resources,
T
1
. T
2
. . . . . T
`
be a set of chronons in an epoch T , and
o = :
i.,

be a monitoring schedule, generated by Algorithm
SUP, with 1. Let o
/
o be a schedule that satisfies o
/
[ T
with 1
/
. Then, 1 _ 1
/
.
Proof. SUP decision making is independent for each
resource i
i
1, and therefore, the problem is separable
in the number of resources. Consider a resource i
i
1.
Let 1 be any execution interval probed by o with respect
to resource i
i
. 1 may or may not be probed in o
/
. Assume
that 1 was probed by o
/
and let T be the probe chronon.
Let `
o
(1) and `
o
/ (1) denote the number of i
i
-intersect-
ing execution intervals captured by probing 1 by o and
o
/
, respectively. Obviously, T _ 1.T
)
and according to
Lemma 3, we get that `
o
(1) _ `
o
/ (1). Now assume that
1 was not probed by o
/
. Therefore, since o
/
[ T, there
must exist some other i
i
-intersecting execution interval 1
/
that was probed by o
/
. Let T
/
be the chronon in which o
/
probed 1
/
. Again, since o
/
[ T, we have that T
/
_ 1.T
)
(otherwise, o
/
will not capture 1). 1
/
was not probed by o,
since according to SUP order, the following holds
1.T
)
_ 1
/
.T
)
. Therefore, for this case, we have again that
the following must hold `
o
(1) _ `
o
/ (1
/
). Using this
result, we have
1
i
= [\
i
[
X
1o
`
o
(1)
_ [\
i
[
X
1oo
/
`
o
/ 1 ( )
X
1
/
o
/
o
`
o
/ 1
/
( )
0
@
1
A
= 1
/
i
.
concluding that: 1 _ 1
/
. .
Probing at the last possible chronon ensures an optimal
usage of system resources (probes) while still satisfying user
profiles. However, due to the stochastic nature of the process,
probing later may decrease the probability of satisfying the
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 13
profile. This is true, for example, with hard deadlines where
once the deadline has passed, the utility is 0. Determining an
optimal chronon for probing, i.e., the one that maximizes the
probability of satisfying the profile, depends on the
stochastic process of choice, and is itself an interesting
optimization problem. We defer this analysis to future work.
5.3 Terms for SUP Dual Optimality
Generally speaking, the dual optimization problems
Ojt`oi
1
and Ojt`oi
2
cannot be compared directly.
Satisfying user profiles may violate system constraints
and satisfying system constraints may fail to satisfy user
profiles. However, the following Theorem 5 provides an
interesting observation. Theorem 4 shows that SUP pro-
vides a schedule with minimal system resource utilization.
The following theorem (which proof is immediate from (4))
shows that the schedule generated by SUP also has
maximum utility for the class of strict utility functions
(and hence, can maximize utility while minimizing system
resource consumption).
Theorem 5. Let 1 = i
1
. i
2
. . . . . i
i
be a set of i resources,
T
1
. T
2
. . . . . T
`
be a set of chronons in an epoch T , T =
j
1
. . . . . j
i
be a set of user profiles, and o = :
i.,
be a
monitoring schedule, generated by SUP, with a utility l(o). If
for every notification rule j A, n(i
i
. T
,
. j) is strict, then
l(o) _ l(o
/
) for any schedule o
/
= :
/
i.,
,= o.
Proof. The maximal value of (4) is
P
jA
P
111(j)
[Q
j
1
[ and it
is achieved when any arbitrary schedule o guarantees
that o [ T. According to Theorem 1, ol1 generates such
a schedule, thus has maximal utility. .
Whenever the resources consumed by SUP satisfy the
system constraints of Ojt`oi
1
, then SUP is guaranteed to
solve the dual Ojt`oi
1
(as well as Ojt`oi
2
) and maximize
user utility, while at the same time minimizing resource
utilization. As an example, consider an algorithmthat sets an
upper limit ` on the number of probes in a chronon for all
pages. Assume that in the schedule of SUP, the maximum
number of probes in any chronon satisfies `. Since SUP
utilizes in each chronon only the amount of probes that is
needed to satisfy the profile expressions, the total number of
probes will never exceed ` `. Whenever strict utility
functions are used, SUP can serve as a basis for solving the
dual problem Ojt`oi
1
. A schedule o, generated by SUP
with no bound on systemresource usage, and a set of desired
systemresource constraints, can be usedas a starting point in
solving Ojt`oi
1
, as illustrated in Fig. 2. o can be used to
avoid overprobing in chronons when less updates are
expected. System resources may be allocated to chronons
that are more update intensive. In this situation, SUP may
serve as a tentative initial solution to the Ojt`oi
1
problem,
allowing local tuning at the cost of reduced utility. We defer
a formal discussion of SUP under system constraints to
future research.
6 SUP WITH LOCAL MODEL MODIFICATION
We next illustrate an approach to managing local errors in
the update model. For purpose of illustration, we assume
a piecewise constant Poisson update model [15], as follows:
Let J

= (J
1
. J
2
. . . . . J
/
) be a set of / intervals J
i
= [T
i
:
. T
i
)
)
aligned on the epoch T such that T
i1
:
= T
i
)
, with T
1
:
= T
1
and T
/
)
= T
`
. The update model associates a constant
intensity level `
i
for each interval J
i
. Therefore, the model
is given as a set of pairs `

= (J
i
. `
i
))
/
i=1
. Fig. 8 provides
an illustration of such a model with / = 3. The horizontal
thin lines represent the three different intensities of the
model.
SUP (`) is an extension of SUP that utilizes the feedback
gathered from the data delivery process to include local
adaptive modifications to the update model itself. In
particular, if feedback indicates that many updates have
been missed, SUP(`) will locally compensate for such a
change in update frequency by locally increasing frequency
and vice versa. It is worth noting that this model
modification technique is heuristic in nature. More statisti-
cally, robust techniques will involve methods developed in
research areas such as statistical process control (e.g., [29]).
The pseudocode of SUP(`) is given in the online
supplement, which can be found on the Computer Society
Digital Library at http://doi.ieeecomputersociety.org/
10.1109/TKDE.2010.15. The algorithm works as follows:
First, as in SUP, it validates the notification rule j given the
feedback )ccd/oc/(i
i
). In case the validation fails, it modifies
the update model of resource i
i
(denoted by `

(i
i
)) by
calling procedure AdaptUpdateModel (also available in the
online supplement, which can be found on the Computer
Society Digital Library at http://doi.ieeecomputersociety.
org/10.1109/TKDE.2010.15). This procedure uses the feed-
back about actual number of events and applies local
modifications to the update model adaptively. Finally, as in
SUP, SUP(`) calls the procedure AdaptiveEIsUpdate to
determine the revised schedule. We now describe the
operation of AdaptUpdateModel procedure which is illu-
strated in Fig. 8. Using the current schedule o, it first locates
the last chronon that a probe was assigned to resource i
i
in
schedule o (denoted by T
jic.
). Then, SUP(`) finds an interval
J = [T
1
. T
l
[ that includes the chronon T on which the current
probe of resource i
i
took place. The start andend chronons of
J define a regioninwhichthe intensityremains constant with
regardto the current intensity at chrononT. It is worthnoting
that the start point T
1
is chosen from the interval [T
jic.
. T[;
therefore, SUP(`) adaptively modifies the updated model by
utilizing only the feedback that falls inside the constant
intensity regionto whichchrononT belongs anddoes not use
feedback that was gathered before chronon T
jic.
(the last
chronon on which the resource i
i
was probed before T). We
14 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 8. Illustrating example of SUP(`) scheme.
term the region [T
1
. T[ the Effective Feedback Region. We first
extract the feedback of the actual number of events i
T
that
occurred during the interval [T
1
. T[ from )ccd/oc/(i
i
). We
then modify the update model by replacing the pair J. `)
with two new pairs J
/
. `
/
) and J
//
. `
//
). For the first pair
J
/
. `
/
), we use the feedback to determine a new estimated
intensity `
/
=
i
T
TT
1
during J
/
= [T
1
. T).
2
Then, we estimate
the intensity `
//
during J
//
= [T. T
l
) by smoothing the local
intensity ` that corresponds to J with the intensity that is
calculated from the feedback `
/
. The smoothing parameter c
is defined by the portion of the feedback region out of the all
J region and is used to avoid overfitting of the newfeedback
local intensity. The revisedintensities arerepresentedinFig. 8
as boldface lines.
7 EXPERIMENTS
We present empirical results, analyzing the behavior of SUP
and SUP(`) under varying settings. We start in Section 7.1
with a description of the trace data sets and the experiment
setup. We then analyze the impact of profile selection, life
parameter, and update model on SUP performance (Sec-
tions 7.2-7.3). Section 7.4 presents an empirical comparison
with existing solutions to the dual optimization problems.
Finally, we compare SUP(`) to SUP (Section 7.5) and show
that the former improves on the latters effective utility with
a moderate increase in the number of probes.
7.1 Data Sets and Experiment Setup
We implemented SUP in Java, JDK version 1.4 and
experimented with it on various trace data sets, profiles, life
parameters, and update models. Traces of update events
include real RSS feedtraces andsynthetic traces. We consider
two different update models, FPN and Poisson (to be
discussed shortly) to model the arrival of new update events
to these traces. For comparison purposes, we also imple-
mented WIC as described in [24] to determine a schedule for
Ojt`oi
1
and TTL [15] as another (yet very simple) Ojt`oi
2
solution. We briefly review the two solutions.
Web information collector (WIC). The WIC algorithm
gets as input four decision parameters, namely, j
i.,
,
|i)c
i
(/. ,), niqcicy
i
(, /), and a constraint `. j
i.,
denotes
the probability that resource i
i
will be updated during
chronon T
,
; |i)c
i
(/. ,) denotes the probability that an update
that occurred to resource i
i
at chronon T
/
will still be
available at chronon T
,
at the server; niqcicy
i
(, /)
denotes the value of a (, /) chronons delayed monitoring
of resource i
i
. We initialized the j
i.,
probabilities for WIC
using the two update models. We used the Overwrite and
Window(Y) life instantiations as defined in [24]. We further
defined a uniform urgency of updates by setting
niqcicy
i
(, /) = 1. WIC is a greedy algorithm, and at each
chronon T
,
, it chooses to probe ` resources with the
highest local gained utility, where such utility is given as
l
i
(/. ,) = j
i.,
|i)c
i
(/. ,) niqcicy
i
(, /).
TTL. Given a TTL parameter, a probe is scheduled for
each resource every TTL chronons. Using TTL, we can
simulate a periodical poll of servers, such as the one
proposed by standard RSS aggregators.
Table 1 summarizes the various dimensions of our
experiments. We next discuss eachparameter inmore details.
Trace data set. We used data from a real trace of RSS
Feeds. We collected RSS news feeds from several Web sites
such as CNN and Yahoo!. We have recorded the events of
insertion of new feeds into the RSS files. In this paper, we
present results for 2,873 updates to CNN Top stories RSS
feed [9] collected for one month during September and
October 2005.
3
We also generated two types of synthetic
data. The first set simulates an epoch with three different
exponential interarrival intensity, medium (first half a day),
low (next two days), and high (last half a day). This data set
can model the arrival of bids in an auction (without the final
bid sniping). The second data set has a stochastic cyclic
model of one week, separating working days from week-
ends, and working hours from night hours. Such a model is
typical for many applications [14], including posting to
newsgroups, reservation data, etc. Here, it can be repre-
sentative of an RSS data with varying update intensity. This
data set was generated assuming an exponential (time-
dependent) interarrival time. Table 2 summarizes the
number of recorded events for each of the three data sets.
The epoch size varies from one data set to another. Each
epoch was partitioned into ` = 10.000 chronons.
Profile and notification rule. We used the profile
template RSS_Monitoring in Fig. 3 as a basis. We
use the Num_Update_Watch notification rule. We vary
the values of A = 1. . . . . 5. For the life parameter, we have
varied window with Y 0. . . . . 100 chronons. We also
consider a life parameter of overwrite.
Update model. As described in Section 3.3, we use update
models to estimate the update pattern at a server, and to
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 15
2. Note that this estimate may be higher or lower than the current
parameter. 3. The trace is available on http://ie.technion.ac.il/~avigal/trace.zip.
TABLE 1
Summary of the Experiment Parameters
TABLE 2
Summary of the Data Sets
trigger monitoring of servers according to profiles. The
estimated pattern may not coincide with the actual update
events at the server. Thus, the choice of an update model has
an impact on profile satisfaction. We used two different
update models to represent updates at servers and modeled
each one of the three data sets with these models, as follows:
. Poisson update model: Following [14], we devised
an update model as a nonhomogeneous Poisson
process. Therefore, we have a Poisson process with
instantaneous arrival rate ` : 1 [0. ) as a model
of occurrence of update events. The number of update
events occurring in any interval (:. )[ is assumed to
be a Poisson random variable with expected value
(:. )) =
R
)
:
`(t)dt.
. False positives and false negatives (FPNs) update
model: Following [24], we devised the FPN update
model. Given a stream of updates, a probability j
i.,
is assigned the value 1 if a resource i
i
is updated at
chronon T
,
. Once probabilities are defined, we add
noise to the probability model, as follows: Given an
error factor 7 [0. 1[, the value of j
i.,
is switched
from 1 to 0 with probability 7. Then, for each
modified j
i.,
, a new chronon T
,
/ is randomly selected
and the value of j
i.,
/ is set to 1. Note that FPN can be
applied to any data trace, regardless of its true
stochastic pattern.
While the Poisson update model can be used to model
real-world updates where updates are predicted based on
past observations (e.g., using update histories), the FPN
model is actually a synthetic model that requires the
complete stream of updates to construct the model. There-
fore, the purpose of the FPN model in our experiments is to
measure the sensitivity of SUP to update model noise, i.e., a
noise that is attributed to the usage of this or that update
model that sometimes estimates updates that deviate from
the actual updates. Since the Poisson update model is
generated using updates observed in the past in order to
predict future updates, such noise may be present.
With three data sets, two update models (and parameter
variations for FPN), and varying profile parameters, there is
a large number of possible experiment configurations. In
this work, we restrict ourselves to presenting results with
the more interesting configurations.
Recall that an optimal schedule o
+
for SUP gives it a
maximum utility. For the variety of update models and
profile settings, the actual schedule o will possibly have a
lower utility. We measure the effective utility of schedule o
as the ratio of o and o
+
utility.
7.2 Impact of Profile Selection
In our first experiment, we report on the impact of profile
selection on the online effective utility for SUP. In this set of
experiments, we do not allow the use of feedback,
effectively setting C
j
()ccd/oc/(i
i
). T) to always be Tinc.
Fig. 9 illustrates the results of variations of the profile
template in Fig. 3. We vary the A value (maximum number
of updates a client can tolerate) from 1 to 5; this value is
plotted on the r-axis. We also vary the life parameter,
introducing four different life parameters, overwrite and
window(Y) with Y = 0. 10, and 20 chronons. It is worth
noting that Y = 0 generates execution intervals with width
of a single chronon, meaning that the event associated with
each interval should be delivered immediately without
further delay, while larger values Y = 10. 20 generate wider
execution intervals, allowing some (constant) delay in
notifications. Each life parameter is represented by a
different curve. We choose the update model to be 11`
with 7 = 0.6. We present the results for the three data sets.
For all data sets and all values of A as the value of Y
increases, satisfying the profile is easier since Y controls
the window to satisfy the profile. Hence, for higher Y , the
effective utility increases. The value of A reflects the
complexity of the profile. For example, for A = 4, the update
model must accurately predict four updates. As the value of
A increases, all of the update models will have increasing
cumulative error in estimating consecutive updates. Thus,
for larger A, the effective utility decreases.
An interesting observation is that the performance of
overwrite for synthetic data 1 and RSS was worse than
window(Y = 10. 20), while for synthetic data 2, the perfor-
mance was better. This indicates that with synthetic data 2,
SUP is allowed more maneuvering space to monitor
properly, probably due to the way, updates are spread
across the epoch. For the other two data sets, it seems that
an average update event was overwritten within less than
10 chronons from the time of its occurrence, while for the
last data set, it was above 20 chronons. Finally, as Y keeps
increasing, the effective utility is expected to continue to
increase as well, reaching the value of 1 when Y ` for
any given A.
7.3 Impact of Update Model Selection
We study next how various parameter settings for the FPN
model and the use of the Poisson model impact effective
utility. Recall that we introduce stochastic variation in the
update model through 7 (FPN model parameter), when-
ever it is strictly less than 1. We present the performance of
SUP for the Num_Update_Watch notification rule, with
A = 1. . . . . 5 and the overwrite life parameter. We use all
three data sets to illustrate our results. Here, we also do not
allow the use of feedback.
In Fig. 10, SUP has 100 percent effective utility for
7 = 1.0, since the FPN update model for 7 = 1.0 accurately
estimates all updates. As we modify the parameter 7 from
1.0 to 0.4, more variance is added, and effective utility is
expected to decrease. We observe that for all update
models, for higher A values, the effective utility decreases.
This is because all update models have increasing difficulty
in predicting four or five consecutive updates.
We observe that the Poisson model has differing
behavior for the different data sets, compared to the FPN
model. For the Synthetic Data 1, the effective utility of the
Poisson model is more-or-less bounded by 7 = 0.6 and
7 = 0.8; this indicates that the Poisson model that was used
reflects this data trace up to about an error of 20-40 percent.
For Synthetic Data 2, the effective utility of the Poisson
model is typically below the effective utility of FPN for
7 = 0.4. This implies that the Poisson model had about
60 percent error. Finally, for the RSS data, the effective
utility of the Poisson model appears to dominate all
variations of the FPN model for which 7 < 1.0 and A 1,
indicating that the Poisson model may best represent this
RSS trace for complex profiles.
16 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
7.4 Ojt`oi
1
and Ojt`oi
2
Recall that while Ojt`oi
1
problems set hard constraints on
system resources, Ojt`oi
2
aims at minimizing system
resource utilization. Further, Ojt`oi
2
secures the full
satisfaction of user specification (given an accurate update
model), while Ojt`oi
1
can only aim at maximizing it.
Thus, we cannot compare solutions to Ojt`oi
1
and
Ojt`oi
2
directly. Instead, we make the following indirect
comparison: 1) We compare the system resource (probes)
utilization of the different solutions. 2) Given some level of
system resource utilization, we compare the effective utility
of the different solutions.
Both SUP and TTL are solutions of Ojt`oi
2
. The TTL
solution will use the server provided TTL
4
to determine
when the next probe to a resource should be to satisfy a
profile. WIC[24] is a solutionto the Ojt`oi
1
problem. Fig. 11
provides the system resource utilization and corresponding
utility of the three algorithms. The experiment uses the
Synthetic Data 1 data set which contains 244 resources
while using a profile with A = 1 and |i)c = Overwrite for
each resource. We add a parameter denoted by `, used by
WIC, to represent a system constraint on the total number of
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 17
Fig. 9. SUP performance for various profiles.
Fig. 10. SUP performance for various update models.
4. Such TTL is in the RSS 2.0 specification [28] and is used by RSS servers
to suggest the next update time of an RSS channel.
probes allowed per chronon. Fig. 11a provides the analysis
results for 11` with 7 = 1.0, where updates occur at the
expected update time as determined by the update model.
Fig. 11b provides the execution results, assuming a Poisson
update model. It is worth noting that TTL does not take into
consideration the update model, and therefore, its perfor-
mance remains the same both in Figs. 11a and 11b.
In Fig. 11, SUP and SUP(`) are represented by a single
point each in the graph. In Fig. 11a, SUP performs optimally
with an effective utility of 1.0. The optimal number of probes
for SUP is 2,462 for this data set. We study WIC and TTL
under various parameter settings; we consider 500, 1,000,
and 2,000 for the number of chronons in an epoch T . The
three curves WIC(N =500. 1.000. 2.000) represent these
parameter settings for WIC, while TTL(N = 500. 1.000.
2.000) for TTL. We also varied the ` level for WIC. The
x-axis represents the total number of probes, which is equal
to ` ` of WIC. Thus, for ` = 500 chronons and ` = 20,
WIC consumes 10,000 probes. Similarly, with ` = 1.000
chronons and ` = 20, WIC consumes 20,000 probes. Given
that TTL is allowed to probe the same total number of probes
as WIC (` `) and assuming that there are i resources, all
have the same importance, each resource was allocated with
``
i
probes. The TTL value (in chronons) used for each
resource monitoring is then given by
`
``
i
| =
i
`
|.
We observe that the effective utility of TTL is less than
for SUP, even with increasing number of probes. The value
of effective utility for WIC is less than both SUP and TTL.
We now focus on the data set and the Poisson update
model of Fig. 11b. For this data set and model, the effective
utility for SUP is about 0.62 (about 62 percent of the optimal).
This corresponds to 3,904 probes; the effective utility is
represented by a single point. In this case also, SUP performs
better for the same number of probes thanbothTTLandWIC.
For all N values, WIC starts with low effective utility
(less than 0.2) and as the number of probes increases, the
utility monotonically increases for values of ` = 500 and
` = 1.000. For value of ` = 2.000, WIC-effective utility
sometimes drops when we increase the number of probes.
In order to reach a utility of 0.62 (equivalent to that of SUP),
it requires more than 20,000 probes, which is approximately
five times higher than resource consumption of SUP. TTL
also starts low, yet higher than WIC, and its effective utility
also increases as the number of probes increases. TTL
requires more than 7,000 probes to reach effective utility of
0.62, which is approximately 1.8 times higher than that of
SUP. For the two update models, we can observe that TTL
has better effective utility than WIC for the same number of
total probes. The reason for that is that TTL, unlike WIC,
has no upper bound of ` resources per chronon and can
actually probe all resources at once.
The relatively low effective utility indicates that predict-
ing an update event may not be very accurate, serving as an
empirical justification to the introduction of feedback in
SUP. Fig. 11b shows that SUP(`), which uses feedback more
aggressively than SUP, manages to improve the effective
utility by more than 15 percent with an increase in the
number of probes. We shall compare SUP and SUP(`) in
more details in Section 7.6.
7.5 Impact of Adaptiveness
We performed experiments on all three data sets, and the
various update models, comparing SUP with and without
the use of feedback.
Fig. 12 illustrates the impact of feedback in the RSS data
set with life = overwrite. Fig. 12a presents the increase in
relative utility when using feedback for four variations of
FPN. For 7 = 1.0, SUP performance is optimal, and there-
fore, feedback cannot improve the schedule. For smaller
FPN values, feedback does not improve the performance for
A = 1. This is because the performance with and without
feedback converges to generating the same execution
intervals. Therefore, the execution intervals generated using
feedback will always coincide with SUP existing execution
intervals, and no additional monitoring tasks will be issued
by the SUP algorithm. For larger A values, however,
feedback improves significantly (for this data set, up to
200 percent for A = 4 and 7 = 0.8).
The cost of feedback is presented in Fig. 12b. Again, for
7 = 1.0, no modification to the schedule is needed and
feedback adds no extra probes. For other models, and for
A 1, effective utility improvement comes at a cost, albeit
not a big one. For example, for 7 = 0.4 and A = 5, the
increase in the number of probes was 71 percent (compare
with 90 percent increase in effective utility). It is noteworthy
that the increase in effective utility and probing is not
necessarily correlated. For example, for A = 4, the effective
utility for 7 = 0.4 is dropping, while the number of probes
slightly increases.
18 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
Fig. 11. SUP, WIC, and TTL for Synthetic Data 1 data set for (a) FPN(1) and (b) Poisson.
7.6 SUP versus SUP(`)
Fig. 13compares betweenSUPandSUP(`). We usedRSSdata
and life = window with Y = 50, 1 _ A _ 10, and the Poisson
model. The light-colored line shows the improvement in
terms of effective utility. The dark line shows the improve-
ment in terms of number of monitoring tasks, where
improvement means less probes.
The results show that SUP(`) consistently improves on
SUP. It is worth noting that even in the case of A = 1,
SUP(`) manages to improve on SUP with a mild increase in
the number of probes. Fig. 11b also shows this improve-
ment. It also shows that SUP(`) dominates WIC for all
variants but one (WIC 500 with 65,000 probes) and many of
the TTL variants as well.
As for probe increase, we can observe that for A = 1. 2. 3,
SUP(`) requires slightly more probes then SUP, but for
A 4, SUP(`) manages to produce higher effective utility
while reducing SUP cost. Although SUP(`) is a heuristic
solution and no guarantees to its performance are given,
these results remain consistent with other parameter
settings as well (not shown in this paper).
8 RELATED WORK
Pull-based freshness policies require clients to contact
servers to check for updates to objects. Such policies have
been proposed in many contexts such as Web caching and
synchronizing collections of objects, e.g., Web crawlers.
There has been much research in the Web caching
community on pull-based freshness policies [15]. These
policies typically rely on heuristics to estimate the freshness
of a cached object, for example, estimating freshness as a
function of the last time the object was modified. Other
works [19], [14] have proposed the use of an update model
to represent, in stochastic terms, update arrival. Pull-based
freshness has also been addressed in the context of
synchronizing a large collection of objects, e.g., Web
crawlers [4], [8]. These works propose policies for prefetch-
ing objects from remote sources to maximize the freshness
of objects in the cache. The goal is to refresh a collection of
objects offline, rather than handle client requests online.
Quality-driven data delivery involves the design of
efficient algorithms for data delivery subject to system and
user constraints. Designing such algorithms is harder in
pull-based settings than in push-based, since the update
process is known only in stochastic terms. We next present a
set of dimensions (see Fig. 14 for an illustrative comparison)
to classify pull-based approaches, followed by an overview
of some existing approaches. We then classify each
approach along these dimensions and discuss the limita-
tions of existing approaches and research challenges.
The first dimension we consider is when objects are
refreshed, either asynchronously, on demand, or some
combination of the two. Researches in [4], [7], [24] are
purely asynchronous and refresh data independent of client
requests. Others, e.g., L-R Profiles [15], [3], are purely on
demand and only refresh objects when they are requested
by clients. Finally, approaches such as Prevalidation [10],
[23], [17] lie in between these two extremes and perform
both asynchronous and on-demand data access.
The second dimension is the objective and constraints of
the problem. We group these together along the y-axis in
Fig. 14 . The objective is the value to be optimized, e.g., data
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 19
Fig. 12. Impact of feedback for RSS data, life = overwrite.
Fig. 13. Relative performance of SUP and SUP(`) for RSS data,
life = window(50).
Fig. 14. Classification of existing pull-based policies along several
dimensions.
recency or client utility, and the constraints are limitations,
e.g., bandwidth. By utility we mean some client-specified
function to measure the value of an object to a client, based
on a metric such as data recency, e.g., [3] or importance to
the client, e.g., [6]. We now present several existing
approaches and describe how we classify them along the
above dimensions.
On-demand approaches.
. TTL: TTL [15] is commonly used to maintain
freshness of object copies for applications such as
on-demand Web access. Each object is assigned a
Time-to-Live (either server-defined or estimated
using heuristics), and any object requested after this
time must be validated at a server to check for
updates. TTL aims to maximize the recency of data
and assumes no bandwidth constraints. Thus, we
classify it as (on demand, recency, none).
. TTL with prevalidation (TTL-Prevalidation): Preva-
lidation [10] extends TTL by asynchronously vali-
dating expired cached objects in the background. As
in TTL, the goal is to maximize data recency. This
approach assumes limits on the amount of band-
width for prevalidation, but as in TTL, it assumes no
bandwidth constraints for on-demand requests.
. Latency-Recency Profiles (L-R Profiles): Latency-
recency profiles [3] are a generalization of TTL that
allow clients to explicitly trade off data recency to
reduce latency using a utility function. The objective
is to maximize the utility of all client requests. This
policy assumes no bandwidth constraints. We
classify this as (on demand, utility, none).
. Profile-Driven Cache Management (PDCM): Profile-
driven cache management [6] enables data rechar-
ging for clients with intermittent connectivity.
Clients specify profiles of the utility of each object.
The objective is to download a set of objects to
maximize client utility, while the client is connected.
PDCM does not consider updates to objects.
Asynchronous approaches.
. Cache Synchronization (Synch): The objective of
cache synchronization [7] is to maximize the average
recency of a set of objects in a cache, subject to
constraints on the number of objects that can be
synchronized (for simplicity, we express this as a
bandwidth constraint). This approach does not
incorporate client utility or preferences into the
decision. Application-aware cache synchronization
(AA-Synch) [4] improves upon this by taking object
popularity into account. In [22], a cooperative
approach between a cache and its data sources is
presented that aim at offering a best effort cache
synchronization under bandwidth constraints.
. WIC: WIC [24] aims to monitor updates to a set of
information sources subject to bandwidth con-
straints. The objective is to capture updates to a set
of objects, rather than maximize the average fresh-
ness of a cache as in cache synchronization [7]. This
approach does not consider client requests or client
utility (utility is given only in terms of server ability
to capture updates). Thus, we classify this as
(asynchronous, recency, bandwidth).
SUP is also classified in Fig. 14. It is an asynchronous
algorithm. Following the dual approach, presented in this
paper, SUP is classified as an algorithm that aims at
minimizing bandwidth while keeping an optimal utility as
its constraint.
SUP(`) uses feedback to modify the model itself using
local and transient changes to the model. Alternating
between predefined models was suggested in [2], where a
mechanism to choose between two possible update models
is established. Such a mechanism was suggested to handle
bursts of updates.
9 CONCLUSIONS
In this work, we focused on pull-based data delivery that
supports user profile diversity. Minimizing the number of
probes to sources is important for pull-based applications to
conserve resources and improve scalability. Solutions that
can adapt to changes in source behavior are also important
due to the difficulty of predicting when updates occur. In this
paper, we have addressed these challenges through the use
of a new formalism of a dual optimization problem
(Ojt`oi
2
), reversing the roles of user utility and system
resources. This revised specification leads naturally to a
surprisingly simple, yet powerful algorithm (SUP) which
satisfies user specifications while minimizing system re-
source consumption. We have formally shown that SUP is
optimal for Ojt`oi
2
and under certain restrictions can be
optimal for Ojt`oi
1
as well. We have empirically shown,
using RSS data traces as well as synthetic data, that SUP can
satisfy user profiles and capture more updates compared to
existing policies. SUP is adaptive and can dynamically
change monitoring schedules. Our experiments show that
using feedback in SUP improves the performance with a
moderate increase in the number of needed probes.
We believe that the main impact of this work will be in
what is now known as the Internet of things, where sensor
data are collected, analyzed, and utilized in many different
ways, based on users needs. With the Internet of things,
user profiles, and their satisfaction dictate the way data are
utilized, and monitoring sensor data efficiently is a
mandatory prerequisite to the creation of any information
system that is based on such data.
Ojt`oi
2
is defined in such a way that satisfaction of a
user profile is a hard constraint. However, sometimes,
profile may state preferences rather than hard constraints.
Extending the problem to handle profile preferences poses a
new challenge to this problem. Adding preferences was
discussed in [26], where a trade-off was suggested between
completeness (which is defined as a hard constraint in this
work) and delay of information delivery. This specification
yields a biobjective problem definition (both client satisfac-
tion and utility maximization). The algorithmic solution
changes to identify the Pareto curve of feasible, pairwise
nondominated solutions. Another way of adding prefer-
ences to this work is by redefining utility (set to be
completeness in this work) to include a variety of dimen-
sions, combined through some linear or other combinations.
We consider this problem as another challenge and an
avenue for future research.
In future work, we shall also consider how to incorporate
resource constraints into SUP. We shall investigate the
optimal positioning of monitoring tasks in an execution
20 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 1, JANUARY 2011
interval, maximizing the probability of satisfying user
profiles, given the stochastic nature of the update model.
Finally, we shall investigate the changes to our algorithmic
solution whenever nonstrict utilities are present.
REFERENCES
[1] A. Adi and O. Etzion, AmitThe Situation Manager, Intl J.
Very Large Data Bases, vol. 13, no. 2, pp. 177-203, May 2004.
[2] L. Bright, A. Gal, and L. Raschid, Adaptive Pull-Based Policies
for Wide Area Data Delivery, ACM Trans. Database Systems,
vol. 31, no. 2, pp. 631-671, 2006.
[3] L. Bright and L. Raschid, Using Latency-Recency Profiles for
Data Delivery on the Web, Proc. Intl Conf. Very Large Data Bases
(VLDB), pp. 550-561, Aug. 2002.
[4] D. Carney, S. Lee, and S. Zdonik, Scalable Application-Aware
Data Freshening, Proc. IEEE CS Intl Conf. Data Eng., pp. 481-492,
Mar. 2003.
[5] L.S. Chandran, L. Ibarra, F. Ruskey, and J. Sawada, Generating
and Characterizing the Perfect Elimination Orderings of a Chordal
Graph, Theoretical Computer Science, vol. 307, no. 2, pp. 303-317,
2003.
[6] M. Cherniack, E. Galvez, M. Franklin, and S. Zdonik, Profile-
Driven Cache Management, Proc. IEEE CS Intl Conf. Data Eng.,
pp. 645-656, Mar. 2003.
[7] J. Cho and H. Garcia-Molina, Synchronizing a Database to
Improve Freshness, Proc. ACM SIGMOD, pp. 117-128, May 2000.
[8] J. Cho and A. Ntoulas, Effective Change Detection Using
Sampling, Proc. Intl Conf. Very Large Data Bases (VLDB), 2002.
[9] CNN Top Stories RSS Feed, http://rss.cnn.com/services/rss/
cnn_topstories.rss, 2010.
[10] E. Cohen and H. Kaplan, Refreshment Policies for Web Content
Caches, Proc. IEEE INFOCOM, pp. 1398-1406, Apr. 2001.
[11] U. Dayal et al., The HiPAC Project: Combining Active Databases
and Timing Constraints, SIGMOD Record, vol. 17, no. 1, pp. 51-70,
Mar. 1988.
[12] P. Deolasee, A. Katkar, P. Panchbudhe, K. Ramamritham, and P.
Shenoy, Adaptive Push-Pull: Disseminating Dynamic Web Data,
Proc. Intl World Wide Web Conf. (WWW), pp. 265-274, May 2001.
[13] J. Eckstein, A. Gal, and S. Reiner, Optimal Information Monitor-
ing under a Politeness Constraint, Technical Report RRR 16-2005,
RUTCOR, Rutgers Univ., May 2005.
[14] A. Gal and J. Eckstein, Managing Periodically Updated Data in
Relational Databases: A Stochastic Modeling Approach, J. ACM,
vol. 48, no. 6, pp. 1141-1183, 2001.
[15] J. Gwertzman and M. Seltzer, World Wide Web Cache
Consistency, Proc. USENIX Ann. Technical Conf., pp. 141-152,
Jan. 1996.
[16] BlackBerry Wireless Handhelds, http://www.blackberry.com,
2010.
[17] Z. Jiang and L. Kleinrock, Prefetching Links on the WWW, Proc.
IEEE Intl Conf. Comm., 1997.
[18] G. Kappel, S. Rausch-Schott, and Retschitzegger, Beyond
Coupling Modes: Implementing Active Concepts on Top of a
Commercial OODBMS, Object-Oriented Methodologies and Systems,
S. Urban and E. Bertino, eds., pp. 189-204. Springer-Verlag, 1994.
[19] J.-J. Lee, K.-Y. Whang, B.S. Lee, and J.-W. Chang, An Update-Risk
Based Approach to TTL Estimation in Web Caching, Proc. Conf.
Web Information Systems Eng. (WISE), pp. 21-29, Dec. 2002.
[20] C. Liu and P. Cao, Maintaining Strong Cache Consistency on the
World Wide Web, Proc. Intl Conf. Distributed Computing Systems
(ICDCS), 1997.
[21] H. Liu, V. Ramasubramanian, and E.G. Sirer, Client and Feed
Characteristics of rss, a Publish-Subscribe System for Web
Micronews, Proc. Internet Measurement Conf. (IMC), Oct. 2005.
[22] C. Olston and J. Widom, Best-Effort Cache Synchronization with
Source Cooperation, Proc. ACM SIGMOD, pp. 73-84, 2002.
[23] V. Padmanabhan and J. Mogul, Using Predictive Prefetching to
Improve World Wide Web Latency, ACM SIGCOMM Computer
Comm. Rev., vol. 26, no. 3, pp. 22-36, July 1996.
[24] S. Pandey, K. Dhamdhere, and C. Olston, WIC: A General-
Purpose Algorithm for Monitoring Web Information Sources,
Proc. Intl Conf. Very Large Data Bases (VLDB), pp. 360-371, Sept.
2004.
[25] Promo Language Specification, http://ie.technion.ac.il/
~avigal/ProMoLang.pdf, 2010.
[26] H. Roitman, A. Gal, and L. Raschid, Capturing Approximated
Data Delivery Tradeoffs, Proc. IEEE CS Intl Conf. Data Eng., 2008.
[27] RSS, http://www.rss-specifications.com, 2010.
[28] J.L. Wolf, M.S. Squillante, P.S. Yu, J. Sethuraman, and L. Ozsen,
Optimal Crawling Strategies for Web Search Engines, Proc. Intl
World Wide Web Conf. (WWW), pp. 136-147, 2002.
[29] E. Yashchin, Change-Point Models in Industrial Applications,
Nonlinear Analysis, vol. 30, pp. 3997-4006, 1997.
[30] J. Yin, L. Alvisi, M. Dahlin, and A. Iyengar, Engineering Server-
Driven Consistency for Large Scale Dynamic Web Services, Proc.
Intl World Wide Web Conf. (WWW), pp. 45-57, May 2001.
Haggai Roitman received the BSc degree in
information systems engineering and the PhD
degree in information management engineering
from the Technion in 2004 and 2009, respec-
tively. He is a research staff member at IBM
Haifa Research Lab (HRL). He works in the
Information Retrieval Solutions Group. His main
research interests are in the boundary between
dynamic data management (e.g., Web monitor-
ing) and content management (e.g., content
analysis and content dissemination networks), Web 2.0 data manage-
ment, and data integration. He is also an adjunct lecturer in the William
Davidson Faculty of Industrial Engineering and Management, Technion.
He has published several papers in leading conferences (e.g., VLDB,
ICDE, SIGIR, CIKM, and JCDL). In his free time, he enjoys mastering
his DJ skills.
Avigdor Gal received the DSc degree in the
area of temporal active databases in 1995 from
the TechnionIsrael Institute of Technology,
where he is an associate professor. He has
published more than 80 papers in journals (e.g.,
Journal of the ACM (JACM), ACM Transactions
on Database Systems (TODS), IEEE Transac-
tions on Knowledge and Data Engineering
(TKDE), ACM Transactions on Internet Technol-
ogy (TOIT), and VLDB Journal), books (Tem-
poral Databases: Research and Practice), and conferences (ICDE, ER,
CoopIS, and BPM) on the topics of data integration, temporal
databases, information systems architectures, and active databases.
He is a steering committee member of IFCIS, a member of IFIP WG 2.6,
and a recipient of the IBM Faculty Award for 2002-2004. He is a member
of the ACM and a senior member of the IEEE.
Louiqa Raschid received the bachelors degree
from the Indian Institute of Technology, Chennai,
in 1980, and the PhD degree from the University
of Florida in 1987. She is a professor at the
University of Maryland. She has published more
than 140 papers in the leading conferences and
journals in databases, scientific computing, Web
data management, bioinformatics, and AI in-
cluding the ACM SIGMOD, VLDB, AAAI, IEEE
ICDE, ACM Transactions on Database Systems,
IEEE Transactions on Knowledge and Data Engineering, IEEE
Transactions on Computers, and the Journal of Logic Programming.
Her research has received multiple awards including more than 25 grants
from the US National Science Foundation (NSF) and US Defense
Advanced Research Projects Agency (DARPA). Papers that she
coauthored have been nominated for or won the Best Paper Awards
at the 1996 International Conference on Distributed Computing
Systems, the 1998 International Conference on Cooperative Information
Systems, and the 2008 International Conference on Data Integration in
the Life Sciences. She has been recognized as an ACM distinguished
scientist. She has chaired or served on multiple IEEE and ACM program
committees and the editorial boards of the VLDB Journal, ACM
Computing Surveys, ACM Journal on Data and Information Quality,
Proceedings of the VLDB, INFORMS Journal of Computing, and the
IEEE Transactions on Knowledge and Data Engineering. She has
played a key role in the Sahana FOSS project for disaster information
management including serving as the chief database architect and
board chair (2006-2008). Sahana is the only comprehensive product for
disaster information management.
ROITMAN ET AL.: A DUAL FRAMEWORK AND ALGORITHMS FOR TARGETED ONLINE DATA DELIVERY 21