Anda di halaman 1dari 22

Distributed Caching Algorithms

for Content Distribution Networks

Sem Borst

INRIA - Bell Labs Workshop

Murray Hill, June 1, 2009

Joint work with Anwar Walid, Varun Gupta (CMU)


Introduction

Scope: personalized/on-demand delivery of high-definition


video through service provider

• Live broadcast TV

• CatchUp TV / PauseLive TV features

• NPVR (Network Personal Video Recorder) capabilities

• Movie libraries / VoD (Video-on-Demand)

• User-generated content

Unicast nature defies conventional broadcast TV paradigm


1
Caching strategies

Focus: ‘hierarchical’ network architecture

Store popular content closer to network edge to reduce


traffic load, capital expense and performance bottlenecks

Two interrelated problems

• Design: optimal cache locations and sizes


(joint work with Marty Reiman)

• Operation: efficient (dynamic) placement of content


items

2
Optimal content placement

Consider symmetric scenario (cache sizes, popularity dis-


tributions)

For now, assume strictly hierarchical topology: content can


only be requested from parent node

Caches should be filled with most popular content items


from lowest level up

VHO

IO

CO

DSLAM

STB
3
Greedy content placement strategy

Whenever node receives request for item, its local ‘popu-


larity estimate’ for that item is updated

If requested item is not stored in local cache, then

• Request is forwarded to parent node

• Popularity estimate for requested item is compared with


that for ‘marginal’ item, which may then be evicted and
replaced

Provable ‘convergence’ to optimal content placement

4
Optimal content placement (cont’d)

Relies on two strong (though reasonable) assumptions

• Symmetric popularity distributions and cache sizes

• Strictly hierarchical topology

What if popularity distributions are spatially heterogeneous?

Or what if content can be requested from peers as well?

5
Optimal content placement (cont’d)

Assume there are caches installed at only two levels

VHO

IO

CO

DSLAM

STB

6
Optimal content placement (cont’d)

Consider cluster of M nodes at same level in hierarchy

Cluster nodes are either directly connected or indirectly via


common parent node

root node

parent node

leaf nodes

1 2 M

7
Optimal content placement (cont’d)

Some notation

• c0: transfer cost from root node - 1 to parent node 0

• ci: transfer cost from parent node 0 to node i

• cij : transfer cost from leaf node j to leaf node i

Then


 c 0 + ci j =i

 c
0 j =0
fij :=



0 j = −1

c0 + ci − cij j 6= −1, 0, i
represents transport cost savings achieved by transferring
data to leaf node i from node j instead of root node
8
Optimal content placement (cont’d)

Problem of maximizing cost savings may be formulated as


M X
X N M
X
max sndin fij xjin (1)
i=1 n=1 j=0
N
X
sub snxin ≤ Bi, i = 0, 1, . . . , M (2)
n=1
xjin ≤ xjn, i = 1, . . . , M, j = 0, 1, . . . , M, n = 1, . . . , N
(3)
M
X
xjin ≤ 1, i = 1, . . . , M, n = 1, . . . , N, (4)
j=0
with Bi denoting cache size of i-th node, sn size of n-th
item, din demand for n-th item at i-th node

9
Inter-level cache cooperation

Allow for heterogeneous popularity distributions, but as-


sume cij = ∞, i.e., content can only be fetched from parent
node and not from peers

For compactness, denote cmin := mini=1,...,M ci

Proposition

For arbitrary popularity distributions, greedy content place-


ment strategy is guaranteed to achieve at least fraction
(M − 1)cmin + M c0 M

(M − 1)cmin + (2M − 1)c0 2M − 1
of maximum achievable cost savings

10
Intra-level cache collaboration

Now suppose content can be requested from peers as well

Intra-level connectivity allows distributed caches to coop-


erate and act as single logical cache, and makes caching
at lower levels more cost-effective

Greedy optimization of local hit rate will lead to complete


replication of cache content

Cache cooperation improves aggregate hit rate across cache


cluster, at expense of lower local hit rate

Optimal trade-off and degree of replication depends on


cost of intra-level transfers relative to transfers from parent
or root node

11
Intra-level cache cooperation (cont’d)

Assume symmetric transport cost, cache sizes and popu-


larity distributions: Bi ≡ B, ci ≡ c, cij ≡ c0, and din ≡ dn

For compactness, denote c00 := M (c + c0) − (M − 1)c0 > c0

Problem (1)–(4) may be simplified to


N
sndn(c00pn + (M − 1)c0qn
0 + Mc x )
X
max 0 0n
n=1
N
X
sub snx0n ≤ B0
n=1
N
0 ) ≤ MB
X
sn(pn + (M − 1)qn
n=1
pn + x0n ≤ 1, n = 1, . . . , N
0 +x
qn 0n ≤ 1, n = 1, . . . , N

Knapsack problem type structure


12
Intra-level cache collaboration (cont’d)

Optimal solution of content placement problem has rela-


tively simple structure

Distinguish between two cases

• M c ≥ (M −1)c0: more advantageous to store un-replicated


content in leaf nodes than in parent node

• M c ≤ (M − 1)c0: more attractive to store un-replicated


content in parent node than in leaf nodes

13
Case M c ≥ (M − 1)c0

root node

parent node

leaf nodes

1 2 M

Four popularity ‘tiers’

0 =1
• Highly popular (red): replicated in all leafs pn = 1, qn

• Fairly popular (pink): stored in single leaf pn = 1

• Mildly popular (yellow): stored in parent node x0n = 1

• Hardly popular (green): stored in root node only


14
Case M c ≥ (M − 1)c0

root node

parent node

leaf nodes

1 2 M

Four popularity ‘tiers’

0 =1
• Highly popular (red): replicated in all leafs pn = 1, qn

• Fairly popular (pink): stored in common parent x0n = 1

• Mildly popular (yellow): stored in single leaf pn = 1

• Hardly popular (green): stored in root node only


15
Local-Greedy algorithm

For convenience, assume B0 = 0, sn = 1 for all n = 1, . . . N

If requested item is not stored in local cache, then

• Item is fetched from peer if cached elsewhere in cluster


and otherwise from root node

• Value of requested item is compared with ‘marginal’


cache value, i.e., value provided by marginal item in
local cache, which may then be evicted and replaced

c0dn if stored elsewhere in cluster


(
Value of item n =
c00dn otherwise

16
Local-Greedy algorithm (cont’d)

May get stuck in suboptimal configuration

globally optimal configuration local optimum

• Duplicating red item less valuable than single yellow


item

• Duplicating yellow item less valuable then single green


item

17
Local-Greedy algorithm (cont’d)

Performance guarantees (competitive ratios)

• Symmetric popularities: within factor 4/3 from optimal

• Arbitrary popularities: within factor 2 from optimal

Modification: Global-Non-Greedy algorithm

18
Numerical experiments

• M = 10 leaf nodes, each with cache of size B = 1 TB

• Unit transport cost c0 = 2, c = 1, c0 = 1

• Collection of N = 10, 000 content items, with common


size of S = 2 GB

• Each leaf node can store K = B/S = 500 content items

• Zipf-Mandelbrot popularity distribution with shape pa-


rameter α = 0.8 and shift parameter q = 10

In optimal placement, items 1 through 165 are fully repli-


cated, and single copies of items 166 through 3515 are
stored 19
Performance of Local-Greedy algorithm

Various leaf nodes receive requests over time, sampled


from above-described popularity distribution

1 1

Full replication Full replication


Performance ratio

Performance ratio
No replication No replication
Random Random
0.5 0.5

0 0
0 2500 5000 0 2500 5000
Number of requests received Number of requests received

Static popularity Content aging

Performance ratio as function of number of requests

20
Some observations

• Local-Greedy algorithm gets progressively closer to op-


timum as system receives more requests and replaces
items over time

• After only 3000 requests (out of total number 10,000


items) Local-Greedy algorithm has come to within 1% of
optimum, and stays there

• Performs markedly better than worst-case ratio of 3/4


might suggest

• While algorithm seems to ‘converge’ for all three initial


states, scenario with no replication appears to be most
favorable one, due to fact that in optimal placement
only items 1 through 165 are fully replicated
21

Anda mungkin juga menyukai