Distributed Caching Algorithms For Content Distribution Networks

Distributed Caching Algorithms
for Content Distribution Networks
Sem Borst
INRIA - Bell Labs Workshop
Murray Hill, June 1, 2009
Joint work with Anwar Walid, Varun Gupta (CMU)

Introduction
Scope: personalized/on-demand delivery of high-definition

video through service provider
• Live broadcast TV
• CatchUp TV / PauseLive TV features
• NPVR (Network Personal Video Recorder) capabilities
• Movie libraries / VoD (Video-on-Demand)
• User-generated content
Unicast nature defies conventional broadcast TV paradigm

1
Caching strategies
Focus: ‘hierarchical’ network architecture
Store popular content closer to network edge to reduce

traffic load, capital expense and performance bottlenecks
Two interrelated problems
• Design: optimal cache locations and sizes

(joint work with Marty Reiman)
• Operation: efficient (dynamic) placement of content

items
2
Optimal content placement
Consider symmetric scenario (cache sizes, popularity dis-

tributions)
For now, assume strictly hierarchical topology: content can

only be requested from parent node
Caches should be filled with most popular content items

from lowest level up
VHO
IO
CO
DSLAM
STB
3
Greedy content placement strategy
Whenever node receives request for item, its local ‘popu-

larity estimate’ for that item is updated
If requested item is not stored in local cache, then
• Request is forwarded to parent node
• Popularity estimate for requested item is compared with

that for ‘marginal’ item, which may then be evicted and
replaced
Provable ‘convergence’ to optimal content placement
4
Optimal content placement (cont’d)
Relies on two strong (though reasonable) assumptions
• Symmetric popularity distributions and cache sizes
• Strictly hierarchical topology
What if popularity distributions are spatially heterogeneous?
Or what if content can be requested from peers as well?
5
Assume there are caches installed at only two levels
VHO
IO
CO
DSLAM
STB
6
Consider cluster of M nodes at same level in hierarchy
Cluster nodes are either directly connected or indirectly via

common parent node
root node
parent node
leaf nodes
1 2 M
7
Some notation
• c0: transfer cost from root node - 1 to parent node 0
• ci: transfer cost from parent node 0 to node i
• cij : transfer cost from leaf node j to leaf node i
Then


 c 0 + ci j =i

 c
0 j =0
fij :=



0 j = −1

c0 + ci − cij j 6= −1, 0, i
represents transport cost savings achieved by transferring
data to leaf node i from node j instead of root node
8
Problem of maximizing cost savings may be formulated as

M X
X N M
X
max sndin fij xjin (1)
i=1 n=1 j=0
N
X
sub snxin ≤ Bi, i = 0, 1, . . . , M (2)
n=1
xjin ≤ xjn, i = 1, . . . , M, j = 0, 1, . . . , M, n = 1, . . . , N
(3)
M
X
xjin ≤ 1, i = 1, . . . , M, n = 1, . . . , N, (4)
j=0
with Bi denoting cache size of i-th node, sn size of n-th
item, din demand for n-th item at i-th node
9
Inter-level cache cooperation
Allow for heterogeneous popularity distributions, but as-

sume cij = ∞, i.e., content can only be fetched from parent
node and not from peers
For compactness, denote cmin := mini=1,...,M ci
Proposition
For arbitrary popularity distributions, greedy content place-

ment strategy is guaranteed to achieve at least fraction
(M − 1)cmin + M c0 M
≥
(M − 1)cmin + (2M − 1)c0 2M − 1
of maximum achievable cost savings
10
Intra-level cache collaboration
Now suppose content can be requested from peers as well
Intra-level connectivity allows distributed caches to coop-

erate and act as single logical cache, and makes caching
at lower levels more cost-effective
Greedy optimization of local hit rate will lead to complete

replication of cache content
Cache cooperation improves aggregate hit rate across cache

cluster, at expense of lower local hit rate
Optimal trade-off and degree of replication depends on

cost of intra-level transfers relative to transfers from parent
or root node
11
Intra-level cache cooperation (cont’d)
Assume symmetric transport cost, cache sizes and popu-

larity distributions: Bi ≡ B, ci ≡ c, cij ≡ c0, and din ≡ dn
For compactness, denote c00 := M (c + c0) − (M − 1)c0 > c0
Problem (1)–(4) may be simplified to

N
sndn(c00pn + (M − 1)c0qn
0 + Mc x )
X
max 0 0n
n=1
N
X
sub snx0n ≤ B0
n=1
N
0 ) ≤ MB
X
sn(pn + (M − 1)qn
n=1
pn + x0n ≤ 1, n = 1, . . . , N
0 +x
qn 0n ≤ 1, n = 1, . . . , N
Knapsack problem type structure

12
Intra-level cache collaboration (cont’d)
Optimal solution of content placement problem has rela-

tively simple structure
Distinguish between two cases
• M c ≥ (M −1)c0: more advantageous to store un-replicated

content in leaf nodes than in parent node
• M c ≤ (M − 1)c0: more attractive to store un-replicated

content in parent node than in leaf nodes
13
Case M c ≥ (M − 1)c0
root node
parent node
leaf nodes
1 2 M
Four popularity ‘tiers’
0 =1
• Highly popular (red): replicated in all leafs pn = 1, qn
• Fairly popular (pink): stored in single leaf pn = 1
• Mildly popular (yellow): stored in parent node x0n = 1
• Hardly popular (green): stored in root node only

14
Case M c ≥ (M − 1)c0
root node
parent node
leaf nodes
1 2 M
Four popularity ‘tiers’
0 =1
• Highly popular (red): replicated in all leafs pn = 1, qn
• Fairly popular (pink): stored in common parent x0n = 1
• Mildly popular (yellow): stored in single leaf pn = 1
• Hardly popular (green): stored in root node only

15
Local-Greedy algorithm
For convenience, assume B0 = 0, sn = 1 for all n = 1, . . . N
If requested item is not stored in local cache, then
• Item is fetched from peer if cached elsewhere in cluster

and otherwise from root node
• Value of requested item is compared with ‘marginal’

cache value, i.e., value provided by marginal item in
local cache, which may then be evicted and replaced
c0dn if stored elsewhere in cluster

(
Value of item n =
c00dn otherwise
16
Local-Greedy algorithm (cont’d)
May get stuck in suboptimal configuration
globally optimal configuration local optimum
• Duplicating red item less valuable than single yellow

item
• Duplicating yellow item less valuable then single green

item
17
Local-Greedy algorithm (cont’d)
Performance guarantees (competitive ratios)
• Symmetric popularities: within factor 4/3 from optimal
• Arbitrary popularities: within factor 2 from optimal
Modification: Global-Non-Greedy algorithm
18
Numerical experiments
• M = 10 leaf nodes, each with cache of size B = 1 TB
• Unit transport cost c0 = 2, c = 1, c0 = 1
• Collection of N = 10, 000 content items, with common

size of S = 2 GB
• Each leaf node can store K = B/S = 500 content items
• Zipf-Mandelbrot popularity distribution with shape pa-

rameter α = 0.8 and shift parameter q = 10
In optimal placement, items 1 through 165 are fully repli-

cated, and single copies of items 166 through 3515 are
stored 19
Performance of Local-Greedy algorithm
Various leaf nodes receive requests over time, sampled

from above-described popularity distribution
1 1
Full replication Full replication

Performance ratio
Performance ratio
No replication No replication
Random Random
0.5 0.5
0 0
0 2500 5000 0 2500 5000
Number of requests received Number of requests received
Static popularity Content aging
Performance ratio as function of number of requests
20
Some observations
• Local-Greedy algorithm gets progressively closer to op-

timum as system receives more requests and replaces
items over time
• After only 3000 requests (out of total number 10,000

items) Local-Greedy algorithm has come to within 1% of
optimum, and stays there
• Performs markedly better than worst-case ratio of 3/4

might suggest
• While algorithm seems to ‘converge’ for all three initial

states, scenario with no replication appears to be most
favorable one, due to fact that in optimal placement
only items 1 through 165 are fully replicated
21

Distributed Caching Algorithms For Content Distribution Networks

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Distributed Caching Algorithms For Content Distribution Networks

Diunggah oleh

Hak Cipta:

Format Tersedia

Distributed Caching Algorithms

for Content Distribution Networks

INRIA - Bell Labs Workshop

Murray Hill, June 1, 2009

Joint work with Anwar Walid, Varun Gupta (CMU)

Scope: personalized/on-demand delivery of high-definition

• CatchUp TV / PauseLive TV features

• NPVR (Network Personal Video Recorder) capabilities

• Movie libraries / VoD (Video-on-Demand)

Unicast nature defies conventional broadcast TV paradigm

Focus: ‘hierarchical’ network architecture

Store popular content closer to network edge to reduce

Two interrelated problems

• Design: optimal cache locations and sizes

• Operation: efficient (dynamic) placement of content

Consider symmetric scenario (cache sizes, popularity dis-

For now, assume strictly hierarchical topology: content can

Caches should be filled with most popular content items

Whenever node receives request for item, its local ‘popu-

If requested item is not stored in local cache, then

• Request is forwarded to parent node

• Popularity estimate for requested item is compared with

Provable ‘convergence’ to optimal content placement

Relies on two strong (though reasonable) assumptions

• Symmetric popularity distributions and cache sizes

• Strictly hierarchical topology

What if popularity distributions are spatially heterogeneous?

Or what if content can be requested from peers as well?

Assume there are caches installed at only two levels

Consider cluster of M nodes at same level in hierarchy

Cluster nodes are either directly connected or indirectly via

• c0: transfer cost from root node - 1 to parent node 0

• ci: transfer cost from parent node 0 to node i

• cij : transfer cost from leaf node j to leaf node i

Problem of maximizing cost savings may be formulated as

Allow for heterogeneous popularity distributions, but as-

For compactness, denote cmin := mini=1,...,M ci

For arbitrary popularity distributions, greedy content place-

Now suppose content can be requested from peers as well

Intra-level connectivity allows distributed caches to coop-

Greedy optimization of local hit rate will lead to complete

Cache cooperation improves aggregate hit rate across cache

Optimal trade-off and degree of replication depends on

Assume symmetric transport cost, cache sizes and popu-

For compactness, denote c00 := M (c + c0) − (M − 1)c0 > c0

Problem (1)–(4) may be simplified to

Knapsack problem type structure

Optimal solution of content placement problem has rela-

Distinguish between two cases

• M c ≥ (M −1)c0: more advantageous to store un-replicated

• M c ≤ (M − 1)c0: more attractive to store un-replicated

Four popularity ‘tiers’

• Fairly popular (pink): stored in single leaf pn = 1

• Mildly popular (yellow): stored in parent node x0n = 1

• Hardly popular (green): stored in root node only

Four popularity ‘tiers’

• Fairly popular (pink): stored in common parent x0n = 1

• Mildly popular (yellow): stored in single leaf pn = 1

• Hardly popular (green): stored in root node only

For convenience, assume B0 = 0, sn = 1 for all n = 1, . . . N

If requested item is not stored in local cache, then

• Item is fetched from peer if cached elsewhere in cluster