Anda di halaman 1dari 100

Networks: Theory and Applications

Autumn 2011
Dr. Michael T. Gastner
References:
M. E. J. Newman, Networks: An Introduction, Oxford University Press, Oxford
(2010).
C. D. Meyer, Matrix analysis and applied linear algebra, Society for Industrial and
Applied Mathematics (SIAM), Philadelphia (2000).
T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to Algorithms
(3rd ed.), MIT Press, Cambridge (2009).
R. K. Ahuja, T. L. Magnanti, J. B. Orline, Network Flows, Prentice Hall, Upper
Saddle River (1993).
T. Roughgarden, Selsh Routing and the Price of Anarchy, The MIT Press, Cam-
bridge (2005).
1 Introduction
A network is a set of points connected by lines. I will refer to the points as nodes and to
the lines as links.
node
link
Figure 1: A small network composed of 10 nodes and 9 links.
In dierent elds, nodes and links are called by dierent names.
engineering and mathematics physics social
computer science sciences
point node vertex site actor
line link edge bond tie
network network graph network network
1
Figure 2: The structure of the Internet. The positions of the nodes in the gure is not repre-
sentative of their real geographic location. Figure created by the Opte project (www.opte.org).
Example: the Internet (Fig. 2)
node: class C subnet (group of computers with similar IP addresses, usually administrated
by a single organisation)
link: routes taken by IP packets, usually optical bre
Example: the World Wide Web (Fig. 3)
Not to be confused with the Internet, which is a physical network of computers, the World
Wide Web is an information network.
node: web page
link: hyperlink (i.e., the elds to click on to navigate from one page to another)
Note: links are directed (i.e. can be traversed in one direction, but not necessarily in the
opposite direction).
Example: social network (Fig. 4)
node: person
link: friendship, business relationship
Example: scientic collaborations (Fig. 5)
node: scientist
2
Figure 3: The network of 180 web pages of a large corporation. From M. E. J. Newman and
M. Girvan, Physical Review E 69, 026113 (2004).
Figure 4: Friendship network of children at a US school. Node colours represent ethnicity. From
James Moody, American Journal of Sociology 107, 679716 (2001).
3
Figure 5: A network of scientic collaborations at the Santa Fe Institute. From M. Girvan and
M. E. J. Newman, Proc. Natl. Acad. Sci. USA 99, 8271-8276 (2002).
link: shared authorship on a scientic publication
Note: publications can be co-authored by more than two scientists, but we cannot tell
this from the above network.
Better represented by two types of nodes: scientists and publications Links only between
scientists and papers they co-authored Bipartite network.
Example: scientic citation network (Fig. 6)
node: scientic publications
link: there is a link from publication A to publication B if A cites B in its bibliography.
Note: citation networks are (almost) acyclic (i.e. all directed links point backward in
time). One cannot cite a paper that is not yet published.
Example: mobile phone call network (Fig. 7)
node: mobile phone user
link: call between two users
Example: food web (Fig. 8)
node: species
link: predator-prey relationship
Example: brain (Fig. 9)
node: neurons
4
Figure 6: Citation network of early DNA articles. Image from
http://www.gareld.library.upenn.edu/papers/vladivostok.html
Figure 7: Part of a large network of mobile phone calls. Image from Wang et al., Science 324,
1071-1076 (2009).
5
Figure 8: Food web of a Caribbean coral reef. Image by Neo Martinez (Pacic Ecoinformatics
and Computational Ecology Lab).
Figure 9: Anatomical representation of brain regions and their connection. From Meunier et
al., Frontiers in Neuroinformatics 3, 37 (2009).
6
Figure 10: A wallchart showing the network formed by major metabolic pathways. Created by
David Nicholson.
link: synchronised activity
Example: metabolic network (Fig. 10)
node: metabolite
link: chemical reaction
Example: urban railroads (Fig. 11)
node: station
link: train connection
Example: road map (Fig. 12)
node: junction
link: street
Typical questions in network analysis:
paths and distances: what is the shortest route between two nodes?
centrality: who is the most inuential person in a social network?
community structure: can we identify groups of like-minded individuals in a social
network?
7
Figure 11: Rail service map of London. Image by Transport for London.
Figure 12: The road network near Imperial College.
8
ow: how can trac be routed to avoid congestion?
9
2 Networks represented by their adjacency matrix
2.1 Undirected networks
Denition 2.1:
An undirected, simple network G = (N, L) is an ordered pair of a set of nodes N and a
set of links L. The links are subsets of N with exactly two distinct elements.
Note: In a simple network there cannot be multiple links (multiedge) between two
nodes and no node can be connected to itself (i.e. no self-loops).
1
2
3
4
5
6
(a) (b)
multiple links
self-loop
1
2
3
4
5
6
Figure 13: (a) An undirected, simple network (i.e. a network without multiple links between
the same pair of nodes or self-loops). (b) An example for a network with multiple links and
self-loops.
If we allow multiple links, the network is called a multigraph. (In this course, we will
mostly deal with simple networks.)
Let us label the nodes 1, . . . , n. The order does not matter as long as every node label is
unique.
The network can be represented by specifying the number of nodes n and the edge list.
For example in Fig. 13a, n = 6 and the links are (1, 2), (1, 5), (1, 6), (2, 3), (3, 4), (3, 5)
and (4, 5).
Another representation is the adjacency matrix.
Denition 2.2:
The adjacency matrix A of a simple network is the matrix with elements A
ij
such that
A
ij
=
_
1 if there is a link between nodes i and j (i and j are adjacent),
0 otherwise.
Example:
The adjacency matrix of the network in Fig. 13a is
10
A =
_
_
_
_
_
_
_
_
0 1 0 0 1 1
1 0 1 0 0 0
0 1 0 1 1 0
0 0 1 0 1 0
1 0 1 1 0 0
1 0 0 0 0 0
_
_
_
_
_
_
_
_
Note:
The diagonal elements A
ii
are all zero (no self-loops).
A is symmetric (if there is a link between i and j, then there is also a link between
j and i).
2.2 Directed networks
A directed network (also called a directed graph or digraph) is a network where the links
only go in one direction.
Formally, in Denition 2.1, the elements in the link set L are now ordered (instead of
unordered) pairs of nodes.
Examples: the World Wide Web, food webs, citation networks.
The links can be represented by lines with arrows on them.
1
2
3
4
5
6
Figure 14: A directed network.
Denition 2.3:
The adjacency matrix of a directed network has matrix elements
A
ij
=
_
1 if there is a link from j to i,
0 otherwise
Note: the direction of the link is counter-intuitive, but this notation will be convenient
later on.
11
Example:
The adjacency matrix of the network in Fig. 14 is
A =
_
_
_
_
_
_
_
_
0 0 0 0 1 0
1 0 1 0 0 0
0 0 0 0 1 0
0 0 1 0 1 0
0 0 1 0 0 0
1 0 0 0 0 0
_
_
_
_
_
_
_
_
Note: A is asymmetric.
2.3 Weighted networks
In some networks, it is useful to assign dierent weights to links.
Examples:
Trac in a transportation network.
Frequency of contacts in a social network.
Total energy ow from prey to predator in a food web.
This information can be represented by an adjacency matrix where the entries are not
all either 0 or 1.
If weights are non-negative, they can be represented by line thickness.
Example:
The network with the weighted adjacency matrix
A =
_
_
_
_
_
_
_
_
0 0 0 0 2 0
2 0 0.5 0 0 0
0 0 0 0 0 0
0 0 1.5 0 0.5 0
0 0 1 0 0 0
1 0 0 0 0 0
_
_
_
_
_
_
_
_
looks like Fig. 15.
Sometimes it is useful to consider negative weights.
Example:
In a social network,
positive weight: friendship,
negative weight: animosity.
A special case are signed networks, where all weights are either +1 or 1 (or 0 if there
is no link).
Structural balance theory states that signed social networks are stable if and only if either
two friends have the same friends or
12
1
2
3
4
5
6
Figure 15: A weighted network.
my enemys enemy is my friend.
A recent study of interactions in a virtual life game (Szell et al., PNAS 107, 13636
[2010]) with 300, 000 participants conrmed that most triads (i.e. sub-networks of three
mutually connected players) satisfy these two rules. Triads with exactly two positive links
were less likely than in a null model where the total number of +s and s was xed,
but randomly redistributed over the links. The case of three negative links in a triad is
more complicated: there were relatively few such triads, but their number was not much
smaller than in the null model.
+
+ +
+

+
+


structural balance
theory stable stable unstable unstable
N

26, 329 4, 428 39, 519 8, 032


N
rand

10, 608 28, 545 30, 145 9, 009


Table 1: Possible triad congurations in a signed network. N

: empirical number of triads in a


large virtual-life community. N
rand

: expectation value for sign randomisation. Data from Szell


et al., PNAS 107, 13636 (2010).
2.4 Cocitation and bibliographic coupling
Cocitation and bibliographic coupling are two dierent ways of turning a simple un-
weighted directed network into a weighted undirected network.
Denition 2.4:
The cocitation C
ij
of two nodes i and j in a directed network is the number of nodes
with links pointing to both i and j.
Example: Academic citation network
13
i j
Figure 16: Papers i and j are cited together by three papers, so C
ij
= 3.
Cocitation and the adjacency matrix:
From the denition of the adjacency matrix A,
C
ij
=
n

k=1
A
ik
A
jk
or, expressed as cocitation matrix C,
C = AA
T
.
Interpretation of cocitation
In citation networks, a large cocitation is an indicator that two papers deal with related
topics.
C is similar to an adjacency matrix, but it will generally have non-zero entries on the
diagonal,
C
ii
=
n

k=1
A
2
ik
=
n

k=1
A
ik
,
thus C
ii
is equal to the total number of links pointing to i.
Denition 2.5:
The bibliographic coupling B
ij
of two nodes i and j is the number of other nodes to which
both point.
Example: Academic citation network
Bibliographic coupling and the adjacency matrix:
B
ij
=
n

k=1
A
ki
A
kj
or, expressed as bibliographic coupling matrix B,
B = A
T
A.
Interpretation of bibliographic coupling:
14
i j
Figure 17: Papers i and j cite three of the same papers, so B
ij
= 3.
Similar to cocitation, a large value B
ij
indicates that papers i and j are about a similar
subject.
Dierence:
Strong C
ij
requires both i and j to be highly cited.
Strong B
ij
requires both i and j to cite many papers.
In practice B
ij
works better because the bibliography sizes of papers are more uniform
than the citations received by papers.
B
ij
is used, for example, by the Science Citation Index in its Related Records features.
15
3 Degree
3.1 Denitions
Denition 3.1:
The degree k
i
of a node i in a simple, undirected, unweighted network is the number of
links connected to i.
degree 4
Figure 18: An undirected network. The node in the centre has degree 4.
Remarks:
The degree can be computed from the adjacency matrix, k
i
=

n
j=1
A
ij
.
The total number m of links in the network satises m =
1
2

n
i=1
k
i
.
Denition 3.2:
In a directed, unweighted network, the in-degree k
in
i
of a node i is the number of ingoing
links and the out-degree k
out
i
the number of outgoing links.
in-degree 1
out-degree 4
Figure 19: An directed network. The node in the centre has in-degree 1 and out-degree 4.
Remarks:
k
in
i
=

n
j=1
A
ij
, k
out
j
=

n
i=1
A
ij
.
m =

n
i=1
k
in
i
=

n
j=1
k
out
j
.
3.2 Degree Distributions
Denition 3.3:
In an undirected network, the degree distribution is the sequence p
1
, p
2
, . . ., where p
k
is
the fraction of nodes in the network with degree k.
16
Example:
p
0
=
1
10
, p
1
=
3
10
, p
2
=
3
10
, p
3
=
2
10
, p
4
= 0, p
5
=
1
10
.
Remark: In a directed network, we can similarly dene the in-degree distribution and
out-degree distribution.
Example:
Figure 20: The in- and out-degree of the World Wide Web. From Broder et al., Comput. Netw.
33, 309320 (2000).
The distributions are often heavy-tailed: there are some nodes (hubs) with very high
degree. As a rst approximation, the distributions can be t by power laws. But how to
make power-law ts statistically sound is a matter of controversy and current research.
17
4 Walks, cycles and paths
4.1 Denitions
Here we consider simple unweighted networks. They may be undirected or directed.
Denition 4.1:
A walk is a sequence of nodes v
1
v
2
. . . v
k
in which every consecutive pair of
nodes in the sequence is connected by a link in the network (i.e. A
v
i+1
,v
i
= 1, i =
1, . . . , k).
The length of a walk is the number of links traversed along the walk (i.e. k 1).
A cycle is a walk that begins and ends at the same node (i.e. v
1
= v
k
).
A path is a walk that does not contain any cycles.
Remark:
Links and nodes in a walk and in a cycle can be traversed more than once, but in a path
multiple traversals are forbidden.
Example:
walk
path
cycle
Figure 21: A walk of length 6, a cycle of length 3 and a path of length 3.
4.2 A reminder: Jordan normal form
We want to relate walks and cycles to the adjacency matrix. For this purpose (and some
applications later in the course), it will be convenient to transform the adjacency matrix
into Jordan normal form. Here is a brief summary of the properties of the Jordan normal
form. Proofs can be found in most linear algebra textbooks.
Theorem 4.2:
For every complex square matrix M, there exists a non-singular matrix P such that
J = P
1
MP is upper triangular and block diagonal,
J =
_
_
_
_
_
J
1
0 . . . 0
0 J
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . J
p
_
_
_
_
_
,
18
where each Jordan block J
i
is an upper triangular square matrix of the form
J
i
=
_
_
_
_
_
_
_

i
1 0 . . . 0
0
i
1 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 0
i
1
0 . . . 0 0
i
_
_
_
_
_
_
_
.
The diagonal entry
i
is an eigenvalue of M.
The Jordan normal form J is unique up to the order of the Jordan blocks.
Denition 4.3:
The index of the eigenvalue
i
, index(
i
), is the size of the largest Jordan block
with diagonal entries
i
.
The algebraic multiplicity of
i
, alg mul
M
(
i
), is the number of times
i
is repeated
on the diagonal of J.
The geometric multiplicity of
i
, geo mul
M
(
i
), is the number of Jordan blocks
with
i
on the diagonal.
The spectral radius (M) of the matrix M is the maximum absolute value of all
diagonal entries in J, i.e. (M) = max
i
|
i
|.
Example:
M =
_
_
_
_
8 1/2 5 5
0 12 0 0
0 1/2 3 5
0 3/2 15 7
_
_
_
_
can be brought into Jordan normal form
J = P
1
MP =
_
_
_
_
8 0 0 0
0 8 0 0
0 0 12 1
0 0 0 12
_
_
_
_
with
P =
1
4
_
_
_
_
1 1 1 0
0 0 0 2
3 1 1 0
3 1 3 0
_
_
_
_
.
index(8) = 1, index(12) = 2,
alg mul
M
(8) = alg mul
M
(12) = 2,
geo mul
M
(8) = 2, geo mul
M
(12) = 1,
(M) = 12.
19
4.3 Relating walks and cycles to the adjacency matrix
Proposition 4.4:
Let us denote by N
(r)
ij
then umber of walks of length r from node j to node i. If A is the
adjacency matrix, then
N
(r)
ij
= [A
r
]
ij
,
i.e. N
(r)
ij
is the (i, j)-th entry of the r-th power of the adjacency matrix.
Proof :
r = 1:
There is a walk from j to i if and only if there is a (directed) link between these
two nodes. N
(1)
ij
= A
ij
.
Induction from r to r + 1:
If there are N
(r)
ik
walks of length r from k to i, then the number of walks of length
r + 1 from j to i visiting k as the second node is equal to N
(r)
ik
A
kj
. Summing over
k yields the number of all walks.
N
(r+1)
ij
=

n
k=1
N
(r)
ik
A
kj
=

n
k=1
[A
r
]
ik
A
kj
= [A
r+1
]
ij
.

Let us denote by C
r
the number of all cycles of length r anywhere in the network.
Note that C
r
counts, for example, the cycles
1 2 3 1,
2 3 1 2 and
1 3 2 1
as separate cycles.
Proposition 4.5:
Consider an arbitrary (directed or undirected) network with n nodes. Let the (generally
complex) eigenvalues of its adjacency matrix A be
1
, . . . ,
n
. (Note: if eigenvalue
i
has
algebraic multiplicity a
i
, it appears a
i
times in this sequence.) Then the number of cycles
of length r is
C
r
=
n

i=1

r
i
.
Proof :
From Prop. 4.4,
C
r
=
n

i=1
[A
r
]
ii
= Tr(A
r
). (1)
Viewing A as a complex matrix, we can transform it into Jordan normal form: J =
P
1
AP.
1
Because of the upper triangular form of J, T = J
r
is upper triangular for any positive
integer r and the diagonal entries are
r
i
.
1
If the network is undirected, A is symmetric so that we can even assume J to be diagonal. But for
directed networks the general Jordan normal form is the best we can do.
20
T = J
r
=
_
_
_
_
_
_

r
1
T
12
T
13
. . . T
1n
0
r
2
T
23
. . . T
2n
0 0 . . . . . . . . .
. . . . . . . . . . . . . . .
0 0 0 0
r
n
_
_
_
_
_
_
.
Now plug this into Eq. 1,
C
r
= Tr(PJ
r
P
1
)
()
= Tr(P
1
PJ
r
) = Tr(J
r
) =
n

i=1

r
i
.
In step () we have used that Tr(M
1
M
2
) = Tr(M
2
M
1
) for any square matrices M
1
, M
2
.

4.4 Directed acyclic networks


Denition 4.6:
A directed network with no cycles is called acyclic.
Example: scientic citation network
A paper can only cite another paper if it has already been written.
2

All directed links point backward in time.


t
i
m
e
Figure 22: An example of a directed acyclic network.
Proposition 4.7:
Consider a directed network whose nodes are labeled 1, . . . , n. Then the following two
statements are equivalent.
(A) The network is acyclic.
(B) There exists a sequence t
i
R, i = 1, . . . n, so that t
j
> t
k
for all links j k.
2
Rare exceptions exist, for example if an author publishes two papers simultaneously in the same
journal and each paper cites the other. Thus, real citation networks have a small number of short cycles.
21
Remark: t
i
plays the role of the publication date in citation networks.
Proof: (A) (B)
There must be at least one node with out-degree 0. To see this consider the following
path across the network.
(i) Start at an arbitrary node,
(ii) If this node has out-degree 0 we are done.
(iii) Otherwise choose one of the directed outgoing links and follow it to a new node.
Go back to step (ii).
If we pass through step (ii) more than n times, we must have revisited a node that has
already been on the path. But then we have found a cycle, contradicting (A).
The above algorithm must terminate.
There is at least one node i
1
with out-degree 0. Assign t
i
1
= 1.
Now remove i
1
and all of the links attached to it from the network.
The remaining network of n 1 nodes must again have one node i
2
with no outgoing
links. Set t
i
2
= 2. Remove i
2
from the network and repeat this procedure to assign
t
i
3
= 3, . . . , t
in
= n. The sequence t
i
satises (B).
Note: t
i
is not unique. For example, if there is more than one node without outgoing
links, we can choose arbitrarily which one we remove next.
Proof: (B) (A)
Suppose we found a cycle of nodes n
1
n
2
. . . n
i
n
1
. From (B) and the rst
i 1 steps in the cycle, we know that t
1
> t
2
> . . . > t
i
. The last step in the cycle
n
i
n
1
demands t
i
> t
1
in contradiction to the previous inequality.
Proposition 4.8:
Consider a network with n nodes. The following three statements are equivalent.
(A) The network is acyclic.
(B) The adjacency matrix A satises A
n
= 0. (This implies that A is nilpotent.)
(C) All (complex) eigenvalues of A are zero.
Proof: (A) (B)
Use the algorithm developed in the proof of Prop. 4.7 to nd a sequence t
i
{1, . . . , n}
so that t
j
> t
k
for all links j k.
Dene the permutation so that (i) = t
i
and the n n permutation matrix
P =
_
_
e
(1)

e
(n)
_
_
,
where
e
i
= (0, . . . , 0, 1
..
i-th position
, 0, . . . , 0).
P
1
AP is strictly upper triangular (i.e. has only zeros on the diagonal),
P
1
AP =
_
_
_
_
_
0 x
12
x
1n
0 0 x
2n
.
.
.
.
.
.
.
.
.
0 0 0
_
_
_
_
_
.
22
(P
1
AP)
n
= 0 P
1
A
n
P = 0 A
n
= 0.
Proof: (B) (C)
Let be an eigenvalue of A with eigenvector v,
v = Av
n
v = A
n
v = 0 = 0.
Proof: (C) (A)
This follows from Prop. 4.5.

23
5 Components
Denition 5.1:
An undirected network is connected if there is a path between every pair of nodes.
An undirected network that is not connected can be divided into components dened as
maximal connected subsets.
Figure 23: An undirected network with three components.
In directed networks the situation is more complicated. If there is a path from node i to
j, there may not be a path from j to i.
Weakly connected components: these are the components in the network if all directed
links are replaced by undirected links.
Strongly connected components: two nodes i and j belong to the same strongly connected
component if there are directed paths from i to j and from j to i.
Figure 24: A directed network with two weakly and four strongly (shaded) connected compo-
nents.
Example:
Directed acyclic networks have no strongly connected component with more than one
node.
Denition 5.2:
The out-component of a node i is the set of all nodes reachable from node i via directed
paths, including i itself.
The in-component of i is the set of all nodes from which i can be reached via directed
paths, including i itself.
24
i
in
out
Figure 25: The in- and out-component of a node i in a directed network.
Remark: If node j is in both the in- and out-component of i, then i and j are in the same
strongly connected component.
The component structure of directed networks is sometimes visualised in form of a bow-
tie diagram. Below is the diagram for the World Wide Web.
Figure 26: From Broder et al., Comput. Netw. 33, 309320 (2000).
25
6 Cycles in bipartite and signed networks
6.1 Bipartite networks
Denition 6.1:
An undirected network is called bipartite if the nodes can be divided into two disjoint
sets N
1
, N
2
so that every link connects one node in N
1
to one node in N
2
.
1 2 3 4 5
A B C
N
N
1
2
Figure 27: A small bipartite network.
Examples:
network N
1
N
2
scientic co-authorship author co-authored publication
board of directors director board of a company
recommender systems customers people who bought this book,
(e.g. Amazon) movie etc.
public transport station, stop train, tram, bus route
lm actors actor cast of a lm
(Kevin Bacon game)
Theorem 6.2:
The following two statements are equivalent:
(A) A network is bipartite.
(B) The length of every cycle is an even number.
Proof (A) (B)
Consider an arbitrary cycle v
1
v
2
. . . v
k
v
1
. Because the network is bipartite,
v
i
and v
i+1
must be in dierent sets.
Without loss of generality, assume v
1
N
1
. ()
Then v
3
, v
5
, v
7
. . . N
1
and v
2
, v
4
, v
6
, . . . N
2
. If k is odd, then v
k
N
1
and, because v
1
is adjacent to v
k
, v
1
N
2
in contradiction with ().
The cycle length k is even.
Proof : (B) (A)
Let us assume that the network is connected. Choose a node v and dene
X = {node x | the shortest path from v to x has even length},
Y = {node y | the shortest path from v to y has odd length}.
We will show that X and Y play the role of N
1
and N
2
in Def. 6.1.
26
Let x
1
, x
2
be in X and suppose they are adjacent. v is not adjacent to x
1
; otherwise the
shortest path from v to x
1
would have length 1 and thus would not be even. Therefore
v = x
2
.
Repeating the same argument with the sub-indices 1 and 2 interchanged, we also know
v = x
1
.
Let P
1
: v v
1
. . . v
2k
be a shortest path from v to v
2k
= x
1
and let P
2
: v
w
1
. . . w
2l
be a shortest path from v to w
2l
= x
2
. Note that both P
1
and P
2
are of
even length.
Then the cycle v v
1
. . . x
1
x
2
. . . w
1
v has odd length in contradiction
to (B).
If the network is not connected, we can apply the above argument to every component.
Because a network is bipartite if and only if each component is bipartite, the proof is
nished.

Denition 6.3:
The incidence matrix B of a bipartite network is a |N
2
| |N
1
| matrix with entries
B
ij
=
_
1 if node j N
1
is linked to i N
2
,
0 otherwise.
Example: In Fig. 27,
B =
_
_
1 0 0 1 0
1 1 1 0 0
0 1 1 0 1
_
_
.
Although a bipartite network represents the complete information, it is sometimes more
convenient to eliminate either N
1
or N
2
and only work with links between the same type
of nodes.
Example: In the Kevin-Bacon game, we try to nd the degree of separation (i.e. the
minimum number of links, a.k.a. Bacon number) between Kevin Bacon and some other
actor.
3
For example, the Bacon number of Clint Eastwood is 2, because Eastwood played
with Glenn Morshower in Blood Works (2002) and Morshower with Bacon in The
River Wild (1994). But to determine the Bacon number, it is enough to know that
there is a connection Eastwood Morshower and Morshower Bacon. The names of
the movies do not matter. This motivates the next denition.
Denition 6.4:
The one-mode projection of a bipartite network on the set N
1
is the weighted network
with node set N
1
whose adjacency matrix A has elements
A
ij
=
_

|N
2
|
k=1
B
ki
B
kj
if i = j
0 otherwise.
3
There is a similar game called the Erdos number for mathematicians. Here mathematicians are
linked if they have co-authored a paper. The Erd os number is the distance from Paul Erd os (1913-1996),
a famous Hungarian mathematician, in the one-mode projection. For example, my Erd os number is 4
(to the best of my knowledge). We will encounter the work of Paul Erd os in random network theory
later in this course.
27
Remarks:
If we dene D
1
to be the diagonal matrix containing the degrees of nodes in N
1
,
D
1
=
_
_
_
_
_
k
1
0 0 . . .
0 k
2
0 . . .
0 0 k
3
. . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
,
then A = B
T
BD
1
.
Similarly we can dene the one-mode projection on N
2
(instead of N
1
). If D
2
contains the degrees in N
2
,
D
2
=
_
_
_
_
_
k
A
0 0 . . .
0 k
B
0 . . .
0 0 k
C
. . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
,
then A = BB
T
D
2
.
Example:
(a) (b)
1
2
3
4
5
1
1
1
2
1
1
A
B
C
1
2
Figure 28: (a) One-mode projection of the bipartite network in Fig. 27 on N
1
. (b) One-mode
projection on N
2
.
6.2 Structural balance in signed networks.
Recall from Sec. 2.3 that a signed network is a simple weighted network whose weights
are all equal to either +1 or 1. In this section, we consider only undirected networks.
In social networks:
+1: friendship,
1 animosity.
Denition 6.5:
An undirected signed network whose nodes can be partitioned into two (possibly empty)
sets N
1
and N
2
so that
28
N
N
2
1

+
+

+
Figure 29: A small structurally balanced network.
each link v w with v, w N
1
or v, w N
2
has weight +1,
each link v w with v N
1
, w N
2
has weight 1,
is called structurally balanced.
Theorem 6.6:
The following statements are equivalent.
(A) A signed network is balanced.
(B) The product of the signs around each cycle is positive.
Remark: (B) is a generalisation of the two rules:
my friends friend is my friend,
my enemys enemy is my friend.
See Table 1 for balanced (stable) and unbalanced (unstable) triads.
Proof: (A) (B)
Consider an arbitrary cycle v
1
v
2
. . . v
k
v
1
. Every time two consecutive
nodes are not in the same set, the sign changes. Because the rst and last node are
identical, namely v
1
, the sign must change an even number of times. Otherwise v
1
would
be simultaneously in sets N
1
and N
2
which is impossible because they partition the node
set and thus N
1
N
2
= .
Proof: (B) (A)
Let us assume that the network is connected. We will assign the nodes to either N
1
or
N
2
according to the following algorithm:
1. Initially N
1
= N
2
= .
Assign a variable p(v) = 1 to every node v.
2. Choose a node u and assign it to set N
1
.
3. If all nodes were already assigned to either N
1
or N
2
, then terminate.
4. Choose a node v that has not yet been assigned to neither N
1
nor N
2
, but one of
its neighbours w has been assigned to one of the two sets.
Change p(v) to w and
if w N
1
and the link v w has weight +1, then assign v to N
1
,
otherwise if w N
2
and the link v w has weight +1, then assign v to N
2
,
otherwise if w N
1
and the link v w has weight 1, then assign v to N
2
,
29
otherwise assign v to N
1
.
5. Go to step 3.
We must show that the algorithm assigns nodes to N
1
and N
2
so that
(a) all nodes linked to a node v by a link with weight +1 are in the same set as v,
(b) all nodes linked to v by a link with weight 1 are in the opposite set.
First case: v N
1
, w adjacent to v and link v w has weight +1.
Assume w N
2
. ()
Let P
1
be the path v [p(v) = v
1
] [p(v
1
) = v
2
] . . . [p(v
i
) = u], where u
was the rst node assigned to N
1
in the algorithm above.
Let P
2
be the path w [p(w) = w
1
] [p(w
1
) = w
2
] . . . [p(w
j
) = u].
Consider the cycle
C : v v
1
. . . v
i

. .
P
1
u w
j
. . . w
1
w
. .
P
2
in opposite direction
v. (2)
On our way from v to u, we must encounter an even number of links with weight
1; otherwise v would not be N
1
.
Similarly, there is an odd number of 1s between u and w because of assump-
tion ().
This implies that there is an odd number of 1s along C, contradicting (B). Thus
w must be N
1
.
Second case: v N
1
, w adjacent to v and link v w has weight 1.
Assume w N
1
. ().
Dene P
1
and P
2
as in the rst case by tracing back our paths from v and w to
u. Form the cycle C as in Eq. 2. This time there is an even number of 1s along
P
1
and, because of (), also along P
2
so that C has an odd number of 1s in
contradiction to (B). Thus () must be false.
The remaining two cases, namely
v N
2
, w adjacent to v and link v w has weight +1,
v N
2
, w adjacent to v and link v w has weight 1,
can similarly be shown to imply w N
2
and w N
1
, respectively.
If the network is not connected, we can apply the above argument to every component.
Because a network is structurally balanced if and only if each component is structurally
balanced, the proof is nished.

30
7 Models of spread in networks
In this chapter, we only consider simple undirected networks. The generalisation for
directed networks is not straightforward.
7.1 Diusion
Assume
there is some commodity distributed on the nodes,
there is an amount
i
on node i,
the commodity ows along the links,
the ow on j i is at a rate C(
j

i
), where C is the so-called diusion constant.

d
i
dt
= C

j
A
ij
(
j

i
), (3)
where A is the adjacency matrix. We can rewrite Eq. 3 as
d
i
dt
= C

j
A
ij

j
C
i

j
A
ij
= C

j
A
ij

j
C
i
k
i
= C

j
(A
ij

ij
k
i
)
j
, (4)
where k
i
is the degree of i and
ij
is the Kronecker delta. In matrix form, Eq. 4 becomes
d
dt
= C(AD), (5)
where
D =
_
_
_
_
_
k
1
0 0 . . .
0 k
2
0 . . .
0 0 k
3
. . .
.
.
.
.
.
.
.
.
.
.
.
.
_
_
_
_
_
.
Denition 7.1:
The matrix L = DA is called the graph Laplacian.
The diusion equation, Eq. 5, can be written as
d
dt
= CL. (6)
Remark: In continuous space, the diusion (or heat) equation is

t
= C
2
. So L plays
the same role as the ordinary Laplacian
2
, apart from the minus sign on the left-hand
side of Eq. 6. We could absorb the minus sign in the denition of L, but unfortunately
this is not standard practice.
Because L is symmetric, we can nd an orthonormal basis of eigenvectors v
1
, . . . , v
n
. We
can express any solution of Eq. 6 as
(t) =

i
a
i
(t)v
i
,
31
where a
i
(t) are time-dependent coecients. Let
i
be the eigenvalue corresponding to
the eigenvector v
i
. Then it follows from Eq. 6 that

i
da
i
dt
v
i
= C

i
a
i
v
i
. (7)
Because the v
i
form a basis, the coecients on both sides of Eq. 7 must be equal, thus
da
i
dt
= C
i
a
i
.
The solution is a
i
(t) = a
i
(0) exp(C
i
t),
(t) =

i
a
i
(0) exp(C
i
t)v
i
. (8)
In summary, given the initial conditions and eigenvalues and eigenvectors of L we can
calculate the diusion dynamics on a network.
7.2 Eigenvalues of the graph Laplacian
Proposition 7.2:
All eigenvalues of the graph Laplacian are non-negative.
Proof:
For every link in the network, arbitrarily designate one end of the link to be end 1 and
the other end 2. If there are m links in total, dene the m n node-link incidence
matrix B with elements
B
ij
=
_

_
+1 if end 1 of link i is attached to node j,
1 if end 2 of link i is attached to node j,
0 otherwise.
Consider

k
B
ki
B
kj
.
Case i = j:
B
ki
B
kj
=
_
1 if link k connects nodes i and j,
0 otherwise.
In a simple network, there is at the most one link between two nodes, so

k
B
ki
B
kj
=
_
1 if i and j are connected,
0 otherwise.
(9)
Case i = j:
B
2
ki
=
_
1 if link k is connected to node i,
0 otherwise.

k
B
2
ki
= k
i
. (10)
32
From Eq. 9 and 10,
B
T
B = L. (11)
Let v
i
be a normalised eigenvector of L with eigenvalue
i
. Then
v
T
i
B
T
Bv
i
= v
T
i
Lv
i
=
i
v
T
i
v
i
=
i
Because v
T
i
B
T
Bv
i
= |Bv
i
|
2
0,
i
cannot be negative.

Proposition 7.3:
The graph Laplacian has at least one eigenvalue 0.
Proof:
Multiply L with the vector 1 = (1, 1, . . . , 1)
T
. The i-th element of the product is

j
L
ij
1 =

j
(
ij
k
i
A
ij
) = k
i

j
A
ij
= k
i
k
i
= 0.
In matrix notation, L 1 = 0. 1 is eigenvector with eigenvalue 0.

Proposition 7.4:
The multiplicity of the eigenvalue 0 equals the number of connected components in the
network.
Proof:
Assume the network consists of c components of sizes n
1
, . . . , n
c
and the nodes are labeled
so that the nodes
1, . . . , n
1
belong to the rst component,
n
1
+ 1, . . . , n
2
to the second component etc.
Then L is block diagonal,
L =
_
_
_
0 . . .
0 . . .
.
.
.
.
.
.
.
.
.
_
_
_
and the blocks are the Laplacians of the individual components. We can use the same
argument as in Prop. 7.3 to show that
v
1
= (1, . . . , 1
. .
n
1
ones
, 0, . . . , 0)
T
, v
2
= (0, . . . , 0
. .
n
1
zeros
, 1, . . . , 1
. .
n
2
ones
, 0, . . . , 0)
T
, . . .
are c linearly independent eigenvectors of L with eigenvalue 0.
We now have to prove that all vectors u satisfying Lu = 0 are linear combinations of
v
1
, . . . v
c
.
Lu = 0
Eq. 11
u
T
B
T
Bu = 0 |Bu| = 0 Bu = 0.
From the denition of B, Bu = 0 implies that u
i
= u
j
for every link i j. By induction
on the path length, we can show that u
i
is constant for all nodes i on a path and hence
for all i in the same component. The vector u must then be of the form
u = (a
1
, . . . , a
1
. .
n
1
times
, a
2
, . . . , a
2
. .
n
2
times
, . . . , a
c
, . . . , a
c
. .
nc times
)
T
= a
1
v
1
+ . . . + a
c
v
c
.
33
Remark: In Eq. 8,
i
0 implies that diusion tends to a stationary solution as t .
In this limit, the only non-zero term in the sum comes from
i
= 0 so that lim
t

j
(t)
is equal for all nodes j in the same component (i.e. in each component, the commodity
is equally spread over all nodes).
7.3 Random walks Stationary distribution
Denition 7.5:
A random walk starting from a specied initial node n
1
is a sequence of nodes (n
1
, n
2
, . . .)
where the node n
i+1
is chosen uniformly at random among the nodes linked to n
i
.
Proposition 7.6:
Assume the network is connected, has m links, and let p
i
(t) be the probability that the
walk is at node i at the t-th step. There is a unique stationary distribution satisfying
p
i
(t) = p
i
(t 1) for all i and t, namely p
i
=
k
i
2m
.
Proof :
From Def. 7.5
p
i
(t) =
n

j=1
A
ij
k
j
p
j
(t 1). (12)
or in matrix form p(t) = AD
1
p(t 1).
We are looking for a stationary distribution, i.e. p(t1) = p(t) = p, so that p = AD
1
p
or
(I AD
1
)p = (DA)D
1
p = LD
1
p = 0. (13)
Equation 13 implies that D
1
p is an eigenvector of L with eigenvalue 0.
From the proof of Prop. 7.4 we know that for a connected network the only such eigen-
vectors are a1 = a (1, . . . , 1) where a is a constant.
p = aD1 p
i
= ak
i
.
Because

i
p
i
= 1 and

i
k
i
= 2m, a =
1
2m
.

Remark: The stationary solution of the random walk is not equal to the at stationary
solution of diusion.
The random walk spends time on nodes k
i
because the higher the degree, the
more ways of reaching the node.
Diusion has a at stationary distribution because particles will leave nodes with
higher degree more quickly.
7.4 Random walks Mean rst passage time
We now want to calculate the mean rst passage time from a node u to v, i.e. the average
time needed for a random walk starting at u to reach v. The next denition will be useful.
34
Denition 7.7:
Let p be a vector in R
n
. Dene p
(v)
to be the (n 1)-dimensional vector where
the v-th entry is removed,
p
(v)
= (p
1
, . . . , p
v1
, p
v+1
, . . . , p
n
)
T
Let M be an nn matrix and 1 v n. Dene M
(v)
to be the (n1) (n1)
matrix obtained from M by removing the v-th column and the v-th row,
M
(v)
=
_
_
_
_
_
_
_
_
M
11
. . . M
1,v1
M
1,v+1
. . . M
1n
. . . . . . . . . . . . . . . . . .
M
v1,1
. . . M
v1,v1
M
v1,v+1
. . . M
v1,n
M
v+1,1
. . . M
v+1,v1
M
v+1,v+1
. . . M
v+1,n
. . . . . . . . . . . . . . . . . .
M
n1
. . . M
n,v1
M
n,v+1
. . . M
nn
_
_
_
_
_
_
_
_
.
In the special case where Mis the graph Laplacian L, L
(v)
is called the v-th reduced
Laplacian.
Let N be an (n 1) (n 1) matrix. Dene N
(v+)
to be the n n matrix that is
equal to N with a v-th row and column of zeros,
N
(v+)
=
_
_
_
_
_
_
_
_
_
_
N
11
. . . N
1,v1
0 N
1v
. . . N
1,n1
. . . . . . . . . . . . . . . . . . . . .
N
v1,1
. . . N
v1,v1
0 N
v1,v
. . . N
v1,n1
0 . . . 0 0 0 . . . 0
N
v1
. . . N
v,v1
0 N
v,v
. . . N
v,n1
. . . . . . . . . . . . . . . . . . . . .
N
n1,1
. . . N
n1,v1
0 N
n1,v
. . . N
n1,n1
_
_
_
_
_
_
_
_
_
_
.
To calculate the mean passage time we also need the following proposition.
Proposition 7.8:
Let M be a symmetric matrix. The series

t=1
t(M
t1
M
t
) converges if and only if all
eigenvectors satisfy |
i
| < 1. In that case,

t=1
t(M
t1
M
t
) = (I M)
1
.
Proof:
Because M is symmetric, there exists an orthogonal matrix Q so that
QMQ
1
=
_
_
_
_
_

1
0 . . . 0
0
2
.
.
.
.
.
.
.
.
. 0
0 . . . 0
n
_
_
_
_
_
.
35
Q
_

t=1
t(M
t1
M
t
)
_
Q
1
=

t=0
t(QM
t1
QQM
t
Q
1
) =

t=0
t
_
(QMQ
1
)
t1
(QMQ
1
)
t

) =
_
_
_
_
_

t
t(
t1
1

t
1
) 0 . . . 0
0

t
t(
t1
2

t
2
)
.
.
.
.
.
.
.
.
. 0
0 . . . 0

t
t(
t1
n

t
n
).
_
_
_
_
_
. (14)
Let us have a closer look at the non-zero entries,

t=1
t(
t1
i

t
i
) =
lim
N
(
0
i

1
i
+ 2
1
i
2
2
i
+ 3
2
i
3
3
i
+ . . . + N
N1
i
N
N
i
) =

t=0

t
i
. .
geometric series
lim
N
N
N
i
. .
0 if and only if |
i
| < 1
=
1
1
i
. (15)
Insert Eq. 15 in Eq. 14
Q
_

t=1
t(M
t1
M
t
)
_
Q
1
= (16)
_
_
_
_
_
(1
1
)
1
0 . . . 0
0 (1
2
)
1
.
.
.
.
.
.
.
.
. 0
0 . . . 0 (1
n
)
1
_
_
_
_
_
= (17)
(I QMQ
1
)
1
= [Q(I M)Q
1
]
1
= Q(I M)
1
Q
1
.

Proposition 7.9:
If all eigenvalues
i
of M = A
(v)
(D
(v)
)
1
satisfy |
i
| < 1, then the mean rst passage
time for a random walk from node u to v is given by
=
n

i=0
k
i

iu
,
where = [(L
(v)
)
1
]
(v+)
.
36
Proof:
We change the rules of the random walk slightly to make it absorbing: as soon as the
walk reaches v, it cannot leave again. That is, we set A
iv
= 0 for all i, rendering A
asymmetric.
Dene p
v
(t) as the probability that a walk reaches v for the rst time in t steps. The
probability that the rst passage time is equal to t is p
v
(t) p
v
(t 1) and the mean is
4
=

t=1
t[p
v
(t) p
v
(t 1)]. (18)
Consider Eq. 12 for i = v: p
i
(t) =

n
j=1
A
ij
k
j
p
j
(t 1)
A
iv
=0
=

j=v
A
ij
k
j
p
j
(t 1).
As long as we concentrate on i = v. we can simply remove the v-th column and row
from the vectors and matrices,
p
(v)
(t) = A
(v)
(D
(v)
)
1
. .
M
p
(v)
(t 1), (19)
By iterating Eq. 19, we obtain
p
(v)
(t) = M
t
p
(v)
(0). (20)
Next we observe that
p
v
(t) = 1

i=v
p
i
(t) = 1 1
T
p
(v)
(t), (21)
where 1 = (1, 1, 1, . . .)
T
.

Eq. 18,21
=

t=1
t1
T
[p
(v)
(t 1) p
(v)
(t)]
Eq. 20
=
1
T
_

t=1
t(M
t1
M
t
)
_
p
(v)
(0)
Prop. 7.8
=
1
T
(I M)
1
p
(v)
(0). (22)
From the denition of M,
(I M)
1
= [I A
(v)
(D
(v)
)
1
]
1
= D
(v)
[D
(v)
A
(v)
]
1
= D
(v)
(L
(v)
)
1
. (23)
Insert Eq. 23 in Eq. 22,
= 1
T
D
(v)
(L
(v)
)
1
p
(v)
(0).
The only non-zero entry in p
(v)
(0) is p
(v)
u
(0) = 1 because the random walk is initially
at u. Furthermore, the only non-zero entries in D
(v)
are the degrees k
i
:
=
n

i=1
k
i
{[(L
(v)
)
1
]
(v+)
}
iu
.

4
The sum in Eq. 18 is not absolutely convergent, so that we cannot change the order of the individual
terms.
37
8 The leading eigenvalue of the adjacency matrix
8.1 Statement of the Perron-Frobenius theorem
The results in this section apply to directed networks (with undirected networks as a
special case).
Denition 8.1:
An nn matrix M is reducible if there exists some permutation matrix P so that P
T
MP
is block upper triangular,
P
T
MP =
_
X Y
0 Z
_
, (24)
where X and Z are square matrices. Otherwise M is called irreducible.
Proposition 8.2:
Let A be the adjacency matrix of a directed network. (The network may be weighted
with link weights 0.) Then the following three statements are equivalent:
(A) A is irreducible.
(B) The directed network is strongly connected.
(C) For each i and j there exists a k so that (A
k
)
ij
> 0.
Proof: (A) (B)
Suppose A is irreducible, but that the network is not strongly connected.
There exist nodes i and j so that there is no directed path from i to j. ()
Dene S
1
= {node k| there is a path from i to k} and let S
2
be its complement.
For any node p in S
1
and q in S
2
, there is no path from p to q; otherwise q would have
been in S
1
.
Dene r = card(S
1
). Because of (), r = 0 and r = n because i S
1
and j S
2
.
Without loss of generality assume that the nodes in S
1
are labeled 1, . . . , r and thus
r + 1, . . . , n are in S
2
.
5
There is no link from k to l for all k = 1, . . . , r and l = r + 1, . . . , n.
A
lk
= 0 for all l = r + 1, . . . , n, k = 1, . . . , r, that is A has the block upper triangular
form of the right-hand side of Eq. 24.
This contradicts that Ais irreducible and, hence, the network must be strongly connected.
Proof: (B) (C)
This follows from Prop. 4.4.
Proof: (C) (A)
We will prove the contrapositive version. Suppose A is reducible and without loss of
generality is upper block triangular as the right-hand side of Eq. 24.
Generally, if two upper block triangular matrices whose blocks have identical dimensions,
the result is another upper block triangular matrix with the same dimensions.
5
We can make this assumption because we can otherwise apply a permutation transformation

A =
P
T
AP which relabels the nodes accordingly in the new adjacency matrix

A.
38
That is, if M
1
=
_
X
1
Y
1
0 Z
1
_
, M
2
=
_
X
2
Y
2
0 Z
2
_
with r r matrices X
1
, X
2
and
(n r) (n r) matices Z
1
, Z
2
, then
M
1
M
2
=
_
X
1
X
2
X
1
Y
2
+Y
1
Z
2
0 Z
1
Z
2
_
.
If M
1
= M
2
= A, then we know that A
2
has the same block dimensions. Applying this
argument repeatedly, A
k
also has the same form. In particular, it keeps a (n r) r
matrix 0 as lower left block for any k. Hence, (C) does not hold.

Notation:
A matrix M or vector v is positive, denoted by M > 0 or v > 0, if all its elements
are positive.
A matrix M or vector v is non-negative, denoted by M 0 or v 0, if it does
not contain any negative elements.
Let M be an nn matrix. An eigenvalue
i
that maximises max
j=1,...,n
|
j
| is called
a leading eigenvalue of M. In other words,
i
is a leading eigenvalue if and only
if its absolute value is equal to the spectral radius of M, i.e. |
i
| = (M). (See
Def. 4.3 for the denition of the spectral radius.)
The 1-norm of an n-dimensional vector v is dened as ||v||
1
=

n
i=1
|v
i
|.
If a network is strongly connected, we can apply the next theorem to the adjacency
matrix.
Theorem 8.3:
Perron-Frobenius theorem: If the matrix M 0 is irreducible and its spectral radius is
(M) = r, then
(A) r is an eigenvalue of M,
(B) alg mul
M
(r) = 1,
(C) there exists an eigenvector x > 0 of M with eigenvalue r (i.e. Mx = rx),
(D) r > 0.
(E) Let p be the unique vector dened by Mp = rp, p > 0 and ||p||
1
= 1. There are
no non-negative eigenvectors of M, regardless of their eigenvalue, except positive
multiples of p.
8.2 Proof for strictly positive matrices
We will rst prove Theorem 8.3 for the special case where M > 0. The proof follows
C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, Philadelphia, 2000.
Without loss of generality, we can assume |
1
| = 1 because, if this is not the case, we can
replace M by

M = M/|
1
|.
6
6
We can rule out
1
= 0. Otherwise all eigenvalues are 0 which makes the matrix nilpotent (see
Prop. 4.8). But if all M
ij
> 0, M cannot be nilpotent.
39
We will furthermore use the notation |M| to represent the matrix with entries |M
ij
|,
(i.e. we take the absolute values of the entries in M). Note that the notation | . . . | here
indicates absolute values, not determinants.
We will need the following lemma for the proof.
Lemma 8.4:
For any complex square matrix M, lim
k
M
k
= 0 if and only if the spectral radius
satises (M) < 1.
Proof:
If J = P
1
MP is the Jordan normal form of M, then
M
k
= PJ
k
P
1
= P
_
_
_
_
_
J
k
1
0 . . . 0
0 J
k
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . J
k
p
_
_
_
_
_
P
1
, (25)
where all Jordan blocks J
i
are of the upper tridiagonal form J

=
_
_
_
1
.
.
.
.
.
.

_
_
_
.
From Eq. 25, lim
k
M
k
= 0 if and only if lim
k
J
k

= 0, so it suces to prove that


lim
k
J
k

= 0 if and only if || < 1.


Suppose J

is an mm matrix. Induction on k proves that


J
k

=
_
_
_
_
_
_
_
_

k
_
k
1
_

k1
_
k
2
_

k2
. . .
_
k
m1
_

km+1

k
_
k
1
_

k1
.
.
.
.
.
.
.
.
.
.
.
.
_
k
2
_

k2

k
_
k
1
_

k1

k
_
_
_
_
_
_
_
_
.
From the diagonal entries, we can tell that J
k
0 implies
k
0 and thus || < 1.
We then only need to show that, conversely, || < 1 implies that all entries in J
k

go to
zero. The binomial coecient can be bounded by
_
k
j
_
=
k(k 1) . . . (k j + 1)
j!

k
j
j!
=

_
k
j
_

kj

k
j
j!
||
kj
k
0.
The last term goes to zero because k
j
increases polynomially, but ||
k
decays exponen-
tially.
Lemma 8.5:
If M > 0 and
1
a leading eigenvalue, then the following statements are true.
M has an eigenvalue equal to |
1
| > 0.
If Mx = |
1
|x, then M|x| = |
1
||x| and |x| > 0.
In other words, M has a strictly positive eigenvector whose eigenvalue is the spectral
radius (M).
40
Proof:
Without loss of generality, we can assume |
1
| = 1. Let x be an eigenvector (and hence
x = 0) of M with eigenvalue
1
. Then
|x| = |
1
||x| = |
1
x| = |Mx|
()
|M||x| = M|x| |x| M|x|, (26)
where () follows from the triangle inequality.
We want to show that equality holds. For convenience, dene z = M|x| and
y = z |x|. (27)
From Eq. 26, y 0. Suppose that y = 0, that is suppose that some y
i
> 0. Because
M > 0, we must then have My > 0 and, since |x| = 0, z = M|x| > 0. This implies that
there exists a number > 0 such that My > z. Then
My = Mz M|x| = Mz z > z
M
1 +
z > z
Dene B = M/(1 +), so Bz > z. Successively multiplying with B > 0, we nd
B
2
z > Bz > z, B
3
z > B
2
z > Bz > z B
k
z > z. (28)
Because
1
/(1 + ) is a leading eigenvalue of B, the spectral radius satises (B) =
|
1
/(1 +)| = 1/(1 +) < 1. According to Lemma 8.4, lim
k
B
k
= 0. Taking the limit
in Eq. 28, we nd 0 > z in contradiction to z > 0, so the assumption y = 0 was false.
0 = y = M|x| |x|,
so |x| is an eigenvector with eigenvalue 1. The proof is completed by observing |x| =
M|x| > 0 where the inequality follows from M > 0, x = 0.
Next we want to show that there is only one eigenvalue with absolute value (M). In the
proof, we will use the -norm for vectors and matrices.
Denition 8.6:
For a complex n-dimensional vector x, ||x||

= max
i
|x
i
|.
For a complex n n matrix M, ||M||

= max
i

n
j=1
|M
ij
|.
Proposition 8.7:
The matrix -norm is submultiplicative, i.e. ||AB||

||A||

||B||

.
Proof:
One can easily show that
||A||

= max
||x||=1
||Ax||

= max
x=0
_
||Ax||

||x||

_
.
||AB||

= max
x=0
_
||ABx||

||Bx||

||Bx||

||x||

_
max
x=0
||Ax||

||x||

__
max
x=0
||Bx||

||x||

_
= ||A||

||B||

.
41
Lemma 8.8:
If M > 0 and
1
a leading eigenvalue with |
1
| = (M), then
(A)
1
= (M) (i.e. there is no other eigenvalue with the same absolute value).
(B) index(
1
) = 1. (See Def. 4.3 for the denition of the index.)
Proof: (A)
Assume without loss of generality (M) = 1. Let x be an eigenvector with eigenvalue
1
and |
1
| = 1.
M|x| = |Mx| = |
1
x| = |
1
||x| = |x| M|x| = |x|.
From Lemma 8.5, we can deduce that
|x| > 0. (29)
We can write the k-th entry in |x| as
|x
k
| = (M|x|)
k
=
n

j=1
M
kj
|x
j
|. (30)
But x
k
also satises
|x
k
| = |
1
||x
k
| = |(
1
x)
k
| = |(Mx)
k
| =

j=1
M
kj
x
j

. (31)
Combining Eq. 30 and 31,

j=1
M
kj
x
j

=
n

j=1
M
kj
|x
j
|, (32)
which implies equality in the triangle inequality. From Eq. 29 we know that all terms in
the sums are dierent from zero. Therefore, the equality of Eq. 32 implies that all terms
M
kj
x
j
must have the same sign (otherwise the triangle inequality is strict). Because
M
kj
> 0 for all k and j, all x
j
must have the same sign.
In other words, there must be a vector p > 0, so that x = ap for some constant a = 0.
From Mx = x, we can now deduce

1
p = Mp = |Mp| = |
1
p| = |
1
|p = p
and thus
1
= 1.
Proof: (B)
Suppose that index(1) = m > 1. The Jordan normal form J = P
1
MP must contain an
mm Jordan block J

with 1s on the diagonal (see Thm. 4.2).


We know the general shape of J
k

from Eq. 26. If m > 1, then


||J
k

||

= max
1im
m

j=1
|(J
k

)
ij
| = 1 +
_
k
1
_
+
_
k
2
_
+ . . . +
_
k
m1
_
.
If m is xed, the right-hand side diverges for k and thus ||J
k

||

which in
turn means ||J
k
||

.
42
From Prop. 8.7 we know that ||J
k
||

= ||P
1
M
k
P||

||P
1
||

||M
k
||

||P||

or
||M
k
||


||J
k
||

||P
1
||

||P||

.
The matrices in the denominator are constants and thus ||J
k
||

implies ||M
k
||


.
Let m
(k)
ij
be the (i, j)-th entry in M
k
and let i
k
denote the row index for which ||M
k
||

j
m
(k)
i
k
j
. From the proof of (A) we know that there exists a vector p > 0 such that
p = Mp and consequently p = M
k
p. For such a p
||p||

p
i
k
=

j
m
(k)
i
k
j
p
j

_

j
m
(k)
i
k
j
_
(min
i
p
i
) = ||M
k
||

(min
i
p
i
) .
But this is impossible because p is a constant vector, so the supposition that index(1) > 1
must be false.

Lemma 8.9:
If M > 0, then alg mul
M
((M)) = 1.
Proof:
Assume without loss of generality (M) = 1. Suppose alg mul
M
(1) = m > 1. We know
from Lemma 8.8 that alg mul
M
(1) = geo mul
M
(1), so there are m linearly independent
eigenvectors with eigenvalue 1. Let x and y be two such independent eigenvectors, i.e.
x = y for all complex numbers . Select a non-zero component y
i
from y and set
z = x (x
i
/y
i
)y. Because Mz = z, we know from Lemma 8.5 that M|z| = |z| > 0. But
this contradicts z
i
= x
i
(x
i
/y
i
)y
i
= 0. The supposition alg mul
M
(1) > 1 must thus be
false.

Denition 8.10:
Let M > 0. The unique vector p satisfying
Mp = (M)p,
p > 0 and
||p||
1
=

i
|p
i
| = 1
is called the Perron vector of M.
Because M > 0 M
T
> 0, there is also a Perron vector q of M
T
called the
left-hand Perron vector. Since (M) = (M
T
), it satises q
T
M = (M)q
T
.
Lemma 8.11:
If M > 0, then there are no non-negative eigenvectors of M, regardless of their eigenvalue,
except for positive multiples of the Perron vector p.
43
Proof:
Let y 0 be an eigenvector (and thus y = 0) with eigenvalue and let x > 0 be the
left-hand Perron vector of M.
(M)x
T
= x
T
M (M)x
T
y = x
T
My = x
T
y. (33)
Because x > 0 and y = 0, we must have x
T
y > 0. From this and Eq. 33 we can conclude
= (M). So y must be an eigenvector with eigenvalue (M). From Lemma 8.9, we
know that the eigenspace corresponding to this eigenvalue is one-dimensional, hence the
Lemma is proved.

Combining Lemmas 8.5, 8.8, 8.9 and 8.11 yields Perrons theorem, an important special
case of the Perron-Frobenius theorem.
Theorem 8.12:
Perrons theorem: If M > 0 and r = (M), then
r > 0,
r is a leading eigenvalue of M,
alg mul
M
(r) = 1,
r is the only eigenvalue with absolute value r,
there exists an eigenvector x > 0 such that Mx = rx,
the Perron vector p dened in Def. 8.10 is unique and, except for positive multiples
of p, there are no other non-negative eigenvectors of M, regardless of the eigenvalue.
Remark: The Perron theorem only applies to the leading eigenvalue. Non-leading eigen-
values can be negative. For example,
M =
_
1 2
2 1
_
.
has an eigenvalue 1. But the elements in the corresponding eigenvectors must have
dierent signs. In this example, the eigenvectors are non-zero multiples of x = (1, 1)
T
.
Remark: The Perron theorem does not apply to the adjacency matrices of simple networks
because their diagonal entries are zero. So we still have some work to do in order to obtain
the more general Perron-Frobenius theorem 8.3.
8.3 Proof for non-negative matrices
For the proof of the next theorem, we need the following lemma.
Lemma 8.13:
(A) For any complex square matrix M, (M) ||M||

.
(B) (M) = lim
k
(||M
k
||

)
1/k
.
(C) If |M| N, then (M) (|M|) (N).
44
Proof: (A)
Let x = (x
1
, . . . , x
n
)
T
be an eigenvector with eigenvalue . Then the n n matrix
X =
_
_
_
x
1
0 . . . 0
.
.
.
.
.
.
.
.
.
x
n
0 . . . 0
_
_
_
satises X = MX. || ||X||

= ||X||

= ||MX||

||M||

||X||

.
Since X = 0, || ||M||

for all eigenvalues of M.


Proof: (B)
From the Jordan normal form, we can derive (M)
k
= (M
k
) and, from (A), (M
k
)
||M
k
||

. Combining these two inequalities, (M) (||M


k
||

)
1/k
.
Furthermore, (M/((M+ )) < 1 for every > 0, so according to Lemma 8.4,
lim
k
_
M
(M) +
_
k
= 0 lim
k
||M
k
||

((M) + )
k
= 0.
This implies that there is a K

> 0 such that ||M


k
||

/((M) + )
k
< 1 and hence
(||M
k
||

)
1/k
< (M) + for all k K

.
In summary,
(M) (||M
k
||

)
1/k
< (M) + for k K

for all > 0 and thus lim


k
(||M
k
||

)
1/k
= (M).
Proof: (C)
The triangle inequality implies |M
k
| |M|
k
for all k N. From |M| N we can further
derive |M|
k
N
k
. These two inequalities together with (B) yield
||M
k
||

|M
k
|

|M|
k

||N
k
||

||M
k
||
1/k

|M|
k

1/k

||N
k
||
1/k

lim
k
||M
k
||
1/k

lim
k

|M|
k

1/k

lim
k
||N
k
||
1/k

(M) (|M|) (N).



Now we have the necessary tools to generalise Perrons theorem to non-negative matrices.
Theorem 8.14:
For any non-negative square matrix M with r = (M), the following statements are true.
M has an eigenvalue r (but r = 0 is possible),
there exists a vector z 0, z = 0 so that Mz = rz.
Proof:
Let us dene E to be the matrix with 1 in every entry and dene the sequence
M
k
= M+ (1/k)E.
Because all M
k
are positive, we can apply Perrons theorem 8.12. Let r
k
> 0 be the
spectral radius of M
k
and p
k
the Perron vector. The set {p
k
}

k=1
is bounded by the
45
unit sphere. The Bolzano-Weierstrass theorem states that each bounded sequence has
a convergent subsequence so that there must be a subsequence {p
k
i
}

i=1
z for some
vector z. We know that z 0 because p
k
i
> 0. We also know that z = 0 because
||p
k
i
||
1
= 1.
Because M
1
> M
2
> . . . > M, Lemma 8.13(C) implies r
1
r
2
. . . r, so the sequence
r
k
is monotonically decreasing and bounded from below by r. Therefore, lim
k
r
k
= r
exists and
r r. (34)
On the other hand, lim
k
M
k
= M so that also lim
i
M
k
i
= M and thus
Mz = lim
i
M
k
i
lim
i
p
k
i
= lim
i
(M
k
i
p
k
i
) = lim
i
(r
k
i
p
k
i
) = lim
i
r
k
i
lim
i
p
k
i
= rz.
This implies that r is an eigenvalue of M. Since r is the spectral radius of M, r r.
Because of Eq. 34, r = r.

Theorem 8.14 is as much as we can prove for general non-negative matrices. In the special
case where M is irreducible, however, we can recover almost all of Perrons theorem 8.12.
The proof requires the following lemma.
Lemma 8.15:
If M is irreducible, then (I +M)
n1
> 0, where I denotes the identity matrix.
Proof:
Let m
(k)
ij
be the (i, j)-th entry in M
k
. From Prop. 8.2(C) we know that for every pair
(i, j) there is a k so that m
(k)
ij
> 0.
_
(I +M)
n1

ij
=
_
n1

k=0
_
n 1
k
_
M
k
_
ij
=
n1

k=0
_
n 1
k
_
m
(k)
ij
> 0.

Now we are prepared for the proof of the Perron-Frobenius theorem 8.3.
Proof of Thm. 8.3: (A)
This follows from Thm. 8.14.
Proof of Thm. 8.3: (B)
Let B = (I + M)
n1
> 0 be the matrix from Lemma 8.15. Furthermore, let J =
P
1
MP =
_
_
_
_
_
J
1
0 . . . 0
0 J
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . J
p
_
_
_
_
_
be the Jordan normal form of M. Then
P
1
BP = P
1
_
n1

k=0
_
n 1
k
_
M
k
_
P =
n1

k=0
_
n 1
k
_
(P
1
MP)
k
=
n1

k=0
_
n 1
k
_
J
k
.
We have calculated the general shape of J
k
in the proof of Lemma 8.4. From this we can
conclude that
P
1
BP =
n1

k=0
_
n 1
k
_
_
_
_
_
_
J
k
1
0 . . . 0
0 J
k
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . J
k
p
_
_
_
_
_
=
_
_
_
_
_
B
k
1
0 . . . 0
0 B
k
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . B
k
p
_
_
_
_
_
46
where
B
k
i
=
_
_
_
_
_

n1
k=0
_
n1
k
_

k
i
x
12
. . . x
1m
0
.
.
.
.
.
.
.
.
.
.
.
. x
m1,m
0 . . . 0

n1
k=0
_
n1
k
_

k
i
_
_
_
_
_
=
_
_
_
_
_
(1 +
i
)
n1
x
12
. . . x
1m
0
.
.
.
.
.
.
.
.
.
.
.
. x
m1,m
0 . . . 0 (1 +
i
)
n1
_
_
_
_
_
and we have assumed that J
i
is an mm matrix. So is an eigenvalue of M if and only
if (1 +)
n1
is an eigenvalue of B and alg mul
M
() = alg mul
B
[(1 +)
n1
].
Set r = (M) and b = (B). Since r is an eigenvalue of M,
b = max
i=1,...,p
|(1 +
i
)|
n1
=
_
max
i=1,...,p
|1 +
i
|
_
n1
= (1 +r)
n1
.
Suppose alg mul
M
(r) > 1. Then alg mul
B
(b) > 1 in contradiction to B > 0 and
Thm. 8.12. Therefore the supposition was wrong and instead alg mul
M
= 1.
Proof of Thm. 8.3: (C)
We know from Thm. 8.14 that there is an eigenvector x 0 with eigenvalue r, Mx = rx.
Bx = (I +M)
n1
x =
n1

k=0
_
n 1
k
_
M
k
x =
n1

k=0
_
n 1
k
_
r
k
x = (1 +r)
n1
x,
which implies that x is a non-negative eigenvector for the leading eigenvalue of B > 0.
It follows from Thm. 8.12 that x > 0.
Proof of Thm. 8.3: (D)
Let x be an eigenvector with eigenvalue r. Suppose r = 0. Then Mx = 0 and furthermore
M 0 and x > 0. This can only be true if M = 0. But a matrix with all zeros is
reducible, so we must have r > 0.
Proof of Thm. 8.3: (E)
This can be proved with the same arguments as Lemma 8.11.
Remark: There is one property of Thm. 8.12 that the Perron-Frobenius theorem 8.3 does
not recover, namely that
() an eigenvalue with || = (M) must satisfy = (M).
For example,
M =
_
0 1
1 0
_
.
has eigenvalues 1 and 1.
Irreducible matrices with the additional property () are called primitive. Primitive
matrices play an important role for random walks on directed networks: if the adjacency
matrix is primitive, then the random walk does not have a periodic solution.
47
9 Centrality measures
There are several measures to quantify how central or important a node is in a network.
We have already encountered one simple, but useful, centrality measure: the degree, also
sometimes called degree centrality. It is plausible that a hub, i.e. a node with a high
(in-)degree, is more important than a node with only few neighbours.
However, the degree is in many applications a very crude measure. Usually not all
neighbours are equally important and, therefore, the number of neighbours alone is not
enough to assess centrality. This idea leads to several more advanced centrality measures.
9.1 Eigenvector centrality
Motivation:
Consider the example in Fig. 30. Node M has a smaller degree than L and R, but is M
really less central? After all, M is connected to the two nodes of highest degree in the
network which should boost its importance. In contrast, L and R are mostly linked to
nodes of low degree and thus should be relatively less important than their own degree
suggests.
A self-consistent measure of the centrality would be to make it proportional to the sum
of its neighbours centralities. If x
i
is the centrality of node i, then we need to solve
x
i
= C
n

j=1
A
ij
x
j
(35)
self-consistently for some constant C. In matrix form, this is x = CAx. In other words,
x is an eigenvector of the adjacency matrix.
If we choose x to be the Perron vector, then M in Fig. 30 receives indeed the same
centrality as L and R.
Denition 9.1:
If A is the adjacency matrix of a strongly connected network with n nodes, then the
eigenvector centralities of the nodes 1, . . . , n are the elements of the Perron vector of A
(see Def. 8.10 for the denition of the Perron vector).
Motivation:
So why do we choose the Perron vector p and not one of the other eigenvectors of A?
There are several reasons:
0.1
0.1
0.2 0.2 0.2
0.1
0.1
L M R
Figure 30: A small illustrative undirected network. Node M has a smaller degree than L and
R, but the same eigenvector centrality (indicated by the decimal numbers).
48
p has a positive eigenvalue (at least for a strongly connected network) so that C > 0
in Eq. 35 which is sensible.
p > 0 so that all centralities are positive which is also reasonable.
As we will show in the next theorem, the Perron vector is (usually) the asymptotic
result of the following iterative procedure known as von Mises iteration or power
method.
(i) Set t = 0.
(ii) Let us make an initial guess about the importance x
(0)
i
> 0 for all nodes
i = 1, . . . , n (e.g. x
(0)
i
= 1 for all i).
(iii) An improved measure of centrality x

i
is the sum of the importance of all nodes
pointing towards i,
x

i
=
n

j=1
A
ij
x
(t)
j
.
or in matrix form x

= Ax
(t)
.
(iv) Increment t by 1 and dene x
(t)
to be the normalised vector pointing in the
direction of x

,
x
(t)
=
x

||x

||
1
.
Go back to step (iii).
Theorem 9.2:
If A 0 is the adjacency matrix of a strongly connected network, x
(0)
> 0, x
(t)
=
A
t
x
(0)
||A
t
x
(0)
||
1
and (A) is the only eigenvalue on the spectral circle, then
x
()
= lim
t
x
(t)
= p (36)
where p is the Perron vector of A.
Proof:
Let J = P
1
AP be the Jordan normal form of A with the leading eigenvalue in the upper
left corner. From the Perron-Frobenius theorem 8.3, we know that the leading eigenvalue
is (A) > 0 with alg mul((A)) = 1 which gives J the general form
J =
_
_
_
_
_
(A) 0 . . . 0
0 J
2
.
.
.
.
.
.
.
.
. 0
0 . . . 0 J
p
_
_
_
_
_
. (37)
Because P is non-singular, the column vectors Pe
1
, . . . , Pe
n
with
e
i
= (0, . . . , 0, 1
..
i-th position
, 0, . . . , 0)
49
form a basis of C
n
so that we can express our initial guess x
(0)
as
x
(0)
=
n

i=1
b
i
Pe
i
(38)
for some coecients b
i
C. We will later on need b
1
= 0, which can be seen as follows.
From Eq. 37, (A)e
T
1
P
1
= e
T
1
JP
1
= e
T
1
P
1
A, so e
T
1
P
1
is a multiple of the left-hand
Perron vector. It cannot be zero because otherwise P would be singular. So we can
conclude that the elements of e
T
1
P
1
are either all positive or all negative. Since we have
chosen x
(0)
> 0, e
T
1
P
1
x
(0)
= 0. Now we insert Eq. 38,
0 = e
T
1
P
1
n

i=1
b
i
Pe
i
= e
T
1
n

i=1
b
i
e
i
= b
1
. b
1
= 0.
Multiplying x
(0)
with A
t
, we obtain
A
t
x
(0)
= PJ
t
P
1
n

i=1
b
i
Pe
i
= PJ
t
n

i=1
b
i
e
i
.
From Eq. 37, Je
1
= (A)e
1
so that
A
t
x
(0)
= b
1
((A))
t
Pe
1
+PJ
t
n

i=2
b
i
e
i
= b
1
((A))
t
P
_
e
1
+
1
b
1
_
J
(A)
_
t n

i=2
b
i
e
i
_
.
In the t-th step of the von Mises iteration, the centrality vector is
x
(t)
=
A
t
x
(0)
||A
t
x
(0)
||
1
=
b
1
|b
1
|
((A))
t
|(A)|
t
P
_
e
1
+
1
b
1
_
J
(A)
_
t

n
i=2
b
i
e
i
_

P
_
e
1
+
1
b
1
_
J
(A)
_
t

n
i=2
b
i
e
i
_

1
.
The matrix
J
(A)
has an entry 1 in the top left corner, but all other diagonal entries are
< 1. Using the arguments in the proof of Thm. 8.4, we nd
lim
t
_
J
(A)
_
t
=
_
_
_
_
_
1 0 . . . 0
0 0 . . . 0
.
.
.
.
.
.
.
.
.
0 0 . . . 0
_
_
_
_
_
.
Because of this and (A) > 0,
x
()
= lim
t
x
(t)
=
b
1
|b
1
|
Pe
1
||Pe
1
||
1
. (39)
Since APe
1
= PJe
1
= (A)Pe
1
, Pe
1
is an eigenvalue of A with eigenvalue (A). Addi-
tionally, we know
A 0, x
(0)
> 0 Ax
(0)
> 0,
50
because zeros in Ax
(0)
could only appear if A contained a row of zeros and thus a node
of in-degree zero, but then the network would not be strongly connected because there
would not be any path to this node. It follows by induction that
A
t
x
(0)
> 0 x
(t)
> 0 x
()
0. (40)
Furthermore, x
()
= 0 because ||x
(t)
||
1
= 1 t ||x
()
||
1
= 1. Together with Eq. 39,
this implies that x
()
is an eigenvector with eigenvalue (A).
In summary, x
()
is a non-negative, normalised eigenvector for the leading eigenvalue
(A) of the irreducible matrix A. From the Perron-Frobenius theorem 8.3 we know that
x
()
must then be the Perron vector.

If the adjacency matrix has more than one eigenvalue on the spectral circle, the von Mises
iteration may not converge.
Example: The network in Fig. 31 has the adjacency matrix A =
_
0 1
1 0
_
with eigen-
Figure 31: A network for which the von Mises iteration does not converge.
values +1 and 1. If you start the von Mises iteration for example with the vector
x
(0)
= (
1
4
,
3
4
)
T
, then the solution oscillates,
x
(t)
=
_
(
1
4
,
3
4
)
T
if t is even,
(
3
4
,
1
4
)
T
otherwise.
However, the network is strongly connected and therefore the eigenvector centrality (i.e.
the Perron vector) is unique, p = (
1
2
,
1
2
)
T
.
If the network is not strongly connected, then the Perron-Frobenius theorem does not
apply. In this case, we can still nd a normalised non-negative eigenvector, but it may
not be unique.
Example:
1 2 3 4
Figure 32: A network which is not strongly connected with no unique eigenvector centrality.
The adjacency matrix of the network depicted in Fig. 32
A =
_
_
_
_
0 0 0 0
1 0 1 0
0 0 0 0
0 0 1 0
_
_
_
_
51
has two orthogonal, normalised, non-negative eigenvectors: p
1
= (0, 1, 0, 0)
T
and p
2
=
(0, 0, 0, 1)
T
. Any convex combination ap
1
+ (1 a)p
2
, a [0, 1] is an eigenvector with
eigenvalue (A) = 0.
There are also cases of networks that are not strongly connected, but the Perron vector
is unique.
Example:
1 2 3
A =
_
_
0 1 0
0 0 1
0 0 0
_
_
p =
_
_
1
0
0
_
_
However, do we really want node 2 to have zero centrality? After all, there is one node
pointing at it. Intuitively, one would therefore assign a higher importance to 2 than to
3. This motivates the search for alternatives to the eigenvector centrality.
9.2 Katz centrality
One idea to prevent zeros in the last example is to give every node a minimum centrality
of in the following variation of the von Mises iteration:
(i) Set t = 0.
(ii) Make an initial guess for the centrality vector x
(0)
0.
(iii) Assign an improved centrality x

i
which is a mix of the centrality of the neighbours
and an intrinsic centrality ,
x

i
=
n

j=1
A
ij
x
(t)
j
+ , > 0, > 0
or in matrix form x

= Ax
(t)
+ 1, where 1 = (1, 1, . . . , 1)
T
.
(iv) Increment t by 1 and dene x
(t)
to be the normalised vector pointing in the direction
of x

,
x
(t)
=
x

||x

||
1
.
Go back to step (iii).
Theorem 9.3:
If A 0 and
_
0 < < 1/(A) if (A) > 0,
> 0 otherwise,
the modied von Mises iteration converges
to
x
Katz
() = lim
t
x
(t)
=
(I A)
1
1
||(I A)
1
1||
1
. (41)
This limit, called Katz centrality, exists even if the network is not strongly connected.
52
Proof:
By induction, one can show that
x
(t)
=
(A)
t
x
(0)
+
_
t1
k=0
(A)
k

(A)
t
x
(0)
+
_
t1
k=0
(A)
k

1
. (42)
The spectral radius of Ais (A) = (A) < 1. According to Thm. 8.4, lim
t
(A)
t
=
0. Furthermore, (I A) is non-singular. This can be seen from the determinant
det(I A) = ()
n
det(A
1
I) = ()
n
p
A
(
1
),
where p
A
is the characteristic polynomial. For p
A
(
1
) = 0 we need
1
to be at
least as small as the largest eigenvalue, but this is outside the permitted range. Hence
det(I A) = 0 which implies that (I A)
1
exists. This allows us to rewrite the
sums in Eq. 42 as follows. First, it follows from straightforward induction on t that
(I A)

t1
k=0
(A)
k
= I (A)
t
. Then we multiply with (I A)
1
from the left to
obtain
t1

k=0
(A)
k
= (I A)
1
(I (A)
t
) (43)
Taking the limit t , we obtain Eq. 41.

Remark: The Katz centrality depends on the parameter . But the second parameter
in the modied von Mises iteration cancels out because of the normalisation.
What value of should we choose? A common practice is to pick close to the maximum.
In the limit 1/(A), the Katz centrality becomes the Perron vector (proof: homework
problem). So, if is near (but not exactly equal to) this limit, the Katz centrality has
a similar interpretation as the eigenvector centrality, but does not suer from the same
problems if the network is not strongly connected.
Example:
The network in Fig. 32 has Katz centralities x
Katz
() =
1
4+3
(1, 1 +2, 1, 1 +)
T
. In the
limit , the centralities are (0,
2
3
, 0,
1
3
)
T
which are the in-degree centralities. The
limit of the Katz centrality is a sensible way to bypass the ambiguity of the eigenvector
centrality.
Interpretation of the Katz centrality:
We can use Eq. 43 to rewrite the Katz centrality of Eq. 41,
x
Katz
i
() =

k=0
_

n
j=1
[A
k
]
ij
_

k=0
_

n
j=1
[A
k
]
ij
_

1
.
From Prop. 4.4 we know that [A
k
]
ij
equals the number of walks from j to i of length k.
The Katz centrality x
Katz
i
counts the number of possible ways to reach i, weighting
each path by a factor
k
. (Because the innite series must converge for [0, 1/(A)],
this observation can be used to determine bounds for the spectral radii of A.)
53
9.3 PageRank
Both eigenvector and Katz centrality, by design, give nodes a large boost in centrality
if another central node points at them. In certain contexts this may not be desirable.
For example, a central web directory like Yahoo! points rather indiscriminately at
many web sites, including my own, but should my web site receive a disproportionately
large centrality in return? In some sense, links from Yahoo! should count relatively little
exactly because Yahoo! has so many outgoing links that one particular connection does
not have much meaning.
How can we reduce the relative inuence of hubs like Yahoo! on the centrality a node i
gains from each of its neighbours j? We can keep the idea of the intrinsic importance
from the Katz centrality, but divide neighbour js centrality x
j
by its out-degree k
out
j
,
x
i
=
n

j=1
A
ij
x
j
k
out
j
+ . (44)
However, Equation 44 is strictly speaking undened if the denominator k
out
j
equals zero.
This can be easily cured by replacing k
out
j
= 0 by

k
out
j
= 1 because, for a node j with
out-degree zero, A
ij
= 0 i and thus A
ij
x
j

k
out
j
= 0. In other words, j does not contribute
to the centrality of any other node i, just as it intuitively ought to be.
We can express this idea in matrix notation by introducing the diagonal matrix

D with
elements

D
ii
= max(k
out
i
, 1) so that
x = A

D
1
x + 1.
Rearranging this equation
(I A

D
1
)x = 1 x = (I A

D
1
)
1
1 =

D(

DA)
1
1
motivates the next denition.
Denition 9.4:
The centrality measure
x
PR
() =

D(

DA)
1
1
||

D(

DA)
1
1||
1
is called PageRank.
Remark:
PageRank is one of the main ingredients of the search engine Google.
Google uses = 0.85, but this choice is apparently based on experimentation rather
than rigorous theory.
Interpretation of PageRank as a random walk:
Equation 44 can be interpreted as the stationary distribution of the following stochastic
process. A random surfer on the World-Wide Web begins surng at some specied web
page. Then the surfer iterates the following steps:
54
If the web site has an out-degree k
out
i
> 0, then
(i) with probability the surfer follows one of the outgoing links chosen uniformly
at random to a new web page,
(ii) with probability (1 ) the surfer types a new URL, chosen uniformly at
random among all existing web pages, into the browser address bar.
If k
out
i
= 0, the surfer performs the teleportation described under (ii) above with
probability 1.
55
10 Spectral network partitioning
Note: In this section we focus on undirected networks. The generalisation to directed
networks is not straightforward.
10.1 What is network partitioning?
Networks can often be divided into groups of nodes so that
there are many links within a group,
there are few links between dierent groups.
Examples of networks with a clear group structure are shown in Fig. 3, 4 and 5. Sometimes
there is additional information about the nodes (e.g. the research area of scientists in
collaboration networks) that can be used to partition the network into groups. But often
such information is missing and the task is to infer the groups from the adjacency matrix.
There are many dierent versions of this problem. Here we only look at the specic case
of
Network bisection:
Suppose the network consists of n nodes. We want to partition the nodes into two sets
N
1
and N
2
consisting of n
1
and n
2
= n n
1
nodes respectively so that the number R of
links connecting dierent sets is minimised.
The number of possible bisections:
There are
_
n
n
1
_
dierent ways to partition the network. For large n, n
1
, n
2
we can use
Stirlings formula lim
n
n!
n
n+1/2
exp(n)
=

2 to nd the approximate relationship


_
n
n
1
_
=
n!
n
1
!n
2
!

2n
n+1/2
exp(n)

2n
n
1
+1/2
1
exp(n
1
)

2n
n
2
+1/2
2
exp(n
2
)
=
n
n+1/2

2n
n
1
+1/2
1
n
n
2
+1/2
2
.
If n
1
n
2
, this is approximately
n
n+1/2

2(n/2)
n+1
=
2
n+1/2

n
,
which grows almost exponentially in n. Even for medium-size networks, the number of
possible partitions becomes too big to investigate every individual case. In practice, one
has to resort to heuristic algorithms which, although not strictly exact, typically return
near-optimal solutions.
10.2 The relaxed problem
Before we develop one such heuristic method, let us write the number R of links between
the sets N
1
and N
2
in terms of the adjacency matrix,
R =
1
2

i,j in
dierent
sets
A
ij
,
56
where we need the factor of
1
2
because the sum contains every pair twice.
We can represent the set to which node i belongs by the auxiliary variable
s
i
=
_
+1 if i N
1
,
1 if i N
2
.
(45)
It follows that
1
2
(1 s
i
s
j
) =
_
1 if i and j are in dierent sets,
0 otherwise.
and thus
R =
1
4
n

i=1
n

j=1
A
ij
(1 s
i
s
j
) =
1
4
_
n

i=1
n

j=1
A
ij
+
n

i=1
n

j=1
A
ij
s
i
s
j
_
.
The rst term in the parentheses can be rewritten as

j
A
ij
=

i
k
i
=

i
k
i
s
2
i
=

j
k
i

ij
s
i
s
j
,
where
ij
is the Kronecker delta. Then
R =
1
4

j
(k
i

ij
A
ij
)s
i
s
j
or in matrix form
R =
1
4
s
T
(DA)s =
1
4
s
T
Ls,
where L is the graph Laplacian.
Network bisection in matrix notation:
Minimise s
T
Ls subject to
(i) s
i
{+1, 1} and
(ii)

i
s
i
= n
1
n
2
. (This constraint xes the cardinalities of N
1
and N
2
to be n
1
and
n
2
, respectively.)
The diculty of this problem lies in the restriction of s
i
to two discrete values. If s
i
could take real values, the situation would simplify tremendously because we could then
use derivatives to nd the minimum. We still keep

i
s
2
i
= n implicit in constraint (i)
above and constraint (ii), but otherwise allow s
i
to have any real value.
Relaxed version of network bisection:
Minimise s
T
Ls subject to
(i)

i
s
2
i
= n and
(ii)

i
s
i
= n
1
n
2
.
57
10.3 Spectral bisection
The relaxed problem can be solved with the usual methods of constrained optimisation.
We introduce two Lagrange multipliers and 2 (the additional factor of 2 will be
convenient later on) and form the Lagrange function
L(s
1
, . . . , s
n
, , ) =

k
L
jk
s
j
s
k
. .
objective function
+
_
n

j
s
2
j
_
. .
constraint (i): =0
+2
_
(n
1
n
2
)

j
s
j
_
. .
constraint (ii): =0
.
Then the maximum satises

s
i
L = 0

j
L
ij
s
j
= s
i
+ Ls = s + 1. (46)
If we multiply the last equation with 1
T
, we can eliminate ,
1
T
Ls = 1
T
s + n
(A)
= 0 =

i
s
i
+ n
(B)
= =
n
1
n
2
n
,
where we have used in (A) that 1 is an eigenvector of L with eigenvalue 0 (see Prop. 7.3)
and in (B) that we impose constraint (ii). Let us dene the new vector
x = s +

1 = s
n
1
n
2
n
1. (47)
It follows from Eq. 46 that
Lx = L
_
s +

1
_
= Ls = s + 1 = x,
which shows that x is an eigenvector of L with eigenvalue . We can, however, rule out
that x = a1, a R, because
1
T
x = 1
T
s

1
T
1 =

i
s
i

n
1
n
2
n
n = n
1
n
2
(n
1
n
2
) = 0.
This still leaves us with many possible eigenvectors and it is not immediately clear which
one is the best candidate. To shed light on this, we note that
R =
1
4
s
T
Ls =
1
4
x
T
Lx =
1
4
x
T
x
and, from Eq. 47,
x
T
x = s
T
s +

(s
T
1 +1
T
s) +

2

2
1
T
1 = n 2
n
1
n
2
n
(n
1
n
2
) +
(n
1
n
2
)
2
n
= 4
n
1
n
2
n
,
thus
R =
n
1
n
2
n
.
58
Since we want to minimise R, we are looking for an eigenvector x which has minimal
eigenvalue , but is not a multiple of 1. We know from Prop. 7.2 that all eigenvalues are
0. If we sort the eigenvalues
1
= 0
2
. . .
n
and if v
1
= 1
T
, v
2
, . . . , v
n
is an
orthogonal basis of eigenvectors with Lv
i
=
i
v
i
, then we are looking for the basis vector
v
2
.
7
We can obtain the solution s
rel
of the relaxed problem from Eq. 47,
s
rel
= v
2
+
n
1
n
2
n
1.
Generally, none of its elements will be +1 or 1, so it is not an exact solution of the
original bisection problem. However, one plausible heuristic is to look for the vector
s {1, +1}
n
that is closest to s
rel
. We are then looking for a minimum of
||s s
rel
||
2
2
= s
T
s +s
T
rel
s
rel
2s
T
s
rel
= 2n 2s
T
s
rel
,
which is the maximum of s
T
s
rel
=

i
s
i
s
rel,i
. Since we xed the total number n
1
of
elements +1 in s, the sum is maximised by assigning s
i
= +1 to those nodes i with the
largest value of s
rel,i
. But s
rel,i
and the i-th element of v
2
only dier by the constant
term (n
1
n
2
)/n, so that we can equivalently assign s
i
= +1 to the n
1
largest entries in
v
2
.
8
Clearly, if v
2
is an eigenvector of L with eigenvalue
2
, then v
2
is eigenvector with the
same eigenvalue. So another heuristic solution is to assign s
i
= +1 to the n
1
smallest
entries in v
2
. This is tantamount to swapping the group labels 1 and 2 in Eq. 45, and
is of course also permitted as a candidate solution. Because the rst heuristic solution
gives us the second one almost for free, we should investigate both and choose the one
with the smaller R.
Spectral bisection algorithm:
(i) Calculate an eigenvector v
2
of the graph Laplacian with the second smallest eigen-
value
2
. (
2
is sometimes called algebraic connectivity.)
(ii) Sort the elements of v
2
in descending order.
(iii) Assign the n
1
nodes corresponding to the largest elements to set N
1
, the rest to N
2
and calculate R.
(iv) Then assign the n
1
nodes corresponding to the smallest elements to set N
1
, the rest
to N
2
and recalculate R.
(v) Between the bisections in steps (iii) and (iv), choose the one with the smaller R.
Example: The network depicted in Fig. 33 has the Laplacian
7
The second-largest eigenvalue
2
may be degenerate, for example
2
=
3
. In this case we should in
principle investigate all linear combinations v
2
+av
3
, but because the relaxed problem is only a heuristic
for bisection anyway, let us not become too obsessed by details at this point.
8
If the n
1
-th largest entry is equal to the (n
1
+ 1)-th, (n + 1 + 2)-th . . . largest entry, then we have
a choice which entry s
i
we want to make +1. Ideally, we would then investigate all possible cases, but
again let us not become distracted by details.
59
1
2
3
4
5
6
7
N
N
1
2
Figure 33: A small illustrative network split into groups of 3 and 4 nodes, respectively.
L =
_
_
_
_
_
_
_
_
_
_
2 1 0 0 0 0 1
1 3 1 0 0 0 1
0 1 4 1 1 1 0
0 0 1 3 1 1 0
0 0 1 1 3 1 0
0 0 1 1 1 4 1
1 1 0 0 0 1 3
_
_
_
_
_
_
_
_
_
_
with algebraic connectivity
2
0.885. The corresponding eigenvector is
v
2
= (1.794, 1.000, 0.679, 1.218, 1.218, 0.679, 1.000)
T
.
If we want to split the network into groups of size n
1
= 3 and n
2
= 4, then spectral
bisection puts the nodes 1, 2 and 7 in one group and the rest in the other group. This is
indeed the optimal split as one can, for this small example, verify by inspection.
Partitioning a network into more than two groups:
So far we have only looked at bisection, that is splitting the network in two. This appears
at rst sight to be only a special case of the more general problem of dividing the nodes
into multiple groups. However, in practice the vast majority of heuristic algorithms to
perform the latter task apply repeated bisection of groups. First the network is split into
two groups, then one or both of the groups are bisected, etc.
60
11 Shortest-path algorithm the unweighted case
In this section we will develop algorithms to determine the shortest path from a node
s to another node t. These algorithms can be implemented as computer programmes
applicable to directed networks (with undirected networks as a special case).
s
t
shortest path
n
o
t

s
h
o
r
t
e
s
t

p
a
t
h
11.1 Network representations
How can we represent networks in computer memory? We have already encountered one
important representation, the adjacency matrix A. Many of our theorems and equations
were expressed in terms of A. If n is the number of nodes, A can be declared in a
computer programme as a two-dimensional n n array. However, storing the network
as a two-dimensional array is often costly in terms of memory. Consider the network in
1
2
3
4
5
Figure 34: A small sparse network.
Fig. 34 and let n be the number of nodes and m be the number of links. Because n = 5,
the adjacency matrix has 5
2
= 25 elements, but only m = 5 of them are equal to 1,
whereas everything else equals 0. This feels like an enormous waste of memory, and for
sparse networks, where the number of links is much less than the maximum n(n 1),
we can indeed nd much less expensive data structures.
Denition 11.1:
Let = {G
i
= (N
i
, L
i
), i N} be a family of networks where the number of nodes
n
i
= card(N
i
) is unbounded. If m
i
= card(L
i
) = O(n
i
), the members of this family are
called sparse.
The O-notation in Def. 11.1 is dened as follows.
61
Denition 11.2:
Let f and g be functions N R. The notation f(n) = O(g(n)) means that there exist
positive constants c and n
0
so that 0 f(n) cg(n) for all n n
0
.
In this notation, the adjacency matrix of a sparse matrix needs O(n
2
) memory to store
information for O(n) links. An alternative data structure that requires only O(n) memory
is the adjacency list. This is actually not a single list, but consists of one separate list
for every node. The list of node i contains the labels of those nodes j for which there is
a link i j.
Example: The adjacency list of the network in Fig. 34 is
node linked to
1 5
2 1, 5
3 2
4
5 2
It is usually a good idea to also store the out-degree k
out
i
of each node i in a separate array
so that we know how many entries there are in is list. The out-degrees are n integers and,
consequently, the total need for memory for the adjacency list plus the out-degrees is still
O(n). If n is large for calculations related to the world-wide web it is not uncommon to
encounter n > 10
7
the adjacency-list representation saves us a lot of memory compared
to the adjacency matrix, but it may cost us in terms of time.
Example: Determine if there is a link between i and j.
Network representation Solution strategy Time needed
Adjacency matrix Look up A
ji
If we have random-access to
the location in memory, O(1).
Adjacency list Go through all entries In the worst-case, there can be n 1
in the list of node i. entries in the list and, if j is not
linked to i, we must investigate all
n 1 entries. O(n)
It is always a good idea to assess the advantages and disadvantages of both representations
before writing a computer programme. However, as a rule of thumb, the adjacency-list
representation is usually the better choice.
11.2 Queues
Another important consideration when writing computer code is the appropriate way
to temporarily store and retrieve information during the execution of the programme.
One rather simple data structure for this purpose is a queue. This is an ordered set Q
62
that maintains a list of numbers in a FIFO (i.e. rst-in, rst-out) order. There are two
basic operations, Enqueue and Dequeue. If we Enqueue a number, it is added at the
last position of Q. If we Dequeue, the number in the front position of Q is returned
as function value and subsequently deleted from Q. After the deletion, the previously
second number moves to the front position and all other numbers also proceed one step
closer to the start of the queue, similar to customers waiting at one single supermarket
till.
Q
Q.head Q.tail
(a)
= 3 = 8
1 2 3 4 5 6 7 8 9 10
11 6 15 14 2
(b) ENQUEUE( ,3) Q
Q
Q.head Q.tail = 3 = 9
1 2 3 4 5 6 7 8 9 10
11 6 15 14 2 3
(c) DEQUEUE( ) Q
Q
Q.head Q.tail = 4 = 9
1 2 3 4 5 6 7 8 9 10
11 6 15 14 2 3
return 11
Figure 35: A queue implemented using an array Q[1 . . . 10]. (a) The queue has 5 elements in
locations Q[3 . . . 7]. (b) The conguration of the queue after calling Enqueue(Q, 3). (c) The
conguration after calling Dequeue(Q).
Figure 35 shows an example. The queue consists of an array Q[1 . . . n] where n is the
maximum number of elements we wish to store. The queue has two attributes, Q.head
and Q.tail. The element currently in the queue are stored in Q[Q.head . . . Q.tail 1].
The queue is empty if Q.head = Q.tail. Initially Q.head = Q.tail = 1. If we attempt
to dequeue an element from an empty queue, the programme should exit with an error
message. Conversely, when Q.tail = n+1 and we try to enqueue an element, Q overows
and again the code should exit. In the pseudocode below I dene one more attribute,
Q.length, which takes on the role of n.
Initialise-Queue(Q)
1 Q.head = Q.tail = 1
63
Enqueue(Q, x)
1 if Q.tail == Q.length + 1
2 error Queue overow.
3 Q[Q.tail] = x
4 Q.tail = Q.tail + 1
Dequeue(Q)
1 if Q.head == Q.tail
2 error Queue underow.
3 x = Q[Q.head]
4 Q.head = Q.head + 1
5 return x
These three subroutines require O(1) time.
11.3 Breadth-rst search
We now develop an algorithm that can nd the shortest path from a specied source
node s to every possible target node t in the out-component of s. The algorithm will
also be able to tell us if t is not in the out-component.
0
1
2
3
Figure 36: Upon initialisation, a breadth-rst search gives the source node a distance label 0.
Then it explores previously unlabelled nodes by preferentially searching for nodes that have a
small distance from the source. In this manner, a breadth-rst search labels nodes in the shell
1 before continuing to shell 2. After shell 2 is nished, the algorithm explores shell 3
etc.
The strategy of this algorithm is to explore the network by stepping from one node whose
distance from s is known to another node whose distance is still unknown. Because
the search for nodes with undiscovered distances proceeds preferentially from nodes with
small established distances, the algorithm tends to search along the breadth of the known
64
frontier rather than penetrating deeper into unknown territory (Fig. 36). For this reason
the algorithm is called breadth-rst search.
To maintain breadth-rst order, the algorithm maintains a queue Q. Initially Q is empty
and all nodes u are given a nominal distance u.d = until they are discovered, except
the source node s to which we assign s.d = 0. When a node i is discovered via a link from
a node j, i is given a distance i.d = j.d +1 and we store the information that we reached
i via j as follows. We say that j is the predecessor of i, and keep this information as
a node attribute i. = j. This attribute will later on allow us to construct the shortest
path from the source s to i. The following pseudocode denotes the network by G, the set
of nodes by G.N and the adjacency list of node u as G.Adj[u].
BFS(G, s)
1 for each node u G.N // Initially all nodes are undiscovered.
2 u.d =
3 u. = nil
4 s.d = 0 // Discover the source.
5 Initialise-Queue(Q)
6 Enqueue(Q, s)
7 while Q.head = Q.tail // Iterate until the queue is empty.
8 u = Dequeue(Q)
9 for each v G.Adj[u]
10 if v.d ==
11 v.d = u.d + 1 // Discover v.
12 v. = u
13 Enqueue(Q, v)
Figure 37 shows how BFS operates on a sample network.
How long will Q have to be in the worst-case? If s is connected to each of the other n1
nodes, Q will hold n 1 elements after the rst iteration of the while loop. Together
with s, which still occupies the rst entry in the array, Q.length = n is a safe choice. The
memory requirements of BFS are then O(n) for the queue plus O(m) for the adjacency
list, which is in total O(n) for a sparse network.
The scaling of the running time of BFS is determined by the sum of the running times
of the for loop in lines 1-3 and the while loop in lines 7-13. The assignments in lines
4-6 only require O(1) time each and will therefore play no role in the limit n . The
for loop initialises the node distances and predecessors which are all O(1) operations and
there are n iterations, so this loop requires O(n) time. The O(1) queue operations in the
while loop are performed at the most n times, and hence are altogether O(n), because
no node can enter the queue more than once. In the for subloop of line 9, we also have
to go through the adjacency list of the dequeued node, which for all nodes together takes
O(m) time. Altogether BFS runs in O(m) + O(n) time, which for sparse networks is
O(n).
Let us convince ourselves that the value of v.d calculated by BFS are indeed the shortest
distances. Let us denote by (s, v) the shortest length of any possible path from s to v.
We begin by establishing the following important property of the shortest-path length.
65
(a)
r s t u
v w x y
0
Q s
0
(b)
r s t u
v w x y
0 1
1
Q w r
1 1
(c)
r s t u
v w x y
0
1
1
2
2
Q r t x
1 2 2
(d)
r s t u
v w x y
1
1
0
2 2
2
Q t x v
2 2 2
(e)
r s t u
v w x y
1
1
0 2
2 2
3
Q x v u
2 2 3
(f)
r s t u
v w x y
1
1
0
2
2
2 3
3
Q v u y
2 3 3
(g)
r s t u
v w x y
1
1
0
2
2
2 3
3
Q u y
3 3
(h)
r s t u
v w x y
1
1
0
2
2
2
3
3
Q y
3
(i)
r s t u
v w x y
1
1
0
2
2
2
3
3
Q empty
Figure 37: The steps carried out by BFS. Undiscovered nodes are white, nodes in the queue
grey and discovered nodes that have left the queue are black. The link from a node to its
predecessor is indicated by a light grey arrow. The numbers in the node are the d values. The
queue is shown at the beginning of each iteration of the while loop. The numbers below the
queue are established d values.
Lemma 11.3:
For any node s and any arbitrary link u v
(s, v) (s, u) + 1.
Proof:
If u is in the out-component of s, then v must also be reachable from s. In this case, one
possible walk from s to v is the one that follows a shortest path from s to u and then
uses the link u v. This walk of length (s, u) + 1 is at least as long as the shortest
path from s to v.
If u is not in the out-component of s, then (s, u) = which is certainly at least as large
as (s, v).

66
Next we show that v.d is an upper bound for (s, v).
Lemma 11.4:
Suppose BFS is run from a source node s. Then at the end of the programme, the
computed value v.d satises v.d (s, v) for all nodes v.
Proof:
The proof is by induction on the number of Enqueue operations. The induction hy-
pothesis is that v.d (s, v) v after Enqueue.
The basis of induction is the rst time we encounter Enqueue which occurs in line
6. The induction hypothesis is true because s.d = 0 = (s, s) and, for all v = s,
v.d = (s, v).
For the induction step, consider a node v that is discovered from u and then enqueued
in line 13. Because of the induction hypothesis, u.d (s, u). Then
v.d
(A)
= u.d + 1 (s, u) + 1
(B)
(s, v),
where (A) follows from line 11 in the pseudocode and (B) from Lemma 11.3. The d values
of all nodes w = v remained unchanged since the last Enqueue, so that the induction
hypothesis is true.

Before we can establish that v.d = (s, v), we rst have to show that the queue can at
all times only contain nodes with at the most two distinct d values.
Lemma 11.5:
Suppose that the queue Q contains during the execution of BFS the nodes (v
1
, v
2
, . . . , v
r
)
in this particular order. Then v
r
.d v
1
.d + 1 and v
i
.d v
i+1
.d for i = 1, 2, . . . , r 1.
Proof:
We use induction on the number of queue operations. The induction basis is the situation
after the rst Enqueue in line 6, when only s is in the queue and the lemma consequently
valid.
For the induction step, we must consider the situation immediately after Dequeue and
Enqueue.
DEQUEUE: If the queue becomes empty after dequeuing v
1
, the lemma certainly
holds. Let us then assume that there is still an element v
2
left in the queue. From
the induction hypothesis v
1
.d v
2
.d and therefore v
r
.d v
1
.d +1 v
2
.d +1. None
of the other inequalities are aected so that the lemma remains true.
ENQUEUE: When a node v is enqueued in line 13, it becomes v
r+1
. At this point
in time, its predecessor u is already removed from Q. The new queue head v
1
was
either in the queue together with u at some point in the past or v
1
was discovered
from u. In both cases v
1
.d u.d. The d value of the new entry in the queue satises
v
r+1
.d = v.d = u.d + 1 v
1
.d + 1. We also have v
r
.d u.d + 1 because of the
induction hypothesis. v
r
.d u.d + 1 = v.d = v
r+1
.d. All other inequalities
needed for the lemma follow immediately from the induction hypothesis.
67
Corollary 11.6:
Suppose v
i
and v
j
are enqueued during BFS and that v
i
is enqueued before v
j
. Then
v
i
.d v
j
.d at the time when v
j
is enqueued.
Proof:
This follows immediately from Lemma 11.5 and the fact that each node only receives at
the most one nite d during the execution of BFS.

Now we are ready to prove that breadth-rst search correctly nds all shortest-path
distances.
Theorem 11.7:
Suppose BFS is run from a source node s. Then
(A) upon termination, v.d = (s, v) for all nodes v,
(B) for any node v = s that is reachable from s, one of the shortest paths from s to v is
a shortest path from s to v. followed by the link v. v.
Proof: (A)
We try to establish a contradiction, so
() assume there exists a node v with v.d = (s, v). If there are several, choose a node
with minimal (s, v).
We know that s.d = 0 is correct, so v = s. From Lemma 11.4, we know v.d (s, v) and
thus v.d > (s, v). We can also conclude that v is reachable from s because otherwise
(s, v) = v.d. Let u be the node immediately preceding v on a shortest path from
s to v, so that (s, v) = (s, u) +1. Because (s, u) < (s, v) we must have u.d = (s, u);
otherwise v would not have been a misclassied node with minimal distance. Combining
these results,
v.d > (s, v) = (s, u) + 1 = u.d + 1. (48)
The node u is, because of its denition, reachable from s and has the correct, thus nite,
d value. During the execution, BFS hence must dequeue u. At this time, v can be in
three dierent states.
v.d == :
Line 11 in the pseudo-code sets v.d = u.d + 1, but this is nite.
v.d < and v / Q:
The algorithm must have already dequeued v previously. From Corollary 11.6,
v.d u.d, contradicting Eq. 48.
v.d < and v Q:
The algorithm must have discovered v from a node w = u that is already dequeued.
At the time of vs rst discovery we have set v.d = w.d + 1. From Corollary 11.6,
we also know w.d u.d. Putting these properties together v.d = w.d +1 u.d +1,
which again contradicts Eq. 48.
68
As a consequence, assumption () must be wrong. BFS assigns the correct distances to
all nodes.
Proof: (B)
If v is reachable, we know from (A) that v.d < and, therefore, v must have been
discovered from some node v. = u with v.d = u.d + 1. Thus, we can obtain a shortest
path from s to v by following a shortest path from s to v. and then taking the link
v. v.

We now know that the distances established during BFS are those of the shortest paths,
but how do we actually obtain the shortest paths? The next lemma provides an important
clue.
Lemma 11.8:
We dene N

as the set of all nodes that have been enqueued during BFS. Then:
(A) We have for every node v N

{s} that v. N

, so that we can properly dene


the auxiliary network G

= (N

, L

) with links L

= {link v v. : v N

{s}}.
9
(B) The out-degree in G

of all nodes in N

{s} equals 1. The out-degree of s equals


0.
(C) G

is a directed acyclic network.


(D) There is exactly one path from t N

{s} to s in (N

, L

). This is a shortest path


from s to t in the original network in reverse order.
Proof: (A)
All nodes in N

are, by denition, enqueued at some point during the execution of BFS.


Except s, all of these must have undergone the assignment v. = u in line 12, where u is
a previously enqueued node.
Proof: (B)
Follows from (A) and the fact that for every v there is exactly one v.. The link s
s. = Nil is explicitly removed by the denition of L

.
Proof: (C)
Before v is enqueued, lines 11 and 12 in BFS have set v.d = (v.).d + 1 > (v.).d.
Because all nodes in N

are enqueued exactly once, this inequality stays intact until


termination. Thus, the distance labels satisfy the conditions of Prop. 4.7(B). The network
must therefore be acyclic.
Proof: (D)
Consider the following algorithm.
9
In a directed network, the link v v. may not be part of the original network, but v. v is
guaranteed to exist because v was discovered in via this link.
69
Shortest-BFS-Path(G, s, t)
1 BFS(G, s)
2 if t = s and t. == nil
3 Print t is not in the out-component of s
4 else u = t.
5 print The predecessor of t is u
6 while u. = nil
7 v = u.
8 print The predecessor of u is v
9 u = v
The while loop repeatedly steps from a node to its predecessor if it exists. The loop
must terminate because the G

is acyclic; otherwise, if we ran into an endless loop, we


would have to revisit one of the nodes u and could thus construct a cycle in N

. We know
from (B) that the only node in N

without predecessor is s, so this must be the node


where the while loop terminates. At all previous steps, there was no alternative link in
L

from the current node, so the path to s is unique. Using Thm. 11.7(B) inductively
proves that the discovered path must indeed be a shortest path.

Remark: BFS calculates the shortest paths from s to every other node in the network.
This may look like overkill if all we want is the shortest path from s to one specic
target node t. We can of course terminate BFS earlier, namely as soon as we discover t.
However, this does not change the worst-case run time O(m) +O(n). In fact, there is no
algorithm known to nd a single shortest path that has a better performance.
70
12 Shortest-path algorithm - the weighted case
In Sec. 11 we implicitly used the minimum number of links between two nodes as a
measure of distance. This is appropriate in many, but not all networks. Especially in
networks where some commodity is transported across the links, there are usually dierent
costs associated with dierent links. For example, these may be travel times or ticket
costs in a passenger network or transmission delays on the Internet. One important case
are costs proportional to geometric distances, measured in kilometres rather than in the
number of traversed links (Fig. 38). But even if the costs are not directly determined by
path with
fewest links
path of shortest
geometric distance
s
t
Figure 38: If links are weighted, the path with the smallest number of links may not be the path
with the smallest sum of weights. Depicted is an example where links are weighted by Euclidean
distance. Obviously, the path with the smallest number of links takes a big geometric detour.
Conversely, the path with the shortest geometric distance traverses many dierent links.
geometry, it is often convenient to interpret them as some kind of distance between the
nodes that we would like to minimise over a path from a node s to another node t.
Let us denote by c
ij
the cost or weight of a link from j to i. We can store the cost as
an additional attribute in the adjacency list, so that the memory requirements remain
O(m) + O(n).
Weighted shortest-path problem:
For two given nodes s and t, nd a path P : s = v
0
v
1
. . . v
k
= t so that
C(P) =

k
i=1
c
v
i
,v
i1
is mimimised.
We will investigate only the case where c
ij
0 for all links j i, which covers the most
common problems.
10
For example, the shortest-path problem in Sec. 11 is the special
case where all c
ij
= 1.
12.1 Dijkstras algorithm
If the c
ij
do not have a constant value, breadth-rst search does not give the correct
answer (Fig. 38). The problem is that, when we discover a node v from another node
10
If negative costs are permitted, the problem becomes considerably more dicult. For example, if
there is a cycle of negative weight, we may be able to minimise the cost by going around the cycle
innitely often.
71
t, we no longer know with certainty that a shortest path to v will pass through the link
t v (Fig. 39). At the moment of discovery, the best we can do is to provide an upper
s
u w
v t
s
h
o
r
t
e
s
t

p
a
t
h

t
o
u
9
5
1
1 1
0
9
10
14
Figure 39: A network with weighted distances (numbers next to the links). Suppose we have
already determined a shortest path from s to t and we know it is of length 9. Exploring the
neighbours of t, we can establish upper bounds (red) of their distance from s, but these are
generally overestimates, as seen here in the case of v.
bound on the distance by adding the link distances of the neighbours to an established
distance (s, v). However, we will prove that the smallest estimated distance is exact.
The argument in short is this.
s
u
x
y
known
distances S
estimated distances S
P
P
1
2
Figure 40: Paths in Dijkstras algorithm.
Consider the situation depicted in Fig. 40. Suppose that we know the shortest path
from s to x and we also know that u is the node with the smallest estimated (but not
yet certain) distance. If us estimated distance is not the exact shortest-path distance,
then there must be another shorter path s, . . . , x, y, . . . , u. Because all distances are non-
negative, the sub-path from s to y via x must be shorter than the path along which
we have rst discovered u (the upper path in the gure). But this contradicts that us
estimated distance is smaller than ys.
This idea leads to the the following procedure, known as Dijkstras algorithm.
(i) Initialisation:
Set S = ,

S = N.
For all nodes v, set v.d = and v. = Nil.
(We will prove that S is the set of nodes with known distance from s and

S its
complement. But let us assume we do not know this yet.)
(ii) Set s.d = 0, but do not yet move it to S.
72
(iii) Let u

S be a node in

S for which u.d = min{d(j) : j

S}. Insert u into S and
remove it from

S.
(iv) Go through all neighbours v of u. If v.d > u.d + c
vu
, then update our distance
estimate: v.d = u.d + c
vu
. In this case also set v. = u.
(v) If

S is not yet equal to , go back to step (iii).
In Fig. 41 a numerical example illustrates the steps in Dijkstras algorithm works.
(a)
s
6
4
2
2
2
1
0
3
7
0 (b)
s
6
4
2
2
2
1
0
3
7
4
6
0
(c)
s
6
4
2
2
2
1
0
3
7
6
6
5
0
4
(d)
s
6
4
2
2
2
1
0
3
7
6
5
12 0
4
5
(e)
s
6
4
2
2
2
1
0
3
7
6
8 0
4
5
5
(f)
s
6
4
2
2
2
1
0
3
7
8 0
4
5
5
6
(g)
s
6
4
2
2
2
1
0
3
7
0
4
5
5
6
8
Figure 41: An illustration of Dijkstras example. Undiscovered nodes with d value equal to
are white. Grey nodes are discovered but their distances are only estimates. Black nodes are
moved to the set S. The link from a node to its predecessor is indicated by a light grey arrow.
The situations depicted are at the beginning of step (iii).
12.2 Proof that Dijkstras algorithm is correct
Let us now formally prove that the u.d value in Dijkstras algorithm returns the correct
shortest-path distance (s, u). We rst need to establish that subpaths of shortest paths
are themselves shortest paths.
Lemma 12.1:
Let P : v
0
v
1
. . . v
k
be a shortest path from v
0
to v
k
and, for any i, j with
0 i j k, let P
ij
: v
i
v
i+1
. . . v
j
be the subpath of P from v
i
to v
j
. Then
P
ij
is a shortest path from v
i
to v
j
.
73
Proof:
If there is another path

P
ij
from v
i
to v
j
with less weight than P
ij
, then we could go from
v
0
to v
k
along the following path

P:
Follow P from v
0
to v
i
.
Follow

P
ij
from v
i
to v
j
.
Follow P from v
j
to v
k
.
This would be a path of smaller weight than P which contradicts the conditions in the
lemma.

Another fundamental property of shortest paths, is a network-equivalent of the triangle
inequality.
Lemma 12.2:
Consider a weighted, directed network with c
vu
0 for all links u v and source node
s. Then the inequality
(s, v) (s, u) + c
vu
(49)
for all links u v.
Proof:
Case 1: u is not in the out-component of s
Then (s, u) = and, regardless if (s, v) is nite or not, the inequality in Eq. 49 is
satised.
Case 2: u is in the out-component of s
Then v is also in the out-component. Let P be a shortest path from s to v. This shortest
path must, by denition, have no more weight than the particular path that takes a
shortest path from s to u followed by the link u v.

Next we show that the d labels assigned during Dijkstras algorithm are upper bounds of
the shortest-path distances.
Lemma 12.3:
At any moment during the execution of Dijkstras algorithm, v.d (s, v) for all nodes
v.
Proof:
The proof is by induction over the number of distance updates in steps (ii) and (iv) of
Dijkstras algorithm.
Induction hypothesis: v.d (s, v) is true for all v after a distance update.
Induction basis: The rst update is in step (ii) where we set s.d = 0 which is the correct
shortest-path distance (s, s). All other distances are at this point v.d = which is
certainly an upper bound for (s, v).
Induction step: Consider what happens in step (iv) to a node v whose distance we are
about to update because we have discovered a link u v that improves our estimate.
Then
v.d = u.d + c
vu
(A)
(s, u) + c
vu
(B)
(s, v),
where we have used (A) the induction hypothesis and (B) the triangle inequality (Eq. 49).
All other distances x.d remain unchanged and satisfy x.d (s, x) because of the indiction
hypothesis.
74
Corollary 12.4:
If Dijkstras algorithm sets v.d = (s, v) at any point during its execution, then this
equality is maintained until termination.
Proof:
In Dijkstras algorithm, any distance update can only decrease, but never increase, the
label v.d. The corollary then follows from Lemma 12.3.

Now we are prepared for the proof that Dijkstras algorithm is correct.
Theorem 12.5:
Dijkstras algorithm, run on a weighted, directed network with weights c
vu
0 for all
links u v and source s, terminates with u.d = (s, u) for all nodes u.
Proof:
The proof is by induction on the number of iterations of step (iii).
Induction hypothesis: The distance label of each node in S is correct. It suces to show
that the newly added node u in step (iii) satises u.d = (s, u) immediately after step
(iii). Because of Corollary 12.4 we then know that the d value of this node will not change
during the rest of the algorithm.
Induction basis: Initially S = {s} which has the correct d value s.d = (s, s) = 0.
Induction step:
Suppose there exists a node u that has u.d = (s, u) when it is added to S in step (iii).
If there are several such nodes, we take the rst misclassied node u encountered during
the execution of the algorithm. We know u = s, because we have already established
that s is given the correct d value. Therefore, S = just before u is added. We also know
that there must be a path from s to u because, from u.d (s, u) and (s, u) = , we
would otherwise have u.d = (s, u). Let us then choose a shortest path P from s to u.
Before adding u to S, P connects a node in S (namely s) to a node in the complement

S (namely u). Let us consider the rst node y along P such that y

S and let x S be
ys predecessor along P. Thus, as Fig. 40 illustrates, we can decompose P into
a path P
1
from s to x that is completely in S,
the link x y,
the rest of the path, P
2
.
It is possible that P
1
or P
2
consist of zero links.
We now want to show that y.d = (s, y) when u is added to S. To see that this is the
case, we note that x.d = (s, x) because of the induction hypothesis. We then must have
y.d x.d + c
yx
(50)
because we must have already scanned y as a neighbour of x in step (iv) where we have
either set y.d = x.d + c
yx
or we have found at that point y.d x.d + c
yx
and we did not
change y.d. During any subsequent encounter of y in step (iv), y.d cannot have increased,
so that Eq. 50 must be true. Because x.d = (s, x), we can deduce
y.d (s, x) + c
yx
.
75
Moreover, from Lemma 12.1, (s, x) + c
yx
= (s, y), thus
y.d (s, y).
But we already know from Lemma 12.3 that y.d (s, y) and therefore
y.d = (s, y).
This now allows us to construct a contradiction to prove u.d = (s, u). Because y appears
before u on a shortest path from s to u and all link weights are non-negative (including
those on path P
2
), we have
y.d = (s, y) (s, u) u.d
where we have again used Lemma 12.3 in the last inequality. However, y is in

S and, if
y.d < u.d, we should have chosen y instead of u to be added to S rst. The only way out
of this contradiction is y.d = u.d, but then
y.d = (s, y) = (s, u) = u.d
and thus (s, u) = u.d violating our assumption about u. In summary, all nodes added
to S have the correct distance labels.

For the sake of completeness, we would still have to show the equivalent of Lemma 11.8
for Dijkstras algorithm, namely that v. is indeed a predecessor on a shortest path to v.
I will leave this as a homework problem.
12.3 Binary heaps
Dijkstras algorithm is based on nding the minimum of the d values. Implemented
naively, we would in the worst case read through a list of n items to nd the minimum
which requires O(n) time for every iteration of step (iii). But we can do much better if
we implement Dijkstras algorithm using a binary heap. It reduces the time per iteration
of (iii) to O(log n) which is much less than O(n) because the logarithm increases quite
slowly.
A binary heap is a special version of a binary tree with an associated index. Every
element in the tree consists of a unique label N and a value R. All levels of the tree
are completely lled except possibly the lowest level, which is lled contiguously starting
from the left.
The tree is stored in memory as an array H whose elements are the same as those in the
tree. The array order is determined by going through the tree from left to right and top
to bottom. An example is shown in Fig. 42.
The index I is in some sense the inverse of H. If the element labeled i is at the j-th
position of H, then j is at the i-th position of I (Fig. 42).
The dening property of the binary heap is that the tree is partially ordered. This
means that the value of every element is greater than or equal to the value of the element
above. As a consequence, the element with the smallest value is at the top of the tree.
This property allows us to quickly identify the minimum value in a set and is the main
reason why binary heaps are used in practice.
We can perform the following operations on the heap.
76
Tree
5
0.4
6
1.9
1
1.3
12
5.6
3
2.0
11
5.7
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
Label
Value
Next
available
space
Array H
5
0.4
6
1.9
1
1.3
12
5.6
3
2.0
11
5.7
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
1 2 3 4 5 6 7 8 9 10 11 12 ...
Index I
Label
Array position
1 2 3 4 5 6 7 8 9 10 11 12
3 11 5 8 1 2 9 7 12 10 6 4
Figure 42: Illustration of a binary heap.
77
5
0.4
6
1.9
1
1.3
12
5.6
3
2.0
11
5.7
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
13
1.0
5
0.4
6
1.9
1
1.3
12
5.6
3
2.0
13
1.0
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
11
5.7
5
0.4
6
1.9
13
1.0
12
5.6
3
2.0
1
1.3
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
11
5.7
Figure 43: After inserting the element with label 13 into the heap of Fig. 42, we need two sift-up
operations to restore partial order in the heap.
78
5
0.4
6
1.9
13
1.0
12
5.6
3
2.0
1
1.3
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
11
5.7
return
11
5.7
6
1.9
13
1.0
12
5.6
3
2.0
1
1.3
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
13
1.0
6
1.9
11
5.7
12
5.6
3
2.0
1
1.3
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
13
1.0
6
1.9
1
1.3
12
5.6
3
2.0
11
5.7
8
1.7
4
8.4
7
9.1
10
7.4
2
3.8
9
6.6
Figure 44: After deleting the minimum, we need two sift-down operations.
79
Inserting an element:
When we add an item to the heap, it is placed in the rst available space at the
bottom of the tree. If the bottom row is full, we start a new row. The new value
may violate the heap property if it is smaller than the value above. To restore the
order, we perform a sift-up operation: we swap the element with the one above
it. If the tree is still not ordered, we repeat the sift-up operation until the new
item has either an upper neighbour of smaller value or has reached the top of the
tree (Fig. 43). If there are n elements in the tree, the maximum number of sift-up
operations is the depth of the tree which scales O(log n).
Decreasing a value in the heap:
If we decrease the value of an element that is already in the heap, we may violate
the partial order. To restore it, we perform the same sift-up operation as described
above. In the worst case, we need O(log n) iterations until the element has reached
its correct position.
Deleting the minimum value:
It is easy to nd the minimum: it must be at the top of the tree. What follows
after we remove this element is a little more complicated. We rst ll the empty
space with the last element in the tree. This element usually does not have the
minimum value and thus violates the partial order. To move it to the right place,
we perform sift-down operations: if the value is bigger than one of the neighbours
below, it trades position with the smallest such neighbour. In the worst-case, we
may have to iterate O(log n) sift-down operations until the element is back at the
bottom of the tree.
In pseudo-code notation, we initialise the heap simply by setting its length equal to 0.
Initialise-Heap(H)
1 H.length = 0
We need functions that can determine which elements are above or below a certain ele-
ment. I will call the element above parent and the two elements below left and right
child. The following functions use the positions in H as input and output.
Parent(c)
1 return c/2
Left-Child(p)
1 return 2p
Right-Child(p)
1 return 2p + 1
1
2 3
4 5 6 7
8 9 10 11 12 13 14 15
The next function returns the array position of the child with the smaller value. Here
and in the rest of the pseudo-code, it is prudent to check if the called heap position is
indeed in the heap. For the sake of simplicity, I omit such sanity checks here.
80
Min-Child(H, p)
1 l = Left-Child(p)
2 r = Right-Child(p)
3 if H[l].v H[r].v
4 return l
5 else return r
We swap two elements as follows.
Swap(H, I, pos1, pos2)
1 auxl = H[pos1].l // Swap the elements at pos1-th and pos2-th position in the tree.
2 auxv = H[pos1].v
3 H[pos1].l = H[pos2].l
4 H[pos1].v = H[pos2].v
5 H[pos2].l = auxl
6 H[pos2].v = auxv
7 I[H[pos1].l] = pos1 // Update the index.
8 I[H[pos2].l] = pos2
The sift-up operations perform repeated swaps on a tree element and its parent.
Sift-Up(H, I, pos)
1 c = pos // Child.
2 p = Parent(c) // Parent.
3 while c > 1 and H[c].v < H[p].v // Iterate until child is at top of tree or order is restored.
4 Swap(H, I, c, p)
5 c = p // New child.
6 p = Parent(c) // New parent.
Sifting down involves swaps with the smaller-valued child.
Sift-Down(H, I, pos)
1 p = pos // Parent.
2 c = Min-Child(p) // Child.
3 while c H.length and H[p].v > H[c].v // Iterate until parent is at bottom or order is restored.
4 Swap(H, I, p, c)
5 p = c // New parent.
6 c = Min-Child(H, p) // New child.
Insertion adds a new element at the end of the heap which is then sifted up.
Insert(H, I, label, value)
1 H.length = H.length + 1
2 H[H.length].l = label
3 H[H.length].v = value
4 I[label] = H.length
5 Sift-Up(H, H.length)
81
Decreasing the value of an existing node must also be followed by iterative sift-up oper-
ations.
Decrease-Value(H, I, label, value)
1 if H[I[label]].v < value
2 error New value greater than current value
3 H[I[label]].v = value
4 Sift-Up(H, I, I[label])
Deleting the minimum, on the other hand, requires the sift-down routine.
Delete-Min(H, I)
1 if H.length == 0
2 error Heap empty
3 minl = H[0].l // The minimum is at the top of the tree.
4 minv = H[0].v
5 H[0].l = H[H.length].l // Move last element to the top.
6 H[0].v = H[H.length].v
7 H[H.length].v = // Make sure last heap position is never returned as minimum child.
8 H.length = H.length 1 // Reduce heap size.
9 I[H[0].l] = 0 // Update index.
10 Sift-Down(H, I, 0)
11 return minl
The procedures Parent, Left-Child, Right-Child, Min-Child and Swap are all
O(1) in time. The while -loop in Sift-Up and Sift-Down are carried out O(log n)
times so that Sift-Up, Sift-Down, Insert, Decrease-Value and Delete-Min are
all O(log n) procedures.
12.4 Heap implementation of Dijkstras algorithm
Now we are prepared to implement step (iii) in Dijkstras algorithm, where we need to
nd the minimum of all estimated distances, using a binary heap. This is admittedly
more dicult to code than a straightforward scan through all estimated distances, but
for sparse networks it saves us a substantial amount of time.
There is one nal subtlety that saves us a little more time. We do not need to keep the
entire set

S stored in the heap because nodes with an estimated distance will only
be returned as a minimum after the distances in the complete out-component of s are
exactly known. But then we can stop the whole process because the remaining innite
distances are correct. We use this observation in the pseudo-code below where we only
transfer nodes with a nite estimated distance to the heap. We denote the network by
G, the set of nodes by G.N, the adjacency list of node u by G.Adj[u] and the set of link
weights by c.
82
Dijkstra(G, c, s)
1 for each node u G.N // Initially all nodes are undiscovered.
2 u.d =
3 u. = Nil
4 s.d = 0 // Discover the source.
5 Initialise-Heap(H)
6 Insert(H, I, s, s.d)
7 while H.length = 0 // Iterate until the heap is empty.
8 u = Delete-Min(H, I)
9 for each v G.Adj[u]
10 estimate = u.d + c
vu
// New distance estimate.
11 if v.d > estimate // Only proceed if estimate is an improvement.
12 if v.d ==
13 v.d = estimate // Discover v.
14 v. = u
15 Insert(H, I, v, v.d)
16 else v.d = estimate // We have found a better estimate.
17 v. = u
18 Decrease-Value(H, I, v, v.d)
If we label the nodes 1, . . . , n, H and index I are both arrays of length n. Including the
space needed for the adjacency list, we need a total memory of O(m) + O(n).
The run-time is determined by the number of heap operations Delete-Min, Insert
and Decrease-Value. We encounter the rst two at the most n times and the third
at the most m times. Since every single heap operation needs O(log n) time, Dijkstras
algorithm runs in O((m+n) log n). For sparse network, this simplies to O(nlog n) which
is the fastest weighted shortest-path algorithm known to date.
11
If we are interested in the shortest paths as well as the distances, we should run the
following code which is almost identical to the one we have seen in the unweighted case.
The only dierence is that we call Dijkstra instead of BFS in line 2.
Shortest-Dijkstra-Path(G, c, s, t)
1 Dijkstra(G, c, s)
2 if t = s and t. == nil
3 Print t is not in the out-component of s
4 else u = t.
5 print The predecessor of t is u
6 while u. = nil
7 v = u.
8 print The predecessor of u is v
9 u = v
11
If the network is not sparse, one can achieve a better asymptotic run time O(m + nlog n) with a
data structure known as Fibonacci heap. In practice, most networks are sparse, so that a Fibonacci heap
does not accelerate the computation compared to a binary heap. For sparse networks, the Fibonacci
heap requires so much computational overhead that it is usually even slower.
83
13 Minimum cost ows basic algorithms
13.1 Introduction
In a minimum cost ow problem, we wish to nd a ow of a commodity from a set of
supply nodes to a set of demand nodes that minimises the total cost caused by trans-
porting the commodity across the network. Minimum cost ow problems arise in many
industrial applications.
Example:
A car manufacturer has two production plants, delivers to two retail centres and oers
three dierent car models. The retail centres request a specic number of cars of each
model. The rm must
determine the production plan of each model at each plant,
nd a shipping pattern that satises the demands of each retail centre,
minimise the overall cost of production and transportation.
Plant
nodes
Plant/model
nodes
Retailer/model
nodes
Retailer
nodes
p
1
p
2
p
2
/m
3
p
2
/m
2
p
2
/m
1
p
1
/m
2
p
1
/m
1
r
2
/m
2
r
2
/m
1
r
1
/m
3
r
1
/m
2
r
1
/m
1
r
1
r
2
Figure 45: Production-distribution model.
We can map this problem onto a network by introducing four kinds of nodes (Fig. 45)
plant nodes, representing the various plants,
plant/model nodes, corresponding to each model made at a plant,
retailer/model nodes, corresponding to the models required by each retailer,
retailer nodes, representing each retailer.
There are three types of links.
Production links, connecting a plant to a plant/model node. The cost of such a link
is the cost of producing the model at this plant.
Transportation links, connecting plant/model nodes to retailer/model nodes. The
cost of such a link is the cost of shipping one car from the plant to the retail centre.
84
Demand links, connecting retailer/model nodes to the retailer nodes. These arcs
have zero cost.
An important feature of such distribution problems are capacity constraints.
Maximum capacity for production links: Production plants can only manufacture a
limited number of cars per unit time.
Maximum capacity for transportation links: The number of available trains/ships
etc. to deliver the products to the retail centres is limited.
Maximum capacity for demand links: The retail centre can only sell as many cars
as demanded by the customers.
The optimal solution for the rm is a minimum cost ow of cars from the plant nodes to
the retailer nodes that satisfy these capacity constraints.
13.2 Notation and assumptions
Let G = (N, L) be a directed network with a cost c
l
and a capacity u
l
associated with
every link l L. We associate with each node i N a number r
i
which indicates
the supply if r
i
> 0,
the demand if r
i
< 0.
Denition 13.1:
Let G = (N, L) be a directed network. A vector f = (f
l
)
lL
that satises the constraints
(ow balance)

link l points out of i


f
l
. .
out-ow

link l points into i


f
l
. .
in-ow
= r
i
for all i N, (51)
(capacity constraints) 0 f
l
u
l
for all l L. (52)
is called a feasible ow.
Minimum cost ow problem:
Find the feasible ow that minimises
C(f) =

lL
c
l
f
l
. (53)
Assumptions:
(A) All input data (cost, supply/demand, capacity) are integers.
(B) There exists a feasible ow.
(C) The total supply equals the total demand,

iN
r
i
= 0.
(D) All costs are non-negative, c
l
0 for all l L.
(E) If L contains a link i j, then it does not contain a link in the opposite direction
j i.
85
(a)
i
j
u
1
u
2
(b)
i
j
k u
1
u
2
u
2
Figure 46: Converting a network (a) with antiparallel links to an equivalent one (b) without
antiparallel links. The numbers indicate capacities. We add an auxiliary node x and replace
the link j i by the pair of links j k and k i with the same capacity u
2
as the original
link.
The last assumption is primarily to make the notation simpler. It does not actually cause
any loss of generality because, if there are antiparallel links, we can perform the network
transformation depicted in Fig. 46.
Denition 13.2:
Let G = (N, L) be a directed network and f a vector satisfying the capacity constraints
0 f
l
u
l
. Such a vector is called a pseudo-ow because it may not satisfy the ow
balance equation 51. Dene an additional set of links L
mirror
by:
link i j L
mirror
link j i L,
so that L
mirror
contains the antiparallel links of L. Because of assumption (E), L and
L
mirror
are disjoint sets. This allows us to dene the function mirror : (L L
mirror
)
(L L
mirror
) with
mirror(l) =
_
link j i L if l : i j L
mirror
,
link j i L
mirror
if l : i j L,
so mirror(l) is the antiparallel link of l.
The residual cost c
res
l
is dened for all l (L L
mirror
) by
c
res
l
=
_
c
l
if l L,
c
mirror(l)
if l L
mirror
.
and the residual capacity is dened by
u
res
l
=
_
u
l
f
l
if l L,
f
mirror(l)
if l L
mirror
,
Denition 13.3:
The residual network for a given network G = (N, L) and a given pseudo-ow f is the
network G(f) = (N, L(f)) with L(f) = {l L L
mirror
: u
res
l
> 0}.
Example:
In Fig. 47(a), the black arrows show a directed network G with costs c
l
and capacities
86
i j
(cost, capacity)
(a)
a
b
c
d
(
2
,
2
)
(
2
,
4
)
(1,2)
(
3
,
3
)
(
1
,
5
)
(b)
2
2
2
4
a
b
c
d
flow
i j
(residual cost,
residual capacity)
(c)
a
b
c
d
(
-
2
,
2
)
(
2
,
2
)
(
-
2
,
2
)
(-1,2)
(
3
,
3
)
(
-
1
,
4
) (
1
,
1
)
Figure 47: (a) Original network. (b) Flow. (c) Residual network.
u
l
indicated near the links. The ows f
l
are given by the red numbers in (b). The
corresponding residual network G(f) is shown in Fig. 47(b), where the numbers near the
links are now the residual costs and capacities.
Motivation behind dening the residual network:
Most algorithms to nd minimum cost ows are iterative and construct an intermediate
solution f. In the next iteration, the algorithm can only add ow to f on the links in the
residual network G(f):
On a link l where f
l
is at the maximum capacity, we cannot add more ow on l.
However, we can send ow on the antiparallel link which cancels out some of the
ow on l.
If f
l
= 0, we can add ow to l in the next iteration as long as u
l
> 0. However, we
cannot reduce the ow on l by adding ow on the antiparallel link.
If 0 < f
l
< u
l
, we can either add ow in the direction of l or reduce it by adding
ow in the opposite direction,
We need two more denitions, namely node potentials and reduced costs, before we can
present our minimum cost ow algorithm.
Denition 13.4:
Any set of real values = (
1
, . . . ,
n
) associated with the nodes 1, . . . , n is called
a node potential.
If the link l points from node i to j, its reduced cost with respect to the node
potential is dened by c

l
= c
res
l

i
+
j
.
The following pseudo-code implements one possible technique to solve the minimum cost
ow problem, namely the successive shortest path algorithm.
87
Successive-Shortest-Path(G, {c
l
}
lL
, {u
l
}
lL
, {r
i
}
iN
)
1 for each link l
2 f
l
= 0
3 for each node i
4
i
= 0
5 e
i
= r
i
// Initialise supply and demand.
6 Initialise the sets E = {i : e(i) > 0} and D = {i : e(i) < 0}.
7 while E =
8 Select a node p E and a node q D.
9 Determine the shortest path distances (p, i) from node p to all other nodes i
in the residual network G(f) where the link weights are the reduced costs c

l
.
Let P be the shortest path from node p to node q.
10 for each node i
11
i
=
i
(p, i)
12 Determine = min({e
p
, e
q
} {u
res
l
: l P}).
13 Augment units of ow along the path P.
14 Update f, G(f), E, D and all reduced costs c

l
.
Example:
In Fig. 48(a), the only only supply node is a and the only demand node is d. Thus,
initially E = {a} and D = {d}. The shortest path distances with respect to the reduced
costs are (a, b) = 2, (a, c) = 2 and (a, d) = 3. The shortest path is P : a c
d. Figure 48(b) shows the updated node potentials and reduced costs. We can send
= min{e
a
, e
d
, u
res
ac
, u
res
cd
} = min{4, 4, 2, 5} = 5 units of ow along P. Afterwards, the
updated residual network looks as in Fig. 48(c).
In the second iteration, we have again E = {a} and D = {d}, but the distances are now
(a, b) = 0, (a, c) = 1 and (a, d) = 1. The shortest path is P : a b c d. The
resulting node potentials and reduced costs are shown in Fig. 48(d). We can augment
the ow by min{e
a
, e
d
, u
res
ab
, u
res
bc
, u
res
cd
} = min{2, 2, 4, 2, 3} = 2 units. At the end of this
iteration, e
a
= e
b
= e
c
= e
d
= 0 and the algorithm terminates.
Remark:
The successive shortest path algorithm is relatively easy to implement and adequate
for many purposes. If U is an upper bound on the largest supply r
i
of a node and if
Dijkstras algorithm is implemented using binary heaps, the run-time scales O(U(m +
n)nlog n). However, there are alternative methods (known as capacity scaling or cost
scaling algorithms) that achieve better worst-case run times.
Convex cost ows:
In Equation 53, we have assumed that the cost c
l
is independent of the ow. In some
applications this is not true. For example in electrical resistor networks, the current f
minimises the function C(f) =

l
R
l
f
2
l
, where R
l
is the Ohmic resistance. More generally,
we would have
C(f) =

l
h
l
(f
l
).
If h
l
is a monotonic, convex, piecewise linear function with h
l
(0) = 0, there is a quick
and dirty way to apply the successive shortest path algorithm. Consider a link whose
88
i j
(reduced cost,
residual capacity)
(a)
a
b
c
d
(
2
,

2
)
(
2
,

4
)
(1, 2)
(
3
,

3
)
(
1
,

5
)
e
a
=4

a
=0
e
b
=0

b
=0
e
d
=4

d
=0
e
c
=0

c
=0
(b)
a
b
c
d
(
0
,

2
)
(
0
,

4
)
(1, 2)
(
2
,

3
)
(
0
,

5
)
e
a
=4

a
=0
e
b
=0

b
=2
e
d
=4

d
=3
e
c
=0

c
=2
(c)
a
b
c
d
(
0
,

2
)
(
0
,

4
)
(1, 2)
(
2
,

3
)
(
0
,

3
)
(
0
,

2
)
e
a
=2

a
=0
e
b
=0

b
=2
e
d
=2

d
=3
e
c
=0

c
=2
(d)
a
b
c
d
(
1
,

2
)
(
0
,

4
)
(0, 2)
(
1
,

3
)
(
0
,

3
)
(
0
,

2
)
e
a
=2

a
=0
e
b
=0

b
=2
e
d
=2

d
=4
e
c
=0

c
=3
(e)
a
b
c
d
(
1
,

2
)
(
0
,

2
)
(
0
,

2
)
(0, 2)
(
1
,

3
)
(
0
,

1
)
(
0
,

4
)
e
a
=0

a
=0
e
b
=0

b
=2
e
d
=0

d
=4
e
c
=0

c
=3
Figure 48: Illustration of the successive shortest path algorithm. (a) Initial network. (b)
Network after updating the node potentials . (c) Network after augmenting two units of ow
along the path a c d. (d) Network after updating the node potentials . (e) Network
after augmenting two units along a b c d.
89
cost is given by the function depicted in Fig. 49 If we replace the single link l with 4
x
h(x)
1 2 3 4
5
10
15
1
3
6
9
slopes
Figure 49: Example of a piecewise linear, monotonic, convex function.
dierent links of capacity 1 and costs equal to the dierent slopes (Fig. 50), then we can
apply exactly the same algorithm as before to this extended network. There are better
i j
(cost, capacity)
(a)
i j
(
h( f )
f
, 4)
(b)
i j
(1,1)
(3,1)
(6,1)
(9,1)
Figure 50: Illustrating the network transformation from (a) a ow-dependent cost to (b) a
ow-independent cost for the function shown in Fig. 49.
tailor-made algorithms for convex cost ow problems, but this network transformation is
particularly simple to programme.
90
14 The Price of Anarchy
In this section, we look at uncapacitated, directed networks. We drop the assumption
that ows and costs must be integers.
14.1 Introduction
Example 14.1:
(Pigou, 1920)
Figure 51: Pigous example
Suppose a total of r = 10 vehicles/minute travel from s to t.
r = f
1
+ f
2
.
Travel time per minute: c
1
(f
1
) = f
1
,
c
2
(f
2
) = 10.
How do drivers decide if they should take path 1 or 2?
Wardrops principles (1952):
First principle (Nash equilibrium):
Travel times on all paths with non-zero trac travel time on any (used or unused)
path.
Second principle (social optimum):
The sum of all travel times is minimised.
Are both principles equivalent?
Let us call the sum of all travel times in the example above S.
Nash equilibrium:
c
1
(f
1
) = c
2
(f
2
) f
1
= 10 f
2
= r f
1
= 0 S
NE
= 100.
Social optimum:
S = f
1
c
1
(f
1
) + f
2
c
2
(f
2
) = f
1
2
+ 10f
2
= f
1
2
+ 10(r f
1
).
The minimum satises
dS
df
1
= 0 2f
1
10 = 0 f
1
= 5.
91
(Check:
d
2
S
df
1
2
= 2 > 0.

)
At the social optimum S
SO
= 75.
Wardrops rst and second principle lead to dierent ows and dier-
ent travel times.
The ratio :=
S
NE
S
SO
is called the Price of Anarchy. In Pigous example = 4/3.
We will prove: for every network with non-negative, linearly increasing costs on all links
l (i.e. c
l
= a
l
f
l
+ b
l
, a
l
, b
l
0) the Price of Anarchy is
_
1,
4
3

. The lower and upper


bounds are tight.
The Nash equilibrium cost can be at most 4/3 times the social optimum cost, regardless of
the exact network topology.
The lower bound immediately follows from the denition of the social optimum as the
ow minimising S. The upper bound needs more work ...
14.2 Notation
Consider a network with node set N and link set L.
There are k source nodes s
1
, . . . , s
k
N and k sink nodes t
1
, . . . , t
k
N.
Trac from s
i
is destined for t
i
and has rate r
i
.
An origin-destination pair {s
i
, t
i
} is called a commodity.
P
i
: set of all paths from s
i
to t
i
.
P :=
i
P
i
.
(Reminder: A path is a walk that contains no cycles.)
A ow is a non-negative real vector indexed by P.
A feasible ow satises

PP
i

P
= r
i
i {1, . . . , k}.
A ow induces a ow on the links {f
l
}
lL
where f
l
=

PP:lP

P
.
Let us call f the link-representation of .
Each link l has a cost c
l
(f
l
) = a
l
f
l
+ b
l
, a
l
, b
l
0.
The cost of a path P with respect to a ow is
P
() :=

lP
c
l
(f
l
).
The total cost of is S() :=

PP

P
()
P
.
Note: we can express S also as a function of ows on the links:
S()

=

PP
_
lP
c
l
(f
l
)
_

P
=

lL
_
PP:lP

P
_
c
l
(f
l
)

=

lL
f
l
c
l
(f
l
).
92
s
1
t
1
s
2
t
2
1
2
3
4
5
6
7
8
f
1
=r
2
f
2
=r
1
+
1
3
r
2
f
3
=r
1
+
1
3
r
2
f
4
=r
1
f
5
=r
1
f
6
=r
2
f
7
=
2
3
r
2
f
8
=
2
3
r
2
Figure 52: A two-commodity network.
14.3 Flows at a social optimum
Additional notation:
h
l
(f
l
) := f
l
c
l
(f
l
) = a
l
f
l
2
+ b
l
f
l
.
The derivative h

l
(f
l
) = 2a
l
f
l
+ b
l
is called the marginal cost function. The cost of
adding an innitesimal amount of ow on link l equals h

l
(f
l
) + O(
2
)
h

P
() :=

lP
h

l
(f
l
) =

lP
(2a
l
f
l
+ b
l
).
Denition 14.2:
A feasible ow is a social optimum if it minimises S() =

lL
h
l
(f
l
).
Existence of a social optimum:
S() is continuous and dened in a closed, bounded region (namely the set of feasible
ows). There is always a social optimum.
Proposition 14.3:
Let

be a feasible ow and f

its link-representation. Then the next three statements


are equivalent.
(A)

is a social optimum.
(B) h
P
1

) h
P
2

) for every i {1, . . . , k} and P


1
, P
2
P
i
with

P
1
> 0.
(C) For every feasible ow with link-representation f,

lL
h

l
(f

l
)f

lL
h

l
(f

l
)f
l
.
Note: (B) implies that all paths P with non-zero trac have equal h

P
().
Proof (A) (B):
Suppose

is an optimal ow. Consider an s


i
-t
i
path P
1
P with

P
1
> 0 and another
s
i
-t
i
path P
2
P. (Note: there must be trac on P
1
, but not necessarily on P
2
.)
93
x
h(x)
x
*
Figure 53: A convex function h (solid curve) and its linear approximation (dashed line) at the
point x

.
Transfer a small amount of ow (0,

P
1
] from P
1
to P
2
. This yields a feasible ow
with total cost
S() =

lL
h
l
(f

l
)
. .
S(

)
+
_

lP
2
h

l
(f

l
)

lP
1
h

l
(f

l
)
_
+
2
_

lP
2
a
l

lP
1
a
l
_
,
where we have used the fact that all h
l
are quadratic.
Because

is optimal, we must have S() S(

). Since > 0,

lP
2
h

l
(f

l
)

lP
1
h

l
(f

l
)
_

lP
2
a
l

lP
1
a
l
_
.
(B) follows for 0
+
.
Proof (B) (C):
Consider
H() :=

PP
h

P
(

)
P
,
where is an arbitrary feasible ow.
Note:

is xed. h

P
(

) is independent of .
Finding the minimum of H() is a congestion-independent min-cost ow problem.
The problem can be solved by independently minimising the cost for every commodity
i. Given (B), the best choice is to route the ow from s
i
to t
i
on one of those paths P
where

P
> 0. Because the cost is equal on all these paths, we can, for example, obtain
a minimum of H() by routing all ow exactly as in

PP
h

P
()
P

PP
h

P
(

P
Rearranging the terms in the sum,

lL
h

l
(f

l
)f
l

lL
h

l
(f

l
)f

l
.
94
s t
c
1
( f
1
) = 1
c
2
( f
2
) = 1
Figure 54: A simple example where the social optimum is not unique.
Proof (C) (A):
a
l
0 l L. h
l
(f
l
) = a
l
f
l
2
+ b
l
f
l
is convex.
h
l
(f
l
) h
l
(f

l
) + (f
l
f

l
)h

l
(f

l
), see Fig. 53.
S() =

lL
h
l
(f
l
)

lL
[h
l
(f

l
) + h

l
(f

l
)(f
l
f

l
)]
=

lL
h
l
(f

l
) +

lL
h

l
(f

l
)(f
l
f

l
)
(C)


lL
h
l
(f

l
) = S(

).

Remark 14.4:
If and are two social optima, their costs must be the same: S() = S().
Otherwise one cost would be larger and thus not a social optimum.
However, the ow is not unique. Consider the network in Fig. 54. If r is the trac
demand between s and t, you can distribute the trac arbitrarily on the two links.
On the other hand, it is possible to show that for all social optima the ows on
links l with a
l
> 0 are equal. (Hint: use convexity of h
l
.)
14.4 Flows at Nash equilibrium
A ow is at Nash equilibrium if no user can reduce his/her travel time by unilaterally
changing paths.
We assume that all users are only responsible for an innitesimal amount of trac.
Denition 14.5:
A feasible ow is at Nash equilibrium if for all
commodities i {1, . . . , k},
s
i
-t
i
paths P
1
, P
2
P
i
with
P
1
> 0,
amounts (0,
P
1
] of trac on P
1
,
95
the costs satisfy
P
1
()
P
2
(), where

P
=
_

P
if P = P
1
,

P
+ if P = P
2
,

P
otherwise,
is the ow obtained by moving units of ow from P
1
to P
2
.
Proposition 14.6 (Wardrops rst principle):
Let be a feasible ow. The following two statements are equivalent.
(A) is at Nash equilibrium.
(B)
P
1
()
P
2
() for every i {1, . . . , k} and P
1
, P
2
P
i
with
P
1
> 0.
Proof (A) (B):
Let 0 so that .

P
1
() lim

P
2
()
continuity
=
P
2
(lim

) =
P
2
().
Proof (B) (A):
The cost functions c
l
(f
l
) are monotonically increasing.
When moving more ow to P
2
,
P
2
cannot decrease.

P
2
()
P
2
().

Note the similarity between the statements (B) in Proposition 14.3 and 14.6. This moti-
vates
Proposition 14.7:
Let be a feasible ow. The following two statements are equivalent.
(A) is at Nash equilibrium.
(B) is a minimum of

S() :=

lL

h
l
(f
l
), where

h
l
(f
l
) =
1
2
a
l
f
l
2
+ b
l
f
l
.
Proof:
Because
P
() =

lP
c
l
(f
l
) =

lP

l
(f
l
) =

h

P
(), we have

h

P
1
()

h

P
2
().
The situation is the same as in Proposition 14.3, only with a tilde on all the function
names and constants.

Remark 14.8:
In a Nash equilibrium, the cost
c
l
(f
l
) = a
l
f
l
+ b
l
is replaced by
c
l
(f
l
) := a
l
f
l
+

b
l
,
where
a
l
=
1
2
a
l
,

b
l
= b
l
.
The ow-dependent term is given less weight.
Nash ows are socially optimal ows, but not for the correct cost functions!
96
social optimum Nash equilibrium
auxiliary congestion-
a
l
a
l
=
1
2
a
l
dependent coecient
auxiliary congestion-
b
l

b
l
= b
l
independent coecient
auxiliary cost per
c
l
= a
l
f
l
+ b
l
c
l
= a
l
f
l
+

b
l
unit trac
auxiliary cost for
h
l
= c
l
f
l

h
l
= c
l
f
l
all trac on link
auxiliary function
S =

lL
h
l
(f
l
)

S =

lL

h
l
(f
l
)
minimised
real cost paid
by all users S =

lL
h
l
(f
l
)
Pigous example (see Ex. 14.1):
c
1
(f
1
) =
1
2
f
1
, c
2
(f
2
) = 10.

S =
1
2
f
1
2
+ 10f
2
.
Because r = f
1
+ f
2
= 10,

S =
1
2
f
1
2
+ 10(10 f
1
).
Minimum:
d

S
df
1
= f
1
10 = 0.
f
1
= 10, f
2
= 0, in agreement with our results in Ex. 14.1.
Lemma 14.9:
Suppose is a ow at Nash equilibrium for trac rates r
1
, . . . , r
k
.
If the trac rates are replaced by r
1
/2, . . . , r
k
/2, the ow /2 is a social optimum for
these new rates.
Proof:
Let f be the link-representation of the ow . The ow /2 then has link-representation
f/2 on all the links.
h
l
(f
l
/2) =
_
1
2
a
l
f
l
+ b
l
_
f
l
2
=
1
2

h
l
(f
l
).
Because is a Nash ow and thus minimises

S =

lL

h
l
(f
l
), /2 minimises S =

lL
h
l
(f
l
/2) =
1
2

S.

Corollary 14.10:
There exists always a Nash equilibrium and its cost is unique.
Proof: Existence

S is continuous and the space of feasible ows is closed and bounded.


Proof: Uniqueness
Suppose
0
and
1
are ows at Nash equilibrium.
From Prop. 14.7:
0
and
1
are global minima of

S.
In particular,

S(
0
) =

S(
1
).
Consider

= (1 )
0
+
1
, [0, 1].
97

S(
0
)

0
global min.


S(

S convex
(1 )

S(
0
) +

S(
1
) =

S(
0
)


S(

) = (1 )

S(
0
) +

S(
1
) [0, 1].
Let f
0
, f
1
be the induced ow on the links.

lL

h
l
((1 )f
0,l
+ f
1,l
) =

lL
[(1 )

h
l
(f
0,l
) +

h
l
(f
1,l
)].
Because all

h
l
are convex, equality can only hold if

h
l
((1 )f
0,l
+ f
1,l
) = (1 )

h
l
(f
0,l
) +

h
l
(f
1,l
) l.
(Otherwise = would turn into .)
All

h
l
(f
l
) = f
l
c
l
(f
l
) must be linear between f
0
and f
1
.
c
l
(f
0,l
) = c
l
(f
1,l
). f
0,l
= f
1,l
or a
l
= 0. c
l
(f
0,l
) = c
l
(f
1,l
).
S(
0
) = S(
1
).

14.5 How bad is selsh routing?
Denition 14.11:
Let
SO
be a socially optimal ow and
NE
a ow at Nash equilibrium for the same
network. The ratio
=
S(
NE
)
S(
SO
)
is called the Price of Anarchy.
Remark:
Because of Remark 14.4 and Corollary 14.10, only depends on the network, not on
SO
or
NE
.
We now want to give an upper bound for . The next two lemmas will become helpful.
Lemma 14.12:
Every ow satises S(/2)
1
4
S().
Proof:
Let f be the ow induced on the links.
h
l
(f
l
) = a
l
f
l
2
+ b
l
f
l
, b
l
0
h
l
(f
l
/2) =
1
4
a
l
f
l
2
+
1
2
b
l
f
l

1
4
a
l
f
l
2
+
1
4
b
l
f
l
=
1
4
h
l
(f
l
)
S(/2) =

lL
h
l
(f
l
/2)
1
4

lL
h
l
(f
l
) =
1
4
S().

Lemma 14.13:
Let r
i
be the trac rate from s
i
to t
i
and

a socially optimal ow. Let f

be the induced
ow on the links.
Now consider trac on the same network with the increased rates (1 + )r
i
, > 0.
Every ow feasible for the augmented rates satises
S() S(

) +

lL
h

l
(f

l
)f

l
,
where h

l
(f
l
) = 2a
l
f
l
+b
l
is the marginal cost function dened at the beginning of Sec. 14.3.
98
Proof:
Let f be the ow on the links induced by .
All h
l
are convex. h
l
(f
l
) h
l
(f

l
) + (f
l
f

l
)h

l
(f

l
), see Fig. 53.
S() =

lL
h
l
(f
l
) S(

) +

lL
(f
l
f

l
)h

l
(f

l
). (54)
Apply Proposition 14.3(C) to the ow /(1 +)
1
1 +

lL
h

l
(f

l
)f
l

lL
h

l
(f

l
)f

l
(55)
Inserting Eq. (55) into Eq. (54) proves the lemma.
Theorem 14.14:
The Price of Anarchy has an upper bound 4/3,
1
4
3
.
Proof:
Let
NE
be a ow at Nash equilibrium for trac rates r
1
, . . . , r
k
.
Let be an arbitrary feasible ow for the same rates.
From Lemma 14.9,

=
1
2

NE
is optimal for trac rates r
1
/2, . . . , r
k
/2.
Let f
NE
, f, f

be the link representations of


NE
, ,

.
We can apply Lemma 14.13 with = 1,
S() S(

) +

lL
h

l
(f

l
)f

l
= S
_

NE
2
_
+

lL
h

l
_
f
NE
l
2
_
f
NE
l
2
.
Next, apply Lemma 14.12 to the rst term,
S()
1
4
S(
NE
) +

lL
h

l
_
f
NE
l
2
_
f
NE
l
2
.
Finally, we use that h

l
(f
l
) = 2a
l
f
l
+ b
l
and c
l
(f
l
) = a
l
f
l
+ b
l
,
S()
1
4
S(
NE
) +
1
2

lL
c
l
(f
NE
l
)f
NE
l
=
3
4
S(
NE
).

14.6 Braess paradox
Nash equilibrium in Fig. 55a:
Suppose there is a trac rate r = 10 between the left and right nodes.
f
a
= f
b
= f
c
= f
d
= 5
cost: S
a
= f
2
a
+ 10f
b
+ 10f
c
+ f
2
d
= 150.
Nash equilibrium in Fig. 55b:
99
Figure 55: The Braess paradox. The added link in (b) causes additional cost for everybody.
Let us try to reduce the cost by inserting a perfect road with cost c
e
= 0 from top to
bottom.
In the Nash equilibrium, all vehicles will follow the path marked in red (i.e. f
a
= f
d
=
f
e
= 10, f
b
= f
c
= 0).
cost: S
b
= 10f
a
+ 10f
d
= 200.
Counterintuitively, S
b
> S
a
.
Braess paradox: in a Nash equilibrium, network improvements can de-
grade network performance!
For linear costs, Theorem 14.14 gives an upper bound on the severity of the Braess
paradox: added links can increase travel times at the most by a factor 4/3.
Remark:
If costs are not linear, the Price of Anarchy is in general not bounded and the
Braess paradox can be arbitrarily severe.
For certain classes of non-linear convex cost functions, upper bounds for the Price
of Anarchy can be proved. For details, see Tim Roughgarden, Selsh Routing and
the Price of Anarchy, The MIT Press (2005).
100

Anda mungkin juga menyukai