Dik Taat

COMPLEX NETWORKS
Diego Garlaschelli, Frank den Hollander, Aske Plaat

October 28, 2015
Preface
Transportation networks, communication networks and social networks form the
backbone of modern society. In recent years there has been a growing fascination
with the complex connectedness such networks provide. This connectedness
manifests itself in many ways: in the rapid growth of Internet and the WorldWide Web, in the ease with which global communication takes place, in the
speed at which news and information travel around the world, and in the fast
spread of an epidemic or a financial crisis. These phenomena involve networks,
incentives, and the aggregate behaviour of groups of people. They are based on
the links that connect people and their decisions, with global consequences.
A network can be viewed as a graph. The vertices in the graph represent the
nodes of the network, the edges connecting the vertices represent the links between the nodes. Neurons in the brain are connected by synapses, proteins
inside the cell by physical contacts, people within social groups by common interests, countries of the world by economic relationships and financial markets,
companies by trade, and computers in the Internet by cables transferring data.
Despite the very different origin and nature of these links, all systems share an
underlying networked structure representing their large-scale organisation. This
organisation in turn leads to self-organisation, and other emergent properties,
which can only be understood by analysing the overall architecture of the system
rather than its constituents elements alone.
The Science of Complex Networks constitutes a young and active area of research, inspired by the empirical study of real-world networks (either physical, chemical, biological, economic or social). Big Data are continuously being
recorded and stored into large data sets, from biological data resulting from
DNA sequencing and the investigation of protein interactions and function, via
financial data reporting the high-frequency behaviour of stock markets, to informatics data mapping the structure and dynamics of the Internet and the
World-Wide Web. Each such data set is the analogue of the outcome of a
large-scale experiment that is rather different from the experiments carried out
in a laboratory. We are therefore experiencing an unprecedented possibility of
analysing experimental data and using them to formulate and test theoretical
models of complex networks.
Most complex networks display non-trivial topological features, with patterns
of connection that are neither purely regular nor purely random. Such features
include a heavy tail in the empirical distribution of the number of edges incident to a vertex (scale freeness), insensitivity of this distribution to the size
of the network (sparseness), small distances between most vertices (small
2
world), likeliness that two neighbours of a vertex are also neighbours of each
other (highly clustered), positivity of the correlation coefficient between the
numbers of edges incident to two neighbouring vertices (assortativity), community structure and hierarchical structure. The challenge is to understand the
effect such features have on the performance of the network, via the study of
models that allow for computation, prediction and control.
The present document contains the notes for the course on Complex Networks
offered by the Departments of Mathematics, Physics and Computer Science of
the Faculty of Science at Leiden University, The Netherlands. This course is
intended for third-year bachelor students and first-year master students. Its aim
is to provide an introduction to the area, covering both theoretical principles
and practical applications from various different directions (see the course schedule and the table of contents below). Complex Networks is a multi-disciplinary
course: it exposes views on the area from mathematics, physics and computer
science, and is open to students from all programs in these three disciplines.
At the same time it assumes basic knowledge at the bachelor level in each of
these disciplines, including key concepts from calculus (differentiation, integration, limits), probability theory (probability distributions, random variables,
stochastic processes), statistical physics (ensembles, entropy), and computer
programming (C, Java or Python). The course is both challenging in terms of
panorama and rewarding in terms of insight.
In the course we highlight some of the many fruitful ways in which mathematics,
physics and computer science come together in the study of complex networks.
The course is divided into two parts:
(I) Theory of Networks. In Chapter 1 we provide a general introduction
to complex networks by reporting on some of the empirically observed
properties of real-world networks, highlighting the universal behaviour
observed across many of them. In Chapters 23 we introduce some of the
most important mathematical models of networks, from simple models
to more difficult models aimed at reproducing the empirical properties of
real-world networks. In Chapters 45 we offer an empirical characterization of real-world networks and a statistical-physics description of some
of the aforementioned models, with the aim of providing tools to identify
structural patterns in real-world networks. In Chapters 67, finally, we
review various key contributions of computer science, including algorithms
to generate random graphs and measure their properties, as well as the use
of visualization tools to gain insight into the structure of random graphs.
(II) Applications of Networks. We exploit the theoretical and methodological tools introduced in (I) to illustrate important applications in Percolation (Chapter 8), Epidemiology (Chapter 9), Pattern detection (Chapter 10), Self-organisation (Chapter 11), Network dynamics and Network
properties (Chapter 12) and Real Networks (Chapter 13). Much of (II)
deals with the interplay between structure and functionality of random
graphs, i.e., with the question how the topology of a network affects the
behaviour of a process taking place on it.
As a red thread through the course we use the so-called Configuration Model,
a random graph with a prescribed degree sequence. This allows us to link up
concepts and tools.
The course consists of 13 lectures (1 introduction, 6 theory, 6 applications) and

6 exercise sessions. Each of lectures 213 contains two types of exercises:
Exercise:
Routine exercise that supports the presentation in the text.
Homework:
In-depth exercise that needs to be handed in one week after the lecture.
It is allowed and may be helpful to form teams of up to three students to work
on the homework exercises, preferably teams with a student from Mathematics,
Physics, and Computer Science each. In this case, a single file per group should
be submitted, clearly listing all names of the students whose work it represents.
At the end of the course there is a 3-hour open-book written examination.
The final grade is a weighted average of the grades for the homework (30%) and
the written examination (70%).
Dr. Diego Garlaschelli (Physics)
Prof. Dr. Frank den Hollander (Mathematics)
Prof. Dr. Aske Plaat (Computer Science)
Course overview
Chapter
Teacher
Topic
Introduction
DG+FdH+AP real-world networks, examples,

topological features, challenges
Theory
FdH
random graphs, degree distribution, sparseness, scale freeness,

small world, Erd
os-Renyi random
graph
FdH
configuration model, preferential

attachment models
DG
empirical topological properties of

real-world networks
DG
maximum-entropy, network ensembles
AP
implementation basics, adjacency

matrix, visualization
AP
implementing the Configuration

Model
FdH
ordinary percolation, invasion

percolation, vulnerability
FdH
contact process, epidemic, spread

of rumour
10
DG
pattern detection in networks
11
DG
self-organised networks
12
AP
network dynamics and higher order properties
13
AP
implementing real networks, adjacency lists
Applications
Contents
I
Theory of Networks
1 Real-world Networks
1.1 Complex networks . . . . . . . . . . . . . . . . . . . . . . .
1.2 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Acquaintance Networks . . . . . . . . . . . . . . . .
1.2.2 Collaboration Networks . . . . . . . . . . . . . . . .
1.2.3 World-Wide Web . . . . . . . . . . . . . . . . . . . .
1.3 Technological Networks . . . . . . . . . . . . . . . . . . . .
1.3.1 Internet . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2 Transportation Networks . . . . . . . . . . . . . . .
1.3.3 Energy Networks . . . . . . . . . . . . . . . . . . . .
1.4 Economic Networks . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Financial Networks . . . . . . . . . . . . . . . . . . .
1.4.2 Shareholding Networks . . . . . . . . . . . . . . . . .
1.4.3 World Trade Web . . . . . . . . . . . . . . . . . . .
1.5 Biological Networks . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Metabolic Networks . . . . . . . . . . . . . . . . . .
1.5.2 Protein Interaction Networks and Genetic Networks
1.5.3 Neural Networks and Vascular Networks . . . . . . .
1.5.4 Food Webs . . . . . . . . . . . . . . . . . . . . . . .
1.6 Still other types of networks . . . . . . . . . . . . . . . . . .
1.6.1 Semantic Networks . . . . . . . . . . . . . . . . . . .
1.6.2 Co-occurrence Networks . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
11
11
12
15
18
18
21
22
22
22
23
23
25
25
25
25
26
26
26
26
2 Random Graphs
2.1 Graphs, random graphs, four scaling features
2.2 Erd
os-Renyi random graph . . . . . . . . . .
2.2.1 Percolation transition . . . . . . . . .
2.2.2 Scaling features . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
31
34
35
38
3 Network Models
3.1 The configuration model . . . . . .
3.1.1 Motivation . . . . . . . . .
3.1.2 Construction . . . . . . . .
3.1.3 Graphical degree sequences
3.1.4 Percolation transition . . .
3.1.5 Scaling features . . . . . . .
3.2 Preferential attachment model . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
40
40
40
40
41
44
45
45
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
3.2.1
3.2.2
3.2.3
3.2.4
Motivation . . . . .
Construction . . . .
Scaling features . . .
Dynamic robustness
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
45
46
49
50
4 Network Topology
4.1 Basic notions . . . . . . . . . .
4.2 Empirical topological properties
4.2.1 First-order properties .
4.2.2 Second-order properties
4.2.3 Third-order properties .
4.2.4 Global properties . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
53
53
56
57
61
66
70
5 Network Ensembles
5.1 Equiprobability in the Erd
os-Renyi model . .
5.2 Implementations of the Configuration Model
5.2.1 Link stub reconnection . . . . . . . . .
5.2.2 The local rewiring algorithm . . . . .
5.2.3 The Chung-Lu model . . . . . . . . .
5.2.4 The Park-Newman model . . . . . . .
5.3 Maximum-entropy ensembles . . . . . . . . .
5.3.1 The Maximum Entropy Principle . . .
5.3.2 Simple undirected graphs . . . . . . .
5.3.3 Directed graphs . . . . . . . . . . . . .
5.3.4 Weighted graphs . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
77
78
79
80
81
82
84
88
89
91
92
93
6 Random Graph Implementation

6.1 Random graph . . . . . . . . . . . .
6.1.1 Adjacency Matrix . . . . . .
6.1.2 Random Graph . . . . . . . .
6.1.3 Mean and Variance . . . . . .
6.1.4 Computing Graph Properties
6.2 Visualization . . . . . . . . . . . . .
6.2.1 Downloading Gephi . . . . .
6.2.2 Running Gephi . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
96
96
97
97
98
99
99
99
7 Configuration Model Implementation

7.1 Configuration Model Implementation .
7.1.1 Pre-specified degree sequence .
7.1.2 Visualization . . . . . . . . . .
7.2 Repeated Configuration Model . . . .
7.2.1 Self-Loops and Multi-Edges . .
7.2.2 Check Routines . . . . . . . . .
7.2.3 Repeated Configuration Model
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
102
102
102
103
104
104
104
105
II
Applications of Networks
106
8 Percolation
107
8.1 Ordinary percolation . . . . . . . . . . . . . . . . . . . . . . . . . 107
CONTENTS
8.2
8.3
Invasion percolation . . . . . . . . . . . . . . . . . . . . . . . . . 109

Vulnerability of the configuration model . . . . . . . . . . . . . . 112
9 Epidemiology
9.1 The contact process on infinite lattices . .
9.1.1 Construction . . . . . . . . . . . .
9.1.2 Shift-invariance and attractiveness
9.1.3 Convergence to equilibrium . . . .
9.1.4 Critical infection threshold . . . .
9.2 The contact process on large finite lattices
9.3 The contact process on random graphs . .
9.4 Spread of a rumour on random graphs . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
116
116
116
117
118
118
119
120
121
10 Pattern detection in networks

10.1 The maximum-likelihood principle . . . . . . . . . . .
10.1.1 Motivation . . . . . . . . . . . . . . . . . . . .
10.1.2 Generalities . . . . . . . . . . . . . . . . . . . .
10.1.3 Erd
os-Renyi random graph . . . . . . . . . . .
10.1.4 More complicated models . . . . . . . . . . . .
10.2 Detecting structural patterns in networks . . . . . . .
10.2.1 Maximum likelihood in the configuration model
10.2.2 Directed graphs . . . . . . . . . . . . . . . . . .
10.2.3 General case . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
124
124
125
126
127
127
129
129
134
141
11 Self-Organized Networks
11.1 Introduction . . . . . . . . . . . . . . .
11.2 Scale invariance and selforganization
11.2.1 Geometric fractals . . . . . . .
11.2.2 SelfOrganized Criticality . . .
11.3 The fitness model . . . . . . . . . . . .
11.3.1 Particular cases . . . . . . . . .
11.4 A selforganized network model . . . .
11.4.1 Motivation . . . . . . . . . . .
11.4.2 Definition . . . . . . . . . . . .
11.4.3 Analytical solution . . . . . . .
11.4.4 Particular cases . . . . . . . . .
11.5 Conclusions . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
146
146
147
151
152
153
154
155
155
159
163
12 Visualizing Dynamics and Higher Order Properties

12.1 Network Dynamics . . . . . . . . . . . . . . . . . . . .
12.1.1 Netlogo . . . . . . . . . . . . . . . . . . . . . .
12.1.2 Preferential Attachment . . . . . . . . . . . . .
12.1.3 Percolation Transition . . . . . . . . . . . . . .
12.2 Network Properties . . . . . . . . . . . . . . . . . . . .
12.2.1 Gnuplot . . . . . . . . . . . . . . . . . . . . . .
12.2.2 Empirical Network properties . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
170
170
170
173
174
175
175
176
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13 Analysing Real Networks

179
13.1 Adjacency Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
13.2 SNAP real networks . . . . . . . . . . . . . . . . . . . . . . . . . 180
Part I
Theory of Networks
Chapter 1
Real-world Networks
1.1
Complex networks
The advent of the computer age has incited a mounting interest in the fundamental properties of real-world networks. Due to the vast computational power
that is presently available, large data sets can be easily stored and analysed.
This has had a profound impact on the empirical study of large networks. A
striking conclusion from this empirical work is that real-world networks share
fascinating features. Many are small worlds, which means that most nodes are
separated from each other by relatively short chains of links. Because networks
tend to operate efficiently, this property was perhaps to be expected. More surprisingly, however, many networks are sparse, which means that the empirical
distribution of the degree (= number of links to other nodes) of the nodes is
almost independent of the size of the network. In addition, they are scale-free,
which means that the fraction of nodes with degree k is approximately proportional to k for some > 1, i.e., many real-world networks appear to have
power-law degree distributions.1 The above observations have had fundamental
implications for scientific research on networks. The aim of this research is to
understand why networks share these features, and what the qualitative and the
quantitative aspects of these features are.
Complex networks plays an increasingly important role in science. Examples
include electrical power grids, transportation and traffic networks, telephony
networks, Internet and the World-Wide Web, Facebook and Twitter, as well as
collaboration and citation networks of scientists. The structure of such networks
affects their performance. For instance, the topology of social networks affects
the spread of information and disease, while the topology of Internet affects
its success as a means of communication. See Barabasi [4], Watts [39] and
Newman, Watts and Barabasi [32] for expository accounts on the discovery of
network properties and the empirical measurement of these properties.
Networks are modelled as graphs, i.e., a set of vertices connected by a set
of edges. A common feature of real-world networks is that they are large and
complex. Consequently, a global description of their topology is impossible,
which is why researchers have turned to local properties: How many vertices
1 In Chapters 2, 3 and 4 we provide rigorous definitions of the above structural properties,
as well as extensive supporting empirical evidence.
10
1.2. SOCIAL NETWORKS
11
does the network have? According to what rules are the vertices connected to
one another by edges? What cluster sizes and cluster shapes are most common?
What is the average distance between two vertices? What is the maximal distance between two vertices? These local properties are typically probabilistic,
which leads to the study of random graphs.
The observation that many real-world networks share the properties mentioned above has incited a burst of activity in network modeling. In this course
we survey some of the proposals made for network models. Most models use
random graphs as a way to model the uncertainty and the lack of regularity
in real-world networks. These models can be divided into two distinct types:
(1) static, where the aim is to describe networks and their topology at a given
instant of time; (2) dynamic, where the aim is to explain how networks came
to be as they are. The goal is to explain the universal behaviour exhibited by
real-world networks. Dynamic explanations often focus on the growth of the
network as a way to explain power-law degree distributions by means of preferential attachment growth, where new vertices are more likely to be attached to
vertices that already have large degrees.
Most real-world networks can be classified into four broad classes:
(I) Social Networks: WWW, Facebook, Twitter, WhatsApp.
(II) Technological Networks: Internet, power grids, traffic, transportation.
(III) Economic Networks: trade, interbank, interfirm, input/output.
(IV) Biological Networks: metabolic, neural, protein interaction.
In Sections 1.21.5 we describe examples drawn from each of these classes. In
Section 1.6 we mention a few further examples that lie beyond.
Reviews on the subject can be found in Albert and Barabasi [2], Dorogovtsev
and Mendes [18], Newman [30], and van der Hofstad [22]. The exposition below
borrows from Chapter 1 of the latter reference.
1.2
1.2.1
Social Networks
Acquaintance Networks
In 1967, psychologist Stanley Milgram performed the following experiment. He

sent 60 letters to various people in Wichita, Kansas, USA, who were asked to
forward the letter to a specific person in Cambridge, Massachusetts, USA. The
participants could only pass the letters (by hand) to personal acquaintances who
they thought might be able to reach the target, either directly or via friends of
friends. While 50 people responded to the challenge, only 3 letters (roughly 5%)
reached their destination. In later experiments, Milgram managed to increase
the success rate to 35%, respectively, 95% by pretending that the value of the
package was high, and by providing more clues about the recipient, such as
his/her occupation. The main conclusion from the work of Milgram was that
most people are connected by a chain of at most 6 friends of friends, and this
fact was dubbed with the phrase Six Degrees of Separation.
The idea of close connectedness was first proposed in 1929 by the Hungarian writer Frigyes Karinthy, in a short story called Chains. Later playwright
12
CHAPTER 1. REAL-WORLD NETWORKS
John Guare popularised the phrase when he chose it as the title for his 1990
play. In this play, Ousa, one of the main characters, says:
Everybody on this planet is separated only by six other people. Six degrees
of separation. Between us and everybody else on this planet. The president
of the United States. A gondolier in Venice ... Its not just the big names.
Its anyone. A native in the rain forest. (...) An Eskimo. I am bound to
everyone on this planet by a trail of six people. It is a profound thought.
The fact that, on average, people can be reached by a chain of at most 6 intermediaries is rather striking. It implies that any two people in remote areas
such as Greenland and the Amazone can be linked by a sequence of on average
6 intermediaries. This makes the phrase It is a small world we live in! very
appropriate indeed.
The idea of Milgram was taken up afresh in 2001, with the added possibilities
of the computer era. In 2001, Duncan Watts, a professor at Columbia University,
recreated Milgrams experiment using an e-mail message as the package that
needed to be delivered. Surprisingly, after reviewing the data collected by 48,000
senders and 19 targets in 157 different countries, Watts again found that the
average number of intermediaries was 6. The research of Watts and the advent
of the computer age have opened up new areas of inquiry related to Six Degrees
of Separation in diverse areas of network theory, such as electrical power grids,
disease transmission, corporate communication, and computer circuitry.
To put the idea of a small world into network language, we define the vertices
of the social graph to be the inhabitants of the world (n 7 109 ), and we
draw an edge between two people when they know each other. Of course, we
should make precise what the latter means. Possibilities are various: it could
mean that the two people involved have shaken hands at some point, or meet
regularly, or address each other on a first-name basis, etc. The precise choice
affects the connectivity of the social graph and hence the conclusions we may
draw about its topology.
One of the main difficulties with social networks is that they are notoriously
hard to measure. Questionaires cannot always be trusted, because people have
different ideas about what a certain social relation is. Also, questionaires take
time to fill out and to collect. As a result, researchers are interested in examples
of social networks that can be more easily measured, for instance, because they
are electronic. Examples are e-mail networks, or social networks such as Hyves
and Facebook (see Fig. 1.1).
1.2.2
Collaboration Networks
An interesting example of a complex network that has drawn attention is the collaboration graph in mathematics, which is popularized under the name Erdosnumber project. In this graph, the vertices are mathematicians, and there is
an edge between two mathematicians when they have co-authored a paper.
The Erd
os number of a mathematician counts how many papers that mathematician is away from the legendary mathematician Paul Erdos, who was extremely prolific and wrote around 1500 papers with 511 collaborators. Of all
the mathematicians who are connected to Erdos by a trail of collaborators, the
maximal Erd
os-number is claimed to be 15. On the above website, you can find
13
Figure 1.1: A map of the network of all friendships formed on Facebook across the
world (from https://www.facebook.com/zuck).
out how far your own professors are away from Erdos. Also, it is possible to
find the distance between any two mathematicians worldwide.
The distribution of the Erd
os numbers is given in the following table (based
on data collected in July 2004):
Erd
os number
number of mathematicians
0
1
2
3
4
5
6
7
8
9
10
11
12
13
1
504
6593
33605
83642
87760
40014
11591
146
819
244
68
23
4
The median is 5, the mean is 4.65, and the standard deviation is 1.21. We
note that the Erd
os number is finite if and only if the corresponding mathematician is in the largest connected component of the collaboration graph. See
Fig. 1.2 for an artistic impression of the collaboration graph in mathematics
taken from
http://www.orgnet.com/Erdos.html
14
and Fig. 1.3 for the degree distribution in the collaboration graph.
Figure 1.2: An artist impression of the collaboration graph in mathematics.
De Castro and Grossman [15, 16] investigated the Erdos-numbers of Nobel

prize laureates and Fields medal winners. They found that Nobel prize laureates
have Erd
os-numbers of at most 8 with an average of 45, while Fields medal
winners have Erd
os-numbers of at most 5 with an average of 34.
In July 2004, the collaboration graph consisted of about 1.9 million authored
papers in the Mathematical Reviews database, with a total of about 401,000
different authors. The percentage of papers with a given number authors is:
number of authors
percentage
62.4%
27.4%
8.0%
1.7%
0.4%
0.1%
The largest number of authors shown for a single item lies in the 20s. Sometimes
the author list includes et al., in which case the number of co-authors is not
known precisely.
The fraction of items authored by just one person has steadily decreased
over time, starting out above 90% in the 1940s and currently standing at under
50%. The entire graph has about 676,000 edges, so that the average number of
collaborators per person is 3.36. See
http://www.oakland.edu/enp
15
Number of vertices with given degree
1000000
100000
10000
1000
Series1
100
10
1
1
10
100
1000
Degree
Figure 1.3: The degree sequence in the collaboration graph.
In the collaboration graph, the average number of collaborators for people who
have collaborated is 4.25. There are only 5 mathematicians with degree at least
200. The largest degree is for Erd
os, who has 511 co-authors.
The clustering coefficient of a graph is equal to the number of ordered triples
of vertices a, b, c in which the edges ab, bc and ac are present, divided by the
number in which ab and bc are present. In other words, the clustering coefficient
describes how often two neighbors of a vertex are adjacent to each other. The
clustering coefficient of the collaboration graph is 1308045/9125801 = 0.14. The
relatively high value of this number, together with the fact that average path
lengths are small, indicates that the collaboration graph is a small-world graph.
1.2.3
World-Wide Web
The vertices of the WWW are electronic web pages, the edges are hyperlinks
(or URLs) pointing from one web page to another. The WWW is therefore
a directed network, since hyperlinks are not necessarily reciprocated. The
properties of the WWW have been studied by a number of authors: see e.g.
Albert, Jeong and Barab
asi [3], Kleinberg, Kumar, Raghavan, Rajagopalan
and Tomkins [25], Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata,
Tomkins and Wiener [7], and the reviews cited at the end of Section 1.1.
While Internet is physical, the WWW is virtual. With the rapid growth
of the WWW, the interest in its properties is growing as well. It is of great
practical importance to know what the structure of the WWW is, for example,
to allow search engines to explore it efficiently. Notorious is the Page-Rank
problem: to rank web pages in such a way that the most important pages come
up first. The Page-Rank algorithm is claimed to be the main reason behind the
success of Google, and its inventors were also the founders of Google (see Brin
16
and Page [6] for the original reference).
Figure 1.4: The in-degree sequence in the WWW.
Albert, Jeong and Barabasi [3] studied the degree distribution of the WWW.
They found that the in-degrees obey a power-law distribution with exponent
in 2.1, while the out-degrees obey a power-law distribution with exponent
out 2.5. Their analysis was based on several Web domains, such as nd.edu,
mit.edu and whitehouse.gov (the Web domains of Notre Dame University, Massachusetts Institute of Technology and the White House). Furthermore, they
investigated the average distance d between the vertices in these domains, and
found it to grow linearly with the logarithm of the size n of the domain, with
an estimated dependence of the form
d = 0.35 + 2.06 log n.
Extrapolating this relation to the estimated size of the WWW at the time
(n = 8 108 ), they concluded that the diameter of the WWW was 19, which
prompted them to the following quote:
Fortunately, the surprisingly small diameter of the web means that all
information is just a few clicks away.
Kumar, Raghavan, Rajagopalan and Tomkins [27] were the first to observe that
the WWW has a power-law degree distribution (see Fig. 1.4).
The most extensive analysis of the WWW was performed by Broder, Kumar,
Maghoul, Raghavan, Rajagopalan, Stata, Tomkins and Wiener [7]. They divide
the WWW into four parts (see Fig. 1.5):
17
Tendrils
44 Million
nodes
IN
SCC
OUT
44 Million nodes
56 Million nodes
44 Million nodes
Tubes
Disconnected components
Figure 1.5: The structure of the WWW.
(a) The Strongly Connected Component (SCC) the central core consisting
of those pages that can reach each other along the directed links (28% of
the pages).
(b) The IN part, consisting of pages that can reach the SCC, but cannot be
reached from it (21% of the pages).
(c) The OUT part, consisting of pages that can be reached from the SCC,
but do not link back to it (21% of the pages).
(d) The TENDRILS and other components, consisting of pages that can neither reach the SCC, nor be reached from it (30% of the pages).
It was found that the SCC has diameter at least 28, while the WWW as a whole
has diameter at least 500. The relatively high values of these numbers are due
in part to the fact that the graph for the WWW is directed. When the WWW
is considered as an undirected graph, the average distance between vertices
decreases to around 7. Furthermore, it was found that both the in-degrees and
the out-degrees in the WWW follow a power-law distribution, with exponents
in 2.1 and out 2.5, in accordance with the rough findings obtained earlier.
When the WWW is considered as a directed graph, the distances between
most pairs of vertices within the SCC are at most 7, similar to the Six Degrees
of Separation found in social networks. See Fig. 1.6 for a histogram of pairwise
distances in the sample.
18
Figure 1.6: Average distances in the Strongly Connected Component of the WWW
(Adamic [1]).
1.3
1.3.1
Technological Networks
Internet
Internet is a physical network of computers, connected by cables transferring

data. It is an undirected network: information can travel both ways along the
cables. A snapshot of Internet is portrayed in Fig. 1.7.
The fine structure of Internet changes continuously, due to local rearrangements of networked computers within, for instance, organisations. Therefore the
network is usually studied at a coarse-grained level, treating as vertices whole
groups of computers, within which rearrangements may occur frequently due
to local handling, but between which there are large-scale stable connections.
These groups of computers are called autonomous systems, which approximately
correspond to domain names. The properties of Internet have been studied
in many references: see e.g. Caldarelli, Marchetti and Pietronero [9], PastorSatorras, Vazquez and Vespignani [34], Chen, Chang, Govindan, Jamin, Shenker
and Willinger [10], and the book by Pastor-Satorras and Vespignani [35].
In Internet, IP-packets cannot use more than a certain threshold of physical
links. If distances in the Internet would be larger than this threshold, e-mail
service would break down. Consequently, the graph of the Internet has evolved
in such a way that typical distances are relatively small, even though the Internet
itself is rather large. Fig. 1.8 depicts the hopcount, which is the number of
routers traversed by an e-mail message between two uniformly chosen routers,
and the AS-count (= the number of Autonomous Systems that are traversed by
an e-mail data set), which is typically bounded by 7.
Fig. 1.9 plots the degree distribution on a log-log scale, i.e., log k 7 log Nk
1.3. TECHNOLOGICAL NETWORKS
19
Figure 1.7: Portrait of a particular snapshot of Internet (from http://www.watblog.

com/wp-content/uploads/2013/01/1069524880.2D.2048x2048.png).
0.4
0.12
0.3
0.08
0.2
0.04
0.1
0.00
0.0
1
9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43
8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
10
11
12
13
Figure 1.8: Internet hopcount data and number of AS traversed in hopcount data.
Data courtesy Hongsuda Tangmunarunkit.
1000
1000
100
100
10
10
100
1

10
[10

=$7
5 - -7$ /

X$ @$ $J@= : $%
5 -
20
10000
10000
10000
"971108.out"
exp(7.68585) * x ** ( -2.15632 )
10000
"980410.out"
"981205.out"
exp(7.89793)
exp(8.11393) **xx****(( -2.16356
-2.20288))
exp(8.52124) * x
1000
1000
1000
1000
100
100
100
100
10
10
10
10

10
100
11
11
[ 10
D10@
100
100
[
10
Figure 1.9:
@$ $in
J@the
! : $%7
511/1997

Degree
=$7
distributions
5 - -7$ /
of

AS
X$ domains
5 -- --77H$ and
/
12/1998

X$ @$ $J@= : $%
5 -
=
months
on a log-log scale (Faloutos [20]): Power-law distribution with exponent 2.15 2.20.
U ! $ -_ $ -
$ $ c
X U5$
e>
$ >Hc! !
$ U5$ : - Z X- W $ 5 $
-( f $[ H e HD $U
10000
: =O[W exp(8.52124)
$
$ $
* x ** ("routes.out"
$-2.48626
)

[ X 5 ; @/ _ $
with Nk the number of vertices of
degree
k.
When
N
is
proportional
to
an
$
[1000
/ e$ $$ @D $;:
$ - kH_
3 $;
1000

inverse power of k, i.e.,

/ Sck -
$( $ f -$ Hd $ $ =;

!"$#
N
O[ 100
k ;
Z$ > 7 @ $ - OZ (1.1) 02546879&:<;=/=>1?A@5+BC>=DE1?F71GH/=DJI
100
\Z$ $ X $1 $(! > ! 1 @ $ -
BCDL@ >"1=@1+>=0BM1+7546021<0C./79&:<;=/=>N
for some constant c and some exponent
$7 $;[ = O Z _ $ _
$ 1 , then
5TJ1+7DJ025+70I$UV
$ D S$[ $ :Z W$ [ $ @
10

[

10 7 @ O[Z/ H $e (@ X $ H
O[BWYX
log Nk
S c@
1k.
xe \ !7S $ X $ $(1.2)
log
O[log
$ $ ; 7O[ e $ e
W=( X
Z %H[( ]\^ /=09&D)@6 1+00C./_79&:<
[ - 1
1
Here,
The
can be esti- K(BC0C.HBC7 .1=@ DR`4/=>=DJ9&DR0C./R79&:<;=/=>O1
1 denotes an10 uncontrolled approximation.
100 H
:7
1 HX 1
$ [ $exponent
D@

$10 BQ O[B100
KS/eGH/Cf(7)/g0C./FD=6 1=@/e1?0C.H
$ and

@
$this
e $log-log
$ plot,
B$
for
W the

AS-data
$D5gives [ I
mated by the slope of the line in the
I Uc
- aa X H H_ $ P
$ - 5 B
the estimate
D J
$ : $that
$@%$ $must
S@$= have
$H KN
k =
3n,
so i $ D $ -;

!$72.15
5 -
-72.20.
$ /Naturally,

-X$ : we
%
B 5 - - 7kN
:
$O$ =\ X U0 Qe
@ @
$
$ [ k j
that it is reasonable to assume that
> 1.
a ! $ Wd%(
S _ $$ 5 % $ D

@ / @ -; K $[ H
$
>
U ! $ -_ $ -
$ $ c
X U5$
$ ;$ UWe >

$ %
$ > Hc !$ ! :eH ( $$ ;$
U 7D X $
$ [ X
$ [ ZH $ - De _ H D - Q e

1S [ %
$U
$ U5$ : - Z X- W $ 5 $ $
> $- ( f
W !
S D $ $ % $K K=
$ @B$
dqdo|vlv ri wkh ghjuhh judsk prgho +vhf1
=duw
O[riW $ $
$
$ [ $ !
: LLL,1
Wkh
= $ dv 5vlpsoh
$ dqg
1sdu0
0.4 = $ O@
[ X 5
; @/ _ $
$
$ D
prgholqj frqvlvwv lq sursrvlqj d prgho
wkdw
- H_
3 $;
/ e$ $$ @D $;: $ D ; > $ W
$
[pdwfkhv
$ uhdolw|
vlprqlrxv lq lwv sdudphwhuv dv srvvleoh

dv forvh dv srvvleoh1 Wr uvw rughu/ wkh/ sorwv

S lq
- Fkhq
$( $hw do1
f -$ Hd $ $ =;

!"$# %& '(%)+*-,./02143
^9` vwloo ghprqvwudwh d srzhu0olnh ehkdylru
;Olq
[ wkh
7 @0.3 $ - OZ
Z$ghjuhh
> glv0
02546879&:<;=/=>1?A@5+BC>=DE1?F71GH/=DJI O[ IAK(BC0C.HBC7 .1=@ DJI
$1 $(! > ! 1 @ $ -
\Z$Wkhuhiruh/
$ X zh
kdyh
BCDL@ >"1=@1+>=0BM1+7546021<0C./79&:<;=/=>N1?O.1=@ DP021Q0C./R@1+KS/=>N1?
wulexwlrq/ dowkrxjk qrw d shuihfw rqh1
$ 1 ri $srzhu0olnh
7 $;[ = O Z _ $ _
5TJ1+7DJ025+70I$UV
frqvlghuhg khuh wkh prvw jhqhudo ghvfulswlrq
$ D S$[ $ :Z W$ [ $ @
glvwulexwlrq ixqfwlrqv/ vshflhg lq +5,/
zklfk
doorzv
ghyl0

[

7 @ O[Z/ 0.2H $e (@ X $ H
O[BWYX >
S @ ghjuhh1
O[1 Wkh
dwlrqv ri wkh srzhu0odz lq uhjlphv ri vpdoohu
xe \ !7S $ X $ $
ehdxw| ri dq dv|pswrwlf dqdo|vlv lv wkdw wkhvh vpdoo gh0

Z %H[( ]\^ /=09&D)@6 1+00C./_79&:<;=/=>R1?)@5+BC>=D_1?L71GH/=DJI O[ I
rqo|
- sod| d vhfrqg
K(BC0C.HBC7 .1=@ DR`4/=>=DJ9&DR0C./R79&:<;=/=>O1?_.1=@ DRBC76 1Ja4326 1JaDbTJ546 /c_d1+>
yldwlrqv iurp dq h{dfw sro|qrpldo[odz
H
:
7HX 1$ $ 0.1$ BQ O[B
IKS/eGH/Cf(7)/g0C./FD=6 1=@/e1?0C.HBCDY@6 1+0021h;=/F0C./ $
$
rughu uroh1 Khqfh/ zh eholhyh wkhuh elv
ydoxh
$vwloo
$ lq $vwxg|lqj
B$
W

@
$ $D5
[ I Uc
- aa X H H_ $ $ - 5 B
wkh ghjuhh judsk1
i $ D $ -;
>$D5 @ ;
0.0
Lq wklv sdshu/ zh irfxv sulpdulo|:rq
wkh
prgholqj
=\ X0 U10 2 Q3e 4 @ 5 @ 6
$
7$ 8[ 9kj X H% W
$Ori$ wkh
prghov=
wkh
DV0krsfrxqw kDV 1 Zh sursrvh wzr
a glhuhqw
! $
S $$ % $ # AS Hops
Z$ $ W D $
@k / @ -; K $[ H <
$;wkh
$
> $% dqg

$ H( $;$ U7D
ghjuhh judsk +vhf1 LLL, iru prgholqj
$ UWDV0krsfrxqw
$
$ [ X [ $ $
>
Figure+vhf1
1.10:
of AS traversed in various
K=
sets.
@B$courtesy
wkh udqgrp judsk zlwk sro|qrpldo olqn zhljkwv
YL,Number
$% $K data
$ Data

7$ Piet
$ Hvan

6} Ai ThLM@M*|) _i?t|) u?U|L? Lu |i 5 LTUL?| uLh
dv d prgho iru wkh LS0krsfrxqw lq dq DV1
Hduolhu zrun
Mieghem.
|hii _gihi?| hi4L|i hL|i UL**iU|Lh +W,c 5Wj @?_ wWj
+^48` wr ^4<`, zdv pruh eldvhg wr prgho wkh LS krsfrxqw
*tL |i ?4Mih Lu _gihi?| 5AOt E| _gihi?| +,6Wj
t 4i?|L?i_ ? |i *@t| UL*4?
kLS 1 Vhfwlrq Y sorwv vlpxodwlrq uhvxowv ri wkh DV krs0
frxqw glvwulexwlrq ghulyhg iurp wkh ghjuhh judsk dqg frp0
Interestingly,
the AS-counts of various different data sets (focussing on differsduhv wkhvh zlwk wkh suhvhqwhg Lqwhuqhw phdvxuhphqwv
ri d Srlvvrq
udqgrp yduldeoh dv h{sodlqhg lq ^48` dqg ixuwkhu
vhfwlrq LL1 Wkh qryhow| ri wkh sdshu olhv lq
glvfxvvlrq
entwkhparts
of the hoderudwhg
Internet)lqyield
roughly
same
picture.
AskDV
shown in Fig. 1.10,
vhf1 YL
dqg YLL/the
zkloh
wkh DV
krsfrxqw
ri wzr ixqgdphqwdoo| glhuhqw prghov wkdw
kdyh
erwk wkh between
the
AS-count
ASs in North-America, respectively, between ASs in
ehkdyhv glhuhqwo|1
srwhqwldo wr pdwfk wkh uvw rughu fkdudfwhulvwlfv +H ^kQ `
wkhvh
revhuydwlrqv/
glhuhqw
are Lq
quite Lqvsluhg
close. e|
This
indicates
thatwzr
theedvlfdoo|
AS-count
is robust, and hints
dqg ydu ^kQ `, ri wkh DV dqg LS krsfrxqw/Europe
uhvshfwlyho|1
zloo eh glvfxvvhg=
wkh ghjuhh judsk
lq vhf1 LLL
dv d the dependence
at uhvxowv
the fact
thatprghov
the AS-graph
is homogenous.
In other
words,
dgglwlrq/ zh suhvhqw qhz dqg pruh suhflvh
rq wkh
prgho iru wkh DV judsk dqg wkh udqgrp judsk zlwk sro|0
ghjuhh judsk wkdq suhylrxvo| rewdlqhg e|ofQhzpdq
hw do1
the AS-count
on the
of YL
thedv network
is LS0krsfrxqw
fairly weak, even though a
qrpldo
olqngeometry
zhljkwv lq vhf1
prgho iru wkh
^46` ru Uhlwwx dqg Qruurv ^47`/ exw suhvhqw wkh lqyroyhg lq dq DV1
pdwkhpdwlfdo surriv hovhzkhuh ^53`1 Rxu frqvwuxfwlrq ri
wkh ghjuhh judsk doprvw dozd|v dyrlgv +xquhdolwlvwlf, vhoi0
YYY Cj aj}ijj }iBV
orrsv/ zklfk duh wrohudwhg lq ^46` dqg ^47`1 Ilqdoo|/ zh sur0
Wkh uvw dssurdfk wr prgho wkh DV krsfrxqw vwduwv e|
srvh dq lqwhjudwhg prgho iru wkh hqg0wr0hqg LS0krsfrxqw
zklfk lv edvhg rq wkh wzr0ohyho urxwlqj klhudufk| lq Lqwhu0 frqvlghulqj d judsk zlwk Q qrghv frqvwuxfwhg iurp d jlyhq
ghjuhh vhtxhqfh/
qhw1
G4> G5> = = = > GQ
YY jBtij6jA|t Nu 7 NVWNA| A YA|jiAj| Kdyho dqg Kdnlpl ^8/ ss1 49` kdyh sursrvhg dq dojrulwkp wr
Wkh Urxwlqj Lqirupdwlrq Vhuylfh +ULV, surylghv lqiru0 frqvwuxfw iurp d jlyhq ghjuhh vhtxhqfh d frqqhfwhg judsk
pdwlrq derxw EJS urxwlqj lq wkh Lqwhuqhw1 Wkh ULV lv d zlwkrxw vhoi0orrsv1 Pruhryhu/ wkh| ghprqvwudwh wkdw/ li
surmhfw ri ULSH +vhh iru pruh ghwdlov ^54`, dqg wkh ULV fro0 wkh ghjuhh vhtxhqfh vdwlvhv fhuwdlq frqvwudlqwv vxfk dv
10000
"981205.out"
exp(8.11393) * x ** ( -2.20288 )
Pr[h AS = k]
RIPE
AMSIX
LINX
E[hAs ]
Var[hAs ]
alfa
2.81
3.13
2.91
1.04
1.06
0.98
2.70 1163687
2.95 366075
2.97 168398
# points
1.3. TECHNOLOGICAL NETWORKS
21
priori we might expect geometry to play a role. As a result, most models for
the Internet, as well as for the AS-graph, ignore geometry altogether.
A topic of research that is receiving considerable attention is how the Internet
behaves under random breakdown or malicious attacks. The conclusion is that
the topology of the Internet is critical for its vulnerability. When vertices with
high degrees are taken out, the random graph models for the Internet cease to
have the necessary connectivity properties. See Cohen, Erez, ben Avraham and
Havlin [12, 13].
1.3.2
Transportation Networks
Transportation networks such as road, railway or airline networks (see Fig. 1.11)
tend to become increasingly complex. To allow these networks to function efficiently, traffic controllers need to deal with disruptions (for instance, due to bad
weather conditions). One objective is to develop robust scheduling algorithms
that take the random nature of traffic into account and can properly cope with
disturbances. Another objective is to be able to provide on-the-fly information
to travelers, so that they can adapt their travel plans to changing circumstances.
Figure 1.11: Flights worldwide on a single day.
The most important issue in the Dutch railway network is that it is tight, due
to the scarce space that is available for extending the network where needed.
As a result, the timetable is not sufficiently robust with respect to modifications in the circumstances caused by accidents, weather conditions, or signaling
breakdown.
Most of the scheduling and planning problems in railway and airline traffic
are very hard (so-called NP-complete), but in practice good approximation
algorithms may do a great deal. Also randomized algorithms can be very useful,
22
as they may yield an optimal profit on the average.
1.3.3
Energy Networks
Energy networks transport energy from providers to users. Examples are electricity grids. Because of their vital interest, these grids need to be designed to
achieve consistently high levels of performance and reliability, and yet need to
be cost-effective to operate. In order to prevent overflow of buffers, mechanisms
must be put in place to ensure that it is highly unlikely for the aggregate arrival rate to exceed the service rate for any length of time. It is critical that
the aggregate production rate of the energy sources is sufficient to meet the
consumption rate of the users with extremely high probability.
With the rising deployment of renewable resources such as wind farms and
solar panels, the generation of energy increasingly exhibits random fluctuations
over time. In addition, the production rate of conventional energy resources
and power plants is subject to uncertainty and variability, due to supply disruptions, technical failures or calamities. These phenomena give rise to very
distinct characteristics, rendering centralised operation impractical, and creating a strong need for distributed control mechanisms. At the same time, the
rapid advance of smart-grid technology offers growing opportunities for actively
controlling energy supply and demand.
1.4
Economic Networks
The economy is a large, complex and networked system operating at different

scales and with extremely heterogeneous components. From the level of individuals (which can be thought of as the basic agents of the economy) up to the
level of firms, organizations, stock markets, industries and whole countries, the
economic system is in fact an intricately connected network with many layers
and degrees of complexity. The availability of different sources of data allows
us to represent various projections of thisnetwork. Examples are reviewed for
instance in Caldarelli, Battiston, Garlaschelli and Catanzaro [8].
1.4.1
Financial Networks
In financial markets, a large number of people (individual investors as well as

companies) interact through financial transactions. The main empirical signals
associated with such transactions are the (highly fluctuating) time series of
prices of the financial entities being traded. Whether market agents are willing
to buy or sell depends on the prices of the financial assets that are being traded.
In turn, once they occur, transactions modify the price of such assets. Therefore
the time series of financial prices are at the same time the input and the output
of a collective process involving the actions of a large number of people.
Networks of market traders can in principle be defined, where nodes are
individual traders in a financial market and links represent transactions among
these traders. Such links can be defined either dynamically, i.e., in such a way
that they appear and disappear over time, or statically, e.g. as an aggregation
of all the transactions that occurred during a given time window. However,
1.4. ECONOMIC NETWORKS
23
these networks are extremely difficult to observe and analyse, because of the
high confidentiality of the data that are required as input.
What is much easier to obtain are the (publicly available) time series of price
increments2 of stocks. From a set of n synchronous time series, it is possible to
calculate the n n matrix of pairwise correlation coefficients between each pair
of stocks. The correlation coefficient ij between two time series xi (t) and xj (t)
(where xi (t) denotes the increment of the i-th time series at time t) is defined
as
xi xj xi xj
(1.3)
ij q

2
xi xi 2 x2j xj 2
where, if f (t) is a time series defined for t = 1, . . . , T , the time average f is dePT
fined as f T1 t=1 f (t). After the empirical correlation matrix is calculated,
networks of financial correlations can be defined by representing the financial
entities (e.g. stocks) as vertices and the strongest correlations as links. The
strongest correlations are defined either as the set {ij } of correlations exceeding a given global threshold , or as the minimum set of correlations (taken in
decreasing value) that ensure some global connectivity property in the output
network (such as the existence of paths connecting all pairs of stocks, while
avoiding the creation of loops, or finally as the set of correlations that exceed
a reference value calculated under some null hypothesis.Networks of financial
correlations have been used to study the returns of assets in a stock market
[28, 5, 33] and interest rates [17]. See Fig. 1.12(a) for an example.
1.4.2
Shareholding Networks
Another possibility is to define firm ownership and shareholding networks (Kogut

and Walker [26]), where the vertices are companies and/or shareholders and the
edges represent the ownership relations between the corresponding vertices. See
Fig. 1.12(b) for an example. Since the vertices often represent individual persons, these networks are in some sense also social networks.
We note that corporate board and director networks can be re-defined in
order to obtain economic networks, where boards are connected when they have
at least one director in common (Newman, Strogatz and Watts [31], Davis, Yoo
and Baker [14]).
1.4.3
World Trade Web
Yet another important networked economic system is the World Trade Web, describing the trade relationships among the world countries (Serrano and Bogu
na
[38]).
2 In the simplest case, the t-th increment of a financial time series is defined as the difference
between the price at time t and the price at time t 1. However, for technical reasons that we
do not discuss here, an alternative and frequently used definition of increment is the difference
between the logarithms of the prices.
24
Figure 1.12: (a) Network formed by the strongest correlations among the stocks of the
S&P500 index, based on correlations between the log-returns of daily closing prices
from 2001 to 2011. Stocks are coloured according to their industrial classification
(from MacMahon and Garlaschelli 2014). (b) Snapshot of the shareholding network
in Italy in 2001. Vertices are companies and edges represent who owns whom, i.e.,
the ownership relations among companies (from Garlaschelli et al. 2005).
1.5. BIOLOGICAL NETWORKS
1.5
1.5.1
25
Biological Networks
Metabolic Networks
Biological networks are shaped by natural evolution. Therefore their structure

can shed light on how a specific function selects a particular topology.
Figure 1.13: A functional network for a yeast cell of correlated genetic interaction
profiles. Genes sharing similar genetic interaction profiles are proximal to one another.
Less similar genes are positioned further apart. Colored genes are enriched for GO
biological processes as indicated (from Costanzo et al. 2010).
1.5.2
Protein Interaction Networks and Genetic Networks
Examples at the cellular level include metabolic networks (Jeong, Tombor, Albert, Oltvai and Barab
asi [24]), where metabolic substrates are linked by directed edges when a known biochemical reaction exists between them, and
protein interaction networks (Jeong, Mason, Barabasi and Oltvai [23]), where
proteins are connected by an undirected edge when they interact by physical
contact. Similarly, genetic networks represent the correlations among the expression profiles of different genes in a cell (see Fig. 1.13 for an example on a
yeast cell).
1.5.3
Neural Networks and Vascular Networks
Examples at the organism level include neural networks (White, Southgate,

Thomson and Brenner [42]), describing the directed synaptic connections among
neurons in the brain, and vascular networks (West, Brown and Enquist [40],
[41]), such as blood vessels in animals and vessels in plants, describing the
(directed) transportation of nutrients between the various regions and tissues of
an organism.
26
1.5.4
Food Webs
Examples at the community level include food webs (Elton [19], Pimm [36],
Cohen, Briand and Newman [11]), where two biological species are connected
by a directed edge when a predator-prey relation exists between them.
1.6
Still other types of networks
The four classes (I)(IV) in Section 1.1, of which examples were listed in Sections 1.21.5, are not exhaustive. We give two further examples.
1.6.1
Semantic Networks
In word networks, words are represented by vertices and edges are placed between words when some linguistic relation exists between them. Two examples
of undirected networks are word synonymy networks (Ravasz and Barabasi [37]),
where words are connected when they are listed as synonyms in a dictionary,
and word co-occurrence networks (Ferrer i Cancho and Sole [21]), where words
are connected when they appear one or two words apart from each other in the
sentences of a given text.
Examples of directed networks are given by networks of dictionary terms,
where words are connected when a (directed) link between them is reported in
a given dictionary, and of free associations, reporting the outcomes of psychological experiments where people are asked to associate input words to freely
chosen output words.
1.6.2
Co-occurrence Networks
Co-occurrence networks, where nodes represent events and edges are established
between events that co-occur together (possibly with a weight that quantifies
the frequency of co-occurrence), form a huge class of networks. Examples include the aforementioned word co-occurrence networks, as well as the collaboration networks discussed in Section 1.2.2, viewed as examples of social networks.
For instance, examples in the field of scientometrics are co-authorship and cocitation networks, where nodes are scientific articles and edges indicate that two
articles have been co-authored by the same author, respectively, co-cited by the
same paper.
Yet another example is given by networks of co-purchased products, where
two products are linked when they have been frequently purchased together.
Such networks are at the basis of the automatic recommendation systems routinely used e.g. by online shops. In Fig. 1.14 we show The Political Books Network compiled by Valdis Krebs [43]. This network represents books about US
politics sold by Amazon.com. Edges represent frequent co-purchasing of books
by the same buyers, as i ndicated by the customers who bought this book also
bought these other books feature on Amazon.
1.6. STILL OTHER TYPES OF NETWORKS
27
Figure 1.14: The network of frequently co-purchased (on Amazon.com) books

about US politics. The political viewpoints of these books are given by liberal
(circles), neutral (triangles) and conservative (squares), respectively. The
colour of vertices is assigned by a so-called community detection algorithm that
finds groups of vertices that are more densely connected among themselves than
with the rest of the network (see Chapter 4, Section 4.2.4). The more central
vertices in the liberal and conservative communities are surrounded by black
boxes. Modified from [43].
Bibliography
[1] L.A. Adamic, The small world web, in: Lecture Notes in Computer Science
1696, Springer, 1999, pp. 443454.
[2] R. Albert and A.-L. Barabasi, Rev. Mod. Phys. 74 (2002) 47.
[3] R. Albert, H. Jeong and A.-L. Barabasi, Internet: Diameter of the worldwide web, Nature 401 (1999) 130131.
[4] A.-L. Barab
asi, Linked: The New Science of Networks, Perseus Publishing,
Cambridge, Massachusetts, 2002.
[5] G. Bonanno, F. Lillo and R.N. Mantegna, Quantitative Finance 1 (2001)
96.
[6] S. Brin and L. Page, The anatomy of a large-scale hypertextual web search
engine, Computer Networks and ISDN Systems 33 (1998) 107117.
[7] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata,
A. Tomkins and J. Wiener, Graph structure in the web, Computer Networks 33 (2000) 309320.
[8] G. Caldarelli, S. Battiston, D. Garlaschelli and M. Catanzaro, book chapter
in: Complex Networks (eds. E. Ben-Naim, H. Frauenfelder, Z. Toroczkai),
Lecture Notes in Physics 650, Springer, 2004, pp. 399423.
[9] G. Caldarelli, R. Marchetti and L. Pietronero, Europhys. Lett. 52 (2000)
386.
[10] Q. Chen, H. Chang, R. Govindan, S. Jamin, S.J. Shenker and W. Willinger,
Proceedings of the 21st Annual Joint Conference of the IEEE Computer
and Communications Societies, IEEE Computer Society, 2002.
[11] J.E. Cohen, F. Briand and C.M. Newman, Community Food Webs: Data
and Theory, Springer, Berlin, 1990.
[12] R. Cohen, K. Erez, D. ben Avraham and S. Havlin, Resilience of the internet to random breakdowns, Phys. Rev. Lett. 85 (2000) 4626.
[13] R. Cohen, K. Erez, D. ben Avraham and S. Havlin, Breakdown of the
internet under intentional attack, Phys. Rev. Lett. 86 (2001) 3682.
[14] G.F. Davis, M. Yoo and W.E. Baker, Strategic Organization 1 (2003) 301.
28
BIBLIOGRAPHY
29
[15] R. De Castro and J.W. Grossman, Famous trails to Paul Erdos, Rev. Acad.
Colombiana Cienc. Exact. Fs. Natur. 23 (1999) 563582. Translated and
revised from the English.
[16] R. De Castro and J.W. Grossman. Famous trails to Paul Erdos, Math.
Intellingencer 21(1999) 5163. With a sidebar by P.M.B. Vitanyi.
[17] T. Di Matteo, T. Aste, S.T. Hyde and S. Ramsden, Proceedings of the
First Bonzenfreies Colloquium on Market Dynamics and Quantitative Economics, Physica A 355 (2005) 2135.
[18] S.N. Dorogovtsev and J.F.F. Mendes, Advances in Physics 51 (2002) 1079.
[19] C.S. Elton, Animal Ecology, Sidgwick & Jackson, London, 1927.
[20] C. Faloutsos, P. Faloutsos and M. Faloutsos, On power-law relationships of
the internet topology, Computer Communications Rev. 29 (1999) 251262.
[21] R. Ferrer i Cancho and R.V. Sole, Proceedings of the Royal Society of
London B268 (2001) 2261.
[22] R. van der Hofstad, Random Graphs and Complex Networks, Volume I,
monograph in preparation. File can be downloaded from http://www.win.
tue.nl/~rhofstad/
[23] H. Jeong, S. Mason, A.-L. Barabasi and Z.N. Oltvai, Nature 411 (2001) 41.
[24] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabasi, Nature
407 (2000) 651.
[25] J.M. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins,
in: Proceedings of the International Conference on Combinatorics and
Computing, Lecture Notes in Computer Science 1627, Springer, Berlin,
1999, pp. 118.
[26] B. Kogut and G. Walker, American Sociological Review 66 (2001) 317.
[27] R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, Trawling the web
for emerging cyber communities, Computer Networks 31 (1999) 14811493.
[28] R.N. Mantegna, Eur. Phys. J. B 25 (1999) 193.
[29] S. Maslov, K. Sneppen and A. Zaliznyak, Physica A 333 (2004) 529540.
[30] M.E.J. Newman, SIAM Review 45 (2003) 167.
[31] M.E.J. Newman, S.H. Strogatz and D.J. Watts, Phys. Rev. E 64 (2001)
026118.
[32] M.E.J. Newman, D.J. Watts and A.-L. Barabasi, The Structure and Dynamics of Networks, Princeton Studies in Complexity, Princeton University
Press, 2006.
[33] J.-P. Onnela, A. Chakraborti, K. Kaski and J. Kertesz, Eur. Phys. J. B 30
(2002) 285.
30
BIBLIOGRAPHY
[34] R. Pastor-Satorras, A. Vazquez and A. Vespignani, Phys. Rev. Lett. 87

(2001) 258701.
[35] R. Pastor-Satorras and A. Vespignani, Evolution and Structure of the Internet. A Statistical Physics Approach, Cambridge University Press, Cambridge, 2004.
[36] S.L. Pimm, Food Webs, Chapman & Hall, London, 1982.
[37] E. Ravasz and A.-L. Barabasi, Phys. Rev. E67 (2003) 026112.
Serrano and M. Bogu
[38] M.A.
na, Phys. Rev. E 68 (2003) 015101(R).
[39] D.J. Watts, Six Degrees. The Science of a Connected Age, W.W. Norton &
Co. Inc., New York, 2003.
[40] G.B. West, J.H. Brown and B.J. Enquist, Science 276 (1997) 122.
[41] G.B. West, J.H. Brown and B.J. Enquist, Science 284 (1999) 1677.
[42] J.G. White, E. Southgate, J.N. Thomson and S. Brenner, Phil. Trans. R.
Soc. London B 314 (1986) 1.
[43] X. Cao, X. Wang, D. Jin, Y. Cao & D. He. Scientific Reports 3, 2993
(2013).
Chapter 2
Random Graphs
In this chapter we describe some key concepts in graph theory. In Section 2.1
we introduce graphs and random graphs, and look at four particular scaling
features as these graphs become large. (More detailed scaling features will be
discussed in Chapter 4.) In Section 2.2 we analyse the simplest random graph
model, due to Erd
os and Renyi, where edges occur randomly and independently.
Random graphs are models for complex networks (randomness is often synonymous to complexity). They are inspired by real-world networks, and are
used as null-models. They play an important role in analysing and explaining
the empirical properties observed in real-world networks. They can also be used
to make predictions.
2.1
Graphs, random graphs, four scaling features
A graph G = (V, E) consists of a set of vertices V (also called nodes or sites) and
a set of edges E (also called links or bonds) connecting pairs of vertices. A graph
is called simple when there are no self-edges (= no edges between a vertex and
itself) and no multiple edges (= at most one edge between a pair of vertices). A
graph that is not simple is called a multi-graph. Edges are undirected. Graphs
with directed edges are called directed graphs. See Fig. 2.1.
s
s
s
s @
@s
s
s
Figure 2.1: Examples 13 are complete graphs. Examples 13 and 5 are simple
graphs, examples 4 and 6 are multi-graphs. Examples 1 and 5 contain isolated vertices.
Example 5 has two clusters.
Not all pairs of vertices need to be connected by an edge. A graph that is

simple and has all pairs of vertices connected by an edge is called a complete
graph. Some vertices may have no edge at all. Such vertices are called isolated. A
cluster or connected component is any maximal subset of vertices whose vertices
are connected by edges (= maximally connected component). The size of a
31
32
CHAPTER 2. RANDOM GRAPHS
cluster is the number of vertices it contains. An isolated vertex is a cluster of

size 1. The degree of a vertex is the number of edges attached to it. An isolated
vertex has degree 0. A vertex with a loop has degree 2.
The degree sequence of a graph G is the vector
~k = (ki )iV
(2.1)
with ki the degree of vertex i. The degree distribution is the probability distribution
X
fG = |V |1
k i ,
(2.2)
iV
where |V | is the cardinality of V and ki is the point distribution concentrated

at ki , i.e.,
ki (k) = 1{k=ki } ,
k N0 ,
(2.3)
with N0 the set of non-negative integers. Note that fG is a probability distribution on N0 , whose weights
fG (k) = |V |1 |{i V : ki = k}|,
k N0 ,
(2.4)
represent the fraction of vertices with degree k.

A triple of distinct vertices i1 , i2 , i3 forms a wedge when the edges i1 i2 and
i2 i3 are present, and a triangle when the edges i1 i2 , i2 i3 and i3 i1 are present.
The clustering coefficient of G is the ratio
CG =
G
[0, 1],
WG
(2.5)
where
G =
X
i1 ,i2 ,i3 V
1{i1 i2 ,i2 i3 ,i3 i1
are present} ,
WG =
1{i1 i2 ,i2 i3
are present} ,
i1 ,i2 ,i3 V
(2.6)
i.e., G is 3! = 6 times the number of triangles in G and WG is 2! = 2 times
the number of wedges in G. This definition is sometimes referred to as the
wedge-triangle clustering coefficient. (In Section 4.2.3 a different definition will
be used, but with the same flavour.) A complete graph has clustering coefficient
1, a tree graph has clustering coefficient 0.
The typical distance in G is the ratio
P
i,jV : ij, i6=j d(i, j)
HG = P
[1, ),
(2.7)
i,jV : ij, i6=j 1
where i j means that i and j are connected, and d(i, j) denotes the graph
distance between i and j (= the minimal number of edges in a path between
i and j). In words, HG is the distance between two vertices drawn uniformly
from all pairs of connected vertices. The complete graph has typical distance 1,
a linear graph has typical distance roughly one third of its length.
A random graph is a graph where the vertices and/or edges are chosen randomly. There are many possible ways in which this can be done, and various
different choices have been made with the aim to model real-world networks of
2.1. GRAPHS, RANDOM GRAPHS, FOUR SCALING FEATURES
33
different types. Since networks tend to grow, it is natural to consider sequences

of random graphs
G = (Gn )nN ,
(2.8)
where n denotes the number of vertices in Gn . This is referred to as a random
graph process. We use the symbol P to denote the probability distribution of G.
In what follows we give a precise mathematical definition of four scaling features
of random graph processes, following van der Hofstad [2, Chapter 1]:
(1) G is called sparse when
lim E(kfGn f k ) = 0
(2.9)
for some non-random probability distribution f on N0 , where E denotes

expectation and
kfGn f k = sup |fGn (k) f (k)|
(2.10)
kN0
is a distance between fGn and f (called the supremum-norm on the space

of probability distributions on N0 ). Sparse means that most vertices have
a degree that stays bounded as n .
(2) G is called scale free with exponent when it is sparse and
lim
log f (k)
=
log(1/k)
(2.11)
for some (1, ), i.e., f (k) = k +o(1) as k . Scale free means

that the graph looks similar on all scales.
(3) G is called highly clustered when
lim
E(Gn )
=C
E(WGn )
(2.12)
for some C (0, 1]. Not highly clustered means that locally the graph
looks like a tree.
(4) G is called a small world when
lim P(HGn K log n) = 1
(2.13)
for some K (0, ). If the latter holds with K log n replaced by an

upper bound that is o(log n), then G is called an ultra-small world. In
such cases the upper bound is often K log log n. Small world means that
typical distances grow only very slowly with the size of the graph (and are
almost independent of the size).
Chapter 4 contains several exercises where the reader is requested to compute
empirical properties of examples of graphs.
[Begin intermezzo]
34
In asymptotic analysis, three symbols are used frequently: o, O and . The symbol o stands for is of smaller order than: an = o(bn ) when limn an /bn = 0.
The symbol O stands for is at most of the same order as: an = O(bn ) when
lim supn an /bn < . The symbol stands for is of the same order as:
an = (bn ) when both an = O(bn ) and bn = O(an ). This is also written as
an bn .
[End intermezzo]
2.2
Erd
os-R
enyi random graph
The simplest example of a random graph is the Erd

os-Renyi random graph.
Here, for each n N, we consider the complete graph Kn on n vertices, and for
each of the n2 edges we decide to retain it with probability p (0, 1) and remove it with probability 1 p, independently for different edges. (The retained
edges are called open, the removed edges are called closed.) The resulting
graph is a random subgraph of Kn , and is denoted by ERn (p). It was introduced in 1959 by the Hungarian mathematicians Paul Erdos and Alfred Renyi
[1], and marked the beginning of random graph theory. Chapter 6 describes algorithms to simulate ERn (p). See Fig. 2.2 for two realizations of ER200 (1/200)
and ER200 (3/200).
Figure 2.2: Two realizations of Erdos-Renyi random graphs with 100 vertices and
edge probabilities 1/200, respectively, 3/200. The three largest clusters are ordered
by the darkness of their edge colors (dark blue, blue, light blue). The remaining edges
all have the lightest shade (grey). Courtesy Remco van der Hofstad.
Homework 2.1 Find the distribution of the number of edges in ERn (p)?
Compute its mean and its variance, and show that it satisfies the law of large
numbers and the central limit theorem in the limit as n (look P
up on
Wikipedia what this means). Hint:
Use
that
the
number
of
edges
is
e Ye ,

where the sum runs over the n2 edges of the complete graph Kn , and Ye =
1{e is retained} are i.i.d. (= independent and identically distributed) random variables taking the values 1 with probability p and 0 with probability 1 p. Note
that E(Ye ) = P(e is retained) = p.
2.2. ERDOS-R
ENYI
RANDOM GRAPH
35
The Erd
os-Renyi random graph is not really suitable as a model of a realworld network, for which typically neither the number of vertices is fixed nor the
edges are retained or removed independently. Yet, it captures a basic feature of
a real-world network: complexity.
2.2.1
Percolation transition
We follow the exposition in van der Hofstad [2, Chapter 4]. The Erdos-Renyi
random graph exhibits an interesting phenomenon: ERn (p) has a percolation
transition when we pick p = /n with (0, ) and let n . Namely, the
largest cluster has size
(log n) when < 1,
(n2/3 ) when = 1,
(n) when > 1.
Thus, there is a critical value c = 1 such that ERn (/n) consists of a large
number of small disconnected components when < c (subcritical regime),
but has a large connected component containing a positive fraction of all the
vertices when > c (supercritical regime). At = c there is a percolation
transition: the small clusters coagulate into a large cluster. It can be shown
that for > c there is only one cluster of size (n), while all the other clusters
are of size (log n). It can also be shown that for = c there are multiple
clusters of size (n2/3 ).
Before we explain the intuition behind the above percolation transition, we
make a brief digression into the mathematics of branching processes.
[Begin intermezzo]
A branching process is a simple model for a population evolving over time. Suppose that, in each generation, each individual in the population independently
gives birth to a random number of children, chosen according to a prescribed
probability distribution f = (f (k))kN0 called the offspring distribution, i.e.,
f (k) is the probability that an individual has k children. Let Zn denote the
number of individuals in the n-th generation, where for convenience we pick
Z0 = 1. Then Zn satisfies the recursion relation
Zn+1 =
Zn
X
Xi,n ,
n N0 ,
(2.14)
i=1
where (Xi,n )iN,nN0 is an array of i.i.d. random variables with common distribution f (i.e., Xi,n is the number of children of individual i in generation n).
Let
X
m=
kf (k).
(2.15)
kN0
One of the key results for branching processes is that if m 1, then the population dies out with probability 1 (unless f = 1 ), while if m > 1, then the
population has a strictly positive probability to survive forever. In fact, it turns
out that the extinction probability
= P( n N : Zn = 0)
(2.16)
36
is the smallest solution of the equation (see Fig. 2.3)

= Gf (),
Gf (x) =
xk f (k),
x [0, 1].
(2.17)
kN0
A branching process is subcritical when m < 1, critical when m = 1, and

supercritical when m > 1. (The case f = 1 is uninteresting and is excluded.)
Gf (x)
s
f (0) s
s
x
Figure 2.3: Plot of the generating function x 7 Gf (x) for the case where m =
G0f (1) > 1.
Exercise 2.1 Show that = 0 if and only if f (0) = 0.

Exercise 2.2 When the offspring distribution is given by
f (k) = (1 p)1{k=0} + p1{k=2}
for some p (0, 1), we speak of binary branching. Compute Gf (x), and show
that = 1 when 0 < p 21 and = (1 p)/p when 21 < p < 1.
[End intermezzo]
We are now ready to explain the intuition behind the percolation transition
in the Erd
os-Renyi random graph. Pick any vertex, call this vertex ?, and paint
it green. Next, consider the vertices that are connected to ?. These vertices
all lie at distance 1 from ?, and are painted green as well. Write N1 to denote
their number. Next, consider the vertices connected to the N1 vertices just
painted green, but exclude ?. These vertices all lie at distance 2 from ?, and
are painted green as well. Write N2 to denote their number where, in order to
avoid complications, a vertex is counted each time it has a connection to a green
vertex. Repeat this procedure. The result is a random sequence
(Nd )dN
(2.18)
that we can think of as an exploration process, counting the vertices that are
connected to ? at successive distances (with multiplicities) and painting them
green. The idea is that this exploration process is close to a branching process
when n is large, because the exploration process rarely creates loops.
2.2. ERDOS-R
ENYI
RANDOM GRAPH
37
Homework 2.2 The distribution of N1 is BINOMIAL(n 1, p), the binomial

distribution with parameters (n 1, p) given by

n1 k
fn1,p (k) = P(N1 = k) =
p (1 p)n1k ,
k = 0, . . . , n 1. (2.19)
k
Show that as n :
(a) E(N1 ) = p(n 1) and Var(N1 ) = p(1 p)(n 1) .
(b) E(N2 ) = E(N1 p(n 1 N1 )) = p(n 1)E(N1 ) pE(N12 ) 2 .
(c) E(Nd ) d for every d N.
Thus, for large n, the random sequence (Nd )dN is close to a branching process
d )dN whose offspring distribution has mean . According to the above in(N
d = 0 with probability 1 when < 1 and limd N
d =
termezzo, limd N
with positive probability when > 1. Hence we see that ? lies in a small cluster
when < 1, but has a positive probability of lying in a large cluster when > 1.
The exploration process eventually covers the entire set of vertices. Homework 2.2(b) shows that the set of vertices not yet covered gradually depletes,
as more and more vertices are covered, but that for large n this effect is hardly
noticeable.
Homework 2.3 The probability that ? has degree k equals
fn1,p (k) = P (N1 = k).
Show that
lim fn1,/n (k) = f (k),
k N,
with f = POISSON() the Poisson distribution with parameter given by

f (k) = e
k
,
k!
k N.
(2.20)
Show that the mean of f equals . Hint: Use that limn (1 n1 )n = e1 .

d )dN has offspring distribution f .
Thus, the branching process (N
The above argumentation is heuristic: for finite n both the exploration processes and the degrees associated with different vertices are dependent. However, for large n this dependence is weak and we may approximate the exploration processes from different vertices as independent branching processes in
which each vertex at distance d has a number of vertices at distance d + 1 attached to it whose distribution is close to POISSON(). In other words, for
large n the Erd
os-Renyi random graph locally looks like a random tree.
d is good (recall
> 1: As long as d = o(log n), the approximation Nd N
Homework 2.2) and we have E(Nd ) E(Nd ) d . When d reaches values
in the range (log n), E(Nd ) reaches values in the range (n) (after which
d begins to break down). Hence, ? has a strictly
the approximation Nd N
positive probability to lie in a cluster of size (n).
< 1: When d reaches values in the range (log n), E(Nd ) reaches values
in the range (1/n). Hence, among the n vertices of the graph there are
(1) vertices that lie in a cluster of size (log n).
38
The following comparison is valid for any n without approximation.

d )dN , i.e.,
Exercise 2.3 Show that (Nd )dN is stochastically smaller than (N
d for all
there exists a coupling of the two random sequences such that Nd N
d N with probability 1.
[Begin intermezzo]
A coupling of two random variables X1 and X2 is any pair of random variables
1, X
2 ) such that the marginal probability distributions of (X
1, X
2 ) coincide
(X
with the probability distributions of X1 and X2 , respectively. Given X1 and X2 ,
there are many ways to construct a coupling. For instance, if X1 and X2 have
the same distribution, then the pair (X1 , X2 ) with independent components is
a coupling, but also the pairs (X1 , X1 ) and (X2 , X2 ) with identical components
are. We say that X1 is stochastically smaller than X2 when there exists a
1 X
2 with probability 1.
coupling such that X
[End intermezzo]
We will encounter coupling again in Chapters 89.
2.2.2
Scaling features
Since the degree distribution of ERn (/n) converges to f , the Erdos-Renyi

random graph is sparse. Since f has a thin tail, i.e., f (k) decays faster than
polynomially in k as k , the Erdos-Renyi random graph is not scale free.
Exercise 2.4 Compute the average number of wedges E(WERn (/n) ) and the
average number of triangles E(ERn (/n) ) in the Erd
os-Renyi random graph.
Show that
lim n1 E(WERn (/n) ) = 12 2 ,
lim E(ERn (/n) ) = 3 ,
(2.21)
which implies that C = 0.

Consequently, the Erd
os-Renyi random graph is not highly clustered.
The Erd
os-Renyi random graph is a small world when 6= 1. Indeed, in
the subcritical regime < 1, this fact is obvious because the largest cluster has
size (log n). In the supercritical regime > 1, typical distances are at most
K log n with K = 1/ log by the following heuristic argument. As long as Nd
d in distribution. Since
is small compared to n, we know that Nd is close to N
d
E(Nd ) = , it follows that Nd = (n) when d = (K log n). Since there are
not more than n vertices, the exploration process from vertex ? must stop after
at most (K log n) iterations.
Finally, it is possible to consider a generalised Erd
os-Renyi random graph
in which the parameter is chosen randomly according to a distribution with
a power law tail. In this way the random graph can be made to be scale free
and highly clustered as well. In Chapter 3 we look at more realistic models to
construct random graphs with these properties.
Bibliography
[1] P. Erd
os and A. Renyi, On random graphs, I. Publ. Math. Debrecen 6
(1959) 290297.
tue.nl/~rhofstad/
39
Chapter 3
Network Models
In this chapter we describe two examples of random graphs that are more realistic models of real-world networks than the Erdos-Renyi random graph encountered in Chapter 2. In Section 3.1 we look at the configuration model, in
Section 3.2 at the preferential attachment model. The former is a static realisation of a random graph (like the Erdos-Renyi random graph), the latter is a
dynamic realisation, i.e., it is the result of a growth process.
3.1
3.1.1
The configuration model

Motivation
In this section we investigate random graphs with a prescribed degree sequence,

i.e., the degrees are given to us beforehand. A practical situation may arise
from a real-world network of which we know the degrees but not the topology,
and we are interested in generating a uniformly random graph with precisely
the same degrees (where uniformly random means that all realisations have the
same probability). An interesting question we may want to settle is: Does the
real-world network resemble a uniformly random graph with the same degree
sequence, or does it inherently have more structure?
The configuration model described below was introduced by Bollobas [5],
inspired by earlier work of Bender and Canfield [4]. It generates the desired
random graph by matching half-edges in a uniformly random manner. This
comes at the expense of possibly creating self-loops and multiple edges, but
these can be removed afterwards.
3.1.2
Construction
We follow van der Hofstad [9, Chapter 7]. Suppose that we take the degree
sequence as the starting point of our model, i.e., for n N we associate with
each vertex i V = {1, . . . , n} a pre-specified degree ki N0 , forming a prespecified degree sequence
~k = (k1 , . . . , kn ),
(3.1)
and we connect the vertices with edges in some way so as to realise these
degrees. To that end, we think of placing ki half-edges (stubs) incident to
40
3.1. THE CONFIGURATION MODEL
41
vertex i, and matching the different half-edges in some way so as to form full
edges. One way to do this is to match the half-edges in a uniformly random
manner. This leads to what is called the configuration model (see Fig. 3.1). The
resulting random multi-graph is denoted by CMn (~k) and is referred to as the
configuration model. Chapter 7 describes algorithms to simulate CMn (~k).
It does not matter in which order the half-edges are paired in the pairing
procedure. As long as, conditionally on the paired half-edges so far, the next
half-edge is paired to any of the remaining half-edges with equal
Pnprobability, the
final outcome is the same
in
distribution.
The
total
degree
is
i=1 ki , the total
Pn
number of edges is 21 i=1 ki , which is why the total degree must be even.
Exercise 3.1 Show that there are (2m 1)!! = (2m 1) (2m 3) 3 1
different ways of pairing 2m half-edges. Show that not all pairings give rise to
a different graph.
The degree distribution associated with CMn (~k) is (recall (2.2))
fCMn (~k) = n1
n
X
ki .
(3.2)
i=1
By choosing ~k such that (recall (2.9))

lim E(kfCMn (~k) f k ) = 0
(3.3)
for some pre-specified probability distribution f (where E is the average with

respect to the randomness of the graph), we can use the configuration model as
a particular way to realise a sequence of random graphs that is sparse and scale
free with any desired exponent.
The pairing procedure may not lead to a simple graph: self-edges and multiple edges may occur. However, we will see that if the degrees are not too large,
more precisely, if (see (3.11)(3.12) below)
lim Var(fCMn (~k) ) = Var(f ) <
(3.4)
with Var(f ) the variance of f , then the resulting graph is simple with a strictly
positive probability. By conditioning on the graph being simple, we end up
with a random graph that has the pre-specified degree sequence. Sometimes
this is referred to as the repeated configuration model, since we may think of
the conditioning as repeatedly forming the graph until it is simple. Another
approach is to remove the self-edges and multiple edges afterwards, which is
referred to as the erased configuration model. It can be shown that when n ,
the degree distributions in these two models also converges to f . Hence, for large
n the conditioning and the erasing do not alter the degrees by much, and they
are completely harmless in the limit as n . To keep the computations
simple we stick to the original construction.
3.1.3
Graphical degree sequences
A natural question is: Which sequences of numbers can occur as the degree
sequence of a simple graph? A sequence ~k = (k1 , . . . , kn ) with k1 k2 . . .
42
CHAPTER 3. NETWORK MODELS
Figure 3.1: Simulation of the configuration model with n = 7 vertices and degree
sequence ~k = (5, 5, 4, 5, 5, 3, 5). The pictures show how 16 pairs of half-edges are
randomly matched to become 16 edges. Courtesy Oliver Jovanovski.
3.1. THE CONFIGURATION MODEL
43
kn is called graphical when it is the degree sequence of some simple graph.

Pn Erdos
and Gallai [8] proved that a sequence ~k is graphical if and only if
i=1 ki is
even and
l
X
n
X
ki l(l 1) +
i=1
min(l, ki ),
l = 1, . . . , n 1.
(3.5)
i=l+1
The necessity of this condition is easy to see. Indeed, the left-hand side is the
total degree of the first l vertices. The first term on the right-hand side is the
maximal total degree of the first l vertices coming from edges between them,
while the second term is a bound on the total degree of the first l vertices coming
from edges that connect to the other vertices. The sufficiency is harder to see,
and we refer to Choudum [7] for a proof.
Exercise 3.2 Give an example of a non-graphical sequence ~k = (k1 , . . . , k4 )
for which k1 + . . . + k4 is even, and explain in a picture why it is non-graphical.

Arratia and Liggett [1] investigate the probability that an i.i.d. sequence
~ = (D1 , . . . , Dn )
D
(3.6)
is graphical. This becomes relevant when the degree sequence ~k in the configuration model is itself drawn as an i.i.d. sequence, say according to a pre-specified
probability distribution f on N0 . In that case automatically
lim E(kfCMn (D)
~ f k ) = 0.
(3.7)
P
It turns out that, under the assumption that 0 < k even f (k) < 1 (i.e., both
even and odd degrees are possible),
0, if lim
n nF (n) = ,
~ is graphical) =
lim P(D
(3.8)
n
1 , if lim
nF (n) = 0,
n
where F (n) =
kn
f (k). It is not hard to show that if

!
n
X
lim P
Di even = 21 .
n
k even
f (k) < 1, then

(3.9)
i=1
Consequently, the tail condition limn nF (n) = 0 in the

P second line of (3.8)
(which incidentally is slightly weaker than the condition kN0 k 2 f (k) < of
finite second moment) guarantees that
!
n
X

~
lim P D is graphical
Di even = 1.
(3.10)
n
i=1
~ for which Pn Di is
In other words, by retaining only those realisations of D
i=1
~ to be simple, i.e., the probability that
even, we make it possible for CMn (D)
44
~
CM
P n (D)2 is simple is strictly positive. It can be shown that if f (0) = 0 and
kN0 k f (k) < , then
n
X

~ is simple
lim P CMn (D)
Di even = exp[ 12 2 14 4 ] (0, 1] (3.11)
i=1
with
P
=
kN
k(k 1)f (k)

[0, ),
kN kf (k)
(3.12)
where N is the normalisation constant. This shows that the repeated configuration model is a feasible way to generate simple random graphs with a prescribed
degree distribution.
3.1.4
Percolation transition
Like the Erd

os-Renyi random graph, the configuration model has a percolation
~ is i.i.d.
transition. We again consider the case where the degree sequence D
with distribution f having a finite second moment. Then the largest cluster of
~ has size
CMn (D)
(log n) when < 1,
(n2/3 ) when = 1,
(n) when > 1.
(The critical scaling actually requires that f has a finite third moment.)
The intuition behind the above result is as follows. The offspring distribution
of a given vertex ? is equal to f . However, the offspring distribution of the
vertices at distance 1 from ? is different, namely, this equals f given by
1
(k + 1)f (k + 1),
f(k) =
N
k N0 ,
(3.13)
with N the normalisation constant.

s
s
?
s k
s
]
Figure 3.2: The vertex ? linked to a neighbour ] that has k neighbours not linked to
?.
Indeed, the fraction of vertices with k + 1 edges is f (k + 1). By the uniform

matching of half-edges, the probability that a vertex with k + 1 half-edges is
linked to ? is proportional to k + 1. The probability that the other k half-edges
end up being linked to half-edges of vertices at distance 2 from ? is 1 (in the
limit as n ). Hence, the probability that a vertex at distance 1 from ? has
k vertices not linked to ? equals f(k) (see Fig. 3.2). The same is true for vertices
at distances 2 from ?, except that during the exploration process vertices and
half-edges get gradually depleted (a phenomenon we already encountered in the
3.2. PREFERENTIAL ATTACHMENT MODEL
45
Erd
os-Renyi random graph). But for large n this effect is minor and so we can
think of f as the forward degree of vertices in the exploration process. Since
(compare (3.12) and (3.13))
X
k f(k)
(3.14)
=
kN0
is the average forward degree, this explains why the percolation transition occurs
at = 1 (recall Homework 2.2 and the intermezzo on branching processes in
Chapter 2).
P
Note that if f 6= 1 , then > kN kf (k), which is the average degree. In
the language of social networks this inequality can be expressed as:
On average your friends have more friends than you do!
This sounds paradoxical, but it is not. You are more likely to be friends with
a person who has many friends than with a person who has few friends. This
causes a bias, which is precisely what (3.13) captures.
3.1.5
Scaling features
The configuration model can be made sparse and scale free by construction:
since the degree distribution is pre-described it can be chosen so as to satisfy
the conditions in (2.9) and (2.11).
In van der Hofstad [10, Chapter 5] it is shown that the configuration model
with i.i.d. degrees is small-world, namely, for any > 1,

lim P HCMn (D)
K > /( 1),
(3.15)
~ K log n = 1
n
where we recall (2.7). The intuition behind this result is similar to that for
the Erd
os-Renyi random graph, with taking over the role of . If the degree
distribution f has exponent (2, 3) (recall (2.11)), so that
X
X
kf (k) < ,
k 2 f (k) = ,
(3.16)
kN0
kN0
then the configuration model is even ultra small-world : distances are at most
of order log log n.
Homework 3.1 Is the configuration model with i.i.d. degrees highly clustered?
Hint: Recall (2.12), and compute the probability that ? lies in a wedge, respectively, in a triangle in the limit as n . Use Fig. 3.2.
Homework 3.1 shows that CMn is locally tree-like, i.e., the number of triangles
grows much slower with n than the number of wedges.
3.2
3.2.1
Preferential attachment model

Motivation
The configuration model describes networks satisfactorily, in the sense that it

gives rise to random graphs with degree distributions that can be matched to
46
the power-law degree distributions found in real-world networks. However, it

does not explain how these networks came to be the way they are. A possible
explanation for the occurrence of scale-free behaviour was given by Barabasi
and Albert [3], via a feature called preferential attachment. Most real-world
networks grow. For example, the WWW has increased from a few web pages
in 1990 to several billion web pages at present. Growth is an aspect that is
not taken into account in the Erdos-Renyi random graph or the configuration
model, which are static models of random graphs (even though it would not be
hard to reformulate these graphs as a growth process where vertices and edges
are added successively). Thus, viewing networks as evolving in time is not in
itself enough to explain the occurrence of power laws, but it does give us the
possibility to investigate and model how they grow.
So, how do real-world networks grow? Think of a social network describing a
population in which new individuals arrive one by one, each time enlarging the
network by one vertex. A newcomer will start to socialise with other individuals
in the population, and this is responsible for new connections to the newcomer.
In an Erd
os-Renyi random graph, the connections to the newcomer are spread
uniformly over the population. But is this realistic? Is the newcomer not more
likely to get to know individuals who are socially active and therefore already
have a large degree? Probably so! We do not live in an egalitarian world.
Rather, we live in a self-reinforcing world, where people who are well known
are likely to become even more known. Therefore, rather than taking equal
probabilities for our newcomer to become acquainted with other individuals in
the population, we should allow for a bias towards individuals who already know
many other individuals.
Phrased in a more mathematical way, a preferential attachment model is
such that new vertices are more likely to attach to old vertices with a high degree
than to old vertices with a low degree. For example, suppose that new vertices
are added (each carrying a fixed number of edges, say 1) that want to connect
to older vertices. Each edge is connected to a specific older vertex with a
probability that is proportional to the current degree of that older vertex.
Below we will argue that preferential attachment naturally leads to sparse
and scale-free random graphs. The power-law exponent of the degree distribution depends on the parameters of the model. We will also argue that preferential attachment leads to small-world random graphs that are locally tree-like,
like the Erd
os-Renyi random graph and the configuration model.
3.2.2
Construction
We follow van der Hofstad [9, Chapter 8]. The preferential attachment we
consider depends on two parameters, m N and [m, ), and produces a
random multi-graph process, denoted by

PAn (m, ) nN ,
(3.17)
such that for every n the graph has n vertices, mn edges and total degree 2mn
(see Exercise 3.6 below).
We begin by defining the model for m = 1 (see Fig. 3.3). In this case,
PA1 (1, ) consists of a single vertex v1 with a single self-loop (which has degree
2). Let
{v1 , . . . , vn }
(3.18)
47
1+
3+2
s
v1
s
v1
2+
3+2
s
v1
s
v2
s
v2
Figure 3.3: The first two iterations in the construction of PAn (1, ). The first iteration
is a single vertex v1 with a single self-loop. The second iteration adds a vertex v2 and
links this via a single edge either to itself or to v1 , with probabilities that depend on
the degree of v1 (which is 2 after the first iteration). Subsequent iterations involve
adding vertices one by one and linking them via a single edge to the already existing
vertices with probabilities that depend on the current degrees of these vertices.
denote the vertices of PAn (1, ), and let

{D1 (n), . . . , Dn (n)}
(3.19)
denote their degrees (a self-loop raises the degree by 2). Conditionally on

PAn (1, ), the growth rule to obtain PAn+1 (1, ) is as follows. We add a single
vertex vn+1 carrying a single edge. This edge is connected to a second endpoint,
drawn from
{v1 , . . . , vn , vn+1 }
(3.20)
with probabilities
P vn+1 vi | PAn (1, ) =
(1+)
n(2+)+(1+) ,
i = n + 1,
Di (n)+
n(2+)+(1+) ,
i = 1, . . . , n.
(3.21)
Note that the degrees in (3.19) are random and typically change as more vertices
are added: Di (n) depends on the vertex label i and the stage of the iteration
n. Note that the parameter is added to the degrees, which amounts to a shift
of the proportionality in the preferential attachment.
Exercise 3.3 Verify that Di (n) 1 for P
all n i, so that Di (n) + 0 for
n
all n i because 1. Also verify that i=1 Di (n) = 2n for all n.
Exercise 3.4 Verify that the attachment probabilities in (3.21) sum up to 1.

Exercise 3.5 Show that PAn (1, 1) consists of a self-loop at vertex v1 while
each other vertex is connected to v1 by precisely one edge.
Homework 3.2 Fix i N. Show that

P lim Di (n) = = 1.
n
(3.22)
Hint: Show that if (Ij )

{0, 1}-valued random
j=i is a sequence of independent
Pn
variables with P(Ij = 1) = (1+)/[j(2+)+(1+)], then j=i Ij is stochastically
smaller than Di (n), yet tends to infinity with probability 1 as n .
48
For = 0 the probabilities in (3.21) simplify to
1
2n+1
,
P vn+1 vi | PAn (1, ) =
D
(n)
i ,
2n+1
i = n + 1,
(3.23)
i = 1, . . . , n,
and for = 1 to
P vn+1
0,
vi | PAn (1, ) =
Di (n)1 ,
n
i = n + 1,
(3.24)
i = 1, . . . , n.
The preferential attachment mechanism in (3.21) is called affine, because the

attachment probabilities depend linearly on the degrees of the random graph
PAn (1, ).
We continue by defining the model for m N\{1}, which uses the model for
m = 1 as follows. We start with PAmn (1, /m), and denote its vertices by
{v1 , . . . , vmn }.
(3.25)
We collapse {v1 , . . . , vm } into a single vertex v1 [m], collapse {vm+1 , . . . , v2m }

into a single vertex v2 [m], etc. After all vertices are collapsed, we obtain
PAn (m, ) with vertices
(v1 [m], . . . , vn [m]).
(3.26)
Exercise 3.6 Show that PAn (m, ) is a multi-graph with n vertices and mn
edges, such that the total degree is equal to 2mn. What do Exercises 3.33.5 and
Homework 3.2 imply for m N\{1}?
Simulations are shown in Figs. 3.4 and 3.5.
Figure 3.4: Preferential attachment random graph with m = 2, = 0 and n =

10, 30, 100. Courtesy Remco van der Hofstad.
Note that an edge in PAmn (1, /m) is attached to vertex vi with a probability proportional to the weight of vertex vi , which according to the second
line of (3.21) is equal to the degree of vertex vi plus /m. Since for each
j {1, . . . , n} the vertices {v(j1)m+1 , . . . , vjm } in PAmn (1, /m) are collapsed
into a single vertex vj [m] in PAn (m, ), an edge in PAn (m, ) is attached to
vertex vj [m] with a probability proportional to the total weight of the vertices
{v(j1)m+1 , . . . , vjm }. Since the sum of the degrees of these vertices is equal to
the degree of vertex vj [m], this probability in turn is proportional to the degree
49
Figure 3.5: Preferential attachment random graph with m = 2, = 1 and

10, 30, 100. Courtesy Remco van der Hofstad.
of vertex vj [m] in PAn (m, ) plus . Thus, also PAn (m, ) grows in an affine
manner.
In the above construction the degrees are updated each time an edge is
attached. This is referred to as intermediate updating of the degrees. It is
possible to define the model with m N\{1} directly, without the help of the
model with m = 1, but the construction is a bit more involved.
The model with = 0 is the Barab
asi-Albert model, which has received
a lot of attention in the literature and was formally defined in Bollobas and
Riordan [6]. The extra parameter was introduced by van der Hofstad [9,
Chapter 8] and makes the model more flexible.
3.2.3
Scaling features
The following two results are taken from van der Hofstad [9, Chapter 8] and are
valid for any m N and > m.

(1) The random graph process (PAn (m, ) nN is sparse with limiting degree
distribution fPA given by
0,
k = 0, . . . , m 1,
fPA (k) =
(k+)
(m+2++(/m))
2+
, k m,
m
(k+3++(/m))
(m+)
where is the Gamma-function defined by (t) =
R
0
(3.27)
xt1 ex dx, t > 0.
(2) The tail behavior of fPA is given by

fPA (k) = cm, k [1 + O(1/k)],
k ,
(3.28)
with
= 3 + (/m),
Hence (PAn (m, )

nN
cm, = ( 1)
(m + + ( 1))
.
(m + )
(3.29)
is scale free with exponent = 3 + (/m) (2, ).
Exercise 3.7 Look up the properties of t 7 (t) on Wikipedia. With the help
of partial integration and induction it can be shown that (j) = (j 1)! for
j N.
50
Exercise 3.8 Why is the result in Homework 3.2 (= every vertex eventually
sees its degree tend to infinity) not in contradiction with the fact that the degree
distribution converges to fPA (= sparseness)?
For m = 1 the above formulas simplify to
0,
fPA (k) =
(2 + ) (k+)
k = 0,
(3+2)
(k+3+2) (1+) ,
(3.30)
k 1,
and
fPA (k) = c1, k [1 + O(1/k)],
k ,
(3.31)
with
(3 + 2)
.
(3.32)
(1 + )
Figure 3.6 shows a realisation of the degree sequence of PAn (2, 0) for n =
300, 000 and n = 1, 000, 000. The horizontal axis is the degree k, the vertical
axis is the number of vertices with degree k, corresponding to nfPA (k).
= 3 + ,
c1, = (2 + )
100000.
100000.
10000
10000
1000
1000
100
100
10
10
1
1
1
10
50 100
5001000
10
100
1000
Figure 3.6: The degree sequence of a preferential attachment random graph with
m = 2, = 0 and n = 300, 000, respectively, n = 1, 000, 000 on a log-log scale.
Courtesy Remco van der Hofstad.

In van der Hofstad [10, Chapter 7] it is shown that (PAn (m, ) nN is smallworld for any m N and > m. Unfortunately, the proof is not easy and
there is no good control on the constant K. For m N\{1} and (m, 0)
it is even ultra small-world. It can also be shown that (PAn (m, ) nN is not
highly clustered because the random graph is locally tree-like, i.e., the number
of triangles grows much slower with n than the number of wedges.
3.2.4
Dynamic robustness
The important feature of the preferential attachment model is that, unlike the
configuration model, the power law degree distribution is explained via a mechanism for the growth of the graph. Therefore, preferential attachment offers
a possible explanation as to why power-law degree distributions occur in realworld networks. As Barabasi [2] puts it:
... the scale-free topology is evidence of organising principles acting at
each stage of the network formation. (...) No matter how large and complex a network becomes, as long as preferential attachment and growth are
present it will maintain its hub-dominated scale-free topology.
51
This is correct, but it is overstating the point a bit, since power laws are intimately related to the affineness of the attachment probabilities. Indeed, it
turns out that if the probability for a new vertex to attach itself to an old vertex with degree k is chosen proportional to k with (0, 1), then fPA falls off
like a stretched exponential and scale freeness is lost (Krapivsky, Redner and
Leyvraz [12]). On the other hand, if (1, ), then there is a single vertex
that is connected to nearly all the other vertices (Krapivsky and Redner [11]).
Moreover, if 1/( 1) is non-integer, then there are finitely many vertices with
degree > 1/( 1) and infinitely many vertices with degree < 1/( 1) (Oliviera
and Spencer [13]).
Many more possible explanations have been given for why power laws occur
in real-world networks, and many adaptations of the above simple preferential
attachment model have been studied in the literature, all giving rise to powerlaw degree distributions.
While preferential attachment is natural in social networks, also in other
examples of real-world networks some form of preferential attachment is likely
to be present. For example, in the WWW when a new webpage is created it
is more likely to link to an already popular site, such as Google, than to the
personal web page of a single individual. For Internet it may be profitable
for new routers to be connected to highly connected routers, since these give
rise to short distances. Even in biological networks some form of preferential
attachment exists. In fact, the idea of preferential attachment in the context of
the evolution of species dates back to Yule [14] in 1925.
Bibliography
[1] R. Arratia and T.M. Liggett, How likely is an i.i.d. degree sequence to be
graphical? Ann. Appl. Probab. 15 (2005) 652670.
[2] A.-L. Barab
asi, Linked: The New Science of Networks, Perseus Publishing,
Cambridge, Massachusetts, 2002.
[3] A.-L. Barab
asi and R. Albert, Emergence of scaling in random networks,
Science 286 (1999) 509512.
[4] E.A. Bender and E.R. Canfield, The asymptotic number of labelled graphs
with given degree sequences, Journal of Combinatorial Theory (A) 24
(1978) 296307.
[5] B. Bollob
as, A probabilistic proof of an asymptotic formula for the number
of labelled regular graphs, European J. Combin. 1 (1980) 311316.
[6] B. Bollob
as and O. Riordan, The diameter of a scale-free random graph,
Combinatorica 24 (2004) 534.
[7] S.A. Choudum, A simple proof of the Erdos-Gallai theorem on graph sequences, Bull. Austral. Math. Soc. 33 (1986) 6770.
[8] P. Erd
os and T. Gallai, Graphs with points of prescribed degrees (in Hungarian), Mat. Lapok 11 (1960) 264274.
tue.nl/~rhofstad/
[10] R. van der Hofstad, Random Graphs and Complex Networks, Volume II,
tue.nl/~rhofstad/
[11] P.L. Krapivsky and S. Redner, Organization of growing random networks,
Phys. Rev. E 63 (2001) 066123.
[12] P.L. Krapivsky, S. Redner and F. Leyvraz, Connectivity of growing random
networks, Phys. Rev. Lett. 85 (2000) 4629.
[13] R. Oliveira and J. Spencer, Connectivity transitions in networks with superlinear preferential attachment, Internet Math. 2 (2005)121163.
[14] G.U. Yule, A mathematical theory of evolution, based on the conclusions of
Dr. J.C. Willis, F.R.S, Phil. Trans. Roy. Soc. London B 213 (1925) 2187.
52
Chapter 4
Network Topology
In this chapter we discuss some of the most important empirical properties
observed in real-world networks. To this end, we first introduce some basic
notions in Section 4.1 and then discuss various empirical properties in some
detail throughout Section 4.2. Most of these empirical properties are computed
on real-world examples of the type discussed in Chapter 1.
This chapter is meant as a general introduction to the structural characterization of real-world networks, and also as a compact summary of the most
commonly observed empirical properties. The chapter puts some of the definitions already introduced in Chapters 2 and 3 at work on empirical networks,
and at the same time it introduces a series of new definitions aimed at capturing
more structural details. The aim is to complement the mathematical definitions
of the previous chapters with a phenomenological basis and to provide a solid
empirical reference for the following chapters.
Other reviews presenting an empirical introduction to networks from a rather
general point of view can be found in review articles [1, 2, 3, 4] and books
[5, 6, 7, 8, 9].
4.1
Basic notions
First we introduce some basic definitions. In some cases, these definitions (or
the notation) are slightly different than the corresponding definitions we gave in
Chapter 2, because, for instance, here we need to distinguish between directed
and undirected networks, and between so-called local and global properties.
This should not alarm readers: being aware of the existence of different quantitative expressions for the same abstract notion is actually an instructive exercise. These different expressions reflect the existing variety of definitions in the
scientific literature about complex networks (and in most other fields as well).
While the use of the terms graph and edge are preferred in the definition
of abstract mathematical models, the terms network and links are more used
when referring to real-world objects. In this chapter we will therefore prefer
the latter choice, even if we will occasionally employ the former one as well.
In general, the links of a network can be either directed, if an orientation (i.e.
an arrow) is specified along them, or undirected, if no orientation is specified.
Correspondingly, the whole network is denoted as directed (see Fig. 4.1a) or
53
54
CHAPTER 4. NETWORK TOPOLOGY
Figure 4.1: Simple examples of networks, each with n = 6 vertices. a) A directed

network. Here the links between vertices 1 and 2 and between 1 and 4 are reciprocated.
b) An undirected network, which is also the undirected version of network a). c) The
directed version of network b). Here all links are reciprocated.
undirected (see Fig. 4.1b). More precisely, undirected links are bidirectional
ones, since they allow transit in both directions. For this reason, an undirected
network can always be thought of as a directed one where each undirected link
is replaced by two directed ones pointing in opposite directions (see Fig. 4.1c).
A link in a directed network is said to be reciprocated if another link between
the same pair of vertices, but with opposite direction, is there. Therefore, an
undirected network can be regarded as a special case of a directed network where
all links are reciprocated.
In a real-world network, the identity of each vertex matters. For this reason,
if n is the number of vertices, each vertex is explicitly labelled with an integer
number i = 1, . . . , n. All the topological information can then be compactly
expressed by defining the n n adjacency matrix of the network, whose
entries tell us whether a link is present between two vertices (this is what is
ordinarily done, for instance, to store network data in a computable form). For
directed networks, we denote the adjacency matrix elements by aij and define
them as follows:
1 if a link from i to j is there

aij
(4.1)
0 else.
For undirected networks, we denote the adjacency matrix elements by the different symbol bij and use the definition
1 if a link between i and j is there

bij
(4.2)
0 else.
Note that for undirected networks bij = bji , while in general aij 6= aji for directed networks (aij = aji = 1 if and only if the links between i and j are reciprocated).
Exercise 4.1 Write the adjacency matrices corresponding to the three networks in Fig. 4.1.
As mentioned above, an undirected network can be regarded as a directed
one; in this case, the adjacency matrix aij of the resulting directed network is
4.1. BASIC NOTIONS
55
Figure 4.2: Examples of simple, familiar undirected networks. a) Periodic onedimensional chain (ring) with first- and second-neighbour interactions. b) Twodimensional lattice with only first-neighbour interactions. c) Fully connected network (mean-field approximation). All these networks are regular, and no disorder is
introduced.
simply given by
aij bij ,
(4.3)
where bij is that of the original undirected network. In this particular case,
aij is a symmetric matrix. Note that this mapping can be reversed in order to
recover the original undirected network: from Fig. 4.1b we can always obtain
Fig. 4.1c, and vice versa. By contrast, the mapping of a directed network
onto an undirected one - where an undirected link is placed between vertices
connected by at least one directed link - is also possible, even if in general it
cannot be reversed due to a loss of information. For instance, the network shown
in Fig. 4.1b is the undirected version of that shown in Fig. 4.1a. From Fig. 4.1a
we can obtain Fig. 4.1b, but from the latter we cannot go back to Fig. 4.1a
unless we are given more information.
Homework 4.1 Imagine a generic directed network where not all links are
reciprocated, and then consider its undirected projection. Find the mathematical relation between the entries {bij } of the adjacency matrix of the projected
undirected network and the entries {aij } of the adjacency matrix of the original
directed network. Test your relation on the networks shown in Figs. 4.1a and
4.1b (by assuming that the former is the original directed network), and then
for the networks shown in Figs. 4.1b and 4.1c (by assuming that the latter is
the original directed network).
Before introducing some specific real-world networks and presenting their
empirical properties, we briefly mention the simplest and most familiar kind of
networks that scientists from different fields have been traditionally experienced
with, namely the class of (deterministic) regular networks . These are networks where all vertices are connected to the same number z of neighbours, in
a highly ordered fashion. A different class of regular networks is that of random
regular graphs, where all vertices still have the same number of neighbours, but
the connections are randomly established between vertices. It should be noted
that such graphs are another example of random graphs, different from the
56
Erd
os-Renyi model discussed in Chapter 2. In fact, they are a particular case
of the Configuration Model introduced in Chapter 3, where the degree sequence
is now chosen to be the constant vector ~k = (z, . . . , z).
In Fig. 4.2 we show three examples of regular (undirected) networks: a
periodic chain with first- and second-neighbour interactions (z = 4), a twodimensional lattice with only first-neighbour interactions (z = 4) and a complete network (where each vertex is connected to all the others: z = n 1).
Chains and square lattices are particular examples of the more general class
of D-dimensional discrete lattices, used whenever a set of elements is assumed
to be connected to its first, second, . . . and lth neighbours (nearest-neighbour
connections). In this case, each vertex is connected to z = 2Dl other vertices.
Complete networks are instead used when infinite-range connections are assumed, resulting in what is sometimes referred to as the mean-field scenario,
i.e. z = n 1. The highly ordered structure of these networks translates into
certain regularities of their adjacency matrices.
Exercise 4.2 Write the adjacency matrices for the three networks shown in
Fig. 4.2. Before doing that, find out what is the most convenient labelling of
vertices in each network and describe it.
The examples of regular networks considered above can be built deterministically, or in other words without introducing randomness. They are among
the simplest specifications of networks and represent only a small subset of
the full range of possible topological configurations. The rest of this Chapter
aims at showing that real networks are not consistent with neither ER random graphs, nor regular networks. Therefore, traditional assumptions such
as nearest-neighbour or mean-field connections cannot be considered as good
choices for most real-world networks, along with their predictions for the dynamical behaviour of any process defined on them. This problem persists even
after introducing randomness in regular networks: random regular graphs, while
exhibiting a behaviour that is much richer than that of their deterministic counterparts, are still not good models of real-world networks. The failure of regular
networks motivates the introduction of more complex models, some of which
have been presented in Chapter 3 and some of which will be introduced in
Chapter 5.
4.2
Empirical topological properties
We now come to the description of various empirical topological properties of

real networks. Several reviews exist in the literature [1, 2, 3, 4, 5, 6, 7] presenting
this subject from various viewpoints. Here we follow an approach in which we
characterize the topology of real networks progressively, from local to global
properties. More precisely, we first consider the properties specified by the first
neighbours of a vertex (first-order properties), then those specified by the first
and second neighbours (second-order properties), and so on until we come to
those relative to the whole network (global properties).
4.2. EMPIRICAL TOPOLOGICAL PROPERTIES
4.2.1
57
First-order properties
By first-order properties we mean the set of topological quantities that can be

specified by starting from a vertex and considering its first neighbours. This
information is captured by simply considering the elements of the adjacency
matrix, or functions of them.
(In- and Out-)Degree
In an undirected network, the simplest first-order property is the number ki of
neighbours of a vertex i, or equivalently the number of links attached to i. The
quantity ki is called the degree of vertex i. In terms of the adjacency matrix
bij , the degree can be defined as
ki
bij .
(4.4)
j6=i
In a directed network, it is possible to distinguish between the in-degree kiin

and the out-degree kiout of each vertex i, defined as the number of in-coming
and out-going links of i respectively. In this case, if aij denotes an entry of the
adjacency matrix, then the degrees read
X
kiin
aji ,
(4.5)
j6=i
kiout
aij .
j6=i
In an undirected network, the n-dimensional vector ~k of vertex degrees is

called the degree sequence (recall the definition (2.1) in Chapter 2). In a
directed network, the n-dimensional vectors ~k in and ~k out are called the indegree sequence and out-degree sequence respectively.
Exercise 4.3 Calculate the (in- and out- where applicable) degree sequences of
the three networks shown in Fig. 4.1. Then calculate the degree sequences of the
three networks shown in Fig. 4.2. Describe the effects of a possible relabelling
of vertices in the two sets of networks.
Empirical degree distribution
A very important quantity for the characterization of the first-order topological
properties of a real-world network is the (normalized) histogram of the values
{ki }ni=1 , i.e. the empirical degree distribution P (k) expressing the fraction
of vertices with degree k, or equivalently the probability that a randomly chosen
vertex has degree k (recall the definition (2.2)). In a directed network, the
corresponding objects are the empirical in-degree distribution P in (k in ) and the
empirical out-degree distribution P out (k out ).
It turns out that, for a large number of real-world networks, the empirical
degree distribution displays the power-law form
P (k) k
(4.6)
58
with 2 3. In directed networks, the in- and out-degree distributions often

display the same form:
P in (k in ) (k in )in ,
P out (k out ) (k out )out ,
(4.7)
where in and out have in general different values, both typically between 2
and 3.
For the practical purpose of plotting empirical degree distributions and estimating their exponents, the (empirical) cumulative distributions are commonly
used:
X
X
X
P> (k)
P (k 0 ), P>in (k in )
P in (k 0 ), P>out (k out )
P out (k 0 ).
k0 k
k0 kout
k0 kin
(4.8)
In this way, the statistical noise is reduced by summing over k 0 .
Exercise 4.4 Plot the empirical cumulative (in- and out- where applicable)
degree distributions for the three networks shown in Fig. 4.1.
If the empirical degree distribution has the power-law behaviour of Eq. (4.6)
or (4.7), then the empirical cumulative distributions are again power-laws, but
with an exponent reduced by one:
P> (k) k +1 ,
P>in (k in )in +1 ,
P>out (k out )out +1 .
(4.9)
Homework 4.2 Prove the above statement, approximating k, k in , k out with

continuous variables. In contrast, consider an exponential empirical degree distribution of the form P (k) eak (with a > 0) and find the large-degree expression for the empirical cumulative distribution P> (k) (approximating k again
with a continuous variable).
In Fig. 4.3 we show the empirical cumulative degree distribution for three
networks: a snapshot of the Internet, a protein network and a portion of the
WWW. The power-law behaviour is witnessed by their approximate straight-line
trend in log-log scale. As we mentioned, the exponent of the empirical degree
distribution of real-world networks is systematically found to be in the range
2 3, a fact that is verified by all plots in the figure (where 1 1 2).
A note on scale-free distributions
Power-law distributions are very important from a general point of view since
they lack a typical scale [13, 14]. More precisely, they are the only distributions
satisfying the scaling condition
P (ak) = f (a)P (k),
k N,
a > 0.
(4.10)
Homework 4.3 Prove the above statement.

The functional form of a power-law distribution is therefore unchanged, apart
for a multiplicative factor, under a rescaling of the variable k. Due to this
absence of a characteristic scale, power-law distributions are also called scalefree distributions. An important consequence of scale-free behaviour is the
59
Figure 4.3: Empirical cumulative degree distribution for three different networks. a)
P> (k) for the Internet at the autonomous system level in 1999 [10]. b) P> (k) for
the protein interaction network of the yeast Saccaromyces cerevisiae [11]. c) P>in (kin )
for a 300 million vertex subset of the WWW in 1999 [12]. All curves are approximately straight lines in log-log scale, indicating that they are power-law distributions
(modified from ref. [4]).
presence of so-called fat tails: compared to distributions that decay (at least)
exponentially (e.g. Gaussian or Poisson distributions), power-law distribution
decay much slower and assign a much bigger probability to rare events (the
outcomes in the tail of the distribution).
The empirical scale-free behaviour means that in real-world networks there
are predominantly many low-degree vertices but also a fair number of highdegree ones, which are connected to a significant fraction of the other vertices.
The fraction of vertices with very large degrees (i.e. hubs) is not negligible
and gives rise to a whole hierarchy of connectivities, from small to large. If we
imagine that a dynamical process takes place on the network (see for instance
Chapters 8, 9, 11, 12), the scale-free property as a remarkable effect: once the
process reaches a high-degree vertex, it then propagates to a large portion of
the entire network, resulting in extremely fast dynamics.
By contrast, note that regular networks introduced in Section 4.1 have a
delta-like empirical degree distribution of the form P (k) = k,z where z is
the degree of every vertex (see Fig. 4.2). For D-dimensional lattices with lthneighbours interactions, z = 2Dl and no vertex is connected to a significant
fraction of the other vertices. For fully connected networks, z = n 1 and every
vertex is connected to all the others. In all these cases, no hierarchy is present
and the network is perfectly homogeneous.
Average degree and number of links
It is possible to consider the average degree k as a single quantity characterizing the overall first-order properties of a network, and then compare different
networks with respect to it. In an undirected network the average degree can
be expressed as
P
ki
2Lu
k i =
,
(4.11)
n
n
where
XX
1 XX
Lu
bij =
bij
(4.12)
2 i
i j<i
j6=i
60
is the total number of undirected links in the network, expressed in terms of the
entries of the adjacency matrix. Note that in principle the total number of links
also includes self-loops, which are links starting and ending at the same vertex
(corresponding to nonzero diagonal entries of the adjacency matrix). However,
here and in the following we assume that there are no self-loops in the network,
and this is reflected in the requirement i 6= j in Eq. (4.12).
Exercise 4.5 Using the result of exercise 4.3, calculate k and Lu for the undirected network in Fig. 4.1. Check that k = 2Lu /n as stated in Eq. (4.11).
For directed networks, it is easy to see that the average in-degree kin equals
the average out-degree kout , and both quantities can be expressed as
k
kiin
Pn out
k
L
= i i = ,
n
n
kin =
kout
(4.13)
where
L
XX
i
aij
(4.14)
j6=i
is the total number of directed links, expressed in terms of the adjacency matrix
entries.
Exercise 4.6 Using the result of Exercise 4.3, calculate kin , kout and L for the
two directed networks in Fig. 4.1. For both networks, check that kin = kout =
L/n as stated in Eq. (4.13).
It should be noted that we chose a different notation for Lu and L to avoid
confusion when an undirected network is regarded as directed, with two directed
links replacing each undirected one. In that case the mapping described by
Eq. (4.3) allows us to recover Eq. (4.14) consistently from Eq. (4.12), and our
notation yields L = 2Lu as expected.
Note that, in terms of the degree distribution, the average degree k reads
k
k 0 P (k 0 ),
(4.15)
k0
X
k0
k 0 P in (k 0 ) =
k 0 P out (k 0 )
(4.16)
k0
for undirected and directed networks respectively.

Homework 4.4 Prove that, for an empirical undirected degree distribution of
the form P (k) k where 2 < 3 (as in most real-world networks), the
mean value of the degree is finite while the variance of the degree is infinite.
[Hint: rewrite Eq. (4.15) by approximating P (k) with a continuous probability
density defined for k (1, +), and recall that the variance is defined as k 2 k2
where k and k 2 are the first and second moment respectively].
61
Link density
The number of links is an interesting property in itself, being a measure of the
density of connections in the network. In order to compare networks with different numbers of vertices, the number of links is usually divided by its maximum
possible value in order to obtain the link density. In an undirected network, the
maximum possible number of links (with
self-loops excluded) is given by the
total number of vertex pairs, which is n2 = n(n 1)/2 if the number of vertices
is n. Therefore the link density is defined as
cu
2Lu
.
n(n 1)
(4.17)
By contrast, in a directed network the maximum number of links is given by

twice the number of possible vertex pairs (since each pair can be occupied by
two links with opposite directions) and the link density is therefore defined as
c
L
.
n(n 1)
(4.18)
It is instructive to plot the values of the link density c for different realworld networks in a single figure, as a function of the number n of vertices. An
example of such plot is shown in Fig. 4.4. We see that networks of the same
type tend to be clustered together, and that all points approximately follow the
trend
c(n) n1 .
(4.19)
There is however one important exception: the temporal snapshots of the World
Trade Web lie out of the curve and tend to have a constant link density, independent of n.
Homework 4.5 Show that, in the limit n of large network size, Ddimensional lattices display cu (n) n1 0 (note that these networks are undirected, unlike those in Fig. 4.4).
In the limit n , the case cu (n) 0 is often referred to as the sparse
network limit, while cu (n) cu () > 0 (where cu () is a finite constant) is
the dense network limit. In graph theory, these limits can be defined rigorously
for mathematical models (see for instance Chapter 2). Real networks are however of finite size, therefore in principle we should speak of large size regime
rather than infinite size limit, the latter being only a formal extrapolation of
Eq. (4.19). Bearing this warning in mind, we conclude that all the real-world
networks in Fig. 4.4 are sparse, except the WTW which is a dense network.
Indeed, most real-world networks are found to be sparse.
4.2.2
Second-order properties
By second-order topological properties we denote those properties which depend not only on the direct connections between a vertex and its nearest neighbours, but also on the indirect connections from a vertex to the neighbours of
its neighbours. Therefore the computation of these properties involves products of two adjacency matrix elements bij bjk . In Section 3.1.4 we encountered a
second-order property when we looked at the distribution of vertices at distance
2 from a given vertex ?.
62
Figure 4.4: Link density c versus number of vertices n for several real-world directed
networks. Except for the WTW, all points roughly follow the dashed line n1 .
Degree-degree correlations
An important example of second-order structure is given by the degree correlations: is the degree of a vertex correlated with that of its first neighbours?
Statistically speaking, the most complete way to describe second-order topological properties is to consider the two-vertices conditional empirical degree
distribution P (k 0 |k) specifying the probability that a vertex with degree k
is connected to a vertex with degree k 0 . In the trivial case with no correlation between the degrees of connected vertices, the second-order properties can
be obtained in terms of the first-order ones, or in other words the conditional
probability is equal to the unconditional (marginal) probability that a vertex is
connected to a vertex of degree k 0 :
k0
P (k 0 |k) = P (k 0 ).
k
(4.20)
However, as we will show in the following, real networks display a more complex
behaviour and are characterized by nontrivial degree correlations which make
the form of P (k 0 |k) deviate from Eq. (4.20).
Average nearest-neighbour degree

Estimating the empirical form of the conditional probability directly from real
data is difficult, since P (k 0 |k) is a two-parameter curve and is affected by statistical fluctuations (however two-parameter plots of this type have been studied
[15]).
A more compact description, which also partly averages out the statistical
noise, is given by defining the average nearest-neighbour degree (ANND
63
<knn>
assortative mixing
disassortative mixing
k
Figure 4.5: Assortative and disassortative mixing in a generic network, as measured by the increasing or decreasing trend of the average nearest-neighbour
degree knn (k) as a function of the degree k.
in the following) of a vertex i as the average degree of the neighbours of i. For

an undirected network, the ANND is denoted by kinn and defined in terms of
the adjacency matrix as
P
P P
bij bjk
j6=i bij kj
j6=i
nn
P k6=j
ki
=
.
(4.21)
ki
b
j6=i ij
It is then possible to average kinn over all vertices with the same degree k and
plot the result against k to obtain the one-parameter curve knn (k). The slope
of this curve gives information about the nature of the degree correlations: if,
when considered as a function of k, knn (k) is an increasing function, this means
that the degrees are positively correlated (high-degree vertices are on average
linked to high-degree ones) and the network is said to display assortative
mixing, while if knn (k) decreases the degrees are negatively correlated and
the network is said to display disassortative mixing. These behaviours are
schematically depicted in Fig. 4.5. In the uncorrelated or neutral case, the
ANND is independent of k.
Exercise 4.7 Compute the value of kinn for each vertex of the network shown
in Fig. 4.1b.
Note that in regular networks (see Section 4.1) kinn = z i and the degrees
are perfectly correlated, however the knn (k) plot reduces to the single point
(z, z). Real networks are systematically found to be either assortative or disassortative. This means that the first-order topological properties such as the
degree distribution, even though interesting by themselves, still do not capture
the relevant complexity of real networks.
We note that the quantity knn (k) can be expressed in terms of the conditional
probability P (k 0 |k) as
X
knn (k) =
k 0 P (k 0 |k).
(4.22)
k0
64
Figure 4.6: Plots of the average nearest-neighbour degree for two real networks. a)
nn (k) plot for the 1998 snapshot of the Internet (circles); the solid line is proThe k
in (kin ), k
out (kout )
portional to k0.5 (modified from Ref. [16]). b) The three plots k
nn
and k (k) for a snapshot of the World Trade Web in 2000 (the solid line is again pronn (k) curve for the subset of the undirected
portional to k0.5 ); the inset reports the k
network defined only by the reciprocated links (after Ref. [17]).
From the above expression we recover the expected constant trend for the uncorrelated networks described by Eq. (4.20), which inserted into Eq. (4.22) yields
knn (k) = k 2 /k independently of k.
The knn (k)-curve is particularly interesting when it displays the empirical
form
knn (k) k .
(4.23)
For instance, the Internet topology displays the above trend with = 0.5
(see Fig. 4.6a) [16] and is therefore a disassortative network, meaning that highdegree autonomous systems are on average connected to low-degree ones and
viceversa.
Relations similar to (4.21) hold for directed networks as well. More specifically, it is possible to define the average nearest-neighbour in-degree kinn,in and
the average nearest-neighbour out-degree kinn,out as
P P
P P
aij ajk
j6=i
k6=j aji akj
j6=i
nn,in
nn,out
P
P k6=j
ki
,
ki
(4.24)
j6=i aji
j6=i aij
respectively, and correspondingly the knn,in (k in )-curve and the knn,out (k out )curve. However, it is also possible to regard the directed network as undirected
(using the mapping you have found in exercise 4.1) and then consider the undirected ANND defined in Eq. (4.21) and the corresponding knn (k)-curve. For
instance, the quantities knn,in (k in ), knn,out (k out ) and knn (k) calculated on a
snapshot of the World Trade Web in the year 2000 are reported in Fig. 4.6b
[17]. The power-law scaling holds for all three of them. In particular, the undirected ANND obeys Eq. (4.23) with = 0.5, just like the Internet. The inset
of the same figure shows the knn (k) curve computed on a subnetwork of the
undirected WTW where pairs of vertices are connected only if in the original
65
directed network they are joined by two reciprocated directed links pointing in
opposite directions (see Section 4.1). The trend is similar to the other trends,
and the WTW is therefore a disassortative network in all the above representations. Another extensive analysis of the WTW [18], based on a more detailed
data set than that used in ref. [17], confirms the disassortative behaviour but
questions the actual occurrence of a scaling form as described by Eq. (4.23).
Assortativity coefficient
As for the first-order properties, it is possible to define single quantities characterizing the overall second-order properties of the network as a whole. For
instance, one can introduce the assortativity coefficient [19, 20] as the correlation coefficient between the degrees at either ends of a link. To this end,
let us define kk as the average, over all links of an undirected network, of the
product of the degrees of the nodes at two ends of a link:
kk
1 XX
bij ki kj .
L i j<i
(4.25)
We can then defined the assortativity coefficient as

ra
kk k2
.
k 2 k2
(4.26)
A similar expression can be derived for directed networks. Consistently with

the analysis of the ANND curve, real-world networks are generally found to be
either assortative (ra > 0) or disassortative (ra < 0), while it is rare to find
an uncorrelated network (ra 0). Interestingly, social networks turn out to be
assortative, while biological networks, the WWW and the Internet turn out to
be disassortative [19, 20].
Reciprocity
We conclude our discussion of the second-order properties with the notion of
reciprocity, which is a characteristic of directed networks. As anticipated in
Section 4.1, a link from a vertex i to a vertex j is said to be reciprocated if the
link from j to i is also present. The number L of reciprocated links can be
defined in terms of the adjacency matrix as
L
n X
X
aij aji .
(4.27)
i=1 j6=i
It is interesting to compare the above expression with Eq. (4.14). As expected,

while each pair of connected vertices (aij = 1) gives a contribution to the number
of directed links, only the pairs of vertices for which two reciprocated links exist
(aij aji = 1) contribute to L . Since 0 L L, it is possible to define the
reciprocity r of the network as
r
L
,
L
(4.28)
66
so that 0 r 1.
The measured value of r allows us to assess if the presence of reciprocated
links in a network occurs completely by chance or not. To see this, note that r
represents the average probability of finding a link between two vertices already
connected by the reciprocal one. If reciprocated links occurred by chance, then
this probability would be simply equal to the average probability of finding a
link between any two vertices, which is the link density c. Therefore if r = c
the reciprocity structure is trivial, while if r > c (or r < c) reciprocated links
occur more (or less) often than predicted by chance.
Homework 4.6 For the two directed networks in Fig. 4.1, calculate the reciprocity r and the link density c. Compare these two numbers and conclude
whether there is a tendency towards or against reciprocation in the two networks.
Real-world networks generally exhibit a nontrivial degree of reciprocity [21].
For instance, citation networks always display c > 0 and r = 0, since recent
papers can cite less recent ones while the opposite cannot occur. Foodwebs
and shareholding networks display 0 < r < c [21], while social networks [22],
email networks [23], the WWW [12], the World Trade Web [17, 21] and cellular
networks [21] generally display c < r < 1. Finally, the extreme case c <
r = 1 corresponds to (not fully connected) undirected networks, where all links
are reciprocated (such as the Internet, where information always travels both
ways along computer cables). In conclusion, real-world networks systematically
display a nontrivial degree of reciprocity.
4.2.3
Third-order properties
The third-order topological properties of a network are those which go the next
step beyond the second-order ones, since they regard the structure of the connections between a vertex and its first, second and third neighbours. The computation of third-order properties involves products of three adjacency matrix
elements bij bjk bkl . In the general language of conditional degree distributions,
the relevant quantity for an undirected network is now the three-vertices probability P (k 0 , k 00 |k) that a vertex with degree k is simultaneously connected to
a vertex with degree k 0 and to a vertex with degree k 00 . In this case too, the
analysis of real networks reveals interesting properties that we report below.
Local clustering coefficient
The most studied third-order property of a vertex i is the (local) clustering
coefficient Ci , defined (for an undirected network) as the number of links connecting the neighbours of vertices i to each other, divided by the total number
of pairs of neighbours of i (therefore 0 Ci 1). In other words, Ci is the link
density (see Section 4.2.1) of the subnetwork defined by the neighbours of i, and
can therefore be thought of as a local link density. It can also be regarded as
the probability of finding a link between two randomly chosen neighbours of i.
The clustering coefficient is a third-order property since it measures the number of triangles a vertex belongs to, and is therefore related to the occurrence
of (closed) paths of three links. Indeed, if bij denotes an entry of the adjacency matrix of the network, then the number of interconnections between the
67
Figure 4.7: Plot of the C(k)-curve

for two real networks. a) Network of synonymy
between English words (circles); the dashed line is proportional to k1 (after Ref. [24]).
b) The undirected versions of the World Trade Web described in Section 4.2.2 (the
inset shows the subnetwork with only reciprocated links); the solid line is proportional
to k0.7 (after Ref. [17]).
P P
neighbours of i is given by j6=i k6=i,j bij bjk bki /2. The clustering coefficient
Ci is then obtained by dividing this number by the number of possible pairs of
neighbours of i, which equals ki (ki 1)/2 if ki is the degree of i. It follows that
P P
j6=i
k6=i,j bij bjk bki
Ci
.
(4.29)
ki (ki 1)
The above expression is a local (vertex-specific) version of Eq. (2.5).
Homework 4.7 Show that Eq. (4.29) can be rewritten as
P P
j6=i
k6=i,j bij bjk bki
Ci = P P
,
j6=i
k6=i,j bij bki
(4.30)
where it becomes manifest that the numerator counts the number of triangles in
which vertex i participates, and the denominator counts the number of wedges
in which vertex i participates. [Hint: use the fact that b2ij = bij ]. Compute the
value of Ci for each vertex of the network shown in Fig. 4.1b and compare the
calculated values with the (single) value obtained using Eq. (2.5).
For directed networks, the computation of the clustering coefficient can be
carried out on the undirected version of the network. Therefore Eq. (4.29)
holds for directed networks as well, with bij given by the expression you found
in Homework 4.1.
Clustering coefficient versus degree
A statistical way to consider the clustering properties of real networks is similar
to that introduced for the degree correlations. By computing the average value
of Ci over all vertices with a given degree k and plotting it versus k, it is
possible to obtain a C(k)-curve

whose trend gives information on the scaling of
the clustering coefficient with the degree [24].
68
Remarkably, the analysis of real networks reveals that in many cases the
average clustering of k-degree vertices decreases as k decreases, and that this
trend is sometimes consistent with a power-law behaviour of the form
C(k)
k .
(4.31)
For instance, the word network of English synonyms [24] and the aforementioned
(incomplete) representation of the World Trade Web [17] display the above
power-law trend with = 1 and = 0.7 respectively (see Fig. 4.7). For the
WTW we note however that, as for the knn (k)-curve, the analysis of a more
comprehensive version of the network reveals that the C(k)-plot

deviates from
the functional form of Eq. (4.31), even if its decreasing trend is confirmed.
The decrease of Ci with the degree ki is a topological property often referred to as clustering hierarchy [24], since it signals that the network is
hierarchically organized in such a way that low-degree vertices are surrounded
by highly interconnected neighbours forming dense subnetworks, while highdegree vertices are surrounded by loosely connected neighbours forming sparse
subnetworks. Dense subnetworks can be thought of as modules into which the
whole network is subdivided. Low-degree vertices are more likely to belong to
such modules, while high-degree vertices are more likely to connect different
modules together.
By contrast, note that for regular networks the C(k)-curve,

as the knn (k)curve, reduces to a single point with coordinates (z, Ci ).
Average clustering coefficient
It is of course possible to compute the average (network-wide) clustering
coefficient C over all vertices:
n
1X
Ci .
C
n i=1
(4.32)
This quantity represents the average probability to find a link between two
randomly chosen neighbours of a vertex (clearly 0 C 1). Note that it
is different from the (also network-wide) definition of clustering coefficient in
Eq. (2.5) of Chapter 2.
The empirical analysis of most real networks reveals a large (i.e. finite
An analysis of some real networks also reveals that
for large n) value of C.
u displays an approximate linear dependence on the
the rescaled quantity C/c
number of vertices n:
u n.
C/c
(4.33)
u n0.96 .
This is shown in Fig. 4.8, reporting the data with best power-law fit C/c
Exercise 4.8 Show that, for regular D-dimensional lattices with up to l-th
u = 0 if l = 1 and C/c
u = 3(n1)(z2D) n if l > 1.
neighbour connections, C/c
4Dz(z1)

In conclusion, just like regular lattices with l > 1, most real networks are on
average highly clustered. Both classes of networks display a qualitative linear
u with n.
scaling of C/c
69
Figure 4.8: Log-log plot of the ratio between the average clustering coefficient C and
the link density cu as a function of the size n of the network. Full circles represent data
from the 18 networks summarized in ref. [2]: 2 food webs, the substrate network and
the reaction network of the bacterium E. coli, the neural network of the nematode C.
elegans, the collaboration network between movie actors, the power grid, 6 scientific
coauthorship data sets, 2 maps of the Internet, the WWW, the networks of word cooccurrence and word synonymy. Empty circles represent data from 16 additional food
webs [25]. The solid line represents the best power-law fit to the data, having slope
0.96 (modified from ref. [25]).
70
4.2.4
Global properties
Although it is in principle possible to proceed with the analysis of fourth-order

properties and so on, the study of higher-order properties of real networks generally goes directly to global properties, i.e. those that (at least in principle)
require the exploration of the entire network to be computed. Since in a network with n vertices the longest path required to go from a vertex to any other
vertex contains at most n 1 links, or n if one includes loops of length n, it
follows that global properties involve products of at most n adjacency matrix
elements:
(4.34)
bi1 i2 bi2 i3 . . . bin1 in bin i1 .
{z
}
|
n factors
Global properties often have the most important effect on processes taking
place on networks, since they are responsible for the way information spreads
over the network and for the possible emergence of collective behaviour of vertices (some of these aspects will be covered in Part II). Here we consider two
(out of the many) examples of global network properties: connected components
and average distance, which are intimately related to each other.
Connected components
In Section 2.1 we have already mentioned that two vertices in an undirected
network are said to belong to the same connected component (or cluster) if
a path exists connecting them through a finite number of steps. The size of a
connected component is the number of vertices in it. Note that for each of the
regular networks shown in Fig. 4.2 all vertices belong to the same connected
component.
For directed networks, it is possible that a path going from a vertex i to a
vertex j exists, while no path from j to i is there. In other words, it is possible
to define the in-component of vertex i as the set of vertices from which a path
exists to i, and the out-component of i as the set of vertices to which a path
exists from i. Finally, two vertices i and j are said to belong to the same strongly
connected component (SCC) if it is possible to go both from i to j and from j
to i. We have already encountered the SCC in our discussion of the WWW in
Subsection 1.2.3.
There is in principle no limit to the number and size of connected components
in a network. However, an empirical property of most real networks is the
presence of one very large component containing most of the vertices, plus a
number of much smaller components containing the few remaining vertices.
This means that the spread of information on real networks is efficient, since
starting from a vertex in the largest component it is possible to reach a large
number of other vertices in the same component. The presence of the largest
component is interesting also for theoretical reasons, since it is related to the
occurrence of a phase transition in models where links are drawn with a specified
probability (see Chapter 8).
Shortest distance
Another important property, which better characterizes the communication
properties in a network, is the shortest distance between vertices. For each
71
pair of vertices i and j in a (strongly) connected component of a (directed)

network, the shortest distance dij is defined as the minimum number of links
that must be crossed to go from i to j. Note that for directed networks in general dij 6= dji (which means
that, actually, dij becomes a quasimetric). Then,

by considering all 2 m
=
m(m
1) ordered pairs of vertices in a (strongly)
2
connected component C of size m, the average distance of the component is
defined as the arithmetic average
XX
1
d
dij .
(4.35)
m(m 1)
iC jC
The shortest distance between two vertices belonging to different (strongly)

connected components can be formally defined as infinite. Then, in principle
the definition (4.35) can be extended to the entire network by performing an
average over all the n(n 1)/2 pairs of the n vertices. However, this would yield
d = for all networks where the (strongly) connected component does not span
the entire network, a result that is not very informative about the differences
in other aspects of the topology of such networks. To prevent this outcome,
the average distance can be alternatively defined as the harmonic mean over all
n(n 1) ordered pairs of vertices, via the expression
d1
XX
1
d1
n(n 1) i j ij
(4.36)
where now i and j run over the entire set of n vertices. In such a way, d will
be finite even for networks where the (strongly) connected component does not
coincide with the whole network and its value will discriminate among different
topologies.
The empirical behaviour of d is very important. It turns out that, even in
a network with an extremely large number of vertices, the average distance is
generally very small. This property, known as the small-world property, is
shown in Fig. 4.9 where a plot of dln k versus n is reported for a set of real
networks. A rough logarithmic trend is observed, meaning that d scales with n
according to the approximate law
ln n
d .
ln k
(4.37)
The above equation is usually taken as a quantitative statement of the smallworld effect (see also Chapter 2, Eq. (2.13)). Its importance lies in the remarkable deviation from the behaviour of regular networks in any Euclidean
dimension D, which instead display d n1/D and are therefore characterized
by a much larger average distance.
The small-world effect is sometimes defined (in a stronger sense) as the
simultaneous presence of a small average distance and a large average clustering
coefficient. As we mentioned above, both properties are typically observed in
real-world networks.
Betweenness centrality
So far, we have seen various ways to define a measure of importance for the
vertices in a network. Most of them are based on different versions of the
72
Figure 4.9: Log-linear plot of the product between the average distance d and the
as a function of number n of vertices for a set of
logarithm of the average degree ln k
real networks studied in Ref. [2] (see the cited reference for the symbol legend). The
dashed line represents the curve ln n, showing that real data approximately follow the
even if with some exceptions (modified from Ref. [2]).
law d = ln n/ ln k,
concept of centrality: a vertex is more important if it is close to many other

vertices. We have already considered the degree as a direct and completely local
measure of centrality in terms of the number of first neighbours of a vertex.
Another widely used, but non-local, choice is the notion of betweenness
centrality (or betweenness for short). The betweenness of a vertex i in a
network measures the number of shortest paths between all possible pairs of
vertices that pass through i [8]. Formally, we can write the betweenness Bi of
a vertex i as
X X Njk (i)
Bi =
,
(4.38)
Njk
k6=i j6=k,i
where the sums run from 1 to the total number n of vertices and one must
take care that j is different from k and that both j and k are different from
i. The quantity Njk counts the total number of shortest paths between j and
k, while Njk (i) counts how many such paths pass through i. Whenever two or
more shortest paths of equal length exist between the same two vertices, the
contribution to the betweenness centrality of a third vertex i will be the number
of shortest paths (between the two given vertices) that pass through i, divided
by the total number of shortest paths between the two vertices.
As shown in the example in Fig. 4.10, the vertex that is crossed the most
times is also the most central in terms of its betweenness. As a result, vertices
with high betweenness play the role of bridges across regions of the network
that are highly connected internally, and more sparsely connected among each
other. In real-world networks, the presence of such regions is typically observed
(see next subsection). Correspondingly, a few bridging vertices with very high
betweenness are typically detected, along with several internal vertices with
73
Figure 4.10: The betweenness of the central black vertex is computed by considering all shortest paths (distances) between all the possible pairs of vertices.
Between the two grey vertices (A and B) in the figure there are two different
shortest paths, one of which contributes 1/2 to the betweenness of the black
vertex and one of which does not contribute to it.
lower betweenness.
Community structure
Very important structures that can be identified in a network are communities
of densely connected vertices. Communities are subsets of vertices whose internal link density is higher than the average density across the entire network, or
higher than an expected value (obtained under certain null hypotheses). Detecting communities is a non-local task, as it typically requires the calculation
of quantities that require repeated iterations across the whole network. A community can consist of any number of vertices (from a few vertices up to a large
fraction of the network), and a network can therefore be partitioned into heterogeneously sized communities.
There is no unique definition of a community, and even when a single definition is adopted, there are various methods to identify the communities of
a particular network [26]. For instance, some definitions allow for overlapping
communities that share one or more vertices, while others do not. Similarly,
some definitions allow for hierarchical communities that can be further resolved
into smaller sub-communities, while others do not.
A simple approach employs the concept of betweenness centrality (see previous subsection) to define and detect communities in large networks [26]. This
method starts by computing the betweenness of all nodes (or of all links, via an
appropriate modification of the definition (4.38)) and iteratively removing the
nodes (or links) with the largest betweenness, recalculating all the values of the
betweenness after each removal. In such a way, the bridges between communities are cut and the network gets partitioned hierarchically into smaller and
smaller communities.
Other methods are based on the comparison of the real network with a null
model, i.e. a mathematical model where some topological property is taken as
input from the data, but where communities are absent by construction [26].
The best partition of the network into communities is sought for by maximizing
a so-called modularity function defined as a sort of difference between the real
network and its null model. Some null models of networks will be introduced
74
in Chapters 5 and 10. The Configuration Model introduced in Chapter 3, if

implemented in such a way that the empirical degree sequence of the real-world
network can be taken as input, is a convenient and widely used null model to
detect communities.
Other methods are based on the spectral properties of empirical adjacency
matrices or on matrix algebra [26].
Real-world networks typically display strong community structure. An example is provided in Fig. 1.14 in Chapter 1 for a network of books (about US
politics) that are frequently co-purchased online. The particular method used
to resolve the communities in the figure is based on nonnegative matrix factorization [27]. Vertices in the same community are assigned the same colour.
The communities are allowed to overlap (some vertices in the figure are of mixed
colour). We see that the community detection method identifies three communities that largely reflect the political viewpoints of the books: liberal (circles),
neutral (triangles) and conservative (squares). The method also detects
vertices (coloured in pink) that are outliers, i.e. that do not belong to any
community.
Bibliography
[1] S.H. Strogatz, Nature 410, 268 (2001).
[2] R. Albert and A.-L. Barab
asi, Rev. Mod. Phys. 74, 47 (2002).
[3] S.N. Dorogovtsev and J.F.F. Mendes, Advances in Physics 51, 1079 (2002).
[4] M.E.J. Newman, SIAM Review 45, 167 (2003).
[5] A.-L. Barab
asi, Linked: The New Science of Networks, Perseus, Cambridge,
MA (2002).
[6] M. Buchanan, Nexus: Small Worlds and the Ground- breaking Science of
Networks, Norton, New York (2002).
[7] D.J. Watts, Six Degrees: The Science of a Connected Age, Norton, New
York (2003).
[8] Caldarelli, G. (2007). Scale-free networks: complex webs in nature and
technology. Oxford University Press.
[9] Newman, M. (2010). Networks: an introduction. Oxford University Press.
[10] Q. Chen, H. Chang, R. Govindan, S. Jamin, S.J. Shenker and W. Willinger,
Proceedings of the 21st Annual Joint Conference of the IEEE Computer
and Communications Societies, IEEE Computer Society (2002).
[11] H. Jeong, S. Mason, A.-L. Barabasi and Z.N. Oltvai, Nature 411, 41 (2001).
[12] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata,
A. Tomkins and J. Wiener, Computer Networks 33, 309 (2000).
[13] Newman, M. E. (2005). Power laws, Pareto distributions and Zipfs law.
Contemporary physics, 46(5), 323-351.
[14] B.B. Mandelbrot, The Fractal Geometry of Nature, Freeman, San Francisco
(1983).
[15] S. Maslov, K. Sneppen, and A. Zaliznyak, Physica A 333, 529-540 (2004).
[16] R. Pastor-Satorras, A. V
azquez and A. Vespignani, Phys. Rev. Lett. 87,
258701 (2001).
Serrano and M. Bogu
[17] M.A.
n
a, Phys. Rev. E 68, 015101(R) (2003).
[18] D. Garlaschelli and M.I. Loffredo, Phys. Rev. Lett. 93, 188701 (2004).
75
76
BIBLIOGRAPHY
[19] M.E.J. Newman, Phys. Rev. Lett. 89, 208701 (2002).

[20] M.E.J. Newman, Phys. Rev. E 67, 026126 (2003).
[22] S. Wasserman and K. Faust, Social Network Analysis (Cambridge University Press, Cambridge, 1994).
[23] M.E.J. Newman, S. Forrest and J. Balthrop, Phys. Rev. E 66, 035101(R)
(2002).
[24] E. Ravasz and A.-L. Barabasi, Phys. Rev. E 67, 026112 (2003).
[25] J.A. Dunne, R.J. Williams and N.D. Martinez, Proc. Natl. Acad. Sci. USA
99, 12917 (2002).
[26] Fortunato, S. (2010). Community detection in graphs. Physics Reports,
486(3), 75-174.
[27] X. Cao, X. Wang, D. Jin, Y. Cao & D. He. Scientific Reports 3, 2993
(2013).
Chapter 5
Network Ensembles
In Chapters 2 and 3 we have already encountered various network models. All
these models have one feature in common: they are stochastic, i.e. they are
based on some degree of randomness. If we fix the parameters of a stochastic
network model, all the possible realizations (i.e. graphs) of the model itself
define a so-called ensemble of random graphs. Such ensemble is a collection1
G {G1 , . . . , GM } of M graphs (i.e. adjacency matrices), where each graph Ga
is assigned a probability P(Ga ) such that
X
GG
P(G) =
M
X
P(Ga ) = 1.
(5.1)
a=1
The number M is known as the cardinality of the ensemble G. Since, as we

discussed in Chapter 4, any graph G is uniquely specified by its adjacency
matrix, we may think of G as an adjacency matrix with entries gij . This notation
applies to both directed and undirected graphs, and generalizes the notation aij
and bij introduced in Eqs. (4.1) and 4.2, respectively, for these two classes of
networks.
In this Chapter we study graph ensembles in more detail. Starting with
some preliminary observations about the Erdos-Renyi random graph and the
Configuration Model introduced in Chapters 2 and 3, respectively, we gradually
arrive at the definition of so-called maximum-entropy ensembles of networks.
The importance of maximum-entropy ensembles is the fact that, starting from
local information, they sample the space of graphs uniformly (under some constraint). This leads to unbiased expectations for the higher-order properties of
a network. This is property if of crucial importance not only for the theoretical reason of redefining the Configuration Model in order to allow for strongly
heterogeneous degree sequences (as we show in Section 5.3), but also for the
practical problem of pattern detection in real-world networks (as we will discuss
in Chapter 10).
1 Note that, for our purposes here, the symbol G denotes a different family of graphs than
that denoted in Chapter 2.
77
78
5.1
CHAPTER 5. NETWORK ENSEMBLES
Equiprobability in the Erd

os-R
enyi model
To realize the importance of unbiasedness, we first highlight some important

properties of the Erd
os-Renyi (ER for short) model introduced in Chapter 2.
Using the adjacency matrix notation introduced in Chapter 4, we restate the
model as follows. Given a set of n vertices, each pair of vertices is connected
by an undirected link with probability p (independently of all other pairs). No
self-loops are created. This implies that, given p, the expected value of the entry
gij of the n n adjacency matrix of a graph G generated by the model is
p i 6= j
E(gij ) hgij i =
(5.2)
0 i=j
P
where, here and in what follows, the expectation value
x xP(X = x) of a
(discrete) random variable X is denoted by E(X) or hXi. It therefore follows
from Eq. (4.12) that the expected number of undirected links is
E(Lu ) = p
n(n 1)
.
2
(5.3)
So, once n is fixed, each given value of p produces a corresponding expected

number of links.
We might reverse the point of view and say that, if we would like the network to have a given expected number of links E(Lu ), then we should set the
probability p to the corresponding value
p=
2E(Lu )
.
n(n 1)
(5.4)
This strategy is useful if, for instance, we want to compare the predictions
of the ER model with the observed properties of a real-world network with a
given number n of vertices and a given number Lu of undirected links. In this
perspective, the empirical values of n and Lu are treated as constraints, and
the model is fitted to these constraints by choosing the same n as in the real
network and p as in Eq. (5.4), with E(Lu ) Lu . Note that n will be necessarily
finite, and we cannot use the results obtained for n in Chapter 2. However,
large real-world networks imply that many asymptotic results will hold at least
approximately.
We already know, from the results of Chapter 4, that the comparison between the real network and the ER model will be unsuccessful: the ER model
is not able to reproduce many properties of most real-world networks, in particular their broad degree distribution and their large clustering. However, the
ER model has an important and desirable property: all the graphs with the
same value of n and Lu are generated with the same probability, i.e. they are
equiprobable. The proof of this result is the goal of the following series of homeworks.
Homework 5.1 Write the cardinality Mn of the ER ensemble, when n is
the (fixed) number of vertices. Calculate the number Mn (Lu ) of simple and
undirected
graphs (without self-loops) with n vertices and Lu edges. Check that
P
M
(L
n
u ) = Mn , where the sum runs over all the possible values of Lu in
Lu
the ensemble of graphs with n vertices.
5.2. IMPLEMENTATIONS OF THE CONFIGURATION MODEL
79
Homework 5.2 Calculate the probability P(G) to generate a particular graph

with binary adjacency matrix G as a function of p in the ER model (write your
answer explicitly in terms of the entries
P{gij } of the matrix G). Show that P(G)
depends on G only through Lu (G) i<j gij . Note that the latter expression
coincides with Eq. (4.12) adapted to the notation used here. Write the resulting
expression for P(G) as a function of p, n and Lu (G) explicitly.
Homework 5.3 The above homeworks allow you to conclude that, in the ER
model, all graphs with the same value of n and Lu are generated with equal
probability. Use this result to write the probability of generating any graph with
n vertices and Lu undirected links, among the possible Mn (Lu ) ones.
Another important property of the ER model is that its natural implementation (connecting each pair of initially disconnected vertices with probability
p, or equivalently disconnecting each pair of initially connected vertices with
probability 1 p) is always feasible, i.e. it is possible to define computational
algorithms that do not get stuck and always lead to a realization of the model
(see Chapter 6).
In Section 5.2 we extend the discussion of Chapter 3 by showing that both
the equiprobability and the feasibility property are violated in the simplest
implementations of the Configuration Model. This limitation, which will be
further illustrated in Chapter 7, calls for a more sophisticated implementation
of this model. This is provided in Section 5.3 in the context of maximum-entropy
ensembles of graphs.
5.2
Implementations of the Configuration Model
With this section we return to the so-called Configuration Model [15, 1, 2] (CM
for short) introduced in Chapter 3. We first recall the abstract idea behind
the model and make general considerations. We then focus on the specific
implementations that have been proposed to realize this idea.
Let us consider undirected networks first. The idea of the CM model is
to assign each vertex i a desired degree ki and then generate the ensemble of
all graphs compatible with the resulting degree sequence {ki }ni=1 , by drawing
links at random between vertices in such a way that the desired degree sequence
is realized. All graphs compatible with the desired degree sequence should be
realized with the same probability. The degree sequence can be picked from any
desired degree distribution P (k), e.g. a scale-free one with desired exponent.
However, as discussed in Chapter 3 (Subsection 3.1.3), the degree sequence
must be graphical, i.e. realizable by at least one graph.
For directed networks, the CM is easily generalized by assigning each vertex
i a given in-degree kiin and a given out-degree kiout . Now the idea is to generate
an ensemble of random directed graphs in such a way that the desired in- and
out-degree sequences are simultaneously realized.
Note that, unlike the preferential attachment model (see Chapter 3), the
CM does not make explicit hypotheses on how networks organize themselves in
a given structure. It rather is a null model, generating a suitably randomized
ensemble of graphs once some low-level information is assumed as an input. As
80
we will see in detail in Chapter 10, the CM can also be used as a benchmark
for empirical data: a comparison between the CM and a real-world network
allows us to check whether some of the higher-order properties observed in the
real-world network are consistent with those generated by the CM using the
same degree sequence (which is a first-order property, see Chapter 4) as the real
network. If this is the case, then one can conclude that the observed higher-order
properties are a mere outcome of the specified form of the degree distribution,
being consistent with a random assignment of links compatible with the degree
sequence. If this is not the case, then the observed deviations from the model
indicate interesting structural patterns that cannot be traced back to the null
hypothesis (i.e. they are not explained by the degree sequence alone).
It is important to notice that, in the above abstract formulation, the CM
model can be regarded as a generalization of the ER model in the following
sense. The ensemble of networks generated by the ER model is completely
random except for the number of links, which is specified by fixing the connection
probability p. In a similar manner, the ensemble of networks generated by
the CM model is completely random except for the degree sequence, which is
specified from the beginning.
In other words, while in the ER model the only constraint (besides the number n of vertices) is given by the number of links Lu , in the CM model the
constraint is the entire degree sequence {ki }ni=1 . However, as we will show below, it makes a big difference whether the constraint is soft, i.e. enforced only
as an average over all realizations (as in the ER) or sharp, i.e. enforced on each
individual realization separately (as in the implementation of the CM model
discussed in Chapter 3). In the first case one speaks of canonical ensembles,
while in the second case one speaks of microcanonical ensembles. It turns out
that microcanonical ensembles are much more difficult to deal with mathematically and are prone to bias. By contrast, (the simplest) canonical ensembles are
analytically tractable and unbiased.
In what follows, we are going to describe various implementations of the CM
model. These variants will gradually lead us from microcanonical to canonical
implementations of the model.
5.2.1
Link stub reconnection
The implementation of the CM that we already discussed in Chapter 3 is the

so-called link stub reconnection process. Here, as many link stubs (half links)
as the prescribed degree ki are initially attached to each vertex i. All these
stubs are then randomly matched, with the aim of realizing a random graph
with the desired degree sequence. In principle, iterating this process generates
as many realizations of the network as desired, and samples the graph ensemble
defined by the given degree sequence.
Note
P that, once the desired degree sequence is specified, the number of links
Lu = i ki /2 is automatically fixed as well. Therefore the actual values of the
number of links and of the vertex degrees are fixed, not simply their expected
values as in the ER model (see Chapter 2). This implementation of the CM is
therefore microcanonical.
As already anticipated in Chapter 3, a problem with the above implementation of the CM is that it typically gives rise to undesired self-loops and multiple
links between two vertices. If these extra links are not discarded, the resulting
81
Figure 5.1: Elementary step of the local rewiring algorithm for a) undirected and
b) directed networks. Two edges, here (A, B) and (D, C), are randomly chosen from
graph G1 and the vertices at their ends are exchanged to obtain the edges (A, C) and
(D, B) in graph G2 . Note that the degree of each vertex is unchanged (in the directed
case, the in- and out-degrees are separately conserved).
graph ensemble cannot be consistently compared with real networks that do

not admit such kinds of links, since this comparison might highlight patterns
that are merely due to the differences between the two topological classes. At a
mathematical level, one can identify degree sequences for which these problems
can be avoided (e.g. for sparse graphs in the asymptotic limit n ; see Chapter 3). However, in practice, if one wants to define the CM using the empirical
degree sequence of a real-world network, then the above problems cannot be
easily avoided.
If double links and self-loops are deliberately removed, then the original
degree sequence can no longer be realized. Computationally, if one rejects the
attempted matching of two link stubs that would result in double links or selfloops, the algorithm will typically get stuck in configurations where largedegree vertices have no more eligible partners to connect to (see Chapter 7).
So, operatively, the link stub reconnection method is unsatisfactory when broad
empirical degree sequences are taken as an input.
5.2.2
The local rewiring algorithm
In an attempt to overcome the above limitation, some important variants of

the CM have been introduced. Maslov, Sneppen and Zaliznyak [15] proposed
to start from the whole real-world network (not just its degree sequence) and
iteratively randomize it in a degree-preserving manner, thereby generating an
ensemble of random graphs with the same degree sequence as the original network. Their randomization process consists in what they call the local rewiring
algorithm, which deliberately avoids the occurrence of multiple links and selfloops in the randomized networks.
The local rewiring algorithm consists of the iteration of the elementary step
shown in Fig. 5.1a and Fig. 5.1b for undirected and directed networks, respectively: two links are randomly chosen from the initial graph G1 and the vertices
at their ends are exchanged in such a way that the new graph G2 has the same
degree sequence of the initial one. If the new links already exist in the network,
then the step is aborted and two different links are randomly chosen again. In
this way an ensemble of randomized networks is generated, having the same
degree sequence as the original network and no multiple links or self-loops.
Note that this ensemble is microcanonical, like in the link stub reconnection
82
method. Howeverm, the two ensembles are different, since double links and selfloops are absent here, while they might be present in the link stub reconnection
method. In particular, since two vertices cannot be connected more than once,
in the local rewiring algorithm the presence of links between high-degree vertices is suppressed, determining a certain degree of spurious disassortativity
which is not due to a basic anticorrelation between vertex degrees. This important point highlighted by Maslov, Sneppen and Zaliznyak led them to show
that much of the disassortativity observed in the Internet (see Section 4.2.2)
can be accounted for in this way, while other patterns such as the clustering
properties are instead genuine [15]. Considerations of this type, which require
the comparison between a real-world network and some implementation of the
CM, will be made extensively in Chapter 10.
Recently, it has been proved mathematically that the local rewiring algorithm is biased, i.e. it does not explore the space of graphs compatible with the
degree constraints uniformly [3, 4]. Roughly speaking, the root of the problem is the fact that the algorithm explores with higher probability the graph
configurations that are closer to the orginal network. In order to overcome
this problem, one should introduce a suitable acceptance probability for each
attempted configuration. However, the calculation of this probability is computationally demanding because it depends on the current configuration, and
should therefore be repeated at each step of the algorithm.
5.2.3
The Chung-Lu model
Microcanonical ensembles are difficult to deal with analytically. Indeed, in the

implementations of the CM discussed so far, it is not obvious how to write down
an exact for the probability of occurrence of a given graph G in the ensemble,
or equivalently the probability that two vertices are connected, given the degree
sequence.
Chung and Lu [14] proposed a completely different variant of the CM where
the graph ensemble is turned from microcanonical into canonical. The main aim
of their approach is that of obtaining an explicit mathematical expression for
the probability that two vertices are connected in the ensemble. Since imposing
a fixed number of links seriously complicates the analytical calculations in the
microcanonical approach, Chung and Lu reinterpreted the configuration model
in a canonical form, as a natural extension of the ER model. In this extension,
not only the expected number of links, but the whole expected degree sequence
is specified.
More precisely, in the case of an undirected network, a link between two
vertices i and j is drawn with probability
pij =
ki kj
,
2Lu
(5.5)
where Lu is the observed number of links and ki , kj are the observed degrees
of vertices i, j. This choice is (at least apparently) reasonable, because the
ensemble averages of the degrees converge to their observed values:
P
X
j kj
E(ki ) =
pij = ki
= ki .
(5.6)
2Lu
j
83
Homework 5.4 Equation (5.6) is not entirely correct. Find out why and
write the corresponding correct expression. Discuss in which limit the correct
expression reduces to Eq. (5.6).
Homework 5.5 Using the correct expression, write the relationship between
the observed number of links Lu and the expected number of links hLu i generated
by the Chung-Lu model. Rearrange this expression to write the difference Lu
hLu i as a function of only the first and second moments of the degree distribution
P (k) of the real-world network. Discuss the effects of the heterogeneity (i.e. the
breadth) of the degree distribution. Discuss these effects for a scale-free degree
distribution of the form P (k) k with 2 < 3.
Note that the factorized form of pij in Eq. (5.5) also implies that no degree
correlations are introduced: the expected average nearest-neighbour degree (see
Chapter 4) reads
P 2
P
j kj
j pij kj
nn
= P
,
(5.7)
E(ki )
ki
j kj
which is independent of i and has the expected form k 2 /k valid for uncorrelated
networks (see Section 4.2.2).
Exercise 5.1 Equation (5.7) is not entirely correct. Find out why and try
to write a more refined expression. Try to write a similar expression for the
expected value E(Ci ) of the clustering coefficient of vertex i (see definition in
Chapter 4, Section 4.2.3).
Note that the model can be formulated also for directed graphs, by establishing a directed link from vertex i to vertex j with probability
pij =
kiout kjin
,
L
(5.8)
P
P
where L = i kiin = i kiout is the observed number of directed links, and kiout , kjin
are the observed out-degree of vertex i and the observed in-degree of vertex j,
respectively. This choice ensures that hkiin i = kiin and hkiout i = kiout for all vertices, generalizing Eq. (5.6). Note, however, that this is not entirely correct and
subject to the same limitation discussed in Homework 5.4.
Homework 5.6 Use Eq. (5.8) to write the expected number E(L ) hL i of
reciprocated links (see Chapter 4), and use the result to approximate the expected
reciprocity hri in the Chung-Lu model. Compare this value with the reciprocity
of a random graph with the same number of vertices and directed links.
The Chung-Lu model avoids by construction the occurrence of multiple links
and self-loops, since each pair of (distinct) vertices is considered only once.
However, to ensure (as we should) that 0 pij 1 for all i, j in Eqs. (5.5)
and (5.8), we are forced to consider only those degree sequences satisfying the
constraint
v
uX
p
u n
max{ki } 2Lu = t
kj ,
(5.9)
i
j=1
84
and similarly maxi {kiin } L and maxi {kiout } L for directed graphs. A
connection probability pij > 1 can be regarded as the establishment of multiple
links between i and j, and this possibility is avoided only by imposing the above
constraint. Therefore, in the Chung-Lu model the problem of the occurrence
of multiple links is circumvented by restricting the possible degree sequences to
those satisfying Eq. (5.9).
Unfortunately, the constraint expressed by Eq. (5.9) is very strong and is
violated by most empirical degree distributions where a few hubs with very large
degree are present. This limitation prevents us from using the Chung-Lu model
for most empirical degree sequences. Since, as we mentioned, the violation of
Eq. (5.9) can be thought of as leading to multiple links, the problem of the
Chung-Lu method is in some sense the canonical counterpart of the problem
encountered in the link stub reconnection method.
Exercise 5.2 Consider a marginal degree distribution where maxi {ki } =
2Lu . Discuss whether this is enough to ensure that Eq. (5.6) is a good approximation to the correct expression you found in exercise 5.4.
Homework 5.7 Consider a regular network with n vertices where ki = z i
and check whether the condition (5.9) holds. Discuss what you obtain if you
use Eq. (5.5) to generate the graph ensemble in this case. Use your result in
Homework 5.5 to write the expression for the difference Lu hLu i in this case.

Homework 5.8 Now consider a star graph with n vertices, where a central
vertex is connected to all the other vertices (and these vertices are not directly
connected to each other), and check whether the condition (5.9) holds. Discuss
what you obtain if you use Eq. (5.5) to generate the graph ensemble in this case.
5.2.4
The Park-Newman model
The limitation of the Chung-Lu model led Park and Newman [68] to modify
the canonical approach in such a way that no restriction on the desired degree
sequence is imposed, and at the same time no multiple links are generated.
Park and Newman started from the general problem of finding the form of the
connection probability pij that generates a canonical ensemble of graphs with
no multiple links and such that two graphs with the same degree sequence are
equiprobable, in the general spirit of the Configuration Model.
As for the Chung-Lu model, we want the connection probability to be a
function pij = p(xi , xj ) of some quantities xi , xj controlling the expected degrees
of vertices i and j. The quantities {xi }ni=1 play a role similar to that of the
desired degrees {ki }ni=1 in the Chung-Lu model, even if they turn out to be
in general very different from the expected degrees {hki i}ni=1 and are therefore
denoted by a different symbol.
The starting point is to write the probability P(G) of occurrence of a given
graph G (with adjacency matrix entries {gij }) in the ensemble as a product,
over all pairs of vertices, of either pij (if the link is realized, i.e. pij = 1), or
(1 pij ) (if the link is not realized, i.e. gij = 0):

Y
Y
P(G) =
pij
i<j|gij =1
(5.10)
i<j|gij =0
(1 pij )
i<j
= P0
(1 pij )
85
Y
i<j|gij =1
Y
i<j|gij =1
pij
1 pij
pij
,
1 pij
where P0 i<j (1 pij ) is a product over all vertex pairs and is therefore
independent of the particular graph G.
The above expression can be used to find the form of pij warranting that two
graphs G1 and G2 with the same degree sequence are equiprobable. Looking
again at Fig. 5.1, the requirement that the graphs G1 and G2 occur with the
same probability P(G1 ) = P(G2 ) translates into the requirement
pDC
pAC
pDB
pAB
=
,
1 pAB 1 pDC
1 pAC 1 pDB
(5.11)
since the two graphs are identical except for the subgraphs defined by the
four vertices A, B, C, D. For the above expression to hold for all quadruples
A, B, C, D, the form of pij must be such that pij /(1 pij ) = fi fj , where fi is
a quantity depending on i alone. Recalling that pij = p(xi , xj ), we see that
fi = f (xi ). Rearranging for pij , we have
pij = p(xi , xj ) =
f (xi )f (xj )
.
1 + f (xi )f (xj )
(5.12)
Any form of f (x) is compatible with the requirement in Eq. (5.11). Since different choices can be mapped to each other via a redefinition of x, we can choose
the simplest nontrivial2 function f (x) = x for later convenience. This yields
p(xi , xj ) =
xi xj
.
1 + xi xj
(5.13)
Equation (5.13) is of fundamental importance. It ensures that 0 pij 1

with no restriction on the degree sequence, thus overcoming the limitation of
the Chung-Lu approach. Moreover, it ensures that the desired property of
unbiasedness, that we proved for the ER model, is extended to the CM. In
other words, the probability to generate a graph G depends only on the degree
sequence of G. This implies that graphs with the same degree sequence are
equiprobable, i.e. they are sampled uniformly. This nicely extends the properties
we proved for the ER (see exercises 5.1-5.3) to the CM.
Homework 5.9 Show
P that the graph probability P(G) depends on G only
through {ki (G)} { j6=i gij }, implying that graphs with the same degree sequence are equiprobable. Show that this is not true if the probability pij of the
Chung-Lu model is used.
2 Note that the simplest choice f (x) = const would lead to the result p
ij = const as in
the ER random graph model, where we lose control over the expected degree sequence. This
still ensures that any two graphs with the same degree sequence are equiprobable, but in a
trivial way, since in the ER model actually any two graphs with the same number of links are
equiprobable (see Section 5.1). So the simplest nontrival choice for the CM is f (x) = x.
86
Figure 5.2: Average degree hk(x)i of vertices versus x corresponding to the choice
(x) x for three values of . The trend is initially linear and then saturates to
the asymptotic value k n 1 (atfer ref. [68]).
We now come to an important result. The expected degrees read

E(ki ) =
X
j
pij =
X
j
xi xj
1 + xi xj
(5.14)
and the above expression can be used to either generate an ensemble of networks
with degree distribution determined by the (free) parameters {xi }, or to find the
particular values of {xi } that produce an expected degree sequence equal to the
observed one. In the latter case, the above expression is intended as a system
of n nonlinear coupled equations where {hki i} = {ki } are known quantities and
{xi } are the unknowns (we will consider this case explicitly in Chapter 10).
Equation (5.14) shows that the expected degree of a vertex with a given
value of x can be written as a function of x, after integrating out the variables
for the other vertices. This can be best appreciated by considering a continuous
approximation where the distribution of x over all vertices is assumed to be the
continuous density (x):
Z
xy
Ex (k) hk(x)i = (n 1)
dy
(y).
(5.15)
1 + xy
0
The behaviour of hk(x)i is proportional to x for small values of x and then saturates to the maximum value n 1 for large x, consistently with the requirement
of no multiple or self loops.
Park and Newman [68] studied the model assuming a power-law distribution
(x) x with various values of the exponent (see Fig. 5.2). They found
that this assumption has two important consequences on the topology: firstly,
the degree distribution P (k) behaves as a power-law with the same exponent
of (x) for small values of x, but then diplays a cut-off ensuring k n 1
(see Fig. 5.3a). Secondly, the average nearest neighbour degree turns out to be
a decreasing function of the degree (see Fig. 5.3b). As expected, the absence
87
Figure 5.3: a) Cumulative degree distribution P> (k) corresponding to the choice
nn (k) for the
(x) x for three values of . b) Average nearest neighbour degree k
same three choices of the exponent . Here isolated symbols correspond to numerical
simulations, while solid lines are the analytical predictions (after ref. [68]).
of multiple links generates an effective repulsion between high-degree vertices,

resulting in some spurious disassortativity. Moreover, the studied mechanism
generates a k nn (k) which is not strictly a power-law, even if it approaches a
power-law behaviour asymptotically (see Fig. 5.3b). These results allowed Park
and Newman to confirm that, as suggested by Maslov et al. [15] by making use
of the local rewiring algorithm (see Section 5.2.2), part of the disassortativity
displayed by the Internet can be accounted for by this mechanism.
The Park-Newman model leads to a very interesting analogy with the statistical physics of quantum particles, i.e. microscopic particles that obey quantum
mechanics and are therefore subject to some discreteness constraint. The analogy relates links of binary graphs to so-called Fermi particles or fermions. In
quantum physics, it is found that each microstate of a system can only be occupied by one fermion at a time (this is known as the exclusion principle). In
jargon, the occupation number of a microstate in a system of fermions can
be only 0 (no particles in that state) or 1 (one particle in that state). In this
analogy, each pair of vertices i and j can be regarded as a microstate whose
occupation number is gij = 1 if a particle (link) is there, and gij = 0 if not.
The requirement of no multiple links (gij 1) is equivalent to the exclusion
principle that there is at most one particle per state, and leads to Eq. (5.11)
which is the analogous of the so-called Fermi distribution.
Homework 5.10 Let us consider the sparse graph limit where xi xj 1
i, j. In this regime, Eq. (5.11) can be approximated as
p(xi , xj ) xi xj .
(5.16)
Show that, with the above approximation,

the system of equations (5.14) decou
ples and the solution is xi = ki / 2Lu i.
88
The above exercise implies that the model defined by Eq. (5.16) is equivalent to the Chung-Lu model defined by Eq. (5.5). As already discussed in
Section 5.2.3, in this limit the expected degrees converge to the specified values
of x, and no degree correlations are introduced. Therefore the spurious disassortativity disappears in this limit. It is also easy to show that the degree
distribution P (k) becomes a rescaled form of (x). Curiously, in the above
quantum analogy this regime corresponds to the classical limit where the discreteness of the quantum world can be neglected, and the Fermi distribution can
be replaced by the Boltzmann distribution which (in a suitable representation)
has the expression (5.16).
We finally briefly describe the directed case. Now the probability that a
directed link from i to j is there is a function pij = p(xi , yj ) of two quantities
xi and yj playing a role analogous to that of the desired out- and in-degrees
kiout , kjin in the directed version of the Chung-Lu model defined in Eq. (5.8). By
looking at Fig. 5.1b and requiring that graphs with the same in- and out-degree
sequence are equiprobable, we are led to a condition analogous to Eq. (5.11)
with q replaced by p. This implies that in this case pij /(1 pij ) = fi gj where
fi = f (xi ) and gj = g(yj ) are functions of xi and yj alone respectively. Again,
all nontrivial choices can be mapped onto the linear case through a suitable
redefinition of x and y. Therefore we have
pij = p(xi , yj ) =
xi yj
1 + xi yj
(5.17)
and in this case the sparse graph limit yields

pij xi yj ,
(5.18)
which is equivalent to thedirected version

of the Chung-Lu model defined in
Eq. (5.8), with xi = kiout / L and yj = kjin / L.
5.3
Maximum-entropy ensembles
All the examples in the last section show that even a conceptually simple idea
(generating a random ensemble of networks with a specified degree sequence)
can encounter big difficulties when naively implemented. The Park-Newman
approach solves the practical and conceptual problems of other implementations
of the CM by ensuring that the ensemble is properly sampled, so that any
inference about the higher-order properties (e.g. the assortativity) is unbiased,
just like for the ER model.
At this point, we might ask a natural question: if we consider a different
constraint (other than the number of links or the degree sequence), how can
we be sure that we end up with an appropriate method to generate the ensemble? Is there some constructive method to generate graph ensembles with given
constraints?
In this section, we show that such a method exists and is based on the
Maximum Entropy Principle.
5.3. MAXIMUM-ENTROPY ENSEMBLES
5.3.1
89
The Maximum Entropy Principle
In information theory, an important measure of the uncertainty, or unpredictability of a random process is provided by Shannons entropy. If P(G) is
the probability of the outcome G, Shannons entropy is defined (up to a proportionality constant which is irrelevant for our later purposes) as
X
S
P(G) ln P(G),
(5.19)
G
where the sum runs over all the M possible outcomes of the process.
A deterministic (certain) process, i.e. one for which one outcome has probability one while all other outcomes have probability zero, gives S = 0, which is
the minimum possible entropy. By contrast, a completely unpredictable (uniform) process, i.e. one where all the outcomes have exactly the same probability
P(G) = M 1 , gives the maximum value S = ln M .
Another important property of the entropy is additivity: if the event G
requires the simultaneous occurrence (intersection) of two events G1 and G2
(i.e. G = G1 G2 ) and if these events are independent, i.e. the joint probability
P(G) = P(G1 G2 ) can be factorized as P(G) = P1 (G1 )P2 (G2 ) where P1 and
P2 are the marginal probabilities
P for the individual events G1 and G2 , then
S = S1 + S2 where Si = Gi Pi (Gi ) ln Pi (Gi ) denotes the entropy of the
individual event Gi (i = 1, 2) and the sum runs over the possible outcomes of
such event.
Exercise 5.3 Prove the last statement. Use the result to calculate the entropy
of the ER model with n vertices as a function of the probability p.
If we measure Shannons entropy on a graph ensemble, this will provide us
with a measure of the degree of randomness that we are left with, once we enforce the constraints that define the model itself. If the constraints represent
structural properties taken from observations (like the number of links or the
degree sequence in the ER and CM respectively), Shannons entropy will quantify the residual uncertainty that we are left with about the network, after we
measure those properties.
An important application of Shannons entropy is the Maximum Likelihood
Principle, especially as developed by Jaynes [5]. According to this principle,
whenever we have only partial information (summarized in the knowledge of a
set of m observables {x }m
=1 ) about a system, then our least biased inference
or best guess about the (unknown) rest of the system shoul be obtained by
finding the probability P(G) that maximizes S, subject to the known constraints.
This reflects the fact that, except for the known constraints, we are maximally
ignorant about the system. The constraints are expressed in the form
X
E(x )
x (G)P(G) = x
,
(5.20)
G
where x is the observed (known) value of the -th property. Note that the constraints are enforced canonically, i.e. as ensemble averages. An additional constraint is given by the normalization of the probability, expressed by Eq.(5.1).
Introducing one Lagrange multiplier for each constraint x (plus an additional multiplier for the normalization constraint) and taking the functional
90
derivative of S with respect to P, one can show that the result of the constrained
maximization of S is
eH(G)
P(G) =
,
(5.21)
Z
where
m
X
H(G) =
x (G)
(5.22)
=1
is a linear combination of the constraints that, by analogy with (statistical)

physics, we will call energy or Hamiltonian, while
X
Z
eH(G)
(5.23)
G
is the so-called partition function, which is enforcing the normalization constraint for the probability.
The above expressions coincide with those of traditional statistical physics.
Indeed, the beauty of the Maximum Entropy approach is that of showing that
the entirety of statistical physics can be reformulated exactly as an inference
problem from limited information. Indeed, from the knowledge of only a few
macroscopic quantities (like the total energy) of a system, statistical physics
looks for a least biased estimate of the microscopic properties of the system,
in terms of the probability P(G) of the microscopic configurations. This consideration establishes a fascinating connection between information theory and
statistical physics.
In the context of networks, we are interested in doing the same operation.
This establishes another beautiful connection, this time to graph theory. From
the knowledge of a few aggregate properties {x }m
=1 (such as the degree sequence), we want to construct completely random ensembles of graphs. Applying the Maximum Entropy principle to graph ensembles leads to the so-called
Exponential Random Graph (ERG) models, which were first introduced in social network analysis [22, 8] to generate ensembles of graphs matching a given
set of observed topological properties. ERGs were then rediscovered within
an explicit statistical-mechanics framework [19, 6, 7] where it was showed that
traditional tools borrowed from statistical physics could successfully contribute
to investigate and sometimes even solve them explicitly [19].
Homework 5.11 Prove the following relation (useful in the following):
E(x ) =
1 Z
=
,
Z
(5.24)
where we have introduced the free energy

ln Z.
(5.25)

In the following, we consider specific examples of ERGs and we show that the
ER model and the Park-Newman implementation of the CM can be recovered
as particular cases of maximum-entropy ensembles. Moreover, we will show that
the general method allows to extend the approach to different constraints.
5.3.2
91
Simple undirected graphs
We start by considering the rather simple, still quite general, case when the
Hamiltonian can be expressed in the form
X
H(G) =
ij gij .
(5.26)
i<j
Note that in this model the total energy H(G) of the graph is the sum of
the energies ij corresponding to its individual links. Each energy ij can be
regarded as the cost of placing a link between i and j. With this choice the
partition function reads
XY
X
X P
eij gij
(5.27)
eH(G) =
e i<j ij gij =
Z =
{gij }
{gij } i<j
{gij }
Y X
ij gij
i<j gij =0,1
(1 + e
ij
)=
Zij ,
i<j
i<j
where we have introduced the vertex-pair partition function

Zij = 1 + eij .
(5.28)
Finally, the free energy is

= ln Z =
ln Zij =
i<j
ij
(5.29)
i<j
where
ij = ln Zij .
(5.30)
Equations (5.26-5.30) completely define the model. From the free energy it
is possible to compute all the relevant quantities. For instance, the expected
occupation number of the pair of vertices i, j, representing the probability that
such vertices are connected, is
pij = E(gij ) =
ij
1
=
ij
1 + eij
and the expected total number of links in the network is
X X
E(Lu ) =
gij =
pij .
i<j
(5.31)
(5.32)
i<j
ER Random Graph
We now show that the ER random graph model can be recovered as particular cases of the exponential model defined by the Hamiltonian (5.26). This is
obtained when all energies are equal:
ij = .
With such a choice, the Hamiltonian reads
X
H(G) =
gij = Lu (G).
i<j
(5.33)
(5.34)
92
This corresponds to m = 1, x1 = Lu and 1 = in Eq. (5.22), and we are

therefore only requiring that the expected number of links hLu i can be set to
any desired value Lu by tuning the parameter . This requirement corresponds
to the ER random graph model. Looking at Eq. (5.31), we have
pij = p =
1
1 + e
(5.35)
and, as expected, we recover the constant form for the connection probability
characterizing the ER model.
Configuration Model
We now consider the additive case
ij = i + j ,
(5.36)
which results in
H(G) =
X
X
X
(i + j )gij =
i gij =
i ki (G).
i<j
(5.37)
i6=j
Note that in this case we are requiring to set the expected value of each degree
hki i to any desired value ki by tuning the corresponding parameter i . In other
words, we are fixing the desired degree sequence and we expect this case to be
equivalent to the version of the configuration model described in Section 5.2.4.
Indeed, Eq. (5.31) now reads
pij =
1
1 + ei +j
(5.38)
and by introducing xi ei , it is easy to see that the above equation is the

same as Eq. (5.12), corresponding to the Park-Newman version of the configuration model. This confirms that that version is unbiased, in accordance with
the Maximum Entropy Principle. Other variants (such as the Chung-Lu one)
cannot be derived from a Maximum Entropy approach, thus showing their biasedness.
5.3.3
Directed graphs
We now briefly consider the directed case. The Hamiltonian (5.26) becomes
X
H(G) =
ij gij
(5.39)
i6=j
and calculations analogous to those presented above allow to write the partition
function as
Y
Y
Z=
Zij =
(1 + eij )
(5.40)
i6=j
i6=j
and the free energy as

= ln Z =
X
i6=j
ln Zij =
X
i6=j
ij .
(5.41)
93
The probability that a directed link from i to j is there is

pij = E(gij ) =
ij
1
,
=
ij
1 + eij
(5.42)
and the expected number of directed links is

X
E(L) hLi =
pij .
(5.43)
i6=j
In the constant case ij = , we recover a directed version of the ER random

graph model:
H(G) =
gij = L(G)
pij = p =
i6=j
1
.
1 + e
(5.44)
In the more general additive case, we have ij = i + j , since for directed

graphs ij can be asymmetric. These two parameters control the in- and the
out-degree of each vertex separately:
H(G) =
X
X

(i + j )gij =
i kiout (G) + i kiin (G)
i6=j
pij =
1
.
1 + ei +j
(5.45)
This choice is equivalent to the directed version of the configuration model
defined in Eq. (5.17), where xi ei , yj ej .
5.3.4
Weighted graphs
Using the same general recipe, one might also consider other ensembles of
graphs. As a last example, we consider an exercise involving an ensemble of
weighted graphs. A weighted graph G is still described by the entries {gij }
of an adjacency matrix. However, these entries are now assumed to be nonnegative integers.
Exercise 5.4 Consider the Hamiltonian
H(G) =
ij gij ,
(5.46)
i<j
where G is a generic weighted graph. Following the derivation leading to Eq. (5.31),
write the expected weight E(gij ) of the connection between vertices i and j. Assume that the possible values of gij are non-negative integers, ranging from 0 to
+.
Bibliography
[1] M.E.J. Newman, S.H. Strogatz and D.J. Watts, Phys. Rev. E 64, 026118
(2001).
[2] M. Molloy and B. Reed, Random Structures and Algorithms 6, 161 (1995).
[3] A.C.C Coolen, A. De Martino, A. Annibale, J. Stat. Phys. B 136, 103567
(2009).
[4] E.S. Roberts, A.C.C. Coolen, Phys. Rev. E 85, 046103 (2012).
[5] E. T. Jaynes, Physical review 106(4), 620 (1957).
[6] J. Berg and M. L
assig, Phys. Rev. Lett. 89, 228701 (2002).
[7] Z. Burda, J. Jurkiewicz and A. Krzywicki, Phys. Rev. E 69, 026106 (2004).
94
Chapter 6
Random Graph
Implementation
In the previous chapters we have discussed at some length examples of graphs
and of their properties. The main idea is to come closer to an understanding
of the nature of real world graphs. In the history of our understanding of these
graphs computer models and computer analysis tools have played an important
role. In the computer science chapters we will delve a bit deeper into these
tools.
These analyses can help us to:
visualize static graphs to gain a visual impression of the shape of the graph
compute static network properties of graphs, such as the number of edges,
the shortest path between two vertices, the degree of a vertex, or the
clustering coefficient of a graph
understand dynamic properties, such as how graphs grow or shrink in time
In this chapter we will look deeper at how we can implement graphs in the
computer, so that we can perform computer analyses of the graphs. Given the
limited time that there is in a part of one-semester course, we will only cover
some basic notions. This will be, however, quite useful, as the basics are enough
to give an idea of how more elaborate systems work.
In the remainder of these notes we will use three computer tools. We will
use Gephi for graph visualization, we use a programming language such as C,
C++, Java, or Python to compute network properties, and we will introduce
Netlogo to study dynamic behavior of graphs.
Gephi and Netlogo are high level tools. C, C++, Java, and Python are lower
level tools. The level of abstraction that you work at is more detailed, and it
takes more effort to achieve things. The high level tools are built using the low
level languages. Therefore, to understand the limitations of the high level tools,
it pays to learn a bit about the low level languages. Therefore, the homework
exercises will concentrate on implementing graphs using a low level language.
In all programming homework exercises of this course, hand in the source
code, the input to the program (if any), and the output (if any). Your program
should contain documentation in the source code (comments). Provide in your
95
96
CHAPTER 6. RANDOM GRAPH IMPLEMENTATION
report a short descripition of what you did, and how essential aspects of your
approach work (except for very small exercises).
6.1
Random graph
We will start with implementing a small regular network so that we can compute
a few important properties of the network. In later chapters we see how to
work with larger graphs, with the Configuration Model, and with real world
graphs. We will see how the graphs can be constructed, and how certain network
properties can be checked to see if the graph exhibits regular or real world
properties.
6.1.1
Adjacency Matrix
The programs in this chapter are used to study network properties such as
counting edges or computing shortest paths. Our first program will generate a
graph. The program will store it in memory, and will compute network properties. The aim of these exercises is to gain proficiency in manipulating random
graphs, and, later on, with real world graphs. By following the exercises, you
will create a functioning program that will be able to do increasingly useful
things.
As has been noted before, graphs can be represented in a computer program
by an adjacency matrix. The data structure of an adjaceny matrix of a graph of
N vertices can be represented in a programming language is a two-dimensional
array of boolean values. The boolean values represent the presence of an edge.
TRUE means an edge is present, FALSE means there is no edge. An edge between
vertex i and vertex j can be represented as m[i][j] = TRUE in a C-like language,
where m is the variable holding the matrix. A logical choice for the datatype of
m could be an (two-dimensional) array of BOOLEAN or an int.
Exercise 6.1 Question: How much memory does a boolean data type take in
your favorite language? How much does an integer take? How much memory
does it take to store a graph of 1024 vertices? How can this be reduced? At what
cost?
Now that we have discussed the data structure, it is time to start writing
the program.
First, arrange to get access to a machine on which you can program. You
can use your own computer, or one in the computer labs.1 We will only be using
simple text-based command-line programs in this course. Your basic skills from
first year programming classes should be all you need.
Next, write a program in C, C++, Java, or Python using an adjacency
matrix.
1 Just as with your introduction to programming class, the machine should have a text
editor and a compiler or a development environment such as Eclipse, Visual Studio, or Xcode.
(Wordprocessors such as Word and Pages save their text in ways that a compiler cannot read.
Wordpad, Emacs, VI, TextEdit, are texteditors that save their output as plain text that can
be read by a compiler).
6.1. RANDOM GRAPH
97
Using the principles of modular or object oriented programming will result

in cleaner code, and will make it easier to change your data structure implementation in the final chapter. (Refer to your introductory programming courses
for help on this.)
Homework 6.1 (1) On a piece of paper, draw an unweighted undirected graph
with 10 vertices and 20 edges. (2) In your program, globally declare the matrix
m and initialize this matrix to the graph that you drew in the previous step.
(3) Write a print routine that prints out the graph. Print all vertex/vertex
combinations. Print a 0 if no edge is present, and a 1 if an edge is present.
The printout should be formatted in readable way. Hint: You may want to use
a doubly nested loop to achieve this. (4) Compile and run the program, and
check if the output graph matches the graph on paper in step (1) in a logical
sense, 1s represent edges; your program does not have to provide a graphical
representation of the graph. If the program does not correctly display the graph,
repair your program.
Congratulations. You now have a working graph-manipulation program.
6.1.2
Random Graph
This program uses a hardcoded initialization of the adjacency matrix, which is

not very flexible. To print another graph, you would have to edit and recompile
the program. Next, we will modify the program so that it generates a random
graph and stores it in the adjacency matrix. For the theory of this section, refer
to section 2.1.
Make a copy of your program under a new name (such as: exercise2.c, or
randomgraph.c).
Homework 6.2 Write an initialization routine that creates a random graph.
First declare the graph size and the connection probability as a global constant.
Start with a small number of vertices, say 5-10. The initialization routine takes
as input a floating point value, which is the probability p that an edge is present
between vertex i and vertex j. Write a small function that reads the value of p
as input, either interactively, or from the command line. Then, for all pairs i
and j, assign with probability p the presence of an edge to the adjacency matrix.
Use the standard random number generator that draws numbers from a uniform
distribution. Compile and run the program for p = 0.0, p = 0.05, p = 0.2 and
p = 1.0. Print the resulting graphs.
6.1.3
Mean and Variance
Next, we will compute a few properties of the network. We will work towards
properties that allow us to discern a difference between regular networks and
complex networks. We will start with computing the mean and variance of the
edge distribution of the networks that we have.
Exercise 6.2 What do you expect is the mean of the edge distribution of the
regular graphs for the values of p that you have used in the previous Homework
Exercise? What is the expected variance?
98
Homework 6.3 Write two routines to compute (1) the mean and (2) the variance of the number of edges in a series of graphs. (Defintions and algorithms
of mean and variance can be found in the usual places, such as your statistics
books, stackoverflow.com, or Wikipedia). Generate 10 random graphs, and compute mean and variance of the number of edges. Do not print the edges, only the
mean and variance if the number of edges for the graphs. Generate 100 random
graphs, and compute mean and variance of the number of edges. Is the mean of
100 graphs different from 10 graphs? What about the variance? Why?
Exercise 6.3 Is your answer correct? Why do you believe that your answer is
correct? Name two strategies to use so you can have more faith in the correctness
of your program.
6.1.4
Computing Graph Properties
Next we will consider computing the distribution of the degree of the vertices
of a graph.
Homework 6.4 Write a program to compute the degree distribution of a random graph with n = 10 vertices and p = 0, 0.05, 0.5, 1.0 edge probabilities. The
program should compute and print the following numbers: the min, max, mean,
median, and variance of the degree distribution of each graph. Repeat this exercise with n = 100. Do the values of min, max, mean, median, and variance
each increase or decrease compared to n = 10? Why do you think is that the
case (provide an argument for each case)?
Time and Space Complexity
For an algorithm to be useful it has to be correct (it computes what it is supposed to compute) and it has to be efficient (it should compute its result in as
little time and space as possible). This notion of efficiency is formally called
computational complexity, and often, when the meaning is clear, just complexity.2 For most algorithms the computational complexity depends on the
size of the input problem, and complexity is expressed in big O notation,
where the O is followed by some function, usually a linear, exponential, or logarithmic function. O(n) should be read as: the complexity of the algorithms
is of the order of n, which means that there is a constant C such that the
complexity is bounded by C n.
The time complexity of an algorithm indicates how the running time scales
with the size n of the input. A linear time complexity means that when the size
of the matrix increases by a factor of 10, then the run time of the algorithm also
increases by that factor. An algorithm with a quadratic time complexity takes
a 100 times longer to run. A single for loop, running from 1 to n has a run time
that is linear in n (assuming the operations inside the for loop take constant
time). A doubly nested for loop from 1 to n has quadratic time complexity
(since the operations inside the outer-most loop now also take linear time, not
constant).
2 Note that the word complex is used in many contexts. To put it differently, we are here
dealing with the computational complexity of complex network computations.
6.2. VISUALIZATION
99
The space complexity of an algorithm indicates how much storage an algorithm needs. An adjacency matrix is a two dimensional array, and thus has
quadratic space complexity in n, the number of vertices.
Some network properties can be computed quickly, such as counting the
number of vertices or edges. We say that the run-time of the algorithm is linearly
(or polynomially) related to the size of the network. However, some network
properties are harder to compute exactly, and require a run-time that has a
quadratic or cubic relation to the size of the network, such as computing the
shortest path between all pairs of vertices. For these properties approximation
algorithms may have been devised, that run in linear time, or even faster, in
sub-linear time (see, for example, [Russel and Norvig(1995)] for examples).
6.2
Visualization
The structure of complex networks is facilitated by visualization tools, that draw

a graphical depiction of a graph. A popular visualization tool is Gephi. Gephi
draws a colored diagram of the network, and can be used to compute network
properties, such as the ones that we have our own algorihtms for.
Visualization is an important method for undertanding the structure of your
network. In the remainder of this course you will from time to time be asked to
use Gephi to create a picture of the network that you are working with, or that
you have generated with your program.
6.2.1
Downloading Gephi
In order to use Gephi, you must download it from the ususal download site.
Google for Gephi, or go to http://gephi.github.io and download the package
that is suited for your operating system (Linux, Mac OS X, Windows). Everything should work right away under Linux and Mac OS X, for Windows you may
need to install the latest Java version (available at https://java.com/en/download/).
6.2.2
Running Gephi
Startup the program, and launch a sample network, Les Miserables. Your screen
should look like the screenshot in Figure 6.1. Your sample file is imported, just
click OK on the import report. You should now see a nice graph of the relations
between the characters in this famous play.
In order to reduce the homework workload, there are no Compulsory exercises for Gephi. Gephi is a nice tool to play around with, that can be quite
useful to get a feel for the graphs that you are working with.
Exercise 6.4 Click on Data Laboratory, go back to the Overview, right click
on the large red circle, switch to Data Laboratory, and you will see that the best
connected character in Les Miserables is Valjean.
To the right of the pane are buttons to compute network properties such as
Average Degree, Network Diameter, etc. Click on the Run buttons, and see what
happens.
100
Figure 6.1: Gephi

Exercise 6.5 Create a new Project (under the File menu), go to Data Laboratory, and add nodes and edges, to recreate the graph that you used in the
previous exercises. Go to Overview, and look at your graph.
Bibliography
[Russel and Norvig(1995)] Russel, S.J. and P. Norvig (1995): Artificial intelligence: a modern approach. Series in Artificial Intelligence, Prentice Hall.
101
Chapter 7
Configuration Model
Implementation
In the previous chapter we have discussed how to write a program to generate
and manipulate Erd
os-Renyi networks using an adjacency matrix. We will now
write some more code to compute network properties for the random networks.
These will hopefully show us the obvious, that the network is regular, and does
not display real-world properties.
In this chapter we will also take further steps in the road towards real-world
networks by writing a program to generate networks using the Configuration
Model.
7.1
Configuration Model Implementation
In the previous chapter we created a program for generating regular random

graphs. We also experimented with computing properties of the graphs such as
the clustering coefficient, to see if the graphs exhibited real world properties.
For the regular graphs this was not the case. In this chapter we will revisit
the configuration model. The configuration model can be used to generate
graphs that exhibit more real world properties. See also section 3.1 for more
background on the Configuration Model.
7.1.1
Pre-specified degree sequence
In the previous chapter the edges were assigned randomly to the pair i and j.
As you will recall, the configuration model assigns the edges differently, using
a pre-specified sequence that defines the degree of each vertex. The edges are
then assigned randomly, provided that the degree sequence is satisfied. The
next exercise is to modify our program to generate edges randomly according
to the pre-specified degree sequence.
Exercise 7.1 Degree Sequence Histogram
Print a histogram of the degree of vertices in the graph. In a histogram there
should be a bar for each value of the degrees. You can print a histogram by
declaring an array to store the counts of the vertex degrees. The array is indexed
102
7.1. CONFIGURATION MODEL IMPLEMENTATION
103
by the degree k of each vertex. Loop through all vertices, and increment the array
element at the right index if there is a vertex with the right degree value. Then
print the array value as a number, or print it as a series of asterisks.
Now that we have worked with a degree sequence, we are ready to implement
the configuration model.
Homework 7.1 (1) On a piece of paper, draw a graph of 10 vertices. You
can choose how you draw the edges. (2) Write down the degree sequence of this
graph. (3) In your program, declare an array of integers (of size 10) for the
degree sequence. (4) Initialize the array to the degree sequence.
The program will now, for each vertex, attempt to create the correct number
of edges, connecting to a randomly chosen destination vertex, if that destination
vertex still has room (i.e., the prespecified degree has not yet been reached).
(5) So: For each (source) vertex,
randomly choose a destination vertex whose degree is too low according to
its pre-specified degree, and create an edge between the two vertices
(update the degree-count of the vertices)
(if there is no destination vertex left whose degree still allows incoming
edges, stop)
repeat this process until the actual degree of the source vertex matches that
of the pre-specified degree
In the end, print the graphs edges.

(6) Run this program a few times and give its output (the graph) in your
report.
7.1.2
Visualization
How can you know that your program is working correctly? One way to see if
your program behaves as expected is to have your program output the edges,
and import the that into Gephi, to have it visualize the graph.
Exercise 7.2 To do so, you can write a small routine that prints (source,
destination) pairs of the vertex ids. Write the output to a file, or print it to
standard output and redirect the output to a file which you then import into
Gephi. program > file.
There are other ways to see if your program behaves correctly, such as by
writing a routine to check the consistency of the output. We will do so shortly.
104
7.2
CHAPTER 7. CONFIGURATION MODEL IMPLEMENTATION
Repeated Configuration Model
Previously, we have made an implementation of the Configuration Model, which

allowed random graphs to be generated according to a pre-specified degree sequence. In this way, we could make sure that a graph is generated that exhibited
some properties of real world graphs (such as, obviously, a certain degree distribution).
Due to the way that the edge destinations were chosen, multi-edges and selfloops may have occurred. The checking routine may have found mismatches
between the actual and the pre-specified degree sequence.
7.2.1
Self-Loops and Multi-Edges
Due to the way that the edge destinations were chosen, multi-edges and selfloops may have occurred. We can write a checking routine to find mismatches
between the actual and the pre-specified degree sequence. Multi-edges and
self-loops are undesirable, however, removing them would subtly change the
randomness of the graph. We need another solution, as chapter 5 discussed in
depth.
One way to address this issue is the Repeated Configuration Model, which
we will implement in this chapter. We will start with a small warmup to check
the degree and to count the number self-loops and multi-edges.
7.2.2
Check Routines
Does the output of your program match the pre-specified degree sequence for
the graphs that you tried?
Homework 7.2 Check routine
Write a routine that checks the degrees of the vertices against the degree sequence. Have the routine report the number of vertices with mismatching degrees, and the number of missing edges. Run the routine a few times on the
graph. Does the graph have the required degree sequence? If so, why? If not,
why not?
Homework 7.3 Rewiring to remove self loops and multi edges

Write a program that reports the number of loops and the number of multi edges.
Do the reported results match the results of the degree sequence checker? How
are they related?
Now draw two more graphs on a piece of paper. The first one is to have 10
vertices and resemble a real world network (hubs, clusters, small world, power
law degrees). The second network is to have 20 vertices and resemble a real
network, and your degree sequence follows a power law.
Exercise 7.3 Clustering Coefficient
Compute the clustering coefficient of the graphs. Compare the clustering coefficient of the regular Erd
os-Renyi graph and of the Configuration Model graph.
What do you see?
7.2. REPEATED CONFIGURATION MODEL
105
Exercise 7.4 Degree Sequences

Print a histogram of the degree sequences. Compare the histogram with the
degree sequence historgram of the Erd
os-Renyi graph of Exercise 7.1. What do
you see?
7.2.3
Repeated Configuration Model
You have most likely found that the resulting degree sequences were not always
realized by the program. We will now implement the repeated configuration
model to remedy this problem.
Homework 7.4 Repeated Configuration model
Create a new copy of your program. Change your program so that when a graph
with loops and multi edges is discovered, it is discarded, and the program starts
over to try with a new graph. Run the repeat configuration model for the two
graphs (10 vertices, 20 vertices). How often does your program have to restart
(for each graph)? Is termination guaranteed?
At this point, you may want to output the graph to visualize it with Gephi,
and to compare it with the un-repeated configuration model.
The repeated configuration model addresses a shortcoming of the original
configuarion model by calling the model repeatedly. Thus, the time complexity
of the repeated configuration model is larger than of the original configuration
model.
The time complexity can be assessed in at least two ways: by inspection,
and by experiment.
Exercise 7.5 (a) Assess the time complexity of the repeated configuration
model by code inspection/logical reasoning. Does the time complexity depend
on n, the number of vertices of the graph? Does it depend on other factors?
(b) Assess the time complexity of the repeated configuration model experimentally. Run the repeated configuration model, and count how many times it
restarts, how often the original configuration model is called. Do this experiment
20 times, and average the result.
Section 5.2 contains a more elaborate description of alternative implementations. It also contains a remark about the computational complexity of the
repeated configuration model (Edge stub reconnection).
Part II
Applications of Networks
106
Chapter 8
Percolation
This chapter is devoted to percolation theory, which is the study of connectivity
in large networks. In Section 8.1 we look at ordinary percolation on an infinite
lattice, which is a model for connectedness of a large regular network in which
edges are randomly removed. In Section 8.2 we look at invasion percolation
on an infinite lattice, which is a model for the spread of a virus through a
large regular network in which edges have random transmission capacities. In
Section 8.3 we look at the configuration model and investigate how vulnerable
its connectedness is when vertices are being removed, either deterministically or
randomly. This is a model for a malicious attack on a network by a hacker.
A standard reference for percolation theory is Grimmett [2].
8.1
Ordinary percolation
Consider the d-dimensional integer lattice Zd , d 2. Draw edges between

neighbouring vertices. Associate with each edge e a random variable w(e), drawn
independently from the uniform distribution on (0, 1). This gives a random field
of weights
w = (w(e))e(Zd ) ,
(8.1)
where (Zd ) is the set of edges (see Fig. 8.1).
Figure 8.1: Z2 (vertices) and (Z2 ) (edges) with random weights.

Pick p [0, 1], and partition Zd into p-clusters by connecting all vertices that
107
108
CHAPTER 8. PERCOLATION
are connected by edges whose weight is p, i.e.,

p
xy
(8.2)
if and only if there is a path connecting x and y such that w(e) p for all
e . (A path is a collection of neighbouring vertices connected by edges.) Let
Cp (0) denote the p-cluster containing the origin, and define
(p) = P(|Cp (0)| = )
(8.3)
with P denoting the law of w, i.e., the probability that the origin is connected
to infinity via edges with weight p. This is called the percolation function.
We have
C0 (0) = {0},
C1 (0) = Zd ,
p 7 Cp (0) is non-decreasing,
(8.4)
so that (see Fig. 8.2)

(0) = 0,
(1) = 1,
p 7 (p) is non-decreasing.
(8.5)
Exercise 8.1 Prove (8.4).
Figure 8.2: Qualitative plot of the percolation function.

Ordinary percolation may serve as a model for the connectedness of a network
in which edges are randomly removed.
Define
pc = sup{p [0, 1] : (p) = 0}.
(8.6)
It is known that pc (0, 1), and that p 7 (p) is continuous for all p 6= pc
and strictly increasing on (pc , 1). Continuity is expected to hold also at p = pc ,
but this has only been proved for d = 2 and d 11. It is further known that
pc = 21 for d = 2 (see Fig. 8.3), while no explicit expression for pc is known for
d 3. However, good numerical approximations are available for pc , as well as
1
expansions in powers of 2d
for d large.
At p = pc , called the critical percolation threshold, a phase transition occurs:
p < pc :
all clusters are finite,
p > pc :
there are infinite clusters.
(8.7)
It is known that in the supercritical phase there is a unique infinite cluster with
P-probability 1.
8.2. INVASION PERCOLATION
109
Exercise 8.2 Why is this uniqueness not obvious?
8.2
Invasion percolation
Again consider Zd and (Zd ) with the random field of weights w. Grow a cluster
from 0 as follows (k k denotes the Euclidean distance on Zd ):
1. Invade the origin: I(0) = {0}.
2. Look at all the edges touching I(0), choose the edge with the smallest
weight, and invade both that edge and the vertex at the other end: I(1) =
{0, x}, with x = argminyZd : kyk=1 w({0, y}).
3. Repeat 2 with I(1) replacing I(0), etc. (see Fig. 8.4).
In this way we obtain a sequence of growing sets I = (I(n))nN0 with I(n) Zd
the set of invaded vertices at time n and |I(n)| n + 1. (The reason for the
inequality is that the vertex at the other end may have been invaded before.
The set of invaded edges at time n has cardinality n. An invaded edge no longer
counts for the probing of the weights, because it cannot be invaded a second
time.) The invasion percolation cluster is defined as
CIPC = lim I(n).
n
(8.8)
This is an infinite subset of Zd , which is random because w is random. Note

that the sequence I is uniquely determined by w (because no two edges have
the same weight).
Invasion percolation may serve as a model for the spread of a virus through
a computer network : the virus is greedy and invades the network along the
weakest links.
The first question we may ask is whether CIPC = Zd with probability 1. The
answer is no:
CIPC ( Zd a.s.
(8.9)
(a.s. is an abbreviation for almost surely, which means that the statement is
true with P-probability 1). In fact, CIPC turns out to be an asymptotically thin
set, in the sense that
lim
1
|BN CIPC | = 0 a.s.
|BN |
with BN = [N, N ]d Zd .
(8.10)
Thus, the virus affects only a small part of the network.

An interesting fact about invasion percolation is the following. Let Wn
denote the weight of the edge that is traversed in the n-th step of the growth of
CIPC , i.e., in going from I(n 1) to I(n). Then
lim sup Wn = pc
a.s.
(8.11)
The intuition behind this fact is the following.

First we argue why the limsup is pc . Pick p (pc , 1). Then the union
of all the p-clusters contains a unique infinite component (recall Exercise 8.2),
110
Figure 8.3: Simulation of ordinary percolation on a 25 25 block in Z2 for p = 0.45

and p = 0.55. The largest cluster is colored red, the second largest cluster blue, the
third largest cluster green. Note that the red cluster spans across the block for p = 0.55
but not for p = 0.45.
Figure 8.4: The first three steps of the invasion: I(n) for n = 0, 1, 2.
8.2. INVASION PERCOLATION
111
which we denote by Cp . Note that the asymptotic density of Cp is (p) (0, 1)

(see Fig. 8.2) and that Cp does not necessarily contain the origin (see Fig. 8.5).
All edges incident to Cp have weight > p. Let p denote the first time a vertex
in Cp is invaded:
p = inf{n N0 : I(n) Cp 6= }.
(8.12)
We claim that P (p < ) = 1. Indeed, each time I breaks out of the box
around 0 it is currently contained in, it sees a never-before-explored region
containing a half-space. There is an independent probability (p) (0, 1) that
it hits Cp at such a break out time. Therefore it will eventually hit Cp with
probability 1.
Figure 8.5: Picture of invasion percolation becoming trapped inside Cp (shaded region). All edge weights in Cp are p, all edge weights incident to the boundary of
Cp are > p. The lower black circle is the origin (the starting location of the invasion).
The upper black circle is the vertex where the invasion enters Cp , which occurs at time
p .
Homework 8.1 Work out the details of the above argument. (Note that we
tacitly use that the critical percolation threshold of the full space is the same as
that of the half-space, which is not obvious but is true.)
Now, the edge invaded at time p , being incident to Cp , has weight > p. Since
the invasion took place along this edge, all edges incident to I(p 1) (which
includes this edge) have weight > p too. Thus, all edges incident to I(p ) Cp
have weight > p. However, all edges connecting the vertices of Cp have weight
p, and so after time p the invasion will be stuck inside Cp forever. Not
only does this show that CIPC = I(p ) Cp ( Zd , it also shows that Wn p
for all n large enough a.s. Since p > pc is arbitrary, it follows that
lim sup Wn pc
a.s.
(8.13)
Next we argue why the reverse inequality holds as well. Indeed, suppose
that Wn p for all n large enough for some p (0, pc ). Then
CIPC Cp I(p)
(8.14)

p = inf m N0 : Wn p n m
(8.15)
with
112
the first
onwards
and this
Note
time from which onwards Wn stays below p. Indeed, from time p

the invasion must stay inside Cp. But |Cp| < and |I(p)| < a.s.,
contradicts |CIPC | = .
that
lim sup
N
1
|BN CIPC | (p) a.s.
|BN |
p > pc ,
(8.16)
which proves (8.10) because limppc (p) = 0 by the (supposed) continuity of

p 7 (p).
The above argument shows that invasion percolation exhibits self-organized
criticality: CIPC is in some sense close to Cpc for ordinary percolation (see
Fig. 8.6). Informally this can be expressed by writing
CIPC = lim Cp .
ppc
(8.17)
Thus, even though invasion percolation has no parameter, it behaves as if it

were critical. Very little is known about the probability distribution of CIPC .
For further background, see Angel, Goodman, den Hollander and Slade [1].
In Chapter 11 we will encounter two further examples of models that exhibit
self-organized criticality: the sandpile model and the Bak-Sneppen model.
Figure 8.6: Simulation of CIPC on Z2 .
8.3
Vulnerability of the configuration model
Let us leave the world of infinite lattices and return to the world of finite random graphs. In Chapters 2 and 3 we saw that percolation may occur in the
Erd
os-Ren
yi model and in the configuration model. We found that the critical percolation thresholds in these models are = 1 and = 1, respectively.
Namely, for > 1 and > 1, respectively, the largest cluster has size (n) as
n , with n the number of vertices, while for < 1 and < 1, respectively,
it has size (log n).
8.3. VULNERABILITY OF THE CONFIGURATION MODEL
113
We focus on the configuration model with n vertices and with vertex degrees
D1 , . . . , Dn that are i.i.d. random variables drawn from a prescribed probability distribution f . We will be particularly interested in choices of f having a
polynomial tail f (k) Ck , k , with exponent (3, ).
The key quantity determining the occurrence of percolation is (recall (3.12))
P
k(k 1)f (k)
P
(0, ).
(8.18)
= kN
kN kf (k)
Suppose that > 1. We ask ourselves the following question:
Given is a sequence = ((k))kN0 of probabilities, each taking values in
[0, 1]. Suppose that a hacker attacks the network by randomly removing
vertices: vertex i is retained with probability (Di ) and is removed with
probability 1 (Di ) (together with all its incident edges), independently
for different vertices. After the attack, does the the largest cluster still
have size (n) or not?
This question was answered by Janson [3]: The answer is yes if and only if
P
k(k 1)(k)f (k)
> 1.
(8.19)
= kN0P
lN0 lf (l)
Thus, if 1, then the answer is no and we say that the attack was successful.
Homework
8.2 Give a heuristic explanation why the denominator in (8.19)
P
is not kN0 k(k)f (k). Hint: Recall the argument in Section 3.1.4.
Two choices are of interest:
(1) (k) = (0, 1) for all k N. This corresponds to a random attack,
where the terrorist removes a fraction 1 of the vertices without looking
at the degrees.
(2) (k) = 0 for k > k and (k) = 1 for k k , with k N some threshold
value. This corresponds to a deterministic attack, where the terrorist
removes all vertices with degree larger than k .
In case (1) we have = , and so the attack is successful if and only if 1/,
i.e., a fraction 1 (1/) of the vertices is removed. In case (2) we have
P
1kk k(k 1)f (k)
P
=
,
(8.20)
kN kf (k)
and so the attack is successful if and only if
P
1
1kk k(k 1)f (k)
P
.
k(k
1)f
(k)
kN
(8.21)
For the special case whereP

f (k) = k /( ), k N, with the Riemann zetafunction given by ( ) = kN k , the denominator in the left-hand side of
(8.21) equals
1
[( 2) ( 1)],
(8.22)
( )
114
while the denominator minus the numerator scales like

[1 + o(1)]
1
( 3)
k
,
( 3)( )
k .
(8.23)
This gives us the approximate criterion that the attack is successful if and only
if k / k( ) with
1/( 3)

1
.
k( ) = ( 3) ( 2) ( 1) 1
(8.24)
We have (see Fig. 8.7)

lim k( ) = ,
3
lim k( ) = 2.
(8.25)
Thus, for 3 the network becomes extremely vulnerable because only few vertices with a high degree need to be removed in order to take down the network,
while for the network becomes extremely robust because all vertices with
degree > 2 need to be removed in order to take down the network.
k( )
3
Figure 8.7: Plot of 7 k( ).
Exercise 8.3 Prove (8.25). Hint: Show that limu1 (u 1)(u) = 1 and
limu 2u [(u 1) (u)] = 1.
Bibliography
[1] O. Angel, J. Goodman, F. den Hollander and G. Slade, Invasion percolation
on regular trees, Ann. Probab. 36 (2008) 420466.
[2] G.R. Grimmett, Percolation, Springer, Berlin, 1989.
[3] S. Janson, On percolation in random graphs with given vertex degrees,
Electronic J. Probab. 14 (2009) 86118.
115
Chapter 9
Epidemiology
In this chapter we look at a model for the spread of an infection over a network,
called the contact process. This process is an example of a larger class of random
processes referred to as interacting particle systems. In Section 9.1 we look at
infinite lattices, in Section 9.2 at finite lattices, in Section 9.3 at finite random
graphs. In Section 9.4 we investigate a closely related problem, namely, how a
rumour spreads through a network.
A standard reference for interacting particle systems is Liggett [8].
[begin intermezzo]
The Poisson process with rate (0, ) is defined as the increasing sequence
of random times (Ti )iN0 such that T0 = 0 and Ti Ti1 , i N, are i.i.d. random
variables with common distribution EXP(), i.e.,
P(T1 > t) = et ,
t 0.
(9.1)
We may think of Ti as the time at which a random clock rings for the i-th
time. We imagine that the clock rings at rate , i.e., the probability that a
ring occurs in an infinitesimally small interval dt is dt. Indeed, by dividing up
the time interval [0, t] into pieces of length t each and letting t 0, we see
that

P(T1 > t) = P no ring occurs in [0, t]
(9.2)
= lim (1 t)t/t = et ,
t 0,
t0
which matches the expression in (9.1).

[end intermezzo]
9.1
9.1.1
The contact process on infinite lattices

Construction
d
The contact process is a Markov process (t )t0 with state space = {0, 1}Z ,
d 1, where
t = {t (x) : x Zd }
(9.3)
116
9.1. THE CONTACT PROCESS ON INFINITE LATTICES
117
denotes the configuration at time t, t (x) = 1 means that vertex x is infected at

time t, and t (x) = 0 means that vertex x is healthy at time t. The configuration t changes with time, and this models how an infection spreads through a
population of individuals: the individuals form the vertices of the network, the
contacts between the individuals form the edges of the networks.
The evolution is modelled by specifying a set of local transition rates
c(x, ),
x Zd , ,
(9.4)
playing the role of the rate at which the state of vertex x flips in the configuration , i.e.,
x
(9.5)
with x the configuration obtained from by changing the state at vertex x
(either 0 1 or 1 0). In the contact process these rates are chosen as

(y), if (x) = 0,
yx
c(x, ) =
(0, ),
(9.6)
1, if (x) = 1,
i.e., infected vertices become healthy at rate 1 and healthy vertices become
infected at rate times the number of infected neighbours. The parameter
measures how contagious the infection is. (Because Zd is infinite, a little work
is needed to show that the contact process is well-defined. Details can be found
in Liggett [8].)
The configuration space comes with a natural partial order : we say that
is everywhere smaller than 0 , written 0 , when
(x) 0 (x)
x Zd .
(9.7)
This order is called partial because some pairs of configurations are ordered
while others are not.
9.1.2
Shift-invariance and attractiveness
Note that
c(x, ) = c(x + y, y )
y Zd
(9.8)
with y the shift of space over y, i.e., y is the configuration viewed relative
to vertex y:
(y )(x) = (x y),
x Zd .
(9.9)
Property (9.8) says that the flip rate at x only depends on the configuration
as seen relative to x, which is natural when the interaction between individuals
is shift-invariant. Also note that
c(x, ) c(x, 0 ) if (x) = 0 (x) = 0,

(9.10)
0
c(x, ) c(x, 0 ) if (x) = 0 (x) = 1.
Property (9.10) says that when is everywhere smaller than 0 , the state at x
flips up faster in 0 than in , but flips down slower. Systems with this property
are called attractive.
118
CHAPTER 9. EPIDEMIOLOGY
Homework 9.1 Show that contact process preserves the partial order ,
i.e., two realisations of the contact process starting from , 0 with 0 can
be coupled in such a way that an infinitesimally small time later the two configurations are still everywhere ordered (with probability 1). Hint: Recall the
intermezzo about coupling in Chapter 2.
In what follows we will see that properties (9.8) and (9.10) allow for a number
of interesting conclusions about the equilibrium behaviour of the contact process,
as well as its convergence to equilibrium.
9.1.3
Convergence to equilibrium
Write [0] and [1] to denote the configurations 0 and 1, respectively.

These are the smallest, respectively, the largest configurations in the partial
order , and hence
[0] [1],
.
(9.11)
Since the contact process preserves the partial order , we can obtain information about what happens when the system starts from any by comparing
with what happens when it starts from [0] or [1]. Indeed, writing
Pt () = P(t | 0 = )
(9.12)
for the probability distribution of t given that 0 = , we have

[0]
is stochastically non-decreasing,
[1]
Pt
is stochastically non-increasing,
t 7 Pt
t 7
[0]
(9.13)
[0]
i.e., if s < t and s , t have probability distributions Ps , Pt , respectively, then

there exists a coupling of s and t such that s t with probability 1.
Homework 9.2 Give a heuristic argument for this monotonicity based on the
result of Homework 9.1.
An immediate consequence of (9.13) is the existence of the limits
[0]
= lim Pt ,
t
[1]
= lim Pt ,
t
(9.14)
which are referred to as the lower equilibrium, respectively, the upper equilibrium.
9.1.4
Critical infection threshold
Note that [0] is a trap for the dynamics (if all sites are healthy, then no infection
will ever occur), and so we have
= [0] .
(9.15)
An immediate consequence of (9.13) is that there is a critical infection threshold

d [0, ] such that
d :
= [0]
(extinction of infection),
> d :
6= [0]
(survival of infection).
(9.16)
9.2. THE CONTACT PROCESS ON LARGE FINITE LATTICES
119
Thus, for large there is an epidemic while for small there is not.
Let p() denote the density of the infections in . The critical infection
threshold
d = inf{ (0, ) : p() > 0} = sup{ (0, ) : p() = 0}
(9.17)
separates the phase of extinction of the infection from the phase of survival of
the infection. The function 7 p() is non-decreasing and continuous (see
Fig. 9.1). The continuity at = d is hard to prove.
Figure 9.1: Qualitative plot of the density function.

Here are three facts about the critical infection threshold, the proof of which
requires delicate coupling arguments:
dd 1 ,
2dd 1,
1 < .
(9.18)
These inequalities combine to yield that d (0, ) for all d 1, so that the
phase transition occurs at a non-trivial value of the infection rate parameter.
Sharp estimates are available for 1 , but these require heavy machinery. For
instance, it can be shown that the one-dimensional contact process survives
when

2
2
1
80
>
.
(9.19)
+1
+1
81
This yields the bound 1 1318 (see Durrett [4] for details). The true value
is 1 1.6494, which can be shown with the help of simulations and with the
help of of approximation techniques.
9.2
The contact process on large finite lattices
Suppose that we consider the contact process on a large finite lattice, say
N = [0, N )d Zd ,
N N.
(9.20)
For convenience we may endow N with period boundary conditions, so that

every vertex has the same environment. The contact process (t )t0 on N is
again well-defined, and is again shift-invariant and attractive. However, since
120
N is finite, we have = = [0] , i.e., the critical infection threshold is infinite

for all N N: on a finite lattice every infection eventually becomes extinct.
An interesting question is the following. Starting from the configuration [1]N
where every individual is infected, how long does it take the dynamics to reach
the configuration [0]N where every individual is healthy? In particular, we can
ask how large is the average extinction time
[0]N = inf{t = [0]N }.
E[1]N ([0]N ),
(9.21)
We expect this time to be growing slowly with N when < d and rapidly
with N when > d , where d is the critical infection threshold for Zd . The
following results are shown in Durrett and Liu [5], respectively, Durrett and
Schonmann [6]: There exist C (), C+ () (0, ) such that
< d :
> d :
lim
E[1]N ([0]N )
log |N |
= C (),
log E[1]N ([0]N )

|N |
N
lim
(9.22)
= C+ ().
Thus, in the subcritical phase the time to extinction is logarithmic in the volume
of the lattice (i.e., very slowly increasing with the volume), in the supercritical
phase it is exponential (i.e., very rapidly increasing with the volume). This is a
rather dramatic difference.
Homework 9.3 Give a heuristic explanation for the scaling of the average
extinction time in the two phases.
It can be shown that in the supercritical phase

[0]N
lim P[1]N
> t = et
N
E[1]N ([0]N )
t > 0,
(9.23)
i.e., the extinction time is asymptotically exponentially distributed.

Homework 9.4 Why is (9.23) reasonable? Hint: Because in a unit time interval extinction occurs with only a very small probability, the time until extinction is exponentially distributed on the scale of its average (recall the explanation
of (9.1)).
9.3
The contact process on random graphs
Chatterjee and Durrett [3], Mountford, Mourrat, Valesin and Yao [9] look at the
contact process on the configuration model and show that, for every (0, )
and every f with (2, ), the average time to extinction grows exponentially
fast with n (the number of vertices). This says that the contact process on the
configuration model with a power law degree distribution is always supercritical: regardless of the value of the average extinction time grows very rapidly
with the size. Apparently, the presence of vertices with large degrees makes it
easy for the infection to survive: hubs easily transmit the infection. In Hao and
Schapira [7] it is shown that the same behaviour occurs for (1, 2].
Similar results have been obtained for a selected class of other random
graphs, such as regular trees and the supercritical Erdos-Ren
yi random graph.
However, it turns out to be hard to obtain sharp estimates. It would be interesting to understand what happens for the preferential attachment model.
Partial results have been obtained by Berger, Borgs, Chayes and Saberi [1].
9.4. SPREAD OF A RUMOUR ON RANDOM GRAPHS
9.4
121
Spread of a rumour on random graphs
Suppose that we consider a modified contact process in which we do not allow

infected sites to become healthy, i.e., once a site is infected it stays infected forever and passes its infection on to neighbouring sites. This can also be viewed as
a model for the spread of a rumour through a social network : once an individual
picks up a rumour, it transmits this rumour to his/her friends, which in turn
transmit the rumour to their friends, and so on.
We model this situation as follows. Assign to each edge e of the configuration
model a random time Ye , which represents the time that is needed by the rumour
to travel along the edge (in either direction). Assume that (Ye )eE are i.i.d. with
probability distribution EXP(1), i.e., exponential with mean 1. The rumour
is inserted at a randomly drawn vertex V1 , and spreads through the graph
employing time Ye to travel accross edge e. Pick a randomly drawn vertex V2
and ask how long it takes for the rumour to reach V2 starting from V1 , i.e., the
typical travel time between two vertices
X
Tn =
inf
Ye ,
(9.24)
: V1 V2
where the infimum runs over all paths from V1 to V2 . It is shown in Bhamidi,
van der Hofstad and Hooghiemstra [2] that in the supercritical regime > 1,
for every (3, ),
lim [Tn
log n] = Z
in distribution,
(9.25)
where
= 1/( 1) and Z is a non-degenerate R-valued random variable. Thus,
the rumour needs a time of order log n to spread through the network, which is
plausible because of the small-world property of the configuration model.
It is further shown in [2] that in the supercritical regime > 1, for every
(2, 3),
lim Tn = Z1 + Z2 in distribution,
(9.26)
n
with Z1 , Z2 i.i.d. non-degenerate (0, )-valued random variables. Thus, when

the edge degrees have an infinite second moment the rumour spreads in a
bounded time. In fact, it turns out that the rumour passes through a small
set of hubs that connect up with almost every other vertex in the network.
For the configuration model it is known that the typical distance Hn (defined
in Chapter 2) scales like log n with
,
(3, ),
1
(9.27)
=
2( 2) , (2, 3).
1
Since >
when > 1 and (3, ), we see that the rumour has a tendency
to spread along edges with an atypically small crossing time.
Interestingly, both and
decrease with and increase with . Indeed, as
decreases, the tail of the degree distribution gets thicker and thicker and the network acquires more and more hubs. Consequently both the typical distance and
the typical travel time decrease. For the special case where f (k) = k /( ),
k N, we have
( 2)
= ( ) =
1.
(9.28)
( 1)
122
Exercise 9.1 Show that 7 ( ) is non-increasing.
Bibliography
[1] N. Berger, C. Borgs, J.T. Chayes and A. Saberi, Asymptotic behavior and
distributional limits of preferential attachment graphs, Ann. Probab. 42
(2014) 140.
[2] S. Bhamidi, R. van der Hofstad and G. Hooghiemstra, First passage percolation on random graphs with finite mean degrees, Ann. Appl. Probab.
20 (2010) 19071965.
[3] S. Chatterjee and R. Durrett, Contact processes on random graphs with
power law degree distributions have critical threshold 0, Ann. Probab. 37
(2009) 23322356.
[4] R. Durrett, Lecture Notes on Particle Systems and Percolation, Wadsworth
Pub. Co., 1988, Belmont CA, USA.
[5] R. Durrett and X. Liu, The contact process on a finite set, Ann. Probab.
16 (1988) 11581173.
[6] R. Durrett and R. Schonmann, The contact process on a finite set II, Ann.
Probab. 16 (1988) 15701583.
[7] C.V. Hao and B. Schapiro, Metastability for the contact process on the
configuration model with infinite mean degree.
[8] T.M. Liggett, Interacting Particle Systems, Grundlehren der mathematische Wissenschaften 276, Springer, New York, 1985.
[9] T. Mountford, J.-C. Mourrat, D. Valesin and Q. Yao, Exponential extinction time of the contact process on finite graphs, to appear.
123
Chapter 10
Pattern detection in
networks
In Chapter 5 we introduced various network ensembles built according to the
Maximum Entropy principle. In this chapter, we are going to use those ensembles as null models that allow us to detect empirical patterns in real-world networks. Such patterns are defined as statistically significant deviations from the
prediction of maximum-entropy ensembles, and reveal the presence of higherorder mechanisms that cannot be explained by the null models themselves.
Since this procedure requires maximum-entropy models to be fitted to empirical data, we will first introduce an important and powerful statistical criterion,
namely the Maximum Likelihood principle, and apply it to network models. We
will then describe a pattern detection method based on this principle.
10.1
The maximum-likelihood principle
As we have already discussed a number of times, one of the main goals in the
study of complex networks is that of reproducing the empirical topological properties of real-world networks by means of relatively simple theoretical models.
In general, given a real-world network and a mathematical model of a graph,
we need to tune the free parameters of the model to those values that optimally
reproduce the empirical properties of the network. Usually, this is done by selecting one or more target topological properties and looking for the parameter
values that make the expected value of these properties match the corresponding
observed value. But since we can target virtually as many topological properties as we want, and surely many more than the number of model parameters,
it is important to realize whether this choice is really arbitrary, or whether a
statistically correct criterion exists which selects a unique parameter value.
In this section we show that the Maximum Likelihood (ML) method, which
has a rigorous statistical basis, allows one to address this problem successfully.
We show that the ML criterion also yields an unbiased way to correctly randomize a network, overcoming the structural bias introduced by other methods.
124
10.1. THE MAXIMUM-LIKELIHOOD PRINCIPLE
10.1.1
125
Motivation
In general, any network model depends on a set of parameters that we col~ Let P(G|)
~ be the conditional probability of
lectively denote by the vector .
occurrence of a graph with adjacency matrix G, in the set of graphs spanned
~ For a given target
by the model, once the parameters are set to the value .
topological property (G) displayed by a graph G (in general a function of the
matrix G), or a set of target properties {i (G)}i , network models provide us
with the expected values hi i~ obtained as ensemble averages:
E~ (i ) hi i~
~
i (G)P(G|).
(10.1)
When comparing the model with a particular realworld network G , one might
in principle derive (analytically or via numerical simulations) the dependence
of E~ (i ) on ~ and then look for the matching value ~M of the parameters ~
that realizes the equality
E~M (i ) = i (G ) i.
(10.2)
In general, the above system of equations might not admit a (unique) solution.
And even if it does, is the criterion leading to Eq. (10.2) statistically correct?
Finally, which target properties have to be chosen anyway?
To concretely illustrate some of the above limitations, we can use again the
simple example we considered in Sec. 5.1. We assume that a real network G
with n vertices and Lu Lu (G ) undirected links (see Eq. (4.12)) is compared
with an Erd
os-Renyi random graph model where the only (unknown) parameter
is the uniform connection probability = p. In the literature, a common choice
for the matching value pM of the parameter p is the one ensuring that the
expected number of links hLu ip = n(n 1)p/2 equals the empirical value Lu ,
which yields
2Lu
(10.3)
pM =
n(n 1)
as in Eq. (5.1). Clearly, choosing the average empirical degree k = 2Lu /n (see
Eq. (4.11)) or the link density cu = 2Lu /n(n 1) (see Eq. (4.17)) as target
properties yields exactly the same value for pM . However, different choices of
target properties would in general result in a different value for pM . For instance,
if the target property was taken to be the average clustering coefficient C defined
in Eq. (4.32), then one would get
pM = C ,
(10.4)
since the expected value of the clustering coefficient in the Erdos-Renyi model
coincides with the connection probability p.1
1 We recall from Chapter 4 that the clustering coefficient C defined in Eqs. (4.29) and (4.30)
i
can be viewed as the probability that two randomly chosen neighbours of vertex i are connected
defined in Eq. (4.32) can be viewed
to each other. Thus the average clustering coefficient C
as the probability that any two vertices sharing a common neighbour are mutually connected.
Since in the Erddos-R

enyi model the probability that any two vertices are connected is p,
independently on whether the vertices share a common neighbour, it follows that the expected
under the Erddos-R

value of C
enyi model is simply p.
126
CHAPTER 10. PATTERN DETECTION IN NETWORKS
As we next show, the Maximum Likelihood (ML in the following) criterion,

which is a statistically rigorous and widely used concept, indicates a unique
choice for the optimal parameter value in the above example as well as in general,
and is also recognized as more reliable than other fitting methods.
10.1.2
Generalities
In general, consider a (discrete for simplicity) random variable V whose probability distribution f (v|) (defined as the probability that V = v) depends on
a parameter . For a physically realized outcome V = v , f (v |) represents
the likelihood that such outcome is generated by the model with parameter
choice . Therefore, for fixed v , the optimal choice for is the value
maximizing f (v |). It is often simpler to define the log-likelihood function
() log f (v |) and maximize it, which gives the same value for the maximum.
The ML approach reverses the role of data and parameters, and makes the
latter subject to the former, thus achieving optimal inference from the empirical knowledge available. This method avoids the drawbacks of other fitting
methods, such as the subjective choice of fitting curves and of the region where
the fit is performed. This is particularly important in the case of networks
and other systems exhibiting broad empirical distributions which may look like
power laws with a certain exponent (which is also subject to statistical error)
in some region, but which may be more closely reproduced by a different value
of the exponent or even by different curves as the fitting region is changed.
By contrast, the ML approach always yields a unique and statistically rigorous
parameter choice.
In the context of network modelling, where we typically have a model gener~ the log-likelihood that a real network
ating a graph G with probability P(G|),
G is generated by the model with parameter choice ~ is

~ log P(G |).
~
()
(10.5)
The ML condition for the optimal choice ~ is found by requiring

"
~
~ ~ ) = ()
(
~
#
= ~0
(10.6)
~
~
=
and checking the second derivatives to be negative in order to ensure that this
indeed corresponds to a maximum. Among all the possible matching values
~ the one preferred by the ML principle is ~ .
{~M } for the parameters ,
Throughout the rest of this Chapter, the empirical value of a network property X(G) measured on a real network G is denoted with an asterisk, i.e.
X X(G ), and the value of the parameter that maximizes the likelihood,
given the data, is also denoted with an asterisk, i.e. ~ . This reminds us
that the parameters are fixed by the data, and simplifies the full notation
~ which illustrates that ~ is ultimately a function of
~ = arg max~ ln P(G |),
G . Finally, the expected value of a quantity X(G), evaluated at the particular

parameter value ~ , will be also denoted by an asterisk, i.e. hXi .
10.1. THE MAXIMUM-LIKELIHOOD PRINCIPLE
10.1.3
127
Erd
os-R
enyi random graph
Homework 10.1 Consider the Erd

os-Renyi random graph model with connection probability p, where 0 < p < 1. Write the log-likelihood function
(p) = log P(G |p) to generate a real-world network with adjacency matrix G
(refer to exercise 5.2 if useful) and show that the ML value p that maximizes
(p) is given by
2Lu
p =
.
(10.7)
n(n 1)

The above exercise shows that, in the Erdos-Renyi model, the ML value for
p is the one we obtain by requiring hLu i = Lu . This coincides with the criterion
we used in Eq. (5.4) in Chapter 5. In general, choosing different reference
quantities would not yield a statistically correct value consistent with the ML
principle. For instance, the ML condition rules out the possibility to construct
a random graph with the same value of the average clustering coefficient as the
real network, which would be obtained by choosing p = C as an alternative
matching value for the parameter p, as in Eq. 10.4. For the Erdos-Renyi model,
the above correct choice is also the simplest and most frequently used one.
However, as we now show, more complicated models may be intrinsically ill
defined, as there may be no possibility to match expected and observed values
of the desired reference properties without violating the ML condition.
10.1.4
More complicated models
In the rest of the chapter, we will often consider a more general class of models
obtained when the links between all pairs of vertices i, j are drawn with different
~ where 0 < pij ()
~ < 1. Note that this class
and independent probabilities pij (),
includes some examples discussed in chapter 5. In this case
Y
~ =
~ aij [1 pij ()]
~ 1aij ,
P(G |)
pij ()
(10.8)
i<j
where the
becomes
aij s
are the entries of the adjacency matrix of graph G , and Eq. (10.5)
~ =
()
X
i<j
aij log
X
~
pij ()
~
+
log[1 pij ()].
~
1 pij ()
(10.9)
i<j
A biased example
For instance, let us consider a modified version of the Chung-Lu model we
introduced in Eq. (5.5), where now (2Lu )1 is replaced by a free parameter z:
pij (z) = zki kj ,
0 < z < (kmax

)2 .
(10.10)
In principle, we would like to find that, among the possible values for the parameter z, the optimal one is precisely z = (2Lu )1 , because we already know
from sec.5.2.3 that this choice would ensure that the expected number of links
hLu i coincides with the observed value Lu , just like in the previous example for
the Erd
os-Renyi model, and also that the expected degree sequence will coincide
with the observed one. Let us see whether this can be actually achieved.
128
Homework 10.2 For the model defined by Eq. (10.10), write the log-likelihood
(z) log P(G |z) as in Eq. (10.9) and show that the ML criterion leads to the
parameter value z defined by the equation
Lu =
X
z ki kj
,
(1 gij
)
1 z ki kj
i<j
(10.11)
which is the counterpart of Eq. (10.7) for this model.

The above exercise shows that the ML value of z is set by Eq. (10.11) and
not by z = (2Lu )1 . Given the value z set by Eq. (10.11), the same equation
automatically shows that the observed number of links Lu is in general different
from the expected value generated by the model:
hLu i =
pij (z ) =
i<j
z ki kj .
(10.12)
i<j
This means that if we want the ML condition to be fulfilled, we cannot match

the expected number of links with the observed number! Viceversa, if we want
the expected number of links to match the empirical one, we have to force z
away from the value z indicated by the ML principle. Similar considerations
apply to the degree sequence ~k, whose empirical value ~k is no longer replicated
by the expected value h~ki since z 6= (2Lu )1 . In other words, Eq. (10.10) is
an example of a biased model where the use of Eq. (10.2) to match expected
and observed properties violates the ML condition. This highlights another
limitation of the multiplicative form of the Chung-Lu model, which adds to the
problematic aspects already discussed in sec.5.2.3.
An unbiased example
As another example, consider the following variant of Eq. (5.13) [5, 15, 7]
pij (z) =
zxi xj
,
1 + zxi xj
z > 0,
xi > 0
(10.13)
where the positive values {xi } are assumed to be fixed for the moment, while z
is a free parameter.
Homework 10.3 For the model defined by Eq. (10.13), show that the ML
criterion leads to the parameter choice z defined by the equation
Lu =
X
i<j
z xi xj
.
1 + z xi xj
As a result, the expected value hLu i =

observed value Lu .
i<j
(10.14)
pij (z ) now coincides with the
The above exercise shows that the model defined by Eq. (10.13) is unbiased: the
ML condition (10.6) and the requirement hLu i = Lu are now equivalent, just
like in the Erd
os-Renyi model.
10.2. DETECTING STRUCTURAL PATTERNS IN NETWORKS
10.2
129
Detecting structural patterns in networks
Detecting patterns in real-world networks means identifying nontrivial structural properties, i.e. properties that cannot be explained by simple random
graph models and therefore indicate the presence of nontrivial mechanisms
of network formation. One way of detecting such patterns is by isolating
the higher-order empirical topological properties that are not simply explained
by lower-order ones (for a discussion of the notion of first-order properties,
second-order properties, and so on, please refer to chapter 4). In principle,
there is no a-priori preferred level of organization separating higher-order from
lower-order properties. However, an accepted criterion is that of considering
local (i.e. first-order) properties as the fundamental building blocks of the network organization, because these properties (such as node degree, see chapter
4) are likely to be directly affected by basic properties of nodes, including nontopological properties such as (depending on the nature of the network) size,
wealth, importance, popularity, etc. Since such properties are usually very heterogeneously distributed over nodes, it is important to control for their effects
by comparing the real network with a maximum-entropy ensemble having the
same local properties.
For this reason, the maximum-entropy ensembles of graphs introduced in
Chapter 5 play an important role in network analysis, and are systematically
used as a benchmark, or null model, for real-world networks. However, this
requires that we make an important step. While a purely mathematical approach to the models introduced in Chapter 5 allows us to easily generate and
characterize such graphs ensembles analytically (for instance by assuming some
probability distribution for the model parameters and studying the resulting
network properties), such an approach does not allow us to study ensembles of
graphs whose realized properties coincide precisely with the observed properties
of a real-world network. In this section we are therefore forced to reverse the
perspective and take a statistical approach, where we use the ML principle to
make rigorous inference starting from the empirical knowledge available.
This will be achieved by first fitting the model ensemble to the data using
the ML principle, and then using the fitted ensemble to provide the expectation
values and standard deviations of various higher-order properties. We will then
compare these model predictions with the empirical data. Empirical properties that are significantly (i.e. by many standard deviations) different from the
expectations are interpreted as nontrivial patterns, not explained by the local
properties defining the maximum-entropy null model. An important feature of
this method is that, although one can in principle measure the averages and
standard deviations of higher-order properties by sampling many graphs from
the maximum-entropy ensemble using the graph probability P(G|~ ), the average and standard deviation are instead calculated analytically (with no need to
sample the ensemble), resulting in a method that is both unbiased and fast.
10.2.1
Maximum likelihood in the configuration model
The simplest ensemble where the local properties of all nodes are controlled for
is the configuration model (CM), that we have introduced in Chapters 3 and 5.
As we discussed in detail in Chapter 5, it is not easy to ensure that, starting
from an empirical degree sequence, the CM is implemented in a way that does
130
not lead to a biased construction. In the rest of this chapter, we show how the
results discussed in Chapter 5, together with the maximum likelihood principle
discussed in sec.10.1, allow us to define a method to compare the observed
properties of a real-world networks with those of a CM that is fitted precisely
on the empirical node degrees.
To this end, we consider two exercises that help us appreciate once more the
subtleties of the implementations of the CM.
Homework 10.4 Consider the model defined by
pij (~x) = xi xj ,
xi > 0
i,
(10.15)
where ~x = {xi } is an n-dimensional vector of positive free parameters 2 . Write

the log-likelihood (~x) ln P(G |~x) of the model (where G is the empirical network) and write the equation fixing the ML values ~x = {xi } of the parameters
} of thePadjacency matrix of G . Show that ~x

as a function of the entries {gij
is such that the expected degree hki i = j6=i pij (~x ) of each node i is in general
different from the observed degree ki = ki (G ).
Homework 10.5 Repeat the previous exercise for the model defined by
pij (~x) =
xi xj
,
1 + xi xj
xi > 0
i.
(10.16)
Show that the ML principle leads to the parameter value ~x defined as the solution of the following set of n nonlinear coupled equations:
ki =
X
j6=i
xi xj
1 + xi xj
i,
(10.17)
thus proving that in this model the expected degree sequence h~ki coincides precisely with the empirical one ~k .
Equations (10.15) and (10.16) in the above two exercises are in some sense
a generalization, to the case of n free parameters, of eqs.(10.10) and (10.13) respectively. The presence of n parameters allows us (at least in principle) to tune
the expected value of each degree of the network to the observed value. However, the exercises above instruct us that only the model defined by Eq. (10.16)
can do so in a manner that is compatible with the ML principle. Importantly,
Eq. (10.17) is formally identical to Eq. (5.14), but (as already noted when discussing the latter), there can be opposite interpretations to such formulae. In
Chapter 5 we mainly used Eq. (5.14) as a way to infer the expected degrees,
given some theoretical probability distribution of the values of ~x. Note that this
operation is very simple, as it only requires summing or integrating, see sec.5.2
the values of the connection probability (over all nodes except i). By contrast,
in Eq. (10.17) the degrees are fixed by observation, and the parameters ~x have
to be found accordingly. This complicates the problem significantly, because
now ~x is the solution of n nonlinear coupled equations.
2 Note that this model coincides with the Chung-Lu implementation of the CM, as already
noted in our discussion of Eq. (5.16)
131
However, we can reduce the number of such equations by noting that, in

Eq. (10.17), any two vertices i, j with the same degree ki = kj give rise to the
same equations for xi and xj , implying xi = xj . In general, all vertices with
the same degree k have the same value xk . So eqs.(10.17) reduce to as many
nonequivalent equations as the number of distinct degrees actually displayed in
the network, which is generally much less than n:
k=n
X
k0
P (k 0 )
xk xk0
(xk )2
1 + xk xk0
1 + (xk )2
(10.18)
where P (k) is the empirical degree distribution, so that nP (k) is the number of
vertices with degree k, and k, k 0 take only the empirical values of the degrees.
The last term in the above equation removes the selfcontribution of a vertex
to its own degree.
Calculating averages and standard deviations
Once the parameters ~x are found, we can put them back into pij (~x) and use the
resulting pij pij (~x ) to calculate the expected value hXi of any higher-order
property X(G) of interest. In general, this can be done analytically only if X(G)
is a linear function of the entries {gij } of the adjacency matrix of G. However, if
X(G) has some nonlinear dependence on the constraints (i.e. the degrees {ki (G)}
in the case of the CM), then we can approximate the value of such constraints as
constant and equal to the empirical value (i.e. ki (G) ki ). This is because the
maximum-ensemble has been constructed precisely in order to keep the value
of those constraints as close as possible to the empirical value. While this is
in general rigorously true only in microcanonical ensembles, one expects that
the fluctuations of the constraints in canonical ensembles, although nonzero, are
still much smaller than the fluctuations of any other (unconstrained) quantity.
Therefore, to calculate the expectation values of the higher-order properties
of interest, we will treat the value of the degrees as fixed and equal to the
empirical value. In particular, the expectation value of the ANND defined in
Eq. (4.21) will be approximated as
P P
hgij i hgjk i
j6=i
Pk6=j
E (kinn ) hkinn i
(10.19)
j6=i hgij i
where hgij i = pij . Similarly, the expectation value of the clustering coefficient
defined in Eq. (4.30) is
P P
j6=i
=i,j hgij i hgjk i hgki i
P k6P
E (Ci ) hCi i
.
(10.20)
j6=i
k6=i,j hgij i hgki i
Using the same sort of approximation, it is also possible to estimate the
standard deviation
p
[X] hX 2 i (hXi )2
of the properties of interest. This provides us with some error bar using which
we can distinguish between properties that are statistically consistent with the
expectations of the null model and properties which are not consistent, thus
representing higher-order patterns.
132
a
1.0
<k nn>, k nn, k nn
120
100
80
60
40
20
0
<c>, c, c
0.8
0.6
0.4
0.2
0.0
0
20 40 60 80 100 120 140

k
20 40 60 80 100 120 140

k
d
0.4
<c>, c, c
<k nn>, k nn, k nn
30
25
20
15
10
5
0
0.3
0.2
0.1
0.0
10
20
30
40
50
10
20
40
50
e
25
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
20
<c>, c, c
<k nn>, k nn, k nn
30
k
15
10
5
0
0
10
20
30
k
40
50
10
20
30
k
40
50
h
1.0
200
<k nn>, k nn, k nn
0.8
<c>, c, c
150
100
50
0.6
0.4
0.2
0.0
0
0
50
100
k
150
200
50
150
200
i
600
500
400
300
200
100
0
100
k
<c>, c
<k nn>, k nn
0.15
0.10
0.05
0.00
0
500
1000 1500 2000

k
50
100 150 200 250

k
Figure 10.1: Application the ML method to binary undirected networks. The

red points are the empirical data, the black solid curves are averages over the
configuration model obtained using the local rewiring algorithm [11, 12], and the
blue dashed curves are the analytical expectations ( one standard deviation).
The green curves are the flat expectations under the random graph model, and
highlight the average level of correlation in the case when there is no dependence
on the degree. The panels report kinn versus ki (left) and Ci versus ki (right)
for: a) and b) the network of the largest US airports (n = 500) [33], c) and
d) the synaptic network of Caenorhabditis elegans (n = 264) [34], e) and f)
the protein-protein interaction network of Helicobacter pylori (n = 732) [35],
g) and h) the network of liquidity reserves exchanges between Italian banks in
1999 [36] (n = 215), i) the Internet at the AS level (n = 11.174) [37] and j)
the protein-protein interaction network of Saccharomices cerevisiae (n = 4.142)
[35].
133
Empirical results
In fig.10.1 we show an application of the ML method to the analysis of various
networks, namely the network of the 500 largest US airports [33], a synaptic
network [34], two protein interaction networks [35], an interbank network [36]
and the Internet at the Autonomous Systems level [37]. These are among the
most studied networks of this type.
We compare the correlation structure of the original networks, as ordinarily
measured by the dependence of kinn (G ) and Ci (G ) on ki (G ) (see chapter 4),
with the expected values hkinn i and hCi i obtained analytically using the ML
method. We also highlight the region within one standard deviation around the
average by plotting the curves hkinn i [kinn ] and hCi i [Ci ].
For the sake of comparison, we also report the average values obtained sampling the microcanonical ensemble with the local rewiring algorithm [11, 12]
(see Chapter 5), and the expected values over the ensemble of random graphs
with the same number of links (corresponding to the Erdos-Renyi random graph
model). It should be noted that the microcanonical method requires the generation of many randomized networks (each obtained after many rewiring steps),
the measurement of kinn and ci on each network separately, plus a final averaging. By contrast, the ML method only requires the preliminary estimation
of the {xi }. Then the calculation of hkinn i and hci i is analytical and takes
exactly the same time as that of the empirical values. As can be seen, the two
approaches yield very similar results most of the time. When they differ, the
deviations are presumably due to the fact that, as we mentioned in Chapter 5,
the microcanonical implementation of the CM is biased. For the two largest
networks (the protein interactions in S. cerevisiae and the Internet), we only
report the expectations obtained using the ML method, as the microcanonical
approach would require too much computing time.
The results shown in fig.10.1 allow to interpret the effect of the degree sequence on higher-order properties. Firstly, the trends displayed by the CM
are not flat as those expected in the Erdos-Renyi case. This confirms that
residual structural correlations, simply due to the enforced constraint, are still
present after the rewiring has taken place. The presence of these correlations
does not require any additional explanation besides the existence of the constraints themselves. This is very different from the picture one would get by
using the (wrong) expectation of Eq. (10.15) which would yield flat trends as
well, naively suggesting that correlations can never be traced back to the degree
sequence alone.
Secondly, while the trends observed in all the networks considered are always decreasing, they unveil different correlation patterns when compared to
the randomized trends. The real interbank data are almost indistinguishable
from the randomized curves, meaning that structural constraints can fully explain the observed behaviour of higher-order network properties. Instead, in the
airport network the randomized curves lie below the real data (except for an
opposite trend of hkinn i for low degrees). This means that the real network is
more correlated than the baseline randomized expectation, and indicates that
additional mechanisms producing positive correlations must be present on top
of structural effects. By contrast, in the H. pylori s protein network the expected curves lie above the real data, suggesting the presence of mechanisms
producing negative correlations. The same is true for the correlation structure
134
of the Internet, confirming previous results [12], while S. cerevisiaes protein

network is completely different from its randomized variants.
All these considerations show that seemingly similar trends can actually reveal very different types of structural organization. This means that the mere
measurement of the topological properties is uninformative, and makes the comparison between real data and randomized ensembles essential.
10.2.2
Directed graphs
We now consider binary directed networks, which are specified by an asymmetric

(in general) adjacency matrix. The local constraints are now represented by the
joint sequence of out-degrees and in-degrees {kiout , kiin }. Given a particular realworld network G and a measured topological property X(G ), the ML method
allows to analytically obtain the expectation value hXi and standard deviation
[X] across the ensemble of binary directed graphs with, on average, the same
directed degree sequences ~k out (G ) and ~k in (G ) as G . We will denote this
model as the directed configuration model (DCM).
Exercise 10.1 The DCM is defined in terms of two n-dimensional vectors ~x,
~y of parameters:
pij (~x, ~y ) =
xi yj
,
1 + xi yj
xi > 0,
yi > 0
i.
(10.21)
Write the log-likelihood of the model (taking care of the fact that the network
is directed) and show that the ML principle requires that these parameters are
set to the particular values ~x , ~y that solve the following set of 2n coupled
nonlinear equations:
X
j6=i
X
j6=i
xi yj
1 + xi yj
= kiout (G )
xj yi
1 + xj yi
= kiin (G )
i
i
(10.22)
(10.23)
thus proving that the expected in- and out-degree sequences coincide precisely
with the empirical ones.
As in the undirected case, the quantities ~x , ~y allow us to obtain hXi and
[X] analytically and quickly, without sampling the ensemble explicitly.
Empirical results
We can now apply the method to various directed networks, by studying the
second-order topological properties measured by the outward ANND and the
inward ANND, defined in Eq. (4.24).
P P
haij i hajk i
j6=i
nn,out
nn,out
Pk6=j
E (ki
) hki
i
(10.24)
j6=i haij i
P P
haji i hakj i
j6=i
Pk6=j
E (kinn,in ) hkinn,in i
(10.25)
j6=i haji i
135
a
20
nn
nn
nn
<kout
>, kout
, kout
<kinnn>, kinnn, kinnn
15
15
10
5
10
0
0
10
20
30
40
50
10
15
kin
kout
20
25
100
100
nn >, k nn , k nn
<kout
out out
80
60
40
20
80
60
40
20
0
0
0
50
100
150
50
100
kin
200
250
e
40
20
30
15
nn >, k nn , k nn
<kout
out out
150
kout
20
10
0
10
5
0
20
40
60
kin
80
100
10
20
30
kout
40
50
60
Figure 10.2: Application of the ML method to directed networks, using the

directed configuration model. Red points are the empirical data, the black
solid curves are expectations under the directed configuration model using the
local rewiring algorithm, and the blue dashed curves are the exact expectations
obtained using the ML method ( one standard deviation). The green curves
are the flat expectations under the directed version of the Erdos-Renyi model.
The panels report kinn,in versus kiin (left) and kinn,out versus kiout (right) for: a)
and b) the directed neural network of Caenorhabditis elegans (n = 264) [34], c)
and d) the metabolic network of Escherichia coli (n = 1078) [38], e) and f) the
Little Rock Lake food web (n = 183) [39].
136
In fig.10.2 we plot the observed values kinn,in (G ) versus kiin (G ) and kinn,out (G )
versus kiout (G ), as well as the expectations hkinn,in i [kinn,in ] and hkinn,out i
[kinn,out ] obtained using the ML method, for three real directed networks: the
neural network of C. elegans [34] (now in its directed version), the metabolic
network of E. coli [38], and the Little Rock Lake food web [39]. As before, we
also show the microcanonical average obtained using the LRA and the expectation under the directed Erd
os-Renyi random graph model (DRG) with the same
number of links. Again, we find a very good agreement between the two approaches, but the ML method yields the correct prediction in incredibly shorter
time. We also confirm that while some networks (C. elegans and E. coli ) are
almost consistent with the null model, others (Little Rock ) deviate significantly.
However, the most interesting point for the present analysis is that, while
for the undirected networks considered above all randomized trends were decreasing, in this case we find that the three randomized trends behave in totally
different ways. In the neural network, both hkinn,in i and hkinn,out i are approximately constant. This means that the baseline behavior for both quantities is
flat and uncorrelated (as in the directed random graph, but at a different level).
By contrast, in the metabolic network the expected curves are decreasing, and
thus the ensemble of randomized networks is disassortative as for the undirected
graphs considered above. Finally, in the food web the constraints enforce unusual positive correlations, and the randomized ensemble is even assortative.
Interestingly, while it is expected that random networks with specified degrees
display a disassortative behavior [12, 15], the assortative trend is totally surprising. This is because the ML method extracts the hidden variables directly
from the specific real world network, rather than drawing them from ad hoc
distributions. The resulting values can be distributed in a very complicated
fashion, invalidating the results obtained under other hypotheses.
To further highlight this important point, we can select three more food
webs characterized by a particularly small size (see fig.10.3). Small networks
cannot be described by approximating the mass probability function of their
topological properties (such as the degree) with a continuous probability density like in the Park-Newman approach described in Chapter 5. Therefore in
this case the difference between the expectations obtained by drawing the ~x and
~y values from analytically tractable continuous distributions and those obtained
by solving eqs.(10.23) using the empirical degrees is particularly evident. As we
show in fig.10.3 (where for simplicity we omit the comparison with the LRA), we
confirm that the (directed) CM can display not only flat or decreasing trends,
but also increasing ones. Importantly, in this case all three webs do not deviate
dramatically from the null model. This means that while one would be tempted
to interpret the three observed trends as signatures of different patterns (zero,
negative and positive correlation), actually in all three cases the observed behavior can be roughly replicated by the same mechanism and almost entirely
traced back to the degree sequence only. This unexpected result highlights once
again that the measured values of any topological property are per se entirely
uninformative, and can only be interpreted in relation to a null model.
Reciprocity and motifs
So far, in our analysis of directed networks we have considered second-order
topological properties. In principle, third-order properties can be studied by in-
137
b
8
6
nn >, k nn
<kout
out
<kinnn>, kinnn
15
10
5
4
2
0
0
10
15
kin
20
25
30
8
kout
10
12
14
30
15
nn >, k nn
<kout
out
<kinnn>, kinnn
25
20
15
10
10
5
5
0
0
0
10
20
kin
30
40
10
nn >, k nn
<kout
out
<kinnn>, kinnn
20
25
4
2
4
2
0
0
0
10
20
30
40
10
kin
15
kout
20
25
20
25
nn >, k nn
<kout
out
<kinnn>, kinnn
15
kout
4
2
0
0
10
20
30
kin
40
10
15
kout
Figure 10.3: Application of the ML method to small-sized, directed food webs.

Red points are the empirical data and the blue dashed curves are the exact
expectations ( one standard deviation) under the directed configuration model
obtained using the ML method. The green curves are the flat expectations under
the directed Erd
os-Renyi model. The panels report kinn,in versus kiin (left) and
nn,out
out
ki
versus ki (right) for: a) and b) the Narragansett Bay web (n = 35)
[40], c) and d) the Mondego Estuary web (n = 46) [40], e) and f) the St. Marks
River web (n = 54) [40]. For the latter, in g) and h) we also compare the
empirical data with the expectations under the reciprocal configuration model,
where also the number of reciprocated links of each vertex is specified.
138
Figure 10.4: The 13 triadic motifs, defined as the possible non-isomorphic connected subgraphs of 3 vertices in a directed graph.
troducing directed generalizations of the clustering coefficient [41, 42]. However,

there is a proliferation of possible third-order patterns, due to the directionality
of links. For this reason, a more complete analysis consists in counting (across
the entire network) the occurrence of all the 13 possible directed motifs [10]
involving three vertices (see fig.10.4), and comparing the empirical abundances
with the expected ones under the null model. As we show below, the ML method
is very effective in such a case.
Before presenting the results, we note that directionality also implies that,
besides the DCM considered above, a more refined way to randomize directed
networks includes the possibility to enforce additional constraints on the reciprocity structure [10, 9]. In other words, it is possible (and important in many
out
applications [10, 17]) to preserve not only the total numbers kiin and kP
of ini
coming and outgoing links of each vertex, but also the number ki j aij aji
of reciprocated links (pairs of links in both directions) [43, 44]. This specification P
is equivalent to enforce, for each vertex i, the three quantities
9]
P [43,
ki j6=i a
(number
of
non-reciprocated
outgoing
links),
k
a
ij
i
j6=i ij
P
(number of non-reciprocated incoming links) and ki j6=i a
ij (number of
reciprocated links), where a

ij aij (1 aji ), aij aji (1 aij ) and aij aij aji .
Given a real directed network G , we denote the null model with specified
joint reciprocal degree sequences {ki (G ), ki (G ), ki (G )} as the reciprocal
configuration model (RCM). This is an example of model with nonlocal (secondorder) constraints which can still be treated analytically using the ML method.
One can show that in this case one needs to solve the following 3n coupled
139
equations:
xi yj
+ xj yi + zi zj
= ki (G )
(10.26)
xj yi
1 + xi yj + xj yi + zi zj
= ki (G )
(10.27)
zi zj
1 + xi yj + xj yi + zi zj
= ki (G )
(10.28)
X
j6=i
X
j6=i
X
j6=i
1+
xi yj
The expectation value of any topological property, as well as its standard deviation, can now be calculated analytically in terms of the three n-dimensional
vectors ~x , ~y , ~z . For instance, in fig.10.3g-h we repeat the analysis of the
directed ANND of the St. Marks River food web, now comparing the observed
trend against the RCM. In this case, we find no significant difference with respect to the DCM considered above (fig.10.3e-f). However, as we now show, the
analysis of motifs reveals a dramatic difference between the predictions of the
two null models.
If Nm denotes the number of occurrences of a particular motif m, the ML
method allows to calculate the expected number hNm i and standard deviation
[Nm ] exactly, and thus to obtain the z-score
z[Nm ]
Nm (G ) hNm i
[Nm ]
(10.29)
analytically. This can be done for both the DCM and the RCM. The value of
z[Nm ] indicates by how many standard deviations the observed and expected
numbers of occurrences of motif m differ. Large values of z[Nm ] indicate motifs that are either over- or under-represented under the particular null model
considered, and that are therefore not explained by the lower-order constraints
enforced.
In fig.10.5 we show the z-scores for all the possible 13 non-isomorphic connected motifs with three vertices in 8 real food webs, for both null models.
We also show the two lines z = 2 to highlight the region within 2 standard
deviations from the models expectations. The food webs considered here are
from different ecosystems (lagoons, marshes, lakes, bays, estuaries, grasses),
with a prevalence of aquatic habitats. The presence of (intrinsically directed)
predator-prey relationships implies that reciprocity is a very important quantity
in food webs [17]. Thus the RCM should fluctuate less than the DCM. Indeed,
this is confirmed by our analysis. The z-scores for the motifs m = 2, 3, 13
are significantly reduced from the DCM to the RCM. Also, while the motifs
m = 1, 6, 10, 11 display large values of z with opposite signs across different
webs under the DCM, the signs of all statistically surprising motifs (i.e. when
|z| & 2) become consistent with each other under the RCM (except for m = 13).
As a consequence, under the RCM all networks display a very similar pattern,
and the most striking features of real webs become the over-representation of
motifs m = 2, 10 (plus m = 6, 11, 13 for the Little Rock Lake web) and the
under-representation of motifs m = 5, 9, 13 (plus m = 3, 7, 8 for Little Rock
Lake). In particular, the under-representation of motif m = 9 (the 3-loop) is
the most common pattern across all webs, and becomes stronger as the reciprocity of the web increases. Also note that in a network with no reciprocated
140
Chesapeake Bay
15
z=HNm -<Nm >LNm
10
Little Rock Lake
5
0
Maspalomas Lagoon
-5
-10
Florida Bay
-15
0
10
12
14
St Marks Seagrass
15
Everglades Marshes
z=HNm -<Nm >LNm
10
5
Grassland
0
-5
-10
Ythan Estuary
-15
0
10
12
14
Figure 10.5: Application of the ML method to the analysis of directed motifs

involving three vertices in 8 real food webs. Top panel: z-scores obtained enforcing only the in-degree and out-degree sequences (directed configuration model).
Bottom panel: z-scores obtained enforcing also the reciprocal degree sequence
(reciprocal configuration model).
141
links, the number of motifs with at least a pair of reciprocated links is zero. Under the RCM, the expected number of these motifs remains zero. By contrast,
their expected number under the DCM is always positive. Thus we confirm
that the upgrade to the RCM is necessary, as its stricter constraints allow to
analyze 3-vertices motifs once 2-vertices motifs (i.e. all possible dyadic patterns)
are correctly accounted for. The possibility to treat the RCM analytically using
the ML method is an important ingredient of this analysis.
10.2.3
General case
We conclude with an exercise aimed at generalizing the results that we have

enumerated so far for specific ensembles.
Let { } be our desired set of reference properties (constraints), and let us
define in terms of them the maximum-entropy model (see Chapter 5)
~
~ = eH(G|)
~
P(G|)
/Z()
(10.30)
~ P (G) is the graph Hamiltonian and Z()

~ P exp[H(G|)]
~
where H(G|)
~ log P(G |)
~ =
is the partition function [7]. The log-likelihood function is ()
~
~
H(G |) log Z().
Homework 10.6 Show that the ML principle implies that the optimal choice
~
for the parameter ~ in the model defined by Eq. (10.30) is given by the solution
to the following set of coupled equations:
X
~
(G ) =
(G)eH(G| ) /Z(~ ) = h i~
(10.31)
G

The above exercise shows that, in this class of models, the ML condition
is equivalent to Eq. (10.2), i.e. ~ = ~M . This means that the whole class of
maximum-entropy ensembles is unbiased. This gives us the following recipe: if
we wish to define a model whose predictions will then be matched to a set of
properties { (G )} observed in a realworld network G , we should decide
~
from the beginning what these reference properties are, include them in H(G|)
~
and define P(G|) as in Eq. (10.30). In this way we are sure to obtain an unbiased
model. The random graph is a trivial special case where (G) = Lu (G) and
H(G|) = Lu (G) [7], and this is the reason why it is unbiased, if Lu is chosen as
reference. Similarly, the hiddenvariable model defined by Eq. (10.13) is another
~ = P i ki (G) with xi ei [7],
special case where ~ (G) = ~k(G) and H(G|)
i
and so it is unbiased too. By contrast, Eq. (5.5) cannot be traced back to
Eq. (10.30), and the model is biased.
Bibliography
[1] G. Caldarelli, A. Capocci, P. De Los Rios and M.A. Mu
noz, Phys. Rev.
Lett. 89, 258702 (2002).
[2] B. S
oderberg, Phys. Rev. E 66, 066121 (2002).
[3] M. Bogu
n
a and R. PastorSatorras, Phys. Rev. E 68, 036112 (2003).
[4] F. Chung and L. Lu, Ann. of Combin. 6, 125 (2002).
[6] D. Garlaschelli, S. Battiston, M. Castri, V.D.P. Servedio and G. Caldarelli,
Physica A 350, 491 (2005).
[7] J. Park and M.E.J. Newman, Phys. Rev. E 70, 066117 (2004) and references therein.
[8] P.W. Holland and S. Leinhardt, J. Amer. Stat. Assoc. 76, 33 (1981).
[9] D. Garlaschelli and M. I. Loffredo, Phys. Rev. E 73, 015101(R) (2006).
[10] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, U. Alon
Science 298, 824-827 (2002).
[11] S. Maslov and K. Sneppen, Science 296, 910 (2002).
[12] Maslov, S., Sneppen, K. & Zaliznyak, A. profile of the Internet. Physica A
333, 529540 (2004).
[13] M.E.J. Newman, S.H. Strogatz & D.J. Watts Phys. Rev. E 64, 026118
(2001).
[14] Chung, F. & Lu, L. Ann. of Combin. 6, 125 (2002).
[15] Park, J. & Newman, M.E.J. Phys. Rev. E 68, 026112 (2003).
[16] Catanzaro, M., Boguna, M. & PastorSatorras, R. Phys. Rev. E 71, 027103
(2005).
[17] D.B. Stouffer, J. Camacho, W. Jiang & L.A.N. Amaral, Proc. R. Soc. B
274, 1931-1940 (2007).
[18] R. Guimer
a, M. Sales-Pardo L. A. N. Amaral, Nat. Phys. 3, 63 (2007).
[19] Park, J. & Newman, M.E.J. Phys. Rev. E 70, 066117 (2004).
142
BIBLIOGRAPHY
143
[20] Serrano, M.A., & Boguna, M. AIP Conf. Proc. 776, 101 (2005).
[21] Serrano, M.A., Boguna, M. & PastorSatorras, R. Phys. Rev. E 74,
055101(R) (2006).
Serrano, networks. Phys. Rev. E 78, 026101 (2008).
[22] M.-A.
[23] Barrat, A., Barthelemy, M., PastorSatorras, R. & Vespignani, A. PNAS
101, 3747-3752 (2004).
[24] T. Opsahl, V. Colizza, P. Panzarasa J.J. Ramasco, Phys. Rev. Lett. 101,
168702 (2008).
[25] Bhattacharya, K., Mukherjee, G., Saramaki, J., Kaski, K. & Manna, S. S.
modelling. J. Stat. Mech. P02002 (2008).
[26] Bianconi, G. Phys. Rev. E 79, 036114 (2009).
[27] D. Garlaschelli & M.I. Loffredo Phys. Rev. Lett. 102, 038701 (2009).
[28] D. Garlaschelli, New J. of Phys. 11, 073005 (2009).
[29] S. Melnik, A. Hackett, M. A. Porter, P. J. Mucha, J. P. Gleeson.
http://arxiv.org/abs/1001.1439.
[30] M.E.J. Newman, PRL 103, 058701 (2009).
[31] M. Bogun
a, R. Pastor-Satorras, A. Vespignani, Eur. Phys. J. B 38, 205-209
(2004).
[32] D. Garlaschelli & M.I. Loffredo, Phys. Rev. E 78, 015101(R) (2008).
[33] V. Colizza, R. Pastor-Satorras & A. Vespignani, Nat. Phys. 3, 276 - 282
(2007).
[34] K. Oshio, Y. Iwasaki, S. Morita, Y. Osana, S. Gomi, E. Akiyama, K.
Omata, K. Oka and K. Kawamura, Tech. Rep. of CCeP, Keio Future 3,
(Keio University, 2003).
[35] http://dip.doe-mbi.ucla.edu/dip/Main.cgi
[36] G. De Masi, G. Iori & G. Caldarelli, Phys. Rev. E 74, 066112 (2006).
[37] V. Colizza, A. Flammini, M.A. Serrano & A. Vespignani, Nat. Phys. 2,
110-115 (2006).
[38] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai and A.-L. Barabasi, Nature
407, 651 (2000).
[39] N.D. Martinez, Ecological Monographs 61, 367-392 (1991).
[40] http://vlado.fmf.uni-lj.si/pub/networks/data/bio/
foodweb/foodweb.htm
[41] G. Fagiolo, Phys. Rev. E 76, 026107 (2007).
[42] S. E. Ahnert, T. M. A. Fink, Phys. Rev. E 78, 036112 (2008).
144
BIBLIOGRAPHY
[43] D. Garlaschelli and M. I. Loffredo, Phys. Rev. Lett. 93, 268701 (2004).
[44] V. Zlatic & H. Stefancic, Phys. Rev. E 80, 016117 (2009).
[45] M.E.J. Newman, Phys. Rev. E 70, 056131 (2004).
[46] S. E. Ahnert, D. Garlaschelli, T. M. Fink & G. Caldarelli, Phys. Rev. E
73, 015101(R) (2006).
[47] J. Saramaki, M. Kivela, J.-P. Onnela, K. Kaski and J. Kertesz, networks.
Phys. Rev. E 75, 027105 (2007).
[48] S. Fortunato, Physics Reports 486(3-5), 75-174 (2010).
[49] P. Holland, S. Leinhardt, in Sociological Methodology, D. Heise, Ed. (JosseyBass, San Francisco, 1975), pp. 1-45.
Chapter 11
Self-Organized Networks
11.1
Introduction
So far, we have approached networks from different points of view: i) the definition and computation of the static topological properties of real-world networks;
ii) the mathematical modelling of (either static or growing) network formation;
iii) the study of the effects that the topology has on various dynamical processes
(such as the spread of epidemics) taking place on static networks; iv) networks
as computing entities. These points of view reflect the main approaches in the
literature [1, 2, 3, 4, 5] present reviews of these results. More recently, a few
attempts to provide a unified approach to networks and their dynamics have
been proposed, exploiting the idea that all these aspects should in the end be
related to each other. In particular, it has been argued that the complexity
of realworld networks is in the most general case the result of the interplay
between topology and dynamics, because networks can process the information
about their environment and respond to it adaptively. So, while most studies
have focused either on the effects that topological properties have on dynamical
processes, or on the reverse effects that vertexspecific dynamical variables have
on network structure, it has been suggested that one should consider the mutual influence that these processes have on each other. This amounts to relax
the (often implicit) hypothesis that dynamical processes and network growth
take place at well separated timescales, and that one is therefore allowed to
consider the evolution of the fast variables while the slower ones are quenched.
Remarkably, one finds that the feedback between topology and dynamics can
drive the system to a steady state that differs from the one obtained when the
two processes are considered separately [6]. These results imply that adaptive
networks generated by this interplay may represent an entirely novel class of
self-organized complex systems, whose properties cannot be straightforwardly
understood in terms of what we have learnt so far.
In this chapter we shall present a selforganized model [6] where an otherwise static model of network formation driven by a so-called vertex fitness
[7] is explicitly coupled to a so-called extremal dynamics process [8] providing
a dynamical rule for the evolution of the fitness itself. In order to highlight
the novel phenomena that originate from the interplay between the two mech145
146
CHAPTER 11. SELF-ORGANIZED NETWORKS
Figure 11.1: First steps in the iteration procedure defining the Sierpinski triangle.
anisms, we first review the main properties of such mechanisms when they are
considered separately. In section 11.2 we recall some aspects of scale invariance
and SelfOrganized Criticality (SOC), and in particular the biologicallyinspired
BakSneppen model [8] where the extremal dynamics for the fitness was originally defined on static graphs. In section 11.3 we briefly review the socalled
fitness model of network formation [7], where the idea that network properties
may depend on some fitness parameter associated to each vertex was proposed.
Finally, in section 11.4 we present the selforganized model obtained by coupling
these mechanisms.
11.2
Scale invariance and selforganization
Selfsimilarity, or fractality, is the property of an object whose subparts have

the same shape of the whole. At first, selfsimilarity appeared as a peculiar
property of a limited class of objects. Only later, due to the activity of Benoit
Mandelbrot [9, 10], it turned out that examples of fractal structures (even if
approximate due to natural cutoffs) are actually ubiquitous in nature. Indeed,
in an incredible number of situations the objects of interest can be represented
by selfsimilar structures over a large, even if finite, range of scales. Examples
include commodity price fluctuations [9], the shape of coastlines [10], the discharge of electric fields [11], the branching of rivers [12], deposition processes
[13], the growth of cities [14], fractures [15], and a variety of biological structures
[16].
11.2.1
Geometric fractals
Due to this ubiquity, scientists have tried to understand the possible origins
of fractal behaviour. The first preliminary studies have focussed on mathematical functions built by recursion (Kochs snowflake, Sierpi
nski triangle and
carpet, etc.). Based on these examples, where selfsimilar geometric objects
are constructed iteratively, mathematicians introduced quantities in order to
distinguish rigorously between fractals and ordinary compact objects.
For instance, one of the simplest fractals defined by recursion is the Sierpinski triangle, named after the Polish mathematician Waclaw Sierpi
nski who
introduced it in 1915 [17]. When the procedure shown in Fig.11.1 is iterated
an infinite number of times, one obtains an object whose empty regions extend
at any scale (up to the maximum area delimited by the whole triangle). It is
therefore difficult to measure its area in the usual way, i.e. by comparison with
another area chosen as the unit of measure. A way to solve this problem is to
11.2. SCALE INVARIANCE AND SELFORGANIZATION
147
consider a limit process not only for the generation of the fractal, but also for
the measurement of its area. Note that at the first iteration we only need three
triangles of side length 1/2 to cover the object (while for the whole triangle
we would need four of them). At the second iteration we need nine covering
triangles of side 1/4 (while for the whole triangle we would need sixteen of
them).
The (scaledependent) number of objects required to cover a fractal is at
the basis of the definition of the fractal dimension D. Formally, if N () is
the number of DE -dimensional volumes of linear size required to cover an
object embedded in a metric space of Euclidean dimension DE , then the fractal
dimension is defined as
ln N ()
,
(11.1)
D = lim
0 ln 1/
which approaches an asymptotic value giving a measure of the region occupied
by the fractal. For a compact (non-fractal) object, the fractal dimension D has
the same value as the Euclidean dimension DE .
Homework 11.1 Prove that, for a compact 2-dimensional triangle, D =
DE = 2.
Homework 11.2 Prove that, for the Sierpi
nski triangle, D =
Note that now D < DE = 2.
ln 3
ln 2
' 1.58496....
Therefore the fractal dimension measures the difference between the compactness of a fractal and that of a regular object embedded in a space of equal
dimensionality. In the present example, D is lower than 2 because the Sierpinski
triangle is less dense than a compact bidimensional triangle. D is also larger
than 1 because it is denser than a one-dimensional object (a line). Note that the
above formula can be rewritten in the familiar form of a power law by writing,
for small ,
N () D
(11.2)
This highlights the correspondence between the geometry of a fractal and scale
invariant laws.
11.2.2
SelfOrganized Criticality
Despite their importance in characterizing the geometry of fractals, purely iterative models are not helpful in order to understand whether a few common
mechanisms might be responsible for the fractal behaviour observed in so many
different, and seemingly unrelated, realworld situations. This has shifted the
interest towards dynamical models. Indeed, open dissipative systems are in
many cases associated with fractals for more than one reason. Firstly, attractors in the phase space of a nonlinear dynamical system can have a fractal
geometry; secondly, their evolution can proceed by means of scaleinvariant
bursts of intermittent activity [18] extending over both time and space. In
general, these features are obtained when a driving parameter of the nonlinear
dynamical system is set to a crossover value at which chaotic behaviour sets on.
When this occurs, the nonlinear system is said to be at the edge of chaos. Another situation where selfsimilarity is observed is at the critical point of phase
transitions. For instance, magnetic systems display a sharp transition from a
148
hightemperature disordered phase, where microscopic spins point in random

directions and generate no macroscopic magnetization, to a lowtemperature
ordered phase where almost all spins point in the same direction, determining
a nonzero overall magnetization. Exactly at the critical transition temperature, spins are spatially arranged in aligned domains whose size is powerlaw
distributed. This means that domains of all sizes are present, with a scale
invariant pattern.
In both cases, in order to explain the ubiquity of selfsimilar systems one
should understand why they appear to behave as if their control parameter(s)
were systematically finetuned to the critical value(s). This point led to the idea
that feedback effects might exist, that drive the control parameter to the critical value as a spontaneous outcome of the dynamics. In this scenario, it is the
system itself that evolves autonomously towards the critical state, with no need
for an external finetuning. This paradigm is termed SelfOrganized Criticality
(SOC) (for a review see Ref. [19] and references therein). At a phenomenological level, SOC aims at explaining the tendency of open dissipative system to
rearrange themselves in such a way to develop longrange temporal and spatial
correlations. Why this happens is still a matter of debate, even if some authors
claimed that this behaviour may be based on the minimization of some energy
potential [20, 21, 22]1 . Also, it has been proposed that a temperaturelike parameter can actually be introduced for these systems [24, 25], and shown to lead
to SOC only if finetuned to zero. This supports the hypothesis that SOC models are closely related to ordinary critical systems, where parameters have to be
tuned to their critical value, the fundamental difference being the feasibility of
this tuning.
There are several examples of simplified models showing SOC, and most of
them have a common structure. In practice, two classes of SOC models attracted
many studies: the class of sandpile models [26] and the class of models based on
extremal dynamics such as the BakSneppen [8] and Invasion Percolation [27]
models. In what follows we briefly review these examples.
Sandpiles
One prototype is represented by sandpile models [26], a class of open dissipative
systems defined over a finite box in a ddimensional hypercubic lattice. In
d = 2 dimensions, one considers a simple square lattice. Any site i of the lattice
is assumed to store an integer amount zi of sand grains, corresponding to the
height reached by the sandpile at that site. At every time step one grain of sand
is added on a randomly chosen site i, so that the height zi is increased by one.
As long as zi remains below a fixed threshold, nothing happens 2 . But as soon
as zi exceeds the threshold, the column of sand becomes unstable and topples
on its nearest neighbours. Therefore the heights evolve according to
zi zi ki
1 Interestingly
(11.3)
a similar claim has been made for networks as well [23].

functions of the height zi can be defined: for example the height itself, the
difference of height between nearest neighbours (first discrete derivative of the height), the
discrete Laplacian operator of height (second discrete derivative), and so on.
2 Different
11.2. SCALE INVARIANCE AND SELFORGANIZATION
149
where
ki
2d
=
1
k=i
k
nearest neighbor of i
(11.4)
otherwise.
This process is called toppling. As the neighbouring sites acquire new grains,
they may topple in their turn, and this effect can propagate throughout the
system until no updated site is active, in which case the procedures starts again
with the addition of a new grain. While the amount of sand remains constant
when toppling occurs in the bulk, for topplings on the boundary sites (i )
some amount of sand falls outside and disappears from the system. In the steady
state of the process, this loss balances the continuous random addition of sand.
All the toppling events occurring between two consecutive sand additions
are said to form an avalanche. One can define both a size and a characteristic
time for an avalanche. The size of an avalanche can be defined, for instance, as
the total number of toppling sites (one site can topple more than once) or the
total number of topplings (it is clear that these two definitions give more and
more similar results as the space dimension increases). In order to define the
lifetime of an avalanche, one must first define the unit timestep. The latter is
the duration of the fundamental event defined by these two processes:
a set of sites becomes critical due to the previous toppling event;
all such critical sites undergo a toppling process, and the heights of their
neighbours are updated.
Then the lifetime of an avalanche can be defined as the number of unit timesteps
between two sand additions.
The fundamental result of the sandpile model is that, at the steady state,
both the size s and the lifetime t of avalanches are characterized by power law
distributions P (s) s , Q(t) t [26]. Therefore the model succeeds in
reproducing the critical behaviour, often associated to phase transitions, but
with a selforganized mechanism requiring no external fine tuning of the control parameter. Note that the grain addition can be viewed as the action of an
external field over the system. Similarly, the avalanche processes can be viewed
as the response (relaxation) of the system to this field. The spatial correlations
that develop spontaneously at all scales indicate that the system reacts macroscopically even to a microscopic external perturbation, a behaviour reminiscent
of the diverging susceptibility characterizing critical phenomena.
The BakSneppen model
A model that attempts to explain some key properties of biological evolution,
even if with strong simplifications, is the BakSneppen (BS) model [8, 28]. In
its simplest formulation, the model is defined by the following steps:
N species are arranged on the sites of a 1-dimensional lattice (a chain, or
a ring if periodic boundary conditions are enforced);
150
a fitness value xi (sometimes interpreted as a fitness barrier ) is assigned to

each species i, drawn randomly from a uniform distribution in the interval
[0, 1];
the site with the lowest barrier and its nearest neighbours are updated:
new random fitness values, drawn from the same uniform distribution on
the unit interval, are assigned them.
The basic idea behind the model is that the species with the lowest fitness
is the one that is most likely to go extinct and replaced by a new one. Alternatively, the update is interpreted as a mutation of the least fit species towards
an evolved species representing its descendant or offspring. Finally, one can
interpret xi as the barrier against mutation for the genotype of species i: the
higher the barrier, the longer the time between two modifications of the genetic
code. The species with lowest barrier is therefore the first to evolve. In any
case, the reason for updating the nearest neighbours is the same: the mutation
of one species changes the state of all the interacting species (for instance, both
predator and prey along the food chain). The effect of this change on the fitness
of the nearest neighbours is not known a priori (it may be beneficial or not),
and is modelled as a random update of their fitness as well.
If the procedure described above is iterated, the system selforganizes to
a critical stationary state in which almost all the barriers are uniformly distributed over a certain threshold value = 0.66702 0.00008 [29] (see Fig.11.2,
left panel). In other words, the fitness distribution evolves from a uniform one
in the interval [0, 1] to a uniform one in the interval [, 1]. In this model an (evolutionary) x-avalanche is defined as a causally connected sequence of mutations
of barriers, all below a fixed value x. In this way the size of an x-avalanche is
uniquely defined as the number of mutations between two consecutive configurations where all barriers are above x. For x the avalanche distribution is
a power law P (s) s with an exponent = 1.073 0.003 [29] (see Fig.11.2,
right panel).
The BakSneppen model is a prototype mechanism generating fractal phenomena as an effect of extremal dynamics [30]. It also provides a possible explanation for the phenomena of mass extinctions observed in the fossil records
[31], some analyses of which have indicated that extinction sizes are powerlaw
distributed. Rather than considering largescale extinctions as triggered by external catastrophic events (such as meteorites or major environmental changes)
and smallscale extinctions as caused by evolutionary factors, the model shows
that a powerlaw distribution of extinction events may be interpreted as the
outcome of a single internal macroevolutionary process acting at all scales.
The BakSneppen model has been studied within a variety of different frameworks ranging from numerical simulation [29, 32], theoretical analysis [33], renormalization group techniques [34, 35], field theory [36], mean-field approximations
[28, 30] and probabilistic approaches (run time statistics) [37, 38]. It has also
been defined on higherdimensional lattices and more general graphs, including
complex networks [8, 28, 38, 39, 40, 41, 42, 43]. For a recent review on this
model see ref. [44] and references therein. Being so well studied, the Bak
Sneppen model is ideal for studying the effects introduced by an additional
11.3. THE FITNESS MODEL
151
Figure 11.2: Left: plot of the probability distribution of fitness values at the
steady state in the BakSneppen model with 500 species. Right: the probability
distribution P (s) for the size of a critical -avalanche.
feedback mechanism between fitness dynamics and topological restructuring.
For this reason, it is at the basis of the adaptive model [6] that we shall present
in detail in section 11.4.
11.3
The fitness model
In chapter 3 we have already discussed various models aimed at generating

realistic network topologies. Another successful model, the so-called Fitness
Model, is based on a suitable extension of the Erdos-Renyi random graph defined
in chapter 2. In the latter, all vertices are assumed to be statistically equivalent,
so (unsurprisingly) no heterogeneity emerges. By contrast, one can define a
static model where heterogeneity is explicitly introduced at the level of vertices.
In particular, Caldarelli et al. [7] have proposed a model where each vertex i
(i = 1, . . . , N ) is assigned a fitness xi drawn from a specified distribution (x).
Then, each pair of vertices i and j is sampled, and a link is drawn between them
with a fitnessdependent probability pij = f (xi , xj ). The expected topological
properties of the network can be easily computed in terms of (x) and f (x, y)
[7, 66, 67]. For instance, the expected degree of vertex i is
hki i =
X
j6=i
pij =
f (xi , xj )
(11.5)
j6=i
For N large, the discrete sum can be approximated by an integral. Thus the
expected degree of a vertex with fitness x is
Z
k(x) = N f (x, y)(y)dy
(11.6)
152
where the integration extends over the support of (x). If one consider the
cumulative fitness distribution and the cumulative degree distribution defined
as
Z
Z
+
(x0 )dx0
> (x)
P (k 0 )dk 0
P> (k)
(11.7)
then the latter can be easily obtained in terms of the former as

P> (k) = > [x(k)]
(11.8)
where x(k) is the inverse of the function k(x) defined in eq.(11.6).

Similarly, the expected value of the average nearest neighbours degree defined in eq.(4.21) is
P
P
j pij hkj i
jk pij pjk
nn
hki i
= P
(11.9)
hki i
j pij
and the expected value of the clustering coefficient defined in eqs.(4.29-4.30) is
P
P
jk pij pjk pki
jk pij pjk pki
= P
hCi i
(11.10)
hki i(hki i 1)
jk pij pki
As for eq.(11.5), the above expressions can be easily rephrased in terms of integrals involving only the functions f (x, y) and (x), upon which all the results
depend.
11.3.1
Particular cases
The constant choice f (x, y) = p is the trivial case corresponding to an ErdosRenyi random graph (see chapter 2), irrespectively of the form of (x).
The simplest nontrivial choice coincides with the configuration model (see
Chapter 5), which is obtained by requiring that all the realizations of the fitness
dependent network having the same degree sequence occur with the same probability. This leads to [68, 69]
f (x, y) =
zxy
1 + zxy
(11.11)
where z is a positive parameter controlling the number of links. We know from

chapter 5 that the resulting network is completely random, apart for its degree
sequence having a definite (fitness-dependent in this case) expected value. We
also know that, when z 1, the above connection probability reduces to the
bilinear choice
f (x, y) = zxy
(11.12)
In this case, a sparse graph is obtained where structural correlations disappear.
Also, from eq.(11.5) one finds that hki i xi . If one chooses a powerlaw fitness distribution (x) x , it is therefore clear that the degree distribution
will have exactly the same shape: P (k) k . In the more general case corresponding to eq.(11.11), the same choice for (x) yields again a powerlaw
degree distribution, with a cutoff at large degree values that correctly takes
11.4. A SELFORGANIZED NETWORK MODEL
153
into account the requirement k N for dense networks. Equation (11.11) also
generates disassortativity and hierarchically distributed clustering, both arising
as structural correlations imposed by the local constraints. For sparse networks,
corresponding to eq.(11.12), these correlations disappear.
Another interesting choice is given by
f (x, y) = (x + y z)
(x) = ex
(11.13)
where z, which again controls the number of links, now plays the role of a positive
threshold. This choice yields again a powerlaw degree distribution P (k) k
(where now = 2), anticorrelated degrees with k nn (k) k 1 , and hierarchically
distributed clustering c(k) k 2 (times logarithmic corrections) [7, 66, 67].
Remarkably, it has been shown that both eq.(11.11) and eq.(11.13) are particular
cases of a more general expression obtained by introducing a temperaturelike
parameter [71]. Equation (11.11), with (x) x , corresponds to the finite
temperature regime, where the temperature can be reabsorbed in a redefinition
of x and z. By contrast, eq.(11.13) corresponds to the zerotemperature regime
where the structural correlations disappear and the graph reaches a sort of
optimized topology [71]. In all these cases, the average distance is small.
In summary, for a series of reasonable choices the networks generated by the
fitness model display
a scaleinvariant degree distribution;
correlations between neighbouring degrees;
hierarchically organized clustering coefficients;
a smallworld effect.
11.4
A selforganized network model
As we have anticipated, recent approaches to the modelling of complex networks have considered the idea that the topology evolves under a feedback with
some dynamical process taking place on the network itself (see for instance refs.
[6, 47, 72, 73, 74, 75, 76, 77]). Among the various contributions, three groups
have considered a possible connection with SelfOrganized Criticality [6, 73, 74].
Bianconi and Marsili [73] have defined a model where slow network growth,
defined as the gradual addition of links between randomly chosen vertices, is
combined to fast relaxation, defined as the random rewiring of links connected
to congested (toppling) vertices. To avoid the collapse to a complete graph,
dissipation is also introduced, allowing toppling nodes to lose all their links at
a given rate. The outcomes of the model depend on the dissipation rate and
on the probability density function for the toppling probabilities to be assigned
at each vertex. A particular choice of these quantities drives the system to a
stationary state characterized by a scalefree topology and a powerlaw distribution for toppling avalanches.
154
Fronczak, Fronczak and Holyst [74] have proposed a model where no parameter choice is required in order to drive the system to the critical region. They
considered the sandpile dynamics defined in section 11.2.2, but where each vertex has a different critical height equal to its degree, as in other previous studies
[61]. In addition, they assumed that after an avalanche of size A, the A ends of
links in the network that have not been rewired for the longest time are rewired
to the initiator of the avalanche. In this way, the avalanche area distribution
and the degree distribution evolve in time, and at the stationary state become
very similar and scalefree.
Garlaschelli, Capocci and Caldarelli [6] have introduced another fully self
organized model where the BakSneppen dynamics defined in section 11.2.2
takes place on a network whose topology is in turn continuously shaped by
the fitness model presented in section 11.3. They find that the mutual interplay between topology and dynamics drives the system to a state characterized
by scalefree distributions for both the degrees and the fitness values. These
unexpected properties differ from what is obtained when the two models are considered separately. The rest of the chapter is devoted to a detailed description
of this model.
11.4.1
Motivation
We have already mentioned that the topology of a network affects dramatically

the outcomes of dynamical processes taking place on it [1, 2, 4, 5]. On the other
hand, the idea behind the fitness model presented in section 11.3 captures the
empirically observed result [52, 53, 78] that the topology of many real networks
is strongly dependent on some vertexspecific quantity. Clearly, these results
imply that in general one should consider the mutual effects that dynamics
and topology have on each other. Unfortunately, the overwhelming majority
of studies have instead considered the two processes separately, by postulating
either a scenario where the topology evolves over a much longer timescale than
the dynamics, or the opposite situation where the dynamical variables evolve
much more slowly than the topology (and are therefore assumed fixed as in
the fitness model itself). In cases when there is indeed such a sharp separation
of timescales, these approaches are helpful. But in many cases the topological
evolution and the dynamics may occur at comparable rates, in which case the
decoupled approach gives no insight into the real process. Moreover, even when
the timescales are indeed well separated, it is clear that the variables involved
in the slower of the two processes must be specified as external parameters, and
ad hoc assumptions must therefore be made. For instance, when considering
the spreading of epidemics on a network one should assume an arbitrary fixed
topology. Similarly, when a network is formed according to the fitness model,
one should assume an arbitrary distribution for the fitness variables.
These motivations lead to the definition of a selforganized model [6] where
ad hoc specifications of any fixed structure, either in the topology or in the dynamical variables, are unnecessary. Rather, it is the interplay between dynamics
and topology that autonomously drives the system to a stationary state. The
choice of both the dynamical rule and the graph formation process was driven
by the interest to highlight the novel effects arising uniquely by the feedback
155
introduced between them. Therefore, two extremely well understood models

where chosen. On one hand, the extremal fitness dynamics of the BakSneppen
model (see section 11.2.2), and on the other hand the fitness network model
(see section 11.3). As we have shown in section 11.3, the topology generated
by the fitness model can be completely calculated for any distribution of the
fitness values. Similarly, the outcomes of the BakSneppen model on several
static networks are well studied [8, 28, 38, 39, 40, 41, 42, 43]. On a generic
graph, each of the N vertices is assigned a fitness value xi , initially drawn from
a uniform distribution between 0 and 1, as in the onedimensional case. At
each timestep the species i with lowest fitness and all its ki neighbours undergo
a mutation, and ki + 1 new fitness values (drawn from the same uniform distribution) are assigned them. On regular lattices [8, 39], random graphs [28],
smallworld [40] and scalefree [41, 42, 43] networks it has been shown that,
as for the onedimensional model, at the stationary state the fitness values are
uniformly distributed above a critical threshold . The only dependence on
the particular topology is the value of [8, 28, 39, 40, 41, 42, 43]. In particular, vanishes for scalefree degree distributions with diverging second moment
[41, 42, 43].
While these more complicated networks are closer to realistic food webs (i.e.
the real-world predator-prey networks [48]), the assumption of a static graph
leads to the ecological paradox that, after a mutation, the evolved species inherits the same connections of the previous species. By contrast, macroevolution is
believed to be at the same time the cause and the effect of food web dynamics
[47]. In particular, after a mutation, a species is expected to develop a new set
of interactions with the other species.
11.4.2
Definition
In order to define an improved, evolving model, one can assume that the Bak
Sneppen dynamics is combined with a fitnessdriven link updating. At the
initial state the network is generated as in the fitness model, and between all
pairs of vertices i and j a link is drawn with probability f (xi , xj ) (where the xi s
are the initial fitness values). Then, whenever a species i is assigned a new fitness
x0i , all the set of connections between i and the other vertices j 6= i are drawn
anew with updated probability f (x0i , xj ). This automatically implies that major
mutations (a large change in xi ) are associated with very different connection
probabilities, while little changes lead to almost equiprobable interactions. An
example of this evolution rule is depicted in figure 11.3.
If at time t the vertex i has the minimum fitness, at time t + 1 the fitness of
i as well as that of its neighbours is updated, i.e. drawn anew from the uniform
distribution on the unit interval. This means
xj (t + 1) = j
if j = i
or aij (t) = 1
(11.14)
where j is uniformly distributed between 0 and 1.
11.4.3
Analytical solution
Despite its complexity, the model is exactly solvable for any choice of the connection probability f (x, y) [6]. Indeed, one can write down a so-called master
156
Figure 11.3: Example of graph evolution in the selforganized model. The

minimumfitness vertex (black) and its two neighbours (gray) undergo a mutation: three new fitness values are assigned them (light grey), and new links are
drawn between them and all the other vertices.
equation for the evolution of the fitness distribution (x, t) at time t:
(x, t)
= rin (x, t) rout (x, t)
t
(11.15)
where rin (x, t) and rout (x, t) are the fractions (strictly speaking, the probability
densities) of vertices with fitness x entering and exiting the system at time t
respectively. If a stationary distribution (timeindependent) distribution (x)
exists, it is found by requiring
(x, t)
=0
t
rin (x) = rout (x)
(11.16)
where at the stationary state the quantities no longer depend on time. If one
manages to write down rin (x) and rout (x) in terms of f (x, y) and (x), then
the above condition will give the stationary form of (x) for any choice of f (x, y).
To this end, it is useful to introduce the probability density q(m) that the
minimum fitness takes the value xmin = m. For x small enough, (x) must be
very close to q(x)/N (the distribution of all fitness values must be approximated
by the correctly renormalized distribution of the minimum). The range where
(x) q(x)/N holds can be defined more formally by introducing the fitness
value such that
x
N (x) = 1
lim
(11.17)
N q(x)
>1
x>
This means that in the large size limit the fitness distribution for x < is
determined by the distribution of the minimum. After an expression for (x) is
derived, the value of can be determined by the normalization condition
Z 1
(x)dx = 1
(11.18)
0
as we show below. Note that we are not assuming from the beginning that > 0
as is observed for the BakSneppen model on other networks (with finite second
157
moment of the degree distribution). It may well be that for a particular choice
of f (x, y) eq.(11.18) yields = 0, signalling the absence of a nonzero threshold.
Also, note that
lim q(x) = 0
for x > ,
N
since eq.(11.17) implies that the minimum is surely below . Thus the normalization condition for q(x) reads
Z
q(x)dx = 1
as N .
The knowledge of q(m) allows one to rewrite rin (x) and rout (x) as
Z
rin (x) = q(m)rin (x|m)dm
and
r
out
Z
(x) =
q(m)rout (x|m)dm
respectively, where rin (x|m), rout (x|m) are conditional probabilities corresponding to the densities of vertices with fitness x which are added and removed when
the value of the minimum fitness is m.
Let us consider rin (x) first. If the minimum fitness is m, then on average
1 + k(m) new fitness values are updated, where k(m) is the expected degree of
the minimumfitness vertex, that can be calculated in a way similar to eq.(11.6).
Since each of these 1 + k(m) values is uniformly drawn between 0 and 1, one
has
1 + k(m)
rin (x|m) =
(11.19)
N
independently of x. This directly implies
rin (x) =
q(m)rin (x|m)dm =
1 + hkmin i
N
(11.20)
R
where hkmin i 0 q(m)k(m)dm is the expected degree of the vertex with minimum fitness (irrespective of the value of the minimum fitness itself), a quantity
that can be derived independently of k(m) as we show below.
Now consider rout (x), for which the independence on x does not hold. For
x , rout (x|m) = (x m)/N (where (x) is the Dirac delta function), since
the minimum is surely replaced and the probability of having other vertices with
fitness in this range is zero. For x > , the fraction of vertices with fitness x
that are removed equals (x) times the probability f (x, m) that a vertex with
fitness x is connected to the vertex with minimum fitness m [6]. This means
rout (x|m) = ( x)
(x m)
+ (x )(x)f (x, m)
N
(11.21)
where (x) = 1 if x > 0 and (x) = 0 otherwise. An integration over q(m)dm
158
yields
r
out
Z
(x)
q(m)rout (x|m)dm
=
0
q(x)/N
=
(x) R q(m)f (x, m)dm
0
(11.22)
x>
Finally, one can impose eq.(11.16) at the stationary state. For x , this
yields q(x) = 1+hkmin i independently of x. Combining this result with q(x) = 0
for x > as N , one finds that the distribution of the minimum fitness m
is uniform between 0 and :
q(m) = (1 + hkmin i)( m)
(11.23)
Requiring that q(m) is normalized yields

hkmin i =
and
q(m) =
(11.24)
( m)
,
which implies
(x) =
q(x)
1
=
N
N
for x
and, from eq.(11.20),

rin (x) =
1
N
x.
(11.25)
For x > , eq.(11.16) implies

(x)
R
0
=
=
=
rout (x)
q(m)f (x, m)dm
rin (x)
q(m)f (x, m)dm
0
1
R
N 0 q(m)f (x, m)dm
1
R
.
N 0 f (x, m)dm
R
(11.26)
Therefore the exact solution for (x) at the stationary state is found [6]:
( N )1
x
(x) =
(11.27)
1
R
x>
N 0 f (x, m)dm
where is determined using eq.(11.18), that reads
Z 1
dx
R
=N 1
f
(x,
m)dm
(11.28)
159
The above analytical solution holds for any form of f (x, y). As a novel result,
one finds that (x) is in general no longer uniform for x > . This unexpected
result, which contrasts with the outcomes of the BakSneppen model on any
static network, is solely due to the feedback between topology and dynamics.
At the stationary state the fitness values and the network topology continue to
evolve, but the knowledge of (x) allows to compute the expected topological
properties as shown in section 11.3 for the static fitness model.
11.4.4
Particular cases
In what follows we consider specific choices of the connection probability f (x, y).
In particular, we consider two forms already presented in section 11.3. Once
a choice for f (x, y) is made, one can also confirm the theoretical results with
numerical simulations. As we show below, the agreement is excellent.
The random neighbour model
As we have noted, the trivial choice for the fitness model is f (x, y) = p, which is
equivalent to the Erdos-Renyi model. When the BakSneppen dynamics takes
place on the network, this choice removes the feedback with the topology, since
the evolution of the fitness does not influences the connection probability. Indeed, this choice is asymptotically equivalent to the socalled random neighbour
variant [28] of the BakSneppen model. In this variant each vertex has exactly d
neighbours, which are uniformly chosen anew at each timestep. Here, we know
that for an Erdos-Renyi graph the degree is peaked about the average value
p(N 1), thus we expect to recover the same results found for d = p(N 1) in
the random neighbour model.
Homework 11.3 Show that, when f (x, y) = p,
( N )1
x
(x) =
(p N )1
x>
(11.29)
where, for N ,
=
(1 + d)1
1 + pN
if
pN 0
if
pN d > 0
if
pN
(11.30)

The reason for the onset of these three dynamical regimes must be searched
for in the topological phases of the underlying network. For p large, there is
one large connected component that spans almost all vertices. As p decreases,
this giant cluster becomes smaller, and several separate clusters form. Below
the critical percolation threshold pc 1/N [4, 5], the graph is split into many
small clusters. Exactly at the percolation threshold pc , the sizes of clusters are
powerlaw distributed according to P (s) s with = 2.5 [4]. Here we find
that the dense regime pN is qualitatively similar to a complete graph,
160
where many fitness values are continuously updated and therefore 0 as in

the initial state (thus (x) is not steplike). In the sparse case where pN = d
with finite d > 1 as N , then each vertex has a finite number of neighbours
exactly as in the random neighbour model, and one correctly recovers the finite
value = (1 + d)1 found in ref. [28]. The subcritical case when p falls faster
than 1/N yields a fragmented graph below the percolation threshold. This is
qualitatively similar to a set of N isolated vertices, for which 1. It is
instructive to notice from eq.(11.27) that the choice f (x, y) = p is the only one
for which (x) is still uniform. This confirms that, as soon as the feedback is
removed, the novel effects disappear.
Homework 11.4 Show that, in the stationary state of this specification of the
R1
model, the average fitness over all nodes (defined as hxi 0 x(x)dx) equals
hxi =
1+
2

The above exercise shows that, while at the initial state (t = 0) the average
fitness is hx(0)i = 1/2 (since the distribution is uniform in the unit interval),
at the stationary state (t = ) the average fitness increases to the asymptotic
value hx()i = (1 + )/2 1/2.
The selforganized configuration model
Following the considerations in section 11.3, the simplest nontrivial choice for
f (x, y) is given by eq.(11.11). For a fixed (x), this choice generates a fitness
dependent version of the configuration model [4, 70], where all graphs with the
same degree sequence are equiprobable. All higherorder properties besides the
structural correlations induced by the degree sequence are completely random
[68, 69]. In this selforganized case, the degree sequence is not specified a priori
and is determined by the fitness distribution at the stationary state. Inserting
eq.(11.11) into eq.(11.27) one finds a solution that for N is equivalent to
[6]
( N )1
x
(x) =
(11.31)
( N )1 + 2/(zN 2 x)
x>
where , again obtained using eq.(11.28), is
1
r
p
(zN )
=
(d)/d
zN
zN 0
zN = d
(11.32)
zN
Here (x) denotes the ProductLog function, defined as the solution of e = x.

Again, the above dynamical regimes are related to three (subcritical, sparse and
dense) underlying topological phases. This can be ascertained by monitoring
the cluster size distribution P (s). It is found that P (s) develops a powerlaw
shape P (s) s (with = 2.45 0.05) when d zN is set to the critical
Cluster Size Distribution P(s)
10
161
10
10
10
10
d = 1.32
d=4
d = 0.1
10
10
10
Cluster Size s
10
10
Figure 11.4: Cluster size distribution. Far from the critical threshold (d = 0.1
and d = 4), P (s) is well peaked. At dc = 1.32, P (s) s with = 2.45 0.05.
Here N = 3200. (After ref. [6]).
value dc = 1.32 0.05 [6] (see fig. 11.4), which therefore represents the percolation threshold. This behaviour can also be explored by measuring the fraction
of vertices spanned by the giant cluster as a function of d (see fig. 11.5). This
quantity is negligible for d < dc , while for d > dc it takes increasing finite values.
Also, one can plot the average size fraction of nongiant components. As shown
in the inset of fig. 11.5, this quantity diverges at the critical point where P (s)
is a power law.
The analytical results in eq.(11.31) mean that (x) is the superposition of a
uniform distribution and a powerlaw with exponent 1. The decay of (x) for
x > is entirely due to the coupling between extremal dynamics and topological
restructuring. It originates from the fact that at any time the fittest species is
also the most likely to be selected for mutation, since it has the largest probability to be connected to the least fit species. This is opposite to what happens
on fixed networks. The theoretical predictions in eqs.(11.31) and (11.32) can be
confirmed by large numerical simulations. This is shown in fig.11.6, where the
cumulative fitness distribution > (x) defined in eq.(11.7) and the behaviour of
(zN ) are plotted. Indeed, the simulations are in very good accordance with the
analytical solution. Note that, as we have discussed in section 11.3, in the sparse
regime z 1 one has f (x, y) zxy. Here, this implies a purely powerlaw behaviour (x) x1 for x > . Therefore > (x) is a logarithmic curve that looks
like a straight line in loglinear axes. In the dense regime obtained for large z,
the uniform part gives instead a significant deviation from the powerlaw trend.
This shows one effect of structural correlations.
162

1
N = 100
N = 200
N = 400
N = 800
N = 1600
N = 6400
0.6
2
NonGiant Component Size
Giant Component Fraction
0.8
0.4
0.2
1.8
1.6
1.4
1.2
1
d = Nz
10
d = Nz
Figure 11.5: Main panel: the fraction of nodes in the giant component for
different network sizes as a function of d. Inset: the non-giant component
average size as a function of d for N = 6400. (After ref. [6]).
Other effects are evident when considering the degree distribution P (k).
Using eq.(11.6) one can obtain the analytic expression of the expected degree
k(x) of a vertex with fitness x:
k(x) =
2
1 + zx
zx ln(1 + zx)
ln
+
z 2 1 + z x
z x
(11.33)
Computing the inverse function x(k) and plugging it into eq.(11.8) allows to
obtain the cumulative degree distribution P> (k). Both quantities are shown in
fig.11.7, and again the agreement between theory and simulations is excellent.
For small z, k(x) is linear, while for large z a saturation to the maximum value
kmax = k(1) takes place. As discussed in section 11.3, this implies that in the
sparse regime P (k) has the same shape as (x). Another difference from static
networks is that here remains finite even if P (k) k with < 3 [41, 42, 43].
For large z the presence of structural correlations introduces a sharp cutoff for
P (k).
Homework 11.5 Show that, in the stationary state of this specification of the
model, the average fitness over all nodes equals
hxi =
1
2
+
(1 )
2 N
zN 2
11.5. CONCLUSIONS
163
CDF
1
0.8
Tau
0.2
0.6
0.1
0.05
0.4
0.02
0.01
0.005
0.2
0.002
100
0.001
1000
10000 100000. 1. 106
0.0050.01
Nz
0.05 0.1
0.5
Figure 11.6: Main panel: cumulative density function > (x) in loglinear axes.
From right to left, z = 0.01, z = 0.1, z = 1, z = 10, z = 100, z = 1000
(N = 5000). Inset: loglog plot of (zN ). Solid lines: theoretical curves,
points: simulation results. (After ref. [6]).
The above exercise shows that at the stationary state the average fitness can
now be smaller than the initial value 1/2. This effect is due to the non-uniform
character of (x) at the stationary state, with a higher density of values just
above the threshold .
11.5
Conclusions
We have presented a brief, and by no means complete, summary of the ideas

that inspired much of the research on scaleinvariance and selfsimilarity, from
the early discovery of fractal behaviour to the more recent study of scalefree
networks. We have highlighted the importance of understanding the emergence of the ubiquitously observed patterns in terms of dynamical models. In
particular, the framework of SelfOrganized Criticality succeeds in explaining
the onset of fractal behaviour without external finetuning. According to the
SOC paradigm, open dissipative systems appear to evolve spontaneously to a
state where the response to an infinitesimal perturbation is characterized by
avalanches of all sizes. We have emphasized the importance of introducing similar mechanisms in the study of networks. In particular, we have argued that
in many cases of interest it is not justified to decouple the formation of a network from the dynamics taking place on it. In both cases, one is forced to
introduce ad hoc specifications for the process assumed to be slower. Indeed,
by presenting an extensive study of a selforganized network model, we have
shown that if the feedback between topology and dynamics is restored, novel
and unpredictable results are found. This indicates that adaptive networks pro-
164
10000
CDF
1
0.8
1000
0.6
100
0.4
10
0.2
0.001 0.01
0.1
x
10
100
1000
Figure 11.7: Left: k(x) (N = 5000; from right to left, z = 0.01, z = 0.1, z = 1,
z = 10, z = 100, z = 1000). Right: P> (k) (same parameter values, inverse order
from left to right). Solid lines: theoretical curves, points: simulation results.
(After ref. [6]).
vide a more complete explanation for the spontaneous emergence of complex
topological properties in real networks.
Bibliography
[1] Caldarelli G. Scale-Free Networks Oxford University Press, Oxford (2007).
[2] Caldarelli G., Vespignani A. (eds), Large Scale Structure and Dynamics of
Complex Networks (World Scientific Press, Singapore 2007).
[3] Dorogovtsev S.N. Mendes J.F.F. Evolution of Networks: From Biological
Nets to the Internet and WWW, Oxford University Press, Oxford (2003).
[4] M.E.J. Newman, SIAM Rev. 45, (2003) 167.
[5] Albert R., Barab
asi A.-L., Rev. of Mod. Phys., 74, (2001) 4797.
[6] Garlaschelli D., Capocci A., Caldarelli G., Nature Physics, 3 813-817
(2007).
[7] Caldarelli G., Capocci A., De Los Rios P. Mu
noz M. A., Phys. Rev. Lett.,
89, (2002) 258702.
[8] Bak P., Sneppen K., Phys. Rev. Lett., 71, 4083-4086 (1993).
[9] Mandelbrot B.B. The variation of certain speculative prices. J. Business
36 394-419, (1963).
[10] Mandelbrot B.B., How Long Is the Coast of Britain? Statistical SelfSimilarity and Fractional Dimension. Science 156, 636-638 (1967).
[11] Niemeyer L., Pietronero L., and Wiesmann H.J., Fractal Dimension of
Dielectric Breakdown, Phys. Rev. Lett. 52, 1033 (1984)
[12] Rodriguez-Iturbe, I., Rinaldo A., Fractal River Networks: Chance and
Self-Organization, Cambridge University Press, New York, (1997).
[13] Brady R.M., Ball, R.C. Fractal growth of Copper electrodeposits Nature
309, 225 (1984).
[14] Batty M., Longley P.A. Fractal Cities: a Geometry of Form and Functions
Academic Press, San Diego (1994)
[15] Mandelbrot B.B., Passoja D.E., Paullay A.J. Fractal character of fracture
surface in metals, Nature 308 721 (1984).
[16] Brown J.H., West G.B. (eds.), Scaling in biology (Oxford University Press,
2000).
165
166
BIBLIOGRAPHY
[17] Sierpi
nski W., Sur une courbe dont tout point est un point de ramification,
C. R. Acad. Sci. Paris 160 302-305 (1915).
[18] Eldredge N., Gould S.J., Punctuated equilibria: an alternative to phyletic
gradualism, In T.J.M. Schopf, ed., Models in Paleobiology. San Francisco:
Freeman Cooper. pp. 82-115 (1972). Reprinted in N. Eldredge Time frames.
Princeton: Princeton Univ. Press. 1985
[19] Jensen H. J., Self-Organized Criticality Cambridge University Press, Cambridge, (1998).
[20] Rigon R., Rodrguez-Iturbe I., Rinaldo A., Feasible optimality implies
Hacks law, Water Res. Res., 34, 3181-3190 (1998).
[21] Marani M., Maritan A., Caldarelli G., Banavar J.A., Rinaldo A., Stationary self-organized fractal structures in potential force fields, J. Phys. A 31,
337-343, (1998).
[22] Caylor K.K., Scanlon T.M. Rodrguez-Iturbe I., Feasible optimality of vegetation patterns in river basins, Geoph. Res. Lett,31, L13502 (2004).
[23] Ferrer i Cancho R. and Sole R.V., Optimisation in Complex Networks,
Lect. Not. in Phys., 625, 114-126, (2003)
[24] Caldarelli G., Maritan A., Vendruscolo M., Hot sandpiles, Europhys. Lett.
35 481-486 (1996).
[25] Caldarelli G., Mean Field Theory for Ordinary and Hot sandpiles, Physica
A, 252, 295-307 (1998).
[26] Bak P., Tang C. Weisenfeld K., Phys. Rev. Lett. 59, 381 (1987).
[27] Wilkinson D. Willemsen J.F., Invasion Percolation: a new form of Percolation Theory, J. Phys. A 16, 3365-3376 (1983).
[28] Flyvbjerg H., Sneppen K., Bak P., Phys. Rev. Lett. 71, 4087 (1993).
[29] Grassberger P., Phys. Lett. A 200 277 (1995).
[30] Dickman R.,Mu
noz M.A., Vespignani A., Zapperi S., Braz. J. Phys. 30 27
(2000).
[31] Benton M.J., The fossil record 2, Chapman and Hall, London. (1993).
[32] De Los Rios P., Marsili M., Vendruscolo M., Phys. Rev. Lett. 80 5746
(1998).
[33] Dorogovtsev S.N., Mendes J.F.F., Pogorelov Y.G., Phys. Rev. E 62 295
(2000).
[34] Marsili M., Europhys. Lett. 28, 385 (1994).
[35] Mikeska B., Phys. Rev. E 55 3708 (1997).
[36] Paczuski M., Maslov S., Bak P., Europhys. Lett. 27 97 (1994).
BIBLIOGRAPHY
167
[37] Caldarelli G., Felici M., Gabrielli A., Pietronero L., Phys. Rev. E 65 (2002)
046101.
[38] M. Felici, G. Caldarelli, A. Gabrielli, L. Pietronero, Phys. Rev. Lett., 86,
(2001) 1896-1899.
[39] P. De Los Rios, M. Marsili and M. Vendruscolo, Phys. Rev. Lett., 80,
(1998) 5746-5749.
[40] Kulkarni, R. V., Almaas, E. & Stroud, D. Evolutionary dynamics in the
BakSneppen model on smallworld networks. ArXiv:cond-mat/9905066.
[41] Moreno, Y. & Vazquez, A. The BakSneppen model on scalefree networks. Europhys. Lett. 57(5), 765771 (2002).
[42] Lee, S. & Kim, Y. Coevolutionary dynamics on scale-free networks. Phys.
Rev. E 71, 057102 (2005).
[43] Masuda, N., Goh, K.-I. & Kahng, B. Extremal dynamics on complex networks: Analytic solutions. Phys. Rev. E 72, 066106 (2005).
[44] Garcia G.J.M., Dickman R. Asymmetric dynamics and critical behavior
in the Bak-Sneppen model, Physica A 342, 516-528 (2004).
[45] Middendorf M., Ziv E., Wiggins C.H. Inferring network mechanisms: The
Drosophila melanogaster protein interaction network, Proc. Nat. Acad.
Sci. 102, 3192-3197 (2005).
[46] Giot L et al, A protein interaction map of Drosophila melanogaster, Science
302 1727-36 (2003).
[47] G. Caldarelli, P.G. Higgs and A.J. McKane, Journ. Theor. Biol. 193,
(1998) 345.
[48] Garlaschelli D., Caldarelli G. Pietronero L. Universal scaling relations in
food webs, Nature 423, 165-168 (2003).
[49] Burlando B, Journal Theoretical Biology 146 99-114 (1990).
[50] Burlando B, Journal Theoretical Biology 163 161-172 (1993).
[51] Caretta Cartozo C., Garlaschelli D., Ricotta C., Barthelemy M., Caldarelli
G. J. Phys. A: Math. Theor. 41, 224012 (2008).
[52] D. Garlaschelli, S. Battiston, M. Castri, V.D.P. Servedio and G. Caldarelli,
Phys. A 350, (2005) 491-499.
[53] D. Garlaschelli and M.I. Loffredo, Phys. Rev. Lett. 93, (2004) 188701.
[54] Faloutsos M, Faloutsos P., Faloutsos C., On Power-law Relationships of the
Internet Topology, Proc. ACM SIGCOMM, Comp. Comm. Rev., 29,251262 (1999).
[55] Adamic L.A., Huberman B.A, Power-Law Distribution of the World Wide
Web, Science 287, 2115 (2000).
168
BIBLIOGRAPHY
[56] Caldarelli G., R. Marchetti R., and Pietronero L., Europhys. Lett. 52, 386
(2000).
[57] R. Pastor-Satorras, A. Vespignani, Phys. Rev. Lett. 86, 3200 (2001).
[58] S. N. Dorogovtsev, A. V. Goltsev, J. F. F. Mendes, Critical phenomena in
complex networks, arXiv:0705.0010v6.
[59] D. Garlaschelli and M.I. Loffredo, Physica A 338(1-2), 113-118 (2004).
[60] D. Garlaschelli and M.I. Loffredo, J. Phys. A: Math. Theor. 41, 224018
(2008).
[61] K.-I. Goh, D.-S. Lee, B. Kahng, D. Kim, Phys. Rev. Lett. 91, 148701
(2003).
[62] Barab
asi A.-L., Albert R. Emergence of scaling in random networks, Science 286, 509-512 (1999).
[63] Fronczak A., Fronczak P., Holyst J.A., Mean-Field theory for clustering
coefficient in Barabasi-Albert networks, Phys. Rev. E, 68, 046126 (2003).
[64] Barrat A., Pastor-Satorras R., Rate equation approach for correlations in
growing network models, Phys. Rev. E, 71, 036127 (2005).
[65] Bollob
as B., Riordan O., The diameter of a scale-free random graph, Combinatorica, 24, 5-34 (2004).
[66] M. Bogu
n
a and R. Pastor-Satorras, Phys. Rev. E 68, (2003) 036112.
[67] V.D.P. Servedio, G. Caldarelli and P. Butt`a, Phys. Rev. E 70 (2004)
056126.
[68] J. Park and M.E.J. Newman, Phys. Rev. E 68, (2003) 026112.
[69] D. Garlaschelli and M.I. Loffredo, ArXiv:cond-mat/0609015.
[70] S. Maslov, K. Sneppen, and A. Zaliznyak, Physica A 333, (2004) 529.
[71] D. Garlaschelli, S. E. Ahnert, T. M. A. Fink, G. Caldarelli, ArXiv:condmat/0606805v1.
[72] Jain, S. & Krishna, S. Autocatalytic Sets and the Growth of Complexity
in an Evolutionary Model. Phys. Rev. Lett. 81, 56845687 (1998).
[73] Bianconi, G. & Marsili, M. Clogging and selforganized criticality in complex networks. Phys. Rev. E 70, 035105(R) (2004).
[74] Fronczak, P., Fronczak, A. & Holyst, J. A. Selforganized criticality and
coevolution of network structure and dynamics. Phys. Rev. E 73, 046117
(2006).
[75] Zanette,D. H. & Gil, S. Opinion spreading and agent segregation on evolving networks. Physica D 224(1-2), 156165 (2006).
BIBLIOGRAPHY
169
[76] Santos, F. C., Pacheco, J. M. & Lenaerts, T. Cooperation Prevails When

Individuals Adjust Their Social Ties. PLoS Comput. Biol. 2(10): e140
(2006).
[77] B. Kozma, A. Barrat, Phys. Rev. E 77, 016102 (2008).
[78] Balcan, D. & Erzan, A. Content-based networks: A pedagogical overview.
CHAOS 17, 026108 (2007).
[79] G. Caldarelli, A. Capocci, D. Garlaschelli, A Selforganized model for
network evolution, European Physical Journal B, in press (2008).
Chapter 12
Visualizing Dynamics and

Higher Order Properties
This chapter is about visualization.
Most real world networks evolve. The Internet is a dynamic entity, friendship
networks evolve, business networks evolve. In a previous chapter we have seen
the principle of preferential attachment. In this chapter we will see examples
of how networks change in time, how they grow, and how we can see how they
change. In this chapter we will introduce the second high level tool to study
networks, Netlogo.
A second topic that will be covered in this chapter is that of the implementation of higher order properties. As has been noted in a previous chapter, the
degree of vertices is one of the most useful properties. Vertex degree is a first
order property, since it depends on a single vertex. Higher order properties
depend on neighbors of vertices, and can tell us more about properties of the
network as a whole. We will use Gnuplot to visualize these lower and higher
order properties.
12.1
Network Dynamics
We will start with network dynamics. Recall from the beginning of this course
the random graph process. In the next exercise, you will implement the random
graph process.
Exercise 12.1 Question: Random graph process. Generate G, a sequence of
random graphs, for n 0, . . . , 100 vertices, using the same four edge probabilities
p = 0.0, 0.1, 0.5, 1.0. Do not print the graph, only print mean and variance of
the number of edges.
12.1.1
Netlogo
Many computer scientists are fascinated by the dynamics of complex networks.

One of the tools that can be used to visualize the dynamics of unfolding complex processes is Netlogo. NetLogo is a free agent-based programming language
170
12.1. NETWORK DYNAMICS
171
and integrated modeling environment. The NetLogo environment enables exploration of emergent phenomena. It comes with an extensive models library
including models in a variety of domains, such as economics, biology, physics,
chemistry, psychology, system dynamics. Netlogo is created by Uri Wilensky of
Nortwestern University, Evanston, Illinois, USA [1]. It is a wonderful system for
exploration and creating an understanding of the dynamics of complex systems,
that we gratefully acknowledge.
Installing NetLogo
You should now download and install NetLogo onto your computer. The homepage for Netlogo is http://ccl.northwestern.edu/netlogo/ NetLogo is freely
downloadable, and comes with an extensive library of example models. Now go
to the webpage (or search for Netlogo, which should get you to the homepage).
When downloading, choose version 5.0.5, fill in your name if you wish, and
choose your operating system: Linux, Mac OS X, or Windows. If you run into
problems, read the manual, look at the tutorial, or ask the course assistants
for help. You really want to have Netlogo running on your machine, and have
played a bit with it before the lecture starts.
To quote the documentation:
NetLogo is a programmable modeling environment for simulating
natural and social phenomena. It was authored by Uri Wilensky
in 1999 and has been in continuous development ever since at the
Center for Connected Learning and Computer-Based Modeling.
NetLogo is particularly well suited for modeling complex systems
developing over time. Modelers can give instructions to hundreds or
thousands of agents all operating independently. This makes it
possible to explore the connection between the micro-level behavior
of individuals and the macro-level patterns that emerge from their
interaction.
NetLogo lets students open simulations and play with them,
exploring their behavior under various conditions. It is also an
authoring environment which enables students, teachers and curriculum developers to create their own models. NetLogo is simple
enough for students and teachers, yet advanced enough to serve as
a powerful tool for researchers in many fields.
NetLogo has extensive documentation and tutorials. It also comes
with the Models Library, a large collection of pre-written simulations that can be used and modified. These simulations address
content areas in the natural and social sciences including biology
and medicine, physics and chemistry, mathematics and computer
science, and economics and social psychology.
Now please startup NetLogo (the normal version, we will not use 3D), and
go to the Models Library (Command-M, Ctrl-M, Alt-M, or File/Models) to
browse through some example models. (Please refer to figure 12.1 for a related
screenshot.)
172CHAPTER 12. VISUALIZING DYNAMICS AND HIGHER ORDER PROPERTIES
Figure 12.1: Open Fireworks
12.1. NETWORK DYNAMICS
173
Figure 12.2: Preferential Attachment
12.1.2
Preferential Attachment
Preferential attachment is a basic concept of our field. It has been covered

at some depth in section 3.2. We will now load an instructive animation of
preferential attachment. Load the model (See figure 12.2). We repeat the text
from the Info tab of Netlogo.
From chapter 2, recall that in some networks, a few hubs have many connections, while everybody else only has a few. This model shows one way such
networks can arise. Such networks can be found in a surprisingly large range
of real world situations, ranging from the connections between websites to the
collaborations between actors. This model generates these networks by a process of preferential attachment, in which new network members prefer to make
a connection to the more popular existing members. The model starts with two
nodes connected by an edge. At each step, a new node is added. A new node
picks an existing node to connect to randomly, but with some bias. More specifically, a nodes chance of being selected is directly proportional to the number of
connections it already has, or its degree. This is the mechanism which is called
preferential attachment.
Pressing the GO ONCE button adds one new node. To continuously add
nodes, press GO. The LAYOUT? switch controls whether or not the layout
procedure is run. This procedure attempts to move the nodes around to make
the structure of the network easier to see. The PLOT? switch turns off the plots
which speeds up the model. The RESIZE-NODES button will make all of the
nodes take on a size representative of their degree distribution. If you press it
again the nodes will return to equal size. If you want the model to run faster,
you can turn off the LAYOUT? and PLOT? switches and/or freeze the view
(using the on/off button in the control strip over the view). The LAYOUT?
switch has the greatest effect on the speed of the model. If you have LAYOUT?
switched off, and then want the network to have a more appealing layout, press
the REDO-LAYOUT button which will run the layout-step procedure until you
press the button again. You can press REDO-LAYOUT at any time even if you
had LAYOUT? switched on and it will try to make the network easier to see.
Things to Notice
The networks that result from running this model are often called scale-free or
power law networks. These are networks in which the distribution of the number
of connections of each node is not a normal distribution instead it follows what
is a called a power law distribution. Power law distributions are different from
normal distributions in that they do not have a peak at the average, and they are
more likely to contain extreme values (see Albert & Barabasi 2002 for a further
description of the frequency and significance of scale-free networks). Barabasi
and Albert originally described this mechanism for creating networks, but there
are other mechanisms of creating scale-free networks and so the networks created
by the mechanism implemented in this model are referred to as Barabasi scalefree networks.
You can see the degree distribution of the network in this model by looking
at the plots. The top plot is a histogram of the degree of each node. The
bottom plot shows the same data, but both axes are on a logarithmic scale.
When degree distribution follows a power law, it appears as a straight line on
the log-log plot. One simple way to think about power laws is that if there is
one node with a degree distribution of 1000, then there will be ten nodes with
a degree distribution of 100, and 100 nodes with a degree distribution of 10.
Exercise 12.2 Let the model run a little while. How many nodes are hubs,
that is, have many connections? How many have only a few? Does some low
degree node ever become a hub? How often? Turn off the LAYOUT? switch and
freeze the view to speed up the model, then allow a large network to form. What
is the shape of the histogram in the top plot? What do you see in log-log plot?
Notice that the log-log plot is only a straight line for a limited range of values.
Why is this? Does the degree to which the log-log plot resembles a straight line
grow as you add more node to the network?
12.1.3
Percolation Transition
We can use our graph program to study the percolation transition of the clustering of the network that was discussed in chapter 8.
12.2. NETWORK PROPERTIES
175
Exercise 12.3 For n = 100, choose three values for p: pick p < 1/n, p = 1/n,
and p > 1/n. Compute the clustering coefficient for all three values of p. Explain
what you see using the concept of percolation transition (refer to the chapter on
random graphs).
Now we turn to Netlogo.
Exercise 12.4 From the Model Library, load Earth Science/Percolation. Run
the Simulation. Read the Info tab.
Try different settings for the porosity. What do you notice about the pattern
of affected soil? Can you find a setting where the oil just keeps sinking, and a
setting where it just stops?
If percolation stops at a certain porosity, its still possible that it would percolate further at that porosity given a wider view. Note the plot of the size of
the leading edge of oil. Does the value settle down roughly to a constant? How
does this value depend on the porosity?
Give the soil different porosity at different depths. How does it affect the
flow? In a real situation, if you took soil samples, Could you reliably predict
how deep an oil spill would go or be likely to go?
12.2
Network Properties
Now that we have found a way to create a network that conforms to a prespecified degree sequence, it is time to study some network properties. We do
so in order to see if the network does indeed have the properties of real world
networks.
The system that you can use to create the plots can be either a generic
spreadsheet such as Microsoft Excel, Apple Numbers, Google Sheets, OpenOffice
Calc, or a dedicated plotting tool such as Gnuplot.
12.2.1
Gnuplot
Gnuplot is a tool that allows easy plotting of statistical plots. This subsection
contains a small Gnuplot example to get you going. Refer to your first year
introduction to programming course for more (or use Excel, OpenOffice, etc.).
The plots that we will make are quite simple, a sequence of pairs or numbers
(x, y) is plotted on a 2 dimensional plane. Gnuplot expects an input file with
some commands. As an example, let us create a command file example.gpl
that looks like the following:
set term postscript enhanced color 20
set output example.eps
set xlabel "degree"
set ylabel "frequency"
set xrange [0 : 10]
set yrange [0 : 20]
plot "example.input" using 1:2 title "degree distribution" with lines axes x1y1 ls 1
20
degree distribution
frequency
15
10
0
0
10
degree
Figure 12.3: Gnuplot of the example file
set term x11

set output
Assume that you have an input file named example.input that looks as
follows:
1
2
3
4
5
4
8
9
3
2
When you run the command \$ gnuplot example.gpl then you should get an
output file named example.eps that looks like Figure 12.3.
Homework 12.1 In the chapter on random graphs the clustering coefficient
of a graph has been defined as the ratio of triangles to wedges in the graph
G
(CG = W
). As formula for the number of triangles and wedges we recall
G
X
G =
i1 ,i2 ,i3 V
1{i1 i2 ,i2 i3 ,i3 i1
are present} ,
WG =
1{i1 i2 ,i2 i3
are present} ,
i1 ,i2 ,i3 V
(12.1)
Now add to your program the computation of this wedge-triangle clustering coefficient.
Hint: the definitions of G and WG suggest the use of nested for-loops for a
straightforward algorithm.
12.2.2
Empirical Network properties
We can now start preparing graphs of network properties. In the following exercise you will create plots for three graphs: for the 10-vertex Erdos-Renyi
12.2. NETWORK PROPERTIES
177
graph, and for the 10-vertex and for the 20-vertex Repeated Configuration
Model graphs. Create plots for each of the following network properties: empirical degree distribution, empirical nearest-neighbour degree, empirical local
clustering coefficient and clustering coefficient versus degree. See section 4.2.2.
1. Empirical Degree distribution
The degree distribution is a first-order topological property. Plot on the
x-axis the value k of the degree from 0 to the highest occurring degreevalue, and on the y- axis the count of the number of nodes that have that
degree.
2. Empirical Average Nearest-Neighbour Degree
A second-order topological property is the average nearest-neighbour degree of a vertex i, or kinn . It can be computed by average kinn over all
vertices with the same degree k and plot this value on the y-axis, against
the value of k on the x-axis. Are degrees positively correlated (are highdegree vertices on average linked to high-degree vertices, does the network
display assortative mixing)?
3. Empirical Local Clustering Coefficient
The most studied third-order property of a vertex is the local clustering
coefficient Ci . It is defined as the number of links connecting the neighbours of node i to each other, divided by the total number of pairs of
neighbours of node i.
4. Empirical Clustering Coefficient versus Degree
As chapter 4 noted, a statistical way to consider the clustering properties
of real networks, is to compute the average value of Ci over all vertices
with a given degree k and plotting this value (on the y-axis) against k (on
the x-axis). Note that in many cases the average clustering decreases as
k decreases. What do your three networks do?
Homework 12.2 Plotting Network Properties
From the above list, create the program to print the four plots of the network
properties for the three networks. 12 plots in total. Answer any questions that
were asked at plot number 2 and 4.
Note that chapter 4 contains more higher order properties of networks. Some
are easy to implement, some take a bit more work.
Exercise 12.5 It is instructive to see the behavior of these other properties on
your networks (and you can of course define your own networks by defining new
degree-sequences).
Bibliography
[1] Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo/.
Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
178
Chapter 13
Analysing Real Networks

In the previous chapters we have studied the behavior of regular random networks, of the configuration model, and of their network properties. All networks
so far have been artificially generated networks. In a sense, this was our goal:
to see if we could find a way to generate synthetic networks that nevertheless
display the same properties as real-world networks.
In this final chapter we will have a look at the properties of real real-world
networks. From a repository of real-world networks we will download a few
networks, load them into our program, and study their properties.
13.1
Adjacency Lists
To do so, our programs must be able to handle real networks. The networks
that we generated so far consist of modest numbers of vertices and edges that
fitted easily in the memory of our computer. Real real-world networks consist
of larger numbers on vertices and edges. Real real-world networks are also
invariably sparse.
Storing large networks that are sparse in an adjacency matrix would mean
allocating storage for V 2 edges that would mostly be zero. A more efficient
method to use memory is to only store the non-zero edges. For sparse networks
with O(E) O(V ) edges, this would save O(V 2 ) space for large V .
Exercise 13.1 Why?
We will now change the primary data structure of the programs that we have
used so far from an adjacency matrix to an adjacency list. If you have applied the
principles of modularity and abstraction in the data structure implementation,
then it should now be easy to replace the implementation by another one.
An adjacency list is a linked list, which you may recall from your introductory programming classes.1 In C/C++ you may implement a linked list from
scratch, or you may use a library implementation such as the Boost library.2
1 You
may want to read up on adjaceny lists and linked list data structures on Wikipedia
or on programming websites such as Stackoverflow.
2 www.boost.org
179
180
CHAPTER 13. ANALYSING REAL NETWORKS
Python and Java provide similar solutions. The web abounds with suitable solutions. (In fact, there are so many solutions out there that some people may
find implementing their own linked list implementation quicker.)
Exercise 13.2 Implement an adjacency list data structure for your program.
Check that it works correctly with the graphs that the program generates.
Homework 13.1 Recompile and test your Erd
os-Renyi program using adjacency lists. Use as a test-case one hundred thousand vertices, and a connection
probability of p = 0.001. Print the number of edges. Does your program work?
Use progressively larger values of p. For which p value does the program stop
working? For how many edges does it still work?
Exercise 13.3 Mean and variance of edges
Incorporate the adjacency list implementation in your Erd
os-Renyi program
from Homework Exercise 6.2 where you computed the mean an variance of 10
random graphs. Choose large values for V = 1000000 and small values for
p = 0.001. Compute and print the mean number of edges and the variance.
13.2
SNAP real networks
There are many repositories of real-world networks. One such repository is the
Stanford Network Analysis Project, or SNAP, at http://snap.stanford.edu.
Go to SNAP, and choose the Stanford Large Network Dataset Collection (the
third big blue entry).
Download cit-HepPh and roadNet-CA. The *.gz file type is gzip, you can
unzip the textfile with gunzip.
The structure of the files is simple. Use a viewer or an editor (such as
Wordpad, TextEdit, VI, or Emacs) to view the files. Note that the files contain
a few lines of commentary (preceded by a hash-mark #) and then lines in which
the edges are specified as vertex-vertex pairs.
We will now add an input routine to the program, which reads the graph
from an input file.
Homework 13.2 Write a short input routine for reading in the graph from
the input file. You can choose your own solution: to recognize comments and
skip those lines, or to edit the files and to manually remove the comments before
feeding it to your program, or another solution that works. Create a short 5 line
example graph to test your input routine. Test the program.
Go back to your programming course notes for tips on how to implement file
management and reading and parsing input.
Homework 13.3 Properties of citation network
Incorporate the input routine in your adjacency list program. Read in real real
world networks from Stanford Large Networks Project, specifically the citation
network cit-HepPh. Plot the same four network properties as for Homework Exercise 12.2 for the Repeated Configuration Model: Degree Distribution, Average
Nearest Neighbour Degree, Local Clustering Coefficient, Clustering Coefficient
versus degree.
13.2. SNAP REAL NETWORKS
181
Homework 13.4 Are the plots different? How are they different? Is the
Repeated Configuraton Model a good approximation of a real real-world network?
Homework 13.5 Create the same plots for the roadNet-CA network. How is
the roadmap network different from the citation network?
Homework 13.6 Now that you have seen the properties of real real-world
networks, can you create a pre-specified degree sequence for the repeated configuration model that comes closer to a real real world network than the previous
network from exercise 7.3? Draw up this sequence, and provide the four plots.
Bibliography
182

Dik Taat

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Dik Taat

Diunggah oleh

Hak Cipta:

Format Tersedia

COMPLEX NETWORKS

Diego Garlaschelli, Frank den Hollander, Aske Plaat

The course consists of 13 lectures (1 introduction, 6 theory, 6 applications) and

DG+FdH+AP real-world networks, examples,

random graphs, degree distribution, sparseness, scale freeness,

configuration model, preferential

empirical topological properties of

maximum-entropy, network ensembles

implementation basics, adjacency

implementing the Configuration

ordinary percolation, invasion

contact process, epidemic, spread

pattern detection in networks

network dynamics and higher order properties

implementing real networks, adjacency lists

6 Random Graph Implementation

7 Configuration Model Implementation

Invasion percolation . . . . . . . . . . . . . . . . . . . . . . . . . 109

10 Pattern detection in networks

12 Visualizing Dynamics and Higher Order Properties

13 Analysing Real Networks

1.2. SOCIAL NETWORKS

In 1967, psychologist Stanley Milgram performed the following experiment. He

CHAPTER 1. REAL-WORLD NETWORKS

1.2. SOCIAL NETWORKS

CHAPTER 1. REAL-WORLD NETWORKS

Figure 1.2: An artist impression of the collaboration graph in mathematics.

De Castro and Grossman [15, 16] investigated the Erdos-numbers of Nobel

1.2. SOCIAL NETWORKS

Number of vertices with given degree

Figure 1.3: The degree sequence in the collaboration graph.

CHAPTER 1. REAL-WORLD NETWORKS

and Page [6] for the original reference).

Figure 1.4: The in-degree sequence in the WWW.

1.2. SOCIAL NETWORKS

Figure 1.5: The structure of the WWW.

CHAPTER 1. REAL-WORLD NETWORKS

Internet is a physical network of computers, connected by cables transferring

1.3. TECHNOLOGICAL NETWORKS

Figure 1.7: Portrait of a particular snapshot of Internet (from http://www.watblog.

CHAPTER 1. REAL-WORLD NETWORKS

inverse power of k, i.e.,

$ [ ZH $  - De  _ H D - Q e

dv forvh dv srvvleoh1 Wr uvw rughu/ wkh/ sorwv

ehdxw| ri dq dv|pswrwlf dqdo|vlv lv wkdw wkhvh vpdoo gh0

1.3. TECHNOLOGICAL NETWORKS

Figure 1.11: Flights worldwide on a single day.

CHAPTER 1. REAL-WORLD NETWORKS

as they may yield an optimal profit on the average.

The economy is a large, complex and networked system operating at different

In financial markets, a large number of people (individual investors as well as

1.4. ECONOMIC NETWORKS

Another possibility is to define firm ownership and shareholding networks (Kogut

World Trade Web

CHAPTER 1. REAL-WORLD NETWORKS

1.5. BIOLOGICAL NETWORKS

Biological networks are shaped by natural evolution. Therefore their structure

Protein Interaction Networks and Genetic Networks

Neural Networks and Vascular Networks

Examples at the organism level include neural networks (White, Southgate,

CHAPTER 1. REAL-WORLD NETWORKS

Still other types of networks

1.6. STILL OTHER TYPES OF NETWORKS

Figure 1.14: The network of frequently co-purchased (on Amazon.com) books

$ [ ZH $ - De _ H D - Q e

dv forvh dv srvvleoh1 Wr uvw rughu/ wkh/ sorwv