Anda di halaman 1dari 157

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL

INSTITUTO DE INFORMÁTICA
PROGRAMA DE PÓS-GRADUAÇÃO EM COMPUTAÇÃO

MARCELO CAGGIANI LUIZELLI

Scalable Cost-Efficient Placement and


Chaining of Virtual Network Functions

Thesis presented in partial fulfillment


of the requirements for the degree of
Doctor of Computer Science

Advisor: Prof. Dr. Luciano Paschoal Gaspary


Coadvisor: Prof. Dr. Luciana Salete Buriol

Porto Alegre
August 2017
CIP — CATALOGING-IN-PUBLICATION

Luizelli, Marcelo Caggiani

Scalable Cost-Efficient Placement and Chaining of Virtual


Network Functions / Marcelo Caggiani Luizelli. – Porto Alegre:
PPGC da UFRGS, 2017.

157 f.: il.

Thesis (Ph.D.) – Universidade Federal do Rio Grande do Sul.


Programa de Pós-Graduação em Computação, Porto Alegre, BR–
RS, 2017. Advisor: Luciano Paschoal Gaspary; Coadvisor: Lu-
ciana Salete Buriol.

1. Network Function Virtualization. 2. Service Chaining.


3. NFV Orchestration. 4. Combinatorial Optimization. 5. Math-
ematical Programming. 6. Math-heuristic. 7. Variable Neighbor-
hood Search. 8. Operational Cost. 9. Open vSwitch. 10. Per-
formance Evaluation. I. Gaspary, Luciano Paschoal. II. Buriol,
Luciana Salete. III. Título.

UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL


Reitor: Prof. Rui Vicente Oppermann
Vice-Reitora: Profa. Jane Fraga Tutikian
Pró-Reitor de Pós-Graduação: Prof. Celso Giannetti Loureiro Chaves
Diretor do Instituto de Informática: Prof. Carla Maria Dal Sasso Freitas
Coordenador do PPGC: Prof. João Luiz Dihl Comba
Bibliotecária-chefe do Instituto de Informática: Beatriz Regina Bastos Haro
“The formulation of a problem is often more essential than its solution, which may be
merely a matter of mathematical or experimental skill. To raise new questions, new
possibilities, to regard old problems from a new angle, requires creative imagination
and marks real advance in science.”
— A LBERT E INSTEIN

"Science is a way of life. Science is a perspective. Science is the process


that takes us from confusion to understanding in a manner that’s precise,
predictive and reliable – a transformation, for those lucky enough to
experience it, that is empowering and emotional."
— B RIAN G REENE
ACKNOWLEDGMENTS

First of all, I would like to thank God for giving me such an amazing opportunity in life.
I would like to thank my parents and brothers for the unconditional support. I am quite
aware that while I was a student, our time hanging out has been really short and joyful moments
very sporadic. If I am taking one more step ahead, this is mainly because you have always been
supporting me. For that, I will always be thankful.
I also would like to especially thank Fernanda, let’s say my "wife", for mainly the patience
and companionship during all these years of study. Certainly, I would not have gotten this far
without all your support and care. I Love you!
I would like to thank my advisors – Prof. Luciano Paschoal and Prof. Luciana Buriol –
who have always been incredibly available when I most needed. Thanks a lot for the technical
and philosophical discussion, as well as for that "little push" when the idle time comes about.
Although this time is coming to an end, I will try to stick around.
I also really thankful to Prof. Danny Raz who hosted me incredibly well during my time
abroad. My time in Israel would not have been the same without all the support gave by Jose
Yallouz and his family. Thanks a lot to Itzik Ashkenazi (from the Technion) for the very good
moments during the working time – it was a pleasure to work together. Further, I also would
like to thank all the guys from NOKIA Bell Labs Israel whom I had the pleasure to work with
– Yaniv Saar, Gil Einziger, Erez Waisbard and Shachar Beiser.
Last, my sincere gratitude goes as well to all members of the computer networking group at
UFRGS.
ABSTRACT

Network Function Virtualization (NFV) is a novel concept that is reshaping the middlebox
arena, shifting network functions (e.g. firewall, gateways, proxies) from specialized hardware
appliances to software images running on commodity hardware. This concept has potential to
make network function provision and operation more flexible and cost-effective, paramount in
a world where deployed middleboxes may easily reach the order of hundreds. Despite recent
research activity in the field, little has been done towards scalable and cost-efficient placement
& chaining of virtual network functions (VNFs) – a key feature for the effective success of
NFV. More specifically, existing strategies have neglected the chaining aspect of NFV (focus-
ing on efficient placement only), failed to scale to hundreds of network functions and relied
on unrealistic operational costs. In this thesis, we approach VNF placement and chaining as
an optimization problem in the context of Inter- and Intra-datacenter. First, we formalize the
Virtual Network Function Placement and Chaining (VNFPC) problem and propose an Integer
Linear Programming (ILP) model to solve it. The goal is to minimize required resource allo-
cation, while meeting network flow requirements and constraints. Then, we address scalability
of VNFPC problem to solve large instances (i.e., thousands of NFV nodes) by proposing a fix-
and-optimize-based heuristic algorithm for tackling it. Our algorithm incorporates a Variable
Neighborhood Search (VNS) meta-heuristic, for efficiently exploring the placement and chain-
ing solution space. Further, we assess the performance limitations of typical NFV-based deploy-
ments and the incurred operational costs of commodity servers and propose an analytical model
that accurately predict the operational costs for arbitrary service chain requirements. Then, we
develop a general service chain intra-datacenter deployment mechanism (named OCM – Op-
erational Cost Minimization) that considers both the actual performance of the service chains
(e.g., CPU requirements) as well as the operational incurred cost. Our novel algorithm is based
on an extension of the well-known reduction from weighted matching to min-cost flow prob-
lem. Finally, we tackle the problem of monitoring service chains in NFV-based environments.
For that, we introduce the DNM (Distributed Network Monitoring) problem and propose an
optimization model to solve it. DNM allows service chain segments to be independently mon-
itored, which allows specialized network monitoring requirements to be met in a efficient and
coordinated way. Results show that the proposed ILP model for the VNFPC problem leads to a
reduction of up to 25% in end-to-end delays (in comparison to chainings observed in traditional
infrastructures) and an acceptable resource over-provisioning limited to 4%. Also, we provide
strong evidences that our fix-and-optimize based heuristic is able to find feasible, high-quality
solutions efficiently, even in scenarios scaling to thousands of VNFs. Further, we provide in-
depth insights on network performance metrics (such as throughput, CPU utilization and packet
processing) and its current limitations while considering typical deployment strategies. Our
OCM algorithm reduces significantly operational costs when compared to the de-facto standard
placement mechanisms used in Cloud systems. Last, our DNM model allows finer grained net-
work monitoring with limited overheads. By coordinating the placement of monitoring sinks
and the forwarding of network monitoring traffic, DNM can reduce the number of monitoring
sinks and the network resource consumption (54% lower than a traditional method).

Keywords: Network Function Virtualization. Service Chaining. NFV Orchestration. Com-


binatorial Optimization. Mathematical Programming. Math-heuristic. Variable Neighborhood
Search. Operational Cost. Open vSwitch. Performance Evaluation.
Posicionamento e Encadeamento Escalável e de Baixo Custo de Funções Virtualizadas de
Rede

RESUMO

A Virtualização de Funções de Rede (NFV – Network Function Virtualization) é um novo con-


ceito arquitetural que está remodelando a operação de funções de rede (e.g., firewall, gateways
e proxies). O conceito principal de NFV consiste em desacoplar a lógica de funções de rede
dos dispositivos de hardware especializados e, desta forma, permite a execução de imagens
de software sobre hardware de prateleira (COTS – Commercial Off-The-Shelf). NFV tem o
potencial para tornar a operação das funções de rede mais flexíveis e econômicas, primordiais
em ambientes onde o número de funções implantadas pode chegar facilmente à ordem de cen-
tenas. Apesar da intensa atividade de pesquisa na área, o problema de posicionar e encadear
funções de rede virtuais (VNF – Virtual Network Functions) de maneira escalável e com baixo
custo ainda apresenta uma série de limitações. Mais especificamente, as estratégias existentes
na literatura negligenciam o aspecto de encadeamento de VNFs (i.e., objetivam sobretudo o po-
sicionamento), não escalam para o tamanho das infraestruturas NFV (i.e., milhares de nós com
capacidade de computação) e, por último, baseiam a qualidade das soluções obtidas em custos
operacionais não representativos. Nesta tese, aborda-se o posicionamento e o encadeamento de
funções de rede virtualizadas (VNFPC – Virtual Network Function Placement and Chaining)
como um problema de otimização no contexto intra- e inter-datacenter. Primeiro, formaliza-se
o problema VNFPC e propõe-se um modelo de Programação Linear Inteira (ILP) para resolvê-
lo. O objetivo consiste em minimizar a alocação de recursos, ao mesmo tempo que atende
aos requisitos e restrições de fluxo de rede. Segundo, aborda-se a escalabilidade do problema
VNFPC para resolver grandes instâncias do problema (i.e., milhares de nós NFV). Propõe-se
um um algoritmo heurístico baseado em fix-and-optimize que incorpora a meta-heurística Varia-
ble Neighborhood Search (VNS) para explorar eficientemente o espaço de solução do problema
VNFPC. Terceiro, avalia-se as limitações de desempenho e os custos operacionais de estratégias
típicas de aprovisionamento ambientes reais de NFV. Com base nos resultados empíricos cole-
tados, propõe-se um modelo analítico que estima com alta precisão os custos operacionais para
requisitos de VNFs arbitrários. Quarto, desenvolve-se um mecanismo para a implantação de
encadeamentos de VNFs no contexto intra-datacenter. O algoritmo proposto (OCM – Operatio-
nal Cost Minimization) baseia-se em uma extensão da redução bem conhecida do problema de
emparelhamento ponderado (i.e., weighted perfect matching problem) para o problema de fluxo
de custo mínimo (i.e., min-cost flow problem) e considera o desempenho das VNFs (e.g., requi-
sitos de CPU), bem como os custos operacionais estimados. Os resultados alcaçados mostram
que o modelo ILP proposto para o problema VNFPC reduz em até 25% nos atrasos fim-a-fim
(em comparação com os encadeamentos observados nas infra-estruturas tradicionais) com um
excesso de provisionamento de recursos aceitável – limitado a 4%. Além disso, os resultados
evidenciam que a heurística proposta (baseada em fix-and-optimize) é capaz de encontrar solu-
ções factíveis de alta qualidade de forma eficiente, mesmo em cenários com milhares de VNFs.
Além disso, provê-se um melhor entendimento sobre as métricas de desempenho de rede (e.g.,
vazão, consumo de CPU e capacidade de processamento de pacotes) para as estratégias típicas
de implantação de VNFs adotadas infraestruturas NFV. Por último, o algoritmo proposto no
contexto intra-datacenter (i.e. OCM) reduz significativamente os custos operacionais quando
comparado aos mecanismos de posicionamento típicos utilizados em ambientes NFV.
Palavras-chave: Funções Virtualizadas de Rede. Encadeamento de Serviços. Orquestração de
NFV. Otimização Combinatória. Programação Matemática. Custo Operacional. Avaliação de
Desempenho..
LIST OF ABBREVIATIONS AND ACRONYMS

BA-2 Albert-Barabsi Model

BPP Bin Packing Problem

CAPEX Capital Expenditures

DAG Direct Acyclic Graph

DDoS Distributed Denial of Service

DNM Distributed Network Monitoring

DPDK Data Plane Development Kit

DPI Deep Packet Inspection

DUT Design Under Test

ETSI European Telecommunications Standards Institute

GRE Generic Routing Encapsulation

GSO Generic Segmentation Offload

HH Heavy Hitter

HTTP Hypertext Transfer Protocol

HV Hypervisor

IETF Internet Engineering Task Force

ILP Integer Linear Programming

IMS IP Multimedia Subsystem

ISP Internet Service Provider

KVM Kernel-based Virtual Machine

MANO NFV Management and Orchestration

MPLS Multiprotocol Label Switching

N-PoP Network Point of Presence

NF Network Function

NFV Network Function Virtualization

NFVI NFV Infrastructure

NFVO NFV Orchestrator


NIC Network Interface Card

NUMA Non-Uniform Memory Access

NSH Network Service Header

OCM Operational Cost Minimization

OPEX Operational Expenditures

OPNFV Open Platform for NFV

OVS Open vSwitch

OVS-DPDK DPDK-enabled OVS

PCIe Peripheral Component Interconnect Express

PMD Poll Mode Driver

RAN Radio Access Network

RSS Receive Side Scaling

SDN Software Defined Networking

SFC Service Function Chaining

TLB Translation Lookaside Buffer

TSO TCP Segmentation Offload

VIM Virtualized Infrastructure Manager

VNE Virtual Network Embedding Problem

VNF Virtualized Network Function

VNFM VNF Manager

VNFPC Virtual Network Function Placement and Chaining Problem

vNIC Virtual NIC

VNS Variable Neighborhood Search

VPN Virtual Private Network

VXLAN Virtual Extensible LAN


LIST OF FIGURES

1.1 Examples of SFCs, and partial view of the network backbone (focusing on
the set of N-PoPs available for placing VNFs) considered in our scenario. . 20
1.2 Example of strategies to deploy three given Service Function Chaining. . . 21

2.1 General NFV architecture proposed by the ETSI. . . . . . . . . . . . . . . 28


2.2 Overview of SFC deployment. . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1 Example SFC deployment on a physical infrastructure to fulfill a number


of requests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 Basic topological components of SFC requests. . . . . . . . . . . . . . . . 40
3.3 Bin Packing instance reduction to VNFPC Problem. . . . . . . . . . . . . 41
3.4 Average number of network function instances. . . . . . . . . . . . . . . . 51
3.5 Average CPU overhead of network function instances. . . . . . . . . . . . 52
3.6 Average bandwidth overhead of SFCs deployed in the infrastructure. . . . . 53
3.7 Average end-to-end delay of SFCs deployed in the infrastructure. . . . . . 54
3.8 Mixed scenario including Components 1, 2, and 4. . . . . . . . . . . . . . 55
3.9 Scenario considering a medium-size NFV infrastructure and components
of type 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.1 A step-by-step run of our algorithm, for the scenario shown in Figure 1.1. . 59
4.2 Number of deployed VNFs and required time to compute a feasible place-
ment and chaining. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3 Analysis of resource commitment (computing power and bandwidth) of
each solution generated. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1 Example of strategies to deploy three given Service Function Chaining. . . 69


5.2 Given a set of m service chains Φ, illustrating deployment of VNFs on a
single server (our DUT). . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Experiment results showing throughput, packet processing and CPU con-
sumption for traffic generated in 100Bytes packets, that is examined over
increasing size of chain, on a DUT that is installed with kernel OV S. . . . 73
5.4 Experiment results showing throughput, packet processing and CPU con-
sumption for traffic generated in 100Bytes packets, that is examined over
increasing size of chain, on a DUT that is installed with DPDK-OV S. . . . 74
5.5 Throughput for traffic generated in 1500Bytes packets, that is examined
over increasing size of chain, on a DUT that is installed with kernel-OV S. . 76
5.6 Throughput for traffic generated in 1500Bytes packets, that is examined
over increasing size of chain, on a DUT that is installed with DPDK-OV S. 76
5.7 Analysis of multiple servers, showing total throughput for traffic generated
in 1500Bytes packets, that is examined over increasing size of chain. . . . . 76
5.8 Cpu-cost ranging over different service chain length (ϕ|n ), while receiving
traffic generated in 1500Bytes packets, for both placement functions on
servers that are installed with kernel OV S. . . . . . . . . . . . . . . . . . . 79
5.9 Cpu-cost ranging over different service chain length (ϕ|n ), while receiving
traffic generated in 1500Bytes packets, for both placement functions on
servers that are installed with DPDK-OV S. . . . . . . . . . . . . . . . . . 80
5.10 Cpu-cost ranging over different packet processing requirements (ϕ|p ) 1500Bytes
per packet, for both placement functions . . . . . . . . . . . . . . . . . . . 81

6.1 Given a set of five service chains Φ and a set of k servers S, illustrating
deployment strategies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2 Server Sj deployed with r sub-chains from different network services. . . . 85
6.3 The extra effort overhead cost as a function of the number of sub-chains
that exists in the server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.4 Run time analysis of OCM compared to optimal solution. . . . . . . . . . 88
6.5 Analysis of service chains deployment on NFV servers with Open vSwitch
in kernel mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.6 Analysis of service chains deployment on NFV servers with Open vSwitch
in DPDK mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

7.1 Coordinated network monitoring of SFC-based network services. . . . . . 95


7.2 Performance degradation when sampling in software switching. . . . . . . 98
7.3 Example of a solution for the Distributed Network Monitoring (DNM)
problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.4 Number of monitoring sinks required for different sampling rates. For this
evaluation, we consider α = 1 and β = 1. . . . . . . . . . . . . . . . . . . 104
7.5 Network monitoring traffic demanded, consumed and saved when applying
DNM solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.6 Required time to compute the optimal solution for the DNM problem. . . . 106

A.1 Exemplo de SFCs e uma visão parcial da infraestrutura de rede. . . . . . . 121


A.2 Exemplo de estratégias para aprovisionar SFCs . . . . . . . . . . . . . . . 122
LIST OF TABLES

2.1 Summary of related approaches to the VNFPC problem. . . . . . . . . . . 36

3.1 Glossary of symbols and functions related to the optimization model. . . . 43


3.2 Processing times of physical and virtual network functions used in our eval-
uation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1 Assessment of the quality of generated solutions, and distance to lower


bound, under various parameter settings. . . . . . . . . . . . . . . . . . . 67

5.1 Coefficients α and β, and the constant factor γ, per each cpu-cost sub-
function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
LIST OF ALGORITHMS

1 Overview of the proposed heuristic. . . . . . . . . . . . . . . . . . . . . . . . 47


2 Overview of the fix-and-optimize heuristic for the VNF placement and chaining
problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3 Optimal service chain placement . . . . . . . . . . . . . . . . . . . . . . . . . 90
CONTENTS

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2 Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3 Goals and Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2 BACKGROUND AND STATE-OF-THE-ART . . . . . . . . . . . . . . . . . . 27


2.1 Network Function Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Service Function Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 NFV Enabling Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Performance Limitations of Commodity Hardware . . . . . . . . . . . . . . . . 31
2.3.2 Hardware Acceleration Technologies . . . . . . . . . . . . . . . . . . . . . . . 32
2.3.3 Virtual Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3 PIECING TOGETHER THE NFV PROVISIONING PUZZLE: EFFICIENT PLACE-


MENT AND CHAINING OF VIRTUAL NETWORK FUNCTIONS . . . . . . . 38
3.1 Problem Overview and Optimization Model . . . . . . . . . . . . . . . . . . . 38
3.1.1 Topological Components of SFC Requests . . . . . . . . . . . . . . . . . . . . 40
3.1.2 Proof of NP-completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.1.3 Model Description and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.1.4 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Proposed Heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4 A FIX-AND-OPTIMIZE APPROACH FOR EFFICIENT AND LARGE SCALE


VIRTUAL NETWORK FUNCTION PLACEMENT AND CHAINING . . . . . . 58
4.1 Fix-and-Optimize Heuristic for the VNF Placement & Chaining Problem . . 58
4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.1.2 Inputs and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.3 Obtaining an Initial Configuration . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.4 Variable Neighborhood Search . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.1.5 Neighborhood Selection and Prioritization . . . . . . . . . . . . . . . . . . . . 61
4.1.6 Configuration Decomposition and Optimization . . . . . . . . . . . . . . . . . 62
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Our Heuristic Algorithm Compared to Existing Approaches . . . . . . . . . . . 65
4.2.3 Qualitative Analysis of Generated Solutions . . . . . . . . . . . . . . . . . . . 66
4.2.4 Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5 THE ACTUAL COST OF SOFTWARE SWITCHING FOR NFV CHAINING . 68


5.1 Problem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Model Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.3 Deployment Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3.2 Evaluating Packet Intense Traffic . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.3 Evaluating Throughput Intense Traffic . . . . . . . . . . . . . . . . . . . . . . 76
5.4 Monolithic Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.1 Building an Abstract Cost Function . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.2 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 OPTIMIZING OPERATIONAL COSTS OF NFV SERVICE CHAINING . . . . 82


6.1 Model and Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 Operational Cost Optimized Placement . . . . . . . . . . . . . . . . . . . . . 84
6.2.1 The Operational Cost of Switching . . . . . . . . . . . . . . . . . . . . . . . . 85
6.2.2 Chain Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.3 Using Matching to Find Optimal Cost Placement . . . . . . . . . . . . . . . . 88
6.2.4 Operational Cost Minimization Algorithm . . . . . . . . . . . . . . . . . . . . 89
6.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.3.2 Comparison to OpenStack . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7 OPTIMIZING DISTRIBUTED NETWORK MONITORING IN NFV . . . . . . . 94


7.1 Problem Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.2 The Cost of Packet-Level Monitoring in Software Switching . . . . . . . . . . 96
7.3 Distributed Network Monitoring: Problem Overview and Optimization Model 98
7.3.1 Problem Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3.2 Model description and notation . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.3.3 Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.4.1 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8 FINAL CONSIDERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.2 Future Research Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.3 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

APPENDIX A RESUMO ESTENDIDO DA TESE . . . . . . . . . . . . . . . . . 119


A.1 Definição do Problema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.2 Objetivos e Contribuições . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

APPENDIX B PAPER PUBLISHED AT IFIP/IEEE IM 2015 . . . . . . . . . . . 126

APPENDIX C JOURNAL PAPER PUBLISHED AT ELSEVIER COMPUTER


COMMUNICATIONS . . . . . . . . . . . . . . . . . . . . . . . . 136

APPENDIX D PAPER PUBLISHED AT IFIP/IEEE IM 2017 . . . . . . . . . . . 148


18

1 INTRODUCTION

Network Functions (NF) play an essential role in today’s networks, as they support a diverse
set of functions ranging from security (e.g., firewalling and intrusion detection) to performance
(e.g., caching and proxying) (MARTINS et al., 2014). As currently implemented, middleboxes
are difficult to deploy and maintain. This is mainly because cumbersome procedures need to
be followed, such as dealing with a variety of custom-made hardware interfaces and manually
chaining middleboxes to ensure the desired network behavior. Further, studies show that the
number of middleboxes in enterprise networks (as well as in datacenter and ISP networks) is
similar to the number of forwarding devices (BENSON; AKELLA; SHAIKH, 2011; SEKAR
et al., 2012). Thus, the aforementioned difficulties are exacerbated by the complexity imposed
by the high number of network functions that a network provider has to cope with, leading to
high operational expenditures. Moreover, in addition to costs related to manually deploying and
chaining middleboxes, the need for frequent hardware upgrades adds up to substantial capital
investments.
Network Function Virtualization (NFV) has been proposed to shift middlebox processing
from specialized hardware appliances to software running on commoditized hardware (GROUP,
2012). In addition to potentially reducing acquisition and maintenance costs, NFV is expected
to allow network providers to make the most of the benefits of virtualization on the management
of network functions (e.g., elasticity, performance, flexibility, etc.). In this context, Software-
Defined Networking (SDN) can be considered a convenient complementary technology, which,
if available, has the potential to make the chaining of the aforementioned network functions
much easier. In fact, it is not unreasonable to state that SDN has the potential to revamp the
Service Function Chaining (SFC) problem.
In short, the problem consists of making sure network flows go efficiently through end-to-
end paths traversing sets of network functions (MECHTRI et al., 2017). In the NFV/SDN realm
and considering the flexibility offered by this environment, the problem consists of (sub)optimally
defining how many instances of virtual network functions (VNF) are necessary and where to
place them in the infrastructure. Furthermore, the problem encompasses the determination of
end-to-end paths over which known network flows have to be transmitted so as to pass through
the required placed network functions.

1.1 Problem Statement

There have been significant achievements in NFV, addressing aspects from effective plan-
ning and deployment (LEWIN-EYTAN et al., 2015; KUO et al., 2016; LUIZELLI et al.,
2017a) to efficient operation and management (HWANG; RAMAKRISHNAN; WOOD, 2014;
GEMBER-JACOBSON et al., 2014; ZHANG et al., 2016). Nevertheless, NFV is a relatively
new and yet maturing paradigm, with various research questions open. As mentioned, one of
19

the most challenging aspects is how to efficiently find a proper VNF placement and chaining.
This problem is particularly challenging for many reasons. First, NFV is an inherently
distributed network design based on small cloud nodes spread over the network infrastructure.
Therefore, depending on how VNFs are positioned and chained in the infrastructure, end-to-
end latencies may become intolerable. This problem is aggravated by the fact that processing
times tend to be higher, due to the use of virtualization, and may vary, depending on the type
of network function and the hardware configuration of the device hosting it. Second, even
when deploying VNFs on a single (small) data center, network services might face performance
penalties (and limitations) on critical network metrics (such as throughput, latency, and jitter)
depending on how network functions are chained and deployed onto physical servers. Third,
resource allocation must be performed in a cost-effective manner, preventing over- or under-
provisioning of resources. Therefore, placing network functions and programming network
flows in a cost-effective manner while ensuring the required network service performance (e.g.,
maximum tolerable end-to-end delays) represent an essential step towards enabling the use of
NFV in production environments. In the following paragraphs, we provide an overview of the
virtual network function placement and chaining (VNFPC) problems tackled in this thesis.
Inter-datacenter Virtual Network Function Placement and Chaining Problem. We be-
gin with a general description of the Inter-datacenter placement and chaining problem with an
example, illustrated in Figure 1.1. It involves the deployment of two Service Function Chaining
requests (SFCs, also referred to as “composite services”) onto a backbone network. For the first
one, incoming flows must be passed through (an instance of) virtual network function (VNF) 1
(e.g., a firewall), and then VNF 2 (e.g., a load balancer). The second specifies that incoming
flows must also be passed through VNF 1, and then VNF 3 (e.g., a proxy). Both SFCs are
sketched in Figures 1.1(a) and 1.1(b), respectively.
The VNF instances required for each composite service must be placed onto Network Points
of Presence (N-PoPs). N-PoPs are infrastructures (e.g., servers, clusters or even datacenters)
spread in the network and on top of which (virtual) network functions can be provisioned.
Without loss of generality, we assume the existence of one N-PoP associated with every major
forwarding device comprising a backbone network (see circles in Figure 1.1(c)). Each N-PoP
has a certain amount of resources (e.g., computing power) available. Likewise, each SFC has
its own requirements. For example, functions composing an SFC (e.g., caching) are expected to
sustain a given load, and thus are associated with a computing power requirement. Also, traffic
between functions can be expected to reach some peak throughput, which must be handled by
the physical path connecting the N-PoPs hosting those functions.
Given the context above, the first problem we approach is finding a proper placement of
VNFs onto distributed N-PoPs and chaining of placed functions, so that overall network re-
source commitment is minimized. The placement and chaining must ensure that each of the
SFC requirements, as well as network constraints, are met. In our illustrating example, a pos-
sible deployment is shown in Figure 1.1(c). The endpoints, represented as filled circles, denote
20

Figure 1.1: Examples of SFCs, and partial view of the network backbone (focusing on the set
of N-PoPs available for placing VNFs) considered in our scenario.

B
A D
A
VNF 1 VNF 3
VNF 1 VNF 2 C

(a) Branch flow SFC (b) Single flow SFC

(1) (4)
B (8)
A (2) (5)
C
(7)

Endpoint
D (6)
Forwarding
(3) device/N-PoP

(c) Network backbone, and deployment view of SFCs

Source: by author (2016).

flows originated/destinated from/to devices/networks attached to core forwarding devices. Ob-


serve that both composite services share a same instance of VNF 1, placed on N-PoP (2), there-
fore minimizing resource allocation as desired. Although relatively simple for small instances,
the complexity of solving a VNF placement and chaining problem is NP-complete (LUIZELLI
et al., 2017a), as later discussed in Chapter 3.
Intra-datacenter Virtual Network Function Placement and Chaining Problem. The
second problem we approach deals specifically with the placement of VNFs onto servers in dat-
acenters (N-PoPs). A solution for the above problem (i.e, the Inter-datacenter VNFPC) involves
the placement and chaining of SFC requests into multiple (distributed) locations. Particularly,
this is the case when SFCs have stringent network requirements (e.g, very low end-to-end de-
lays). As a resulting of the planning, SFCs are broken down into sub-chains (i.e., subgraphs)
individually placed and chained into specific datacenters. Then, these partial SFC requests are
deployed on top of available commodity servers.
In the context of intra-datacenter VNFPC, identifying deployment mechanisms that mini-
mize the provisioning cost of service chains has recently received significant attention from both
academia and industry (CLAYMAN et al., 2014; GHAZNAVI et al., 2015; LEWIN-EYTAN et
al., 2015; LUIZELLI et al., 2015; BOUET; LEGUAY; CONAN, 2015; LUKOVSZKI; SCHMID,
21

Figure 1.2: Example of strategies to deploy three given Service Function Chaining.

Server 𝑪 Server 𝑪
𝝋𝟑𝟏 𝝋𝟑𝟐 𝝋𝟑𝟑 𝝋𝟏𝟑 𝝋𝟐𝟑 𝝋𝟑𝟑
Starting Starting
Server 𝑩 Server 𝑩
Point 𝝋𝟐𝟏 𝝋𝟐𝟐 𝝋𝟐𝟑 Point 𝝋𝟏𝟐 𝝋𝟐𝟐 𝝋𝟑𝟐
End End
Server 𝑨 Server 𝑨
𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑 Point 𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 Point

(a) Gather placement strategy (b) Distribute placement strategy

Source: by author (2017).

2015; LUKOVSZKI; ROST; SCHMID, 2016; LUIZELLI et al., 2017a). However, to the best
of our knowledge, existing studies neglected the actual operational cost of NFV deployments.
Therefore typical proposed models (e.g., those implemented in NFV orchestrators) might either
lead to infeasible solutions (e.g., in terms of CPU requirements) or suffer high penalties on the
expected performance.
In an attempt to address this gap, we focused on evaluating and modeling the virtual switch-
ing cost in an NFV-based infrastructure (LUIZELLI et al., 2017b). In this environment, virtual
switching is an essential building block that enables flexible communication between VNFs.
However, its operation comes with an extra cost in terms of computing resources that are allo-
cated specifically to software switching in order to steer the traffic through running services (in
addition to computing resources required by VNFs). This cost depends primarily on the way
VNFs are internally chained, packet processing requirements, and accelerating technologies
(e.g., packet acceleration such as Intel DPDK (INTEL, 2016a)).
Figure 1.2 illustrates possible deployments of service chains on three identical physical
servers (A, B and C). As one can see, all service chains ϕ are composed of VNFs – ϕ1,2,3 =
hϕ1 → ϕ2 → ϕ3 i. For simplicity, we assume that all traffic steering is done by servers’
internal virtual switching, all VNFs require the same amount of processing power to perform
their tasks, and that each service chain is associated with a known amount of traffic it has to
process. Figures 1.2(a) and 1.2(b) illustrate two widely applied VNF deployment strategies in
OpenStack Cloud Orchestrator (ONF, 2015). In Figure 1.2(a), we depict a deployment strategy
where all VNFs of a single SFC are deployed on the same server (referred to as “gather”).
In contrast, Figure 1.2(b) illustrates a deployment strategy where each VNF of a same SFC
is deployed onto different servers (referred to as “distribute”). Observe that all servers have
the same number of deployed VNFs and, therefore, they are under the same CPU processing
requirements. However, determining the amount of processing resources required for switching
inside each server is far from being straightforward even when considering simple deployment
strategies. Therefore, understanding these operational overheads in real NFV deployments is
22

of paramount importance for three main reasons: (i) to ensure performance requirements of
deployed network services in light of performance limitations (e.g., maximum throughput of a
given SFC); (ii) to design efficient placement strategies and accurately estimate their operational
cost for any arbitrary deployments; and (iii) to reduce the operational cost (in particular, CPU
consumption of software switching) of NFV providers.

1.2 Hypothesis

The hypothesis we formulate as the fundamental research issue to guide this Ph.D. research
work is the following:

In order to take full advantage of the benefits provided by NFV, efficient and scalable
strategies should be used to orchestrate the deployment of Service Function Chaining.

Previous work on similar optimization problems have failed to properly consider the place-
ment and chaining of SFCs. In particular, they do not satisfy simultaneously quality, efficiency,
and scalability on their solutions. We advocate that the placement and chaining steps on de-
ploying SFCs must be optimized together so as each subproblem (placing VNF and chaining
network flows) can benefit from each other. Additionally, existing approaches to place VNFs
do not scale to the envisioned extent of NFV deployments (i.e, thousands of N-PoPs). Further,
as previously mentioned, existing approaches also disregard operational costs and real con-
straints of NFV environment, potentially threatening the correct operation of deployed network
services and increasing their operational cost. We emphasize that scalability in the context of
this thesis refers to the ability to solve large instances of the problem in a timely manner (e.g.,
computing time in the order of minutes/hours). In turn, cost-efficient deployments refers to the
ability to come up with optimized solutions in terms of operational cost while meeting quality
requirements (such as end-to-end delay).
The objective of this research is to confirm this hypothesis and answer the research questions
presented below.
Research Question 1. When performing the deployment of NFV-based SFCs, what are the
gains regarding network metrics and resource consumption that can be attained in comparison
to traditional network service deployments (i.e., based on physical network functions)?
Research Question 2. Since the SFCP problem is NP-complete, how to provide efficient
and near-optimal solutions to SFPC instances on large-scale NFV infrastructures?
Research Question 3. On deploying SFCs onto physical servers, there is a non-negligible
CPU cost associated with the service operation. Which are these costs and how to properly
estimate them in an NFV environment?
Research Question 4. How to minimize operational costs in SFC deployments? Is it possi-
ble to efficiently guide NFV orchestrators on deploying VNFs intra- and inter-server?
23

Research Question 5. The proper operation of network services requires to constantly


monitor them. Is it possible to take advantage of NFV/SDN technologies to efficiently deploy
virtualized monitoring services?

1.3 Goals and Contributions

The study proposed here has five main goals: (i) formalize the Inter- and Intra-datacenter
VNFPC problem; (ii) design efficient and scalable algorithmic methods in order to timely com-
pute quality-wise solutions to the VNFPC problem; (iii) measure and model operational costs of
SFC deployments, as well as network performance limitations, on an NFV-based environment;
(iv) minimize incurred operational costs in NFV infrastructures; and (v) optimize how network
traffic steered through service chains is monitored in NFV-based deployments.
The aforementioned goals unfold into a set of contributions of this thesis, described below.
The first set of contributions encompasses the formalization of the virtual network function
placement and chaining problem (VNFPC) by means of an Integer Linear Programing model.
The devised model considers a wide range of NFV requirements (e.g., network function com-
puting power, flow capacity requirement, etc) (LUIZELLI et al., 2015). Additionally, in order
to cope with medium-size NFV infrastructures, we also propose a heuristic procedure which ef-
ficiently guides commercial solvers throughout the exploration of good solutions. We compare
both optimal and heuristic approaches considering different use cases and metrics, such as the
number of instantiated virtual network functions, physical and virtual resource consumption,
and end-to-end latencies.
As the second set of contributions, we address the scalability of the VNFPC problem by
proposing a novel fix-and-optimize-based heuristic algorithm (LUIZELLI et al., 2017a). It
combines mathematical programming (Integer Linear Programming) and a heuristic method
(Variable Neighborhood Search (HANSEN; MLADENOVIć, 2001)), so as to produce high-
quality solutions for large scale network setups in a timely fashion. We provide strong evidence,
supported by an extensive set of experiments, that our heuristic algorithm scales to environments
comprised of hundreds of network functions. It produces timely results on average only 20%
far from a computed lower bound, and outperforms the existing algorithmic solutions, quality-
wise, by a factor of 5. As another important contribution, we prove the NP-completeness nature
of the VNFPC problem. As far as we are aware of, this is the first study to formally validate
such an important claim.
The third set of contribution of this thesis comprises measuring and modeling operational
costs and network metrics of different SFC deployment strategies in real NFV infrastructures
(LUIZELLI et al., 2017b). We conduct an extensive and in-depth evaluation, measuring the
performance and analyzing the impact of SFC deployment on Open vSwitch – the de facto
standard software switch for cloud environments (PFAFF et al., 2009; PFAFF et al., 2015;
ONF, 2016). Based on our evaluation, we then craft a generalized and abstract cost function
24

that accurately captures the CPU cost of network switching. We measure and estimate the
software switching costs for two widely applied VNF deployment strategies, namely, distribute
and gather. On using the distribute deployment, each VNF of an SFC is deployed on top of a
different server. In contrast, in the gather, the entire SFC (i.e., all VNFs) is deployed on the
same server.
As the fourth set of contributions, we develop a general service chain deployment algorith-
mic mechanism that considers both the actual performance of the service chains as well as the
required extra internal switching resource. This is done by decomposing service chains into
sub-chains and deploying each sub-chain on a (possibly different) physical server, in a way that
minimizes the total switching overhead cost. We introduce a novel algorithm based on an ex-
tension of the well-known reduction from weight matching to min-cost flow problem, and show
that it gives an almost optimal solution, with much more efficient run time comparing to the
exhaustive search. We evaluate the performance of this algorithm against the fully distribute
or fully gather solutions, which are very similar to the placement of standard mechanism com-
monly utilized on cloud schedulers (e.g., nova-scheduler module in OpenStack with load
balancing or energy conserving weights) and show that our algorithm significantly outperforms
OpenStack (can be up to a factor of 4 in some cases) with respect to operational costs.
As the fifth set of contributions, we tackle the problem of monitoring network traffic in ser-
vice chains – another important building block for the proper operation of NFV-based network
services. We first formalize the DNM (Distributed Network Monitoring) problem, and then we
propose an ILP model to solve it. Our optimization model is able to effectively coordinate the
monitoring of service chains in NFV environments. The model is aware of SFC topological
components, which allows to independently monitor elements within a network service with
low overheads in terms of deployed monitoring and network traffic consumption.
We group the main contributions of this thesis into the five areas below:

1. Inter-Datacenter Placement & Chaining of VNF.


• We formalize the virtual network function placement and chaining problem (VNFPC)
by means of an Integer Linear Programming model.
• We prove the NP-completeness nature of the VNFPC problem.
• We proposed a heuristic procedure that dynamically and efficiently guides the search
for solutions performed by commercial solvers. Further, we demonstrate that our
heuristic scales to medium-size infrastructures while still finding solutions that are
close to optimality.
2. Limited Scalability of State-of-the-Art Solutions.
• We address scalability of VNFPC by proposing a novel fix-and-optimize-based
heuristic algorithm.
• We demonstrate that our proposed method scales to large NFV infrastructures (i.e.,
25

thousands of NFV nodes). Results demonstrates that our proposed method come up
with solution on average 5x better than the considered baseline.
3. Operational Costs.
• We provide in-depth insights of SFC deployment strategies in a real NFV infrastruc-
ture by measuring overheads and incurred costs.
• We develop an analytical model to properly estimate the cost of software switch-
ing (operational cost) for different placement strategies (and requirements) in NFV
infrastructures.
4. Intra-Datacenter Placement & Chaining of VNF.
• We generalize our previous analytical model in order to properly estimate any arbi-
trary SFC deployment placement strategy.
• We develop an online algorithm aiming at minimizing operational costs (software
switching overheads) for incoming service chain requests.
5. Efficient Network Traffic Monitoring of SFCs.
• We formalize the Distributed Network Monitoring problem and propose and Integer
Linear Programming model.
• We conducted a set experiments in an NFV-based environment in order to assess the
cost of sampling network traffic to monitoring systems.
• We provide insights of the gains attained to DNM solutions.

1.4 Organization

The remainder of this proposal is organized as follows.

• Chapter 2 presents fundamental concepts for fully comprehending our proposal. This
background reviews the topics of Network Function Virtualization, Service Function
Chaining and NFV enabling technologies. Further, it also discusses state-of-the-art solu-
tions regarding SFC deployment.
• Chapter 3 defines formally the Service Function Chaining Problem (SFPC) as an Integer
Linear Programming model and introduces a heuristic algorithm to solve medium-size
problem instances.
• Chapter 4 proposes a novel fix-and-optimize math-heuristic to the SFPC problem so as to
scale the resolution to the order of thousands of NFV nodes.
• Chapter 5 assesses and accurately estimates the operational costs of SFC deployments in
real NFV infrastructures.
• Chapter 6 explores operational cost functions to guide the efficient intra-server deploy-
ment of SFCs.
26

• Chapter 7 defines formally the Distributed Network Monitoring problem as an Integer


Linear Programming model.
• Chapter 8 presents the final considerations and directions for future work. The chapter
also presents additional academic contributions originated from this thesis.
27

2 BACKGROUND AND STATE-OF-THE-ART

In this chapter, we first provide an overview on Network Function Virtualization (NFV),


Service Function Chaining (SFC) and NFV enabling technologies. Afterwards, we revisit the
most prominent approaches regarding SFC deployment in the context of NFV environments.

2.1 Network Function Virtualization

Network services are increasingly becoming difficult to deploy and maintain (HAN et al.,
2015). Particularly in the telecommunication industry, the provisioning of network services has
been based on deploying physical proprietary devices/equipments for each function being part
of a given service (MIJUMBI et al., 2016). As the network demand for services is steadily
increasing (CISCO, 2016) (e.g., due to video streaming), network operators are required to con-
stantly deploy and/or expand running network services. The wide adoption of hardware-based
middleboxes as the core of any network service brings many drawbacks to the operation and
growth of current network infrastructure. As an example, in today’s infrastructure, it is almost
unlikely to realize on-demand service provisioning (or scaling up/down) according to traffic
fluctuation. Therefore, these drawbacks directly contribute to increase capital and operational
expenditures by network operators/providers (i.e., CAPEX and OPEX) as well as limiting their
ability to innovate and/or optimize their operations.
Network Function Virtualization (NFV) has been proposed to address these problems by
leveraging virtualization technologies as a new way to design, deploy, and manage network
services (MIJUMBI et al., 2016). The main concept of NFV consists of decoupling network
functions from the hardware they have been embedded into. Therefore, NFV allows software-
based network functions to run over standard high-volume equipments (e.g. servers, switches
and storage). On running network functions on general-purpose hardware, NFV breaks the tight
dependency on specialized hardware and allows NFs to be fully virtualized. Hence, NFV has
the potential to make the most of the benefits from traditional virtualization such as flexibility,
dynamic resource scaling, migration, energy efficiency (to name a few). Apart from these ben-
efits, NFV is also expected to foster the innovation of third-parties network function solutions
and, therefore, encourage free network market competition. Consequently, NFV is expected to
help to lower the acquisition and operation of network solutions and services.
The NFV architecture has recently been defined by the ETSI (European Telecommunica-
tions Standards Institute) and by many efforts from the IETF community (Internet Engineering
Task Force). The main goal of defining a general architecture is to enable open standardized
interfaces amongst the NFV components. Figure 2.1 illustrates the components and their rela-
tionships. In essence, the proposed architecture consists of three main components: (i) the Net-
work Function Virtualization Infrastructure (NFVI), (ii) the Virtual Network Functions (VNF),
and (iii) the NFV Management and Orchestration (MANO). The NFVI comprises all hardware
28

Figure 2.1: General NFV architecture proposed by the ETSI.


Virtual Network Functions

VNF1 VNF2 ... VNFn

Network Services
Management and Orchestration

Management and Orchestration


Computing, Storage and
Network Resources

Virtual Resources
Computing, Storage and
Network Resources

Physical Resources
Network Function Virtualization
Infrastructure

Source: adapted from (MIJUMBI et al., 2016).

(e.g. servers, storage and network) and software components (e.g. virtualization hypervisor and
software switching) that build an NFV environment. Such NFV infrastructure is intended to be
organized as a distributed set of NFV-enabled nodes – also known as N-PoPs. Each N-PoP is
an NFV-enabled node located in the network infrastructure which has a limited computational
power (i.e., it can host only a limited amount of network functions).

In turn, VNFs are the software implementation of network functions (e.g., firewall, Deep
Packet Inspection – DPI, IP Multimedia Subsystem – IMS, Ran Access Network – RAN) and
are usually deployed as a virtual resource – that is, using virtualization technologies such as
traditional virtual machine, containers or micro-kernel. A VNF implementation can be even
decomposed into multiple functional blocks which might run individually in different virtual-
ization platforms. This decomposition enables, for instance, a particular functional block to
run onto different physical devices and/or to be shared with other NFs. As an example, we can
consider the packet classification component (or packet parser) that is usually present in many
NFs. With that decomposition, we can have a single component being shared by different NFs.

Finally, the NFV MANO defines the Management and Orchestration layer of the NFV en-
vironment. The main roles of the MANO are delegated to the NFV Orchestrator (NFVO), to
the VNF Manager (VNFM) and to the Virtualized Infrastructure Manager (VIM). In short, the
NFVO manages the life-cycle of network services (e.g., (de-)instantiation and service reposito-
ries). The VNFM controls the life-cycle of VNFs (e.g., instantiation, updates, scaling in/out).
In turn, the VIM is in charge of managing and monitoring of available physical and virtual
resources in the NFVI.
29

2.2 Service Function Chaining

Network services often requires various network functions (NF) ranging from traditional
network functions (e.g., firewalls and load balancers), to application-specific features (e.g.,
HTTP header manipulation, WAN and application acceleration (HALPERN; PIGNATARO,
2015). The realization of an end-to-end network service encompasses the interconnection (or
traffic steering) of required NFs in the infrastructure – which is know as Service Function Chain-
ing (SFC).
A Service Function Chaining (SFC) is formally defined as an ordered or partially ordered
set of service functions (also referred to NFs) that must be applied to packets, frames and/or
flows according to a prior classification (HALPERN; PIGNATARO, 2015). It is important to
emphasize that the implied order of an SFC might not be a linear progression. For instance,
the current SFC architecture allows a network service to be defined as a forwarding graph (i.e.,
Direct Acyclic Graph – DAG). In the occurrence of branches in the DAG (i.e., NFs connected to
two different endpoints/NFs), packets (or flows) can be forwarded to one, both, or none of the
following NFs. Figure 2.2 depicts a NFV-based service being deployed through a combination
of VNFs. Note that the VNF-2 is decomposed into three modular components (i.e, VNF-2a,
VNF-2b, and VNF-2c), which allows each component to run at different locations and over
different virtualization platforms.

Figure 2.2: Overview of SFC deployment.


End-to-end Network Service
VNF-FG
VNF-FG-2

VNF-2a VNF-2b

VNF-1 VNF-3
End End
VNF-2c
Point Point
Virtualization Layer

N-PoP
Physical Link
Logical Link
Virtualization
Hardware Resources in Physical Locations

Source: adapted from (MIJUMBI et al., 2016).

On current network service deployments, there is a tightly dependency of SFC and the phys-
ical underlying infrastructure. The process of steering the network traffic through NFs is per-
30

formed by a cumbersome, manual and error-prone process of inter-connecting network cables


and crafting routing tables. According to (HALPERN; PIGNATARO, 2015), such topological
dependency imposes many constraints (not limited to) on network services delivery such as:

1. Limited ability to (optimally) utilize (and optimize) infrastructure resources. Overtime,


SFC deployments might become inefficient due to changing in traffic pattern. Therefore,
there is a high operational cost associated with re-planning the chaining of existing NFs
in a production environment, making it impractical to be done frequently.
2. Configuration complexity. Due to high dependency on network infrastructure, simple
modifications on current deployed SFCs (e.g., adding or removing a NF) require changing
the logical and/or physical topology. Therefore, it hinders dynamically network service
reconfiguration and slows down the provisioning of new services.
3. Constrained high availability. Redundant NFs or SFCs must be provisioned in the same
topology as the primary network service. Consequently, it limits the ability to (dynami-
cally) re-route network traffic to backup instances of NFs on the occurrence of failures or
disruption.
4. Transport dependency. There is a wide variety of network transport technologies on cur-
rent network infrastructure (e.g., VXLAN, MPLS, Ethernet, GRE) which implies NFs to
support many technologies simultaneously.
5. Elastic service delivery. There is very little room for maneuver to adjust the infrastructure
to future and ongoing demands due to high configuration complexity of changing SFCs.
Therefore, it is hard to realize any dynamic adjustment due to the risk and complexity.

By virtualizing NFs, NFV is expected to bring more agility and flexibility to the life cycle
management of network services. To overcome technical limitation of traffic steering, Soft-
ware Defined Networking can be seen as a convenient ally, due to its flexible flow handling
capability, thus making placement and chaining technically easier. It is important to mention
that other alternative solutions in the context of traffic steering and SFCs have recently been
proposed such as NSH (Network Service Header) (QUINN; ELZUR., 2016). For additional
information regarding Service Function Chaining, the interested reader is referred to Bhamare
et al. (BHAMARE et al., 2016) and Mijumbi et al. (MIJUMBI et al., 2016).

2.3 NFV Enabling Technologies

On shifting the execution of network function application from specialized and dedicated
hardware to commodity (and shared) hardware, many technical challenges and limitation come
to light. We start discussing these technical limitations on running network functions on com-
modity hardware. Then, we overview how hardware acceleration technologies has overcame
current limitations.
31

2.3.1 Performance Limitations of Commodity Hardware

Networking application requires intensive interaction between the user’s application (i.e.,
user-space VNF implementation) and the kernel’s network stack (which owns and manages
hardware NIC through a kernel-space module (CORBET; RUBINI; KROAH-HARTMAN, 2005)).
This continuous interaction leads to several well-known performance drawbacks regarding user/k-
ernel synchronization and memory management which ultimately affect the desired perfor-
mance of virtualized NFV deployments (particularly when processing packets at line-rate).

We start by the synchronization between user- and kernel-space. One of the major per-
formance obstacles of software-based network stack implementations is the communication
sequence between the hardware (i.e., Network Interface Controller – NIC) and the user ap-
plication. As soon network traffic arrives into the NIC, the user-space application should be
notified by the kernel. Once the application has gotten the network packets, the application has
to release hardware resources as soon as possible so as the hardware can keep working. The
two classical existing approaches to address this communication sequence are either using poll
mode or interrupt mode. In poll mode, the user application polls for available working elements
in the NIC’s queue. In contrast, in the interruption mode, when the hardware adds new working
elements to the queue, it then triggers an interrupt that invokes a callback function. The imple-
mentation of poll mode driver (PMD) requires allocating at least one CPU core to constantly
query for available working elements in the queue. Since in commodity hardware CPU is a lim-
ited resource, interrupt mode driver has been the default kernel implementation (and up until
recently the only one). On the other hand, the major drawback of interrupt mode is the intense
need of CPU context switching which dramatically lowers the performance of network-intense
application.

Another major drawback of using commodity hardware to run network function regards
memory management in modern operating systems. Upon receiving network traffic, the kernel
allocates enough memory to handle received network packets. As this procedure is done by
the kernel (i.e., in kernel space), it is not allowed to grant access to a user space application
on that specific memory region. Consequently, it requires the kernel to copy all network traffic
to a different memory region which is granted to user space applications. On perform such
intense memory operations, environments naturally suffer performance degradations. Observe
that in the context of a virtualized environment, specially the ones running virtual switching, this
memory overhead is also observed in the communication between V NICs (i.e., whenever a VNF
send/receive network traffic). Specifically in NFV environments, a well-known countermeasure
consists of using huge page enabled hardware devices. The usage of huge pages reduces the
number of address entries in translation tables (TLB) and, therefore, makes faster to the CPU to
translate memory address – which reduces the overhead involved in performing memory-intense
operations.
32

2.3.2 Hardware Acceleration Technologies

Many hardware acceleration technologies have been developed over the past years (e.g.,
(BONELLI et al., 2012; L. Deri, 2016; RIZZO, 2012)). Accelerating packet processing enables
applications such as software-based switching (RIZZO; LETTIERI, 2012; ZHOU et al., 2013;
MARTINS et al., 2014; PFAFF et al., 2015; HWANG; RAMAKRISHNAN; WOOD, 2015) and
network stack implementation (BERNAL et al., 2016) to process packets fast and efficiently –
i.e., mitigating the aforementioned drawbacks of current operating systems.
Acceleration technologies implement new hardware interfaces that are tailored to bypass
performance weaknesses in a specific domain of acceleration (e.g., packet processing flow).
Despite providing performance gains (e.g., in terms of throughput or latency), such technologies
are usually cumbersome to configure and deploy, besides being harder to program for. More
importantly, these accelerations technologies usually introduce security risks. In this context,
vulnerabilities are related to developing applications in error-prone environments that might be
exploited by malicious parties.
In the context of NFV, Intel DPDK (Data Plane Development Kit) (INTEL, 2016a) hardware-
acceleration technology has drawn attention from both academy and industry. In short, DPDK
is a set of user-space libraries that enables an user space application to fully manage and own
the hardware NIC– and, therefore, completely bypassing the kernel networking stack (for in-
stance, avoiding the overhead of copying packets to user-space memory regions). Shifting the
entire networking stack to the user space and implementing a poll mode driver, overcomes the
two major current performance weaknesses: (i) zero-copy packet forwarding from the NIC to
the user space application, and (ii) no need for context switching to handle interrupts. On the
other hand, since DPDK requires the user to manage and own the physical hardware, the user
must be given privilege rights to install and run a complete network stack (in the user space) to
processes the entire network traffic.

2.3.3 Virtual Switching

Virtual switching is an essential building block in NFV environments, allowing the inter-
connection of multiple Virtual Network Functions in a flexible, isolated, and scalable manner.
Early approaches of virtual switching (e.g., L2 bridges) were static in nature and, therefore, not
suitable to cope with the requirements of NFV environments.
Open vSwitch (OV S) is the current standard virtual switching implementation in most cloud
environments (PFAFF et al., 2015). The reason for its wide adoption include: (i) native support
to OpenFlow (MCKEOWN et al., 2008); (ii) integrated into most virtualization environments
(e.g., Xen and KVM); and (iii) part of successful ecosystem that develops open-source cloud
solutions (e.g., OpenStack (ONF, 2015)).
Due to its wide adoption, there are many initiatives to enable hardware acceleration tech-
33

nologies in OVS – ranging from proprietary solutions (e.g., EZchip) to open-source libraries
such as Intel DPDK. In particular, the implementation of OV S-DPDK enables to shift the en-
tire packet processing pipeline to the user space, including the NIC poll mode driver and the
datapath – which eliminates the overhead of context switches. In contrast to OV S-DPDK, in
the kernel implementation, each arriving packet triggers an interrupt to the operating system
and requires multiple copy operations (e.g., between the network interface card (NIC) and the
virtual switching; or, between virtual switching and VNFs) – which degrades the performance
of high-speed network processing. The interested reader is referred to (PFAFF et al., 2015) for
additional information.
In addition to acceleration technologies, high-speed virtual switching requires proper tuning
to boost its performance in NFV environments (INTEL, 2016b). A well-known configuration
for virtualization-intense environments consists of logically separate CPU-cores (disjoint sets).
One set of physical cores is assigned to the hypervisor (i.e., kernel and KVM) in order to manage
and provision resources (and, therefore, enable networking between all virtual and physical
ports). In case of OV S-DPDK, the set of cores assigned to the hypervisor also includes the
cores to run DPDK poll mode driver. In turn, the second set of physical CPU-cores is used
to run VNFs. Misconfiguration of these disjoint sets of CPU-cores may lead to over usage
of specific cores and performance degradation of OVS. It is important to emphasize that the
internal architecture of processors (i.e., the way physical cores are interconnected) might lead
to other limitations/implications on network performance (LEPERS; QUÉMA; FEDOROVA,
2015) (e.g., some CPU-cores might be closer to PCIe that is connected to the physical NIC).

2.4 Related Work

We now review some of the most prominent research work related to network function
virtualization and the network function placement and chaining problem. We start the section
by discussing recent efforts aimed at evaluating the technical feasibility of deploying network
functions on top of commodity hardware. Then, we review studies carried out to solve different
aspects of the virtual network function placement and chaining problem.
Hwang et al. (HWANG; RAMAKRISHNAN; WOOD, 2014) propose the NetVM platform
to allow network functions based on Intel DPDK technology to be executed at line-speed (i.e.,
10 Gb/s) on top of commodity hardware. According to the authors, it is possible to acceler-
ate network processing by mapping NIC buffers to user space memory. In another investiga-
tion, Martins et al. (MARTINS et al., 2014) introduce a high-performance middlebox platform
named ClickOS. It consists of a Xen-based middlebox software, which, by means of alterations
in I/O subsystems (back-end switch, virtual net devices and back and front-end drivers), can
sustain a throughput of up to 10 Gb/s. The authors show that ClickOS enables the execution
of hundreds of virtual network functions concurrently without incurring significant overhead
(in terms of delay) in packet processing. The results obtained by Hwang et al. and Martins et
34

al. are promising and definitely represent an important milestone to make the idea of virtual
network functions a reality.
To the best of our knowledge, network function placement and chaining has not been inves-
tigated before the inception of NFV, e.g., for planning and deployment of physical middleboxes.
One of the most similar problems in the networking literature is Virtual Network Embedding
(VNE): how to deploy virtual network requests on top of a physical substrate (ZHU; AMMAR,
2006; YU et al., 2008; LUIZELLI et al., 2016). In spite of the similarities, solutions to the VNE
are not appropriate to the SFPC. The reason is twofold according to (HERRERA; BOTERO,
2016). First, while in VNE we observe one-level mappings (virtual network requests → phys-
ical network), in NFV environments we have two-level mappings (service function chaining
requests → virtual network function instances → physical network). Second, while the VNE
problem considers only one type of physical device (i.e., routers), a much wider number of
different network functions coexist in NFV environments.
With respect to placement and chaining of network functions, Barkai et al. (BARKAI et
al., 2013) and Basta et al. (BASTA et al., 2014) have taken a first step toward modeling this
problem. Barkai et al., for example, propose mechanisms to program network flows to an SDN
substrate taking into account virtual network functions through which packets from these flows
need to pass. In short, the problem consists of mapping SDN traffic flows properly (i.e., in
the right sequence) to virtual network functions. To solve it in a scalable manner, the authors
propose a more efficient topology awareness component, which can be used to rapidly program
network flows. Note that they do not aim at providing a (sub)optimal solution to the network
function placement and chaining problem as we do in this thesis. Instead, the scope of their
work is more of an operational nature, i.e., building an OpenFlow-based substrate that is effi-
cient enough to allow flows – potentially hundred of millions, with specific function processing
requirements – to be correct and timely mapped and programmed. Our solution could be used
together with Barkai’s and therefore help the decision on where to optimally place network
functions and how to correctly map network flows.
The work by Basta et al., in turn, proposes an ILP model for network function placement in
the context of cellular networks and crowd events. More specifically, the problem addressed is
the question on whether or not virtualize and migrate mobile gateway functions to datacenters.
When applicable, the model also encompasses the optimal selection of datacenters that will
host the virtualized functions and SDN controllers. Although the paper covers optimal virtual
function placement, the proposed model is restricted, as it does not have to deal with function
chaining. Our proposal is, in comparison, a broader, optimal solution. It can be applied to plan
not only the placement of multiple instances of virtual network functions on demand, but also
to map and chain service functions.
Moens et al. (MOENS; TURCK, 2014) were the first to address VNF placement and chain-
ing, by formalizing it as an optimization problem. The authors consider a hybrid scenario,
where SFCs can be instantiated using existing middlebox hardware and/or virtual functions.
35

Lewin-Eytan et al. (LEWIN-EYTAN et al., 2015) follow a similar direction, and use an
optimization model along with approximation algorithms to solve the problem. The focus how-
ever is VNF placement: where to deploy VNFs, and how to assign traffic flows to them. Their
work is relevant for having established a theoretical background for NFV placement, building
on two classical optimization problems, namely facility location and generalized assignment.
Nonetheless, network function chaining (key for NFV planning and deployment) is left out of
scope. In turn, Ghaznavi et al. (GHAZNAVI et al., 2015) focus on cost-oriented placement
of elastic demands. They propose a dynamic mechanism to place, migrate and/or reassign net-
work traffic to cope with traffic fluctuations. However, their considered operational costs are not
practical for real deployments (e.g., number of running VNFs). Indiscriminately migration/re-
allocation of VNFs (e.g., to save energy) might even increase the provider’s operational costs
(with respect to the internal server’s switching).
Mehraghdam et al. (MEHRAGHDAM; KELLER; KARL, 2014), Luizelli et al. (LUIZELLI
et al., 2015) and Bari et al. (BARI et al., 2015) introduce joint optimization problems for place-
ment and chaining of VNFs. Mehraghdam et al. (MEHRAGHDAM; KELLER; KARL, 2014)
focus on formally specify service chains and on analyzing the proposed ILP model under dif-
ferent objective functions. Luizelli et al. (LUIZELLI et al., 2015) also approach the problem
from an optimization perspective. The authors introduce a general ILP model which take into
account end-to-end delay and resource constraints. They analyze the effect on resource con-
sumption when deploying different service chains. Their proposed heuristic prunes the search
space, reducing the complexity of finding feasible solutions. Following, Bari et al. (BARI et
al., 2015) propose a similar model focusing on reducing general operational expenditures on
datacenters over time (e.g., energy consumption of deployed services). However, as they do
not consider insights on which VNFs to bring up/down (according to demand), their deploy-
ment solution may end up increasing operational expenditures more than the expected footprint
savings. Kuo et al. (KUO et al., 2016) explore the relation between resource consumption on
physical servers and links.
Rost et al. (ROST; SCHMID, 2016) and Lukovszki et al. (LUKOVSZKI; ROST; SCHMID,
2016), in turn, were the first to introduce approximation algorithms to the joint NFV optimiza-
tion problem. Rost et al. (ROST; SCHMID, 2016) present the first polynomial time service
chain approximation considering request admission control. The proposed solution is based
on classical rounding techniques. Lukovszki et al. (LUKOVSZKI; ROST; SCHMID, 2016)
propose a deterministic approximation algorithm based on submodular functions covering in-
cremental deployment. Finally, Lukovszki and Schmid (LUKOVSZKI; SCHMID, 2015) intro-
duce a deterministic online algorithm with logarithm competitive ratio on the length of service
chains.
Information presented thus far is summarized in Table 2.1. As one can observe from the
state-of-the-art, the area of network function virtualization has recently received much attention
from academy and industry. Most of the effort has been focused on engineering ways of running
36

network functions on top of commodity hardware and, more recently, optimizing aspects in
NFV-based environments. Regarding optimization, many studies have been proposed in the
context of VNF placement (CLAYMAN et al., 2014; GHAZNAVI et al., 2015; LEWIN-EYTAN
et al., 2015) and chaining (MEHRAGHDAM; KELLER; KARL, 2014; LUIZELLI et al., 2015;
BARI et al., 2015; RANKOTHGE et al., 2015; BOUET; LEGUAY; CONAN, 2015) problems.

Table 2.1: Summary of related approaches to the VNFPC problem.

Authors Placement Objective Optimization Infrastructure Evaluation


and/or Method Size Scenario
Chaining (Scalability)
(BASTA et al., 2014) Placement Min. network Exact up to 20 Inter-
load nodes datacenter
(MOENS; TURCK, 2014) Placement Min. resource Exact up to 7 nodes Inter-
(limited utilization datacenter
chaining)
(LEWIN-EYTAN et al., 2015) Placement Min. Number of Approximation up to 100 Inter-
VNFs techniques nodes datacenter
(GHAZNAVI et al., 2015) Placement Min. number of Heuristic up to 100 Intra-
(limited VNFs / cost of nodes datacenter
chaining) assignment
(MEHRAGHDAM; KELLER; Placement Min. number of Exact up to 12 Inter-
KARL, 2014) (limited active nodes / nodes datacenter
chaining) latency
Our proposal (LUIZELLI et Placement Min. number of Exact / Heuristic up to 200 Inter-
al., 2015) and chaining active VNFs nodes datacenter
(BARI et al., 2015) Placement Min. general Exact / Heuristic up to 80 Inter-
and chaining expenditures nodes datacenter
(deployment and
energy costs)
(KUO et al., 2016) Placement Max. admitted Exact / Dynamic up to 200 Intra-
and chaining demands programming nodes datacenter
(ROST; SCHMID, 2016) Placement Max. profit Approximation – Theoretical
and chaining techniques analysis
(LUKOVSZKI; ROST; Placement Min. number of Approximation up to 100 Inter-
SCHMID, 2016) active VNFs algorithms nodes datacenter
Our proposal (LUIZELLI et Placement Min. number of Math-Heuristic > 1000 Inter-
al., 2017a) and chaining active VNFs nodes datanceter
Our proposal OCM Placement Min. software Heuristic > 1000 Intra-
and chaining switching cost nodes datacenter

Source: by author.

In spite of their potentialities, the investigations referred above do not properly scale to
large network settings and several SFCs submitted in parallel. Moens et al. (MOENS; TURCK,
2014), for example, were limited to small scale scenarios. Lewin-Eytan et al. (LEWIN-EYTAN
et al., 2015) and Luizelli et al. (LUIZELLI et al., 2015) have shown to be only partially effective
in scaling to scenarios with a few hundreds nodes and allocating network resources wisely (and
the former does not approach chaining, as mentioned earlier). Our proposal in this thesis en-
hances the state-of-the-art in that it outperforms the solutions of (LEWIN-EYTAN et al., 2015;
LUIZELLI et al., 2015), coming up with feasible, high quality solutions for larger scenarios in
a timely fashion.
Additionally, none of these studies provides optimized operational costs to NFV deploy-
ments. Therefore, most of these solutions are not practical in NFV deployments as they might
lead to either infeasible or low-performance solutions. Perhaps the closest approaches to our
37

proposal are (GHAZNAVI et al., 2015) and (BARI et al., 2015) where the authors propose mod-
els to minimize general operational expenditures. However, as discussed later, the operational
cost depends to a large extent on many factors including the way VNFs are deployed on physi-
cal servers (intra- and inter-server). Therefore, since most of these works focus on arbitrary cost
functions (e.g., reducing the amount of deployed VNFs) – the operational cost of maintaining
deployed network services is still an open research field. In this regard, we propose a general
deployment mechanism specifically tailored to provide performance-oriented deployment.
38

3 PIECING TOGETHER THE NFV PROVISIONING PUZZLE: EFFICIENT PLACE-


MENT AND CHAINING OF VIRTUAL NETWORK FUNCTIONS

In the previous chapter, we presented the fundamental concepts of NFV and discussed the
most prominent solutions to the placement and chaining of VNFs. In this thesis, we first ap-
proach the Inter-datacenter Virtual Network Placement and Chaining (VNFPC) problem1 . We
start this chapter by formalizing the Inter-datacenter VNFPC problem and proposing an opti-
mization model. Further, to cope with medium-size NFV infrastructures, we design a heuristic
procedure that prunes the search space and, therefore, reduces the time of finding feasible solu-
tions. Then, we evaluate both optimal and heuristic approaches considering different use cases
and metrics, such as the number of instantiated virtual network functions, physical and virtual
resource consumption, and end-to-end latencies.
The remainder of this chapter is organized as follows. In Section 3.1, we start with an
overview of the VNFPC problem and a brief discussion of topological components of SFC
requests. We then formalize the problem using an ILP model and prove its NP-completeness
nature. In Section 3.2, we describe the design of the proposed heuristic procedure. Last, in
Section 3.3, we present the performance evaluation of both solutions.

3.1 Problem Overview and Optimization Model

As briefly explained in Chapter 1, network function placement and chaining consists of in-
terconnecting a set of network functions (e.g., firewall, load balancer, etc.) through the network
to ensure network flows are given the correct treatment. These flows must go through end-to-
end paths traversing a specific set of functions. In essence, this problem can be decomposed
into three phases: (i) placement, (ii) assignment, and (iii) chaining.
The placement phase consists of determining how many network function instances are
necessary to meet the current/expected demand and where to place them in the infrastructure.
Virtual network functions are expected to be placed on network points of presence (N-PoPs),
which represent groups of (commodity) servers in specific locations of the infrastructure (with
processing capacity). N-PoPs, in turn, would be potentially set up either in locations with
previously installed commuting and/or routing devices or in facilities such as datacenters.
The assignment phase defines which placed virtual network function instances (in the N-
1
This chapter is based on the following publications:
• Marcelo Caggiani Luizelli, Leonardo Richter Bays, Marinho Pilla Barcellos, Luciana Salete Buriol, Lu-
ciano Paschoal Gaspary. Piecing Together the NFV Provisioning Puzzle: Efficient Placement and
Chaining of Virtual Network Functions. In: Proceedings of IFIP/IEEE International Symposium on
Integrated Network Management, 2015.
• Marcelo Caggiani Luizelli, Weverton Luis Cordeiro, Luciana Salete Buriol, Luciano Paschoal Gaspary.
Fix-and-Optimize Approach for Efficient and Large Scale Virtual Network Function Placement and
Chaining. Elsevier Computer Communications, 2017.
39

PoPs) will be in charge of each flow. Based on the source and destination of a flow, instances
are assigned to it in a way that prevents processing times from causing intolerable latencies.
For example, it may be more efficient to assign network function requests to the nearest virtual
network function instance or to simply split the requested demand between two or more virtual
network functions (when possible).

In the third and final phase, the requested functions are chained. This process consists of cre-
ating paths that interconnect the network functions placed and assigned in the previous phases.
This phase takes into account two crucial factors, namely end-to-end path latencies and distinct
processing delays added by different virtual network functions. Figure 3.1 depicts the main
elements involved in virtual network function placement and chaining. The physical network is
composed of N-PoPs interconnected through physical links. There is a set of SFC requests that
contain logical sequences of network functions as well as the endpoints, which implicitly define
the paths. Additionally, the provider has a set of virtual network function images that it can
instantiate. In the figure, larger semicircles represent instances of network functions running
on top of an N-PoP, whereas the circumscribed semicircles represent network function requests
assigned to the placed instances. The gray area in the larger semicircles represents processing
capacity allocated to network functions that is not currently in use. Dashed lines represent paths
chaining the requested endpoints and network functions.

Figure 3.1: Example SFC deployment on a physical infrastructure to fulfill a number of re-
quests.

Region A Region B SFC-1

A NF2 B

NF4

SFC-2 NF2 B
NF2
A NF1
NF1
NF3 C
NF3
N-PoP
SFC-3 NF3
Physical link
NF instances A NF1 B
Assigned NF instances NF4
Region C
(a) Physical infrastructure. (b) SFC requests.

Source: by author (2015).


40

3.1.1 Topological Components of SFC Requests

SFC requests may exhibit different characteristics depending on the application or flow they
must handle. More specifically, such requests may differ topologically and/or in size. In this
paper, we consider three basic types of SFC components, which may be combined with one
another to form more complex requests. These three variations – (i) line, (ii) bifurcated path
with different endpoints, and (iii) bifurcated path with a single endpoint – are explained next.
The simplest topological component that may be part of an SFC request is a line with two
endpoints and one or more network functions. This kind of component is suitable for handling
flows between two endpoints that have to pass through a particular sequence of network func-
tions, such as a firewall and a Wide Area Network (WAN) accelerator. The second and third
topological components are based on bifurcated paths. Network flows passing through bifur-
cated paths may end up at the same endpoint or not. Considering flows with different endpoints,
the most basic component contains three endpoints (one source and two destinations). Between
them, there is a network function that splits the traffic into different paths according to a certain
policy. A classical example that fits this topological component is a load balancer connected to
two servers. As for bifurcated paths with a single end point, we consider a scenario in which
different portions of traffic between two endpoints must be treated differently. For example,
part of the traffic has to pass through a specific firewall, while the other part, through an encryp-
tion function. Figure 3.2 illustrates these topological components. As previously mentioned,
more sophisticated SFC requests may be created by freely combining these basic topological
components among themselves or in a recursive manner.

Figure 3.2: Basic topological components of SFC requests.

NF2 NFn B
A NF1

A NF1 NFn B NF3 NFm C


(a) Line. (b) Bifurcated path with different endpoints.

NF2 NFn

A NF1 B
NF3 NFm

(c) Bifurcated path with a single endpoint.

Source: by author (2015).


41

3.1.2 Proof of NP-completeness

Next, we show that the VNF placement and chaining (VNFPC) problem belongs to class
NP-Complete.

Figure 3.3: Bin Packing instance reduction to VNFPC Problem.

SFC 1: o t
Cs1,1 = Cq dPi,j = 1
1 bPi,j = ∞
SFC 2:
SFC 2: o t
o 2 t
Cs2,1 = Cq
CPo = 0 CPt = 0
… 3
Endpoint Logical Link
CP1,2,3 = B
SFC |Q|: o t Special Purpose N-PoP
Bins/N-PoP
Cs|Q|,1 = Cq

(a) BPP items represented as SFCs. (b) NFV infrastructure created from Bin Packing
instance.

Source: by author (2016).

Lemma 1. The VNFPC problem belongs to the class NP.

Proof. A solution for the problem is the sequence of links used to map the SFC, as well as
the nodes where the functions were placed. All directed paths between two endpoints (at most

n/2
2
pairs) of an SFC must be mapped to a valid path in the infrastructure. The solution can
be guessed (by means of a nondeterministic Turing machine) and verified in polynomial time,
while else accounting for the delay. Endpoints have to be mapped to pre-defined nodes, and
checking this is trivial. Moreover, resources consumed by all functions installed in each N-PoP
cannot surpass their capacity. While traversing the mapped paths, resource consumption of each
node can be accounted.

Lemma 2. Any Bin Packing instance can be reduced to an instance of the VNFPC problem.

Proof. An instance of the Bin Packing Problem (BPP), which is a classical NP-Complete
problem (GAREY; JOHNSON, 1979), comprises a set Q of items, a size cq for each item
q = 1, . . . , |Q|, a positive integer bin capacity B, and a positive integer n. The decision version
of the BPP asks if there is a partition of Q into disjoint sets Q1 , Q2 , . . . , Qn such that the sum of
sizes of the items in each subset is at most B. We reduce any instance of the BPP to an instance
of the VNFPC using the following procedure:
1. |Q| SFCs are created, each with exactly three nodes and two links. Each SFC has one
source endpoint, which must be mapped to N-PoP o, and one sink endpoint, which must
42

be mapped to t. The middle node is a VNF, which requires a computing power of cq .


Each link demands unitary bandwidth, and the maximum delay requirement of each SFC
is set to two (see Figure 3.3(a)).
2. An NFV infrastructure is created with n + 2 N-PoPs (nodes) and 2 · n logical links
(arcs). The n central N-PoPs have a capacity B, and are linked to two other special
purpose N-PoPs, called o and t. The logical links have an arbitrarily large bandwidth and
an insignificant delay (see Figure 3.3(b)).
The reduction has polynomial time complexity O(|Q|).

Theorem 1. VNFPC is an NP-Complete problem.

Proof. By the instance reduction presented, if the BPP instance has a solution using n bins, then
the VNFPC has a solution using n N-PoPs. Consider that each item of size cq allocated to a bin
j corresponds to place function from SFC q into N-PoP node j. Conversely, if the VNFPC has
a solution using n N-PoPs, then the corresponding BPP instance has a solution using n bins. To
place function from SFC q into N-PoP node j corresponds to allocate item of size cq to a bin j.
Lemmas 1 and 2 complete the proof.

3.1.3 Model Description and Notation

The optimization model presented here is an important building block throughout this thesis.
We adopt a revised version of the model proposed by Luizelli et al. (LUIZELLI et al., 2015),
which captures the placement and chaining aspects we are currently interested in.
We start by describing both the input and output of the model, and establishing a supporting
notation. We use superscript letters P and S to indicate symbols that refer to physical resources
and SFC requests, respectively. Similarly, superscript letters N and L indicate references to N-
PoPs/endpoints, and the links that connect them. We also use superscript H to denote symbols
that refer to a subset (sub-graph) of an SFC request.
The optimization model we use for solving the VNFPC problem considers a set of composite
services Q and a physical infrastructure p, the latter a triple p = (N P , LP , S P ). N P is a set of
network nodes (either an N-PoP or a packet forwarding/switching device), and pairs (i, j) ∈ LP
denote unidirectional physical links. We use two pairs in opposite directions (e.g., (i, j) and
(j, i)) to denote bidirectional links. The set of tuples S P = {hi, ri | i ∈ N P ∧ r ∈ N∗ } contains
the actual location (represented as an integer identifier) of N-PoP i. Observe that more than one
N-PoP may be associated to the same location (e.g. N-PoPs in a specific room or datacenter).
The model captures the following resource constraints: computing power for N-PoPs (cPi ), and
one-way bandwidth and delay for physical links (bPi,j and dPi,j , respectively).
Observe that the forwarding graph of a composite service may represent any topology. Fig-
ures 3.2(a) and 3.2(b) illustrate topologies containing simple transitions and flow branches (note
Table 3.1: Glossary of symbols and functions related to the optimization model.
Symbol Formal specification Definition
Sets and set objects
p p = (N P , LP , S P ) Physical network infrastructure, composed of nodes and links
i ∈ NP N P = {i | i is a N-PoP } Network points of presence (N-PoPs) in the physical infrastructure
(i, j) ∈ LP LP = {(i, j) | i, j ∈ N P } Unidirectional links connecting pairs of N-PoPs i and j
hi, ri ∈ S P S P = {hi, ri | i ∈ N P ∧ r ∈ N∗ } Identifier r of the actual location of N-PoP i
m∈F F = {m | m is a function type } Types of virtual network functions available
j ∈ Um Um = {j | j is an instance of m ∈ F } Instances of virtual network function m available
q∈Q Service function chaining (SFC) requests that must be deployed
q q = (NqS , LSq , SqS ) A single SFC request, composed of VNFs and their chainings
i ∈ NqS N S = {i | i is a VNF instance or endpoint } SFC nodes (either a network function instance or an endpoint)
(i, j) ∈ LSq LSq = {(i, j) | i, j ∈ N S } Unidirectional links connecting SFC nodes
hi, ri ∈ SqS SqS = {hi, ri | i ∈ N S ∧ r ∈ N∗ } Required physical location r of SFC endpoint i
Hq,iH ∈ HS
q Distinct forwarding paths (subgraphs) contained in a given SFC q
Hq,iH H = (N H , LH )
Hq,i q,i q,i A possible subgraph (with two endpoints only) of SFC q
Nq,iH H ⊆ NS
Nq,i q VNFs that compose the SFC subgraph Hq,i H
H
Lq,i LH S
q,i ⊆ Lq Links that compose the SFC subgraph Hq,i H
Parameters
cPi ∈ R+ Computing power capacity of N-PoP i
bPi,j ∈ R+ One-way link bandwidth between N-PoPs i and j
dPi,j ∈ R+ One-way link delay between N-PoPs i and j
cSq,i ∈ R+ Computing power required for network function i of SFC q
bSq,i,j ∈ R+ One-way link bandwidth required between nodes i and j of SFC q
dSq ∈ R+ Maximum tolerable end-to-end delay of SFC q
Functions
ftype (m) ftype : N P ∪ N S → F Type of some given virtual network function (VNF)
fcpu (m, j) fcpu : (F × Um ) → R+ Computing power associated to instance j of VNF type m
fdelay (m) fdelay : F → R+ Processing delay associated to VNF type m
Variables
yi,m,j ∈ Y Y = { yi,m,j , ∀ i ∈ N P , m ∈ F, j ∈ Um } VNF placement
aNi,q,j ∈ A
N AN = { aN P
i,q,j , ∀ i ∈ N , q ∈ Q, j ∈ Nq }
S Assignment of required network functions/endpoints
ai,j,q,k,l ∈ AL
L A = { ai,j,q,k,l , ∀ (i, j) ∈ L , q ∈ Q, (k, l) ∈ LSq }
L L P Chaining allocation

43
44

that flow joins may also be used as illustrated in Figure 3.2(c)). We assume, for simplicity, that
the set of virtual paths available to carry traffic flows is known in advance, as such paths are
convenient for determining end-to-end delays among pairs of endpoints in our model. In Fig-
ure 3.2(b), there are two virtual paths: one starting in A and ending in B, and another starting in
A and ending in C (both traversing NFs 1 to n). It is important to emphasize here that such set
of virtual paths is defined according to the network policy being implemented. For example, a
network policy that allows traffic to go from and to all endpoints requires all paths to be known
in advance. In this context, path is related to the sequence of virtual network functions and
endpoints that a specific traffic should pass through in a particular composite service, instead
of being related to a routing path in the physical infrastructure (this is performed by the model
– as shown next). This assumption does not restrict the model, neither increase its complexity,
since finding paths (e.g., the shortest one) is known to have polynomial time complexity.
The set of virtual paths of a composite service q is denoted by Hq . Each element Hq,i ∈ Hq
is one possible sub-graph of q, and contains one source and one sink endpoint, and only one
possible forward path. The subsets Nq,iH
⊆ NqS and LH q,i ⊆ Lq contain the VNFs and links that
S

belong to Hq,i .
A composite service q ∈ Q is an aggregation of network functions and chaining between
them. It is represented as a triple q = (NqS , LSq , SqS ). Sets NqS and LSq contain the SFC nodes
and virtual links connecting them, respectively. Each SFC has at least two endpoints, denoting
specific locations in the infrastructure. The required locations of SFC endpoints is determined
in advance, and given by SqS = {hi, ri | i ∈ NqS ∧ r ∈ N∗ }. For each composite service q, we
capture the following resource requirements: computing power required by a network function
i (cSq,i ), minimum bandwidth required for traffic flows between functions i and j (bSq,i,j ), and
maximum tolerable end-to-end delay (dSq ).
F denotes the set of types of VNFs (e.g., firewall, gateway) available for deployment. Each
VNF has Um instances, and may be instantiated at most |Um | times (e.g. due to the number of
licenses purchased/available). We denote as ftype : N P ∪ N S → F the function that indicates
the type of some given VNF, which can be either one instantiated in some N-PoP (N P ) or one
requested in an SFC (N S ). We also use functions fcpu : (F × Um ) → R+ and fdelay : F → R+
to denote computing power requirement and processing delay of a VNF.
The model output is denoted by a 3-tuple χ = {Y, AN , AL }. Variables from Y = { yi,m,j , ∀ i ∈
N P , m ∈ F, j ∈ Um } indicate a VNF placement, i.e. whether instance j of network function
m is mapped to N-PoP i. The variables from AN = { aN i,q,j , ∀ i ∈ N , q ∈ Q, j ∈ Nq }, in
P S

turn, represent an assignment of required network functions/endpoints. They indicate whether


node j (either a network function or an endpoint), required by SFC q, is assigned to node
i (either an N-PoP or another device in the network, respectively). Finally, variables from
AL = { aLi,j,q,k,l , ∀ (i, j) ∈ LP , q ∈ Q, (k, l) ∈ LSq } indicate a chaining allocation, i.e. whether
the virtual link (k, l) from SFC q is being hosted by physical path (i, j). Each of these variables
may assume a value in {0, 1}.
45

3.1.4 Model Formulation

Next we describe the linear integer programming formulation for the VNFPC problem. For
convenience, Table 3.1 presents the complete notation used in the formulation. The goal of
the objective function is to minimize the number of VNF instances mapped on the infrastruc-
ture. That choice was based on the fact that resource allocation accounts for a significant and
direct impact on operational costs. It is important to emphasize, however, that other objective
functions could be adopted (either exclusively or several functions combined). Examples in-
clude number of VNF instances deployed, overall bandwidth commitment, end-to-end delays,
energy-aware deployments, survivability of SFCs, VNF load balancing, just to name a few. As
constraints 1-11 (detailed next) ensure a feasible solution, any objective function being consid-
ered that does not depend on any other constraints should work properly. However, there are
objective functions that might require some minor modification to the model (e.g., additional
constraints) in order to work as expected (which is out of the scope of this thesis).

X X X
minimize yi,m,j
i∈N P m∈F j∈Um

subject to
X X
yi,m,j · fcpu (m, j) ≤ cP
i ∀i ∈ N P (3.1)
m∈F j∈Um
X X X
cS N
q,j · ai,q,j ≤ yi,m,j · fcpu (m, j) ∀i ∈ N P , m ∈ F (3.2)
q∈Q j∈N S :ftype (j)=ftype (m) j∈Um
q

X X
bS L P
q,k,l · ai,j,q,k,l ≤ bi,j ∀(i, j) ∈ LP (3.3)
q∈Q (k,l)∈LS
q

X
aN
i,q,j = 1 ∀q ∈ Q , j ∈ NqS (3.4)
i∈N P

aN N
i,q,k · l = ai,q,k · j ∀hi, ji ∈ S P , q ∈ Q , hk, li ∈ SqS (3.5)
X X
aN
i,q,k ≤ yi,m,j ∀i ∈ N P , q ∈ Q , k ∈ NqS (3.6)
m∈F j∈Um :m=ftype (k)
X X
aL
i,j,q,k,l − aL N N
j,i,q,k,l = ai,q,k − ai,q,l ∀q ∈ Q , i ∈ N P , (k, l) ∈ LS
q (3.7)
j∈N P j∈N P
X X X X
aL P
i,j,q,k,l · di,j + aN S
i,q,j · fdelay (k) ≤ dq
H
∀q ∈ Q , (Nq,t , LH
q,t ) ∈ Hq (3.8)
(i,j)∈LP (k,l)∈LH
q,t
i∈N P k∈Nq,t
H

yi,m,j ∈ {0, 1} ∀ i ∈ N P , m ∈ F, j ∈ Um (3.9)

aN
i,q,j ∈ {0, 1} ∀ i ∈ N P , q ∈ Q, j ∈ NqS (3.10)

aL
i,j,q,k,l ∈ {0, 1} ∀ (i, j) ∈ LP , q ∈ Q, (k, l) ∈ LS
q (3.11)

The first three constraint sets refer to limitations of physical resources. Constraint set (3.1)
ensures that, for each N-PoP, the sum of computing power required by all VNF instances
46

mapped to it does not exceed its available capacity. Constraint set (3.2) certifies that the sum
of required flow processing capacities does not exceed the amount available on a VNF instance
deployed on a given N-PoP. Finally, constraint set (3.3) ensures that the physical path between
the required endpoints has enough bandwidth.
Constraint sets (3.4)-(3.6) ensure the mandatory placement of all virtual resources. Con-
straint set (3.4) certifies that each SFC (and its respective network functions) is mapped to the
infrastructure (and only once). Constraint set (3.5), in turn, seeks to guarantee that required end-
points are mapped to network devices in the requested physical locations. Constraint set (3.6)
certifies that, if a VNF being requested by an SFC is assigned to a given N-PoP, then at least
one VNF instance should be running (placed) on that N-PoP.
The constraints that refer to VNF chaining are the seventh and eighth ones. Constraint
set (3.7) ensures that there is an end-to-end path between required endpoints. Constraint set (3.8)
certifies that latency constraints on mapped SFC requests are met for each path. The first part of
the equation is a sum of the delay incurred by end-to-end latencies between mapped endpoints
belonging to the same path. The second part defines the delay incurred by packet processing on
VNFs that are traversed by flows in the same path.

3.2 Proposed Heuristic

In this subsection we present our heuristic approach for efficiently placing, assigning, and
chaining virtual network functions. We detail each specific procedure it uses to build a feasible
solution, and present an overview of its algorithmic process.
In this particular problem, the search procedure performed by the integer programming
solver leads to an extensive number of symmetrical feasible solutions. This is mainly because
there is a considerable number of potential network function mappings/assignments that satisfy
all constraints, in addition to the fact that search schemes conducted by commercial solvers are
not specialized for the problem in hand.
To address the aforementioned issues, our heuristic approach dynamically and efficiently
guides the search for solutions performed by solvers in order to quickly arrive at high quality,
feasible ones. This is done by performing a search to find the lowest possible number of network
function instances that meets the current demands. In each iteration, the heuristic employs a
modified version of the proposed ILP model in which the objective function is removed and
transformed into a constraint, resulting in a more bounded version of the original model. This
strategy takes advantage of two facts: first, there tends to be a significant number of feasible,
symmetrical solutions that meet our criteria for optimality, and once the lowest possible number
of network function instances is determined, only one such solution needs to be found; and
second, commercial solvers are extremely efficient in finding feasible solutions.
Algorithm 1 presents a simplified pseudocode version of our heuristic approach, and its de-
tails are explained next. The heuristic iteratively attempts to find a more constrained model by
47

Algorithm 1 Overview of the proposed heuristic.


Input: Inf rastructure G, set Q of SF Cs, set V N F of network f unctions, timeLimit
1: s, s0 ← ∅
2: upperBound ← |F |
3: lowerBound ← 1
4: nf ← (upperBound + lowerBound)/2
5: while nf ≥ lowerBound and nf ≤ uppperBound do
6: nf ← (upperBound + lowerBound)/2
7: Remove objective f unction
P
8: Add constraint : i∈RP ,m∈F,j∈Um yi,m,j ≤ nf
9: s ← solveAlteredM odel(timeLimit)
10: if s is f easible then
11: s0 ← s
12: upperBound ← nf
13: else
14: lowerBound ← nf
15: end if
16: end while
17: if s0 = ∅ then
18: return return inf easible solution
19: else
20: return s0
21: end if

dynamically adjusting the number of network functions that must be instantiated on the infras-
tructure. The upper bound of this search is initially set to the maximum number of network
functions that may be instantiated on the infrastructure (line 2), while the lower bound is initial-
ized as 1 (line 3). In each iteration, the maximum number of network function instances allowed
is represented by variable nf , which is increased or decreased based on the aforementioned up-
per and lower bounds (line 6). After nf is updated, the algorithm transforms the original model
into the bounded one by removing the objective function (line 7) and adding a new constraint
(line 8), considering the computed value for nf . The added constraint is shown in Equation 10.

X
yi,m,j ≤ nf (3.12)
i∈RP ,m∈F,j∈Um

In line 9, a commercial solver is used to obtain a solution for the bounded model within an
acceptable time limit. In each iteration, the algorithm stores the best solution found so far (i.e.,
the solution s with the lowest value for nf – line 11). Afterwards, it adjusts the upper or lower
48

bound depending on whether the current solution is feasible or not (lines 12 and 14). Last, it
returns the best solution found (s0 , which represents variables y, AN and AL ).
Although the proposed heuristic uses an exact approach to find a feasible solution for the
problem, timeLimit (in line 9) should be fine-tuned considering the size of the instance being
handled to ensure the tightest solution will be found. In our experience, for example, a time
limit in the order of minutes is sufficient for dealing with infrastructures with 200 N-PoPs.

3.3 Evaluation

In order to evaluate the provisioning of different types of SFCs, the ILP model formalized
in the previous section was implemented and run in CPLEX Optimization Studio2 version 12.4.
The heuristic, in turn, was implemented and run in Python. All experiments were performed
on a machine with four Intel Xeon E5-2670 processors and 56 GB of RAM, using the Ubuntu
GNU/Linux Server 11.10 x86_64 operating system.

3.3.1 Setup

We consider four different types of SFC components. Each type uses either one of the
topological components described in Subsection 3.1.1 or a combination of them. The first com-
ponent is a line composed of a single firewall between the two endpoints (Figure 3.2(a)). The
second component used consists of a bifurcated path with different endpoints (Figure 3.2(b)).
This component is composed of a load balancer splitting the traffic between two servers. These
two types of components are comparable since their end-to-end paths pass through exactly one
network function. The third and fourth components use the same topologies of the previously
described ones, but vary in size. The third component is a line (like Component 1) composed
of two chained network functions – a firewall followed by an encryption network function (e.g.,
VPN). The fourth component is a bifurcated path (like Component 2), but after the load bal-
ancer, traffic is forwarded to one more network function – a firewall. These particular network
functions were chosen due to being commonly referenced in recent literature; however, they
could be easily replaced with any other functions if so desired. All network functions requested
by SFCs have the same requirements in terms of CPU and bandwidth. Each network function
requires 12.5% of CPU, while the chainings between network functions require 1Gbps of band-
width. When traffic passes through a load balancer, the required bandwidth is split between
the paths. The values for CPU and bandwidth requirements were fixed after a preliminary
evaluation, which revealed that they did not have a significant impact on the obtained results.
Moreover, the establishment of static values for these parameters facilitates the assessment of
the impact of other, more important factors.
The processing times of virtual network functions (i.e., the time required by these functions
2
http://www-01.ibm.com/software/integration/optimization/cplex- optimization-studio/
49

to process each incoming packet) considered in our evaluation are shown in Table 3.2. These
values are based on the study conducted by Dobrescu et al. (DOBRESCU; ARGYRAKI; RAT-
NASAMY, 2012), in which the authors determine the average processing time of a number of
software-implemented network functions.

Table 3.2: Processing times of physical and virtual network functions used in our evaluation.

Processing Time Processing Time


Network Function
(physical) (virtual)
Load Balancer 0.2158 msec 0.6475 msec
Firewall 2.3590 msec 7.0771 msec
VPN Function 0.5462 msec 1.6385 msec

Source: by author (2015).

Networks used as physical substrates were generated with Brite3 . The topology of these net-
works follows the Barabasi-Albert (BA-2) (ALBERT; BARABÁSI, 2000) model. This type of
topology was chosen as an approximation of those observed in real ISP environments. Physical
networks have a total of 50 N-PoPs, each with total CPU capacity of 100%, while the bandwidth
of physical links is 10 Gbps. The average delay of physical links is 30ms. This value is based
on the study conducted by Choi et al. (CHOI et al., 2007), which characterizes typical packet
delays in ISP networks.
In order to provide a comparison between virtualized network functions and non-virtualized
ones, we consider baseline scenarios for each type of SFC. These scenarios aim at reproducing
the behavior of environments that employ physical middleboxes rather than NFV. Our baseline
consists of a modified version of our model, in which the total number of network functions
is exactly the number of different functions being requested. Moreover, the objective function
attempts to find the minimum chaining length between endpoints and network functions. In
baseline scenarios, function capacities are adjusted to meet all demands and, therefore, we do
not consider capacity constraints. Further, processing times are three times lower than those in
virtualized environments. This is in line with the study of Basta et al. (BASTA et al., 2014).
These processing times, like the ones related to virtual network functions, are shown in Table
3.2.
In our experiments, we consider two different profiles of network function instances. In
the first one, instances may require either 12.5% or 25% of CPU, leading to smaller instance
sizes. In the second profile, instances may require 12.5% or 100%, leading to larger instances
overall. We first evaluate our optimal approach considering individual types of requests. Next,
we evaluate the effect of a mixed scenario with multiple types of SFCs. Last, we evaluate our
proposed heuristic using large instances. Each experiment was repeated 30 times, with each
3
http://www.cs.bu.edu/brite/
50

repetition using a different physical network topology. All results have a confidence level of
90% or higher.

3.3.2 Results

First, we analyze the number of network functions instances needed to cope with an increas-
ing number of SFC requests. Figure 3.4 depicts the average number of instantiated network
functions with the number of SFC requests varying from 1 to 20. At each point on the graph, all
previous SFC requests are deployed together. It is clear that the number of instances is propor-
tional to the number of SFC requests. Further, we observe that smaller instance sizes lead to a
higher number of network functions being instantiated. Considering small instances, scenarios
with Components 1 and 2 require, on average, 10 network function instances (Figure 3.4(a)). In
contrast, scenarios with Components 3 and 4 require, on average, 20 and 30 instances (Figure
3.4(b)), respectively. For large instances, scenarios with Components 1 and 2 require, respec-
tively, 4 and 3 network function instances, while those with Components 3 and 4 require 9 and
12 instances on average. These results demonstrate that the number of virtual network functions
in an SFC request has a much more significant impact on the number of instances needed to ser-
vice such requests than the chainings between network functions and endpoints. This can be
observed, for example, in Figure 3.4(a), in which Components 1 and 2 only differ topologically
and lead to, on average, the same number of instances. In contrast, Figure 3.4(b) shows that
when handling components of type 4 (which have a higher number of network functions than
those of type 3), a significantly higher number of network function instances is required.
Figure 3.5 illustrates the average CPU overhead (i.e., allocated but unused CPU resources)
in all experiments. Each point on the graph represents the average overhead from the beginning
of the experiment until the current point. In all experiments, CPU overheads tend to be lower
when small instances are used. When large instances are allocated, more resources stay idle.
Considering small instances, Components 1 and 2 (Figure 3.5(a)) lead to, on average, CPU
overhead of 7.80% and 7.28%, respectively. Components 3 and 4 (Figure 3.5(b)) lead to, on
average, 6.58% and 3.18% CPU overhead. In turn, for large instances, Components 1 and 2 lead
to average CPU overheads of 45.61% and 38.68%, respectively. Components 3 and 4 lead to, on
average, 40.21% and 40.36% CPU overhead. Observed averages demonstrate that the impact of
instance sizes is notably high, with smaller instances leading to significantly lower overheads.
Further, we can observe that, in general, CPU overheads tend to be lower when higher num-
bers of SFCs are being deployed. As more requests are being serviced simultaneously, network
function instances can be shared among multiple requests, increasing the efficiency of CPU al-
locations. In these experiments, the baseline has 0% of CPU overhead as network functions are
planned in advance to support the exact demand. Since in NFV environment network function
instances are hosted on top of commodity hardware (as opposed to specialized middleboxes),
these overheads – especially those observed for small instances – are deemed acceptable, as
51

Figure 3.4: Average number of network function instances.

Component 1, small instances Component 3, small instances


Component 1, large instances Component 3, large instances
Component 1, baseline Component 3, baseline
Component 2, small instances Component 4, small instances
Component 2, large instances Component 4, large instances
Component 2, baseline Component 4, baseline

30 30
Number of Network Functions Instantiated

Number of Network Functions Instantiated


25 25

20 20

15 15

10 10

5 5

0 0
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4.

Source: by author (2015).

they do not incur high additional costs.


Next, Figure 3.6 shows the average overhead caused by chaining network functions (through
virtual links) in each experiment. This overhead is measured as the ratio between the effective
bandwidth consumed by SFC virtual links hosted on the physical substrate and the bandwidth
requested by such links. In general, the actual bandwidth consumption is higher than the total
bandwidth required by SFCs, due to the frequent need to chain network functions through paths
composed of multiple physical links. The absence of overhead is observed only when each
virtual link is mapped to a single physical link (ratio of 1.0), or when network functions are
mapped to the same devices as the requested endpoints (ratio < 1.0). Lower overhead rates
may potentially lead to lower costs and allow more SFC requests to be serviced.
Considering large instances, the observed average overhead is 50.49% and 69.87% for sce-
narios with Components 1 and 2, respectively. In turn, Components 3 and 4 lead to overhead
ratios of 116% and 72.01%. This is due to the low number of instantiated network functions
(Figure 3.4), which forces instances to be chained through long paths. Instead, when small in-
stances are considered (i.e., more instances running in a distributed way), overheads tend to be
lower. Components 1 and 2 lead to, on average, 44.30% and 57.60% bandwidth overhead, while
Components 3 and 4, 44.53% and 53.41%, respectively. When evaluating bandwidth overheads,
52

Figure 3.5: Average CPU overhead of network function instances.

Component 1, small instances Component 3, small instances


Component 1, large instances Component 3, large instances
Component 1, baseline Component 3, baseline
Component 2, small instances Component 4, small instances
Component 2, large instances Component 4, large instances
Component 2, baseline component 4, baseline

1 1

0.8 0.8
CPU Overhead

CPU Overhead
0.6 0.6

0.4 0.4

0.2 0.2

0 0
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4.

Source: by author (2015).

we can observe that the topological structure of SFC requests has the most significant impact
on the results (in contrast to previously discussed experiments). More complex chainings tend
to lead to higher bandwidth overheads, although these results are also influenced by other fac-
tors such as instance sizes and the number of instantiated functions. In these experiments the
baseline overhead tends to be lower than the others as the objective function prioritizes shortest
paths (in terms of number of hops) between endpoints and network functions.
Figure 3.7 depicts the average end-to-end delay, in milliseconds, observed between end-
points in all experiments. The end-to-end delay is computed as a sum of the path delays and
network function processing times. In this figure, results for scenarios with small and large
instances are grouped together, as average delays are the same. The observed end-to-end de-
lay for all components tends to be lower than the delay observed for the baseline scenario.
This is mainly due to the better positioning of network functions and chainings between them.
Furthermore, the model promotes a better utilization of the variety of existing paths in the in-
frastructure. Although the baseline scenario aims at building minimum chainings (in terms of
hops), we observe that: (i) minimum chaining does not always lead to global minimum delay;
(ii) when baseline scenarios overuse the shortest paths, other alternative paths remain unused
due to the depletion of resources in specific locations (mainly in the vicinity of highly inter-
53

Figure 3.6: Average bandwidth overhead of SFCs deployed in the infrastructure.

Component 1, small instances Component 3, small instances


Component 1, large instances Component 3, large instances
Component 1, baseline Component 3, baseline
Component 2, small instances Component 4, small instances
Component 2, large instances Component 4, large instances
Component 2, baseline Component 4, baseline

2.2 2.2

2 2
Bandwidth Overhead

Bandwidth Overhead
1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1 1

0.8 0.8
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4.

Source: by author (2015).

connected nodes). In comparison with baseline scenarios, Component 1 leads to, on average,
25% lower delay (21.55ms compared to 29.07ms), while Component 2 leads to, on average,
15.40% lower delay (19.28ms compared to 22.79ms). In turn, Component 3 leads to, on aver-
age, 13.86% lower delay than its baseline (25.15ms compared to 29.20ms), while Component
4 leads to 15.75% lower delay (24.89ms compared to 29.55ms). In summary, even though our
baseline scenarios are planned in advance to support exact demands and we consider processing
times of virtual network functions to be three times those of physical ones, end-to-end delays
are still lower in virtualized scenarios. This advantage may become even more significant as
the estimated processing times of virtual network functions get closer in the future to those
observed in physical middleboxes.
After analyzing the behavior of SFCs considering homogeneous components, we now ana-
lyze the impact of a mixed scenario. In it, Components 1, 2, and 4 are repeatedly deployed in
the infrastructure sequentially. Figure 3.8 presents the results for the mixed scenario. Although
there are different topological SFC components being deployed together in the same infrastruc-
ture, the results exhibit similar tendencies as those of homogeneous scenarios. In Figure 3.8(a),
we observe that the average number of network functions (on average, 17 network functions
when considering small instances and 9 functions considering large instances) is proportional
54

Figure 3.7: Average end-to-end delay of SFCs deployed in the infrastructure.

Component 1 Component 3
Component 1, baseline Component 3, baseline
Component 2 Component 4
Component 2, baseline Component 4, baseline

34 34
32
Average End−to−end Delay (in milliseconds)

Average End−to−end Delay (in milliseconds)


32
30
28
30
26
24 28
22
26
20
18
24
16
14 22
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4.

Source: by author (2015).

to the obtained average values depicted in Figures 3.4(a) and 3.4(b). The average CPU overhead
also remains similar (9.12% considering small instances and 45.37% considering large ones).
In turn, the average overhead caused by chaining network functions in the mixed scenario is
of 59.06% and 47.51%, for small and large instances, respectively. Despite these similarities,
end-to-end delays tend to be comparatively lower than the ones observed in homogeneous sce-
narios. The delay observed in the proposed chaining approach is 8.57% lower than that of the
baseline (22.93ms in comparison to 25.08ms). This is due to the combination of requests with
different topological structures, which promotes the use of a wider variety of physical paths
(which, in turn, leads to lower overutilization of paths). Similarly to homogeneous scenarios,
average end-to-end delays are the same considering small and large instances.
We now proceed to the evaluation of our proposed heuristic approach. The heuristic was
subjected to the same scenarios as the ILP model, in addition to the ones with a larger infras-
tructure. Considering the scenarios presented so far (i.e., with physical infrastructures with 50
nodes and 20 SFC requests), our heuristic was able to find an optimal solution in all cases. We
omit such results due to space constraints. We emphasize, however, that the heuristic approach
was able to find an optimal solution in a substantially shorter time frame in comparison to the
ILP model, although the solution times of both approaches remained in the order of minutes.
55

Figure 3.8: Mixed scenario including Components 1, 2, and 4.

Small instances Small instances


Large instances Large instances
Baseline Baseline

30 1
Number of Network Functions Instantiated

25
0.8
20

CPU Overhead
0.6
15
0.4
10
0.2
5

0 0
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Average number of network function instances. (b) Average CPU overhead of network function in-
stances.

Small instances Small, large instances


Large instances Baseline
Baseline

34
Average End−to−end Delay (in milliseconds)

2
32
1.8 30
Bandwidth Overhead

28
1.6
26
1.4 24

1.2 22
20
1
18
0.8 16
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(c) Average bandwidth overhead. (d) Average end-to-end delay.

Source: by author (2015).

The average solution times of the ILP model and the heuristic considering all scenarios were
of, respectively, 8 minutes and 41 seconds and 1 minute and 21 seconds.
Last, we evaluate our heuristic approach on a large NFV infrastructure. In this experiment,
we consider a physical network with 200 N-PoPs and a maximum of 60 SFC components of
type 4. The delay limit was scaled up to 90ms in order to account for the larger network size.
56

Figure 3.9(a) depicts the average time needed to find a solution using both the ILP model and the
heuristic. The ILP model was not able to find a solution in a reasonable time in scenarios with
more than 18 SFCs (the solution time was longer than 48 hours). The heuristic approach, in turn,
is able to properly scale to cope with this large infrastructure, delivering feasible, high-quality
solutions in a time frame of less than 30 minutes. As in previous experiments, small network
function instances lead to higher solution times than large ones. This is mainly because smaller
instances lead to a larger space of potential solutions to be explored.
Although the heuristic does not find the optimal solution (due to time constraints), Figures
3.9(b), 3.9(c), 3.9(d) and 3.9(e) show that the solutions obtained through this approach present
a similar level of quality to the ones obtained optimally. Figure 3.9(b) depicts the average
number of instantiated network functions with the number of SFC requests varying from 1 to
60. As in previous experiments, the number of instances remains proportional to the number
of SFC requests. Smaller instance sizes lead to a higher number of network functions being
instantiated. Considering small sizes, 75 network functions instances are required on average.
In contrast, for large sizes, 40 instances are required on average. Figure 3.9(c), in turn, illustrates
the average CPU overhead. For small instances, CPU overhead is limited to 18.77%, while for
large instances it reaches 48.65%. Similarly to the results concerning the number of network
function instances, CPU overheads in these experiments also follow the trends observed in
previous ones. Next, Figure 3.9(d) presents bandwidth overheads. Small instances lead to a
bandwidth overhead of 300%, while for large instances this overhead is, on average, 410%.
These particularly high overheads are mainly due to the increase on the average length of end-
to-end paths, as the physical network is significantly larger. Note that the bandwidth overhead
observed in the baseline scenario (198%) is also significantly higher than those observed in
experiments employed on the small infrastructure. Last, 3.9(e) depicts the average end-to-
end delay observed in large infrastructures. In line with previous results, the end-to-end delay
tends to be lower than the delay observed in the baseline scenario. The scenario considering
Component 4 presents, on average, 17.72% lower delay than the baseline scenario (70.30ms
compared to 82.76ms). In short, these results demonstrate that: (i) the heuristic is able to find
solutions with a very similar level of quality as the optimization model for small infrastructures;
and (ii) as both infrastructure sizes and the number of requests increase, the heuristic is able to
maintain the expected level of quality while still finding solutions in a short time frame.
57

Figure 3.9: Scenario considering a medium-size NFV infrastructure and components of type 4.

Heuristic, small instances Heuristic, small instances


Heuristic, small instances Heuristic, large instances Heuristic, large instances
Heuristic, large instances ILP, small instances ILP, small instances
ILP, large instances ILP, large instances
ILP, small instances Baseline
ILP, large instances Baseline

80
Number of Network Functions Instantiated

1
1000 70
60 0.8
100

CPU Overhead
Time (seconds)

50
0.6
40
10
30 0.4

1 20
0.2
10
0.1 0 0
10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60
SFC Requests SFC Requests SFC Requests

(a) Average time needed to find a (b) Average number of network (c) Average CPU Overhead of
solution. function instances. network function instances.
Heuristic, small instances
Heuristic, large instances
ILP, small instances
ILP, large instances
Baseline Heuristic Baseline
ILP

4.5 85
4
Average End−to−end Delay (in milliseconds)

3.5 80
Bandwidth Overhead

3
2.5 75

2
70
1.5
1
65
0.5
0 60
10 20 30 40 50 60 10 20 30 40 50 60
SFC Requests SFC Requests

(d) Average bandwidth overhead of (e) Average end-to-end delay of


SFCs deployed in the infrastructure. SFCs deployed in the infrastructure.

Source: by author (2015).


58

4 A FIX-AND-OPTIMIZE APPROACH FOR EFFICIENT AND LARGE SCALE VIR-


TUAL NETWORK FUNCTION PLACEMENT AND CHAINING

In the previous chapter, we formally defined the Inter-datacenter VNFPC problem and pro-
posed both an optimization model and a heuristic procedure to cope with medium-size in-
frastructures. In this chapter, we go one step further and address the scalability of the Inter-
datacenter VNFPC problem in order to solve large instances (i.e., thousands of NFV nodes).
We propose a novel fix-and-optimize approach which combines the previously defined opti-
mization model and Variable Neighborhood Search (VNS)1 .
The remaining of this chapter is organized as follows. In Section 4.1 we describe the design
of our proposed math-heuristic based solution and, in Section 4.2, we extensively evaluate it in
comparison to state-of-the-art approaches.

4.1 Fix-and-Optimize Heuristic for the VNF Placement & Chaining Problem

As previously shown in Chapter 3, VNFPC optimization problem is NP-complete. To


tackle this complexity and come up with high-quality solutions efficiently, we introduce an
algorithm that combines mathematical programming with heuristic search. Thus, our opti-
mization model (defined in Section 3.1.4) is an important building block for the design of
such solution. Our algorithm is based on Fix-and-Optimize and Variable Neighborhood Search
(VNS), techniques that have been successfully applied to solve large/hard optimization prob-
lems (HANSEN; MLADENOVIć, 2001; HELBER; SAHLING, 2010; SAHLING et al., 2009).
A comprehensive view of our proposal is shown in Algorithm 2.

4.1.1 Overview

In this subsection we provide a general view of our algorithm, using Figure 4.1 as basis. It
illustrates the search for a solution to the VNFPC problem, considering the composite services
and network infrastructure shown in Figure 1.1.
In our algorithm, we first compute an initial, feasible solution (or configuration) χ to the
optimization problem (see Subsection 4.1.3). In the example shown in Figure 4.1, the initial
configuration (1) has VNF 1 placed on N-PoPs 3 and 5; VNF 2 placed on N-PoP 5; and VNF 3
placed on N-PoP 3.
We then iteratively select a subset of N-PoPs, and enumerate the list of variables yi,m,j ∈
D ⊆ Y related to them; variables listed in D will be subject to optimization, while others will
1
This chapter is based on the following publication:
• Marcelo Caggiani Luizelli, Weverton Luis Cordeiro, Luciana Salete Buriol, Luciano Paschoal Gaspary.
Fix-and-Optimize Approach for Efficient and Large Scale Virtual Network Function Placement and
Chaining. Elsevier Computer Communications, 2017.
59

remain unchanged (or fixed, hence fix-and-optimize). We take advantage of VNS to systemat-
ically build subsets of N-PoPs (Subsection 4.1.4). We also use a prioritization scheme to give
preference to those subsets with higher potential for improvement (Subsection 4.1.5).

Figure 4.1: A step-by-step run of our algorithm, for the scenario shown in Figure 1.1.

3 D 3 D 3 D 3 D
A 2 A 2 A 2 A 2
B B B B
4 5 4 5 4 5 4 5
C C C C
(1) Initial solution (2) k = 2 (Vadj), round 1 (3) k = 2 (Vadj), round 2 (4) k = 2 (Vany), round 1

3 D 3 D 3 D 3 D
A 2 A 2 A 2 A 2
B B B B
4 5 4 5 4 5 4 5
C C C C

(5) k = 2 (Vany), round 2 (6) k = 2 (Vany), round 3 (7) k = 3 (Vadj), round 1 (8) k = 3 (Vadj), round 2

3 D 3 D 3 D 3 D
A 2 A 2 A 2 A 2
B B B B
4 5 4 5 4 5 4 5
C C C C

(9) k = 3 (Vany), round 3 (10) Improved solution (11) k = 2 (Vadj), round 1 (12) k = 2 (Vadj), round 2

3 D 3 D 3 D
A 2 A 2 A 2 ...
B B B
4 5 4 5 4 5 VNF 1 endpoint
C C C VNF 2
(13) k = 2 (Vadj), round 3 (14) k = 2 (Vany), round (15) k = 2 (Vany), round VNF 3 N-PoP

Source: by author (2016).

For each candidate subset of N-PoPs (processed in order of priority), we submit its decom-
posed set of variables D along with χ to a mathematical programming solver (Subsection 4.1.6).
Here the goal is to obtain a set of values to those variables listed in D, so that a better configu-
ration is reached. We evaluate configurations according to the objective function of the model.
In case there is no improvement, we rollback and pick the N-PoP subset that follows. We run
this process iteratively until a better configuration χ0 is found. Once it happens, we replace
the current configuration with χ0 , and restart the search process (using χ0 as basis). This loop
continues until either we have explored the most promising combinations of N-PoP subsets, or
a time limit is exceeded.
60

The loop explained above is illustrated in Figure. 4.1 through instances (1)-(10). Note that,
for each instance, a different subset of N-PoPs (determined using VNS) is picked, and the
resulting configuration (after optimized by the solver) is evaluated using the model objective
function. For example, instance (6) failed as it violated delay constraints, whereas the other
instances did not reduce resource commitment. When an improved configuration is found (10),
the search process is restarted, now taking as basis the configuration found.

4.1.2 Inputs and Output

In addition to the optimization model input (described in the previous section), our algorithm
takes five additional parameters. The first three are 1) a global execution time limit Tglobal , for
the algorithm itself, 2) an initial neighborhood size Ninit , for the VNS meta-heuristic, and 3)
an increment step Ninc , for increasing the neighborhood size. These parameters are discussed
in Subsection 4.1.4. The other parameters are 4) NoImprovmax , which represents the maximum
number of rounds without improvement allowed per neighborhood, and 5) Tlocal , which indi-
cates the maximum amount of time the solver may take for a single optimization run. They are
discussed in Subsections 4.1.5 and 4.1.6, respectively.
Our algorithm produces as output a configuration χ = {Y, AN , AL } for the VNFPC prob-
lem. This configuration may be null, in case a feasible solution does not exist, given the network
topology in place and composite services submitted.

4.1.3 Obtaining an Initial Configuration

The first step of our algorithm (line 1) is generating a configuration χ to serve as basis for the
fix-and-optimize search. To this end, we remove the objective function (from model introduced
in Chapter 3), therefore turning the optimization problem into a factibility one. Then we use a
commercial solver (CPLEX2 ) for solving it. The resulting configuration is one that satisfies all
constraints, though not necessarily a high quality one, in terms of resources required. Observe
that a solution to the VNFPC problem may not exist (line 2). In this case, no solution is returned.
Note that other polynomial-time heuristic procedures could be used for generating χ, instead of
a solver. We argue, however, that CPLEX is very efficient at finding good initial solutions for
this problem. This is because it iteratively applies a set of highly specialized heuristics, based
on similarities, to well-know optimization problems.

4.1.4 Variable Neighborhood Search

One important aspect of our algorithm is how to choose which subset of variables D ∈ Y
will be passed to the solver for optimization. As mentioned earlier, we approach it using VNS.
2
http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/
61

It enables organizing the search space in k-neighborhoods. Each neighborhood is determined as


a function of the incumbent solution, and a neighborhood size k. In each round, a neighbor (a
tuple having k elements) is picked, and used to determine which subset of the incumbent solu-
tion will be subject to optimization. In case an improvement is made, VNS uses that improved
solution as basis, thus moving to a potentially more promising neighborhood (for achieving
further improvements).
We build a neighborhood as a combination (unordered tuple) of any k N-PoPs, with at least
one having a VNF assignment (line 5). Each tuple is then decomposed, resulting in the subset
of variables D ∈ Y that will be subject to optimization (variable decomposition is discussed in
Subsection 4.1.6). The reason we focus only on N-PoPs to build these tuples is that determining
the assignment of required network functions (aN i,q,j ) and chaining allocations (ai,j,q,k,l ) can be
L

done straightforwardly once VNF placements (yi,m,j ) are defined.


The initial value for k is given as input to the algorithm, Ninit (line 3). In the example shown
in Figure 4.1, we start with k = 2. In each iteration, we explore a neighborhood of size k (line 4)
and, if no improvement is made, we increment k in Ninc units (line 21). Observe in Figure 4.1
that, after exploring the 2-neighborhood space (instances (2)-(6)), we make k ← k + 1 and start
exploring a neighborhood composed of 3-tuples of N-PoPs (instance (7)). The highest value k
may assume is limited by the number of N-PoPs.
On the event of an improvement, we reset the neighborhood size (to Ninit ), and restart the
search process. This is illustrated in Figure 4.1, in the transition from instance (10) to (11). It
is important to emphasize that same size neighborhoods do not imply in same neighborhoods.
For example, consider instances (2) and (11) in Figure 4.1. Although the 2-tuples are composed
of the same N-PoPs, each belong to a different 2-neighborhood, as each is associated to a
distinct configuration χ. The iteration over neighborhoods, and evaluation of k-tuples within a
neighborhood, continues until either Tglobal is exceeded, or k exceeds the number of N-PoPs in
the infrastructure (line 4).

4.1.5 Neighborhood Selection and Prioritization

The time required by the solver to process a configuration χ and a subset D ⊆ Y is often
small. However, processing every candidate subset D from the entire k-neighborhood can be
cumbersome. For this reason, we prioritize those neighbor tuples that might lead to a better
configuration, or whose processing might be relatively less complex.
Our heuristic for prioritizing tuples in the k-neighborhood set Vk relies on three key ob-
servations. First, solving VNF allocations/relocations becomes less complex when N-PoPs are
adjacent. In other words, setting values for variables aN
i,q,j and ai,j,q,k,l is relatively less complex
L

when N-PoPs i and j are directly connected through a physical path.


To explore the observation above in our algorithm, we break down a k-neighborhood set
into two distinct ones. The first one (line 6) is formed by tuples whose participating N-PoPs
62

form a connected graph. The second set (line 7) is formed by remaining tuples in Vk (Vk,any =
Vk \ Vk,adj ), i.e., those tuples whose participating nodes form a disconnected graph. Observe
in Figure 4.1 that, for each k-neighborhood, we first process the tuples of adjacent N-PoPs
(Vadj ): instances (2)-(3); (7)-(8); and (11)-(13)). Then, we process the remainder tuples (Vany ):
instances (4)-(6); (9); and (14)-(15).
The second key observation is that N-PoPs having a higher number of VNFs allocated,
but fewer SFC requests assigned, are more likely candidates for optimization. Examples of
such optimization are the removal of (some) VNFs allocated to those N-PoPs, or merge of
those VNFs in a single N-PoP. We explore this observation by establishing a tuple priority, as
a function of its residual capacity, and processing higher priority tuples first (line 11). The
residual capacity of a tuple v is given by r : Vk → R, defined as a ratio of assigned VNFs to
placed ones (according to Eq. 4.1).

 
XXX X X X
r(v) = exp  aN
i,q,j − yi,m,j  (4.1)
i∈N P q∈Q j∈NqS i∈N P m∈F j∈Um

The third observation is a complement of the previous one. As the priority of a neighbor
decreases, it becomes less likely that it will lead to any optimization at all. The reason is trivial:
lower priority tuples are often composed of overcommitted N-PoPs, for which removal/reloca-
tion of allocated/assigned VNFs is more unlikely. Therefore, when processing a neighborhood,
we can skip a certain fraction of low priority neighbors, with a high confidence that no opti-
mization opportunity will be missed.
To this end, our algorithm takes as input NoImprovmax . It indicates the maximum number
of rounds without improvement that is allowed over a neighborhood subset. Before process-
ing a given subset, we reset the counter of rounds without improvements, NoImprov (line 9).
This counter is incremented for every round in which no improvement is made (lines 14 -
17). We stop processing the current neighborhood subset once NoImprov exceeds NoImprovmax
(line 10).

4.1.6 Configuration Decomposition and Optimization

Decomposing a configuration (i.e., building a subset of variables D ⊆ Y from χ for opti-


mization) means enumerating all yi,m,j variables that can be optimized (line 12). It basically
consists of listing all yi,m,j variables related to each N-PoP in the current neighbor tuple v, con-
sidering all network function types m ∈ F , and function instances available j ∈ Um . Formally,
we have D = { yi,m,j ∈ Y | i ∈ v }.
Once the decomposition is done, the incumbent configuration χ and set D are submitted to
the solver (line 13). As mentioned earlier, the solver will consider variables listed in D as free
63

(for optimization), and those not listed as fixed (i.e., no change in their values can be made).
Observe that this restriction does not affect those variables related to the assignment of required
network functions (aN i,q,j ) and to chaining allocation (ai,j,q,k,l ).
L

We also limit each optimization run to a certain amount of time Tlocal (passed as parameter).
This enables us to allocating a significant amount of the global time limit Tglobal on fewer but
extremely complex χ and D instances.
Algorithm 2 Overview of the fix-and-optimize heuristic for the VNF placement and chaining
problem.
Input: Tglobal global time limit
Tlocal time limit for each solver run
Ninit initial neighborhood size
Ninc increments for neighborhood size
NoImprovmax max. rounds without improvement
Output: χ: best solution found to the optimization model
1: χ ← initial feasible configuration
2: if a feasible configuration does not exist then fail else
3: k ← Ninit
4: while Tglobal is not exceeded and k ≤ |N P | do
5: Vk ← current neighborhood, formed of unordered
tuples of k nodes only, one of them (at least) having
a VNF assignment in the incumbent configuration χ
6: Vk,adj ← neighbor tuples from Vk , whose nodes
(for each tuple) form a connected graph
7: Vk,any ← Vk \ Vk,adj
8: for each list V ∈ {Vk,adj , Vk,any }, while a better
configuration is not found do
9: NoImprov ← 0
10: while V = 6 ∅ and NoImprov ≤ NoImprovmax and
better configuration is not found do
11: v ← next unvisited, highest priority
neighbor tuple from V
12: D ← decomposed list of variables yi,m,u from
those nodes listed in neighbor tuple v
13: 0
χ ← configuration χ optimized by the solver,
performed under time limit Tlocal , and
making those variables not listed in D as fixed
14: if χ0 is a better configuration than χ then
15: update χ to reflect configuration χ0
16: else
17: NoImprov ← NoImprov + 1
18: end if
19: end while
20: end for
21: if no improvement was made then k ← k + Ninc
else k ← Ninit end if
22: end while
23: return χ
24: end if
64

4.2 Evaluation

In this section we evaluate the effectiveness of our math-heuristic algorithm in generating


feasible solutions to the VNFPC problem. We compare the solutions achieved with those of
Lewin-Eytan et al. (LEWIN-EYTAN et al., 2015), Luizelli et al. (LUIZELLI et al., 2015), and
with a globally optimal one obtained with CPLEX (whenever they are feasible to compute).
We also establish a lower bound for the VNFPC optimization, computed by relaxing output
variables of the mathematical model and making them linear (instead of discrete). Observe
that such lower bound becomes a solid reference for what the optimal solution to the VNFPC
problem looks like, in a given large scenario.
We used CPLEX v12.4 for solving the optimization models, and Java for implementing the
algorithms. The experiments were run on Windows Azure Platform – more specifically, an
Ubuntu 14.04 LTS VM instance, featuring an Intel Xeon processor E5 v3 family, with 32 cores,
448 GB of RAM, and 6 TB of SSD storage. Next we describe our experiment setup, followed
by results obtained.

4.2.1 Setup

The physical network substrate was generated with Brite3 , following the Barabasi-Albert
(BA-2) (ALBERT; BARABÁSI, 2000) model. The number of N-PoPs and physical paths be-
tween them varied in each scenario, ranging from 200 to 1,000 N-PoPs, and from 793 to 3,993
links. The computing power of each N-PoP is normalized and set to 1, and each link has band-
width capacity of 10 Gbps. Average delay between any pair of N-PoPs is 90 ms, value adopted
after a study from Choi et al. (CHOI et al., 2007).
In order to fully grasp the efficiency and effectiveness of our approach, we carried out ex-
periments considering a wide variety of scenarios and parameter settings. For the sake of space
constraints, we concentrate on a subset of them. Our workload, for example, comprised from
1 to 80 SFCs. The topology of each SFC consisted of: a source endpoint and then a link (sup-
porting 1 Gbps of traffic flow) to an instance of VNF X; a flow branch follows VNF X, each
branch with 500 Mbps output flow, linking to a distinct instance of VNF Y; finally, each flow
links to a distinct sink endpoint. Instances of VNF X require a normalized computing power
capacity of 0.125 for small loads, and 1 for large loads, and have a processing delay of 0.6475
seconds. Instances of VNF Y require, in turn, 0.125 and 0.25 for small and large loads, and
impose a processing delay of 7.0771 seconds. VNF processing delays were determined after a
study from Dobrescu et al. (DOBRESCU; ARGYRAKI; RATNASAMY, 2012).
In the subsections that follow, we adopted the following parameter setting for our approach:
Tglobal = 5, 000 seconds, Tlocal = 200 seconds, Ninit = 2, Ninc = 1, and NoImprovmax = 15. In
Subsection 4.2.4 we present an analysis of our approach considering other parameter settings.
3
http://www.cs.bu.edu/brite/
65

4.2.2 Our Heuristic Algorithm Compared to Existing Approaches

Figure 4.2: Number of deployed VNFs and required time to compute a feasible placement and
chaining.

100 100
VNFs deployed

VNFs deployed
10 10 Lewin−Eytan (200 N−PoPs)
Fix−&−Opt (200 N−PoPs) Lewin−Eytan (400 N−PoPs)
Fix−&−Opt (400 N−PoPs) Lewin−Eytan (600 N−PoPs)
Fix−&−Opt (600 N−PoPs) Lewin−Eytan (800 N−PoPs)
Lewin−Eytan (1000 N−PoPs)
Fix−&−Opt (800 N−PoPs) Luizelli (200 N−PoPs)
Fix−&−Opt (1000 N−PoPs) CPLEX (200 N−PoPs)
lowerbound lowerbound
1 1
60 120 180 240 60 120 180 240
Network functions requested Network functions requested
(a) (b)

10k 10k

1k 1k
Time (seconds)

Time (seconds)

Fix−&−Opt (200 N−PoPs) Lewin−Eytan (200 N−PoPs)


Fix−&−Opt (400 N−PoPs) Lewin−Eytan (400 N−PoPs)
Fix−&−Opt (600 N−PoPs) Lewin−Eytan (600 N−PoPs)
Fix−&−Opt (800 N−PoPs) Lewin−Eytan (800 N−PoPs)
Fix−&−Opt (1000 N−PoPs) Lewin−Eytan (1000 N−PoPs)
Luizelli (200 N−PoPs)
CPLEX (200 N−PoPs)
0 60 120 180 240 0 60 120 180 240
Network functions requested Network functions requested
(c) (d)

Source: by author (2016).

Figure 4.2 provides an overview of achieved results, focusing on resource commitment of


computed solutions, and time required to generate them. The main takeaway here is that our
approach is able to generate significantly better solutions (closer to the lower bound) in a reason-
able time, compared to existing ones. Observe in Figure 4.2(a), for example, that our approach
generated a solution distant at most 2.1 times from the lower bound, in a more extreme case
with 1,000 N-PoPs and 180 requested functions. This is a significantly smaller gap compared
to other approaches, even considering their best cases. The measured distance was 4.97 times
for Lewin-Eytan et al. (LEWIN-EYTAN et al., 2015), in the scenario with 200 N-PoPs and
120 requested functions, whereas for Luizelli et al. (LUIZELLI et al., 2015) it was 6.7 times,
considering 200 N-PoPs and 60 functions.
Observe that finding an optimum solution (with CPLEX) is unfeasible even for very small
66

instances (200 N-PoPs and less than 20 requested functions). Luizelli et al. (LUIZELLI et al.,
2015) also failed short in scaling to such small instances, requiring more than 10k seconds for
those around 60 requested functions.
An important caveat regarding the time required to obtain a solution is that, for our approach,
we measured the time elapsed until it generated the highest quality solution. Note however that
our algorithm is allowed to continue running, until Tglobal is passed or all neighborhoods are
explored.

4.2.3 Qualitative Analysis of Generated Solutions

Figure 4.3: Analysis of resource commitment (computing power and bandwidth) of each solu-
tion generated.

100 100
Fix−&−Opt (200 N−PoPs)
Unused allocated CPU (%)

Fix−&−Opt (200 N−PoPs)


Bandwidth overhead (%)

Fix−&−Opt (600 N−PoPs) 90 Fix−&−Opt (600 N−PoPs)


Fix−&−Opt (1000 N−PoPs) Fix−&−Opt (1000 N−PoPs)
80 Lewin−Eytan (200 N−PoPs) Luizelli (200 N−PoPs)
Lewin−Eytan (600 N−PoPs) 80
Lewin−Eytan (1000 N−PoPs)
Luizelli (200 N−PoPs)
60 70

60
40 50

40
20
30

0 20
60 120 180 240 60 120 180 240
Network functions requested Network functions requested
(a) (b)

Source: by author (2016).

Here we dive deeper on measuring the quality of generated solutions, analyzing their over-
provisioning with regard to computing power and bandwidth. Observe from Figure 4.3 that our
proposal led to comparatively higher quality solutions, minimizing the amount of allocated but
unused computing power in N-PoPs. By “unused” we mean that resources were allocated to
VNFs deployed on N-PoPs, but the capacity of the VNFs is not fully consumed by assigned
SFCs.
The performance of Lewin-Eytan et al. (LEWIN-EYTAN et al., 2015) is worse in this aspect
mostly because their goal is not only minimize resource commitment, but also the distance
(hops) a flow must traverse to reach a required VNF. It is also important to emphasize that their
theoretical model has simplifications, which result in solutions that often extrapolate resource
commitment. Since the authors did not approach VNF chaining in their model, bandwidth
overhead could not be measured for their case.
67

Table 4.1: Assessment of the quality of generated solutions, and distance to lower bound, under
various parameter settings.

Tlocal
Ninit
50 factor 60 factor 100 factor
2 59 5.24 51 4.53 51 4.53
Tglobal = 600 sec. 4 59 5.24 59 5.24 42 3.73
6 59 5.24 59 5.24 33 2.93
Tlocal
Ninit
50 factor 60 factor 100 factor
2 32 2.84 16 1.42 15 1.33
Tglobal = 1,000 sec. 4 44 3.91 12 1.06 12 1.06
6 40 3.55 18 1.60 12 1.06
Tlocal
Ninit
50 factor 60 factor 100 factor
2 20 1.77 12 1.06 12 1.06
Tglobal = 1,500 sec. 4 36 3.20 14 1.24 12 1.06
6 36 3.20 15 1.33 12 1.06

Source: by author (2016).

4.2.4 Sensitivity Analysis

We conclude our analysis with a glimpse on experiments varying input parameter settings,
whose results are summarized in Table 4.1. The cells in gray show how far our approach is
from the lower bound (in terms of resources allocated), in a scenario with 600 N-PoPs and 120
functions. We focused on the following values for Tglobal : 600, 1,000, and 1,500 secs.; for Tlocal :
50, 60, and 100 secs.; and for Ninit : 2, 4, and 6.
Observe in Table 4.1 that our approach was at most 5.24 times far from the lower bound,
having allocated 59 VNFs (compared to a lower bound of 11.75 VNFs). More importantly,
observe that such distance decreases as we provide more execution time for both the algorithm
itself (Tglobal ) and for each optimization instance (Tlocal ). In the best case scenario, our approach
was only 1.06 times distant from the lower bound, allocating the least number of VNFs (12).
Observe also a trade-off when setting Tlocal . On the one hand, larger values enable solving
each sub-problem instance with higher quality, as one may note in Table 4.1. On the other
hand, smaller values enable avoiding more complex sub-problem instances, for which CPLEX
requires far more RAM memory to solve. It also leaves the algorithm with more global time
available for exploring other promising sub-problem instances.
68

5 THE ACTUAL COST OF SOFTWARE SWITCHING FOR NFV CHAINING

In Chapters 3 and 4, we focused on the Inter-datacenter VNFPC problem. We mathemati-


cally formalized it and designed efficient and scalable solutions so as large NFV distributed in-
frastructures could be timely handled. As previously discussed, a solution to the Inter-datacenter
VNFPC problem involves the placement and chaining of SFC requests into multiple locations.
This is particularly important to SFC requests with stringent network requirements (e.g, very
low end-to-end delays). In this case, SFCs are carefully broken down into sub-chains (i.e.,
subgraphs) individually placed and chained into datacenters. Then, these partial SFC requests
are deployed on top of commodity servers. Up to this point, we have focused on planning and
chaining SFCs in distributed NFV environments without any insight on how VNFs are deployed
(onto servers) and their incurred performance limitations.
For that reason, we take a step further in the resolution of this problem by specifically
tackling the intra-datacenter VNFPC problem1 . In this chapter, we focus on paving the way
towards cost-efficiently intra-datacenter deployments. As NFV is yet a premature research area,
little has been done in order to assess performance limitations of real NFV-based deployments
and the incurred operational costs of commodity servers.
To fill in this gap, we conduct an extensive and in-depth evaluation, measuring the perfor-
mance and analyzing the impact of deploying service chains on NFV servers with Open vSwitch
(PFAFF et al., 2015). We analyze different intra-datacenter placement strategies and assess per-
formance metrics such as throughput, packet processing, and resource consumption. Based on
the performance evaluation, we then define a generalized abstract cost function that accurately
captures the CPU cost of network switching. Our proposed model can predict the operational
cost incurred by different deployment strategies given arbitrary requirements.
The remainder of this chapter is organized as follows. In Section 5.1, we start describ-
ing the intra-datacenter VNFPC problem. Then, in Section 5.2, we formalize the model used
throughout this chapter. In Section 5.3, we provide an extensive evaluation of service chain
deployments on a real NFV environment. Last, in Section 5.4 we define an abstract CPU-cost
function that estimates operational costs of NFV deployments.

5.1 Problem Overview

Placement of service chains in NFV-based infrastructure introduces new challenges that are
not trivial and need to be addressed. A key factor in the long-term success of NFV is the
optimization of the operational costs (OPEX) when deploying network services. On deploying
1
This chapter is based on the following publication:
• Marcelo Caggiani Luizelli, Danny Raz, Yaniv Saar and Jose Yallouz. The Actual Cost of Software
Switching for NFV Chaining. In: Proceedings of IFIP/IEEE International Symposium on Integrated
Network Management, 2017.
69

service chains onto physical servers (i.e. intra-datacenter VNFPC), network services might face
performance penalties and limitations on critical network metrics – such as throughput, latency,
and jitter – depending on how network functions are chained and deployed. This is one of the
most interesting challenges for network providers in the shift to NFV, namely, to identify good
placement strategies that minimize the provisioning and operation cost of service chains.

Figure 5.1: Example of strategies to deploy three given Service Function Chaining.

Server 𝑪 Server 𝑪
𝝋𝟑𝟏 𝝋𝟑𝟐 𝝋𝟑𝟑 𝝋𝟏𝟑 𝝋𝟐𝟑 𝝋𝟑𝟑
Starting Starting
Server 𝑩 Server 𝑩
Point 𝝋𝟐𝟏 𝝋𝟐𝟐 𝝋𝟐𝟑 Point 𝝋𝟏𝟐 𝝋𝟐𝟐 𝝋𝟑𝟐
End End
Server 𝑨 Server 𝑨
𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑 Point 𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 Point

(a) Gather placement strategy. (b) Distribute placement strategy.

Source: by author (2017).

However, operational costs of NFV are still far from being straightforward estimated even
for relatively simple deployment strategies. In the context of practical NFV deployments, Open-
Stack (ONF, 2015) is the most popular open-source orchestration system (and part of OPNFV
project) and its scheduling/placement mechanisms can be used for the placement of service
chains. The current available placement mechanism follows two deployment strategies (or ob-
jective functions): load balancing (i.e. distributing VNFs between resources) or energy con-
serving (i.e. gathering VNFs on a selected resource).
Figure 5.1 illustrates these two placement strategies. In this example, we are given three
identical servers A, B and C, and three identical service chains {ϕ1 , ϕ2 , ϕ3 }. Each service
chain ϕi (for i = 1, 2, 3) is tagged with a fixed amount of required traffic and is composed of
three chained VNFs: ϕi = hϕi1 → ϕi2 → ϕi3 i. Figure 5.1(a) depicts the first placement strategy
(referred to as “gather”), where all VNFs composing a single chain are deployed on the same
server. Note that in the gather case the majority of traffic steering is done by the server’s
internal virtual switching. Figure 5.1(b) illustrates the second placement strategy (referred to as
“distribute”), where each VNF is deployed on a different server and, therefore, the majority of
traffic steering is done between servers by external switches.
One can immediately see that these two placement strategies require from the VNF per-
spective the same amount of resources to operate. However, it is not clear the operational costs
from the provider’s perspective involved in each deployment – especially regarding the cost of
their software switching (CPU consumption). Additionally, network metrics (e.g. throughput
and latency) might perform differently according to each deployment. In order to decide which
placement strategy is better, we need to identify the specific optimization criteria of interest. In
70

this chapter, we take the first steps towards accurately estimating the virtual switching cost –
which ultimately represents the operational cost.
As previously discussed, virtual switching is an essential building block in NFV-based in-
frastructure which enables flexible communication between VNFs; however, it introduces costs
in terms of resource utilization in the infrastructure. Therefore, measuring and modeling these
costs represents an essential step to: (i) efficiently guide VNF placement in a physical infras-
tructure; (ii) understand how software switching impacts deployed network services regarding
network metrics.

5.2 Model Definition

In this section, we define our model to estimate the cost of software switching for different
placement strategies.
For a server S we denote by S t the maximum throughput that server S can support (i.e.,
the wire limit of the installed NIC). Following the recommended best practice for virtualization
intense environment ((INTEL, 2016b)), NFV servers require to be configured with two disjoint
sets of CPU-cores, one for the hypervisor and the other for the VNF to operate. We denote by
S h (and S v ) the number of CPU-cores that are allocated and reserved for the hypervisor (and
guests, respectively) to operate. The total number of CPU-cores installed in server S is denoted
by S c . Throughout this chapter, unless explicitly saying otherwise, we assume that we are given
a set of k servers S = {S1 , S2 , . . . , Sk }.
A service chain is defined to be an ordered set of VNFs ϕ = hϕ1 → ϕ2 . . . → ϕn i. We de-
note by ϕ|n the length of the service chain, and define ϕ|p (and ϕ|s ) to be the number of packets
per second (and average packet size, respectively) that service chain ϕ is required to process.
Note that ϕ|p and ϕ|s are properties of service chain ϕ, namely given two service chains ϕ and
ψ if ϕ|p 6= ψ|p or ϕ|s 6= ψ|s then ϕ 6= ψ. Throughout the paper, unless explicitly saying other-
wise, we assume that we are given a set of m identical service chains Φ = {ϕ1 , ϕ2 , ϕ3 , . . . ϕm },
i.e., ∀i,j=1..m : ϕi = ϕj .
We define P : Φ → S k to be a placement function that for every service chain ϕ ∈ Φ, maps
every VNF ϕi ∈ ϕ to a server Sj ∈ S. We identify two particularly placement functions:

1. Pg – we call gather placement, where for every service chain the placement function
deploys all VNFs ϕi ∈ ϕ on the same server, namely:

∀ϕ∈Φ∀ϕ ,ϕ ∈ϕ :
i j
Pg (ϕi ) = Pg (ϕj )

2. Pd – we call distribute placement, where for every service chain the placement function
deploys each VNFs ϕi ∈ ϕ on a different server, namely:

∀ϕ∈Φ∀ϕ ,ϕ ∈ϕ :
i j
i 6= j → Pd (ϕi ) 6= Pd (ϕj )
71

As explained before, both considered placement functions follow OpenStack and the objective
functions implemented by its scheduler mechanism (i.e. load balancing and energy conserving).
Figure 5.1 illustrates these deployment strategies. Figure 5.1(a) depicts deployment of VNFs
that follows the gather placement functions Pg , and Figure 5.1(b) depicts deployment of VNFs
that follows the distribute placement functions Pd .
For a given placement function P that deploys all service chains in Φ on the set of servers
S, we define C : P → (R+ )k to be a cpu-cost functions that maps placement P to the required
CPU per server. We say the cpu-cost function is feasible if there are enough resources to
implement it. Namely, cpu-cost function C is feasible with respect to a given placement P if
for every server Sj ∈ S, the function does not exceed the number of CPU-cores installed in the
server: ∀Sj ∈S : C(P)j ≤ Sjc . Note that our main focus is the evaluation of the cost-functions
with respect to the hypervisor performance, and therefore we evaluate deployments of VNFs
that are fixed to do the minimum required processing, i.e. forward the traffic from one virtual
interface to another.

5.3 Deployment Evaluation

In this section, we evaluate the performance of software switching for NFV chaining con-
sidering metrics such as throughput, CPU utilization, and packet processing. We describe our
environment setup, followed by a discussion on performance metrics and possible bottlenecks.
We evaluate two types of OV S installations: Linux kernel and DPDK; and compare between
the two types of VNF placements: gather placement function Pg , versus distribute placement
function Pd .

5.3.1 Setup

Our environment setup consists of two high-end HP ProLiant DL380p Gen8 servers, each
server has two Intel Xeon E5-2697v2 processors, and each processor is made of 12 physical
cores at 2.7 Ghz. One server is our Device Under Test (DUT), and the other is our traffic
generator. The servers have two NUMA nodes (Non-Uniform Memory Access), each has 192
GBytes RAM (total of 384 GBytes), and an Intel 82599ES 10 Gbit/s network device with two
network interfaces (physical ports). We disabled HyperThreading in our servers in order to have
more control over core utilization.
In both types of OV S installations (Linux kernel and DPDK), we isolate CPU-core and
separate between two disjoint sets CPU-cores: S c = 24 = S h + S v , i.e. CPU-cores used
for management and for allocating resources for the VNFs to operate. This a priori separation
between these two disjoint sets of CPU-cores plays an important role in the behavior of CPU
consumption (and packet processing). In our experiments the size of the disjoint set of cores
that are given to the hypervisor (S h ) varies between 2 to 12 physical cores, while the remainder
72

Figure 5.2: Given a set of m service chains Φ, illustrating deployment of VNFs on a single
server (our DUT).

𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑 𝝋𝟏𝒏 𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 𝝋𝒎


𝟏

… …

Open vSwitch Open vSwitch


DUT DUT

(a) Following gather placement function Pg . (b) Following distribute placement function Pd .

Source: by author(2017).

is given to the VNFs (i.e. S v ranges between 22 to 12). The exact values depend on the type of
installation (Linux kernel or DPDK).
The DUT is installed with CentOS version 7.1.1503, Linux kernel version 3.10.0. All guests
operating system are installed with Fedora 22, Linux kernel version 4.0.4 – running on top of
qemu version 2.5.50. Each VNF is configured to have a single virtual CPU pinned to a specific
physical core and 4GBytes of RAM. Network offload optimizations (e.g., TSO, GSO) in all
network interfaces are disabled in order to avoid any noise in the analysis and provide a fair
comparison between the two types of OV S installations. We evaluated Open vSwitch version
2.4 in both kernel mode and DPDK. For the latter, we compiled OV S against DPDK version
2.2.0 (without shared memory mechanisms – in compliance with rigid security requirements
imposed by NFV). Packet generation is performed on a dedicated server that is interconnected
to the DUT server with one network interface to send data, and another network interface to
receive. In all experiments, we generate the traffic with Sockperf version 2.7 (CORPORATION,
2016), that invokes 50 TCP flows.
Note that our objective is to provide an analytic model that captures the cost of software
switching. In order to be able to examine and provide a clear understanding of the parameters
that impact the cost of software switching, we need to simplify our environment by removing
optimizations such as network offloads, and fixing resource allocation. For achieving optimal
performance, the reader is referred to (INTEL, 2016b).
We evaluate the placement functions defined in Section 6.1 (i.e., the gather placement func-
tion Pg and the distribute placement function Pd ) on our DUT as follows. We vary the number
of simultaneously deployed VNFs from 1 to 30 in each of the placement functions. All VNFs
forward their traffic between two virtual interfaces, each on a different sub-domains, relaying
on IP route forwarding.
Figure 5.2 illustrates deployments of VNFs on a single server (our DUT). Figure 5.2(a)
depicts the traffic flow of the gather placement function Pg . Ingress traffic is arriving from the
physical NIC to the OV S, which forwards it to the first V NIC of the first VNF. The VNF then
73

Figure 5.3: Experiment results showing throughput, packet processing and CPU consumption
for traffic generated in 100Bytes packets, that is examined over increasing size of chain, on a
DUT that is installed with kernel OV S.

● HV = 4 35
1.5M

# Packets in OVS (pps)


● HV = 4 HV = 6 30
HV = 6 HV = 8
Bandwidth (Mbps)

% CPU Utilization
200
HV = 8 25

1M 20

100 15 ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ●
● ● ● ●●
● ● ● ● 10


●● 0.5M 5 ●
● HV = 4 VNF = 20
● ● ● ●
HV = 6 VNF = 18
● ●
0
● ● ● ●
0.3M ● 0 HV = 8 VNF = 16
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(a) BW for placement Pg (b) PPS for placement Pg (c) CPU for placement Pg

35
1.5M
# Packets in OVS (pps)

30
Bandwidth (Mbps)

% CPU Utilization
200
● ● ●
25
● ● ●
● ● ● ● ● ●

1M
● ● 20
●● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
100
● 15 ●
● ●
10 ●
● ● HV = 4 ● HV = 4
HV = 6 0.5M HV = 6 5 ● ● HV = 4 VNF = 20
HV = 8 ● HV = 8 HV = 6 VNF = 18
0 0.3M 0 HV = 8 VNF = 16
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(d) BW for placement Pd (e) PPS for placement Pd (f) CPU for placement Pd

Source: by author (2017).

returns the traffic to the OV S through its second V NIC, and the OV S forwards the traffic to
the following VNF, composing a chain that eventually egresses the traffic through the second
physical NIC. On the other hand, Figure 5.2(b) depicts the traffic flow of the distribute place-
ment function Pd . For this placement, it is the responsibility of the traffic generator to spread
the workload between the VNFs. Ingress traffic arriving from the physical NIC to the OV S that
forwards it to one of the VNFs through its first V NIC. The VNF then returns the traffic to the
OV S through its second V NIC, that egress the traffic through the second physical NIC.

5.3.2 Evaluating Packet Intense Traffic

In order to discuss the utilization of resources, we measure and analyze the results of our
DUT for several configurations, while receiving intense network workload. To generate an in-
tense network workload we generate traffic where each packet is composed of 100Bytes (avoid
reaching the NIC’s wire limitation). We examine the behavior of throughput, packet processing
and CPU consumption for increasing chain size, using both placement functions, i.e. Pg and
Pd .
The six graphs in Figure 5.3 (and Figure 5.4) depict CPU consumption, packet processing,
and throughput performance on a DUT that is installed with kernel OV S (and DPDK-OV S,
74

Figure 5.4: Experiment results showing throughput, packet processing and CPU consumption
for traffic generated in 100Bytes packets, that is examined over increasing size of chain, on a
DUT that is installed with DPDK-OV S.

1000 5M 35
● HV = 1 + 1 ● HV = 1 + 1 VNF = 22

# Packets in OVS (pps)


● HV = 1 + 1 HV = 2 + 1 30 HV = 2 + 1 VNF = 21
HV = 2 + 1
Bandwidth (Mbps)

% CPU Utilization
800 4M HV = 4 + 1 HV = 4 + 1 VNF = 19
HV = 4 + 1 25
600 3M 20

400
● 15
2M ●● ●




10
● ● ● ● ● ● ● ●
200
●● 1M 5 ●● ●● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● 0.5M
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(a) BW for placement Pg (b) PPS for placement Pg (c) CPU for placement Pg

1000 5M 35
● HV = 1 + 1 VNF = 22
# Packets in OVS (pps)

● HV = 1 + 1 30 HV = 2 + 1 VNF = 21
HV = 2 + 1
Bandwidth (Mbps)

% CPU Utilization
800 4M HV = 4 + 1 VNF = 19
HV = 4 + 1 25

600 ●● ● ●
20


3M ●
● ●
400 ● ●
15

● ●
2M ●

● ● ● ● ● ● 10
● ● ● ● ●
200 ● HV = 1 + 1 1M 5
HV = 2 + 1 ●● ●● ● ● ● ● ● ● ● ● ● ●

HV = 4 + 1 0.5M
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(d) BW for placement Pd (e) PPS for placement Pd (f) CPU for placement Pd

Source: by author (2017).

respectively). The three graphs on the top (Figures 5.3(a), 5.3(b), and 5.3(c) for kernel OV S
and Figures 5.4(a), 5.4(b), and 5.4(c) for DPDK-OV S) present the results following the gather
placement function Pg and the three graphs on the bottom (Figures 5.3(d), 5.3(e), and 5.3(f) for
kernel OV S and Figures 5.4(d), 5.4(e), and 5.4(f) for DPDK-OV S) present the results following
the distribute placement function Pd .
We measure throughput by aiming at the total of traffic that the traffic generator successfully
sends and receives after routing through the DUT. To measure packet processing we aggregate
the number of received packets (including TCP acknowledgments) in all interfaces of the OV S
(including both NICs and V NICs). For CPU consumption, we present the results for both
the CPU-cores allocated to the hypervisor to manage and support the VNF and CPU-cores
allocated for the VNFs to operate.

5.3.2.1 Packet Processing and Throughput

A key factor in the behavior of our environment is the a priori separation between two
disjoint sets of CPU-cores (S h and S v ). Thus, in our experiments, we vary values of CPU-
cores that are allocated to the OV S (hypervisor). For ease of presentation, we show only several
selected values (annotated HV in the figures).
75

Figure 5.3(a) depicts the average throughput for gather placement function Pg , and Fig-
ure 5.3(d) for distribute placement function Pd . For the case of distribute placement function
Pd , the more VNFs we deploy on the server, the more the average throughput is increased,
while for the gather placement function Pg the more the average throughput is decreased. Fig-
ures 5.3(b) and 5.3(c) depict packet per second for both our placement function Pg and Pd . The
results show that OV S can seamlessly scale to support 5-10 VNF, however at that point OV S
reaches saturation, and we observe mild degradation when continuing to increase the number
of deployed VNFs.

Figures 5.4(a) and 5.4(d) (and Figures 5.4(b) and 5.4(c)) depict average throughput (and
packet per second, respectively) of our two placement function Pg and Pd , on a DUT that is
installed with OV S-DPDK. The behavior of OV S-DPDK is similar to that of the kernel OV S,
with the exception of having better network performance.

5.3.2.2 CPU Consumption

Figures 5.3(c) and 5.3(f) depict the average CPU consumption of OV S in Kernel mode. For
both placement functions, the CPU consumption of the VNFs is bounded by the number of
allocated cores for management S h . A tighter bound of the CPU consumed by the VNFs (for
networking) is a function of the CPU consumed by the hypervisor to manage the traffic, namely
the total CPU consumed by the VNF is bounded by the total CPU consumed by the hypervisor
in order to steer the traffic to the VNF.

Comparing CPU utilization between the two placement functions, we observe that both
placement strategies behave similarly for short service chains with little traffic. However, for
longer service chains with increasing traffic requirements the behavior of the two placements
differs. Namely, in the gather placement, the CPU consumed by the VNF is almost identical to
the CPU consumed by the hypervisor, whereas in the distribute placement the CPU consumed
by the VNFs are 70% to 50% of the CPU consumed by the hypervisor. This observation sug-
gests that in order to achieve efficiently CPU cost placement, different traffic might require
different placement strategies (as will be shown in the results of our analytical analysis in Sec-
tion 5.4).

Figures 5.4(c) and 5.4(f) depict the average CPU consumption of DPDK-OV S. For both
placement functions, the CPU consumed by the hypervisor is fixed and derived from the DPDK
poll-mode driver (with the additional CPU-core for the management of other non-networking
provision tasks). Comparing CPU utilization between the two placement functions shows again
that both behave the same for small chains with little traffic, however for bigger chains with
increasing traffic requirements the behaviour of the two placements is different.
76

Figure 5.5: Throughput for Figure 5.6: Throughput for Figure 5.7: Analysis of mul-
traffic generated in 1500Bytes traffic generated in 1500Bytes tiple servers, showing total
packets, that is examined over packets, that is examined over throughput for traffic gener-
increasing size of chain, on increasing size of chain, on ated in 1500Bytes packets,
a DUT that is installed with a DUT that is installed with that is examined over increas-
kernel-OV S. DPDK-OV S. ing size of chain.

9000 9000 12000


8000 ● HV = 4 8000 ● HV = 1 + 1
HV = 6 HV = 2 + 1 10000

Bandwidth (Mbps)
Bandwidth (Mbps)

Bandwidth (Mbps)
HV = 8 HV = 4 + 1
6000 6000 ●


6000
4000 4000 ●●
● ● ● ● ●
● ● ● ●
● ●


2000 2000 ●
● HV = 4 − Gather
●●


2000 ● ● HV = 4 − Distribute
●●● ● ● ●
● ● ● ● ● ●
HV = 8 − Gather
● ● ●
0
● ● ● ● ●
0

HV = 8 − Distribute
1 5 10 15 20 25 30 1 5 10 15 20 25 30 12 52 102 152 202 252 302
# VNF # VNF # VNF

(a) BW for placement Pg (a) BW for placement Pg (a) Kernel OV S

9000 9000
HV = 1 + 1 − Gather
8000 8000
● ●●● 30000
● HV = 1 + 1 − Distribute

HV = 4 + 1 − Gather
Bandwidth (Mbps)
Bandwidth (Mbps)

Bandwidth (Mbps)

● ●
● ● HV = 4 + 1 − Distribute
6000 6000 ● ● ●
● ●
● ● ●
●● ● ● ●
● ● ●
● ●
4000 ● 4000 15000

10000 ●
2000 ●
● HV = 4 2000 ● HV = 1 + 1 ●● ● ● ● ● ●
● ● ● ● ●
HV = 6 HV = 2 + 1 5000 ●
HV = 8 HV = 4 + 1
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 12 52 102 152 202 252 302
# VNF # VNF # VNF

(b) BW for placement Pd (b) BW for placement Pd (b) DPDK-OV S

Source: by author (2017).

5.3.3 Evaluating Throughput Intense Traffic

We reiterate the experiment presented in Subsection 5.3.2 in the context of achieving maxi-
mum throughput. Note that as opposed to the previous section where we wanted to examine the
CPU cost of the transferred traffic, here our goal is solely to maximize our throughput. In order
to discuss maximum throughput, we measure and analyze the results of our DUT for several
configurations, while receiving maximum transferable unit, i.e. we generate traffic where each
packet is composed of 1500Bytes. We examine the behavior of throughput for increasing chain
size, using both placement functions Pg and Pd . Since the behavior of CPU consumption, and
packet processing are similar to the behavior observed for 100Bytes packet (in bigger scale),
we omit their presentation.
Figure 5.5 (and Figure 5.6) depicts throughput performance on a DUT that is installed with
kernel OV S (DPDK-OV S, respectively). Both Figures 5.5(b) for kernel-OV S, and 5.6(b) for
DPDK-OV S show that the average throughput can scale up as long as the server is not over-
77

provisioning resources (when deploying 1-5 VNFs). For the case of kernel OV S the bottleneck
is the packet processing limit, while the NIC’s wire limit is the bottleneck for the case of
DPDK-OV S. The same effect can also be seen in Figures 5.5(a) and 5.6(a) where again as long
as the server is not over-provisioning resources, the chain of VNFs is able to forward ∼0.8-1.5
Gbit/s in the case of kernel OV S, and ∼6 Gbit/s in the case of DPDK-OV S. Note that DPDK-
OV S does not reach its packet processing limit (as it was seen in the case of 100-Byte packets
– Subsection 5.3.2). Instead, the observed limit is induced by the amount of packet processing
that a single CPU core (allocated for the VNF) can perform (∼6 Gbit/s). This bottleneck can
be mitigated if VNFs are set to have more than a single vCPU – that are configured to enable
Receive Side Scaling (RSS).
So far, we presented the average throughput of a single server (our DUT). A naive straight-
forward analysis might lead to the wrong conclusion that the distribute placement function Pd
outperforms the gather placement function Pg . As can be seen in Figures 5.7, this is not the
case. Figures 5.7(a) and 5.7(b) present the average overall throughput estimated by the model
defined in Section 5.2 on a set of many servers that are installed with kernel OV S (DPDK-OV S,
respectively). The gather placement function Pg has the advantage of being able to break the
traffic between several different independent servers, whereas the distribute placement function
Pd is bound by the wire capacity.

5.4 Monolithic Cost Function

Section 5.3 focuses on exhaustive evaluations, analyzing throughput, OV S packet process-


ing and CPU consumption for different placement functions and software switching technolo-
gies (kernel OV S and DPDK-OV S). In the following, we describe how we build the generalized
abstract cpu-cost functions, and then present and discuss the results.

5.4.1 Building an Abstract Cost Function

Based on the measured results on a single server (presented in Figures 5.3 and 5.4), we are
now ready to craft an abstract generalized cost function that accurately captures the CPU cost
of network switching.
We iterate the following process for both kernel OV S and DPDK-OV S. For each place-
ment function (Pg or Pd ) and for each packet size (100Bytes or 1500Bytes), we split the con-
struction and build a set of sub-functions that compose the cpu-cost function, namely C =
{Cgs , Cds , Cgb , Cdb } where Cgs and Cgb are sub-functions for placement Pg and packet size 100Bytes
and respectively 1500Bytes, and Cds and Cdb are sub-functions for placement Pd and packet size
100Bytes and respectively 1500Bytes.
Per each sub-function and for all measured service chain length, we sample the throughput
(Figures 5.3(a) - 5.3(d), 5.4(a) - 5.4(d), 5.5(a) - 5.5(b), and 5.6(a) - 5.6(b)) to estimate the amount
78

Table 5.1: Coefficients α and β, and the constant factor γ, per each cpu-cost sub-function.

Kernel OV S DPDK-OV S
α β γ α β γ
Cgs 0.586 0.858 -1.789 0.370 0.467 1.543
Cds 0.660 0.243 -2.661 0.217 0.091 3.795
Cgb 0.752 0.979 -3.856 0.478 0.578 0.194
Cdb 1.009 0.268 -7.176 0.157 0.109 4.718

Source: by author (2017).

of packets that a single server can process, and correlate between the service chain length and
the amount of packets. Next, we correlate the resulting packet processing, with the measured
CPU consumption (Figures 5.3(c) - 5.3(f), 5.4(c) - 5.4(f)).
Finally, after extracting a 3-dimensional correlation between (i) service chains length; (ii)
packet processing; and (iii) CPU consumption, we use the results to extract a set of cpu-cost
functions using logarithmic regression, as follows:

log(CX ) = α · log(ϕ|n ) + β · log(ϕ|p ) + γ (5.1)

Where X ∈ {gs, ds, bd, db}, namely gather or distribute placements for 100Bytes or 1500Bytes
size of packets. Table 5.1 lists per each sub-function the coefficients α and β, and the constant
factor γ. Given a service chain ϕ, the set of sub-functions C estimates the total required CPU
consumption on all servers.

5.4.2 Insights

The values presented in Table 5.1 reflect the real CPU cost of the various deployments,
but they provide very little insight regarding our motivation question (see Figure 5.1). In order
to get a real understanding of this cost we provide graphs that depict the CPU cost for various
service chains characterized by the length of the chain ϕ|n , and the amount of packets to process
ϕ|p .
Figures 5.8(a), 5.8(b), and 5.8(c) depict the CPU cost for both placement functions when
increasing the number of VNFs (and also the number of service chains) on servers that are
installed with kernel-OV S, and traffic is received in large packets (1500Bytes per packet).
In all graphs, the CPU consumption is the total amount of CPU required on all physical ma-
chines to support the service chain (where 100% represents all CPU-core on all servers). For
79

Figure 5.8: Cpu-cost ranging over different service chain length (ϕ|n ), while receiving traffic
generated in 1500Bytes packets, for both placement functions on servers that are installed with
kernel OV S.

10 20
10 Kpps − Gather
● 10 Kpps − Distribute
% CPU Utilization

% CPU Utilization
8 50 Kpps − Gather 15
50 Kpps − Distribute
6
10
● ●
4 ●



5 100 Kpps − Gather
2 ● ● 100 Kpps − Distribute
● ● ● ● ● ● 200 Kpps − Gather
0

0 200 Kpps − Distribute
12 52 102 152 202 252 302 12 52 102 152 202 252 302
# VNF # VNF

(a) 10Kpps and 50Kpps. (b) 100Kpps and 200Kpps

100


% CPU Utilization


75 ●


50

25 1 Mpps − Gather
● 1 Mpps − Distribute
1.5 Mpps − Gather
0 1.5 Mpps − Distribute
12 52 102 152 202 252 302
# VNF

(c) 1Mpps and 1.5Mpps

Source: by author (2017).

service chains with low packet processing requirements (10 Kpps - 50 Kpps), the cpu-cost func-
tion of the distribute placement Cdb , outperforms its gather placement counterpart Cgb . However,
as the requirement for packet processing increases (100 Kpps to 1.5 Mpps), the behaviour turns
over and favors the gather placement cpu-cost function Cgb .
In turn, Figures 5.9(a), 5.9(b), and 5.9(c) depict the CPU cost for both placement functions
when increasing the number of VNFs (and also the number of service chains) on servers that
are installed with DPDK-OV S, and traffic is received in 1500Bytes per packet. In this case
the behavior changes. For service chains with low packet processing requirements (10Kpps
- 50Kpps), the cpu-cost function of the gather placement Cgb outperforms its distribute place-
ment counterpart Cdb . However, as the requirement for packet processing increases (1Mpps to
6Mpps), the behavior turns over and favors the distribute placement cpu-cost function Cdb . Both
results presented above show that deciding which of the placement strategy is better, depends
on the required demand of packets to process, where the exact dependency varies according to
the technology used.
Next we examine the cpu-cost function by varying the demand (i.e. required packets to
80

Figure 5.9: Cpu-cost ranging over different service chain length (ϕ|n ), while receiving traffic
generated in 1500Bytes packets, for both placement functions on servers that are installed with
DPDK-OV S.

100 100
10 Kpps − Gather
● 10 Kpps − Distribute
% CPU Utilization

% CPU Utilization
80 50 Kpps − Gather 80
50 Kpps − Distribute
60 60 ● ● ●
● ●

40 40 ●

● ● ●
● ● ● 1 Mpps − Gather
20 ● 20 ● 1 Mpps − Distribute
1.5 Mpps − Gather
0 0 1.5 Mpps − Distribute
12 52 102 152 202 252 302 12 52 102 152 202 252 302
# VNF # VNF

(a) 10Kpps and 50Kpps (b) 1Mpps and 1.5Mpps

100
% CPU Utilization

75
● ●
● ●


50 ●

25 3 Mpps − Gather
● 3 Mpps − Distribute
6 Mpps − Gather
0 6 Mpps − Distribute
12 52 102 152 202 252 302
# VNF

(c) 3Mpps and 6Mpps

Source: by author (2017).

process) of a service chain, for arbitrarily selected few service chain lengths. Figures 5.10
depicts the CPU cost for both placement functions when increasing the required number of
packets to process (1500Bytes per packet) on servers that are installed with kernel OV S (Fig-
ure 5.10(a)) and DPDK-OV S (Figure 5.10(b)). In these graphs we focus on the definition of a
feasible cpu-cost function. Again, we normalize the cpu-cost values (i.e., value 100% represent
all CPU-core on all servers), and scale the amount of required packets to process, in order to
illustrate the bounds. All values presented in the graphs are in log scale. The graphs reaffirm
the results discussed in Figures 5.8 and 5.9, that is, deciding the appropriate placement strategy
depends on the required network traffic demand to process.
Recall the definition of feasible cpu-cost function from Section 6.1: a cpu-cost function is
feasible if there are enough resources to implement it. In the results presented in Figures 5.8,
5.9, and in Figure 5.10, the value 100% indicates that we have reached the processing bound
and we cannot process more packets. We can observe that the infeasibility point is reached
differently in each deployment strategy. For instance, the cpu-cost function of the distribute
placement Cdb reaches its infeasibility point faster than the gather placement Cgb when using
81

Figure 5.10: Cpu-cost ranging over different packet processing requirements (ϕ|p ) 1500Bytes
per packet, for both placement functions

100 ● 100 ●
● ● ● ●

50 50
% of CPU Utilization

% of CPU Utilization

10 ● 10

1 1
● Gather − 5 Distribute − 5 Gather − 5 Distribute − 5
Gather − 30 ● Distribute − 30 Gather − 30 ● Distribute − 30
10 K 100 K 1M 10 M 10 K 100 K 1M 10 M
# Packets to be processed (in pps) # Packets to be processed (in pps)

(a) Servers installed with kernel OV S (b) Servers installed with DPDK-OV S

Source: by author (2017).

OV S in Kernel model (Figure 5.10(a)).


82

6 OPTIMIZING OPERATIONAL COSTS OF NFV SERVICE CHAINING

In Chapter 5, we paved the way towards cost-efficient intra-datacenter deployment of ser-


vice chains. We performed an extensive experimental evaluation and analyzed the impact on the
operational cost. Then, we developed a cost function that accurately predicts the CPU required
to maintain the network traffic (with a guaranteed level of performance) for two baseline place-
ment strategies (i.e., gather and distribute). In essence, results show that the operational cost of
deploying service chains depends on many factors including the installed OV S (i.e., whether it
supports or not hardware acceleration), the required amount of packets to process, the length of
the service chain, and (more importantly) the placement strategy. Further, we show that there is
no single deployment strategy that is always better – regarding minimum operational costs – be-
ing the best strategy dependent on the system parameters and the characteristics of the deployed
service chains.
In this chapter, we develop a general service chain deployment mechanism that considers
both the actual performance of the service chains as well as the required extra internal switching
resource1 . This is done by decomposing service chains into sub-chains and deploying each one
on a (possibly different) physical server, in a way that minimizes the total switching overhead
cost. We then introduce a novel algorithm based on an extension of the well-known reduction
from weighted matching to min-cost flow problem and show that it gives an almost optimal
solution. Additionally, we extend the switching cost formulation defined in Chapter 5 from
considering only the gather and distribute placements to a general deployment scheme that can
compute the switching related CPU cost for arbitrary deployments.
The remainder of this chapter is organized as follows. In Section 6.1 we define our model
and then in Section 6.2 we describe our proposed intra-datacenter algorithm. Last, in Section
6.3, we evaluate the performance of our proposal.

6.1 Model and Problem Definition

In this section, we extend the model and notations previously defined in Chapter 5. As
mentioned, the recommended best practice for virtualization intense environment requires each
server to allocate two disjoint sets of CPU cores. One specifically to the hypervisor – which
manages the VNF provisioning –, and the other to properly operate VNFs. Given a server S we
denote by S h the number of CPU cores that are allocated and reserved solely for the hypervisor
to operate, and by S v the number of CPU cores required by the VNFs to operate. Throughout
this chapter, unless explicitly saying otherwise, we assume that we are given a set of k servers
S = {S1 , S2 , . . . , Sk }.
1
This chapter is based on the following publication:
• Marcelo Caggiani Luizelli, Danny Raz, Yaniv Saar. Optimizing NFV Chain Deployment Through Min-
imizing the Cost of Virtual Switching. In: Submitted to IEEE INFOCOM 2018.
83

Figure 6.1: Given a set of five service chains Φ and a set of k servers S, illustrating deployment
strategies.

Server 𝒌
𝝋𝟓𝟏 …𝝋 𝟓
𝒓 𝝋𝟑𝟏…𝝋 𝟑
𝒎
Server 𝒌
𝝋𝟓𝒓 𝝋𝟒𝒘 𝝋𝟑𝒎 𝝋𝟏𝒏
Server 𝟐
𝝋𝟐𝟏 𝝋𝟐𝟐 …𝝋 𝟐
𝒏
Server 𝟐
𝝋𝟒𝟏 𝝋𝟑𝟐 𝝋𝟐𝟐 𝝋𝟏𝟐
Server 𝟏
𝝋𝟒𝟏 …𝝋 𝟒
𝒘 𝝋𝟏𝟏 𝝋𝟏𝟐
Server 𝟏
𝝋𝟓𝟏 𝝋𝟑𝟏 𝝋𝟐𝟏 𝝋𝟏𝟏

(a) Gather placement Pg (b) Distribute placement Pd

Server 𝒌
𝝋𝟓𝒓 𝝋𝟒𝟏 … 𝝋𝟒𝒎 𝝋𝟏𝒏
Server 𝟐
𝝋𝟓𝟏 𝝋𝟑𝟐 𝝋𝟐𝟐 𝝋𝟐𝟑 𝝋𝟏𝟑
Server 𝟏
𝝋𝟑𝟏 𝝋𝟐𝟏 𝝋𝟏𝟏 𝝋𝟏𝟐

(c) Arbitrary placement P

Source: by author (2017).

We define a service chain to be an ordered set of VNFs ϕ = hϕ1 → ϕ2 . . . → ϕn i,


and denote by ϕ|n its length. We denote by ϕ|p (and ϕ|s ) the number of packets per second
(and average packet size, respectively) that service chain ϕ is required to process. For a VNF
ϕi ∈ ϕ we denote by ϕi |c the CPU required for its operation. Throughout the chapter, unless
explicitly saying otherwise, we assume that we are given a sequence of service chains Φ =
{ϕ1 , ϕ2 , . . . ϕm }.
A sub-chain of VNFs is sub-sequence ϕ[i..j] = hϕi . . . → ϕj i, where 1 ≤ i ≤ j ≤ ϕ|n .
A set of sub-chains {ϕ1 [1..i1 ], . . . , ϕr [1..ir ]} is a decomposition of ϕ if the concatenation of
all sub-chains recomposes ϕ, i.e. ϕ = hϕ1 [1..i1 ] → . . . → ϕr [1..ir ]i. We say that a set of
VNFs is an affinity group (and anti-affinity group) if all VNFs are mapped to the same server
(and respectively mapped to a different servers). We say that a set of sub-chains is a strict
decomposition if all VNFs in each sub-chain is an affinity group, and exactly one (arbitrary)
VNF from each sub-chain is in the same anti-affinity group. Intuitively, strict decomposition of
ϕ is a constraint that ties VNFs in order to implement a placement where each server is visited
at most once. We adopt the placement functions already defined in Chapter 5. For that, we use
the same definition of a general placement function, that is, P : Φ → S k to be a placement
function that for every service chain ϕ ∈ Φ, maps every VNF ϕi ∈ ϕ to a server Sj ∈ S. In
this context, we use the placement functions gather and distribute (respectively, Pg and Pd ).
Figure 6.1 illustrates possible deployment strategies. Figure 6.1(a) depicts deployment of
VNFs that follows the gather placement functions Pg , while Figure 6.1(b) illustrates the de-
ployment of VNFs following the distribute placement functions Pd . Last, Figure 6.1(c) show
84

a possible deployment of VNFs that follows arbitrary mixed placement functions P. Note that
given a service chain ϕ, the distribute placement function requires to have at least ϕ|n servers in
order for the deployment to be feasible. Also note that the gather placement function requires to
have at least one server that can host all VNFs in ϕ in order for the deployment to be feasible.
In Section 6.2 we develop a new algorithm that allows any arbitrary segmentation of the service
chain, while minimizing the operational cost of network traffic.
For a given placement function P that deploys all service chains in Φ on the set of servers
S, we define two cpu-cost functions as follows:

1. C v : P → (R+ )k is the guest cpu-cost function that per server outputs the CPU required
to operate the VNFs allocated on the server. For simplicity, we use Civ to refer to the i’th
P
value in the output, and C v to denote the sum of all k servers, i.e. C v = ki=1 Civ .
2. C h : P → (R+ )k is the hypervisor cpu-cost function that per server outputs the CPU
required to manage its network traffic. For simplicity, we use Cih to refer to the i’th value
P
in the output, and C h to denote the sum of all k servers, i.e. C h = ki=1 Cih .

We say that placement function P is feasible with respect to the set of servers S, if for every
server in S, the sum of the hypervisor cpu-cost function and the guest cpu-cost function does
not exceed the number of CPU cores that are available on this server, i.e.,

∀Sj ∈S : Cjh (P) + Cjv (P) ≤ Sjh + Sjv (6.1)

We can now formally define the algorithmic problem of interest. We are given a set of k
servers, and an online sequence of service chains. For each service chain in the sequence we
need to find a feasible placement to all the VNFs in the chain such that the total cpu-cost in-
crease is minimized. In some sense finding a feasible placement for a given service chain is a
zero-one acceptance problem since we are not allowed to place partial service chain (i.e. either
deploying the entire chain or nothing). Minimizing the total cpu-cost (hypervisor) releases ad-
ditional resources that can be potentially used to allocated additional network functions, which
directly improves the acceptance rate of service chains overtime. Our evaluation of acceptance
rate (comparing our algorithm to OpenStack scheduler) shows that minimizing the cpu-cost
locally brings us to a near-optimal global point at the end of the sequence.

6.2 Operational Cost Optimized Placement

In this section, we introduce our Operational Cost Minimization algorithm (OCM) for the
problem described above. Given a sequence of service chains and a set of physical servers,
our algorithm entails a strategy to deploy the service chains onto the set of physical servers.
Usually, resource scheduling algorithms decide their deployment strategy based on an analysis
of the current state of the infrastructure, and the computation resources required for the VNFs
85

Figure 6.2: Server Sj deployed with r sub-chains from different network services.

𝝋𝟏𝟏 𝝋𝒘 𝒘
𝝋𝒓𝟏 𝒓
… … …
𝟏 𝝋𝒊 𝝋𝒊
𝒊 𝝋𝒊
𝟏 𝟏 𝟏 𝒊𝒘 𝒘 𝒊 𝒓 𝒓

Virtual Switch

Source: by author (2017).

to operate. Our approach extends the typical resource scheduling algorithms by reasoning about
the operational cost involved in managing the network traffic to the VNFs. As explained above
the idea is that minimizing this cost will release additional resources that can be potentially
used to allocated additional network functions. The first step, is then, computing the operational
related cost, and as explained in the previous sections, we focus on the internal server switching
cpu-cost.

6.2.1 The Operational Cost of Switching

Recall that the cost function is composed of two parts: (i) the guest cpu-cost function C v that
describes the VNF computation cost and is simply the sum of the VNFs CPU requirements,
and (ii) the more interesting hypervisor cpu-cost function C h that describes the CPU cost that
is required to manage the network traffic. Given a placement function P, our goal in this
subsection is to evaluate the CPU cost of Cjh (P).
In Chapter 5, we studied and evaluated the feasibility and the hypervisor cpu-cost function
of gather and distribute placements. For a service chain ϕ, composed of n VNFs (i.e., ϕ|n ),
they showed how to calculate the values of Fg (ϕ) which is the cpu-cost of a gather deployment
on a server (if the deployment is feasible, otherwise Fg (ϕ) is infinity), and of Fd (ϕ) which is
the cpu-cost of each server when l copies of ϕ are deployed in a distributed way on l identical
servers (if such a deployment is feasible otherwise Fd (ϕ) is infinity). This is done separately
for both types of OV S installation, kernel OV S and DPDK-OV S.
Using these functions as building blocks, we are now ready to define the Cjh function that
captures the cpu-cost per server j. Let {ϕ1 [1..i1 ], . . . , ϕw [1..iw ], . . . , ϕr [1..ir ]} be a set of sub-
chains that are deployed on server Sj , as illustrated in Figure 6.2. Note that each sub-chain may
be part of a different service chain decomposition, with different properties of a traffic.
The function Cjh should take into account both the switching work needed to support the
gathering of each sub-chain, as well as the effort given for the parallel deployment of all r sub-
chains together. As explained above, the cpu-cost associated with the gathering of each sub-
86

chain ϕw [1..iw ] can be obtained directly from the function Fg (ϕw [1..iw ]) (defined in Chapter 5).
On the other hand, there is an extra effort because there are several (r in our case) parallel
deployments of sub-chains that is not covered by previous definitions (Chapter 5), and thus we
approximate this value, using r · Fd (ϕrir ) − Fd (ϕr [1..ir ]). Note that this value is the difference
between r times the case that one VNF is deployed on the server and the case where r VNFs
are deployed. However, when the VNFs that compose the chain have different characterizations
we have to select one of them for this computation and we selected the last one.
Figure 6.3 depicts the overhead cost as a function of the number of sub-chains that exists
in the server. The overhead (in CPUs) represents the amount of resources each physical server
has to dedicate to properly operate each service chains (which includes hypervisor and soft-
ware switching layer). We vary the amount of network traffic each sub-chain is processing
from 10Kpps to 500Kpps. This is done separately for environment installed with kernel OV S
(Figure 6.3(b) for big packets and Figure 6.3(c) for small packets), and DPDK-OV S (for big
packets)2 . The overall computation of Cjh is then done as follows:

X
r
Cjh = r · Fd (ϕrir ) − Fd (ϕr [1..ir ]) + Fg (ϕw [1..iw ]) (6.2)
w=1

6.2.2 Chain Partitioning

When receiving a new service chain, we need to find servers for each of the VNFs such that
the overall placement is feasible, and has the minimal switching cost (as described above).
A naive approach would be to go over all VNFs in the service chain, for each one filter the
feasible servers and then compute the cost of the placement. This is basically what is done in
OpenStack nova-scheduler for a single VNF. When we have k servers and N VNFs in
the service chain the complexity of this method is O(k N ). Even for small chains (i.e., length
four or five), this can easily become impractical even when we have a thousand of servers. Note
that the main reason for this is that we need to check the cost of the placement for the entire
chain and not for one VNF at a time.
To address this deployment problem we take a different approach. We first consider all pos-
sible set partitions of the service chain into subsets of VNFs (not necessarily a decomposition
of the service chain) that are co-located on the same server, find the optimal cost for each such
partition (we show how to do this in polynomial time), and select the one with the minimal cost.
Identifying all possible subsets of VNFs in a service chain ϕ is equivalent to solving the
combinatorial problem of partition of a set. Namely given a set of VNFs {ϕ1 , . . . ϕn } (all
member of service chain ϕ), enumerate all subsets of the set into non-empty subsets, in such a
way that every element is included in one and only one of the subsets. Given an input N = ϕ|n ,
2
The behavior for small packets is similar to big packets.
87

Figure 6.3: The extra effort overhead cost as a function of the number of sub-chains that exists
in the server

20 10K 20 10K
Overhead Cost (in CPUs)

Overhead Cost (in CPUs)



50K 50K
● 100K ●
● 100K
15 200K 15 200K
500K ●
500K
10 ●
10


● ●
5 5 ●


● ●


1 1 ●
0 ● 0 ●

1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
# Deployed Sub−chains # Deployed Sub−chains

(a) DPDK-OV S with big packets (b) Kernel OV S with big packets

20 10K
Overhead Cost (in CPUs)

50K
● 100K
15 200K
500K

10 ●



5 ●


1 ●
0 ●

1 2 3 4 5 6 7 8 9 10
# Deployed Sub−chains

(c) Kernel OV S with small packets

Source: by author (2017).

the number of all possible partitions of a set is the Bell number B(N ). Recent study ((BEREND;
0.792N N

TASSA, 2010)) established an upper bound of ln(N +1)
, i.e. requires O(N N ) iterations. This
is already much better than S N as the typical chain length is less than five and the typical number
of servers can be 1K or more. Note, however, that the complete enumeration suggested above
also includes instances of placements that allow traffic to: enter a server, do some processing in
a certain VNF, then leave the server, and enter it again for the second time in order to do some
other processing in a different VNF. Since in practice this is not a reasonable placement, in the
following we suggest a relaxed version of the problem.
Assuming service chain ϕ goes through a server at most once, we can scan all possible de-
compositions of ϕ. All possible decompositions of ϕ is equivalent to solving the combinatorial
problem of integer composition. Namely, given a positive integer N , enumerate all ordered lists
P
of positive integers n1 , . . . , nr , such that their sum is equal to N , i.e. N = ri=1 ni . The prob-
lem of integer composition has been widely studied in the literature, including fast algorithms
for its solution (e.g., (MERCA, 2012; KELLEHER; O’SULLIVAN, 2009)). Given an input N
the number of all possible integer compositions is exactly 2N −1 (i.e., requires O(2N ) iterations).
88

Figure 6.4: Run time analysis of OCM compared to optimal solution.


10000
● Nova Scheduler
1000 OCM Optimal

Time (seconds)

OCM
100

10

2
0.5
0.1

1 2 3 4 5 6 7 8 9 10
# Service chain size

Source: by author (2017).

Since in our case the input number to the problem is the length of the service chain ϕ|n
that is practically bound, the enumeration of all integer composition is practically bound as
well. Our evaluations show that the enumeration of all integer composition can scale to service
chains of length ∼20 (executing for few seconds), whereas the enumeration of all set partitions
can scale to chains of length ∼6 (up to a minute of execution).
Figure 6.4 compares the running time (in log scale) of our approach with nova-scheduler
implementation that performs an exhaustive search over all possibilities of assigning servers to
the NFVs. We ran both methods on a reduced infrastructure setting with 100 servers, since the
exhaustive search algorithm cannot handle bigger sizes. On this input, our algorithm does not
take more than 2 seconds (in the worst case – long service chains) and 1 second on average to
compute the solution, while the optimum surpasses 10K seconds. It is important to emphasize
that we significantly improved the running time while delivering quality-wise solutions. For
all evaluated solutions, our solution that is based on integer composition was able to find the
optimal solution. Furthermore, when scaling our infrastructure to more than 1K servers, it is
still able to find solutions in a reasonable time (in the order of few seconds in the worst case).

6.2.3 Using Matching to Find Optimal Cost Placement

To simplify the presentation, in the following we refer to decomposition of a service chain,


but the algorithmic steps also apply to the case of partition of a set. For every decomposition
{ϕ1 [1..i1 ], . . . , ϕr [1..ir ]} of the service chain ϕ, we need to find an optimal placement over the
set servers S = {S1 , S2 , . . . , Sk } with the current load already taken into account. Again, we
could try all possible assignments and select the one with minimal cost but this will bring us
back to the exhaustive search complexity. Instead, we use matching to find in polynomial time
the optimal servers to assign to each sub-chain of ϕw [1..iw ]. The idea is to construct a bipartite
89

graph where the nodes in one side are the sub-chains ϕ1 , . . . , ϕr , and the nodes in the other
side are all the servers S1 , S2 , . . . , Sk . Sub-chain ϕw is connected to server Sj , with an edge
associated with the extra cost induced by placing sub-chain ϕw on server Sj on top the existing
sub-chains (i.e., Cjh with ϕw minus Cjh without it) if placing ϕw on server Sj is feasible. This
cost is infinity if it is not feasible to place sub-chain ϕw on server Sj . It is not too difficult to
verify that a feasible minimal cost placement of this service chain decomposition corresponds
exactly to a minimum-weight perfect matching in the bipartite graph.
Solving the minimum-weight perfect matching in the bipartite graph problem can be re-
duced to the problem of finding a minimum cost flow in a graph ((AHUJA; MAGNANTI;
ORLIN, 1993)). The complexity is polynomial and, therefore, for a given service chain ϕ of
length N = ϕ|n and set of k servers in S, can be solved in time O((N + k) · N 2 · k 2 ).

6.2.4 Operational Cost Minimization Algorithm

We are now ready to present our placement algorithm. Roughly, our algorithm has three
building blocks: (i) list all decompositions of ϕ or all partitions of the set of all VNFs in ϕ
in extended version of the algorithm (subsection 6.2.2); (ii) for a given decomposition build
the objective function as suggested above, a cost function that measures the operational cost of
network traffic management per each server (subsection 6.2.1); (iii) for a given decomposition
and objective function build a reduction to minimum-weight matching in bipartite graphs be-
tween sub-chains and servers, where the weights are given by the objective function (subsection
6.2.3).
The operation cost minimization algorithm receives as an input a set of servers S = {S1 , S2 , . . . , Sk }
and a sequence of service chains Φ = {ϕ1 , ϕ2 , . . . ϕm }. For each service chain ϕ ∈ Φ, the al-
gorithm invokes the optimal service chain placement step (presented in Algorithm 3) over the
input ϕ and S.

6.3 Evaluation

In order to assess the expected performance of the OCM algorithm and the ability to increase
utilization compared to commonly used placement methods, we implemented the OCM algo-
rithm and performed an extensive set of simulation-based evaluations. All experiments were
performed on an Intel i7 machine with 8 cores at 2.6Ghz and 16GB of RAM, installed with
MacOS 10.11.

6.3.1 Setup

The physical infrastructure representing a typical NFV-node (i.e., small datacenter) compose
of 100 physical servers, each having 24 physical cores (again representing typical servers in
90

Algorithm 3 Optimal service chain placement


Input: S ← {S1 , S2 , . . . , Sk }: set of servers
ϕ ← hϕ1 → ϕ2 . . . → ϕn i: service chain
1: min-cost ← ∞: minimum cost found so far
2: deploy-map ← NIL: maps all VNFs to servers in S
3: A ← all partition of a set (or decompositions) of ϕ
4: for every specific partition of a set a ∈ A do
5: for specific partition p ∈ a, and server Sj ∈ S do
6: Cjh , Cjv ← build cost function deploying p on Sj
7: end for
8: G ← build reduction to minimum cost flow in a graph
9: if min(G) ≤ min-cost then
10: min-cost ← min(G)
11: deploy-map ← extract solution from G
12: end if
13: end for
14: return deploy-map

this environment). Service chains are composed of sequences of VNFs, where the number of
VNFs was chosen uniformly at random between 2 and 10 (which is the expected length of
service chains in real deployments). We set the VNF computation requirement to be a fixed
amount of CPU (1 vCPU). This was done since our main goal is to assess operational costs
on deployments. Note that increasing the required amount of CPU per VNF does not change
the operational costs, it only impacts the acceptance ratio and overall resource utilization. We
vary the amount of packets to be processed for each service in the range 10K to 1.5M packets.
We also considered both kernel mode as well as DPDK packet acceleration for evaluating the
operational costs when deploying service chains using Open vSwitch.

For the purpose of this evaluation, we developed a generator of service chains requests.
Service chain requests are handled one by one as described in Section 6.1. Each experiment
was performed 30 times, considering different service chain requests.

6.3.2 Comparison to OpenStack

We compare the OCM algorithm to two possible deploying strategies in OpenStack/Nova:


(i) load balanced policy (distribute placement) (ii) energy saving policy (gather placement).
Observe that we mimic the nova-scheduler engine in our simulator by using the gather and
distribute policies, which assume hard requirements (instead of soft constraints) and, therefore,
requests are rejected whenever not possible to meet such requirements.
91

6.3.3 Results

NFV operational cost. Figures 6.5(a) and 6.6(a) depict the average additional CPU re-
quired to deploy VNFs. This additional cost represents resources specifically allocated to keep
up the network performance (e.g., throughput) of software switching. As we observe, the addi-
tional number of physical cores needed for the virtualized switching functionality is not negligi-
ble, and as expected the cost tends to be higher as the traffic requirement (in packet per seconds)
increases. For high network requirements (1M to 1.5M pps), the operational cost is up to half
of the available resources (≈ 10 − 12 cores) of a whole physical server (see Figure 6.5(a)). Ob-
serve that the operational cost of nova-scheduler/distribute is substantially higher
in DPDK-enable OV S server for low requirement network traffic. In OV S-DPDK, there is at
least a subset of poll-mode threads (doing busy-waiting) in the hypervisor (which consumes
resources independently of the network traffic). As the nova-scheduler/distribute
deployment tries to utilize available servers evenly, for low-requirement network traffic there
is a low level of sharing of those resources (which leads to higher operational costs). Observe
that as the network traffic requirement increases, such cost tends to lower in comparison to
kernel-base implementations.

Figure 6.5: Analysis of service chains deployment on NFV servers with Open vSwitch in kernel
mode.
Extra CPUs allocated per VNF

15 ● 1.00
10 ●● ●
● ●
Acceptance ratio (%)

● ●
● ●
5 ● 0.75 ●
4

3
2 ●
0.50 ●

1 ● ●

● ● Distribute 0.25 ● Distribute ● ● ●● ●
Gather Gather
● OCM OCM
0.2 0.00
10 K 50 K 100 K 500 K 1M1.5 M 10 K 50 K 100 K 500 K 1M1.5 M
# Packets to be processed (in pps) # Packets to be processed (in pps)

(a) Operational cost. (b) Acceptance ratio.

1.00 ●
● Distribute
Unused resources (%)

Gather
0.75 OCM


0.50 ●


● ● ● ● ● ● ● ●
0.25

0.00
0 200 400 600
# Service chain requests

(c) Resource utilization.

Source: by author (2017).


92

When comparing the OCM algorithm to the other two methods, one can see that the amount
of the additional resources required when placement is done by the OCM algorithm is always
below the other two methods. The amount of improvement depends on the specific parameters
and in extreme cases can be as high as a factor of 4. However, in most cases the improvement is
between 20% and 40%. Note that reducing the switching cost is not the ultimate goal, the goal
is to free resources in order to be able to support additional functionality, so the real benefit is
the ability to accept additional services as shown next.
Service chain acceptance ratio. Figures 6.5(b) and 6.6(b) depict the average acceptance
ratio (this is the ratio between the number of deployed chains and the number of the requested
chains) for different network traffic requirements. It is clear that, on average, OCM is able to de-
ploy a higher number of requests. In the worst case, it behaves similarly to nova-scheduler/gather
deployment strategy. Due to higher operational cost of deploying services in a distributed way
(e.g., in DPDK mode), the acceptance ratio is substantially affected in those cases (see Fig-
ure 6.6(b), the acceptance is ≈ 30% in this case). However, note that for higher network traf-
fic requirements, there is a slight improvement of nova-scheduler/distribute over
nova-scheduler/gather. OCM, in turn, even for high requirements has shown to better
take advantage of the available resources. This is because OCM provides flexible solutions that
better fits available physical resources and, therefore, it ensures a higher acceptance rate. The
improvement of OCM over standard deployment policies is much more noticeable if we look
only at large requests.
Physical infrastructure resource utilization. Figures 6.5(c) and 6.6(c) illustrate the aver-
age unused resource over time (i.e., the lower the average values the better). Observe that OCM
can utilize more resources in the physical infrastructure (up to 18% of unused resources) with
comparatively lower operational cost (see Figures 6.5(a) and 6.6(a)). The average of unused
resources may be lower depending on the workload (for longer service chains, OCM can better
take advantage of chaining decomposition). In the OV S-DPDK, the average percentage of un-
used resources is slightly higher in comparison to kernel mode. That happens because DPDK
requires a fixed amount of resources to operate (threads required by poll-mode drivers) which
reflects in high operational costs and less flexibility on placing service chains.
Overall one can see that trying to reduce the switching CPU cost locally at each deployment
of a new request, proves itself to be a very good strategy as it improves the acceptance rate and
allows a much better utilization of the resources.
93

Figure 6.6: Analysis of service chains deployment on NFV servers with Open vSwitch in DPDK
mode.
Extra CPUs allocated per VNF

15 1.00
10 ● ● ● ●● ● ● Distribute

Acceptance ratio (%)


● ●

● ● ●
● Gather
5 0.75 OCM
4
3
2 0.50
1 ● ●

● Distribute 0.25 ● ● ● ● ● ● ● ● ●● ●
Gather
OCM
0.2 0.00
10 K 50 K 100 K 500 K 1M1.5 M 10 K 50 K 100 K 500 K 1M1.5 M
# Packets to be processed (in pps) # Packets to be processed (in pps)

(a) Operational cost. (b) Acceptance ratio.

1.00 ●
● Distribute
Unused resources (%)

Gather
0.75 OCM

● ● ● ● ● ● ● ● ● ● ● ●
0.50

0.25

0.00
0 200 400 600
# Service chain requests

(c) Resource utilization.

Source: by author (2017).


94

7 OPTIMIZING DISTRIBUTED NETWORK MONITORING IN NFV

In the previous chapters, we approached the Inter- and Intra-datacenter VNFPC problem.
The interplay between these two optimization problems ensures an efficient and scalable provi-
sioning of service chains. In this chapter, we tackle the problem of monitoring network traffic
in service chains – another important building block for the properly operation of NFV-based
network services1 . We propose an optimization model that is able to effectively coordinate the
monitoring of service chains in the context of NFV deployments. The model is aware of SFC
topological components, which allows to independently monitor elements within a network
service.
The remainder of this chapter is organized as follows. In Sections 7.1 and 7.2, we moti-
vate the Distributed Network Monitoring (DNM) problem and assess the impact of forwarding
network traffic (to monitoring services) on network performance. We then formalize the DNM
problem using an ILP model in Section 7.3. Last, in Section 7.4, we present the performance
evaluation of our proposed model.

7.1 Problem Motivation

Collecting and analyzing flow measurement data are essential tasks to properly operate a
network infrastructure. By continuously analyzing this data, network operators can diagnose
performance problems (e.g., congested links (ZHU et al., 2015)), detect network attacks (e.g.,
superspread attack (YU; JOSE; MIAO, 2013)), or even perform traffic engineering (e.g., de-
tect large traffic aggregation (BEN-BASAT et al., 2017) or traffic changes) – to name just a
few possible monitoring operations. Considering a software-based network infrastructure (i.e.,
NFV- and SDN-based one), network operators have increased flexibility to dynamically collect
network traffic and, consequently, monitor network services thoroughly.
The flexibility offered by software-based environments allows monitoring services to be
dynamically and efficiently deployed. In the context of NFV, monitoring services are orthogonal
to the deployment of SFCs – as the network monitoring involves to cope with highly dynamic
events. As previously defined in Chapters 3 and 5, service chains can be defined as an ordered
set of network functions that the network traffic is steered through. As an example, we can
consider the following service chains ϕ1 composed of three VNFs – ϕ1 = hϕ11 → ϕ12 → ϕ13 i.
In this particular service chain, there might exist independent monitoring requirements for each
segment. In other words, segments ϕ11 → ϕ12 and ϕ12 → ϕ13 might be monitored according to
multiple policies so as to fulfill different network monitoring demands at packet- or flow-level.
1
This chapter is based on the following publication:
• Marcelo Caggiani Luizelli, Luciana Salete Buriol, Luciano Paschoal Gaspary. In the Right Place at the
Right Time: Optimizing Distributed Network Monitoring for NFV Service Chains In: To be submitted
to IEEE NOMS 2018.
95

For instance, the egress network traffic of VNF ϕ11 could be forwarded to a specific monitoring
system with a predefined traffic rate r(.), while the egress of VNF ϕ13 to a different one. Figure
7.1 illustrates an example where the SFC ϕ1 is deployed along multiple locations in an NFV-
based infrastructure. Consider that SFC ϕ1 is composed of a firewall (VNF ϕ11 ), an encryption
function (VNF ϕ12 ), and a user accounting function (VNF ϕ13 ) – in this respective order. On
monitoring the SFC ϕ1 , the network traffic should be carefully collected so as to ensure the
correct analysis of network data. In the example, the SFC ϕ1 is constantly monitored by a DPI
and a HH (heavy hitter) solution. As the DPI requires the raw network traffic to properly operate
(i.e., non-encrypted), the network traffic should be sent to it right after being processed by VNF
ϕ11 and right before VNF ϕ12 . In turn, the HH can receive the network traffic any point after ϕ13 –
assuming that the operator wants to know the top-k flows authorized by the accounting system.
Figure 7.1 illustrates a possible solution for the problem of steering the network monitoring
traffic to a set of monitoring sinks. From this point on, we use the term monitoring sink as a
reference to monitoring function.

Figure 7.1: Coordinated network monitoring of SFC-based network services.

φ1 r(φ2-φ3)/2
Monitoring sinks
φ2
DPI HH
r(φ1-φ2)/2 r(φ1-φ2)/2 φ3
B Network Functions
A φ1 φ2 φ3
r(φ1-φ2) r(φ2-φ3)/2
Network traffic
Sampling

Source: by author (2017).

While some SFC segments are required to collect/aggregate flow-level statistics (e.g., num-
ber of packets/volume of a given flow), other segments are required to collect specific packet-
level information to be sent to complex monitoring sinks. Packet-level monitoring enables
network operators to deepen the understanding of a significant number of network behaviors
– e.g., understand flow properties (such as entropy), measure delay or throughput per flow
(YU, 2014), and perform complex network traffic inferences (such as identify network attacks).
Despite these benefits, packet level monitoring incurs in major network overheads due to the
collection, storage, transmission, and analysis of an enormous amount of packets (particularly,
at line rate). As illustrated, examples of packet level monitoring applications include DPI and
HH solutions. While DPI requires the inspection of all packet contents (headers and payload),
HH solutions (BEN-BASAT et al., 2016) demand header checking (packet size). To reduce
the burden of packet-level monitoring, complex monitoring systems rely on network sampling
mechanisms (e.g., (BEN-BASAT et al., 2017)). Consequently, network devices can keep up
with network performance requirements while the monitoring system can still provide accurate
results.
96

To fulfill monitoring requirements (at flow- and packet-level), network devices rely exten-
sively on flow export protocols such as sFlow (PHAAL; PANCHEN; PANCHEN, 2001) and
NetFlow (CLAISE, 2004). Despite their wide adoption in traditional infrastructures, these pro-
tocols are rigidly designed to either mirror or uniformly sample network traffic to monitor ap-
plications. As a consequence, it has led to a few drawbacks on monitoring NFV service chains.
These drawbacks include the following.

1. Monitoring applications are bounded to run at predefined locations in the network infras-
tructure.
2. Network flows are treated equally, being sampled at the same rate.
3. Network flows are potentially sampled multiple times along the path a service chain is
deployed – in case many routers are enabled to perform sFlow/Netflow.

As one can observe, traditional monitoring mechanisms are not flexible enough to cope
with monitoring requirements imposed by NFV-based service chains – namely, independent
and coordinated monitoring of service chain segments. Fortunately, NFV infrastructures have
largely adopted software switching (e.g., Open vSwitch or P4-based dataplanes (BOSSHART
et al., 2014; SHAHBAZ et al., 2016)) as a building block to interconnect VNFs. In addition
to interconnect VNFs, software switching allows network traffic to be monitored (by sampling
and/or forwarding network traffic to monitoring sinks) at any granularity (by filtering network
flows) and at any position of a given deployed service chain. The benefit, in summary, is to
provide a fully customized monitoring system that can be orchestrated orthogonally to SFC
deployments in a flexible way.

7.2 The Cost of Packet-Level Monitoring in Software Switching

The drawbacks discussed above of traditional monitoring mechanisms are overcome in


NFV environments by taking advantage of existing capabilities of (programmable) software
dataplanes. These capabilities include mainly (i) fine-grained traffic matching and (ii) flexible
mechanisms to sample network traffic. Therefore, such mechanisms enable service chains to
be monitored at any given point (e.g., egress/ingress ports) to as many as necessary monitoring
sinks.
However, providing monitoring flexibility directly on the dataplane also incurs in network
performance overheads. In order to simultaneously forward network traffic (coming from dif-
ferent service chains) and sample to monitoring sinks, the software switching layer requires an
additional amount of computing resources to keep up with performance requirements. Despite
this resource overhead, sampling mechanisms are much more efficient than sending all packets
out to monitoring sinks for two main reasons: (i) network links are not overwhelmed by mon-
itoring packets and (ii) dataplane resources are saved as it reduces substantially the number of
packet copies performed between network interfaces (physical and virtual ones). This is par-
97

ticularly important as the dataplane is upper-bounded by the number of packets it can handle
(LUIZELLI et al., 2017b).

Performing per-packet sampling directly into the dataplane introduces greater flexibility to
monitor service chains. However, it comes with a cost. In order to quantify network perfor-
mance degradation when performing network sampling, we conduct a set of experiments in a
typical NFV environment. For that, we deployed one service chain composed of a single VNF
(i.e., ϕ1 = hϕ11 i). All the traffic processed by VNF ϕ11 is then sampled to a monitoring sink.
We vary the sampling rate from 10−4 to 10−1 and compare it against two baselines cases: (i)
mirroring network traffic (i.e., copy all packets to the monitoring sink), and (ii) without any
sampling. We highlight that our main goal is not to provide the best performance on sampling
packets, but rather motivate about the potential benefits and incurred costs of sampling.

Our NFV evaluation environment consists of two HP ProLiant servers. Each server is
equipped with an Intel Xeon E3-1220v2 processor, with four physical cores running at 3.1
Ghz. Each server has 8 GB RAM and a DPDK-enabled Intel 82599ES 10 Gbit/s NIC with two
physical ports – which are used to interconnect both servers. The operating system running on
each server is CentOS version 7.2.1511, Linux kernel version 3.10.0. One of the servers is used
as Design Under Test (DUT), and the other is used as a traffic generator. In our DUT, we use
Open vSwitch 2.62 – compiled with Intel’s DPDK 16.07. The VNF and the monitoring sink
are configured with Fedora 22 operating system, Linux kernel version 4.0.4 – running on top of
qemu version 2.6. Further, they are configured to operate with a single virtual CPU (pinned to
a specific physical core) and 512MB of RAM. In turn, our traffic generator server runs Moon-
Gen (EMMERICH et al., 2015) to send UDP network traffic at line speed through our deployed
service chain ϕ1 .

Figure 7.2 illustrates the maximum achieved throughput on the evaluated scenario. In case
there is sampling, the Open vSwitch dataplane first copies the packets to the egress port (in com-
pliance with service chain specification) and then to the monitoring sink. As one can observe,
the more packets are copied from VNF ϕ11 to the monitoring sink (e.g, 1 out of 10 – sampling
rate of 10−1 ), the lower is the network performance observed on the dataplane. Further, observe
that full mirroring (i.e., sampling rate of 100 ) incurs in considerably high throughput overhead
(~40% lower) in comparison to the case in which there is no sampling (i.e, VNF ϕ11 just sends
the network traffic to the egress port). Note, however, that sampling leads to an acceptable
overhead – being as low as 5%. The reason for that performance degradation is due to (i) more
packet copies (according to the sampling rate) and (ii) more CPU cycles spent for each packet
(for each packet, the dataplane checks whether or not to sample).

2
We extended the Open vSwitch dataplane capabilities in order to support sampling. Our code is publicly
available at https://github.com/ovssample
98

Figure 7.2: Performance degradation when sampling in software switching.

6
Throughput (Gbit/s) 5
4
3
2
1 Sampling
Baseline
0 100 10−1 10−2 10−3 10−4 No Sampling
Sample Probability [τ]
Source: by author (2017).

7.3 Distributed Network Monitoring: Problem Overview and Optimization Model

In this section, we first introduce the Distributed Network Monitoring (DNM) problem, and
then we propose an optimization model to solve it.

7.3.1 Problem Overview

We start by introducing the DNM problem with an example. The problem in hand relies ex-
tensively on sampling mechanisms by NFV infrastructures to efficiently monitor service chains.
Consider an NFV infrastructure composed of N-PoPs interconnected by physical links.
Each N-PoP has a limited CPU capacity to support the operation of VNFs, while links have
limited bandwidth capacity. Attached to each N-PoP, there is a forwarding device with a limited
capacity to sample network traffic. On top of this NFV infrastructure, there are many deployed
service chains. For the sake of simplicity, consider three identical service chains ϕ1 , ϕ2 and
ϕ3 (in terms of VNF composition) – that is, ϕ1,2,3 = hϕ1 → ϕ2 → ϕ3 i. For example, each
service chain is specialized on handling different network traffic. Consider that HTTP network
traffic is steered through service chain ϕ1 , video streaming traffic through service chain ϕ2 and
all non-classified network traffic through service chain ϕ3 .
As previously mentioned, there exist complex network monitoring sinks that might be ap-
plied individually (or simultaneously) to each network flow. In this example, we consider that
the HTTP network traffic (being steered through service chain ϕ1 ) is monitored by a monitor
sink that implements DDoS detection/prevention. Similarly, the network traffic steered through
99

service chain ϕ2 (i.e., video streaming) is monitored by HH. Last, all non-classified network
traffic is monitored by a DPI. Yet, we assume that service chains ϕ1 an ϕ2 are monitored simul-
taneously by a Traffic Engineering (TE) solution.

Figure 7.3: Example of a solution for the Distributed Network Monitoring (DNM) problem.

I
J r(f2)/2
Monitoring sinks
r(f1)/4 H
r(f1)/4 DPI
P G
A O HH
r(f1)/2
M r(f3)/8 r(f1)/8 DDoS
N Q
B F TE
r(f3)/2 L r(f2)/2
C Network traffic
D E Sampling

Source: by author (2017).

In our example, we have a set of monitoring applications, represented by M . Set M is


composed of DDoS, HH, DPI and TE monitoring functions. Each monitoring application m ∈
M requires a specific portion of the network traffic so as to accurately process the input. Let’s
consider that the network sampling rate each network monitor function requires is represented
by function r(·) and, therefore, for all m ∈ M , there is a sampling rate r(m) known in advance.
Note that the sampling rate required by monitoring sinks m ∈ M might change overtime – for
that cases, we say that a sampling rate is known in advance at each time slot.
As mentioned, the DNM problem relies on the ability to sample network traffic along the
path each service chain is deployed. Then, a valid solution for the DNM admits distributing
the required sampling rate for a given monitoring service through arbitrarily network devices.
Hence, a solution for the DNM problem determines (i) how to split and forward sampled net-
work traffic efficiently and (ii) where to place network monitoring sinks according to the de-
mand. Note that the NFV infrastructure is constrained by the number of packets/s each forward-
ing device can handle and, therefore, sampling all network traffic at one location might incur in
high-performance degradation. Figure 7.3 illustrates a possible solution for the DNM problem.
In the figure, we have the three service chains deployed (represented by the bold dashed lines)
on top of the NFV infrastructure. For simplicity, we omit where network functions are deployed
(i.e., service chains ϕ1 , ϕ2 and ϕ3 ) and focus on showing the position of monitoring network
functions (i.e., m ∈ M ). Note that the network traffic being steered through service chains are
sampled according to the required sampling rate r(m) of each monitoring application. Further,
each network monitoring sink m ∈ M can be deployed multiple times (as the case of DPI and
HH in the example).
100

7.3.2 Model description and notation

We start by describing both the input and output of the model, and establishing a supporting
notation. We use superscript letters P and S to indicate symbols that refer to physical resources
and SFC requests, respectively. Similarly, superscript letters N and L indicate references to
N-PoPs/endpoints and the links that connect them.
The optimization model we use for solving the DNM problem considers a set of already
deployed service chains Q and a physical infrastructure p, the latter a tuple p = (N P , LP ).
N P is a set of network nodes, and pairs (i, j) ∈ LP denote unidirectional physical links. We
consider each node i ∈ N P as a computing node directly connected to a forwarding device.
We use two pairs in opposite directions (e.g., (i, j) and (j, i)) to denote bidirectional links. The
model captures the following resource constraints: computing power for N-PoPs (cPi ), sampling
capacity of forwarding device (sPi ) and one-way bandwidth for physical links (bPi,j ).
A service chain q ∈ Q is an aggregation of network functions and chaining between them.
It is represented as a tuple q = (NqS , LSq ). Sets NqS and LSq contain the SFC nodes and virtual
links connecting them, respectively. Each virtual link (i, j) ∈ LSq is potentially chained through
a path in the physical infrastructure p. We denote the deployed path of a virtual link (i, j) by a
set of physical links Pq,i,j
S
⊆ LP .
M denotes the set of monitoring sinks (e.g., HH, DPI, DDoS) available for deployment in
the infrastructure. Each VNF has Um instances, and may be instantiated at most |Um | times.
We denote as ftype : N P ∪ N S → M the function that indicates the type of a given VNF, which
can be either one instantiated in an N-PoP (N P ) or one requested in an SFC (N S ). We also
use functions fcpu : (F × Um ) → R+ and fpkt : (F × Um ) → R+ to denote computing power
requirement and packet processing capabilities of a given VNF.
As previously discussed, service chains q ∈ Q might have multiple independent monitor-
ing requirements. We define monitoring requirements to individual virtual links (i, j) ∈ LSq .
In other words, each link (i, j) is associated with a set of monitoring sinks, denoted by set
S
Mq,i,j ⊆ M . On monitoring virtual links independently, the model allows to cope with topo-
logical dependencies of already deployed network services. This is particularly important for
monitoring applications that require network traffic right after/before being handled by specific
network functions – e.g., packets sent to a DPI must not be encrypted/compressed. Last, for
each monitoring sink m ∈ Mq,i,j
S
, we denote the sampling rate it requires to properly operate as
q,i,j
rm .
The model output is denoted by a 4-tuple χ = {Y, F, D, S}. Variables from Y = { yi,m,j , ∀ i ∈
N P , m ∈ F, j ∈ Um } indicate a monitoring sink placement, i.e. whether instance j of mon-
q,k,l,m
itoring sink m is mapped to N-PoP i. The variables from F = { fi,j , ∀ q ∈ Q, (k, l) ∈
LSq , m ∈ F, (i, j) ∈ LP }, in turn, represent how the network traffic is split (and forwarded) to
the required set of monitoring services. Specifically, they indicate how much network traffic,
required by monitoring service m on virtual link (k, l) belonging to service chains q, is for-
101

warded through physical link (i, j). The variables from D = { dq,k,l
i , ∀ q ∈ Q, (k, l) ∈ LSq , m ∈
F, i ∈ RP } represent how much traffic is sent by forwarding device i for a given virtual link
(k, l) ∈ LSq . Similarly, the variables from S = { dq,k,l
i , ∀ q ∈ Q, (k, l) ∈ LSq , m ∈ F, i ∈ RP }
represent the amount of network traffic that is received by forwarding device i for a given virtual
link (k, l) ∈ LSq . Variables Y may assume a value in {0, 1}, while variable sets {F, D, S} ∈ R+ .

7.3.3 Model Formulation

Next, we describe the linear integer programming formulation for the DNM problem. The
goal of the objective function is to minimize simultaneously the number of deployed monitoring
sinks and the cost of forwarding monitoring flows. The objective is illustrated in Equation 7.1.

Objective:

X X X X X X X q,k,l,m
Minimize. α· yi,m,j + β · fi,j (7.1)
i∈N P m∈M j∈Um ∀q∈Q ∀(k,l)∈LS
q ∀m∈M ∀(i,j)∈L
P

Subject to:

X q,k,l,m
X q,k,l,m
fi,j − fj,i = sq,k,l q,k,l
i,m − di,m
j∈RP j∈RP

∀q ∈ Q, (k, l) ∈ LSq , i ∈ RP , m ∈ Mq,k,l


S
(7.2)

X
sq,k,l q,k,l
i,m = rm ∀q ∈ Q, (k, l) ∈ LSq , m ∈ Mq,k,l
S
(7.3)
S
i∈Pq,k,l

X X X
sq,k,l P
i,m ≤ si ∀i ∈ RP (7.4)
q∈Q (k,l)∈LS S
q m∈Mq,k,l

X X X q,k,l,m
fi,j ≤ bPi,j ∀(i, j) ∈ LP (7.5)
q∈Q (k,l)∈LS S
q m∈Mq,k,l

X X
y i,m,j · fcpu (m, j) ≤ cPi ∀i ∈ N P (7.6)
m∈M j∈Um
102

X X
dq,k,l
i,m ≤ yi,m,j · fpkt (m, j) ∀i ∈ RP , m ∈ Mq,k,l
S
, j ∈ Um ∈ (7.7)
q∈Q (k,l)∈LS
q

q,k,l,m
fi,j ∈ R+ ∀(i, j) ∈ LP , q ∈ Q, (k, l) ∈ LSq , m ∈ Mq,k,l
S
(7.8)

sq,k,l q,k,l
i,m , di,m ∈ R+ ∀i ∈ RP , q ∈ Q, (k, l) ∈ LSq , m ∈ Mq,k,l
S
(7.9)

y i,m,j ∈ {0, 1} ∀i ∈ RP , m ∈ M, j ∈ j ∈ Um (7.10)

The first two constraint sets refer to the forwarding of network monitoring traffic. Constraint
set (7.2) ensures (i) network traffic is sampled by forwarding devices; (ii) the flow conservation
along the path that monitoring traffic is routed through; and (iii) that N-PoPs (or network nodes)
receive the sampled network traffic. In turn, constraint set (7.3) guarantees that just a subset of
forwarding devices can sample network traffic for a given SFC, that is, the ones belonging to
the path the SFC is currently deployed. In addition, it ensures that the sampled network traffic
meets a sampling rate for a given monitoring sink. Constraint sets (7.4), (7.5), (7.6), and (7.7)
are related to capacity aspects. Constraint set (7.4) ensures an upper-bounded on the number
of network traffic being sampled. Constraint sets (7.5) ensures that the monitoring traffic does
not exceed the available bandwidth on the links used to forward packets to monitoring sinks.
Constraint set (7.6) ensures that the monitoring sinks does not exceed the available computing
capacity of a particular N-PoP. Last, constraint set (7.7) ensures that the monitoring sinks have
available capacity to process incoming monitoring traffic. Constraint sets (7.8), (7.9), and (7.10)
define the domain of output variables F , S, D and Y .

7.4 Evaluation

In this section, we evaluate the effectiveness of our mathematical model in generating fea-
sible solutions to the DNM problem. The ILP model formalized in the previous section was
implemented and run in CPLEX Optimization Studio3 version 12.7.1. All experiments were
performed on a machine with four Intel Xeon E5-2670 processors and 56 GB of RAM, using
the Ubuntu GNU/Linux Server 11.10 x86_64 operating system. Next, we describe our experi-
ment workload, followed by results obtained.

3
http://www-01.ibm.com/software/integration/optimization/cplex- optimization-studio/
103

7.4.1 Setup

To perform the experiments, we adopted a similar workload to recent literature, such as the
ones conducted by (BARI et al., 2015; LUIZELLI et al., 2017a). The physical network sub-
strate was generated with Brite4 , following the Barabasi-Albert (BA-2) (ALBERT; BARABÁSI,
2000) model. We consider physical infrastructures consisting of 100 N-PoPs and 393 physical
links. The computing power of each N-PoP is normalized and set to 1, and each link has a band-
width capacity of 10 Gbps. Attached to each N-PoP, there is a forwarding device with limited
sampling capability. We set the sampling capability (cPi ) of each forwarding device to 5% of its
line rate.
Our workload is comprised of a set of already deployed SFCs. For simplicity, we consider
that all deployed SFCs are uniform – i.e., they have the same requirements in terms of VNFs
and bandwidth. The topology of each SFC consisted of: a source endpoint, a sequence of three
VNFs interconnected by links (supporting 1 Gbps of traffic flow), and a destination endpoint.
Source and destination endpoints are placed randomly at predefined locations in the infras-
tructure (i.e., location constraints adopted by VNFPC (LUIZELLI et al., 2017a)). All VNF
instances require a normalized computing power capacity of 0.25. Our model takes as input the
mechanism for placement & chaining of SFCs (VNFPC model), previously defined in Chapter
3. The output of VNFPC determines the set Pq,i,j S
: (∀q ∈ Q, (i, j) ∈ LSq ) that specifies the set
of physical links used to steer network traffic from i to j in SFC q.
We consider a set of five types of arbitrary monitoring sinks (defined in our model as the
set M ). Monitoring sinks require a normalized computing power capacity of 0.25 to properly
analyze incoming network traffic. Each monitor sink is set to support an incoming traffic rate of
up to 1 Gbps. As previously defined, SFC segments are monitored independently – that is, each
segment (i.e., (i, j) ∈ LSq : ∀q ∈ Q) is potentially monitored considering different requirements.
We vary the network sampling requirements (rm q,i,j
) of each segment (i, j) from 10−5 to 10−1
uniformly amongst all SFCs q ∈ Q. Each experiment is repeated 30 times considering different
deployed service chains and physical infrastructures to ensure a confidence level of 90% or
higher.

7.4.2 Results

First, we analyze the number of monitoring sinks needed to cope with an increasing sam-
pling rate required by deployed SFCs. Figure 7.4 depicts the average number of instantiated
monitoring sinks with sampling rates varying from 10−5 to 10−1 (in log scale). Each dashed
line represents a scenario with an increasing number of simultaneously deployed SFCs (from 5
to 40). We observe that the higher the sampling rate (i.e., towards rate 10−1 ), the more mon-
itoring sinks are deployed in the infrastructure – up to 30 in the worst case. In contrast, as
4
http://www.cs.bu.edu/brite/
104

the sampling rate tends to be lower (i.e., towards rate 10−5 ), the number of sinks reduces dra-
matically – up to a factor of 6. The reason for this behavior is because as the sampling rate
increases, a larger number of network packets is expected to be forwarded to monitoring sinks
to further analysis. Observe that the baseline case (for the conducted evaluation) consists of for-
warding all network traffic (or none of it) to monitoring sinks. By forwarding all network traffic
to monitoring sinks, the DNM solution is infeasible for most of the considered scenarios. In
such infeasible scenarios, the amount of demanded resources is far from meeting the available
physical resources in the infrastructure. For instance, if we consider 40 deployed SFCs, each
SFC having five potential segments to be monitored (of 1 Gbps), it would lead to a theoretical
lowerbound of (at least) 200 instances of deployed sinks.
Figure 7.4: Number of monitoring sinks required for different sampling rates. For this evalua-
tion, we consider α = 1 and β = 1.

30
# Monitoring sinks

20

● ●

10 ● 5
10
20
● 30
40

0
10−5 10−4 10−3 10−2 10−1
Sampling rate

Source: by author (2017).

Next, we evaluate the volume of network monitoring traffic that is routed in the network
infrastructure. Figure 7.5(a) depicts the aggregate amount of network monitoring traffic de-
manded by all SFC segments being monitored. Observe that higher sampling rates lead to a
higher volume of network monitoring traffic – as it is expected. The observed behavior is linear
since we are evaluating a set of homogeneous scenarios. In turn, Figure 7.5(b) illustrates the
aggregated amount of network monitoring traffic that is routed through physical links in the
network infrastructure in comparison to the demanded network monitoring traffic (illustrated in
Figure 7.5(a)). The volume of network monitoring traffic steered in the infrastructure depends
primarily (i) on the required sampling rate and (ii) on how distant monitoring sinks are placed
from SFC segments and sampling devices. When a monitoring sink is instantiated on top of
an N-PoP attached to the network device that is sampling the network traffic, there is no need
to route network monitoring traffic through the physical infrastructure. In this case, the DNM
solution leads to a substantial gain regarding the usage of physical resources. However, in most
cases, due to the lack of available resources at specific network devices (i.e., forwarding devices,
physical links and/or monitoring sinks), the sampling of network traffic is often performed at
105

multiple forwarding devices (implying in network overheads). As an example, for the scenario
with 10 SFCs using a sampling rate of 10−3 , it requires around 50Mbit of network monitoring
traffic to be steered in the network infrastructure (illustrated in Figure 7.5(a)). For the same
setting, using DNM, we observe a network consumption of 9Mbit. In order to clearly illustrate
the benefits of DNM solutions, Figure 7.5(c) depicts the percentage gain in terms of network
monitoring traffic that is saved from being routed in the physical infrastructure. In the figure,
higher percentage gains are interpreted as lower overheads, as it leads to a reduced consump-
tion of physical resources. The average observed gain is around 54% in relation to the total
amount of demanded network monitoring traffic (illustrated in Figure 7.5(a)). Note that these
gains might reach as much as 80% for lower sampling rates (10−5 and 10−4 ), and from 20% to
50% for higher ones (10−1 ). That happens mainly because the DNM aims at placing monitoring
sinks (whenever possible) as close as possible to sampling devices. As mentioned, the closer
they are placed, the fewer resources are consumed to steer network monitoring traffic to them.

Figure 7.5: Network monitoring traffic demanded, consumed and saved when applying DNM
solutions.

10000
10000
# Network traffic (Mbit)

# Network traffic (Mbit)


● 1000
1000
● 100 ●
100

10

10 ● 5 ● 5

10 1 10

20 20
1 30 30
40 40
● ●

10−5 10−4 10−3 10−2 10−1 10−5 10−4 10−3 10−2 10−1
Sampling rate Sampling rate

(a) Aggregate amount of demanded network monitoring (b) Average amount of consumed physical network re-
traffic. sources.

1.0
10−5 10−3 10−1
10−4 10−2
0.8
Bandwidth Consumption
Percentage Gain in

0.6

0.4

0.2

0.0 5 10 20 30 40
# Deployed SFCs
(c) Average amount of network resources that are saved
when using DNM model.

Source: by author (2017).

Last, Figure 7.6 depicts the average time needed to find the optimal solution for different
106

DNM problem instances. We vary the sampling rate and the number of simultaneously de-
ployed SFCs. We can observe that, even for very small instances, the required time to find a
solution is non-negligible. For example, DNM instances having up to 20 SFCs, the required
time spent by CPLEX is in the order of minutes (up to 1,000 sec.). As the number of SFCs
considered increases (i.e. scenarios from 20 SFCs), the average required time to find the opti-
mal solution reaches the order of hours (up to 5,200 sec.). Further, we also observe that there
is a perceptible difference for the required computing time when we vary the sampling rates.
For higher sampling rates, the required time tends to be lower in comparison to lower sampling
rates. The reason for that is due to higher/lower volume of monitoring traffic to be steered in
each scenario – which directly impacts on the hardness of finding the optimal solution to the
DNM instance in hand.

Figure 7.6: Required time to compute the optimal solution for the DNM problem.

5000
● 0.00001
0.0001
0.001
0.01 ●
Time (seconds)

1000 0.1 ●




100

10 ●

0 10 20 30 40
# Deployed SFCs

Source: by author (2017).


107

8 FINAL CONSIDERATIONS

In this chapter, we present the conclusions and contributions obtained with the work devel-
oped in the context of this thesis. We briefly present the publications and other achievements of
this thesis.

8.1 Conclusions

While Network Function Virtualization (NFV) is increasingly gaining momentum, with


promising benefits of flexible service function deployment and reduced operations and manage-
ment costs, there are several challenges that remain to be properly tackled, so that it can realize
its full potential. One of these challenges, which has a significant impact on the NFV produc-
tion chain, is effectively and (cost) efficiently deploying service functions, while ensuring that
service level agreements are satisfied and making wise allocations of network resources.
Amid various other aspects involved, Virtual Network Function Placement & Chaining
(VNFPC) is key to fulfilling this challenge. VNFPC poses, however, an important trade-off be-
tween quality, efficiency, and scalability that previously proposed solutions (MOENS; TURCK,
2014; LEWIN-EYTAN et al., 2015; LUIZELLI et al., 2015) have failed to satisfy simultane-
ously. This is no surprise, though, given that the complexity of solving this problem is NP-
complete, as we have demonstrated in Chapter 3.
In this thesis, we tackled the VNFPC problem in the context of Inter- and Intra-datacenter.
As a major contribution to the state-of-the-art, we first formalized the Inter-datacenter VNFPC
and proposed an optimization model to solve it (LUIZELLI et al., 2015). Our mathematical
model has established one of the first baseline comparison in the field of resource manage-
ment in NFV and has been widely used in the recent related literature. Further, we devised
a novel Fix-and-Optimize based approach for efficiently solving the VNFPC problem for re-
alistic and large instances of the problem (LUIZELLI et al., 2017a). Our proposed approach
combines mathematical programming and meta-heuristic algorithms for systematically explor-
ing the VNFPC solution space.
Then, we tackled the VNFPC problem in the context of Intra-datacenter deployments. First,
we conducted an extensive set of experiments to better comprehend the effects of empirical
environments on the performance of deployed network services in a typical NFV environment
(LUIZELLI et al., 2017b). From this set of experiments, we developed a generalized cost func-
tion that accurately captures and estimates the CPU cost of software switching for arbitrary NFV
setups. Then, we designed the Operational Cost Minimization (OCM) algorithm for the effi-
cient and cost-oriented Intra-datacenter placement of service chains. OCM mechanism heavily
relies on the usage of our previously defined CPU cost function to guide the search for optimal
solutions with minimum operational cost.
Finally, after diving specifically into the provisioning of NFV services in both Inter- and
108

Intra-datacenter contexts, we tackled the problem of monitoring network traffic of service


chains – another essential building block for the proper operation of NFV-based network in-
frastructures. For that, we formalized the Distributed Network Monitoring (DNM) problem and
proposed a mathematical optimization model to effectively coordinate the monitoring of service
chains in the context of NFV/SDN deployments.
In Chapter 1, we presented the following hypothesis based on the limitations of state-of-the-
art proposals in the context of NFV resource management:

Hypothesis: In order to take full advantage of the benefits provided by NFV, efficient and
scalable strategies should be used to orchestrate the deployment of Service Function Chaining.

Based on the work presented in this thesis, it is possible to identify evidence to answer the
research questions associated with the hypothesis that has been posed to guide this study. The
answer to each question is detailed as follows.

Research Question 1. When performing the deployment of NFV-based SFCs, what are the
gains regarding network metrics and resource consumption that can be attained in comparison
to traditional network service deployments (i.e., based on physical network functions)?
Answer. In this thesis, we formalized the Inter-datacenter VNFPC problem taking
into account the main constraints involved in the process of deploying SFCs requests
on top of NFV-based environments. Our proposed optimization model for the afore-
mentioned problem minimizes the number of VNFs instantiated in the infrastructure
while ensuring quality related metrics such as end-to-end delay. We evaluated our
approach to the VNFPC problem considering realistic workloads and different use
cases. Results show that solutions for the problem lead to a reduction of up to 25%
in end-to-end delays and an acceptable resource over-provisioning limited to 4% in
comparison to traditional middlebox-based deployments.

Research Question 2. Since the SFCP problem is NP-complete, how to provide efficient and
near-optimal solutions to SFPC instances on large-scale NFV infrastructures?
Answer. We proposed a novel fix-and-optimized based heuristic which is able to
timely solve large instances of the VNFPC problem – in the order of a couple of
hours in the worst case. The results achieved not only evidenced the potentialities of
fix-and-optimize and Variable Neighborhood Search as building blocks for system-
atically exploring the VNFPC solution space. More importantly, they have shown
the significant improvement of our approach over the state-of-the-art – considering
the quality (500% better solutions on average), efficiency (in the order of a couple of
hours in the worst case), and scalability (to the order of thousands NFV nodes).
109

Research Question 3. On deploying SFCs onto physical servers, there is a non-negligible CPU
cost associated with the service operation. Which are these costs and how to properly estimate
them in an NFV environment?

Answer. In NFV-based environments (and any virtualization-intense one), virtual


switching is an essential functionality that provides isolation, scalability, and mainly
flexibility to the environment. Such functionalities provided by virtual switching in-
troduces (in general) a non-negligible operational cost, which makes much harder to
guarantee a reasonable level of network performance to network services – consid-
ered a key requirement for the success operation of NFV. To fully grasp the costs and
limitation of typical NFV deployments, we conducted an extensive experimental eval-
uation measuring the performance and analyzing the impact of virtual switching (i.e.,
Open vSwitch). Our results indicate that the operational cost of deploying service
chains depends on the installed Open vSwitch (either kernel OvS or DPDK-OvS), the
required amount of traffic to process, the length of the service chain, and the place-
ment strategy. Based on such evaluation, we developed a generalized cost function
that accurately captures the CPU cost of software switching in this setting. Our cost
function is based on logarithm regression and is able to estimate the operational cost
to any arbitrary input parameters. By fully understanding incurred costs and potential
limitations in NFV environments, we were able to design tailored solutions for the
management and orchestration of VNFs.

Research Question 4. How to minimize operational costs in SFC deployments? Is it possible


to efficiently guide NFV orchestrators on deploying VNFs Intra- and Inter-server?

Answer. Understanding the costs and potential limitations of software switching in


NFV environments is a key ingredient in the ability to design efficient solutions for
VNF management and orchestration – possibly leading to lower operational costs. We
proposed OCM algorithm which relies on the usage of the previously defined CPU
cost function. OCM design relies on the well-known minimum cost matching prob-
lem in order to efficiently (i.e., in polynomial time) place a service chain into a set of
physical servers with minimum cost to the NFV provider. The obtained results show
that the proposed algorithm can reduce the operational costs (i.e., additional resources
consumption of the hypervisor) significantly (up to a factor of 4 in the extreme case
and 20% – 40% in typical cases) when compared to traditional deployment policies
implemented in OpenStack. Moreover, OCM makes a better usage of the available
resources – which contributes to a higher acceptance ratio of new network services
overtime.
110

Research Question 5. The proper operation of network services requires to constantly monitor
them. Is it possible to take advantage of NFV/SDN technologies to efficiently deploy virtualized
monitoring services?
Answer. We formalized the Distributed Network Monitoring (DNM) problem in or-
der to monitor network traffic of service chains. DNM is able to effectively coordinate
the monitoring of service chains in the context of NFV/SDN deployments. By coordi-
nating the placement of monitoring sinks (also referred as monitoring functions) and
the efficient forwarding of network traffic to them, DNM can outperform traditional
monitoring mechanism. Results for our proposed optimal model indicate a reduc-
tion of up to 80% in network monitoring traffic that is routed through the network
infrastructure, while placing a reasonable number of monitoring sinks.

8.2 Future Research Directions

As NFV is a recent network paradigm and yet in a maturing phase, there are many open
research problems that were out of the scope of this thesis. Throughout this thesis, we have
focused on the VNFPC problem in the context of Inter- and -Intra-datacenter. In this realm, we
consider extending this thesis in the following manner.
First, we plan to evaluate the impact of complex SFC request (e.g., non-linear) on deploy-
ing them both in Inter- and Intra-datacenter scenarios. This is particularly important since
VNFs can be decomposed into independent subcomponents – potentially leading to even more
complex requests. Second, we plan to study other operational costs that were not covered in
this thesis – such as energy consumption – and its correlation to current performance limita-
tions. The interplay between different operational costs can open up opportunities to design
even more accurate pricing model by NFV providers. Third, we intend to integrate our devised
models/algorithms in OpentStack scheduler and measure its performance in a real NFV-based
deployment. For that purpose, there are many existing technological limitations that should be
tackled in advance. For instance, it is well known the current limitation of OpenStack regarding
its computing (responsible for managing VNFs) and networking (responsible for managing the
routing) modules. In order to instantiate an SFC in an NFV environment, these two modules
should be further integrated. Fourth, we plan to extend our proposed solutions in order to cope
with online scheduling of SFC requests. The scheduling of SFCs is another cornerstone to the
effective success of NFV. We envision that, in a near future, NFV orchestrators will be able to
instantiate a network service within a few milliseconds – which would allow, for instance, a net-
work service to be provisioned per customer or yet (even more challenging) per network flow.
Fifth, we intend to extend our approach to deal with constantly evolving network conditions.
In such cases, assigned composite services need to be reorganized (reassigned) in response to
fluctuations in traffic demand (for example), so that service level agreements are not violated.
Lastly, we plan to extend our conducted evaluation in order to fully understand the impact of
111

different parameters in our models, as well as to design efficient and online strategies to cope
with highly dynamic scenarios.

8.3 Achievements

The development of this thesis has led to the publication of the following peer-reviewed/journal
papers:

• Piecing Together the NFV Provisioning Puzzle: Efficient Placement and Chaining of
Virtual Network Functions. (Best Student Paper Award)
Marcelo Caggiani Luizelli, Leonardo Richter Bays, Marinho Pilla Barcellos, Luciana
Salete Buriol e Luciano Paschoal Gaspary.
IFIP/IEEE International Symposium on Integrated Network Management 2015 (IM 2015).
• A Fix-and-Optimize Approach for Efficient and Large Scale Virtual Network Func-
tion Placement and Chaining.
Marcelo Caggiani Luizelli, Weverton Luis da Costa Cordeiro, Luciana Salete Buriol e
Luciano Paschoal Gaspary.
Elsevier Computer Communications, 2016.
• The Actual Cost of Software Switching for NFV Chaining
Marcelo Caggiani Luizelli, Danny Raz, Yaniv Saar e Jose Yallouz.
IFIP/IEEE International Symposium on Integrated Network Management 2017 (IM 2017).

In addition to the aforementioned main outcomes of this thesis, we further authored/coau-


thored some others studies on correlated optimization problems in the context of NFV, SDN,
and Network Virtualization. These publications are listed next.

• Constant Time Updates in Hierarchical Heavy Hitters


Ran Basat, Gil Enziger, Roy Friedman, Marcelo Caggiani Luizelli, Erez Waisbard.
ACM SIGCOMM, 2017.
• Constant Time Weighted Frequency Estimation for Virtual Network Functionalities
Gil Enziger, Marcelo Caggiani Luizelli, Erez Waisbard.
IEEE International Conference on Computer Communications and Networks (ICCCN),
2017.
• NFV-PEAR: Posicionamento e Encadeamento Adaptativo de Funções Virtuais de
Rede
Gustavo Miotto, Marcelo Caggiani Luizelli, Weverton Luis da Costa Cordeiro e Luciano
Paschoal Gaspary.
Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC), 2017.
• How physical network topologies affect virtual network embedding quality: A char-
acterization study based on ISP and datacenter networks
112

Marcelo Caggiani Luizelli, Leonardo Richter Bays, Marinho Pilla Barcellos, Luciana
Salete Buriol e Luciano Paschoal Gaspary.
Journal of Network and Computer Applications, 2016.
• QoS Aware Schedulers for Multi-users on OFDMA Downlink: Optimal and Heuris-
tic.
Matheus Cadori Nogueira, Marcelo Caggiani Luizelli, Samuel Marini, Cristiano Both,
Juergen Rochol; Armando Ordonez e Oscar Caicedo.
IEEE Latin-American Conference on Communications, 2016.
• HIPER: Heuristic-based Infrastructure Expansion through Partition Reconnection
for Efficient Virtual Network Embedding.
Marcelo Caggiani Luizelli, Leonardo Richter Bays, Marinho Pilla Barcellos e Luciano
Paschoal Gaspary.
IFIP/IEEE/ACM International Conference on Network and Service Management (CNSM),
2014.
• Survivor: an Enhanced Controller Placement Strategy for Improving SDN Surviv-
ability
Lucas Fernando Muller, Rodrigo Ruas de Oliveira, Marcelo Caggiani Luizelli, Luciano
Paschoal Gaspary e Marinho Pilla Barcellos.
IEEE Global Communications Conference (GLOBECOM), 2014.
• Reconectando Partições de Infraestruturas Físicas: Rumo a uma Estratégia de Ex-
pansão para o Mapeamento Eficiente de Redes Virtuais
Marcelo Caggiani Luizelli, Leonardo Richter Bays, Marinho Pilla Barcellos, Luciana
Salete Buriol e Luciano Paschoal Gaspary.
Simpósio Brasileiro de Redes de Computadores e Sistemas Distribuídos (SBRC), 2014.
113

REFERENCES

AHUJA, R. K.; MAGNANTI, T. L.; ORLIN, J. B. Network flows - theory, algorithms and
applications. New Jersey, USA: Prentice Hall, 1993. ISBN 978-0-13-617549-0.

ALBERT, R.; BARABÁSI, A.-L. Topology of evolving networks: Local events and
universality. Physical Review Letters, American Physical Society, v. 85, p. 5234 – 5237, Dec
2000.

BARI, M. F.; CHOWDHURY, S. R.; AHMED, R.; BOUTABA, R. On orchestrating virtual


network functions. In: INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE
MANAGEMENT. Proceedings... Washington, DC, USA: IEEE Computer Society, 2015.
(CNSM ’15), p. 50–56.

BARKAI, S.; KATZ, R.; FARINACCI, D.; MEYER, D. Software defined flow-mapping for
scaling virtualized network functions. In: ACM SIGCOMM WORKSHOP ON HOT TOPICS
IN SOFTWARE DEFINED NETWORKING. Proceedings... New York, NY, USA: ACM,
2013. (ACM SIGCOMM HotSDN’13), p. 149–150.

BASTA, A.; KELLERER, W.; HOFFMANN, M.; MORPER, H. J.; HOFFMANN, K. Applying
nfv and sdn to lte mobile core gateways, the functions placement problem. In: WORKSHOP
ON ALL THINGS CELLULAR: OPERATIONS, APPLICATIONS AND CHALLENGES.
Proceedings... New York, NY, USA: ACM, 2014. (AllThingsCellular ’14), p. 33–38.

BEN-BASAT, R.; EINZIGER, G.; FRIEDMAN, R.; KASSNER, Y. Heavy hitters in streams
and sliding windows. In: IEEE INFOCOM. Proceedings... Piscataway, NJ, USA: IEEE Press,
2016. (INFOCOM ’16), p. 1–9.

BEN-BASAT, R.; EINZIGER, G.; FRIEDMAN, R.; LUIZELLI, M. C.; WAISBARD, E.


Constant time updates in hierarchical heavy hitters. In: ACM SIGCOMM CONFERENCE ON
DATA COMMUNICATION. Proceedings... New York, NY, USA: ACM, 2017. (SIGCOMM
’17), p. 1–14.

BENSON, T.; AKELLA, A.; SHAIKH, A. Demystifying configuration challenges and


trade-offs in network-based isp services. In: ACM SIGCOMM CONFERENCE ON DATA
COMMUNICATION. Proceedings... New York, NY, USA: ACM, 2011. (SIGCOMM ’11), p.
302–313.

BEREND, D.; TASSA, T. Improved bounds on bell numbers and on moments of sums of
random variables. Probability and Mathematical Statistics, v. 30, n. 2, p. 185–205, 2010.

BERNAL, M. V.; CERRATO, I.; RISSO, F.; VERBEIREN, D. Transparent optimization


of inter-virtual network function communication in open vswitch. In: INTERNATIONAL
CONFERENCE ON CLOUD NETWORKING. Proceedings... Piscataway, NJ, USA: IEEE
Press, 2016. (Cloudnet ’16), p. 76–82.

BHAMARE, D.; JAIN, R.; SAMAKA, M.; ERBAD, A. A survey on service function chaining.
Journal of Network and Computer Applications, v. 75, p. 138–155, 2016.
114

BONELLI, N.; PIETRO, A. D.; GIORDANO, S.; PROCISSI, G. On multi gigabit packet
capturing with multi core commodity hardware. In: INTERNATIONAL CONFERENCE
ON PASSIVE AND ACTIVE MEASUREMENT. Proceedings... Berlin, Heidelberg:
Springer-Verlag, 2012. (PAM’12), p. 64–73.
BOSSHART, P.; DALY, D.; GIBB, G.; IZZARD, M.; MCKEOWN, N.; REXFORD,
J.; SCHLESINGER, C.; TALAYCO, D.; VAHDAT, A.; VARGHESE, G.; WALKER,
D. P4: Programming protocol-independent packet processors. SIGCOMM Computer
Communication Review, ACM, New York, NY, USA, v. 44, n. 3, p. 87–95, jul 2014.
BOUET, M.; LEGUAY, J.; CONAN, V. Cost-based placement of vdpi functions in nfv
infrastructures. In: CONFERENCE ON NETWORK SOFTWARIZATION. Proceedings...
Piscataway, NJ, USA: IEEE Press, 2015. (NetSoft’15), p. 1–9.
CHOI, B.-Y.; MOON, S.; ZHANG, Z.-L.; PAPAGIANNAKI, K.; DIOT, C. Analysis of
point-to-point packet delay in an operational network. Computer Networks, Elsevier
North-Holland, Inc., New York, NY, USA, v. 51, p. 3812–3827, 2007.
CISCO. Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update,
2015-2020. 2016. Available at: http://www.cisco.com/c/en/us/solutions/collateral/service-
provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html. Visited on: Jan.
30, 2016.
CLAISE, B. Cisco Systems NetFlow Services Export Version 9. 2004. Available at:
https://www.ietf.org/rfc/rfc3954.txt. Accessed on: Jun. 1, 2016.
CLAYMAN, S.; MAINI, E.; GALIS, A.; MANZALINI, A.; MAZZOCCA, N. The dynamic
placement of virtual network functions. In: IEEE NETWORK OPERATIONS AND
MANAGEMENT SYMPOSIUM. Proceedings... Piscataway, NJ, USA: IEEE Press, 2014.
(NOMS’14), p. 1–9.
CORBET, J.; RUBINI, A.; KROAH-HARTMAN, G. Linux device drivers – where the
Kernel meets the hardware. Sebastopol, EUA: O’Reilly, 2005.
CORPORATION, M. Sockperf Traffic Generator. 2016. Available at:
https://github.com/Mellanox/sockperf/. Accessed on: Oct. 20, 2015.
DOBRESCU, M.; ARGYRAKI, K.; RATNASAMY, S. Toward predictable performance in
software packet-processing platforms. In: USENIX CONFERENCE ON NETWORKED
SYSTEMS DESIGN AND IMPLEMENTATION. Proceedings... San Jose, CA: USENIX,
2012. (NSDI’12), p. 141–154.
EMMERICH, P.; GALLENMüLLER, S.; RAUMER, D.; WOHLFART, F.; CARLE, G.
Moongen: A scriptable high-speed packet generator. In: ACM INTERNET MEASUREMENT
CONFERENCE. Proceedings... New York, NY, USA: ACM, 2015. (IMC ’15), p. 275–287.
GAREY, M. R.; JOHNSON, D. S. Computers and Intractability: A Guide to the Theory of
NP-Completeness. New York, NY, USA: W. H. Freeman, 1979.
GEMBER-JACOBSON, A.; VISWANATHAN, R.; PRAKASH, C.; GRANDL, R.; KHALID,
J.; DAS, S.; AKELLA, A. Opennf: Enabling innovation in network function control. In: ACM
CONFERENCE ON COMPUTER COMMUNICATIONS. Proceedings... New York, NY,
USA: ACM, 2014. (ACM SIGCOMM’14), p. 163–174.
115

GHAZNAVI, M.; KHAN, A.; SHAHRIAR, N.; ALSUBHI, K.; AHMED, R.; BOUTABA,
R. Elastic virtual network function placement. In: IEEE INTERNATIONAL CONFERENCE
ON CLOUD NETWORKING. Proceedings... Piscataway, NJ, USA: IEEE Press, 2015.
(CloudNet’15), p. 255–260.

GROUP, N. F. I. S. Network Function Virtualisation (NFV): An Introduction,


Benefits, Enablers, Challenges and Call for Action. 2012. 1–16 p. Available at:
https://portal.etsi.org/nfv/. Accessed on: Jan. 20, 2015.

HALPERN, J.; PIGNATARO, C. Service Function Chaining (SFC) Architecture. 2015.


Available at: https://datatracker.ietf.org/doc/draft-ietf-sfc-architecture/. Accessed on: Dez. 1,
2015.

HAN, B.; GOPALAKRISHNAN, V.; JI, L.; LEE, S. Network function virtualization:
Challenges and opportunities for innovations. IEEE Communications Magazine, v. 53, n. 2,
p. 90–97, Feb 2015.

HANSEN, P.; MLADENOVIć, N. Variable neighborhood search: Principles and applications.


European Journal of Operational Research, v. 130, n. 3, p. 449–467, 2001.

HELBER, S.; SAHLING, F. A fix-and-optimize approach for the multi-level capacitated lot
sizing problem. International Journal of Production Economics, v. 123, n. 2, p. 247–256,
2010.

HERRERA, J. G.; BOTERO, J. F. Resource allocation in nfv: A comprehensive survey. IEEE


Transactions on Network and Service Management, v. 13, n. 3, p. 518–532, Sept 2016.

HWANG, J.; RAMAKRISHNAN, K. K.; WOOD, T. Netvm: High performance and flexible
networking using virtualization on commodity platforms. In: USENIX CONFERENCE ON
NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION. Proceedings... Oakland,
CA: USENIX Association, 2014. (NSDI’14), p. 34–47.

HWANG, J.; RAMAKRISHNAN, K. K.; WOOD, T. Netvm: High performance and flexible
networking using virtualization on commodity platforms. IEEE Transactions on Network
and Service Management, v. 12, n. 1, p. 34–47, March 2015.

INTEL. Intel DPDK API. 2016. Available at: http://dpdk.org/. Visited on: May 19, 2016.

INTEL. Intel Open Network Platform Release 2.1: Performance Test Report. 2016.
Available at: https://01.org/packet-processing/intel/textregistered-onp. Visited on: Jun. 15,
2016.

KELLEHER, J.; O’SULLIVAN, B. Generating All Partitions: A Comparison Of Two


Encodings. 2009. Available at: https://arxiv.org/abs/0909.2331. Visited on: Jul. 15, 2016.

KUO, T.; LIOU, B.; LIN, J.; TSAI, M. Deploying chains of virtual network functions: On the
relation between link and server usage. In: IEEE INTERNATIONAL CONFERENCE ON
COMPUTER COMMUNICATIONS. Proceedings... San Francisco, USA: IEEE Press, 2016.
(INFOCOM’16), p. 1–9.

L. Deri. Direct NIC Access. 2016. Available at: http://www.ntop.org/products/packet-


capture/pf_ring. Accessed on: Jun. 1, 2016.
116

LEPERS, B.; QUÉMA, V.; FEDOROVA, A. Thread and memory placement on numa systems:
Asymmetry matters. In: USENIX CONFERENCE ON USENIX ANNUAL TECHNICAL
CONFERENCE. Proceedings... Berkeley, CA, USA: USENIX Association, 2015. (USENIX
ATC’15), p. 277–289.

LEWIN-EYTAN, L.; NAOR, J.; COHEN, R.; RAZ, D. Near optimal placement of virtual
network functions. In: IEEE INFOCOM. Proceedings... New York, NY, USA: IEEE, 2015.
(INFOCOM’15), p. 1346–1354.

LUIZELLI, M. C.; BAYS, L. R.; BURIOL, L. S.; BARCELLOS, M. P.; GASPARY, L. P.


Piecing together the nfv provisioning puzzle: Efficient placement and chaining of virtual
network functions. In: IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED
NETWORK MANAGEMENT. Proceedings... New York, NY, USA: IEEE, 2015. (IM’15), p.
98–106.

LUIZELLI, M. C.; BAYS, L. R.; BURIOL, L. S.; BARCELLOS, M. P.; GASPARY, L. P. How
physical network topologies affect virtual network embedding quality: A characterization study
based on {ISP} and datacenter networks. Journal of Network and Computer Applications,
v. 70, p. 1 – 16, 2016.

LUIZELLI, M. C.; CORDEIRO, W. L. da C.; BURIOL, L. S.; GASPARY, L. P. A


fix-and-optimize approach for efficient and large scale virtual network function placement and
chaining. Computer Communications, v. 102, p. 67 – 77, 2017.

LUIZELLI, M. C.; RAZ, D.; SAAR, Y.; YALLOUZ, J. The actual cost of software switching
for nfv chaining. In: IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED
NETWORK MANAGEMENT. Proceedings... New York, NY, USA: IEEE, 2017. (IM’17), p.
335–343.

LUKOVSZKI, T.; ROST, M.; SCHMID, S. It’s a match!: Near-optimal and incremental
middlebox deployment. SIGCOMM Computer Communication Review, v. 46, n. 1, p.
30–36, jan. 2016.

LUKOVSZKI, T.; SCHMID, S. Online admission control and embedding of service chains.
In: INTERNATIONAL COLLOQUIUM ON STRUCTURAL INFORMATION AND
COMMUNICATION COMPLEXITY. Proceedings... New York, USA: Springer-Verlag, 2015.
(SIROCCO’15).

MARTINS, J.; AHMED, M.; RAICIU, C.; OLTEANU, V.; HONDA, M.; BIFULCO, R.;
HUICI, F. Clickos and the art of network function virtualization. In: USENIX CONFERENCE
ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION. Proceedings...
Berkeley, CA, USA: USENIX Association, 2014. (NSDI’14).

MCKEOWN, N.; ANDERSON, T.; BALAKRISHNAN, H.; PARULKAR, G.; PETERSON,


L.; REXFORD, J.; SHENKER, S.; TURNER, J. Openflow: Enabling innovation in campus
networks. SIGCOMM Computer Communication Review, ACM, New York, NY, USA,
v. 38, n. 2, p. 69–74, mar. 2008.

MECHTRI, M.; GHRIBI, C.; SOUALAH, O.; ZEGHLACHE, D. Nfv orchestration framework
addressing sfc challenges. IEEE Communications Magazine, v. 55, n. 6, p. 16–23, Jun. 2017.
117

MEHRAGHDAM, S.; KELLER, M.; KARL, H. Specifying and placing chains of


virtual network functions. In: IEEE INTERNATIONAL CONFERENCE ON CLOUD
NETWORKING. Proceedings... Piscataway, NJ, USA: IEEE Press, 2014. (CloudNet’14), p.
7–13.

MERCA, M. Fast algorithm for generating ascending compositions. Journal of Mathematical


Modelling and Algorithms, Springer-Verlag, Berlin, Heidelberg, v. 11, n. 1, p. 89–104, 2012.

MIJUMBI, R.; SERRAT, J.; GORRICHO, J. L.; BOUTEN, N.; TURCK, F. D.; BOUTABA,
R. Network function virtualization: State-of-the-art and research challenges. IEEE
Communications Surveys Tutorials, v. 18, n. 1, p. 236–262, Jun 2016.

MOENS, H.; TURCK, F. D. Vnf-p: A model for efficient placement of virtualized network
functions. In: INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE
MANAGEMENT. Proceedings... Piscataway, NJ, USA: IEEE Press, 2014. (CNSM’14
MiniConf), p. 418–423.

ONF. OpenStack. 2015. Available at: http://www.openstack.org. Visited on: Jul. 30, 2016.

ONF. Open vSwitch. 2016. Available at: http://www.openvswitch.org. Visited on: Jul. 30,
2016.

PFAFF, B.; PETTIT, J.; KOPONEN, T.; AMIDON, K.; CASADO, M.; SHENKER, S.
Extending networking into the virtualization layer. In: ACM WORKSHOP ON HOT TOPICS
IN NETWORKS. Proceedings... New York, NY, USA: ACM, 2009. (HotNets’09), p. 1–6.

PFAFF, B.; PETTIT, J.; KOPONEN, T.; JACKSON, E.; ZHOU, A.; RAJAHALME, J.;
GROSS, J.; WANG, A.; STRINGER, J.; SHELAR, P.; AMIDON, K.; CASADO, M. The
design and implementation of open vswitch. In: USENIX SYMPOSIUM ON NETWORKED
SYSTEMS DESIGN AND IMPLEMENTATION. Proceedings... Oakland, CA: USENIX
Association, 2015. (NSDI’15), p. 117–130.

PHAAL, P.; PANCHEN, S.; PANCHEN, S. InMon Corporation’s sFlow: A Method


for Monitoring Traffic in Switched and Routed Networks. 2001. Available at:
https://www.ietf.org/rfc/rfc3176.txt. Accessed on: Jan. 1, 2016.

QUINN, P.; ELZUR., U. Network Service Header. 2016. Available at:


https://datatracker.ietf.org/doc/draft-ietf-sfc-nsh/ . Accessed on: Oct. 1, 2016.

RANKOTHGE, W.; MA, J.; LE, F.; RUSSO, A.; LOBO, J. Towards making network function
virtualization a cloud computing service. In: IFIP/IEEE INTERNATIONAL SYMPOSIUM
ON INTEGRATED NETWORK MANAGEMENT. Proceedings... Piscataway, NJ, USA:
IEEE Press, 2015. (IM’15), p. 89–97.

RIZZO, L. Netmap: A novel framework for fast packet i/o. In: USENIX CONFERENCE
ON ANNUAL TECHNICAL CONFERENCE. Proceedings... Berkeley, CA, USA: USENIX
Association, 2012. (ATC’12), p. 9–21.

RIZZO, L.; LETTIERI, G. Vale, a switched ethernet for virtual machines. In: INTER-
NATIONAL CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND
TECHNOLOGIES. Proceedings... New York, NY, USA: ACM, 2012. (CoNEXT’12), p.
61–72.
118

ROST, M.; SCHMID, S. Service Chain and Virtual Network Embeddings: Approximations
using Randomized Rounding. 2016. Available at: https://arxiv.org/abs/1604.02180 . Accessed
on: Apr. 15, 2016.

SAHLING, F.; BUSCHKüHL, L.; TEMPELMEIER, H.; HELBER, S. Solving a multi-level


capacitated lot sizing problem with multi-period setup carry-over via a fix-and-optimize
heuristic. Computers & Operations Research, v. 36, n. 9, p. 2546 – 2553, 2009.

SEKAR, V.; EGI, N.; RATNASAMY, S.; REITER, M. K.; SHI, G. Design and implementation
of a consolidated middlebox architecture. In: USENIX CONFERENCE ON NETWORKED
SYSTEMS DESIGN AND IMPLEMENTATION. Proceedings... Berkeley, CA, USA:
USENIX Association, 2012. (NSDI’12), p. 24.

SHAHBAZ, M.; CHOI, S.; PFAFF, B.; KIM, C.; FEAMSTER, N.; MCKEOWN, N.;
REXFORD, J. Pisces: A programmable, protocol-independent software switch. In: ACM
SIGCOMM CONFERENCE. Proceedings... New York, NY, USA: ACM, 2016. (SIGCOMM
’16), p. 525–538.

YU, M. Programmable measurement architecture. In: ACM/IEEE SYMPOSIUM ON


ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS.
Proceedings... [S.l.], 2014. (ANCS’14).

YU, M.; JOSE, L.; MIAO, R. Software defined traffic measurement with opensketch. In:
USENIX CONFERENCE ON NETWORKED SYSTEMS DESIGN AND IMPLEMEN-
TATION. Proceedings... Berkeley, CA, USA: USENIX Association, 2013. (NSDI’13), p.
29–42.

YU, M.; YI, Y.; REXFORD, J.; CHIANG, M. Rethinking virtual network embedding: Substrate
support for path splitting and migration. ACM SIGCOMM Computer Communication
Review (CCR), ACM, New York, NY, USA, v. 38, n. 2, p. 17–29, mar. 2008. ISSN 0146-4833.

ZHANG, W.; HWANG, J.; RAJAGOPALAN, S.; RAMAKRISHNAN, K.; WOOD, T.


Flurries: Countless fine-grained nfs for flexible per-flow customization. In: INTERNA-
TIONAL ON CONFERENCE ON EMERGING NETWORKING EXPERIMENTS AND
TECHNOLOGIES. Proceedings... New York, NY, USA: ACM, 2016. (CoNEXT’16), p. 3–17.

ZHOU, D.; FAN, B.; LIM, H.; KAMINSKY, M.; ANDERSEN, D. G. Scalable, high
performance ethernet forwarding with cuckooswitch. In: Proceedings... New York, NY, USA:
ACM, 2013. (CoNEXT’13), p. 97–108.

ZHU, Y.; AMMAR, M. Algorithms for assigning substrate network resources to


virtual network components. In: INTERNATIONAL CONFERENCE ON COMPUTER
COMMUNICATIONS. Proceedings... Piscataway, NJ, USA: IEEE Press, 2006.
(INFOCOM’06), p. 1–12.

ZHU, Y.; KANG, N.; CAO, J.; GREENBERG, A.; LU, G.; MAHAJAN, R.; MALTZ, D.;
YUAN, L.; ZHANG, M.; ZHAO, B. Y.; ZHENG, H. Packet-level telemetry in large datacenter
networks. In: Proceedings... New York, NY, USA: ACM, 2015. (SIGCOMM ’15), p. 479–491.
119

APPENDIX A RESUMO ESTENDIDO DA TESE

Middleboxes (ou funções de rede - NF) desempenham um papel essencial nas redes de com-
putadores, uma vez que fornecem um conjunto diversificado de funcionalidades que abrangem
desde aspectos de segurança (por exemplo, firewall e detecção de intrusão) a questões de desem-
penho (por exemplo, caching e proxying) (MARTINS et al., 2014). Conforme implementado,
os middleboxes são difíceis de implantar e operar. Isto se dá principalmente devido aos procedi-
mentos que precisam ser seguidos, como lidar com uma ampla variedade de interfaces hardware
customizadas e o encadeamento manual NFs para garantir o comportamento desejado de um
serviço de rede. Além disso, estudos evidenciam que o número de middleboxes presente em
redes empresariais (assim como nas redes de datacenter e ISP) é similar ao número de disposi-
tivos de encaminhamento (BENSON; AKELLA; SHAIKH, 2011; SEKAR et al., 2012). Desta
forma, as dificuldades acima mencionadas são exacerbadas pela complexidade imposta pelo
alto número de funções de rede que um provedor de rede tem que gerencia – o que diretamente
leva ao aumento dos custos operacionais. Para além dos custos relacionados à implantação e
com o encadeamento das NFs, a frequente necessidade de atualizações de hardware aumenta
substancialmente os investimentos necessários para operacionalizar serviços de rede.

A virtualização de funções de rede (NFV) é um novo paradigma em redes de computa-


dores que foi proposto para migrar o processamento de NFs de dispositivos de hardware es-
pecializados para software executado em hardware de prateleira (GROUP, 2012). Além de
potencialmente reduzir os custos de aquisição e manutenção de equipamentos, espera-se que o
paradigma NFV permita que os provedores de rede aproveitem os benefícios da virtualização
no gerenciamento de funções de rede (por exemplo, elasticidade, desempenho, flexibilidade,
etc.). Neste contexto, as Redes Definidas por Software (Software-Defined Networking – SDN)
podem ser consideradas uma tecnologia complementar conveniente, que, se disponível, tem o
potencial de facilitar o encadeamento das funções de rede acima mencionadas. Na verdade,
o paradigma SDN tem potencial para facilitar problema de encadeamento de funções de rede
(Service Function Chaining – SFC).

Em suma, o problema de encadeamento de NFs consiste em garantir que os fluxos de rede


sejam encaminhados de maneira eficiente através de caminhos que passem por um dado con-
junto de middleboxes (MECHTRI et al., 2017). No contexto de redes NFV/SDN e considerando
a flexibilidade oferecida por estes ambientes, o problema consiste em definir de forma otimizada
quantas instâncias de funções de rede virtual (Virtual Network Function – VNF) são necessárias
e onde posicioná-las na infraestrutura. Além disso, o problema consiste em determinar camin-
hos fim-a-fim em que os fluxos de rede devem ser encaminhados de modo a passar pelas funções
de rede necessárias.
120

A.1 Definição do Problema

Nos últimos anos, houve significativos avanços em NFV, abordando principalmente aspec-
tos relacionados ao planejamento e implantação (LEWIN-EYTAN et al., 2015; KUO et al.,
2016; LUIZELLI et al., 2017a) e a operação e gerenciamento de tais ambientes (HWANG; RA-
MAKRISHNAN; WOOD, 2014; GEMBER-JACOBSON et al., 2014; ZHANG et al., 2016).
No entanto, NFV é um paradigma relativamente novo e ainda em faze de amadurecimento, com
várias questões de pesquisa em aberto. Como mencionado, um dos problemas mais desafiantes
consiste em posicionar e encadear VNF de maneira eficiente.
Este problema é particularmente desafiador por várias razões. Primeiro, NFV é um paradigma
de rede inerentemente distribuído baseado em nós com poder de computação (por exemplo,
small clouds) espalhados pela infraestrutura. Portanto, dependendo de como os VNFs estão
posicionadas e encadeadas na infraestrutura, os atrasos fim-a-fim observados em um serviço
de rede podem tornar-se intoleráveis. Esse problema é agravado pelo fato de que os tempos
de processamento tendem a ser maiores devido ao uso da virtualização e podem variar, depen-
dendo do tipo de função de rede e da configuração de hardware do dispositivo que o hospeda.
Em segundo lugar, mesmo quando as VNFs estejam implementadas em um único datacenter,
os serviços de rede podem frequentemente enfrentar degradação de desempenho em métricas
de rede (como, por exemplo, vazão, latência e jitter), dependendo de como as funções de rede
estão encadeadas e implantadas em dispositivos físicos. Em terceiro lugar, a alocação de recur-
sos deve ser realizada de maneira econômica, evitando o desperdício de recursos físicos. Por
conseguinte, o posicionamento de funções de rede e o o encadeamento dos fluxos representam
um passo essencial para permitir o uso de NFV em ambientes de produção. Nos parágrafos
seguintes, fornece-se uma visão geral dos problemas de posicionamento e encadeamento da
função de rede virtual (VNFPC) abordados nesta tese.
Posicionamento e Encadeamento de Funções Virtualizadas de Rede no contexto Inter-
datacenter. Ilustra-se na Figura A.1 o problema de posicionar e encadear funções virtualizadas
de rede no contexto inter-datacenter. O problema envolve a implantação de duas solicitações de
SFCs. Para a primeira requisição, os fluxos de entrada devem ser transmitidos para uma instân-
cia da função de rede virtual (VNF) 1 (por exemplo, um firewall) e, em seguida, para a VNF 2
(por exemplo, um balanceador de carga). A segunda requisição especifica que os fluxos de en-
trada também devem ser encaminhados para a VNF 1 e, em seguida, pela VNF 3 (por exemplo,
um proxy). Ambos as SFCs são ilustradas nas Figuras A.1(a) e A.1(b), respectivamente.
As instâncias de VNF necessárias para cada serviço devem ser posicionadas em pontos de
presença da rede (N-PoPs). Os N-PoPs são infraestruturas (por exemplo, servidores, clusters
ou mesmo datacenters) espalhados na infraestrutura nos quais as funções de rede podem ser
aprovisionadas. Sem perda de generalidade, assume-se a existência de um N-PoP associado a
cada dispositivo de encaminhamento em uma infraestrutura de backbone (representado pelos
círculos na Figura A.1(c)). Cada N-PoP tem uma certa quantidade de recursos (por exemplo,
121

Figure A.1: Exemplo de SFCs e uma visão parcial da infraestrutura de rede.

B
A D
A
VNF 1 VNF 3
VNF 1 VNF 2 C

(a) SFC com fluxo bifurcado (b) SFC com fluxo único

(1) (4)
B (8)
A (2) (5)
C
(7)

Endpoint
D (6)
Forwarding
(3) device/N-PoP

(c) Infraestrutura de rede e o aprovisionamento das SFCs

Fonte: autor (2016).

poder de computação disponível). Da mesma maneira, cada SFC tem seus próprios requisitos.
Por exemplo, as funções que compõem uma SFC devem sustentar uma determinada carga de
trabalho e, portanto, estão associadas a requisitos de computação. Além disso, espera-se que o
tráfego entre funções alcance uma vazão máxima, que deve ser tratado pelo caminho físico que
conecta os N-PoPs que hospedam essas funções.
Dado o contexto acima, o primeiro problema que abordado nesta tese consiste em encontrar
um posicionamento adequada para VNFs em N-PoPs distribuídos e realizar o encadeamento
das mesmas, de modo que a utilização de recursos da infraestrutura seja minimizado. O posi-
cionamento e o encadeamento devem garantir cada um dos requisitos das SFCs, bem como
as restrições da infraestrutura. Uma possível implantação é ilustrada na Figura A.1. Os end-
points, representados como círculos preenchidos, denotam fluxos originados/destinados de/para
dispositivos/redes conectados aos dispositivos de encaminhamento da infraestrutura. Observe
que ambas as SFCs compartilham uma mesma instância da VNF 1, posicionada no N-PoP (2),
minimizando a alocação de recursos conforme desejado. Embora relativamente simples para
pequenas instâncias, o problema de posicionamento e encadeamento de VNFs é NP-completo
(LUIZELLI et al., 2017a), como discutido no Capítulo3.
Posicionamento e Encadeamento de Funções Virtualizadas de Rede no contexto Intra-
datacenter. O segundo problema que abordado trata especificamente do posicionamento de
122

Figure A.2: Exemplo de estratégias para aprovisionar SFCs

Server 𝑪 Server 𝑪
𝝋𝟑𝟏 𝝋𝟑𝟐 𝝋𝟑𝟑 𝝋𝟏𝟑 𝝋𝟐𝟑 𝝋𝟑𝟑
Starting Starting
Server 𝑩 Server 𝑩
Point 𝝋𝟐𝟏 𝝋𝟐𝟐 𝝋𝟐𝟑 Point 𝝋𝟏𝟐 𝝋𝟐𝟐 𝝋𝟑𝟐
End End
Server 𝑨 Server 𝑨
𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑 Point 𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 Point

(a) Estratégia de posicionamento “Gather” (b) Estratégia de posicionamento “Distribute”

Fonte: autor (2017).

VNFs em servidores de datacenters (N-PoPs). Uma solução para o problema acima (isto é,
o inter-datacenter VNFPC) envolve o posicionamento e encadeamento de requisições de SFC
em vários locais (distribuídos). Particularmente, este é o caso quando os SFCs possuem req-
uisitos de rede rigorosos (por exemplo, atrasos fim-a-fim muito baixos). Como resultado do
planejamento global (isto é, do inter-datacenter), SFCs podem ser potencialmente divididas
em sub-cadeias individualmente posicionadas e encadeadas em datacenters específicos. Então,
essas requisições de SFC parciais são implantadas em servidores disponíveis.
No contexto do problema VNFPC intra-datacenter, a identificação de mecanismos de im-
plantação que minimizem o custo de provisionamento dos encadeamentos de serviços tem rece-
bido significativa atenção da academia e indústria (MEHRAGHDAM; KELLER; KARL, 2014;
CLAYMAN et al., 2014; GHAZNAVI et al., 2015; LEWIN-EYTAN et al., 2015; LUIZELLI et
al., 2015; BARI et al., 2015; RANKOTHGE et al., 2015; BOUET; LEGUAY; CONAN, 2015;
LUKOVSZKI; SCHMID, 2015; KUO et al., 2016; ROST; SCHMID, 2016; LUKOVSZKI;
ROST; SCHMID, 2016; LUIZELLI et al., 2017a). No entanto, estudos existentes negligenciam
o custo operacional real das implantações de SFCs em ambientes NFV típicos. Portanto, os
modelos propostos (por exemplo, utilizados em orquestradores NFV) podem levar à soluções
infactíveis (em termos de requisitos de CPU) ou sofrer penalizações no desempenho esperado.
Em uma tentativa de abordar essa lacuna, avalia-se e modela-se os custos operacionais
de elementos encaminhamento virtualizados (virtual switching) em uma infraestrutura NFV
(LUIZELLI et al., 2017b). Neste ambiente, virtual switching é um bloco de construção essen-
cial que permite a comunicação flexível entre VNFs. No entanto, sua operação vem com um
custo em termos de recursos computacionais que precisam ser alocados para a camada de en-
caminhamento, a fim de orientar o tráfego através de serviços em execução (além de recursos
computacionais requeridos pelas VNFs). Esse custo depende principalmente da maneira como
as VNFs são internamente encadeadas, requisitos de processamento de pacotes e tecnologias de
aceleração (por exemplo, Intel DPDK (INTEL, 2016a)).
A Figura A.2 ilustra possíveis implementações de cadeias de serviço em três servidores
físicos idênticos (A, B e C). Como é possível observar, todas as cadeias de serviço ϕ são com-
123

postas pelas mesmas VNFs - ϕ1,2,3 = hϕ1 → ϕ2 → ϕ3 i. Por simplicidade, assume-se que toda
o roteamento do tráfego é realizado pela camada de switching (virtual) existente em cada servi-
dor. Ademais, todas as VNFs requerem a mesma quantidade de CPU para executar suas tarefas,
além de estarem associadas a uma mesma quantidade conhecida de tráfego. Figuras A.2(a) e
A.2(b) ilustram duas estratégias de implantação de VNFs amplamente aplicadas no contexto
do OpenStack Cloud Orchestrator (ONF, 2015). Na Figura A.2(a), todas as VNFs de uma
única requisição SFC são implantadas no mesmo servidor, estratégia referida como “gather”.
Em contraste, a Figura A.2(b) ilustra uma estratégia de implantação em que cada VNF de um
mesma SFC é implantado em diferentes servidores (referenciada como “distribute”). Observe
que todos os servidores têm o mesmo número de VNF e, portanto, os servidores apresentam os
mesmos requisitos de processamento da CPU. No entanto, determinar a quantidade de recursos
de processamento necessários para operar a camada de encaminhamento virtual (virtual switch-
ing) dentro de cada servidor está longe de ser direta mesmo quando se considera estratégias
de implantação simples. Portanto, a compreensão desses custos operacionais em implantações
reais de NFV é de suma importância por três razões principais: (i) para garantir que os req-
uisitos de desempenho dos serviços de rede implantados (por exemplo, vazão máxima de uma
SFC); (ii) para projetar estratégias de posicionamento eficientes e estimar com precisão o custo
operacional para estratégias arbitrárias; e (iii) para reduzir o custo operacional (em particular, o
consumo de CPU da camada de encaminhamento) dos provedores de NFV.

A.2 Objetivos e Contribuições

O estudo desenvolvido nesta tese tem cinco objetivos principais: (i) formalizar o problema
de posicionamento e encadeamento de funções virtuais de rede (inter e intra-datacenter); (ii)
projetar métodos algorítmicos eficientes e escaláveis para prover soluções com alta qualidade
para o problema VNFPC; (iii) aferir e modelar os custos operacionais da implantação de SFCs,
bem como as limitações de desempenho da infraestrutura de rede, em um ambiente NFV típico;
(iv) minimizar os custos operacionais incorridos nas infraestruturas NFV; e (v) otimizar como
as SFCs são monitoradas nas infraestruturas NFV.
Os objetivos acima descritos se desdobram nas principais contribuições científicas desta
tese, descritas abaixo.
A primeira contribuição abrange a formalização do problema de posicionar e encadear
função de rede virtuais (VNFPC) por meio de um modelo de programação Linear Inteira Inteiro
(LUIZELLI et al., 2015). O modelo desenvolvido considera uma ampla gama de requisitos típi-
cos em ambientes NFV (por exemplo, poder de computação das funções de rede, requisito de
capacidade de fluxo, etc.). Além disso, para lidar com as infraestruturas NFV de tamanho mé-
dio, propõe-se um procedimento heurístico que orienta eficientemente os solvers comerciais ao
longo da exploração de soluções. Compara-se as abordagens ótimas e heurísticas considerando
diferentes casos de uso e métricas, como o número de funções de rede virtual instanciadas, o
124

consumo de recursos físicos e virtuais e latências fim-a-fim.


Como a segunda grande contribuição, abordamos a escalabilidade do problema VNFPC,
propondo um algoritmo heurístico baseado em fix-and-optimize (LUIZELLI et al., 2017a). O
método combina programação matemática (Integer Linear Programming) e o método heurís-
tico (Variable Neighborhood Search (HANSEN; MLADENOVIć, 2001), de modo a produzir
soluções de alta qualidade para instâncias do problema em larga escala em um tempo hábil.
Fornece-se fortes evidências, apoiadas por um extenso conjunto de experiências, que o algo-
ritmo heurístico escala para ambientes compostos por centenas de funções de rede. Resultados
mostram que o método desenvolvido é capaz de gerar soluções apenas 20 % longe do lower-
bound calculado, e supera as soluções algorítmicas existentes, em qualidade, por um fator de
5. Além disso, prova-se a natureza NP-completa do problema em questão – este é o primeiro
estudo a validar formalmente o carácter NP-completo do problema.
A terceira contribuição do nosso trabalho compreende medir e modelar custos operacionais
e métricas de rede de diferentes estratégias de implantação de SFC em infraestruturas de NFV
reais (LUIZELLI et al., 2017b). Realiza-se uma avaliação extensa e aprofundada, medindo o
desempenho e analisando o impacto da implantação de SFC em um ambiente contendo Open
vSwitch - o switching virtual padrão para ambientes em nuvem (PFAFF et al., 2009; PFAFF et
al., 2015; ONF, 2016). Com base nessa avaliação, definiu-se uma função de custo generalizada
que captura com precisão o custo da CPU associados com a camada de encaminhamento virtual.
Para isso, mediu-se os custos de encaminhamento para duas estratégias de implantação de VNFs
amplamente aplicadas – a saber, “distributed” e “gather”.
Como a quarta contribuição, desenvolvemos um mecanismo algorítmico de implantação de
SFCs que considera tanto o desempenho real dos serviços quanto os recursos necessários para a
operação (isto é, para a camada de encaminhamento). O método decompõe cada SFC em sub-
cadeias e implanta cada sub-cadeia em um servidor físico (possivelmente diferente), de forma
a minimizar o custo total de encaminhamento. Para isso, projetou-se um novo algoritmo o qual
é baseado na redução bem conhecida ao problema de fluxo com custo mínimoi em redes. O
desempenho deste algoritmo em comparação com os mecanismos existentes em orquestradores
NFV (por exemplo, módulo nova-scheduler no OpenStack) mostra que o algoritmo de-
senvolvido supera significativamente o OpenStack (por um fator de até 4) em relação aos custos
operacionais.
Como quinta contribuição, abordamos o problema do monitoramento eficiente de – outro
importante bloco construtor para o funcionamento ideal dos serviços de rede em ambientes
NFV. Primeiro, formaliza-se o problema DNM (Distributed Network Monitoring), e então
propõe-se um modelo ILP para solucioná-lo. O modelo de otimização é capaz de coordenar
efetivamente o monitoramento de cadeias de serviços em ambientes NFV. O modelo está ciente
de componentes topológicos de SFCs, o que permite monitorar de forma independente elemen-
tos dentro de um serviço de rede com baixo overhead em termos de consumo de tráfego de rede
e coletores implantados.
125

As cinco principais contribuições desta tese são, portanto, resumidas a seguir:


1. Posicionamento e Encadeamento de VNF no Contexto Inter-datacenter.
• Formaliza-se o problema de posicionamento e encadeamento da funções virtual-
izadas de rede (VNFPC) por meio de um modelo de programação linear inteiro.
• Prova-se que o problema VNFPC é NP-completo.
• Propõe-se um procedimento heurístico que orienta dinamicamente e com eficiência
a busca de soluções realizadas por solvers comerciais. Além disso, demonstra-se
que a heurística desenvolvida escala para infraestruturas de tamanho médio – com
soluções próximas da otimalidade.
2. Escalabilidade Limitada das Soluções Existentes.
• Aborda-se a escalabilidade do VNFPC propondo um novo algoritmo heurístico
baseado em fix-and-optimize. Demonstra-se que o método proposto escala para
grandes infraestruturas NFV (isto é, milhares de nós NFV).
3. Custo Operacionais não realistas.
• Afere-se os custos operacionais incorridos de diferentes estratégias de implantação
de SFC em infraestruturas NFV típicas.
• Desenvolve-se um modelo analítico para estimar adequadamente o custo da camada
de switching (custo operacional) para diferentes estratégias de posicionamento em
infraestruturas NFV.
4. Posicionamento e Encadeamento de VNF no Contexto Intra-datacenter.
• Generaliza-se modelo analítico anterior para estimar corretamente o custo opera-
cional de estratégia arbitrária de posicionamento de VNFs.
• Projeta-se um algoritmo on-line com o objetivo de minimizar os custos operacionais
(isto é, software switching) para requisições de SFCs.
5. Monitoramento Eficiente de SFCs.
• Formaliza-se o problema de Monitoramento de Rede Distribuída (Distributed Net-
work Monitoring – DNM) e propõe-se um modelo de Programação Linear Inteira.
• Avalia-se os ganhos alcançados nas soluções DNM em relação à abordagens tradi-
cionais de monitoramento.
126

APPENDIX B PAPER PUBLISHED AT IFIP/IEEE IM 2015

• Title: Piecing together the NFV provisioning puzzle: Efficient placement and chaining of
virtual network functions
• Conference: IFIP/IEEE Integrated Network Management Symposium (IM 2015)
• Type: Main track (full-paper)
• Qualis: A2
• Date: May 11-15, 2015
• Held at: Ottawa, CA
Abstract. Network Function Virtualization (NFV) is a promising network architecture con-
cept, in which virtualization technologies are employed to manage networking functions via
software as opposed to having to rely on hardware to handle these functions. By shifting dedi-
cated, hardware-based network function processing to software running on commoditized hard-
ware, NFV has the potential to make the provisioning of network functions more flexible and
cost-effective, to mention just a few anticipated benefits. Despite consistent initial efforts to
make NFV a reality, little has been done towards efficiently placing virtual network functions
and deploying service function chains (SFC). With respect to this particular research problem,
it is important to make sure resource allocation is carefully performed and orchestrated, pre-
venting over- or under-provisioning of resources and keeping end-to-end delays comparable to
those observed in traditional middlebox-based networks. In this paper, we formalize the net-
work function placement and chaining problem and propose an Integer Linear Programming
(ILP) model to solve it. Additionally, in order to cope with large infrastructures, we propose a
heuristic procedure for efficiently guiding the ILP solver towards feasible, near-optimal solu-
tions. Results show that the proposed model leads to a reduction of up to 25% in end-to-end
delays (in comparison to chainings observed in traditional infrastructures) and an acceptable
resource over-provisioning limited to 4%. Further, we demonstrate that our heuristic approach
is able to find solutions that are very close to optimality while delivering results in a timely
manner.
Piecing Together the NFV Provisioning Puzzle:
Efficient Placement and Chaining of
Virtual Network Functions

Marcelo Caggiani Luizelli, Leonardo Richter Bays, Luciana Salete Buriol


Marinho Pilla Barcellos, Luciano Paschoal Gaspary
Institute of Informatics – Federal University of Rio Grande do Sul (UFRGS)
{mcluizelli,lrbays,buriol,marinho,paschoal}@inf.ufrgs.br

Abstract—Network Function Virtualization (NFV) is a promis- of network functions (e.g., elasticity, performance, flexibility,
ing network architecture concept, in which virtualization tech- etc.). In this context, Software-Defined Networking (SDN) can
nologies are employed to manage networking functions via be considered a convenient complementary technology, which,
software as opposed to having to rely on hardware to handle if available, has the potential to make the chaining of the
these functions. By shifting dedicated, hardware-based network aforementioned network functions much easier. In fact, it is
function processing to software running on commoditized hard- not unreasonable to state that SDN has the potential to revamp
ware, NFV has the potential to make the provisioning of network
functions more flexible and cost-effective, to mention just a few the Service Function Chaining (SFC)1 problem. In short, the
anticipated benefits. Despite consistent initial efforts to make NFV problem consists of making sure network flows go efficiently
a reality, little has been done towards efficiently placing virtual through end-to-end paths traversing sets of middleboxes. In the
network functions and deploying service function chains (SFC). NFV/SDN realm and considering the flexibility offered by this
With respect to this particular research problem, it is important environment, the problem consists of (sub)optimally defining
to make sure resource allocation is carefully performed and how many instances of virtual network functions are necessary
orchestrated, preventing over- or under-provisioning of resources and where to place them in the infrastructure. Furthermore, the
and keeping end-to-end delays comparable to those observed in problem encompasses the determination of end-to-end paths
traditional middlebox-based networks. In this paper, we formalize over which known network flows have to be transmitted so as
the network function placement and chaining problem and
propose an Integer Linear Programming (ILP) model to solve to pass through the required placed network functions.
it. Additionally, in order to cope with large infrastructures, we Despite consistent efforts to make NFV a reality [1],
propose a heuristic procedure for efficiently guiding the ILP [6], little has been done to efficiently perform the placement
solver towards feasible, near-optimal solutions. Results show
that the proposed model leads to a reduction of up to 25% and chaining of virtual network functions on physical in-
in end-to-end delays (in comparison to chainings observed in frastructures. This is particularly challenging mainly for two
traditional infrastructures) and an acceptable resource over- reasons. First, depending on how virtual network functions
provisioning limited to 4%. Further, we demonstrate that our are positioned and chained, end-to-end latencies may become
heuristic approach is able to find solutions that are very close to intolerable. This problem is aggravated by the fact that pro-
optimality while delivering results in a timely manner. cessing times tend to be higher, due to the use of virtualization,
and may vary, depending on the type of network function and
the hardware configuration of the device hosting it. Second, re-
I. I NTRODUCTION source allocation must be performed in a cost-effective manner,
Middleboxes (or Network Functions - NF) play an essential preventing over- or under-provisioning of resources. Therefore,
role in today’s networks, as they support a diverse set of placing network functions and programming network flows in
functions ranging from security (e.g., firewalling and intrusion a cost-effective manner while ensuring acceptable end-to-end
detection) to performance (e.g., caching and proxying) [1]. delays represents an essential step toward enabling the use of
As currently implemented nowadays, middleboxes are difficult NFV in production environments.
to deploy and maintain. This is mainly because cumbersome In this paper, we formalize the network function placement
procedures need to be followed, such as dealing with a and chaining problem and propose an optimization model to
variety of custom-made hardware interfaces and manually solve it. Additionally, in order to cope with large infrastruc-
chaining middleboxes to ensure the desired network behavior. tures, we propose a heuristic procedure. Both optimal and
Further, recent studies show that the number of middleboxes heuristic approaches are evaluated considering different use
in enterprise networks (as well as in datacenter and ISP cases and metrics, such as the number of instantiated virtual
networks) is similar to the number of physical routers [2]–[4]. network functions, physical and virtual resource consumption,
Thus, the aforementioned difficulties are exacerbated by the and end-to-end latencies. The main contributions of this paper
complexity imposed by the high number of network functions are then: (i) the formalization of the network function place-
that a network provider has to cope with, leading to high ment and chaining problem by means of an ILP model; (ii) the
operational expenditures. Moreover, in addition to costs related proposal of a heuristic solution, and (iii) the evaluation of both
to manually deploying and chaining middleboxes, the need proposed approaches and discussion of the obtained results.
for frequent hardware upgrades adds up to substantial capital
investments. The remainder of this paper is organized as follows. In
Section 2, we discuss related work in the area of network
Network Function Virtualization (NFV) has been proposed function virtualization. In Section 3, we formalize the network
to shift middlebox processing from specialized hardware appli- function placement and chaining problem and propose both
ances to software running on commoditized hardware [5]. In an optimal ILP model and a heuristic approach to solve it. In
addition to potentially reducing acquisition and maintenance
costs, NFV is expected to allow network providers to make 1 In this paper, the terms network function and service function are used
most of the benefits of virtualization on the management interchangeably.

978-3-901882-76-0 @2015 IFIP 98


Section 4, we present and discuss the results of an evaluation problem and the Virtual Network Embedding (VNE) problem
of the model and heuristic. Last, in Section 5 we conclude the [9]–[12]. Despite some similarities, solutions to the latter are
paper with final remarks and perspectives for future work. not appropriate to the former. The reason is twofold. First,
while in VNE we observe one-level mappings (virtual network
II. R ELATED W ORK requests ! physical network), in NFV environments we have
two-level mappings (service function chaining requests ! vir-
We now review some of the most prominent research work tual network function instances ! physical network). Second,
related to network function virtualization and the network while the VNE problem considers only one type of physical
function placement and chaining problem. We start the section device (i.e., routers), a much wider number of different net-
by discussing recent efforts aimed at evaluating the technical work functions coexist in NFV environments.
feasibility of deploying network functions on top of com-
modity hardware. Then, we review preliminary studies carried As one can observe from the state-of-the-art, the area of
out to solve different aspects of the virtual network function network function virtualization is still in its early stages. Most
placement and chaining problem. of the effort has been focused on engineering ways of run-
ning network functions on top of commodity hardware while
Hwang et al. [6] propose the NetVM platform to allow keeping performance roughly the same as the one obtained
network functions based on Intel DPDK technology to be when deploying traditional middlebox-based setups. As far as
executed at line-speed (i.e., 10 Gb/s) on top of commodity we are aware of, this paper consolidates a first consistent step
hardware. According to the authors, it is possible to accelerate towards placing virtual network functions and mapping service
network processing by mapping NIC buffers to user space function chains. Besides, it captures and discusses the trade-
memory. In another investigation, Martins et al. [1] introduce off between resources employed and performance gains in the
a high-performance middlebox platform named ClickOS. It particular context of NFV.
consists of a Xen-based middlebox software, which, by means
of alterations in I/O subsystems (back-end switch, virtual III. T HE N ETWORK F UNCTION P LACEMENT AND
net devices and back and front-end drivers), can sustain a C HAINING P ROBLEM
throughput of up to 10 Gb/s. The authors show that ClickOS
enables the execution of hundreds of virtual network functions In this section, we describe the network function placement
concurrently without incurring significant overhead (in terms and chaining problem and introduce our proposed solution.
of delay) in packet processing. The results obtained by Hwang Next, we formalize it as an Integer Linear Programming model,
et al. and Martins et al. are promising and definitely represent followed by an algorithmic approach.
an important milestone to make the idea of virtual network
functions a reality. A. Problem Overview
With respect to efficient placement and chaining of network As briefly explained earlier, network function placement
functions, the main contribution of this paper, Barkai et al. [7] and chaining consists of interconnecting a set of network func-
and Basta et al. [8] have recently taken a first step toward tions (e.g., firewall, load balancer, etc.) through the network
modeling this problem. Barkai et al., for example, propose to ensure network flows are given the correct treatment. These
mechanisms to program network flows to an SDN substrate flows must go through end-to-end paths traversing a specific set
taking into account virtual network functions through which of functions. In essence, this problem can be decomposed into
packets from these flows need to pass. In short, the problem three phases: (i) placement, (ii) assignment, and (iii) chaining.
consists of mapping SDN traffic flows properly (i.e., in the
right sequence) to virtual network functions. To solve it in a The placement phase consists of determining how many
scalable manner, the authors propose a more efficient topology network function instances are necessary to meet the cur-
awareness component, which can be used to rapidly program rent/expected demand and where to place them in the infras-
network flows. Note that they do not aim at providing a tructure. Virtual network functions are expected to be placed
(sub)optimal solution to the network function placement and on network points of presence (N-PoPs), which represent
chaining problem as we do in this paper. Instead, the scope groups of (commodity) servers in specific locations of the
of their work is more of an operational nature, i.e., building infrastructure (with processing capacity). N-PoPs, in turn,
an OpenFlow-based substrate that is efficient enough to allow would be potentially set up either in locations with previously
flows – potentially hundred of millions, with specific function installed commuting and/or routing devices or in facilities such
processing requirements – to be correct and timely mapped as datacenters.
and programmed. Our solution could be used together with
Barkai’s and therefore help the decision on where to optimally The assignment phase defines which placed virtual network
place network functions and how to correctly map network function instances (in the N-PoPs) will be in charge of each
flows. flow. Based on the source and destination of a flow, instances
are assigned to it in a way that prevents processing times
The work by Basta et al., in turn, proposes an ILP model from causing intolerable latencies. For example, it may be
for network function placement in the context of cellular more efficient to assign network function requests to the
networks and crowd events. More specifically, the problem ad- nearest virtual network function instance or to simply split
dressed is the question on whether or not virtualize and migrate the requested demand between two or more virtual network
mobile gateway functions to datacenters. When applicable, the functions (when possible).
model also encompasses the optimal selection of datacenters
that will host the virtualized functions and SDN controllers. In the third and final phase, the requested functions are
Although the paper covers optimal virtual function placement, chained. This process consists of creating paths that intercon-
the proposed model is restricted, as it does not have to deal nect the network functions placed and assigned in the previous
with function chaining. Our proposal is, in comparison, a phases. This phase takes into account two crucial factors,
broader, optimal solution. It can be applied to plan not only the namely end-to-end path latencies and distinct processing de-
placement of multiple instances of virtual network functions lays added by different virtual network functions. Figure 1
on demand, but also to map and chain service functions. depicts the main elements involved in virtual network function
placement and chaining. The physical network is composed
Before summarizing this section, we add a note on the of N-PoPs interconnected through physical links. There is a
relation between the network function placement and chaining set of SFC requests that contain logical sequences of network

2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015) 99


functions as well as the endpoints, which implicitly define the C. Definitions and Modeling
paths. Additionally, the provider has a set of virtual network
function images that it can instantiate. In the figure, larger Next, we detail the inputs, variables, and constraints of our
semicircles represent instances of network functions running optimization model. Superscript letters represent whether a set
on top of an N-PoP, whereas the circumscribed semicircles or variable refers to service chaining requests (S) or physical
represent network function requests assigned to the placed (P ) resources, or whether it relates to nodes (N ) or links (L).
instances. The gray area in the larger semicircles represents NFV Infrastructure and Service Function Chaining.
processing capacity allocated to network functions that is not The topology of the NFV infrastructure, as well as that of
currently in use. Dashed lines represent paths chaining the each SFC, is represented as a directed graph G = (N, L).
requested endpoints and network functions. Vertices N represent network points of presence (N-PoPs) in
SFC-1
physical infrastructures or network functions in SFCs. Each
Region A Region B N-PoP represents a location where a network function may
A
be implemented. In turn, each edge (i, j) 2 L represents an
NF2 B

unidirectional link. Bidirectional links are represented as a pair


NF4 of edges in opposite directions (e.g., (a, b) and (b, a)). Thus,
SFC-2 NF2 B the model allows the representation of any type of physical
NF2 A NF1
topology, as well as any SFC forwarding graph.
NF1 NF3 C
In real environments, physical devices have a limited
amount of resources. In our model, the CPU capacity of
NF3
N-PoP SFC-3 NF3
Physical link
A NF1 B each N-PoP is represented as CiP . In turn, each physical link
NF instances
Assigned NF instances NF4
in the infrastructure has a limited bandwidth capacity and a
Region C
particular delay, represented by Bi,j
P
and Di,j
P
, respectively.
(a) Physical infrastructure. (b) SFC requests.
Similarly, SFCs require a given amount of resources. Network
functions require a specific amount of CPU, represented as
Fig. 1. Example SFC deployment on a physical infrastructure to fulfill a
number of requests. CiS . Additionally, each SFC being requested has a bandwidth
demand and a maximum delay allowed between its endpoints,
represented as Bi,j
S
and DS , respectively.
B. Topological Components of SFC Requests
Virtual Network Functions. Set F represents possible vir-
SFC requests may exhibit different characteristics de- tual network functions (e.g., firewall, load balancer, NAT, etc.)
pending on the application or flow they must handle. More that may be instantiated/placed by the infrastructure operator
specifically, such requests may differ topologically and/or in on top of N-PoPs. Each network function can be instantiated
size. In this paper, we consider three basic types of SFC at most Um times, which is determined by the number of
components, which may be combined with one another to form licenses the provider has purchased. Each virtual function
more complex requests. These three variations – (i) line, (ii) instance requires a given amount of physical resources (which
bifurcated path with different endpoints, and (iii) bifurcated are used by SFCs mapped to that instance). Each instance
path with a single endpoint – are explained next. provides a limited amount of resources represented by Fm cpu
.
The simplest topological component that may be part of This enables our model to represent instances of the same type
an SFC request is a line with two endpoints and one or of network function with different sizes (e.g., pre-configured
more network functions. This kind of component is suitable instances for services with higher or lower demand). Each
for handling flows between two endpoints that have to pass function m 2 F has a processing delay associated with it,
through a particular sequence of network functions, such as which is represented by Fm delay
. Moreover, we consider that
a firewall and a Wide Area Network (WAN) accelerator. each mapped network function instance may be shared by one
The second and third topological components are based on or more SFCs whenever possible.
bifurcated paths. Network flows passing through bifurcated SFC requests. Set Q represents SFCs that must be properly
paths may end up at the same endpoint or not. Considering assigned to network functions. SFCs are composed of chains of
flows with different endpoints, the most basic component network functions and each requested function is represented
contains three endpoints (one source and two destinations). by q. Each link interconnecting the chained functions requires a
Between them, there is a network function that splits the
traffic into different paths according to a certain policy. A given amount of bandwidth, represented by Bi,j V
. Furthermore,
classical example that fits this topological component is a load each request has at least two endpoints, representing specific
balancer connected to two servers. As for bifurcated paths with locations on the infrastructure. The required locations of SFC
a single end point, we consider a scenario in which different endpoints are stored in set S C . Likewise, the physical location
portions of traffic between two endpoints must be treated of each N-PoP i is represented in S P . Since the graph of
differently. For example, part of the traffic has to pass through an SFC request may represent any topology, we assume that
a specific firewall, while the other part, through an encryption the set of virtual paths available to carry data between pairs
function. Figure 2 illustrates these topological components. of endpoints is known in advance. As there are efficient
As previously mentioned, more sophisticated SFC requests algorithms to compute paths, we opted to compute them in
may be created by freely combining these basic topological advance without loss of generality to the model. This set is
components among themselves or in a recursive manner. represented by P .
NF2 NFn B NF2 NFn
Variables. The variables are the outputs of our model, and
A NF1 A NF1 B represent the optimal solution of the service function chaining
A NF1 NFn B NF3 NFm C NF3 NFm problem for the given set of inputs. These variables indicate
(a) Line. (b) Bifurcated path with (c) Bifurcated path with
in which N-PoP virtual network functions are instantiated
different endpoints. a single endpoint. (placed). Further, these variables indicate the assignment of
SFCs being requested to virtual network functions placed in
Fig. 2. Basic topological components of SFC requests. the infrastructure. If a request is accepted, each of its virtual
functions is mapped to an N-POP, whereas each link in the

100 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015)


chain is mapped to one or more consecutive physical links Constraint 2 ensures that the sum of CPU capacities
(i.e., a physical path). required by network function instances mapped to N-PoP i
does not exceed the amount of available physical resources. In
• yi,m,j 2 {0, 1} – Virtual network function placement, turn, constraint 3 ensure that the sum of processing capacities
indicates whether instance j of network function m is required by elements of SFCs does not exceed the amount
mapped to N-PoP i. of virtual resources available on network function m mapped
to N-PoP i. Constraint 4 ensures that, if a network function
• ANi,q,j 2 {0, 1} – Assignment of required network being requested by an SFC is assigned to N-PoP i, then at
functions, indicates whether virtual network function least one network function instance should be running (placed)
j, required by SFC q, is serviced by a network function on i. Constraint 5 ensures that the virtual path between the
placed on N-PoP i. required endpoints has enough available bandwidth to carry
the amount of flow required by SFCs. Constraint 6 ensures that
• ALi,j,q,k,l 2 {0, 1} – Chaining allocation, indicates every required SFC (and its respective network functions) is
whether physical link (i, j) is hosting virtual link (k, l) mapped to the infrastructure. Constraint 7 consists in building
from SFC q. the virtual paths between the required endpoints. Constraint 8
ensures that the required endpoints are mapped to devices in
Based on the above inputs and outputs, we now present the the requested physical locations. Last, constraint 9 ensures that
objective function and its constraints. The objective function of end-to-end latency constraints on mapped SFC requests will be
our model aims at minimizing the number of virtual network met (the first part of the equation is a sum of the delay incurred
function instances mapped on the infrastructure. This objective by end-to-end latencies between mapped endpoints, while the
was chosen due to the fact that this aspect has the most second part defines the delay incurred by packet processing on
significant and direct impact on the network provider’s costs. virtual network functions).
However, our model could be easily adapted to use other
objective functions, such as multi-objective ones that consider D. Proposed Heuristic
different factors simultaneously (e.g., number of network func-
tion instances and end-to-end delays). The purpose of each In this subsection we present our heuristic approach for
constraint of our model is explained next. efficiently placing, assigning, and chaining virtual network
functions. We detail each specific procedure it uses to build a
Objective: feasible solution, and present an overview of its algorithmic
X process.
Min yi,m,j (1) In this particular problem, the search procedure performed
i2RP ,m2F,j2Um by the integer programming solver leads to an extensive num-
ber of symmetrical feasible solutions. This is mainly because
Subject to: there is a considerable number of potential network function
X mappings/assignments that satisfy all constraints, in addition to
cpu
yi,m,j · Fm,j  CiP 8i 2 RP (2) the fact that search schemes conducted by commercial solvers
m2F,j2Um are not specialized for the problem in hand.
X X To address the aforementioned issues, our heuristic ap-
S
Cq,j · AN
i,q,j 
cpu
yi,m,j · Fm,j (3) proach dynamically and efficiently guides the search for solu-
q2Q,j2RqV :q=Fm j2Um tions performed by solvers in order to quickly arrive at high
quality, feasible ones. This is done by performing a binary
8i 2 RP , m 2 F search to find the lowest possible number of network function
X instances that meets the current demands. In each iteration,
AN 8i 2 RP , q 2 Q, k 2 RqS the heuristic employs a modified version of the proposed
i,q,k  yi,m,j
ILP model in which the objective function is removed and
m2F,j2Um :m=Fk transformed into a constraint, resulting in a more bounded
(4) version of the original model. This strategy takes advantage
X of two facts: first, there tends to be a significant number
S
Bq,k,l · AL P
i,j,q,k,l  Bi,j 8(i, j) 2 LP (5) of feasible, symmetrical solutions that meet our criteria for
q2Q,(k,l)2LV optimality, and once the lowest possible number of network
X function instances is determined, only one such solution needs
AN
i,q,j = 1 8q 2 Q, k 2 RqS (6) to be found; and second, commercial solvers are extremely
i2RP efficient in finding feasible solutions.
X X
AL AL N
AN (7) Algorithm 1 presents a simplified pseudocode version of
i,j,q,k,l j,i,q,k,l = Ai,q,k i,q,l our heuristic approach, and its details are explained next. The
j2RP j2RP heuristic performs a binary search that attempts to find a more
8q 2 Q, i 2 RP , (k, l) 2 LSq constrained model by dynamically adjusting the number of net-
work functions that must be instantiated on the infrastructure.
AN N
8(i, j) 2 S P , q 2 Q, (k, l) 2 SqS (8) The upper bound of this search is initially set to the maximum
i,q,j · j = Ai,q,k · l number of network functions that may be instantiated on the
X infrastructure (line 2), while the lower bound is initialized as
AL P
i,j,q,k,l · Di,j 1 (line 3). In each iteration, the maximum number of network
(i,j)2LP ,(k,l)2LS
q :(k,l)2p
function instances allowed is represented by variable nf ,
X delay
which is increased or decreased based on the aforementioned
+ AN
i,q,j · Fk  DkS upper and lower bounds (line 6). After nf is updated, the
i2RP ,k2RqS :k2p algorithm transforms the original model into the bounded one
by removing the objective function (line 7) and adding a new
8q 2 Q, p 2 Pq constraint (line 8), considering the computed value for nf . The
(9) added constraint is shown in Equation 10.

2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015) 101


endpoints (Figure 2(a)). The second component used consists
X of a bifurcated path with different endpoints (Figure 2(b)).
yi,m,j  nf (10) This component is composed of a load balancer splitting the
i2RP ,m2F,j2Um traffic between two servers. These two types of components are
comparable since their end-to-end paths pass through exactly
In line 9, a commercial solver is used to obtain a solution one network function. The third and fourth components use the
for the bounded model within an acceptable time limit. In each same topologies of the previously described ones, but vary in
iteration, the algorithm stores the best solution found so far size. The third component is a line (like Component 1) com-
(i.e., the solution s with the lowest value for nf – line 11). posed of two chained network functions – a firewall followed
Afterwards, it adjusts the upper or lower bound depending on by an encryption network function (e.g., VPN). The fourth
whether the current solution is feasible or not (lines 12 and 14). component is a bifurcated path (like Component 2), but after
Last, it returns the best solution found (s0 , which represents the load balancer, traffic is forwarded to one more network
function – a firewall. These particular network functions were
variables y, AN and AL ). chosen due to being commonly referenced in recent literature;
Although the proposed heuristic uses an exact approach to however, they could be easily replaced with any other functions
find a feasible solution for the problem, timeLimit (in line 9) if so desired. All network functions requested by SFCs have
should be fine-tuned considering the size of the instance being the same requirements in terms of CPU and bandwidth. Each
handled to ensure the tightest solution will be found. In our network function requires 12.5% of CPU, while the chainings
experience, for example, a time limit in the order of minutes between network functions require 1Gbps of bandwidth. When
is sufficient for dealing with infrastructures with 200 N-PoPs. traffic passes through a load balancer, the required bandwidth
is split between the paths. The values for CPU and bandwidth
Input: Inf rastructure G, set Q of SF Cs, set requirements were fixed after a preliminary evaluation, which
revealed that they did not have a significant impact on the
V N F of network f unctions, timeLimit obtained results. Moreover, the establishment of static values
Output: V ariables yi,m,j , AN i,q,j , Ai,j,q,k,l
L
for these parameters facilitates the assessment of the impact
0
1 s, s ; of other, more important factors.
2 upperBound |F |
The processing times of virtual network functions (i.e., the
3 lowerBound 1 time required by these functions to process each incoming
4 nf (upperBound + lowerBound)/2 packet) considered in our evaluation are shown in Table I.
5 while nf lowerBound and nf  upperBound do These values are based on the study conducted by Dobrescu et
6 nf (upperBound + lowerBound)/2 al. [13], in which the authors determine the average processing
7 Remove objectiveP f unction time of a number of software-implemented network functions.
8 Add constraint : i2RP ,m2F,j2Um yi,m,j  nf
TABLE I. P ROCESSING TIMES OF PHYSICAL AND VIRTUAL NETWORK
9 s solveAlteredM odel(timeLimit) FUNCTIONS USED IN OUR EVALUATION .
10 if s is f easible then
11 s0 s Processing Time Processing Time
12 upperBound nf Network Function
(physical) (virtual)
13 else Load Balancer 0.2158 sec 0.6475 sec
14 lowerBound nf Firewall 2.3590 sec 7.0771 sec
15 end VPN Function 0.5462 sec 1.6385 sec
16 end
17 if s0 = ; then
18 return inf easible solution Networks used as physical substrates were generated with
19 else Brite3 . The topology of these networks follows the Barabasi-
20 return s0 Albert (BA-2) [14] model. This type of topology was chosen as
21 end an approximation of those observed in real ISP environments.
Algorithm 1: Overview of the proposed heuristic. Physical networks have a total of 50 N-PoPs, each with total
CPU capacity of 100%, while the bandwidth of physical links
is 10 Gbps. The average delay of physical links is 30ms. This
IV. E VALUATION value is based on the study conducted by Choi et al. [15],
which characterizes typical packet delays in ISP networks.
In order to evaluate the provisioning of different types
of SFCs, the ILP model formalized in the previous section In order to provide a comparison between virtualized
was implemented and run in CPLEX Optimization Studio2 network functions and non-virtualized ones, we consider base-
version 12.4. The heuristic, in turn, was implemented and run line scenarios for each type of SFC. These scenarios aim at
in Python. All experiments were performed on a machine with reproducing the behavior of environments that employ physical
four Intel Xeon E5-2670 processors and 56 GB of RAM, using middleboxes rather than NFV. Our baseline consists of a
the Ubuntu GNU/Linux Server 11.10 x86 64 operating system. modified version of our model, in which the total number of
network functions is exactly the number of different functions
being requested. Moreover, the objective function attempts
A. Workloads to find the minimum chaining length between endpoints and
We consider four different types of SFC components. Each network functions. In baseline scenarios, function capacities
type uses either one of the topological components described are adjusted to meet all demands and, therefore, we do not
in Subsection III-B or a combination of them. The first com- consider capacity constraints. Further, processing times are
ponent is a line composed of a single firewall between the two three times lower than those in virtualized environments. This
is in line with the study of Basta et al. [8]. These processing
2 http://www-01.ibm.com/software/integration/optimization/cplex-
optimization-studio/ 3 http://www.cs.bu.edu/brite/

102 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015)


times, like the ones related to virtual network functions, are 7.80% and 7.28%, respectively. Components 3 and 4 (Figure
shown in Table I. 4(b)) lead to, on average, 6.58% and 3.18% CPU overhead.
In turn, for large instances, Components 1 and 2 lead to
In our experiments, we consider two different profiles of average CPU overheads of 45.61% and 38.68%, respectively.
network function instances. In the first one, instances may re- Components 3 and 4 lead to, on average, 40.21% and 40.36%
quire either 12.5% or 25% of CPU, leading to smaller instance CPU overhead. Observed averages demonstrate that the impact
sizes. In the second profile, instances may require 12.5% or of instance sizes is notably high, with smaller instances leading
100%, leading to larger instances overall. We first evaluate to significantly lower overheads. Further, we can observe that,
our optimal approach considering individual types of requests. in general, CPU overheads tend to be lower when higher
Next, we evaluate the effect of a mixed scenario with multiple numbers of SFCs are being deployed. As more requests are
types of SFCs. Last, we evaluate our proposed heuristic using being serviced simultaneously, network function instances can
large instances. Each experiment was repeated 30 times, with be shared among multiple requests, increasing the efficiency
each repetition using a different physical network topology. of CPU allocations. In these experiments, the baseline has 0%
All results have a confidence level of 90% or higher. of CPU overhead as network functions are planned in advance
to support the exact demand. Since in a NFV environment
B. Results network function instances are hosted on top of commodity
First, we analyze the number of network functions in- hardware (as opposed to specialized middleboxes), these over-
stances needed to cope with an increasing number of SFC heads – especially those observed for small instances – are
requests. Figure 3 depicts the average number of instantiated deemed acceptable, as they do not incur high additional costs.
network functions with the number of SFC requests varying Component 1, small instances Component 3, small instances
from 1 to 20. At each point on the graph, all previous SFC Component 1, large instances
Component 1, baseline
Component 3, large instances
Component 3, baseline
requests are deployed together. It is clear that the number Component 2, small instances
Component 2, large instances
Component 4, small instances
Component 4, large instances

of instances is proportional to the number of SFC requests.


Component 2, baseline component 4, baseline

Further, we observe that smaller instance sizes lead to a higher 1 1


number of network functions being instantiated. Considering
small instances, scenarios with Components 1 and 2 require, 0.8 0.8

on average, 10 network function instances (Figure 3(a)). In


CPU Overhead

CPU Overhead
0.6 0.6

contrast, scenarios with Components 3 and 4 require, on


average, 20 and 30 instances (Figure 3(b)), respectively. For 0.4 0.4

large instances, scenarios with Components 1 and 2 require, 0.2 0.2

respectively, 4 and 3 network function instances, while those 0 0


with Components 3 and 4 require 9 and 12 instances on 4 8 12 16 20 4 8 12 16 20

average. These results demonstrate that the number of virtual


SFC Requests SFC Requests

network functions in a SFC request has a much more sig- (a) Components 1 and 2. (b) Components 3 and 4.
nificant impact on the number of instances needed to service
such requests than the chainings between network functions Fig. 4. Average CPU overhead of network function instances.
and endpoints. This can be observed, for example, in Figure
3(a), in which Components 1 and 2 only differ topologically Next, Figure 5 shows the average overhead caused by
and lead to, on average, the same number of instances. In chaining network functions (through virtual links) in each
contrast, Figure 3(b) shows that when handling components experiment. This overhead is measured as the ratio between
of type 4 (which have a higher number of network functions the effective bandwidth consumed by SFC virtual links hosted
than those of type 3), a significantly higher number of network on the physical substrate and the bandwidth requested by such
function instances is required. links. In general, the actual bandwidth consumption is higher
than the total bandwidth required by SFCs, due to the frequent
Component 1, small instances
Component 1, large instances
Component 3, small instances
Component 3, large instances need to chain network functions through paths composed of
multiple physical links. The absence of overhead is observed
Component 1, baseline Component 3, baseline
Component 2, small instances Component 4, small instances
Component 2, large instances Component 4, large instances
Component 2, baseline Component 4, baseline
only when each virtual link is mapped to a single physical link
(ratio of 1.0), or when network functions are mapped to the
30 30
same devices as the requested endpoints (ratio < 1.0). Lower
Number of Network Functions Instantiated

Number of Network Functions Instantiated

25 25
overhead rates may potentially lead to lower costs and allow
20 20 more SFC requests to be serviced.
15 15
Considering large instances, the observed average overhead
10 10
is 50.49% and 69.87% for scenarios with Components 1 and
5 5 2, respectively. In turn, Components 3 and 4 lead to overhead
0 0 ratios of 116% and 72.01%. This is due to the low number
of instantiated network functions (Figure 3), which forces
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4. instances to be chained through long paths. Instead, when
small instances are considered (i.e., more instances running
Fig. 3. Average number of network function instances. in a distributed way), overheads tend to be lower. Components
1 and 2 lead to, on average, 44.30% and 57.60% bandwidth
overhead, while Components 3 and 4, 44.53% and 53.41%,
Figure 4 illustrates the average CPU overhead (i.e., al- respectively. When evaluating bandwidth overheads, we can
located but unused CPU resources) in all experiments. Each observe that the topological structure of SFC requests has the
point on the graph represents the average overhead from the most significant impact on the results (in contrast to previously
beginning of the experiment until the current point. In all discussed experiments). More complex chainings tend to lead
experiments, CPU overheads tend to be lower when small to higher bandwidth overheads, although these results are also
instances are used. When large instances are allocated, more influenced by other factors such as instance sizes and the
resources stay idle. Considering small instances, Components number of instantiated functions. In these experiments the
1 and 2 (Figure 4(a)) lead to, on average, CPU overhead of baseline overhead tends to be lower than the others as the

2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015) 103


objective function prioritizes shortest paths (in terms of number After analyzing the behavior of SFCs considering homo-
of hops) between endpoints and network functions. geneous components, we now analyze the impact of a mixed
scenario. In it, Components 1, 2, and 4 are repeatedly deployed
Component 1, small instances Component 3, small instances in the infrastructure sequentially. Figure 7 presents the results
for the mixed scenario. Although there are different topolog-
Component 1, large instances Component 3, large instances
Component 1, baseline Component 3, baseline
Component 2, small instances Component 4, small instances
Component 2, large instances
Component 2, baseline
Component 4, large instances
Component 4, baseline ical SFC components being deployed together in the same
infrastructure, the results exhibit similar tendencies as those
of homogeneous scenarios. In Figure 7(a), we observe that the
average number of network functions (on average, 17 network
2.2 2.2

2 2
functions when considering small instances and 9 functions
Bandwidth Overhead

Bandwidth Overhead

1.8 1.8

1.6 1.6
considering large instances) is proportional to the obtained
1.4 1.4 average values depicted in Figures 3(a) and 3(b). The average
1.2 1.2 CPU overhead also remains similar (9.12% considering small
1 1 instances and 45.37% considering large ones). In turn, the
0.8 0.8 average overhead caused by chaining network functions in
the mixed scenario is of 59.06% and 47.51%, for small and
4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(a) Components 1 and 2. (b) Components 3 and 4. large instances, respectively. Despite these similarities, end-
to-end delays tend to be comparatively lower than the ones
Fig. 5. Average bandwidth overhead of SFCs deployed in the infrastructure. observed in homogeneous scenarios. The delay observed in
the proposed chaining approach is 8.57% lower than that of
the baseline (22.93ms in comparison to 25.08ms). This is
Figure 6 depicts the average end-to-end delay, in millisec- due to the combination of requests with different topological
onds, observed between endpoints in all experiments. The end- structures, which promotes the use of a wider variety of
to-end delay is computed as a sum of the path delays and physical paths (which, in turn, leads to lower overutilization
network function processing times. In this figure, results for of paths). Similarly to homogeneous scenarios, average end-to-
scenarios with small and large instances are grouped together, end delays are the same considering small and large instances.
as average delays are the same. The observed end-to-end delay
for all components tends to be lower than the delay observed Small instances
Large instances
Small instances
Large instances
for the baseline scenario. This is mainly due to the better Baseline Baseline

positioning of network functions and chainings between them.


Furthermore, the model promotes a better utilization of the
30 1
Number of Network Functions Instantiated

variety of existing paths in the infrastructure. Although the 25 0.8

baseline scenario aims at building minimum chainings (in 20

CPU Overhead
0.6

terms of hops), we observe that: (i) minimum chaining does 15

not always lead to global minimum delay; (ii) when baseline


0.4
10

scenarios overuse the shortest paths, other alternative paths 5


0.2

remain unused due to the depletion of resources in specific lo- 0 0

cations (mainly in the vicinity of highly interconnected nodes).


4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

In comparison with baseline scenarios, Component 1 leads to, (a) Average number of network (b) Average CPU overhead of
on average, 25% lower delay (21.55ms compared to 29.07ms), function instances. network function instances.
while Component 2 leads to, on average, 15.40% lower delay
(19.28ms compared to 22.79ms). In turn, Component 3 leads Small instances
Large instances
Small, large instances
Baseline
to, on average, 13.86% lower delay than its baseline (25.15ms Baseline

compared to 29.20ms), while Component 4 leads to 15.75% 34

lower delay (24.89ms compared to 29.55ms). In summary,


Average End−to−end Delay (in milliseconds)

2
32

even though our baseline scenarios are planned in advance to 1.8 30

support exact demands and we consider processing times of


Bandwidth Overhead

28
1.6
26
virtual network functions to be three times those of physical 1.4 24

ones, end-to-end delays are still lower in virtualized scenarios. 1.2 22

This advantage may become even more significant as the 1


20
18
estimated processing times of virtual network functions get 0.8 16

closer in the future to those observed in physical middleboxes.


4 8 12 16 20 4 8 12 16 20
SFC Requests SFC Requests

(c) Average bandwidth overhead. (d) Average end-to-end delay.


Component 1 Component 3
Component 1, baseline Component 3, baseline
Component 2 Component 4
Component 2, baseline Component 4, baseline Fig. 7. Mixed scenario including Components 1, 2, and 4.

34 34
32
We now proceed to the evaluation of our proposed heuristic
Average End−to−end Delay (in milliseconds)

Average End−to−end Delay (in milliseconds)

32
30
28
approach. The heuristic was subjected the same scenarios as
26
30
the ILP model, in addition to ones with a larger infrastructure.
24 28 Considering the scenarios presented so far (i.e., with physical
22
26
infrastructures with 50 nodes and 20 SFC requests), our
20
18
heuristic was able to find an optimal solution in all cases.
16
24
We omit such results due to space constraints. We emphasize,
14
4 8 12 16 20
22
4 8 12 16 20
however, that the heuristic approach was able to find an optimal
SFC Requests SFC Requests solution in a substantially shorter time frame in comparison to
(a) Components 1 and 2. (b) Components 3 and 4. the ILP model, although the solution times of both approaches
remained in the order of minutes. The average solution times
Fig. 6. Average end-to-end delay of SFCs deployed in the infrastructure. of the ILP model and the heuristic considering all scenarios
were of, respectively, 8 minutes and 41 seconds and 1 minute

104 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015)


Heuristic, small instances
and 21 seconds. Heuristic, small instances
Heuristic, large instances
Heuristic, large instances
ILP, small instances
ILP, small instances ILP, large instances

Last, we evaluate our heuristic approach on a large NFV ILP, large instances Baseline

infrastructure. In this experiment, we consider a physical net- 80


work with 200 N-PoPs and a maximum of 60 SFC components

Number of Network Functions Instantiated


1000 70

of type 4. The delay limit was scaled up to 90ms in order 100


60

to account for the larger network size. Figure 8(a) depicts

Time (seconds)
50
40
the average time needed to find a solution using both the 10
30

ILP model and the heuristic. The ILP model was not able 1 20

to find a solution in a reasonable time in scenarios with more 10

than 18 SFCs (the solution time was longer than 48 hours).


0.1 0
10 20 30 40 50 60 10 20 30 40 50 60

The heuristic approach, in turn, is able to properly scale to SFC Requests SFC Requests

cope with this large infrastructure, delivering feasible, high- (a) Average time needed to find (b) Average number of network
quality solutions in a time frame of less than 30 minutes. As a solution. function instances.
in previous experiments, small network function instances lead Heuristic, small instances
Heuristic, large instances
Heuristic, small instances
Heuristic, large instances
to higher solution times than large ones. This is mainly because ILP, small instances
ILP, large instances
ILP, small instances
ILP, large instances

smaller instances lead to a larger space of potential solutions Baseline Baseline

to be explored. 4.5
1
Although the heuristic does not find the optimal solution
4
3.5
(due to time constraints), Figures 8(b), 8(c), 8(d) and 8(e) show 0.8

Bandwidth Overhead
3

CPU Overhead
that the solutions obtained through this approach present a 0.6 2.5

similar level of quality to the ones obtained optimally. Figure 0.4


2

8(b) depicts the average number of instantiated network func-


1.5
1

tions with the number of SFC requests varying from 1 to 60.


0.2
0.5

As in previous experiments, the number of instances remains 0


10 20 30 40 50 60
0
10 20 30 40 50 60

proportional to the number of SFC requests. Smaller instance SFC Requests SFC Requests

sizes lead to a higher number of network functions being (c) Average CPU Overhead of (d) Average bandwidth overhead
instantiated. Considering small sizes, 75 network functions network function instances. of SFCs deployed in the infras-
instances are required on average. In contrast, for large sizes, tructure.
40 instances are required on average. Figure 8(c), in turn, Heuristic
ILP
Baseline

illustrates the average CPU overhead. For small instances,


CPU overhead is limited to 18.77%, while for large instances 85

it reaches 48.65%. Similarly to the results concerning the


Average End−to−end Delay (in milliseconds)

80
number of network function instances, CPU overheads in
these experiments also follow the trends observed in previous 75

ones. Next, Figure 8(d) presents bandwidth overheads. Small 70

instances lead to a bandwidth overhead of 300%, while for


large instances this overhead is, on average, 410%. These par- 65

ticularly high overheads are mainly due to the increase on the 60


10 20 30 40 50 60
average length of end-to-end paths, as the physical network is SFC Requests

significantly larger. Note that the bandwidth overhead observed (e) Average end-to-end delay of
in the baseline scenario (198%) is also significantly higher SFCs deployed in the infrastruc-
than those observed in experiments employed on the small ture.
infrastructure. Last, Figure 8(e) depicts the average end-to-end
delay observed in large infrastructures. In line with previous Fig. 8. Scenario considering a large infrastructure and components of type 4.
results, the end-to-end delay tends to be lower than the delay
observed in the baseline scenario. The scenario considering
Component 4 presents, on average, 17.72% lower delay than
the baseline scenario (70.30ms compared to 82.76ms). In short,
these results demonstrate that: (i) the heuristic is able to find model to solve it. Additionally, in order to cope with large
solutions with a very similar level of quality as the optimization infrastructures, we proposed a heuristic procedure that dy-
model for small infrastructures; and (ii) as both infrastructure namically and efficiently guides the search for solutions per-
sizes and the number of requests increase, the heuristic is able formed by commercial solvers. We evaluated both optimal
to maintain the expected level of quality while still finding and heuristic approaches considering realistic workloads and
solutions in a short time frame. different use cases. The obtained results show that the ILP
model leads to a reduction of up to 25% in end-to-end delays
and an acceptable resource over-provisioning limited to 4%.
V. C ONCLUSION Further, we demonstrate that our heuristic scales to larger
NFV is a prominent network architecture concept that has infrastructures while still finding solutions that are very close
the potential to revamp the management of network functions. to optimality in a timely manner.
Its wide adoption depends primarily on ensuring that resource
allocation is efficiently performed so as to prevent over-
or under-provisioning of resources. Thus, placing network As perspectives for future work, we envision extending
functions and programming network flows in a cost-effective the evaluation of the proposed solutions by applying them to
manner while guaranteeing acceptable end-to-end delays rep- other types of SFCs and ISP topologies, as well as conducting
resents an essential step towards a broader adoption of this an in-depth analysis of the inter-relationships between their
concept. parameters. Moreover, we intend to explore mechanisms to
reoptimize network function placements, assignments, and
In this paper, we formalized the network function place- chainings. Further, we intend to explore exact solutions for
ment and chaining problem and proposed an optimization the problem, such as matheuristics.

2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015) 105


R EFERENCES placement problem,” in Proceedings of the 4th Workshop on All Things
Cellular: Operations, Applications and Challenges, 2014.
[1] J. Martins, M. Ahmed, C. Raiciu, V. Olteanu, M. Honda, R. Bifulco,
and F. Huici, “Clickos and the art of network function virtualization,” [9] M. Yu, Y. Yi, J. Rexford, and M. Chiang, “Rethinking virtual network
in Proceedings of the 11th USENIX Conference on Networked Systems embedding: Substrate support for path splitting and migration,” SIG-
Design and Implementation, 2014. COMM Computer Communication Review, vol. 38, no. 2, pp. 17–29,
Mar. 2008.
[2] D. A. Joseph, A. Tavakoli, and I. Stoica, “A policy-aware switching
layer for data centers,” in Proceedings of the ACM SIGCOMM Confer- [10] M. Chowdhury, M. R. Rahman, and R. Boutaba, “Vineyard: Virtual net-
ence on Data Communication, 2008. work embedding algorithms with coordinated node and link mapping,”
IEEE/ACM Transactions on Networking, vol. 20, no. 99, pp. 206–219,
[3] T. Benson, A. Akella, and A. Shaikh, “Demystifying configuration 2012.
challenges and trade-offs in network-based isp services,” in Proceedings
of the ACM SIGCOMM Conference on Data Communication, 2011. [11] M. Rabbani, R. Pereira Esteves, M. Podlesny, G. Simon, L. Zam-
benedetti Granville, and R. Boutaba, “On tackling virtual data center
[4] V. Sekar, N. Egi, S. Ratnasamy, M. K. Reiter, and G. Shi, “Design and embedding problem,” in Integrated Network Management (IM 2013),
implementation of a consolidated middlebox architecture,” in Proceed- 2013 IFIP/IEEE International Symposium on, May 2013, pp. 177–184.
ings of the 9th USENIX Conference on Networked Systems Design and
Implementation, 2012. [12] L. R. Bays, R. R. Oliveira, L. S. Buriol, M. P. Barcellos, and L. P.
Gaspary, “A heuristic-based algorithm for privacy-oriented virtual net-
[5] Network Functions Industry Specification Group, “Network function work embedding,” in IEEE/IFIP Network Operations and Management
virtualisation (nfv): An introduction, benefits, enablers, challenges and Symposium (NOMS), Krakow, Poland, May 2014.
call for action,” in SDN and OpenFlow World Congress, 2012, pp. 1–16.
[13] M. Dobrescu, K. Argyraki, and S. Ratnasamy, “Toward predictable
[6] J. Hwang, K. K. Ramakrishnan, and T. Wood, “Netvm: High perfor- performance in software packet-processing platforms,” in Proceedings
mance and flexible networking using virtualization on commodity plat- of the 9th USENIX Conference on Networked Systems Design and
forms,” in Proceedings of the 11th USENIX Conference on Networked Implementation, 2012.
Systems Design and Implementation, 2014.
[14] R. Albert and A.-L. Barabási, “Topology of evolving networks: Local
[7] S. Barkai, R. Katz, D. Farinacci, and D. Meyer, “Software defined events and universality,” Physical Review Letters, vol. 85, pp. 5234 –
flow-mapping for scaling virtualized network functions,” in Proceedings 5237, Dec 2000.
of the Second ACM SIGCOMM Workshop on Hot Topics in Software
Defined Networking, 2013. [15] B.-Y. Choi, S. Moon, Z.-L. Zhang, K. Papagiannaki, and C. Diot,
“Analysis of point-to-point packet delay in an operational network,”
[8] A. Basta, W. Kellerer, M. Hoffmann, H. J. Morper, and K. Hoffmann, Computer Networks, vol. 51, pp. 3812–3827, 2007.
“Applying nfv and sdn to lte mobile core gateways, the functions

106 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM2015)


136

APPENDIX C JOURNAL PAPER PUBLISHED AT ELSEVIER COMPUTER COM-


MUNICATIONS

• Title: A fix-and-optmize approach for efficient and large scale virtual network function
placement and chaining
• Journal: Elsevier Computer Communications
• Qualis: A2
Abstract. Network Function Virtualization (NFV) is a novel concept that is reshaping the
middlebox arena, shifting network functions (e.g. firewall, gateways, proxies) from specialized
hardware appliances to software images running on commodity hardware. This concept has
potential to make network function provision and operation more flexible and cost-effective,
paramount in a world where deployed middleboxes may easily reach the order of hundreds. In
spite of recent research activity in the field, little has been done towards efficient and scalable
placement & chaining of virtual network functions (VNFs) – a key feature for the effective suc-
cess of NFV. More specifically, existing strategies have either neglected the chaining aspect of
NFV, focusing on efficient placement only, or failed to scale to hundreds of network functions.
In this paper, we approach VNF placement and chaining as an optimization problem, and pro-
pose a fix-and-optimize-based heuristic algorithm for tackling it. Our algorithm incorporates
a Variable Neighborhood Search (VNS) meta-heuristic, for efficiently exploring the placement
and chaining solution space. The goal is to minimize required resource allocation, while meet-
ing network flow requirements and constraints. We provide evidence that our algorithm is able
to find feasible, high quality solutions efficiently, even in scenarios scaling to hundreds of VNFs.
Computer Communications 102 (2017) 67–77

Contents lists available at ScienceDirect

Computer Communications
journal homepage: www.elsevier.com/locate/comcom

A fix-and-optimize approach for efficient and large scale virtual


network function placement and chaining
Marcelo Caggiani Luizelli, Weverton Luis da Costa Cordeiro∗, Luciana S. Buriol,
Luciano Paschoal Gaspary
Institute of Informatics – Federal University of Rio Grande do Sul, Av. Bento Gonçalves, 9500, 91.501-970 – Porto Alegre, RS, Brazil

a r t i c l e i n f o a b s t r a c t

Article history: Network Function Virtualization (NFV) is a novel concept that is reshaping the middlebox arena, shift-
Received 31 March 2016 ing network functions (e.g. firewall, gateways, proxies) from specialized hardware appliances to software
Revised 9 August 2016
images running on commodity hardware. This concept has potential to make network function provi-
Accepted 12 November 2016
sion and operation more flexible and cost-effective, paramount in a world where deployed middleboxes
Available online 14 November 2016
may easily reach the order of hundreds. In spite of recent research activity in the field, little has been
Keywords: done towards efficient and scalable placement & chaining of virtual network functions (VNFs) – a key
NFV feature for the effective success of NFV. More specifically, existing strategies have either neglected the
Network function placement chaining aspect of NFV, focusing on efficient placement only, or failed to scale to hundreds of network
Service Function Chaining functions. In this paper, we approach VNF placement and chaining as an optimization problem, and pro-
Optimization pose a fix-and-optimize-based heuristic algorithm for tackling it. Our algorithm incorporates a Variable
Fix-and-optimize
Neighborhood Search (VNS) meta-heuristic, for efficiently exploring the placement and chaining solution
Variable Neighborhood Search
space. The goal is to minimize required resource allocation, while meeting network flow requirements
and constraints. We provide evidence that our algorithm is able to find feasible, high quality solutions
efficiently, even in scenarios scaling to hundreds of VNFs.
© 2016 Elsevier B.V. All rights reserved.

1. Introduction There have been significant achievements in NFV, addressing


aspects from effective planning and deployment [11,17] to efficient
Network Function Virtualization (NFV) is a novel trend in which operation and management [6,9]. Nevertheless, NFV is a relatively
middlebox functions (firewalling, caching, encryption, etc.) are new and yet maturing paradigm, with various research questions
shifted from specialized hardware to software-centric solutions. open. One of the most challenging is how to efficiently find a
These, in turn, run on top of off-the-shelf commodity hardware. proper VNF placement and chaining. Each instance of this problem,
The benefits of NFV are manifold, reaching middlebox consumers, also known as Service Function Chaining (SFC), involves ensur-
vendors, and operators. For example, it has potential to enable ing that allocated functions satisfy flow processing requirements
companies and organizations (consumers) reduce capital expendi- (placement), and steering flows across functions in a specific or-
tures on middlebox hardware purchases. Network operators also der (chaining). In this context, Software Defined Networking (SDN)
benefit from flexible placement of virtual network functions (VNFs) can be seen as a convenient ally, due to its flexible flow handling
across the infrastructure. Examples include smooth reconfigura- capability, thus making placement and chaining technically easier.
tion of flow chaining, given processing requirements, and resource In spite of the convenience of bringing together NFV and SDN,
(de)allocation/dimensioning, in response to variations in traffic de- the VNF placement and chaining problem, under certain require-
mand. NFV even enables small industry players (vendors) to easily ments and constraints, is NP-complete. This aspect becomes es-
step in and develop customized, software-centric middlebox solu- pecially relevant if we consider that the number of middleboxes
tions, reducing their time-to-market. may scale to the order of hundreds. Some surveys have shown
that certain very large networks have around 20 0 0 middleboxes
deployed [21], and that this number often compares to those of
routers and switches in place [20]. Recent investigations have tar-

Corresponding author.
geted the placement and chaining problem [11,12,17], however they
E-mail addresses: mcluizelli@inf.ufrgs.br (M.C. Luizelli),
weverton.cordeiro@inf.ufrgs.br, weverton.cordeiro@gmail.com (W.L. da Costa
scale to a few dozens of VNFs, or to a handful of SFC submitted in
Cordeiro), buriol@inf.ufrgs.br (L.S. Buriol), paschoal@inf.ufrgs.br (L.P. Gaspary). parallel (as it will be discussed later).

http://dx.doi.org/10.1016/j.comcom.2016.11.002
0140-3664/© 2016 Elsevier B.V. All rights reserved.
68 M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77

As a major contribution to the state-of-the-art, in this paper


we address scalability of VNF placement and chaining (VNFPC) –
deemed a very important design requirement of real operational
networks – by proposing a novel fix-and-optimize-based heuristic
algorithm. It combines mathematical programming (Integer Linear
Programming, ILP) and a meta-heuristic method (Variable Neigh-
borhood Search, VNS [7]), generating a math-heuristic method [15],
so as to produce high quality solutions for large scale network se-
tups in a timely fashion. We provide evidences, supported by an
extensive set of experiments, that our heuristic algorithm scales
to environments comprised of hundreds of network functions. It
produces timely results on average only 20% far from a computed
lower bound, and outperforms the best existing algorithmic solu-
tion, quality-wise, by a factor of 5. As another important contri-
bution, we prove that the VNFPC problem belongs to class NP-
complete. As far as we are aware of, this is the first paper to for-
mally validate such an important claim.
The remainder of this paper is organized as follows. In
Section 2 we cover related works. In Section 3 we introduce a for-
mal definition to the VNFPC problem, along with the optimization
Fig. 1. Examples of SFCs, and partial view of the network backbone (focusing on
model that approaches it. Our math-heuristic algorithm for VNF the set of N-PoPs available for placing VNFs) considered in our scenario.
placement and chaining is presented in Section 4. In Section 5 we
discuss evaluation results, and also compare our approach with the
state-of-the-art, whereas in Section 6 we close the paper with con- In spite of their potentialities, the investigations above either
cluding remarks. ignore a number of aspects vital for the placement & chaining
problem (e.g. network functions deployed as part of a service
2. Related work chain, capacity constraints of network links, etc.) [13,18], or do not
properly scale to large network settings and several SFCs submit-
To the best of our knowledge, network function placement and ted in parallel. Moens et al. [17], for example, were limited to small
chaining has not been investigated before the inception of NFV, e.g. scale scenarios. Lewin-Eytan et al. [11] and Luizelli et al. [12] have
for planning and deployment of physical middleboxes. One of the shown to be only partially effective in scaling to scenarios with a
most similar problems in the networking literature is Virtual Net- few hundreds nodes and allocating network resources wisely (and
work Embedding (VNE): how to deploy virtual network requests the former does not approach chaining, as mentioned earlier). As
on top of a physical substrate [22,23]. In spite of the similarities, it will be shown latter, our approach is able to outperform both,
VNF placement and chaining is a broader, two-phase problem, as coming up with feasible, high quality solutions for larger scenarios
it also encompasses the assignment of SFCs to deployed functions. in a timely fashion.
Moens et al. [17] were the first to address VNF placement and
chaining, by formalizing it as an optimization problem. The authors 3. Problem overview and optimization model
considered a hybrid scenario, where SFCs can be instantiated us-
ing existing middlebox hardware and/or virtual functions. Luizelli We begin a more in-depth description of the placement and
et al. [12] also approached the problem from an optimization per- chaining problem with an example, illustrated in Fig. 1. It involves
spective. The authors introduced a set of heuristics for pruning the the deployment of two Service Function Chaining requests (SFCs,
search space, thus reducing the complexity of finding feasible so- also referred to as “composite services”) onto a backbone network.
lutions. For the first one, incoming flows must be passed through (an in-
Lewin-Eytan et al. [11] followed a similar direction, and used stance of) virtual network function (VNF) 1 (e.g. a firewall), and
an optimization model along with approximation algorithms to then VNF 2 (e.g. a load balancer). The second specifies that incom-
solve the problem. The focus however was VNF placement: where ing flows must also be passed through VNF 1, and then VNF 3 (e.g.
to deploy VNFs, and how to assign traffic flows to them. Their a proxy). Both SFCs are sketched in Figs. 1(a) and (b), respectively.
work is relevant for having established a theoretical background The VNF instances required for each SFC must be placed onto
for NFV placement, building on two classical optimization prob- Network Points of Presence (N-PoPs). N-PoPs are infrastructures
lems, namely facility location and generalized assignment. Nonethe- (e.g. servers, clusters or even datacenters) spread in the network
less, network function chaining (key for NFV planning and deploy- and on top of which (virtual) network functions can be provi-
ment) was left out of scope. Following the same line as the semi- sioned. Without loss of generality, we assume the existence of
nal work by Lewin-Eytan et al., Rost et al. [18] and Lukovszki et al. one N-PoP associated with every major forwarding device com-
[13] also leveraged approximation algorithms for the VNFPC opti- prising a backbone network (see circles in Fig. 1(c)). Each N-PoP
mization problem, the former considering SFC admission control, has a certain amount of resources (e.g. computing power) avail-
whereas the latter covering incremental deployment. able. Likewise, each SFC has its own requirements. For example,
There have been other proposals that approached specific functions composing an SFC (e.g. caching) are expected to sustain
versions of the VNFPC problem [2,10,14,16]. Mehraghdam et al. a given load, and thus are associated with a computing power re-
[16] and Bari et al. [2] introduced joint optimization problems for quirement. Also, traffic between functions can be expected to reach
VNFPC, the former analyzing the optimization model under differ- some peak throughput, which must be handled by the physical
ent objective functions, whereas the latter focusing on reducing path connecting the N-PoPs hosting those functions.
operational expenditures on datacenters. Kuo et al. [10] analyzed Given the context above, the problem we approach is finding
resource consumption on physical servers and links, and Lukovszki a proper placement of VNFs onto N-PoPs and chaining of placed
and Schmid [14] presented a deterministic online algorithm with functions, so that overall network resource commitment is mini-
logarithm competitive ratio on the length of service chains. mized. The placement and chaining must ensure that each of the
M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77 69

SFC requirements, as well as network constraints, are met. In our ing specific locations in the infrastructure. The required locations
illustrating example, a possible deployment is shown in Fig. 1(c). of SFC endpoints is determined in advance, and given by a func-
The endpoints, represented as filled circles, denote flows origi- tion SqS : NqS → N∗ , where NqS contains endpoints and N∗ represent
nated/destinated from/to devices/networks attached to core for- possible identifiers of physical world locations. For each SFC q, we
warding devices. Observe that both SFCs share a same instance of capture the following resource requirements: computing power re-
VNF 1, placed on N-PoP (2), therefore minimizing resource alloca- S ), minimum bandwidth required
quired by a network function i (cq,i
tion as desired. for traffic flows between functions i and j (bSq,i, j ), and maximum
tolerable end-to-end delay (dqS ).
3.1. Model description and notation F denotes the set of types of VNFs (e.g., firewall, gateway) avail-
able for deployment. Each VNF has a set of Um instances, and may
The optimization model is an important building block for the be instantiated at most |Um | times (e.g. number of network func-
math-heuristic algorithm proposed in this paper. In this iteration tions of the same type available for deployment, having same re-
of our work, we adopt a revised version of the model proposed quirements with respect to CPU usage, etc.). We denote as ftype :
by Luizelli et al. [12], which captures the placement and chain- NP ∪ NS → F the function that indicates the type of some given
ing aspects we are currently interested in. We start by describ- VNF, which can be either one instantiated in some N-PoP (NP )
ing both the input and output of the model, and establishing a or one requested in an SFC (NS ). We also use functions fcpu :
supporting notation. We use superscript letters P and S to indicate (F × Um ) → R+ and fdelay : F → R+ to denote computing power re-
symbols that refer to physical resources and SFC requests, respec- quirement and processing delay of a VNF.
tively. Similarly, superscript letters N and L indicate references to The model output is denoted by a 3-tuple χ = {Y, AN , AL }.
N-PoPs/endpoints, and the links that connect them. We also use Variables from Y = { yi,m, j , ∀ i ∈ N P , m ∈ F , j ∈ Um } indicate a VNF
superscript H to denote symbols that refer to a subset (sub-graph) placement, i.e. whether instance j of network function m is
of an SFC request. mapped to N-PoP i. The variables from AN = { aN i,q, j
, ∀ i ∈ NP , q ∈
The optimization model we use for solving the VNFPC problem Q, j ∈ NqS }, in turn, represent an assignment of required network
considers a set of SFCs Q and a physical infrastructure p, the latter functions/endpoints. They indicate whether node j (either a net-
is a triple p = (N P , LP , SP ). NP is a set of network nodes (either an work function or an endpoint), required by SFC q, is assigned
N-PoP or a packet forwarding/switching device), and pairs (i, j) ∈ LP to node i (either an N-PoP or another device in the network,
denote unidirectional physical links. We use two pairs in opposite respectively). Finally, variables from AL = { aLi, j,q,k,l , ∀ (i, j ) ∈ LP , q ∈
directions (e.g., (i, j) and (j, i)) to denote bidirectional links. The Q, (k, l ) ∈ LSq } indicate a chaining allocation, i.e. whether the vir-
set of tuples SP = {i, r | i ∈ N P ∧ r ∈ N∗ } contains the actual loca- tual link (k, l) from SFC q is being hosted by physical path (i, j).
tion (represented as an integer identifier) of N-PoP i. Observe that Each of these variables may assume a value in {0, 1}.
more than one N-PoP may be associated to the same location (e.g.
N-PoPs in a specific room or datacenter). The model captures the 3.2. Model formulation
following resource constraints: computing power for N-PoPs (ciP ),
and one-way bandwidth and delay for physical links (bPi, j and di,P j , Next we describe the ILP formulation for the VNFPC problem.
respectively). For convenience, Table 1 presents the complete notation used in
Observe that the forwarding graph of a SFC may represent any the formulation. The goal of the objective function is to minimize
topology. Figs. 1a and (b) illustrate topologies containing simple the number of VNF instances mapped on the infrastructure. That
transitions and flow branches (note that flow joins may also be choice was based on the fact that resource allocation accounts for
used). We assume, for simplicity, that the set of virtual paths avail- a significant and direct impact on operational costs. It is impor-
able to carry traffic flows is known in advance. This is important tant to emphasize, however, that other objective functions could be
because these paths enable determining end-to-end delays among straightforwardly adopted (either exclusively or several functions
pairs of endpoints, in our model. To illustrate, Fig. 1(a) shows two combined). Examples include number of VNF instances deployed,
virtual paths: one starting in A and ending in B, and another start- overall bandwidth commitment, end-to-end delays, energy-aware
ing in A and ending in C (both traversing VNFs 1 and 2). deployments, survivability of SFCs, VNF load balancing, just to
It is important to emphasize that the set of virtual paths can be name a few. As constraint sets (1)-(11) (detailed next) ensure a
defined, for example, according to the network policy determined feasible solution, any objective function being considered that does
by the network operator. For example, a policy may allow traffic not depend on any other constraints should work properly. How-
to go from and to all endpoints. In this case, our model requires ever, there are objective functions that might require some minor
all paths to be known in advance. Observe that path refer to the modifications to the model (e.g., additional constraints) in order to
sequence of virtual network functions and endpoints that a spe- work as expected (which is out of the scope of this work).
cific traffic should pass through in a particular SFC. This is not to  
be confused with the routing path in the physical infrastructure, minimize yi,m, j
i∈N P m∈F j∈Um
which is performed by the model – as shown next. This assump-
tion does not restrict the model, neither increase its complexity, subject to
since finding paths (e.g., the shortest one) is known to have poly- 
yi,m, j · fcpu (m, j ) ≤ ciP ∀i ∈ N P (1)
nomial time complexity.
m∈F j∈Um
The set of virtual paths of a SFC q is denoted by Hq . Each el-
ement Hq, i ∈ Hq is one possible sub-graph of q, and contains one   
source and one sink endpoint, and only one possible forward path
S
cq, N
j · ai,q, j ≤ yi,m, j · fcpu (m, j )
(as discussed above). The subsets Nq,i H ⊆ N S and LH ⊆ LS contain the q∈Q j∈NqS : ftype ( j )= ftype (m ) j∈Um
q q,i q
VNFs and links that belong to Hq, i . ∀i ∈ N P , m ∈ F (2)
A SFC q ∈ Q is an aggregation of network functions and chain-
ing between them. It is represented as a triple q = (NqS , LSq , SqS ).
 
Sets NqS and LSq contain the SFC nodes and virtual links connect- bSq,k,l · aLi, j,q,k,l ≤ bPi, j ∀(i, j ) ∈ LP (3)
ing them, respectively. Each SFC has at least two endpoints, denot- q∈Q (k,l ) ∈LSq
70 M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77

Table 1
Glossary of symbols and functions related to the optimization model.

Symbol Formal specification Definition

Sets and set objects


p p = ( N P , LP , S P ) Physical network infrastructure, composed of nodes and links
i ∈ NP N P = {i | i is a N-PoP } Network points of presence (N-PoPs) in the physical infrastructure
(i, j) ∈ LP LP = { (i, j ) | i, j ∈ N P } unidirectional links connecting pairs of N-PoPs i and j
i, r ∈ SP SP = {i, r | i ∈ N P ∧ r ∈ N∗ } identifier r of the actual location of N-PoP i
m∈F F = {m | m is a function type } types of virtual network functions available
j ∈ Um Um = { j | j is an instance of m ∈ F } instances of virtual network function m available
q∈Q Service Function Chaining (SFC) requests that must be deployed
q q = (NqS , LSq , SqS ) a single SFC request, composed of VNFs and their chainings
i ∈ NqS N S = {i | i is a VNF instance or endpoint } SFC nodes (either a network function instance or an endpoint)
(i, j ) ∈ LSq LSq = { (i, j ) | i, j ∈ N S } unidirectional links connecting SFC nodes
SqS SqS : NqS → N∗ function that determines the physical location of SFC endpoints
H
Hq,i ∈ HqS distinct forwarding paths (subgraphs) contained in a given SFC q
H
Hq,i H
Hq,i = (Nq,i
H
, LH
q,i
) a possible subgraph (with two endpoints only) of SFC q
H
Nq,i H
Nq,i ⊆ NqS VNFs that compose the SFC subgraph Hq,i H

LHq,i
LH
q,i
⊆ LSq links that compose the SFC subgraph Hq,i H

Parameters
ciP ∈ R+ computing power capacity of N-PoP i
bPi, j ∈ R+ one-way link bandwidth between N-PoPs i and j
di,P j ∈ R+ one-way link delay between N-PoPs i and j
S
cq,i ∈ R+ computing power required for network function i of SFC q
bSq,i, j ∈ R+ one-way link bandwidth required between nodes i and j of SFC q
dqS ∈ R+ maximum tolerable end-to-end delay of SFC q
Functions
ftype (m) ftype : NP ∪ NS → F type of some given virtual network function (VNF)
fcpu (m, j) f cpu : (F × Um ) → R+ computing power associated to instance j of VNF type m
fdelay (m) f delay : F → R+ processing delay associated to VNF type m
Variables
yi, m, j ∈ Y Y = { yi,m, j , ∀ i ∈ N P , m ∈ F, j ∈ Um } VNF placement
aNi,q, j ∈ AN AN = { aNi,q, j , ∀ i ∈ N P , q ∈ Q, j ∈ NqS } assignment of required network functions/endpoints
aLi, j,q,k,l ∈ AL AL = { aLi, j,q,k,l , ∀ (i, j ) ∈ LP , q ∈ Q, (k, l ) ∈ LSq } chaining allocation

that the sum of required flow processing capacities does not ex-
 ceed the amount available on a VNF instance deployed on a given
aNi,q, j = 1 ∀q ∈ Q, k ∈ NqS (4)
N-PoP. Finally, constraint set (3) ensures that the physical path be-
i∈N P
tween the required endpoints has enough bandwidth.
Constraint sets (4)–(6) ensure the mandatory placement of all
aNi,q,k · l = aNi,q,k · j ∀i, j ∈ SP , q ∈ Q, k, l  ∈ SqS (5) virtual resources. Constraint set (4) certifies that each SFC (and its
respective network functions) is mapped to the infrastructure (and
  only once). Constraint set (5), in turn, seeks to guarantee that re-
aNi,q,k ≤ yi,m, j ∀i ∈ NP , q ∈ Q, k ∈ NqS (6) quired endpoints are mapped to network devices in the requested
m∈F j∈Um :m= ftype (k ) physical locations. Constraint set (6) certifies that, if a VNF being
  requested by an SFC is assigned to a given N-PoP, then at least one
aLi, j,q,k,l − aLj,i,q,k,l = aNi,q,k − aNi,q,l VNF instance should be running (placed) on that N-PoP.
j∈N P j∈N P The constraints (7) refer to VNF chaining. Constraint set (7) en-
sures that there is an end-to-end path between required endpoints.
∀q ∈ Q, i ∈ NP , (k, l ) ∈ LSq (7)
Constraint set (8) certifies that latency constraints on mapped SFC
requests are met for each path. The first part of the equation
    is a sum of the delay incurred by end-to-end latencies between
aLi, j,q,k,l · di,P j + aNi,q, j · fdelay (k ) ≤ dqS
mapped endpoints belonging to the same path. The second part
(i, j )∈LP (k,l )∈LHq,t i∈N P k∈Nq,t
H
defines the delay incurred by packet processing on VNFs that are
∀q ∈ Q, (Nq,t
H
q,t ) ∈ Hq
, LH (8) traversed by flows in the same path.

3.3. Proof of NP-completeness


yi,m, j ∈ {0, 1} ∀ i ∈ NP , m ∈ F , j ∈ Um (9)
Although relatively simple for small instances, the VNF place-
ment and chaining (VNFPC) problem belongs to complexity class
aNi,q, j ∈ {0, 1} ∀ i ∈ NP , q ∈ Q, j ∈ NqS (10) NP-Complete. This aspect is further discussed next.

Lemma 1. The VNFPC problem belongs to the class NP.


aLi, j,q,k,l ∈ {0, 1} ∀ (i, j ) ∈ LP , q ∈ Q, (k, l ) ∈ LSq (11)
Proof. A solution for the problem is the set of paths used to map
The first three constraint sets refer to limitations of physical re- the virtual links, as well as the nodes where the functions were
sources. Constraint set (1) ensures that, for each N-PoP, the sum placed. Endpoints have to be mapped to pre-defined nodes, and
of computing power required by all VNF instances mapped to it checking this is trivial. Moreover, resources consumed by all func-
does not exceed its available capacity. Constraint set (2) certifies tions installed in each N-PoP cannot surpass their capacity. All di-
M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77 71

Fig. 2. Bin Packing instance reduction to the VNFPC problem.

rected paths between two endpoints of an SFC must be mapped 4. Fix-and-optimize heuristic for the VNF placement &
to a valid path in the infrastructure. Given the link mapping of chaining problem
the solution, it can be verified if each path starts and ends at
the pre-defined nodes, forming a valid path. While traversing the As shown in Section 3.3, VNFPC optimization problem is NP-
mapped paths, the flow processing capacities of nodes can be ac- complete. To tackle this complexity and come up with high qual-
counted, the path delay computed, and it is verified if the path has ity solutions efficiently, we introduce an algorithm that combines
enough bandwidth. Checking all that can be done in O(n3 ) per vir- mathematical programming with heuristic search. Our algorithm
tual network, since O(n) is spent traversing each path, and at most is based on fix-and-optimize and Variable Neighborhood Search
there n2 paths per virtual network. Thus, a VNFPC solution can be (VNS), techniques that have been successfully applied to solve
verified in polynomial time, and the problem is then classified as large/hard optimization problems [7,8,19]. A comprehensive view
NP.  of our proposal is shown in Algorithm 1.

Lemma 2. Any Bin Packing instance can be reduced to an instance of 4.1. Overview
the VNFPC problem.
In this section we provide a general view of our algorithm, us-
Proof. An instance of the Bin Packing Problem (BPP), which is a ing Fig. 3 as basis. It illustrates the search for a solution to the
classical NP-Complete problem [5], comprises a set Q of items, a VNFPC problem, considering the SFCs and network infrastructure
size cq for each item q = 1, . . . , |Q |, a positive integer bin capacity shown in Fig. 1.
B, and a positive integer k. The decision version of the BPP asks In our algorithm, we first compute an initial, feasible solution
if there is a partition of Q into k disjoint sets Q1 , Q2 , . . . , Qk such (or configuration) χ to the optimization problem (see Section 4.3).
that the sum of sizes of the items in each subset is at most B. We In the example shown in Fig. 3, the initial configuration (1) has
reduce any instance of the BPP to an instance of the VNFPC using VNF 1 placed on N-PoPs 3 and 5; VNF 2 placed on N-PoP 5; and
the following procedure: VNF 3 placed on N-PoP 3.
We then iteratively select a subset of N-PoPs, and enumerate
the list of variables yi,m, j ∈ D ⊆ Y related to them; variables listed
1. |Q| SFCs are created, each with exactly three nodes and two
in D will be subject to optimization, while others will remain un-
links. Each SFC has one origin endpoint, which must be mapped
changed (or fixed, hence fix-and-optimize). We take advantage of
to N-PoP o, and one target endpoint, which must be mapped to
VNS to systematically build subsets of N-PoPs (Section 4.4). We
t. The middle node is a VNF, which requires a computing power
also use a prioritization scheme to give preference to those sub-
of cq . Each link demands unitary bandwidth, and the maximum
sets with higher potential for improvement (Section 4.5).
delay requirement of each SFC is set to two (see Fig. 2(a)).
For each candidate subset of N-PoPs (processed in order of pri-
2. An NFV infrastructure is created with k + 2 N-PoPs (nodes) and
ority), we submit its decomposed set of variables D along with χ
2 · k logical links (arcs). The k central N-PoPs have a capacity
to a mathematical programming solver (Section 4.6). Here the goal
B, and are linked to two other special purpose N-PoPs, called o
is to obtain a set of values to those variables listed in D, so that a
and t. The logical links have an arbitrarily large bandwidth and
better configuration is reached. We evaluate configurations accord-
an insignificant delay (see Fig. 2(b)).
ing to the objective function of the model. In case there is no im-
provement, we rollback and pick the N-PoP subset that follows. We
The reduction has polynomial time complexity O(|Q|).  run this process iteratively until a better configuration χ is found.
Once it happens, we replace the current configuration with χ , and
Theorem 1. VNFPC is an NP-Complete problem. restart the search process (using χ as basis). This loop continues
until either we have explored the most promising combinations of
Proof. By the instance reduction presented, if the BPP instance has N-PoP subsets, or a time limit is exceeded.
a solution using k bins, then the VNFPC has a solution using k N- The loop explained above is illustrated in Fig. 3 through in-
PoPs. Consider that each item of size cq allocated to a bin j corre- stances (1)–(10). Note that, for each instance, a different subset of
sponds to place function from SFC q into N-PoP node j. Conversely, N-PoPs (determined using VNS) is picked, and the resulting con-
if the VNFPC has a solution using k N-PoPs, then the correspond- figuration (after optimized by the solver) is evaluated using the
ing BPP instance has a solution using k bins. To place function from model objective function. For example, instance (6) failed as it vio-
SFC q into N-PoP node j corresponds to allocate item of size cq to lated delay constraints, whereas the other instances did not reduce
a bin j. Lemmas 1 and 2 complete the proof.  resource commitment. When an improved configuration is found
72 M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77

Fig. 3. We start with an initial solution (1), which satisfies both SFCs: VNF 1 is placed on N-PoPs 3 and 5, VNF 2 placed on N-PoP 5, and VNF 3 on N-PoP 3. We then
explore the space of possible VNF assignments using VNS, starting with neighborhood size k = 2, and incrementing it in each iteration. For each round within a given
iteration, we pick a 2-tuple of N-PoPs from the 2-neighborhood (gray nodes in each round), and evaluate those N-PoPs as a prospective better solution. Observe that we first
evaluate tuples of N-PoPs which form a connected graph (instances (2) and (3)), and them the remainder tuples ((4)–(6)). We increment the neighborhood size to k = 3,
after evaluating all tuples in the 2-neighborhood (note the transition from instance (6) to (7)). An improvement (i.e. a solution with fewer resource commitment) is reached
in instance (9). The improved solution is shown in (10): it is obtained by removing instances of VNF 1 from N-PoPs 3 and 5, and placing a single VNF on N-PoP 2 (which
now serves both SFCs). After that, we reset k and use the improved solution as basis for exploring another neighborhood.

(10), the search process is restarted, now taking as basis the con- VNFPC problem may not exist (line 2). In this case, no solution is
figuration found. returned. Note that heuristic procedures could be used for gener-
ating χ , instead of a solver. However, we observed in practice that
4.2. Inputs and output CPLEX quickly finds an initial solution for this problem.

In addition to the optimization model input (described in the


previous section), our algorithm takes five additional parameters. 4.4. Variable Neighborhood Search
The first three are 1) a global execution time limit Tglobal , for the
algorithm itself, 2) an initial neighborhood size Ninit , for the VNS One important aspect of our algorithm is how to choose which
meta-heuristic, and 3) an increment step Ninc , for increasing the subset of variables D ∈ Y will be passed to the solver for optimiza-
neighborhood size. These parameters are discussed in Section 4.4. tion. As mentioned earlier, we approach it using VNS. It enables
The other parameters are 4) NoImprovmax , which represents the organizing the search space in k-neighborhoods. Each neighbor-
maximum number of rounds without improvement allowed per hood is determined as a function of the incumbent solution, and
neighborhood, and 5) Tlocal , which indicates the maximum amount a neighborhood size k. In each round, a neighbor (a tuple having
of time the solver may take for a single optimization run. They are k elements) is picked, and used to determine which subset of the
discussed in Sections 4.5 and 4.6, respectively. incumbent solution will be subject to optimization. In case an im-
Our algorithm produces as output a configuration χ = provement is made, VNS uses that improved solution as basis, thus
{Y, AN , AL } for the VNFPC problem. This configuration may be null, moving to a potentially more promising neighborhood (for achiev-
in case a feasible solution does not exist, given the network topol- ing further improvements).
ogy in place and SFCs submitted. We build a neighborhood as a combination (unordered tuple) of
any k N-PoPs, with at least one having a VNF assignment (line 5).
Each tuple is then decomposed, resulting in the subset of variables
4.3. Obtaining an initial configuration
D ∈ Y that will be subject to optimization (variable decomposition
is discussed in Section 4.6). The reason we focus only on N-PoPs to
The first step of our algorithm (line 1) is generating a configu-
build these tuples is that determining the assignment of required
ration χ to serve as basis for the fix-and-optimize search. To this
network functions (aN i,q, j
) and chaining allocations (aLi, j,q,k,l ) can be
end, we remove the objective function, therefore turning the opti-
mization problem into a factibility one. Then we use a commercial done straightforwardly once VNF placements (yi, m, j ) are defined.
solver (CPLEX1 ) for solving it. The resulting configuration is one The initial value for k is given as input to the algorithm, Ninit
that satisfies all constraints, though not necessarily a high quality (line 3). In the example shown in Fig. 3, we start with k = 2. In
one, in terms of resources required. Observe that a solution to the each iteration, we explore a neighborhood of size k (line 4) and,
if no improvement is made, we increment k in Ninc units (line
21). Observe in Fig. 3 that, after exploring the 2-neighborhood
1
http://www-01.ibm.com/software/commerce/optimization/cplex-optimizer/ space (sub-figures (2)-(6)), we make k ← k + 1 and start exploring
M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77 73

Algorithm 1 Overview of the fix-and-optimize heuristic for the might lead to a better configuration, or whose processing might
VNF placement and chaining problem. be relatively less complex.
Our heuristic for prioritizing tuples in the k-neighborhood set
Input: Tglobal global time limit
Vk relies on three key observations. First, solving VNF alloca-
Tlocal time limit for each solver run
tions/relocations becomes less complex when N-PoPs are adjacent.
Ninit initial neighborhood size
In other words, setting values for variables aN and aLi, j,q,k,l is rel-
Ninc increments for neighborhood size i,q, j
NoImprovmax max. rounds without improvement atively less complex when N-PoPs i and j are directly connected
Output: χ : best solution found to the optimization model through a physical path.
χ ← initial feasible configuration To explore the observation above in our algorithm, we break
if a feasible configuration does not exist then fail else down a k-neighborhood set into two distinct ones. The first one
k ← Ninit (line 6) is formed by tuples whose participating N-PoPs form a
while Tglobal is not exceeded and k ≤ |N P | do connected graph. The second set (line 7) is formed by remaining
Vk ← current neighborhood, formed of unordered tuples in Vk (Vk,any = Vk \ Vk,adj ), i.e. those tuples whose participat-
tuples of k nodes only, one of them (at least) having ing nodes form a disconnected graph. Observe in Fig. 3 that, for
a VNF assignment in the incumbent configuration χ each k-neighborhood, we first process the tuples of adjacent N-
Vk,adj ← neighbor tuples from Vk , whose nodes PoPs (Vadj ): sub-figures (2) and (3); (7) and (8); and (11)–(13)).
(for each tuple) form a connected graph Then, we process the remainder tuples (Vany ): sub-figures (2)-(6);
Vk,any ← Vk \ Vk,adj (9); and (14)-(15).
for each list V ∈ {Vk,adj , Vk,any }, while a better The second key observation is that N-PoPs having a higher
configuration is not found do number of VNFs allocated, but fewer SFC requests assigned, are
NoImprov ← 0 more likely candidates for optimization. Examples of such op-
while V = ∅ and NoImprov ≤ NoImprovmax and timization are the removal of (some) VNFs allocated to those
better configuration is not found do N-PoPs, or merge of those VNFs in a single N-PoP. We explore this
v ← next unvisited, highest priority observation by establishing a tuple priority, as a function of its
neighbor tuple from V residual capacity, and processing higher priority tuples first (line
D ← decomposed list of variables yi,m,u from 11). The residual capacity of a tuple v is given by r : Vk → R, de-
those nodes listed in neighbor tuple v fined as a ratio of assigned VNFs to placed ones (according to
χ ← configuration χ optimized by the solver, Eq. (9)).
performed under time limit Tlocal , and  
  
making those variables not listed in D as fixed r (v ) = exp aNi,q, j − yi,m, j (9)
if χ is a better configuration than χ then i∈N P q∈Q j∈NqS i∈N P m∈F j∈Um
update χ to reflect configuration χ
else The third observation is a complement of the previous one. As
NoImprov ← NoImprov + 1 the priority of a neighbor decreases, it becomes less likely that it
end if will lead to any optimization at all. The reason is trivial: lower
end while priority tuples are often composed of overcommitted N-PoPs, for
end for which removal/relocation of allocated/assigned VNFs is more un-
if no improvement was made then k ← k + Ninc likely. Therefore, when processing a neighborhood, we can skip a
else k ← Ninit end if certain fraction of low priority neighbors, with a high confidence
end while that no optimization opportunity will be missed.
return χ To this end, our algorithm takes as input NoImprovmax . It indi-
end if cates the maximum number of rounds without improvement that
is allowed over a neighborhood subset. Before processing a given
subset, we reset the counter of rounds without improvements,
a neighborhood composed of 3-tuples of N-PoPs (instance (7)). The NoImprov (line 9). This counter is incremented for every round
highest value k may assume is limited by the number of N-PoPs. in which no improvement is made (lines 14–17). We stop pro-
On the event of an improvement, we reset the neighborhood cessing the current neighborhood subset once NoImprov exceeds
size (to Ninit ), and restart the search process. This is illustrated NoImprovmax (line 10).
in Fig. 3, in the transition from instance (10) to (11). It is impor-
tant to emphasize that same size neighborhoods do not imply in 4.6. Configuration decomposition and optimization
same neighborhoods. For example, consider subfigures (2) and (11)
in Fig. 3. Although the 2-tuples are composed of the same N-PoPs, Decomposing a configuration (i.e. building a subset of vari-
each belong to a different 2-neighborhood, as each is associated to ables D ⊆ Y from χ for optimization) means enumerating all
a distinct configuration χ . The iteration over neighborhoods, and yi, m, j variables that can be optimized (line 12). It basically con-
evaluation of k-tuples within in a neighborhood, continues until sists of listing all yi, m, j variables related to each N-PoP in the
either Tglobal is exceeded, or k exceeds the number of N-PoPs in current neighbor tuple v, considering all network function types
the infrastructure (line 4). m ∈ F, and function instances available j ∈ Um . Formally, we have
D = { yi,m, j ∈ Y | i ∈ v }.
Once the decomposition is done, the incumbent configuration
4.5. Neighborhood selection and prioritization χ and set D are submitted to the solver (line 13). As mentioned
earlier, the solver will consider variables listed in D as free (for
The time required by the solver to process a configuration χ optimization), and those not listed as fixed (i.e. no change in their
and a subset D ⊆ Y is often small. However, processing every can- values may be made). Observe that this restriction does not af-
didate subset D from the entire k-neighborhood can be cumber- fect those variables related to the assignment of required network
some. For this reason, we prioritize those neighbor tuples that functions (aN i,q, j
) and to chaining allocation (aLi, j,q,k,l ).
74 M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77

We also limit each optimization run to a certain amount of time of: a source endpoint and then a link (supporting 1 Gbps of traffic
Tlocal (passed as parameter). This enables us to allocating a signifi- flow) to an instance of VNF X; a flow branch follows VNF X, each
cant amount of the global time limit Tglobal on fewer but extremely branch with 500 Mbps output flow, linking to a distinct instance
complex χ and D instances. of VNF Y; finally, each flow links to a distinct sink endpoint. De-
spite their simplicity, we argue that such service chains are enough
5. Evaluation to evaluate the scalability of our model and represent the usual
service chains use cases (i.e., lines or simple bifurcation). For de-
In this section we evaluate the effectiveness of our math- tails regarding resource consumption for different service chains,
heuristic algorithm in generating feasible solutions to the VNFPC we kindly refer the reader to our previous work [12]. Instances of
problem, comparing its performance with those of polynomial VNF X require a normalized computing power capacity of 0.125 for
(Lewin-Eytan et al. [11]) and non-polynomial approaches (namely, small loads, and 1 for large loads, and have a processing delay of
our previous work [12] and CPLEX optimization solver). We an- 0.6475 s. Instances of VNF Y require, in turn, 0.125 and 0.25 for
alyze placement & chaining solutions achieved, also discussing small and large loads, and impose a processing delay of 7.0771s.
them from the perspective of a globally optimal one obtained with VNF processing delays were determined after a study from Do-
CPLEX (whenever they are feasible to compute). For the sake of brescu et al. [4].
comparing with Lewin-Eytan et al. [11], we have adjusted our ex- In the sections that follow, we adopted the following parameter
periment instances to be applied to their algorithm, without con- setting for our approach: Tglobal = 10, 0 0 0 s, Tlocal = 20 0 s, Ninit =
sidering the chaining part (not approached in their research), thus 2, Ninc = 1, and NoImprovmax = 15. In Section 5.4 we present an
enabling a fair comparison of results achieved. We also establish analysis of our approach considering other parameter settings.
a lower bound for the VNFPC optimization, computed by relax-
ing output variables of the mathematical model and making them
linear (instead of discrete), thus representing a solid reference for 5.2. Our heuristic algorithm compared to existing approaches
what the optimal solution to the VNFPC problem looks like, in a
given large scenario. Fig. 4 provides an overview of achieved results, focusing on re-
We used CPLEX v12.4 for solving the optimization models, and source commitment of computed solutions, and time required to
Java for implementing the algorithms. The experiments were run generate them. The main takeaway here is that our approach is
on Windows Azure Platform – more specifically, an Ubuntu 14.04 able to generate significantly better solutions (closer to the lower
LTS VM instance, featuring an Intel Xeon processor E5 v3 family, bound) in a reasonable time, compared to existing ones. Observe
with 32 cores, 448 GB of RAM, and 6 TB of SSD storage. Next we in Fig. 4(a), for example, that our approach generated a solution
describe our experiment setup, followed by results obtained. distant at most 2.1 times from the lower bound, in a more ex-
treme case with 10 0 0 N-PoPs and 180 requested functions. This
5.1. Setup is a significantly smaller gap compared to other approaches, even
considering their best cases. The measured distance was 4.97 times
The physical network substrate was generated with Brite,2 fol- for Lewin-Eytan et al. [11], in the scenario with 200 N-PoPs and
lowing the Barabasi-Albert (BA-2) [1] model. The reason to use this 120 requested functions, whereas for Luizelli et al. [12] it was 6.7
model is the flexibility it enables for generating large enough phys- times, considering 200 N-PoPs and 60 functions.
ical infrastructures for the purposes of our evaluation. Observe that Observe that finding an optimum solution (with CPLEX) is un-
such infrastructures are in line with scenarios likely to adopt NFV, feasible even for very small instances (200 N-PoPs and less than
as network functions shift to cloud nodes highly distributed over 20 requested functions). Luizelli et al. [12] also failed short in scal-
the infrastructure [21]. Also, as we mentioned earlier, some surveys ing to such small instances, requiring more than 10k s for those
have shown that certain very large networks have around 20 0 0 around 60 requested functions.
functions deployed [21], and that this number often compares to An important caveat regarding the time required to obtain a so-
those of routers and switches in place [20]. Barabasi-Albert model lution is that, for our approach, we measured the time elapsed un-
is capable of generating network topologies that are comparable in til it generated the highest quality solution. Note however that our
size and complexity, thus enabling us to reasonably assess perfor- algorithm is allowed to continue running, until Tglobal is passed or
mance aspects of our algorithm (and weigh it against related ones). all neighborhoods are explored.
The number of N-PoPs and physical paths between them varied
in each scenario, ranging from 200 to 1000 N-PoPs, and from 793
to 3993 links. The computing power of each N-PoP is normalized 5.3. Qualitative analysis of generated solutions
and set to 1, and each link has bandwidth capacity of 10 Gbps.
Average delay between any pair of N-PoPs is 90 ms, value adopted Here we dive deeper on measuring the quality of generated so-
after a study from Choi et al. [3]. It is important to emphasize here lutions, analyzing their over-provisioning with regard to computing
that our goal was to assign latency values comparable to those ob- power and bandwidth. Observe from Fig. 5 that our proposal led to
served in real-world settings, for all pairs of endpoints in the in- comparatively higher quality solutions, minimizing the amount of
frastructure. In this context, the values adopted ensure that there allocated but unused computing power in N-PoPs. By “unused” we
exists at least one delay-bounded path between any pair of nodes mean that resources were allocated to VNFs deployed on N-PoPs,
(i.e., the delay of all links used in a single path should be less than but the capacity of the VNFs is not fully consumed by assigned
90ms). SFCs.
In order to fully grasp the efficiency and effectiveness of our ap- The performance of Lewin-Eytan et al. [11] is worse in this as-
proach, we carried out experiments considering a wide variety of pect mostly because their goal is not only minimize resource com-
scenarios and parameter settings. For the sake of space constraints, mitment, but also the distance (hops) a flow must traverse to reach
we concentrate on a subset of them. Our workload, for example, a required VNF. It is also important to emphasize that their the-
comprised from 1 to 80 SFCs. The topology of each SFC consisted oretical model has simplifications, which result in solutions that
often extrapolate resource commitment. Since the authors did not
approach VNF chaining in their model, bandwidth overhead could
2
http://www.cs.bu.edu/brite/ not be measured for their case.
M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77 75

Fig. 4. Number of deployed VNFs and required time to compute a feasible placement and chaining.

Fig. 5. Analysis of resource commitment (computing power and bandwidth) of each solution generated.

5.4. Sensitivity analysis


Table 2
Assessment of the quality of generated solutions, and distance to lower bound,
We conclude our analysis with a glimpse on experiments vary-
under various parameter settings.
ing input parameter settings, whose results are summarized in
Table 2. The cells in gray show how far our approach is from the Tlocal
Ninit
lower bound (in terms of resources allocated), in a scenario with 50 factor 60 factor 100 factor
600 N-PoPs and 120 functions. We focused on the following val- 2 59 5.24 51 4.53 51 4.53
ues for Tglobal : 60 0, 1,0 0 0, and 150 0 s.; for Tlocal : 50, 60, and 10 0 s.; Tglobal = 600 sec. 4 59 5.24 59 5.24 42 3.73
6 59 5.24 59 5.24 33 2.93
and for Ninit : 2, 4, and 6.
Tlocal
Observe in Table 2 that our approach was at most 5.24 times Ninit
50 factor 60 factor 100 factor
far from the lower bound, having allocated 59 VNFs (compared to 2 32 2.84 16 1.42 15 1.33
a lower bound of 11.75 VNFs). More importantly, observe that such Tglobal = 1,000 sec. 4 44 3.91 12 1.06 12 1.06
distance decreases as we provide more execution time for both the 6 40 3.55 18 1.60 12 1.06
algorithm itself (Tglobal ) and for each optimization instance (Tlocal ). Tlocal
Ninit
In the best case scenario, our approach was only 1.06 times distant 50 factor 60 factor 100 factor
from the lower bound, allocating the least number of VNFs (12). 2 20 1.77 12 1.06 12 1.06
Observe also a trade-off when setting Tlocal . On one hand, larger Tglobal = 1,500 sec. 4 36 3.20 14 1.24 12 1.06
values enables solving each sub-problem instance with higher 6 36 3.20 15 1.33 12 1.06
quality, as one may note in Table 2. On the other hand, smaller
76 M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77

Table 3 need to be reorganized (reassigned) in response to fluctuations in


Assessment of the performance of the worst case scenario (i.e., 240 VNFs) for all
traffic demand (for example), so that service level agreements are
instances sizes.
not violated. We also intend addressing metrics (such as delay and
Instance size Solution Improv. factor Iterations Avg. TLocal ratio (%) throughput) that vary significantly throughout time – an essen-
Initial Best tially challenging problem.
200 155 30 5.16 153 6 15.38
References
400 121 27 4.48 91 30 0.38
600 109 27 4.03 77 88 0.38
[1] R. Albert, A.-L. Barabási, Topology of evolving networks: local events and uni-
800 102 31 3.29 55 121 19.23
versality, Phys. Rev. Lett. 85 (20 0 0) 5234–5237.
10 0 0 92 38 2.42 60 130 46.15
[2] M.F. Bari, S.R. Chowdhury, R. Ahmed, R. Boutaba, On orchestrating virtual net-
work functions, in: Proceedings of the 2015 11th International Conference on
Network and Service Management (CNSM), 2015, pp. 50–56.
[3] B.-Y. Choi, S. Moon, Z.-L. Zhang, K. Papagiannaki, C. Diot, Analysis of point–
values enable avoiding more complex sub-problem instances, for
to-point packet delay in an operational network, Comput. Netw. 51 (2007)
which CPLEX requires far more RAM memory to solve. It also 3812–3827.
leaves the algorithm with more global time available for exploring [4] M. Dobrescu, K. Argyraki, S. Ratnasamy, Toward predictable performance in
software packet-processing platforms, in: Proceedings of the 9th USENIX Con-
other promising sub-problem instances.
ference on Networked Systems Design and Implementation (NDSI 2012), 2012.
Finally, in Table 3 we illustrate how our algorithm improves the [5] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory
initial solution obtained with CPLEX, when running a feasibility of NP-Completeness, W. H. Freeman & Co., New York, NY, USA, 1979.
version of the problem (as discussed in Section 4.3) in the worst [6] A. Gember-Jacobson, R. Viswanathan, C. Prakash, R. Grandl, J. Khalid, S. Das,
A. Akella, Opennf: enabling innovation in network function control, in: Pro-
case scenario, i.e., when deploying 240 VNFs altogether. One may ceedings of the 2014 ACM Conference on Computer Communications (SIG-
observe that, on average, our approach is able to compute a so- COMM 2014), ACM, New York, NY, USA, 2014, pp. 163–174.
lution up to 5.16 times better (regarding deployed VNFs) than the [7] P. Hansen, N. Mladenovi, Variable neighborhood search: principles and appli-
cations, Eur. J. Oper. Res. 130 (3) (2001) 449–467.
initial solution. In this table, we also included the number of iter- [8] S. Helber, F. Sahling, A fix-and-optimize approach for the multi-level capaci-
ations our algorithm runs until finding the best-reported solution. tated lot sizing problem, Int. J. Prod. Econ. 123 (2) (2010) 247–256.
Note that it decreases as larger instances are handled by the al- [9] J. Hwang, K.K. Ramakrishnan, T. Wood, Netvm: high performance and flexible
networking using virtualization on commodity platforms, in: Proceedings of
gorithm since the time needed to compute each sub-problem in- the 11th USENIX Conference on Networked Systems Design and Implementa-
creases with the instance size. Further, the observed ratio (%) be- tion (NSDI 2014), 2014, pp. 445–458.
tween best-reported solution and lower bound is of 17.69 in the [10] T. Kuo, B. Liou, J. Lin, M. Tsai, Deploying chains of virtual network functions:
on the relation between link and server usage, in: Proceedings of the IEEE
worst case. Recall that the lower bound for the VNFPC optimization
International Conference on Computer Communications (INFOCOM 2016), San
is a solution computed by relaxing output variables of the mathe- Francisco, USA, 2016.
matical model and making them linear (instead of discrete). [11] L. Lewin-Eytan, J. Naor, R. Cohen, D. Raz, Near optimal placement of virtual
network functions, in: Proceedings of the IEEE International Conference on
Computer Communications (INFOCOM 2015), IEEE, New York, NY, USA, 2015,
6. Final considerations pp. 1346–1354.
[12] M.C. Luizelli, L.R. Bays, L.S. Buriol, M.P. Barcellos, L.P. Gaspary, Piecing together
While Network Function Virtualization (NFV) is increasingly the NFV provisioning puzzle: efficient placement and chaining of virtual net-
work functions, in: Proceedings of the IFIP/IEEE International Symposium on
gaining momentum, with promising benefits of flexible service Integrated Network Management (IM 2015), IEEE, New York, NY, USA, 2015,
function deployment and reduced operations & management costs, pp. 98–106.
there are several challenges that remain to be properly tackled, so [13] T. Lukovszki, M. Rost, S. Schmid, It’S a match!: near-optimal and incremen-
tal middlebox deployment, SIGCOMM Comput. Commun. Rev. 46 (1) (2016)
that it can realize its full potential. One of these challenges, which
30–36.
has a significant impact on the NFV production chain, is effectively [14] T. Lukovszki, S. Schmid, Online Admission Control and Embedding of Service
and efficiently deploying service functions, while ensuring that ser- Chains, Springer International Publishing, Cham, pp. 104–118. doi:10.1007/978-
3-319-25258-2_8.
vice level agreements are satisfied, and making wise allocation of
[15] V. Maniezzo, T. Stützle, S. Voß, Hybridizing metaheuristics and mathematical
network resources. programming, Ann.Inf. Syst. 10 (2009).
Amid various other aspects involved, Virtual Network Func- [16] S. Mehraghdam, M. Keller, H. Karl, Specifying and placing chains of virtual net-
tion Placement and Chaining (VNFPC) is key to fulfilling this work functions, in: Proceedings of the 2014 IEEE 3rd International Conference
on Cloud Networking (CloudNet), 2014, pp. 7–13.
challenge. VNFPC poses however an important trade-off between [17] H. Moens, F. De Turck, Vnf-p: a model for efficient placement of virtualized
quality, efficiency, and scalability, that previously proposed solu- network functions, in: Proceedings of the 10th International Conference on
tions [11,12,17] have failed to satisfy simultaneously. This is no sur- Network and Service Management (CNSM 2014), 2014, pp. 418–423.
[18] M. Rost, S. Schmid, Service chain and virtual network embeddings: approxima-
prise, though, given that the complexity of solving this problem is tions using randomized rounding, CoRR abs/1604.02180 (2016).
NP-complete, as we have demonstrated. [19] F. Sahling, L. Buschkhl, H. Tempelmeier, S. Helber, Solving a multi-level capac-
In this paper, we combined mathematical programming and itated lot sizing problem with multi-period setup carry-over via a fix-and-op-
timize heuristic, Comput. Oper. Res. 36 (9) (2009) 2546–2553.
meta-heuristic algorithms to propose a novel approach for solving [20] V. Sekar, S. Ratnasamy, M.K. Reiter, N. Egi, G. Shi, The middlebox manifesto:
VNFPC. The results achieved not only evidenced the potentialities enabling innovation in middlebox deployment, in: Proceedings of the 10th
of fix-and-optimize and Variable Neighborhood Search as building ACM Workshop on Hot Topics in Networks (HotNets-X), ACM, New York, NY,
USA, 2011, pp. 21:1–21:6.
blocks for systematically exploring the VNFPC solution space. More
[21] J. Sherry, S. Hasan, C. Scott, A. Krishnamurthy, S. Ratnasamy, V. Sekar, Making
importantly, they have shown the significant improvement of our middleboxes someone else’s problem: network processing as a cloud service,
approach over the state-of-the-art – considering the quality (500% ACM SIGCOMM Comput. Commun. Rev. 42 (4) (2012) 13–24.
[22] M. Yu, Y. Yi, J. Rexford, M. Chiang, Rethinking virtual network embedding: sub-
better solutions on average), efficiency (in the order of a couple of
strate support for path splitting and migration, ACM SIGCOMM Comput. Com-
hours in the worst case), and scalability (to the order of hundreds) mun. Rev. 38 (2) (2008) 17–29.
tripod. [23] Y. Zhu, M. Ammar, Algorithms for assigning substrate network resources to vir-
In spite of the progresses achieved, much work remains. One tual network components, in: Proceedings of the 25th IEEE International Con-
ference on Computer Communications (INFOCOM 2006), 2006, pp. 1–12.
promising direction is extending our approach to deal with con-
stantly evolving network conditions. In such cases, assigned SFCs
M.C. Luizelli et al. / Computer Communications 102 (2017) 67–77 77

Marcelo Caggiani Luizelli is a Ph.D. candidate at the Institute of Informatics (INF) of the Federal University of Rio Grande do Sul (UFRGS), Brazil. He
holds a M.Sc. degree in Computer Science from Federal University of Rio Grande do Sul (2014). Currently, he is a visiting student at the Computer
Science Department of Technion University (Israel) and a visiting research student at Bell Labs (Israel). His research interests are computer networks,
algorithms, and optimization on NFV and SDN.

Weverton Luis da Costa Cordeiro is a Postdoctoral Research Fellow at the Institute of Informatics of the Federal University of Rio Grande do
Sul, Brazil. He holds a Ph.D. degree in Computer Science from the Federal University of Rio Grande do Sul (2014). His research interests include
large scale distributed systems, information technology service management, software defined networks, network security and monitoring, future
internet, and mobile ad hoc networks.

Luciana Salete Buriol received her B.Sc. in Computer Science (1998) from the Federal University of Santa Maria, Brazil. Her M.Sc. (20 0 0) and Ph.D.
(2003) were obtained at the State University of Campinas (UNICAMP), Sao Paulo, Brazil. In 2001 and 2002 she spent 15 months as a visiting scholar
at the Algorithms and Optimization Research Department, AT&T Labs. Research, USA. In 2004 and 2005 she was a postdoc at the Computer Science
Department, University of Rome, Italy. Since 2006 she is Associate Professor of Computer Science at the Federal University of Rio Grande do Sul
(UFRGS), Porto Alegre, Brazil. Her main research interests are in optimization and algorithms. Buriol is currently a CNPq (Brazilian National Research
Council) Advanced Fellow, was President of ALIO in the period of 2012–2014, and vice-Presidente of IFORS (International Federation of Operational
Research Societies) in the period 2016–2018.

Luciano Paschoal Gaspary holds a Ph.D. in Computer Science (PPGC/UFRGS, 2002) and serves as Associate Professor at the Institute of Informatics,
UFRGS. From 2008 to 2014, he worked as Director of the National Laboratory on Computer Networks (LARC) and, from 2009 to 2013, was Managing
Director of the Brazilian Computer Society (SBC). Prof. Gaspary has been involved in various research areas, mainly Computer Networks, Network
Management and Computer System Security. He is author of more than 120 full papers published in leading peer-reviewed publications and has a
history of dedication to research activities such as organization of scientific events, participation in the TPC of relevant IEEE, IFIP and ACM confer-
ences and symposia, and participation as editorial board member of various journals. He is member of the SBC, IEEE, and ACM. More information
about Prof. Gaspary can be found at http://www.inf.ufrgs.br/∼paschoal/.
148

APPENDIX D PAPER PUBLISHED AT IFIP/IEEE IM 2017

• Title: The Actual Cost of Software Switching for NFV Chaining


• Conference: IFIP/IEEE Integrated Network Management Symposium (IM 2017)
• Type: Main track (full-paper)
• Qualis: A2
• Date: May 8-12, 2017
• Held at: Lisbon, Portugal
Abstract. Network Function Virtualization (NFV) is a novel paradigm that enables flexible
and scalable implementation of network services on cloud infrastructure. An important enabler
for the NFV paradigm is software switching, which should satisfy rigid network requirements
such as high throughput and low latency. Despite recent research activities in the field of NFV,
not much attention was given to understand the costs of software switching in NFV deploy-
ments. Existing approaches for traffic steering and orchestration of virtual network functions
either neglect the cost of software switching or assume that it can be provided as an input, and
therefore real NFV deployments of network services are often suboptimal. In this work, we
conduct an extensive and in-depth evaluation that examines the impact of service chaining de-
ployments on Open vSwitch – the de facto standard software switch for cloud environments.
We provide insights on network performance metrics such as throughput, CPU utilization and
packet processing, while considering different placement strategies of a service chain. We then
use these insights to provide an abstract generalized cost function that accurately captures the
CPU switching cost of deployed service chains. This cost is an essential building block for any
practical optimized placement management and orchestration strategy for NFV service chain-
ing
The Actual Cost of Software Switching for NFV
Chaining
Marcelo Caggiani Luizelli∗ Danny Raz Yaniv Sa’ar Jose Yallouz∗
Federal University of Nokia, Bell Labs Nokia, Bell Labs Technion Israel Institute
Rio Grande do Sul, Brazil danny.raz@nokia-bell-labs.com yaniv.saar@nokia-bell-labs.com of Technology, Israel
mcluizelli@inf.ufrgs.br jose@tx.technion.ac.il

Abstract— Network Function Virtualization (NFV) is a novel In current hardware-based networks, operators are required
paradigm that enables flexible and scalable implementation of to manually route cables and compose physical chains of mid-
network services on cloud infrastructure. An important enabler dleboxes in order to provide services, a problem that is know
for the NFV paradigm is software switching, which should satisfy
rigid network requirements such as high throughput and low as Service Function Chaining (SFC) [3]. The manual compo-
latency. Despite recent research activities in the field of NFV, sition of SFC is error prone and static in nature (i.e., hard to
not much attention was given to understand the costs of software change or physically relocate), and therefore expensive. On the
switching in NFV deployments. Existing approaches for traffic other hand, NFV-based service chaining (forwarding graph),
steering and orchestration of virtual network functions either enables flexible placement and management of services, which
neglect the cost of software switching or assume that it can be
provided as an input, and therefore real NFV deployments of allows a much more efficient utilization of resources, since
network services are often suboptimal. the same computation resources can be consumed by different
In this work, we conduct an extensive and in-depth evaluation NFs in a dynamic way. Thus, a key expected success criteria
that examines the impact of service chaining deployments on for NFV is efficient resource utilization and reduced cost of
Open vSwitch – the de facto standard software switch for cloud operation.
environments. We provide insights on network performance
metrics such as latency, throughput, CPU utilization and packet However, placement of service chain in NFV-based in-
processing, while considering different placement strategies of frastructure is not trivial, and introduces new challenges that
a service chain. We then use these insights to provide an
abstract generalized cost function that accurately captures the
need to be addressed. First, NFV is an inherently distributed
CPU switching cost of deployed service chains. This cost is an network design based on small cloud nodes spread over the
essential building block for any practical optimized placement network. Second, even when considering a single (small) data
management and orchestration strategy for NFV service chaining center, network services might face performance penalties and
limitations on critical network metrics, such as throughput,
I. I NTRODUCTION latency, and jitter, depending on how network functions are
chained and deployed. This is one of the most interesting
Traditional telecommunication and service networks heavily challenges for network providers in the shift to NFV, namely
rely on proprietary hardware appliances (also called mid- identify good placement strategies that minimize the provi-
dleboxes) that implement Network Functions (NF). These sioning and operation cost of deploying a service chain.
appliances support a variety of NF ranging from security
(e.g., firewall, intrusion detection/prevention system) through OpenStack [4] is the most popular open source cloud
performance (e.g., caching and proxy [1]) to wireless and voice orchestration system, its scheduling (placement) component
functions (e.g., vRAN and IMS [2]). is called nova-scheduler, and can be utilized to place a
Due to the increasing demand for network services, sequence of VNFs. When doing so the resulting placement
providers are deploying an increasing number of NFs. This objective can be either load balancing (i.e., distributing VNFs
requires more and more hardware-based dedicated boxes, and between resources) or energy conserving (i.e., gathering VNFs
thus deploying new NFs is becoming a challenging (and on a selected resource). Consider the two placement strategies
cumbersome) task – which directly leads to high operational presented in Figure 1. In this example, we are given three
costs. The emerging paradigm of Network Function Virtual- identical servers A, B and C, and three identical service chains
ization (NFV) aims to address the aforementioned problems {ϕ1 , ϕ2 , ϕ3 }. Each service chain ϕi (for i = 1, 2, 3) is tagged
by reshaping the architectural design of the network [2]. Es- with a fixed amount of required traffic and is composed of
sentially, NFV leverages traditional virtualization by replacing three chained VNFs: ϕi = hϕi1 → ϕi2 → ϕi3 i. Figure 1(a)
NFs with software counterparts that are referred to as Virtual illustrates the first placement strategy (referred to as “gather”),
Network Function (VNF). Virtualization technologies enable where all VNFs composing a single chain are deployed on the
to consolidate VNF onto standard commodity hardware (e.g. same server. Note that in the gather case the majority of traffic
servers, switches, and storage), and support dynamic situations steering is done by the server’s internal virtual switching.
in a flexible and cost-effective way (e.g., service provisioning Figure 1(b) illustrates the second placement strategy (referred
on demand). to as “distribute”), where each VNF is deployed on a different
server, and therefore the majority of traffic steering is done
∗ Work done while in Nokia, Bell Labs. between servers by external switches.
25
Server 𝑪 Kernel − Gather Kernel − Distribute
𝝋𝟑𝟏 𝝋𝟑𝟐 𝝋𝟑𝟑 20
DPDK − Gather ● DPDK − Distribute

Latency (ms)
Starting
Server 𝑩
𝝋𝟐𝟏 𝝋𝟐𝟐 𝝋𝟐𝟑
Point 15

10 ●
End
Server 𝑨 ●
𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑

Point ●

5 ●




● ●
0 ●●

1 5 10 15 20 25 30
# Chaining Size
(a) Gather placement strategy.

Fig. 2: End-to-end latency on different deployment strategies.


Server 𝑪
𝝋𝟏𝟑 𝝋𝟐𝟑 𝝋𝟑𝟑
Starting
Server 𝑩 of software switching for different placement strategies over
Point 𝝋𝟏𝟐 𝝋𝟐𝟐 𝝋𝟑𝟐
NFV-based infrastructure. We conduct an extensive and in-
Server 𝑨
End depth evaluation, measuring the performance and analyzing
𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 Point
the impact of deploying a service chain on Open vSwitch –
the de facto standard software switch for cloud environments
[5]–[7]. Based on our evaluation, we then continue to craft a
(b) Distribute placement strategy. generalized abstract cost function that accurately captures the
CPU cost of network switching. The main contributions of
Fig. 1: Example of strategies to deploy 3 given service chains. this paper can be summarized as follows: (i) comprehensive
evaluation of placement strategies of service chains – based
on a real NFV-based infrastructure, we examine our placement
One can immediately see that these two placement strategies strategies, and assess performance metrics such as throughput,
require from the VNF the same amount of resources to packet processing, and resource consumption1 ; and (ii) model
operate, however differ in the cost of their software switching the cost of network switching – given an arbitrary number
(in terms of CPU consumption). Additionally, it is also not of service chains, our model accurately predicts the CPU cost
completely clear how would the network perform for each of the network switching they require.
of the placement strategies, with respect to metrics such as The rest of the paper is organized as follows. In Section II
throughput and latency. For instance, Figure 2 depicts the we define our model and in Section III we provide an extensive
service latency (end-to-end), given a set of servers that are and in-depth evaluation of software switching performance.
installed with either kernel-OV S or DPDK-OV S. Per each in- In Section IV we analyze the results and build an abstract
stalled environment, we evaluate the two placement strategies generalized cost function for our model. In Section V we
discussed above. Obviously the latency increases as the chain discuss related works, and finally in Section VI we conclude
size increases, however note that there is a non-negligible our work and provide future directions.
difference between the two placement strategies. When the
servers are installed with kernel-OV S (resp. DPDK-OV S) the II. M ODEL D EFINITION
service latency of the distribute placement requires 50% more In this section we define our model to estimate the cost of
time (resp. 100% more time) than the service latency time software switching for different placement strategies.
required for the gather placement. For a server S we denote by S t the maximum throughput
Going back to Figure 1, in order to decide which of the that server S can support (i.e., the wire limit of the installed
placement strategy is better, we need to identify the specific NIC). Following the recommended best practice for virtual-
optimization criteria of interest. In this paper we focus on ac- ization intense environment ( [8]), NFV servers require to be
curately estimating the virtual switching cost. For NFV-based configured with two disjoint sets of CPU-cores, one for the
infrastructure, virtual switching is an essential building block hypervisor and the other for the VNF to operate. We denote by
that on one hand enables flexible communication between S h (and S v ) the number of CPU-cores that are allocated and
VNFs; but on the other hand, it comes with a cost of resource reserved solely for the hypervisor (and guests, respectively)
utilization. Therefore assessing, understanding, and crafting to operate. The total number of CPU-cores installed in server
a monolithic CPU cost function for software switching in S is denoted by S c . Throughout this paper, unless explicitly
a NFV-based environment is an extremely important task saying otherwise, we assume that we are given a set of k
and a crucial step in order to: (i) efficiently guide VNF servers S = {S1 , S2 , . . . , Sk }.
placement in a NFV-based infrastructure; (ii) understand how A service chain is defined to be an ordered set of VNFs
software switching impacts deployed services, namely analyze ϕ = hϕ1 → ϕ2 . . . → ϕn i. We denote by ϕ|n the length of the
and comprehend how latency- and/or throughput-sensitive service chain, and define ϕ|p (and ϕ|s ) to be the number of
services behave; (iii) whenever possible, reduce the costs to packets per second (and average packet size, respectively) that
the NFV provider and ultimately, foster the development of
cost optimized deployment solutions. 1 Unlike other performance studies, our goal is not to delivers the best per-
In this paper, we develop a model to estimate the cost formance but rather to evaluate the performance of a given NF configuration.

2
service chain ϕ is required to process. Note that ϕ|p and ϕ|s E5-2697v2 processors, and each processor is made of 12
are properties of service chain ϕ, namely given two service physical cores at 2.7 Ghz. One server is our Design Under
chains ϕ and ψ if ϕ|p 6= ψ|p or ϕ|s 6= ψ|s then ϕ 6= ψ. Test (DUT), and the other is our traffic generator. The servers
Throughout the paper, unless explicitly saying otherwise, we have two NUMA nodes, each has 192 GBytes RAM (total of
assume that we are given a set of m identical service chains 384 GBytes), and an Intel 82599ES 10 Gbit/s network device
Φ = {ϕ1 , ϕ2 , ϕ3 , . . . ϕm }, i.e. ∀i,j=1..m : ϕi = ϕj . with two network interfaces (physical ports). We disabled
We define P : Φ → S k to be a placement function that HyperThreading in our servers in order to have more control
for every service chain ϕ ∈ Φ, maps every VNF ϕi ∈ ϕ over core utilization.
to a server Sj ∈ S. We identify two particularly interesting In both types of OV S installations (Linux kernel and
placement functions: DPDK), we isolate CPU-core and separate between two
(i) Pg – we call gather placement, where for every service disjoint sets CPU-cores: S c = 24 = S h + S v , i.e. CPU-
chain the placement function deploys all VNFs ϕi ∈ ϕ cores used for management and for allocating resources for
on the same server, namely: the VNFs to operate. This a priori separation between these
two disjoint sets of CPU-cores plays an important role in the
∀ϕ∈Φ ∀ϕ ,ϕ ∈ϕ :
i j
Pg (ϕi ) = Pg (ϕj ) behavior of CPU consumption (and packet processing). In our
experiments the size of the disjoint set of cores that are given
(ii) Pd – we call distribute placement, where for every
to the hypervisor (S h ) varies between 2 to 12 physical cores,
service chain the placement function deploys each VNFs
while the remainder is given to the VNFs (i.e. S v ranges
ϕi ∈ ϕ on a different server, namely:
between 22 to 12). The exact values depend on the type of
∀ϕ∈Φ ∀ϕ ,ϕ ∈ϕ :
i j i 6= j → Pd (ϕi ) 6= Pd (ϕj ) installation (Linux kernel or DPDK).
The DUT is installed with CentOS version 7.1.1503, Linux
As explained above, both placement functions being consid- kernel version 3.10.0. All guests operating system are installed
ered, follow OpenStack and the objective goal implemented with Fedora 22, Linux kernel version 4.0.4 – running on top of
by nova-scheduler which can be either load balancing qemu version 2.5.50. Each VNF is configured to have a single
that distributes VNFs between resources, or energy conserving virtual CPU pinned to a specific physical core and 4GBytes
that gathers VNFs on a selected resource. Figure 1 illustrates of RAM. Network offload optimizations (e.g., TSO, GSO) in
our deployment strategies. Figure 1(a) depicts deployment of all network interfaces are disabled in order to avoid any noise
VNFs that follows the gather placement functions Pg , and in the analysis and provide a fair comparison between the
Figure 1(b) depicts deployment of VNFs that follows the two types of OV S installations. We evaluated Open vSwitch
distribute placement functions Pd . version 2.4 in both kernel mode and DPDK. For the latter, we
For a given placement function P that deploys all service compiled OV S against DPDK version 2.2.0 (without shared
chains in Φ on the set of servers S, we define C : P → memory mechanisms – in compliance with rigid security re-
(R+ )k to be a cpu-cost functions that maps placement P to quirements imposed by NFV). Packet generation is performed
the required CPU per server. We say the cpu-cost function on a dedicated server that is interconnected to the DUT server
is feasible if there are enough resources to implement it. with one network interface to send data, and another network
Namely, cpu-cost function C is feasible with respect to a given interface to receive. In all experiments, we generate the traffic
placement P if for every server Sj ∈ S, the function do with Sockperf version 2.7 [9], that invokes 50 TCP flows.
not exceed the number of CPU-cores installed in the server: Note that our objective is to provide an analytic model that
∀Sj ∈S : C(P)j ≤ Sjc captures the cost of software switching. In order to be able to
Note that our main focus is the evaluation of the cost- examine and provide a clear understanding of the parameters
functions with respect to the hypervisor performance, and that impact the cost of software switching, we need to simplify
therefore we evaluate deployments of VNFs that are fixed to our environment by removing optimizations such as network
do the minimum required processing, i.e. forward the traffic offloads, and fixing resource allocation. For achieving optimal
from one virtual interface to another. performance, the reader is referred to [8].
We evaluate the placement functions defined in Section
III. D EPLOYMENT E VALUATION
II (i.e., the gather placement function Pg and the distribute
In this section, we evaluate the performance of software placement function Pd ) on our DUT as follows. We vary
switching for NFV chaining considering several performance the number of simultaneously deployed VNFs from 1 to 30
metrics such as throughput, CPU utilization, and packet in each of the placement functions. All VNFs forward their
processing. We begin by describing our environment setup, internal traffic between two virtual interfaces (using Linux IP
followed by a discussion on performance metrics and possible forwarding), which are configured in different sub-domains.
bottlenecks. We evaluate two types of OV S installations: Figure 3 illustrates deployments of VNFs on a single server
Linux kernel and DPDK; and compare between the two types (our DUT). Figure 3(a) depicts the traffic flow of the gather
of VNF placements: gather placement function Pg , versus placement function Pg . Ingress traffic is arriving from the
distribute placement function Pd . physical NIC to the OV S that forwards it to the first V NIC of
the first VNF. The VNF then returns the traffic to the OV S
A. Setup through its second V NIC, and the OV S forwards the traffic to
Our environment setup consists of two high-end HP Pro- the following VNF, composing a chain that eventually egress
Liant DL380p Gen8 servers, each server has two Intel Xeon the traffic through the second physical NIC. On the other hand,

3
1) Packet Processing and Throughput: A key factor in the
𝝋𝟏𝟏 𝝋𝟏𝟐 𝝋𝟏𝟑 𝝋𝟏𝒏 behavior of our environment is the a priori separation between
… two disjoint sets of CPU-cores (S h and S v ). Thus, in our
experiments, we vary values of CPU-cores that are allocated
to the OV S (hypervisor). For ease of presentation, we show
only several selected values.
Open vSwitch
Figure 4(a) depicts the average throughput for gather place-
DUT
ment function Pg , and Figure 4(d) for distribute placement
function Pd . For the case of distribute placement function Pd ,
(a) Following gather placement function Pg .
the more VNFs we deploy on the server, the average through-
put increases, while for the gather placement function Pg the
𝝋𝟏𝟏 𝝋𝟐𝟏 𝝋𝟑𝟏 𝝋𝒎
𝟏 average throughput decreases. Figures 4(b) and 4(e) depict
… packet per second for both our placement function Pg and Pd .
The results show that OV S can seamlessly scale to support 5-
10 VNF, however at that point OV S reaches saturation, we
observe mild degradation when keeping increasing the number
of deployed VNFs.
Open vSwitch
DUT Figures 5(a) and 5(d) (and Figures 5(b) and 5(e)) depict
average throughput (and packet per second, respectively) of
(b) Following distribute placement function Pd . our two placement function Pg and Pd , on a DUT that is
installed with OV S-DPDK. The behavior of OV S-DPDK is
Fig. 3: Given a set of m service chains Φ, illustrating deploy- similar to that of the kernel OV S, with the exception of having
ment of VNFs on a single server (our DUT). better network performance.
2) CPU Consumption: Figures 4(c) and 4(f) depict the
Figure 3(b)) depicts the traffic flow of the distribute placement average CPU consumption of OV S in Kernel mode. For both
function Pd . For this placement, it is the responsibility of the placement functions, the CPU consumption of the VNFs is
traffic generator to spread the workload between the VNFs. bounded by the number of allocated cores for management
Ingress traffic arriving from the physical NIC to the OV S that S h . A tighter bound of the CPU consumed by the VNFs
forwards it to one of the VNFs through its first V NIC. The (for networking) is a function of the CPU consumed by
VNF then returns the traffic to the OV S through its second the hypervisor to manage the traffic, namely the total CPU
V NIC, that egress the traffic through the second physical NIC. consumed by the VNF is proportional to the total CPU
consumed by the hypervisor in order to steer the traffic.
B. Evaluating Packet Intense Traffic Comparing CPU utilization between the two placement
In order to discuss utilization of resources, we measure and functions, we observe that both placement strategies behave
analyze the results of our DUT for several configurations, similarly for short service chains with little traffic. However,
while receiving intense network workload. To generate an for longer service chains with increasing traffic requirements
intense network workload we generate traffic where each the behaviour of the two placements differs. Namely, in the
packet is composed of 100Bytes (avoid reaching the NIC’s gather placement, the CPU consumed by the VNF is almost
wire limitation). We examine the behaviour of throughput, identical to the CPU consumed by the hypervisor, whereas
packet processing and CPU consumption for increasing chain in the distribute placement the CPU consumed by the VNFs
size, using both placement functions, i.e. Pg and Pd . are 70% to 50% of the CPU consumed by the hypervisor.
The six graphs in Figure 4 (and Figure 5) depict CPU This observation suggests that in order to achieve CPU cost
consumption, packet processing, and throughput performance efficient placement, different traffic might require different
on a DUT that is installed with kernel OV S (and DPDK- placement strategies (as will be shown in the results of our
OV S, respectively). The three graphs on the top (Figures 4(a), analytical analysis in Section IV).
4(b), and 4(c) for kernel OV S and Figures 5(a), 5(b), and Figures 5(c) and 5(f) depict the average CPU consumption
5(c) for DPDK-OV S) present the results following the gather of DPDK-OV S. For both placement functions, the CPU
placement function Pg and the three graphs on the bottom consumed by the hypervisor is fixed and derived from the
(Figures 4(d), 4(e), and 4(f) for kernel OV S and Figures 5(d), DPDK poll-mode driver (with the additional CPU-core for
5(e), and 5(f) for DPDK-OV S) present the results following the management of other non-networking provision tasks).
the distribute placement function Pd . Similarly to the behavior observed for OV S in Kernel mode,
We measure throughput by aiming at the total of traffic we observe that CPU utilization (for the two placement
that the traffic generator successfully sends and receives after functions) behave similarly to a small amount of VNFs and
routing through the DUT. To measure packet processing we have correlation with the amount of traffic being handled
aggregate the number of received packets in all interfaces (derived from throughput analysis).
of the OV S (including TCP acknowledgments). For CPU
consumption, we present the results for both the CPU-cores C. Evaluating Throughput Intense Traffic
allocated to the hypervisor to manage and support the VNF We reiterate the experiment presented in Subsection III-B
and CPU-cores allocated for the VNFs to operate. in the context of achieving maximum throughput. Note that as

4
● HV = 4 800
1.5M

# Packets in OVS (pps)


● HV = 4 HV = 6
HV = 6 HV = 8
Bandwidth (Mbps)

200

# CPU Utilization
HV = 8 600

1M
400 ●
● ● ● ● ● ●
100 ● ●
● ● ● ●
● ● ●
● ● ● ● ●●
● ● ● ●
200 ●
● ● HV = 4 VNF = 20
●● 0.5M ●
● ● ● ●
HV = 6 VNF = 18
● ● ●
0
● ● ●
0.3M ● 0 HV = 8 VNF = 16
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(a) BW for placement Pg (b) PPS for placement Pg (c) CPU for placement Pg

800
1.5M
# Packets in OVS (pps)
Bandwidth (Mbps)

200

# CPU Utilization
600
● ● ●
● ● ●
● ● ● ● ● ● ● ● ●
1M ●● ● ●

● ● ●
400 ● ● ● ● ● ● ● ● ● ●

● ●
100 ●


● ● HV = 4 ● HV = 4 200
HV = 6 0.5M HV = 6 ● ● HV = 4 VNF = 20
HV = 8 ● HV = 8 HV = 6 VNF = 18
0 0.3M 0 HV = 8 VNF = 16
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(d) BW for placement Pd (e) PPS for placement Pd (f) CPU for placement Pd

Fig. 4: Experiment results showing throughput, packet processing and CPU consumption for traffic generated in 100Bytes
packets, that is examined over increasing size of chain, on a DUT that is installed with kernel OV S.

1000 5M
● HV = 1 + 1 ● HV = 1 + 1 VNF = 22
# Packets in OVS (pps)

● HV = 1 + 1 HV = 2 + 1 1000 HV = 2 + 1 VNF = 21
HV = 2 + 1
Bandwidth (Mbps)

800 HV = 4 + 1 HV = 4 + 1 VNF = 19
# CPU Utilization

4M
HV = 4 + 1
800
600 3M
600

400
2M ●● ● 400


● ● ● ● ● ●
● ● ● ●
200 200
●● 1M
● ●● ●● ● ● ● ● ● ● ● ● ● ●
● ● 0.5M
● ● ● ● ● ● ●
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(a) BW for placement Pg (b) PPS for placement Pg (c) CPU for placement Pg

1000 5M ● HV = 1 + 1 VNF = 22
# Packets in OVS (pps)

● HV = 1 + 1 1000 HV = 2 + 1 VNF = 21
HV = 2 + 1
Bandwidth (Mbps)

800 HV = 4 + 1 VNF = 19
# CPU Utilization

4M HV = 4 + 1
800
● ●
600 ●● ●


3M ● 600
● ●
400 ● ●

● ●
2M ●

400
● ● ● ● ● ●
● ● ● ● ●
200 ● HV = 1 + 1 200
1M
HV = 2 + 1 ●● ●● ● ● ● ● ● ● ● ● ● ●
HV = 4 + 1 0.5M
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF # VNF

(d) BW for placement Pd (e) PPS for placement Pd (f) CPU for placement Pd

Fig. 5: Experiment results showing throughput, packet processing and CPU consumption for traffic generated in 100Bytes
packets, that is examined over increasing size of chain, on a DUT that is installed with DPDK-OV S.

opposed to the previous section where we wanted to examine able unit, i.e. we generate traffic where each packet is com-
the CPU cost of the transferred traffic, here our goal is solely posed of 1500Bytes. We examine the behaviour of throughput
to maximize our throughput. In order to discuss maximum for increasing chain size, using both placement functions Pg
throughput, we measure and analyze the results of our DUT and Pd . Since the behaviour of CPU consumption, and packet
for several configurations, while receiving maximum transfer- processing are similar to the behaviour observed for 100Bytes

5
9000 9000 12000
8000 ● HV = 4 8000 ● HV = 1 + 1
HV = 6 HV = 2 + 1 10000
Bandwidth (Mbps)

Bandwidth (Mbps)

# CPU Utilization
HV = 8 HV = 4 + 1
6000 6000 ●


6000
4000 4000 ● ● ●
●● ● ● ●
● ● ● ● ●



2000 2000 ● HV = 4 − Gather

●● ● 2000 ● ● HV = 4 − Distribute
●●● ●
● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
HV = 8 − Gather
● ●
0 0 HV = 8 − Distribute
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF Chaining Size

(a) BW for placement Pg (a) BW for placement Pg (a) Kernel OV S


9000 9000 HV = 1 + 1 − Gather
8000 8000 ●●● 30000
● HV = 1 + 1 − Distribute
● ● HV = 4 + 1 − Gather
Bandwidth (Mbps)

Bandwidth (Mbps)

# CPU Utilization

● ●
HV = 4 + 1 − Distribute
6000 6000 ● ● ●
● ●
● ● ●
●● ● ● ●
● ● ●
● ●
4000 ● 4000 15000

10000 ●
2000 ●
● HV = 4 2000 ● HV = 1 + 1 ●● ● ● ● ● ● ● ●
● ● ●
HV = 6 HV = 2 + 1 5000 ●
HV = 8 HV = 4 + 1
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# VNF # VNF Chaining Size

(b) BW for placement Pd (b) BW for placement Pd (b) DPDK-OV S

Fig. 6: Throughput for traffic generated Fig. 7: Throughput for traffic generated Fig. 8: Analysis of multiple servers,
in 1500Bytes packets, that is examined in 1500Bytes packets, that is examined showing total throughput for traffic gen-
over increasing size of chain, on a DUT over increasing size of chain, on a DUT erated in 1500Bytes packets, that is ex-
that is installed with kernel-OV S that is installed with DPDK-OV S amined over increasing size of chain

packet (in bigger scale), we omit their presentation. are installed with kernel OV S (DPDK-OV S, respectively).
Figure 6 (and Figure 7) depicts throughput performance
IV. M ONOLITHIC C OST F UNCTION
on a DUT that is installed with kernel OV S (DPDK-OV S,
respectively). Both Figures 6(b) for kernel-OV S, and 7(b) Section III focuses on exhaustive evaluations, analyzing
for DPDK-OV S show that the average throughput can scale throughput, OV S packet processing and CPU consumption
up as long as the server is not over-provisioning resources for different placement functions and software switching tech-
(when deploying 1-5 VNFs). For the case of kernel OV S the nologies (kernel OV S and DPDK-OV S). In the following,
bottleneck is the packet processing limit, while the NIC’s wire we describe how we build the generalized abstract cpu-cost
limit is the bottleneck for the case of DPDK-OV S. The same functions, and then present and discuss the results.
effect can also be seen in Figures 6(a) and 7(a) where again
A. Building an Abstract Cost Function
as long as the server is not over-provisioning resources, the
chain of VNFs is able to forward ∼0.8-1.5 Gbit/s in the case of Based on the measured results on a single server (presented
kernel OV S, and ∼6 Gbit/s in the case of DPDK-OV S. Note in Figures 4 - 5, and 6 - 7), we are now ready to craft an
that DPDK-OV S does not reach its packet processing limit abstract generalized cost function that accurately captures the
(as it was seen in the case of 100-Byte packets – Subsection CPU cost of network switching.
III-B). Instead, the observed limit is induced by the amount We iterate the following process for both kernel OV S and
of packet processing that a single CPU core (allocated for DPDK-OV S. For each placement function (Pg or Pd ) and
the VNF) can perform (∼6 Gbit/s). This bottleneck can be for each packet size (100Bytes or 1500Bytes), we split the
mitigated if VNFs are set to have more than a single vCPU construction and build a set of sub-functions that compose the
– that are configured to enable Receive Side Scaling (RSS). cpu-cost function, namely C = {Cgs , Cds , Cgb , Cdb } where each
So far, we presented the average throughput of a single is a cpu-cost sub-function for:
server (our DUT). A naive straightforward analysis might lead • Cgs – placement Pg and packet size 100Bytes.
to the wrong conclusion that the distribute placement function • Cds – placement Pd and packet size 100Bytes.
Pd outperforms the gather placement function Pd . As can be • Cgb – placement Pg and packet size 1500Bytes.
seen in Figures 8(a) and 8(b), this is not the case. Figures 8(a) • Cdb – placement Pd and packet size 1500Bytes.
and 8(b) present the average overall throughput estimated by Per each sub-function and for all measured service chain
the model defined in Section II on a set of many servers that length, we sample the throughput (Figures 4(a) - 4(d), 5(a) -

6
5(d), 6(a) - 6(b), and 7(a) - 7(b)) to estimate the amount of However, as the requirement for packet processing increases
packets that a single server can process, and correlate between (1Mpps to 15Mpps), the behaviour turns over and favors
the service chain length and the amount of packets. Next, we the distribute placement cpu-cost function Cdb . Both results
correlate the resulting packet processing, with the measured presented above show that deciding which of the placement
CPU consumption (Figures 4(c) - 4(f), 5(c) - 5(f)). strategy is better, depends on the required demand of packets
Finally, after extracting a 3-dimensional correlation between to process, where the exact dependency varies according to
(i) service chains length; (ii) packet processing; and (iii) CPU the technology used.
consumption, we use the results to extract a set of cpu-cost Next we examine the cpu-cost function by varying the
functions using logarithmic regression, as follows: demand (i.e. required packets to process) of a service chain,
for arbitrarily selected few service chain lengths.
log(CX ) = α · log(ϕ|n ) + β · log(ϕ|p ) + γ
Figures 11 and 12 depicts the CPU cost for both placement
Where X ∈ {gs, ds, bd, db}, namely gather or distribute functions when increasing the required number of packet to
placements for 100Bytes or 1500Bytes size of packets. Table process (1500Bytes per packet) on servers that are installed
I lists per each sub-function the coefficients α and β, and the with kernel OV S (Figure 11) and DPDK-OV S (Figure 12).
constant factor γ. In these graphs we focus on the definition of a feasible cpu-
cost function. Thus, we normalize the cpu-cost values (i.e.,
Kernel OV S DPDK-OV S the value 100% is now the total CPU-core on all servers),
and scale the amount of required packets to process, in order
α β γ α β γ
to focus on the bounds. All values presented in the graphs
Cgs 0.586 0.858 -1.789 0.370 0.467 1.543 are in log scale. The graphs reaffirm the results discussed in
Cds 0.660 0.243 -2.661 0.217 0.091 3.795 Figures 9 and 10, that is, deciding the appropriate placement
strategy depends on the required network traffic demand to
Cgb 0.752 0.979 -3.856 0.478 0.578 0.194 process.
Cdb 1.009 0.268 -7.176 0.157 0.109 4.718 Recall the definition of feasible cpu-cost function from
Section II: a cpu-cost function is feasible if there are enough
resources to implement it. In the result presented in Figures
TABLE I: Coefficients α and β, and the constant factor γ, per
11 and 12 the value 100 indicates that we have reached the
each cpu-cost sub-function.
processing bound and we cannot process more packets. We
can observe that the infeasibility point is reached differently in
B. Insights
each deployment strategy. For instance, the cpu-cost function
The values presented in Table I reflect the real CPU cost of of the distribute placement Cdb reaches its infeasibility point
the various deployments, but they provide very little insight faster than the gather placement Cgb when using OV S in Kernel
regarding our motivation question (see Figure 1). In order to model (Figure 11).
get a real understanding of this cost we provide graphs that
depict the CPU cost for various service chains characterized V. R ELATED W ORK
by the length of the chain ϕ|n , and the amount of packets to
process ϕ|p . Network function placement. A proliferating field of
Figures 9(a), 9(b), and 9(c) depict the CPU cost for both interest in NFV is VNF placement [10]–[12] and chaining
placement functions when increasing the number of VNFs [13]–[17] strategies. The efficient orchestration of VNFs and
(and also the number of service chains) on servers that are its routing (or chaining) play a crucial role in the performance
installed with kernel-OV S, and traffic is received in large of deployed network services. In this regard, [12] focus on
packets (1500Bytes per packet). where to place VNFs and how to assign flows to them. In turn,
In all graphs, the CPU consumption is the total amount of [14], [15] continue and focus on joint optimization of VNF
CPU required on all physical machines to support the service placement and chaining. Specifically, they defined optimized
chain (where the value 100 is a single CPU-core). For service placement functions which take into account how VNFs are
chains with low packet processing requirements (10 Kpps - 50 interconnected. However, all mentioned lines of work have
Kpps), the cpu-cost function of the distribute placement Cdb , neglected the cost of software switching and its limitations.
outperforms its gather placement counterpart Cgb . However, In general, those works have focused on minimizing arbitrary
as the requirement for packet processing increases (100 Kpps cost functions. Therefore, the usage of such models in real
to 1.5 Mpps), the behaviour turns over and favors the gather NFV deployments might either lead to infeasible solutions or
placement cpu-cost function Cgb . suffer form high penalties on the expected performance.
In turn, Figures 10(a), 10(c), and 10(b) depict the CPU Middleboxes and Traffic steering. Both academia and
cost for both placement functions when increasing the number industry show interest in virtualizing network appliances
of VNFs (and also the number of service chains) on servers through a programmable, flexible and scalable architectures
that are installed with DPDK-OV S, and traffic is received in [18]–[21]. Traffic steering is an essential building block for
1500Bytes per packet. In this case the behaviour changes. enabling flexible NFV deployment. SDN is a complementary
For service chains with low packet processing requirements technology that enables the dynamic traffic steering between
(10Kpps - 50Kpps), the cpu-cost function of the gather place- middleboxes and commodity servers [22]–[26]. However, most
ment Cgb , outperform its distribute placement counterpart Cdb . of the recent literature on SDN has analyzed inter-server

7
100000
10 Kpps − Gather 100 Kpps − Gather 1.0 Mpps − Gather
4000 ● 10 Kpps − Distribute ● 100 Kpps − Distribute ● 1.0 Mpps − Distribute
# CPU Utilization

# CPU Utilization

# CPU Utilization
50 Kpps − Gather 10000 200 Kpps − Gather 75000 1.5 Mpps − Gather
50 Kpps − Distribute 200 Kpps − Distribute 1.5 Mpps − Distribute ●
3000
50000 ●

2000 ●
5000 ●
● ●
25000
1000 ●

● ●

● ● ●
● ●
0 ● ● ● ●
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# Chaining size # Chaining size # Chaining size

(a) 10Kpps and 50Kpps. (b) 100Kpps and 200Kpps (c) 1Mpps and 1.5Mpps

Fig. 9: Cpu-cost ranging over different service chain length (ϕ|n ), while receiving traffic generated in 1500Bytes packets, for
both placement functions on servers that are installed with kernel OV S.

150000
10 Kpps − Gather 1.0 Mpps − Gather ● 10 Mpps − Gather
● 10 Kpps − Distribute 40000 ● 1.0 Mpps − Distribute ● 10 Mpps − Distribute
# CPU Utilization

# CPU Utilization

# CPU Utilization
20000 50 Kpps − Gather ● 1.5 Mpps − Gather ● 15 Mpps − Gather
50 Kpps − Distribute 1.5 Mpps − Distribute 100000
15 Mpps − Distribute
● 30000


20000 ● ●
10000 ●
50000 ●
● ●

10000 ●
● ● ●

● ● 0 ●
0 0
1 5 10 15 20 25 30 1 5 10 15 20 25 30 1 5 10 15 20 25 30
# Chaining size # Chaining size # Chaining size

(a) 10Kpps and 50Kpps (b) 1Mpps and 1.5Mpps (c) 10Mpps and 15Mpps

Fig. 10: Cpu-cost ranging over different service chain length (ϕ|n ), while receiving traffic generated in 1500Bytes packets, for
both placement functions on servers that are installed with DPDK-OV S.

100 ● 100 ●
● ● ● ●

50 50
% of CPU Utilization

% of CPU Utilization


10 ● 10

1 1
● Gather − 5 Distribute − 5 Gather − 5 Distribute − 5
Gather − 30 ● Distribute − 30 Gather − 30 ● Distribute − 30
10 K 100 K 1M 10 M 10 K 100 K 1M 10 M
# Packets to be processed (in pps) # Packets to be processed (in pps)

Fig. 11: Cpu-cost ranging over different packet processing Fig. 12: Cpu-cost ranging over different packet processing
requirements (ϕ|p ) 1500Bytes per packet, for both placement requirements (ϕ|p ) 1500Bytes per packet, for both placement
functions, on servers that are installed with kernel OV S. functions, on servers that are installed with DPDK-OV S.

traffic, rather than intra-server traffic – that is the focus of We provide insights on how latency, throughput, packet pro-
this work. cessing and CPU consumption behave when scaling up service
chaining deployments. Furthermore, we develop a generalized
VI. C ONCLUSION AND F UTURE W ORK cost function that accurately captures the CPU cost of software
Software switching is a key component that enables the switching in this setting.
communication between VNFs. in any cloud based infrastruc- Understanding the costs and the limitations of software
ture. The rigid network requirements introduced by network switching in NFV environments is a key ingredient in the
function virtualization (i.e. high throughput and low latency) ability to design efficient solutions for VNF management
makes it a crucial component in that paradigm. and orchestration – possibly leading to lower operational
In this paper, we conduct an extensive and in-depth evalua- costs. Thus, a natural extension of this work is to develop
tion, measuring the performance and analyzing the impact of cost-efficient service chain deployment schemes based on the
deploying service chains on a real NFV-based infrastructure. devised CPU-cost function.

8
R EFERENCES [22] Z. A. Qazi, C.-C. Tu, L. Chiang, R. Miao, V. Sekar, and M. Yu, “Simple-
fying middlebox policy enforcement using sdn,” SIGCOMM Comput.
[1] J. Martins, M. Ahmed, C. Raiciu, V. A. Olteanu, M. Honda, R. Bifulco, Commun. Rev., vol. 43, no. 4, pp. 27–38, Aug. 2013.
and F. Huici, “Clickos and the art of network function virtualization,” [23] Y. Zhang, N. Beheshti, L. Beliveau, G. Lefebvre, R. Manghirmalani,
in Proceedings of the 11th USENIX Symposium on Networked Systems R. Mishra, R. Patneyt, M. Shirazipour, R. Subrahmaniam, C. Truchan,
Design and Implementation, (NSDI 2014), April 2014, pp. 459–473. and M. Tatipamula, “Steering: A software-defined networking for inline
[2] ETSI Industry Specification Group (ISG) Network Functions Vir- service chaining,” in 2013 21st IEEE International Conference on
tualisation (NFV), “Network functions virtualisation,” Available: Network Protocols (ICNP), Oct 2013, pp. 1–10.
http://www.etsi.org/technologies-clusters/technologies/nfv. [24] S. K. Fayazbakhsh, L. Chiang, V. Sekar, M. Yu, and J. C. Mogul,
[3] P. Quinn and T. Nadeau, “Problem Statement for Service Function “Enforcing network-wide policies in the presence of dynamic middlebox
Chaining,” Internet Requests for Comments, RFC Editor, RFC 7498, actions using flowtags,” in 11th USENIX Symposium on Networked
July 2015. [Online]. Available: http://www.rfc-editor.org/rfc/rfc7498.txt Systems Design and Implementation (NSDI 14). Seattle, WA: USENIX
[4] “Openstack,” Association, Apr. 2014, pp. 543–546.
http://www.openstack.org/, accessed: 04-19-2016. [25] G. W. Adiseshu Hari, T. V. Lakshman, “Path switching: Reduced-
[5] B. Pfaff, J. Pettit, T. Koponen, K. Amidon, M. Casado, and S. Shenker, state flow handling using path information,” in Proceedings of the 11th
“Extending networking into the virtualization layer,” in Proceedings of International Conference on Emerging Networking Experiments and
HotNets, Oct. 2009. Technologies, ser. CoNEXT ’15. New York, NY, USA: ACM, 2015,
[6] B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, pp. 1–7.
J. Gross, A. Wang, J. Stringer, P. Shelar, K. Amidon, and M. Casado, [26] J. R. A. S. J. R. Nanxi Kang, Monia Ghobadi, “Efficient traffic splitting
“The design and implementation of open vswitch,” in Proceedings on commodity switches,” in Proceedings of the 11th International
of the 12th USENIX Symposium on Networked Systems Design and Conference on Emerging Networking Experiments and Technologies, ser.
Implementation (NSDI 15). Oakland, CA: USENIX Association, May CoNEXT ’15. New York, NY, USA: ACM, 2015, pp. 1–13.
2015, pp. 117–130.
[7] “Open vswitch,”
http://www.openvswitch.org/, accessed: 06-09-2016.
[8] Intel, “Intel open network platform release 2.1: Performance test
report,” Internet Engineering Task Force, Mar. 2016, Available:
https://01.org/packet-processing/intel -onp.
R
[9] “Sockperf,”
https://github.com/Mellanox/sockperf/, accessed: 04-19-2016.
[10] S. Clayman, E. Maini, A. Galis, A. Manzalini, and N. Mazzocca, “The
dynamic placement of virtual network functions,” in 2014 IEEE Network
Operations and Management Symposium (NOMS), May 2014, pp. 1–9.
[11] M. Ghaznavi, A. Khan, N. Shahriar, K. Alsubhi, R. Ahmed, and
R. Boutaba, “Elastic virtual network function placement,” in Cloud
Networking (CloudNet), 2015 IEEE 4th International Conference on,
Oct 2015, pp. 255–260.
[12] R. Cohen, L. Lewin-Eytan, J. S. Naor, and D. Raz, “Near optimal
placement of virtual network functions,” in 2015 IEEE Conference on
Computer Communications (INFOCOM), April 2015, pp. 1346–1354.
[13] S. Mehraghdam, M. Keller, and H. Karl, “Specifying and placing chains
of virtual network functions,” in Cloud Networking (CloudNet), 2014
IEEE 3rd International Conference on, Oct 2014, pp. 7–13.
[14] M. C. Luizelli, L. R. Bays, L. S. Buriol, M. P. Barcellos, and L. P.
Gaspary, “Piecing together the nfv provisioning puzzle: Efficient place-
ment and chaining of virtual network functions,” in 2015 IFIP/IEEE
International Symposium on Integrated Network Management (IM), May
2015, pp. 98–106.
[15] M. F. Bari, S. R. Chowdhury, R. Ahmed, and R. Boutaba, “On
orchestrating virtual network functions,” in Proceedings of the 2015 11th
International Conference on Network and Service Management (CNSM),
ser. CNSM ’15. Washington, DC, USA: IEEE Computer Society, 2015,
pp. 50–56.
[16] W. Rankothge, J. Ma, F. Le, A. Russo, and J. Lobo, “Towards making
network function virtualization a cloud computing service,” in 2015
IFIP/IEEE International Symposium on Integrated Network Manage-
ment (IM), May 2015, pp. 89–97.
[17] M. Bouet, J. Leguay, and V. Conan, “Cost-based placement of vdpi
functions in nfv infrastructures,” in Network Softwarization (NetSoft),
2015 1st IEEE Conference on, April 2015, pp. 1–9.
[18] A. Gember, R. Grandl, J. Khalid, and A. Akella, “Design and implemen-
tation of a framework for software-defined middlebox networking,” in
Proceedings of the ACM SIGCOMM 2013 Conference on SIGCOMM,
ser. SIGCOMM ’13. New York, NY, USA: ACM, 2013, pp. 467–468.
[19] J. W. Anderson, R. Braud, R. Kapoor, G. Porter, and A. Vahdat, “xomb:
Extensible open middleboxes with commodity servers,” in Proceedings
of the Eighth ACM/IEEE Symposium on Architectures for Networking
and Communications Systems, ser. ANCS ’12. New York, NY, USA:
ACM, 2012, pp. 49–60.
[20] A. Gember-Jacobson, R. Viswanathan, C. Prakash, R. Grandl, J. Khalid,
S. Das, and A. Akella, “Opennf: Enabling innovation in network function
control,” in Proceedings of the 2014 ACM Conference on SIGCOMM,
ser. SIGCOMM ’14. New York, NY, USA: ACM, 2014, pp. 163–174.
[21] D. H. Anat Bremler-Barr, Yotam Harchol, “Openbox: A software-defined
framework for developing, deploying, and managing network functions,”
in Proceedings of the ACM SIGCOMM 2016 Conference on SIGCOMM,
ser. SIGCOMM ’16. New York, NY, USA: ACM, 2016, pp. 1–15.