Anda di halaman 1dari 16

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Introduction to Multi-Protocol Label Switching (MPLS)


Multi-Protocol Label Switching (MPLS) is rapidly becoming a key technology for
use in core networks, including converged data and voice networks. MPLS does
not replace IP routing, but works alongside existing and future routing
technologies to provide very high-speed data forwarding between Label-Switched
Routers (LSRs) together with reservation of bandwidth for traffic flows with
differing Quality of Service (QoS) requirements. MPLS enhances the services that
can be provided by IP networks, offering scope for Traffic Engineering, guaranteed
QoS and Virtual Private Networks (VPNs).
MPLS uses a technique known as label switching to forward data through
the network. A small, fixed-format label is inserted in front of each data packet on
entry into the MPLS network. At each hop across the network, the packet is routed
based on the value of the incoming interface and label, and dispatched to an
outwards interface with a new label value. The path that data follows through a
network is defined by the transition in label values, as the label is swapped at
each LSR. Since the mapping between labels is constant at each LSR, the path is
determined by the initial label value. Such a path is called a Label Switched Path
(LSP).
MPLS Header Format

Label Value : 20-bit label value


Exp : experimental use can indicate class of service
S : bottom of stack indicator 1 for the bottom label, 0 otherwise
TTL : Time To Live

Figure 5 : MPLS label format


MPLS Operation
Page
1

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

An MPLS network or internet consists of a set of nodes, called Label Switched


Routers (LSRs), that are capable of switching and routing packets on the basis of a
label which has been appended to each packet. Labels define a flow of packets
between two endpoints or, in the case of multicast, between a source endpoint
and a multicast group of destination endpoints. For each distinct flow, called a
Forwarding Equivalence Class (FEC), a specific path through the network of LSRs
is defined. Thus, MPLS is a connection-oriented technology. Associated with each
FEC is a traffic characterization that defines the QoS requirements for that flow.
The LSRs do not need to examine or process the IP header, but rather simply
forward each packet based on its label value. Therefore, the forwarding process is
simpler than with an IP router.

Figure 6: MPLS Operation


Figure 2, based on one in [4], depicts the operation of MPLS within a domain of
MPLS-enabled routers. The following are key elements of the operation.
Page
2

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

1.

Prior to the routing and delivery of packets in a given FEC, a path


through the network, known as a Label Switched Path (LSP), must be
defined and the QoS parameters along that path must be
established. The QoS parameters determine (1) how many resources
to commit to the path, and (2) what queuing and discarding policy to
establish at each LSR for packets in this FEC. To accomplish these
tasks, two protocols are used to exchange the necessary information
among routers:
a An interior routing protocol, such as OSPF, is used to exchange
. reachability and routing information.
b Labels must be assigned to the packets for a particular FEC.
. Because the use of globally unique labels would impose a
management burden and limit the number of usable labels, labels
have local significance only, as discussed subsequently. A network
operator can specify explicit routes manually and assign the
appropriate label values. Alternatively, a protocol is used to
determine the route and establish label values between adjacent
LSRs. Either of two protocols can be used for this purpose: the
Label Distribution Protocol (LDP) or an enhanced version of RSVP.

2.

A packet enters an MPLS domain through an ingress edge LSR where


it is processed to determine which network-layer services it requires,
defining its QoS. The LSR assigns this packet to a particular FEC, and
therefore a particular LSP, appends the appropriate label to the
packet, and forwards the packet. If no LSP yet exists for this FEC, the
edge LSR must cooperate with the other LSRs in defining a new LSP.

3.

Within the MPLS domain, as each LSR receives a labelled packet, it:
a Removes the incoming label and attaches the appropriate
. outgoing label to the packet.
b Forwards the packet to the next LSR along the LSP.
Page
3

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

.
4.

The egress edge LSR strips the label, reads the IP packet header, and
forwards the packet to its final destination.

Network recovery
If a failure occurs in a network, recovery is achieved by moving traffic from the
failed part of the network to another portion of the network. It is important that
this recovery operation can be performed as fast as possible to prevent too many
packets from getting dropped at the failure point. If this is achieved fast the failure
can be unnoticeable (resilient) or minimal for end-users.
Recovery techniques can be used in both Circuit Switched and Packet
switched networks. When a link or node in a network fails, traffic that was using
the failed component must change the path used to reach the destination. This is
done by the adjacent routers to the failure that updates their forwarding tables to
forward packets on a different path that avoids the failing component. The path
Page
4

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

that the traffic was using before the failure is called the primary path or the
working path and the new path is called the backup path. Often recovery
techniques consist of four steps.
First, the network must be able to detect the failure. Second, nodes that
detect the failure must notify certain nodes in the network of the failure. Which
nodes that are notified of the failure depend on which recovery technique that is
used. Third, a backup path must be computed. Forth, instead of sending traffic on
the primary path a node called Path Switching Node must send traffic on the
backup path instead. This step is called switchover and completes the repair of
the network after a failure.
If we consider an unicast communication. When a link fails on the path
between the sender and the receiver, users experience service disruption until
these four steps are completed. The length of the service disruption is the time
between the last bits was sent before the failure occurred is received, and the
instance when the first bit of data that uses the backup path arrives at the
receiver. The total time of service disruption is:
Service Disruption = Time to detect failure + Time to notify + Time to compute
backup + Time to switchover.

Types of errors in networks


Any of the resources within a network might fail. The traditional error in a network
is a link failure caused by a link getting cut or unplugged by mistake. The
following table shows the distribution of outage events in that study.

Page
5

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Table 1 : Failure types


Planned events can be upgrades or configuration of the router OS. This
includes patches, new software releases, routing/policy changes etc. It can also be
replacement or reconfiguration of hardware. These outages are planned in time so
they do not need to result in any downtime for traffic, as traffic can be moved to
another segment of the network before the outage occurs. The unplanned events
are the failures that do affect network traffic as they are not planned and
therefore have to be reacted upon when they occur.
Software failures in the control plane are failures in any of the routing control
plane protocols like the IP routing protocol. Data plane software failures are
failures in the forwarding software. Hardware failures are categorized as failures
that happen in any control or system component of a router, as router processor,
switch fabric, fans etc, and in failure of a particular line card.

Page
6

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Link failures are failures to links such as link cuts and failure in transmission
equipment.
No device can be more reliable than its power supply, it has therefore been
common to install extra power supplies to eliminate the risk of power outage. And
as can see seen in the table, power outage is only 1% of the observed causes for
outage in a network.
As we can see from this table, link failures and control plane failures are the
most common kind of failures.
Network layer recovery
In packet switched networks like the Internet the recovery mechanisms rely on the
capabilities of the routing protocols. If a failure occurs then each router in the
network has to be informed about the failure and the routing tables in each router
have to be recomputed using a shortest path first algorithm. When routing tables
has been recomputed and converted, all paths that were using a failed link or
node are rerouted through other links.
To detect when a failure has occurred, routing protocols use timers. The
routing protocol on a node periodically sends hello messages to its neighbor
nodes. If a neighbor doesn't receive a preset number of hello messages in a time
interval, then the routing protocol interprets this as a failure on the link to that
node. When a network failure has been detected then all the nodes in the network
has to be notified about the failure. This is done by the node that detected the
failure, which sends a LSA (Link State Advertisement) to all other nodes in the
network. Then each node has to perform a shortest path first calculation with the
failed link pruned from the network. When all nodes have finished the calculation,
new routing tables are updated and traffic can be rerouted in the network.

Page
7

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Routing protocols make the network able to survive one or multiple link or
node failures, but they do not guarantee the recovery time. The recovery time
strongly depends on the dimension of the network and on the routing protocol
used. The IP layer relies on the physical layer that provides transport of IP packets
between two points in the network. There is no standard mechanism for exchange
of information about network status between the IP layer and the lower layers,
therefore routing protocols in IP run transparent as regards of type and structure
of the physical layer.
With the current default settings in routing protocols as OSPF, the network
takes several tens of seconds before recovering from a failure. The main reason
for this is the time it takes before a failure is detected using the hello protocol.
There have been some proposals that the recovery time could be reduced by
reducing the value of the hello message interval. But it has been showed that a
reduction of this interval can cause problems. If the interval is to small then it will
result in an increased chance of network congestion causing loss of several
consecutive hellos, thus a node may think that a link is down that is only
congested. It will then flood the network with LSAs causing all the routers to make
shortest path first calculations. Then it will start to receive hello messages again
and conclude that the link is up, thus flooding the network with new LSAs. This will
not only lead to unnecessary routing changes but also increase the processing
load on the nodes.
MPLS Network Recovery Mechanisms
As in recovery mechanism for other layers, MPLS recovery operates by forwarding
traffic on a new path around the point of failure in the network. Where this path is
placed, when it is calculated and how it is setup depends on the recovery
mechanism used.

Page
8

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

If no recovery mechanism is used in an MPLS domain, recovery is performed


by the default action by the signalling protocol used to setup and maintain a LSP,
this is called best-effort restoration. In the case of RSVP-TE, a failure can be
detected by hello messages or by the RSVP-TE soft state mechanism. As described
earlier the soft state mechanism in RSVP-TE keeps the LSP reserved by
periodically sending PATH and RESV messages. If a failure occurs on the LSP, the
soft state will fail to update the reservation and depending on if it were the PATH
or RESV message that failed, a PathErr or ResvErr message will be sent back
upstream to the ingress LSR of that LSP. If a failure is detected by the hello
message state, then a PathErr will be sent back to the ingress LSR. When the
failure is detected and notified to the ingress LSP it will try to setup the LSP again.
If the LSP was loosely routed, the ingress can try to find a link and node disjoint
path to the egress and try to setup the LSP on the new path instead. Or it can try
to setup the same LSP again and let the node adjacent to the failure compute a
new path from the point of failure to the egress. Once the path is established the
traffic can start to flow on the new LSP. If the LSP was strictly routed the ingress
will periodically try to setup the path again, until it becomes available. Restoration
time of this scheme is typically large and is only acceptable for best effort type of
traffic.
If recovery shall be performed faster other MPLS mechanisms need to be
used that can try to minimize the failure notification time or backup path
computation time.
The path recovery mechanisms in MPLS can be classified according to two criteria:
recovery by rerouting vs. protection switching and local repair vs. global repair.
Recovery Path Placement
There are two main categories of where a recovery path is placed. These are
called global repair and local repair.

Page
9

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Global Repair
The intent of global repair is to protect against any link or node failure on a
path or on a segment of a path, with the obvious exception of the fault
occurring at the ingress or egress LSR of the protected path segment. In
global repair, the POR is usually distant from the failure and needs to be
notified by a FIS. In global repair end-to-end path recovery can be applied,
where the working path is completely link and node disjoint from the
protection path. This has the advantage that all links and nodes on the
working path are protected by a single recovery path. But it has the
disadvantage that a FIS has to be propagated all the way back to the
ingress LSR before recovery can start.

Figure 7 : Global Recovery Path

Local Repair:
The intent of local repair is to protect against a link failure or neighbour
node failure and to minimize the amount of time required for failure
propagation. If a repair can be performed local to the device that detects
the failure, restoration can be achieved faster. In local repair (also known as
local recovery), the node immediately upstream of the failure is the node
that initiates the recovery operation (PSL). Local repair can be setup in two
different ways.
Page
10

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Link Recovery - In link recovery the goal is to protect a link in the


working path from failure. If a failure occurs on the protected link in the
working path then the protection path connects the PSL and PML with a
path disjoint of the working path with regard to the failed link.
Node Recovery - In node recovery the goal is to protect a node in the
working path from failure. In node protection the recovery path is disjoint
from the working path in regard to the protected node and the links from
the protected node, in difference from link protection where only the link
between two adjacent nodes is protected. In node protection there will
be one or more hops between PSL and PML.

Figure 8: Link Recovery

Figure 9 : Node Protection

Page
11

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Protection and Restoration


Centralized model
In this model, an Ingress Node is responsible for resolving the restoration as the
FIS arrives. This method needs an alternate disjoint backup path for each active
path (working path).
Protection is always activated at the Ingress Node, irrespective of where along the
working path a failure occurs. This means that failure information has to be
propagated all the way back to the source node before a protection switch is
activated. If no reverse LSP is created the fault indication can only be activated as
a Path Continuity Test.
This method has the advantage of setting up only one backup path per working
path, and is a centralized protection method, which means only one LSR, has to
be provided with PSL functions. On the other hand this method has an elevated
cost (in terms of time), especially if a Path Continuity Test is used as a fault
indication method. If we want to use an RNT as a fault indication method we have
to provide a new LSP to reverse back the signal to the Ingress Node

Figure 10 : Centralized model

LSP segment restoration (local repair)


With this method restoration starts from the point of the failure. It is a local
method and is transparent to the Ingress Node. The main advantage is that it
Page
12

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

offers lower restoration time than the centralized model. With this method, an
added difficulty arises in that every LSR, where protection is required, has to be
provided with switchover function (PSL). A PML should be provided too. Another
drawback is the maintenance and creation of multiple LSP backups (one per
protected domain). This could report low resource utilization and a high
development complexity. On the other hand, this method offers transparency to
the Ingress Node and faster restoration time than centralized mechanisms. An
intermediate solution could be the establishment of local backup, but only for
protection segments where a high degree of reliability is required, supplying only
protected path segments.

Figure 11 : Local restoration


Reverse backup
The main idea of this method is to reverse traffic at the point of failure of the
protected LSP back to the source switch of the protected path (Ingress Node) via a
Reverse Backup LSP.
As soon as a failure along the protected path is detected, the LSR at the ingress of
the failed link reroutes incoming traffic by redirecting this traffic into the
alternative LSP and traversing the path in the opposite direction to the primary
LSP.
This method is especially good in network scenarios where the traffic streams are
very sensitive to packet losses. Another advantage is that it simplifies fault
indication, since the reverse backup offers, at the same time, a way of
transmitting the FIS to the Ingress Node and to the recovery traffic path. One
Page
13

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

disadvantage could be poor resource utilization. Two backups per protected


domain are needed. Another drawback is the time taken to reverse fault indication
to the Ingress Node, as with the Centralized model.

Figure 12 : Reverse backup utilization

Protection
In network scenarios with a high degree of protection requirements, the possibility
of a multilevel fault management application could improve performance,
compared to the single method application. Nonetheless, complete scenario
construction is highly costly (in terms of time and resources), so intermediate
scenarios could be built instead. For example our protected domain could start
with just a centralized method, and as the protection requirements grows (a node
falls repeatedly), a new local backup could be provided, thus making available a
new protection mechanisms. These two methods can be activated at the same
time. If a fault is located at node 4 or link 3-4, the local method will be applied,
transparent to Ingress Node (due to local notification method).
Page
14

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Another advantage of using multilevel protection domains occurs when in


scenarios with multiple faults. For example, (fig 13-a) if node 4 falls (or LSPs 3-4 or
4-5 faults) and only a centralized backup LSP 1-2-7-5 is used and node 6 or links
1-6, 6-7 fall (during restoration) traffic could be route to 1-2-3-7-5 avoiding links
and node faults. Another example (fig. 13-b) occurs when applying local
restoration and link 3-7 falls. In this case, if another backup mechanism
(centralized model) is applied the faults are avoided.

Figures 13 (a), (b) : Multilevel protection application.


The development of this method could be highly costly (in terms of time and
resources). Complete scenario construction could be complex and could report
low resource utilization. We propose to analyse network survivability requirements
(QoS requirements) and establish different protection levels. Depending on the
protection level for a specific MPLS backbone, the development of a more or less
complex scenario is constructed. LSP Backup creation, bandwidth reservation,
fault indication, method activation, and PML/PSL functions assignation could be
carry out explicitly, via a network administrator, or could be done automatically,
via agent application. These agents could be placed on every Ingress Node,
developing a centralized policy whereby these agents could analyse LSP statistics
and network behaviours, and apply defined protection actions. The specific
development the creation and application of agents are beyond the scope of this
paper, yet certain proposals, such as could be taken into account when
elaborating upon more specific agent development.
Page
15

Network recovery, protection and restoration for Multi-Protocol Label Switching (MPLS)

Figure 14 : Agent application to a dynamic multilevel MPLS protection domain


Comparison between Protection and Restoration
Restoration

Protection

Advantages
Usually can recover from
multiplex element faults
More efficient usage of resource

Advantages
Simple and Quick: especially if it
uses 1+1
Do not require much extra
process time,
except to signal (set up) the
switches along
the pre-determined backup path
(1:1or 1:N)
Disadvantages
Usually can only recover from
single link fault (what if the precomputed path fails?)
Inefficient usage of resource Path
Characteristic: the resource are
reserved before the failure, they
may be not used

Disadvantages
Complex
Slow: require extra process time
to setup path and reserve
resource
Characteristic: the resource are
reserved and used after the failure
Route : can be dynamically
computed
Resource Efficiency : High
Time used : Long
Reliability: can survive under
multiplex faults
Implementation: Complex

Route: predetermined
Resource Efficiency: Low
Time used: Short
Reliability: mainly for single fault
Implementation: Simple
Page
16

Anda mungkin juga menyukai