Anda di halaman 1dari 120

BRKSPG-2402

Best Practices to Deploy High-Availability in Service Provider Edge and Aggregation Architectures

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Abstract
user, overall Service Availability becomes increasingly important. High Availability techniques such as Fast Convergence or MPLS TE FRR have focused on raising the availability of the network core in the past. Recently, these techniques are being increasingly deployed in

For Your Reference

As Service Providers are deploying value-added triple-play or quadruple play services to maintain or generate a higher average revenue per

Ethernet Aggregation networks, for example by introducing MPLS TE FRR in the aggregation. Also, additional high-availability mechanism are being developed to enhance the resilience of the IP Edge against failures. Examples of new developments include IP Fast-Reroute, BGP

Prefix Independent Convergence for both the Core and Edge, or even stateful application inter-chassis redundancy mechanisms to overcome
single-system outages. This Session aims to provide the audience with best current practices to increase service availability by deploying Cisco High-Availability mechanisms in both the Aggregation and the IP Edge. Traditional HA techniques such as NSF/SSO, BFD, Fast convergence or NSR are reviewed. The details of new technologies such as IP FRR, BGP PIC are discussed in depth. Furthermore,

advanced topics such as achieving HA for Layer 4-7 services or stateful inter-chassis redundancy solutions are introduced. The Session also
provides the best current practices of deploying the tools offered by the Cisco High-availability toolset, in particular the deployment of MPLS TE FRR in the aggregation. Furthermore, possible stateful and stateless clustering approaches are introduced, which SPs may use to increase the availability of their IP Edge architecture.

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Glossary
NHAT ACL ACT APS ARP AS ATM BFD BNG BW CC CC CDR CE CE CF CFM CLI CM CP CPLD CSC DHCP DP DPM DSLAM E2E ECMP EEM EOAM
BRKSPG-2402

For Your Reference

next hop address tracking Access Control List Active Automatic Protection Switching Address Resolution Protocol autonomous System Asynchronous Transfer Mode Bi Directional Forwarding Detection Broadband Network Gateway Bandwidth Continuity Check control connection call detail record Customer Edge Customer Edge checkpoint facility Configuration and Fault Management Command Line Interface Chassis Manager Control Plane Complex Programmable Logic Device ? Carrier's Carrier Dynamic Host Configuration Protocol Data Plane Defects per Million DSL Access Multiplexer End to end equal cost multipath Embedded Event Manager Ethernet OAM

EOBC ESP EVC EVDO FECP FIB FM FR FRR FSOL FWLB GEC GLBP GR GRE GW HA HSRP HW IETF IF IGP IOCP IOS IP IPC ISG iSPF ISSU IWF

Ethernet out of band management Embedded Services Processor Ethernet Virtual Circuit Evolution Data Only Forwarding Engine Control Processor Forwarding Information Base Forwarding Manager Frame Relay Fast Re Route First Sign of Life Firewall Loadbalancing Gigabit Ether Channel Global Load Balancing Protocol Graceful Restart Generic Route Encapsulation Gateway High Availability Hot Standby Routing Protocol Hardware Internet Engineering Task Force Interface Internal Gateway Protocol Input Output control Processor Internet Operating System Internet Protocol Inter process communication Intelligent Services Gateway incremental Shortest Path First in service software upgrade Interworking function
Cisco Public
4

2013 Cisco and/or its affiliates. All rights reserved.

Glossary (Cont.)
L2TP LAC LACP LAN LC LDP LFA LI LMI LNS LOS LSDB LSP LTE MC LAG mcast MD5 MFIB MLD MME MoFRR MPLS MRIB MSC MSPP MST MTBF MTSO MTTR NAT
BRKSPG-2402

For Your Reference

Layer 2 transport protocol L2TP access concentrator Link aggregation control Protocol Local Area Network Linecard label Distribution Protocol loop free alternate Lawful Intercept Local management interface L2TP network Server Loss of signal link state database label switched path long term evolution multi chassis link aggregation multicast message Digest algorithm 5 multicast forwarding information base multicast listener discovery mobile management entity Multicast Only fast reroute Multiprotocol label switching multicast routing information base mobile switching center Multi-service provisioning platform Minimum spanning tree mean time between failures mobile telephone switching office mean time to repair network address translation

NIC Nr Ns NSF NSR NVRAM OAM OCE OIR OS PADR PE PIC PIM PPP PS PSN PTA PVRSTP PW QFP RADIUS RF RMA RNC RP RPR RSP RSVP SAA

network interface card receive sequence number send sequence number non stop forwarding non stop routing non volatile random access memory operations, administration and maintenance Object Chain Element online insertion and removal operating system PPP active discovery provider edge prefix independent convergence protocol independent multicast Point to point protocol power supply Packet Switched Network PPP termination and aggregation Per VLAN rapid spanning tree pseudowire Quantum flow Processor remote authentication dial in user service redundancy facility Return material authorization radio network controller route processor route processor redundancy route switch processor resource reservation protocol service assurance agent
Cisco Public
5

2013 Cisco and/or its affiliates. All rights reserved.

Glossary (Cont.)
SBC SBY SGW SIP SLA SLB SP SPA SPF SRLG SSH SSO STP SW T&C TCAM TE TR UC uRPF VAI VC VCCV VIP VLAN VMAC VPN VRF VRRP WAN
BRKSPG-2402

For Your Reference

session border controller standby SAE gateway Session initiation protocol service level assurance server loadbalancing service Provider Shared port adapter shortest path first shared risk link group secure shell stateful switchover spanning tree protocol software terms & conditions ternary content addressable memory traffic engineering Traceroute unified communications unicast reverse path forwarding virtual access interface Virtual Circuit VC connection verification virtual IP virtual LAN virtual MAC virtual private network virtual routing and forwarding table virtual router redundancy protocol wide area network
Cisco Public
6

2013 Cisco and/or its affiliates. All rights reserved.

Agenda
Motivation for High Availability in SP Aggregation Networks
Network Level High Availability System High Availability Service High Availability Stateful Inter-chassis Redundancy

Case Studies
Summary and Conclusions

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Motivation for High-Availability in the IP Edge and Aggregation

High Availability and Service Level Agreements


Many SPs specify their SLAs in the T&Cs

Important characteristic of both business and residential services


Historically given for Core network, but expanding to end-end SLAs

Metrics
Service Availability (averaged over time) Mean time to repair (MTTR) Packet Loss / Delay / Jitter

Examples
AT&T Sprint Verizon Business BT Level 3
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

What Is High Availability?


Availability
99.000% 99.500% 99.900% 99.950% 99.990% 99.999% 99.9999%

For Your Reference

DPM
10000 5000 1000 500 100

Downtime per Year (24x365) 3 Days 1 Day 15 Hours 19 Hours 8 Hours 4 Hours 36 Minutes 48 Minutes 46 Minutes 23 Minutes 53 Minutes 5 Minutes 30 Seconds Predictive High Availability Proactive Reactive

10 1

Two ways to state availability of a network: Percentage Method DPM Method = Defects per Million (Hours of Running Time)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

10

Availability Definitions
MTBF Availability = MTBF + MTTR

For Your Reference

Uptime divided by the total time to create the percentage time your network is operational MTBF is Mean Time Between Failure
When does it fail?

MTTR is Mean Time To Repair


How long does it take to fix?
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

11

Calculated vs. Measured Availability


Calculated Availability based on:
Network design Component MTBF and MTTR different underlying models, simulations Cisco uses Industry standards to compute Hardware MTBF Basic Availability Calculation Formulae:

For Your Reference

A Series =

AK
k =1

= A1 A2 .... AN

A Parallel = 1 Measured Availability based on:


ICMP Reachability (E2E, Device)
Cisco Service Assurance Agent (SAA) Trouble Ticket / Outage Log Analysis Observed Method: Shipping/RMA Method
BRKSPG-2402

(1 - A k )
K =1

= 1 - (1 - A1 ) .... (1 - AN )

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

12

Reduction of MTTR
Stateful inter-chassis redundancy allows for additional resilience against
System Failures Interface Failures

For Your Reference

Product ID ASR1000-RP2 ASR1000-ESP20 ASR1000-SIP10 ASR1006 ASR1006-PWR-AC ASR1006-PWR-DC ASR1000-SIP40 ASR1000-ESP40 SPA-8X1GE-V2 SPA-1X10GE-L-V2

MTBF (hrs) 380532 335317 287549 1986649 570776 357781 283225 118790 482023 411892

Issue is not really MTBF of hardware modules, but rather


Line failures / optical path failures

Interface failures
Power outages

Goal of stateful inter-chassis redundancy is subsecond failover with state preservation for applications

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

13

Device Availability Calculation


IOS
12.0 GD

For Your Reference

IF1

IF2

CPU
NPE-400

Chassis and Backplane


CISCO7206-VXR

P/S P/S
PWR-7200-AC

PA-E3 PA-POS-OC3

Cisco 7206

IOS =
IF1 = IF2 =

30.000 = 0.999997 30.000 + 0.1


1.120.000 = 0.999996 1.120.000 + 4 600.000 = 0.999993 600.000 + 4

CPU =

490.000 = 0.999992 490.000 + 4 460.000 BB = = 0.999983 460.000 + 8 750.000 = 0.999995 P/S = 750.000 + 4

System Availability = 0.999997 * ....*0.999983*(1-(1-0.999995)2) = 0.999961 = 99.9961%


Calculated MTBF Values from Cisco Database
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

14

Network Availability Calculation


R1 R2

For Your Reference

Router R1,R2,R3, R4: 0.999961

R3

R4

Availability of R1 and R2 in Series = (0.999961*0.999961) = 0.99992175


2

but not considered: -Links (WAN, LAN) -Computer NICs -Computer OS -Computer Applications

Availability of Parallel Network Path (R1-R4) = 1 - ((1-0.999921)(1 - 0.999921)) = 0.999999994


3

Network Availability = 99.9999% Only Based on Device Availability Values


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

15

Cisco High-Availability Focus


System Level Resilience
increase MTBF using resilient HW/SW minimize MTTR for system failures (RP, LCs and SW) Mitigate planned outages by providing hitless software upgrades in the core and where redundant paths exist Deliver features for fast network convergence, protection & restoration

Network Level Resilience

Embedded Embed intelligent event management for proactive maintenance Management Automation and configuration management to reduce human and Automation errors
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

16

Cisco HA Feature Toolbox: System Level


RPR, SSO
NSF, NSR SSO Multirouter APS
Internet Service Provider Core Data Center Building Block

Stateful NAT/IPSec/Firewall/SLB stateful failover within single chassis


MPLS HA (L3VPN, L2VPN, InterAS, CSC, TE, FRR) IOS / IOS XR / IOS XE ISSU, dual IOS XE

Service Provider Edge

Service Provider Aggregation

Access Layer

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

17

Network-Level Resiliency
Network Design Resiliency
Dual-homing APS, GEC, MC-LAG

Event Dampening Fast Convergence


iSPF Optimization (OSPF, IS-IS) BGP Optimization Fast BGP Convergence
2nd Level Aggregation

SP Core IP Edge DC

Internet

Graceful Restart (MBGP, OSPF, RSVP, LDP) EMCP, Anycast, dual RR VRRP/HSRP/GLBP/SLB/FWLB MPLS High Availability LDP Graceful Restart MPLS/VPN NSF BFD MPLS FRR Path Protection MoFRR IP FRR Pseudowire Redundancy Spanning Tree (MST, PVRSTP...) ..................
BRKSPG-2402

1st Level Aggregation

Access

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

18

Cost of High Availability


Designing a network for higher Service Availability comes at a cost
Redundant Network Elements Redundant Links
Cost ($)

Redundant System Components (route processors, forwarding processors, power supplies, etc.)

Operational costs
Lower steady-state Utilization levels
Increased configuration and management Tighter maintenance windows
0% Availability 100%

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

19

Cost of High-Availability: Example


Large SP Network
Residential Services (3-Play) 10M Subscribers 1.25 Mbps / subscriber
Redundancy Scheme
No redundancy Access NW uplink redundancy (Agg1, Agg2, Agg3) AN uplink redundancy Access Network node redundancy (Agg1, Agg2, Agg3) Edge link redundancy Edge Node redundancy

Total Cost Chassis Costs $M $M $1,232 $529


$1,250 $1,563 $2,423 $2,425 $2,437 $529 $531 $1,044 $1,044 $1,056

Interface Costs (SPA, SFPs), $M $704


$721 $1,032 $1,379 $1,381 $1,381

Number of nodes 4658


4658 4680 9222 9222 9296

Up to 96% increased CAPEX for full redundancy!


Opex increased due to higher number of network elements
Access

Values for AN, Agg1, Agg2, Agg3 and Edge nodes only (No Pp-routers). Cumulative redundancy Schemes, GPL

Residential Residential

MPLS Aggregation

Edge

Core

Subscribers Locations System Type


BRKSPG-2402

AN 200,000 Generic

Agg1 4000 ASR 9000

Agg2 500 ASR 9000

Agg3 74 ASR 9000

BNG 74 ASR 1013


Cisco Public

P 74 CRS-3
20

2013 Cisco and/or its affiliates. All rights reserved.

Network High Availability

End-to-End Service HA Solution Set


3 TE/FRR Link/Node/Path Protection sub-50msec network convergence 5 LACP Ethernet bundle With Full L3 support 7 HSRP/VRRP Excellent Scale 100ms timer BFD Integration 4 ECMP 32-way IGP/LDP, 8-way BGP Dynamic ECMP

For Your Reference

Aggregation
3

7
5 4

SP Core

6 MPLS IP/TE FRR IP FRR w/ OSPF/ISIS sub-50msec network convergence

1
0 2

1 Distributed BFD Rapid failure detection 15ms timer High scale

0 System-level HA (Baseline) RSP failover: 0 packet loss All L3 protocols are NSF capable NSR: OSPF, ISIS, BGP Routing timers and protocol configs are optimized by default

2 BGP PIC Prefix Independent Convergence Fast Core/edge failure convergence

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

22

HA Network Map
Access Aggregation Edge Core

Access Domain

AN L4-7 App Failure Detection Recovery L3 Failure Detection Recovery

<->

AGG

<->

Edge Keepalives Stateful App Redundancy

<->

BFD MPLS TE FRR, IP Event Dampening, Fast Convergence, IP FRR,

Keepalives NSR / NSF HSRP / VRRP/GLBP/SLB/FWLB Multicast HA Keepalives PPP / FR / ATM / HDLC / GE SSO Interrupts Module Redundancy
Cisco Public

BFD, Keepalives ECMP, iSPF, BGP PIC Core / Edge, IP / MPLS FRR, LNS Load sharing / Anycast / Dual RR, Fast Hello EOAM, MPLS Ping / TR GEC, APS, MC-LAG

L2

Failure Detection Recovery

EOAM, (VCCV) GEC / APS / MC-LAG Loss of Signal Path diversity / dual homing

Keepalives PW redundancy Bridge Domains Interrupts Module Redundancy

VCCV, EOAM, MPLS Ping / TR GEC / APS / MCLAG Loss of Signal Path diversity / dual homing

L0/1

Failure Detection Interrupts Recovery Module Redundancy

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

23

HA Network Map
Access Aggregation Edge Core

For Your Reference

Access Domain

AN L4-7 App Failure Detection Recovery L3 Failure Detection Recovery

<->

AGG

<->

Edge Keepalives Stateful App Redundancy

<->

BFD MPLS TE FRR, IP Event Dampening, Fast Convergence, IP FRR,

Keepalives NSR / NSF HSRP / VRRP/GLBP/SLB/FWLB Multicast HA Keepalives PPP / FR / ATM / HDLC / GE SSO Interrupts Module Redundancy
Cisco Public

BFD, Keepalives ECMP, iSPF, BGP PIC Core / Edge, IP / MPLS FRR, LNS Load sharing / Anycast / Dual RR, Fast Hello EOAM, MPLS Ping / TR GEC, APS, MC-LAG

L2

Failure Detection Recovery

EOAM, (VCCV) GEC / APS / MC-LAG Loss of Signal Path diversity / dual homing

Keepalives PW redundancy Bridge Domains Interrupts Module Redundancy

VCCV, EOAM, MPLS Ping / TR GEC / APS / MCLAG Loss of Signal Path diversity / dual homing

L0/1

Failure Detection Interrupts Recovery Module Redundancy

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

24

Link Failure Detection Mechanisms


Loss of Signal (LOS) key to link failure detection O(ms) with interrupt driven LoS detection on ASR 9000 Carrier Delay may be used to become resilient to link flaps BFD Ethernet OAM MPLS OAM
Media type Ethernet Last Mile Ethernet Provider Bridge MPLS LDP MPLS TE MPLS PW IPv4
BRKSPG-2402

For Your Reference

CC CP

CC DP

Loopback
-

Performance Traceroute
-

IEEE 802.1ah

IEEE 802.1ag (MAC: Broadcast Domain)


LDP Hello RSVP Hello LDP Hello IGP/BGP Hello BFD, Y.1713, Y.1711 BFD, Y.1711 BFD
-

LSP Ping
-

LSP TR
Cisco Public

VCCV Ping IP Ping

IP TR
25

2013 Cisco and/or its affiliates. All rights reserved.

BRKSPG-2202

Ethernet OAM Overview


E-LMI - Provides protocol and mechanisms used for:

For Your Reference

Notification of EVC addition, deletion or status to CE


Communication of UNI and EVC attributes to CE CE auto-configuration Notification of Remote UNI name and status to CE IEEE 802.3ah OAM Discovery Link Monitoring Fault Signaling Remote MIB Variable Retrieval Remote Loopback IEEE 801.3ag (CFM)

Family of protocols that provides capabilities to detect, verify, isolate and report end-to-end ethernet connectivity faults
Protocols (Continuity Check, Loopback and Linktrace) used for Fault Management activities
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

26

BRKSPG-2202

Ethernet OAM Overview

Core

E-LMI 802.3ah 802.3ah 802.3ah 802.1ag 802.3ah 802.3ah

E-LMI 802.3ah

Ethernet LMI: Automated configuration of CE based on EVCs and bandwidth profiles

L2 connectivity management
IEEE 802.3ah: When applicable, physical connectivity management between devices. IEEE 802.1ag: Connectivity Fault Management (CFM) Uses Domains to contain OAM flows and bound OAM responsibilities Provides per EVC connectivity management and fault isolation Three types of packets: Continuity Check, L2 Ping, L2 Traceroute
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

27

IEEE 802.1ag CFM Concepts


Nested Maintenance Domains (MDs)
break up the responsibilities for network administration of a given end-to-end service Defined by operational boundaries

Maintenance Domain

Nest & touch, but do not intersect

Maintenance Associations (MAs)


monitor service instances under a given MD Defined by set of MEPs at the edge of a domain Identified by {MA Name + MD ID}

Maintenance Association

Maintenance Association End Points (MEPs)


generate and respond to CFM PDUs Define boundaries of MD Initiate & respond to CFM PDUs

Maintenance Association End Points

Per-Maintenance Association multicast heart-beat messages


Carries status of port on which MEP is configured Uni-directional (no response required) Transmitted at a configurable periodic interval by MEPs

Catalogued by MIPs at the same MD-Level and service, Terminated by remote MEPs in the same MA
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

28

ITU-T Y.1731 Overview


OAM Functions for Fault Management
Ethernet Continuity Check (ETH-CC) (Y.1731 adds unicast CCM)
Covered by IEEE 802.1ag

Ethernet Loopback (ETH-LB) (Y.1731 adds multicast LBM) Ethernet Linktrace (ETH-LT) Ethernet Remote Defect Indication (ETH-RDI) Ethernet Alarm Indication Signal (ETH-AIS) Ethernet Locked Signal (ETH-LCK) In addition: ETH-TEST, ETH-APS, ETH-MCC, ETH-EXP, ETH-VSP

OAM Functions for Performance Management


Frame Loss Measurement (ETH-LM)
Frame Delay Measurement (ETH-DM)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

29

Virtual Circuit Connection Verification (VCCV) Overview


checks connectivity between egress and ingress PEs VCCV allows sending control packets in band of pseudowires (PW)
Signaling component: communicate VCCV capabilities as part of VC label

Type 1
(in-band vccv)

To signal in-band VCCV [RFC4385] using PW ID from PW Control Word

Switching component: cause the PW payload to be treated as a control packet

Type 2 (out-of-band VCCV) Type 3 (TTL expiry)

VCCV capability is negotiated when the AToM tunnel is brought up


depends on the LDP peer and the VC type

Signal out-of-band VCCV inserting MPLS router alert label between tunnel and PW Labels Manipulate and Signal TTL exhaust (TTL == 1) for multiple switching point PEs

Both endpoints must have the same capabilities

marks the payload as control packet for switching purpose; packet follows the PW data path Control packets sent over the AToM tunnels are intercepted by the egress PE

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

30

MPLS Pseudowire Status Signaling Procedure


AC
PE1

LDP Notification Message


PW Status TLV PW Status Code

PE2

AC

PW Status Signaling method selected if supported by both peers. PEs exchange label mapping messages upon PW configuration. Simple Label Withdraw status method will be used if one of the peers doesnt support PW Status Signaling. PW label wont be withdrawn unless AC is administratively down or the PW configuration is deleted. PW state set to down if the Label mapping is not available. Capability is on by default.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

31

Virtual Circuit Connection Verification (VCCV)


Emulated Service

For Your Reference

Pseudo Wire PSN Tunnel

CE1

PE1
Native Service

PW1

PW2

PE2

CE2

Native Service

Multiple Packet Switched Network (PSN) Tunnel Types


MPLS, IPSEC, L2TP, GRE,

Motivation
One tunnel can serve many pseudo-wires. MPLS LSP ping is sufficient to monitor the PSN tunnel (PE-PE connectivity), but not Virtual Circuits (VCs) inside of tunnel.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

32

BFD Protocol Overview


Accelerates convergence by running fast keepalives in a consistent, standardized mechanism across routing protocols Lightweight hello protocol Neighbors exchange hello packets at negotiated regular intervals Configurable transmit and receive time intervals Unicast packets, even on shared media No discovery mechanism BFD sessions are established by the clients e.g. OSPF, IS-IS, EIGRP, BGP, Client hello packets transmitted independently

EIGRP IS-IS BGP OSPF BFD

BFD Control Packets BFD

EIGRP IS-IS BGP OSPF

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

33

BFD Details
Async Mode
Session established between two peers
Timers are negotiated Hello packets similar to IGP control packets Does NOT react to failures itself -> notifies clients

For Your Reference

green is alive

Async mode (no echo): periodic control packets sent


Neighbour declared dead if no pkt is received for <interval * multiplier> period

orange is alive

Session established using async control session

Echo Mode
orange is alive

Echo mode: echo packets sent at negotiated rate, used for failure detection
Control packets sent at low rate

Scalability: between 500 and 4000 sessions


Scalability depends on timer settings (50ms 1sec)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

green is alive

Cisco Public

34

Network Convergence Why It Takes So Long


Detection of Link layer failure Report failure to Route Controller Generate and flood an LSP Trigger and Compute an SPF
Bottleneck
ms ms 10s of ms

10s of ms

Communicate new FIB entries to linecards

100s of ms

Install new FIB entries into linecard HW path

10s of ms

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

35

Network Convergence Why It Takes So Long


Detection of Link layer failure
Optimize IGP Convergence Optimize LDP & BGP Convergence
ms ms 10s of ms

Report failure to Route Controller Generate and flood an LSP Trigger and Compute an SPF

10s of ms

BGP PIC

Communicate new FIB entries to linecards

100s of ms

Install new FIB entries into linecard HW path

10s of ms

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

36

Hierarchical CEF
Optimizes the data plane for sub-second convergence CEF Data Structure Enhancements Solves the FIB Download Convergence Bottleneck
LSP and Prefix Independent Optimizes FIB
Default IP to MPLS
Failure
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry MPLS Label OCE MPLS Label OCE MPLS Label OCE Adjacency OCE - Interface

Repair
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry MPLS Label OCE MPLS Label OCE MPLS Label OCE

One CEF Update Message per prefix


Adjacency OCE - Interface Adjacency OCE - Interface

Hierarchical CEF Technologies


MPLS-FRR IP-FRR BGP PIC Core BGP PIC Edge Non-Hierarchical CEF Technologies MPLS Path Protection
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

PIC Core IP to MPLS


Failure
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry MPLS Label OCE MPLS Label OCE MPLS Label OCE Load Balance OCE Adjacency OCE - Interface

Repair
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry MPLS Label OCE MPLS Label OCE MPLS Label OCE Load Balance OCE

One CEF Update Message for Multiple Prefixes


Adjacency OCE - Interface Adjacency OCE - Interface

Cisco Public

37

BRKRST-3363

MPLS FRR 50 ms Convergence


Key Features
Fast Convergence for Link and Node Failures
Supported Across all Network Topologies MPLS-TE Traffic Management SRLG

BW Reservation
Per Tunnel Traffic Statistics

Link Failure

IP/MPLS Aggregation

Caveats
Requires MPLS and MPLS-TE No Protection for Ingress or Egress Tunnel Failures Requires Pre-Computed Backup Paths Requires (n-1)! Tunnels for Full Protection

FRR LSP Tunnel LSP VC LSPs


Cisco Public
38

Applicability
Protecting Links in the aggregation network
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

MPLS-FRR CEF
MPLS-FRR IP and MPLS
Failure
Loadbalance OCE
IP & MPLS CEF

For Your Reference

Loadbalance OCE Loadbalance OCE

Midchain OCE

Label OCE

FRR OCE Label OCE

Adjacency OCE - Interface Adjacency OCE - Interface

Repair

Pre-computed MPLS-FRR Backup Path

Loadbalance OCE
IP & MPLS CEF

One CEF Update Message


Midchain OCE Label OCE FRR OCE Label OCE Adjacency OCE - Interface

Loadbalance OCE Loadbalance OCE

Typical FIB Programming Rate - ~5000 10,000 CEF Updates per second
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

39

MPLS Path Protection


Key Features
Optimized for Ring Topologies

Utilizes Pre-Signaled Backup Tunnel


MPLS-TE Traffic Management SRLG BW Reservation

Per Tunnel Traffic Statistics

Link Failure

IP/MPLS Aggregation

Caveats
Requires MPLS and MPLS-TE No Protection for Ingress or Egress Tunnel Failures

Convergence Dependant on IGP Prefixes and L2VPN LSPs Under Protection

Applicability
Protecting Ring Topologies

Tunnel LSP

Protect Tunnel LSP VC LSPs

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

40

MPLS Path Protection CEF


MPLS-FRR IP and MPLS
Failure
Loadbalance OCE
IP & MPLS CEF

For Your Reference

Loadbalance OCE Loadbalance OCE

Midchain OCE
Midchain OCE

Label OCE Label OCE

Adjacency OCE - Interface Adjacency OCE - Interface

MPLS-TE Path Protect Tunnel


Repair
Loadbalance OCE
IP & MPLS CEF

One CEF Update Message per IGP Prefix and L2VPN LSP!

Loadbalance OCE Loadbalance OCE Midchain OCE Label OCE Adjacency OCE - Interface

Typical FIB Programming Rate - ~5000 10,000 CEF Updates per second
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

41

BRKRST-3363

IP FRR-LFA: 50 ms Convergence
R5
Key Features
50 msec Convergence for Link and Node Failures Works for MPLS and IP Only Environments Simple Automatic configuration of Loop Free Alternate Paths via OSPF or ISIS No Tunnels

Link Failure

R3

Loop Free Path

R4

Caveats
Requires a Loop Free Path for Protection No Bandwidth Reservation No Support for SRLG New Feature

R1

R2

Applicability
Strong Solution for Deployments with Cost Effective Bandwidth

No Convergence Required on Routers R2, R3, R4 and R5 to Maintain Green Traffic Flow!
Cisco Public
42

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Tight SLA Protection with IPFRR


IPFRR Loop Free Alternate (LFA) - Principle of Operation
If A finds an alternate path to B Then this alternate path is valid for any destinations that A normally routes via B

For Your Reference

IPFRR LFA Properties and Benefits


Automated No additional setup No IETF protocol change - all the needed info is already in classical LSDB Incremental deployment No inter-operability testing <50msec, prefix-independent Applicable to MPLS LDP networks

IPFRR LFA Principle of Operation

IPFRR LFA Deployment Dependency


IPFRR LFA coverage depends on the network topology

Two-plane network topologies are most friendly for IPFRR LFA deployments
Topology analysis required to assess IPFRR LFA efficiency
Two-plane Network Topology
LSDB = Link State DataBase Cisco PublicLDP = Label Discovery Protocol

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

43

IP FRR-LFA CEF Enhancement


Failure
IP Prefix FIB Entry IP Prefix FIB Entry IP Prefix FIB Entry Load Balance OCE Load Balance OCE Load Balance OCE MPLS Label OCE MPLS Label OCE MPLS Label OCE IP-FRR OCE Adjacency OCE - Interface

For Your Reference

IP FRR-LFA IP to MPLS

Adjacency OCE - Interface

Repair
IP Prefix FIB Entry IP Prefix FIB Entry IP Prefix FIB Entry Load Balance OCE

Pre-computed backup Path


MPLS Label OCE

Load Balance OCE


Load Balance OCE

MPLS Label OCE


MPLS Label OCE

IP-FRR OCE

One CEF Update Message for Multiple Prefixes


Adjacency OCE - Interface

Cleanup After Repair Assuming No Available Loop Free Path


IP Prefix FIB Entry IP Prefix FIB Entry IP Prefix FIB Entry Load Balance OCE Load Balance OCE Load Balance OCE MPLS Label OCE MPLS Label OCE MPLS Label OCE Adjacency OCE - Interface Adjacency OCE - Interface Adjacency OCE - Interface

Typical FIB Programming Rate - ~5000 10,000 CEF Updates per second
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

44

Non Stop Forwarding (NSF)


NSF Aware IGP and LDP NSF Aware eBGP

P1

PE1

CE1

Traffic is forwarded continuously

Routers to maintain forwarding state when communication between them is lost Routing sessions are established with NSF aware peers. Upon HA event, neighboring peers maintain forwarding until routing sessions are reestablished. Copy of FIB maintained on secondary and used on failure for continuously traffic flow. Requires neighboring routers to be NSF aware.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

45

BRKRST-3363

BGP PIC Edge


Optimizes BGP Convergence for BGP Next-Hop Change
PE to CE Link Failures
PE Node Failures CE Node Failures

Applicability
PE Routers

Requires bgp advertise-best-external to enable

Link Failure PE2

VPN1
CE1 PE1 PE3 CE2

VPN1

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

46

BGP PIC Edge PE-CE Link Protection

PE2

RR1 CE2 CE1

RR2
PE3

Traffic flow due to BGP best path

PE1

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

47

BGP PIC Edge PE-CE Link Protection

PE2

RR1 CE2 RR2 PE3

The BGP pre-calculated Backup Path

CE1

Traffic flow due to BGP best path

PE1

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

48

BGP PIC Edge PE-CE Link Protection

PE2

RR1 CE2 RR2 PE3


The best BGP path to CE1 is now through PE2

The BGP pre-calculated Backup Path

CE1

Traffic flow due to BGP best path

PE1

PE-CE link Failure Detects that link is down and CEF layer will switch to precomputed backup path
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

49

BGP PIC Edge PE-CE Link Protection

PE2

RR1 CE2

CE1

RR2
The best BGP path to CE1 is now through PE2

PE3

PE1
PE-CE link Failure

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

50

BGP PIC Edge PE-CE Link Protection


On Backup PE: router bgp 100 address-family ipv4 vrf red bgp advertise-best-external bgp additional-paths install

For Your Reference

MPLS-VPN
PE3 CE2

PE2 CE1 PE1


On Primary PE: router bgp 100 address-family ipv4 vrf red bgp additional-paths install bgp advertise-best-external

PE1 and PE2 precomputes bgp backup paths using bgp best-external approach When primary link PE1 - CE1 fails:
PE1 holds on to the bgp local labels and re-routes CE1s traffic to PE2 using labels advertised by PE2 PE1 uses fixed timer to clean up stale local labels

PE3 is expected to converge to start using PE2 as the BGP nexthop and IGP label for PE2 to send traffic from CE2 to CE1
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

51

BGP PIC Edge PE-Node Protection


Relies on BGP Add-path
PE2

RR1 CE2 CE1

RR2
The best BGP path to CE1 is through PE1 PE3

Traffic flow due to BGP best path

PE1

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

52

BGP PIC Edge PE-Node Protection

PE2

RR1 CE2 RR2 The best BGP path to CE1 is through PE1 PE3 CE1

Traffic flow due to BGP best path

PE1

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

53

BGP PIC Edge PE-Node Protection

PE2

RR1 CE2 RR2 CE1

Detects that PE1 is down and CEF layer will switch to precomputed backup path

PE3

PE1

PE-CE node Failure IGP signals router is dead

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

54

BGP PIC Edge PE-Node Protection

PE2

RR1

CE2
RR2

CE1

next BGP next-hop scan the path through PE2 will become the best Path

PE3

PE1

PE-CE node Failure

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

55

BGP PIC Edge PE Node Protection


On Backup PE: router bgp 100 address-family ipv4 vrf red bgp advertise-best-external bgp additional-paths install

For Your Reference

PE2

PE3 CE2

MPLS-VPN

CE1 PE1
On Primary PE: router bgp 100 address-family ipv4 vrf red bgp additional-paths install bgp advertise-best-external

On Ingress PE: router bgp 100 address-family ipv4 vrf red bgp additional-paths install

PE1, PE2 and PE3 precomputes bgp backup When node PE1 fails:
IGP notification on PE3 invalidates active path PE3 switches to backup path

PE3 is expected to converge to start using PE2 as the BGP nexthop and IGP label for PE2 to send traffic from CE2 to CE1
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

56

BGP PIC Edge Notes


Supported for IPv4/v6 and VPNv4/v6
Not supported for L2VPN and mVPN address families

For Your Reference

Failures detected using BFD or IGP

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

57

BGP PIC Edge CEF


Failure
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry Load Balance OCE MPLS Label OCE MPLS Label OCE

For Your Reference

PIC Edge IP to MPLS

Adjacency OCE - Interface Adjacency OCE - Interface

Repair
BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry

Pre-Computed Backup Path


Load Balance OCE

One CEF Update Message for Multiple Prefixes


MPLS Label OCE Adjacency OCE - Interface

Cleanup After Repair


BGP Prefix FIB Entry BGP Prefix FIB Entry BGP Prefix FIB Entry Load Balance OCE Load Balance OCE MPLS Label OCE MPLS Label OCE Adjacency OCE - Interface Adjacency OCE - Interface

Typical FIB Programming Rate - ~5000 10,000 CEF Updates per second
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

58

BGP PIC Edge vs. BGP PIC Core


1 2
PE2

For Your Reference

3
CE1

CE2
VPN1 site1

PE3
VPN1 Site2

PE1

BGP PIC core when IGP path to BGP Next-Hop changes


1.

Examples: PE-P or P-P link failure, P node failure Sub-second convergence (prefix independent) vs. multiple seconds convergence (prefix and hardware dependent) Enabled by default since IOS XE 2.5.0 (cef table output-chain build favor

convergence-speed)

BGP PIC edge When BGP Next-hop changes


2. 3.

when remote PE node fails or no longer reachable. when PE-CE link fails. Immediate to sub-second convergence (prefix independent) vs. multiple seconds convergence (prefix and hardware dependent)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

59

IP/MPLS High-Availabilty Options: Scorecard


Network Infrastructure- all transit links and nodes
IGP Fast Convergence (IGP FC)
Broke the barrier of <200msec restoration time Covers all faults, including multiple failures Fault Coverage Recovery Time

For Your Reference

Operational Simplicity

O(x00ms)

IP/MPLS Fast ReRoute Loop Free Alternate (IPFRR LFA)

Provides local protection (link, node) with <50msec recovery Tool to improve on IGP FC for most topologies (triangle, square, mesh)

<50ms <50ms

MPLS TE Fast ReRoute (TE FRR)

Provides local protection (link, node, path) with <50msec recovery

Service Edge- edge node and access links


BGP Prefix Independent Convergence (BGP PIC)
IP/IPVPN scale independent recovery in line with IGP FC and FRR

O(x00ms)

Applicable to all BGP based services (IPv4, IPv6, VPNv4, VPNv6)

Feasible to deliver very tight E2E Service Availability SLAs without increasing operational complexity
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

60

System High Availability

System High-Availability with Hardware Redundancy


Redundant hardware components
Power Supplies Route Processors Forwarding Processors Switching Matrix SPA Interface Cards

Interface Redundancy typically achieved using IEEE 802.3ad / LACP or APS Hardware Redundancy needs to be complemented by Software redundancy Features Cisco Platforms supporting hardware redundancy

CRS-3
BRKSPG-2402

ASR 9000

ASR 5000

ASR 1000

Cisco 12000
Cisco Public

Cisco 7600
62

2013 Cisco and/or its affiliates. All rights reserved.

ASR 9000 System Architecture


Distributed Forwarding Plane for Performance
Up to Eight Linecards

Line Card (40G)


MAC MAC MAC NPU NPU NPU CPU

RSP(s)

(Autonomous Forwarding)

FIC CPU FIC

CPU

BITS/DTI

Distributed IOSXR based Control Plane for Scale


MAC

Dual Route Switch Processors (RSPs)

NPU NPU FIC NPU NPU NPU


FIC

BITS/DTI

Dual-Core CPU on Each Linecard

Active/Active Switch Fabric for HA


Non-blocking Memory-less Fabric Service Intelligence with Hi / Lo Priorities,
MAC NPU NPU FIC NPU CPU MAC NPU MAC NPU NPU FIC MAC NPU NPU

Unicast & Multicast Recognition, and VoQs


Redundant EOBC, Fan Trays, Power supplies

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

63

Example: ASR 1000 System Redundancy


Data Plane
Embedded Services Processor (active)
FECP

Control Plane
Embedded Services Processor (standby)
FECP

Route Processor (active)


RP

Route Processor (standby)


RP

Embedded Services Processor (active)


FECP

Route Processor (active)

Route Processor (Standby)

Embedded Services Processor (Standby)


FECP

RP

RP

Interconn. Crypto assist

Interconn. Crypto assist

QFP subsystem

QFP subsystem

Crypto assist

QFP subsys-tem

Crypto assist

QFP subsys-tem

Interconn.

Interconn.

Interconn.

Interconn.

Passive Midplane
Interconn. Interconn. Interconn. Interconn. Interconn.

Passive Midplane
Interconn.

SPA Agg.

IOCP

SPA Agg.

IOCP

SPA Agg.

IOCP

SPA
Agg.

IOCP

SPA
Agg.

IOCP

SPA
Agg.

IOCP

SPA

SPA

SPA

SPA

SPA

SPA

SPA

SPA

SPA

SPA

SPA

SPA

ESI, (Enhanced Serdes) 11.5Gbps SPA-SPI, 11.2Gbps Hypertransport, 10Gbps

GE, 1Gbps I2C SPA Control SPA Bus

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

64

ASR 1000 Software Architecture


Runs Control Plane Generates configurations Populates and maintains routing tables (RIB, FIB) Provides abstraction layer between hardware and IOS (manages ESP redundancy) Maintains copy of FIB and interface list Communicates FIB status to active & standby ESP (or bulk-download state info in case of restart) Communicates with Forwarding manager on RP Provides interface to QFP Client / Driver Maintains copy of FIBs Programs QFP forwarding plane and QFP DRAM Statistics collection and communication to RP Implements forwarding plane Programs PPEs with forwarding information
SIP
Interconn.

For Your Reference

RP

CPU

Chassis Mgr.

IOS

Forwarding Mgr.

Kernel (incl. utilities)


Interconn.

ESP

FECP

QFP Client / Driver

Chassis Mgr. Forwarding Mgr.

Kernel (incl. utilities)


Interconn. Interconn.
Crypto assist

QFP subsys-tem QFP code

IOCP

SPA SPA SPA SPA driver driver driver driver

Chassis Mgr.

SPA Agg.

Kernel (incl. utilities)


SPA

SPA

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

65

ASR 1006 High Availability Infrastructure


RPact
Non-HA-Aware Application

IOSact

IOSsby

Non-HA-Aware Application

RPsby

Config MLD CEF Mcast Driver/Media Layer


IDB State Update Msg

CF
IPC Message Qs

I P C

RF
MRIB RT

Interconnect Used for IPC and Checkpointing

I P C

CF
IPC Message Qs

Config MLD

RF

CEF Mcast Driver/Media Layer


IDB State Update Msg

MFIB FIB

RIB

IDB

FMRP
MFIB FIB

FMRP

ESPact
FMESP QFP Client

ESPsby

HA operates in a similar manner to other protocols on the ASR 1000 Reliable IPC transport used for synchronization

FMESP

QFP Client

SPAs
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

66

Which Events Trigger Failovers?


The following events may trigger failovers on the RP/ESP
1. Hardware component failures 2. Software component failures

3. Online Insertion and Removal (OIR)


4. CLI-initiated failover (e.g. reload command, force-switchover command)

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

67

1. Failover Triggers: Hardware Failures


What hardware failures?
a. CPUs: RP-CPU, QFP, FECP, IOCP, interconnect CPU, I2C Mux, ESP Crypto Chip, b. Memory: NVRAM, TCAM, Bootflash, RP SDRAM, FECP SDRAM, resource DRAM, Packet buffer DRAM, particle length DRAM, IOCP SDRAM, c. Interconnects: ESI Links, I2C links, EOBC Links, SPA-SPI bus, local RP bus, local FP bus d. Other: heat-sinks,
ESP Interconn. RP
CPU

FECP

Detected using
CPLD interrupts / register bits within O(ms) controlled by CMRP

Interconn. Interconn.

QFP subsys-tem

Crypto assist

Watchdog timers: low level watchdogs running in O(min) that can initiate a reset (e.g. RP)
SIP IOCP

JTAG: RP can program CPLD on other modules. Test interconnects and other boards (primarily for RMAd hardware)

Interconn.

SPA Agg.

Interrupts generated by hardware failures initiate fail-over events

Hardware failures are typically fatal such that modules need to be replaced!
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

SPA

SPA

Cisco Public

68

2. Failover Triggers: Software Failures


What Software Failures?
a. Kernel: Linux on RP / ESP / SIP b. Middleware: Chassis Manager (CM), Forwarding Manager (FM) c. IOS d. SPA drivers
Interconn. RP
CPU

Chassis Mgr.

IOS

Forwarding Mgr.

Kernel (incl. utilities)

ESP

FECP QFP Client / Driver

Detected using
Kernel: the kernel supervises middleware or SPA driver processes (kmonitor()). It always knows if a process is healthy IPC: between 2 IOS (and only for IOS)

Chassis Mgr. Forwarding Mgr.

Kernel (incl. utilities) QFP subsys-tem Interconn.


QFP code

Interconn.
Crypto assist

Kernel will take the module down in a controlled manner


IOS, CMESP, CMSIP, FMESP, QFP Driver/Client, IMSIP are not re-startable!
SIP
Interconn.

IOCP
SPA SPA drive SPA SPA r drive r drive r drive r

Chassis Mgr.

Also setting register bits to initiate fail-over for ESP or RP

SPA Agg. Kernel (incl. utilities)

Note: some other processes are re-startable (CMRP, FMRP, SSH, Telnet)
Kernel will try to re-start the processes in this case

SPA

SPA

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

69

RPact Failover Procedure


SIP SIP SIP CMSIP Kernel
ESP (slot 0)
FMESP CMESP Kernel
ACT

For Your Reference

ESP (slot 1)
FMESP CMESP Kernel
SBY

RP (slot 0)
CMRP FMRP Kernel IOS
ACT Failure

RP (slot 1)
CMRP FMRP Kernel IOS
SBY

CMSIP Kernel CMSIP Kernel

Detect RPact failure Restart New RPact information ACT

Close ESI links (ESP) Establish ESI links

State information If not received in time, send restart message. Update H/W component file system
Cisco Public
70

ESI link status


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

RPact Failover Procedure (cont.)


SIP SIP SIP CMSIP Kernel
ESP (slot 0)
FMESP CMESP Kernel

For Your Reference

ESP (slot 1)
FMESP CMESP Kernel

RP (slot 0)
CMRP FMRP Kernel IOS

RP (slot 1)
CMRP FMRP Kernel IOS
Failover Take-over control using checkpointed state

CMSIP Kernel CMSIP Kernel

Forwarding State information Check updated state and discard old state Check updated state and discard old state

Service recovered
H/W initialization Initialize EOBC start Start kernel Start IOS

Start CM Start FM

Other RP information Run Mastership SBY ESI link status


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Detect RPsby

Forwarding State information


Cisco Public
71

ESPact Failover Procedure


SIP SIP SIP CMSIP Kernel
ESP (slot 0)
FMESP CMESP Kernel
ACT ACT Failure

For Your Reference

ESP (slot 1)
FMESP CMESP Kernel
SBY Interrupt SBY

RP (slot 0)
CMRP FMRP Kernel IOS
ACT ACT

RP (slot 1)
CMRP FMRP Kernel IOS
SBY

CMSIP Kernel CMSIP Kernel

State information of failed ESP Failed Disable ESI link w/ failed ESP Change state of ESI link w/ new ESPact ACT Reconfigure ESI link w/ RPs

Detect ESPact failure

ESI link status

ESI link status

Service Recovered with momentary packet loss


Resend state information

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

72

ESPact Failover Procedure (cont.)


SIP SIP SIP CMSIP Kernel
ESP (slot 0)
FMESP CMESP Kernel

For Your Reference

ESP (slot 1)
FMESP CMESP Kernel

RP (slot 0)
CMRP FMRP Kernel IOS

RP (slot 1)
CMRP FMRP Kernel IOS

CMSIP Kernel CMSIP Kernel

Restart

H/W initialization
Initialize EOBC

Wait for RPact


RPact information Detect RPact Activate ESI-link

Download software packages


Start kernel Start FM Start CM Register with CMRP Other-ESP information (e.g. mastership) SBY

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

73

IOS High Availability


During the initialization process IOS is loaded on both the RPact and Rpsby IOS learns about any events
Uses Redundancy and checkpointing facility

For Your Reference

Redundancy facility (when to synchronize)


A process to help in synchronization and coordination of switchovers (e.g. switchover events, switchover control and monitoring) Clients of the RF maintain the databases that are synchronized using the RF Examples: reliable internal data transfer service, Event notification mechanisms, logging, etc.

Checkpointing facility: (how to synchronize)


Defines a set of APIs and transport for different SSO-aware features to copy state Helps to synchronize state data in a consistent, repeatable and well-ordered manner Keeps checkpointing state Interfaces to RF
Non-HA-Aware Application

Active RP

Standby RP

Non-HA-Aware Application

Config L2TP CEF PPP Driver/Media Layer


IDB State Update Msg

Config L2TP

CF
IPC Message Queues

CF

I P C

Interconnect Used for IPC and Check-pointing

I P C

CEF
IPC Message Queues

PPP Driver/Media Layer


IDB State Update Msg

RF

RF

FIB

IDB

FIB

IDB

Line Card

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

74

Cisco Software High-Availability Support


Stateful Switchover (SSO) support for features provides the synchronization of dynamic feature state between hardware modules Configuration synchronization ensures that the running config is synchronized on the route processors
Dynamic State Preservation
Connectivity Protocols Routing & IP Services Multicast MPLS Protocols Broadband Security SBC

For Your Reference

ASR 1000
FR, PPP, MLPPP, HDLC, 802.1Q, BFD (BGP, IS-IS, OSPF) RP, HSRP, IPv6 NDP, uRPF, SNMP, GLBP, VRRP, NSR (MP-iBGP, eBGP), ISSU, GRE, IPv4 Multicast (IGMP), IPv6 Multicast (MLD, PIM-SSM, MLD Access group), MoFRR MPLS L3VPN, MPLS LDP , VRF-aware BFD, Roadmap: NSR LDP, T-LDP PPPoE, L2TP (LAC, LNS), DHCPv4/v6, AAA, session state (virtual templates), ISG, ANCP, LI SSO, Stateful Inter-chassis redundancy for FW / NAT SSO

ASR 9000
BFD (OSPF, BGP, IS-IS, Static) NSF (ISIS, OSPF, BGP), NSR (ISIS, OSPFv2, OSPFv3, BGP) NSF Multicast, BFD for PIM, MoFRR NSF (LDP, T-LDP, RSVP-TE) NSR (LDP), BFD for MPLS FRR, VRF-aware BFD PPPoE (including nV) Roadmap Roadmap

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

75

Non Stop Routing (NSR)


IGP and LDP eBGP

P1

PE1

CE1

Traffic is forwarded continuously

Routers to maintain routing state and forwarding state when communication between them is lost

Routing sessions are maintained between processors on a failure, allowing routing sessions to stay up with Peer
Copy of FIB maintained on secondary and used on failure for continuously traffic flow

No need for neighboring routers to be NSF aware or capable. Can give high reliability without upgrading CE.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

76

BGP NSR
Implemented by hardening code for
BGP RIB checkpointing BGP TCP interaction

Only supported for IPv4 unicast, VPNv4 unicast Address families in Cisco IOS Configuration
router bgp <asn> address-family ipv4 vrf RED neighbor x.x.x.x ha-mode sso

No peer session flaps on VPNv4 CE when RP switches over

Route updates during RP switchover are announced to VPNv4 CE peers


NO delays which prevents data black-holes during RP switchover as in the case of graceful restart peers

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

77

OSPFv2 NSR
Provides the ability to perform hitless RP switchovers when OSPF is used as the routing protocol (Expect zero-traffic loss across such HA events)

To enable OSPFv2 Non-Stop Routing (NSR)


router ospf <process id> [vrf <vrf name>] nsr

Activated on a per-process basis (for both ipv4 or ipv4 VRF for PE-CE sessions) Depends on the forwarding planes ability to retain state across control plane restarts and RP switchovers Alleviates dependency on OSPFv2 protocol extensions (NSF)
Neighboring routers are unaware that a router is NSR-capable Neighboring routers are unaware that a router has gone through an RP switchover

Provides near-transparent RP switchover capability


OSPF adjacencies remain up Minimal state refreshed by the restarting router post switchover Scalable to larger link state database sizes and number of neighbors
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

78

Service High Availability

High Availability for Advanced Service Models


Many SP Services already go beyond standard L3VPN / L2VPN / transport services Increasing subscriber management capabilities and L4-L7 services Examples:
Subscriber Management Multicast Session Border Controller Firewall IPSec

LI

Some Services can be made highly-available using Intra-chassis redundancy (e.g. IPSec, Firewall, NAT, PPPoX, L2TP) Stateless inter-chassis redundancy available for BNG Stateful Inter-chassis redundancy available for NAT, Firewall and SBC on the Cisco ASR 1000
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

80

L3VPN Key HA Technologies


Physical
Circuit Diversity

For Your Reference

Chassis Redundancy
NSF/NSR

Multihoming

Link Detection
IP Event Dampening BFD

Routing Protocols and Convergence


BGP PIC Core BGP PIC Edge IP-FRR MPLS Path Protection

CPE
Mobile Business
VRF Blue

Access

Edge

Core

Edge

Access

CPE
Mobile Business
VRF Blue

Business
VRF Red

Business
VRF Red

Site A Business
VRF Green

Site C Business
VRF Green

Business
VRF Orange

Business
VRF Orange

BRKSPG-2402

SiteCisco B and/or its affiliates. All rights reserved. 2013

Site D

Cisco Public

81

L3VPN Key HA Technologies


CPE
Mobile Business
VRF Blue

Access

Edge

Core

Edge

Access

CPE
Mobile Business
VRF Blue

Business
VRF Red

Business
VRF Red

Site A Business
VRF Green

Site C Business
VRF Green

Business
VRF Orange

Business
VRF Orange

Site B

Site D

CPE
BFD for PE-CE Link Detection NSF/NSR for Chassis HA PE Multihoming
Intra-Site PE for PE Diversity Inter-Site for SP Facility Diversity

Access
Circuit Diversity - Physical Diversity for Multihomed CPE
Physical Circuit Diversity is Not the Default Must be Requested from the SP

Edge
BFD for PE-CPE / PE-P Link Detection NSF/NSR for Chassis HA IP Event Dampening for PE-CPE IP-FRR for PE-P
For Cost Effective PE-P Bandwidth

BGP PIC Core BGP PIC Edge for Multi-Homed CPE


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

82

L2VPN Pseudowire Redundancy


Active-Standby PW Access Circuit Redundancy
L2TPv3 and MPLS Support

BRKSPG-2207

Detection Mechanisms
IGP Convergence for Remote PE Failure LDP Signaling for PE-CE Failure LDP Timeout for Remote PE Software Failure

Standby PW
P P
PE2 CE

PE1

P P

CE PE23
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Active PW

Cisco Public

83

Multicast High-Availability Behavior


Before failure
Multicast state is synchronized from RPact to RPsby Configuration MLDv1/v2 state information PIM or MRIB state are NOT synchronized MFIB also synched to ESPact and ESPsby
Config MLD

RPact

RPsby

IOSact

IOSsby
Config

CF

CEF Mcast Driver/Media Layer IDB State Update Msg


MFIB MRIB MRIB

I P C

RF

I P C

CF

MLD CEF

RF

Driver/Media Layer IDB State Update Msg

Mcast

After failure
RPsby sends out PIM hellos to all neighbors PIM neighbors re-send PIM state Newly active RP re-builds the PIM state IGP reconverges to assure uRPF check MFIB and ESP updates proceed to incorporate refreshed PIM state Forwarding of multicast packets is NOT disrupted
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

FMRP
MFIB MFIB

FMRP

ESPact
FMESP QFP Client

ESPsby
FMESP QFP Client

SPAs

ESPact continues to forward multicast traffic based on its version of the MFIB
Cisco Public
84

Multicast only Fast Re-Route (MoFRR)


Receiver
IPTV source

Multicast join on primary path Multicast join on backup path

Data packets are received from the primary and secondary paths The redundant packets are discarded at topology merge points due to RPF checks Failure:
Interface chance on where packets are accepted Backup path interfaces become active
POP1 POPN

Configuration and Restrictions


POP2 Normal path Alternate PIM Dependency on ECMP and will not work without it Disabled by default and enabled through a cli Applicable to IPv4 multicast only and not IPv6 multicast Works only for SM S,G and SSM routes Works where the rpf lookups are done in a single vrf Extranet routes are not supported Both primary and secondary paths should exist in the same multicast topology.

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

85

Stateful Application Switchover: PPP


Copies state information for PPP, PPPoE, and PPPoEoVLAN Sessions Switch-over is transparent to peers
Sessions are not torn-down / re-established

PPP, PPPoE, and PPPoEoVLAN Session States:


Configuration (through config synch), including QoS configuration, ACLs Session identifiers PADR frame (cached) RADIUS session attributes Physical interface VAI identifier MD5 signature

Statistics are synchronized on ASR 1000!

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

86

Stateful Application Switchover: L2TP


RPsby
L2TP Control Connections

RPact ESP
L2TP Tunnel

LNS

RPact synchronizes state with RPsby


State includes configuration, PPP session IDs, L2TP CC sequence numbers etc.
Sequence numbers (Ns, Nr) for L2TP Control Connections (CC) are only synched once for a packet window of X (i.e. once every X L2TP control packets)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

87

BNG Service Edge High Availability


Residential PADR PADI E-DSLAM Ethernet PADI Aggregation PADI BNG Cluster PADS PADO PADO Delay PADO

STB 1

PADI PADO

2 PADR 3 PADS 4

PPP Smart Server Selection allows user to configure specific PADO delay for a received PADI packet
Can be configured per bba-group or based on circuit-id/remote-id

In case of an outage of a BNG in the cluster, other BNG stand ready to accept subscriber sessions
Detection of failure possible at both ends of PPPoE session because of missing keepalives
Subscriber sessions have to be re-established

Allows BNG redundancy with predictable behavior


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

88

Stateful Inter-Chassis Redundancy

Motivation for Stateful Application Inter-Chassis Redundancy


Current Intra-chassis HA typically protects against
Control Plane (RP) Failures Forwarding Plane (ESP) failures

Interface failures can be mitigated using link bundling (e.g. GEC)

Any other failures may result in recovery times O(hours) Inter-chassis redundancy provides additional resilience against
Interface Failures System failures Site failures (allowing for geographic redundancy)

RP RP FP FP SIP SIP

SIP SIP

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

90

Stateful System Redundancy Models


Different deployment models
1+1 one system is actively processing and passing traffic, the other in standby mode. 1:1 two systems are actively processing and passing traffic, and backing each other up N+1 N systems are actively processing and passing traffic, and share a single standby

For Your Reference

System vs. Application


Is the inter-chassis resilience applicable to ALL of the features / functions configured on the system, or only for a particular application? System-level: provide resilience for ALL applications and traffic configured on a system Application-Level: provide resilience for a particular application and its traffic

Hot-standby vs. Cold-standby


Cold-standby: FIB / adjacency updates are NOT synchronized between active and standby system Hot-standby: forwarding/state information is synchronized between active and standby system

Different Approaches can also be categorized into


Control plane active-standby / active-active Forwarding plane active-standby / active-active
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

91

System Level Redundancy


Example: VSS Failover Granularity at the System Level Control-plane active-standby Active RP considers remote linecards under its control Forwarding-plane active-active No application granularity for failover

Application Level Redundancy


Example: RG Infra Failover Granularity at the Application Level (NAT, Firewall, SBC etc) Control plane active-active
Each RP only considers its own linecards, but synchronizes application state

Forwarding-plane active-active E.g. can have one set of firewall services resilient, and other set of firewall services non-resilient
Failover

Need to ensure all features are SSO capable


Failover

RPsby act
Fabric LC LC

RPsby act Fabric

RPact

FW

RPact

FW

ESP SIP

ESP

LC

LC

SIP

SIP

SIP

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

92

Stateful INTRA-Chassis Redundancy Revisited


IPC

Building-blocks required to achieve stateful interchassis redundancy are for ASR 1006 / 1013: 1. Redundant Hardware components

RP
RIB RT NAT

RP

RP / ESP / ESI links SIPs/SPAs are NOT redundant

ESP
SIP SIP

ESP SIP

2. Forwarding / Application State Tables 3. Control mechanism to synchronize between active-standby components
Who is active / who is standby? Initiate failover in case of failure

4. State transfer mechanism


Copy forwarding / application state tables to standby and keep synchronized Currently provided by IOS RF/CF infrastructure over IPC

Note: EOBC (internal control plane) infrastructure NOT shown.

5. Failure detection mechanism


Interrupts

Active Enhanced Serdes Link ESI (internal dataplane) Standby Enhanced Serdes Link ESI (internal dataplane)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

93

nV Edge Overview
Control Plane EOBC Extension (L1 or L2 connection) One or two 10G/1G from each RSP
External EOBC link fail wont cause RP failover as long as it has alternative EOBC link

Special external EOBC 1G/10G ports on RSP

Active RSP

Secondary RSP

1 Standby RSP

Secondary RSP

Internal EOBC

LC

LC

LC

LC

LC

LC

LC

LC

Inter-chassis data link (L1 connection) 10G or 100 G bundle (up to 32 ports)

Regular 10G or 100G data ports

Control plane EOBC extension is through special 1G or 10G EOBC ports on the RSP. External EOBC could be over dedicated L1 link, or over port-mode L2 connection

Data plane extension is through regular LC ports (it can even mix regular data ports and inter-chassis data plane ports on the same LC) Doesnt require dedicated fabric chassis flexible co-located or different location deployment, lower cost

For redundancy purpose, minimal two control plane and two data plane links are required
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

94

Inter-Chassis Control Plane and Data Plane Packet Format


Inter-chassis control plane link
Ethernet snap with special ethertype and internal mac addresses Work over L2 circuit as well assuming its port mode: transparently forward every packet Recommend L1 link, with up to 10msec latency

Inter-chassis data plane link


Regular Ethernet frame, with 802.1q tag (VLAN=1) In theory, it can work over L2 circuit, but its never tested and wont be supported officially

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

95

Multi-chassis LAG
mLACP uses ICCP to synchronize LACP configuration & operational state between PoAs, to provide DHD the perception of being connected to a single switch All PoAs use the same System MAC Address & System Priority when communicating with DHD
Configurable or automatically synchronized via ICCP

Every PoA in the RG is configured with a unique Node ID (value 0 to 7). Node ID + 8 forms the most significant nibble of the Port Number For a given bundle, all links on the same PoA must have the same Port Priority
Port #: 0x9001, Port Priority 1 PoA1 Node ID: 1

DHD

ICCP
Node ID: 2

System MAC: aaaa.bbbb.cccc System Priority: 1

LACP
BRKSPG-2402

PoA2 Port #:0xA001, Port Priority 2


2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

96

Inter-chassis Communication Protocol


RG

ICCP allows two or more devices to form a Redundancy Group ICCP provides a control channel for synchronizing state between devices ICCP uses TCP/IP as the underlying transport
ICCP rides on targeted LDP session, but MPLS need not be enabled
ICCP over Dedicated Link

Various redundancy applications can use ICCP:


mLACP Pseudowire redundancy

RG

ICCP over Shared Network


BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

97

Control Plane HA Model


0 Active RSP Standby Secondary RSP DSC Chassis LC LC LC LC LC 1 Active Standby RSP RSP Standby Secondary RSP RSP

For Your Reference

Active control plane Standby control plane

Non DSC Chassis LC LC LC

Only one Active RSP, Only one standby RSP at a given time, which are located on two different chassis
SSO/NSF/NSR works exactly the same way as two RSPs on the same chassis

Reliable out of band control channel between two chassis


IOS-XR control plan can tolerant hundreds of msec latency*, although the latency can impact overall service convergence time

Virtual Chassis is always on as long as there is one chassis and one RSP alive
* Practically, recommend maximum 10msec latency between two chassis
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

98

Data Plane Forwarding Model


0 Active RSP Secondary RSP 1 Standby RSP Secondary RSP

For Your Reference

LC

LC

LC

LC

LC

LC

LC

LC

Simulated switch fabric


Inter-chassis data links simulate the switch fabric , which provide the data connection between two chassis. It has similar features as switch fabric, for example, fabric qos. Packet load balancing over inter-chassis links is same as regular link bundle: per-flow based
Keep the existing IOS-XR two-stage forwarding model no forwarding architecture change for single chassis vs. nV Edge system In case of ECMP or link bundle paths cross two chassis, it prefer local port instead of load balancing packet to the other chassis. This is to reduce the inter-chassis link usage as much as possible. However, this feature (local rack preference) could be turn off by user CLI Only single Multicast copy is sent over inter-chassis link. Multicast replication is done on egress line cards and fabric on the local chassis

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

99

Data Forwarding
InterChassis LC
Data Plane
Encapsula tion 3

For Your Reference

Inter-Chassis LC Data Plane


4 P1 P2 P1 P2

Ingress LC Data Plane


Load Balance Lookup
P 1 P 2

Decapsulat ion

Egress LC Data Plane


P1 P2

Inter-Chassis Link bundle

InterChassis LC
Data Plane
3 Encapsula tion P1 P2

Inter-Chassis LC Data Plane


4 P1 P2

Chassis 0
1 2
Ingress Forwarding Lookup L2/L3/Mcast regular lookup

Decapsulat ion

Chassis 1
3
Inter-Chassis Encapsulation

Inter-Chassis Load Balance Load balance across multiple inter-chassis links


BRKSPG-2402

Egress Forwarding Lookup L2/L3/Mcast regular lookup


100

Inter-Chassis Decapsulation
Cisco Public

2013 Cisco and/or its affiliates. All rights reserved.

LOOKUP

Introduction to RG-Infra
RG Infra is the IOS Redundancy Group Infrastructure to enable the synchronization of application state data between different physical systems
Does the job of RF/CF between chassis

Infrastructure provides the functions to


Pair two instances of RG configured on different chassis for application redundancy purposes Determine active/standby state of each RG instance Exchange application state data (e.g. for NAT/Firewall) Detect failures in the local system

Initiate & manage failover (based on RG priorities, allows for pre-emption)

Assumptions
Application state has to be supported by RG infra (ASR 1000 currently supports NAT, Firewall, SBC)

Connectivity redundancy solved at the architectural level (need to externalize the redundant ESI links of the intra-chassis redundancy solution)
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

101

Redundancy Groups Functions


Registers applications as clients Registers (sub)interfaces / {SA/DA}-tuplets in case of firewall Determines if traffic needs to be processed or not
E.g. for Firewall: if a subset of sessions are associated with a RG in active state, then the Firewall application will perform normal processing for those sessions and actively sync the session state to another device that has the same RG in STANDBY state. For Firewall sessions that are associated with a RG in STANDBY state, the session information will be synchronized from a device that has the RG in ACTIVE state.

Communicates control information between RGs using a redundancy group protocol


Advertisement of RGs and RG state Determination of peer IP address Determination of presence of active RG

Synchronizes application state data using a transport protocol Manages Failovers!


RPactFW ESP
RG Active RG control data

RG state data

RPact ESP

RG

SIP
SPA
BRKSPG-2402

SIP
SPA SPA SPA

SIP
SPA

SIP
SPA SPA
102

SPA

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Redundancy Groups Functions Details


Configuration of stateful system redundancy
Priority (similar to HSRP priority for RG state determination) Preemption, Name

For Your Reference

Peer Management
Maintain information about peers

Fault Handling
Changing priorities of RG (may affect RG state) Fault event dampening Logging Integration with Enhanced Object tracking / BFD

RG State control
Init, Active, Standby, disabled Communicating state changes to other software entities in the system (e.g. QFP software)

Synchronization management
Synchronization state tracking (standby has to request bulkupdates from active) Determines when synchronization is started (e.g. ensures transport is available)

Transport Connectivity
Knows via which interface application state is synchronized Can be different for application state data and RG control messages

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

103

Intra-Chassis vs. Inter-Chassis Redundancy

For Your Reference

Function / Method Hardware redundancy Redundant connectivity Redundancy control State synchronization Failure detection mechanism

Stateful Intra-chassis ESP, RP Internal ESI links RF/CF IPC over EOBC Interrupts

Stateful Inter-chassis ESP, RP, Interfaces Redundant links to neighbor nodes RG External GEC BFD, Hellos

Failover mechanism

Chassis Manager

RG Protocol (HSRP like)

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

104

Possible RG-Infra Redundancy Models

Active-Standby All application traffic associated with a SINGLE RG instance Failures would switch all traffic over to the standby chassis Active-Active
Multiple RG instances configured per system Subset of traffic associated with a particular RG instance
RG1act RG3sby RG1sby RG2act RG3act RG2sby RGact RGsby

Single failure only affects subset of overall application traffic

2+1 Active-Standby
2 or more chassis loadshare application traffic, backed up by a

single standby system


Subset of traffic associated with a particular RG instance on
RG1act

RG2act

RG1sby RG2sby

different chassis
Single failure only affects subset of overall application traffic
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

105

Case Studies

Case Study: Highly Available IP Architecture for Mobile One Second Convergence Requirement
MPLS VPN
EvDO/LTE VRF

CSR
QFP

Agg1
LTE Core
VRF EvDO/LTE

MME SGW

1xRTT VRF

VRF 1xRTT

MSC
EvDO/LTE VRF

CSR
RNC

MSPP
1xRTT VRF

MSPP CDMA Core

QFP

Agg2 PE Cellsite
Local VLANs or T1s

Internet Core

EoMPLS Backhaul

MTSO
OSPF/RIP/VRRP

PE

Transport VLANs / Static Routes / BGP PIC Edge / BFD protection

L2 Domain

L3 Domain

L3 Domain

FE T1

QFP

GE

10 GE

Service Termination
107

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

Case Study: Highly Available IP Architecture for Mobile Transport


Static Routes establish connectivity between loopback addresses
MPLS VPN
EvDO/LTE VRF

CSR
VRF VRF QFP

Agg1
LTE Core VRF
VRF EvDO/LTE

MME SGW

1xRTT VRF

VRF 1xRTT

MSC
EvDO/LTE VRF

CSR
RNC

MSPP
1xRTT VRF

MSPP CDMA Core

QFP

BFD sessions established VRF Tables Updated

VRF

PE PE

Agg2

Internet Core

Static routes for cellsite reachability BGP PIC Edge for Layer-3 convergence VRRP for MTSO
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

108

Case Study: Highly Available IP Architecture for Mobile Steady-State Traffic Flows
MPLS VPN
EvDO/LTE VRF 1xRTT EVDO

CSR

Agg1
LTE Core
VRF EvDO/LTE

1xRTT VRF

QFP

MME SGW

VRF 1xRTT

MSC
EvDO/LTE VRF

CSR
MSPP MSPP
QFP

1xRTT RNC EVDO 1xRTT

1xRTT VRF

CDMA Core

PE Agg2 PE

Internet Core

Steady state: CSR distributes flows across both Aggs using ECMP. Traffic could flow across Agg inter switch links. Each Agg handles traffic related to all services from the cell-site.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

109

Case Study: Highly Available IP Architecture for Mobile Link Failure


No changes to LAN side connectivity
LTE Core
VRF EvDO/LTE

EvDO/LTE VRF

CSR
MPLS VPN QFP

Agg1
MME SGW

1xRTT VRF

x
CSR
MSPP MSPP
QFP

VRF 1xRTT

MSC RNC

EvDO/LTE VRF

1xRTT VRF

CDMA Core

PE Agg2 PE

Internet Core

Steady state: Traffic flows distributed across both Agg. Failure: GE link from MSPP to Agg1 fails. Action: BFD session to Agg1 times out at CSR. Agg1 next hop removed from forwarding table. Traffic flows resume across existing path to Agg2. Results: Traffic flows to Agg1 via Agg2.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

110

Case Study: Highly Available IP Architecture for Mobile Aggregation Switch Failure
EvDO/LTE VRF

CSR
QFP

MPLS VPN

Agg1

1xRTT VRF

LTE Core
VRF EvDO/LTE

MME SGW

VRF 1xRTT

MSC
EvDO/LTE VRF

CSR
RNC

MSPP
1xRTT VRF

MSPP CDMA Core

QFP

PE

Agg2

Internet Core

PE

Steady state: Traffic flows distributed across Agg. Failure: Agg1 power outage. Action: BFD and VRRP sessions time out BGP and OSPF neighbors drop due to BFD BGP PIC Edge ensures sub-second convergence Traffic flows resume across existing path thru Agg2. Results: Traffic flows via Agg2 to end hosts.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

111

Case Study: Highly Available IP Architecture for Mobile CSR Failure


EvDO/LTE VRF 1xRTT VRF

x
QFP QFP

CSR

MPLS VPN

Agg1
LTE Core
VRF EvDO/LTE

MME SGW

VRF 1xRTT

MSC
EvDO/LTE VRF

CSR
RNC

MSPP
1xRTT VRF

MSPP CDMA Core

PE Agg2 PE

Internet Core

Steady state: Traffic flows distributed across CSR. Failure: CSR power outage. Action: BFD sessions time out BGP neighbors drop due to BFD Mobile handsets resync to neighboring cell site Results: Mobile handset voice connectivity is maintained.
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

112

Summary

Summary
Motivation for High Availability in SP Aggregation Networks

Network Level High Availability


System High Availability Service High Availability Stateful Inter-chassis Redundancy Case Studies Summary and Conclusions

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

114

Key Takeaways
High-Availability becoming increasingly deployed in Aggregation Networks
Motivated by experiences with MPLS Core Networks

Many high-availability techniques deployed in the core are now applied in MPLS aggregation networks
MPLS TE FRR, BFD, EOAM, Pseudowire Redundancy

Service High Availability requires comprehensive approach including the deployment of


Network level resiliency System Level resiliency

L4-7 service resiliency

Stateful Inter-chassis redundancy increasingly being considered to provide geographic redundancy for applications

High Availability comes at a cost (CAPEX & OPEX)!

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

115

Call to Action

Visit the Cisco Campus at the World of Solutions


to experience Cisco innovations in action

Get hands-on experience attending one of the Walk-in Labs Schedule face to face meeting with one of Ciscos engineers at the Meet the Engineer center Discuss your projects challenges at the Technical Solutions Clinics

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

116

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

117

Recommended Reading
N. Stringfield et. Al, Cisco Express Forwarding,
ISBN-13: 978-1-58705-236-1

D. C. Lee, Enhanced IP Services for Cisco Networks,


ISBN-13: 978-1-57870-106-3

A. Sayeed, M. Morrow, MPLS and Next-Generation Networks,


ISBN-13: 978-1-58720-120-2

J. Davidson et. al, Voice over IP Fundamentals, 2nd Edition,


ISBN-13: 978-1-58705-257-6

V. Bollapragada et. Al, Inside Cisco IOS Software Architecture ,


ISBN-13: 978-1-57870-181-0.

R. Wood, Next-generation Network Services,


ISBN-13: 978-1-58705-159-3.

K. Lee, F. Lim, B. Ong, Building Resilient IP Networks,


ISBN-13: 978-1-58705-215-6

T. Szigeti, C. Hattingh, End-to-End QoS Network Design: Quality of Service in LANs, WANs, and VPNs:,
ISBN-13: 978-1-58705-176-0

B. J. Carroll, Cisco Access Control Security,


ISBN-13: 978-1-58705-124-1.

A. Khan, Building Service-Aware Networks: The Next-Generation WAN/MAN,


ISBN-13: 978-1-58705-788-5

BRKSPG-2402

2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

118

Whitepapers on CCO
Cisco IOS High Availability
http://www.cisco.com/en/US/tech/tk869/tk769/tech_white_papers_list.html
http://www.cisco.com/en/US/products/ps6550/prod_white_papers_list.html

Campus Network for High Availability Design Guide


http://www.cisco.com/en/US/docs/solutions/Enterprise/Campus/HA_campus_DG/hacampusdg.html

Cisco Validated Designs


http://www.cisco.com/en/US/netsol/ns742/networking_solutions_program_category_home.html

ASR 9000
Cisco ASR 9000 Series High Availability: Continuous Network Operations Introduction to Cisco ASR 9000 Series Network Virtualization Technology Distributed Virtual Data Center for Enterprise and Service Provider Cloud

ASR 1000
Cisco ASR 1000 Series Aggregation Services Routers Cisco ASR 1000 Series: ISSU Deployment Guide and Case Study Cisco Unified Border Element (SP Edition) on Cisco ASR 1000 Series Cisco Unified WAN Services: Services, Security, Resiliency, and Intelligence
BRKSPG-2402 2013 Cisco and/or its affiliates. All rights reserved.

Cisco Public

119

Anda mungkin juga menyukai