1. In this example, Spanning Tree is not configured. Port b (SW02) is an individual network segment.
Port a is VLAN 0 and Port b is VLAN 1. From the perspective of each BIG-IP, each of these networks (a for BIG-IP1 and b for BIG-IP2) are available.
From the BIG-IP perspective, only the Active unit is replying to Address Resolution Protocol (ARP) requests for Virtual Servers, Source (or Secure) Network Address Translation (SNATs), Network Address Translation (NAT) and Floating Self IPs. BIG-IP1 is the Active Local Traffic Manager (LTM). Assuming a failover from BIG-IP1 to BIG-IP2, BIG-IP2 would send out a Gratuitous ARP to force the switch to begin forwarding incoming traffic on Port b. Failover happens without the switches needing to recalculate STP. It is recommended that MAC (Media Access Control) Masquerade be configured on the BIG-IP to eliminate any potential ARP cache issues. It is also recommended that VLAN Fail-Safe be enabled so LTM can failover in the event of a switch failure.
The BIG-IP (in this design) with Serial Failover enabled, offers a very fast failover time, and coupled with session mirroring causes the traffic to continue without failing. In some applications and topology configurations, the application may experience a 1 -3 second pause before continuing. This is acceptable for most protocols.
Instructor Notes: Remind: Explain:
When BIG-IP executes a failover to BIG-IP2. This causes BIG-IP2 to begin forwarding traffic almost instantly through port b on SW02. However, if SW01 goes down for any reason, traffic will not failover to SW02 unless VLAN Fail-Safe is enabled on LTM. Also note that if any device(s) on the connected segments do not honor our Gratuitous ARP and update their cache, MAC Masquerade must be configured on all connected VLANs. Summary While this is the recommended configuration, an additional consideration could include using a minimum of two ports per switch in an aggregated channel such as, Link Aggregation Control Protocol (LACP) as a way to mitigate individual port or cable failures. Disadvantage(s) 1) Some Network Administrators prefer meshed environments. 2) MSTP failover is relatively quick if setup correctly creating design strife between architects. 3) LTM is fully responsible for failover so configuration specifics and monitoring are critical. Advantage(s) 1) STP is the most common deployment. It uses LTM failover mechanisms. 2) STP provides the most reliable failover process with predictable traffic flows. 3) STP provides a simplistic troubleshooting environment. 4) LTM will failover with network or switch failure.
Ask:
Meshed
This example illustrates a Spanning Tree with Cisco Catalyst where SW01 is configured as the root bridge (Fully Meshed).
1. In this example, the root bridge determines that Port a (SW01) is in blocking. This means that Port a is forwarding from the switch perspective, and is talking to two both BIG-IP1 and BIG-IP2. From the perspective of each BIG-IP, Port a (BIG-IP1) is available and Port b (BIG-IP2) is not available. Also, from the BIG-IP perspective, the only unit that is passing traffic is the active one, which means production traffic passes through switch Port a. Note: That although no production traffic is passing through the standby unit (BIG-IP2), Bridge Protocol Data Unit (BPDU) traffic will be forwarded through the interfaces as required to maintain proper STP configuration. Turning off STP pass through on BIG-IP1 and BIG-IP2 will cause the switches to change Port b from blocking to forwarding, immediately causing bridge loops and network failure due to broadcast floods. It should also be noted that BPDUs do not pass across more than one 802.1q tagged VLAN as BPDUs are not 802.1q compliant.
2. Assume a failover from BIG-IP1 to BIG-IP2. When this happens, BIG-IP2 sends out a Gratuitous ARP to force the switch to begin forwarding incoming traffic on Port b instead of Port a. From the switch perspective, we assume that neither of the ports is down and in forwarding mode, so the failover happens without the switches needing to recalculate STP. The BIG-IP with the hardware failover offers a very fast failover time, and, coupled with session mirroring, causes the traffic to continue without failing. In some applications and topology configurations, the application may experience a 1-3 second pause before continuing, which is acceptable for most protocols. A point to consider here, is lets say that after failover has occurred and traffic is going through BIG-IP2, BIG-IP1 loses its link on Port a. This will cause the switch to recalculate STP so that Port b is placed in blocking and Port a is placed in forwarding. However, although no production traffic is going through BIG-IP1, this does not interrupt the normal flow of traffic. With STP active on both Cisco switches, and with SW01 configured as the root bridge, the links between SW01 and SW02 become the root path and SW02 Port b changes its status to blocking mode. During a BIG-IP failover, there will be a seamless transition from BIG-IP1 to BIG-IP2, or from ports a to b. Note: Port b on SW02 is in blocking mode and because Spanning Tree is not triggered, failover happens instantly. If there is a link failure between a and/or b Spanning Tree will begin to recalculate. STP requires recalculation time, which by default is 30 seconds, (15 seconds for listening, and 15 seconds more for learning). There may also be some additional seconds of delay due to ARP packets sent by BIG-IP to discover Layer 2 information before it starts forwarding packets. These processes can take up to 40+ seconds to complete. Forty+ seconds of traffic delay will definitely impact application timers used to keep sessions open, therefore requiring clients to restart sessions. Summary Under certain conditions, seamless port failover is not available with this topology. Disadvantage(s) 1) Fully meshed is not a common deployment. 2) Traffic flows are difficult to predict, since the Network Protocol is controlling how traffic passes regardless of the LTM condition. 3) Fully meshed creates complex troubleshooting environments. 4) Adds an extra layer of complexity
4. Additional fault tolerance can be achieved by trunking two interfaces to create the peering networks, and configuring Link Aggregation Control Protocol (LACP)
Link Aggregation Control Protocol (LACP) Configuration Tips 1. The interfaces that you specify for an assigned trunk must operate at the same media speed, and must be set at full-duplex mode, and any interface that you assign to a trunk must be an untagged interface. 2. Active mode (default) will cause the BIG-IP to periodically issue LACP control packets at the interval specified by the configured timeout value (Short = 1 second, and Long = 30 seconds). a. We generally recommend that you leave the mode set to Active, however, if you set the LACP mode to Passive, do so only on one peer system. If you set both systems to Passive mode, no control packets will be sent. 3. F5 Networks has enhanced the 802.3ad specification for LACP by adding a Link Selection Policy option. a. When you set the link selection policy to Auto, the system then aggregates any links that have the same media properties and are connected to the same peer as the reference link. b. When you set the link selection policy to Maximum Bandwidth, the BIG-IP system aggregates the subset of member links that provide the maximum amount of bandwidth to the trunk.
5. It is generally recommended that you allow the default services on your PeerNet(s).
a. Failover.NetTimeoutSec = the number of seconds the unit will wait for an update from the Active unit from its last received update. The default setting is three seconds (use bigpipe db Failover.NetTimeoutSec <#> to alter value). Serial cable failover is based on heartbeat detection, where voltage is continuously sent from one BIG-IP Controller to another. If a response does not initiate from one BIG-IP, failover to the peer will occur in less than one second, so it provides the fastest recovery in the event of system failure. Network failover is also based on heartbeat detection, but instead of using the serial cable, heartbeat packets are sent over the internal network on ports 1028. If a response does not initiate from one BIG-IP, failover to the peer will occur in approximately 5 seconds by default. If the BIG-IP is configured with both serial cable failover and network-based failover, then the serial cable signal and the network heartbeat must both fail before the standby BIG-IP will become active. A potential problem with network failover is that network problems may cause both LTM units to enter into active mode. To avoid this serious issue, we recommend that if you use network failover, you crossover one interface on each unit to perform only failover communications. The self IPs on this interface will require Custom Port Lockdown.
Unlike an active/standby configuration, which is designed strictly to ensure no interruption of service in the event that a BIG-IP LTM system becomes unavailable, an active/active configuration has an additional benefit. An active/active configuration allows the two units to simultaneously manage traffic, thereby improving overall performance. However, there are several caveats to running in this redundancy mode. A common active-active configuration is one in which each unit processes connections for different virtual servers. For example, you can configure unit 1 to process traffic for virtual servers A and B, and configure unit 2 to process traffic for virtual servers C and D. If unit 1 becomes unavailable, unit 2 begins processing traffic for all four virtual servers. Here is an active-active configuration, first as it behaves normally, and then after failover has occurred:
The figure above shows an active-active configuration in which units 1 and 2 are both in active states. With this configuration, failover causes the following to occur: a. Unit 2 (already in an active state) begins processing the connections that would normally be processed by unit 1. b. Unit 2 continues processing its own connections, in addition to those of unit 1. When unit 1 becomes available again, you must manually initiate failback (System > High Availability > Failback), which, in this case, means that the currently-active unit drops all connections that it is managing on behalf of its peer, and continues to operate in an active state, processing its own connections. Failback will cause a service impact to the targeted virtual servers, unless connection mirroring is enabled.
Active/Active and Secure Network Address Translation (SNAT) Each BIG-IP system in an active-active configuration has a unit ID, either 1 or 2. When you define a local traffic management object, such as a virtual server, self IP or a SNAT, you must associate that object with a specific unit of the active-active redundant pair. When failover occurs, these associations of objects to unit IDs allow the surviving unit to process connections correctly for itself and the failed unit. If you do not associate an object with a specific unit ID in an active-active redundant pair, the redundant system uses 1 as the default unit ID, however, you cannot associate a default SNAT with a unit ID, as the default SNAT is not compatible with an active/active system. Instead, you must configure SNAT automap on each individual unit.
Converting Between Redundancy Modes One of the major caveats to running in active/active redundancy mode, is that it is possible to ramp up traffic to over one hundred percent of the capacity of a single BIG-IP LTM unit. If this is done, and a hardware failover occurs, the remaining unit will be over-subscribed and an impact will follow. Since F5 Networks recommends reinstalling the BIG-IP system software from CD-ROM when reconfiguring redundant pair units, it is very important to consider the redundancy mode in the early phases of your BIG-IP implementation. Due to the large number of settings that must be changed using both the Configuration utility and the command line when you convert a unit from active-standby to active/active, or from active-active to active-standby, reloading and reconfiguring both units is generally an easier and more reliable process than converting from one redundant mode to another.
5. Test the SCCP login by opening an SSH session to the IP address that you configured. Access SCCP using the management port and the same login credentials that you use for the BigIP root account. 6. Now you can establish console access from the SCCP to the host subsystem by typing the following command from the SCCP command prompt: hostconsh
Setting the SCCP IP address remotely F5 recommends that customers always setup the SCCPs IP address from serial console. However, in rare cases when serial connectivity to the BigIP platform is not possible, you can setup the SCCP IP address through the connectivity to the host subsystem 1. SSH into the BigIP host subsystem as root. 2. Connect through SSH to the SCCP, by typing the following command: ssh sccp An sccp# prompt will appear similar to the following:
Last login: Mon Jan 01 01:23:45 2006 from host Welcome to the F5 Networks SCCP! sccp#
3. Invoke the network configuration utility by typing the following commands: cd /etc netconfig
4. Follow the prompts to configure the SCCP IP connectivity. Please remember that all settings below are completely independent from the management port settings on the host subsystem. An example of configuration screen is below: SCCP Linux Management Network Configurator Use DHCP? n Host name (optional): IP address (required): Network mask (required): Broadcast IP address (optional): Default gateway IP address (optional): Nameserver IP address (optional): Nameserver IP address (optional): sccp1.mycompany.com 192.168.245.11 255.255.255.0 192.168.245.255 192.168.245.1 192.168.2.10
5. After running the netconfig utility, F5 highly recommends that you reboot the entire platform to verify that the SCCP IP connectivity is properly configured and is available following the power outage to the platform. If it is impossible to reboot the platform, please skip this step and follow step 6. To perform the reboot of the platform, exit the SCCP ssh session back to the BigIP host subsystem and type the following commands: touch /.sccp_hard_reboot reboot 6. Please only follow this step if you were unable to perform step 5. Run the following command to initialize the network interface on the SCCP:
/etc/rc.network 7. Test the SCCP login by opening an SSH session to the IP address that you configured. Access SCCP using the management port and the same login credentials that you use for the BigIP root account. 8. Now you can establish console access from the SCCP to the host subsystem by typing the following command from the SCCP command prompt: hostconsh
MAC Masquerading
For active/standby systems, you can create a media access control (MAC) masquerade address that can be shared between the BIG-IP units. Doing so can minimize the impact of a BIG-IP LTM failover event by responding with a consistent MAC address from the newly active device. When a BIG-IP becomes active one of the first things it does it perform an interface reset, dropping carrier for an instant and then bringing the link back up and sending a gratuitous ARP with all the vitual IPs for which it is now active. One potential problem in this instance, may be the default switchport configuration of a connected layer 2/3 device. For instance, by default legacy Cisco 65xx and 55xx ethernet switchports are configured to perform several possible functions at startup: a. Spanning-tree holddown b. Etherchannel auto-negotiation c. Trunking autonegotiation d. Speed/Duplex autonegotiation All of the above happens before the switch passes any traffic. If even one of these functions is enabled (they are all on by default), the imposed delay might be enough to drop the gratuitous ARP which announces the new active BIG-IP. In modern versions of CatOS there is an option to turn on spanning tree "portfast", and turn off etherchannel and trunking negotiation, potentially eliminating the delay at port initialization and, thus, the need for MAC masquerading.
When you configure MAC masquerade, you must use a MAC address that is unique to your network. If its not, a conflict will result. The only way to guarantee a unique MAC addresses is to register as a vendor with the IEEE; however, you can easily find MAC address ranges that are unused. F5 Networks recommends that you construct a MAC masquerade address using the vendor code 40:00:00. This code does not appear to have been used by a vendor and F5 Networks is not aware of any cases where use of this vendor code resulted in a conflict. However, several other vendor code options exist. To construct the full MAC masquerade address, use the existing serial number portion of the pre-assigned MAC address in combination with the false vendor code. For example: If the unit's pre-assigned MAC address is 00:01:D7:01:02:03, you would change the vendor code and keep the serial number, producing the following address: 40:01:d7:01:02:03 This method produces a unique MAC address for a MAC masquerade.
In some networks, enabling MAC masquerading will not be an option. If so, we recommend that you increase the input buffer size or allow ARP updates to occur on the connected devices. Changing this option will vary on different network devices, so consult the manufacturers documentation for your specific device for information about altering this behavior. For instance, with Cisco IOS, you can change the size of the input hold queue by using the hold-queue configuration command on the interface that is attached to the network with the BIG-IP LTM. For example: hold-queue 400 in
System Fail-Safe
System services have heartbeats. The BIG-IP system continually monitors service heartbeats to determine whether the service is still running. For some services, if the system does not detect a heartbeat, the system takes some action with respect to failover. These services are:
MCPD (messaging and configuration) TMM (traffic management) BIGD (health monitors) SOD (failover) BCM56XXD (switch hardware driver)
In general, the default behaviors are recommended, since restarting a critical service such as BIGD or BCM56XXD is sufficient and a system failover event is not required.
The Traffic Management Microkernel (TMM) service, however, is the process running on the BIG-IP system that performs most traffic management for the product. As such, the TMM service supports all system and networking components that the BIG-IP system needs in order to process application and administrative traffic. It controls all system interfaces, except for the management interface so, if the heartbeat fails, it is appropriate to both failover and restart the service. When the TMM service is running, make sure that you have defined a default route on the BIG-IP LTM. Defining a default route prevents the high volumes of administrative traffic generated by the BIG-IP system from using the management interface (except for NTP), which is limited to 100 Mbps of throughput.
VLAN Fail-Safe
When you configure VLAN failsafe for a VLAN, the BIG-IP system monitors network traffic on the VLAN. If the BIG-IP system detects a loss of network traffic on the VLAN, the BIG-IP will attempt to generate VLAN failsafe traffic to nodes or the default router accessible through the VLAN in the following manner: After half of the VLAN failsafe timeout value has elapsed, the following actions occur. An ARP request for the IP address of the oldest entry in the BIG-IP ARP table is initiated, and an ICMPv6 neighbor discovery probe (only if entries exist in the BIG-IP IPv6 neighbor cache) is initiated After 3/4 of the VLAN failsafe timeout value expires, the following actions occur. An ARP request for all IP addresses in the BIG-IP ARP table is initiated, An ICMPv6 neighbor discovery probe (only if entries exist in the BIG-IP IPv6 neighbor cache) is initiated, and an ICMP echo request to 224.0.0.1 (multicast ping) is initiated The failover action is avoided if the BIG-IP system receives a response to the VLAN failsafe traffic it generated. F5 Networks strongly recommends using the default VLAN failsafe timeout. Setting the timeout too low can cause system stability issues such as daemons restarting. Beginning with version 9.2.5, you can configure the BIG-IP system to reset the timeout counter when it receives any frame on the VLAN. However, be aware that 9.2.5 has a known issue (fixed in 9.3), wherein certain types of outbound monitor traffic, such as traffic from a gateway_icmp monitor, can cause the BIG-IP LTM system to reset the VLAN failsafe timer, preventing failover from occurring. To configure the BIG-IP system to reset the timeout counter when it receives any frame on the VLAN, use the following bigpipe command:
b db Failover.VlanFailsafe.ResetTimerOnAnyFrame=true
Gateway Fail-Safe
Gateway failsafe allows you to configure redundancy between a failover pair of BIG-IP systems that point to different gateways. The gateway failsafe option allows each BIG-IP system to monitor the upstream gateway to which they are connected. If the gateway is marked as DOWN, the BIG-IP system can failover to its partner system to prevent further disruption to traffic. This functionality makes use of gateway pools for monitoring the upstream device. Therefore, the gateway pools must be assigned to their corresponding units (via the Unit ID), since the pool information will be synchronized between the systems.
In most environments, gateway failsafe is not required since the upstream device is normally configured for high availability as well. Because of this, the gateway address is normally an HSRP or VRRP address which will survive the failure of the gateway device. An easy way to determine if this is the case is to look at the MAC address of the gateway: VRRP routers use a common MAC address of the format 00:00:5e:00:01:xx. The last octet is the VRID or the VRRP virtual router or group identifier, which provides for 255 virtual routers in a network. HSRP routers also use a common MAC address on all media which supports HSRP (except token ring). The HSRP format is 00:00:0c:07:ac:xx (the last octet is the HSRP group number).