Key words: STP, RSTP, MSTP, rapid transition, multiple instances, redundancy loop, redundancy
Abstract: This article introduces basic MSTP terms, MSTP algorithm and implementation, MSTP
implementations delivered by Comware, and typical MSTP applications.
Acronyms:
1 Overview......................................................................................................................................... 3
1.1 Background.......................................................................................................................... 3
1.1.1 IEEE 802.1D STP ..................................................................................................... 3
1.1.2 IEEE 802.1w RSTP................................................................................................... 4
1.2 Benefits of MSTP ................................................................................................................. 5
5 Summary ...................................................................................................................................... 28
6 References ................................................................................................................................... 28
1.1 Background
In a Layer-2 switched network, loops can cause proliferation and infinite cycling of
packets, which incurs broadcast storms. Broadcast storms can cause all network
bandwidth to be occupied, making the whole network unavailable.
To address the problem, the Spanning Tree Protocol (STP) was introduced. STP
operates on Layer 2. It eliminates Layer 2 loops in a network by selectively blocking
specific links. STP also enables link redundancy.
Like other protocols, STP evolves with the development of network technologies. STP
first took its form in IEEE 802.1D, from which IEEE 802.1W (RSTP) and IEEE 802.1s
(MSTP) are derived.
The idea of STP is to eliminate loops in a network by cutting the network into loop-
free tree shape. To achieve this, the concepts of root bridge, root port, designated
port, and path cost are introduced. They also help achieve link redundancy and path
optimization. The algorithm used by STP to create a tree-shape topology is called the
spanning tree algorithm.
Bridges implement STP by sending messages between them. These messages carry
information needed for spanning tree calculation and are encapsulated in bridge
protocol data units (BPDUs). STP BPDUs are Layer 2 packets received and
processed by all STP-enabled bridges. They carry the destination MAC address 01-
80-C2-00-00-00, a multicast MAC address.
Despite all its benefits, STP has some drawbacks, one of which is slow convergence.
There is a delay before configuration BPDUs can propagate throughout the network.
The delay is known as the forward delay. With STP, it defaults to 15 seconds. During
this period, transient loops may exist because some ports that need to be blocked
may be still in the forwarding state. To eliminate transient loops, an intermediate state,
that is, the learning state, is introduced. In this state, a port learns MAC addresses
but does not forward packets. Because each state transition must undergo a period
the same as the forward delay, the mechanism eliminates transient loops that may
occur when network topology changes. A convergence delay that is at least two times
the forward delay, however, is intolerable for real-time services such as voice and
video.
To overcome slow convergence of STP, the IEEE released IEEE 802.1w in 2001.
IEEE 802.1w defines RSTP. RSTP makes three improvements (as listed below)
based on STP to increase the convergence speed remarkably to up to one second.
RSTP is compatible with STP. You can employ STP and RSTP in the same network.
However, as RSTP results in a single spanning tree (SST) in a network just like STP,
it suffers from the same drawbacks as any SST protocols do, as described below.
(1) In a large switched network, adopting a single spanning tree causes a relatively
long convergence time.
(2) With a single spanning tree adopted, all the VLANs in the network share the
same spanning tree. You need to make sure that data communications in each
VLAN can be carried out along the spanning tree.
(3) Blocked links do not forward traffic and therefore, do not participate in load
balancing. This causes inefficient use of bandwidth.
These drawbacks limit the application of SST protocols and result in the emergence
of MSTP, which takes VLANs into account.
MSTP is defined in IEEE 802.1S. MSTP enjoys remarkable advantages over STP
and RSTP, as listed below.
2 MSTP Implementation
2.1 Concepts
Assume that all the switches in Figure 1 are running MSTP. This section explains
some basic concepts in MSTP based on the figure.
BPDU BPDU
B CST
C
D
Region D0
VLAN1 mapped to instance 1 BPDU Region B0
B as regional root bridge VLAN 1 mapped to instance 1
VLAN 2 mapped to instance 2 VLAN 2 mapped to instance 2
C as regional root bridge Other VLANs mapped to CIST
Other VLANs mapped to CIST
Region C0
VLAN1 mapped to instance 1
VLAN2 and 3 mapped to
instance 2
Other VLANs mapped to CIST
An MST region consists of multiple devices in a switched network and the network
segments between them. These devices have the following characteristics:
(3) IST
An internal spanning tree (IST) is a spanning tree that runs in an MST region. ISTs in
all MST regions and the common spanning tree (CST) jointly constitute the common
and internal spanning tree (CIST) of the entire network. An IST is the section of the
CIST in an MST region, as shown in Figure 1 .
(4) CST
The CST is a single spanning tree that connects all MST regions in a switched
network. If you regard each MST region as a device, the CST is a spanning tree
calculated by these devices through STP or RSTP. For example, the red lines in
Figure 1 represent the CST.
(5) CIST
Jointly constituted by ISTs and the CST, the CIST is a single spanning tree that
connects all devices in a switched network.
As shown in Figure 1 , the ISTs in all MST regions plus the inter-region CST
constitute the CIST of the entire network.
(6) MSTI
Multiple spanning trees can be generated in an MST region through MSTP, one
spanning tree being independent of another. Each spanning tree is referred to as a
multiple spanning tree instance (MSTI). As shown in Figure 1 , multiple spanning
trees can exist in an MST region, each spanning tree corresponding to the specified
VLANs. These spanning trees are called MSTIs.
A boundary port is a port that is located on an MST region boundary and is used to
connect an MST region to another MST region, or to a single spanning tree region
running STP/RSTP.
(8) Bridge ID
A bridge ID consists of the priority of a bridge and the MAC address of the bridge.
External root path cost refers to the cost of the shortest path for a packet to travel to
the common root bridge.
The root bridge of the IST or an MSTI within an MST region is the regional root bridge
of the IST or the MSTI. Based on the topology, different spanning trees in an MST
region may have different regional roots.
Internal root path cost refers to the cost of the shortest path for a packet to travel to
the regional root bridge.
A designated bridge ID consists of the priority of a designated bridge and the MAC
address of the designated bridge.
A designated port ID consists of the priority of a designated port and the port number.
MSTP calculation involves these port roles: root port, designated port, master port,
alternate port, and backup port. A port can play different roles in different MSTIs.
Figure 2 helps understand these concepts.
B C
Port 6
Port 5
Backup port
D
Designated port
Port 3 Port 4
z Root port: a port responsible for forwarding data to the root bridge.
z Designated port: a port responsible for forwarding data to the downstream
network segment or device.
z Master port: a port on the shortest path from the current region to the common
root bridge, connecting the MST region to the common root bridge.
z Alternate port: the standby port of a root port or a master port. When the root
port or master port is blocked, the alternate port becomes the new root port or
master port.
z Backup port: the backup port of a designated port. When the designated port is
blocked, the backup port becomes a new designated port and starts forwarding
data without delay. Because two interconnected ports on the same MSTP device
can cause loops, the device will block either of the two ports. The one blocked is
the backup port.
(16) Port state
In MSTP, a port stays in one of the following three states depending on whether it
learns MAC addresses and forwards user traffic:
z Forwarding: the port learns MAC addresses and forwards user traffic;
z Learning: the port learns MAC addresses but does not forward user traffic;
z Discarding: the port neither learns MAC addresses nor forwards user traffic.
In this state, each port on each device generates the configuration BPDUs of its own,
with the root bridge being the local device, the common root and regional root being
the local bridge ID, the internal/external root path cost being 0, the designated bridge
ID being the local bridge ID, the designated port being the local port, and the BPDU
receiving port being 0.
Note:
z Port role selection rules involve multiple types of priority vectors. For detailed
information about these priority vectors, refer to Determining the Priority Vectors.
z Once the message priority vector received by a port is superior to the port priority
vector, all types of priority vectors will be recalculated and the role of each port
involved will be recalculated.
The MSTP role of each bridge is calculated based on the information carried in
BPDUs. The most important information carried in BPDUs is the spanning tree priority
vector. The following part introduces how to calculate the CIST priority vectors and
MSTI priority vectors.
The CIST priority vector consists of common root bridge, external root path cost,
regional root, internal root path cost, designated bridge ID, designated port ID, and
the BPDU-receiving port ID.
z In the initial state, the BPDUs that port PB of bridge B sends out carry common
root bridge RB, external root path cost ERCB, regional root RRB, internal root
path cost IRCB, designated bridge ID B, designated port ID PB, and the BPDU-
receiving port ID PB, that is, the value set of {RB, ERCB, RRB, IRCB, B, PB,
PB}.
z The BPDUs that port PB of bridge B receives from port PD of bridge D carry
common root bridge RD, external root path cost ERCD, regional root RRD,
internal root path cost IRCD, designated bridge ID D, designated port ID PD,
and the BPDU-receiving port ID PB, that is, the value set of {RD, ERCD, RRD,
IRCD, D, PD, PB}.
z The BPDUs that port PB of bridge B receives from port PD of bridge D are of
higher priority.
The following part describes how to calculate each priority vector based on the
assumptions above.
Message priority vectors are those carried in MSTP BPDUs. According to the
assumptions, the message priority vector of port PB on bridge B is: {RD : ERCD :
RRD : IRCD : D : PD : PB}. If bridge B and bridge D are in different regions, the
internal root path cost is insignificant to bridge B and will be set to 0.
In the initial state, the port priority vector takes the local port as the root. The port
The port priority vector is updated as new BPDUs are received on the port. If the port
receives BPDUs superior to the BPDUs that it generates, the port updates its port
priority vector according to the received BPDUs; otherwise, the port priority vector of
the port does not change. As the priority vector of the BPDUs received on port PB is
superior to the port priority vector, the port priority vector is updated into {RD :
ERCD : RRD : IRCD : D : PD : PB}.
Root path priority vector is derived from the port priority vector.
z If the port priority vector received is from another region, then the external root
path cost of the root path priority vector is the sum of the path cost of the port
and the external root path cost of the port priority vector; the regional root of the
root path priority vector is the local regional root; the internal root path cost is 0.
Suppose the path cost of port PB is PCPB. Then, the root path priority vector of
port PB on bridge B is {RD : ERCD + PCPB : B : 0 : D : PD : PB}.
z If the port priority vector received is from the same region, then the internal root
path cost of the root path priority vector is the sum of the path cost of the port
and the internal root path cost of the port priority vector. Thus, the root path
priority vector of port PB on bridge B is {RD : ERCD : RRD : IRCD + PCPB : D :
PD : PB}.
(4) Bridge priority vector
In the bridge priority vector, these elements are all 0: common root ID, regional root
ID, external root path cost, internal root path cost, designated port ID, and receiving
port ID. Both the regional root ID and designated bridge ID are the local bridge ID.
Thus, the bridge priority vector of bridge B is {B : 0 : B : 0 : B : 0 : 0}.
The root priority vector is the optimal one among bridge priority vector and all the root
path priority vectors with their designated bridge ID not the same as the local bridge
ID. If the local bridge priority vector is the optimal one, the local bridge is the root of
the CIST. Suppose that the bridge priority vector of bridge B is optimal. Then, the root
priority vector of bridge B is {B : 0 : B : 0 : B : 0 : 0}.
By setting the designated bridge ID and designated port ID of the root priority vector
to B and PB, you can obtain the designated priority vector of port PB on bridge B, that
is, {B : 0 : B : 0 : B : PB : 0}.
The way to determine MSTI priority vectors is the same as the way to determine the
CIST priority vectors except that:
z An MSTI priority vector only contains the regional root, internal root path cost,
designated bridge ID, designated port ID, and BPDU-receiving port ID, without
the common root or external root path cost.
z Message priority vectors are processed only if they are from the same region.
This section describes how to calculate the CIST. In the network shown in Figure 3 ,
assume that the bridge priority of Switch A is higher than that of Switch B; and the
bridge priority of Switch B is higher than that of Switch C. The costs of the links in the
network are 4, 5, and 10, as shown in the figure. Switch A and Switch B are in the
same region; Switch C is in another region.
Table 1 lists the initial message priority vectors of the involved ports.
Note that in the initial state, the port priority vector and the message priority vector of
a port are the same.
In the initial state, all the ports are designated ports. They propagate message priority
vectors of their own, with the root bridge being the local device.
Port AP1 and AP2 on Switch A receive packets from Switch B and Switch C and
process the priority vectors carried in the packets. As the port priority vectors of AP1
and AP2 are superior to the message priority vectors received, AP1 and AP2 remain
designated ports; Switch A acts as the common root bridge and the root of the region
to which Switch A and Switch B belong. From then on, the port sends messages
periodically, with the local device as the root bridge.
The port priority vector is compared with the message priority vector following these
steps:
z Compare each element in the port priority vector with the corresponding element
in the message priority vector. A smaller value wins out. If each element in the port
priority vector is equal to the corresponding one in the message priority vector, the
two vectors are equal.
z When the message priority vector is superior to the port priority vector or when the
designated bridge MAC address and designated port ID of the message priority
vector are equal to those in the port priority vector, the message priority vector
replaces the port priority vector.
After port BP1 on Switch B receives packets from port CP1 on Switch C, Switch B
compares the received message priority vector with the port priority vector of BP1. As
the port priority vector is superior to the message priority vector, the role of BP1
remains unchanged.
Switch B operates as follows when receiving packets through port BP2 from port AP2
on Switch A:
In the beginning, Switch C receives from port BP1 the message priority vector {B : 0 :
B : 0 : B : BP1 : CP1} on port CP1 and from port AP1 the message priority vector {A :
0 : A : 0 : A : AP1 : CP2} on port CP2. As the two message priority vectors are
superior to the port priority vectors of the receiving ports, the port priority vectors are
updated. As Switch C is not in the region where Switch A and Switch B reside, the
root path priority vector of port CP1 is updated as {B : 5 : C : 0 : B : BP1 : CP1}, and
that of port CP2 is updated as {A : 4 : C : 0 : A : AP1 : CP2}. Accordingly, the root
priority vector is {A : 4 : C : 0 : A : AP1 : CP2}, and the designated priority vectors of
port CP1 and CP2 are {A : 4 : C : 0 : C : CP1 : CP2} and {A : 4 : C : 0 : C : CP2 :
CP2}. As a result, port CP1 is selected as a designated port, and port CP2 as the root
port.
When port CP1 receives the updated message priority vector {A : 0 : A : 10 : B : BP1 :
CP1}, it replaces the original port priority vector with the message priority vector
because the latter is superior and calculates that the root path priority vector is {A : 5 :
As the role of each device and port has been determined, the whole tree topology is
established. The traffic forwarding path is as shown in Figure 4 .
MSTP and RSTP are mutually compatible and thus can recognize each others
BPDUs. STP, however, is unable to recognize MSTP packets. For hybrid networking
with legacy STP devices and for full interoperability with RSTP-enabled devices,
MSTP supports three work modes: STP-compatible mode, RSTP mode, and MSTP
mode.
z In STP-compatible mode, all ports of the device send out STP BPDUs,
In a switched network, when a port on the device running MSTP (or RSTP) is
connected to a device running STP, this port automatically migrates to the STP-
compatible mode. However, when the device running STP is removed, the port
cannot automatically migrate back to the MSTP (or RSTP) mode and will remain
working in STP-compatible mode. In this case, you can perform an mCheck operation
to force the port to migrate to the MSTP (or RSTP) mode.
You can create multiple instances on devices operating in STP mode and RSTP
mode. In this case, the state of a port in an MSTI is the same as that in the CIST. To
avoid excessive CPU utilization, do not create multiple instances on devices that
operate in STP mode or RSTP mode.
The MSTP implementation in Comware supports the default path cost calculation
methods defined in IEEE 802.1D-1998, IEEE 802.1T, and the Comware-proprietary
standard.
For how the default path cost is calculated in IEEE 802.1D-1998 and IEEE 802.1T,
refer to the two protocols. This section describes the Comware-proprietary default
path cost calculation method and the extensions to the IEEE 802.1D-1998 and IEEE
802.1T standards.
In IEEE 802.1D-1998, default path cost calculation for an aggregate link is the same
as that for a single link. It does not take into account the number of links in the
aggregation group.
The Comware-proprietary standard for determining the default path cost is outlined as
follows:
The speed of a link aggregation group is the sum of the rates of the unblocked ports
in the link aggregation group.
Note:
z Timeout time = timeout factor 3 hello time.
z Normally, we recommend that you set the timeout factor to 5, 6, or 7 in a stable
network.
Normally, the root bridge is determined by STP. You can also specify the root bridge
at the CLI.
Usually, when the root bridge of an instance fails or is shut down, a secondary root
bridge (if configured) becomes the root bridge for the instance. However, if you
In MSTP, you can specify a switch as the root bridge or the secondary root bridge of
a specified spanning tree instance. A switch can play different roles in different
spanning tree instances. For example, it can be the root bridge in a spanning tree
instance while a secondary root bridge in another spanning tree instance. In the same
spanning tree instance, however, a switch cannot be both the root bridge and a
secondary root bridge.
Normally, access ports of the devices operating on the access layer are directly
connected to user terminals such as PCs or file servers. These ports are usually
configured as edge ports to implement rapid transition and do not receive
configuration BPDUs. If an edge port receives configuration BPDUs, the switch will
re-configure it as a non-edge port and starts spanning tree calculation. Attackers may
exploit this weakness to send BPDUs deliberately fabricated to edge ports, causing
network topology instability. To prevent this type of attacks, you can use the BPDU
guard function.
With this function enabled, a switch shuts down edge ports that receive configuration
BPDUs and then reports these cases to the administrator. These ports can be
restored only by the network administrator. You are recommended to use the function
on switches configured with edge ports.
The root bridge in a network may receive configuration BPDUs with higher priority
because of configuration errors or network attacks. This can cause a new root bridge
to be elected, causing network topology instability. The topology change may cause
traffic traveling a high-speed link to be switched over to a low-speed link, resulting in
congestion. To avoid the situation, you can use the root guard function.
A switch maintains the state of each port by receiving and processing BPDUs from
the upstream switch. These BPDUs may get lost because of network congestions or
unidirectional link failures. If a switch does not receive BPDUs from the upstream
switch for a certain period, the switch selects a new root port; the original root port
becomes a designated port; and the blocked ports turn to the forwarding state. All
these events result in loops in the network. The loop guard function suppresses loops
caused by these events. With this function enabled, a root port turns to the Discarding
state when its role changes and remains in the Discarding state without forwarding
packets. Thus, loops are prevented.
In MSTP, this function is only applicable to root ports, alternate ports, and backup
ports.
Switch A
GE1/1 GE1/2
GE2/1 GE2/2
Switch B
Both Switch A and Switch B are Comware switches. Suppose Switch A is the root
switch. On Switch B, GigabitEthernet 2/1 is the root port; GigabitEthernet 2/2 is an
When receiving TC-BPDUs (BPDUs used to notify topology changes), the switch
flushes the forwarding address entries. If someone forges TC-BPDUs to attack the
switch, the switch will receive a larger number of TC-BPDUs within a short time and
be busy with forwarding address entry flushing. This affects network stability.
With the TC-BPDU guard function, you can set the maximum number of immediate
forwarding address entry flushes that the switch can perform within 10 seconds after
receiving the first TC-BPDU. For TC-BPDUs received in excess of the limit, the switch
performs forwarding address entry flush only when the 10-second timer expires. This
prevents frequent deletion of forwarding address entries.
According to IEEE 802.1S, two connected switches can communicate with each other
in an MSTI only when their MST region-related configurations (that is, region name,
revision level, and VLAN-to-MSTI mapping) are the same. With MSTP employed,
interconnected switches determine whether or not they are in the same MST region
by checking the configuration IDs in the BPDUs they exchange. Configuration ID
comprises region name, revision level, and configuration digest. Configuration digest
is a 16-byte signature obtained from VLAN-to-MSTI mapping by using the HMAC-
MD5 algorithm.
For compatibility sake, the digest snooping function was developed in Comware to
enable switches adopting different ways to obtain the configuration digest to
communicate with each other in an MSTI.
Caution:
z The digest snooping function requires that the interconnected switches involved
have the same region configuration. Otherwise, broadcast storms may occur due
to different VLAN-to-instance mappings.
z To enable a Comware switch to communicate with switches of other vendors, you
need to enable the digest snooping function on each port directly connecting the
Comware switch to a switch of another vendor. Note that you cannot enable digest
snooping function on edge ports.
z When the digest snooping function is enabled, do not modify the region
configuration of the Comware switch and the switches of other vendors directly
connected to it. To do that, you need to disable digest snooping first to prevent
possible broadcast storms caused by inconsistent VLAN-to-instance mappings.
z The digest snooping function is not needed in regions that only contain Comware
switches.
GE1/1
GE1/1
GE1/2 GE1/2
Switch C Switch B
Switch A and Switch B are Comware switches while Switch C is a switch of another
vendor. It adopts a non-standard algorithm to calculate the configuration digest. All
the three switches are MSTP-enabled and have the same region configuration.
To enable both Switch A and Switch B to communicate with Switch C in the region,
you need to enable the digest snooping function on GigabitEthernet 1/1 of Switch A
and GigabitEthernet 1/2 of Switch B. As Switch A and Switch B are Comware
switches, the function is unnecessary on the ports that connect them.
As defined in standard MSTP, a rapid transition occurs on a designated port when the
port receives a packet carrying the agreement flag from the downstream root port. A
root port, however, sends packets carrying the agreement flag only when it receives a
packet carrying the agreement flag from the upstream designated port. This can
cause fast transition failure of the designated port on a switch that do not send
packets with the agreement flag, on an RSTP-enabled switch for example, when the
switch is connected to a downstream MSTP-enabled Comware switch. The reason is
that the designated port on the upstream switch cannot receive acknowledgement
packets with the agreement flag from the downstream switch because it never sends
packets with the agreement flag. You can solve this problem by enabling the no
agreement check function on a port uplinked your Comware switch to an RSTP-
enabled switch or another vendors switch that uses a different implementation.
Switch A is the root switch with RSTP-enabled. GigabitEthernet 2/1 on Switch B is the
root port. To make rapid transition available on GigabitEthernet 1/1 of Switch A, you
need to enable no agreement check on GigabitEthernet 2/1 of Switch B.
This function enables Comware switches to communicate with devices adopting the
standard MSTP protocol packet format. It ensures that spanning trees can be
determined correctly in a network containing both devices adopting the standard
protocol packet format and devices adopting proprietary protocol packet formats.
By default, a Comware device can recognize the format of each received MSTP
packet and sends packets in the format of the received MSTP packet. You can also
specify the packet format to have the device receive/send packets that are only of the
specific format. A Comware device can communicate with other devices in the CIST
even if it operates in RSTP-/STP-compatible mode.
4 Applications Scenarios
MSTP enables packets of different VLANs to be transmitted over different spanning
trees, and thus implement per-VLAN load balancing and link redundancy.
As shown in the above figure, Switch A and Switch B operate on the distribution layer.
Switch C and Switch D operate on the access layer.
To achieve proper load balancing, you can perform configurations on the devices to:
After MSTP calculation is completed, packets of different VLANs are forwarded over
different paths as shown in Figure 9 . This forwarding topology not only balances
traffic across links but also achieves redundancy to decrease data loss by providing a
backup link for each VLAN.
5 Summary
MSTP overcomes the drawbacks of STP and RSTP. It achieves fast convergence
and enables packets of different VLANs to travel separate paths, allowing for a better
load balancing mechanism for redundant links. MSTP is flexible and suitable for
complex networks. Moreover, it is easy to configure. In the simplest cases, you only
need to enable MSTP for it to operate normally; if needed, you can select a path in a
VLAN for traffic transmission by configuring bridge priority, region settings, and port
path cost.
6 References
IEEE 802.1D, Spanning Tree Protocol
IEEE 802.1w, Rapid Spanning Tree Protocol
IEEE 802.1s, Multiple Spanning Tree Protocol
Copyright 2008 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of