Anda di halaman 1dari 15

Technical Report

TR-4146: Aggregate Relocate (ARL) Overview


and Best Practices for Clustered Data ONTAP
Controller Hardware Upgrades
Charlotte Brooks, NetApp
December 2013 | TR-4146

TABLE OF CONTENTS
1

Introduction........................................................................................................................................... 3
1.1

Scope .............................................................................................................................................................. 3

ARL Foundation for the Innovation of Business Processes ............................................................ 3


2.1

Options for Cluster Technology Refresh ......................................................................................................... 3

2.2

Cost Comparison of Vol Move versus ARL for Tech Refresh ......................................................................... 5

Aggregate Relocate Overview ............................................................................................................. 6

Controller Head Upgrades ................................................................................................................... 7


4.1

Double-Hop Upgrades .................................................................................................................................... 7

4.2

Supported Upgrades ....................................................................................................................................... 9

4.3

Command Line Interface ............................................................................................................................... 10

4.4

Graphical User Interface ............................................................................................................................... 12

4.5

Best Practices ............................................................................................................................................... 12

4.6

Considerations for Failure Scenarios ............................................................................................................ 14

References ................................................................................................................................................ 15
Version History ......................................................................................................................................... 15
LIST OF TABLES
Table 1) Cost analysis breakdown.................................................................................................................................. 5
Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades. ............................. 5
Table 3) Supported nondisruptive head upgrades using ARL. ....................................................................................... 9
Table 4) Supported platforms with ARL. ......................................................................................................................... 9
Table 5) Command line interface options for ARL. ....................................................................................................... 10
Table 6) Compatibility of ARL between Data ONTAP releases. ................................................................................... 11
Table 7) Recommended values for cifs-ndo-duration with small block size ................................................................. 12

LIST OF FIGURES
Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions. .............................. 6
Figure 2) Double-hop upgrade transition steps. ............................................................................................................. 8

Aggregate Relocate Overview and Best Practices

1 Introduction
Todays business environments require 24/7 data availability. The storage industry delivers the base
building block for IT infrastructures, providing data storage for all business and objectives. Therefore,
constant data availability begins with architecting storage systems that facilitate nondisruptive operations
(NDOs). Nondisruptive operations have three main uses: in hardware resiliency, hardware and software
lifecycle operations, and hardware and software maintenance operations. For the purposes of this paper
we focus on aggregate relocate (ARL) as a solution for lifecycle operations, specifically for refreshing a
storage controller.

NetApp storage controllers are the physical component of the logical node entity that services the client
and host requests from the upper layers of the IT infrastructure. The continuity of service during the
transition from storage controllers that are approaching an end-of-life state to shipping versions of the
replacement controller has been a cumbersome task regardless of the storage vendor. The aggregate
relocate feature eliminates the need to physically move any data. Aggregate relocate reassigns disks to
the partner node, making the partner node the owner of the aggregates. The relocation of the aggregates
makes the partner node the point of entry for all requests from the client or host applications without client
disruption.

1.1

Scope

The focus of this paper is on the aggregate relocate solution for the purpose of upgrading storage
controllers. The initial version of this paper does not go into detail regarding other solutions for ARL. In

the clustered Data ONTAP 8.2 architecture, ARL is qualified for the purpose of controller hardware
upgrades and maintenance operations. The primary points of discussion in this document include:

Comparison of the ARL and vol move methods for cluster technology refresh

An overview of the ARL feature and the end-to-end process of the ARL feature

Use of the ARL command

Continuity of data availability throughout the phases of ARL

Best practices and considerations when planning to use ARL

2 ARL Foundation for the Innovation of Business Processes


The simplicity and minimal overhead of the aggregate relocate feature allow the lifecycle of a storage
controller to be extended several months longer than a typical controller lifecycle. Aggregate relocate,
being a no data copy feature, completes in a matter of seconds, whereas a data copy solution can take
weeks or months. This reduces the overall time of transition from an old controller to a new controller and
lengthens the production time of each controller. The cost of ownership per controller as a function of time
is therefore reduced. The total cost of ownership is reduced by:

2.1

Yielding a longer production period for the hardware

Reducing the number of people required to transfer data from the old controller to the new
controller

Reducing the time for data migration (no physical data copy with ARL)

Options for Cluster Technology Refresh

There are essentially two primary methods for technology refresh in clustered Data ONTAP. The first is
via DataMotion for Volumes (vol move) and involves moving all the data to existing or newly added nodes
in the cluster to evacuate the old nodes, then removing the nodes from the cluster. This process has been

Aggregate Relocate Overview and Best Practices

supported in all versions of clustered Data ONTAP. The second method is using ARL and is new in
clustered Data ONTAP 8.2.
The following considerations apply to using the vol move method for tech refresh of the cluster.

If the node root or data aggregates are using internal drives, the vol move method is required.

If the customer has purchased new storage shelves in addition to new controllers, then vol move
is the best solution. In the ARL method, the data remains on the same shelves.

If the customer cannot tolerate disabling HA for the duration of ARL, vol move is preferred. When
vol move is used, all controllers remain up and serving data throughout, as opposed to the ARL
method when each controller in the HA pair will be shutdown for the duration of the controller
upgrade.

The vol move method takes substantially longer than the ARL method, since a full data copy is
required. The amount of time may be in the order of days or even weeks as it is directly
dependent on the amount of data to be evacuated. The vol moves can, however, be staggered
over time as required. The ARL method, by contrast is in the order of several hours for each
controller pair since no data copy is required, and it must be completed as a single maintenance
event.

If the controllers being refreshed are not on Data ONTAP 8.2, the ARL method cannot be used.

If new nodes are being added to the cluster, the cluster size and controller mix must remain
compliant with the Cluster Platform Mixing rules.

The following considerations apply to using the ARL inplace upgrade for tech refresh of the cluster.

If neither the node root or data aggregates are using internal drives, then ARL can be used

If the customer is only upgrading the storage controllers, then ARL can be used, since the disk
storage remains the same

The new controllers must support the existing shelf technology, since the controllers will be
attached to the old storage.

The original nodes must already be running Data ONTAP 8.2 or higher. If the controllers cannot
be upgraded to Data ONTAP 8.2, then the vol move method is the only option.

If the cluster is already at the maximum supported size, or the new controllers are not supported
in the same cluster as the old controllers (e.g. 6200 and 2200 nodes cannot exist in the same
cluster), then either ARL inplace controller upgrade, or vol move to existing nodes in the cluster
must be used. If a cluster is already at its maximum size and ARL is not possible for any reason,
the summary process to refresh the controllers is
1. Evacuate data from the nodes being replaced to other cluster nodes using vol move. If there
is insufficient free capacity for the data elsewhere in the cluster, additional storage will be
need to be added to the existing nodes.
2. Remove LIFs from the old nodes
3. Unjoin the evacuated nodes, thereby reducing the cluster size.
4. In the new cluster configuration it may now be possible to perform ARL inplace controller
upgrade. If this is not possible, the new controllers can be joined to the cluster as additional
nodes.

A final consideration is process complexity. The vol method is conceptually simpler since it uses only
standard clustered ONTAP commands and does not require advanced administration skills. It is also
available as a WFA workflow for the case where a new HA pair of controllers and storage will be added to
the cluster and the old controllers and storage are evacuated. In comparison, the ARL method requires a
number of different clustered ONTAP commands including commands run in maintenance mode, and is
almost exclusively CLI driven. This may be a consideration in the choice of which method to use.
Nevertheless, it is expected going forward that ARL will become the preferred method for the majority of
technology refresh scenarios.

Aggregate Relocate Overview and Best Practices

2.2

Cost Comparison of Vol Move versus ARL for Tech Refresh

In addition to the considerations given in the previous section, the impact of system cost should be taken
into account.
Table 1 and Table 2 show various values associated with a typical controller hardware upgrade. Two
values are given in the tables: values associated with the data copy (vol move) method and values
associated with the ARL method. The values are theoretical values; exact values would depend on the
impact to the business. For example, the amount of data that is being migrated for a physical data
migration solution impacts the duration of the transition time.
The objective of the tables is to demonstrate that aggregate relocate decreases the cost associated with
each controller.
Table 1) Cost analysis breakdown.

Cost Variable
Controller_cost

Cost Variable Description

Theoretical Value

Controller price paid

$50,000.00

Production_time Number of months the controller is in


production (life span of controller less
transition time)

42 months
(48 months 6 months) Data Copy Move
47.75 months
(48 months month) ARL

Transition_time

Transition_cost

Number of months data is being copied


to upgrade the controller

6 months (Data Copy Move)


1 week (ARL)

Cost per month for additional hardware, $1,000.00


personnel to do the data move, overtime
hours

Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades.

General Formula

Using Data Copy

Using ARL

Cost of
Controller
Monthly

(Controller_cost)/(Production_time)

($50,000.00)/(42
months)

($50,000.00)/(47.75
months)

$1,190.48

$1,047.12

Total
Transition
Costs

(Transition_time)*(Transition_cost)

(6 months)*
($1,000.00)

(1/4 month)*
($1,000.00)

$6,000

$250.00

The following figure represents the lifetime of a controller using a physical data copy solution versus an
aggregate relocate solution. The physical data copy solution requires more time to complete and the new
controllers will go into production earlier than if aggregate relocate were being used. Aggregate relocate

Aggregate Relocate Overview and Best Practices

allows the new controllers to be purchased and put in-place later in the cycle, extending the lifetime of
each controller.
Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions.

3 Aggregate Relocate Overview


Aggregate relocate (ARL) is a nondisruptive process that moves ownership of aggregates between nodes
that share storage (HA pair controllers). This data migration does not require any data to be physically
copied. Rather, aggregate ownership is reassigned with no dependency on the HA interconnect.
ARL is a new feature in clustered Data ONTAP 8.2 and relies on nodes with direct access to the storage
housing the identified aggregates. For the purposes of ARL, the node within an HA pair configuration
taking over ownership of the aggregates is referred to as the destination node, and the node originally
owning the aggregates is referred to as the source node. Both the source and destination nodes must be
running clustered Data ONTAP 8.2 or later to take advantage of this feature. In addition, both the source
and destination nodes need to be:

Fully booted

Joined to the same cluster and within the same HA pair

Directly cable connected to the storage shelves (HA pair configurations)

Running the same release of Data ONTAP 8.2.x.

The nondisruptive nature of ARL is facilitated by the clustered Data ONTAP architecture, which provides
virtualization of the networking interface from the storage resources. This virtualized infrastructure allows
a client or host request to be serviced through any node port within the cluster regardless of where the
storage resource that contains the data actually resides. During ARL, there is no change in the availability
of the aggregate to any incoming host or application requests. During the reassignment of aggregates
from the source node to the destination nodem, the aggregates are offlined and then returned to the
online state once the storage resource is reassigned to the destination node. This period when the
aggregate is offline and then brought back online constitutes a small window of time in which I/O will be
retried until the aggregate is brought back online. This period of time is similar to the time taken for
storage failover to complete.
Before initiating an aggregate relocate, several conditions must be true for the source and destination
nodes as well as for the aggregates identified for the relocation.

The aggregate(s) have SFO policy (rather than CFO policy, which is traditionally assigned to root
aggregates and 7-Mode aggregates). Refer to TR-3450 for SFO and CFO policy information.

The aggregate is in the online state. An aggregate in an offline or degraded state will not be
eligible for ARL.

General Phases of ARL


The ARL process consists of these phases.

Aggregate Relocate Overview and Best Practices

1. Validation Phase: Checks the conditions of the source and destination nodes as well as the
aggregates to be relocated.
2. Precommit Phase: The period for execution of any prerquisite processing required before the relocate
is executed. This can include preparing the aggregate(s) for being relocated, setting flags, and
transferring certain noncritical subsystem data. Any processing performed at this stage can be simply
reverted or cleaned up.
3. Commit Phase: This is when the actual processing associated with relocating the aggregate to the
destination node is done. Once the commit phase is entered, the ARL cannot be aborted. This
commit stage is time bound within a period of time acceptable to the client or host application,
meaning that the time between the aggregate being offline on the source node and the aggregate
coming online on the destination node does not exceed 60 seconds.
4. Abort Phase (Optional): An abort is performed only if the validation phase or precommit phase is
abandoned by conditional checks not being met. A series of cleanup processes revert any processing
activity that happened during the validation or precommit phases.

4 Controller Head Upgrades


The longevity of a controller is between three and five years. As a controller nears end of life, a process is
initiated to upgrade the hardware. Traditionally, for lack of a better solution, the process requires an
extensive period of time to migrate the data on the controllers storage to other hardware. Aggregate
relocate eliminates the need to perform a physical data migration in order to upgrade the controller
hardware. Instead, the aggregates are simply logically relocated to an alternative controller for the
duration of the upgrade. The data stays intact on its original storage; client and host I/O requests will be
serviced by the alternative controller for the duration of the ARL.

4.1

Double-Hop Upgrades

There are several ways ARL can be used to upgrade storage controllers. NetApp recommends a doublehop upgrade for simplicity as the preferred method for controller hardware upgrades.
A double-hop upgrade uses ARL to relocate aggregates between the heads of the HA pair. The high-level
procedure for the double-hop upgrade is as follows. Note this is a summary of the process; for detailed
execution steps, please refer to the product documentation, available currently at Using ARL to upgrade
controller hardware on a pair of nodes running clustered Data ONTAP 8.2.
1. If the replacement nodes are running a later release of Data ONTAP (for example, 8.2.1),
upgrade the source nodes to that release. A head upgrade using aggregate relocate cannot be
combined with a Data ONTAP version upgrade.
2. Use ARL to migrate aggregates from node A to node B
3. Migrate data LIFs from node A to node B (or other nodes within the cluster)
4. Disable SFO.
5. Replace node A with node C (execute all setup, disk reassign, and licensing administration for
node C).
6. Migrate data LIFs from node B to node C.
7. Use ARL to relocate aggregates from node B to node C.
8. Migrate data LIFs from node B to node C (or from other nodes in the cluster)
9. Replace node B with node D (execute all setup, disk reassign of the root aggregate and spare
drives, and licensing administration for node D).
10. Enable SFO.
11. Migrate selected aggregates and LIFs to node D.

Aggregate Relocate Overview and Best Practices

Figure 2) Double-hop upgrade transition steps.

Upgrade for Controllers with Internal Disks


Controllers with internal disks contained within the controller chassis require the use of volume move to
physically relocate the data on the internal drives to storage connected to the new heads. ARL is not
supported for aggregates on internal disks. For example, for a FAS2200 storage controller HA pair that
has three volumes on the internal drives, volume moves would required to move the volumes to disks that
are attached to the new controllers.
To perform a controller upgrade on this cluster, join the new controllers to the cluster, provided the cluster
platform mixing rules are complied with. As per the clustered Data ONTAP Platform Mixing Rules,
clusters containing FAS22x0 controllers are limited to a maximum of 4 nodes. Therefore if internal drives
will be used in the FAS22x0s, the cluster should be limited to two nodes so that two new nodes can be
added to perform a technology refresh. The new controllers require additional storage sufficient to contain
the existing data volumes on the FAS22x0 controllers. Move all the data volumes from the internal drives
(as well as any volumes on external shelves) in the FAS22x0 controllers to the new heads using volume
move. Migrate LIFs and adjust failover groups, delete the aggregates, then unjoin the FAS22x0 nodes
from the cluster.

Aggregate Relocate Overview and Best Practices

To summarize, for controllers with internal drives and externally attached shelves, volume move
(DataMotion for volumes) should be used for controller hardware upgrades. This process is documented
in Upgrading controller hardware on a pair of nodes by moving volumes. OnCommand Workflow
Automation 2.2 includes a workflow for technology refresh of an HA pair controller and attached storage.

4.2

Supported Upgrades

Aggregate relocate is a feature introduced in clustered Data ONTAP 8.2; therefore, all controllers
participating in the hardware controller upgrade must be running Data ONTAP 8.2 or later, and the
original and replacement controllers must be running the same 8.2.x release. Only platforms that are
supported in Data ONTAP 8.2 (or later) will be qualified for nondisruptive hardware controller upgrades
with ARL. Upgrades from platforms that are not supported on Data ONTAP 8.2, will not support
nondisruptive head upgrades with ARL. ARL is supported on V-series with the same guidelines as FAS
controllers and can be used to upgrade a V-series controller with another V-series, or to a FAS controller.
Note that a V-series controller can be upgraded to a FAS controller providing that only NetApp shelves
are configured on the V-series. In all cases, the upgraded controllers in the HA pair must match exactly
(FAS with FAS, V-series with V-series).
The supported and qualified controller upgrades are listed in the documentation available at Using ARL to
upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2. Only controller
upgrades are supported controller downgrades (for example from a 62x0 to a 32x0 platform) are not
supported.
A dual-controller enclosure refers to a single-chassis enclosure containing two controller heads. A singlecontroller enclosure refers to a single-chassis enclosure with a single controller head.
Table 3) Supported nondisruptive head upgrades using ARL.

Controller Not
Supported in Data
ONTAP 8.2.x

Single-Controller
Enclosure HA

Dual-Controller
Enclosure HA

Controller Not
Supported in Data
ONTAP 8.2.x

No

No

No

Single-Controller
Enclosure HA

No

Yes

Yes

Dual-Controller
Enclosure HA

No

Yes

Yes

Destination
Source

Table 4) Supported platforms with ARL.

Platform

Supports ARL

FAS2020/FAS2040/FAS2050

No not supported in clustered Data ONTAP 8.2

FAS2220/FAS2240

Yes*

FAS3020/FAS3040/FAS3050/FAS3070

No not supported in clustered Data ONTAP 8.2

FAS3140/FAS3160/FAS3170

Yes

Aggregate Relocate Overview and Best Practices

FAS3210/FAS3220/FAS3240/FAS3250/FAS3270

Yes

FAS6030/FAS6070

No not supported in clustered Data ONTAP 8.2

FAS6040/FAS6080

Yes

FAS6210/FAS6240/FAS6280/FAS6290

Yes

*Note: ARL is not supported for internal drives. Volume move needs to be used to physically relocate any
data on internal drives to storage attached to the nodes that will replace the original controllers. ARL
does not relocate the root aggregate, and cannot be used to perform an in-place controller upgrade if the
root aggregate is hosted on internal drives.

Single Node Cluster


A cluster consisting of a single node does not support ARL. ARL is qualified for support on HA pair
configurations for the purpose of an upgrade or maintenance of the controller(s).

4.3

Command Line Interface

ARL is implemented as the storage aggregate relocation CLI command.


Table 5) Command line interface options for ARL.

storage aggregate relocation start


Command Options

Description

-aggregate-list

List of aggregates to relocate to HA partner node.


For all aggregates, use the asterisk symbol.

-node

The name of the source node for the aggregates


being relocated.

-destination

The name of the destination node that the


aggregates will be relocated to.

-override-vetoes

This vetoes some checks on the source that would


prevent the relocation attempt when set to true.
Refer to ARL Upgrade documentation on the
NetApp support site for conditions which can be
vetoed.

-relocate-to-higher-version

This allows aggregates to be relocated to a node


that is running a higher version of Data ONTAP.
Currently this flag has not been qualified as part of
an in-place controller upgrade process; hence the
original and replacement nodes must be running
the same release of Data ONTAP 8.2.x before
starting the process.

-override-destination-checks

This overrides certain checks on the destination if


set to true. Refer to ARL Upgrade documentation
on the NetApp support site for checks which can
be vetoed.

10

Aggregate Relocate Overview and Best Practices

storage aggregate relocation start


-ndo-controller-upgrade

This specifies if the relocation operation is being


done as a part of non-disruptive controller upgrade
process. Aggregate relocation will not change the
home ownerships of the aggregates while
relocating as part of controller upgrade. The default
value is false. Requires advanced privilege.

Temporary Reassign Parameter


When doing a controller upgrade via ARL, the ndo-controller-upgrade advanced privilege option
is required, so that ownership of the aggregates is not reassigned d to the partner node. The relocation to
the partner node is temporary during a controller hardware upgrade and the aggregates will be returned
to their home node at the end of the process.

Relocating to a Higher Version of Data ONTAP


Upgrading a new controller type may introduce a new version of Data ONTAP into the cluster. For
example, suppose a FAS3200 series HA pair running Data ONTAP 8.2 is to be upgraded with controllers
formatted with Data ONTAP 8.2.1, using the aggregate relocate nondisruptive process. It is not supported
to relocate aggregates from a Data ONTAP 8.2 node to a Data ONTAP 8.2.1 node; it is required to
upgrade the existing FAS3200 HA pair to Data ONTAP 8.2.1 before starting the process;
A source node formatted for a later version of Data ONTAP cannot relocate aggregates to a destination
node running an older version of Data ONTAP. Generally, this will not be an issue since ARL is supported
only for the purposes of a controller upgrade.
Table 6 shows the supported combinations of clustered Data ONTAP versions with ARL. In summary, in
all cases, the source and replacement nodes must be running the same major/minor version of Data
ONTAP.
Table 6) Compatibility of ARL between Data ONTAP releases.

Destination

Data ONTAP 8.1

Data ONTAP 8.2

Data ONTAP 8.2.1

Data ONTAP 8.1

N/A

No

No

Data ONTAP 8.2

No

Yes

No

Data ONTAP 8.2.1

No

No

Yes

Source

Forcing Aggregate Relocate


A set of checks is carried out on both the source and destination nodes participating in the aggregate
relocate. An option is provided to bypass the conditional checks done on the destination node, thereby
forcing the aggregate relocate. Using the override-destination-checks parameter may result in a
prolonged time to complete for the ARL due to the nonoptimized conditions on the destination node. The
prolonged time for the ARL to complete may exceed the allowable window of time for an IO to be
successful and cause a disruption to the client.

11

Aggregate Relocate Overview and Best Practices

Two sets of checks are executed on the aggregates: common checks and aggregate granular checks.
The common checks will have the same result for all aggregates while the aggregate granular checks are
specific to each aggregate. If any of these conditional checks fails, then the ARL may fail completely or
for a specific aggregate.
If a common check fails, the entire ARL job will fail. For example, some of the common checks are:

Are the source and destination nodes for the ARL job in the same cluster?

Is the destination node in quorum?

Are compatible versions of clustered Data ONTAP (8.2 or higher) on the source and destination
nodes?

If the checks done granularly at the aggregate level fail, then relocation of one or more aggregates may
fail. For example, some of the more common aggregate checks are:

4.4

Will the relocation of the aggregate cause any hard limits (such as a FlexVol limit) to be
exceeded?

Does the destination node see all the disks assigned to the aggregate(s) being relocated?

Graphical User Interface

Aggregate relocate is currently available only from the CLI.

4.5

Best Practices

Initiating Aggregate Relocate in a Degraded Cluster


Do not use ARL in parallel with storage failover or disaster recovery events. During a storage failover or
disaster recovery event, the system is already at an increased risk of failure due to the vulnerability of the
infrastructure in a degraded state. NetApp does not recommend introducing additional processing such
as an ARL job unless completely necessary. In addition, NetApp does not recommend making any
changes to the aggregate or the contained volumes during an aggregate relocate. It is essential to
complete the ARL in a defined period of time to prevent disruption to the client or host application.
Therefore, limiting additional processing on the resources being relocated allows the ARL completion to
be more deterministic.

SMB File Shares and Small Random Workloads


NDO, including ARL, is supported for customers with Hyper-V with Continuously Available shares and
SMB 3.0 in Data ONTAP 8.2, as documented in TR-4100: Nondisruptive Operations for SMB File Shares.
For customer environments with a FAS6200 series controller running a workload with predominately
small IO size, it is recommended to adjust the cifs-ndo-duration option to reduce certain long
running processes associated with the transition of aggregates. Specifically, customers running 50% or
more of their workload with an IO size of 4KB blocks or smaller will need to adjust the cifs-ndoduration option as per this table:
Table 7) Recommended values for cifs-ndo-duration with small block size

Workload Percentage having 4KB (or smaller) IO


size

Setting for cifs-ndo-duration

0-50%

default (no change)

12

Aggregate Relocate Overview and Best Practices

Workload Percentage having 4KB (or smaller) IO


size

Setting for cifs-ndo-duration

50-75%

medium

75-100%

low

For example, in a mixed workload of 5% 16KB IO, 50% 8KB IO, and 45% 4KB no change would need to
be made. For a mixed workload of 55% 4KB IO and 45% 8KB IO then the option would be set to medium.
The option can be set using the -storage failover modify command, using advanced privilege.
The CIFS license must be enabled.

FlexVol Limits and Concurrent Aggregate Relocates


An administrator can issue multiple aggregate relocate commands in parallel in a cluster. For situations in
which concurrent aggregate relocate jobs are initiated, it is necessary to consider limits on the destination
node. The aggregate relocate validation phase checks to make sure that there is adequate space to
accommodate the increase in the number of FlexVol volumes after the relocation. However, if several
relocates are happening in parallel, the relocation of several aggregates to the same destination is not
accounted for. For example, if a node has 450 volumes and the limit is 500 volumes, if aggregate A and
aggregate B being moved each have 35 volumes, the FlexVol limit would not be flagged during the
validation phase. However, once the aggregates are both relocated to the destination node, the FlexVol
limit would be exceeded and volumes would be taken offline. As a best practice, it is advisable to move
aggregates in sequence when limits are at risk of being exceeded.

Aggregate State
Aggregate relocate will only succeed for aggregates that are in an online state. Prior to initiating a
controller hardware upgrade, verify that all aggregates are online and in a healthy state. Any aggregate in
a degraded or offline state will not be relocated.

Volume Move and Aggregate Relocate


A volume move job can be issued concurrently with an ARL. For example, a customer may issue a
volume move on a set of internal drives while at the same time relocating an external aggregate to the
partner node in preparation for a controller hardware upgrade of a low-end platform. The expected
behavior of a volume move, done in parallel with an ARL, depends on the phase of the volume move.
During the iterative phase, when data is being physically migrated to the destination aggregate, ARL can
proceed as necessary. However, during the cutover phase, only limited processing by other jobs can
occur to allow the volume move to complete in the defined cutover period. NetApp recommends that you
execute ARL jobs separately from a volume move job when all resources are on the same controllers. For

more information on DataMotion software for Volumes refer to TR-4075 (see details in the References
section of this document).

Maximum Number of Aggregates and Volumes


When relocating aggregates between HA partners, consider the variance in aggregate size when moving
between dissimilar controller types. In general, customers will move to a controller with equal or higher
limits for the aggregate size. The supported controller upgrades are listed in the documentation and these
combinations should take the supported aggregate size and number of volumes into account. As a best
practice, verify the limits on the original and replacement controllers on Hardware Universe as part of the
upgrade planning process to ensure that the replacement controllers has equal or higher limits than the
original controllers.

13

Aggregate Relocate Overview and Best Practices

Configuring Timeouts
There is a short period during the relocation of the aggregate when I/O requests are retried while
aggregates are brought online on the partner node. The client I/O retry would be based on the retry
method instantiated on the client. Configure client or host response windows to exceed the amount of
time they may take. NetApp recommends that these retry windows be set to 60 seconds at a minimum.
NetApp also recommends increasing the retry window to 120 seconds for protocols that will support it.

Mutually Exclusive Operations


There are several activities that will prevent an ARL from proceeding. If any of the following activities are
in progress when an ARL is initiated, ARL will not start.

The source node is either executing a takeover of a partners disks or being taken over.

The source node is executing a giveback of a partners disks.

The source node is in the process of shutting down.

The source node is out of quorum in the cluster.

The source node is in the process of reverting the clustered Data ONTAP version.

When any of the above conditions are true, the ARL job would need to be initiated once the activities
have completed. For the destination node, the following operations are handled differently.

4.6

While a destination node is executing a giveback, an ARL will proceed, but NetApp does not
recommend doing this as a best practice.

Considerations for Failure Scenarios

Storage Failover During Aggregate Relocate


Storage failover and ARL have dependencies that impact the overall system if both events are triggered
concurrently. In general, if the SFO event occurs when SFO is enabled, then failover will occur as
expected. However, if SFO is disabled, which is required during the ARL process for head upgrade, as
described in section 4.1, then failover would not occur and some level of system resiliency and availability
would be impacted.

Aggregate Relocate Failure of an Aggregate


The ARL process will relocate each individual aggregate serially. Use the storage aggregate
relocate show command, to display successful aggregate relocates as well as any that incurred
errors. Any aggregate that incurred an error will have an associated cause for the failure. When
applicable a message will indicate a course of action to correct the error; for example, if an aggregate
was not relocated due to an aggregate-level check. You may use the override-vetoes option to
avoid this check.

14

Aggregate Relocate Overview and Best Practices

References
The following references were used in this technical report.

TR-3450: High-Availability Controller Configuration Guide and Best Practices


http://media.netapp.com/documents/tr-3450.pdf

TR-4075: DataMotion for Volumes Overview, Best Practices and Optimization


https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=72418&contentID=75392

Using ARL to upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2
https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=103946&contentID=158025

Version History
Version

Date

Document Version History

Version 1.0

May 2013

Initial release

Version 1.1

June 2013

Minor updates

Version 1.2

December 2013

Minor updates

Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product
and feature versions described in this document are supported for your specific environment. The NetApp
IMT defines the product components and versions that can be used to construct configurations that are
supported by NetApp. Specific results depend on each customer's installation in accordance with published
specifications.
NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided herein. The
information in this document is distributed AS IS, and the use of this information or the implementation of
any recommendations or techniques herein is a customers responsibility and depends on the customers
ability to evaluate and integrate them into the customers operational environment. This document and
the information contained herein may be used solely in connection with the NetApp products discussed
in this document.

15

2014 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp,
Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, DataMotion, Data ONTAP, and
FlexVol are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-XXX-MMYR

Aggregate Relocate Overview and Best Practices

Anda mungkin juga menyukai