TABLE OF CONTENTS
1
Introduction........................................................................................................................................... 3
1.1
Scope .............................................................................................................................................................. 3
2.2
Cost Comparison of Vol Move versus ARL for Tech Refresh ......................................................................... 5
4.2
4.3
4.4
4.5
4.6
References ................................................................................................................................................ 15
Version History ......................................................................................................................................... 15
LIST OF TABLES
Table 1) Cost analysis breakdown.................................................................................................................................. 5
Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades. ............................. 5
Table 3) Supported nondisruptive head upgrades using ARL. ....................................................................................... 9
Table 4) Supported platforms with ARL. ......................................................................................................................... 9
Table 5) Command line interface options for ARL. ....................................................................................................... 10
Table 6) Compatibility of ARL between Data ONTAP releases. ................................................................................... 11
Table 7) Recommended values for cifs-ndo-duration with small block size ................................................................. 12
LIST OF FIGURES
Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions. .............................. 6
Figure 2) Double-hop upgrade transition steps. ............................................................................................................. 8
1 Introduction
Todays business environments require 24/7 data availability. The storage industry delivers the base
building block for IT infrastructures, providing data storage for all business and objectives. Therefore,
constant data availability begins with architecting storage systems that facilitate nondisruptive operations
(NDOs). Nondisruptive operations have three main uses: in hardware resiliency, hardware and software
lifecycle operations, and hardware and software maintenance operations. For the purposes of this paper
we focus on aggregate relocate (ARL) as a solution for lifecycle operations, specifically for refreshing a
storage controller.
NetApp storage controllers are the physical component of the logical node entity that services the client
and host requests from the upper layers of the IT infrastructure. The continuity of service during the
transition from storage controllers that are approaching an end-of-life state to shipping versions of the
replacement controller has been a cumbersome task regardless of the storage vendor. The aggregate
relocate feature eliminates the need to physically move any data. Aggregate relocate reassigns disks to
the partner node, making the partner node the owner of the aggregates. The relocation of the aggregates
makes the partner node the point of entry for all requests from the client or host applications without client
disruption.
1.1
Scope
The focus of this paper is on the aggregate relocate solution for the purpose of upgrading storage
controllers. The initial version of this paper does not go into detail regarding other solutions for ARL. In
the clustered Data ONTAP 8.2 architecture, ARL is qualified for the purpose of controller hardware
upgrades and maintenance operations. The primary points of discussion in this document include:
Comparison of the ARL and vol move methods for cluster technology refresh
An overview of the ARL feature and the end-to-end process of the ARL feature
2.1
Reducing the number of people required to transfer data from the old controller to the new
controller
Reducing the time for data migration (no physical data copy with ARL)
There are essentially two primary methods for technology refresh in clustered Data ONTAP. The first is
via DataMotion for Volumes (vol move) and involves moving all the data to existing or newly added nodes
in the cluster to evacuate the old nodes, then removing the nodes from the cluster. This process has been
supported in all versions of clustered Data ONTAP. The second method is using ARL and is new in
clustered Data ONTAP 8.2.
The following considerations apply to using the vol move method for tech refresh of the cluster.
If the node root or data aggregates are using internal drives, the vol move method is required.
If the customer has purchased new storage shelves in addition to new controllers, then vol move
is the best solution. In the ARL method, the data remains on the same shelves.
If the customer cannot tolerate disabling HA for the duration of ARL, vol move is preferred. When
vol move is used, all controllers remain up and serving data throughout, as opposed to the ARL
method when each controller in the HA pair will be shutdown for the duration of the controller
upgrade.
The vol move method takes substantially longer than the ARL method, since a full data copy is
required. The amount of time may be in the order of days or even weeks as it is directly
dependent on the amount of data to be evacuated. The vol moves can, however, be staggered
over time as required. The ARL method, by contrast is in the order of several hours for each
controller pair since no data copy is required, and it must be completed as a single maintenance
event.
If the controllers being refreshed are not on Data ONTAP 8.2, the ARL method cannot be used.
If new nodes are being added to the cluster, the cluster size and controller mix must remain
compliant with the Cluster Platform Mixing rules.
The following considerations apply to using the ARL inplace upgrade for tech refresh of the cluster.
If neither the node root or data aggregates are using internal drives, then ARL can be used
If the customer is only upgrading the storage controllers, then ARL can be used, since the disk
storage remains the same
The new controllers must support the existing shelf technology, since the controllers will be
attached to the old storage.
The original nodes must already be running Data ONTAP 8.2 or higher. If the controllers cannot
be upgraded to Data ONTAP 8.2, then the vol move method is the only option.
If the cluster is already at the maximum supported size, or the new controllers are not supported
in the same cluster as the old controllers (e.g. 6200 and 2200 nodes cannot exist in the same
cluster), then either ARL inplace controller upgrade, or vol move to existing nodes in the cluster
must be used. If a cluster is already at its maximum size and ARL is not possible for any reason,
the summary process to refresh the controllers is
1. Evacuate data from the nodes being replaced to other cluster nodes using vol move. If there
is insufficient free capacity for the data elsewhere in the cluster, additional storage will be
need to be added to the existing nodes.
2. Remove LIFs from the old nodes
3. Unjoin the evacuated nodes, thereby reducing the cluster size.
4. In the new cluster configuration it may now be possible to perform ARL inplace controller
upgrade. If this is not possible, the new controllers can be joined to the cluster as additional
nodes.
A final consideration is process complexity. The vol method is conceptually simpler since it uses only
standard clustered ONTAP commands and does not require advanced administration skills. It is also
available as a WFA workflow for the case where a new HA pair of controllers and storage will be added to
the cluster and the old controllers and storage are evacuated. In comparison, the ARL method requires a
number of different clustered ONTAP commands including commands run in maintenance mode, and is
almost exclusively CLI driven. This may be a consideration in the choice of which method to use.
Nevertheless, it is expected going forward that ARL will become the preferred method for the majority of
technology refresh scenarios.
2.2
In addition to the considerations given in the previous section, the impact of system cost should be taken
into account.
Table 1 and Table 2 show various values associated with a typical controller hardware upgrade. Two
values are given in the tables: values associated with the data copy (vol move) method and values
associated with the ARL method. The values are theoretical values; exact values would depend on the
impact to the business. For example, the amount of data that is being migrated for a physical data
migration solution impacts the duration of the transition time.
The objective of the tables is to demonstrate that aggregate relocate decreases the cost associated with
each controller.
Table 1) Cost analysis breakdown.
Cost Variable
Controller_cost
Theoretical Value
$50,000.00
42 months
(48 months 6 months) Data Copy Move
47.75 months
(48 months month) ARL
Transition_time
Transition_cost
Table 2) Cost differential between data copy and ARL solutions for controller hardware upgrades.
General Formula
Using ARL
Cost of
Controller
Monthly
(Controller_cost)/(Production_time)
($50,000.00)/(42
months)
($50,000.00)/(47.75
months)
$1,190.48
$1,047.12
Total
Transition
Costs
(Transition_time)*(Transition_cost)
(6 months)*
($1,000.00)
(1/4 month)*
($1,000.00)
$6,000
$250.00
The following figure represents the lifetime of a controller using a physical data copy solution versus an
aggregate relocate solution. The physical data copy solution requires more time to complete and the new
controllers will go into production earlier than if aggregate relocate were being used. Aggregate relocate
allows the new controllers to be purchased and put in-place later in the cycle, extending the lifetime of
each controller.
Figure 1) Controller HW lifecycle chart when using physical data migration versus ARL solutions.
Fully booted
The nondisruptive nature of ARL is facilitated by the clustered Data ONTAP architecture, which provides
virtualization of the networking interface from the storage resources. This virtualized infrastructure allows
a client or host request to be serviced through any node port within the cluster regardless of where the
storage resource that contains the data actually resides. During ARL, there is no change in the availability
of the aggregate to any incoming host or application requests. During the reassignment of aggregates
from the source node to the destination nodem, the aggregates are offlined and then returned to the
online state once the storage resource is reassigned to the destination node. This period when the
aggregate is offline and then brought back online constitutes a small window of time in which I/O will be
retried until the aggregate is brought back online. This period of time is similar to the time taken for
storage failover to complete.
Before initiating an aggregate relocate, several conditions must be true for the source and destination
nodes as well as for the aggregates identified for the relocation.
The aggregate(s) have SFO policy (rather than CFO policy, which is traditionally assigned to root
aggregates and 7-Mode aggregates). Refer to TR-3450 for SFO and CFO policy information.
The aggregate is in the online state. An aggregate in an offline or degraded state will not be
eligible for ARL.
1. Validation Phase: Checks the conditions of the source and destination nodes as well as the
aggregates to be relocated.
2. Precommit Phase: The period for execution of any prerquisite processing required before the relocate
is executed. This can include preparing the aggregate(s) for being relocated, setting flags, and
transferring certain noncritical subsystem data. Any processing performed at this stage can be simply
reverted or cleaned up.
3. Commit Phase: This is when the actual processing associated with relocating the aggregate to the
destination node is done. Once the commit phase is entered, the ARL cannot be aborted. This
commit stage is time bound within a period of time acceptable to the client or host application,
meaning that the time between the aggregate being offline on the source node and the aggregate
coming online on the destination node does not exceed 60 seconds.
4. Abort Phase (Optional): An abort is performed only if the validation phase or precommit phase is
abandoned by conditional checks not being met. A series of cleanup processes revert any processing
activity that happened during the validation or precommit phases.
4.1
Double-Hop Upgrades
There are several ways ARL can be used to upgrade storage controllers. NetApp recommends a doublehop upgrade for simplicity as the preferred method for controller hardware upgrades.
A double-hop upgrade uses ARL to relocate aggregates between the heads of the HA pair. The high-level
procedure for the double-hop upgrade is as follows. Note this is a summary of the process; for detailed
execution steps, please refer to the product documentation, available currently at Using ARL to upgrade
controller hardware on a pair of nodes running clustered Data ONTAP 8.2.
1. If the replacement nodes are running a later release of Data ONTAP (for example, 8.2.1),
upgrade the source nodes to that release. A head upgrade using aggregate relocate cannot be
combined with a Data ONTAP version upgrade.
2. Use ARL to migrate aggregates from node A to node B
3. Migrate data LIFs from node A to node B (or other nodes within the cluster)
4. Disable SFO.
5. Replace node A with node C (execute all setup, disk reassign, and licensing administration for
node C).
6. Migrate data LIFs from node B to node C.
7. Use ARL to relocate aggregates from node B to node C.
8. Migrate data LIFs from node B to node C (or from other nodes in the cluster)
9. Replace node B with node D (execute all setup, disk reassign of the root aggregate and spare
drives, and licensing administration for node D).
10. Enable SFO.
11. Migrate selected aggregates and LIFs to node D.
To summarize, for controllers with internal drives and externally attached shelves, volume move
(DataMotion for volumes) should be used for controller hardware upgrades. This process is documented
in Upgrading controller hardware on a pair of nodes by moving volumes. OnCommand Workflow
Automation 2.2 includes a workflow for technology refresh of an HA pair controller and attached storage.
4.2
Supported Upgrades
Aggregate relocate is a feature introduced in clustered Data ONTAP 8.2; therefore, all controllers
participating in the hardware controller upgrade must be running Data ONTAP 8.2 or later, and the
original and replacement controllers must be running the same 8.2.x release. Only platforms that are
supported in Data ONTAP 8.2 (or later) will be qualified for nondisruptive hardware controller upgrades
with ARL. Upgrades from platforms that are not supported on Data ONTAP 8.2, will not support
nondisruptive head upgrades with ARL. ARL is supported on V-series with the same guidelines as FAS
controllers and can be used to upgrade a V-series controller with another V-series, or to a FAS controller.
Note that a V-series controller can be upgraded to a FAS controller providing that only NetApp shelves
are configured on the V-series. In all cases, the upgraded controllers in the HA pair must match exactly
(FAS with FAS, V-series with V-series).
The supported and qualified controller upgrades are listed in the documentation available at Using ARL to
upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2. Only controller
upgrades are supported controller downgrades (for example from a 62x0 to a 32x0 platform) are not
supported.
A dual-controller enclosure refers to a single-chassis enclosure containing two controller heads. A singlecontroller enclosure refers to a single-chassis enclosure with a single controller head.
Table 3) Supported nondisruptive head upgrades using ARL.
Controller Not
Supported in Data
ONTAP 8.2.x
Single-Controller
Enclosure HA
Dual-Controller
Enclosure HA
Controller Not
Supported in Data
ONTAP 8.2.x
No
No
No
Single-Controller
Enclosure HA
No
Yes
Yes
Dual-Controller
Enclosure HA
No
Yes
Yes
Destination
Source
Platform
Supports ARL
FAS2020/FAS2040/FAS2050
FAS2220/FAS2240
Yes*
FAS3020/FAS3040/FAS3050/FAS3070
FAS3140/FAS3160/FAS3170
Yes
FAS3210/FAS3220/FAS3240/FAS3250/FAS3270
Yes
FAS6030/FAS6070
FAS6040/FAS6080
Yes
FAS6210/FAS6240/FAS6280/FAS6290
Yes
*Note: ARL is not supported for internal drives. Volume move needs to be used to physically relocate any
data on internal drives to storage attached to the nodes that will replace the original controllers. ARL
does not relocate the root aggregate, and cannot be used to perform an in-place controller upgrade if the
root aggregate is hosted on internal drives.
4.3
Description
-aggregate-list
-node
-destination
-override-vetoes
-relocate-to-higher-version
-override-destination-checks
10
Destination
N/A
No
No
No
Yes
No
No
No
Yes
Source
11
Two sets of checks are executed on the aggregates: common checks and aggregate granular checks.
The common checks will have the same result for all aggregates while the aggregate granular checks are
specific to each aggregate. If any of these conditional checks fails, then the ARL may fail completely or
for a specific aggregate.
If a common check fails, the entire ARL job will fail. For example, some of the common checks are:
Are the source and destination nodes for the ARL job in the same cluster?
Are compatible versions of clustered Data ONTAP (8.2 or higher) on the source and destination
nodes?
If the checks done granularly at the aggregate level fail, then relocation of one or more aggregates may
fail. For example, some of the more common aggregate checks are:
4.4
Will the relocation of the aggregate cause any hard limits (such as a FlexVol limit) to be
exceeded?
Does the destination node see all the disks assigned to the aggregate(s) being relocated?
4.5
Best Practices
0-50%
12
50-75%
medium
75-100%
low
For example, in a mixed workload of 5% 16KB IO, 50% 8KB IO, and 45% 4KB no change would need to
be made. For a mixed workload of 55% 4KB IO and 45% 8KB IO then the option would be set to medium.
The option can be set using the -storage failover modify command, using advanced privilege.
The CIFS license must be enabled.
Aggregate State
Aggregate relocate will only succeed for aggregates that are in an online state. Prior to initiating a
controller hardware upgrade, verify that all aggregates are online and in a healthy state. Any aggregate in
a degraded or offline state will not be relocated.
more information on DataMotion software for Volumes refer to TR-4075 (see details in the References
section of this document).
13
Configuring Timeouts
There is a short period during the relocation of the aggregate when I/O requests are retried while
aggregates are brought online on the partner node. The client I/O retry would be based on the retry
method instantiated on the client. Configure client or host response windows to exceed the amount of
time they may take. NetApp recommends that these retry windows be set to 60 seconds at a minimum.
NetApp also recommends increasing the retry window to 120 seconds for protocols that will support it.
The source node is either executing a takeover of a partners disks or being taken over.
The source node is in the process of reverting the clustered Data ONTAP version.
When any of the above conditions are true, the ARL job would need to be initiated once the activities
have completed. For the destination node, the following operations are handled differently.
4.6
While a destination node is executing a giveback, an ARL will proceed, but NetApp does not
recommend doing this as a best practice.
14
References
The following references were used in this technical report.
Using ARL to upgrade controller hardware on a pair of nodes running clustered Data ONTAP 8.2
https://fieldportal.netapp.com/Core/DownloadDoc.aspx?documentID=103946&contentID=158025
Version History
Version
Date
Version 1.0
May 2013
Initial release
Version 1.1
June 2013
Minor updates
Version 1.2
December 2013
Minor updates
Refer to the Interoperability Matrix Tool (IMT) on the NetApp Support site to validate that the exact product
and feature versions described in this document are supported for your specific environment. The NetApp
IMT defines the product components and versions that can be used to construct configurations that are
supported by NetApp. Specific results depend on each customer's installation in accordance with published
specifications.
NetApp provides no representations or warranties regarding the accuracy, reliability, or serviceability of any
information or recommendations provided in this publication, or with respect to any results that may be
obtained by the use of the information or observance of any recommendations provided herein. The
information in this document is distributed AS IS, and the use of this information or the implementation of
any recommendations or techniques herein is a customers responsibility and depends on the customers
ability to evaluate and integrate them into the customers operational environment. This document and
the information contained herein may be used solely in connection with the NetApp products discussed
in this document.
15
2014 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp,
Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, DataMotion, Data ONTAP, and
FlexVol are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. All other brands or
products are trademarks or registered trademarks of their respective holders and should be treated as such. TR-XXX-MMYR