Isilon OneFS Storage Utilization Best Practices Guide An Isilon Systems Technical Whitepaper
April 2008
ISILON SYSTEMS
Table of Contents
1. 2. 3. 4. 5. 6. 7. Introduction ................................................................................................................................. 3 OneFS Storage Utilization......................................................................................................... 3 OneFS File Layout ...................................................................................................................... 4 OneFS Protection Overhead..................................................................................................... 6 Small Files Best Practices ........................................................................................................ 8 Online Capacity Analyzer ........................................................................................................10 Summary ....................................................................................................................................11
Appendix A: File Size Reporting Tools .........................................................................................12 Appendix B: Sample Storage Utilization Graphs .........................................................................13
ISILON SYSTEMS
1. Introduction
Isilon has become the clear leader in clustered storage by demonstrating superiority in storing and servicing data sets primarily containing large files of unstructured data. While this data is dominant in our core verticals, such as media & entertainment and Web2.0, Isilon has also been very successful in providing storage for a wide variety of data sets with a mix of small, medium and large files. In comparison to traditional RAID based file systems, OneFS can be misconstrued as inefficient at storing data sets with small files. The purpose of this paper is to rectify this false impression by introducing key concepts and demonstrating how to properly assess the overhead of OneFS data protection architecture. By further educating the market on OneFS we hope to remove any ambiguity associated with storage protection overhead and examine how the benefits of OneFS unique architecture extend to a wide variety of data sets. As part of these guidelines a new online tool is introduced to help storage professionals calculate and analyze overhead of varying data sets as well as eliminate the guess work of determining where OneFS is applicable.
Considering these limitations, a typical RAID 5 storage system may incur between 40%-50% overhead, and even more for RAID-6 (double parity protection). In contrast to traditional storage architectures, which consist of three separate, non-integrated software layers the file system, volume management and RAID control - OneFS integrates all three layers into the OS. This seamless integration eliminates the management pain and scalability limitations mentioned above. Therefore, by combining these three layers, OneFS provides the following storage benefits:
ISILON SYSTEMS
Granular, flexible file protection. OneFS directly controls the layout of each file across the cluster, allowing administrators to set protection and performance settings on a file-by-file basis. These settings can be modified at any given moment, simultaneously modifying the protection overhead accordingly. One single pool of storage and one file system. Administrators do not need to manage complex RAID group settings or many volumes and file systems. In OneFS, once new nodes are added, existing data is automatically balanced by striping or mirroring data onto the new set of nodes. Support protection levels up to N+4 or 8x. OneFS is able to sustain multiple simultaneous failures of nodes or disks without losing data by striping or mirroring data across multiple nodes. Since OneFS stripes user data across different nodes, data protection extends from individual drive failures to complete node failures. No hot spare disks required. All disks across nodes in the cluster are available for rebuilding protection data anywhere in the cluster. No snapshot reservation required. Snapshots use the same pool of storage as the live file system. Industry leading scalability. OneFS presents all available storage as one single pool of storage. Up to 90% storage utilization. With proper cluster configuration certain user data sets may require only 10% of physical storage for protection.
Because OneFS protects data at the file level, protection overhead is calculated on a per file basis, as apposed to a flat overhead charge associated with RAID volumes. Files of different sizes and protection settings incur varying protection overhead. Consequently, when certain individual files are examined in isolation their overhead may seem excessive, leading to an incorrect assessment of overall storage utilization.
ISILON SYSTEMS
Once the write transaction has been committed a response is returned to the client. There are other elements to writing data on OneFS such as asynchronous writes, caching, journaling, and locking that are outside the scope of this discussion.
Data is evenly distributed across nodes as it is written OneFS block size is 8KB 16 blocks written contiguously to each node 8KB block size X 16 blocks = 128KB stripe unit size
0 1 ... 15
16 17 ... 31
next stripe
node A
node B
node J
File protection is defined in a policy using one of two protection types: Forward Error Correction (FEC) This protection type is presented as N+M where N is the maximum number of nodes in a stripe containing user data and M is the number of nodes with parity data. M, which ranges from 1 to 4, represents how many nodes may fail while user data is fully available. The combination of N+M is also known as the stripe width. The stripe width will never exceed the number of nodes in a cluster; otherwise, one or more nodes in a stripe will have more than one sequence which will not be protected if that node becomes unavailable. For example, a five node cluster with N+1 protection policy will write files in stripes with N=4 (for data sequences) and M=1 (1 protection sequence). In an N+1 protection level, N is maxed out at 9 to produce maximum stripe width of 10, and in N+4, N is maxed at 16 to produce maximum stripe width of 20. See table 1 below for full range of stripe widths. Mirroring This protection type is presented as Mx where M is between 2 and 8. In this protection type, OneFS can protect files by creating between 2 and 8 copies of data in a stripe. Each additional mirrored copy adds 100% overhead to the user data copy. All mirrored copies are identical and reside on different nodes. The number of mirrored copies cannot exceed the number of nodes in a cluster. OneFS stores each files protection policy in the files meta-data. When new files are created they inherit the protection setting from their parent directory.
ISILON SYSTEMS
The BPO in table 1 is only an optimal baseline to compare against. It matches actual protection overhead only for files that have an exact number of 8KB blocks and that are large enough to occupy the maximum stripe width for the given FEC protection level. For example, at +3, to achieve optimal BPO, the cluster must have at least 18 nodes and the file must be at least 15x128KB=1.875MB in size, consuming 2.25MB of disk storage (20% protection overhead). Similar to mirrored protection cases, small files and files that do not hold an exact number of 8KB blocks will increase the overall protection overhead. For example, a 10KB file set at +1 on a 5
ISILON SYSTEMS
node cluster will actually take 32KB. While OneFS will report the protection policy as +1, the actual protection overhead will be 220% compared with the 20% baseline protection overhead. Actual Protection vs. Protection Policy There are also situations where a file cannot be protected according to its protection policy because OneFS cannot use the ideal protection layout. In these cases OneFS will attempt to provide the same level of file protection using the lowest mirroring protection scheme. For example, on a 3 node cluster a file with a +2 policy will be protected using a 3x mirror because on a 3 node cluster two distinct protection sequences in a stripe cannot be generated. The following table maps policy to actual protection level. Green indicates the desired protection policy is met, yellow indicates a changed protection policy that preserves the desired protection policy, and red indicate a lower protection than the desired protection policy. Cluster Size Policy +1 +2 +3 +4 9+1 12+2 15+3 16+4 9+1 12+2 15+3 16+4 9+1 8+2 7+3 6+4 8+1 7+2 6+3 5+4 7+1 6+2 5+3 4+4 6+1 5+2 4+3 5x 5+1 4+2 3+3 5x 4+1 3+2 4x 5x 3+1 2+2 4x 4x 2+1 3x 3x 3x 30 20 10 9 8 7 6 5 4 3
Table 2 File protection for various cluster configurations and protection policies Mixed Data Sets As indicated, smaller files incur additional protection overhead. This different overhead incurred in small files is noticeable in data sets with only very small files, but most data sets include files in a variety of sizes. The total cluster protection overhead mainly depends on the ratio between small and large files. In most cases the presence a few large files nullifies the extra protection overhead of many small files. For example, on a 5 node cluster with +1 protection setting, if there are a hundred 10K files and one 10MB file is added the overhead shifts dramatically from 150% to 36% of total protection overhead (compared to a baseline of 25%). It is important to understand that it is not the average file size in the cluster that matters for calculating total protection overhead, but the ratio between small and large files. In a data set with many small files and very few large files the average file size may be very close to the size of the small files, but the total protection overhead may be much closer to that of the large file. Using average file size produces incorrect storage utilization assessments. In the example above the average file size is about 111KB. Calculating the cluster protection overhead assuming all files are 111KB in size would produce 105% protection overhead compared with the 36% of actual overhead. The following graph shows the total data set protection overhead in the right column and the overhead of each of the classes of file sizes in the 2 other left columns. The graph shows that the total additional overhead in bytes produced by the 100 small files is significantly smaller in proportion to the one large file overhead. That is why the total additional overhead in the rightmost column remains relatively small.
ISILON SYSTEMS
20,000,000
18,000,000
16,000,000
14,000,000
12,000,000
Bytes
10,000,000
8,000,000
6,000,000
4,000,000
2,000,000
0
kB B B B 2T B B B B 51 2B 51 2B B 12 8M B 51 2M B 51 2G B 2M B 8M B 32 M B 2G 8G 8k 2k 8k G 8G 32 32 To ta l
12
File Size
ISILON SYSTEMS
12
Not all small files are created equally. Small files should be grouped into categories such as: 1K, 10K, 100K, and 500K. These groups should then be matched with other large files to offset storage overhead. The actual size of large files with ideal storage utilization is dependent on cluster size and stripe width. It is safe to use files over 2MB to offset against small files. The online analyzer tool described below can provide an accurate and granular analysis of storage utilization. The Isilon IQ single pool of storage makes it ideal for consolidating storage islands. Storage consolidation has the extra benefit of creating data sets with mixed file sizes, which further reduces total cluster protection overhead. Assuming some large files are present, always consider adding nodes to reduce storage overhead. Maximizing the number of nodes in a cluster, per desired protection level, minimizes protection overhead as more nodes participate in each file stripe. This technique directly minimizes protection overhead of single large files (2MB and up) and as described above, leads to significant lower total cluster protection overhead. Use the desired vs. actual protection policy in table 2 to ensure cluster size can support the desired protection policies. Try to avoid policy settings and cluster size combinations that produce suboptimal protection levels, as indicated in the yellow cells, or degraded protection levels indicated in the red cells. Isilon is a certified data storage partner for VMware ESX and Virtual Infrastructure. As an NFS data store, VM virtual disks are stored on an Isilon cluster as .vmdk files. Regardless of the size and amount of files a VM guest OS manages the entire data set is stored as a single large file on the Isilon cluster. As a result space utilization can be dramatically improved by data set virtualization. Investigate usage and access patterns of small files. If small files are rarely accessed after being created or after reaching a certain age, explore ways to archive those files. Isilon partners with various Archive vendors such as Symantec. An online tool is available for Isilon staff to calculate total data set protection overhead. The next section describes this tool.
ISILON SYSTEMS
Figure 3 online storage usage analyzer The Storage Usage Analyzer accepts cluster and file level settings as user input, and generates storage utilization output.
ISILON SYSTEMS
10
Cluster Settings The user selects the cluster size and the protection level. At any point the user can change the global settings and data below will be recalculated. Data set input and utilization metrics The user adds multiple entries of file size and file count combinations representing a data set. For each line entry the calculator generates the following storage utilization metrics: o Logical Space: the total space representing the file data. o Physical Space: the actual space used to store the data on disk. o Ideal Physical Space: what the cluster would use ideally, if there was no small file overhead. o Ideal Overhead: the overhead for protection in an ideal physical space allocation. o Additional Overhead: the extra space used because of small files space allocation. Data Set Analysis Actual vs Ideal % of Storage Used for Protection: the ratio of protection overhead divided by total storage allocation for the data set. The comparison between actual and ideal ratios allows you to see how much storage is needed for protection and the extra cost of protecting smaller files. Actual vs Ideal Parity Tax: the ratio of parity protection data divided by user data. The comparison between actual and ideal ratios allows you to see the protection overhead per data set and the extra cost of small files. Small File Penalty: the ratio of the small files additional storage divided by ideal physical space provides a view of the cost associated with small files compared with the ideal cost.
7. Summary
OneFS storage utilization is file based, not volume based. OneFS storage utilization compares competitively against other RAID based storage system while adding the benefits of unmatched reliability and scalability. This unique aspect of OneFS must be taken into account when assessing storage utilization of various data sets. Using the online analyzer, storage utilization of varying data sets and cluster configurations can be assessed.
About Isilon Systems Isilon Systems is the worldwide leader in clustered storage systems and software for digital content and unstructured data, enabling enterprises to transform data into information - and information into breakthroughs. Isilon's award-winning family of IQ clustered storage systems combines Isilon's OneFS operating system software with the latest advances in industrystandard hardware to deliver modular, pay-as-you-grow, enterprise-class storage systems. Isilon's clustered storage solutions speed access to critical business information while dramatically reducing the cost and complexity of storing it. Information about Isilon can be found at http://www.isilon.com.
ISILON SYSTEMS
11
Command On Cluster du du -h du -l du -lh df ls stat isi quota NFS Client du df ls stat Windows Client File Properties Share Properties
Base
Includes sparse
Includes Protection
no no no no no yes yes no
no no yes yes
yes yes+ no no
1024 1024
yes no
no yes+
Notes: * All Base-10 numbers on the cluster are expected to change to Base-2 in OneFS 5.0 + It is possible to setup a logical quota, with a hard threshold and the container flag, that would generate a value that excludes protection overhead.
ISILON SYSTEMS
12
Contrary to large files, cluster size does not affect small file storage utilization. Below are sample graphs of two different clusters that show storage utilization of different file sizes. Cluster size affects maximum storage utilization of large files while protection level affects storage utilization of both small and large files. See examples below.
On this 3 node cluster with +1 protection maximum space utilization of 66% is achieved with 4MB files. A 128KB file has about 50% space utilization and a 2K file is at 10% utilization.
Utilization as a Function of File Size
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1M B 2M B 4M B 8M B 16 M B 32 M B 64 M B 12 8M B 25 6M B 51 2M B
1k B
2k B
4k B
8k B 16 kB 32 kB 64 kB 12 8k B 25 6k B 51 2B
1T B
2T B
2T B
1G B
File Size
On this 6 node cluster with +1 protection level, maximum space utilization of 83% is achieved at about 16MB per file. A 128KB file has about 50% space utilization and a 2KB file is at 10% utilization.
Utilization as a Function of File Size
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
1M B 2M B
4M B 8M B 16 M B 32 M B 64 M B 12 8M B 25 6M B 51 2M B
1k B
2k B
4k B
8k B
2G B 4G B 8G B 16 G B 32 G B 64 G B 12 8G B 25 6G B 51 2G B
51 2B
1T B
1G B
File Size
On this 6 node cluster with +2 protection level, maximum space utilization of 66% is achieved at about 2MB per file. A 128KB file has about 50% space utilization, and a 2KB file is at 10% utilization.
ISILON SYSTEMS
13
2G B 4G B 8G B 16 G B 32 G B 64 G B 12 8G B 25 6G B 51 2G B
16 kB 32 kB 64 kB 12 8k B 25 6k B 51 2B
51 2B
4T B
4T B
80%
60%
40%
20%
0%
1M B 2M B 4M B 8M B 16 M B 32 M B 64 M B 12 8M B 25 6M B 51 2M B
1k B
2k B
4k B
8k B 16 kB 32 kB 64 kB 12 8k B 25 6k B 51 2B
1T B
2T B
1G B
File Size
Cluster size affects maximum storage utilization of large files while the storage overhead for small files is mainly affected by the file protection level.
ISILON SYSTEMS
14
2G B 4G B 8G B 16 G B 32 G B 64 G B 12 8G B 25 6G B 51 2G B
51 2B
4T B