IP Storage Networking:
IBM NAS and iSCSI Solutions
Application scenarios
Rowell Hernandez
Keith Carmichael
Cher Kion Chai
Geoff Cole
ibm.com/redbooks
International Technical Support Organization
IP Storage Networking:
IBM NAS and iSCSI Solutions
Second Edition
February 2002
SG24-6240-01
Take Note! Before using this information and the product it supports, be sure to read the
general information in “Special notices” on page 285.
This edition applies to the IBM TotalStorage Network Attached Storage 200, 300, and 300G with
microcode Release 2.0, IBM TotalStorage IPStorage 200i with microcode Release 1.2, Cisco
SN5420 storage and initiator clients running on Redhat Linux 7.1, Windows 2000, and
Windows NT.
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the
information in any way it believes appropriate without incurring any obligation to you.
© Copyright International Business Machines Corporation 2001, 2002. All rights reserved.
Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to
restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Summary of changes
This section describes the technical changes made in this edition of the book and
in previous editions. This edition may also include minor corrections and editorial
changes that are not identified.
New information
Added information on IBM TotalStorage 200
Added information on IBM TotalStorage 300
Added information on Cisco SN5420
Changed information
Removed all references to IBM ~ xSeries 150
Updated to include information on IPStorage 200i new models and microcode
v1.2
Updated to include information on NAS new models and preloaded software
v2.0
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
The team that wrote this redbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii
Special notice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
IBM trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Comments welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Contents vii
3.2 IBM TotalStorage Network Attached Storage 300 . . . . . . . . . . . . . . . 135
3.2.1 IBM NAS 300 hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.2.2 IBM NAS 300 technical specifications. . . . . . . . . . . . . . . . . . . . . . . 140
3.2.3 IBM NAS 300 features and benefits . . . . . . . . . . . . . . . . . . . . . . . . 140
3.2.4 IBM NAS 300 optional features . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.2.5 IBM NAS 300 preloaded software . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.3 IBM NAS 200 and 300 comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.4 IBM TotalStorage Network Attached Storage 300G . . . . . . . . . . . . . 145
3.4.1 IBM NAS 300G hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.4.2 IBM NAS 300G technical specifications . . . . . . . . . . . . . . . . . . . . . 151
3.4.3 IBM NAS 300G features and benefits . . . . . . . . . . . . . . . . . . . . . . . 152
3.4.4 IBM NAS 300G preloaded software . . . . . . . . . . . . . . . . . . . . . . . . 153
3.4.5 IBM NAS 300G connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.5 IBM TotalStorage IP Storage 200i Series . . . . . . . . . . . . . . . . . . . . . 158
3.5.1 IBM TotalStorage IP Storage 200i Configurations . . . . . . . . . . . . . 160
3.5.2 IBM TotalStorage IP Storage 200i Technical Specifications . . . . . . 161
3.5.3 IBM TotalStorage IP Storage 200i Microcode . . . . . . . . . . . . . . . . . 162
3.5.4 IBM TotalStorage IP Storage 200i features and profiles . . . . . . . . . 162
3.5.5 IBM IP Storage high availability and serviceability . . . . . . . . . . . . . 163
3.5.6 IBM IP Storage expandability and growth . . . . . . . . . . . . . . . . . . . . 164
3.5.7 IBM IP Storage 200i 4125-EXP Expansion Unit . . . . . . . . . . . . . . . 164
3.5.8 IBM IP Storage 200i Optional Features . . . . . . . . . . . . . . . . . . . . . 165
3.6 The Cisco SN 5420 Storage Router . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.6.1 Cisco SN 5420 hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3.6.2 Cisco SN 5420 technical specifications . . . . . . . . . . . . . . . . . . . . . 169
3.6.3 Cisco SN5420 clustering and high availability . . . . . . . . . . . . . . . . 170
3.6.4 Cisco SN5420 SCSI Routing Services . . . . . . . . . . . . . . . . . . . . . . 170
3.6.5 Cisco SN5420 features and benefits. . . . . . . . . . . . . . . . . . . . . . . . 171
Chapter 6. Application examples for IBM NAS and iSCSI solutions . . . 221
6.1 NAS Storage consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
6.2 NAS LAN file server consolidation . . . . . . . . . . . . . . . . . . . . . . . . . . 224
6.3 SANergy high speed file sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.4 SANergy with Tivoli Storage Manager (TSM) . . . . . . . . . . . . . . . . . . 227
6.4.1 Using TSM with SANergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
6.4.2 TSM backup/restore using SANergy: Scenario 1 . . . . . . . . . . . . . . 228
6.4.3 TSM backup/restore using SANergy: Scenario 2 . . . . . . . . . . . . . . 228
6.5 NAS Web hosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
6.6 IP Storage 200i solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
6.6.1 Database solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
6.6.2 Transaction-oriented applications . . . . . . . . . . . . . . . . . . . . . . . . . . 233
6.7 Positioning storage networking solutions . . . . . . . . . . . . . . . . . . . . . 234
6.8 Typical applications for NAS and for iSCSI? . . . . . . . . . . . . . . . . . . . 235
Contents ix
7.10 The bottom line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
We hope you will read this redbook from cover to cover, but in case you are in a
hurry, here is a guide to its organization:
For beginners without any knowledge about storage, we suggest you first read
Chapter 1, “Introduction to storage networking” on page 1. This chapter will guide
you through the different storage technologies, pros and cons, description,
terminologies, and so on—just the basics.
For more details, we suggest that you read Chapter 2, “IP storage networking
technical details” on page 63. This chapter discusses the different protocols
involved in storage networking and tells you what goes on under-the-covers.
In Chapter 3, “IBM NAS and iSCSI storage products” on page 121, we write
about the new IBM NAS and iSCSI products. You will get a comprehensive
overview of the different IBM TotalStorage Network Attached Storage and iSCSI
products.
And finally, what other developments are going on with regard to storage
networking? In Chapter 7, “Other storage networking technologies” on page 237,
we describe some of the key developments which are under way within the
industry, including work which is in progress to develop new industry standards in
important areas.
For those who are primarily interested in iSCSI topics, the following sections
cover various aspects of this new technology and the IBM iSCSI products:
1.9, “A new direction: SCSI over IP networks” on page 48
2.4, “iSCSI basics” on page 79
2.10, “Tracing the I/O path for Internet SCSI (iSCSI)” on page 106
3.5, “IBM TotalStorage IP Storage 200i Series” on page 158
Geoff Cole is a Senior Advisor and Sales Support Manager in the IBM Storage
Networking Solutions Advisory Group. He provides sales support for the IBM
Storage Systems Group in Europe, Middle East, and Africa (EMEA). Geoff is
based in London. He has been with IBM for 30 years, and has 17 years
experience in IBM’s storage business. He has held a number of sales and
marketing roles in the United Kingdom, the United States, and Germany. Geoff
holds a Master of Arts degree in Politics, Philosophy, and Economics from Oxford
University. He is a regular speaker on storage networking-related topics at IBM
customer groups and external conferences in Europe. Geoff can be reached at
coleg@uk.ibm.com.
Preface xiii
Thanks to the following people for their valuable contributions to this project:
International Technical Support Organization
Jon Tate, Emma Jacobs, Yvonne Lyon, Deanna Polm, Will Carney, Alison
Chandler
IBM Raleigh
Jay Knott, Eric Dunlap, Robert Owens, Chuck Collins, David Heath, Thomas
Daniels, Jeff Ottman, Joao Molina, Rebecca Witherspoon, Ken Quarles, Sandra
Kipp, Christopher Snell, Megan Kirkpatrick, Holly Tallon, Garry Rawlins
IBM Rochester
Steve Miedema
IBM Chicago
David Sacks
IBM Austria
Wolfgang Singer
Special notice
This publication is intended to help IBMers, business partners and customers to
understand the different storage networking solutions. The information in this
publication is not intended as the specification of any programming interfaces
that are provided by IBM TotalStorage NAS 200, 300, 300G, IPStorage 200i and
Cisco SN 5420. See the PUBLICATIONS section of the IBM Programming
Announcement for IBM TotalStorage NAS 200, 300, 300G, IPStorage 200i and
Cisco SN 5420 for more information about what publications are considered to
be product documentation.
Comments welcome
Your comments are important to us!
Preface xv
xvi IP Storage Networking: IBM NAS and iSCSI Solutions
1
Many volumes have already been written describing the explosion in data
storage, and the need for storage networks. We do not intend to repeat much of
what you have probably already read. We think that Information Technology (IT)
professionals who are involved in storage acquisition decisions understand very
well that we have reached a time when traditional approaches to data storage no
longer meet the needs of many applications and users. If you are a storage
veteran you may wish to turn straight to section 1.2, “Growth in networked
storage” on page 4.
If your data is doubling every year, then in ten years it will have grown more than
one thousand fold. We all know that if we do nothing, we will drown in data. It will
become impossible to control, and our business effectiveness will suffer. We
have to become more efficient in the way we store and manage data. IDC
estimates that storage managers must increase efficiency more than 60% per
year.
Throughout the 1990s, more than 70% of all disk storage was directly attached to
an individual server. This was primarily due to the rapid growth in the capacity of
hard disk drive technology in individual PCs, as well as client and server
platforms, rising from tens of megabytes to tens of gigabytes. It is now generally
recognized that connectivity of storage devices must enable substantially higher
scalability, flexibility, availability, and manageability than is possible with directly
attached devices.
Storage Network
Links between NAS and SAN, by means of intelligent NAS appliances, were
announced in early 2001 by IBM. These enable LAN-attached clients to access
and share SAN-attached storage systems. Now a third type of network storage
solution is emerging, known as iSCSI. This utilizes features of both SAN and
NAS using SCSI storage protocols on LAN IP network infrastructures. IBM was
first to market with iSCSI solutions with its TotalStorage IP Storage 200i devices,
announced in February 2001.
Since iSCSI is, in effect, SAN over IP, predictions regarding its growth are
included in the SAN projections. One projection is that iSCSI could represent
some 15% of the SAN market within three years. Although industry analysts
anticipated delivery of such solutions after the beginning of 2002, IBM leadership
in storage networking allowed an earlier introduction.
Since the advent of SAN solutions there has been a tendency to view NAS and
SAN as competing technologies within the market. This is partly due to some
confusion on how to apply each technology. After all, both terms include the
words storage and network. The problem to be solved is how to connect lots of
storage to lots of servers. The best technology to use to resolve the problem is a
network . However, the implementations are very different. NAS exploits the
existing intermediate speed messaging network, whereas the SAN solution uses
a specially designed high-speed networked channel technology.
30
NAS Storage
20
SAN Storage
10
0
1997 1998 1999 2000 2001 2002 2003
S ou rce: G artne r IT xpo 1 0/2 0 00
% Revenue
100 100
90 90
80 80
70 70
60 60
50 50
40 40
30 30
20 20
10 10
0 0
1998 1999 2000 2001 2002 2003
We also refer to some recent IBM IP network storage solutions where applicable,
and show what benefits they can provide.
Svr A
A
B
Svr B
C
Free space available
for dynam ic
Svr C allocation
SCSI is a “block-level” protocol, called block I/O, since SCSI I/O commands
define specific block addresses (sectors) on the surface of a particular disk drive.
So with SCSI protocols (block I/O), the physical disk volumes are visible to the
servers that attach to them. Throughout this book we assume the use of SCSI
protocols when we refer to directly attached storage.
The distance limitations of parallel SCSI have been addressed with the
development of serial SCSI-3 protocols. These allow SCSI commands to be
issued over different types of loop and network media, including Fibre Channel,
SSA, and more recently IP Networks. Instead of being sent as a group of bits in
parallel, on separate strands of wire within a cable, serial SCSI transports carry
the signal as a stream of bits, one after the other, along a single strand of media.
Fibre Channel
Fibre Channel is an open, technical standard for networking. It combines many of
the data characteristics of an I/O bus, with the added benefits of the flexible
connectivity and distance characteristics of a network. Fibre Channel uses
serialized data transmission over either copper (for short distances up to 25
meters) or fiber optic media (for distances up to 10 kilometers). IBM devices only
support the use of fiber optic media.
In the case of I/Os to disks using SCSI protocols, the application may use
generalized file system services. These manage the organization of the data
onto the storage device via the device driver software. In the UNIX world, this
file-level I/O is called cooked I/O. However, many databases and certain
specialized I/O processes generate record-oriented I/O direct to the disk via the
device driver. UNIX fans call this raw I/O.
These blocks are moved on the I/O bus to the disk device, where they are
mapped via a block table to the correct sector on the media (in mainframe
parlance, this is called channel I/O ). Block I/O is illustrated in Figure 1-5. For
technical details of how block I/Os are generated, refer to 2.7, “Tracing the I/O
path for local storage” on page 98.
IP network
Application
server
Block I/O
DAS
SCSI protocol
To achieve data exchange and sharing across networks, LANs require the use of
appropriate interconnection topologies and protocols. A LAN has a single logical
topology (access scheme), and will usually use a common network operating
system and common connecting cable.
A logical topology is the method used for transporting data around the network. It
is comparable to an access method (Media Access Control (MAC) in the OSI
Data Link layer). The access scheme handles the communication of data
packets, and places them in frames for transmission across the network.
Several different types of network access schemes were developed for LANs in
the 1980s. These include token ring passing schemes such as:
Fiber Distributed Data Interface (FDDI), based on concentric rings of fiber
optic cable)
Token Ring (developed by IBM)
ARCnet (developed by Datapoint)
Ethernet (originally designed by Xerox Corporation), which uses a
collision-detect access method
Today the predominant logical topology for LANs is Ethernet. IDC estimates that
more than 85% of all installed network connections worldwide are Ethernet,
which is so popular because it offers the best combination of price, simplicity,
scalability, and management ease of use. For this reason, we assume the
Ethernet protocol whenever we refer to LANs in this book.
1.5.1 Ethernet
Ethernet is an open industry standard for local area networks. It includes
definitions of protocols for addressing, formatting, and sequencing of data
transmissions across the network. The term Ethernet also describes the physical
media (cables) used for the network.
Ethernet uses a media access protocol, known as Carrier Sense Multiple Access
with Collision Detection (CSMA/CD). The CSMA/CD protocol moves packets on
the network. In effect, every node monitors the network to see if the network is
already transmitting a packet. A node waits until the network is free before
transmitting its packet. Since the nodes are spread in different locations, it is
possible for more than one node to begin transmitting concurrently. This results
in a collision of the packets on the network. If a collision is detected, all nodes
then go into a wait mode. On a random basis, they attempt to re-transmit the
packets until they are successful.
More nodes tend to mean more data packets transferred, and therefore more
collisions. The more collisions there are, the slower the network runs. This
problem is alleviated by the division of Ethernet LANs into multiple smaller
“subnets” or collision zones, by means of routers. Implementation of switched
networks, which create collision-free environments, has overcome the potential
limitations of the CSMA/CD protocol. CSMA/CD is described in more detail in
2.3.3, “The CSMA/CD protocol” on page 73.
There are several different types of Ethernet networks, based on the physical
cable implementations of the network. There are a number of media segments,
or cable types, defined in the Ethernet standards. Each one exhibits different
speed and distance characteristics. They fall into four main categories: thick
coaxial (thicknet), thin coaxial cable (thinnet), unshielded twisted pair (UTP), and
fiber optic cable. These are described in 2.3.6, “Ethernet media systems” on
page 77, for those readers who want more technical details.
Today, most sites use high quality twisted-pair cable, or fiber optic cables. Short
wave fiber optics can use multi-mode 62.5 micron or 50 micron fiber optic cables,
and single mode 9 micron fiber optic cable is used for long wave lasers. These
cables can all carry either 10 Mbps, 100 Mbps or 1 Gigabit signals, thus allowing
easy infrastructure upgrades as required.
Today, the de facto standard for client/server communications in the LAN, and
across the Internet, is TCP/IP. This is because it is an entirely open protocol, not
tied to any vendor. Millions of clients and servers, using TCP/IP protocols, are
interconnected into IP network infrastructures by way of routers and switches.
For this reason, we assume the TCP/IP protocol whenever we refer to LANs in
this book.
The Internet
Today the Internet is known to all since it is so pervasively used to interconnect
autonomous networks around the world. The Internet has acquired its own
administration body to oversee issues and to carry out ongoing research and
development. This board is called the Internet Activities Board (IAB). It has a
number of subsidiary groups, the best known of which is the Internet Engineering
Task Force (IETF), which deals with tactical implementation and engineering
problems of the Internet. For information on the IAB and IETF, see the following
Web sites:
http://www.iab.org/iab/
http://www.ietf.org
TCP: The protocol which manages the OSI Transport level of exchanges is
Transmission Control Protocol (TCP). (Note: the OSI network layers are
described in 2.1, “Open Systems Interconnection (OSI) model” on page 64). TCP
adds a destination port and other information about the outgoing data, and puts it
into what is known as a TCP segment.
IP: The standard peer-to-peer networking protocol used by Ethernet (and the
Internet) to route message exchanges between network nodes is the Internet
Protocol (IP). As a result, these networks are generically known as IP networks.
IP is operating in the OSI Network layer. It takes the TCP segment and adds
specific network routing information. The resulting packet is known as an IP
datagram. The datagram passes to the network driver software, which adds
further heading information. The datagram is now a packet, or frame, ready for
transmission across the network.
TCP/IP also includes a number of other protocols, which are known as the
TCP/IP Suite or stack. This describes a suite of protocols designed to handle
program to program transactions, electronic mail, security, file transfers, remote
logon facilities, and network discovery mechanisms over local and wide area
networks. We describe the TCP/IP protocol stack, and how it interrelates with IP
networks in 2.2, “TCP/IP technical overview” on page 66.
The manner in which files are stored, accessed, and protected differs among
different types of platforms. Therefore, FTP works with some basic properties
which are common to files on most systems to enable users to manipulate files.
An FTP communication begins when the FTP client establishes a session with
the FTP server. The client can then initiate multiple file transfers to or from the
FTP server. An example of FTP file copying is illustrated in Figure 1-6. At
completion of the process, both systems have a copy of file “x”, and both can
work on it independently.
IP network
x x
y
Computer A Computer B
File sharing
Another early requirement was to share files. In other words, rather than ship
files between computers, why not allow multiple clients to access a single copy of
a file which is stored on a central server? Network file protocols and network
operating systems (NOS) were developed in the 1980s to enable users to do
this. These include Network File System (NFS), Common Internet File System
(CIFS), and Novell Netware.
NetWare
NetWare is a popular PC-based specialized network operating system (NOS)
rather than a protocol. Developed by Novell, the NetWare operating system is
optimized as a multi-platform network file server. It supports numerous client
platforms by means of its name space service. In addition to supporting CIFS for
Windows systems, UNIX clients can store data on NetWare servers using NFS,
and Apple Macintosh users can do so via the Apple file protocol.
By making storage systems LAN addressable, the storage is freed from its direct
attachment to a specific server, and any-to-any connectivity is facilitated using
the LAN fabric. In principle, any user running any operating system can access
files on the remote storage device. This is done by means of a common network
access protocol—for example, NFS for UNIX servers and CIFS for Windows
servers. In addition, a task such as backup to tape can be performed across the
LAN using software like Tivoli Storage Manager (TSM), enabling sharing of
expensive hardware resources (for example, automated tape libraries) between
multiple servers.
A storage device cannot just attach to a LAN. It needs intelligence to manage the
transfer and the organization of data on the device. The intelligence is provided
by a dedicated server to which the common storage is attached. It is important to
understand this concept. NAS comprises a server, an operating system, and
storage which is shared across the network by many other servers and clients.
So a NAS is a specialized server or appliance, rather than a network
infrastructure, and shared storage is attached to the NAS server.
The NAS system “exports” its file system to clients, which access the NAS
storage resources over the LAN.
The file server has to manage I/O requests accurately, queuing as necessary,
fulfilling the request, and returning the information to the correct client. The NAS
server handles all aspects of security and lock management. If one user has the
file open for updating, no one else can update the file until it is released. The file
server keeps track of connected clients by means of their network IDs,
addresses, and so on.
A file I/O specifies the file. It also indicates an offset into the file. For instance, the
I/O may specify “Go to byte ‘1000’ in the file (as if the file were a set of
contiguous bytes), and read the next 256 bytes beginning at that position.” Unlike
block I/O, there is no awareness of a disk volume or disk sectors in a file I/O
request. Inside the NAS appliance, the operating system keeps track of where
files are located on disk. It is the NAS OS which issues a block I/O request to the
disks to fulfill the client file I/O read and write requests it receives.
In summary, network access methods like NFS and CIFS can only handle file I/O
requests to the remote file system. This is located in the operating system of the
NAS device. I/O requests are packaged by the initiator into TCP/IP protocols to
move across the IP network. The remote NAS file system converts the request to
block I/O and reads or writes the data to the NAS disk storage. To return data to
the requesting client application, the NAS appliance software repackages the
data in TCP/IP protocols to move it back across the network. This is illustrated in
Figure 1-7 on page 24.
Because of its channel, or bus-like, qualities, hosts and applications see storage
devices attached to the SAN as if they are locally attached storage. With its
network characteristics, it can support multiple protocols and a broad range of
devices, and it can be managed as a network.
Measured effective data rates of Fibre Channel have been demonstrated in the
range of 60 to 80 MBps over the 1 Gbps implementation. This compares to less
than 30 MBps measured over Gigabit Ethernet. The packet size of Fibre Channel
is 2,112 bytes (rather larger than some other network protocols). For instance, an
IP packet is 1,518 bytes, although normally IP transfers are much smaller. But for
Fibre Channel a maximum transfer unit sequence of up to 64 frames can be
defined, allowing transfers of up to 128 MB without incurring additional overhead
due to processor interrupts. Thus, today Fiber Channel is unsurpassed for
efficiency and high performance in moving large amounts of data.
Server-to-server
Server-to-storage
Storage-to-storage
Storage Area Network
It is this high degree of flexibility, availability, and scalability, over long distances,
and the broad acceptance of the Fibre Channel standards by vendors throughout
the IT industry, which make the Fibre Channel architecture attractive as the basis
for new enterprise storage infrastructures.
I P n e tw o r k F ib r e C h a n n e l n e tw o r k
A p p li c a t io n
s e rv e r
B lo c k I/O
FCP SAN
A p p lic a t io n m a k e s file I /O
r e q u e s t to fi le s y s t e m in
s e r v e r , w h ic h in it ia te s b lo c k
I/O to S A N a tta c h e d d is k
OR
A p p lic a t io n in it ia te s r a w
b l o c k I/O to d is k
Current IBM disk storage systems, including the Enterprise Storage Server
(ESS), Modular Storage Server (MSS), FAStT200, FAStT500, and tape
subsystems like the IBM 3590, 3494, and LTO models, are also FC ready. In
addition, IBM offers a broad range of FC hubs, switches, directors and gateways
to build SANs which scale from small workgroups to enterprise-wide solutions.
Furthermore, IBM Global Services supports SAN implementation with
comprehensive design and consultancy services. It is not our intention in this
book to examine these IBM solutions. There are a number of other IBM
Redbooks which address SAN concepts and solutions in considerable detail; we
recommend the following for more information:
Introduction to Storage Area Networks, SG24-5470
Designing an IBM SAN , SG24-5788
Planning and Implementing an IBM SAN , SG24-6116
Using Tivoli Storage Manager in a SAN environment , SG24-6132
Storage Area Networks; Tape Future in Fabrics , SG24-5474
Storage consolidation in SAN environments, SG24-5987
Implementing Fibre Channel Attachment on the ESS, SG24-6113
SAN Survival Guide, SG24-6143
For details about IBM SAN solutions, visit the IBM storage Web site at:
http://www.storage.ibm.com/ibmsan
IBM has introduced a family of data and SAN resource management tools,
namely the IBM StorWatch family of tools, Tivoli Storage Manager and Tivoli
Network Storage Manager. In addition, IBM has indicated its strategic direction to
develop storage network virtualization solutions, known as the Storage Tank
project, which will allow enterprise-wide, policy-driven, open systems
management of storage. Refer to 2.13, “Data and network management” on
page 111 for more details on:
Tivoli Storage Manager (TSM)
Tivoli Network Storage Manager (TSNM)
Storage virtualization
Tivoli SANergy File Sharing is unique SAN software that allows sharing of access
to application files and data between a variety of heterogeneous servers and
workstations connected to a SAN. In addition, Tivoli SANergy File Sharing
software uses only industry-standard file systems like NFS and CIFS, enabling
multiple computers simultaneous access to shared files through the SAN (shown
in Figure 1-12 on page 39). This allows users to leverage existing technical
resources instead of learning new tools or migrating data to a new file system
infrastructure. This software allows SAN-connected computers to have the
high-bandwidth disk connection of a SAN while keeping the security, maturity,
and inherent file sharing abilities of a LAN.
SAN
Disk Storage
Sub-System
NTFS
In addition to the SAN, Tivoli SANergy also uses a standard LAN for all the
metadata associated with file transfers. Because Tivoli SANergy is NT File
System (NTFS) based, even if the SAN should fail, access to data via the LAN is
still possible. Since each system has direct access to the Tivoli SAN-based
storage, Tivoli SANergy can eliminate the file server as a single point of failure
for mission-critical enterprise applications. Tivoli SANergy can also easily
manage all data backup traffic over the storage network, while the users enjoy
unimpeded LAN access to the existing file servers.
Data “about” data is referred to as metadata. Examples include file names, file
sizes, and access control lists. The Tivoli SANergy File Sharing architecture lets
metadata transactions take place over conventional LAN networking. The actual
content of files moves on the high-speed direct SAN connection, as illustrated in
Figure 1-13 on page 40.
SANergy
Metadata
Controller
1
2
SANergy works with Ethernet, ATM, or anything else that carries networking
protocols. The network operating system can also be CIFS protocol (Windows
NT), Appletalk, NFS (UNIX), or a combination. Similarly, SANergy supports any
available disk-attached storage fabric. This includes Fibre Channel, SSA, SCSI,
and any other disk-level connection. It is also possible for installations to use one
set of physical wiring to carry both the LAN and storage traffic.
When you use SANergy, one computer in the workgroup is designated as the
Meta Data Controller (MDC) for a particular volume. You can have a single
computer as the MDC for all volumes, or MDC function can be spread around so
that multiple computers each control certain volumes. The other computers are
SANergy clients. They use conventional networking to “mount” that volume, and
SANergy on those clients separates the metadata from the raw data
automatically.
Two different types of configurations are available for the NAS 300G; the
single-node G01 and the dual-node G26. The dual node Model G26 provides
clustering and failover protection for top performance and availability. The G01
and G26 models are illustrated in Figure 1-14.
The NAS 300G accepts a file I/O request (for example, using the NFS or CIFS
protocols) and translates that to a SCSI block I/O request to access the external
attached disk storage. The 300G interconnections are illustrated in Figure 1-15
on page 44 and Figure 1-16 on page 45.
IP network
Application
server
NAS 300G
NAS 300G
The 300G also offers additional advantages of SAN scalability and performance
on the IP network:
Increased choice of disk types: By separating the disk subsystem
selection from the NAS appliance selection, the buyer has greater flexibility to
choose the most cost-effective storage to meet business requirements. The
best of breed storage systems can be selected to attach to the SAN, and the
300G appliance can exploit the benefits of their superior performance,
availability, and advanced functions.
300G w/ESS
(Shark)
300G
w/FAStT500
NAS 300G
w/FAStT200
NAS 300G
w/7133
NAS 300
Network Attached
NAS 200 Storage 300G
The question arises whether we can use TCP/IP, the networking technology of
Ethernet LANs and the Internet, for storage. This could enable the possibility of
having a single network for everything, and could include storage, data sharing,
Web access, device management using SNMP, e-mail, voice and video
transmission, and all other uses.
IP SANs could leverage the prevailing technology of the Internet to scale from
the limits of a LAN to wide area networks, thus enabling new classes of storage
applications. SCSI over IP could enable general purpose storage applications to
run over TCP/IP. Moreover, an IP SAN would also automatically benefit from new
networking developments on the Internet, such as Quality of Service (QoS) and
security. It is also widely anticipated that the total cost of ownership of IP SANs
would be lower. This is due to larger volumes of existing IP networks and the
wider skilled manpower base familiar with them.
At the IBM research centers at Almaden and Haifa, efforts are under way to
resolve these issues. The goal is to make the promise of IP SANs a reality.
Efforts are concentrated along two different directions: the primary effort is to
bridge the difference in performance between Fibre Channel and IP SANs. In
parallel, there is an effort to define a standard mapping of SCSI over TCP/IP. The
result is Internet SCSI (iSCSI), sometimes called SCSI over IP.
The iSCSI proposal was made to the Internet Engineering Task Force (IETF)
standards body jointly by IBM and Cisco. Details of some of the objectives and
considerations of the IETF standards proposals for iSCSI are described in 2.4,
“iSCSI basics” on page 79. In February 2001 IBM announced the IBM
TotalStorage IP Storage 200i, which became generally available in June 2001.
This was followed in April by the announcement by Cisco of the Cisco SN 5420
Storage Router, a gateway product linking iSCSI clients and servers to Fibre
Channel SAN-attached storage.
IBM has taken a leadership role in the development and implementation of open
standards for iSCSI. As it is a new technology, you can expect additional
developments as iSCSI matures. Since IBM’s iSCSI announcement in February
2001, a large number of other companies in the storage networking industry
have stated their intentions to participate in iSCSI developments, and to bring
products to market in due course. Things already are moving very rapidly. In July
2001 IBM participated in a SNIA sponsored iSCSI interoperability demonstration,
This is a network appliance that uses the new iSCSI technology. The IP Storage
200i appliance solution includes client initiators. These comprise client software
device drivers for Windows NT, Windows 2000, and Linux clients. These device
drivers coexist with existing SCSI devices without disruption. They initiate the
iSCSI I/O request over the IP network to the target IP Storage 200i. IBM plans to
add additional clients in response to customer feedback and market demands.
IBM is committed to support and deliver open industry standard implementations
of iSCSI as the IP storage standards in the industry are agreed upon.
The IBM IP Storage 200i is a low cost, easy to use, native IP-based storage
appliance. It integrates existing SCSI storage protocols directly with the IP
protocol. This allows the storage and the networking to be merged in a seamless
manner. iSCSI-connected disk volumes are visible to IP network-attached
processors, and as such are directly addressable by database and other
performance oriented applications. The native IP-based 200i allows data to be
stored and accessed wherever the network reaches—LAN, MAN or WAN
distances.
IBM TotalStorage IP Storage 200i, comprises the 4125 Model 110 tower system,
and the 4125 Model 210 rack-mounted system. These are high-performance
storage products that deliver the advantages of pooled storage, which FC SANs
provide. At the same time, they take advantage of the familiar and less complex
IP network fabric.
The IBM TotalStorage IP Storage 200i products are “appliance-like.” All required
microcode comes pre-loaded, minimizing time required to set up, configure, and
make operational the IP Storage 200i. There are only two types of connections to
make: connecting the power cord(s) and attaching the Ethernet connection(s) to
the network.
Microcode for the 200i is Linux-based. Since the microcode is pre-loaded, the
initial installation time (after unpacking, physical location, and external cabling)
should take about 15 minutes. After the first IPL boot, succeeding IPL boots
should take about 5 minutes. The code for an iSCSI initiator should take less
than 5 minutes to install, as it is a seamless device driver addition.
Tw o industry A pproaches:
iSCSI App liances (with Em bed ded Sto rage )
iSCSI G ate ways (IP/FC Bridges)
iSCSI iS CS I Appliance
C lient Softw are
3
IP N etwork 2
1
SCSI Protoc ol
S
iSC SI G atew ay A
N
iSCSI Appliance
Block I/O
IP protocol
The benefits of the IBM IP Storage 200i appliance include the following:
Connectivity: iSCSI can be used for DAS or SAN connections.
iSCSI-capable devices could be placed on an existing LAN (shared with other
applications) in a similar way to NAS devices. Also, iSCSI-capable devices
could be attached to a LAN which is dedicated to storage I/O (in other words,
an IP SAN), or even to a LAN connected to only one processor (like a DAS).
These options are shown in Figure 1-21.
Extended distance: IP networks offer the capability easily to extend beyond
the confines of a LAN, to include Metropolitan and Wide Area Networks
(MANs and WANs). This gives greater flexibility, and at far less cost and
complexity, compared to the interconnection of Fibre Channel SANs over
wide areas.
iSCSI storage
iSCSI
iSCSI
Client
Client Software
Software
IP LAN
IP LAN
IP SAN
SCSI Protocol SCSI Protocol SCSI Protocol
iSCSI Appliance iSCSI Appliance iSCSI Appliance
Pooled storage Pooled storage
Media and network attachments: iSCSI and NAS devices both attach to IP
networks. This is attractive compared to Fibre Channel because of the
widespread use of IP networks. IP networks are already in place in most
With these applications in mind, the IBM TotalStorage IP Storage 200i will be well
suited for departments and workgroups within large enterprises, mid-size
companies, service providers (such as Internet service providers), and
e-business organizations.
However, the good news is that the IBM offerings are truly complementary with
each other. They are designed to work together to deliver the broadest range of
cooperating storage network solutions. In making a decision for one solution
today, you are not ruling out the ability to select and benefit from another network
choice tomorrow.
In reality, most larger organizations are likely, in our view, to implement several of
the network options, in order to provide an optimal balance of performance,
flexibility, and cost for differing application and departmental needs. As we show
in Figure 1-22, all the IBM storage network systems can be interlinked.
Clients IP 200i
iSCSI Appliances
IP
Cisco
NAS 300g Gateway
"iSCSI"
FC Servers
NAS 200 & 300
Appliances
SAN Attached
Storage
LAN
IP Network
MDC
TCP/IP SAN
LAN LAN
IP Network NAS 300G IP Network
iSCSI iSCSI
In addition, IBM and other major vendors in the industry have invested heavily in
interoperability laboratories. The IBM laboratories in Gaithersburg, (Maryland,
USA), Mainz (Germany), and Tokyo, Japan, are actively testing equipment from
IBM and many other vendors, to facilitate the early confirmation of compatibility
between multiple vendors servers, storage and network hardware and software
components. Many IBM Business Partners have also created interoperability test
facilities to support their customers.
The SNIA is accepted as the primary organization for the development of SAN
and NAS standards, with over 150 companies and individuals as its members,
including all the major server, storage, and fabric component vendors. The SNIA
is committed to delivering architectures, education, and services that will propel
storage networking solutions into a broader market. IBM is one of the founding
members of SNIA, and has senior representatives participating on the board and
in technical groups. For additional information on the various activities of SNIA,
see its Web site at:
http://www.snia.org
The SNIA mission is to promote the use of storage network systems across the
IT community. The SNIA has become the central point of contact for the industry.
It aims to accelerate the development and evolution of standards, to promote
their acceptance among vendors and IT professionals, and to deliver education
and information. This is achieved by means of SNIA technical work areas and
work groups. A number of work groups have been formed to focus on specific
areas of storage networking, and some of these are described in 7.9.1, “SNIA
work groups” on page 255.
One of the relevant work groups pertaining to topics in this book is the IP Storage
(ips) Work Group, which is addressing the significant interest in using IP-based
networks to transport block I/O storage traffic. The work of this group is outlined
in 7.9.2, “IETF work groups” on page 259.
For more information on the IETF and its work groups, refer to:
http://www.ietf.org
Fibre Channel SAN concepts have been extensively covered in other IBM
Redbooks, so we do not address Fibre Channel here.
Since the focus of this book is on IP networks, it is useful to begin with a brief
description of the standard model for open systems networks. This is known as
the Open Systems Interconnection (OSI) model.
We then discuss in detail the specific products and technologies that make up
open systems networks.
Presentation
Session
Transport
Network
Data link
Physical
The terms protocol and service are often confused. A protocol defines the
exchange that takes place between identical layers of two hosts. For example, in
the TCP/IP stack, the transport layer of one host talks to the transport layer of
another host using the TCP protocol. A service, on the other hand, is the set of
functions that a layer delivers to the layer above it. For example, the TCP layer
provides a reliable byte-stream service to the application layer above it.
Application Presentation
Session
Transport Transport
TCP or UDP
Network Network
(Internet Protocol)
Data Link
Data Link and
Physical
(Subnet) Physical
TCP/IP Stack
Application
Application datagram
TCP
TCP Header
Internet IP
Protocol Header
Subnet IP Packet
IP TCP Application
Header Header data
Subnet Subnet
Header IP Packet Trailer
Subnetwork Frame
The job of the IP layer is to route these packets to the target destination. IP
packets consist of an IP header, together with the higher level TCP protocol and
the application datagram. IP knows nothing about the TCP and datagram
contents. Prior to transmitting data, the network layer might further subdivide it
into smaller packets for ease of transmission. When all the pieces reach the
destination, they are reassembled by the network layer into the original
datagram.
The IP Packet
All IP packets or datagrams consist of a header section and a data section
(payload). The payload may be traditional computer data or, as is common today,
it may be digitized voice or video traffic. Using the postal service analogy again,
the “header” of the IP packet can be compared with the envelope and the
“payload” with the letter inside it. Just as the envelope holds the address and
information necessary to direct the letter to the desired destination, the header
helps in the routing of IP packets.
The payload has a maximum size limit of 65,536 bytes per packet. It contains
error and/or control protocols, like the Internet Control Message Protocol (ICMP).
To illustrate control protocols, suppose that the postal service fails to find the
destination of your letter. It would be necessary to send you a message
indicating that the recipient's address was incorrect. This message would reach
you through the same postal system that tried to deliver your letter. ICMP works
the same way: it packs control and error messages inside IP packets.
IP addressing
An IP packet contains a source and a destination address. The source address
designates the originating node's interface to the network, and the destination
address specifies the interface for an intended recipient or multiple recipients (for
broadcasting).
The network part of the address is common for all machines on a local network. It
is similar to a postal code, or zip code, that is used by a post office to route letters
to a general area. The rest of the address on the letter (i.e., the street and house
number) are relevant only within that area. It is only used by the local post office
to deliver the letter to its final destination.
The host part of the IP address performs a similar function. The host part of an IP
address can further be split into a subnetwork address and a host address.
IP network addressing is a large and intricate subject. It is not within the scope of
this book to describe it in any further detail.
The application data has no meaning to the transport layer. On the source node,
the transport layer receives data from the application layer and splits it into
chunks. The chunks are then passed to the network layer. At the destination
node, the transport layer receives these data packets and reassembles them
before passing them to the appropriate process or application. Further details
about how data travels through the protocol stack follow.
The transport layer is the first end-to-end layer of the TCP/IP stack. This
characteristic means that the transport layer of the source host can communicate
directly with its peer on the destination host, without concern about how data is
moved between them. These matters are handled by the network layer. The
layers below the transport layer understand and carry information required for
moving data across links and subnetworks.
Figure 2-4 shows how the client side and the server side TCP/IP stack
implementation adds increasing overhead to the transmission of data through the
network.
Data encapsulation
Data
Increasing overhead
Application Application
Increasing overhead
Add Transport header
Transport Transport
Application layer
The application layer is the layer with which end users normally interact. This
layer is responsible for formatting the data so that its peers can understand it.
Whereas the lower three layers are usually implemented as a part of the OS, the
application layer is a user process. Some application-level protocols that are
included in most TCP/IP implementations include the following:
Telnet for remote login
FTP for file transfer
SMTP for mail transfer
The CSMA/CD protocol moves packets on the network. The term Multiple
Access describes the concept that every node “hears” every message. In effect,
every node “listens” to the network segment to see if the network is transmitting a
frame. A node which wishes to transmit a frame waits until the network is free
before transmitting its data. Carrier Sense refers to this technique.
Since the nodes are spread in different locations, it is possible for more than one
node to begin transmitting concurrently. This results in a collision of the frames
on the network. If a collision is detected the sending nodes transmit a signal to
prevent other nodes from sending more packets. All nodes then go into a wait
mode. On a random basis they go back to monitoring and transmitting.
We can liken this to a group of people sitting around a dinner table. I may wish to
say something, but someone else is already speaking. Rather than rudely
interrupting the speaker, I will wait politely until the other person has finished
speaking. When there is a pause, then I will say my piece. However, someone
else may also have been waiting to say something. At the pause in the
conversation we may both begin to speak, more or less at the same time. In
Ethernet terminology, a collision has occurred. We will both hear the other
person begin to speak, so we both politely stop, in order to allow the other one to
finish speaking. One of us will sense that it is OK to carry on, and will begin the
conversation again.
Packets which collided are re-sent. Since collisions are normal, and expected,
the only concern is to ensure a degree of fairness in achieving a timely
transmission. This is achieved by a simple random algorithm, which will enable a
node to “win” a collision battle after a number of attempts.
When Fast Ethernet, at 100 Mbps, was introduced in the mid 1990s, an
auto-negotiation procedure was also introduced. This dealt with the difference
between the original 10 Mbps CSMA/CD half duplex operation, and the new 100
Mbps full duplex implementation. With half duplex, only one end node on a
copper link (not fiber) may transmit at a time. With full duplex, both end nodes
may transmit concurrently, without generating a collision. With full duplex
operation, many of the CSMA/CD protocol functions become redundant, and the
propensity for frames to collide is almost eliminated.
Segments
As we have seen, early designs communicated with devices attached to a single
cable (segment) shared by all the devices on the network. A single segment is
also known as a collision domain because no two nodes on the segment can
transmit at the same time without causing a collision.
Segment 3
Bridge Collision domain B
R
(subnet)
Repeaters
Segment 1
Segment 2
R
Collision domain A
(subnet)
Figure 2-5 Ethernet spanning tree topology with subnet collision domains
Switched fabric
Today, Ethernet has evolved to switched fabric topologies. It also normally uses
twisted pair wiring or fiber optic cable to connect nodes in a radial pattern. Early
implementations of Ethernet used half-duplex transmission (that is to say, data
transferred in one direction at a time).
Router to
Ethernet hub
other networks
Ethernet switch
Ethernet hub
Ethernet hub
Once again using our dinner party theme, now we have a series of major
banquets in different rooms, even in different buildings and cities. But each
person wants to be able to talk to anyone else at any of the tables in any of the
locations. The organizers have thoughtfully provided each diner with a
telephone. Each diner can now call any of the other participants directly, have a
person-to-person conversation, and later speak to other people individually,
wherever they are seated. Everyone can speak at the same time, without
interrupting the other diners.
There are a number of other definitions of cabling, with suffixes such as TX, FX,
CX and SX, describing different types of twisted pair cables, or multi-mode and
single-mode fiber optic cable. To keep things simple we have not described
these, but the media variations are summarized in Figure 2-7 on page 79.
Ethernet really took off commercially when it became possible to use UTP cable,
and when the use of hubs greatly simplified the logistics of installing the cabling.
A hub acted as a kind of concentrator for linking many machines to a central
wiring point. Today most sites use high quality twisted-pair cable or fiber optic
cables. These are much easier to install than coaxial cable because of their
flexibility. Short wave fiber optics can use multi-mode 62.5 micron or 50 micron
fiber optic cables; and single mode 9 micron cable is for long wave. These cables
can all carry either 10-Mbps, 100-Mbps or 1 Gigabit signals, thus allowing easy
infrastructure upgrades as required.
100Base-T4 100Base-TX
Twisted pair Twisted pair 100Base-FX 100 Mbps
Fiber Optic
(voice grade) (data grade)
In keeping with similar protocols, the initiator and target divide their
communications into messages. The term iSCSI protocol data unit (iSCSI
PDU) describes these messages.
The iSCSI transfer direction is defined with regard to the initiator. Outbound or
outgoing transfers are transfers from initiator to target, while inbound or incoming
transfers are from target to initiator.
iSCSI operations
iSCSI is a connection-oriented command/response protocol. An iSCSI session
begins with an iSCSI initiator connecting to an iSCSI target (typically, using TCP)
and performing an iSCSI login. This login creates a persistent state between
initiator and target, which may include initiator and target authentication, session
security certificates, and session option parameters.
Once this login has been successfully completed, the iSCSI session continues in
full feature phase. The iSCSI initiator may issue SCSI commands encapsulated
by the iSCSI protocol over its TCP connection, which are executed by the iSCSI
target. The iSCSI target must return a status response for each command over
the same TCP connection, consisting of both the completion status of the actual
SCSI target device and its own iSCSI session status.
Data transferred from the iSCSI initiator to iSCSI target can be either unsolicited
or solicited. Unsolicited data may be sent either as part of an iSCSI command
message, or as separate data messages (up to an agreed-upon limit negotiated
between initiator and target at login). Solicited data is sent only in response to a
target-initiated Ready to Transfer message.
Each iSCSI command, Data, and Ready to Transfer message carries a tag,
which is used to associate a SCSI operation with its associated data transfer
messages.
The iSCSI target layer must deliver the commands to the SCSI target layer in the
specified order.
iSCSI login
The purpose of the iSCSI login is to enable a TCP connection for iSCSI use,
authenticate the parties, negotiate the session's parameters, open a security
association protocol, and mark the connection as belonging to an iSCSI session.
As part of the login process, the initiator and target may wish to authenticate
each other and set a security association protocol for the session. This can occur
in many different ways.
The initiator must present both its initiator WWUI and the target WWUI to which it
wishes to connect during the login phase.
The login phase is implemented via login and text commands and responses
only. The login command is sent from the initiator to the target in order to start the
login phase. The login response is sent from the target to the initiator to conclude
the login phase. Text messages are used to implement negotiation, establish
security, and set operational parameters. The whole login phase is considered as
a single task and has a single Initiator Task Tag (similar to the linked SCSI
commands).
The login phase starts with a login request via a login command from the initiator
to the target. A target may use the Initiator WWUI as part of its access control
mechanism; therefore, the Initiator WWUI must be sent before the target is
required to disclose its LUs.
Security considerations
Historically, native storage systems have not had to consider security because
their environments offered minimal security risks. That is, these environments
consisted of storage devices either directly attached to hosts, or connected via a
subnet distinctly separate from the communications network. The use of storage
protocols, such as SCSI, over IP networks requires that security concerns be
addressed. iSCSI implementations must provide means of protection against
active attacks (posing as another identity, message insertion, deletion, and
modification) and may provide means of protection against passive attacks
(eavesdropping, gaining advantage by analyzing the data sent over the line).
No security: This mode does not authenticate nor does it encrypt data. This
mode should only be used in environments where the security risk is minimal and
configuration errors are improbable.
Every compliant iSCSI initiator and target must be able to provide initiator-target
authentication and data integrity and authentication. This quality of protection
may be achieved on every connection through properly configured IPSec
involving only administrative (indirect) interaction with iSCSI implementations.
For full details of the latest iSCSI Internet Draft you may wish to refer to the IETF
Web site at:
http://www.ietf.org/internet-drafts/draft-ietf-ips-iscsi-05.txt
The device driver controls the operation of the attached storage device, and the
transfer of data to and from the device through the HBA. The device driver
software is part of the system operating system; it is described briefly in “Device
and network drivers” on page 92.
Network connections
Storage networks solve, among other things, the distance limitations of the SCSI
bus. The storage I/O bus is replaced by a cable attachment into the network. The
attachment may utilize devices to facilitate ease of implementation, such as hubs
and switches. The physical topologies of these attachments may vary according
to the network size, costs, and performance requirements.
SAN topologies
The following physical topologies for Fibre Channel SAN are supported:
– Loop: A Fibre Channel loop cable is a shared attachment resource.
Arbitration determines which device can send its transmission. Loops are
typically implemented in a star fashion. A hub provides a simple, low cost,
loop topology within its own hardware. Each loop node is connected via
cable to the hub. The bandwidth of the loop is shared by all attached loop
nodes.
– Switched fabric: Switched fabric topologies use centralized, high speed
switches to deliver multiple, dedicated, concurrent data transmission paths
across the network. There is no arbitration required. The bandwidth of the
network automatically scales as paths are added to the topology.
Intelligence in the fabric components, such as switches, can determine if a
path is broken or busy, and can select the best alternative route through
the network to the target node.
– Point-to-point: A point-to-point connection may be made, depending on
the storage device attached. This provides a connection similar to direct
attachment, although it uses HBAs and Fibre Channel protocols.
LAN topologies
In the case of LAN topologies, Ethernet supports bus-like daisy chain
(segment), spanning tree, and switched fabric topologies. These are
described in 1.5.1, “Ethernet” on page 14. For the sake of brevity we will not
repeat the information here.
Arcnet 1000BaseT
ATM
Application software
Applications which need access to data generate an I/O. The I/O request may
come from an interactive user-driven application, a batch process, a database
operation, or a system management process. The application has no idea about
the physical structure and organization of the storage device where the data is
located.
File systems
A file system (FS) is the physical structure an operating system uses to store and
organize files on a storage device. At the basic I/O system (BIOS) level, a disk
partition contains sectors, each with a number (0,1,2 and so on). Each partition
could be viewed as one large dataset, but this would result in inefficient use of
disk space and would not meet application requirements effectively. To manage
how data is laid out on the disk, an operating system adds a hierarchical
directory structure. Each directory contains files, or further directories, known as
sub-directories. The directory structure and methods for organizing disk
partitions is called a file system.
File systems manage storage space for data created and used by the
applications. The primary purpose of an FS is to improve management of data by
allowing different types of information to be organized and managed separately.
The FS is implemented through a set of operating system commands that allow
creation, management, and deletion of files. A set of subroutines allows lower
level access, such as open, read, write, and close to files in the file system. The
FS defines file attributes (read only, system file, archive, and so on), and
allocates names to files according to a naming convention specific to the file
system. The FS also defines maximum size of a file and manages available free
space to create new files.
Many different file systems have been developed to operate with different
operating systems. They reflect different OS requirements and performance
assumptions. Some file systems work well on small computers; others are
designed to exploit large, powerful servers. An early PC file system is the File
Allocation Table (FAT) FS used by the MS-DOS operating system. Others file
systems include the High Performance FS (HPFS), initially developed for IBM
OS/2, Windows NT File System (NTFS), Journal File System (JFS) developed
for the IBM AIX OS, and General Parallel File System (GPFS), also developed
by IBM for AIX. There are many others.
A disk drive may have partitions with file systems belonging to several different
operating systems. Generally an operating system will ignore those partitions
whose ID represents an unknown file system.
The file system is usually tightly integrated with the OS. However, in storage
networks it may be separated from the OS and distributed to multiple remote
platforms. This is to allow a remote file system (or part of a file system) to be
accessed as if it were part of a local file system. Later we will see how this
happens with Network File System (NFS) and Common Internet File System
(CIFS).
Database systems
A database can access and store data by making I/O requests via a file system.
Alternatively, it can manage its own block I/O operations by reading and writing
directly to “raw partitions” on the disk device. In this case the database allocates
and manipulates the storage for its own table spaces without requesting services
from the file system. This may result in very much faster performance.
The roles of these components are described in more detail in 2.7, “Tracing the
I/O path for local storage” on page 98.
Volume manager
The volume manager may be an integral part of the OS, or it may be a separate
software module, such as Veritas Logical Volume Manager developed for Sun
Solaris OS. The volume manager is concerned with disk device operations,
creating and configuring disk drive partitions into logical drives. The File System
uses these logical views to place the data. For instance, the volume manager
can mirror I/O requests to duplicate partitions, to provide redundancy and
improve performance. In this case, it takes a single I/O request from the file
system and creates two I/O requests for two different disk devices. Also, it can
stripe data across multiple drives to achieve higher performance; and it may
implement RAID algorithms to create fault-tolerant arrays of disk volumes.
Device driver
For DAS and SCSI block I/O on SAN and iSCSI networks, the device driver
software (or firmware) receives the I/O request from the volume manager
function. It formats the data and generates the appropriate signal for the targeted
storage device. It is the last software in the server to handle the data before it
leaves the hardware, and the first to handle it when it returns from the storage
device.
Network driver
In the case of network-attached devices, I/O must pass through the network
interface card (NIC) attachment to the network. The NIC contains a network
protocol driver in firmware. This describes the operations exchanged over the
underlying network protocol (such as TCP/IP). There are often several
protocol layers implemented here as a series of “device drivers.”
One of the layers is the file protocol driver software, which varies according to
the operating system environment. For instance, with Windows operating
systems the file protocol is CIFS; with UNIX it is NFS. Or it may be File Transfer
Protocol (FTP). These network file system protocol drivers interface to the
TCP/IP stack. CIFS and NFS are described in 2.6, “Network file system
protocols” on page 93.
In the NFS environments, the Network Lock Manager (NLM) provides support for
file locking when used.
Key features
The NFS provides the following key features:
Improved interoperability with other system platforms, increasing overall
network utilization and user productivity
Easy access to files for the end-user of the NFS client system
Uses industry standard TCP/IP protocols
With NFS, all file operations are synchronous. This means that the file operation
call returns only when the server has completed all work for the operation. In the
case of a write request, the server will physically write the data to disk. If
necessary, it will update any directory structure before returning a response to
the client. This ensures file integrity.
NFS is a stateless service. That means it is not aware of the activities of its
clients. As a result, a server does not need to maintain any extra information
about any of its clients in order to function correctly. In the case of server failure,
clients only have to retry a request until the server responds, without having to
reiterate a mount operation.
File locking and access control synchronization services are provided by two
cooperating processes: the Network Lock Manager (NLM) and the Network
Status Monitor (NSM). The NLM and NSM are RPC-based servers, which
normally execute as autonomous daemon servers on NFS client and server
systems. They work together to provide file locking and access control capability
over NFS.
CIFS defines a standard remote file system access protocol for use over the
Internet. This enables groups of users to work together and share documents
across the Internet, or within their corporate intranets. CIFS is an open,
cross-platform technology based on the native file-sharing protocols built into
Microsoft Windows and other popular PC operating systems. It is supported on
dozens of other platforms, including UNIX.
With CIFS, millions of computer users can open and share remote files on the
Internet without having to install new software or change the way they work.
With CIFS, existing applications and applications for the World Wide Web can
easily share data over the Internet or intranet, regardless of computer or
operating system platform. CIFS is an enhanced version of Microsoft's open,
cross-platform Server Message Block (SMB) protocol. This is the native
file-sharing protocol in the Microsoft Windows 95, Windows NT, and OS/2
operating systems. It is the standard way that millions of PC users share files
across corporate intranets. CIFS is also widely available on UNIX, VMS™,
Macintosh, and other platforms.
CIFS technology is open, published, and widely available for all computer users.
Microsoft has submitted the CIFS 1.0 protocol specification to the Internet
Engineering Task Force (IETF) as an Internet-Draft document. Microsoft is also
working with interested parties for CIFS to be published as an Informational RFC.
CIFS (SMB) has been an Open Group (formerly X/Open) standard for PC and
UNIX interoperability since 1992 (X/Open CAE Specification C209).
CIFS is not intended to replace HTTP or other standards for the World Wide
Web. CIFS complements HTTP while providing more sophisticated file sharing
and file transfer than older protocols such as FTP. CIFS is designed to enable all
applications, not just Web browsers, to open and share files securely across the
Internet.
CIFS benefits
Following are some benefits of using CIFS:
Integrity and concurrency - CIFS allows multiple clients to access and
update the same file, while preventing conflicts with sophisticated file-sharing
and locking semantics. These mechanisms also permit aggressive caching,
and read-ahead/write-behind, without loss of integrity.
Fault tolerance - CIFS supports fault tolerance in the face of network and
server failures. CIFS clients can automatically restore connections, and
reopen files, that were open prior to interruption.
Although the File System does not deal directly with the physical device, it does
have a map of where data is located on the disk drives. This map is used to
allocate space for the data, and to convert the file I/O request into storage I/O
protocols. The I/O must go to the device in a format which is understandable to
the device; in other words, in some number of “block-level” operations. The File
System therefore creates for the I/O some metadata (data describing the data),
and adds information to the I/O request which defines the location of the data on
the device.
The File System deals with a logical view of the physical disk drives. It maps data
on to logical devices as evenly as possible in an attempt to deliver consistent
performance. It passes the I/O request via a volume manager function, which
processes the request based on the configuration of the disk subsystem it is
managing. Then the volume manager passes the transformed I/O to the device
driver in the operating system.
The device driver reads or writes the data in blocks. It sizes them to the specific
data structure of the storage media on a physical device, such as a SCSI disk
drive. SCSI commands contain block information mapped to specific sectors on
the surface of the physical disk. This block information is used to read and write
data to and from the block table located on the disk device.
However, database applications are generally not oriented to file structures, but
instead are “record” oriented, using a great deal of indexing to database tables.
Different databases may have very specific I/O requirements, depending on the
applications they support. For instance, a data mining database system may
have very long streaming I/Os, whereas a transaction oriented database is likely
to generate many short bursts of small I/Os.
In this case, the database application provides its own mechanism for creating
an I/O request. It reads and writes blocks of data directly to a raw partition, and
provides its own volume management functions. The database assumes control
over a range of blocks (or sectors) on the disk. This range of blocks is called the
“raw partition.” It then directly manages the system software component of the
I/O process itself. In effect the raw partition takes the role of the File System for
the database I/O operations.
The database provides its own complete method of handling the I/O requests.
This includes maintenance of a tailored table, or index, which knows the location
of records on the disk devices. When it recognizes that an I/O operation is
required it uses this table, and directs the record-level I/O through the raw
partition to the device driver, which reads or writes the data in blocks to the disk.
The database application also handles security locking at the record level, to
prevent multiple users updating the same record concurrently. Some other
applications, especially those which stream large amounts of data to and from
disk, also generate “raw I/O”.
Server
Database
Application
Application
Raw Partition
Volume Manager
Manager
Device Driver
Data in blocks to sector location on disk Block I/O / data / storage location
On receipt of the I/O request to a file that is located in the remote NAS appliance,
the following occurs:
The I/O redirector performs what is called a “mapped drive” in the Windows
world, or a “remote mount” in UNIX.
The I/O request is directed away from the local I/O path to an alternative path
over the network, which accesses the remote file server.
Since the client system has no awareness of the device characteristics on which
the data is stored on the remote server, all redirected I/Os must be done at the
file (byte range) level. This is termed a “file I/O.”
The client is attached to the LAN by a Network Interface Card (NIC). Since the
NIC uses a network protocol, such as the TCP/IP stack, the I/O operation must
be transferred using a network protocol. Now one of the network file protocols
(such as NFS or CIFS) comes into play as a kind of network device driver. In
effect, the network file protocol lies on top of the lower level communications
protocol stack, such as TCP/IP. It is the TCP/IP protocol that carries the
redirected I/O through the NIC onto the network. On a LAN the media access
control layer used is typically the Ethernet CSMA/CD protocol. (See 2.3.3, “The
CSMA/CD protocol” on page 73.)
The receiving NAS device must keep track of the initiating client’s details so that
the response can be directed back to the correct network address. The route for
the returning I/O follows more or less the reverse path outlined above.
Appliance
Network File Protocol TCP/IP stack
Device Driver
(NFS / CIFS)
Network Host Bus
Interface Card Adapter
TCP/IP stack
Storage I/O Bus
The SANergy client software lies in a protocol layer beside the I/O Redirector. It
is first to see the I/O request from the application. It passes the initial file mount
(file I/O) request to the network, via the I/O Redirector and Network File Protocol
(NFS or CIFS). The I/O passes through the TCP/IP stack for encapsulation, and
out through the Network Interface Card.
Now the client can access the file directly via the SAN. SANergy knows all the
required details of the file on the device, and, in effect, “sees” the device itself.
Since “ownership” of the file has temporarily been ceded by the MDC to the
SANergy client, it can proceed with all further I/Os as block I/Os to the disk.
The client application continues to issue file I/Os, as it appears to be working with
a remote file system. The SANergy client code effectively blocks this view, and
intercepts the I/Os. These are redirected via the client’s own file system and
volume manager to the device driver. I/Os are converted to serial SCSI block
I/Os for transmission through the Fibre Channel SAN to the disk device. This is
illustrated in Figure 2-10.
File System
Network File Protocol
(NFS / CIFS)
Volume Manager
Client File I/O request to MDC Server returns file access, SANergy client redirects all I/O
1 SANergy MDC Server
2 locks and disk metadata 3 over SAN as block I/O to disk
The iSCSI client (the initiator) has a special SCSI mini-port driver layer of
software associated with the SCSI device driver. We call it the iSCSI device
driver layer. This is used to interface to TCP/IP, and to encapsulate the SCSI
commands into the TCP/IP stack. TCP/IP accesses the network device driver
firmware of the Network Interface Card (NIC), and transmits the I/O in SCSI
blocks over the network to the iSCSI storage appliance.
On arrival at the NIC of the target iSCSI appliance, the I/O is passed through the
receiving network device driver to the TCP/IP stack in the target. The iSCSI
device driver layer de-encapsulates the I/O from TCP, and passes it to the SCSI
device driver. From there it is handed on to the storage system bus adapter (the
ServeRAID adapter on the 200i), and then to the device.
The return journey is the reverse of the outbound route. Like the network file I/O,
you can see that today there is a software stack processing overhead associated
with an iSCSI I/O request. This has performance implications, but in general they
are less than for a file I/O. See Figure 2-11 on page 107.
Encapsulation
Raw Partition SCSI Device Driver
De-encapsulation
Volume Manager
Manager
TCP/IP stack
Network Interface Card
Network Interface Card
File I/O
Block I/O IP Network
The collection of two or more server engines into a single unified cluster makes it
possible to share a computing load without users or administrators needing to
know that more than one server is involved. For example, if any resource in the
server cluster fails, the cluster as a whole can continue to offer service to users
using a resource on one of the other servers in the cluster, regardless of whether
the failed component is a hardware or software resource.
The design of network storage systems, like IBM’s TotalStorage NAS devices,
offer high availability configurations to meet these demands. Today they use
concepts derived from clustered server implementations, such as Microsoft
Cluster Services.
There are three possible levels of clustering availability, which use the common
industry terminology of shared null, shared nothing, and shared everything.
However, if a component which is a single point of failure (SPoF) fails, then the
node has no fault tolerance. It cannot failover to an associated node to provide
continued access to data. This would apply, for instance, in the case of two
single-node Network Attached Storage 300G Model G01s. If the appliance or its
attached disk fails, the other node has no access to the data. This is illustrated in
the top section of Figure 2-13 on page 110.
Shared Null A A
No failover
LAN/WAN LAN/WAN
No load balancing
B B
Shared Nothing A A
Failover
No load balancing LAN/WAN LAN/WAN
B B
Shared Everything A
A
Failover
Load balancing LAN/WAN
LAN/WAN
B B
Shared Nothing implies that storage and other resources are shared only after a
failure. Otherwise the two nodes do not share resources.
When a node in the cluster needs to access data owned by another cluster
member, it must ask the owner. The owner performs the request and passes the
result back to the requesting node. If a node fails, the data it owns is assigned, in
the case of a two node cluster, to the other node in the cluster (or to another
node, in the case of more than two nodes in the cluster).
Shared Nothing is illustrated in the second layer of Figure 2-13. The Shared
Nothing model makes it easier to manage disk devices and standard
applications.
At the time of writing, no NAS appliances on the market are implemented with a
“shared everything” architecture.
Tivoli Netview is a well established management tool for networks. IBM has
introduced a family of data and SAN resource management tools, namely Tivoli
Storage Manager and Tivoli Storage Network Manager. These cooperate with
device-specific management tools, such as IBM’s StorWatch family of software.
Tivoli NetView provides the scalability and flexibility to manage large scale
mission-critical network environments.
Tivoli NetView SmartSets allow you to group network resources that should be
managed similarly, and apply policies to these groups. As a result, you can
manage a set of resources as though it were a single device. SmartSets let you
dynamically group resources by type, location, vendor, services offered, or other
common characteristics.
With its highly scalable design, the Tivoli NetView Web console allows you to
observe network activity from anywhere. Using the Web console, you can view
events, node status, and SmartSets, as well as perform network diagnostics.
Today's TCP/IP networks are more complex than ever. Tivoli NetView accurately
manages and represents complex topologies and provides accurate status
information. Additionally, networks often comprise a wide variety of devices such
as hubs, routers, bridges, switches, workstations, PCs, laptops, and printers.
With Tivoli NetView, you can decide which of these devices to manage. You can
then focus on your most important devices, as well as the most important
information about those devices.
With Tivoli NetView you can distribute management functions to remote locations
that cannot support full-scale management. This minimizes administrative
overhead, and eliminates the need for dedicated management systems
throughout the network. Local management is enabled to handle most problems,
while staff members in the network operations center monitor critical systems.
Tivoli Decision Support Network Guides provide insight and the ability to perform
thoughtful data analysis. These guides enable you to proactively manage your
network by presenting trend data and quickly answering questions. The following
are three Tivoli Decision Support Guides for Tivoli NetView:
Network Element Status: Provides a detailed view of the overall health and
behavior of your network's individual elements, such as routers, servers, end
systems, SNMP data, and MIB expressions collected from MIB II agents.
Network Event Analysis: Provides an overall view of network and NetView
event flow and event traffic. It analyzes events over time, distinguishing
device class and event severity.
Network Segment Performance: Provides a view of network segment
behavior primarily determined by using RMON characteristics on the network.
By providing a means to gather key network information and identify and solve
problems, Tivoli NetView allows network administrators to centralize the
management of network hardware devices and servers. Tivoli NetView is a
smarter way to isolate, evaluate, and resolve network issues. It is an ideal
solution for identifying and resolving short- and long-term network problems.
Tivoli Netview is not bundled with any of the NAS products.
TSM supports eight different server platforms: Microsoft Windows NT, AIX, Sun
Solaris, HP-UX, VM, OS/390, OS/2, and OS/400. It also protects more than 35 of
the most popular platforms as clients, including Apple, Digital, HP, IBM,
Microsoft, NCR, SCO, Silicon Graphics, Sun Microsystems, and more. TSM
integrates fully with hundreds of storage devices, as well as LAN, WAN, and
emerging SAN infrastructures. It provides online backups of all major groupware,
ERP applications, and database products. The objective is to keep information
available and accessible to anyone, anywhere.
TSM's progressive backup methodology has earned high marks from users. An
initial full backup is routinely supplemented with incremental backups that require
minimal network bandwidth. An intelligent relational database tracks all backups.
It builds, offline, the complete up-to-date picture. TSM keeps track of where files
are located. Incremental backups are performed in the background, so you can
continue to perform business as usual.
For a mobile workforce, TSM features patented byte- and block-level technology
to help you more effectively manage the rising volume of information stored on
laptop computers. Since TSM typically transmits only changed data, backups
occur in a fraction of the time required to back up entire files.
For Storage Area Networks, TSM provides integrated tools which exploit SAN
functionality, such as LAN-free backup to reduce the traffic on your IP network.
Tape libraries can be dynamically shared between multiple TSM servers. All
backups are managed intelligently, so recovery is a single, fast process. And
TSM can be configured to rebuild revenue-generating applications and customer
touchpoints first.
Tivoli provides tools that enable online backups and restores, and manages
database transaction logs. Support is provided for most of today's popular
systems, including Lotus Notes, Lotus Domino, Informix, SAP R/3, Oracle,
Microsoft SQL Server, and Microsoft Exchange Server.
TSM can be integrated with other Tivoli software such as the Tivoli Enterprise
solution. It delivers a complete view of operations and monitors and manages the
entire business process, including: networks, systems, storage information, and
business applications. Tivoli Storage Manager client comes already pre-installed
as part of the IBM TotalStorage NAS products.
The TSNM Server is supported on Windows 2000 Advanced Server Edition. The
managed host platforms are supported on Windows NT, Windows 2000, IBM
AIX, and Sun Solaris.
TSNM integrates with Tivoli NetView. This allows you to monitor and control your
SAN infrastructure and devices from the same interface you use to manage your
LAN and WAN. These customer networks can now be viewed from a single
console.
Tivoli Storage Network Manager allows you securely to allocate the discovered
storage resources to the appropriate host systems. You can easily assign disk
storage resources or Logical Unit Numbers (LUNs) from the SAN storage
subsystems to any computers connected to the SAN. TSNM effectively allows
multiple computers to share the same SAN resources, and the same storage
subsystems, even though they may be using different file systems. TSNM
ensures that the right host is looking at the right source.
Events and data from the SAN are continuously captured, providing information,
alerts, and notification to administrators for problem resolution. SAN-related
events are forwarded to SNMP (Simple Network Management Protocol)
management tools such as Tivoli Event Console (TEC).
A methodology to translate between the logical view and the physical view is
required in order to implement storage virtualization. The question arises “Where
and how should this be done?”
The Storage Tank ultimately will deliver the promise of heterogeneous storage
networking. It will provide a universal storage system capable of sharing data
across any storage hardware, platform, or operating system. Storage Tank is a
software management technology that unleashes the flow of information across
a storage area network, providing universal access to storage devices in a
seamless, transparent, and dynamic manner.
The illustration in Figure 2-14 shows that Storage Tank clients communicate with
Storage Tank servers over an enterprise's existing IP network using the Storage
Tank protocol. It also shows that Storage Tank clients, servers, and storage
devices are all connected to a Storage Area Network (SAN) on a high-speed,
Fibre Channel network.
Heterogeneous
Clients NT AIX Linux Solaris
(Workstations
or Servers)
Device to
Shared device copy Metadata
for backup
Storage
and migration
servers
Devices
Matadata Servers for
Authentication
Active Data Backups and Access control
migrated data Locking
Data placement
File level outboard
services
An enterprise can use one Storage Tank server, a cluster of Storage Tank
servers, or multiple clusters of Storage Tank servers. Clustered servers provide
load balancing, fail-over processing, and increased scalability. A cluster of
Storage Tank servers are interconnected on their own high-speed network or on
the same IP network they use to communicate with Storage Tank clients. The
private server storage that contains the metadata managed by Storage Tank
servers can be attached to a private network connected only to the cluster of
servers, or it can be attached to the Storage Tank SAN.
For more details on storage network virtualization, refer to the IBM Redbook
Storage Networking Virtualization - What’s it all about?, SG24-6211-00.
NAS appliances like the IBM TotalStorage Network Attached Storage 200 and
300 are fully integrated and dedicated storage solutions that can be quickly and
easily attached to an IP network. Their storage will then become immediately and
transparently available as a network file serving resource to all clients. These
specialized appliances are also independent of their client platforms and
operating systems, so that they appear to the client application as just another
server.
Note: As of the time of writing, these are the available products IBM has to
offer. The latest information on IBM Storage Networking products is always
available at this website:
http://www.storage.ibm.com/snetwork/index.html
Two models have been developed for use in a variety of workgroup and
departmental environments. They support file serving requirements across NT
and UNIX clients, e-business, and similar applications. In addition, these devices
support Ethernet LAN environments with large or shared end user workspace
storage, remote running of executables, remote user data access, and personal
data migration.
Both models have been designed for installation in a minimum amount of time,
and feature an easy-to-use Web browser interface that simplifies setup and
ongoing system management. Hot-swappable hard disk drives mean that you do
not have to take the system offline to add or replace drives, and redundant
components add to overall system reliability and uptime.
To help ensure quick and easy installation, both NAS models have tightly
integrated preloaded software suites.
The NAS 200 models scale from 108 GB to over 3.52 TB total storage. Their
rapid, non-disruptive deployment capabilities mean you can easily add storage
on demand. Capitalizing on IBM experience with RAID technology, system
design and firmware, together with the Windows Powered operating system (a
derivative of Windows 2000 Advanced Server software) and multi-file system
support, the NAS 200 delivers high throughput to support rapid data delivery.
Dedicated
As a fully-integrated, optimized storage solution, the NAS 200 allows your
general-purpose servers to focus on other applications. Pre-configured and
tuned for storage-specific tasks, this solution is designed to reduce setup time
and improve performance and reliability.
Open
The open-system design enables easy integration into your existing network
and provides a smooth migration path as your storage needs grow.
Scalable
Scalability allows you to increase storage capacity, performance, or both, as
your needs grow. NAS 200 storage capacities ranging from 108 GB to
440.4 GB (Model 201), and from 218 GB to 3.52 TB (Model 226) are
provided, while NAS 300 can be scaled from 360 GB to 6.61 TB (Model 326).
Flexible
Multiple file protocol support (CIFS, NFS, HTTP, FTP, AppleTalk, and Novel
NetWare) means that clients and servers can easily share information from
different platforms.
Reliable
Hot-swappable disk drives, redundant components, and IBM Systems
Management are designed to keep these systems up and running.
Pre-loaded software
The NAS 200 is preloaded with Windows Powered OS and other software
designed specifically to enable network clients to access large amounts of
data storage on the NAS server using multiple file protocols. Pre-loaded
software is described in 3.1.7, “IBM NAS 200 preloaded software” on
page 129.
ServeRAID4LX RAID
Drives
IBM NAS 200 Model 201
Figure 3-2 shows a picture of the IBM 5194-201 NAS 201 tower model.
ServeR A ID 4H
Engine
A ppliance
O ptions
EXP 300
EXP 300
EXP 300
IB M N AS 20 0 M o d el 2 26
Features Benefits
One 1.133 GHz Pentium III Processor — Powerful processor for optimal
Model 5194-201 performance
Two 1.133 GHz Pentium III Processor — Increased processing power for more
Model 5194-226 storage-intensive environments
Note: Use of tower-to-rack conversion kit does not transform a Model 201 into
a Model 226. It is simply a means of converting a Model 201 from tower into a
rack configuration
Network administrators not currently running DHCP servers will find the
advanced appliance configuration utility particularly useful for automatically
configuring network settings for newly added IBM NAS 200 appliances. Even
administrators with networks using DHCP servers can benefit from the advanced
appliance configuration utility, by permanently assigning IP addresses and host
names automatically and launching Web-based management.
The IBM Storage Unit Models EXU and EXX contain two hot-swappable,
redundant power supply/fan assemblies. Potential failure-causing conditions
are reported to the controller via Predictive Failure Analysis (PFA).
Here are the key features of the IBM 5194 NAS Storage Unit Model EXU and
EXX:
Supports data transfer speeds of up to 160 MB
Note: A maximum of three IBM 5194 Storage Unit Model EXU and EXX can
be attached to the IBM NAS 200 Model 226.
The NAS 300 appliance provides an affordable but robust solution for the storage
and file serving needs for a large department or a small enterprise. It provides
the same features and benefits as the IBM NAS 200 series products. In addition,
with its second engine, it provides an increase in reliability and availability
through the use of clustering software built into the appliance.
The NAS 300 also provide scalability, fault tolerance, and performance for
demanding and mission critical applications. The NAS 300 consists of a dual
engine chassis with failover features. It has dual fibre channel hubs and a fibre
channel RAID Controller. The 300 is preloaded with a task-optimized Windows
Powered Operating System. With its fault-tolerant, dual engine design, the 300
provides a significant performance boost over the 200 series.
The NAS 300 system will scale easily from 364 GB to 6.55 TB, making future
expansion simple and cost-effective. It comes ready to install, and becomes a
part of a productive environment with minimal time and effort.
The preloaded operating system and application code is tuned for the network
storage server function, and designed to provide 24 X 7 uptime. With multi-level
persistent image capability, file and volume recovery is quickly managed to
ensure highest availability and reliability.
F ib r e C h a n n e l N ode
FC H ub
FC H ub
F C R A ID
JB O D
JB O D
JB O D
e th e rn e t
e n g in e 1
e th e rne t
en gine 2
fc h u b1 fc h ub 2
R A ID C o ntro lle r
S to ra ge U n it
S to rag e U n it
Figure 3-8 represents the IBM TotalStorage NAS 300 maximum configuration.
e th e rn e t
e n g in e 1
e th e rn e t
e n g in e 2
fc h u b 1 fc h u b 2
S to ra g e U n it S to ra g e U n it
S to ra g e U n it S to ra g e U n it
Specifications 5195-326
Number and type of processors (std./max) 1/2 Pentium III 1.133 GHz
L2 Cache 512KB
Features Benefits
Redundancy Hot Swap HDD Hot Swap HDD Hot Swap HDD
Hot Spare HDD Hot Spare HDD Hot Spare HDD
Hot Swap Hot Swap Hot Swap
Redundant Power Redundant Power Redundant Power
Supplies Supplies Supplies
Two different types of configurations are available for this product: the
single-node G01 and the dual-node G26. The dual node Model G26 also
provides clustering and failover protection for top performance and availability.
The IBM TotalStorage NAS 300G, 5196 models are specialized NAS appliances
acting as a high-bandwidth conduit. They connect LAN-attached clients and
servers to the SAN through high-speed Fibre Channel paths.
Ethernet Fibre
Shared SAN
File IO Storage Block IO
Protocols Protocols
The main characteristics of the IBM TotalStorage NAS 300G are the following:
Easy to use and install
No keyboards, mouse, or display required to configure and maintain
Supports CIFS, NFS, Novell NetWare, FTP, AppleTalk, and HTTP
Persistent image file server backup, a point-in-time backup accessible by
users without administrator intervention
Web-based GUI administration tools
Windows Terminal Services for remote administration and configuration
Uses external storage
Netfinity Director agent
Tivoli Storage Manager client
SANergy
LAN
E th e rn e t
NO DE 1 F /C
F ib r e C h a n n e l S w itc h
( C u s t o m e r P r o v id e s )
SAN
Figure 3-10 IBM NAS 300G, 5196 G01 single node diagram
Figure 3-11 The IBM TotalStorage NAS 300G G01 single node model
The IBM TotalStorage NAS 300G Dual Node Model is made up of 2 individual
rack-mounted Single Node units and includes the following hardware
components in each unit:
2 x 1.13 GHz Pentium III Processors
512KB L2 Cache
1 GB SDRAM
3 x 36.4 GB Hard Drive
Up to 4 Ethernet Adapters (at most, 2 can be gigabit)
Fibre Channel Adapter (qlogic) in each chassis
LAN
Ethernet Ethernet
NODE 1 NODE 2
Ethernet
SAN
Figure 3-13 IBM NAS 300G, 5196 G26 dual node diagram
Nodes 1 2
Number of processors per Dual 1.13 GHz Pentium III Dual 1.13 GHz Pentium III
engine
Clustering/failover No Yes
Memory (std./max) 1 GB / 4 GB 1 GB / 4 GB
Features Benefits
Dual node configuration — G26 only Clustered Failover support for increased
availability and performance
Provide remote LAN users access to Access to pooled storage on SAN and does
SAN storage not require individual Fibre Channel
connections
In addition to the operating system and application code, the code load contains
configuration and administration tools which simplify remote configuration and
administrator tasks. Network management agents are included that provide
options by which the NAS Models G01 and G26 can be managed.
The software listed in Table 3-8 is included in the IBM NAS 300G.
Table 3-8 IBM NAS 300G software
Tivoli SANergy
Tivoli SANergy software is pre-installed and ready to license on the TotalStorage
Network Attached Storage 300G. It can provide all of the benefits of a NAS
device with the higher performance and scalability of a SAN.
Any computer connected to the 300G can increase its bandwidth access to SAN
storage. Bandwidth-hungry computers can now receive data from the 300G at up
to 100 MB per second using SANergy. Tivoli SANergy will dynamically route data
to either the LAN or SAN to provide optimum network utilization and
performance.
The use of SANergy will not only increase disk-to-computer bandwidth for
individual computers; it will also greatly reduce CPU utilization on those
computers while accessing SAN storage. It will also reduce data copy and
transfer times between any computer connected this same way and will greatly
reduce traffic over the LAN. For more information on Tivoli SANergy, contact
Tivoli, or refer to this Web site:
http://www.tivoli.com/sanergy/nas
The following sections show various connectivity configurations using the IBM
TotalStorage NAS 300G.
IP network Application
client a server
ESS or MSS
or FAStT200 & 500
client b
ethernet
SAN
Fibre
Channel
NAS 300G
Figure 3-15 IBM NAS 300G with IBM 7133 SSA subsystem
The IBM IP Storage 200i is a low cost, easy to use, native IP-based storage
appliance. The 200i is designed for workgroups, departments, general/medium
businesses, and solution providers that have storage area network requirements
across heterogeneous clients. It integrates existing SCSI storage protocols
directly with the IP protocol. This allows the storage and the networking to be
merged in a seamless manner. iSCSI-connected disk volumes are visible to IP
network attached processors, and as such are directly addressable by database
and other performance-oriented applications. The native IP-based 200i allows
data to be stored and accessed wherever the network reaches, LAN, MAN, or
WAN distances.
IBM TotalStorage IP Storage 200i family consists of the 4125 Model 110 and
4125 Model 210 tower systems, and the 4125 Model EXP rack-mounted system.
All required microcode comes preloaded, minimizing time required to set up,
configure and make operational the IP Storage 200i. There are only two types of
connections to make: attaching the power cord(s) and establishing the Ethernet
connection(s) to the network. High speed, 133 MHz SDRAM is optimized for
133 MHz processor-to-memory subsystem performance. IBM IP Storage 4125
Model 110 and IP Storage 4125 Model 210 use the ServerWorks ServerSet III LE
(CNB3.OLE) chipset to maximize throughput from processors to memory, and to
the 64-bit and 32-bit Peripheral Component Storage (PCI) buses.
After power on, the initial IP address configuration is a straightforward task which
would be completed by the system administrator. The IBM TotalStorage IP
Storage 200i provides a browser-based interface with which the system
administrator can configure the network easily. RAID provides enhanced disk
performance while minimizing storage failure. Adding disks and administering
operations can occur while the system is online, providing excellent operational
availability.
IBM provides iSCSI initiator drivers for Linux, Windows NT, and Windows 2000.
These drivers are available for download from the following website:
http://www.storage.ibm.com
IBM provides a user ID and password to authorized customers and users. The
download package extracts all files, including a README, which explains how to
build the initiator for particular hardware types and Linux versions. The Windows
NT and 2000 install packages run under Install Shield, which will install drivers
and update the registry. Information provided explains how to configure the IP
address of the iSCSI target. Once installed and configured (assuming the system
administrator assigns access to storage for the initiator machine), the iSCSI
initiator driver will open a connection to the iSCSI target on bootstrap and will
treat the assigned storage just like a locally attached disk. This is an important
concept and has implications which are discussed later in this chapter.
The departmental model, IP Storage 200i, 4125 Model 210, is rack mounted and
consists of the following components:
– Dual 1.13 GHz Pentium III Processors
– 1 GB of ECC 133 MHz System Memory
– 512 KB Level 2 cache per processor
– ServeRAID-4H - high function, four-channel RAID adapter
– 3/109 GB of HDD Storage, expandable up to 6/440 GB internal
The IBM TotalStorage IP Storage 200i 4125 Model EXP is a storage expansion
unit that provides additional storage capability for the rack-based 4125. It
provides up to 1.027 TB storage capacity per unit and up to three expansion
units can be attached to a single 4125 Model 210, providing a maximum of
3.52TB of storage.
Number of Processors 1/2 1.13 GHz Pentium III 2/2 1.13 GHz Pentium III
(std./max)
Expansion Slots 5 5
The ability to access storage residing on the IBM TotalStorage IP Storage 200i is
coordinated by Access Control logic in the Web-based User Interface (UI). iSCSI
clients use an assigned client ID and password to access assigned LUNs.
The system disk is partitioned for multiple system images for upgrade and
recovery. The system is booted from the primary partition. If the boot fails, the
system is automatically booted from the Recovery CD-ROM, which invokes
failure recovery procedures. Through the service interface, the user can apply
new system images from a local management station.
Network management is supported via SNMP and standard MIBs. SNMP agents
and subagents support internal functions. A specific iSCSI MIB is not supported
in this initial product release.
The RAID levels supported are RAID 0, 1, 1E, 5, and 5E. Disk partitioning and
management, as well as RAID arrays, are supported. Hot-spare disks can be
defined for automatic failed disk replacement (with the exception of RAID 0).
These servers have the flexibility to handle applications for today and expansion
capacity for future growth.
Note: For more information on the IBM TotalStorage IP Storage 200i refer to
the redbook: Planning and Implementing Solutions using iSCSI, SG24-6291
The Cisco SN 5420 Storage Router provides access to SCSI storage over IP
networks. With the 5420 you can directly access storage anywhere on an IP
network just as easily as you can access storage locally. The SN5420 is shown
in Figure 3-17.
The SN 5420 uses the TCP/IP protocol suite for networking storage supporting
the level of interoperability inherent to IP networks. It leverages existing
management and configuration tools that are already well known and
understood. And it is based on industry standards, which maximizes your
investment by allowing you to leverage existing TCP/IP experience and
equipment.
The Cisco SN 5420 Storage Router is based on both IP and storage area
network (SAN) standards, providing interoperability with existing local area
network (LAN), wide-area network (WAN), optical and Storage Area Network
(SAN) equipment. The Cisco SN 5420 is a high performance router designed to
allow block-level access to storage regardless of your operating system or
location. The SN 5420 accomplishes this by enabling Small Computer Systems
Interface over IP (iSCSI). The SN 5420 connects to both the FC SAN network
and the IP network via Gigabit Ethernet. This allows the Cisco SN 5420 Storage
Router to perform gateway functions between environments and allows IP
routing intelligence to be leveraged with storage networking technologies.
Each server that requires IP access to storage via the Cisco SN 5420 Storage
Router needs to have the Cisco iSCSI driver installed. Cisco and Cisco partners
have developed, or are currently working on, iSCSI drivers that support the
following operating systems:
– Linux
– Sun Solaris
– Windows NT
– Windows 2000 (under development by Cisco)
– AIX (under development by IBM)
– HP UX (under development by HP)
– Netware (under development by Novell)
Using the iSCSI protocol, the iSCSI driver allows a server to transport SCSI
requests and responses over an IP network. From the perspective of a server
operating system, the iSCSI driver appears to be a SCSI or Fibre Channel driver
for a peripheral channel in the server. Figure 3-18 on page 168 shows a sample
storage router network. Servers with iSCSI drivers access the storage routers
through an IP network connected to the Gigabit Ethernet interface of each 5420
storage router. The storage routers access storage devices through a storage
network connected to the Fibre Channel interface of the management interface
of each storage router. For high availability operation the storage routers
communicate with each other over two networks: the HA network connected to
the HA interface of each storage router, and the management network connected
Specifications Description
Environmental
Physical Characteristics
AC Power
Output 70W
Frequency 50 to 60 Hz
Connector Duplex SC
Core Size, Modal Bandwidth, Maximum 62.5, 160, 722 ft. (220 m)
Length 62.5, 200, 902 ft. (275 m)
50.0, 400, 1640 ft. (500 m)
50.0, 500, 1804 ft. (550 m)
Connector Duplex SC
Core Size, Modal Bandwidth, Maximum 62.5, 160, 984 ft. (300 m)
Length 50.0, 400, 1640 ft. (500 m)
Once access is configured in the servers and once the storage mapping is
configured in a storage router, the storage router will forward SCSI requests and
responses between servers and the mapped storage devices.
Interoperability
The Cisco SN 5420 fits seamlessly into existing storage and data networks. The
Cisco SN 5420 uses the well-known TCP/IP protocol suite for network storage,
supporting the level of interoperability inherent to mature IP networking
protocols. The SN 5420 is based on current SAN standards, as well, and is
compatible with existing SAN deployments, point-point, switched, or arbitrated
loop.
Manageability
The Cisco SN 5420 Storage Router leverages existing management and
configuration tools that are already well known and understood. The SN 5420
provides full network management support through Simple Network
Management Protocol (SNMP), WEB-based GUI and command line interface
(CLI) access.
Investment protection
Total cost of ownership (TCO) is a growing concern for most system
administrators and management. The Cisco SN 5420 Storage Router helps
reduce the costs by leveraging your existing TCP/IP networking infrastructure
while maintaining your current and near-term investments in storage systems
and Fibre Channel infrastructure. The SN 5420 simplifies the cost of
management, deployment and support issues, given the fact that technical skills
in TCP/IP support are more widely available that SAN experience.
Note: For more information on the Cisco SN 5420, refer to the redbook:
Using iSCSI Solutions’ Planning and Implementation, SG24-6291
For the latest information about Cisco SN 5420, refer to the product page at:
http://www.cisco.com/warp/public/cc/pd/rt/5420/index.shtml
Moreover, these products use a balanced system design so that your system is
running at optimal performance levels for your environment. IBM also introduced
an innovative light-path service panel in conjunction with component-level LEDs
on certain failing components. This makes the identification and replacement of a
failing component extremely easy.
The light-path service panel directs you to the problem area, and the
component-level LEDs tell you which component is the problem. This helps you
minimize downtime and save spare parts for times you might need them.
Figure 4-1 shows a logical representation of the NAS 300 and 300G base drive
configuration.
Array A
Figure 4-1 Logical view of NAS 300 and 300G base drive configuration
With all these powerful remote management functions, security is essential. The
ASM Processor includes security features such as these:
Password protection
User profiles (up to 12 profiles with the ability to define the level of access
rights)
A time stamp in the event log of last login
Dial-back configuration to protect the server from unauthorized access
In addition, the PCI adapter enables more flexible management through a Web
browser interface. It also allows you to download flash BIOS for the ASM
Processor, as well as for the server, over a LAN, modem or ASM Interconnect.
The adapter also supports the generation and forwarding of unique SNMP traps,
allowing it to be managed by Tivoli Netview or Netfinity Director.
Automated Server Restart and orderly operating system shutdown are supported
by the ASM processor. The ASM processor is hardware and software
independent for all other functions.
The ASM uses a DOS based configuration utility. This provides additional
configuration functionality for both the ASM Processor and ASM PCI Adapter. In
addition, it also allows you to set up and configure all relevant parameters for the
ASM Processor and ASM PCI Adapter, independent of the operating system and
status of your server. This is done through a bootable DOS diskette.
NAS Appliance
NAS Appliance
NAS Appliance
Figure 4-2 View of interconnected NAS appliances using ASM PCI adapters
Figure 4-3 on page 180 shows the ServeRAID program found in the NAS
products.
If you do not plan to use the IBM Advanced Appliance Configuration Utility, then
you must install and use Windows Terminal Services to configure the appliance
server. Refer to the User’s Reference in the IBM TotalStorage NAS Appliance
product documentation for detailed instructions for installing and using the
configuration programs.
An example of the Terminal Services Client program found in the IBM NAS
products is shown in Figure 4-4.
Once the NAS appliance is detected by the IAACU console, you can use the
IAACU to set up and manage the appliance’s network configuration, including
assigning the IP address, default gateway, network mask, and DNS server to be
used by the appliance. You can also use the Advanced Appliance Configuration
Utility to start Universal Manageability Services on the appliance, enabling you to
perform more advanced systems management tasks.
Networks not currently running DHCP servers will find the IAACU particularly
useful for automatically configuring network settings for newly added appliance
servers. However, networks with DHCP servers will also benefit from using the
IAACU, as it enables the systems administrator to reserve and assign the
appliance IP address in an orderly, automated fashion. Even if the customer
decides to use DHCP and does not choose to reserve an IP address for the
appliance, the IAACU can still be used to discover appliances and to start UM
Services Web-based systems management.
Consider the following information when using the IBM Advanced Appliance
Configuration Utility:
1. The IAACU configures and reports the TCP/IP settings of the first adapter on
each appliance server only. The first adapter is typically the built-in Ethernet
Only one system running the IAACU console in a physical subnetwork is allowed
and supported.
Figure 4-5 shows the IBM Advanced Appliance Configuration Utility Console.
The Advanced Appliance Configuration Utility Console is divided into two panes:
The tree view pane
The information pane
In a Family
If the discovered appliance fits the requirements of a Family, it will automatically
appear as part of a Family. If a discovered appliance fits the requirements of
more than one Family, it is automatically added to the first appropriate Family
that is listed in the tree view, starting from the top of the tree. (For information on
how to move appliances between families, refer to , “Using Families and Groups
in the tree view” on page 186.)
The Advanced Appliance Configuration Utility is not the only way to configure
network settings. For example, network settings can be configured using
Terminal Services for Windows, or by attaching a keyboard and mouse to the
appliance and using Windows Control Panel on the server. If the appliance
network settings have been configured by a method other than using the IAACU,
the appliance will be discovered by the Advanced Appliance Configuration Utility
and it will be added to an appropriate Family, if one exists. Appliances that have
been configured using a method other than the IAACU for which no appropriate
family exists will appear in the Orphaned Externally Configured Appliances
group.
If you are not using DHCP, the Advanced Appliance Configuration Utility
automatically assigns one IP address per appliance server, using available
addresses within the range defined in the Family rules. When a Family’s IP
address range has been exhausted, the Advanced Appliance Configuration
Utility automatically searches for other Families that have rules matching the
appliance server being configured. If a matching Family with an available
address is found, the server will automatically be assigned to the Family that has
available IP addresses. This enables you to define multiple Families, each of
which uses a range of non-contiguous IP address ranges.
Orphaned appliances
Any discovered appliance servers that were configured using the Advanced
Appliance Configuration Utility, but that do not meet the rules for any existing
Family, are automatically added to the Orphaned Appliances group.
From here, clicking Administer this server appliance will lead you to the NAS
administration menu, as shown in Figure 4-7.
From this menu, you will be able to perform the following actions:
Network setup: Manage essential network properties
Services: Control essential services
Folders and Shares: Manage local folders, and create or modify file shares
Disks and Volumes: Configure disks, volumes, disk quotas, and persistent
images
Users and Groups: Manage local users and groups
Maintenance: Perform maintenance tasks
Help: View online help
The IBM NAS products use two types of backup implementation: point-in-time
image copies and archival backup.
Point-in-time backup
Point-in-time images provide a near instant virtual copy of an entire storage
volume. These point-in-time copies are referred to as persistent images and are
managed by the Persistent Storage Manager (PSM) software.
These virtual copies are created very quickly and are relatively small in size. As a
result, functions that would otherwise have been too slow, or too costly, are now
made possible. Use of these persistent images may now allow individual users to
restore their own files without any system administrator’s intervention. With the
pre-loaded code, the NAS administrator can set up the Persistent Storage
Manager automatically to schedule an instant virtual copy. This could be done
every night, for example, and users could be given access to their specific virtual
copies. If users accidentally delete or corrupt a file, they can drag-and-drop from
the virtual copy to their storage without any administrator involvement.
Archival backup
Archival backup is used to make full, incremental, or differential backup copies,
which are typically stored to tape. The NAS Persistent Storage Manager can
resolve the well-known “open file” problem of making backup copies in a 24x7
operation.
Read cache
The basic goal of a read cache is to get data into the processor as quickly as
possible. The read cache algorithms attempt to accomplish this goal by having
the most-often-used data written into fast-technology memory such as in RAM,
rather than disk. Based on some algorithms, it will make a “best guess” of what
data will be needed next. These algorithms generally copy data into the faster-
technology read cache by pre-fetching the data from the slower memory, or
keeping a copy of the data from an earlier write. Read caches are used heavily
because they can provide dramatic performance gains at a modest cost.
Write-back cache
The basic goal of a write-back cache is to “get rid of” data stored in the processor
as quickly as possible. In a write-back cache, the read cache is updated
immediately, but the change to the “real” (not read) cache location might be
slightly delayed as it uses a slower storage technology. During this time period,
the data waits in the write-back cache queue. Performance for the write-back
cache approach is very fast. The write operation completes as soon as the
(faster) read cache is updated, taking it “on faith” that the real location will also be
updated soon.
Write-through cache
In a write-through cache, a write operation is simultaneously updated in the
cache copy and in the “real” location, and a separate write-cache buffer is not
required. This approach is of course the simplest and “safest,” but unfortunately,
it is also slower than a write-back cache. A write-through cache is slower
because the write operation cannot complete until both copies are updated, as
the “real ” (not the cache) copy is stored in a much slower technology (slower
RAM, or even a disk). Assuming that there is no battery backup, a write-through
cache approach is “safer” because both copies are always exactly the same,
even if the cache copy gets destroyed (for example, RAM cache during loss of
power), because the real copy (for example, the disk copy) has already been
updated.
Microprocessor cache
Most current microprocessors, such as the Intel Pentium processors, use
multiple read and write cache mechanisms to improve performance. There are
instruction caches, data caches, and even register caches. Both write-through
and write-back schemes are often used. The most-widely known microprocessor
cache is the level 2 cache (L2). In earlier processors, such as the 486 processor,
the L2 cache was implemented with separate memory chips on the personal
computer motherboard. With today’s technology, this cache is included within the
microprocessor chip, module, or microprocessor carrier card.
A customer can purchase varying amounts of main-memory RAM for the engines
on IBM Network Attached Storage products. Some of this RAM is used for the
Powered by Windows OS, but the vast majority is used as a large read cache for
the user data stored on the disk. When a NAS user requests a file, the NAS
If a NAS user writes data, this update must be made to both the disk copy and
the RAM copy (if any). In the IBM Network Attached Storage products, the main
RAM has error correction code (ECC) technology to protect against loss of data
due to a partial memory failure. However, this memory is not powered by
batteries, so all data in RAM is lost upon power down. Likewise, if there is a
power failure on the box, or should the operating system abend, the main RAM
contents will be lost. Furthermore, if write-back mode was being used when this
problem occurred, the data in the main RAM would never get written back to the
disk. To avoid this potential problem, the Windows Powered OS caches are
configured for write-through mode.
The ServeRAlD-4LX adapter used in the IBM NAS 200 Model 201 workgroup
machine has 16 MB of internal RAM memory, most of which is for a disk-read
cache. This adapter does not have a battery-backed write cache. If this RAID
adapter is used in write-back mode, a failure at the wrong moment will result in
permanent lost data, even if the data is written to a redundant RAID configuration
(such as RAID 1 or RAID 5). For some operations this might be acceptable, but
for most cases, it would not. Therefore, this adapter should generally be run in
write-through mode, so that data integrity is not dependent on the cache
contents.
The ServeRAlD-4H adapter used in the IBM NAS 200 Model 226 departmental
machine has 128 MB ECC battery-backed cache, 32 MB onboard processor
memory, and 1 MB L2 cache for complex RAID algorithms, which allows this
RAID controller to be safely configured for write-back operations. Should there be
a power failure or an abend in the NAS product before the write-to-disk
completes, the data to be written will still be contained in the battery-backed
RAM. When power is restored and the NAS product is rebooted, the RAID card
will be triggered to flush out all remaining information in the battery-backed RAM.
This data will be written to the disk, and all remaining write operations will be
completed automatically. For the best performance, this card can be safely run in
In the IBM NAS 300, the RAID subsystem is not contained in the engine
enclosure itself but instead is contained in the first storage unit enclosure. Within
this storage unit enclosure are dual RAID controllers and dual power supplies to
provide a completely redundant solution with no single point of failure. Large
system configurations have a second identical RAID subsystem, which also has
dual RAID controllers. Each of the dual RAID controllers has 128 MB of internal
battery backed-up write ECC RAM. The RAID subsystem can be safely
configured for write-back operations. Should power fail (or the NAS product
abend) before the write-to-disk completes, the data to be written is still contained
in the battery-backed RAM. When power is restored and the NAS device is
rebooted, the RAID will realize that there is information still in the battery-backed
RAM that must be written to the disk, and this write operation will complete
automatically. Additionally, as write-back data is stored in both of the dual RAID
controllers, this write-back will occur even if one of these RAID controllers fails.
For the best performance, this adapter can be run safely in write-back mode, as
all writes to the disks will eventually get written to the disk array.
The IBM NAS 300G does not have an integrated RAID controller, but instead
uses a disk subsystem that is SAN-attached. Based on the properties of that
SAN disk subsystem, the SAN administrator may choose to run that RAID
adapter in write-through or write-back mode, after considering performance and
potential data integrity tradeoffs, if any.
Number of copies limitation The number of tape cartridges or Total disk storage space
availability of disk capacity to available (250 maximum images
hold backup images per volume)
Used for NAS Operation System or NAS Mainly for NAS user (client) data
user (client) data backup backup
Stores volumes as an entity No, but volumes are simply a No, but volumes are simply a
collection of files (dependent on collection of files
additional backup software)
Useful for disaster recovery Yes (if written to tape) No, as data is always stored on
where entire disk system is disk within the same NAS
destroyed (for example, fire) appliance. This approach is not
useful if disk is destroyed
Useful for recovery where data is Yes, administrator recovery only Yes, administrator or user
accidentally erased or modified recovery
On the IBM NAS products, all of the following terms refer to the same
functionality:
Persistent image
True Image on Columbia Data Products
Point-in-time image
Instant virtual copy
Snapshot on NetApp or StorageTek
Usually, after a backup is made, the users will continue to update those files on
the disk. These backups will “turn stale” with time (that is, they will be outdated
after a while). However, it is very important that the data on the backup stays
exactly as it was when the backup was made.
Unfortunately, making a backup copy while the data is still changing is rather
difficult. Commonly encountered problems include:
While data is changing, multiple sectors are being written to disk
Write-back caches might not have completed writing to disk
An application that is changing two or more files “at the same time” will not
truly update both at the exact same instant
Therefore, for a good backup, these changes must not occur while the backup is
being made, so that all data written is consistent in all changed files.
Historically, this problem has been solved by disabling all users while the backup
occurs. However, this may take several hours. In today’s 24x7 environment,
having such a large backup window is simply not acceptable. In these NAS
systems, this problem is solved by making a very quick “instant virtual copy” of a
volume, a True Image copy.
Figure 5-1, Figure 5-2, and Figure 5-3 show the copy-on-write, normal read, and
reading of data from a persistent image, during the execution of the PSM.
Copy-on-write operation
NAS file
system
1. Write request to
update disk
N A S f ile
s y s te m
1 . R e a d n o rm a l
( n o t p e r s is te n t 4 . D a ta p a s s e d - -
im a g e ) c o p y no change
P S M s o ftw a re PS M cache
2 . P a s s e d to d i s k - 3 . D a ta p a s s e d --
no change no change
D is k
R e a d d a ta fr o m P e r s is te n t Im a g e
N A S file
sy s te m
1 . R e a d f ro m t h e
p e rs is t e n t 3 . F o r c h a n g e d s e c t o rs , P S M s u b s t it u te s t h e
im a g e c o p y o rig in a l fr o m its c a c h e w h e n it s e n d s th e
d a t a t o t h e N A S file s y s t e m .
P S M s o ftw a r e P S M cache
2 a . S e c t o rs th a t
2 b . F o r s e c to r s t h a t h a v e c h a n g e d , t h e
ha ve not
p r e v io u s ly -s a v e d o rig in a l s e c t o r d a t a is
c h a n g e d a re
re trie v e d fr o m th e P S M c a c h e
re a d f ro m t h e
re g u la r lo c a t io n .
D is k
Figure 5-3 Persistent Storage Manager — reading data from persistent image
In these examples, we assume that the disk originally contained only the
following phrase:
“Now is the time for all good men to come to the aid of their country.”
Table 5-2 shows the layout of how the disk would appear immediately after the
True Image copy is made. Note that nothing has really changed (while pointers
and control blocks have changed, for simplicity those details are not shown here).
Table 5-3 shows the layout of the PSM cache after “instant virtual copy” is made.
Notice that it contains empty cells.
Table 5-3 Layout of PSM cache after “instant virtual copy” is made
Table 5-4 shows the layout of how the disk would appear immediately after the
original file was erased. Note that a copy of the original file system (metadata,
and so on) is all that is saved.
Table 5-5 shows the layout of the PSM cache immediately after file is deleted.
Notice that the PSM cache contains a copy of the original file system data.
(FS)
Table 5-6 shows the layout of how the disk would appear if the word “time” was
changed to “date”. For this example to be truly correct, we would further assume
the application program only wrote back the changed sectors (as explained later,
this is not typical). This picture illustrates how the sectors might appear.
Table 5-7 shows the layout in which the PSM cache would contain the original
sector contents for the word “time” and the file system’s metadata:
time (FS)
Table 5-8 shows the layout of how the disk would appear if the change requires
more spaces. Since more spaces are required, obviously the data following the
word “women” would also change as well. The original contents of all changed
sectors would have to be saved in the PSM cache. Note that this example is not
cumulative with examples B or C.
Table 5-9 shows the layout in which the PSM cache would contain all the
changed sectors, starting with the sector containing “men” and including the data
that slid to the right, together with the original file system’s metadata.
(FS)
Individual sectors on a disk always have some ones and zeros stored in every
byte. Sectors are either “allocated” (in use) or “free space” (not in use or empty,
and the specific data bit pattern is considered as garbage). The disk file system
keeps track of which data is in what sector, and also which sectors are free
space.
Table 5-10 shows the layout of how the disk would appear following a “save”
operation after changing the word “time” to “date.” This assumes no free space
detection and no “update in place.” Note again that this example is not
cumulative with examples A through D.
Table 5-10 Layout of disk after changes without free space detection”
After this “save” is complete, the new, saved information is written into free space
sectors #0015-#0028, and the original location sectors then turn into free space,
as indicated by #0001-#0014 in the preceding example.
Since the PSM cache works at the sector level and since this version of PSM
code is unaware of free space, PSM would copy the previous free-space sectors
to its cache as shown in Table 5-11.
Table 5-11 Layout of PSM cache after changes without free space detection
For the NAS code that shipped on 28 April 2001, PSM is enhanced and can
detect free space in the file system. Therefore, if data is written to the disk’s
free-space sectors, those free space sectors will not be copied to the PSM
cache.
Table 5-12 shows the layout of the disk in the event of a “save” operation after
changing the word “time” to “date,” with free space detection but not “update in
place.” Again, this example is not cumulative with previous examples.
Table 5-12 Layout of disk after changes with free space detection
Table 5-13 shows the layout of the PSM cache after saving the “time” to “date”
change. Here, since the PSM cache is aware that the new phrase is being stored
in free space, it does not copy the original free space contents into the cache,
and instead only updates the file system information containing pointers to the
data, and so on.
Table 5-13 Layout of PSM cache after changes with free space detection
(FS)
Finally, note that in this situation, as the recycle bin is active on the NAS, these
save operations tend to “walk through disk storage” and write in free-space
sectors. Therefore, with free space detection (28 April 2001 code) the recycle bin
should be set to a higher number to minimize cache writes and minimize cache
size. For the 9 March 2001 code, the recycle bin should be set to a low number or
turned off, to minimize cache size.
Eventually, a save operation will need to use sectors that were not free space
when the original persistent image was made. Then the original contents are
copied into the PSM cache.
Once a persistent image is created, the PSM cache must keep a copy of any and
all changes to the original file. Therefore, the cache for a specific True Image
copy could eventually grow to be as big as the original volume. The maximum
cache storage size is configurable by the administrator. If insufficient storage is
allocated, then not all the changes can be stored. The PSM cache would then be
made invalid as it would have some good and some missing information. For this
reason, if the PSM cache size is exceeded, the cache will be deleted, starting
with the oldest cache first. It is highly recommended that the NAS administrator
configure a warning threshold that will signal if the cache exceeds the warning
level. The administrator should choose the cache size wisely, as changing the
maximum size might require a NAS system to be rebooted.
PSM caches can neither be backed up or restored from tape. Therefore, the
tape-archive backup program should not be configured to back up the PSM
caches.
The following examples will illustrate how files might appear. The name of the
special PSM folder is administrator-customizable, but in the following example,
the NAS administrator chose the name PSMCOPY.
First, let’s see how the directory looks without any persistent images. Say a user
has a D:\drive located as a “share” on network-attached storage, and that this
drive appears as follows:
D:\
1 MY DOCUMENTS folder
/ January Sales.doc
/ February Sales.doc
/ Sales Plan.doc
/ Orders.123
0 PROGRAM FILES folder
1 Lotus Applications
0 Notes
0 123
0 Freelance Graphics
0 MULTIMEDIA FILES folder
0 TEMP folder
0 ZZZZ folder
The following example shows how True Image copy within the PSMCOPY folder
would appear to the user. The PSMCOPY folder has been opened, and
persistent images had been created at 10:00 a.m. on Monday, Tuesday, and
Wednesday.
D:\
0 MY DOCUMENTS folder
0 PROGRAM FILES folder
0 MULTIMEDIA FILES folder
D:\
0 MY DOCUMENTS folder
0 PROGRAM FILES folder
0 MULTIMEDIA FILES folder
0 PSMCOPY folder
0 Mon_Mar_05_2001_10.00.00 folder
0 Tue_Mar_06_2001_10.00.00 folder
1 MY DOCUMENTS folder
/ January Sales.doc
/ February Sales.doc
/ Sales Plan.doc
/ Orders.123
0 PROGRAM FILES folder
1 Lotus Applications folder
0 Notes
0 123
0 Freelance Graphics
0 MULTIMEDIA FILES folder
0 TEMP folder
0 ZZZZ folder
0 Wed_Mar_07_2001_10.00.00 folder
0 TEMP folder
0 ZZZZ folder
Stores volumes as No, but volumes Only can backup Only can backup No, but volumes
an entity are simply a volumes, not files volumes, not files are simply a
collection of files collection of files
Space usage Changes only Target volume Changes only Changes only
size=source volume
size
Note that these NAS products do not support making an archival copy of the
PSM cache itself. Therefore, when using the following recovery approaches, all
PSM True Image copies and PSM caches should be deleted.
Figure 5-4 shows the PSM schedule menu. This menu contains a list of
schedules for the PSM images to be captured.
First, most backup programs allow the administrator to select all files or a specific
subset of the files to be backed up. For these selected files, a full backup,
differential backup, or incremental backup can generally be requested. The
distinctions between the three types of backup are as follows:
When a full backup is taken, all selected files are backed up without any
exception.
When a differential backup is taken, all files changed since the previous full
backup are now backed up. Thus, no matter how many differential backups
are made, only one differential backup plus the original full backup are
needed for any restore operation. However, the administrator should
understand the particular backup software thoroughly because some backup
software will back up changed files—but not new files—during a differential
backup. When restoring from a differential backup, both the full backup and
the latest differential backup must be used.
An incremental backup is similar to a differential backup. When an
incremental backup is taken, all files changed since that previous incremental
backup are now backed up. When restoring from an incremental backup, the
full backup will be needed as well as all of the incremental backups.
The NAS administrator can decide to perform a backup using all of the files from
a specific True Image copy, or only some files from it. However, while the
administrator can take incremental or differential backups of the drive
represented by a virtual image, the administrator cannot back up the PSM
persistent image cache files themselves. Therefore, should you have a situation
where you have to restore user data from tape, the persistent images will be lost.
The following example illustrates how True Image copies can be used in the
backup and restoration process:
On Monday, a True Image copy is taken of drive E:\. A full tape backup of that
image is made. After this backup is completed, the True Image can be kept or
deleted. If the copy is kept, it can be subsequently used to restore Monday’s
files.
However, the Administrator cannot restore the specific PSM cache files that
otherwise would have been available if the earthquake had never occurred.
To assist the NAS administrator in making backups using either TSM or ISV
software with PSM persistent image technology, IBM has provided the IBMSNAP
utility. Using this utility requires knowledge of Windows batch files and a
command line backup utility. IBMSNAP.EXE is a command line utility that creates
a PSM persistent image virtual drive, launches backup batch files, and then sets
the archive bits accordingly on the drive being backed up. It can be used in
conjunction with any third-party backup utilities as long as the utility supports
command line backups. The IBMSNAP.EXE utility can be found in the
c:\nas\ibm\nasbackup directory of the NAS operating system. See the online
NAS help for further details.
NT backup
The IBM Network Attached Storage products are pre-loaded with Windows
NTBackup and the NAS Backup Assistant. This approach can be used to back
up operating system data or user data, either to disk or tape. The pre-loaded
Persistent Storage Manager function is the recommended method of resolving
the “open file” problem.
There are two ways to back up the files in the NAS appliance when you use the
NT backup method. You can either access it through the NAS administration
console or the Windows Terminal Services. The NAS administration console is
accessed via the Maintenance -> System Backup and Restore -> Backup
option. For this approach you should first create a Persistent Image before the
NT Backup is started. Use this method if you want to back up a selected folder
from one of the persistent images, or the system partition.
The other method is to use the NAS Backup Assistant tool. The NAS Backup
Assistant automatically creates a Persistent Image and starts the NT Backup
program. Use this method to back up the data in a volume or file level basis.
These are the steps to be executed:
1. Use Windows Terminal Services from any NAS client to access the NAS
appliance.
2. Select Start -> IBM NAS Admin.msc -> Backup and Restore.
3. This leads you to the IBM NAS Admin display.
4. Select Backup and Restore -> IBM NAS Backup Assistant from the left
pane.
5. In the right pane, the following options appear:
– Backup Operations: Select drive, schedules, backup types, backup
methods, destination type, file path or tape name.
– Schedule Jobs: List jobs scheduled for backups. You can also delete jobs
that have been scheduled but not yet executed.
– Backup Logs: Shows logs of all backups. You can view or delete logs here.
– Display Logs: Allows you to display the logs.
Note: You must ensure that a check mark appears on the directory or
individual files during the selection process. Otherwise, nothing will be backed
up or restored.
As in the NTBackup method, you will have to ensure that the persistent images
are created before activating this backup function. Automated scheduling to back
up these PSM images can then be configured in the TSM server.
The TSM client uses an option file to store its configuration. Once the setup is
completed, it creates an option file on the IBM NAS appliance in the following
directory and file name: C:\Program Files\Tivoli\TSM\baclient\dsm.opt
NODENAME IBM_NAS_TSM_CLIENT
PASSWORDACCESS GENERATE
DOMAIN “(\\ibm-23ttn07\share_e)"
DOMAIN "(\\ibm-23ttn07\share_g)"
DOMAIN ALL-LOCAL
TCPSERVERADDRESS 192.1.1.5
Figure 5-5 Sample output of TSM client’s dsm.opt file in the NAS Appliance
For the backup to work, the TSM Server must have its client’s nodename
registered in its configuration files. In this case, it will be the NAS Appliance’s
nodename.
To back up the files from the TSM Client, follow these steps:
1. Use Windows Terminal Services from any NAS client to access the NAS
appliance.
To restore, just follow the preceding steps, but select Restore in step 4 instead of
Backup.
Note: You must ensure that a check mark appears on the directory or
individual files during the selection process. Otherwise, nothing will be backed
up or restored.
However, a limited number of add-on applications have been tested with these
NAS products, and customers may add those specific software applications to
the system. Should a customer have problems with non-IBM software that they
have added to this appliance, the customer should contact the vendor directly, as
IBM does not provide on-site or remote telephone support for those non-IBM
products.
IBM will continue to support hardware and software that is shipped with the NAS
appliance. However, in certain circumstances, any non-IBM software may have to
be uninstalled for IBM service to provide problem determination on the IBM
hardware and software.
IBM has tested, and will continue to test, a variety of vendor software products.
Customers can go to the IBM Support Web site at
http://www.ibm.com/storage/nas to see the status and additional details of this
testing.
Note: For more information, read the IBM whitepaper by Jay Knott entitled
“NAS Cache Systems, Persistent Storage Manager and Backup“ available at:
http://www.storage.ibm.com/snetwork/nas/whitepaper_nas_cache_systems.html
We want to emphasize that these are generalized examples, and our objective is
to bring together some of the ways in which you can benefit from IBM’s new
products. Inevitably we will not cover every possible use. Customers are often
very inventive, and think of new things they can do, which further enhance the
portfolio of solutions! However, the examples we include here are typical of the
way we believe users will begin to exploit the functions and capacity offered by
NAS and iSCSI storage.
W indow s
NT File W indows
Servers NT File
W indow s Servers
W indow s®
NT File NT File LAN
Servers LAN Servers
W indows
NT File NAS 200 ( up to 1.74 TB )
Servers NAS 300 (up to 3.24 TB )
integtrated disk
Needs: Benefits:
To sim plify data mana gement of file servers Consolidates file-server managem ent
To sim plify adding storage to file servers Easier storage manageme nt
Simplifies adding additional storage
Figure 6-1 Implementation of storage consolidation with the NAS 200 and 300
N AS 30 0G
SAN
SAN (Fibre Channel)
(Fibre Channe l)
Needs: Benefits:
S im p lify data and storage resource C onsolid ates file-server m anagem ent
m anag em ent o f file servers E asier storage m anagem ent
S im p lify adding storage to file servers S im plifies adding additional sto rag e
The attractions of the NAS appliances are the ease of management, along with
the availability of advanced functions, such as RAID, instantaneous copy of files
for easier backup processes, using Persistent Storage Manager, and so on.
Figure 6-3 shows how you can still make use of your 7133 with 300G.
Chapter 6. Application examples for IBM NAS and iSCSI solutions 223
S cena rio: 7133 d atab ase m an agem ent
C u rren t E n viro n m en t S o lu tio n
pS e rie s pS e rie s
se rv er s s e rve rs
(R S /6 00 0 ® ) (R S /6 0 0 0 )
SSA S SA
SSA SSA S SA S SA
71 3 3 7 13 3 71 3 3
7 133 71 33
7 13 3
L AN
LA N
N AS 3 00 G
SAN
(F ib re C h a nn e l) S AN
SL IC (F ib re C h a nn e l) S LIC
Ad a pte r A da pte r
71 3 3
71 3 3
N eeds: B en efits:
To sim plify da ta m an ag em en t o f file C on s olid ates d a ta m an ag em en t
s ervers H elps p ro te cts you r 71 33
To sim plify ad d in g s to rag e to file ser ve rs in ve stm en t
Other benefits also accrue. Storage scalability is enhanced and growth can take
place with minimum disruption. Storage space can be re-allocated as required,
based on changing user needs. Backup processes can be automated using
Persistent Storage Manager (see Chapter 5, “Backup for IBM Network Attached
Storage” on page 191).
Clients Clients
Clients Clients
Windows
Windows® NT File LAN
NT File Servers
Servers LAN
Needs: Benefits:
Simplify management of file servers Reduce number of file servers
Simplify adding storage to file servers Heterogeneous file-sharing on the SAN
Simplifies storage management
Simplifies adding additional storage
Chapter 6. Application examples for IBM NAS and iSCSI solutions 225
File server consolidation with NAS 300G
Current Environment Proposed Solution
Clients
Clients
Clients
Clients
Servers
NAS 300G
SAN File
SAN sharing
(Fibre Channel)
SAN
(Fibre Channel)
Needs: Benefits:
Simplify management of file servers Reduce number of file servers
Simplify adding storage to file servers Heterogeneous file-sharing on the SAN
Simplifies storage management
Simplifies adding additional storage
LAN
LAN
Fibre
NAS 300G Channel
NAS 300G MDC
SAN SAN
(Fibre Channel) (Fibre Channel)
Needs: Benefits:
Provide end-users with access to SAN Provides heterogeneous file sharing
storage Reduces traffic over LAN
Provide heterogeneous file sharing File access at Fibre Channel speed
Reduce LAN traffic
Chapter 6. Application examples for IBM NAS and iSCSI solutions 227
In the following sections, we illustrate two possible configurations.
LAN
SAN
Disk
Vol 1 Vol 2 Vol 3 Vol 4
Note that in this case, no data is transferred through the LAN, not even metadata.
That is because there is no backup/restore action on your application server.
LAN
TSM server
MDC TSM client
SANergy client
SAN
Disk
Vol 1 Vol 2 Vol 3 Vol 4
When the Tivoli Storage Manager client begins to back up the data, it will need to
get the metadata from the MDC machine. For this purpose, the TCP/IP transport
over the LAN will be used. But the raw data still will be transmitted through the
SAN.
Chapter 6. Application examples for IBM NAS and iSCSI solutions 229
become unpredictable, and lead to inconsistent performance and response
times. In the world of e-business this is an unacceptable situation, because Web
users are potentially your customers. Poor service levels will drive them into the
arms of your competitors.
Each server has its own storage, but the data related to Web pages is exactly the
same. It is costly to continue to grow in this manner, multiplying the number of
data copies with the addition of each new Web server. The ideal solution is to
have consolidated storage which all Web servers can access concurrently. One
possible solution is to move to, or increase investment in a Fibre Channel SAN.
However, the cost of building this new, high speed storage infrastructure may be
high, especially for low cost NT servers, (which typically are the Web servers).
Also, the time required to implement a SAN solution is long. An alternative that is
much lower in cost, and easier to implement rapidly, is to install a NAS appliance
to handle Web serving.
In Figure 6-9 we show how a NAS 200 or 300 would provide an excellent
Web-serving, consolidated storage solution, at low cost, and with minimum time
to install. New investment in servers is minimized, and Web services can easily
be isolated from other mission-critical applications.
Web Hosting
Current Environment Solution
Internal
Internal
Users
Users
Business Surfers,
Business Surfers, Shoppers
Shoppers Clients
Clients
LANs,
LANs, WANs,
WANs,
Web,
Database, NAS
Transaction, Database, 200
Mission Transaction, Web
Critical, Mission Server
Servers Critical, SAN
SAN
Servers (Fiber Channel)
(Fiber Channel)
Needs: Benefits:
Increase storage due to business growth High performance, dedicated Web Server
Provide high speed, web streaming to clients Minimize investment in additional Servers
To share storage among multiple web servers Provide storage pooling
Keep costs low Provide heterogeneous Web File Serving
Reduce CPU load on "Mission Critical" Servers Use existing infrastructure / tools / processes
Isolates Web clients
Internet Data Centers (IDCs) also want plenty of cheap storage to offer their
clients. An IDC offers a physical location for storage, to support anything the
customer wants to put on the box. Corporate IT centers use IDCs, as do SSPs
and Web-hosting ISPs. The benefit of an IDC is that it is located adjacent to an
optic fiber line, so it eliminates the customer need to run fiber optic to their
business, saving them thousands of dollars per month. Most IDC customers do
not require high availability.
Integrated NAS solutions are also used frequently for video streaming storage
service on the Web, and as a vehicle for providing a place to do backups for an
office workgroup or department.
Video streaming frequently runs with CIFS protocol, for which NAS 200 and 300
are well suited. Video streaming is also typically not an application where failover
is required. NAS 200 and 300 will also allow several users to view the file
simultaneously, whereas the IP 200i is good for a direct feed to a single client.
Chapter 6. Application examples for IBM NAS and iSCSI solutions 231
6.6.1 Database solutions
The example illustrated in Figure 6-10 shows the use of the IP Storage 200i to
enable a small- to medium-sized data center to exploit their existing IP network to
support a number of database or low volume transaction-oriented applications.
The 200i is an ideal, flexible solution for an organization that needs to keep
implementation simple and low cost, and to avoid the need to develop new skills,
as would be necessary with a Fibre Channel SAN.
High
Performance
Database and
Transaction
Servers
DataCenter IP DataCenter IP
Infrastructure Infrastructure
High
Performance
IP
Database and
Transaction Storage
Servers 200i
Block I/O Pooled
Environment Storage
Needs: Benefits:
Additional storage for database + Pooled / Centralized Storage
transaction servers Non-disruptive growth
Limited IT skills Centralized storage management
Pooled storage for availability, flexibility, Utilized existing network/IP skills
+ scalability
Low/Moderate Transaction Volume
LANs, LANs,
WANs, WANs,
NAS pooled
storage
Needs: Benefits:
Add database applications to web serving Pooled storage for database applications
Share storage among database servers Complements / coexists with NAS solution
Reduce SAN implementation costs Use IP infrastructure / tools / processes
Isolates Web clients away from database
applications
Chapter 6. Application examples for IBM NAS and iSCSI solutions 233
6.7 Positioning storage networking solutions
Table 6-1 provides a brief summary of all the Storage Networking solutions we
have described thus far.
Table 6-1 Summary of storage networking solutions
SAN NAS iSCSI SANergy
Better with Block I/O Better with File I/O Block I/O File IO plus Block IO
(database) applications
applications IP Based NAS File sharing with
IP Based SAN performance
FC Storage Sharing
File Sharing
Storage Sharing
Slower database
performance than SAN or
iSCSI
Typical applications using file I/O Typical applications using block I/O
Chapter 6. Application examples for IBM NAS and iSCSI solutions 235
Today, under normal circumstances, Assembler is no longer used to write
application programs since the cost of writing in Assembler is much higher
than the cost of the “wasted” storage and CPU power incurred with PL/1. This
is because the cost of hardware has fallen substantially over time. A similar
approach can be expected in the storage arena. It is likely that application
developers will leave the lower layer functionality to the operating systems,
especially as new storage technologies emerge.
2. All file I/Os result at the lower layers into block I/O commands. In other words,
iSCSI devices, like other storage systems which support storage protocols,
also support file I/O applications. In this case, it should be noted that the
“visibility” of the files is lost. The iSCSI device, like DAS and SAN attached
storage, knows nothing about the “files,” but only about “raw I/O” or blocks. It
is for this reason that NAS devices should be considered only for file I/O
applications, whereas iSCSI appliances are well suited to general purpose
storage applications, including file I/O applications.
The preceding chapters in this book covered IBM’s announced solutions using
storage over IP networks. In this chapter, we describe some of the other
technologies which are emerging, or are in the process of being introduced into
the storage market. In general, the developments come from groups of
co-operating companies, and they address varying connectivity and data
transmission issues arising from today’s diverse customer networking
environments. Many of these developments are complementary, and combine to
enhance your choices, and benefit the solutions you plan to implement today.
IBM is an active participant in many of these industry initiatives.
iFCP uses TCP to provide congestion control, error detection, and recovery.
iFCP's primary objective is to allow interconnection and networking of existing
Fibre Channel devices at wire speeds over an IP network. The protocols and
method of frame translation of this protocol enables the transparent attachment
of Fibre Channel storage devices to an IP-based fabric by means of lightweight
gateways. The protocol achieves this transparency through an address
translation process. This allows normal frame traffic to pass through the gateway
directly, with provisions for intercepting and emulating the fabric services
required by an FCP device.
In its simplest form of iFCP implementation, the Fibre Channel devices are
directly connected to the iFCP fabric through F_PORTs, which are implemented
as part of the edge switch or gateway. At the N_PORT interface on the Fibre
Channel side of the gateway, the network appears as a Fibre Channel fabric.
Here, the gateway presents remote N_PORTs as directly attached devices.
Conversely, on the IP side, the gateway presents each locally connected
N_PORT as a logical iFCP device on the IP network.
For more information on this topic, visit the following Web site:
http://www.ietf.org
FCIP Protocol
The FCIP Protocol consists of the following:
FCIP Device: This term generally refers to any device that encapsulates FC
frames into TCP segments and reassembles TCP segments to regenerate
FC frames. It may be a stand-alone box, or integrated with an FC device such
as an FC backbone switch. It could also be integrated with any TCP/IP device,
such as an IP switch or an IP router. The FCIP device is a transparent
translation point. The IP network is not aware of the FC payload that it is
carrying. Similarly, the FC fabric and FC end nodes are not aware of the
IP-based transport.
Protocol: The FCIP protocol specifies the TCP/IP encapsulation, mapping
and routing of FC frames. It applies these mechanisms to an FC network
utilizing IP for its backbone (or more generally, between any two FC devices).
FCIP Header Format: This header consists of its version number, header
length, frame length, and its reserved bits.
The FCIP device always delivers entire FC frames to the FC ports to which it is
connected. The FC ports must remain unaware of the existence of the IP
network that provides, through the FCIP devices, the connection for these FC
ports. The FCIP device also treats all classes of FC frames the same, that is, as
datagrams.
For more information on this topic, visit the following Web site:
http://www.ietf.org
Scalability needs are addressed in two ways. First, the I/O fabric itself is
designed to scale without encountering the latencies that some shared bus I/O
architectures experience as workload increases. Second, the physical modularity
of InfiniBand Technology will avoid the need for customers to buy excess
capacity up-front in anticipation of future growth. Instead, they will be able to buy
what they need at the outset and “pay as they grow,” to add capacity without
impacting operations or installed systems.
For more information on this topic, visit the following Web site:
http://www.infinibandta.org
VI providers process the posted descriptors asynchronously, and mark them with
a status value when completed. VI consumers will remove these completed
descriptors from the work queues and reuse them for subsequent requests. Each
work queue has an associated doorbell that is used to notify the VI network
adapter whenever a new descriptor has been posted to a work queue. There is
no operating system intervention to operate the doorbell since this is
implemented directly by the adapter.
VI provider
The VI provider is the set of hardware and software components responsible for
initiating a virtual interface. The VI provider consists of a network interface
controller (NIC) and a kernel agent. The VI NIC implements the virtual interfaces
and completion queues and directly performs data transfer functions.
The kernel agent is a privileged part of the operating system. This is usually a
driver supplied by the VI NIC vendor; it provides setup and resource
management functions which are needed to maintain a virtual interface between
VI Consumers and VI NICs. These functions include the creation and destruction
of VIs, VI connection setup and teardown, interrupt management and/or
processing, management of system memory used by the VI NIC, and error
handling. Standard operating system mechanisms, such as system calls, are
used by the VI consumers to access the kernel agent. Kernel agents interact with
VI NICs through standard operating system device management mechanisms.
The operating system makes the system calls to the kernel agent to create a VI
on the local system and connect it to a VI on a remote system. Once a
connection is established, the operating system facility posts the application’s
send and receive requests directly to the local VI.
The operating system communication facility often loads a library that abstracts
the details of the underlying communication provider, in this case the VI and
kernel agent. This component is shown as the VI user agent in Figure 7-1. It is
supplied by the VI hardware vendor, and conforms to an interface defined by the
operating system communication facility.
Completion queues
Completed requests can be notified directly to a completion queue on a per-VI
work queue basis. This association is established when a VI is created. Once a
VI work queue is associated with a completion queue, all completion
synchronization must take place on that completion queue.
As with VI work queues, notification status can be placed into the completion
queue by the VI NIC without an interrupt, and a VI consumer can synchronize on
a completion without a kernel transition.
Figure 7-3 on page 248 shows the VI architecture completion queue model.
For more information on the virtual architecture, visit the following Web site:
http://www.viarch.org
For the local or network file system, data is copied into a buffer cache. It is then
copied into the application’s private buffer. File access over network file systems
incurs additional data copies in the networking stack. Some operating systems
can bypass the buffer cache copy in certain cases, but all reads over a traditional
network file system require at least one data copy.
In this way, data can be transferred to and from a client application’s buffers
without any CPU overhead on the client side. To avoid extra data copies on write
requests, a traditional local or remote file system must lock down the
application’s I/O buffers before each request. A DAFS client allows an application
to register its buffers with the NIC once, which avoids the per-operation
registration overhead currently incurred.
NDMP uses the External Data Representation (XDR) and TCP/IP protocols as
foundations. The key goals of NDMP include interoperability, contemporary
functionality, and extensibility.
NDMP
Network Data Management Protocol. An open protocol for enterprise-wide
network-based backup.
NDMP client
The application that controls the NDMP server.
NDMP server
The virtual state machine on the NDMP host that is controlled using the NDMP
protocol. There is one of these for each connection to the NDMP host. This term
is used independent of implementation.
In the simplest configuration, an NDMP client will back up the data from the
NDMP host to a backup device connected to the NDMP host.
NDMP can be used to back up data to a backup device in a tape library that is
physically attached to the NDMP host. In this configuration, there is a separate
instance of the NDMP server to control the robotics within the tape library. This is
shown in Figure 7-7.
This architecture can also back up a host that supports NDMP, but which does
not have a locally attached backup device. This is achieved by sending the data
through a raw TCP/IP connection to another NDMP host. A logical view of this
configuration is shown in Figure 7-8.
Tape-to-tape copy function could be used to duplicate the backup tape for off-site
storage, while data-to-data copy is used to restore the entire data from one disk
to another disk.
Extending NAS protocols to share data over SANs effectively eliminates the
distinction between NAS and SANs, allowing them to be managed and
administered as one logical network that simply has varying means of physical
connectivity. In both cases, storage is attached to, and heterogeneously shared
via, some kind of network: typically, Ethernet for LAN-attached storage and Fibre
Channel for SAN-attached storage.
White papers from this Work Group will provide customers an understanding of
discovery within the SAN, how it fits into the overall management scheme of the
SAN, and how SAN storage management and data management software will
use it.
The Backup Work Group maintains a prioritized list of topics and problems that
are viewed as current or important to the community of backup providers, backup
consumers and to SAN/NAS element providers with a stake in backup
technology. The Backup Work Group has an objective of promoting all draft
specifications to standards bodies whenever possible. Currently, this Work
Group is addressing a number of issues, including these Subcommittees:
Snapshot/Checkpoint/Quiesce Subcommittee
Currently a large number of application, database, or supporting software
companies produce or are planning to produce a snapshot capability. A large
number of software companies produce software that must either invoke a
snapshot (such as backup software) or use a snapshot (such as recovery
software). The large increase in connectivity afforded by storage network
technology amplifies the need for uniform interfaces for snapshot, quiesce and
checkpoint. The market values a general solution. Providing a general solution
requires that each snapshot-using software product handle all the different
snapshot types. The Snapshot/Checkpoint Subcommittee is defining a standard
API for creating snapshots and checkpoints. A standard API will reduce
complexity and encourage interoperability.
Details of these work groups can be found at the IETF Web site:
http://www.ietf.org
The Work Group cannot assume that any changes it desires will be made in
these standards, and hence will pursue approaches that do not depend on such
changes unless they are unavoidable. In that case, the Work Group will create a
document to be forwarded to the standards group responsible for the technology,
explaining the issue and requesting the desired changes be considered. The
Work Group will endeavor to ensure high quality communications with these
standards organizations. It will consider whether a layered architecture providing
common transport, security, and/or other functionality for its encapsulations is the
best technical approach.
Use of IP-based transports raises issues that do not occur in the existing
transports for the protocols to be encapsulated. The Work Group will address at
least the following:
Congestion control suitable for shared traffic network environments such as
the Internet.
Security measures, including authentication and privacy, sufficient to defend
against threats up to and including those that can be expected on a public
network.
Naming and discovery mechanisms for the encapsulated protocols on
IP-based networks, including both discovery of resources (for example,
storage) for access by the discovering entity, and discovery for management.
Management, including appropriate MIB definition(s).
The Work Group specifications will provide support for bridges and gateways that
connect to existing implementations of the encapsulated protocols. The Work
Group will preserve the approaches to discovery, multi-pathing, booting, and
similar issues taken by the protocols it encapsulates to the extent feasible.
It may be necessary for traffic utilizing the Work Group's encapsulations to pass
through Network Address Translators (NATs) and/or firewalls in some
circumstances; the Work Group will endeavor to design NAT- and firewall-friendly
protocols that do not dynamically select target ports or require Application Level
Gateways.
The standard Internet checksum is weaker than the checksums use by other
implementations of the protocols to be encapsulated. The Work Group will
consider what levels of data integrity assurance are required and how they
should be achieved.
The Work Group will produce a framework document that provides an overview
of the environments in which its encapsulated protocols and related protocols are
expected to operate. The Work Group will produce requirements and
specification documents for each protocol encapsulation, and may produce
applicability statements. The requirements and specification documents will
consider both disk and tape devices, taking note of the variation in scale from
single drives to large disk arrays and tape libraries, although the requirements
and specifications need not encompass all such devices.
The wise IT professional will not get too carried away with promises for
tomorrow, because “tomorrow never comes.” It is smart to be aware of possible
advances, but you cannot make a sensible investment decision until the promise
is delivered. After all, many “wonder technologies” never really make it in the
market, and others arrive later than expected. Yet again, something previously
ignored may cause a much greater than anticipated impression in the market,
when its potential is fully understood.
Our recommendation is this: Focus on the solutions we can deliver now, or in the
near future. We hope we have shown you that these solutions offer cost
effectiveness and great flexibility; and IBM is committed to open standards, now
and for the future.
What is RAID
RAID is an architecture designed to improve data availability by using arrays of
disks in conjunction with data striping methodologies. The idea of an array—a
collection of disks the system sees as a single device—has been around for a
long time. In fact, IBM was doing initial development of disk arrays as early as the
1970s. In 1978, IBM was issued the patent for a disk array subsystem. At that
time, however, the cost of technology precluded the use of RAID in products.
The original Berkeley paper emphasized performance and cost. The authors
were trying to improve performance while lowering costs at the same time. In
their efforts to improve reliability, they designed the fault tolerance and logical
data redundancy which was the origin of RAID. The paper defined five RAID
architectures, RAID Levels 0 through 5. Each of these architectures has its own
strengths and weaknesses, and the levels do not necessarily indicate a ranking
of performance, cost, or availability. Other RAID levels and combinations have
been defined in subsequent years.
In the case of a six-drive array, the “logical” disk has six completely independent
head mechanisms for accessing data, so the potential for improved performance
is immediately apparent. In the optimal situation all six heads could be providing
data to the system without the need for the time-consuming head-seeks to
different areas of the disk that would be necessary were a single physical disk
being used. RAID can be implemented using a specialized hardware or via
software most common in the operating system.
RAID-0
RAID-0, sometimes referred to as disk striping, is not really a RAID solution since
there is no redundancy in the array at all. The disk controller merely stripes the
data across the array so that a performance gain is achieved. This is illustrated in
Figure A-1 on page 265.
It is common for a striped disk array to map data in blocks with a stripe size that is
an integer multiple of real drive track capacity. For example, the IBM ServeRAID
controllers allow stripe sizes of 8 KB, 16 KB, 32 KB or 64 KB, selectable during
initialization of the array.
Physical disks
0 1 2 3
4 5 6 7
8 9 10 11
0
1
2
3
4
5
6
Logical disk
Certain operating systems, including Windows NT, provide direct support for disk
mirroring. There is a performance overhead, however, as the processor has to
issue duplicate write commands. Hardware solutions where the controller
handles the duplicate writes are preferred.
Physical disks
0 0 1
1 2 2
3 3 4
0
1
2
3
4
5
6
Logical disk
As you can see, any one disk can be removed from the array without loss of
information, because each data stripe exists on two physical disks. The controller
detects a failed disk and redirects requests for data from the failed drive to the
drive containing the copy of the data. When a drive has failed, the replacement
drive can be rebuilt using the data from the remaining drives in the array.
When a disk fails, only one copy of the data that was on the failed disk is
available to the system. The system has lost its redundancy, and if another disk
fails, data loss is the result. When a failed disk is replaced, the controller rebuilds
the data that was on the failed disk from the remaining drives and writes it to the
new disk, restoring the redundancy.
To avoid having to manually replace a failed disk, the IBM ServeRAID controller
implements hot spare disks that are held idle until a failure occurs, at which point
the controller immediately starts to rebuild the lost data onto the hot spare,
minimizing the time when redundancy is lost. The controller provides data to the
system while the rebuild takes place. When you replace the failed drive, its
replacement becomes the array’s new hot spare.
RAID-3
RAID-3 stripes data sequentially across several disks. The data is written or
retrieved in one parallel movement of all of the access arms. RAID-3 uses a
single dedicated disk to store parity information, as shown in Figure A-3.
Because of the single parallel movement of all access arms, only one I/O can be
active in the array at any one time.
Because data is striped sequentially across the disks, the parallel arm movement
yields excellent transfer rates for large blocks of sequential data, but renders
RAID-3 impractical for transaction processing or other high throughput
applications needing random access to data. When random processing does
take place, the parity becomes a bottleneck for write operations.
RAID-3 can withstand a single disk failure without losing data or access to data.
It is well-suited for imaging applications.
D is k C o n tro lle r
P h y s ic a l d is k s
0 1 2 3 P a r ity
4 5 6 7 P a r ity
8 9 10 11 P a r ity
D is k 1 D is k 2 D is k 3 D is k 4 D is k 5
RAID-5
RAID-5 is one of the most capable and efficient ways of building redundancy into
the disk subsystem. The principles behind RAID-5 are very simple and are
closely related to the parity methods sometimes used for computer memory
subsystems. In memory, the parity bit is formed by evaluating the number of 1
bits in a single byte. For RAID-5, if we take the example of a four-drive array,
three stripes of data are written to three of the drives and the bit-by-bit parity of
the three stripes is written to the fourth drive.
As an example, we can look at the first byte of each stripe and see what this
means for the parity stripe. Let us assume that the first byte of stripes 1, 2, and 3
are the letters A, B, and G respectively. The binary code for these characters is
01000001, 01000010 and 01000111 respectively.
We can now calculate the first byte of the parity block. Using the convention that
an odd number of 1s in the data generates a 1 in the parity, the first parity byte is
01000100 (see Table A-1). This is called Even Parity because there is always an
even number of 1s if we look at the data and the parity together. Odd Parity could
have been chosen; the choice is of no importance as long as it is consistent.
Table A-1 Generation of parity data for RAID 5
0 0 0 0
1 1 1 1
0 0 0 0
0 0 0 0
0 0 0 0
0 0 1 1
0 1 1 0
1 0 1 0
Calculating the parity for the second byte is performed using the same method,
and so on. In this way, the entire parity stripe for the first three data stripes can be
calculated and stored on the fourth disk.
The presence of parity information allows any disk to fail without loss of data. In
the above example, if drive 2 fails (with B as its first byte) there is enough
information in the parity byte and the data on the remaining drives to reconstruct
the missing data. The controller has to look at the data on the remaining drives
and calculate what drive 2’s data must have been to maintain even parity.
Because of this, a RAID-5 array with a failed drive can continue to provide the
system with all the data from the failed drive.
Performance will suffer, however, because the controller has to look at the data
from all drives when a request is made to the failed one. A RAID-5 array with a
failed drive is said to be critical, since the loss of another drive will cause lost
data. For this reason, the use of hot spare drives in a RAID-5 array is as
important as in RAID-1.
The simplest implementation would always store the parity on disk 4 (in fact, this
is the case in RAID-4, which is hardly ever implemented for the reason about to
be explained). Disk reads are then serviced in much the same way as a level 0
array with three disks. However, writing to a RAID-5 array would then suffer from
a performance bottleneck. Each write requires that both real data and parity data
are updated. Therefore, the single parity disk would have to be written to every
time any of the other disks were modified. To avoid this, the parity data is also
striped, as shown in Figure A-4 on page 270, spreading the load across the
entire array.
The consequence of having to update the parity information means that for every
stripe written to the virtual disk, the controller has to read the old data from the
stripe being updated and the associated parity stripe. Then the necessary
changes to the parity stripe have to be calculated based on the old and the new
data. All of this complexity is hidden from the processor, but the effect on the
system is that writes are much slower than reads. This can be offset to a great
extent by the use of a cache on the RAID controller. The IBM ServeRAID
controllers have cache as standard, which is used to hold the new data while the
calculations are being performed.
0 1 2 Parity 0-2
3 4 Parity 3-5 5
6 Parity 6-8 7 8
0
1
2
3
4
5
6
Logical disk
Meanwhile, the processor can continue as though the write has taken place.
Battery backup options for the cache, available for some controllers, mean that
data loss is kept to a minimum even if the controller fails with data still in the
cache.
In the event of a physical drive failing, its status will change to Defunct Disk Drive
(DDD) and the ServeRAID controller will start rearranging the data the disk
contained into the spare space on the other drives in the array, provided there is
enough space, of course.
During the migration of data, the logical drive will be in a critical, non-redundant
state. As soon as all the data is rearranged, the logical drive will be marked OKY
(Okay) and have full redundancy again. This is illustrated in Figure A-6 on
page 272.
A second physical disk failure, occurring before the previously failed disk has
been replaced, is illustrated in Figure A-7.
RAID-5E Logical
Logical Drive Drive Status: OKY
Data distributed
Figure A-7 RAID-5E array: data distributed throughout previous spare space
In the event of such a second physical disk failure before the previously failed
disk has been replaced, normal RAID-5 procedures will be taken to provide
service to the system through the checksum calculations described in
Figure A-8.
Each of the new levels utilizes disk drive organizations referred to as spanned
arrays. Data is striped across a number of lower level arrays rather than
individual disks using RAID 0 techniques. These lower level arrays are
themselves RAID arrays.
In this section we explain the principles behind each of these spanned RAID
levels.
RAID-00
RAID-00 comprises RAID-0 striping across lower level RAID-0 arrays, as shown
in Figure A-9:
This RAID level does not provide any fault tolerance. However, as with a standard
RAID-0 array, you achieve improved performance, and also the opportunity to
group more disks into a single array, providing larger maximum logical disk size.
A B I J
C D K L
E F M N
G H O P
R A ID 0 R A ID 0
R A ID 0
S p a n n e d R A ID 0 0
RAID-10
As we have seen, RAID-1 offers the potential for performance improvement as
well as redundancy. RAID-10 is a variant of RAID-1 that effectively creates a
striped volume of a RAID-1 array. The disks are first mirrored together and then
striped together as one volume.
A A E E
B B F F
C C G G
D D H H
R A ID 1 R A ID 1
R A ID 0
S p a n n e d R A ID 1 0
This RAID level provides fault tolerance. Up to one disk of each sub-array may
fail without causing loss of data.
RAID-1E0
RAID-1E0 comprises RAID-0 striping across lower level RAID-1E arrays, as
shown in Figure A-11.
This RAID level gives you the performance of the RAID-1E and RAID-0 mixed in
one single array, and will give you high availability for your data. Up to one disk in
each sub-array may fail without causing data loss.
C A B I J H
D E F J K L
F D E L J K
R A ID 1 E R ARIDA1 ID
E 1
R A ID 0
S p a n n e d R A ID 1 E 0
RAID-50
RAID-50 comprises RAID-0 striping across lower level RAID-5 arrays, as shown
in Figure A-12 on page 277.
Once again, the benefits of RAID-5 are gained, while the spanned RAID-0 allows
you to incorporate many more disks into a single logical drive. Up to one drive in
each sub-array may fail without loss of data.
C+D C D I + J I J
E E +F F K
K+L L
R A ID 5 R AID 5
R A ID 0
S p a n n e d R A ID 5 0
RAID summary
RAID is an excellent and proven technology for protecting your data against the
possibility of hard disk failure. IBM’s ServeRAID range of RAID controllers bring
the benefits of RAID technology to IBM TotalStorage NAS solutions to your
critical business information.
Here is a brief summary of the different RAID levels we covered in this appendix:
Level 1 Duplicates all data from one Where only two drives are
(Mirroring) drive to a second drive available and data protection
is needed
RAID 0 and RAID 00 would typically be used only when data on the array is not
subject to change and is easily replaced in the case of a failed disk.
Table A-4 illustrates the different advantages and disadvantage of each RAID
level.
1 Various Mirroring. Each disk in a Simplicity, reliability, and High inherent cost.
mirrored array holds an availability.
identical image of data.
4 NetApp User data is striped High performance for Requires extra cache for
across multiple disks. reads. writes. Single parity disk
Parity check data is can be a bottleneck.
stored on a single disk.
5 Various User data is striped High performance for Still requires caching or
across multiple disks. reads. parallel multiprocessors
Parity check data is Multiple drives can fail in for writes.
stored across multiple a single array, but data is
disks. still protected.
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
IBM Redbooks
For information on ordering these publications, see “How to get IBM Redbooks”
on page 283.
Introduction to Storage Area Network, SAN, SG24-5470
Designing an IBM Storage Area Network, SG24-5758
Implementing an Open IBM SAN, SG24-6116
Using Tivoli Storage Manager in a SAN Environment, SG24-6132
IBM Tape Solutions for Storage Area Networks and FICON, SG24-5474
Storage Area Networks; Tape Future in Fabrics, SG24-5474
Storage Consolidation in SAN Environments, SG24-5987
Implementing Fibre Channel Attachment on the ESS, SG24-6113
IBM SAN Survival Guide, SG24-6143
Storage Networking Virtualization: What’s it all about?, SG24-6210
A Practical Guide to Network Storage Manager, SG24-2242
Using iSCSI Solutions’ Planning and Implementation, SG24-6291
Other resources
These publications are also relevant as further information sources:
Building Storage Networks, ISBN 0072130725, Farley, Marc, McGraw-Hill
Professional, 2001
IP Fundamentals, What Everyone Needs to Know About Addressing &
Routing, ISBN 0139754830, Maufer, Thomas, Prentice All, 1999
Information in this book was developed in conjunction with use of the equipment
specified, and is limited in application to those specific hardware and software
products and levels.
IBM may have patents or pending patent applications covering subject matter in
this document. The furnishing of this document does not give you any license to
these patents. You can send license inquiries, in writing, to the IBM Director of
Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact IBM Corporation, Dept.
600A, Mail Drop 1329, Somers, NY 10589 USA.
The information contained in this document has not been submitted to any formal
IBM test and is distributed AS IS. The use of this information or the
implementation of any of these techniques is a customer responsibility and
depends on the customer's ability to evaluate and integrate them into the
customer's operational environment. While each item may have been reviewed
by IBM for accuracy in a specific situation, there is no guarantee that the same or
similar results will be obtained elsewhere. Customers attempting to adapt these
techniques to their own environments do so at their own risk.
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of these
Web sites.
Java and all Java-based trademarks and logos are trademarks or registered
trademarks of Sun Microsystems, Inc. in the United States and/or other
countries.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of
Microsoft Corporation in the United States and/or other countries.
UNIX is a registered trademark in the United States and other countries licensed
exclusively through The Open Group.
SET, SET Secure Electronic Transaction, and the SET Logo are trademarks
owned by SET Secure Electronic Transaction LLC.
ANSI American National Standards Institute - The Bridge/Router A device that can provide the
primary organization for fostering the development functions of a bridge, router or both concurrently. A
of technology standards in the United States. The bridge/router can route one or more protocols, such
ANSI family of Fibre Channel documents provide the as TCP/IP, and bridge all other traffic. See also:
standards basis for the Fibre Channel architecture Bridge, Router.
and technology. See FC-PH.
Cache A small fast memory holding recently
Arbitrated Loop A Fibre Channel interconnection accessed data, designed to speed up subsequent
technology that allows up to 126 participating node access to the same data. Most often applied to
ports and one participating fabric port to processor-memory access but also used for a local
communicate. copy of data accessible over a network, and so on.
Backup (1) A copy of computer data that is used to Client A software program used to contact and
recreate data that has been lost, mislaid, corrupted, obtain data from a server software program on
or erased. (2) The act of creating a copy of computer another computer—often across a great distance.
data that can be used to recreate data that has been Each client program is designed to work
lost, mislaid, corrupted or erased. specifically with one or more kinds of server
programs and each server requires a specific kind
Bandwidth Measure of the information capacity of of client program.
a transmission channel.
Client/Server The relationship between machines
BI Business Intelligence. in a communications network. The client is the
requesting machine, the server the supplying
BIOS Basic Input/Output System - set of routines machine. Also used to describe the information
stored in read-only memory that enable a computer management relationship between software
to start the operating system and to communicate components in a processing system
with the various devices in the system, such as disk
drives, keyboard, monitor, printer, and Cluster A type of parallel or distributed system that
communications ports. consists of a collection of interconnected whole
computers and is used as a single, unified
Bridge (1) A component used to attach more than computing resource.
one I/O unit to a port. (2) A data communications
device that connects two or more networks and Coaxial Cable A transmission media (cable)
forwards packets between them. The bridge may used for high speed transmission. It is called
use similar or dissimilar media and signaling coaxial because it includes one physical channel
systems. It operates at the data link level of the OSI that carries the signal surrounded (after a layer of
model. Bridges read and filter data packets and insulation) by another concentric physical
frames. channel, both of which run along the same axis.
The inner channel carries the signal, and the outer
channel serves as a ground.
FCIP Fibre Channel over Internet Protocol. HBA Host Bus Adapter.
FCP Fibre Channel Protocol - the mapping of Heterogeneous Network Often used in the context
SCSI-3 operations to Fibre Channel. of distributed systems that may be running different
operating systems or network protocols (a
heterogeneous network).
iSCSI Internet Small Computer System Interface. NFS Network File System - A distributed file system
in UNIX developed by Sun Microsystems which
ISDN Integrated System Digital Network. allows a set of computers to cooperatively access
each other's files in a transparent manner.
JBOD Just a bunch of disks.
OSI Open Systems Interconnect - A model of
LAN Local Area Network - A network covering a network architecture and a suite of protocols (a
relatively small geographic area (usually not larger protocol stack) to implement it, developed by ISO in
than a floor or small building). Transmissions within 1978 as a framework for international standards in
a Local Area Network are mostly digital, carrying heterogeneous computer network architecture.
data among stations at rates usually above one
megabit/s. Packet A short block of data transmitted in a packet
switching network.
Latency A measurement of the time it takes to send
a frame between two locations. PFA Predictive Failure Analysis.
LUN Logical Unit Number - A 3-bit identifier used on POST Power-on self-test.
a SCSI bus to distinguish between up to eight
devices (logical units) with the same SCSI ID Protocol A data transmission convention
encompassing timing, control, formatting and data
MAN Metropolitan Area Network - A data network representation.
intended to serve an area the size of a large city.
QoS Quality of Service - A set of communications
MAC Media Access Control The lower sublayer of characteristics required by an application. Each QoS
the OSI data link layer. The interface between a defines a specific transmission priority, level of route
node's Logical Link Control and the network's reliability, and security level.
physical layer. The MAC differs for various physical
media.
Glossary 289
RAID Redundant Array of Inexpensive or SCSI Small Computer System Interface - A set of
Independent Disks. A method of configuring multiple evolving ANSI standard electronic interfaces that
disk drives in a storage subsystem for high allow personal computers to communicate with
availability and high performance. peripheral hardware such as disk drives, tape drives,
CD ROM drives, printers and scanners faster and
Raid-0 Level 0 RAID support - Striping, no more flexibly than previous interfaces.
redundancy.
SCSI-3 SCSI-3 consists of a set of primary
Raid-1 Level 1 RAID support - mirroring, complete commands and additional specialized command
redundancy. sets to meet the needs of specific device types. The
SCSI-3 command sets are used not only for the
Raid-5 Level 5 RAID support, Striping with parity. SCSI-3 parallel interface, but also for additional
parallel and serial protocols, including Fibre
RDist A utility included in UNIX that is used to Channel, Serial Bus Protocol (used with IEEE 1394
maintain identical copies of files over multiple hosts. Firewire physical protocol) and the Serial Storage
It preserves the owner, group, mode, and timestamp Protocol (SSP).
of files if possible, and can update programs that are
executing. SCSI-FCP The term used to refer to the ANSI Fibre
Channel Protocol for SCSI document (X3.269-199x)
Redirector An operating system driver that sends that describes the FC-4 protocol mappings and the
data to and receives data from a remote device. A definition of how the SCSI protocol and command
network redirector often provides mechanisms to set are transported using a Fibre Channel interface.
locate, open, read, write, and delete files and submit
print jobs. SCSI initiator A device that begins a SCSI
transaction by issuing a command to another device
RFC Request for Comment - One of a series, begun (the SCSI target), giving it a task to perform.
in 1969, of numbered Internet informational Typically a SCSI host adapter is the initiator, but
documents and standards widely followed by targets may also become initiators.
commercial software and freeware in the Internet
and UNIX communities. Few RFCs are standards Server A computer which is dedicated to one task.
but all Internet standards are recorded in RFCs.
SNIA Storage Networking Industry Association. A
Router (1) A device that can decide which of non-profit organization comprised of more than 77
several paths network traffic will follow based on companies and individuals in the storage industry.
some optimal metric. Routers forward packets from
one network to another based on network-layer SNMP Simple Network Management Protocol - The
information. (2) A dedicated computer hardware Internet network management protocol which
and/or software package which manages the provides a means to monitor and set network
connection between two or more networks. See configuration and run-time parameters.
also: Bridge, Bridge/Router
SSA Serial Storage Architecture - A high speed
SAN A Storage Area Network (SAN) is a dedicated, serial loop-based interface developed as a high
centrally managed, secure information speed point-to-point connection for peripherals,
infrastructure, which enables any-to-any particularly high speed storage arrays, RAID and
interconnection of servers and storage systems. CD-ROM storage by IBM.
StorWatch Expert These are StorWatch TCP/IP Transmission Control Protocol/ Internet
applications that employ a 3-tiered architecture that Protocol - a set of communications protocols that
includes a management interface, a StorWatch support peer-to-peer connectivity functions for both
manager and agents that run on the storage local and wide area networks.
resource(s) being managed. Expert products
employ a StorWatch data base that can be used for Topology An interconnection scheme that allows
saving key management data (e.g. capacity or multiple Fibre Channel ports to communicate. For
performance metrics). Expert products use the example, point-to-point, Arbitrated Loop, and
agents, as well as analysis of storage data saved in switched fabric are all Fibre Channel topologies.
the database, to perform higher value functions,
including: reporting of capacity, performance, etc. Trivial File Transfer Protocol (TFTP) A simple file
over time (trends), configuration of multiple devices transfer protocol used for downloading boot code to
based on policies, monitoring of capacity and diskless workstations. TFTP is defined in RFC 1350.
performance, automated responses to events or
conditions, and storage related data mining. Twisted Pair A transmission media (cable)
consisting of two insulated copper wires twisted
StorWatch Specialist A StorWatch interface for around each other to reduce the induction (thus
managing an individual fibre channel device or a interference) from one wire to another. The twists, or
limited number of like devices (that can be viewed as lays, are varied in length to reduce the potential for
a single group). StorWatch specialists typically signal interference between pairs. Several sets of
provide simple, point-in-time management functions twisted pair wires may be enclosed in a single cable.
such as configuration, reporting on asset and status This is the most common type of transmission
information, simple device and event monitoring, media.
and perhaps some service utilities.
UMS Universal Manageability Services.
Striping A method for achieving higher bandwidth
using multiple N_Ports in parallel to transmit a single UTP Unshielded Twisted Pair.
information unit across multiple levels.
VI Virtual Interface.
Switch A component with multiple entry/exit points
(ports) that provides dynamic connection between VTS Virtual Tape Server.
any two of these points.
WAN Wide area network - A network which
Switch Topology An interconnection structure in encompasses inter-connectivity between devices
which any entry point can be dynamically connected over a wide geographic area. A wide area network
to any exit point. In a switch topology, the available may be privately owned or rented, but the term
bandwidth is scalable. usually connotes the inclusion of public (shared)
networks.
Tape Backup Making magnetic tape copies of hard
disk and optical disc files for disaster recovery. WfM Wired for Management (Intel).
Tape Pooling A SAN solution in which tape XDR eXternal Data Representation - A standard for
resources are pooled and shared across multiple machine-independent data structures developed by
hosts rather than being dedicated to a specific host. Sun Microsystems for use in remote procedure call
systems. It is defined in RFC 1014.
Glossary 291
292 IP Storage Networking: IBM NAS and iSCSI Solutions
Index
Symbols C
cache 160
'routing' algorithms 68
Carrier Sense 73
‘headless’ environment 180
Carrier Sense Multiple Access with Collision Detec-
tion (CSMA/CD) 72, 73
Numerics channel I/O 11
200i 158 circuit switched telephone 69
client/server 13
Clustered Failover 140
A
access scheme 13 Clustering 108
Advanced System Management 150, 174, 175 coaxial 75
Advanced System Management PCI Adapter 177 collision domain 74
Advanced System Management Processor (ASMP) collisions 76
131 Common Information Model (CIM) 182
Alert on LAN 143, 155 Common Internet File System (CIFS) 19, 95, 256
Alto Aloha Network 72 connection 53
American National Standards Institute (ANSI) 30 cooked I/O 10
AntiVirus 220 copy-on-write 202
any-to-any 29 CSMA/CD 14
AppleTalk 40, 123, 141, 146 Customer Relationship Management (CRM) 3
appliance 20, 53
appliance-like 52 D
appliances 22 DAS 11, 59
Application layer 66, 71 Data Link layer 65
Arbitrated loop 30 Data Management Application (DMA) 251
Archival backup 192 data sharing 34
ARCnet 13 database I/O 32
ARP (Address Resolution Protocol) 163 datagram 17, 68
ASM planar processor 177 DECNet 16
Asynchronous Transfer Mode (ATM) 14 Desktop Management Interface (DMI) 182
ATM 67 DHCP servers 130
Automated Server Restart 177 Direct Access File System (DAFS) 238, 248
Direct Attach Storage (DAS) 1, 4
disaster recovery 142
B
Basic Input/Output System (BIOS) 174, 179 discrete LAN 65
battery-backed RAM 196 DNS server 182, 186
block I/O 9, 10, 30, 32, 49, 53, 105, 122, 231 Domain Naming Service (DNS) 96
block I/O applications 235 drag-and-drop 192
blocks 11
bridges 65, 75 E
Business Intelligence (BI) 3 e-commerce 2
End -of-File (EOF) 241
Enterprise Resource Planning (ERP) 3
Index 295
P routers 65
packet 17, 69 Routing 69
Peer-to-Peer Remote Copy (PPRC) 35, 226
Peripheral Component Interface (PCI) 86, 241
Peripheral Component Storage 159
S
sample connectivity 157
persistent images 192 SAN 29
Persistent Storage Manager (PSM) 44, 130, 154, attached disk 157
191, 197, 213 benefits 33
Physical layer 64 fabric 6
physical medium 72 over IP 4
plug-and-play 22 SANergy 40, 59
point-in-time 192 SANergy benefits 41
images 137 SANergy Metadata Controller 105, 226
persistent images 154 SBA 9
Point-to-point 30 Scalable storage 140
fabric 150 SCSI 8, 80
pooled SAN storage 41 SCSI bus adapter (SBA) 9
pooled storage 51 SCSI Select Utility 179
Power-on self-test (POST) 131, 134, 176 SCSI-3 32
Predictive Failure Analysis (PFA) 132, 174, 176 SDRAM 159
Presentation layer 65 segment 15, 72, 74
primary gateway 186 Serial Storage Architecture 10
Processor 160 Server Message Block (SMB) 96
Processors 161 Server to server 31
protocol 6 Server to storage 31
protocol stack 66 ServeRAID 160, 162, 163
ServeRAID Manager 130
Q ServerConfiguration.dat 183
Quality of Service (QoS) 48 server-less backup 227
ServerWorks ServerSet 159
Session layer 65
R Shared Everything 111
RAID 7, 123, 160
Shared Nothing 110
RAID-3 267
Shared null 109
RAMAC Virtual Array (RVA) 118
Shared serial port 176
random access memory (RAM) 193
short wave GBIC 141
raw data 40
Simple Network Management Protocol (SNMP)
Raw I/O 10, 100
182
read/write 41, 210
SmartSets 112
Redbooks Web site 283
SNIA 61
Contact us xv
SNMP 117, 238
Remote connectivity 177
SNMP device listener 143
remote copy (rcp) 18
spanning tree 15
remote file call 42
specialized server 20
Remote power cycling 176
SSA 8, 9, 10, 12
Remote Procedure Call (RPC) 94
stack 17, 64
Remote update 176
storage 4
requestor 41
Storage Area Network (SAN) 1, 3, 29, 59, 119
return on investment (ROI) 52
Index 297
298 IP Storage Networking: IBM NAS and iSCSI Solutions
IP Storage Networking: IBM NAS and iSCSI Solutions
(0.5” spine)
0.475”<->0.875”
250 <-> 459 pages
Back cover ®
IP Storage Networking:
IBM NAS and iSCSI
Solutions
All about the latest IP Storage Networking utilizes existing Ethernet infrastructure as a
IBM Storage Network backbone for connecting storage devices. By using this network,
INTERNATIONAL
Products the infrastructure investment may be leveraged to provide an even TECHNICAL
greater ROI. Where creation of a dedicated storage network is SUPPORT
Selection criteria for desirable, the use of familiar IP "fabric" means that existing support ORGANIZATION
skills and resources can be leveraged, providing lower cost of
Storage Networking
ownership. IP Storage Networking devices simplify installation and
needs
management by providing a complete suite of pre-loaded software.
They are readily capable of filling the need caused by the BUILDING TECHNICAL
Application elimination of general purpose servers with direct attached storage. INFORMATION BASED ON
scenarios PRACTICAL EXPERIENCE
This IBM Redbook is intended for IBMers, Business Partners, and
customers who are tasked to help choose a storage network. This IBM Redbooks are developed by
book will help you understand the different storage networking the IBM International Technical
technologies available in the market. It discusses the circumstances Support Organization. Experts
from IBM, Customers and
under which you might want to use SAN, NAS, or iSCSI, showing
Partners from around the world
where all of these technologies complement each other. create timely technical
information based on realistic
We introduce the different storage networking technologies, scenarios. Specific
discuss in detail how Network Attached Storage and iSCSI work, recommendations are provided
to help you implement IT
and show how they differ from SAN. Various NAS and iSCSI products
solutions more effectively in
from IBM are covered, with their management tools, including your environment.
on-disk data protection and data archiving. We also suggest some
sample NAS and iSCSI applications.