Anda di halaman 1dari 14

Title:

Is RAID Technology Beneficial in the Present


Day industry

Introduction
Definition of RAID
RAID, or Redundant Array of Inexpensive Disks (or later also referred to as
Redundant Array of Independent Disks) is an acronym first used in a 1988 paper by
Berkeley researchers Patterson, Gibson and Katz. RAID is a technology developed to
improve data protection and performance while storing large amounts of data,
without necessarily requiring improvements in disk drive technology.

History
RAID technology was first defined by a group of computer scientists at
the University of California at Berkeley in 1987. The scientists studied
the possibility of using two or more disks to appear as a single device to
the host system.
Although the array's performance was better than that of large, singledisk storage systems, reliability was unacceptably low. To address this,
the scientists proposed redundant architectures to provide ways of
achieving storage fault tolerance. In addition to defining RAID levels 1
through 5, the scientists also studied data striping -- a non-redundant
array configuration that distributes files across multiple disks in an array.
Often known as RAID 0, this configuration actually provides no data
protection. However, it does offer maximum throughput for some dataintensive applications such as desktop digital video production.

The driving factors behind RAID


A number of factors are responsible for the growing adoption of arrays
for critical network storage.
More and more organizations have created enterprise-wide networks to
improve productivity and streamline information flow. While the
distributed data stored on network servers provides substantial cost
benefits, these savings can be quickly offset if information is frequently
lost or becomes inaccessible. As today's applications create larger files,
network storage needs have increased proportionately. In addition,
accelerating CPU speeds have outstripped data transfer rates to
storage media, creating bottlenecks in today's systems.

RAID storage solutions overcome these challenges by providing a


combination of outstanding data availability, extraordinary and highly
scalable performance, high capacity, and recovery with no loss of data
or interruption of user access.
By integrating multiple drives into a single array -- which is viewed by
the network operating system as a single disk drive -- organizations can
create cost-effective, minicomputersized solutions of up to a terabyte or
more of storage.

RAID Levels
There are several different RAID "levels" or redundancy schemes, each
with inherent cost, performance, and availability (fault-tolerance)
characteristics designed to meet different storage needs. No individual
RAID level is inherently superior to any other. Each of the five array
architectures is well-suited for certain types of applications and
computing environments. For client/server applications, storage
systems based on RAID levels 1, 0/1, and 5 have been the most widely
used. This is because popular NOSs such as Windows NT Server and
NetWare manage data in ways similar to how these RAID architectures
perform.
RAID 0 - RAID 1 - RAID 2 - RAID 3 - RAID 4 - RAID 5 - RAID 01 (0+1)
and RAID 10 (1+0)
RAID 0
Data striping without redundancy (no protection).

Minimum number of drives: 2


Strengths: Highest performance.
Weaknesses: No data protection; One drive fails, all data is lost.

DRIVE 1 DRIVE 2
Data A

Data B

Data C

Data D

Data E

Data F

RAID 1
Disk mirroring.

Minimum number of drives: 2


Strengths: Very high performance; Very high data protection;
Very minimal penalty on write performance.
Weaknesses: High redundancy cost overhead; Because all data
is duplicated, twice the storage capacity is required.
Mirroring

Duplexing

Standard Host
Adapter

Standard Host Standard Host


Adapter 1
Adapter 2

DRIVE 1

DRIVE 2

DRIVE 1

DRIVE 2

Data A

Data A

Data A

Data A

Data B

Data B

Data B

Data B

Data C

Data C

Data C

Data C

Original Data Mirrored Data

Original Data Mirrored Data

RAID 2
No practical use.

Minimum number of drives: Not used in LAN


Strengths: Previously used for RAM error environments
correction (known as Hamming Code ) and in disk drives before
he use of embedded error correction.
Weaknesses: No practical use; Same performance can be
achieved by RAID 3 at lower cost.

RAID 3
Byte-level data striping with dedicated parity drive.

Minimum number of drives: 3


Strengths: Excellent performance for large, sequential data
requests.
Weaknesses: Not well-suited for transaction-oriented network
applications; Single parity drive does not support multiple,
simultaneous read and write requests.

RAID 4
Block-level data striping with dedicated parity drive.

Minimum number of drives: 3 (Not widely used)


Strengths: Data striping supports multiple simultaneous read
requests.
Weaknesses: Write requests suffer from same single parity-drive
bottleneck as RAID 3; RAID 5 offers equal data protection and
better performance at same cost.

RAID 5
Block-level data striping with distributed parity.

Minimum number of drives: 3


Strengths: Best cost/performance for transaction-oriented
networks; Very high performance, very high data protection;
Supports multiple simultaneous reads and writes; Can also be
optimized for large, sequential requests.
Weaknesses: Write performance is slower than RAID 0 or RAID
1.

DRIVE 1 DRIVE 2 DRIVE 3

Parity A Data A

Data A

Data B

Parity B Data B

Data C

Data C

Parity C

RAID 01 (0+1) and RAID 10 (1+0)


Combination of RAID 0 (data striping) and RAID 1 (mirroring). RAID 01
(0+1) is a mirrored configuration of two striped sets (mirror of stripes);
RAID 10 (1+0) is a stripe across a number of mirrored sets(stripe of
mirrors). RAID 10 provides better fault tolerance and rebuild
performance than RAID 01. Both array types provide very good to
excellent overall performance by combining the speed of RAID 0 with
the redundancy of RAID 1 without requiring parity calculations.

Minimum number of drives: 4


Strengths: Highest performance, highest data protection (can
tolerate multiple drive failures).
Weaknesses: High redundancy cost overhead; Because all data
is duplicated, twice the storage capacity is required; Requires
minimum of four drives.
RAID 01 (0+1 mirror of stripes)

DRIVE 1

DRIVE 2

DRIVE 3

DRIVE 4

Data A

Data A

mA

mA

Data B

Data B

mB

mB

Data C

Data C

mC

mC

Original Data Original Data Mirrored Data Mirrored Data


RAID 10 (1+0 stripe of mirrors)
DRIVE 1

DRIVE 2

DRIVE 3

DRIVE 4

Data A

mA

Data B

mB

Data C

mC

Data D

mD

Data E

mE

Data F

mF

Original Data Mirrored Data Original Data Mirrored Data

Types Of RAID
There are three primary array implementations: software-based arrays,
bus-based array adapters/controllers, and subsystem-based external
array controllers. As with the various RAID levels, no one
implementation is clearly better than another -- although software-based
arrays are rapidly losing favor as high-performance, low-cost array
adapters become increasingly available. Each array solution meets
different server and network requirements, depending on the number of
users, applications, and storage requirements.
It is important to note that all RAID code is based on software. The
difference among the solutions is where that software code is executed
-- on the host CPU (software-based arrays) or offloaded to an on-board
processor (bus-based and external array controllers).
Description

Advantages

Software- Primarily used with entry-level servers, Low price


based
software-based arrays rely on a
Only requires a
RAID
standard host adapter and execute all standard controller.
I/O commands and mathematically
intensive RAID algorithms in the host
server CPU. This can slow system
performance by increasing host PCI
bus traffic, CPU utilization, and CPU
interrupts. Some NOSs such as
NetWare and Windows NT include
embedded RAID software. The chief
advantage of this embedded RAID
software has been its lower cost
compared to higher-priced RAID
alternatives. However, this advantage
is disappearing with the advent of
lower-cost, bus-based array adapters.

Hardware- Unlike software-based arrays, busbased


based array adapters/controllers plug
RAID
into a host bus slot [typically a 133
MByte (MB)/sec PCI bus] and offload
some or all of the I/O commands and
RAID operations to one or more
secondary processors. Originally used
only with mid- to high-end servers due
to cost, lower-cost bus-based array
adapters are now available specifically
for entry-level server network
applications.

Data protection
and performance
benefits of RAID
More robust
fault-tolerant
features and
increased
performance
versus softwarebased RAID.

In addition to offering the fault-tolerant


benefits of RAID, bus-based array
adapters/controllers perform
connectivity functions that are similar
to standard host adapters. By residing
directly on a host PCI bus, they
provide the highest performance of all
array types. Bus-based arrays also
deliver more robust fault-tolerant
features than embedded NOS RAID
software.
As newer, high-end technologies such
as Fibre Channel become readily
available, the performance advantage
of bus-based arrays compared to
external array controller solutions may
diminish.
External Intelligent external array controllers
Hardware "bridge" between one or more server
RAID Card I/O interfaces and single- or multipledevice channels. These controllers
feature an on-board microprocessor,
which provides high performance and
handles functions such as executing
RAID software code and supporting
data caching.

OS
independent
Build super
high-capacity
storage systems for
high-end servers.

External array controllers offer


complete operating system
independence, the highest availability,
and the ability to scale storage to
extraordinarily large capacities (up to a
terabyte and beyond). These
controllers are usually installed in
networks of stand alone Intel-based
and UNIX-based servers as well as
clustered server environments.

Server Technology Comparison


UDMA

SCSI

Fibre Channel

Best Suited Low-cost entry


For
level server with
limited
expandability

Low to high-end
server when
scalability is
desired

Server-to-Server
campus networks

Advantages Uses low-cost


ATA drives

Performance:
up to 160 MB/s
Reliability
Connectivity to
the largest variety
of peripherals
Expandability

Performance:
up to 100 MB/s
Dual active
loop data path
capability
Infinitely
scalable

Parity
The concept behind RAID is relatively simple. The fundamental premise
is to be able to recover data on-line in the event of a disk failure by
using a form of redundancy called parity. In its simplest form, parity is an
addition of all the drives used in an array. Recovery from a drive failure

is achieved by reading the remaining good data and checking it against


parity data stored by the array. Parity is used by RAID levels 2, 3, 4, and
5. RAID 1 does not use parity because all data is completely duplicated
(mirrored). RAID 0, used only to increase performance, offers no data
redundancy at all.
A

PARITY

1
1

+
+

2
2

+
+

3
X

+
+

4
4

=
=

10
10

7 + X
-7 +
--------X
MISSING
DATA

= 10
= -7
---------3
RECOVERED
DATA

Fault tolerance
RAID technology does not prevent drive failures. However, RAID does
provide insurance against disk drive failures by enabling real-time data
recovery without data loss.
The fault tolerance of arrays can also be significantly enhanced by
choosing the right storage enclosure. Enclosures that feature
redundant, hot-swappable drives, power supplies, and fans can greatly
increase storage subsystem uptime based on a number of widely
accepted measures:

MTDL:
Mean Time to Data Loss. The average time before the failure of
an array component causes data to be lost or corrupted.

MTDA:
Mean Time between Data Access (or availability). The average
time before non-redundant components fail, causing data
inaccessibility without loss or corruption.

MTTR:
Mean Time To Repair. The average time required to bring an
array storage subsystem back to full fault tolerance.
MTBF:
Mean Time Between Failure. Used to measure computer
component average reliability/life expectancy. MTBF is not as
well-suited for measuring the reliability of array storage systems
as MTDL, MTTR or MTDA (see below) because it does not
account for an array's ability to recover from a drive failure. In
addition, enhanced enclosure environments used with arrays to
increase uptime can further limit the applicability of MTBF ratings
for array solutions.

RAID Levels
As the definition and awareness of the RAID technology has grown, several RAID
configurations for storing data have been devised and standardized upon. These
RAID "levels" are now commonly discussed in the industry. The simplest RAID
configurations either "stripe" data across two drives to increase data transfer speed,
but offer no data protection; or "mirror" redundant data onto a second drive, without
increasing performance. More advanced configurations involve three or more drives,
and simultaneously provide fault tolerance, increased performance, and the ability to
"recreate" information onto a spare drive should a drive failure occur. These more
advanced RAID configurations are preferred in server environments where maximum
data availability and performance is critical.
The applications, advantages, and disadvantages of the different RAID
configurations, or levels, are described below. The numbers assigned to each level of
RAID do not indicate superiority or effectiveness; they are only used to differentiate
between them.
RAID 0 - Disk Striping
With RAID 0, or a configuration known as "data striping", data is written in
sequential sections across more than two drives. RAID 0 is easy to implement, and it
can dramatically improve performance. Several drives can be accessed at once,
minimizing the overall "seek" time of larger files. This configuration has no data
redundancy and therefore no protection against data loss, however, so it should not
be used for business-critical applications.

RAID 1 - Mirroring
Also known as "drive mirroring", RAID 1 simultaneously copies data to a second
drive. The mirroring method offers data protection and good performance in the case
where a mirrored drive fails. RAID 1 is the simplest RAID configuration, requiring
only a minimum of two drives with equal capacity, and also that the drives be added
in pairs. The main disadvantage of RAID 1 is that it uses 100% drive overhead (the
highest of all RAID levels), which can be considered an inefficient use of drive
capacity.

RAID 2- Redundancy Using Hamming Code


A RAID 2 array stripes data to a group of drives using a byte stripe. A hamming code
Error Checking and Correction (ECC) symbol for each data stripe is stored on a
dedicated drive. This code provides detection and correction of data errors, allowing
data to be recovered without completely duplicating the data. Since most drives now

embed ECC information within each sector as standard, however, RAID 2 doesn't
offer any advantages over RAID 3.

RAID 3 - Striping Plus Parity


RAID 3 stripes data across multiple drives, with an additional drive dedicated to
parity for error correction/recovery. This configuration offers very high data transfer
rates and only requires a small percentage of ECC (parity) to data drives. However,
RAID 3 requires a complicated controller design and the configuration may be
difficult to rebuild after a drive failure.

RAID 4 - Independent Striping Plus Parity


RAID 4 is identical to RAID 3 except that large strips are used, so that records can be
read from any individual drive in the array apart from the parity drive, allowing read
operations to be overlapped. Since RAID 4 offers no significant advantages over
RAID 5, the RAID 4 configuration is now rarely implemented.

RAID 5 - Independent Striping Plus Distributed Parity


With RAID 5, each block of data is written on a data drive and parity information is
then striped across all drives. RAID 5 is the most popular of the RAID levels because
it delivers data protection and good performance with a small overhead for parity.
RAID 5 offers the most efficient use of drive capacity of all the redundant RAID
levels. This configuration requires at least three drives of equal size, which can be
added one at a time.

RAID 6 -RAID 5 With Double Parity (or "P+Q Redundancy")

(Not recognized by the RAID Advisory Board (RAB).)


RAID 6 is an extension of RAID 5 that uses a second independent distributed parity
scheme. Data is striped on a block level across a set of drives, and then a second set
of parity is calculated and written across all of the drives. This configuration provides
extremely high fault tolerance and can sustain several simultaneous drive failures,
but it requires an "n+2" number of drives and a very complicated controller design.
RAID 10 - Combination of RAID 1 and RAID 0
(Not recognized by original Berkeley papers or by the RAB.)
RAID 10 combines RAID 0 and RAID 1 by striping data across multiple drives without
parity, and it mirrors the entire array to a second set of drives. This process delivers
fast data access (like RAID 0) and single drive fault tolerance (like RAID 1), but cuts
the usable drive space in half. RAID 10 requires a minimum of four equally sized
drives, is the most expensive RAID solution and offers limited scalability.

RAID 53 - Combination of RAID 0 and RAID 3


(Not recognized by original Berkeley papers or by the RAB.)
The RAID 53 configuration should really be called "RAID 03". This configuration is a
striped array whose segments are essentially RAID 3 arrays. It has the same fault
tolerance and high data transfer rates of RAID 3, with the high I/O rates associated
with RAID 0 (striping), plus some added performance. This configuration is very
expensive and requires at least 5 drives to implement.
RAID Applications, Tradeoffs and Limitations
Typically, RAID is used in systems where data accessibility is critical and fault
tolerance is required, such as in large file servers. However, RAID is now also more
frequently seen used in desktop systems for CAD, multimedia editing and playback
where higher transfer rates are needed.
In general, for a given price point, the performance improvement of a particular type
of RAID array "trades off" with the amount of the redundancy and data security of
the array. Similarly, capacity of the array "trades off" with the price and fault
tolerance. Inexpensive RAID solutions are limited in their ability to protect your data
or improve performance, whereas high-end RAID implementations providing very
high performance and very high data reliability are quite expensive.
Although RAID can greatly improve the reliability and performance of a storage
system, it is dangerous to assume that a RAID system with redundancy provides
absolute data protection. Since there are sources of failure that are still applicable to
RAID systems, such as viruses, environmental disturbances and/or cases where
more than one drive fails at the same time, regular system maintenance and backup
remain critical practices.

Anda mungkin juga menyukai