Anda di halaman 1dari 32

Rain Technology

Submitted by:

ASHISH GUPTA
A22205215042
B.Tech (CSE) III Semester

Under the Guidance of


Mr. Deepak Pawar

Amity School of Engineering and Technology

AMITY UNIVERSITY RAJASTHAN

CERTIFICATE

Certified that the Report entitled Rain Technology submitted by Ashish Gupta with
Enrolment No. A22205215042 in his own work and has been carried out under my
supervision. It is recommended that the candidate may now be evaluated for his
work by the University.

Signature:

Destination:

Date:

Signature:

Acknowledgements

First of all, I would like to sincerely thank my supervisor, Mr. Deepak Pawar, for
his persistent support, guidance, help, and encourage during the whole process of
my study and my dissertation work. I would also like to thank the students, staff
and faculty of the Department of Computer science and Engineering, Amity School
of Engineering and Technology for their support.

I would also like to thank my parents for their well wishes to complete this work.
Finally 1thanks to friends for their support and love all the time.

ASHISH GUPTA

ABSTRACT
The RAIN project is research collaboration between Caltech and NASA-JPL on
distributed computing and data storage systems for future space-borne missions.
The goal of the project is to identify and develop key building blocks for reliable
distributed systems built with inexpensive off-the-shelf components.
The RAIN platform consists of a heterogeneous cluster of computing and/or
storage nodes connected via multiple interfaces to networks configured in faulttolerant topologies. The RAIN software components run in conjunction with
operating system services and standard network protocols. Through softwareimplemented fault tolerance, the system tolerates multiple node, link, and switch
failures, with no single point of failure.
The RAIN technology has been transferred to RAIN finity, a start-up company
focusing on creating clustered solutions for improving the performance and
availability of Internet data centers.
The massive jumps in technology led to the expansion of internet as the most
accepted medium for communication. But one of the most prominent problems
with this client server based technology is that of maintaining a regular connection.
Even if a sole intermediate node breaks down, the entire system crumples. The
solution to this problem can be the use of clustering. Clustering means linking
together two or more systems to manage erratic workloads or to offer unremitting
operation in the event one fails .Clustering technology suggests an approach to
augment general reliability and performance. One implementation done for this
was RAIN-Reliable Array Of Independent Nodes developed by the California
Institute of Technology, in collaboration with NASAs Jet Propulsion Laboratory
and the Defense Advanced Research Projects Agency (DARPA).The technology is
implemented in a distributed computing architecture, built with inexpensive offthe-shelf components. The RAIN platform involves heterogeneous cluster of nodes
linked using many interfaces to networks configured in fault-tolerant topologies.
RAIN technology was capable of providing the solution by reducing the number of
nodes in the chain linking the client and server in addition to making the current
nodes more robust and more autonomous.

INTRODUCTION

RAIN technology originated in a research project at the California Institute of


Technology (Caltech), in collaboration with NASA's Jet Propulsion Laboratory and
the Defense Advanced Research Projects Agency (DARPA). The name of the
original research project was RAIN, which stands for Reliable Array of
Independent Nodes. The main purpose of the RAIN project was to identify key
software building blocks for creating reliable distributed applications using off-theshelf hardware. The focus of the research was on high-performance, fault-tolerant
and portable clustering technology for space-borne computing. Led by Caltech
professor Shuki Bruck, the RAIN research team in 1998 formed a company called
Rainfinity. Rainfinity, located in Mountain View, Calif., is already shipping its first
commercial software package derived from the RAIN technology, and company
officials plan to release several other Internet-oriented applications. The RAIN
project was started four years ago at Caltech to create an alternative to the
expensive, special-purpose computer systems used in space missions. The Caltech
Researchers wanted to put together a highly reliable and available computer system
by distributing processing across many low-cost commercial hardware and
software components.
To tie these components together, the researchers created RAIN software, which
has three components:
1. A component that stores data across distributed processors and retrieves it even
if some of the processors fail.
2. A communications component that creates a redundant network between
multiple processors and supports a single, uniform way of connecting to any of the
processors.
3. A computing component that automatically recovers and restarts applications if a
processor fails.

Fig1: RAIN Software Architecture

Myrinet switches provide the high speed cluster message passing network for
passing messages between compute nodes and for I/O. The Myrinet switches have
a few counters that can be accessed from an ethernet connection to the switch.
These counters can be accessed to monitor the health of the connections, cables,
etc. The following information refers to the 16-port, the clos-64 switches, and the
Myrinet2000 switches.
ServerNet is a switched fabric communications link primarily used in proprietary
computers made by Tandem Computers, Compaq, and HP. Its features include
good scalability, clean fault containment, error detection and failover.

The ServerNet architecture specification defines a connection between nodes,


either processor or high performance I/O nodes such as storage devices. Tandem
Computers developed the original ServerNet architecture and protocols for use in
its own proprietary computer systems starting in 1992, and released the first
ServerNet systems in 1995.
Early attempts to license the technology and interface chips to other companies
failed, due in part to a disconnect between the culture of selling complete hardware
/ software / middleware computer systems and that needed for selling and
supporting chips and licensing technology.
A follow-on development effort ported the Virtual Interface Architecture to
ServerNet with PCI interface boards connecting personal computers. Infiniband
directly inherited many ServerNet features. After 25 years, systems still ship today
based on the ServerNet architecture.

BRIEF HISTORY OF RAIN

1. Rain Technology developed by the California Institute of technology, in

collaboration with NASAs Jet Propulsion laboratory and the DARPA.


2. It was started to develop a substitute to the costly, special-purpose computer

systems used in space missions. The prime objective of the Caltech


researchers was to put distributes processing across many economical
commercial hardware and software components.
3. The name of the original research project was RAIN, which stands for

Reliable Array of
Independent Nodes.
4. The RAIN research team in 1998 formed a company called Rainfinity.

Rainfinity is a company that primarily deals with creating clustered solutions


for enhancing the performance and availability of Internet data centers.

ARCHITECTURE
The RAIN technology incorporates a number of unique innovations as its core

modules:

FIG 2 : Architectures of RAIN Technology[1]


A. Reliable transport

Reliable transport ensures the reliable communication between the nodes in


the cluster. This transport has a built-in acknowledgement scheme that
ensures reliable packet delivery. It transparently uses all available network
links to reach the destination. When it fails to do so, it alerts the upper layer,
therefore functioning as a failure detector. This module is portable to
different computer platforms, operating systems and Networking
environments.

B. Consistent global state sharing protocol

This protocol provides consistent group membership, optimized information


distribution and distributed group-decision making for a RAIN cluster. This
module is at the core of a RAIN cluster. It enables efficient group

communication among the computing nodes, and ensures that they operate
together without conflict.
C. Always-On-IP

This module maintains pools of "always-available" virtual IPs. This virtual


IPs is logical addresses that can move from one node to another for load
sharing or fail-over. Usually a pool of virtual IPs is created for each subnet
that the RAIN cluster is connected to. A pool can consist of one or more
virtual IPs. Always-On-IP guarantees that all virtual IP addresses
representing the cluster are available as long as at least one node in the
cluster is operational. In other words, when a physical node fails in the
cluster, its virtual IP will be taken over by another Healthy node in the
cluster.
D. Local and Global Fault Monitors

Fault monitors track. The critical resources within and around the cluster:
network connections, on a continuous or event-driven basis. The critical
resources within and around the cluster: network connections, Rainfinity or
other applications residing on the nodes, remote nodes or applications. They
are an integral part of the RAIN technology, guaranteeing the healthy
operation of the cluster
E. Secure and Central Management

This module of Rain Technology offers a browser-based management GUI


for centralized monitoring and configuration of all nodes in the RAIN
clusters. The central management GUI connects to any node in the cluster to
obtain a single-system view of the entire cluster. It actively monitors the
status, and can send operation and configuration commands to the entire
cluster.

FEATURES OF RAIN

1. Communication

As the network is frequently a single point of failure, RAIN provides fault


tolerance in the network through the following mechanisms:
i)
Bundled interfaces: Nodes are permitted to have multiple
interface cards. This not only adds fault tolerance to the network,
but also gives improved bandwidth.
ii)

Link monitoring: To correctly use multiple paths between nodes


in the presence of faults, we have developed a link state monitoring
protocol that provides a consistent history of the link state at each
endpoint.

iii)

Fault-tolerant interconnects topologies: Network partitioning is


always a problem when a cluster of computers must act as a whole.
We have designed network topologies that are resistant to
partitioning as network elements fail.

The Problem:
We look at the following problem: Given n switches of degree ds connected
in a ring, what is the best way to connect n compute nodes of degree dc to
the switches to minimize the possibility of partitioning the compute nodes
when switch failure occur? Figure3 illustrates the problem.

Fig 3: How to connect n compute nodes to a ring of n switches?

A Nave Approach:

At a first glance, Figure 4a may seem a solution to our problem. In this


construction we simply connect the compute nodes to the nearest switches in
regular fashion. If we use this approach, we are relying entirely on fault
tolerance in the switching network.
A ring is 1-fault-tolerant for connectivity, so we can lose one switch without
upset. A second switch failure can partition the switches and thus the
compute nodes, as in figure 4b. this prompts the study of whether we can use
the multiple connections of the compute nodes to make the compute nodes
more resistant to partitioning. In other word, we want a construction where
the connectivity of the nodes is maintained even after the switch network has
become partitioned.

Fig.4: (a)A naive approach dc=0. (b) Notice that it is easily partitioned with
two switches failures.

2. Data Storage

Fault tolerance in data storage over multiple disks is achieved through


redundant storage schemes. Novel error-correcting codes have been
developed for this purpose. These are array codes that encode and decode
using simple XOR operations. Traditional RAID codes generally allow
mirroring or parity as options. Array codes exhibit optimality in the storage
requirements as well as in the number of update operations needed.
Although some of the original motivations for these codes come from
traditional RAID systems, these schemes apply equally well to partitioning
data over disks on distinct nodes or even partitioning data over remote
geographic locations.

Fig 5: RAIN Testbed


3. Group Membership

Tolerating faults in an asynchronous distributed system is a challenging task.


Reliable group Membership service ensures that processes in a group
maintain a consistent view of the global membership. In order for a
distributed application to work correctly in the presence of faults, a certain
level of problems in an asynchronous distributed system such as consensus,
group membership, commit and atomic broadcast that have been extensively
studied by researchers. In the RAIN system, the group membership protocol
is the critical building block. It is a difficult task especially when change in
membership occurs, either due to failures or voluntary joins and
withdrawals. In fact under the classical asynchronous environment, the
group membership problem has been proven impossible to solve in the
presence of any failures. The underlying reason for the impossibility is that
according to the classical definition of asynchronous environment, processes
in the system share no common clock and there is no bound on the message
delay. Under this definition it is impossible to implement a reliable fault
detector, for no fault detector can distinguish between a crashed mode and a
very slow mode. Since the establishment of this theoretic result researchers
have been striving to circumvent this impossibility. Theorists have modified
the specification while practitioners have built a number of real systems that
achieve a level of reliability in their particular environment.

v Token Mechanism
The nodes in the membership are ordered in a logical ring. A token is a
message that is being passed at a regular interval from one node to next node
in the ring. The reliable packet communication layer is used for the
transmission of the token, and guarantees that the token will eventually
reach the destination. The token carries the authoritative knowledge of the
membership when a node receives a token; it updates its local membership
information according to the token. The token is also used for failure
detection. There are two variants for failure detection protocol in this token
mechanism. The aggressive detection protocol achieves fast detection time
but is more prone to incorrect decisions viz, it may temporarily exclude a
node only in the presence of link failures. The conservative detection
protocol excludes a node only when its communication has failed from all

nodes in the connected component. The conservative failure detection


protocol has slower detection time than the other detection protocol.

Aggressive Failure Detection


When the aggressive failure detection protocol is used, after a node fails to
send a token to the next node, the former node immediately decides that the
latter node has failed or disconnected, and removes information and passes
the token to the next live node in the ring. This protocol does not guarantee
that all nodes in the connected component are included in the membership at
all times. If a node looses a connection to part of the system because of link
failure, it could be excluded from the membership. The excluded node will
automatically rejoin the system, however, via the 911 mechanism, which
will describe in the next section. For eg., for the situation in figure(b), the
link between A and B is broken. After node A fails to send the token to node
B, the aggressive failure detection protocol excludes node B from the
membership. The ring changes from ABCD to ACD until node B rejoins the
membership when the 911 mechanism is activated.

Conservative Failure Detection

In comparison when conservative failure detection protocol is used, partially


disconnected nodes will not be excluded. When a node detects that another
node is not responding, the former node does not remove the latter node
from the membership instead it changes the order of the ring. In figure (c),
after node A fails to send the token to node B, it changes the order of the ring
from ABCD to ACBD. Node A then sends the token to node C, and C to
node B. in the case when a node is indeed broken, all the nodes in the
connected component fail to send the token to this node. When a node fails
to send a token to another node twice in a row, it removes that node from the
membership.
Uniqueness of Tokens
The token mechanism is the basic component of the membership protocol. It
guarantees that there exists no more than one token in the system at any
time. This single token detects the failures, records the membership and
updates all live nodes as it travels around the ring. After a failed node is
determined, all live nodes in the membership are unambiguously informed
within one round of token travel. Group membership consensus is therefore
achieved.
911 Mechanisms
Having described the token mechanism, few questions remain. What if a
node fails when it processes the token and consequently the token is lost? Is
it possible to add a new node to the system? How does the system recover
from the transient failures? All of these questions can be answered by the
911 mechanism.
Token Regeneration

To deal with the token loss problem, a time out has been set on each node in
the membership. If a node does not receive a token for a certain period of
time, it enters the STARVING mode. The node suspects that the token has
been lost and sends out a 911 message to the next node in the ring. The 911
message is a request for a right to regenerate the token, and is to be provided
by all the live nodes in the membership. It is imperative to allow one and
only one node to regenerate the token when a token regeneration is needed.
To guarantee this mutual exclusivity, we utilize the sequence number on the

token.
Every time a token is being passed from one node to another, the sequence
number on it is increased by one. The primary function of the sequence
number is to allow the receiving node to discard the out of sequence tokens.
The sequence number also plays an important role in the token regeneration
mechanism. Each node makes a local copy of the token every time that the
node receives it. When a node needs to send a 911 message to request the
regeneration of token, it adds this message to the sequence number that is on
its last local copy of the token. This sequence number will be compared to
all the sequence numbers on the local copies of the token on the other live
nodes. The 911 requests will be denied by any node, which possesses a more
recent copy of the token. In the event that the token is lost, every live node
sends out a 911 request after its STARVING timeout expires. Only the node
with the latest copy of the token will receive the right to regenerate the
token.
Dynamic Scalability

The 911 message is not only used as a token regeneration request, but also as
a request to join the group. When a new node wishes to participate in the
membership, it sends a 911 message to any node in the cluster. The receiving
node notices that the originating node of this 911 is not a member of the
distributed system, and therefore, treats it as a join request. The next time
that it
receives the token, it adds the new node to the membership, and sends the
token to the new node. The new node becomes a part of the system.
Link Failures and Transient Failures

The unification of the token regeneration request and the join request
facilitates the treatment of the link failures in the aggressive failure detection
protocol. Using the example in figure (b), node B has been removed from
the membership because of the failure between A and B. node B does not
receive the token for a while and it enters the STARVING mode and sends
out a 911 message to node C. node C notices that node B is not a part of the
membership and therefore treats the 911 as a join request. The ring is
changed to ABCD and node B joins the membership.
Transient failures are treated with the same mechanism. When a transient
failure occurs a node is removed from the membership. After the node

recovers it sends out a 911 message. The 911 message is treated as a join
request and the node is added back into the cluster. In the same fashion,
wrong decisions made in a local failure detector can be corrected,
guaranteeing that all no faulty nodes in the primary connected component
eventually stay in the primary membership. Putting together the token and
911 mechanisms, we have a reliable group membership protocol. Using this
protocol it is easy to build the fault management service. It is also possible
to attach to the token application dependant synchronization information.
4. Clustering

Clustering means linking together two or more systems to handle variable


workloads or to provide continued operation in the event one fails. Each
computer may be a multiprocessor system itself, clustered computers behave
like a single computer and are used for load balancing, fault tolerance, and
parallel processing. Rainfinity provides clustering solutions that let Internet
applications to run on a reliable, scalable cluster of computing nodes so that
they do not become single points of failures.
5. Distributed

A distributed system is the one that contains many independent computers


that communicate through a computer network. The computers communicate
with each other in order to achieve a common goal. Software that runs in a
distributed system is called a distributed program. RAIN uses loosely
coupled architecture but, the distributed protocols interact closely with
existing networking protocols so that a RAIN cluster is able to interact
with the environment. Particularly, technological modules were developed to
manage high-volume network-based transactions.
6. Shared-Nothing

Shared nothing architecture (SNA) is a distributed computing architecture


that contains of multiple nodes such that each node consists of its own
private memory, disks and input/output devices independent of any other
node in the network. Each node is self sufficient and shares nothing across
the network. Therefore, there is no disputation and no data sharing or system

resources. In RAIN the most general share-nothing model is assumed. There


is no shared storage accessible from all computing nodes. The only way for
the computing nodes to share state is to communicate via a network.
7. Fault tolerant

RAIN achieves fault tolerance through software implementation. The system


tolerates multiple node, link, and switch failures, with no single point of
failure. But the concept has actually been derived from RAID (redundant
array of independent disks) which is implementation on independent disk
arrays. Disk-use techniques involve the use of multiple disks working
cooperatively. It uses Disk striping uses a group of disks as one storage unit.
RAID schemes improve performance and improve the reliability of the
storage system by storing redundant data. RAID uses mirroring or
shadowing (RAID 1) to keeps duplicate of each disk On the other hand
RAIN is a novel, more advanced, way of protecting computer storage than
RAID. A RAIN cluster is a proper distributed computing system that is
tough to faults. It handles node, link and application failures or transient
failures efficiently. When there are failures in the system, a RAIN cluster
gracefully degrades to leave out the failed node continues to perform the
operations.
8. Reliance on software

RAIN depends on software to systematize multiple separate computer


servers to provide data reliability. In spite of storing multiple copies of the
same data on physically separate hard disks on a server, data is replicated
across multiple servers. The software organizing the cluster of RAIN servers
knows the location of each copy and thus provides protection in case of
failures by making duplicate copies as and when required.
9. Use of inexpensive nodes

RAIN uses loosely coupled computing clusters using inexpensive RAIN


nodes, instead of using expensive hardware devices. It uses management
software that transmits tasks to various computers and, in the event of a

failure, will retry the task until a node responds. Many of the loosely
coupled computing projects make use of, to some degree, a RAIN strategy.
10. Suitability for Internet applications and Network Applications

RAIN technology is very apt for Internet and network applications. During
the RAIN project, key components were put up to accomplished to achieve
its suitability for network and internet applications. A patent was filed and
granted for the RAIN technology. Rainfinity emerged as a byproduct, in
1998, and the company had exclusive intellectual property rights to the
RAIN technology. After the formation of the company, the RAIN technology
has been further augmented, and additional patents have been filed. The
architecture objectives for clustering data network applications are dissimilar
to clustering data storage applications. Alike goals apply in the telecom
environment that offer the Internet backbone infrastructure, owing to the
nature of applications and services being clustered.
11. Scalability

The technology has the feature of scalability. Scalability is the ability of a


computer application or product to continue to function well when it is
changed in size or volume in order to meet a user need. RAIN has the
characteristic wherein failed node is replaced by a new node. It focuses on
recovery from unplanned and planned downtimes. This new-fangled type of
cluster must also be able to make the most of I/O performance by load
balancing across various computing nodes. Moreover with the help of RAIN
connection between a client and server can be maintained despite all
unplanned and planned downtimes.

COMPONENTS OF RAIN
The RAIN technology consists of following components:
4.1. RAIN nodes: These are the basic elements of RAIN, these hardware
components use 1 terabyte of disk storage capacity comprising standard Ethernet
networking and CPU processing power to run RAIN and data management
software. Data is stored and secured reliably among multiple RAIN nodes.
4.2. IP-based internetworking: The physical interconnections amongst the RAIN
nodes are established using standard IP-based LANs, metropolitan-area networks
(MAN) and/or WANs. This allows administrators develop an integrated storage
and protection grid of RAIN nodes across multiple data centers.

Fig 7: Hardware used in RAIN


4.3. RAIN management software: The RAIN management software is a vital
component of the RAIN architecture and performs some significant tasks like
letting RAIN nodes incessantly communicate their assets, capacity, performance
and health among themselves, automatically detecting the presence of new RAIN
nodes on a new network, recovery operations etc. RAIN software has three
components:
Storage component: The basic function of this component is to store and retrieve
data across distributed processors
Communication component: Communications component is used to create a
redundant network between multiple for providing uniformity across the network.
Computing component: A computing component is concerned with automatically
recovering and restarting applications if there exist a malfunctioning processor.

TOPOLOGY USING RAIN


Rain technology helps in building the structure of topology in such a manner that it
minimizes the number of nodes and removes the extra nodes. It is able to provide
the solution by minimizing the total number of nodes in network between client
and server. As the total number of nodes is minimum, so the data transmission time
will also be reduced from source node to destination node. Secondly, delay factor
will also be reduced and data can be transmitted within less period of time.
A. Star Topology

In star topology all the nodes are attached to Central HUB or switch. All the
nodes in network communicate with one another via Central HUB as shown
in following figure 2:

Fig 8: Star Topology

Main Problem in Star topology is that if Central HUB fails then whole network
goes down and no node can communicate with one another in network.
1) Star Topology Using Rain:

We can place switch at each Node of network and Each node can be connected
with few another node in network as shown in Figure 3 apart from central
node so if central node fails then node can communicate with rest of node of
network by using another path available. If central node fails then node-2 can

communicate with another path with node-1 and node-3. Suppose further any
one link of node-2 fails even then node-2 can communicate with with rest of
network. Node-2 will be disconnected if both outgoing link and central hub
fails.

Fig 9: Star Topology Using Rain


B. Ring Topology

In Ring Topology one node is connected with another node and forms a ring like
network as shown in Figure-4.

Fig 10: Rain Topology


There are two Main problem of ring network :
If one node of network fails then whole network fails.
Scalability : if we add more nodes in network then token needs more time to
reach at destination node, thus delay time increase.
(1) Ring Topology Using Rain :

Using Rain technology nodes are attached with another nodes of network

using diameter method[3] such that in case of node or link failure can
communicate with one another. Nodes are connected with other node which
is on longest distance, which helps to reduce delay to transfer token.

Fig 11: Solution of Rain Topology Using Rain


As shown in above Figure-5 every nodes are connected with another node which
are far from them and they can suffer upto 2 -3 link failure. If any link in above
ring topology fails then via another duplicate path node can communicate with
another node in network.
C. BUS Topology
In Bus topology backbone cable is used on which all the nodes of network are
connected .Every node of network communicate with each other via backbone
cable.

Fig 12: Bus Topology


Main problem with bus topology is that if backbone cable fails then
whole
network goes down.
1) BUS Topology Using Rain: Nodes of Bus topology are connected through
Backbone cable as well as switch, as shown in following in figure 7. So each
node in bus topology can communicate with rest of network by using either
switch or backbone cable. Node are connected with different switch so that
they can reach to all node of network as in figure-7.

Fig 13: Bus Topology Using Rain


If backbone cable fails then also nodes can communicate with one another using
switch and another node of network. Network will goes down when backbone
cable as well as both switch will fail.

D. Mesh Topology
In Mesh Topology every node of network has a dedicated point to point link
to every other device. A fully connected mesh network therefore has n (n1)/2 physical channels to link n devices. To accommodate that many links,
every device on the network has n-1 input/output ports. Figure -8 shows
representation of mesh topology.

Fig 14: Mesh Topology

Main problem in Mesh topology is that if there is n nodes then all nodes should
have n-1 input/output ports, therefore large number of cables are required.
(1) Mesh topology Using RAIN :
The problem in the mesh topology can be solved by the diameter solution[3] of
RAIN. The nodes are kept distance apart and the link is established at minimum
distance. Using diameter solution we can avoid the dedicated link among nodes ,
so there is no requirement of (n-1) port at each node.

ADVANTAGES
1. RAIN Technology is the most scalable software cluster technology for the
Internet market place today.
2. There is no limit on the size of a RAIN cluster.
3. All nodes are active and can participate in load balancing.
4. This software only technology is open and highly portable.
5. RAIN technology offers various benefits as listed below:
Fault tolerance: RAIN achieves fault tolerance through software
implementation. The system tolerates multiple node, link, and switch
failures, with no single point of failure .A RAIN cluster is a true distributed
computing system that is durable to faults, it works on the principle of
graceful degradation.

Simple to deploy and manage: It is very easy to deploy and administer a


RAIN cluster. RAIN technology deals with the scalability problem on the
layer where it is happening, without the need to create additional layers in
the front. The management software allows the user to monitor and
configure the entire cluster by connecting to any one of the nodes.

Open and portable: The technology used is open and highly portable. It is
compatible with a variety of hardware and software environments. Currently
it has been ported to Solaris, NT and Linux.

Supports for heterogeneous environment: It supports a heterogeneous


environment as well, where the cluster can consist of nodes of different
operating systems with different configurations.

No distance limitation: There is any distance restriction to RAIN


technology. It allows clusters of geographically distributed nodes. It can
work with many different Internet applications.

Availability: Another advantage of RAIN is its incessant availability. Eg as


in case of Rainwall ,it detects failures in software and hardware components
in real time, shifting traffic from failing gateways to functioning ones
without interrupting existing connections.

Scalability: RAIN technology is scalable. There is no limit on the size of a


RAIN cluster eg. Rainwall is scalable to any number of Internet firewall
gateways and allows the addition of new gateways into the cluster without
service interruption

Load Balancing and Performance: New nodes can be added into the
cluster on the spot to take part in load sharing, without deteriorating the
network performance as in case of Rainwall. Rainwall keeps track of the
total traffic going into each node. When a disproportion is sensed, in the
network traffic, it moves one or more of the virtual IPs on the more heavilyloaded node to the more lightly-loaded node. Also new nodes can be added
into the cluster to participate in load sharing, without taking down the
cluster.

Disadvantage
Rain technology suffers with some drawback as specified below:
1. As the rain technology requires placement of switches in between of structure,so
it becomes little expensive.
2. Secondly, Installation and configuration is time consuming and requires
maintenance also.
3. Although if the node of the topology fails, it will not disturb the topology
completely as mentioned above but if the switch fails, it affects the network
partially and switch has to be repaired as early as possible.

APPLICATIONS
We consider several applications implemented on RAIN platform based on the
communication, fault management and data storage building blocks: a video server
(RAIN Video), a web server (SNOW), and a distributed check pointing system
(RAIN Check).
High availability video server:
There has been considerable research in the areas of fault tolerant internet and
multimedia servers. Examples are the SunSCALR project at Sun Microsystems
[15], For this RAIN Video application, a collection of videos are written and
encoded to all n nodes in the system with distributed store operations. After this
Each node runs a client application that attempts to display a video, as well as a
server application that supplies encoded video data.
High availability web server:
SNOW is meant for Strong Network of Web Servers. It implements the concept
project that demonstrates the features of the RAIN system. The main purpose is to
develop a highly available Fault-Tolerant Distributed Web Server Cluster that
minimizes the risk of down time for mission critical Internet and intranet
applications. The SNOW project uses several key building blocks of the RAIN
technology. First, it considers the reliable communication layer is used to handle
all of the messages, which passes between the servers in the SNOW system.
Secondly, the token-based fault management module is used to establish the set of
servers participating in the cluster.
Distributed check pointing mechanism:
A checkpoint and rollback/recovery mechanism on the RAIN platform based on
the distributed store and retrieve operations. The scheme runs in conjunction with a
leader election protocol. This protocol ensures that there is a unique node
designated as leader in every connected set of nodes. As each job executes, a

checkpoint of the state is taken periodically. The state is encoded and written to all
accessible nodes with a distributed store operation. If a node fails or becomes
inaccessible, the leader assigns the nodes job to other nodes.

CONCLUSION
The goal of the RAIN project has been to address fault management,
communication and storage in a distributed environment. Building blocks that we
consider important are those providing reliable communication, group membership
and reliable storage. Simply, RAIN allows for the grouping of an unlimited number
of nodes, which can then function as one single giant node, sharing load or taking
over if one or more of the nodes ceases to function correctly.
The future direction of this work is,
Development of API's for using the various building blocks.
The implementation of a real distributed file system
RAIN technology has been exceedingly advantageous in facilitating resolution of
high-availability and load-balancing problems. It is applicable to an extensive
range of networking applications, such as firewalls, web servers, IP telephony
gateways, application routers, etc. The purpose of the RAIN project has been to
pave a way to fault-management, communication, and storage in a distributed
environment. It integrates many significant exclusive innovations in its core
elements, like unlimited scalability, built-in reliability; portability etc .It has very
useful in the development of a fully functional distributed computing system.

REFERANCES

1) http://searchdatacenter.techtarget.com/definition/RAIN
2) http://www.thefreelibrary.com/Latest+network+management+products-

a062893683
3) http://paradise.caltech.edu/papers/etr029.pdf
4) http://www.seminarprojects.com/Thread-rain-random-array-of-independent-

nodes
5) http://en.wikipedia.org/wiki/Redundant_Array_of_Inexpensive_Nodes
6) www.paradise.caltech.edu

Anda mungkin juga menyukai