Submitted by:
ASHISH GUPTA
A22205215042
B.Tech (CSE) III Semester
CERTIFICATE
Certified that the Report entitled Rain Technology submitted by Ashish Gupta with
Enrolment No. A22205215042 in his own work and has been carried out under my
supervision. It is recommended that the candidate may now be evaluated for his
work by the University.
Signature:
Destination:
Date:
Signature:
Acknowledgements
First of all, I would like to sincerely thank my supervisor, Mr. Deepak Pawar, for
his persistent support, guidance, help, and encourage during the whole process of
my study and my dissertation work. I would also like to thank the students, staff
and faculty of the Department of Computer science and Engineering, Amity School
of Engineering and Technology for their support.
I would also like to thank my parents for their well wishes to complete this work.
Finally 1thanks to friends for their support and love all the time.
ASHISH GUPTA
ABSTRACT
The RAIN project is research collaboration between Caltech and NASA-JPL on
distributed computing and data storage systems for future space-borne missions.
The goal of the project is to identify and develop key building blocks for reliable
distributed systems built with inexpensive off-the-shelf components.
The RAIN platform consists of a heterogeneous cluster of computing and/or
storage nodes connected via multiple interfaces to networks configured in faulttolerant topologies. The RAIN software components run in conjunction with
operating system services and standard network protocols. Through softwareimplemented fault tolerance, the system tolerates multiple node, link, and switch
failures, with no single point of failure.
The RAIN technology has been transferred to RAIN finity, a start-up company
focusing on creating clustered solutions for improving the performance and
availability of Internet data centers.
The massive jumps in technology led to the expansion of internet as the most
accepted medium for communication. But one of the most prominent problems
with this client server based technology is that of maintaining a regular connection.
Even if a sole intermediate node breaks down, the entire system crumples. The
solution to this problem can be the use of clustering. Clustering means linking
together two or more systems to manage erratic workloads or to offer unremitting
operation in the event one fails .Clustering technology suggests an approach to
augment general reliability and performance. One implementation done for this
was RAIN-Reliable Array Of Independent Nodes developed by the California
Institute of Technology, in collaboration with NASAs Jet Propulsion Laboratory
and the Defense Advanced Research Projects Agency (DARPA).The technology is
implemented in a distributed computing architecture, built with inexpensive offthe-shelf components. The RAIN platform involves heterogeneous cluster of nodes
linked using many interfaces to networks configured in fault-tolerant topologies.
RAIN technology was capable of providing the solution by reducing the number of
nodes in the chain linking the client and server in addition to making the current
nodes more robust and more autonomous.
INTRODUCTION
Myrinet switches provide the high speed cluster message passing network for
passing messages between compute nodes and for I/O. The Myrinet switches have
a few counters that can be accessed from an ethernet connection to the switch.
These counters can be accessed to monitor the health of the connections, cables,
etc. The following information refers to the 16-port, the clos-64 switches, and the
Myrinet2000 switches.
ServerNet is a switched fabric communications link primarily used in proprietary
computers made by Tandem Computers, Compaq, and HP. Its features include
good scalability, clean fault containment, error detection and failover.
Reliable Array of
Independent Nodes.
4. The RAIN research team in 1998 formed a company called Rainfinity.
ARCHITECTURE
The RAIN technology incorporates a number of unique innovations as its core
modules:
communication among the computing nodes, and ensures that they operate
together without conflict.
C. Always-On-IP
Fault monitors track. The critical resources within and around the cluster:
network connections, on a continuous or event-driven basis. The critical
resources within and around the cluster: network connections, Rainfinity or
other applications residing on the nodes, remote nodes or applications. They
are an integral part of the RAIN technology, guaranteeing the healthy
operation of the cluster
E. Secure and Central Management
FEATURES OF RAIN
1. Communication
iii)
The Problem:
We look at the following problem: Given n switches of degree ds connected
in a ring, what is the best way to connect n compute nodes of degree dc to
the switches to minimize the possibility of partitioning the compute nodes
when switch failure occur? Figure3 illustrates the problem.
A Nave Approach:
Fig.4: (a)A naive approach dc=0. (b) Notice that it is easily partitioned with
two switches failures.
2. Data Storage
v Token Mechanism
The nodes in the membership are ordered in a logical ring. A token is a
message that is being passed at a regular interval from one node to next node
in the ring. The reliable packet communication layer is used for the
transmission of the token, and guarantees that the token will eventually
reach the destination. The token carries the authoritative knowledge of the
membership when a node receives a token; it updates its local membership
information according to the token. The token is also used for failure
detection. There are two variants for failure detection protocol in this token
mechanism. The aggressive detection protocol achieves fast detection time
but is more prone to incorrect decisions viz, it may temporarily exclude a
node only in the presence of link failures. The conservative detection
protocol excludes a node only when its communication has failed from all
To deal with the token loss problem, a time out has been set on each node in
the membership. If a node does not receive a token for a certain period of
time, it enters the STARVING mode. The node suspects that the token has
been lost and sends out a 911 message to the next node in the ring. The 911
message is a request for a right to regenerate the token, and is to be provided
by all the live nodes in the membership. It is imperative to allow one and
only one node to regenerate the token when a token regeneration is needed.
To guarantee this mutual exclusivity, we utilize the sequence number on the
token.
Every time a token is being passed from one node to another, the sequence
number on it is increased by one. The primary function of the sequence
number is to allow the receiving node to discard the out of sequence tokens.
The sequence number also plays an important role in the token regeneration
mechanism. Each node makes a local copy of the token every time that the
node receives it. When a node needs to send a 911 message to request the
regeneration of token, it adds this message to the sequence number that is on
its last local copy of the token. This sequence number will be compared to
all the sequence numbers on the local copies of the token on the other live
nodes. The 911 requests will be denied by any node, which possesses a more
recent copy of the token. In the event that the token is lost, every live node
sends out a 911 request after its STARVING timeout expires. Only the node
with the latest copy of the token will receive the right to regenerate the
token.
Dynamic Scalability
The 911 message is not only used as a token regeneration request, but also as
a request to join the group. When a new node wishes to participate in the
membership, it sends a 911 message to any node in the cluster. The receiving
node notices that the originating node of this 911 is not a member of the
distributed system, and therefore, treats it as a join request. The next time
that it
receives the token, it adds the new node to the membership, and sends the
token to the new node. The new node becomes a part of the system.
Link Failures and Transient Failures
The unification of the token regeneration request and the join request
facilitates the treatment of the link failures in the aggressive failure detection
protocol. Using the example in figure (b), node B has been removed from
the membership because of the failure between A and B. node B does not
receive the token for a while and it enters the STARVING mode and sends
out a 911 message to node C. node C notices that node B is not a part of the
membership and therefore treats the 911 as a join request. The ring is
changed to ABCD and node B joins the membership.
Transient failures are treated with the same mechanism. When a transient
failure occurs a node is removed from the membership. After the node
recovers it sends out a 911 message. The 911 message is treated as a join
request and the node is added back into the cluster. In the same fashion,
wrong decisions made in a local failure detector can be corrected,
guaranteeing that all no faulty nodes in the primary connected component
eventually stay in the primary membership. Putting together the token and
911 mechanisms, we have a reliable group membership protocol. Using this
protocol it is easy to build the fault management service. It is also possible
to attach to the token application dependant synchronization information.
4. Clustering
failure, will retry the task until a node responds. Many of the loosely
coupled computing projects make use of, to some degree, a RAIN strategy.
10. Suitability for Internet applications and Network Applications
RAIN technology is very apt for Internet and network applications. During
the RAIN project, key components were put up to accomplished to achieve
its suitability for network and internet applications. A patent was filed and
granted for the RAIN technology. Rainfinity emerged as a byproduct, in
1998, and the company had exclusive intellectual property rights to the
RAIN technology. After the formation of the company, the RAIN technology
has been further augmented, and additional patents have been filed. The
architecture objectives for clustering data network applications are dissimilar
to clustering data storage applications. Alike goals apply in the telecom
environment that offer the Internet backbone infrastructure, owing to the
nature of applications and services being clustered.
11. Scalability
COMPONENTS OF RAIN
The RAIN technology consists of following components:
4.1. RAIN nodes: These are the basic elements of RAIN, these hardware
components use 1 terabyte of disk storage capacity comprising standard Ethernet
networking and CPU processing power to run RAIN and data management
software. Data is stored and secured reliably among multiple RAIN nodes.
4.2. IP-based internetworking: The physical interconnections amongst the RAIN
nodes are established using standard IP-based LANs, metropolitan-area networks
(MAN) and/or WANs. This allows administrators develop an integrated storage
and protection grid of RAIN nodes across multiple data centers.
In star topology all the nodes are attached to Central HUB or switch. All the
nodes in network communicate with one another via Central HUB as shown
in following figure 2:
Main Problem in Star topology is that if Central HUB fails then whole network
goes down and no node can communicate with one another in network.
1) Star Topology Using Rain:
We can place switch at each Node of network and Each node can be connected
with few another node in network as shown in Figure 3 apart from central
node so if central node fails then node can communicate with rest of node of
network by using another path available. If central node fails then node-2 can
communicate with another path with node-1 and node-3. Suppose further any
one link of node-2 fails even then node-2 can communicate with with rest of
network. Node-2 will be disconnected if both outgoing link and central hub
fails.
In Ring Topology one node is connected with another node and forms a ring like
network as shown in Figure-4.
Using Rain technology nodes are attached with another nodes of network
using diameter method[3] such that in case of node or link failure can
communicate with one another. Nodes are connected with other node which
is on longest distance, which helps to reduce delay to transfer token.
D. Mesh Topology
In Mesh Topology every node of network has a dedicated point to point link
to every other device. A fully connected mesh network therefore has n (n1)/2 physical channels to link n devices. To accommodate that many links,
every device on the network has n-1 input/output ports. Figure -8 shows
representation of mesh topology.
Main problem in Mesh topology is that if there is n nodes then all nodes should
have n-1 input/output ports, therefore large number of cables are required.
(1) Mesh topology Using RAIN :
The problem in the mesh topology can be solved by the diameter solution[3] of
RAIN. The nodes are kept distance apart and the link is established at minimum
distance. Using diameter solution we can avoid the dedicated link among nodes ,
so there is no requirement of (n-1) port at each node.
ADVANTAGES
1. RAIN Technology is the most scalable software cluster technology for the
Internet market place today.
2. There is no limit on the size of a RAIN cluster.
3. All nodes are active and can participate in load balancing.
4. This software only technology is open and highly portable.
5. RAIN technology offers various benefits as listed below:
Fault tolerance: RAIN achieves fault tolerance through software
implementation. The system tolerates multiple node, link, and switch
failures, with no single point of failure .A RAIN cluster is a true distributed
computing system that is durable to faults, it works on the principle of
graceful degradation.
Open and portable: The technology used is open and highly portable. It is
compatible with a variety of hardware and software environments. Currently
it has been ported to Solaris, NT and Linux.
Load Balancing and Performance: New nodes can be added into the
cluster on the spot to take part in load sharing, without deteriorating the
network performance as in case of Rainwall. Rainwall keeps track of the
total traffic going into each node. When a disproportion is sensed, in the
network traffic, it moves one or more of the virtual IPs on the more heavilyloaded node to the more lightly-loaded node. Also new nodes can be added
into the cluster to participate in load sharing, without taking down the
cluster.
Disadvantage
Rain technology suffers with some drawback as specified below:
1. As the rain technology requires placement of switches in between of structure,so
it becomes little expensive.
2. Secondly, Installation and configuration is time consuming and requires
maintenance also.
3. Although if the node of the topology fails, it will not disturb the topology
completely as mentioned above but if the switch fails, it affects the network
partially and switch has to be repaired as early as possible.
APPLICATIONS
We consider several applications implemented on RAIN platform based on the
communication, fault management and data storage building blocks: a video server
(RAIN Video), a web server (SNOW), and a distributed check pointing system
(RAIN Check).
High availability video server:
There has been considerable research in the areas of fault tolerant internet and
multimedia servers. Examples are the SunSCALR project at Sun Microsystems
[15], For this RAIN Video application, a collection of videos are written and
encoded to all n nodes in the system with distributed store operations. After this
Each node runs a client application that attempts to display a video, as well as a
server application that supplies encoded video data.
High availability web server:
SNOW is meant for Strong Network of Web Servers. It implements the concept
project that demonstrates the features of the RAIN system. The main purpose is to
develop a highly available Fault-Tolerant Distributed Web Server Cluster that
minimizes the risk of down time for mission critical Internet and intranet
applications. The SNOW project uses several key building blocks of the RAIN
technology. First, it considers the reliable communication layer is used to handle
all of the messages, which passes between the servers in the SNOW system.
Secondly, the token-based fault management module is used to establish the set of
servers participating in the cluster.
Distributed check pointing mechanism:
A checkpoint and rollback/recovery mechanism on the RAIN platform based on
the distributed store and retrieve operations. The scheme runs in conjunction with a
leader election protocol. This protocol ensures that there is a unique node
designated as leader in every connected set of nodes. As each job executes, a
checkpoint of the state is taken periodically. The state is encoded and written to all
accessible nodes with a distributed store operation. If a node fails or becomes
inaccessible, the leader assigns the nodes job to other nodes.
CONCLUSION
The goal of the RAIN project has been to address fault management,
communication and storage in a distributed environment. Building blocks that we
consider important are those providing reliable communication, group membership
and reliable storage. Simply, RAIN allows for the grouping of an unlimited number
of nodes, which can then function as one single giant node, sharing load or taking
over if one or more of the nodes ceases to function correctly.
The future direction of this work is,
Development of API's for using the various building blocks.
The implementation of a real distributed file system
RAIN technology has been exceedingly advantageous in facilitating resolution of
high-availability and load-balancing problems. It is applicable to an extensive
range of networking applications, such as firewalls, web servers, IP telephony
gateways, application routers, etc. The purpose of the RAIN project has been to
pave a way to fault-management, communication, and storage in a distributed
environment. It integrates many significant exclusive innovations in its core
elements, like unlimited scalability, built-in reliability; portability etc .It has very
useful in the development of a fully functional distributed computing system.
REFERANCES
1) http://searchdatacenter.techtarget.com/definition/RAIN
2) http://www.thefreelibrary.com/Latest+network+management+products-
a062893683
3) http://paradise.caltech.edu/papers/etr029.pdf
4) http://www.seminarprojects.com/Thread-rain-random-array-of-independent-
nodes
5) http://en.wikipedia.org/wiki/Redundant_Array_of_Inexpensive_Nodes
6) www.paradise.caltech.edu