Anda di halaman 1dari 51

Fundamentals of DCS

DISTRIBUTED SYSTEMS

What is a distributed system? A distributed system is a collection of independent computers that appear to its users as a single coherent system.
A weaker definition of distributed system: A distributed system is a collection of independent computers that are used jointly to perform a single task or to provide a single service.

Multiple processors
Computer architectures having interconnected, multiple processors are basically of two types: 1. Tightly coupled systems. 2. Loosely coupled systems

(a) A tightly coupled multiprocessor systems (b) A loosely coupled multiprocessor systems

Multiple processors
Tightly coupled systems. there is a single system wide primary memory (address space) that is shared by all the processors If any processor writes, for example, the value 100 to the memory location x, any other processor subsequently reading from location x will get the value 100. Therefore, in these systems, any communication between the processors usually takes place through the shared memory. are referred to as parallel processing systems

2. Loosely coupled systems. In these systems, the processors do not share memory, and each processor has its own local memory If a processor writes the value 100 to the memory location x, this write operation will only change the contents of its local memory and will not affect thememory of any other processor. In these systems, all communication between the processors is done by passing messages across the network. are referred to as distributed computing systems

In short, a DCS is basically a collection of processors interconnected by a communication network in which each processor has its own local memory and other peripherals, and the communication between any two processors of the system takes place by message passing over the communication network. For a particular processor, its own resources are local, whereas the other processors and their resources are remote. Together, a processor and its resources are usually referred to as a node or site or machine of the distributed computing system.

DCS Models
System Architecture types are
Minicomputer Model-processor/user Workstation Model-processor/user Workstation-server Model-processor/user Processor Pool Model-processor/user Hybrid Model

Minicomputer Model
A simple extension of the centralized time-sharing system. A DCS based on this model consists of few minicomputers ( may be large supercomputers as well) interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on to it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputer. The network allows a user to access remote resources available on some machine other than the one logged.

Workstation Model
A DCS based on the workstation model consists of several workstations interconnected by a communication network. A company's office or a university department, It has been often found that in such an environment, at night many workstations are idle resulting in the waste of large amounts of CPU time. The idea of the workstation model is to interconnect all these workstations by a high-speed LAN so that idle workstations may be used to process jobs of users who are logged onto other workstations and do not have sufficient processing power at their own workstations and get their jobs processed efficiently. The system transfers one or more of the processes from the user's workstation to some other workstation that is currently idle and gets the process executed there, and finally the result of execution is returned to the user's workstation.

Workstation-Server Model
A workstation with its own local disk is usually called a diskful workstation and a workstation without a local disk is called a diskless workstation. With the invention of high-speed networks, diskless workstations ie., the workstation-server model is popular.

Workstation

Along with Normal computation activities, requests for services provided by special servers (such as a file server or a database server) are sent to a server providing that type of service that performs the users requested activity and returns the result of request processing to the users workstations. Here , the users processes need not to be migrated to the server machines for getting the work done by those machines.

Comparison
Workstation-server model has several advantages: Cheaper - to use a few servers that are accessed over the network than a large number of diskful stations, with each workstation having a small, slow disk. Diskless workstations are preferred from a system maintenance point of view. Backup and hardware maintenance are easier to perform with a few large disks (servers). Furthermore, installing new releases of software (such as file server with new functionalities) is easier.

Comparison
In the workstation-server model, since the file servers manage all files, users have the flexibility to use any workstation and access the files in the same manner irrespective of which workstation the user is currently logged on. In the workstation-server model, the request-response protocol is mainly used to access the services of the server machines. Therefore, this model does not need a process migration facility, which is difficult to implement. A user has guaranteed response time because workstations are not used for executing remote processes. However, the model does not utilize the processing capability of idle workstations.

Processor-Pool Model
A user needs very large amount computing power once in a while for a short time. Therefore, in the processor-pool model the processors are pooled together to be shared by the users as needed. The pool of processors consists of a large number of microcomputers and minicomputers attached to the network.
Pool of processors

Each processor in the pool has its own memory to load and run a system program or an application program of the distributed computing system.

Hybrid Model
To combine the advantages of both the workstation-server and processor-pool models, a hybrid model may be used to build a distributed computing system. This is based on the workstation-server model but with the addition of a pool of processors. The processors in the pool can be allocated dynamically for computing that are too large for workstations or that requires several computers concurrently for efficient execution. In addition to efficient execution of computation-intensive jobs, the hybrid model gives guaranteed response to interactive jobs by allowing them to the processed on local workstations of the users. However, the hybrid model is more expensive to implement than the workstation-server model or the processor-pool model.

Advantages of Distributed System


Inherently distributed applications
Bank, Reservation systems, multinational companies

Low price/high performance


rapid increasing power of processor with reducing cost, increasing network speed

Improved reliability and availability


Degree of tolerance against errors and component failures Prevent loss of information even in the event of failures Is done by having multiple copies of critical information within the system Availability refers to the total time for which the system is available for use Failure of one component need not stop the remaining components from doing the job

Advantages
Modular Expandability
Open distributed system can exist where additional resources can be added as and when required

Resource sharing
Facilitate hardware resource ( printers, plotters, storage devices) sharing among multiple computers Sharing of software resources like libraries, databases

Information sharing
Information generated by one user can be easily and efficiently shared by users working on other nodes geographical far off from each other Example Computer Supported Cooperative Work or groupware

Advantages
Shorter response times and higher thruput
due to multiplicity of processors can have shorter response times A computation can be partitioned into number of subcomputations ad run on different processors simultaneously

Better Flexibility
Pool of different types of computers suitable for different types of computations

Distributed Operating Systems


Issues :
Transparency Reliability Flexibility Performance Scalability Heterogeneity Security Emulation of existing OSs

DOS: Design Issues ..


Transparency Access Description Hide differences in data representation and how a resource is accessed remote or local , accessibility is same

Location name transparency, user mobility


Migration Scaling Replication

Hide where a resource is located - unique resource names systemwide, logon to any machine and access the resource with same name
Hide that a resource may move to another location Expansion of resources without disturbing activities of the users Hide that a resource may be shared by several competitive users by having replicas of files and resources Hide that a resource may be shared by several competitive users which requires properties event ordering, mutual exclusion, nostarvation, no deadlock Hide the failure and recovery of a resource Hide the load of processors which vary dynamically by migrating jobs from high loaded processors to low loaded processors

Concurrency Failure Performance

DOS : Design Issues Reliability


Fault is a mechanical or algorithmic defect that may generate error. System may fail in two ways:
Fail- stop system stops functioning after changing to a state in which its failure can be detected Byzantine ( undetected software bugs) system continues to function but produces wrong results

Faults can be avoided, tolerated, detected and recovered if possible

DOS : Design Issues


Fault Avoidance
Deals with designing the components of system in such a way that the occurrence of faults is minimized

Fault Tolerance
Ability of the system to function in the event of partial failure of system This can be done by
Redundancy techniques replicating critical hardware and software components with consistency maintained Distributed control Algorithms or protocols must have distributed control

DOS : Design Issues


Fault Detection and recovery
Use of hardware and software to determine the occurrence of failure and to correct the system Techniques are
Atomic Transactions- either all operations performed successfully or none Stateless servers restart the process without any details of previous operations Acknowledgements and time-out based transmissions of messages - detect lost messages based on time outs and acknowledgements

DOS : Design Issues


Flexibility required because
Ease of modification some parts of design need to be modified or replaced due to bug or new requirements with minimum interruption to the user Ease of enhancement kernel level design
Monolithic kernel includes almost all OS services, shared resources, hence no message passing, no context switching- so faster performance Micro kernel as small as possible, includes minimal facility, hence easy to design, implement and install with a penalty of performance

DOS : Design Issues Performance


Batch if possible transfer of data across the network in large chunks rather than individual pages, attach acknowledge in the subsequent msgs Cache whenever possible frequently used data cached to reduce computation time, network bandwidth and avoid contention Minimize copying of data moving data between stack - msg buffer of senders address space - kernel address space should be minimized Minimize network traffic cluster the processes that communicate on a single node rather than migrating Fine-grained parallelism for multiprocessing servers structured as group of threads and executed

DOS : Design Issues


Scalability adapt to increased service load
Avoid centralized entities replication of resources Avoid centralized algorithms especially scheduling algorithms Perform most operations on client workstations done by caching required data

DOS: Issues ..
Heterogeneity dissimilar components Networks , computer hardware , operating systems, programming languages provides Portability, interoperability Disadvantage extra resources required like Mobile code, adaptability softwares (applets, agents, translators)

DOS: Issues .. Security - Distributed systems should allow


communication between programs/users/ resources on different computers. Authentication - Protection against disclosure to unauthorized person Integrity - Protection against alteration and corruption Authorizability - Keep the resource accessible to appropriate use of resources by different users has to be guaranteed.

Message Passing
Interprocess communication (IPC) basically requires information sharing among two or more processes. Two basic methods for information sharing are as follows: original sharing, or shared-data approach; copy sharing, or message-passing approach

In the shared-data approach, the information to be shared is placed in a common memory area that is accessible to all processes involved in an IPC. In the message-passing approach, the information to be shared is physically copied from the sender processs space to the address space of all the receiver processes, and this is done by transmitting the data to be copied in the form of messages (message is a block of information).

message-passing system
A message-passing system is a subsystem of distributed operating system that provides a set of message-based IPC protocols does by shielding the details of complex network protocols and multiple heterogeneous platforms from programmers. It enables processes to communicate by exchanging messages and allows programs to be written by using simple communication primitives, such as send and receive. Helps for building other high level IPC systems like RPC and DSM

Features of a Good Message-Passing System


Simplicity
should be simple and easy to use Must be direct, to construct new applications and to communicate It should be possible to communicate with old and new applications, with different modules without the need to worry about the system and network aspects.

Others
Uniform semantics, efficiency, correctness, reliability, security, flexibility, portability

Features
Uniform Semantics easy to use
In a distributed system, a message-passing system may be used for the following two types of interprocess communication:
local communication -communicating processes are on the same node; remote communication-communicating processes are on different nodes.

Semantics of remote communication should be as close as possible to those of local communications.

Features
Efficiency critical issue
Can be gained by reducing the number of message exchanges, Some optimizations adopted for efficiency are avoiding the costs of establishing and terminating connections between the same pair of processes for every message exchange minimizing the costs of maintaining the connections piggybacking of acknowledgement of previous messages with the next message that involves several message exchanges between a sender and a receiver.

Features
Correctness for group communication Issues related to correctness are as follows: Atomicity- ensures that every message sent to a group of receivers will be delivered to either all of them or none of them ordered delivery - ensures that messages arrive to all receivers in an order acceptable to the application survivability - guarantees that messages will be correctly delivered despite partial failures of processes, machines, or communication links

Features
Reliability
Should cope with failure problems and guarantees the delivery of messages Handling of lost messages thru acknowledgements and retransmissions Detecting and handling duplicate messages by providing sequence numbers to messages

Security
Authentication of sender and receiver Encryption of messages before sending

Features
Flexibility
Users can choose and specify the types and levels of reliability and correctness requirements of their applications To permit any kind of control flow between the cooperating processes, like synchronous and asynchronous send and receive

Portability
Message passing system itself should be portable new IPC facility on another system can be constructed by reusing the existing basic design Applications of Message passing system should be portable external data representation format

Issues in IPC by Message Passing


A message is a block of information formatted by a sending process in such a manner that it is meaningful to the receiving process. It consists of a fixed-length header and a variable-size collection of typed data objects. The header usually consists of the following elements: Address Sequence number Structural information

Structural information.
This element also has two parts. The type part specifies whether the data to be passed on to the receiver is included within the message or the message only contains a pointer to the data, which is stored somewhere outside the contiguous portion of the message. The second part specifies the length of the variablesize message data.

Synchronization
A central issue in the communication structure is synchronization imposed on the communicating processes by the communication primitives. The semantics used for synchronization may by broadly classified as blocking and nonblocking types. A primitive is said to have nonblocking semantics if its invocation does not block the execution of its invoker (the control returns almost immediately to the invoker); otherwise a primitive is said to be of the blocking type.

blocking send primitive, after execution of the send statement, the sending process is blocked until it receives an acknowledgement from the receiver nonblocking send primitive, after execution of the send statement, the sending process is allowed to proceed with its execution as soon as the message has been copied to a buffer. blocking receive primitive, after execution of the receive statement, the receiving process is blocked until it receives a message. nonblocking receive primitive, the receiving process proceeds with its execution after execution of the receive statement, which returns control almost immediately just after telling the kernel where the message buffer is.

An important issue in a nonblocking receive primitive is how the receiving process knows that the message has arrived in the message buffer? Polling- The receiver uses test primitive to periodically poll the kernel to check the buffer status - if the message is already available in the buffer. Interrupt - when the message has been filled in the buffer and is ready for use by the receiver, a software interrupt is used to notify the receiving process.

conditional receive primitive, which also returns control to the invoking process almost immediately, either with a message or with an indicator that no message is available. When both the send and receive primitives of a communication between two processes use blocking semantics, the communication is said to be synchronous, otherwise it is asynchronous. The main drawback of synchronous communication is that it limits concurrency and is subject to communication deadlocks.

Synchronous mode of communication with both send and receive primitives having blocking-type semantics

More reliable Easy to implement

Buffering
In the standard message passing model, messages can be copied many times, hence buffering has to be managed The three types of buffering strategies used in interprocess communication Null Buffering Single message buffer Unbounded capacity buffer Finite bounded capacity buffer

Null Buffer (No Buffering)


no place to temporarily store the message. Hence one of the following implementation strategies used: The message remains in the sender processs address space and the execution of the send is delayed until the receiver executes the corresponding receive. The message is simply discarded and the time-out mechanism is used to resend the message after a timeout period. The sender may have to try several times before succeeding.

Single-Message Buffer
a buffer having a capacity to store a single message on the receivers node. usually used for synchronous communication, an application may have at most one message at a time.

Unbounded-Capacity Buffer
In the asynchronous mode of communication, since a sender does not wait for the receiver to be ready, there may be several pending messages that have not yet been accepted by the receiver. Therefore, a message-buffer that can store all unreceived messages is needed to support asynchronous communication with the assurance that all the messages sent to the receiver will be delivered.

Finite-Bound Buffer
also known as multiple-message buffers - message is first copied from the sending processs memory into the receiving processs mailbox and then copied from the mailbox to the receivers memory when the receiver calls for the message.

strategy is needed for handling buffer overflow. Is of the following two ways: Unsuccessful communication. message transfers simply fail and an error is returned. Flow-controlled communication. the sender is blocked until the receiver accepts some messages, thus creating space in the buffer for new messages.

Failure Handling
During interprocess communication partial failures such as a node crash or communication link failure may lead to the following problems:
Loss of request message- may happen either due to the failure of communication link between the sender and receiver or the receivers node is down at the time the request message reaches there. Loss of response message. This may happen either due to the failure of communication link between the sender and receiver or the senders node is down at the time the response message reaches there. Unsuccessful execution of the request. This may happen due to the receivers node crashing while the request is being processed.

Anda mungkin juga menyukai