Applications OS Hardware
Distributed System
Distribution Transparency
Depending on which computing system you use, you will have to consider the byte order in which multibyte numbers are stored, particularly when you are writing those numbers to a file. The two orders are called "Little Endian" and "Big Endian".
Goal of Transparency
Hide all irrelevant system-dependent details from the user and system programmer and create the illusion of a simple and easy to use system
An open distributed system Offers services according to standard rules that describe syntax and semantics of the services Can interact with services from other open systems, irrespective of the underlying environment
Examples In computer networks, standard rules govern the format, contents and meaning of messages sent and received In distributed systems, services are specified through interface description language (IDL)
Cloud Computing
A cloud is an elastic execution environment of resources providing a metered service at multiple granularities. On-demand resource allocation: add and subtract processors, memory, storage.
Elastic Compute Cloud (EC2) Rent computing resources by the hour Basic unit of accounting = instance-hour Additional costs for bandwidth Simple Storage Service (S3) Persistent storage Charge by the GB/month Additional costs for bandwidth Youll be using EC2 for a programming assignment!
Transaction
Sensor Networks
1.4 Internet
intranet % % ISP
backbone
1.4.1 World-Wide-Web
30
http://www.uu.se/
www.w3c.org File system of www.w3c.org http://www.w3c.org/Protocols/Activity.html Protocols
Activity.html
31
Instructors Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012
Announcement
Distributed Systems
Design Issues
Definition:
Distributed operating system: Integration of system services presenting a transparent view of a multiple computer system with distributed resources and control. Consisting of concurrent processes accessing distributed shared or replicated resources through message passing in a network environment.
Generation: Third Generation Operating System. Characteristics: Global view of file system, name space, time, security, computational power. Goal: Single computer view of multiple computer system (transparency)
Efficiency
Consistency
Consistency Problem:
Users perspective: Uniformity in using the system Predictability of the systems behavior Systems perspective: Integrity maintenance
Robustness
Robustness Problems:
Fault tolerance
What to do when a message is lost? Handling of exceptional situations and errors Changes in the system topology Long message delays Inability to locate a server
Implementation Issues
Objects models and identification. Distributed Coordination. Interprocess Communication Distributed Resources. Fault Tolerance and Security.
Identification / Name
Design Issue Example: Resource identification [2] The resources in a distributed system are spread across different computers and a naming scheme has to be devised so that users can discover and refer to the resources that they need.
An example of such a naming scheme is the URL (Uniform Resource Locator) that is used to identify WWW pages. If a meaningful and universally understood identification scheme is not used then many of these resources will be inaccessible to system users.
Objects:
processes, files, memory, devices, processors, and networks. Each object associate with a defined access operation. Accesses via object servers Name Physical or Logical address Service that the servers provide. Multiple server addresses may exist requiring a server to move requiring the name to be changed.
Object access:
Identification Issue:
Distributed Coordination
[1]
Synchronization Types
Barrier Synchronization:
Process must reach a common synchronization point before they can continue: A process must wait for a condition that will be set asynchronously by other interacting processes to maintain some ordering of execution. Concurrent processes must have mutual exclusion when accessing a critical shared resource.
Condition Coordination:
Mutual Exclusion:
Synchronization Issues:
Typically only partial state information is known about other processes making synchronization difficult. Information not current due to transfer time delay.
Deadlocks
Circular Waiting for the other process Deadlock detection and recovery strategies.
Interprocess Communication
[1]
Lower level: message passing Higher level logical communication provides transparency
Remote Procedure Call, (RPC), communication. RPC built on top of client/server model. Request/reply message passing as used in programming procedure-call concept.
Susceptible to failures in the system due to having to communicate through several protocol layers.
Distributed Resources
[1]
Resources:
Data (Storage) Processing capacity (Sum of all processors) Transparency of Data distribution:
Distributed shared memory Issue: Sharing and replication of data/Memory. Applications are constrained by time, thus scheduling of process must satisfy a real-time requirement.
Multiprocessor scheduling Objective: Minimize the completion time of processes Issue: Minimize communication overhead with
efficient scheduling.
Load sharing Objective: Maximize the utilization of processors. Issue: Process migration strategy & mechanism.
[1]
Failures: Faults due to unintentional intrusions Security Violations: Faults due to intentional intrusions. Faults Transparent to user:
System Redundancy (Inherent property in Distributed Systems) Systems Ability to Recovery. (Rolling back failed processes)
Security Issue: Authentication & Authorization Access control over across network with different administrative units & varying security models.
Summary of Issues
Issue Communication, Synchronization, distributed algorithms Process scheduling, deadlock handling, load balancing Resource scheduling, file sharing, concurrency control Failure handling, configuration, redundancy
[3]
Performance
Resource
System Failures
The quality of service offered by a system reflects its performance, availability and reliability. It is affected by a number of factors such as the allocation of processes to processes in the system, the distribution of resources across the system, the network and the system hardware and the adaptability of the system.
References
[1] Randy Chow & Theodore Johnson, 1997,Distributed Operating Systems & Algorithms, (Addison-Wesley), p. 45 to 50. [2] Ian Sommerville, 2000, Software Engineering, 6th edition, Chapter 11.
[3] Pierre Boulet, 2006, Distributed Systems: Fundemental Concepts
Distributed Systems
Logical Clocks
Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Logical clocks
Assign sequence numbers to messages
Each system maintains its own local clock No total ordering of events
No concept of happened-when
Happened-before
Lamports happened-before notation
a b event a happened before event b e.g.: a: message being sent, b: message receipt Transitive: if a b and b c then a c
If a and b occur on different processes that do not exchange messages, then neither a b nor b a are true
P1 P3
P2
g
j
1 1 2
i
3
k
2
P1 P3
P2
g
j
1 1 2
i
3
k
2
Bad ordering:
eh
fk
Lamports algorithm
Each message carries a timestamp of the senders clock When a message arrives:
Lamports algorithm
Algorithm allows us to maintain time ordering among related events
Partial ordering
P1 P3
P2
1
g
j
1 2
h
6
i
7
k
27
Summary
Algorithm needs monotonically increasing software counter Incremented at least when events that need to be timestamped occur Each event has a Lamport timestamp attached to it For any two events, where a b:
P1 P3
P2
1
g
j
1 6
i
7
k
7
ab, bc, :
ic, fd , dg, : Lamport imposes a sendreceive relationship Concurrent events (e.g., a & i) may have the same timestamp or not
Compare timestamps:
b
2.1
d e
5.1
f
6.1
3.1 4.1
P2
P3
g
1.2
h
6.2
i
7.2
j
1.3
k
7.3
Rules:
c
e
d
f
Event a
timestamp (1,0,0)
c
e
d
f
Event a b
d
f
Event a b c
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) e f
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) (2,2,2) e f
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) (2,2,2) e f
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) (2,2,2) e f
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) (2,2,2) e f
Event a b c d
(1,0,0) (2,0,0) Vector (0,0,0) a btimestamps P1 (0,0,0) P2 (0,0,0) P3 (2,1,0) (2,2,0) c d (0,0,1) (2,2,2) e f
Event a b c d
Causality
Concurrency
The end.
Interleaving model
P1 sends what is my checking balance to P2 P1 sends what is my savings balance to P2 P2 receives what is my checking balance from P1 P1 sets total to 0 P2 receives what is my savings balance from P1 P2 sends checking balance = 40 to P1 P1 receives checking balance = 40 from P2 . . .
Logical Clocks
A logical clock C is a map from the set of events E to N (the set of natural numbers) with the following constraint:
Vector Clocks
Map from the set of states to vectors of natural numbers with the constraint:
iff s.v < t.v ...... where s.v is the vector assigned to the state s.
int id) {
v[i] = 0;
message
sentValue) {
Direct-Dependency Clocks
Weaker version of the vector clock Maintain a vector clock locally Process sends only its local component of the clock Directly precedes relation: only one message in the happened-before diagram of the computation Direct-dependency clocks satisfy
public class DirectClock { public int[] clock; int myId; public DirectClock(int numProc, int id) { myId = id; clock = new int[numProc]; for (int i = 0; i < numProc; i++) clock[i] = 0; clock[myId] = 1; } public int getValue(int i) { return clock[i]; } public void tick() { clock[myId]++; } public void sendAction() { // sentValue = clock[myId]; tick(); } public void receiveAction(int sender, int sentValue) { clock[sender] = Util.max(clock[sender], sentValue);
Matrix Clocks
Lecture 9
SYNCHRONIZATION TIME, EVENT, CLOCKS
SE-9048 Concurrency & Distributed System
PART 4
99
Vector Clocks
100
Vector Clocks
Vector Clocks was proposed to overcome the limitation of Lamports clock: the fact that C(a)<C(b) does not mean that ab
The property of inferring that a occurred before b is called as causality property
A Vector clock for a system of N processes is an array of N integers Every process Pi stores its own vector clock VCi
Lamports time value for events are stored in VCi VCi(a) is assigned to an event a
101
Example
Our timestamp is now a vector of numbers, with each element corresponding to a process. Each process knows its position in the vector. For example, the vector corresponds to the processes (P0, P1, P2) are given as : If a process P0 has four events, a, b, c, d, they would get Vector timestamps of (1,0,0), (2, 0, 0), (3, 0, 0), (4, 0, 0). If a process P2 has four events, a, b, c, d, they would get 102 Vector timestamps of 0,1,0), (0, 2, 0), (0, 3,
1. VCi[i]
Pj
103
message
The entire vector is sent along with a message. When the message is received by a process (that is an event that will get a timestamp), the receiving process does the following: increment the counter for the process' position in the vector, just as it would prior to timestamping any local event. Perform an element-by-element comparison of the received vector with the process' timestamp vector. Set the process' timestamp vector to the higher of the values:
104
Whenever there is a new event at Pi, increment VCi[i] When a process Pi sends a message m to Pj:
Increment VCi[i] Set ms timestamp ts(m) to the vector VCi ts(m)[k]) ; (for all k)
P0
VC0=(1,0,0) m:(2,0,0)
VC0=(2,0,0)
P1 P2
VC1=(0,0,0) VC2=(0,0,0)
VC1=(2,1,0)
105
P0
VC0=(0,0,0)
m:(2,0,0)
P1 P2
VC1=(0,0,0)
VC1=(0,1,0)
VC1=(2,2,0)
VC1=(2,3,0)
m:(2,3,0) VC2=(0,0,0)
VC2=(2,3,1)
106
VECTOR CLOCK
To determine if two events are concurrent, do an element-by-element comparison of the corresponding timestamps. If each element of timestamp V is less than or equal to the corresponding element of timestamp W then V causally precedes W and the events are not concurrent. If each element of timestamp V is greater than or equal to the corresponding element of timestamp W then W causally precedes V and the events are not concurrent. If, on the other hand, neither of those conditions apply and some elements in V are greater than while others are less than the corresponding element107 in W
The timestamp for m is less than the timestamp for g because each element in m is less than or equal to the corresponding element in g. That is, 108 0 6, 0 1, and 2 2.
Logical Clocks are employed when processes have to agree on relative ordering of events, but not necessarily actual time of events
Two types of Logical Clocks
Supports relative ordering of events across different processes by using happen-before relationship
Vector Clocks
109
Synchronization in distributed systems are often more difficult compared to synchronization in uniprocessor and multiprocessor systems.
Synchronization in time achieved by Clock Synchronization algorithms Synchronization between resources - Mutual Exclusion algorithms
Mutual Exclusion
Mutual exclusion (often abbreviated to mutex) algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections.
Critical Sections
In concurrent programming a critical section is a piece of code that accesses a shared resource (data structure or device) that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will only have to wait a fixed time to enter it. Some synchronization mechanism is required at the entry and exit of the critical section to ensure exclusive use, for example a semaphore.
Mutual exclusion ensures that concurrent processes make a serialized access to shared resources or data. A distributed mutual exclusion algorithm achieves mutual exclusion using only peer communication. The problem can be solved using either a contentionbased or a token-based approach.
Centralized algorithms
Distributed Algorithms
Controlled Algorithms (Token based algorithms)
Tree Structure
Voting schemes
Broadcast Structure
Ring Structure
a) b) c)
Process 1 asks the coordinator for permission to enter a critical region. Permission is granted Process 2 then asks permission to enter the same critical region. The coordinator does not reply. When process 1 exits the critical region, it tells the coordinator, when then replies to 2
Advantages
Fair algorithm, grants in the order of requests The scheme is easy to implement Scheme can be used for general resource allocation
Critical Question: When there is no reply, does this mean that the coordinator is dead or just busy?
Shortcomings
Single point of failure. No fault tolerance Confusion between No-reply and permission denied Performance bottleneck of single coordinator in a
large system
Distributed Algorithms
A contention-based approach means that each process freely and equally competes for the right to use shared resource by using a request control criteria. The fairest way is to grant the request to the process which asked first (Timestamp Prioritized scheme) or the process which got the most votes from the other processes (Voting scheme).
Ricart & Agrawala came up with a distributed mutual exclusion algorithm in 1981. It requires the following: total ordering of all events in a system (e.g. Lamport's algorithm or others). messages are reliable (every message is acknowledged).
When a process receives a request message, it may be in one of three states: Case 1: The receiver is not interested in the critical section, send reply (OK) to sender. Case 2: The receiver is in the critical section; do not reply and add the request to a local queue of requests.
When the process is done with its critical section, it sends a reply (OK) to everyone on its queue and deletes the processes from the queue
One problem with this algorithm is that a single point of failure has now been replaced with n points of failure. A poor algorithm has been replaced with one that is essentially n times worse. All is not lost. We can patch this omission up by having the sender always send a reply to a message... either an OK or a NO. When the request or the reply is lost, the sender will time out and retry. Still, it is not a great algorithm and involves quite a bit of message traffic but it demonstrates that a distributed algorithm is at least possible