Anda di halaman 1dari 8

Chapter 3

Processes

Code Migration
Traditionally, communication in distributed systems is concerned with exchanging data between
processes. Code migration in the broadest sense deals with moving programs between machines, with the
intention to have those programs be executed at the target. In some cases, the execution status of a program,
pending signals, and other parts of the environment must be moved as well.

Reasons for Code Migration


1. Performance
Overall system performance can be improved if processes are moved from heavily-loaded to lightly-
loaded machines.
How system performance is improved by code migration
1. Using load distribution algorithms
By monitoring CPU queue length or CPU utilization, load distribution algorithms are used to
make decision concerning the allocation and redistribution of tasks with respect to a set of
processors.
2. Using qualitative reasoning
Minimizing communication between systems is more important than optimizing computing
capacity. Hence, performance improvement through code migration is often based on qualitative
reasoning instead of mathematical models.
 Migrating parts of the client to the server
Consider, as an example, a client-server system in which the server manages a huge
database. If a client application needs to perform many database operations involving large
quantities of data, it may be better to ship part of the client application to the server and send
only the results across the network. Otherwise, the network may be swamped with the
transfer of data from the server to the client. In this case, code migration is based on the
assumption that it generally makes sense to process data close to where those data reside.
 Migrating parts of the server to the client
For example, in many interactive database applications, clients need to fill in forms that
are subsequently translated into a series of database operations. Processing the form at the
client side, and sending only the completed form to the server, can sometimes avoid that a
relatively large number of small messages need to cross the network. The result is that the
client perceives better performance, while at the same time the server spends less time on
form processing and communication.
 Exploiting parallelism, but without the usual difficulties related to parallel
programming
A typical example is searching for information in the Web. It is relatively simple to
implement a search query in the form of a small mobile program, called a mobile agent, that
moves from site to site. By making several copies of such a program, and sending each off to
different sites, we may be able to achieve a linear speed-up compared to using just a single
program instance.

1
2. Flexibility
The traditional approach to building distributed applications is to partition the application into
different parts, and decide in advance where each part should be executed. However, if code can move
between different machines, it becomes possible to dynamically configure distributed systems.
For example, suppose a client program uses some proprietary APIs for doing some tasks that are
rarely needed, and because of the huge size of the necessary API files, they are kept in a server. If the
client ever needs to use those APIs, then it can first dynamically download the APIs and then use them.
Advantage of this model
Clients need not have all the software preinstalled to do common tasks. Instead, the software can be
moved in as necessary, and likewise, discarded when no longer needed.
Disadvantage of this model
Security - blindly trusting that the downloaded code implements only the advertised APIs while
accessing your unprotected hard disk and does not send the juiciest parts to heaven-knows-who may not
always be such a good idea.

Models for Code Migration


To get a better understanding of the different models for code migration, we use a framework described
in Fuggetta et al. (1998).
In this framework, a process consists of three segments.
1. The code segment is the part that contains the set of instructions that make up the program that is
being executed.
2. The resource segment contains references to external resources needed by the process, such as files,
printers, devices, other processes, and so on.
3. The execution segment is used to store the current execution state of a process, consisting of private
data, the stack, and, of course, the program counter.
Weak Mobility
In this model, it is possible to transfer only the code segment, along with perhaps some initialization data.
Characteristic feature: A transferred program is always started from its initial state.
Example: Java applets – which always start execution from the beginning.
Benefit: Simplicity – weak mobility requires only that the target machine can execute that code, which
essentially boils down to making the code portable.
Strong Mobility
In contrast to weak mobility, in systems that support strong mobility the execution segment can be
transferred as well.
Characteristic feature: A running process can be stopped, subsequently moved to another machine, and
then resume execution where it left off.
Example: D’Agents.
Benefit: Much more general than weak mobility
Drawback: Much harder to implement.
Sender-Initiated Migration [For both strong and weak mobility]
Migration is initiated at the machine where the code currently resides or is being executed.

2
Examples:
1. Uploading programs to a compute server.
2. Sending a search program across the Internet to a web database server to perform the queries at that
server.
Receiver-Initiated Migration [For both strong and weak mobility]
The initiative for code migration is taken by the target machine.
Example: Java applets.
Execute Migrated Code at Target Process or in Separate Process [For weak mobility]
In the case of weak mobility, it also makes a difference if the migrated code is executed by the target
process, or whether a separate process is started. For example, Java applets are simply downloaded by a web
browser and are executed in the browser's address space.
Benefit for executing code at target process: There is no need to start a separate process, thereby
avoiding communication at the target machine.
Drawback for executing code at target process: The target process needs to be protected against
malicious or inadvertent code executions.
Migrate or Clone Process [For strong mobility]
Instead of moving a running process, also referred to as process migration, strong mobility can also be
supported by remote cloning. In contrast to process migration, cloning yields an exact copy of the original
process, but now running on a different machine. The cloned process is executed in parallel to the original
process. In UNIX systems, remote cloning takes place by forking off a child process and letting that child
continue on a remote machine.
Benefit of cloning process: The model closely resembles the one that is already used in many
applications. The only difference is that the cloned process is executed on a different machine.
In this sense, migration by cloning is a simple way to improve distribution transparency.

Figure 3.1: Alternatives for code migration.

Migration in Heterogeneous Systems


The Problem
So far, we have tacitly assumed that the migrated code can be easily executed at the target machine. This
assumption is in order when dealing with homogeneous systems. In general, however, distributed systems
are constructed on a heterogeneous collection of platforms, each having their own operating system and
machine architecture. Therefore,

3
- How can we ensure that the migrated code segment can be executed on the target platform?
- How can we ensure that the execution segment can be properly represented on the target platform?
Solution for the Case of Weak Mobility
As there is basically no runtime information that needs to be transferred between machines, it suffices to
compile the source code generating the target platform code segment.
Solution for the Case of Strong Mobility
A process can have two types of data in its execution segment – some machine-dependent data and some
machine-independent data.
We can easily migrate the machine-independent data. To migrate machine-dependent data, we can have
a runtime system which stores the machine-dependent data in a machine-independent format in the source
system. It can pass the machine-independent data to the target system’s runtime system and the target
runtime system can translate the machine-independent data into the target platform’s machine-dependent
format.
How the runtime system manages the machine-independent copy of the execution segment:
1. The runtime system maintains its own
copy of the program stack, but in a
machine-independent way. We refer to
this coy as the migration stack. The
migration stack is updated when a
subroutine is called, or when execution
returns from a subroutine.
2. When a subroutine is called, the runtime
system marshals the data that have been
pushed onto the stack since the last call.
These data represent values of local
variables, along with parameter values
for the newly called subroutine.
3. The marshaled data are then pushed onto
the migration stack, along with an
identifier for the called subroutine. In
Figure 3.2: The principle of maintaining a migration stack to support
addition, the address where execution migration of an execution segment in a heterogeneous environment.
should continue when the caller returns
from the subroutine is pushed in the form of a jump label onto the migration stack as well.
How code migration is handled:
1. Code migration can take place only when a next subroutine is called.
2. When a code migration takes place, the runtime system first marshals all global program-specific
data forming part of the execution segment. Machine-specific data are ignored as well as the current
stack.
3. The marshaled data are transferred to the destination, along with the migration stack. In addition, the
destination loads the appropriate code segment containing the binaries fit for its machine architecture
and operating system.
4. The marshaled data belonging to the execution segment are unmarshaled, and a new runtime stack is
constructed by unmarshaling the migration stack.
5. Execution can then be resumed simply entering the subroutine that was called at the original site.

4
Migration and Local Resources
What often makes code migration so difficult is that the resource segment cannot always be simply
transferred along with the other segments without being changed. For example, suppose a process holds a
reference to a specific TCP port through which it was communicating with other (remote) processes. Such a
reference is held in its resource segment. When the process moves to another location, it will have to give up
the port and request a new one at the destination.
Process-to-Resource Bindings
To understand the implications that code migration has on the resource segment, Fuggetta et al. (1998)
distinguish three types of process-to-resource bindings.
1. Binding by Identifier
A process refers to a resource by its identifier. In that case, the process requires precisely the
referenced resource, and nothing else.
Examples:
1. A URL to refer to a specific web site.
2. Local communication endpoints (IP, port etc.).
2. Binding by Value
Only the value of a resource is needed. In that case, the execution of the process would not be
affected if another resource would provide the same value.
Example: Standard libraries for programming languages. Such libraries should always be locally
available, but their exact location in the local file system may differ between sites. Not the specific
files, but their content is important for the proper execution of the process.
3. Binding by Type
A process indicates it needs only a resource of a specific type.
Example: References to local devices, such as monitors, printers and so on.
Resource Types
When migrating code, we often need to change the references to resources, but cannot affect the kind of
process-to-resource binding. If, and exactly how a reference should be changed, depends on whether that
resource can be moved along with the code to the target machine. More specifically, we need to consider the
resource-to-machine bindings, and distinguish the following cases:
1. Unattached resources can be easily moved between different machines.
Example: Typically (data) files associated only with the program that is to be migrated.
2. Fastened resources may be copied or moved, but only at relatively high costs.
Example: Local databases and complete web sites.
Although such resources are, in theory, not dependent on their current machine, it is often infeasible
to move them to another environment.
3. Fixed resources are intimately bound to a specific machine or environment and cannot be moved.
Example: Local devices, local communication end points.
Resource Considerations for Code Migration
Combining three types of process-to-resource bindings, and three types of resource-to-machine bindings,
leads to nine combinations that we need to consider when migrating code. These nine combinations are
shown below.

5
Establishing a GR is a better alternative
Normally, copies of such
when huge amounts of data are to be
resources are readily
copied, e.g. for with dictionaries and
available on the target
thesauruses in text processing
machine, or should
When the resource is otherwise be copied before
shared by other processes code migration takes place

Irrespective of the resource-to-


machine binding, the obvious
solution is to rebind the process to
a locally available resource of the
same type.
Only when such a resource is not
available, will need to copy or
move the original one to the
destination, or establish a global
reference.

Examples Unattached Fastened Fixed


By Identifier URL Ports
By Value Standard library files
By Type Monitors, Printers

6
Chapter 6
Consistency & Replication

Introduction
An important issue in distributed systems is the replication of data. Data are generally replicated to
enhance reliability or improve performance. One of the major problems is keeping replicas consistent.
Informally, this means that when one copy is updated we need to ensure that the other copies are updated as
well; otherwise the replicas will no longer be the same. In this chapter, we take a detailed look at what
consistency of replicated data actually means and the various ways that consistency can be achieved.

Issues in Keeping Replicas Consistent


There are essentially two, more or less independent, issues we need to consider.
1. Managing replicas – where to place replica servers and how content is distributed to these servers.
2. How replicas are kept consistent – how can updates be propagated more or less immediately
between replicas.

Data-Centric Consistency Models


Traditionally, consistency has been discussed in the context of read and write operations on shared data,
available by means of (distributed) shared memory, a (distributed) shared database, or a (distributed) file
system.
In this section, we use the broader term data store. A data store may be physically distributed across
multiple machines. In particular, each process that can access data from the store is assumed to have a local
(or nearby) copy available of the entire store. Write operations are propagated to the other copies.
The Problem
Normally, a process that performs a read operation on a data item, expects the operation to return a value
that shows the results of the last write operation on that data.
In the absence of a global clock, it is difficult to define precisely which write operation is the last one.
Solution
Use a consistency model.
A consistency model is essentially a contract between processes and the data store. It says that if
processes agree to obey certain rules, the store promises to work correctly.
There are a range of consistency models. Each model effectively restricts the values that a read operation
on a data item can return. As is to be expected, the ones with minor restrictions are easy to use, whereas
those with major restrictions are sometimes difficult.
Notations Used in Consistency Models
To study consistency in detail, we will give numerous examples. To make these examples precise, we
need a special notation in which we draw the operations of a process along a time axis:
- The time axis is always drawn horizontally, with time increasing from left to right.
- The symbol Wi(x)a means that a write by process Pi to data item x with the value a has been done.
- The symbol Ri(x)b means that a read by process Pi from data item x returning b has been done.
- Assume that each data item is initially NIL.
- When there is no confusion concerning which process is accessing data, we omit the index from the
symbols W and R.

7
As an example, in the figure beside, P1 does a write to a data item x,
modifying its value to a. P2 later reads x and sees value a.

Strict Consistency
Any read on a data item x returns a value corresponding to the result of the most recent write on x.
This definition is natural and obvious, although it implicitly assumes the existence of absolute global
time so that the determination of “most recent” is unambiguous.
Problem with Strict Consistency
It relies on absolute global time. In essence, it is impossible in a distributed system to assign a unique
timestamp to each operation that corresponds to actual global time.
Example Problematic Situation
As an example, in Fig. (a) below, P1 does a write to a data item x, modifying its value to a. Note that, in
principle, this operation W1(x)a is first performed on a copy of the data store that is local to P1, and is then
subsequently propagated to the other local copies. In our example, P2 later reads x (from its local copy of the
store) and sees value a. This behavior is correct for a strictly consistent data store. In contrast, in Fig. (b), P2
does a read after the write (possibly only a nanosecond after it, but still after it), and gets NIL. A subsequent
read returns a. Such behavior is incorrect for a strictly consistent data store.

Anda mungkin juga menyukai