Anda di halaman 1dari 7

<H1> <h1>1.

1 Heterogeneous Distributed Database Capabilities


Heterogeneous distributed database systems provides various capabilities such as integrating schemas
of different databases, processing distributed query, managing distributed transactions as well as
administrative functions, and dealing with various types of heterogeneity. Let us understand each of
these capabilities:
Integrating Schemas: Allow users to logically view the distributed data at a single place
Processing Distributed Queries: Deal with analysis, optimization, and execution of queries that
refer to the data available in distributed databases.
Managing Distributed Transactions: Facilitate ACID properties of transaction in a distributed
system.
Managing Administrative Functions: Comprise authentication and authorization of accessing
data, defining and enforcing constraints on data, and maintaining data dictionaries.
Dealing with Heterogeneity: Implies managing differences in hardware, operating systems,
communications links, DBMS vendors, and data models.
The above mentioned are the important capabilities of distributed data management. After getting a
brief overview of these capabilities, let us discuss each of them in detail.
Different types of capabilities can be provided by heterogeneous distributed database systems. They include

Formatted: Font: (Default) +Body (Calibri), 11


pt, Not Bold
Formatted: Font: (Default) +Body (Calibri), 11
pt, Not Bold
Formatted: Font: Bold
Formatted: List Paragraph, Bulleted + Level: 1
+ Aligned at: 0.25" + Indent at: 0.5"
Formatted: Font:
Formatted: Font:
Formatted: Font: Bold
Formatted: Font:
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font:

schema integration, distributed query processing, distributed transaction management, administrative


functions, and coping with different types of heterogeneity. Schema integration has to do with the way in
which users can logically view the distributed data. Distributed query management deals
with the analysis, optimization, and execution of queries that reference distributed
data. Distributed transaction management deals with the atomicity, isolation, and
durability of transactions in a distributed system. Administrative functions include such things as
authentication and authorization, defining and enforcing semantic constraints on the data, and
management of data dictionaries and directories.
Heterogeneity can include differences in hardware, operating systems, communications links, database
management system (DBMS) vendors, and/or data models.
These are all important aspects of distributed data management. In considering
them, it is important to recognize there is no ideal set of capabilities for all environments
or applications. A particular capability may be invaluable in certain situations while being totally unsuitable
in others.

1.1.1 Schema Integration <H2>


Prior to the discussion on schema integration, let us get familiarized with the term schema. Schema
defines the structure of data in a database and each local database has a local schema. A user generally
views that portion of the distributed data in the schema which is of his/her interest and meets
requirements. Now, the problem arises to map the user view and the local schema. The two general
approaches to these problems are:
1. All the local schemas can be integrated into a single global schema, which will depict the entire
data of Each local database has a local schema, describing the structure of the data in that
database. Each user has a user view, describing that portion of the distributed data that is of
interest to the user, possibly reorganized to meet the users requirements.the distributed
system. Further, every user can derive their views as per the requirements from the global
schema.
2. Different parts of the various local schemas can be integrated into multiple federated schemas.
These schemas will be according to the user view and the user view can be extracted from these
federated schemas.

Formatted: Font: (Default) +Body (Calibri), 11


pt
Formatted: List Paragraph, Numbered + Level:
1 + Numbering Style: 1, 2, 3, + Start at: 1 +
Alignment: Left + Aligned at: 0.25" + Indent
at: 0.5"
Formatted: Font: (Default) +Body (Calibri), 11
pt
Formatted: Font: (Default) +Body (Calibri), 11
pt

In both the approaches, the


There are two general approaches to the problem of providing mappings between user views and the
local schemata:
(1)
(2)
All of the local schemata may be integrated into a single global schema that represents all data in the
entire distributed system. All user views are derived from this global schema.
Various portions of various local schemata may be integrated into multiple federated schemata. These
federated schemata may correspond closely to user views, or user views may be further derived from
these federated schemata.
local schemas are integrated that forms a conceptual schema hiding the fragments of local schemas.
Moreover, if the local databases are based on various data models and different names are used for a
specific data element, then the first step will be to translate the local schema into an equivalent schema
in some common schema with common names as well as representations. Therefore, problem will be
faced with schema translation.
The simplest approach of schema integration is collaborating local schema or subsets of local schema
into a conceptual schema to be viewed by various distributed users. In the conceptual schema, you can
identify a data item by the name if an item along with the name of the local database.
More capabilities that can be provided are replication, horizontal and vertical fragmentation, and data
mapping. Let us discuss each of them:
Replication: Refers to the approach in which the data items are identified as replicas of each
other.
Horizontal Fragmentation: Refers to the approach in which the data items of various local
databases are considered as a part of the same table or an entity set in the conceptual schema.
Vertical Fragmentation: Refers to the approach in which the data items of various local
databases are considered as a part of the same row or an entity in the conceptual schema. Each
row or entity will have different attributes.
Data Mapping: Refers to the approach in which data types or data values are transformed for
conformity with each other. For example, one local database may be storing the salaries of
employees in Rupees whereas the other database may be storing the salaries in Dollars. To
integrate the schemas of these databases, we need to map Rupees to Dollars.
In either approach one is faced with schema integration-the process of developing a conceptual schema
that encompasses a collection of local schemata.
If the local databases are based on different data models and/or use different names or representations
for particular data elements, the usual first step is for each local schema to be translated into an
equivalent schema in some common data model using common names and representations. Thus, in this
case one is also faced with a schema translation problem.
The simplest form of schema integration is schema union, whereby a conceptual schema seen by
distributed users is the disjoint union of local schemata or perhaps of subsets or views of local schemata.
In the simplest form of this, a data item in the conceptual schema is identified by the name of a local
database concatenated with the name of an item in the local database.
More commonly, some type of aliasing capability is provided. More sophisticated capabilities that can
be provided include the following:
Replication, whereby data items in different local databases may be identified as copies of each other
Horizontal fragmentation, whereby data items in different local databases may be identified as logically
belonging to the same table or entity set in the integrated schema Vertical fragmentation, whereby data
items in different local databases may be identified as logically representing the same row or entity in the
integrated schema but containing different attributes for the row or entity Data mapping, whereby data
types or data values are converted for conformity with each other As an example of data mapping, one
local database might store temperatures as floating point numbers representing degrees Celsius, and

Formatted: Font: (Default) +Body (Calibri), 11


pt

Formatted: Font: Bold


Formatted: List Paragraph, Bulleted + Level: 1
+ Aligned at: 0.25" + Indent at: 0.5"
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font: Bold
Formatted: Font: Bold

another local database might store temperatures as character string representations of degrees
Fahrenheit. To integrate these schemas, one might want to map degrees Fahrenheit to degrees Celsius
and map character string representations of numbers to floating point representations, or vice versa.

1.1.2 Distributed Query Management <H2>


Distributed query management implies combining the data retrieved through a single query from
various local databases. A distributed database system may or may not provide such functionality. Some
distributed systems send queries to individual databases to extract data from various local databases;
instead of sending a single distributed query to a distributed query manager. If distributed query
management is available, it is possible to optimize the heterogeneous distributed queries and handle
their execution. There will be great difference in the communication speed over the network and
workload on the local processors. The parallel execution of queries is dependent on the features of a
system. Moreover, the complexity can increase due to the replicated or fragmented data. An ideal
approach is to move all the data to the query site and then integrate the data to handle the user query.
Distributed query management provides the ability to combine data from different local databases in a
single retrieval operation (as seen by the user). This capability may or may not be provided by a
distributed database system. In some systems, an application needing data from multiple local databases
would explicitly send queries to the individual databases rather than presenting a single distributed query
to a distributed query manager. If distributed query management is provided, there are many possible
approaches to optimizing the heterogeneous distributed queries and managing their execution. To
complicate the optimization problem, there may be great differences in the speeds of the communications
links, the speeds and workloads of the local processors, and the nature of the data operations available at
the local sites. Depending on the system characteristics, it may be desirable to exploit parallelism in the
execution of queries.
Replicated and/or fragmented data also add complexity. One of the simpler approaches to query
management is to move all the desired data to the query site and combine them there. More
sophisticated approaches may consider a wide range of algorithms and take account a wide range
of factors in evaluating them [Chen et al. 19891.

1.1.3 Distributed Transaction Management


Similar to distributed query management, in distributed transaction management the data from the
multiple sites is to be updated or read through a single transaction. Also, the distributed transaction
management preserves the ACID properties of a transaction, which may or may not be provided by a
distributed database system. The possibility of a system is to provide only two aspects:
Concurrency control protocols (Local as well as Distributed)
Commit protocols (Local as well as Distributed)
Local concurrency control protocols ensure that the execution of local transactions, irrespective of their
origination, should meet the standards of the local system. This implies that the execution should be
serializable, that is, the actual execution should be in accordance with the serial execution of
transactions. Apart from local concurrency control protocols, the distributed concurrency protocols are
also designed to ensure that the distributed transactions are globally serializable. It implies the
serializations of local components are in accordance with the global serialization of all the distributed
transactions.
Local commit protocols assure atomicity and durability properties of local transactions. This implies that
either all the tasks related to a local transaction complete or none of them. Similar to local commit
protocol, distributed commit protocols assure global atomicity and durability implying either all the local
subtransactions of a distributed transaction commit or none of them.
You should keep in mind that the distributed protocols have an operational cost along with the
implementation cost because the failure at one site may result into the locked data available at the

Formatted: Font: (Default) +Body (Calibri), 11


pt

Formatted: List Paragraph, Bulleted + Level: 1


+ Aligned at: 0.25" + Indent at: 0.5"

other site. This locked data may not be available for extended periods of time. Therefore, there are
benefits of not using these protocols if they are not required.
In certain situations, due to semantics of database and applications, the distributed concurrency control
and commit protocols may not be required. For example, maintaining the independent databases such
as bibliographic listing or restaurant ratings, in which a user can retrieve data from multiple databases as
a collection. Because the databases are independently updated and the users require partial
information, then there is no need for global serializability or global atomicity.
Another situation can be in which distributed concurrency commit protocols are required instead of
distributed concurrency control. For example, travel reservation system in which seats are reserved for
multiple resources and are managed independently in various databases. In such an example, the order
is not important therefore the distributed concurrency control protocols are not required. However, a
user may require a number of independent reservations into a single transaction; therefore, either all of
the tasks should complete or none of them.
At times, both the distributed concurrency and commit protocols are needed. For example, a banking
system where accounts at different branches are maintained in various databases. In such an example,
to ensure the funds are transferred between two or more accounts accurately, the global atomicity is
required. Also, to ensure the audit programs run correctly, global serializability is required.
Distributed transaction management provides the ability to read and/or update data at multiple sites within
a single transaction, preserving the transaction properties of atomicity, isolation, and durability [Ceri
and Pelagatti 19841. This capability may or may not be provided by a distributed database
system. If it is provided, there are two aspects provided: concurrency control protocols and commit
protocols.
Local concurrency control protocols ensure that the execution of local transactions, whether originating at
the local database or coming from a remote site, meet the isolation standards of the local system. In most
cases this means the execution is serializable; that is, the actual execution is equivalent to the serial
execution of the transactions in some order [Eswaran et al. 19761. Distributed concurrency control
protocols are designed to ensure that distributed transactions are globally serializable; that is, that the
serializations of the local components at all the databases are compatible with some global serialization
of all the distributed transactions [Traiger et al. 19821.
Local commit protocols guarantee atomicity and durability of local transactions; that is, that either all of
the a&ions of a local transaction complete and commit or none of them do. Distributed commit protocols
guarantee global atomicity and durability; that is, that either all of the local subtransactions of a distributed
transaction complete and commit or none of them do.
These distributed protocols carry an operational cost, as well as an implementation
cost, since a failure at one site may leave locked data at other sites unavailable for extended periods of
time. Thus, there can be advantages to not having them if they are not needed. In some situations the
semantics of the databases and applications may make both distributed concurrency
control and distributed commit protocols unnecessary. An example of this is a collection
of independently maintained reference databases (e.g., restaurant ratings or bibliographic listings), with a
capability for users to submit queries that form the union of information from the different databases in
the collection. Since the databases are being independently updated and since users are typically
interested in partial information, even if one of the databases is temporarily unavailable, there is no need
to ensure global serializability or global atomicity. In other situations, distributed commit protocols, but not
distributed concurrency control, may be needed. An example of this is a travel reservation system
in which reservations for different resources are independently managed in different databases. The order
in which reservations for different resources are interleaved is not critical, so distributed concurrency
control is not necessary. A user would, however, often want to package a number of interdependent
reservations into a single transaction and require that either they all commit or none of them do.
In other situations, both distributed concurrency control and commit protocols may be needed. An
example is a banking system in which accounts at different branches are maintained in different
databases. Global atomicity is needed to ensure that funds transfers between branches are handled
correctly. Global serializability is needed to ensure that audit programs run correctly, even when run
concurrently with funds transfer programs.

1.1.4 Administration <H2>


In a database system, various administrative functions should be performed such as user authorization,
enforcing semantic integrity, and maintaining schema information. Lets discuss each of these in detail.
A number of administrative functions, such as authorization of users, definition and enforcement of
semantic integrity constraints on the data, and maintenance of schema information, must be performed
for any database system. In a heterogeneous distributed database system, there are many choices as to
the degree of centralization and transparency of these administrative functions.

1.4.1 Authorization Management <H3>


The user authorization can either be completely decentralized or centralized. In case of decentralized
user authorization, each user is granted permissions by local database administrators and enforces by
the database management systems. However, in case of the centralized user authorization each user
has a system-side permission, which is enforced by the system-wide distributed query manager.
Let us discuss the centralized and decentralized user authorization with the help of examples. With
centralized permissions, a user may be allowed to display the average salaries instead of the individual
salary.
1.4.2 Semantic Integrity Management <H3>
Few database management systems provide functionalities for providing and enforcing semantic data
integrity rules. For the heterogeneous databases, the semantic integrity management can be applied
either at a central level or at a global level through individual local schema or global conceptual schema.
Moreover, the rules or checks can be applied on the database on the basis of the data stored in the
system.
1.4.3 Location Transparency and Schema Maintenance <H3>
Location transparency implies working with the distributed database irrespective of the awareness of its
location or whether or not it is distributed. The location transparency plays a crucial role at two levels:
user and database administrator.
Users may need to mention the data items with the help of machine name or address. Also, the users
may need to mention the logical names for data items. Some naming conventions should be followed
for logical data items to ensure uniqueness.

1.1.5 Types of Heterogeneity <H2>


A distributed database may involve a number of different types of heterogeneity. The heterogeneity is
in terms of computer hardware operating systems, communications links and protocols, database
vendors, and data models. It is possible that one protocol may permit out-of-band interruption whereas
the other protocol may not. Variety of database vendors uses different data dictionary and manipulation
languages within the same data model.
At one extreme, authorization can be completely decentralized, with each user having a user id at each
local database and permissions granted by local database adminstrators and enforced by local database
management systems. At the other extreme, authorization can be completely centralized, with each user
having a system-wide user id and permissions granted on a system-wide basis by a system-wide
database administrator and enforced by a system-wide distributed query manager. An intermediate
possibility is to have systemwide user ids but with permissions granted and enforced at local databases.
One potential advantage of centralized permissions is that a user can be given access to certain
distributed views while being denied direct access to the underlying local data. For example, one might
have a table in one database that tells which employees are assigned to which division of a company and
another table in another database that gives the salary of each employee.

Formatted: Font: (Default) +Headings


(Cambria), 13 pt, Bold, Font color: Accent 1

Formatted: Font: (Default) +Body (Calibri), 11


pt

Formatted: Heading 3, Adjust space between


Latin and Asian text, Adjust space between
Asian text and numbers

Formatted: Font: (Default) +Body (Calibri), 11


pt

Formatted: Font: (Default) +Body (Calibri), 11


pt
Formatted: Font: (Default) +Body (Calibri), 11
pt

One might want certain users to have access to the average salaries in the different divisions of the
company but not to the salaries of individual employees. With centralized permissions, it is straightforward
to define a view containing the average salary information (obtained by taking the join of the two local
tables and applying aggregation and projection operations) give the users permission to access
the view but not permission to access the underlying employee salary table. With only local permissions,
the only choices are to give the user access to the employee salary table or not to. There is no
mechanism for preventing access to the salary table but granting access to a view derived
from a join of the salary table with a table in another database.

1.4.2 Semantic Integrity Management


Some database management systems provide capabilities for specifying and enforcing semantic integrity
rules for the database instead of leaving that function to the applications programmers. These include
such things as rules for allowed data values and existence dependencies. In the context of
heterogeneous distributed databases, this can be done either centrally,through a global conceptual
schema, or locally, through the individual local schemas.
One potential advantage of the centralized approach is that distributed integrity rules,
that is, rules that depend on data stored in different databases, can be enforced.

1.4.3 Location Transparency and Schema Maintenance


Location transparency is the ability to deal with distributed data without having to know where it is or even
that it is distributed.
There are two levels at which location transparency is relevant: the level of the user (programmer or
interactive user) and the level of the database administrator.
At one extreme, users may have to specify data items by machine name or address, DBMS identifier,
database name, and data item name. At the other extreme, users may only have to specify logical names
of data items. For pure logical data item naming to work, there must be some kind of naming
convention that guarantees uniqueness of names for all data items accessible by the user, a troublesome
requirement if there are users who need access to essentially all the data of the enterprise. A
compromise is base name and a data item name, with the system mapping logical database names to
local databases or federated schemata.
This requires only system-wide uniqueness of logical database names and uniqueness of data item
names within each logical database.
There is also a range of choices for how the database administrator(s) provide location
transparency for the programmer or interactive user. At one extreme, the database
administrator(s) may individually maintain multiple data dictionaries at multiple sites describing what data
can be found where. These data dictionaries may all represent a global schema and describe
all the data in the system, or they may each represent a particular federated schema and only describe
data of interest to a particular group of users. At the other extreme, the system itself may make all
decisions about data placement and maintain all directories automatically (although this extreme
would be highly unusual in a heterogeneous system). A compromise approach is to have any changes in
data placement be entered only once and to have the system automatically propagate the information
throughout the system as needed. In a system with multiple federated schemata, the data dictionaries
and directories are typically physically decentralized. In a system with a single global schema, they
are logically centralized but may be physically centralized or distributed. In either case, as indicated
above, varying degrees of centralization and coordination are possible for their maintenance.

1.5 Types of Heterogeneity


Many people in the research community have traditionally regarded a heterogeneous
database system as one involving multiple data models. From a practical standpoint, however, a
distributed database system may involve a number of different types of heterogeneity-computer
hardware, operating systems, communications links and protocols, database management
systems (vendors), and/or data model search of which presents its own problems to
the system developer. Different hardware and operating systems may use different data representations,
for example, different character codes or different representations for floating point numbers. Different
communications mechanisms require gateways, and these can be difficult to implement
because of the different capabilities involved.

Formatted: Normal, Don't adjust space


between Latin and Asian text, Don't adjust
space between Asian text and numbers

Formatted: Normal, Don't adjust space


between Latin and Asian text, Don't adjust
space between Asian text and numbers

For example, one protocol may allow an out-of-band interruption of a session, whereas another may not.
Different DBMS vendors may use different data definition and data manipulation languages,
even though they may be using the same data model. Different data models can present a difficult
schema translation and query translation problem, especially if performance is important.

Anda mungkin juga menyukai