37-44
(2001)
37
Abstract
There are two key motivations for this work. First, the implementation of object-oriented databases has grown to a significant
number. Second, there has been a need for integrated access of information from multiple data sources. The multidatabase system has been
proposed as a solution for integrated access of data from multiple distributed, heterogeneous, and autonomous database systems. To present
a single database illusion to its users, a multidatabase system maintains a single global database schema, which is the integration of all
component database schemas and against which its users will issue
queries and updates. Many approaches to schema integration have
been proposed in the literature. Most of the previous approaches are
concerned with relational databases. In this paper, we propose an approach to the integration of database schemas between object-oriented
databases in a multidatabase system environment. The underlying
principle of our approach is to facilitate the automation of the schema
integration process.
Key Words : Schema Integration, Object-Oriented Databases, MultiDatabase
Systems,
Heterogeneous
Databases,
Distributed Databases
1. Introduction
There are two primary underlying motivations for the work done in this paper. First, the
number of implementation of object-oriented databases has grown significantly. Second, there has
been a need for integrated access of information
from multiple data sources. The multidatabase
system (MDBS) [2,5] has been recognized as the
most viable solution to the problems of interoperating distributed, heterogeneous, and autonomous
database systems. To present a single database illusion to its users, an MDBS maintains a single
global database schema, which is the integration of
all component database schemas and against which
its users will issue queries and updates. In this paper, we propose an approach to the integration of
database schemas between object-oriented databases in an MDBS environment.
The database research community has paid a
Ching-Ming Chao
38
2. Correspondence Assertions
Correspondence assertions specify semantic
correspondences between schema objects of two
component schemas. Schema objects in object-oriented databases are classes and attributes.
We classify three categories of correspondence
assertions that are between classes, between attributes, and between a set of attributes and a class,
respectively. In our approach, correspondence assertions between component schemas should be
specified first. However, there can be some correspondence assertions that cannot be specified until
the integrated schema is being constructed.
2.1 Correspondence between Classes
The correspondence between two classes is
based on their semantic domains. The semantic
domain of a class is the set of real world entities it
can represent. Such a correspondence is distinguished into equivalent, related, or homonymous.
The related correspondence is further distinguished
into containment, overlap, disjoint, or component.
z Class-Equivalent: A class C1 is equivalent to
another class C2, denoted as Class-Equivalent
(C1, C2), if their semantic domains are the same.
z Class-Containment: A class C1 is contained in
another class C2, denoted as Class-Containment
(C1, C2), if the semantic domain of C1 is a subset
of that of C2.
z Class-Overlap: Two classes C1 and C2 overlap,
denoted as Class-Overlap (C1, C2), if the
set-intersection of their semantic domains is not
empty.
z Class-Disjoint: Two classes C1 and C2 disjoint,
denoted as Class-Disjoint (C1, C2), if the
set-intersection of their semantic domains is
empty but their semantic domains are both subsets of the semantic domain of a common superclass.
z Class-Component: A class C1 is a component of
another class C2, denoted as Class-Component
(C1, C2), if entities of C1 are components of enti-
ties of C2.
z Class-Homonymous: Two classes C1 and C2 are
homonymous, denoted as Class-Homonymous
(C1, C2), if they are neither equivalent nor related but they have the same class name.
2.2 Correspondence between Attributes
Correspondence assertions between attributes
are specified only when their associated classes are
equivalent or related. The correspondence between
attributes is based on their semantic domains. The
semantic domain of an attribute is the set of real
world entities it can represent. Correspondence
assertions in this category are further classified
into one-to-one and many-to-many correspondences. A one-to-one correspondence is one between two attributes from two different schemas.
Such a correspondence is distinguished into
equivalent, related, or homonymous. The related
correspondence is further distinguished into containment or overlap.
z Attribute-Equivalent: An attribute A1 is equivalent to another attribute A2, denoted as Attribute-Equivalent (A1, A2), if their semantic domains are the same.
z Attribute-Containment: An attribute A1 is contained in another attribute A2, denoted as Attribute-Containment (A1, A2), if the semantic domain of A1 is a subset of the semantic domain of
A2.
z Attribute-Overlap: Two attributes A1 and A2
overlap, denoted as Attribute-Overlap (A1, A2),
if the set-intersection of their semantic domains
is not empty.
z Attribute-Homonymous: Two attributes A1 and
A2 are homonymous, denoted as Attribute-Homonymous (A1, A2), if they are neither
equivalent nor related but they have the same
attribute name.
A many-to-many correspondence is one between two sets of attributes from two different
schemas. Two databases may use different numbers of attributes to represent the same set of real
world entities.
z Attribute-Set-Equivalent: A set of attributes AS1
is equivalent to another set of attributes AS2,
denoted as Attribute-Set-Equivalent (AS1, AS2),
if their semantic domains are the same.
2.3 Correspondence between a set of Attributes
and a Class
The same set of real world entities can be
represented as one or more attributes in one database and as a class in another database.
3. Integration Operators
Constructing an integrated schema is
achieved by successively applying primitive integration operators to component schemas. Note that
component schemas do not change in the process
of constructing the integrated schema. Integration
operators can be classified into restructuring and
integrating operators.
3.1 Restructuring Operators
Restructuring operators are used to rename or
restructure schema objects of component schemas
to resolve conflicts between them.
z Rename. The Rename operator renames a class
or an attribute. It uses the following syntax to
rename a class: Rename (class, new-class)
where new-class is the new name for the class.
It uses the following syntax to rename an attribute: Rename (class, old-attribute, new-attribute)
where new-attribute is the new name for the attribute old-attribute in the class.
z Coerce. The Coerce operator changes the domain type of an attribute. It has the following
syntax: Coerce (class, attribute, new-type)
where new-type is the new domain type for the
attribute in the class.
z Concatenate. The Concatenate operator concatenates several attributes to an attribute. It has
the following syntax: Concatenate (class, {attribute-list}, new-attribute, new-type) where attributes in the attribute-list of the class are replaced by a new attribute whose name is
new-attribute and whose type is new-type. Domain types of concatenated attributes and the
resulted attribute must all be character strings.
z Upgrade. The Upgrade operator creates a class
from a set of attributes. It has the following
syntax: Upgrade (owner-class, {attribute-list},
new-attribute, new-class) where attributes in the
attribute-list belong to the owner-class. Attributes in the attribute-list are replaced with a
complex attribute named new-attribute whose
domain class is new-class. A new class named
new-class is created that includes attributes in
the attribute-list.
39
40
Ching-Ming Chao
4. Integration Rules
According to the specified correspondence
assertions, integration rules provide the steps to
construct the integrated schema from the component schemas. There are five integration rules.
During the construction of the integrated schema,
these five integration rules are applied in the order:
rule 3, rule 1, rule 2, and rules 4 and 5. For integration rules 4 and 5, they are applied to classes in an
inheritance hierarchy in the top-down order. Besides, the same virtual class cannot be produced
more than once in the integrated schema by different applications of integration rules.
Integration rule 1: This rule is applied when
a correspondence assertion Class-Equivalent (C1,
C2) is specified. Classes C1 and C2 are integrated
into a virtual class.
[Step 1] If C1 and C2 have different names (i.e.,
they are synonymous), we apply the Rename operator to C1 or C2 to make them
have the same name.
[Step 2] For each pair of attributes such that a cor-
5. An Integration Example
We now give an example to illustrate the
process of constructing an integrated schema from
two component schemas. Figure 1 shows the
schemas of two object-oriented databases DB1 and
DB2 that store data of two different universities.
First, correspondence assertions between
these two component schemas are specified as
many as possible. Equivalent correspondences
between classes as well as correspondences between their attributes are as follows.
z Class-Equivalent (Person@DB1, People@DB2)
Attribute-Equivalent (ss#, ssn)
Attribute-Equivalent (nationality, nationality)
Attribute-Set-Equivalent
({first-name,
last-name}, {name})
z Class-Equivalent
(Employee@DB1,
Employee@DB2)
41
42
Ching-Ming Chao
Person
Computer
ss#
serial-no
first-name
price
last-name
cpu-speed
sex
ram-size
nationality
Student
Employee
Association
department
salary
name
father
participation
purpose
People
Country
Equipment
ssn
name
serial-no
name
population
price
age
area
nationality
Employee
Student
salary
Association
department
name
parent
purpose
composition hierarchy
participation
Faculty
rank
Staff
inheritance hierarchy
specialty
43
Person
Country
Equipment
ssn
name
serial-no
name
population
price
sex
area
age
Computer
nationality
cpu-speed
ram-size
Committee
Employee
Student
Association
name
salary
department
name
purpose
participation
parent
purpose
participation
Faculty
Staff
rank
specialty
6. Conclusion
We proposed in this paper an approach to
schema integration between object-oriented databases in an MDBS environment. In our approach,
correspondence assertions between schema objects
of component schemas are specified first and as
many as possible. According to the specified correspondence assertions, integration rules are then
applied to construct the integrated schema from
component schemas. Our approach has three salient features that make the automation of schema
integration much easier. First, the correspondence
assertions are in the form of predicates in the
first-order logic. Second, the integration rules are
triggered by specific correspondence assertions,
consist of algorithmic steps, and invoke primitive
integration operators. Last, the primitive integration operators are algebraic operators. Besides, our
approach not only keeps the data of component
databases retrievable form the global schema, but
also gets more information due to schema integration.
[2]
[3]
[4]
[5]
[6]
References
[1] Abiteboul, S., Cluet, S., Milo, T., Mogilevsky,
P., Simon, J., and Zohor, S., Tools for Data
[7]
Translation and Integration, IEEE Data Engineering Bulletin, Vol. 22, No. 1, pp. 3-8
(1999).
Bright, M.W., Hurson, A.R., and Pakzad, S.H.,
A Taxonomy and Current Issues in Multidatabase Systems, IEEE Computer, Vol. 25, No.
3, pp. 50-60 (1992).
Grahne, G. and Mendelzon, A.O., Tableau
Techniques for Querying Information Sources
through Global Schemas, in Proceedings of
the 7th International Conference on Database
Theory, pp. 332-347 (1999).
Josifovski, V. and Risch, T., Integrating Heterogeneous Overlapping Databases through
Object-Oriented Transformations, in Proceedings of the 25th International Conference
on Very Large Data Bases, pp. 435-446
(1999).
Pitoura, E., Bukhres, O., and Elmagarmid, A.,
Object-Oriented in Multidatabase Systems,
ACM Computing Surveys, Vol. 27, No. 2, pp.
141-195 (1995).
Schmitt, I. and Turker, C., An Incremental
Approach to Schema Integration by Refining
Extensional Relationships, in Proceedings of
the International Conference on Information
and Knowledge Management, pp. 322-330
(1998).
Tomasic, A., Raschid, L., and Valduriez, P.,
Scaling Access to Heterogeneous Data
44
Ching-Ming Chao