Schema Integration Between Object-Oriented Databases: Ching-Ming Chao

Tamkang Journal of Science and Engineering, Vol. 4, No. 1, pp.
37-44
(2001)
37
Schema Integration between Object-Oriented Databases

Ching-Ming Chao
Department of Computer and Information Science
Soochow University
Taipei 100, Taiwan, R.O.C.
E-mail: chao@cis.scu.edu.tw
Abstract
There are two key motivations for this work. First, the implementation of object-oriented databases has grown to a significant
number. Second, there has been a need for integrated access of information from multiple data sources. The multidatabase system has been
proposed as a solution for integrated access of data from multiple distributed, heterogeneous, and autonomous database systems. To present
a single database illusion to its users, a multidatabase system maintains a single global database schema, which is the integration of all
component database schemas and against which its users will issue
queries and updates. Many approaches to schema integration have
been proposed in the literature. Most of the previous approaches are
concerned with relational databases. In this paper, we propose an approach to the integration of database schemas between object-oriented
databases in a multidatabase system environment. The underlying
principle of our approach is to facilitate the automation of the schema
integration process.
Key Words : Schema Integration, Object-Oriented Databases, MultiDatabase
Systems,
Heterogeneous
Databases,
Distributed Databases
1. Introduction
There are two primary underlying motivations for the work done in this paper. First, the
number of implementation of object-oriented databases has grown significantly. Second, there has
been a need for integrated access of information
from multiple data sources. The multidatabase
system (MDBS) [2,5] has been recognized as the
most viable solution to the problems of interoperating distributed, heterogeneous, and autonomous
database systems. To present a single database illusion to its users, an MDBS maintains a single
global database schema, which is the integration of
all component database schemas and against which
its users will issue queries and updates. In this paper, we propose an approach to the integration of
database schemas between object-oriented databases in an MDBS environment.
The database research community has paid a
lot of attention on this topic. Many approaches to

schema integration for various database models
have been proposed in the literature. Most of the
earlier approaches are concerned with the integration of relational databases. Our approach is concentrated on the integration of object-oriented databases. Some of the more recent related works can
be seen in [1,3,4,6,7,8]. The primary concern of
our approach that differs from others is to facilitate
the automation of the schema integration process.
The primary idea of our approach is described as
follows. We define a set of correspondence assertions that are used to declaratively specify correspondences between schema objects of two component schemas. We also define a set of integration
rules that give algorithmic steps for constructing an
integrated schema from two component schemas
according to the specified correspondence assertions. These integration rules use primitive integra-
Ching-Ming Chao
38
tion operators to restructure and integrate component schemas.

The remainder of this paper is organized as
follows. Section 2 proposes the correspondence
assertions between two object-oriented databases.
Section 3 briefly describes the integration operators for restructuring and integrating component
schemas. Section 4 presents the integration rules
for constructing the integrated schema. Section 5
gives a simple example to illustrate the schema
integration process. Section 6 concludes this paper.
2. Correspondence Assertions
Correspondence assertions specify semantic
correspondences between schema objects of two
component schemas. Schema objects in object-oriented databases are classes and attributes.
We classify three categories of correspondence
assertions that are between classes, between attributes, and between a set of attributes and a class,
respectively. In our approach, correspondence assertions between component schemas should be
specified first. However, there can be some correspondence assertions that cannot be specified until
the integrated schema is being constructed.
2.1 Correspondence between Classes
The correspondence between two classes is
based on their semantic domains. The semantic
domain of a class is the set of real world entities it
can represent. Such a correspondence is distinguished into equivalent, related, or homonymous.
The related correspondence is further distinguished
into containment, overlap, disjoint, or component.
z Class-Equivalent: A class C1 is equivalent to
another class C2, denoted as Class-Equivalent
(C1, C2), if their semantic domains are the same.
z Class-Containment: A class C1 is contained in
another class C2, denoted as Class-Containment
(C1, C2), if the semantic domain of C1 is a subset
of that of C2.
z Class-Overlap: Two classes C1 and C2 overlap,
denoted as Class-Overlap (C1, C2), if the
set-intersection of their semantic domains is not
empty.
z Class-Disjoint: Two classes C1 and C2 disjoint,
denoted as Class-Disjoint (C1, C2), if the
set-intersection of their semantic domains is
empty but their semantic domains are both subsets of the semantic domain of a common superclass.
z Class-Component: A class C1 is a component of
another class C2, denoted as Class-Component
(C1, C2), if entities of C1 are components of enti-
ties of C2.
z Class-Homonymous: Two classes C1 and C2 are
homonymous, denoted as Class-Homonymous
(C1, C2), if they are neither equivalent nor related but they have the same class name.
2.2 Correspondence between Attributes
Correspondence assertions between attributes
are specified only when their associated classes are
equivalent or related. The correspondence between
attributes is based on their semantic domains. The
semantic domain of an attribute is the set of real
world entities it can represent. Correspondence
assertions in this category are further classified
into one-to-one and many-to-many correspondences. A one-to-one correspondence is one between two attributes from two different schemas.
Such a correspondence is distinguished into
equivalent, related, or homonymous. The related
correspondence is further distinguished into containment or overlap.
z Attribute-Equivalent: An attribute A1 is equivalent to another attribute A2, denoted as Attribute-Equivalent (A1, A2), if their semantic domains are the same.
z Attribute-Containment: An attribute A1 is contained in another attribute A2, denoted as Attribute-Containment (A1, A2), if the semantic domain of A1 is a subset of the semantic domain of
A2.
z Attribute-Overlap: Two attributes A1 and A2
overlap, denoted as Attribute-Overlap (A1, A2),
if the set-intersection of their semantic domains
is not empty.
z Attribute-Homonymous: Two attributes A1 and
A2 are homonymous, denoted as Attribute-Homonymous (A1, A2), if they are neither
equivalent nor related but they have the same
attribute name.
A many-to-many correspondence is one between two sets of attributes from two different
schemas. Two databases may use different numbers of attributes to represent the same set of real
world entities.
z Attribute-Set-Equivalent: A set of attributes AS1
is equivalent to another set of attributes AS2,
denoted as Attribute-Set-Equivalent (AS1, AS2),
if their semantic domains are the same.
2.3 Correspondence between a set of Attributes
and a Class
The same set of real world entities can be
represented as one or more attributes in one database and as a class in another database.
z Attribute-Set-Class-Equivalent: A set of attributes AS of a class C1 is equivalent to another

class
C2 ,
denoted
as
Attribute-Set-Class-Equivalent (C1, AS, C2), if the
semantic domain of AS is the same as the semantic domain of C2.
3. Integration Operators
Constructing an integrated schema is
achieved by successively applying primitive integration operators to component schemas. Note that
component schemas do not change in the process
of constructing the integrated schema. Integration
operators can be classified into restructuring and
integrating operators.
3.1 Restructuring Operators
Restructuring operators are used to rename or
restructure schema objects of component schemas
to resolve conflicts between them.
z Rename. The Rename operator renames a class
or an attribute. It uses the following syntax to
rename a class: Rename (class, new-class)
where new-class is the new name for the class.
It uses the following syntax to rename an attribute: Rename (class, old-attribute, new-attribute)
where new-attribute is the new name for the attribute old-attribute in the class.
z Coerce. The Coerce operator changes the domain type of an attribute. It has the following
syntax: Coerce (class, attribute, new-type)
where new-type is the new domain type for the
attribute in the class.
z Concatenate. The Concatenate operator concatenates several attributes to an attribute. It has
the following syntax: Concatenate (class, {attribute-list}, new-attribute, new-type) where attributes in the attribute-list of the class are replaced by a new attribute whose name is
new-attribute and whose type is new-type. Domain types of concatenated attributes and the
resulted attribute must all be character strings.
z Upgrade. The Upgrade operator creates a class
from a set of attributes. It has the following
syntax: Upgrade (owner-class, {attribute-list},
new-attribute, new-class) where attributes in the
attribute-list belong to the owner-class. Attributes in the attribute-list are replaced with a
complex attribute named new-attribute whose
domain class is new-class. A new class named
new-class is created that includes attributes in
the attribute-list.
Integrating operators are used to construct

schema objects of the integrated schema from
those of component schemas.
z Create. The Create operator creates a virtual
class1 from a class of some component schema.
It has the following syntax: Create (class) where
class is a class from some component schema.
The name, attributes, and virtual objects of the
resulted virtual class are the same as those of the
class. Beside, the relationships (i.e., the inheritance and composition hierarchy) of the resulted
virtual class remain the same.
z Combine. The Combine operator combines two
classes into a virtual class. Only the resulted
virtual class will appear in the integrated
schema. It has the following syntax: Combine
(class1, class2, new-class) where class1 and
class2 are combined into the virtual class
new-class. The Combine operator is similar to
the outer-join operation in relational databases.
The attributes of new-class are the set-union of
those of class1 and class2. The virtual objects of
new-class are the set-union of those of class1
and class2.
z Inherit. The Inherit operator builds an inheritance hierarchy between two classes. It has the
following syntax: Inherit (subclass, superclass).
Two virtual classes are produced in the integrated schema. One virtual class corresponds to
subclass and is denoted as virtual-subclass. The
other virtual class corresponds to superclass and
is denoted as virtual-superclass. The attributes
of virtual-superclass are the same as those of
superclass. The attributes of virtual-subclass are
the same as those of subclass; besides, it inherits
the attributes of superclass. The virtual objects
of virtual-subclass are the same as those of subclass. However, if there are objects in superclass,
which represent the same real world entities as
some objects in subclass, these objects have to
be virtually integrated in virtual-subclass. The
virtual objects of virtual-superclass are the
set-difference between superclass and subclass.
z Generalize. The Generalize operator creates a
common superclass of two classes. It has the
following syntax: Generalize (class1, class2,
superclass) where the virtual class superclass is
the common superclass of class1 and class2.
The attributes of superclass are the
set-intersection of those of class1 and class2.
The set of virtual objects of superclass is empty.
Two more virtual classes are produced in the in1
3.2 Integrating Operators
39
A virtual class is a class created in the integrated

schema.
40
Ching-Ming Chao
tegrated schema as subclasses of superclass,

whose attributes and virtual objects are the same
as those of class1 and class2, respectively.
z Specialize. The Specialize operator creates a
common subclass of two classes. It has the following syntax: Specialize (class1, class2, subclass) where the virtual class subclass is the
common subclass of class1 and class2. The attributes of subclass are the set-union of those of
class1 and class2. The virtual objects of subclass
are the set-intersection of those of class1 and
class2. Two more virtual classes are produced in
the integrated schema as superclasses of subclass. The attributes of these two virtual classes
are the same as those of class1 and class2, respectively. The virtual objects of each of these
two virtual classes are the set-difference between each of the virtual objects of class1 and
class2 and the virtual objects of subclass.
z Compose. The Compose operator builds a
composition hierarchy between two classes. It
has the following syntax: Compose (component,
composite, link-attribute). Two virtual classes
virtual-component and virtual-composite are
produced in the integrated schema such that
virtual-component is the domain class of the attribute link-attribute in virtual-composite. The
attributes and virtual objects of virtual-component and virtual-composite are the
same as those of component and composite, respectively.
4. Integration Rules
According to the specified correspondence
assertions, integration rules provide the steps to
construct the integrated schema from the component schemas. There are five integration rules.
During the construction of the integrated schema,
these five integration rules are applied in the order:
rule 3, rule 1, rule 2, and rules 4 and 5. For integration rules 4 and 5, they are applied to classes in an
inheritance hierarchy in the top-down order. Besides, the same virtual class cannot be produced
more than once in the integrated schema by different applications of integration rules.
Integration rule 1: This rule is applied when
a correspondence assertion Class-Equivalent (C1,
C2) is specified. Classes C1 and C2 are integrated
into a virtual class.
[Step 1] If C1 and C2 have different names (i.e.,
they are synonymous), we apply the Rename operator to C1 or C2 to make them
have the same name.
[Step 2] For each pair of attributes such that a cor-
respondence assertion Attribute-Equivalent

(A1, A2) is specified, we do the following
two substeps.
[Step 2-1] If A1 and A2 have different names, we
apply the Rename operator to A1 or A2 to
make them have the same name.
[Step 2-2] If A1 and A2 have different domain types,
we apply the Coerce operator to either A1
or A2 to make them have the same type.
[Step 3] For each pair of attributes such that a correspondence
assertion
Attribute-Containment (A1, A2) is specified, we
do the following two substeps.
apply the Rename operator to A1 to change
its name to the name of A2.
[Step 3-2] This substep is the same as step 2-2.
[Step 4] For each pair of attributes such that a correspondence assertion Attribute-Overlap
(A1, A2) is specified, we do the following
two substeps.
apply the Rename operator to both A1 and
A2 to make them have the same name that
semantically contains the old names of A1
and A2.
[Step 4-2] This substep is the same as step 2-2.
[Step 5] For each pair of attributes such that a correspondence
assertion
Attribute-Homonymous (A1, A2) is specified, we
apply the Rename operator to A1 or A2 to
make them have different names.
[Step 6] For each pair of attribute sets such that a
correspondence
assertion
Attribute-Set-Equivalent (AS1, AS2) is specified,
we apply the Concatenate operator to both
AS1 and AS2 to make the resulted attributes have the same name and type.
[Step 7] Apply the Combine operator to C1 and C2
to produce a virtual class.
two classes C1 and C2 are related.
[Step 1] If C1 and C2 have the same name, apply
the Rename operator to C1 or C2 to make
them have different names.
[Step 2] Apply a process similar to that of steps 2
to 6 in integration rule 1 to resolve
conflicts between the attributes of C1 and
the attributes of C2.
[Step 3] This step varies for different correspondence assertions between C1 and C2.
[Step 3-1] If a correspondence assertion ClassContainment (C1, C2) is specified, we apply the Inherit operator to C1 and C2 to
make C1 a subclass of C2.
[Step 3-2] If a correspondence assertion ClassOverlap (C1, C2) is specified, we apply

both the Generalize operator and the
Specialize operator to C1 and C2 to create
their common superclass and subclass,
respectively.
[Step 3-3] If a correspondence assertion ClassDisjoint (C1, C2) is specified, we apply the
Generalize operator to C1 and C2 to create
their common superclass.
[Step 3-4] If a correspondence assertion ClassComponent (C1, C2) is specified, we apply
the Compose operator to C1 and C2 to
build a composition hierarchy between
them.
a
correspondence
assertion
Attribute-Set-Class-Equivalent (AS, C) is specified.
[Step 1] Apply the Upgrade operator to AS to create a new class, say, called NC.
[Step 2] Specify the correspondence assertion
Class-Equivalent (NC, C) and correspondence assertions between the attributes of
NC and the attributes of C.
a correspondence assertion Class-Homonymous
(C1, C2) is specified.
[Step 1] Apply the Rename operator to C1 or C2 to
make them have different names.
[Step 2] Apply the Create operator to both C1 and
C2 to produce two virtual classes.
Integration rule 5: This rule is applied for
each class without any correspondence assertion.
We apply the Create operator to it to produce a
virtual class.
5. An Integration Example
We now give an example to illustrate the
process of constructing an integrated schema from
two component schemas. Figure 1 shows the
schemas of two object-oriented databases DB1 and
DB2 that store data of two different universities.
First, correspondence assertions between
these two component schemas are specified as
many as possible. Equivalent correspondences
between classes as well as correspondences between their attributes are as follows.
z Class-Equivalent (Person@DB1, People@DB2)
Attribute-Equivalent (ss#, ssn)
Attribute-Equivalent (nationality, nationality)
Attribute-Set-Equivalent
({first-name,
last-name}, {name})
z Class-Equivalent
(Employee@DB1,
Employee@DB2)
41
Attribute-Equivalent (salary, salary)

z Class-Equivalent
(Student@DB1,
Student@DB2)
Attribute-Equivalent (department, department)
Attribute-Containment (father, parent)
Related correspondences between classes as
well as correspondences between their attributes
are as follows.
z Class-Containment (Computer@DB1, Equipment@DB2)
Attribute-Equivalent (serial-no, serial-no)
Attribute-Equivalent (price, price)
Homonymous correspondences between
classes are as follows.
z Class-Homonymous (Association@DB1, Association@DB2)
Equivalent correspondences between sets of
attributes and classes are as follows.
z Attribute-Set-Class-Equivalent (Person@DB1,
{nationality}, Country@DB2)
Then, according to the specified correspondence assertions, integration rules are applied in
the following order to construct the integrated
schema.
1. Apply integration rule 3 for Attribute-Set-Class-Equivalent
(Person@DB1,
{nationality}, Country@DB2).
Upgrade (Person@DB1, [nationality], nationality, Country)
Specify
correspondence
assertions
Class-Equivalent (Country@DB1, Country@DB2) and Attribute-Equivalent (nationality, name).
2. Apply integration 1 for Class-Equivalent (Person@DB1, People@DB2).
Rename (People@DB2, Person)
Rename (Person@DB1, ss#, ssn)
Coerce (Person@DB1, nationality, Country)
Concatenate (Person@DB1, [first-name,
last-name], name, string-type2)
Combine (Person@DB1, Person@DB2, Person)
3. Apply integration rule 1 for Class-Equivalent
(Employee@DB1, Employee@DB2).
Combine (Employee@DB1, Employee@DB2,
Employee)
2
The string-type is the domain type of the attribute

name in Person@DB2.
42
Ching-Ming Chao
Person
Computer
ss#
serial-no
first-name
price
last-name
cpu-speed
sex
ram-size
nationality
Student
Employee
Association
department
salary
name
father
participation
purpose
(a) An object-oriented schema of DB1
People
Country
Equipment
ssn
name
serial-no
name
population
price
age
area
nationality
Employee
Student
salary
Association
department
name
parent
purpose
composition hierarchy
participation
Faculty
rank
Staff
inheritance hierarchy
specialty
(b) An object-oriented schema of DB2

Figure 1. Schemas of two component databases

(Student@DB1, Student@DB2).
Combine (Student@DB1, Student@DB2,
Student)
(Country@DB1, Country@DB2).
Rename (Country@DB1, nationality, name)
Combine (Country@DB1, Country@DB2,
Country)
6. Apply integration rule 2 for Class-Containment
(Computer@DB1, Equipment@DB2).
Inherit (Computer@DB1, Equipment@DB2)

7. Apply integration rule 4 for Class-Homonymous
(Association@DB1, Association@DB2)
Rename (Association@DB1, Committee)
Create (Committee@DB1)
Create (Association@DB2)
8. Apply integration rule 5 for classes Faculty@DB2 and Staff@DB2 that do not have any
correspondence assertion.
Create (Faculty@DB2)
Create (Staff@DB2)
43
Person
Country
Equipment
ssn
name
serial-no
name
population
price
sex
area
age
Computer
nationality
cpu-speed
ram-size
Committee
Employee
Student
Association
name
salary
department
name
purpose
participation
parent
purpose
participation
Faculty
Staff
rank
specialty
Figure 2. The jntegrated object-oriented schema
The resulted integrated schema is shown in

Figure 2.
6. Conclusion
We proposed in this paper an approach to
schema integration between object-oriented databases in an MDBS environment. In our approach,
correspondence assertions between schema objects
of component schemas are specified first and as
many as possible. According to the specified correspondence assertions, integration rules are then
applied to construct the integrated schema from
component schemas. Our approach has three salient features that make the automation of schema
integration much easier. First, the correspondence
assertions are in the form of predicates in the
first-order logic. Second, the integration rules are
triggered by specific correspondence assertions,
consist of algorithmic steps, and invoke primitive
integration operators. Last, the primitive integration operators are algebraic operators. Besides, our
approach not only keeps the data of component
databases retrievable form the global schema, but
also gets more information due to schema integration.
[2]
[3]
[4]
[5]
[6]
References
[1] Abiteboul, S., Cluet, S., Milo, T., Mogilevsky,
P., Simon, J., and Zohor, S., Tools for Data
[7]
Translation and Integration, IEEE Data Engineering Bulletin, Vol. 22, No. 1, pp. 3-8
(1999).
Bright, M.W., Hurson, A.R., and Pakzad, S.H.,
A Taxonomy and Current Issues in Multidatabase Systems, IEEE Computer, Vol. 25, No.
3, pp. 50-60 (1992).
Grahne, G. and Mendelzon, A.O., Tableau
Techniques for Querying Information Sources
through Global Schemas, in Proceedings of
the 7th International Conference on Database
Theory, pp. 332-347 (1999).
Josifovski, V. and Risch, T., Integrating Heterogeneous Overlapping Databases through
Object-Oriented Transformations, in Proceedings of the 25th International Conference
on Very Large Data Bases, pp. 435-446
(1999).
Pitoura, E., Bukhres, O., and Elmagarmid, A.,
Object-Oriented in Multidatabase Systems,
ACM Computing Surveys, Vol. 27, No. 2, pp.
141-195 (1995).
Schmitt, I. and Turker, C., An Incremental
Approach to Schema Integration by Refining
Extensional Relationships, in Proceedings of
the International Conference on Information
and Knowledge Management, pp. 322-330
(1998).
Tomasic, A., Raschid, L., and Valduriez, P.,
Scaling Access to Heterogeneous Data
44
Ching-Ming Chao
Sources with DISCO, IEEE Transactions on

Knowledge and Data Engineering, Vol. 10, No.
5 (1998).
[8] Yang, J. and Papazoglou, M.P., A Configurable Approach for Object Sharing among Multidatabase Systems, in Proceedings of the International Conference on Information and
Knowledge Management, pp. 129-136 (1995).
Manuscript Received: Feb. 20, 2001

and Accepted: Mar. 20, 2001

Schema Integration Between Object-Oriented Databases: Ching-Ming Chao

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Schema Integration Between Object-Oriented Databases: Ching-Ming Chao

Diunggah oleh

Hak Cipta:

Format Tersedia

Tamkang Journal of Science and Engineering, Vol. 4, No. 1, pp.

Schema Integration between Object-Oriented Databases

lot of attention on this topic. Many approaches to

tion operators to restructure and integrate component schemas.

Schema Integration between Object-Oriented Databases

z Attribute-Set-Class-Equivalent: A set of attributes AS of a class C1 is equivalent to another

Integrating operators are used to construct

3.2 Integrating Operators

A virtual class is a class created in the integrated

tegrated schema as subclasses of superclass,

respondence assertion Attribute-Equivalent

Schema Integration between Object-Oriented Databases

[Step 3-2] If a correspondence assertion ClassOverlap (C1, C2) is specified, we apply

Attribute-Equivalent (salary, salary)

The string-type is the domain type of the attribute

(a) An object-oriented schema of DB1

(b) An object-oriented schema of DB2

4. Apply integration rule 1 for Class-Equivalent

Inherit (Computer@DB1, Equipment@DB2)

Schema Integration between Object-Oriented Databases

Figure 2. The jntegrated object-oriented schema

The resulted integrated schema is shown in

Sources with DISCO, IEEE Transactions on

Manuscript Received: Feb. 20, 2001

Anda mungkin juga menyukai