Anda di halaman 1dari 8

Tamkang Journal of Science and Engineering, Vol. 4, No. 1, pp.

37-44

(2001)

37

Schema Integration between Object-Oriented Databases


Ching-Ming Chao
Department of Computer and Information Science
Soochow University
Taipei 100, Taiwan, R.O.C.
E-mail: chao@cis.scu.edu.tw

Abstract
There are two key motivations for this work. First, the implementation of object-oriented databases has grown to a significant
number. Second, there has been a need for integrated access of information from multiple data sources. The multidatabase system has been
proposed as a solution for integrated access of data from multiple distributed, heterogeneous, and autonomous database systems. To present
a single database illusion to its users, a multidatabase system maintains a single global database schema, which is the integration of all
component database schemas and against which its users will issue
queries and updates. Many approaches to schema integration have
been proposed in the literature. Most of the previous approaches are
concerned with relational databases. In this paper, we propose an approach to the integration of database schemas between object-oriented
databases in a multidatabase system environment. The underlying
principle of our approach is to facilitate the automation of the schema
integration process.
Key Words : Schema Integration, Object-Oriented Databases, MultiDatabase
Systems,
Heterogeneous
Databases,
Distributed Databases

1. Introduction
There are two primary underlying motivations for the work done in this paper. First, the
number of implementation of object-oriented databases has grown significantly. Second, there has
been a need for integrated access of information
from multiple data sources. The multidatabase
system (MDBS) [2,5] has been recognized as the
most viable solution to the problems of interoperating distributed, heterogeneous, and autonomous
database systems. To present a single database illusion to its users, an MDBS maintains a single
global database schema, which is the integration of
all component database schemas and against which
its users will issue queries and updates. In this paper, we propose an approach to the integration of
database schemas between object-oriented databases in an MDBS environment.
The database research community has paid a

lot of attention on this topic. Many approaches to


schema integration for various database models
have been proposed in the literature. Most of the
earlier approaches are concerned with the integration of relational databases. Our approach is concentrated on the integration of object-oriented databases. Some of the more recent related works can
be seen in [1,3,4,6,7,8]. The primary concern of
our approach that differs from others is to facilitate
the automation of the schema integration process.
The primary idea of our approach is described as
follows. We define a set of correspondence assertions that are used to declaratively specify correspondences between schema objects of two component schemas. We also define a set of integration
rules that give algorithmic steps for constructing an
integrated schema from two component schemas
according to the specified correspondence assertions. These integration rules use primitive integra-

Ching-Ming Chao

38

tion operators to restructure and integrate component schemas.


The remainder of this paper is organized as
follows. Section 2 proposes the correspondence
assertions between two object-oriented databases.
Section 3 briefly describes the integration operators for restructuring and integrating component
schemas. Section 4 presents the integration rules
for constructing the integrated schema. Section 5
gives a simple example to illustrate the schema
integration process. Section 6 concludes this paper.

2. Correspondence Assertions
Correspondence assertions specify semantic
correspondences between schema objects of two
component schemas. Schema objects in object-oriented databases are classes and attributes.
We classify three categories of correspondence
assertions that are between classes, between attributes, and between a set of attributes and a class,
respectively. In our approach, correspondence assertions between component schemas should be
specified first. However, there can be some correspondence assertions that cannot be specified until
the integrated schema is being constructed.
2.1 Correspondence between Classes
The correspondence between two classes is
based on their semantic domains. The semantic
domain of a class is the set of real world entities it
can represent. Such a correspondence is distinguished into equivalent, related, or homonymous.
The related correspondence is further distinguished
into containment, overlap, disjoint, or component.
z Class-Equivalent: A class C1 is equivalent to
another class C2, denoted as Class-Equivalent
(C1, C2), if their semantic domains are the same.
z Class-Containment: A class C1 is contained in
another class C2, denoted as Class-Containment
(C1, C2), if the semantic domain of C1 is a subset
of that of C2.
z Class-Overlap: Two classes C1 and C2 overlap,
denoted as Class-Overlap (C1, C2), if the
set-intersection of their semantic domains is not
empty.
z Class-Disjoint: Two classes C1 and C2 disjoint,
denoted as Class-Disjoint (C1, C2), if the
set-intersection of their semantic domains is
empty but their semantic domains are both subsets of the semantic domain of a common superclass.
z Class-Component: A class C1 is a component of
another class C2, denoted as Class-Component
(C1, C2), if entities of C1 are components of enti-

ties of C2.
z Class-Homonymous: Two classes C1 and C2 are
homonymous, denoted as Class-Homonymous
(C1, C2), if they are neither equivalent nor related but they have the same class name.
2.2 Correspondence between Attributes
Correspondence assertions between attributes
are specified only when their associated classes are
equivalent or related. The correspondence between
attributes is based on their semantic domains. The
semantic domain of an attribute is the set of real
world entities it can represent. Correspondence
assertions in this category are further classified
into one-to-one and many-to-many correspondences. A one-to-one correspondence is one between two attributes from two different schemas.
Such a correspondence is distinguished into
equivalent, related, or homonymous. The related
correspondence is further distinguished into containment or overlap.
z Attribute-Equivalent: An attribute A1 is equivalent to another attribute A2, denoted as Attribute-Equivalent (A1, A2), if their semantic domains are the same.
z Attribute-Containment: An attribute A1 is contained in another attribute A2, denoted as Attribute-Containment (A1, A2), if the semantic domain of A1 is a subset of the semantic domain of
A2.
z Attribute-Overlap: Two attributes A1 and A2
overlap, denoted as Attribute-Overlap (A1, A2),
if the set-intersection of their semantic domains
is not empty.
z Attribute-Homonymous: Two attributes A1 and
A2 are homonymous, denoted as Attribute-Homonymous (A1, A2), if they are neither
equivalent nor related but they have the same
attribute name.
A many-to-many correspondence is one between two sets of attributes from two different
schemas. Two databases may use different numbers of attributes to represent the same set of real
world entities.
z Attribute-Set-Equivalent: A set of attributes AS1
is equivalent to another set of attributes AS2,
denoted as Attribute-Set-Equivalent (AS1, AS2),
if their semantic domains are the same.
2.3 Correspondence between a set of Attributes
and a Class
The same set of real world entities can be
represented as one or more attributes in one database and as a class in another database.

Schema Integration between Object-Oriented Databases

z Attribute-Set-Class-Equivalent: A set of attributes AS of a class C1 is equivalent to another


class
C2 ,
denoted
as
Attribute-Set-Class-Equivalent (C1, AS, C2), if the
semantic domain of AS is the same as the semantic domain of C2.

3. Integration Operators
Constructing an integrated schema is
achieved by successively applying primitive integration operators to component schemas. Note that
component schemas do not change in the process
of constructing the integrated schema. Integration
operators can be classified into restructuring and
integrating operators.
3.1 Restructuring Operators
Restructuring operators are used to rename or
restructure schema objects of component schemas
to resolve conflicts between them.
z Rename. The Rename operator renames a class
or an attribute. It uses the following syntax to
rename a class: Rename (class, new-class)
where new-class is the new name for the class.
It uses the following syntax to rename an attribute: Rename (class, old-attribute, new-attribute)
where new-attribute is the new name for the attribute old-attribute in the class.
z Coerce. The Coerce operator changes the domain type of an attribute. It has the following
syntax: Coerce (class, attribute, new-type)
where new-type is the new domain type for the
attribute in the class.
z Concatenate. The Concatenate operator concatenates several attributes to an attribute. It has
the following syntax: Concatenate (class, {attribute-list}, new-attribute, new-type) where attributes in the attribute-list of the class are replaced by a new attribute whose name is
new-attribute and whose type is new-type. Domain types of concatenated attributes and the
resulted attribute must all be character strings.
z Upgrade. The Upgrade operator creates a class
from a set of attributes. It has the following
syntax: Upgrade (owner-class, {attribute-list},
new-attribute, new-class) where attributes in the
attribute-list belong to the owner-class. Attributes in the attribute-list are replaced with a
complex attribute named new-attribute whose
domain class is new-class. A new class named
new-class is created that includes attributes in
the attribute-list.

Integrating operators are used to construct


schema objects of the integrated schema from
those of component schemas.
z Create. The Create operator creates a virtual
class1 from a class of some component schema.
It has the following syntax: Create (class) where
class is a class from some component schema.
The name, attributes, and virtual objects of the
resulted virtual class are the same as those of the
class. Beside, the relationships (i.e., the inheritance and composition hierarchy) of the resulted
virtual class remain the same.
z Combine. The Combine operator combines two
classes into a virtual class. Only the resulted
virtual class will appear in the integrated
schema. It has the following syntax: Combine
(class1, class2, new-class) where class1 and
class2 are combined into the virtual class
new-class. The Combine operator is similar to
the outer-join operation in relational databases.
The attributes of new-class are the set-union of
those of class1 and class2. The virtual objects of
new-class are the set-union of those of class1
and class2.
z Inherit. The Inherit operator builds an inheritance hierarchy between two classes. It has the
following syntax: Inherit (subclass, superclass).
Two virtual classes are produced in the integrated schema. One virtual class corresponds to
subclass and is denoted as virtual-subclass. The
other virtual class corresponds to superclass and
is denoted as virtual-superclass. The attributes
of virtual-superclass are the same as those of
superclass. The attributes of virtual-subclass are
the same as those of subclass; besides, it inherits
the attributes of superclass. The virtual objects
of virtual-subclass are the same as those of subclass. However, if there are objects in superclass,
which represent the same real world entities as
some objects in subclass, these objects have to
be virtually integrated in virtual-subclass. The
virtual objects of virtual-superclass are the
set-difference between superclass and subclass.
z Generalize. The Generalize operator creates a
common superclass of two classes. It has the
following syntax: Generalize (class1, class2,
superclass) where the virtual class superclass is
the common superclass of class1 and class2.
The attributes of superclass are the
set-intersection of those of class1 and class2.
The set of virtual objects of superclass is empty.
Two more virtual classes are produced in the in1

3.2 Integrating Operators

39

A virtual class is a class created in the integrated


schema.

40

Ching-Ming Chao

tegrated schema as subclasses of superclass,


whose attributes and virtual objects are the same
as those of class1 and class2, respectively.
z Specialize. The Specialize operator creates a
common subclass of two classes. It has the following syntax: Specialize (class1, class2, subclass) where the virtual class subclass is the
common subclass of class1 and class2. The attributes of subclass are the set-union of those of
class1 and class2. The virtual objects of subclass
are the set-intersection of those of class1 and
class2. Two more virtual classes are produced in
the integrated schema as superclasses of subclass. The attributes of these two virtual classes
are the same as those of class1 and class2, respectively. The virtual objects of each of these
two virtual classes are the set-difference between each of the virtual objects of class1 and
class2 and the virtual objects of subclass.
z Compose. The Compose operator builds a
composition hierarchy between two classes. It
has the following syntax: Compose (component,
composite, link-attribute). Two virtual classes
virtual-component and virtual-composite are
produced in the integrated schema such that
virtual-component is the domain class of the attribute link-attribute in virtual-composite. The
attributes and virtual objects of virtual-component and virtual-composite are the
same as those of component and composite, respectively.

4. Integration Rules
According to the specified correspondence
assertions, integration rules provide the steps to
construct the integrated schema from the component schemas. There are five integration rules.
During the construction of the integrated schema,
these five integration rules are applied in the order:
rule 3, rule 1, rule 2, and rules 4 and 5. For integration rules 4 and 5, they are applied to classes in an
inheritance hierarchy in the top-down order. Besides, the same virtual class cannot be produced
more than once in the integrated schema by different applications of integration rules.
Integration rule 1: This rule is applied when
a correspondence assertion Class-Equivalent (C1,
C2) is specified. Classes C1 and C2 are integrated
into a virtual class.
[Step 1] If C1 and C2 have different names (i.e.,
they are synonymous), we apply the Rename operator to C1 or C2 to make them
have the same name.
[Step 2] For each pair of attributes such that a cor-

respondence assertion Attribute-Equivalent


(A1, A2) is specified, we do the following
two substeps.
[Step 2-1] If A1 and A2 have different names, we
apply the Rename operator to A1 or A2 to
make them have the same name.
[Step 2-2] If A1 and A2 have different domain types,
we apply the Coerce operator to either A1
or A2 to make them have the same type.
[Step 3] For each pair of attributes such that a correspondence
assertion
Attribute-Containment (A1, A2) is specified, we
do the following two substeps.
[Step 3-1] If A1 and A2 have different names, we
apply the Rename operator to A1 to change
its name to the name of A2.
[Step 3-2] This substep is the same as step 2-2.
[Step 4] For each pair of attributes such that a correspondence assertion Attribute-Overlap
(A1, A2) is specified, we do the following
two substeps.
[Step 4-1] If A1 and A2 have different names, we
apply the Rename operator to both A1 and
A2 to make them have the same name that
semantically contains the old names of A1
and A2.
[Step 4-2] This substep is the same as step 2-2.
[Step 5] For each pair of attributes such that a correspondence
assertion
Attribute-Homonymous (A1, A2) is specified, we
apply the Rename operator to A1 or A2 to
make them have different names.
[Step 6] For each pair of attribute sets such that a
correspondence
assertion
Attribute-Set-Equivalent (AS1, AS2) is specified,
we apply the Concatenate operator to both
AS1 and AS2 to make the resulted attributes have the same name and type.
[Step 7] Apply the Combine operator to C1 and C2
to produce a virtual class.
Integration rule 2: This rule is applied when
two classes C1 and C2 are related.
[Step 1] If C1 and C2 have the same name, apply
the Rename operator to C1 or C2 to make
them have different names.
[Step 2] Apply a process similar to that of steps 2
to 6 in integration rule 1 to resolve
conflicts between the attributes of C1 and
the attributes of C2.
[Step 3] This step varies for different correspondence assertions between C1 and C2.
[Step 3-1] If a correspondence assertion ClassContainment (C1, C2) is specified, we apply the Inherit operator to C1 and C2 to
make C1 a subclass of C2.

Schema Integration between Object-Oriented Databases

[Step 3-2] If a correspondence assertion ClassOverlap (C1, C2) is specified, we apply


both the Generalize operator and the
Specialize operator to C1 and C2 to create
their common superclass and subclass,
respectively.
[Step 3-3] If a correspondence assertion ClassDisjoint (C1, C2) is specified, we apply the
Generalize operator to C1 and C2 to create
their common superclass.
[Step 3-4] If a correspondence assertion ClassComponent (C1, C2) is specified, we apply
the Compose operator to C1 and C2 to
build a composition hierarchy between
them.
Integration rule 3: This rule is applied when
a
correspondence
assertion
Attribute-Set-Class-Equivalent (AS, C) is specified.
[Step 1] Apply the Upgrade operator to AS to create a new class, say, called NC.
[Step 2] Specify the correspondence assertion
Class-Equivalent (NC, C) and correspondence assertions between the attributes of
NC and the attributes of C.
Integration rule 4: This rule is applied when
a correspondence assertion Class-Homonymous
(C1, C2) is specified.
[Step 1] Apply the Rename operator to C1 or C2 to
make them have different names.
[Step 2] Apply the Create operator to both C1 and
C2 to produce two virtual classes.
Integration rule 5: This rule is applied for
each class without any correspondence assertion.
We apply the Create operator to it to produce a
virtual class.

5. An Integration Example
We now give an example to illustrate the
process of constructing an integrated schema from
two component schemas. Figure 1 shows the
schemas of two object-oriented databases DB1 and
DB2 that store data of two different universities.
First, correspondence assertions between
these two component schemas are specified as
many as possible. Equivalent correspondences
between classes as well as correspondences between their attributes are as follows.
z Class-Equivalent (Person@DB1, People@DB2)
Attribute-Equivalent (ss#, ssn)
Attribute-Equivalent (nationality, nationality)
Attribute-Set-Equivalent
({first-name,
last-name}, {name})
z Class-Equivalent
(Employee@DB1,
Employee@DB2)

41

Attribute-Equivalent (salary, salary)


z Class-Equivalent
(Student@DB1,
Student@DB2)
Attribute-Equivalent (department, department)
Attribute-Containment (father, parent)
Related correspondences between classes as
well as correspondences between their attributes
are as follows.
z Class-Containment (Computer@DB1, Equipment@DB2)
Attribute-Equivalent (serial-no, serial-no)
Attribute-Equivalent (price, price)
Homonymous correspondences between
classes are as follows.
z Class-Homonymous (Association@DB1, Association@DB2)
Equivalent correspondences between sets of
attributes and classes are as follows.
z Attribute-Set-Class-Equivalent (Person@DB1,
{nationality}, Country@DB2)
Then, according to the specified correspondence assertions, integration rules are applied in
the following order to construct the integrated
schema.
1. Apply integration rule 3 for Attribute-Set-Class-Equivalent
(Person@DB1,
{nationality}, Country@DB2).
Upgrade (Person@DB1, [nationality], nationality, Country)
Specify
correspondence
assertions
Class-Equivalent (Country@DB1, Country@DB2) and Attribute-Equivalent (nationality, name).
2. Apply integration 1 for Class-Equivalent (Person@DB1, People@DB2).
Rename (People@DB2, Person)
Rename (Person@DB1, ss#, ssn)
Coerce (Person@DB1, nationality, Country)
Concatenate (Person@DB1, [first-name,
last-name], name, string-type2)
Combine (Person@DB1, Person@DB2, Person)
3. Apply integration rule 1 for Class-Equivalent
(Employee@DB1, Employee@DB2).
Combine (Employee@DB1, Employee@DB2,
Employee)
2

The string-type is the domain type of the attribute


name in Person@DB2.

42

Ching-Ming Chao

Person

Computer

ss#

serial-no

first-name

price

last-name

cpu-speed

sex

ram-size

nationality

Student

Employee

Association

department

salary

name

father

participation

purpose

(a) An object-oriented schema of DB1

People

Country

Equipment

ssn

name

serial-no

name

population

price

age

area

nationality

Employee

Student

salary

Association

department

name

parent

purpose

composition hierarchy

participation
Faculty
rank

Staff

inheritance hierarchy

specialty

(b) An object-oriented schema of DB2


Figure 1. Schemas of two component databases

4. Apply integration rule 1 for Class-Equivalent


(Student@DB1, Student@DB2).
Combine (Student@DB1, Student@DB2,
Student)
5. Apply integration rule 1 for Class-Equivalent
(Country@DB1, Country@DB2).
Rename (Country@DB1, nationality, name)
Combine (Country@DB1, Country@DB2,
Country)
6. Apply integration rule 2 for Class-Containment
(Computer@DB1, Equipment@DB2).

Inherit (Computer@DB1, Equipment@DB2)


7. Apply integration rule 4 for Class-Homonymous
(Association@DB1, Association@DB2)
Rename (Association@DB1, Committee)
Create (Committee@DB1)
Create (Association@DB2)
8. Apply integration rule 5 for classes Faculty@DB2 and Staff@DB2 that do not have any
correspondence assertion.
Create (Faculty@DB2)
Create (Staff@DB2)

43

Schema Integration between Object-Oriented Databases

Person

Country

Equipment

ssn

name

serial-no

name

population

price

sex

area

age

Computer

nationality

cpu-speed
ram-size

Committee

Employee

Student

Association

name

salary

department

name

purpose

participation

parent

purpose

participation
Faculty

Staff

rank

specialty

Figure 2. The jntegrated object-oriented schema

The resulted integrated schema is shown in


Figure 2.

6. Conclusion
We proposed in this paper an approach to
schema integration between object-oriented databases in an MDBS environment. In our approach,
correspondence assertions between schema objects
of component schemas are specified first and as
many as possible. According to the specified correspondence assertions, integration rules are then
applied to construct the integrated schema from
component schemas. Our approach has three salient features that make the automation of schema
integration much easier. First, the correspondence
assertions are in the form of predicates in the
first-order logic. Second, the integration rules are
triggered by specific correspondence assertions,
consist of algorithmic steps, and invoke primitive
integration operators. Last, the primitive integration operators are algebraic operators. Besides, our
approach not only keeps the data of component
databases retrievable form the global schema, but
also gets more information due to schema integration.

[2]

[3]

[4]

[5]

[6]

References
[1] Abiteboul, S., Cluet, S., Milo, T., Mogilevsky,
P., Simon, J., and Zohor, S., Tools for Data

[7]

Translation and Integration, IEEE Data Engineering Bulletin, Vol. 22, No. 1, pp. 3-8
(1999).
Bright, M.W., Hurson, A.R., and Pakzad, S.H.,
A Taxonomy and Current Issues in Multidatabase Systems, IEEE Computer, Vol. 25, No.
3, pp. 50-60 (1992).
Grahne, G. and Mendelzon, A.O., Tableau
Techniques for Querying Information Sources
through Global Schemas, in Proceedings of
the 7th International Conference on Database
Theory, pp. 332-347 (1999).
Josifovski, V. and Risch, T., Integrating Heterogeneous Overlapping Databases through
Object-Oriented Transformations, in Proceedings of the 25th International Conference
on Very Large Data Bases, pp. 435-446
(1999).
Pitoura, E., Bukhres, O., and Elmagarmid, A.,
Object-Oriented in Multidatabase Systems,
ACM Computing Surveys, Vol. 27, No. 2, pp.
141-195 (1995).
Schmitt, I. and Turker, C., An Incremental
Approach to Schema Integration by Refining
Extensional Relationships, in Proceedings of
the International Conference on Information
and Knowledge Management, pp. 322-330
(1998).
Tomasic, A., Raschid, L., and Valduriez, P.,
Scaling Access to Heterogeneous Data

44

Ching-Ming Chao

Sources with DISCO, IEEE Transactions on


Knowledge and Data Engineering, Vol. 10, No.
5 (1998).
[8] Yang, J. and Papazoglou, M.P., A Configurable Approach for Object Sharing among Multidatabase Systems, in Proceedings of the International Conference on Information and
Knowledge Management, pp. 129-136 (1995).

Manuscript Received: Feb. 20, 2001


and Accepted: Mar. 20, 2001

Anda mungkin juga menyukai