ELENA MONTIEL-PONSODA
Ontology Engineering Group, Universidad Politcnica de Madrid
Madrid, Spain
lupe@fi.upm.es
emontiel@delicias.dia.fi.upm.es
Ontological Relations
Ontological relations, as they are understood in Ontology Engineering, can be divided into three main groups:
taxonomic, meronymic and non-taxonomic relations. Taxonomic relations are hierarchical or inclusion relations,
i.e., those that allow subordinate concepts to inherit the properties of the superodinate concept they belong to.
They are also known as hyponymy or subclass of relations. Meronymic relations are the ones that hold between
an object and its parts, also known as part-whole relations. The rest of ontological relations (ad-hoc relations of
a specific domain) are considered non-hierarchical relations, for instance, function or cause-result relations.
In this paper, we want to approach one of the most significant relations both in Terminology and in Ontology
Engineering, the subclass of relation. This is a fundamental relation whenever we want to organize the
1 http://www.neon-project.org/web-content/
knowledge of a domain, and, in fact, it is the basic relation in the science of classification, i.e. taxonomy. Any
concept related to another by the hyperonymy-hyponymy relation is said to inherit the properties of the
hyperonym and has additional ones that specify it. This is also the basic idea behind the Aristotelian definition. Let
us take as example the concepts sensor and thermometer. We can say that a thermometer inherits the
properties of the sensor in that it is a device that measures a physical quantity and converts it to a signal, and
specifies it by saying that the thermometer measures temperature.
Lexico-Syntactic Patterns for the Subclass of relation
In most ontology paradigms, the ontological relation subclass of indicates that a class is a specialization of
another class (Gmez-Prez et al., 2003: 49). In natural languages, this relation is present in definitions and
realized linguistically, for instance, by means of the combination is a(n), as in A thermometer is a sensor
that (). In this sentence, the hyponym or subclass is the concept on the left-hand side of the verb, and the
hypernym or superclass is on the right-hand side. By identifying the most used linguistic constructs that a
language has for representing this kind of relation and including them in the LSP repository, we will allow users to
express the subclass of relations present in their knowledge domain in full natural language (NL), and
automatically model it in an ontology with the system previously outlined in FIGURE 1.
If we have a look at the following sentences in English, we will see that they establish a subclass of relation
among the concepts in the sentence.
1. An orphan drug is a type of drug.
2. Membrane proteins are classified into two major categories: integral proteins and peripheral proteins.
We observed that the lexico-syntactic structures presented above appear recurrently across domains, and we
decided to formalize them as LSPs. In TABLE 1 we include the LSPs representing these structures, and some
additional ones we have identified for the English language. Some of the symbols and abbreviations used in the
formalization of LSPs have been included below for the sake of readability.
LSP-Subclass of-English
LSP Identifier
Formalization
Examples
Membrane proteins are classified into two major categories, integral proteins and peripheral
proteins.
Sensors are divided into two groups: contact and non-contact sensors.
There are two types of narcotic analgesics: the opiates and the opioids.
As has been mentioned, the linguistic structures presented above correspond to the ontological relation subclass
of. However, from an ontological perspective it is recommendable to further specify this basic hierarchical relation
by adding knowledge about disjointness and exhaustiveness of the hyponyms with regard to the hypernym.
Disjointness is generally understood in ontological modelling as the property of two classes of not sharing
subclasses or individuals. If we analyse sentence number 2, Membrane proteins are classified into two major
categories: integral proteins and peripheral proteins, we should further determine whether integral proteins and
peripheral proteins are two completely different groups and do not share any individuals, in other words, whether
a protein belonging to the integral proteins group can also belong to the peripheral proteins class, or not.
Exhaustiveness has to do with the property of a set of classes that belong to a superclass and include all
individuals that belong to that superclass, without exclusion of any of them. Considering again sentence 2, we
should also specify whether these two types of membrane proteins are the only types or groups in which
membrane proteins can be divided, or if there is an additional type of membrane proteins.
The reason for this further specification of the "subclass of" relation has to do with ontology consistency. Taking
into account that applications based on ontologies are able to reason with the information contained in ontologies,
it is advisable to enrich the subclass of relations established in the ontology with information about disjointness
and exhaustiveness to guarantee consistency checking and automatic evaluation of the elements contained in the
ontology. To put it in other words, that sort of information helps checking if the ontology has been correctly
instantiated. However, users who are not experts in ontology engineering are simply not aware of the fact that if
they do not explicitly declare this kind of knowledge2, some inconsistencies can result when reasoning with the
ontology. For example, if subclasses are not declared to be disjoint, they may be considered to overlap, i.e., to
share individuals. Both specifications of the subclass of relation may be implicit in the sentence, but they have to
be made explicit in ontologies. In this sense, there have been some initiatives in order to automatically extract
knowledge about disjointness from the natural language assertions expressed by users modelling ontologies.
Among other features for automatically discovering disjoint subclasses, Vlker et al. (2007), have intuitively
identified two linguistic patterns in line with Hearsts (1992) for the English language:
a) A pattern that contains the conjunction ...eitheror / neithernor This conjunction indicates that the
introduced concepts do not allow sharing instances, as they belong to only one of two groups.
b) A pattern based on enumerations. This pattern assumes that whenever users enumerate a set of concepts,
these are pair wise disjoint.
There is no doubt that the function words either or neither undoubtedly refer to disjoint classes. Then, it can be
stated that whenever the system comes across that conjunction, it will straightforward model knowledge units as
subclass of relation and disjoint classes in the ontology. For this purpose, we have created new patterns that
directly identify both relations. See TABLE 2 below.
LSP-Subclass of + Disjoint classes-English
LSP Identifier
Formalization
According to the enumeration assumption, we could contend that the identified LSPs 3, 4 and 5 for the subclass
of relation (see TABLE 1) could also be regarded as patterns indicating additionally that subclasses are pair wise
disjoint. In order to validate this statement we analysed the use of some verbs of classification in the British
National Corpus3 (BNC). The verb forms searched in the BNC were: classified into, divided into, split into and
separated into. Although this is an ongoing research, at this stage we observed that out of the sentences
retrieved with the meaning of "subclass of"4 (50 sentences analysed for each verb form), between 80% and 90%
clearly expressed disjoint subclasses according to domain knowledge. This statement was further supported by
the fact that sentences often included a cardinal number as well as adjectives like distinct or separate
accompanying class names, e.g.:
Covalent crystals are classified into two distinct groups
Government institutions are divided into two separate scientific communities
The results of this initial analysis confirm that when users provide an enumeration of subclasses, these are
normally pair wise disjoint. Nevertheless, in order to cope with the remaining 10% to 20% of cases in which
information about disjointness was not so clear, a system to interact with the user has been planned. This will
enable users to become aware of the relevance of making explicit that sort of information when modelling
ontologies.
Thus, our system formulates some questions in order to find out whether subclasses are disjoint. Taking as
example sentence 2 Membrane proteins are classified into two major categories: integral proteins and peripheral
proteins, the question launched by the system would be:
Can a certain membrane protein belong to the category of integral proteins and peripheral proteins at the
same time?
In this case, the answer should be no, and the system would further model those subclasses as disjoint classes.
If the answer is yes, it would just model them as subclasses of membrane proteins.
2 The need of making explicit information about disjointness and exhaustiveness is necessary in ontology paradigms following
Description Logics, because they rely on an open world assumption.
3 http://www.natcorp.ox.ac.uk/
4 Note that some of these verbs (e.g. divide in/into) are ambiguous and could also indicate a part-whole relation. For more
information, see (Montiel-Ponsoda, et al., 2008, and Aguado de Cea et al., 2008).
Regarding exhaustiveness, whenever there is a cardinal number, as in example 2, we can be nearly sure that the
enumeration is complete. However, if this cardinal is not present in the sentence, one option to find out whether
the listed classes are exhaustive could be by answering the following question posed by the system:
Are there any other types of membrane proteins?
According to our knowledge of the domain, the question should be no, in order to be completely sure that the
right modelling decision is to further model those classes as exhaustive classes.
Conclusions
In this paper we have tried to show how grammatical collocations can provide interesting hints about the
conceptual relations underlying them. In fact, these combinations of lexical items have been used in fields like
Terminology and Ontology Engineering with the purpose of automatically extracting data for accelerating
respectively terminology tasks and ontology development. However, with this new approach to what we have
called Lexico-Syntactic Patterns (LSPs), we aim at applying lexical combinations to help users who are not
experts in ontology engineering to develop ontologies by formulating in NL what they want to model. In order to
support the system for an automatic identification of LSPs corresponding to ontological relations, we have
developed a repository of LSPs associated to ontological relations that make up the core of the system. In this
paper, we have focused on those LSPs expressing the "subclass of" ontological relation, and have pointed out the
semantic differences between the lexical combinations expressing this relation and the equivalent ontological
relation, since the latter needs to be further specified with information about disjointness and exhaustiveness. As
explained in the paper, some lexical elements may help to directly identify these specifications. Notwithstanding,
with the aim of validating and making explicit these properties of the subclass of relation, user interaction with
the system has been devised. Future work will be centred on enriching this initial repository of LSPs, putting
special emphasis on the discovery of disparities between lexical and ontological relations.
Acknowledgements. This research has been supported by the European project NeOn (FP6-027595), and the
National project GeoBuddies (TSI2007-65677C02).
References
Aguado de Cea, G., Gmez-Prez, A., Montiel-Ponsoda, E., Surez-Figueroa, M.C. (2008). Natural Languagebased Approach for Helping in the Reuse of Ontology Design Patterns. To appear in EKAW 2008, Catania, Italy.
Aguado de Cea, G. (2007). A Multiperspective Approach to Specialized Phraseology: Internet as a Reference
Corpus for Phraseology. In S. Posteguillo, M.J. Esteve and M.L. Gea-Valor (eds.). The Texture of Internet:
Netlinguistics in progress. Newcastle: Cambridge Scholars Publishing.
Cimiano, P. and Wenderoth, J. (2007). Automatic Acquisition of Ranked Qualia Structures from the Web. In
Proc. of the Annual Meeting of the Association for Computational Linguistics, 888--895.
Condamines, A. (2002). Corpus analysis and conceptual relation patterns. Terminology. Vol. 8. (1), 141-162.
Feliu, J. and M.T. Cabr. (2002). Conceptual relations in specialized texts: new typology and an extraction
system proposal. In Proc. of TKE 2002. Nancy, 45-49.
Feliu, J. (2004). Relacions conceptuals i terminologia: anlisi i proposta de detecci semiautomtica. PhD Thesis.
Institut Universitari de Lingstica Aplicada.
Gmez-Prez, A., Fernndez-Lpez, M. Corcho, . (2003). Ontological Engineering. Springer, New York.
Hearst, M. A. (1992). Automatic Acquisition of Hyponyms from
In Proc. of the14th International Conference of Computational Linguistics, 539-545.
Large
Text
Corpora.
Meyer, I. (2001). Extracting knowledge-rich contexts for terminography. A conceptual and methodological
framework. In C. Bourigault, (ed.), Recent Advances in Computational Terminology, 279-303. Benjamins.
Montiel-Ponsoda, E., Aguado de Cea, G., Gmez-Prez, A., Surez-Figueroa, M.C. (2008). Helping Naive Users
to Reuse Ontology Design Patterns. In Proc.of the KRRSW, co-located at the ESWC2008, in Tenerife, Spain.
Snow, R., Jurafsky, D., Ng, A. Y. (2004). Learning syntactic patterns for automatic hypernym discovery. In
Advances in Neural Information Processing Systems 17.
Studer, R., Benjamins, R., Fensel, D. (1998). Knowledge engineering: principles and methods. In Data &
Knowledge Engineering 25 (1-2), 161-198.
Vlker, J., Vrandecic, D., Sure, Y., Hotho, A. (2007) Learning Disjointness. In Enrico Franconi and Michael Kifer
and Wolfgang May,(eds.), Proc.of the ESWC2007, Springer-Verlag.