Anda di halaman 1dari 18

Applied Intelligence 12, 251–268 (2000)

°
c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Incremental Iterative Retrieval and Browsing for Efficient Conversational


CBR Systems

IGOR JURISICA
University of Toronto, Faculty of Information Studies, 140 St. George St., Toronto, ON M5S 3G6, Canada
jurisica@fis.utoronto.ca

JANICE GLASGOW
Queen’s University, Department of Computing and Information Science, Kingston, ON K7L 3N6, Canada
janice@qucis.queensu.ca

JOHN MYLOPOULOS
University of Toronto, Department of Computer Science, 6 King’s College Rd., Toronto, ON M5S 3H5, Canada
jm@ai.utoronto.ca

Abstract. A case base is a repository of past experiences that can be used for problem solving. Given a new
problem, expressed in the form of a query, the case base is browsed in search of “similar” or “relevant” cases.
Conversational case-based reasoning (CBR) systems generally support user interaction during case retrieval and
adaptation. Here we focus on case retrieval where users initiate problem solving by entering a partial problem
description. During an interactive CBR session, a user may submit additional queries to provide a “focus of
attention”. These queries may be obtained by relaxing or restricting the constraints specified for a prior query. Thus,
case retrieval involves the iterative evaluation of a series of queries against the case base, where each query in the
series is obtained by restricting or relaxing the preceding query.
This paper considers alternative approaches for implementing iterative browsing in conversational CBR systems.
First, we discuss a naive algorithm, which evaluates each query independent of earlier evaluations. Second, we
introduce an incremental algorithm, which reuses the results of past query evaluations to minimize the computation
required for subsequent queries. In particular, the paper proposes an efficient algorithm for case base browsing and
retrieval using database techniques for incremental view maintenance. In addition, the paper evaluates scalability
of the proposed algorithm using its performance model. The model is created using algorithmic complexity and
experimental evaluation of the system performance.

Keywords: knowledge base technology, case-based reasoning, performance evaluation, context-based iterative
browsing and retrieval

1. Introduction analytical processing, web databases [4], conversa-


tional case-based reasoning [5], exploratory retrieval
There are many applications where one could effec- and browsing [6]. Although the proposed approach to
tively obtain information and support collaboration us- incremental retrieval has wider applicability, here we
ing iterative browsing [1–3]. It is particularly useful focus on its application to achieve efficient retrieval for
for exploratory search in complex domains and for case-based reasoning (CBR) systems.
problem-solving tasks that involve considering alter- The research presented in this paper has three pri-
native situations (e.g., examining “what-if” scenarios). mary objectives: (i) to define an iterative browsing
Such applications include knowledge mining, on-line system for conversational case retrieval; (ii) to adapt
252 Jurisica, Glasgow and Mylopoulos

an incremental view maintenance algorithm used in Thus, case retrieval involves the iterative evaluation of
database systems for use in CBR for efficient iterative a series of queries against the case base, where each
retrieval and browsing; and (iii) to provide a theoretical query in the series is obtained by restricting or relaxing
performance model for the proposed incremental case the preceding query.
retrieval system. Results of performance evaluation In order to support iterative browsing, a system must
suggest that a conversational CBR system with an in- be able to respond to a series of queries, where the sys-
cremental retrieval method would scale up for large and tem or the user may alter a current query and re-submit
complex case bases. However, in some well-defined it for further evaluation. This process may be repeated
and relatively-small domains, traditional CBR algo- until the desired quality and quantity of cases is ob-
rithms currently provide an adequate solution for case tained. The iterative process is aimed at maintaining
retrieval. high recall while improving precision of the case re-
trieval. It is essential that the process is efficient, so
that the overall system is scalable.
1.1. Case-Based Reasoning
To be effective, a conversational CBR system should
limit the number of questions asked and control their
Case-based reasoning (CBR) is a process of remember-
relevancy [7]. Questions are selected based on the num-
ing and reusing experience in the form of cases stored
ber and quality of retrieved cases. To increase relevancy
in a case base.1 It relies on the concept that similar
of cases, the user may need to specify criteria for simi-
problems have similar solutions. Informally, a case is
larity assessment. We have previously demonstrated
a problem-solution-feedback triplet. Given a partially
that variable-context similarity-based retrieval can im-
developed case—a new problem expressed in the form
prove competence of a CBR system [8, 9]. Since the
of a query, the case base is browsed in search of “sim-
criteria are likely to change during the retrieval, the
ilar” or “relevant” cases. These cases are then adapted
process of altering it is iterative. Here we hypothesize
(either by an expert or by the system) to produce a so-
that incremental retrieval can improve the efficiency of
lution for the input problem. The final step of CBR
CBR systems.
involves evaluating the solution and producing feed-
back information.
1.3. Incremental Iterative Browsing
The decision-making process finds similarities be-
tween a problem at hand and cases in the case base,
To illustrate our approach to conversational CBR,
and reuses stored solutions to solve new problems. The
imagine a partial representation of a medical case base
more relevant a case is to a problem, the less adaptation
for an in vitro fertilization (IVF) domain [9]. Indivi-
is needed and the more precise the solution. Thus, the
dual patients are identified by age, IVF cycle, pro-
quality of reasoning depends on the quality and quan-
tocol used, and diagnosis of infertility. During case-
tity of cases in a case base. With an increased number of
based reasoning, it may be desired to modify the
unique cases, the problem-solving capabilities of CBR
relevance of individual attributes during similarity as-
systems improve while their efficiency may decrease.
sessment. In addition, constraints on attribute values
may also be modified. Since these changes affect what
1.2. Conversational CBR cases are retrieved from a case base, we call this con-
text modification.3 Context can be either relaxed or
Conversational CBR systems generally support user restricted. Intuitively, relaxation results in retrieving
interaction during case retrieval and adaptation. In this more cases and restriction results in retrieving fewer
paper we propose a retrieval algorithm for efficient and cases. Since relaxation and restriction can be inter-
effective iterative browsing for CBR systems. During twined, context modification is a process analogous
an interactive CBR session, a user may submit several to iterative browsing. As an example, consider the re-
queries to provide a “focus of attention” (note that we laxation of a context specified in Fig. 1.4 Individual
assume that the queries are related to a given problem- attributes are grouped into three categories of the same
solving session). These queries may be obtained by priority. During retrieval, cardinality criteria are used
modifying a partial description of a case, e.g., relaxing to change a requirement that all attributes in a given
or restricting the constraints specified for a prior query.2 category must match to a relaxed requirement that only
Efficient Conversational CBR Systems 253

Figure 1. Context relaxation by reducing the number of attributes required to match and by enlarging the set of allowable values for an attribute.

2 attributes must match. We call this relaxation reduc- on performance; (ii) an explicit context is used during
tion, since we reduce number of attributes in a category similarity assessment to ensure that only relevant cases
required to match. We can also relax requirements on are retrieved; and (iii) incremental context transforma-
attribute values. tions are applied during query relaxation to speed-up
This paper considers alternative approaches for im- query-processing.
plementing iterative browsing in conversational CBR
systems. First, we discuss a naive algorithm, which 1.4. Outline
evaluates each query independent of earlier evalua-
tions. Thus, after the query is modified, the system Next we describe TA3, a case-based decision-support
searches the case base again. Second, we introduce system with incremental retrieval.5 Section 3 presents
an incremental algorithm, which reuses the results of theory and implementation of context modification. An
past query evaluations to minimize the computation re- incremental approach to context relaxation and restric-
quired for subsequent queries. Incremental algorithms tion is described in Section 4. Section 5 introduces a
have previously been successfully applied to database performance model and presents a performance evalu-
systems for view maintenance [10–13]. In particular, ation of TA3. Section 6 discusses related research, our
the paper proposes an efficient algorithm for case base experience, and future directions.
browsing and retrieval using database techniques for
incremental view maintenance. Although this may re- 2. System Description
quire additional storage, overall performance is usu-
ally improved. Retrieval is based on a nearest-neighbor TA3 comprises all modules of a CBR system: case
matching algorithm [14], which was modified as fol- representation, case retrieval, evaluation, adaptation
lows: (i) attributes are grouped into categories of dif- and presentation. In addition, it supports case base or-
ferent importance to help control the matching process ganization and knowledge mining. TA3 uses flexible
and diminish the negative effect of irrelevant attributes case representation, effective and efficient incremental
254 Jurisica, Glasgow and Mylopoulos

Figure 2. TA3 architecture.

retrieval algorithm. Figure 2 depicts the overall archi- Context plays an important role during case retrieval.
tecture, data and control flow. Our primary goal is to retrieve only useful cases, i.e.,
A case, C, is a representation of a real-world experi- cases highly relevant to a problem represented as a par-
ence in a particular representation scheme. A case will tially developed target case (a query). Since the use-
be represented as a finite set of attribute-value pairs fulness of individual attributes during matching varies,
(descriptors):6 we propose to incorporate the notion of a context to
define what attributes are relevant in a given situation.
C = {ha0 · V0 i, ha1 · V1 i, . . . , han · Vn i}.

Individual attributes and values are grouped into one 2.1. A Context
or more categories, which introduces additional struc-
ture to a case representation. Category membership is A context explicitly defines the attributes to be used dur-
defined either using a knowledge-discovery algorithm ing similarity assessment and any constraints that may
[9, 15] or using domain knowledge (if available). Dif- be applicable to the attribute values: Context = {ha0 ·
ferent constraints may be ascribed to different cate- C V0 i, . . . , hak ·C Vk i}, where ai is an attribute name and
gories, i.e., individual groups of attributes. This reduces constraint C Vi specifies the set of “allowable” values
the impact of irrelevant attributes on system compe- for attribute ai . A context specifies relevant attributes
tence by selectively using individual categories during and how “close” an attribute value must be in order to
matching [16, 17]. satisfy the context.
Efficient Conversational CBR Systems 255

A context is formed using the most relevant attributes less important attributes x-of-n matching (for x < n)
for a given target case; similarity is determined as a and irrelevant attributes may be eliminated from con-
closeness of values for the prescribed attributes, taking sideration altogether.
attribute value distribution and the task being solved Value constraints specify constraints on attribute val-
into account. The attributes and constraints for a con- ues. They include: (i) instance matching—the attribute
text can be specified by a user or derived automatically. value in the source case must match the attribute value
In general, the context can be specified using the fol- in the target case; (ii) set matching—the attribute value
lowing scenarios: (i) Task-based retrieval: The user in the source case must be included in the set of al-
specifies the task to be solved and the system selects an lowed values specified in the context; (iii) interval
appropriate context. (ii) Query-by-example: The user matching—the attribute value in the source case must
selects a source case from a case base to be used as a be within an interval specified by the context; and (iv)
problem example. Case attributes and values are then domain matching—the attribute value in the source
used as attributes and constraints in a context. (iii) The case must be an instance of the constraint in the context.
user has enough domain knowledge7 to specify the con- Variable-context similarity-based retrieval is mono-
text directly. (iv) Retrieval-by-reformulation: The user tonic [8]. For the purpose of case retrieval, the sim-
submits an initial unrefined query, reviews the resulting ilarity relation maps a Context and a CaseBase onto
solution, and then iteratively modifies and resubmits the set of cases Answer in the CaseBase that sat-
the query. This approach is best suited for repository isfy the Context. Given a CaseBase and a Context,
browsing. the retrieval function returns a non-empty set of rel-
If the interpretation of the case in a case base (a evant cases Answer ⊆ CaseBase such that all cases in
source case) satisfies the given constraints for the spec- Answer satisfy the given Context. The retrieval func-
ified attributes, then it is considered relevant for a tion is complete in the sense that it returns all relevant
given context. A case C satisfies a context Context, cases and only relevant cases.
denoted sat(C, Context), if and only if for all pairs
hai .C Vi i ∈ Context, there exists a pair hai · Vi i ∈ C
such that Vi is in C Vi : 3. Context Relaxation and Restriction

sat(C, Context) iff ∀ai hai · C Vi i ∈ Context An explicitly defined context controls the closeness of
→ ∃Vi hai · Vi i ∈ C ∧ Vi ∈ C Vi . retrieved cases. If too many or too few relevant cases
are retrieved using the initial context, then the system
All cases relevant to a given context are considered automatically transforms the context or the user man-
similar for the context.8 The process of retrieving rel- ually modifies it. The context transformation process
evant cases can then be described as a constraint satis- controls the quality and quantity of retrieved cases and
faction process [18]. thus when transforming the context, the system may re-
turn an approximate answer quickly or may spend more
resources to calculate a more accurate answer [19]. An
2.2. Cardinality and Value Constraints approximate answer can be iteratively improved, so
that the change between an approximate and an accu-
Two types of constraints can be imposed on attributes rate answer is continuous. This is an important feature
and their values. Cardinality constraints specify the for bounded-resource computation [20, 21].
number of attributes required to match for a particular We propose two context transformations as a foun-
category. This is an extension of an x-of-n matching dation for supporting iterative retrieval and browsing:
algorithm [17], since different categories may have dif- relaxation—to retrieve more cases, and restriction—to
ferent matching requirements, i.e., different x. Individ- retrieve fewer cases.
ual categories are ordered according to the importance
of their attributes. If no priority is assigned to a cat-
egory, they are accessed sequentially. If the case has 3.1. Context Relaxation
only one category, the matching is equivalent to x-of-
n matching. However, if more categories are defined, A context Context1 is a relaxation of a context Context2 ,
then important attributes may require n-of-n matching, denoted Context1 Â Context2 , if and only if the set of
256 Jurisica, Glasgow and Mylopoulos

attributes for Context1 is a subset of the set of attributes Thus, the attribute value could alternatively be relaxed
for Context2 and for all attributes in Context1 , the set from 28 to for example, 26–30. The constraint of at-
of constraints in Context2 is a subset of the constraints tribute 1 in category 3 is also relaxed from constraint
in Context1 . As well, contexts Context1 and Context2 tubal to constraint set {tubal, endo}.
are not equal.
We implement context relaxation as reduction
3.2. Context Restriction
and generalization.9 Reduction removes an attribute-
value pair from a context, either permanently or
A context can be similarly iteratively restricted by mak-
dynamically—given x-of-n matching, the number of
ing it progressively more specific, i.e., allowing fewer
attributes required to match is reduced from x to y,
cases to satisfy it. As in the relaxation algorithm, ex-
where 0 < y < x ≤ n. Generalization relaxes the
pansion and specialization algorithms produce a par-
context by enlarging the set of allowable values for an
tial order of contexts. In general, only one category
attribute.
is restricted at a time, using categories with the lower
The relaxation technique can advantageously be
priority first.
used for returning answers to a specific query as well as
A Context1 is a restriction of a Context2 , denoted
returning related answers [22]. Without an automatic
Context1 ≺ Context2 , if and only if Context2 is a relax-
query relaxation, users would need to submit alterna-
ation of Context1 : Context1 ≺ Context2 iffContext2 Â
tive queries. The restriction technique works analo-
Context1 . We implement context restriction as expan-
gously, but is used mainly for controlling the amount of
sion and specialization. Expansion, an inverse opera-
returned information, preventing information overload.
tion to reduction, strengthens constraints by enlarging
Since the search for relaxed or restricted contexts could
the number of attributes that are required to match. Spe-
be infinite, there must be a mechanism for controlling
cialization strengthens constraints by removing values
it, either by user intervention (via user preferences) or
from a constraint set for an attribute. This may lead to
by other means (e.g., limiting number of consecutive
a decreased number of cases that satisfy the resulting
context modifications).
context.
As an example, consider the relaxation of a context
Contexts can also be iteratively restricted to re-
specified in Fig. 1. This context was created from a
trieve successively fewer cases. There are two possible
partially developed problem case. In order to increase
implementations of context restriction: expansion—
versatility, we grouped attributes into three categories.
strengthening constraints by enlarging the number
Individual attributes identify age, IVF cycle, protocol
of attributes required to match and specialization—
used, and diagnosis of infertility. Assume that all cat-
strengthening constraints by removing values from a
egories have the same priority and value constraints
constraint set for an attribute (see Fig. 3).
but they require a different number of attributes to
match, i.e., different cardinality criteria are imposed. It
is not possible to apply reduction on category 1, since 3.3. Naive Iterative Retrieval Algorithm
there is only one attribute. Category 2 and 3 can be
relaxed using reduction. Some(2), in the cardinality Conversational case-based reasoning and case base
criteria for category 3, implies that at least two of the browsing require that the query (and the context) be
attribute-value pairs need to match for the whole cate- iteratively changed, by restricting or relaxing it. The
gory to match. This constraint could be further relaxed process of restricting and relaxing contexts can be re-
to Some(1). Additionally, category 2 could be relaxed peated and interwoven until the agent is satisfied with
from Some(2) to Some(1), as illustrated in Fig. 1. the quantity and the relevance of the retrieved cases. It
In the second scenario, value constraints are relaxed. is apparent that after the context is modified by relax-
Since all categories required instance matches, context ation/restriction, the system must re-evaluate the query,
values could be updated by using domain knowledge. i.e., to search the case base for similar cases. A naive
Thus, the constraint for the AGE attribute could be approach takes the new query and submits it to the sys-
changed to 24–29 (as shown in Fig. 1), as an imme- tem (see Fig. 4). A more sophisticated approach could
diate generalization of 28. In addition, a user-guided take advantage of an already processed query by incre-
relaxation may be a better option, since the user might mentally modifying its result [23]. Thus, the system
have domain knowledge not represented in the system. will not search the whole case base again, but will only
Efficient Conversational CBR Systems 257

Figure 3. Context restriction by enlarging the number of attributes required to match and by reducing the set of allowable values for an attribute.

Figure 4. Naive iterative case retrieval algorithm. Context is initialized with the attributes and constraints from the input case. Special counters
are used to prevent repeating context restrictions and relaxations forever. Context transformations modify attributes of the least important
category first. Only one category is transformed at a time.
258 Jurisica, Glasgow and Mylopoulos

modify set of retrieved relevant cases accordingly. In produced during query evaluation. Although this may
the next section, we introduce such an algorithm. require additional storage, the overall efficiency is
usually improved. Thus, during an incremental com-
putation the result of a query is reused to evaluate a
4. Incremental Context Modifications subsequent query more efficiently.
If the number of attributes per case is significantly
The basic idea of incremental query processing is to smaller than the total number of cases in the case base
store query results and reuse them to compute related then incremental context modification outperforms
queries [10]. Incremental algorithms have previously computation of the answer from scratch. Our system
been successfully applied to database systems for view supports incremental context modification when partial
maintenance [10–13]. Incremental view maintenance results are kept at the attribute or category level. The
algorithms avoid a complete query re-computation by first approach requires extra storage space, but is more
changing only relevant parts of the answer or view. versatile and is thus suitable when context is changed
Usually, an incremental approach is substantially more frequently and when cases have fewer attributes. The
efficient than a naive one. Efficiency improvement is second approach is a compromise between efficiency
higher when several consecutive changes to the query gain from an incremental approach and modest stor-
are required (i.e., during iterative browsing), or when a age requirement. It is useful for less frequent context
large information base is used. Small updates to the changes and for case bases containing cases with a large
query generally produce only small changes to the number of attributes. Next, we explain the rationale be-
query result. Then an incremental approach performs hind the first approach. (The second approach works
only local changes to the query [13]. analogously.)
In general, an incremental view maintenance al- Consider context relaxation with the naive and incre-
gorithm handles deletions, negations, updates, aggre- mental retrieval algorithms (see Figs. 4 and 5). During
gation and recursion. Various approaches have been k iterative context modifications the naive case retrieval
proposed to tackle these problems. One well known algorithm requires k iterations, while the incremental
incremental view maintenance algorithm is a count- algorithm handles it with only a single iteration. Al-
ing algorithm, which supports delete and re-derive op- though the initial evaluation is the same for both algo-
erations [24]. It assumes universal materialization of rithms, the incremental approach stores partial match-
predicates and stores counts of the number of deriva- ing results on the attribute level in the set of cases
tions to be associated with each tuple. Another al- Answeri . If a case is a member of Answeri it means
gorithm derives production rules to maintain selected that it satisfies the constraint on attributei .
SQL views, namely views without duplicates, aggre- During an iterative retrieval of cases, the system
gation and negation [11]. modifies constraints on attributes. A naive retrieval al-
Here we propose an incremental retrieval algorithm gorithm (see Fig. 4) produces Answer0 by determining
for CBR systems based on a nearest-neighbor matching which cases in a case base satisfy constraints on all at-
algorithm [14] with the following modifications: tributes defined in the context, i.e., for all Answer.ai .
The incremental retrieval algorithm reuses Answer.ai
1. grouping attributes into categories of different im- to produce the set of relevant cases Answer0 (see Fig. 5).
portance to gain a fine-grain control over the match- Relaxing attribute ai changes a set of matching cases
ing process, and diminish the negative effect of included in Answer.ai . However, all remaining partial
irrelevant attributes on competence; answers remain unchanged. Thus, Answer0 can be con-
2. using an explicit context during similarity assess- structed by adding cases that satisfy both the initial
ment to recall only relevant cases; and context and the relaxed constraint on attribute ai to
3. accelerating query processing via incremental con- the Answer. Restricting attribute ai results in creat-
text transformation during query relaxation. ing Answer0 by removing cases that do not satisfy
additional constraints from Answer. Determining satis-
After the context is modified by relaxation/restri- fiability only requires testing if the case in the set of
ction the amount of necessary modification to a query retrieved cases must be removed either because it needs
in an incremental approach can be, to some degree, an excluded value to match or because it cannot match
controlled by collecting and using extra information an added attribute.
Efficient Conversational CBR Systems 259

Figure 5. Incremental case retrieval algorithm. Initial Answer is set to ∅. A user specifies LowerLimit and UpperLimit to set a desired amount
of cases to be retrieved.

The idea of incremental context relaxation and Reduction involves removing an attribute-value pair
restriction has evolved from the notion of diffe- from a context. This can be done either permanently—
rential queries [25]. First, parts of the context af- an attribute ak is removed from the context: Context0 =
fected by the transformation are determined. Second, Context − ak , or dynamically—an m-of-n matching is
only affected parts of context are recomputed. We used. Thus, for the reduced context, the resulting set of
express this process using context addition and cases is generated as a union of the partial results of the
difference.10 Thus, the incremental context transfor- set of cases that satisfy individual attribute constraints
mation algorithm can be formalized as follows: of the original context (Answer.ai ), without consider-
Context0 = Context + Context+ − Context− , where Co- ing the constraints on the removed attribute:
ntext + and Context− denote all attributes and their
associated constraints that need to be added or Answer0 = ∩Answer.ai /Answer.ak .
removed.
Without loss of generality, it is assumed that Expansion involves adding an attribute-value pair
Context+ and Context− are contexts with a single to a context: Context0 = Context + Context+ , where
attribute-constraint pair. Because the context transfor- Context+ = {hai , {Vi }i}. Thus, removing cases that
mation process is iterative, more complex constraints do not satisfy the constrained context from the set of
can be created through multiple iterations. Next we cases generates the set of retrieved cases:
formalize context transformations for our incremental
retrieval algorithm (see Fig. 5) and show how the final Answer0 = Answer ∩ Answer.ai0 ,
set of retrieved cases (Answer0 ) can be constructed us-
ing partial results. where Ci ∈ Answer.ai0 iff sat(Ci , Context+ ).
260 Jurisica, Glasgow and Mylopoulos

Generalization involves enlarging the set of commerce. Here we report on efficiency improvement
allowable values for a given context: Context0 = achieved by using an incremental query modification.
Context + Context+ , where Context+ = {hai , {Vi }i}. Precision and recall are used to evaluate information
Thus, the set of retrieved cases is generated as an in- retrieval systems, while accuracy and coverage evalu-
tersection of the set of cases that satisfy the original ate machine-learning algorithms. The former bench-
context and the set of cases that satisfy the context marks are used to assess how good the retrieval is,
change Context+ : i.e., how much relevant information is retrieved and
how much irrelevant information is included. The lat-
Answer0 = Answer ∩ Answer.ai0 , ter benchmarks are used to measure accuracy of the
learned rule and the portion of an information base it
where Ci ∈ Answer.ai0 iff sat(Ci , Context+ ).
covers. Given the nature of information retrieval sys-
Specialization involves removing values from a
tems, when recall rises, generally precision decreases
constraint set for an attribute defined in a context:
and vice versa. In contrast, variable-context similar-
Context0 = Context − Context− , where Context− =
ity assessment helps to achieve both high precision
{hai , {Vi }i}. Thus, the set of retrieved cases (Answer0 )
and high recall. Two iterative retrieval approaches yield
is generated by removing cases that do not satisfy the
high precision and recall simultaneously:
restricted context from Answer:
1. Conservative approach: The objective is to prevent
Answer0 = Answer ∩ Answer.ai0 ,
retrieving irrelevant items, even if some relevant
where Ci ∈ Answer.ai0 iff sat(Ci , Context− ). items are missed. After the initial set of relevant
items is available, we could iteratively consider
neighbors and include only relevant ones (so that
5. Performance Evaluation precision is not decreased). This approach is ap-
propriate if the user is knowledgeable in the prob-
Various approaches can be used to evaluate system lem domain, since the retrieval process must be
performance. Available methods evaluate either com- constrained—to eliminate irrelevant matches, and
petence of the system or its scalability. The former similarity-based—to retrieve relevant neighbors of
measures the capabilities of the system and can be initial matches. The conservative approach can be
assessed by precision/recall or accuracy/coverage mea- explained as precision-oriented retrieval with itera-
sures. Scalability assesses system dependence on im- tive query tuning to increase recall, while preserving
portant factors, such as case base size, case and con- high precision.
text complexity, and number of context transformation 2. Novice approach: The emphasis is on retrieving
applied during iterative case retrieval. all relevant items, even at the price of including
We have tested both the competence and scalability some irrelevant ones. After the initial pool of items
of the proposed system [26]: (i) learning control— is retrieved, irrelevant items are iteratively removed.
solving the inverse kinematic task for a three-link This approach is suitable for novices, i.e., users who
spherical, angular robot [27], (ii) classification into a do not have initial knowledge about the structure of
continuous class—predicting the rise time of a ser- the repository. Thus, they may not specify a query
vomechanism in terms of two continuous gain set- that ensures high precision, but using the result and
tings and two discrete choices of mechanical linkages feedback information they may iteratively prune the
[27], (iii) prediction in medicine—suggesting a cost- set of retrieved items. The novice approach can be
effective treatment for in-vitro fertilization patients explained as a recall-oriented retrieval with itera-
without compromising the probability for successful tive query tuning to increase precision, while pre-
pregnancy [9], (iv) letter recognition [8] and (v) soft- serving perfect recall.
ware reuse and program understanding—similarity-
based retrieval of software artifacts [28, 29]. In [30] we From the above description it follows that flexible
show how a generic CBR system prototype TA3 can criteria are required to guide and control retrieval. Usu-
be custom-tailored to satisfy specific requirements in ally, information retrieval systems are tuned for spe-
individual domains. Errico and Jurisica [31] show how cific applications; thus, supporting either precision-
case-based reasoning can be applied to user modeling or recall-oriented retrieval. Next, we describe how
to support adaptive agent-based systems for electronic variable-context similarity assessment can be used to
Efficient Conversational CBR Systems 261

support flexible retrieval while ensuring high precision and the number of consecutive iterations. A perfor-
and high recall. mance model of individual context transformations is
Variable-context similarity assessment assures that presented in below.
all cases that satisfy the current context are similar Iterative retrieval with standard reduction depends
with respect to the context. Thus, precision and recall is on the case base size, context, and number of iterative
100% with respect to the given context. However, we context reductions:
have to take into account that the original query may à !
Xk
be relaxed or restricted. Thus, we make a distinction |CaseBase| |Context| + k|Context| − i ,
between precision according to the query (precisionq ) 1
and precision according to the user (precisionu ). Sim- where |Context| is the context complexity, |CaseBase|
ilarly, we distinguish recall with respect to a query is the case base size, and k is the number of iterations.
(recallq ) from recall according to a user’s original re- Iterative retrieval with incremental reduction de-
quest (recallu ). pends on the case base size, context, set of returned
After making this terminological distinction, we re- cases, and number of iterative context reductions:
turn to the high precision and recall claim. We can à !
prove that precisionq and recallq is 100%. However, Xk
|Context| |CaseBase| + k|Answer| − |Answer| i ,
precisionu and recallu may or may not be equal to 1
100%, depending on the context.
For scalability evaluation we define a performance where |Context| is the context complexity, |CaseBase|
model of TA3. The performance model simulates is the case base size, |Answer| is the number of returned
TA3’s scalability as a function of case base size, cases, and k is the number of iterations.
case representation complexity, query complexity, and Iterative retrieval with standard expansion depends
context-modification strategy used. As a validation of on the case base size, context, and number of iterative
this model, we compare experimental efficiency eval- context expansions:
uation and scalability evaluation of TA3 on respective à !
Xk
case base sizes, case representation complexity, and |CaseBase| k|Context| + i ,
context complexity. 1

5.1. The Performance Model of TA3 where |Context| is the context complexity, |CaseBase|
is the case base size, and k is the number of iterations.
From the algorithms presented in Figs. 4 and 5 it fol- Iterative retrieval with incremental expansion de-
lows that k consecutive context modifications require pends on the case base size, context, set of returned
k applications of the naive algorithm. Thus, a naive cases, and number of iterative context expansions:
approach requires k × |CaseBase| × |Context| evalua- Ã Ã !!
X k
tions. In contrast to this, an incremental algorithm is |Context| |CaseBase| + |Answer| k + i ,
evaluated only once to produce Answerpart and subse- 1
quent context modifications are handled by incremental
changes to the result. where |Context| is the context complexity, |CaseBase|
Assuming that the initial retrieval result |Answerpart | is the case base size, |Answer| is the number of returned
is substantially smaller than the case base size cases, and k is number of iterations.
|CaseBase|, producing Answer0 incrementally is sig- Iterative retrieval with standard generalization de-
nificantly more efficient. For all categories that are not pends on the case base size, context, and number of
affected by a context modification, no recomputation iterative context generalizations:
is necessary. For categories affected by a change, lo-
|CaseBase||Context|(k + 1),
cal changes handle the recomputation as presented in
earlier. It should be noted that both algorithms could where |Context| is the context complexity, |CaseBase|
be improved by indexing, which avoids accessing all is the case base size, and k is number of iterations.
cases in a case base. Iterative retrieval using incremental generalization
Iterative retrieval is based on the algorithm defined depends on the case base size, context, and number of
in Fig. 4. The complexity of iterative retrieval depends iterative context generalizations:
not only on the number of cases in a case base but
also on the user criteria, the number of retrieved cases, |CaseBase|(k+ |Context|),
262 Jurisica, Glasgow and Mylopoulos

where |Context| is the context complexity, |CaseBase| cases, and number of iterative context specializations:
is the case base size, and k is number of iterations.
Iterative retrieval with standard specialization de- |Context|(|CaseBase| +k |Answer|),
pends on the case base size, context, and number of
iterative context specializations: where |Context| is the context complexity, |CaseBase|
is the case base size, |Answer| is the number of returned
|CaseBase||Context|(k + 1), cases, and k is number of iterations.
Next we show efficiency and scalability evaluation
where |Context| is the context complexity, |CaseBase| using the performance model of TA3. On the basis of
is the case base size, and k is number of iterations. experiments from real-world domains, we assume that
Iterative retrieval with incremental specialization ten cases are returned in average, and that ten consec-
depends on the case base size, context, set of returned utive context transformations are performed. It should

Figure 6. Cost of retrieval for standard and incremental reduction as a function of case base size (|CB|) and size of the context (|Context|).
Ten consecutive relaxations are considered.
Efficient Conversational CBR Systems 263

Figure 7. Cost of retrieval for standard and incremental expansion as a function of case base size (|CB|) and size of the context (|Context|).
Ten consecutive restrictions are considered.

be noted that if fewer than ten cases are returned, or complexity (context), and (3) number of consecutive
more than ten consecutive context transformations are iterations for individual context transformations. In
performed, then the incremental approach is even more the model we assumed that partial evaluations are kept
efficient than a naive algorithm. on attribute level. Thus, more storage space is required,
but the system is more efficient when frequent changes
5.2. Efficiency and Scalability Evaluation of TA3 to the context are made.
Presented results support the claim that the incre-
We have used the complexity of incremental implemen- mental context manipulation is more efficient than
tation of context transformations to create a model that the naive approach. Performance improvement is in-
fits TA3’s performance on real-world domains [8, 9]. creased when the case base size is substantial, cases
We parameterize this model to determine the scalability have many attributes and several subsequent context
of TA3 with respect to (1) case base size, (2) query modifications are required. The results presented in
264 Jurisica, Glasgow and Mylopoulos

Figure 8. Cost of retrieval for standard and incremental generalization as a function of case base size (|CB|) and size of the context (|Context|).
Ten consecutive relaxations are considered.

Figs. 6–9 show that the incremental approach improves mental algorithm for recomputing the view in response
performance by 70% on average. However, for sim- to changes in the view definition. The authors consider
ple retrievals—small case base and/or no consecu- an SQL Select-From-Where-GroupBy, Union and Ex-
tive context modifications—the naive approach is more cept views, and present local adaptation strategies using
efficient. the old view materialization. Their methods for adapt-
ing the Where part of an SQL view are similar to our
6. Conclusions context modifications. However, they do not support
cardinality relaxation and restriction. Blakeley et al.
Research in database management systems has long [25] introduced an incremental view maintenance algo-
been producing scalable incremental algorithms for rithm that supports only base relations updates. In con-
view maintenance. Gupta et al. [13] present an incre- trast, TA3 also supports updates to context (i.e., views).
Efficient Conversational CBR Systems 265

Figure 9. Cost of retrieval for standard and incremental specialization as a function of case base size (|CB|) and size of the context (|Context|).
Ten consecutive restrictions are considered.

FRANK is a case retrieval system applied in a medi- CBR has previously been successfully applied in var-
cal domain for back-injury diagnosis [32]. The user ious domains [33]. However, in many of these studies,
provides a description of a patient’s symptoms and only small case bases were considered, often because
selects from a hierarchy of report types. The user’s the implemented system could not efficiently support
top-level considerations are filtered through the sys- retrieval for larger case bases. Various approaches, such
tem’s processing using a flexible control mechanism. as indexing [34–36] and selective forgetting [37, 38],
The plan is then selected based on the report type. The have been proposed to improve the performance of
task mechanism controls queries to the case base—if a CBR systems without decreasing competence. Parallel
query is not successful, then the system resubmits the architectures have also been suggested as a way of in-
query with altered initial values. Thus, it is similar to creasing efficiency [39]. Research on conversational
our notion of iterative query modification. CBR systems has focused on reducing the number
266 Jurisica, Glasgow and Mylopoulos

of questions posed and increasing their relevancy incremental context manipulation works [51]. We are,
[7], supporting dialogue inferencing [40], case base however, considering modifications of the incremental
authoring [41], and maintaining case bases [42, 43]. methods for image processing.
Respective approaches have been proposed that in-
tegrate the CBR paradigm with model-based reason- Acknowledgments
ing [44], machine learning [45], and high-performance
query tools [46, 47]. Our approach to performance The research described in this paper was supported by
improvement is based on incremental context modifi- Natural Sciences and Engineering Research Council,
cation, an approach which has previously proven suc- Communications and Information Technology Ontario
cessful in database applications. and IBM Canada.
Results presented in this paper support the claim that
incremental context manipulation is generally more ef- Notes
ficient then a naive approach. An incremental approach
improves retrieval efficiency without reducing system 1. A reader may find detailed descriptions of CBR process and
competence. Performance improvement is increased systems in [52]. More recent research directions are presented
when a large case base is used, when cases have many in [53] and and practically-oriented descriptions of CBR can be
found in [54, 55].
attributes or when several subsequent context modifica-
2. This modification can be done by the user or the system.
tions are required. In other words, this approach is most 3. Context modifications are similar to knowledge manipulation
suitable for iterative browsing in complex domains and transmutations [56].
thus conversational case-based reasoning systems may 4. More details are provided in Section 3.1.
benefit from it. For simple retrievals involving a small 5. Additional details on the theory, performance evaluation and
complexity analysis of TA3 are provided in [26].
case base and/or no consecutive context modifications,
6. Note that this is a low-level representation and even complex
the naive approach may be preferable since time per- case representations can be created using this schema.
formance is not an issue and no extra storage space is 7. Relevant attributes can be located using a knowledge-discovery
required. algorithm [9].
The performance model simulates TA3’s scalability 8. This is similar to the notion of “similarity in the context of a
given set of descriptors” [56, 57].
as a function of case base size, case representation com-
9. Reduction and generalization implement cardinality and value
plexity, query complexity, and context-modification constraints respectively.
strategy used. As a validation of this model, we com- 10. We define context addition and difference in terms of standard
pare experimental efficiency evaluation and scalability set-theoretic operations. Individual attributes could be added,
evaluation of TA3 on respective case base sizes, case removed or constraints on these attributes could be expanded or
restricted.
representation complexity, and context complexity.
The proposed incremental algorithm is general in the
sense that it is applicable for retrieval in any automated References
reasoning or decision-support system that represents
knowledge as attribute-value pairs and problems are 1. P. Constantopoulos and E. Pataki, “A browser for software
reuse,” in Proc. of the CAiSE’92, edited by P. Loucopoulos,
solved by iteratively accessing and using previously
Berlin: Springer, 1992, pp. 304–326.
derived information. 2. T.P. Martin, H.-K. Hung, and C. Walmsley, “Supporting brows-
In addition to evaluations done earlier, we are cur- ing of large knowledge bases,” Technical report, Department
rently applying the proposed system to two biomedical of Computing and Information Science, Queen’s University,
applications: reproductive medicine and crystallogra- Kingston, ONT, 1992.
3. M.B. Twidale, D.M. Nichols, and C.D. Paice, “Browsing is
phy experiment design [48–50]. In both application
collaborative process,” Information Processing & Management,
areas the case base is large and complex, and the rea- vol. 33, no. 6, pp. 761–783, 1997.
soning is done iteratively. Preliminary results show 4. R.J. Miller, O.G. Tsatalos, and J.H. Williams, “DataWeb: Cus-
that an incremental conversational CBR works well for tomizable database publishing for the web,” IEEE MultiMedia,
symbolic data. However, in both domains we need to pp. 14–21, 1997.
5. D.W. Aha and L.A. Breslow, “Refining conversational case rea-
extend the representation of domain knowledge to in-
soning,” in Proc. of the 2nd International Conference on Case-
clude image representations. At the present time, we Based Reasoning, Providence, RI, 1997, pp. 267–278.
use image feature extraction algorithms to transform 6. K. Hammond, R. Burke, and K. Schmitt, “A case-based approach
images into symbolic information, so that the proposed to knowledge navigation,” in Leake, 1996, pp. 125–136, 1996.
Efficient Conversational CBR Systems 267

7. D.W. Aha, L.A. Breslow, and T. Maney, “Supporting conver- 26. I. Jurisica, “TA3: Theory, implementation, and applications
sational case-based reasoning in an integrated reasoning frame- of similarity-based retrieval for case-based reasoning,” Ph.D.
work,” Technical Report AIC-98-006, Naval Research Labora- dissertation, University of Toronto, Department of Computer
tory, Navy Center for Applied Research in Artificial Intelligence, Science, Toronto, Ontario, 1998.
Washington, DC, 1998. 27. I. Jurisica and J. Glasgow, “A case-based reasoning approach to
8. I. Jurisica and J. Glasgow, “Case-based classification using learning control,” in 5th Int. Conf. on Data and Knowledge Sys-
similarity-based retrieval,” International Journal of Artificial In- tems for Manufacturing and Engineering, DKSME-96, Phoenix,
telligence Tools. Special Issue of IEEE ICTAI-96 Best Papers, AZ, 1996.
vol. 6, no. 4, pp. 511–536, 1997. 28. I. Jurisica, “Similarity-based retrieval for diverse Bookshelf soft-
9. I. Jurisica, J. Mylopoulos, J. Glasgow, H. Shapiro, and R. Casper, ware repository users,” in IBM CASCON Conference, Toronto,
“Case-based reasoning in IVF: Prediction and knowledge min- Canada, 1997, pp. 224–235.
ing,” AI in Medicine, vol. 12, no. 1, pp. 1–24, 1998. 29. H. Dayani-Fard and I. Jurisica, “Reverse engineering by mining
10. L. Bækgaard and L. Mark, “Incremental computation of time- dynamic repositories,” in 5th Working Conference on Reverse
varying query expressions,” IEEE Trans. on Knowledge and Engineering (WCRE’98), Honolulu, Hawaii, 1998, pp. 174–
Data Engineering, vol. 7, no. 4, pp. 583–589, 1995. 182.
11. S. Ceri and J. Widom, “Deriving production rules for incremental 30. I. Jurisica and B. Nixon, “Building quality into case-based rea-
view maintenance,” in VLDB-91, Barcelona, Spain, 1991, pp. soning systems,” in CAiSE*98, 1998, Lecture Notes in Computer
577–589. Science.
12. T. Griffin and L. Libkin, “Incremental maintenance of views 31. B. Errico and I. Jurisica, “Adaptive agent-based systems for the
with duplicates,” in ACM SIGMOD, San Jose, CA, 1995, pp. Web: An application to the NECTAR project,” in AAAI Spring
328–339. Symposium on Intelligent Agents in Cyberspace, Stanford, CA:
13. A. Gupta, I. Mumick, and K. Ross, “Adapting materialized views AAAI Press, 1999.
after redefinitions,” in ACM SIGMOD, San Jose, CA, 1995, pp. 32. E.L. Rissland, J.J. Daniels, Z.B. Rubinstein, and D.B. Skalak,
211–222. “Case-based diagnostic analysis in a blackboard architecture,”
14. D. Wettschereck and T. Dietterich, “An experimental compar- in Proc. of AAAI-93, 1993.
ison of the nearest neighbor and nearest hyperrectangle algo- 33. I.D. Watson, Applying Case-Based Reasoning: Techniques for
rithms,” Machine Learning, vol. 19, no. 1, pp. 5–27, 1995. Enterprise Systems, San Francisco, CA: Morgan Kaufmann Pub-
15. J. Frawley and G. Piatetsky-Shapiro, Knowledge Discovery in lishers, 1997.
Databases, AAAI Press, 1991. 34. R. Barletta and W. Mark, “Explanation-based indexing of cases,”
16. D. Aha, “Tolerating noisy, irrelevant and novel attributes in in Proc. of AAAI-88, 1988, pp. 541–546.
instance-based learning algorithms,” International Journal of 35. A. Ram, “Indexing, elaboration and refinement: Incremental
Man—Machine Studies, vol. 36, no. 2, pp. 267–287, 1992. learning of explanatory cases,” Machine Learning, vol. 10, no. 3,
17. J. Ortega, “On the informativeness of the DNA promoter pp. 201–248, 1993.
sequences domain theory,” Journal of Artificial Intelligence 36. C.M. Seifert, “Case-based learning—predictive features in in-
Research, vol. 2, pp. 361–367, Research Note, 1995. dexing,” Machine Learning, vol. 16, nos. 1, 2, pp. 37–56,
18. P.R. Thagard, K.J. Holyoak, G. Nelson, and D. Gotchfeld, “Ana- 1994.
log retrieval by constraint satisfaction,” Artificial Intelligence, 37. B. Smyth and M.T. Keane, “Remembering to forget: A
vol. 46, pp. 259–310, 1990. competence-preserving case deletion policy for case-based rea-
19. I. Jurisica, “Supporting flexibility. A case-based reasoning soning systems,” in Proc. of the 14th IJCAI. Montreal, Quebec,
approach,” in The AAAI Fall Symposium. Flexible Computa- 1995, pp. 377–382.
tion in Intelligent Systems: Results, Issues, and Opportunities, 38. H. Watanabe, K. Okuda, and S. Fujiwara, “A strategy for forget-
Cambridge, MA, 1996. ting cases by restricting memory,” IEICE Trans. on Information
20. B. D’Ambrosio, “Process, structure, and modularity in reason- and Systems, vol. E78D, no. 10, pp. 1324–1326, 1995.
ing with uncertainty,” in Uncertainty in Artificial Intelligence 39. E. Sumita, N. Nisiyama, and H. Iida, “The relationship between
edited by R. Shachter, T. Levitt, L. Kanal, and J. Lemmer, vol. architectures and example-retrieval times,” in Proc. of AAAI,
4, pp. 15–25, 1990. Seattle, 1994, pp. 478–483.
21. E. Horvitz, “Reasoning under varying and uncertain resource 40. H. Shimazu, A. Shibata, and K. Nihei, “Case-based retrieval
constraints,” in Proc. of AAAI-88, 1988, pp. 111–116. interface adapted to customer-initiated dialogues in help desk
22. T. Gaasterland, “Restricting query relaxation through user con- operations,” in Proc. of the 12th National Conference on Artifi-
straints,” in Proc. Int. Conf. on Intelligent and Coop. Inf. Systems, cial Intelligence. Seattle, WA, 1994, pp. 553–564.
Rotterdam, 1993, pp. 359–366. 41. K.M. Gupta, “Case base engineering for large scale industrial
23. F. Bancilhon, “Naive evaluation of recursively defined relations,” applications,” in AAAI Spring Symposium Series on Knowledge
Knowledge Base Management Systems, edited by M. Brodie and Management, Stanford, CA, 1997.
J. Mylopoulos, pp. 165–178, 1986. 42. K. Racine and Q. Yang, “Maintaining unstructured case bases,”
24. A. Gupta, I. Mumick, and V. Subrahmanian, “Maintaining views in Proc. of the 2nd International Conference on Case-Based
incrementally,” in Proc. of the 12th ACM SIGACT-SIGMOD- Reasoning, Providence, RI, pp. 553–564, 1997.
SIGART Symposium on Principles of Database Systems, 1993, 43. B. Smyth, “Case base maintenance,” in 11th International
pp. 157–166. Conference on Industrial Engineering Applications of Artifi-
25. J.A. Blakeley, P.-A. Larson, and F.W. Tompa, “Efficiently up- cial Intelligence and Expert Systems (IEA/AIE’98). Benicassim,
dating materialized views,” in ACM-SIGMOD, 1986, pp. 61–71. Castellon, Spain, 1998, pp. 507–516.
268 Jurisica, Glasgow and Mylopoulos

44. M.P. Feret and J.I. Glasgow, “Combining case-based and model- Crystallization, San Diego, CA, An abstract for an oral presen-
based reasoning for the diagnosis of complex devices,” Applied tation, 1999.
Intelligence, vol. 7, no. 1, pp. 57–78, 1997. 50. I. Jurisica, G. DeTitta, J. Luft, J. Glasgow, and S. Fortier,
45. L.A. Breslow and D.W. Aha, “NaCoDAE: Navy conversational “Knowledge management in scientific domains,” in AAAI-99
decision aids environment,” Technical Report AIC-97-018, Workshop on Exploring; Synergies of Knowledge Manage-
Naval Research Laboratory, Navy Center for Applied Research ment and Case-Based Reasoning, Orlando, FL, pp. 30–34,
in Artificial Intelligence, Washington, DC, 1997. 1999.
46. H. Munoz-Avila, J.A. Hendler, and D.W. Aha, “Conversational 51. J. Glasgow and I. Jurisica, “Integration of case-based and image-
case-based planning,” Review of Applied Expert Systems, vol. 5, based reasoning,” in AAAI’98 Workshop on Case-Based Reason-
pp. 163–174, 1999. ing, edited by D.W. Aha, Madison, WI, pp. 67–74, 1998.
47. I. Jurisica and J. Glasgow, “An efficient approach to itera- 52. J.L. Kolodner, Case-Based Reasoning, San Mateo, CA: Morgan
tive browsing and retrieval for case-based reasoning,” in 11th Kaufmann, 1993.
International Conference on Industrial Engineering Applica- 53. D. Leake (ed.), Case-Based Reasoning: Experiences, lessons
tions of Artificial Intelligence and Expert Systems (IEA/AIE’98), and future directions, AAAI Press, 1996.
Benicassim, Castellon, Spain, 1998, Vol. 2 of Lecture Notes in 54. I.D. Watson, Applying Case-Based Reasoning: Techniques for
Computer Science, LNAI 1416, pp. 535–546. Enterprise Systems, San Francisco, CA: Morgan Kaufmann
48. J. Luft, M. Bianca, I. Jurisica, P. Rogers, J. Glasgow, S. Fortier, Publishers, 1997.
and G. DeTitta, “An opening strategy for macromolecular crys- 55. R. Bergmann, S. Breen, M. Goker, M. Manago, and S. Wess, De-
tallization: Case-based reasoning and the exploitation of a veloping Industrial Case-Based Reasoning Applications: The
precipitation reaction outcome database. in Conference of the INRECA Methodology, Berlin: Springer, 1999.
American Crystallography Association (ACA99), Buffalo, NY. 56. R.S. Michalski, “Inferential theory of learning: Developing
An abstract for an oral presentation, 1999. foundations for multistrategy learning,” in Machine Learning:
49. J. Luft, M. Bianca, L.M. Owczarczak, D.R. Weeks, I. Jurisica, A Multistrategy Approach, vol. IV, 1994.
P. Rogers, J. Glasgow, S. Fortier, and G. DeTitta, “The devel- 57. A.M. Collins and R.S. Michalski, “The logic of plausible-
opment of high throughput methods for macromolecular micro- reasoning: A core theory,” Cognitive Science, vol. 13, pp. 1–49,
batch crystallization,” in Recent Advances in Macromolecular 1989.

Anda mungkin juga menyukai