10
section, we present some motivational examples. Sentence Type [ Main-Verb Aux-Verb I Clue-Word I
•
The word 'yey q in Korean has a number of En-
glish expression such as 'yes', 'no', 'O.K:', 'Hello', - Assert
'thanks', and so on (Jae-woong Choe 1996). When YN-Quest t PV
PA
Must
Want t yey
aniyo
the speech act of the utterance 'yey' is response, it WH-Quest FRAG Intent kulemyen
must be translated as 'yes' or 'no'. On the other - Imperative LEXEME Possible ...
hand, when the speech act of the utterance is ac- Serve
cept, it must be translated as 'O.K.'. It is even used Serve_to
as greeting or opening in Korean. In this case, 'Hello' May
is an appropriate expression in English. Intend
The verb 'kulehsupnita' in Korean, also, may
be translated differently depending on context. Figure 1: A Syntactic Pattern
Kulehsupnila is used to accept the previous utter-
ance in Korean. In this case, it must be translated
differently depending on context. The following di-
alogue examples show such cases.
P(UilUt,U2,...,U i-l) means the probability that
Ui will be uttered given a sequence of utterances
Dialogue 1 U1,U2,...,Ui-1. As shown in the equation (1),
A : Hankwuk hotelipnikka? we can approximate P(UilU1, U2,..., Ui-1) by the
(Is it Hankwuk Hotel?) product of the sentential probability P(UilSi) and
B : Yey, kulehsupnita. the contextual probability P(SilS1,S2,...,Si-1)
(Yes, It is.) (Nagata and Morimoto 1994). In subsequent sec-
tions, we describe the details for each probability.
Dialogue 2
A : Kayksil yeyyak hasyesssupnikka? P(UitU~, U2,..., U,_,) (1)
(Did you reserve a room?) P(U, IS~)P(S,[S1, $2..... S,_a ).
B : Yey, kulehsupnita.
(Yes, I did.)
3.1 Sentential Probability
To differentiate such cases, a translation system There is a strong relation between the speaker's
must analyze the context of a dialogue. Since a dia- speech act and the surface utterances expressing
logue has a hierarchical structure than a linear struc- that speech act (Allen 1989 ; Andernach 1996). T h a t
ture, the discourse structure of a dialogue must be is, the speaker utters a sentence which most well
analyzed to reflect the context in translation. There expresses his/her intention (speech act). This sen-
are the previous plan-based approaches for analyz- tence allows the hearer to infer what the speaker's
ing context in dialogues. Since it is very difficult speech act is. However, a sentence can be used as
to have a complete knowledge, it is not easy to find several speech acts depending on the context of the
a correct analysis using such knowledge bases. In sentence.
this paper, we propose a statistical dialogue analy- The sentential probability P(Ui ISi) represents the
sis model based on speech acts for dialogue machine relationship between the speech acts and the features
translation. Such model is weaker than the dialogue of surface sentences. In this paper, we approximate
analysis model which uses m a n y difference source ,of utterances with a syntactic pattern, which consists
knowledge. However, it is more efficient and robust, of the selected syntactic features.
and easy to be scaled up. We believe that this kind We decided the syntactic pattern which consists
of minimal approach is more appropriate for a trans- of the fixed number of syntactic features. Sentence
lation system. Type, Main- Verb, Aux- Verb, Clue- Word are selected
as the syntactic features since they provide strong
cues to infer speech acts. The features of a syntactic
3 Statistical Speech Act Processing pattern with possible entries are shown in figure 1.
We construct a statistical dialogue model based on
speech acts as follows. • Sentence Type represents the m o o d of an ut-
Let D denote a dialogue which consists of a se- terance. Assert, YN-Quest, WH-Quest, Imper-
quence of n utterances, U1, U2 . . . . , Un, and let Si ative are possible sentence types.
denote the speech act of Ui. With this notation,
• Main- Verb is the type of the main verb in the
1All notations for Korean follow Yale Romanization utterance. PA is used when the main verb rep-
System notation. resents a slate and PV for the verbs of type
11
Table 1: A part of the syntactic patterns extracted from corpus
Speech Act Sentence Type Main- Verb Aux- Verb Clue Word
Request-Act Imperative PV Request None
Request-Act YN-Quest PV Possible None
Request-Act Assert PV Want None
Ask-Ref WH-Quest PV None None
Ask-Ref YN-Quest PJ None None
Ask-Ref Imperative malhata Request None
Inform Assert PJ None None
Inform Assert PV None None
Request-Conf YN-Quest PJ None None
Request-Conf YN-Quest FRAG None None
Response Assert PJ None yey
Suggest Wh-Quest PV Serve None
event or action. Utterances without verbs be- $1, $2, • •., Si-1 were previously uttered. Since pre-
long to FRAG (fragment). In the case of per- vious speech acts constrain possible speech acts in
formative verbs (ex. promise, request, etc.), the next utterance, contextual information have an
lexical items are used as a Main-Verb because important role in determining the speech act of an
these are closely tied with specific speech acts. utterance. For example, if an utterance with ask-ref
speech act uttered, then the next speech act would
• Aux-Verb represents the modality such as
be one of response, request-conf, and reject. In this
Want, Possible, Must, and so on.
case, response would be the most likely candidate.
• Clue-Word is the special word used in the ut- The following table shows an example of the speech
terance having particular speech acts, such as act bigrams.
Yes, No, O.K., and so on.
12
Dialogue 3
13
dialogue transition network is ended and a current
state is popped from the stack. This process contin- Table 2: The distribution of speech acts in corpus
ues until a dialogue is finished.
With DTN and the stack, the system makes ex-
pectations for all possible speech acts of the next Speech Act Type Ratw Speech Act Type Ratio
utterance. For example, let us consider dialogue 3. ask-ref ask-if
Figure 3 shows the transitions with the dialogue 3. inform response
In utterance 2, according to the RA diagram
request-confirm request-action
in figure 2, B may request-confirm or request-
il~formation. Since B asks for the type of rooms, suggest confirm
push operation occurs and a RI diagram is initi- accept reject
ated. In utterance 3, A doesn't know the possible correct promise
room sizes, hence asks B to provide such informa- expressive greeting
tion. Therefore, push operation occurs again and a
good-bye Total
new RI diagram is initiated. This diagram is con-
tinued by response in utterance 4. In utterance 5,
this diagram is popped from the stack by response
for ask-refin utterance 2.
Table 3: Experimental results
In this state, some cases can be expected for the
next utterance. The first case is to clarify the ut- 1 2 3 4
terance 5. The second case is to return to the ut-
Model I 68.48 % 74.57 % 76.09 % 76.30 %
terance 1. The last case is to introduce a new sub-
dialogue. Therefore, if we assume that ask-if and Model II 78.59 % 92.82 % 97.88 % 99.06 %
request-confirm are possible from the syntactic pat-
tern of the next utterance, then the following table
can be expected for the next utterance from the di-
alogue transition networks. /SP/hotel
IKSlEtten pangul wenhasipnikka?
/ES/What k i n d o f room do you want?
Uk Uj U~ /SA/ask-ref
/DS/[1]
(0:-:init) (0:-:init) (6:B:ask-if)
(2:B:ask-ref) (5:A:response) (6:B:ask-if)
/SP/customer
(2:B:ask-ref) (5:A:response) (6:B:request-conf) IKSIEtten pangiisssupnikka?
(0:-:init) (l:A:request-act) (6:B:ask-if) /ES/What k i n d of room do you have?
/SA/ask-ref
/DS/[1,1]
Since DTN has the same expressive power as
ATN(Augmented Transition Network) grammar, we
believe that it is not enough to cover the whole phe- We test two models in order to verify the efficiency
nomenon of dialogues. However, considering the fact of the proposed model. Model-I is the proposed
that the utterances requiring context for translation model based on linear recency, where an utterance U/
is relatively small, it is practically acceptable for di- is always connected to the previous utterance Ui-1.
alogue machine translation. Model-II is the model based on hierarchical recency.
Table 3 shows the average accuracy of two models.
5 Experiments and Results Accuracy figures shown in table 3 are computed
by counting utterances that have a correct speech
In order to experiment the proposed model, we used act and a correct discourse relation. In the closed
70 dialogues recorded in real fields such as hotel experiments, Modelq achieved 68.48 % accuracy for
reservation and airline reservation. These 70 dia- the top candidate and 76.30 % for the top four can-
logues consist of about 1,700 utterances, 8,319 words didates. In contrast, the proposed model, Model-
total. Each utterance in dialogues was annotated II, achieved 78.59 % accuracy for the top candidate
with speech acts (SA) and with discourse structure and 99.06 % for the top four candidates. Errors in
information (DS). DS is an index that represents the Model-I occurred, because the hierarchical structure
hierarchical structure of discourse. Table 2 shows of dialogues was not considered. Although dialogue
the distribution of speech acts in this dialogue cor- corpus are relatively small, the experimental results
pus. The following shows a part of an annotated showed that the proposed model is efficient for ana-
dialogue corpus. lyzing dialogues.
14
6 Conclusions Layered Dialogue Component for a Speech-to-
Speech Translation System", Proc. of the 7th Eu-
In this paper, we described an efficient dialogue anal- ropean Association for Computational Linguis-
ysis model with statistical speech act processing. We tics, pp. 188-193.
proposed a statistical method to decide a speech
act of a sentence and to maintain a discourse struc- Jin Ah Kim, Young Hwan Cho, Jae-won Lee, Gil
ture. This model uses the surface syntactic patterns Chang Kim, 1995, "A Response Generation in
of the sentence and N-gram of speech acts of the Dialogue System based on Dialogue Flow Dia-
sentences which are discourse structurally recent to grams," Natural Language Processing Pacific Rim
tile sentence. Our experimental results with trigram Symposium, pp.634-639.
showed that the proposed model achieved 78.59 % Jungyun Seo, Jae-won Lee, Jae-Hoon Kim, Jeong-
accuracy for the top candidate and 99.06 % for the Mi Cho, Chang-Hyun Kim, and Gil Chang Kim,
top four candidates although the size of the train- 1994, "Dialogue Machine Translation Using a Di-
ing corpus is relatively small. This model is weaker alogue Model", Proc. of China-Korea Joint Sym-
than the dialogue analysis model which uses many posium on Machine Translation, pp.55-63.
difference source of knowledge. However, it is more
Masaaki Nagata and Tsuyoshi Morimoto, 1994,
efficient and robust, and easy to be scaled up. We
believe that this kind of statistical approach can be "First Steps towards Statistical Modeling of Di-
alogue to Predict the Speech Act Type of the
integrated with other approaches for an efficient and
Next Utterance", Speech Communication, Vol.15,
robust analysis of dialogues.
pp.193-203.
Massko Kume, Gayle K. Sato, Kei Yoshimoto,
References 1990, "A Descriptive Framework for Translat-
ing Speaker's Meaning", Proc. of the 4th Euro-
Hwan Jin Choi, Young Hwan Oh, 1996, "Analysis of pean Association for Computational Linguistics,
Intention in Spoken Dialogue Based on Learning pp.264-271.
of Intention Dependent Sentence Patterns", Jour-
nal of Korea Science Information Society, Vol.23, Marilyn Walker and Steve Whittaker, 1990, "Mixed
No.8, pp.862-870, In Korea. initiative in Dialogue : An Investigation into Dis-
course Segmentation", In Proc. of the 28th An-
Jae-woong Choe, 1996, "Some Issues in Conversa- nual Meeting of the ACL, Association of Compu-
tional Analysis : Telephone Conversations for Ho- tational Linguistics, pp.70-78.
tel Reservation," In Proc. of Hangul and Korean
Language Information Processing, pp.7-16, In Ko- Sandra Caberry, 1989, "A Pragmatics-Based Ap-
rea. proach to Ellipsis Resolution", Computational
Linguistics, Vol.15, No.2, pp.75-96.
James F. Allen, C. Raymond Perrault, 1980, "Ana-
lyzing Intention in Utterances", Artificial Intelli- Toine Andernach, 1996, "A Machine Learning Ap-
gence, Vol.15, pp.143-178 proach to the Classification of Dialogue Utter-
ances", Proceedings of NeMLaP-2, Bilkent Uni-
Elizabeth A. Hinkelman, James F. Allen, 1989, versity, Turkey.
"Two Constraints on Speech Act Ambiguity," In
Proc. of th 27th Annual Meeting of the ACL, As- Marilyn A. Walker, 1996, "Limited Attention and
sociation of Computational Linguistics, pp.212- Discourse Structure,", Computational Linguistics,
219. Vol.22, No.2, pp.255-264.
Barbara J. Grosz, Candace L. Sidner, 1986, "Atten- Woods, W. A., 1970, "Transition Network Gram-
tion, Intentions, and the Structure of Discourse", mars for Natural Language Analysis," Commun.
Computational Linguistics, Vol.12, No.3, pp.175- of the ACM, Vol.13, pp.591-606.
204.
Philip R. Cohen, C. Raymond Perrault, 1979, "El-
ements of a Plan-Based Theory of Speech Acts",
Cognitive Science, Vol.3, pp.177-212.
Diane J. Litman, James F. Allen, 1987, "A Plan
Recognition Model for Subdialogues in Conversa-
tions", Cognitive Science, Vol.11, pp.163-200.
Hiroaki Kitano, 1994, "Speech-to-Speech Transla-
tion : A Massively Parallel Memory- Based Ap-
proach",Kluwer Academic Publishers.
Jan Alexandersson, Elisabeth Maier, Nobert Rei-
thinger, 1994, "A Robust and Efficient Three-
15