Anda di halaman 1dari 6

A Dialogue Analysis Model with Statistical Speech Act

Processing for Dialogue Machine Translation*

Jae-won Lee and Gil Chang Kim J u n g y u n Seo


Dept. of Computer Science and CAIR Dept. of Computer Science
Korea Advanced Institute of Science and Sogang University
T e c h n o l o g y , T a e j o n , 305-701, K o r e a Seoul, 121-742, Korea
{j wonl ee, gckim}Ocsone, kaist, ac. kr seo©nlpeng, sogaag, ac. kr

Abstract reflect embedding subdialogues (Grosz and Sidner


1986).
In some cases, to make a proper trans- Many researchers have studied the way how to
lation of an utterance in a dialogue, the analyze dialogues. One of the representative ap-
system needs various information about proaches is the plan-based method (Litman et al.
context. In this paper, we propose a sta- 1987; Caberry 1989). Considering that our dia-
tistical dialogue analysis model based on logue translation system is to be combined with
speech acts for Korean-English dialogue the speech system to develop an automatic translat-
machine translation. T h e model uses syn- ing telephone, however, the plan-based approach has
tactic patterns and N-grams reflecting the some limitations. In an automatic translating tele-
hierarchical discourse structures of dia- phone environment, the system must make one cor-
logues. The syntactic pattern includes the rect translated target sentence for each source sen-
syntactic features that are related with the tence and must be able to respond in real time. How-
language dependent expressions of speech ever, the plan inference is computationally expensive
acts. The N-gram of speech acts based and is hard to be scaled up. In order to overcome
on hierarchical recency approximates the such limitations, we have focused on defining mini-
context. Our experimental results with mal approach which uses knowledgebase as small as
trigram showed that the proposed model possible while it can handle ambiguous utterances.
achieved 78.59 % accuracy for the top can- This paper presents an efficient discourse anal-
didate and 99.06 % for the top four candi- ysis model using statistical speech act processing
dates even though the size of the training for Korean-English dialogue machine translation. In
corpus is relatively small. The proposed this model, we suggest a probabilistic model which
model can be integrated with other ap- uses surface syntactic patterns and the N-gram of
proaches for an efficient and robust anal- speech act reflecting the hierarchical structures of
ysis of dialogues. dialogues to decide the speech act of an input sen-
tence and to maintain a discourse structure. The
1 Introduction proposed model consists of two steps : (1) identify-
ing the syntactic pattern of an utterance (2) calcu-
Recently, special concerns are paid to research on di- lating the plausibility for possible speech acts and
alogue machine translation. Many different aspects discourse relations.
of dialogue, however, make it difficult to translate After presenting some motivational examples in
spoken language with conventional machine transla- section 2, we discuss the statistical speech act .pro-
tion techniques. One of the reasons is that a surface cessing model to analyze discourse structure in sec-
utterance may represent several ambiguous mean- tion 3. In section 4, we describe a method to ana-
ings depending on context. T h a t means such utter- lyze dialogue structure using the proposed statistical
ance can be translated into many different ways de- speech act processing. We discuss experimental re-
pending on context. Interpreting this kind of utter- sults for the proposed model in section 5. Finally,
ances often requires the analysis of contexts. There- we draw some conclusions.
fore, the discourse structure of a dialogue plays a
very important role in translating the utterances in
the dialogue. Discourse structures of dialogues are 2 Motivation
usually represented as hierarchical structures which Translation of dialogues often requires the analysis
*This research is supported in part by the ministry of of contexts. T h a t is, a surface utterance may be
information and communication of Korea. translated differently depending on context. In this

10
section, we present some motivational examples. Sentence Type [ Main-Verb Aux-Verb I Clue-Word I


The word 'yey q in Korean has a number of En-
glish expression such as 'yes', 'no', 'O.K:', 'Hello', - Assert
'thanks', and so on (Jae-woong Choe 1996). When YN-Quest t PV
PA
Must
Want t yey
aniyo
the speech act of the utterance 'yey' is response, it WH-Quest FRAG Intent kulemyen
must be translated as 'yes' or 'no'. On the other - Imperative LEXEME Possible ...
hand, when the speech act of the utterance is ac- Serve
cept, it must be translated as 'O.K.'. It is even used Serve_to
as greeting or opening in Korean. In this case, 'Hello' May
is an appropriate expression in English. Intend
The verb 'kulehsupnita' in Korean, also, may
be translated differently depending on context. Figure 1: A Syntactic Pattern
Kulehsupnila is used to accept the previous utter-
ance in Korean. In this case, it must be translated
differently depending on context. The following di-
alogue examples show such cases.
P(UilUt,U2,...,U i-l) means the probability that
Ui will be uttered given a sequence of utterances
Dialogue 1 U1,U2,...,Ui-1. As shown in the equation (1),
A : Hankwuk hotelipnikka? we can approximate P(UilU1, U2,..., Ui-1) by the
(Is it Hankwuk Hotel?) product of the sentential probability P(UilSi) and
B : Yey, kulehsupnita. the contextual probability P(SilS1,S2,...,Si-1)
(Yes, It is.) (Nagata and Morimoto 1994). In subsequent sec-
tions, we describe the details for each probability.
Dialogue 2
A : Kayksil yeyyak hasyesssupnikka? P(UitU~, U2,..., U,_,) (1)
(Did you reserve a room?) P(U, IS~)P(S,[S1, $2..... S,_a ).
B : Yey, kulehsupnita.
(Yes, I did.)
3.1 Sentential Probability
To differentiate such cases, a translation system There is a strong relation between the speaker's
must analyze the context of a dialogue. Since a dia- speech act and the surface utterances expressing
logue has a hierarchical structure than a linear struc- that speech act (Allen 1989 ; Andernach 1996). T h a t
ture, the discourse structure of a dialogue must be is, the speaker utters a sentence which most well
analyzed to reflect the context in translation. There expresses his/her intention (speech act). This sen-
are the previous plan-based approaches for analyz- tence allows the hearer to infer what the speaker's
ing context in dialogues. Since it is very difficult speech act is. However, a sentence can be used as
to have a complete knowledge, it is not easy to find several speech acts depending on the context of the
a correct analysis using such knowledge bases. In sentence.
this paper, we propose a statistical dialogue analy- The sentential probability P(Ui ISi) represents the
sis model based on speech acts for dialogue machine relationship between the speech acts and the features
translation. Such model is weaker than the dialogue of surface sentences. In this paper, we approximate
analysis model which uses m a n y difference source ,of utterances with a syntactic pattern, which consists
knowledge. However, it is more efficient and robust, of the selected syntactic features.
and easy to be scaled up. We believe that this kind We decided the syntactic pattern which consists
of minimal approach is more appropriate for a trans- of the fixed number of syntactic features. Sentence
lation system. Type, Main- Verb, Aux- Verb, Clue- Word are selected
as the syntactic features since they provide strong
cues to infer speech acts. The features of a syntactic
3 Statistical Speech Act Processing pattern with possible entries are shown in figure 1.
We construct a statistical dialogue model based on
speech acts as follows. • Sentence Type represents the m o o d of an ut-
Let D denote a dialogue which consists of a se- terance. Assert, YN-Quest, WH-Quest, Imper-
quence of n utterances, U1, U2 . . . . , Un, and let Si ative are possible sentence types.
denote the speech act of Ui. With this notation,
• Main- Verb is the type of the main verb in the
1All notations for Korean follow Yale Romanization utterance. PA is used when the main verb rep-
System notation. resents a slate and PV for the verbs of type

11
Table 1: A part of the syntactic patterns extracted from corpus

Speech Act Sentence Type Main- Verb Aux- Verb Clue Word
Request-Act Imperative PV Request None
Request-Act YN-Quest PV Possible None
Request-Act Assert PV Want None
Ask-Ref WH-Quest PV None None
Ask-Ref YN-Quest PJ None None
Ask-Ref Imperative malhata Request None
Inform Assert PJ None None
Inform Assert PV None None
Request-Conf YN-Quest PJ None None
Request-Conf YN-Quest FRAG None None
Response Assert PJ None yey
Suggest Wh-Quest PV Serve None

event or action. Utterances without verbs be- $1, $2, • •., Si-1 were previously uttered. Since pre-
long to FRAG (fragment). In the case of per- vious speech acts constrain possible speech acts in
formative verbs (ex. promise, request, etc.), the next utterance, contextual information have an
lexical items are used as a Main-Verb because important role in determining the speech act of an
these are closely tied with specific speech acts. utterance. For example, if an utterance with ask-ref
speech act uttered, then the next speech act would
• Aux-Verb represents the modality such as
be one of response, request-conf, and reject. In this
Want, Possible, Must, and so on.
case, response would be the most likely candidate.
• Clue-Word is the special word used in the ut- The following table shows an example of the speech
terance having particular speech acts, such as act bigrams.
Yes, No, O.K., and so on.

We extracted 167 pairs of speech acts and syntac-


Si--1 Si Ratio
tic patterns from a dialogue corpus automatically ask-ref response 58.46
using a conventional parser. As the result of ap- ask-ref request-confirm 18.46
plying these syntactic patterns to all utterances in
corpus, we found that the average number of speech ask-ref ask-if 7.69
act ambiguity for each utterance is 3.07. Table 1 ask-ref ask-ref 3.08
gives a part of the syntactic patterns extracted from ask-ref suggest 3.08
corpus.
ask-ref inform 1.54
Since a syntactic pattern can be matched with
several speech acts, we use sentential probability,
P(UilSi) using the probabilistic score calculated This table shows that response is the most likely
from the corpus. Equation (2) represents the ap- candidate speech act of the following utterance of
proximated sentential probability. F denotes the the utterances with ask-refspeech act. Also, request-
syntactic pattern and freq denotes the frequency
confirm and ask-if are probable candidates.
count of its argument.
Since it is impossible to consider all preceding ut-
terances $1, $ 2 , . . . , Si-1 as contextual information,
P(U~IS,) ~ P(LIS,) = freq(Fi, Si) (2) we use the n-gram model. However, simply using
fTeq(Si) n utterances linearly adjacent to an utterance as
contextual information has a problem due to sub-
dialogues which frequently occurred in a dialogue.
Let's consider an example dialogue.
3.2 Contextual Probability
The contextual probability P(SilS1, $ 2 , . . . , Si-1) is In dialogue 3, utterances 3-4 are part of an em-
the probability that n utterances with speech act bedded segment. In utterance 3, the speaker asks for
Si is uttered given that utterances with speech act the type of rooms without responding to B's ques-

12
Dialogue 3

1. A : I would like to reserve a room. request-act


2. B : What kind of room do you want? ask-ref
3. A : What kind of room do you have? ask-ref
4. B : We have single and double rooms. response
5. A : A single room, please. response

tion (utterance 2). This subdialogue continues up


to the utterance 4. As shown in the above exam-
ple, dialogues cannot be viewed as a linear sequence
of utterances. Rather, dialogues have a hierarchical request- ~acce
structure. Therefore, if we use n utterances linearly
adjacent to an utterance as a context, we cannot re-
RA = © action pt/e

fleet the hierarchical structure of a dialogue in the


model. JRC/
Therefore, we approximate the context for an ut-
terance as speech acts of n utterances which is hi-
erarchically recent to the utterance. An utterance RI/
A is hierarchically recent to an utterance B if A is ask-if/ Q~respo
adjacent to B in the tree structure of the discourse RI =
O ask-ref _ nse _ O
(Walker 1996). Equation (3) represents the approxi-
mated contextual probability in terms of hierarchical
recency in the case of using trigram. In this equa-
tion, Ui is adjacent to Uj and Uj is adjacent to Uk
in the discourse structure, where 1 _< j < k _< i - 1 . request- R ~ c o n fi
RC :
O confirm ~
rm/e O
P(SilS1,S2 .... , & - l ) '~' P(SilSj,S~). (3)

4 Discourse Structure Analysis inform/


response
Now we can define a discourse structure analysis
model with the statistical speech act processing.
GI :
(3
Formally, choose Si which maximizes the following RI/ -~"-k...J
probability
Figure 2: A part of the dialogue transition network
max P( F~IS~)P( S~ISj, Sk ). (4)
S,

where Si is a possible speech act for the utterance


Ui. Uj and Uk are the utterances which Uj is hi- A:request
erarchically adjacent to Ui, and Uk to Uj, where 1 acti°n~t'--)
<_j<k<_i-1. RA-I : v ;i
In equation (4), one problem is to search all pos-
sible Uj that Ui can be connected to. We use the
dialogue transition networks (DTN) and a stack for
maintaining the dialogue state efficiently. The di-
alogue transition networks describe possible flow of
speech acts in dialogues as shown in figure 2 (Seo
et al. 1994, Jin Ah Kim et al. 1995). Since DTN k-ref ~ O B:response~
is defined using recursive transition networks, it can RI-II u3 u4

handle recursively embedded subdialogues. It works


just like the RTN parser (Woods 1970). If a subdi-
alogue is initiated, a dialogue transition network is Figure 3: The transitions of dialogue 3
initiated and a current state is pushed on the stack.
On the other hand, if a subdialogue is ended, then a

13
dialogue transition network is ended and a current
state is popped from the stack. This process contin- Table 2: The distribution of speech acts in corpus
ues until a dialogue is finished.
With DTN and the stack, the system makes ex-
pectations for all possible speech acts of the next Speech Act Type Ratw Speech Act Type Ratio
utterance. For example, let us consider dialogue 3. ask-ref ask-if
Figure 3 shows the transitions with the dialogue 3. inform response
In utterance 2, according to the RA diagram
request-confirm request-action
in figure 2, B may request-confirm or request-
il~formation. Since B asks for the type of rooms, suggest confirm
push operation occurs and a RI diagram is initi- accept reject
ated. In utterance 3, A doesn't know the possible correct promise
room sizes, hence asks B to provide such informa- expressive greeting
tion. Therefore, push operation occurs again and a
good-bye Total
new RI diagram is initiated. This diagram is con-
tinued by response in utterance 4. In utterance 5,
this diagram is popped from the stack by response
for ask-refin utterance 2.
Table 3: Experimental results
In this state, some cases can be expected for the
next utterance. The first case is to clarify the ut- 1 2 3 4
terance 5. The second case is to return to the ut-
Model I 68.48 % 74.57 % 76.09 % 76.30 %
terance 1. The last case is to introduce a new sub-
dialogue. Therefore, if we assume that ask-if and Model II 78.59 % 92.82 % 97.88 % 99.06 %
request-confirm are possible from the syntactic pat-
tern of the next utterance, then the following table
can be expected for the next utterance from the di-
alogue transition networks. /SP/hotel
IKSlEtten pangul wenhasipnikka?
/ES/What k i n d o f room do you want?
Uk Uj U~ /SA/ask-ref
/DS/[1]
(0:-:init) (0:-:init) (6:B:ask-if)
(2:B:ask-ref) (5:A:response) (6:B:ask-if)
/SP/customer
(2:B:ask-ref) (5:A:response) (6:B:request-conf) IKSIEtten pangiisssupnikka?
(0:-:init) (l:A:request-act) (6:B:ask-if) /ES/What k i n d of room do you have?
/SA/ask-ref
/DS/[1,1]
Since DTN has the same expressive power as
ATN(Augmented Transition Network) grammar, we
believe that it is not enough to cover the whole phe- We test two models in order to verify the efficiency
nomenon of dialogues. However, considering the fact of the proposed model. Model-I is the proposed
that the utterances requiring context for translation model based on linear recency, where an utterance U/
is relatively small, it is practically acceptable for di- is always connected to the previous utterance Ui-1.
alogue machine translation. Model-II is the model based on hierarchical recency.
Table 3 shows the average accuracy of two models.
5 Experiments and Results Accuracy figures shown in table 3 are computed
by counting utterances that have a correct speech
In order to experiment the proposed model, we used act and a correct discourse relation. In the closed
70 dialogues recorded in real fields such as hotel experiments, Modelq achieved 68.48 % accuracy for
reservation and airline reservation. These 70 dia- the top candidate and 76.30 % for the top four can-
logues consist of about 1,700 utterances, 8,319 words didates. In contrast, the proposed model, Model-
total. Each utterance in dialogues was annotated II, achieved 78.59 % accuracy for the top candidate
with speech acts (SA) and with discourse structure and 99.06 % for the top four candidates. Errors in
information (DS). DS is an index that represents the Model-I occurred, because the hierarchical structure
hierarchical structure of discourse. Table 2 shows of dialogues was not considered. Although dialogue
the distribution of speech acts in this dialogue cor- corpus are relatively small, the experimental results
pus. The following shows a part of an annotated showed that the proposed model is efficient for ana-
dialogue corpus. lyzing dialogues.

14
6 Conclusions Layered Dialogue Component for a Speech-to-
Speech Translation System", Proc. of the 7th Eu-
In this paper, we described an efficient dialogue anal- ropean Association for Computational Linguis-
ysis model with statistical speech act processing. We tics, pp. 188-193.
proposed a statistical method to decide a speech
act of a sentence and to maintain a discourse struc- Jin Ah Kim, Young Hwan Cho, Jae-won Lee, Gil
ture. This model uses the surface syntactic patterns Chang Kim, 1995, "A Response Generation in
of the sentence and N-gram of speech acts of the Dialogue System based on Dialogue Flow Dia-
sentences which are discourse structurally recent to grams," Natural Language Processing Pacific Rim
tile sentence. Our experimental results with trigram Symposium, pp.634-639.
showed that the proposed model achieved 78.59 % Jungyun Seo, Jae-won Lee, Jae-Hoon Kim, Jeong-
accuracy for the top candidate and 99.06 % for the Mi Cho, Chang-Hyun Kim, and Gil Chang Kim,
top four candidates although the size of the train- 1994, "Dialogue Machine Translation Using a Di-
ing corpus is relatively small. This model is weaker alogue Model", Proc. of China-Korea Joint Sym-
than the dialogue analysis model which uses many posium on Machine Translation, pp.55-63.
difference source of knowledge. However, it is more
Masaaki Nagata and Tsuyoshi Morimoto, 1994,
efficient and robust, and easy to be scaled up. We
believe that this kind of statistical approach can be "First Steps towards Statistical Modeling of Di-
alogue to Predict the Speech Act Type of the
integrated with other approaches for an efficient and
Next Utterance", Speech Communication, Vol.15,
robust analysis of dialogues.
pp.193-203.
Massko Kume, Gayle K. Sato, Kei Yoshimoto,
References 1990, "A Descriptive Framework for Translat-
ing Speaker's Meaning", Proc. of the 4th Euro-
Hwan Jin Choi, Young Hwan Oh, 1996, "Analysis of pean Association for Computational Linguistics,
Intention in Spoken Dialogue Based on Learning pp.264-271.
of Intention Dependent Sentence Patterns", Jour-
nal of Korea Science Information Society, Vol.23, Marilyn Walker and Steve Whittaker, 1990, "Mixed
No.8, pp.862-870, In Korea. initiative in Dialogue : An Investigation into Dis-
course Segmentation", In Proc. of the 28th An-
Jae-woong Choe, 1996, "Some Issues in Conversa- nual Meeting of the ACL, Association of Compu-
tional Analysis : Telephone Conversations for Ho- tational Linguistics, pp.70-78.
tel Reservation," In Proc. of Hangul and Korean
Language Information Processing, pp.7-16, In Ko- Sandra Caberry, 1989, "A Pragmatics-Based Ap-
rea. proach to Ellipsis Resolution", Computational
Linguistics, Vol.15, No.2, pp.75-96.
James F. Allen, C. Raymond Perrault, 1980, "Ana-
lyzing Intention in Utterances", Artificial Intelli- Toine Andernach, 1996, "A Machine Learning Ap-
gence, Vol.15, pp.143-178 proach to the Classification of Dialogue Utter-
ances", Proceedings of NeMLaP-2, Bilkent Uni-
Elizabeth A. Hinkelman, James F. Allen, 1989, versity, Turkey.
"Two Constraints on Speech Act Ambiguity," In
Proc. of th 27th Annual Meeting of the ACL, As- Marilyn A. Walker, 1996, "Limited Attention and
sociation of Computational Linguistics, pp.212- Discourse Structure,", Computational Linguistics,
219. Vol.22, No.2, pp.255-264.
Barbara J. Grosz, Candace L. Sidner, 1986, "Atten- Woods, W. A., 1970, "Transition Network Gram-
tion, Intentions, and the Structure of Discourse", mars for Natural Language Analysis," Commun.
Computational Linguistics, Vol.12, No.3, pp.175- of the ACM, Vol.13, pp.591-606.
204.
Philip R. Cohen, C. Raymond Perrault, 1979, "El-
ements of a Plan-Based Theory of Speech Acts",
Cognitive Science, Vol.3, pp.177-212.
Diane J. Litman, James F. Allen, 1987, "A Plan
Recognition Model for Subdialogues in Conversa-
tions", Cognitive Science, Vol.11, pp.163-200.
Hiroaki Kitano, 1994, "Speech-to-Speech Transla-
tion : A Massively Parallel Memory- Based Ap-
proach",Kluwer Academic Publishers.
Jan Alexandersson, Elisabeth Maier, Nobert Rei-
thinger, 1994, "A Robust and Efficient Three-

15

Anda mungkin juga menyukai