on
Bachelor of Technology
in
Computer Science
Submitted by
1
CERTIFICATE
This is to certify that the Project entitled, Analysis of Spoken Dialog Systems has
been submitted to the Department of Computer Science, IP College for Women,
University of Delhi by Twinkle, Tanpreet and Tanya, is satisfactory and authentic
piece of work that has been carried out under my supervision and guidance. This
work is done in the partial fulfillment of requirement for the award of degree of
Bachelor of Technology in Computer Science. The matter embodied in this Project
Report is a genuine work done by the students and has not been submitted to this
university or any other university/institute for the fulfillment of the requirement of
any course of study.
Date:
2
ABSTRACT
Human-Machine spoken dialog differs from written dialog primarily due to the limi-
tations of current speech recognition systems and the intrinsic structure of the spoken
language dialog. Speech recognition systems have a limitation that may be explained
by the non-deterministic character of the recognition process including difficulties to
account for short and degraded messages. We will focus on the problem of modeling
and evaluating spoken language systems in the context of human-machine dialogs.
The intrinsic characteristics of the spoken dialog include the spontaneity of utter-
ances which yields a significant amount of redundant information, repetitions, self
corrections, hesitations, contradictions and even tendencies to stop the interlocutor.
They also include non-grammatical structure of human utterances. Finally, they in-
clude clarification and/or reformulation sub-dialogs that depend on the limitations
of the speech recognizer. This project report describes an ambitious project that
embeds human subjects in a spoken dialog system. It collects an efficient data set
including spoken dialog, human behavior and system features.
This project lays out the analysis of 3 spoken dialog systems and tries to know
how people manage the problems that arise under dialog during such restrictions.
3
Contents
2 INTRODUCTION 6
2.1 WHY IS THE PARTICULAR TOPIC CHOSEN? . . . . . . . . . . . 7
2.2 OBJECTIVE AND SCOPE . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 METHODOLOGY: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 WHAT CONTRIBUTION WOULD THE PROJECT MAKE? . . . . 8
3 OVERALL DESCRIPTION 9
3.1 NON-FUNCTIONAL REQUIREMENTS . . . . . . . . . . . . . . . . 9
4 WORK DONE 10
4.1 Spoken Dialogue for Intelligent Tutoring Systems . . . . . . . . . . . 10
4.2 Predictive Performance Modeling . . . . . . . . . . . . . . . . . . . . 11
4.3 Monitoring Student State (motivation) . . . . . . . . . . . . . . . . . 12
4.4 Cobot: A Software Agent . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 An Intelligent Natural Language Conversational System for Aca-
demic Advising INSTAVIS . . . . . . . . . . . . . . . . . . . . . . . 19
5 FUTURE WORK 20
6 CONCLUSION 21
7 ACKNOWLEDGMENT 23
4
Chapter 1
DEFINITIONS, ACRONYMS
AND ABBREVIATIONS
The following are the list of conventions and acronyms used in this document and
the project as well:
5
Chapter 2
INTRODUCTION
6
2.1 WHY IS THE PARTICULAR TOPIC CHO-
SEN?
There is increasing interest in building dialogue systems that can detect and adapt
to user affective states. However, while this line of research is promising, there is still
much work to be done. For example, most research has focused on detecting user
affective states, rather than on developing dialogue strategies that adapt to such
states once detected. In addition, when affect-adaptive dialogue systems have been
developed, most systems detect and adapt to only a single user state, and typically
assume that the same affect-adaptive strategy will be equally effective for all users.
2.3 METHODOLOGY:
These are the factors of analysis:
• Metaphor
• Language
• Utterance length
• Language models
7
2.4 WHAT CONTRIBUTION WOULD THE PROJECT
MAKE?
The overall objective of the project is to support rapid, cost-effective development
of speech-enabled dialogue systems. Current commercial technology for speech-
enabled interfaces has made rapid progress over the past decade. There are increas-
ing numbers of systems deployed in commercial applications that provide structured
system-initiated interaction. These systems work by controlling the conversation,
requesting that the user provide a specific kind of information at each turn. How-
ever, these systems do not yet have true conversational capability. This project will
help us in:
• Building robust systems that can engage in true mixed initiative interaction
8
Chapter 3
OVERALL DESCRIPTION
• Performance Requirements:
– The system shall accommodate high number of books and users without
any fault.
– Responses to view information shall take no longer than 5 seconds to
appear on the screen.
• -Safety Requirements: System use shall not cause any harm to human users.
9
Chapter 4
WORK DONE
– TUTOR: The right side pumps blood to the lungs, and the left side
pumps blood to the other parts of the body. Could you explain how that
works?
– STUDENT 1 (self-explains): So the septum is a divider so that the blood
doesn’t get mixed up. So the right side is to the lungs, and the left side
is to the body. So the septum is like a wall that divides the heart into
two parts...it kind of like separates it so that the blood doesn’t get mixed
up...
– STUDENT 2 (doesnt self-explain): right side pumps blood to lungs
10
4.2 Predictive Performance Modeling
Opportunity :
Spoken dialogue system evaluation methodologies can improve our understand-
ing of how dialogue facilitates student learning
• Train model via multiple linear regression over parameters, predicting perfor-
mance System Performance = wi * pi
Challenges :
• System Performance
11
4.3 Monitoring Student State (motivation)
• Tutor26: Let me ask my question in a more general way. What happens to
the velocity of a body when there is no force acting on it?
• Tutor32 : Well... OK, let’s consider how forces in the vertical direction affect
the motion of a body in a horizontal direction in a different scenario
• Student33: oh gravity you already said this (ASR: beats gravity you’re exert
less)
• Tutor34: Fine. Are there any other forces acting on the apple as it falls?
• Student35: no why are you doing this again (ASR: no y and to it yes)
• Tutor36: Correct. The only force acting on the apple as it falls to the earth
is the earth’s gravitational force. In which direction does gravitational force
act?
12
Challenge 1: What emotions to detect?
Negative: Angry, Annoyed, Frustrated, Bored, Confused, Uncertain, Con-
tempt, Disgusted, Sad
• ITSPOKE: What else do you need to know to find the boxs acceleration?
• ITSPOKE : Good job. Say there is only one force acting on the box. How
is this force, the box’s mass, and its acceleration related? Student: velocity
[UNCERTAIN]
– ITSPOKE: What is the net force acting on the truck equal to?
– Student: I dont know [Uncertain, Frustrated]
– ITSPOKE: Since net force is defined as the vector sum of all the forces
exerted on an object, let’s first find the forces acting on the truck. Try
to name them.
– Student: The impact force, the normal force, and gravity [Certain, Neu-
tral]
– ITSPOKE: Yep.
13
Discourse Structure:
Opportunity
14
Findings:
• Statistically significant dependencies exist between students state of certainty
and the responses of an expert human tutor
• Learning opportunities (e.g. uncertain and incorrect student states) have more
speech recognition problems o However, speech recognition problems have not
negatively correlated with learning
15
4.4 Cobot: A Software Agent
• CobotDS provides real-time, two-way, natural language communication be-
tween a phone user and the multiple users in the text environment.
• We describe a number of the challenging design issues we faced, and our use of
summarization, social filtering and personalized grammars in tackling them.
We report a number of empirical findings from a small user study.
• Cobot is one of the most popular LambdaMOO residents, and both chats with
human users, and provides them with social statistics summarizing their usage
of verbs and interactions with other users (such as who they interact with, who
are the most popular users, and so on).
• To support conversation, CobotDS passes messages and verbs from the phone
user to LambdaMOO users (via automatic speech recognition, or ASR), and
from LambdaMOO to the phone user (via text-tospeech, or TTS).
• U1 waves to U2.
• U1 comforts U2.
• U1 [to U2]: Remember, the mighty oak was once a nut like you.
• U2 [to U1]: Right, but his personal growth was assured. Thanks anyway,
though.
• U1 feels better no
16
Calling Cobot
• Provided a dozen or so friendly LambdaMOO users with acces to a toll-free
Cobot-DS number.
• Users call with LambdaMOO user name, numeric password; then enter main
Cobot-DS command loop.
Personalization of Grammars
• Phone user could change grammar used through command grammar and then
engaging in subdialogue.
17
Basic Phone Commands
• Statements of whereabouts
• Listen command:
• Provides phone user richer view, allows passive participation User has no
scrollback
• Pace of activity can quickly outrun TTS rate Thus filter activity, including
via social rules
• Summarize command
18
4.5 An Intelligent Natural Language Conversa-
tional System for Academic Advising INSTAVIS
• Academic advisors assist students in academic, professional, social and per-
sonal matters. Successful advising increases student retention, improves grad-
uation rates and helps students meet educational goals.
• This is an advising system that assists advisors in multiple tasks using natural
language.
• The system is operational for several hundred students from a university de-
partment.
19
Chapter 5
FUTURE WORK
Our results contribute to the increasing body of literature demonstrating the utility
of adding fully-automated affect-adaptation to existing spoken dialogue systems. In
future work we will examine other performance measures besides learning, and will
manually annotate true disengagement and uncertainty in order to group students by
amount of disengagement. Second, our results contribute to the literature suggesting
that gender effects should be considered when designing dialogue systems. However,
further research is needed to determine more effective combinations of disengagement
and uncertainty adaptations for both males and females, and to investigate whether
gender differences might be related to other types of measurable user factors.As
future work, we will integrate statistical measurements from the log files content
and include indirect evaluations by the constituencies. The evaluation data will offer
advisors documented assessment of the areas of advising that most concern students.
Additional work includes adding an expert system for academic enrollment planning,
a mechanism to forward selected conversations to advisors and allowing users to add
lexical definitions.
20
Chapter 6
CONCLUSION
• Performance Evaluation
• Affective Reasoning
• Discourse Analysis
21
Bibliography
[2] www.google.co.in.
22
Chapter 7
ACKNOWLEDGMENT
Thank you.
(April, 2017)
IP College for Women
23