Ai 5 6 NLP

1/7/2012
Artificial Intelligence
Natural Language Processing Dr Alexiei Dingli
1
Aims of NLP?
Trying to make computers talk Give computers the linguistic abilities of humans
1/7/2012
1940s - 1950s
Turings (1936) model of algorithmic computation McCulloch-Pitts neuron (McCulloch and Pitts, 1943) a simplified model of the neuron as a kind of computing element (propositional logic) Kleene (1951) and (1956) finite automata and regular expressions.
Shannon (1948) probabilistic models of discrete Markov processes to automata for language. Chomsky (1956) finite state machines as a way to characterize a grammar 3
1940s - 1950s
Speech and language processing Shannon
metaphor of the noisy channel entropy as a way of measuring the information capacity of a channel
Foundational research in phonetics First machine speech recognizers (early 1950s).

1952, Bell Lab, statistical system that could recognize any of the 10 digits from a single speaker (Davis et al., 1952)
1/7/2012
1940s - 1950s
One of the earliest applications of computers Major attempts in US and USSR
Russian to English and reverse
George Town University, Washington system:

Translated sample texts in 1954
The ALPAC report (1964)

Assessed research results of groups working on MTs

Concluded: MT not possible in near future Funding should cease for MT ! Basic research should be supported Word to word translation does not work
Linguistic Knowledge is needed
1950s - 1970s Symbolic paradigm

Formal language theory and generative syntax 1957 Noam Chomsky's Syntactic Structures
A formal definition of grammars and languages Provides the basis for an automatic syntactic processing of NL expressions
1967 : Woods procedural semantics
A procedural approach to the meaning of a sentence Provides the basis for a automatic semantic processing of NL expressions
1/7/2012

Parsing algorithms
top-down and bottom-up dynamic programming Transformations and Discourse Analysis Project (TDAP)
Harris, 1962 Joshi and Hopely (1999) and Karttunen (1999), cascade of finite-state transducers
7

AI Summer of 1956 :John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester
work on reasoning and logic
Newell and Simon - the Logic Theorist and the General Problem Solver Early natural language understanding systems
Domains Combination of pattern matching and keyword search Simple heuristics for reasoning and questionanswering
Late 1960s - more formal logical systems

8
1/7/2012
1950s - 1970s Statistical paradigm

Bayesian method to the problem of optical character recognition.
Bledsoe and Browning (1959) : Bayesian textrecognition
a large dictionary compute the likelihood of each observed letter sequence given each word in the dictionary Joshi and Hopely (1999) and Karttunen (1999)
cascade of finite-state transducers likelihoods for each letter.
Bayesian methods to the problem of authorship attribution on The Federalist papers

Mosteller and Wallace (1964)
Testable psychological models of human language processing based on transformational grammar Resources
First online corpora: the Brown corpus of American English DOC (Dictionary on Computer) an on-line Chinese dialect dictionary.
Symbolic vs statistical approaches

Symbolic Based on hand written rules Requires linguistic expertise No frequencey information More brittle and slower than statistical approaches Often more precise than statistical approaches Error analysis is usually easier than for statistical approaches Statistical Supervised or non-supervised Rules acquired from large size corpora Not much linguistic expertise required Robust and quick Requires large size (annotated) corpora Error analysis is often difficult
10
1/7/2012
1970-1983 Statistical paradigm

Speech recognition algorithms Hidden Markov model (HMM) and the metaphors of the noisy channel and decoding
Jelinek, Bahl, Mercer, and colleagues at IBMs Thomas J. Watson Research Center, Baker at Carnegie Mellon University
11
1970-1983 Logic-based paradigm

Q-systems and metamorphosis grammars (Colmerauer, 1970, 1975) Definite Clause Grammars (Pereira and Warren, 1980) Functional grammar (Kay,1979) Lexical Functional Grammar (LFG) (Bresnan and Kaplans,1982)
12
1/7/2012
1970-1983 Natural Language Understanding

SHRDLU system : simulated a robot embedded in a world of toy blocks (Winograd, 1972a).
natural-language text commands
Move the red block on top of the smaller green one complexity and sophistication
first to attempt to build an extensive (for the time) grammar of English (based on Hallidays systemic grammar)
13
1970-1983 Natural Language Understanding

Yale School : series of language understanding programs
conceptual knowledge (scripts, plans, goals..) human memory organization network-based semantics (Quillian, 1968)
14
1/7/2012
1983-1993
Return of state models
Finite-state phonology and morphology (Kaplan and Kay, 1981) Finite-state models of syntax by Church (1980).
Return of empiricism
Probabilistic models throughout speech and language processing,

IBM Thomas J. Watson Research Center: probabilistic models of speech recognition. Data-driven approaches
Speech - part-of-speech tagging, parsing, attachment ambiguities, semantics.
New focus on model evaluation Considerable work on natural language generation

15
1994-1999
Major changes Probabilistic and data-driven models had become quite standard Parsing, part-of-speech tagging, reference resolution, and discourse processing
Algorithms incorporate probabilities Evaluation methodologies from speech recognition and information retrieval. commercial exploitation (speech recognition, spelling and grammar correction) need for language-based information retrieval and information extraction.
16
Increases in the speed and memory of computers
Rise of the Web
1/7/2012
1994-1999 Ressources and corpora

Disk space becomes cheap Machine readable text become common US funding emphasises large scale evaluation on real data 1994 : The British National Corpus is made available
A balanced corpus of British English
Mid 1990s : WordNet (Fellbaum & Miller)

A computational thesaurus developed by psycholinguists
The World Wide Web used as a corpus

17
2000-2008 Empiricist trends 1

Spoken and written material widely available
Linguistic Data Consortium (LDC) ... Annotated collections (standard text sources with various forms of syntactic, semantic, and pragmatic annotations)
Penn Treebank (Marcus et al., 1993),) PropBank (Palmer et al., 2005), TimeBank (Pustejovsky et al., 2003b) ....
More complex traditional problems castable in supervised machine learning

Parsing and semantic analysis
Competitive evaluations
Parsing (Dejean and Tjong Kim Sang, 2001), Information extraction (NIST, 2007a; Tjong Kim Sang, 2002; Tjong Kim Sang and De Meulder, 2003) Word sense disambiguation (Palmer et al., 2001; Kilgarriff and Palmer, 2000) Question answering (Voorhees and Tice, 1999), and 18 summarization (Dang, 2006).
1/7/2012
19

More serious interplay with the statistical machine learning community
Support vector machines (Boser et al., 1992; Vapnik, 1995) Maximum entropy techniques (multinomial logistic regression) (Berger et al., 1996) Graphical Bayesian models (Pearl, 1988)
20
10
1/7/2012

Largely unsupervised statistical approaches
Statistical approaches to machine translation (Brown et al., 1990; Och and Ney, 2003) t Topic modeling (Blei et al., 2003)
Effective applications could be constructed from systems trained on unannotated data alone Use of unsupervised techniques
21
Elements of a Language
Phonemes Morphemes Syntax Semantics
22
11
1/7/2012
From sounds to language

Linked with language understanding Carried out by the auditory cortex Basic sounds of language are Phonemes
(sound)
Smallest phonetic unit in a language Capable of conveying a distinction in meaning. Every language has discrete set of phonemes Describing all possible sounds
Eg: "M", in "man," and "c", in "can," are phonemes.
Basic unit of words are Morphemes (to change form)
A meaningful linguistic unit Consisting of a root word or a word element that cannot be divided into smaller meaningful parts.
Eg: "Pick" and "s", in the word "picks," are morphemes
23
NATO Phonetic Alphabet

A - Alpha B - Bravo C - Charlie D - Delta E - Echo F - Foxtrot G - Golf H - Hotel I - India J - Juliet K - Kilo L - Lima M - Mike N - November O - Oscar P - Papa Q - Quebec R - Romeo S - Sierra T - Tango . - decimal (point) . - (full) stop U - Uniform V - Victor W - Whiskey X - X-ray Y - Yankee Z - Zulu 0 - Zero 1 - Wun (One) 2 - Two 3 - Tree (Three) 4 - Fower (Four) 5 - Fife (Five) 6 - Six 7 - Seven 8 - Ait (Eight) 9 - Niner (Nine)
24
12
1/7/2012
Exercise
Word Bay Pots A Teacher Morpheme Bay (1) Phoneme B + ay (2)
? ? ?
? ? ?
Pot + s ? (2) A (1)
P + o + t + ? (4) s A (1)
Teach + er (2)
T + ea + ch + e + r (5)
25
Exercise
Word Bay Pots A Teacher Morpheme Bay (1) Pot + s (2) A (1) Teach + er (2) Phoneme B + ay (2) P + o + t + s (4) A (1) T + ea + ch + e + r (5)
26
13
1/7/2012
Syntax structure of language

Languages have structure:
not all sequences of words over the given alphabet are valid when a sequence of words is valid (grammatical), a natural structure can be induced on it
27
Syntax
Describes the constituent structure of NL expressions
(I (am sorry)), Dave, ( I ((cant do) that))
Grammars are used to describe the syntax of a language Syntactic analysers and surface realisers assign a syntactic structure to a string/semantic representation on the basis of a grammar
28
14
1/7/2012
Syntax
It is useful to think of this structure as a tree:
represents the syntactic structure of a string according to some formal grammar. the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the grammar
29
Syntax tree example

S NP John Adv V V Det a VP NP n book PP Prep to NP Mary
often gives
30
15
1/7/2012
Methods in syntax
Words - syntactic tree
Algorithm: parser Resources used: Lexicon + Grammar Symbolic : hand-written grammar and lexicon Statistical : grammar acquired from treebank
A parser checks for correct syntax and builds a data structure.
Difficulty: coverage and ambiguity
Treebank : text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank.
31
Syntax applications
For spell checking
*its a fair exchange No syntactic tree Its a fair exchange ok syntactic tree
To construct the meaning of a sentence To generate a grammatical sentence

32
16
1/7/2012
Syntax to meaning
John loves Mary love(j,m)
33
Semantics
Where the hell d you get that idea HAL Dave, although you took thorough precautions in the pod against my hearing you, I could see your lips move
34
17
1/7/2012
Lexical semantics Meaning of words

1. 2. 3. 4. 5. 6. 7. 8. ... To get come to have or hold; receive. succeed in attaining, achieving, or experiencing; obtain. experience, suffer, or be afflicted with. move in order to pick up, deal with, or bring. bring or come into a specified state or condition. catch, apprehend, or thwart. come or go eventually or with some difficulty. move or come into a specified position or state 1. 2. 3. 4. An idea a though or suggestion about a possible course of action. a mental impression. a belief. (the idea) the aim or purpose. The hell a place regarded in various religions as a spiritual realm of evil and suffering, often depicted as a place of perpetual fire beneath the earth to which the wicked are sent after death. a state or place of great suffering. a swear word that some people use when they are annoyed or surprised
35
1.
2. 3.
Lexical semantics
Who is the master? - Context? - Semantic relations?
36
18
1/7/2012
Compositional semantics
Where the hell did you get that idea?
A swear word that some people use when they are annoyed or surprised or to emphasize something
Have this belief
37
Semantics issues in NLP

Definition and representation of meaning Meaning construction Semantic relations Interaction between semantic and syntax
38
19
1/7/2012
Pragmatics
Knowledge about the kind of actions that speakers intend by their use of sentences
REQUEST: HAL, open the pod bay door. STATEMENT: HAL, the pod bay door is open. INFORMATION QUESTION: HAL, is the pod bay door open?
Speech act analysis (politeness, irony, greeting, apologizing...)

39
Discourse
Where the hell'd you get that idea, HAL?
Dave and Frank were planning to disconnect me Much of language interpretation is dependent on the preceding discourse/dialogue
40
20
1/7/2012
Linguistics knowledge in NLP summary

Phonetics and Phonology knowledge about linguistic sounds Morphology knowledge of the meaningful components of word Syntax knowledge of the structural relationships between word Semantics knowledge of meaning Pragmatics knowledge of the relationship of meaning to the goals and intentions of the speaker Discourse knowledge about linguistic units larger than a single utterance
41
Ambiguity
I made her duck I cooked duck for her. I cooked duck belonging to her. I caused her to quickly lower her head or body.
42
21
1/7/2012
Ambiguity
Sound-to- text issues:
Recognise speech.
Speech act interpretation

Can you switch on the computer?
Question or request?
43
Ambiguity vs paraphrase
Ambiguity : the same sentence can mean different things Paraphrase: There are many ways of saying the same thing.
Beer, please. Can I have a beer? Give me a beer, please. I would like beer. Id like a beer, please.
44
22
1/7/2012
Applications of NLP
IE IR QA Dialogue Systems
45
What is Question Answering?

The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents. As it becomes more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important.
46
23
1/7/2012
Question Types (1)

Clearly there are many different types of questions:
When was Mozart born? Question requires a single fact as an answer. Answer may be found verbatim in text i.e. Mozart was born in 1756.
How did Socrates die? Finding an answer may require reasoning. In this example die has to be linked with drinking poisoned wine.
47
Question Types (2)

How do I assemble a bike?
The full answer may require fusing information from many different sources. The complexity can range from simple lists to script-based answers.
Is the Earth flat?

Requires a simple yes/no answer.
48
24
1/7/2012
Evaluating QA Systems
The biggest independent evaluations of question answering systems have been carried out at TREC (Text Retrieval Conference) Five hundred factoid questions are provided and the groups taking part have a week in which to process the questions and return one answer per question. No changes are allowed to your system between the time you receive the questions and the time you submit the answers.
49
A Generic QA Framework
Search Engine
Document Collection Top n documents
Document Processing
Answers
Questions
Questions
A search engine is used to find the n most relevant documents in the document collection These documents are then processed with respect to the question to produce a set of answers which are passed back to the user Most of the differences between question answering systems are centred around the document processing stage
25
1/7/2012
A Simplified Approach
The answers to the majority of factoid questions are easily recognised named entities, such as countries, cities, dates, peoples names, etc The relatively simple techniques of gazetteer lists and named entity recognisers allow us to locate these entities within the relevant documents the most frequent of which can be returned as the answer This leaves just one issue that needs solving how do we know, for a specific question, what the type of the answer should be
51
A Simplified Approach (1)

The simplest way to determine the expected type of an answer is to look at the words which make up the question:
who suggests a person when suggests a date where suggests a location
52
26
1/7/2012
A Simplified Approach (2)

Clearly this division does not account for every question but it is easy to add more complex rules:
country suggests a location how much suggests an amount of money author suggests a person birthday suggests a date college suggests an organization
These rules can be easily extended as we think of more questions to ask

53
Problems (1)
The most frequently occurring instance of the right type might not be the correct answer.
For example if you are asking when someone was born, it maybe that their death was more notable and hence will appear more often (e.g. John F Kennedys assassination).
There are many questions for which correct answers are not named entities:
How did Ayrton Senna die? in a car crash
54
27
1/7/2012
Problems (2)
The gazetteer lists and named entity recognisers are unlikely to cover every type of named entity that may be asked about:
Even those types that are covered may well not be complete. It is of course relatively easy to build new lists, e.g. Birthstones.
55
Does a gazetteer of people names contains all the names?

Amber Precious Diamond Asia Summer Holly
Are these persons names?

56
28
1/7/2012
Dialogue (1)
A sequence of utterances Exchange of information among multiple dialogue participants Stays coherent over the time Driven by certain goal
finding the most suitable restaurant in a foreign city, booking the cheapest flight to a given city, controlling the state of the devices in a home, or the goal might also be the interaction itself (chatting)
57
Dialogue (2)
Most natural means for communication for humans perceived as a very expressive, efficient and robust However, dialogue is very complex protocol
follow certain conventions or protocols that are adopted by participants
humans usually use their extensive knowledge and reasoning capabilities to understand the conversational partner the dialogue utterances are often imperfect ungrammatical or elliptical
58
29
1/7/2012
Ellipsis
People often utter partial phrases to avoid repetition
A: At what time is Titanic playing? B: 8pm A: And The 5th element?
It is necessary to keep track of the conversation to complete such phrases

59
Deixis
Some words can only be interpreted in context:
Previous context (anaphora)
The monkey took the banana and ate it
Future context (cataphora)

Give me that. The book by the lamp.
Temporal/spatial
The man behind me will be dead tomorrow. (Who is the man? When he died/dies?)
60
30
1/7/2012
Indirect Meaning
The meaning of a discourse may be far from literal.
B: I cant reach him. A: There is the telephone. B: I am not in my office. A: Okay.
Undertones & implications are often employed for effect or efficiency

61
Turn Taking
People seem to know very well when they can take their turn
There is little overlap (5%) Gaps are often a few 1/10ths of a second Appears fluid, but not obvious why
A computational model of overlap does not exists

causes problem for dialogue systems
62
31
1/7/2012
Conversational fillers
Phrases like a-ha, yes, hmm or eh are often prompted in order to fill the pauses of the conversation, to indicate the attention or reflection The challenge here is to recognize when they should be understood as a request for turn taking and when they should be ignored
63
Most common dialogue domain

Flight and train timetable information and reservation Smart homes Automated directory enquires Yellow pages enquires Weather information
64
32
1/7/2012
Components of a Dialogue System
65
Automatic Speech Recognition

Transforms speech to text Two basic types
Grammar-based ASR
The set of accepted phrases defined by regular/context-free grammars (i.e. language model in the form of a grammar) Usually speaker independent
Dictation machine
Recognizes any utterance N-gram language model Often speaker dependent
66
33
1/7/2012
Natural Language Understanding

Analyzes textual utterance and returns its formal semantic representation
Logical formula Named entities etc
67
Dialogue Manager
Coordinates activity of all components Maintains representation of the current state of the dialogue Communicates with external applications Decides about the next dialogue step
68
34
1/7/2012
Three types of DM
Finite-state
dialogue flow determined by a finite state automata
Frame-based
form filling
Plan (task) based

a dynamic plan is constructed to reach the dialogue goal
in practice, you often find an extended versions or combinations of above mentioned approaches!
69
Finite State Automata
70
35
1/7/2012
Frame Based
71
Plan Based
Take a problem solving approach
There are goals to be reached Plans are made to reach those goals The goals and plans of the other participants must be iteratively inferred or predicted
Potential for handling complicated dialogues

suffers from todays technological limitation in more complex cases the planning problem can become computationally intractable
Examples: Bathroom consultant

72
36
1/7/2012
Natural Language Generation

Produces a textual utterance (so called surface realization) from an internal (formal) representation of the answer The surface realization can include formatting information
Speaking style, pauses Background sounds
73
Text-To-Speech
Transforms the surface realization into a an acoustic representation (sound signal)
74
37
1/7/2012
Typical parameters
Commercial systems:
small vocabulary (~100 words) closed domain system initiative
Research systems:
larger (but still small) vocabulary (~10000 words) closed domain (limited) mixed initiative
75
Different Initiatives
System-initiative
system always has control, user only responds to system questions
User-initiative:
user always has control, system passively answers user questions
Mixed-initiative:
control switches between system and user using fixed rules
Variable-initiative:
control switches between system and user dynamically based on participant roles, dialogue history, etc.
76
38
1/7/2012
Multi Modal Dialogue Systems

Several possible input/output modalities to communicate with dialogue systems
speech, text, pointing, graphics, gestures, face configurations, body positions, emotions, etc.
Not single most convenient modality (different modalities have different advantages)
entering day of week: click on a calendar entering Zip code: use keyboard performing commands: speech complex query: express them as typed natural language
Several modalities useful

when one modality is not applicable - e.g. eyes or hands are busy, silent environment or when difficult to use - e.g. small devices with limited keyboard and small screen
77
Case Study
Eliza Comic Companions
78
39
1/7/2012
Eliza
Eliza, the first chatterbot : a computer program that mimics human conversation.
Joseph Weizenbaum, Massachusetts Institute of Technology
User types in natural language ELIZA then analyzes the user's statement and generates some response which it types out. http://www-ai.ijs.si/eliza/eliza.html
79
ELIZA Sample conversation

Men are all alike. IN WHAT WAY? They're always bugging us about something or other. CAN YOU THINK OF A SPECIFIC EXAMPLE? Well, my boyfriend made me come here. YOUR BOYFRIEND MADE YOU COME HERE He says I'm depressed much of the time. I AM SORRY TO HEAR YOU ARE DEPRESSED It's true. I am unhappy DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY I need some help, that much seems certain. WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP Perhaps I could learn to get along with my mother. TELL ME MORE ABOUT YOUR FAMILY
80
40
1/7/2012
Eliza's strategy Keywords

Input read and inspected for the presence of a keyword When keyword is found, the sentence is transformed according to a rule associated with the keyword Else a content-free remark or, under certain conditions, an earlier transformation is retrieved. The text so computed or retrieved is then printed out
81
ELIZA Example transformations

"I am X" "How long have you been X? I am very unhappy. HOW LONG HAVE YOU BEEN VERY UNHAPPY? "X YOU Y ME" "What makes you think I Y you? It seems that you hate me. WHAT MAKES YOU THINK THAT I HATE YOU?
82
41
1/7/2012
ELIZA Why does it work?

Rogerian psychotherapist Can pretend to know almost nothing of the real world "I went for a long boat ride" "Tell me about boats Not a real dialog (no real communication)
ELiza generates a reaction (often a rather non commital one) to the user's statement The user cannot ask for information. The user cannot issue an order
83
ELIZA Does it really work?

Eliza: Hello. I am ELIZA. How can I help you? You: I went to Paris yesterday Eliza: Can you elaborate on that? You: I took the train Eliza: Can you elaborate on that? You: The train was late Eliza: Please go on. You: All the seats were full Eliza: Please go on. You: A little boy was having a tantrum Eliza: Please go on.
84
42
1/7/2012
ELIZA Does it really work?

Indexicals (I, you, here, now)
User: I am having fun right now. Eliza: Does it please you to believe that I am having fun right now.
World Knowledge
User: I am a human. Eliza: How long have you been a human?
85
When Eliza fails

Yes/No Questions
User: Are you a good therapist? Eliza: Would you prefer it if I were a good therapist? User: I'm wondering about your skills as a therapist; Eliza: Does it please you to believe I am wondering about my skills of therapist?
Wh-Questions
User: Who are you? Eliza: Would you prefer it if I were not? User: What do you want me to tell? Eliza: Does that question interest you?
Complex Assertions
User: I don't have a problem. I'm quite happy, I have a girlfriend, I study. Everything is OK! Eliza: Why do you not have any problem, I'm quite happy, you have a girlfriend, you study. Everything is OK!
86
43
1/7/2012
Eliza is not intelligent but...

Regardless of what people believe or know about the inner workings of computers, they talk about them and interact with them as social entities. Weizenbaum (1976) notes that many people continued to believe in ELIZAs abilities even after the programs operation was explained to them. People act toward computers as if they were people and expect that computers should be able to understand their needs and be capable of interacting with them naturally. Given these predispositions, speech- and language-based system are not supposed to be intelligent
But they may provide users with the most natural interface for many applications
87
The Comic Avatar
88
44
1/7/2012
Wizard of Oz
89
Putting it together
90
45
1/7/2012
The Companions Architecture
91
The Companions Robot
92
46
1/7/2012
The Companions Interface 1
93
The Companions Interface 2
94
47
1/7/2012
What is Named Entity Recognition?

Identification of proper names in texts, and their classification into a set of predefined categories of interest Persons Organisations (companies, government organisations, committees, etc) Locations (cities, countries, rivers, etc) Date and time expressions Various other types as appropriate
95
Why is NE important?
NE provides a foundation from which to build more complex IE systems Relations between NEs can provide tracking, ontological information and scenario building Tracking (co-reference) Dr Head, John, he
96
48
1/7/2012
Two kinds of approaches

Knowledge Engineering
rule based developed by experienced language engineers make use of human intuition require only small amount of training data development can be very time consuming some changes may be hard to accommodate
Learning Systems
use statistics or other machine learning developers do not need expertise require large amounts of annotated training data some changes may require re-annotation of the entire training corpus
97
Typical NE pipeline
Pre-processing (tokenisation, sentence splitting, morphological analysis, POS tagging) Entity finding (gazeteer lookup, NE grammars) Coreference (alias finding, orthographic coreference etc.) Export to database / XML
98
49
1/7/2012
GATE and ANNIE

GATE (Generalised Architecture for Text Engineering) is a framework for language processing ANNIE (A Nearly New Information Extraction system) is a suite of language processing tools, which provides NE recognition GATE also includes: plugins for language processing, e.g. parsers, machine learning tools, stemmers, IR tools, IE components for various languages etc. tools for visualising and manipulating ontologies ontology-based information extraction tools evaluation and benchmarking tools
99
GATE
100
50
1/7/2012
Information Extraction vs. Retrieval
IR
IE
101
A couple of approaches
Active learning to reduce annotation burden
Supervised learning Adaptive IE The Melita methodology
Automatic annotation of large repositories

Largely unsupervised Armadillo
51
1/7/2012
The Seminar Announcements Task

Created by Carnegie Mellon School of Computer Science How to retrieve
Speaker Location Start Time End Time
From seminar announcements received by email

103
Seminar Announcements Example

Dr. Steals presents in Dean Hall at one am. becomes <speaker>Dr. Steals</speaker> presents in <location>Dean Hall</location> at <stime>one am</stime>.
104
52
1/7/2012
Information Extraction Measures

How many documents out of the retrieved documents are relevant?
How many retrieved documents are relevant out of all the relevant documents?
Weighted harmonic mean of precision and recall
105
IE Measures Examples
If I ask the librarian to search for books on cars, there are 10 relevant books in the library and out of the 8 he found, only 4 seem to be relevant books. What is his precision, recall and fmeasure?
106
53
1/7/2012
IE Measures Answers
If I ask the librarian to search for books on cars, there are 10 relevant books in the library and out of the 8 he found, only 4 seem to be relevant books. What is his precision, recall and fmeasure? Precision = 4/8 = 50% Recall = 4/10 = 40% F =(2*50*40)/(50+40) = 44.4%
107
Adaptive IE
What is IE?
Automated ways of extracting unstructured or partially structured information from machine readable files
What is AIE?
Performs tasks of traditional IE Exploits the power of Machine Learning in order to adapt to
complex domains having large amounts of domain dependent data different sub-language features different text genres
Considers important the Usability and Accessibility of the system
108
54
1/7/2012
What is adaptable?
New domain information Based upon an ontology which can change Different sub-language features POS, Noun chunks, etc Different text genres Free text, structured, semi-structured, etc Different types Text, String, Date, Name, etc
109
Amilcare
Tool for adaptive IE from Web-related texts
Specifically designed for document annotation Based on (LP)2 algorithm
*Linguistic Patterns by Learning Patterns
Covering algorithm based on Lazy NLP Trains with a limited amount of examples Effective on different text types
free texts semi-structured texts structured texts
Uses Gate and Annie for preprocessing

110
55
1/7/2012
CMU: detailed results

speaker location stime etime All Slots (L P )2 77.6 75.0 99.0 95.5 86.0 BWI 67.7 76.7 99.6 93.9 83.9 HMM 76.6 78.6 98.5 62.1 82.0 SRV Rapier Whisk 56.3 53.0 18.3 72.3 72.7 66.4 98.5 93.4 92.6 77.9 96.2 86.0 77.1 77.3 64.9
1. Best overall accuracy 2. Best result on speaker field 3. No results below 75%
RULIE: Rule Unification for learning IE

speaker location stime etime All Slots (LP)2 77.6 75.0 99.0 95.5 86.0 RULIE 82.0 80.0 99.0 98.0 89.7
56
1/7/2012
IE by example (1)
the seminar at 4 pm will ... How can we learn a rule to extract the seminar time?
113
IE by example (2)
114
57
1/7/2012
IE by example (3)
115
Shallow Vrs Deep Approaches

Shallow approach
Uses syntax primarily
Tokenisation, POS, etc.
Deep approach
Uses syntactic information Uses semantics (Named entity, etc) Heuristics (World rules, Brother is male) Additional knowledge
116
58
1/7/2012
Single Vrs Multi Slot

Single
Extract one element at a time
The seminar is at 4pm.
Multi Slot
Extract several concepts simultaneously
Tom is the brother of Mary.
Brother(Tom, Mary)
117
Top-Down Vrs Bottom Up

Top-Down
Starts from a generic rule and specialise it
Bottom Up
Starts from a specific rule and relax it
118
59
1/7/2012
Top Down
119
Bottom Up
120
60
1/7/2012
Overfitting Vrs Underfitting

Underfitting
When the learner does not manage to detect the full underlying model Produces excessive bias
Overfitting
When the learner fits the model and the noise
121
Stages of document processing

Document selection involves identification and retrieval of potentially relevant documents from a large set (e.g. the web) in order to reduce the search space. Standard or semantically-enhanced IR techniques can be used for this. Document pre-processing involves cleaning and preparing the documents, e.g. removal of extraneous information, error correction, spelling normalisation, tokenisation, POS tagging, etc. Document processing consists mainly of information extraction
122
61
1/7/2012
Metadata extraction
Metadata extraction consists of two types:
Explicit metadata extraction involves information describing the document, such as that contained in the header information of HTML documents (titles, abstracts, authors, creation date, etc.) Implicit metadata extraction involves semantic information deduced from the material itself, i.e. endogenous information such as names of entities and relations contained in the text. This essentially involves Information Extraction techniques, often with the help of an ontology.
123
IE for Document Access

With traditional query engines, getting the facts can be hard and slow Where has the President visited in the last year? Which places in Europe have had cases of Bird Flu? Which search terms would you use to get this kind of information? How can you specify you want someones home page? IE returns information in a structured way IR returns documents containing the relevant information somewhere (if youre lucky)
124
62
1/7/2012
IE as an alternative to IR
IE returns knowledge at a much deeper level than traditional IR Constructing a database through IE and linking it back to the documents can provide a valuable alternative search tool. Even if results are not always accurate, they can be valuable if linked back to the original text
125
Try IE yourself ... (1)

Given a particular text ... Find all the successions ...
Hint there are 6 including the one below Hint we do not have complete information
E.g.
<SUCCESSION-1>
ORGANIZATION : New York Times POST : "president" WHO_IS_IN : Russell T. Lewis WHO_IS_OUT : Lance R. Primis
126
63
1/7/2012
<DOC> <DOCID> wsj93_050.0203 </DOCID> <DOCNO> 930219-0013. </DOCNO> <HL> Marketing Brief: @ Noted.... </HL> <DD> 02/19/93 </DD> <SO> WALL STREET JOURNAL (J), PAGE B5 </SO> <CO> NYTA </CO> <IN> MEDIA (MED), PUBLISHING (PUB) </IN> <TXT> <p> New York Times Co. named Russell T. Lewis, 45, president and general manager of its flagship New York Times newspaper, responsible for all business-side activities. He was executive vice president and deputy general manager. He succeeds Lance R. Primis, who in September was named president and chief operating officer of the parent. </p> </TXT> </DOC>
127
Answer (1)
<SUCCESSION-2> ORGANIZATION : "New York Times" POST : "general manager" WHO_IS_IN : "Russell T. Lewis" WHO_IS_OUT : "Lance R. Primis" <SUCCESSION-3> ORGANIZATION : "New York Times" POST : "executive vice president" WHO_IS_IN : WHO_IS_OUT : "Russell T. Lewis"
128
64
1/7/2012
Answer (2)
<SUCCESSION-4> ORGANIZATION : "New York Times" POST : "deputy general manager" WHO_IS_IN : WHO_IS_OUT : "Russell T. Lewis" <SUCCESSION-5> ORGANIZATION : "New York Times Co." POST : "president" WHO_IS_IN : "Lance R. Primis" WHO_IS_OUT :
129
Answer (3)
<SUCCESSION-6> ORGANIZATION : "New York Times Co." POST : "chief operating officer" WHO_IS_IN : "Lance R. Primis" WHO_IS_OUT :
130
65
1/7/2012
Questions?
131
66

Ai 5 6 NLP

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Ai 5 6 NLP

Diunggah oleh

Hak Cipta:

Format Tersedia

1/7/2012

Foundational research in phonetics First machine speech recognizers (early 1950s).

George Town University, Washington system:

The ALPAC report (1964)

Assessed research results of groups working on MTs

1950s - 1970s Symbolic paradigm

1967 : Woods procedural semantics

1950s - 1970s Symbolic paradigm

1950s - 1970s Symbolic paradigm

Late 1960s - more formal logical systems

1950s - 1970s Statistical paradigm

cascade of finite-state transducers likelihoods for each letter.

Bayesian methods to the problem of authorship attribution on The Federalist papers

Symbolic vs statistical approaches

1970-1983 Statistical paradigm

1970-1983 Logic-based paradigm

1970-1983 Natural Language Understanding

1970-1983 Natural Language Understanding

Probabilistic models throughout speech and language processing,

Speech - part-of-speech tagging, parsing, attachment ambiguities, semantics.

New focus on model evaluation Considerable work on natural language generation

Increases in the speed and memory of computers

Rise of the Web

1994-1999 Ressources and corpora

Mid 1990s : WordNet (Fellbaum & Miller)

The World Wide Web used as a corpus

2000-2008 Empiricist trends 1

More complex traditional problems castable in supervised machine learning

2000-2008 Empiricist trends 2

2000-2008 Empiricist trends 2

From sounds to language

Basic unit of words are Morphemes (to change form)

NATO Phonetic Alphabet

Pot + s ? (2) A (1)

Syntax structure of language

Syntax tree example

Difficulty: coverage and ambiguity

To construct the meaning of a sentence To generate a grammatical sentence

Lexical semantics Meaning of words

Who is the master? - Context? - Semantic relations?

Where the hell did you get that idea?

Have this belief

Semantics issues in NLP

Speech act analysis (politeness, irony, greeting, apologizing...)

Linguistics knowledge in NLP summary

Speech act interpretation

What is Question Answering?

Question Types (1)

Question Types (2)

Is the Earth flat?

A Simplified Approach (1)

A Simplified Approach (2)

These rules can be easily extended as we think of more questions to ask

Does a gazetteer of people names contains all the names?

Are these persons names?

It is necessary to keep track of the conversation to complete such phrases

Future context (cataphora)

Undertones & implications are often employed for effect or efficiency

A computational model of overlap does not exists

Most common dialogue domain

Components of a Dialogue System

Automatic Speech Recognition