Anda di halaman 1dari 36

Introduction to Syntax, with Part-of-Speech Tagging

Owen Rambow rambow@cs.columbia.edu September 17 & 19

Admin Stuff (Homework)


Ex 2.6: every possible time expression,

not just those patterns that are listed in book! Email Ani if cant access links on homework page

What is Syntax?
Study of structure of language
Specifically, goal is to relate surface form

(e.g., interface to phonological component) to semantics (e.g., interface to semantic component) Morphology, phonology, semantics farmed out (mainly), issue is word order and structure Representational device is tree structure

What About Chomsky?


At birth of formal language theory (comp sci) and

formal linguistics Major contribution: syntax is cognitive reality Humans able to learn languages quickly, but not all languages universal grammar is biological Goal of syntactic study: find universal principles and language-specific parameters Specific Chomskyan theories change regularly These ideas adopted by almost all contemporary syntactic theories

Types of Linguistic Activity


Descriptive: provide account of syntax

of a language; often good enough for NLP engineering work Explanatory: provide principles-andparameters style account of syntax of (preferably) several languages Prescriptive: prescriptive linguistics is an oxymoron

Structure in Strings
Some words: the a small nice big very boy girl sees likes Some good sentences: o the boy likes a girl o the small girl likes the big girl o a very small nice boy sees a very nice boy Some bad sentences: o *the boy the girl o *small boy likes nice girl Can we find subsequences of words

(constituents) which in some way behave alike?

Structure in Strings Proposal 1


Some words: the a small nice big very boy girl sees likes
Some good sentences: o (the) boy (likes a girl) o (the small) girl (likes the big girl) o (a very small nice) boy (sees a very nice boy) Some bad sentences: o *(the) boy (the girl) o *(small) boy (likes the nice girl)

Structure in Strings Proposal 2


Some words: the a small nice big very boy girl sees likes
Some good sentences: o (the boy) likes (a girl) o (the small girl) likes (the big girl) o (a very small nice boy) sees (a very nice boy) Some bad sentences: o *(the boy) (the girl) o *(small boy) likes (the nice girl)
This is better proposal: fewer types of constituents

More Structure in Strings Proposal 2 -- ctd


Some words: the a small nice big very boy girl sees likes
Some good sentences: o ((the) boy) likes ((a) girl) o ((the) (small) girl) likes ((the) (big) girl) o ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) Some bad sentences: o *((the) boy) ((the) girl) o *((small) boy) likes ((the) (nice) girl)

From Substrings to Trees


(((the) boy) likes ((a) girl))

boy the

likes a

girl

Node Labels?
((the) boy) likes ((a) girl) Deliberately chose constituents so each one has

one non-bracketed word: the head Group words by distribution of constituents they head (part-of-speech, POS):
o

Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det)

Category of constituent: XP, where X is POS o NP, S, AdjP, AdvP, DetP

Node Labels
(((the/Det) boy/N) likes/V ((a/Det) girl/N))
S NP DetP

likes

NP

boy

DetP

girl

the

Word Classes (=POS)


Heads of constituents fall into

distributionally defined classes Additional support for class definition of word class comes from morphology

Some Points on POS Tag Sets


Possible basic set: N, V, Adj, Adv, P, Det,

Aux, Comp, Conj 2 supertypes: open- and closed-class


o o

Open: N, V, Adj, Adv Closed: P, Det, Aux, Comp, Conj

Many subtypes: o eats/V eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, o Reflect morphological form & syntactic function

More on POS Tag Sets: Problematic Cases


o

adjective or participle?
a

seen event, a rarely seen event, an unseen event, an event rarely seen in Idaho, *a rarely seen in Idaho event
child seat, *a very child seat, *this seat is child

o o

noun or adjective?
a

preposition or particle?
he

threw out the garbage, he threw the garbage out, he threw the garbage out the door, *he threw the garbage the door out

The Penn TreeBank POS Tag Set


Penn Treebank: hand-annotated corpus

of Wall Street Journal, 1M words 46 tags Some particularities:


o o

to /TO not disambiguated


Auxiliaries and verbs not distinguished

Part-of-Speech Tagging
Problem: assign POS tags to words in a

sentence
o

fruit flies like a banana

Part-of-Speech Tagging
Problem: assign POS tags to words in a

sentence
o

fruit/N flies/N like/V a/DET banana/N

Part-of-Speech Tagging
Problem: assign POS tags to words in a

sentence
o

fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N

Part-of-Speech Tagging
Problem: assign POS tags to words in a

sentence
o

fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N

2nd example: o the/Det flies/N like/V a/Det banana/N

Part-of-Speech Tagging
Problem: assign POS tags to words in a

sentence
o

fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N

2nd example: o the/Det flies/N like/V a/Det banana/N Useful for parsing, but also partial

parsing/chunking, IR, etc

Approaches to POS Tagging


Hand-written rules Statistical approaches Machine learning of rules (e.g., decision trees

or transformation-based learning)

Role of corpus:
o o o o

No corpus (hand-written) No machine learning (hand-written) Unsupervised learning from raw data Supervised learning from annotated data

Methodological Points
When looking at problem in NLP, need to

know how to evaluate Possible evaluations:


o o o

against annotated naturally occurring corpus against hand-crafted corpus against human task performance

Need to know baseline: how well does simple

method do? Need to do topline: given evaluation, what is meaninful best result?

Methodological Points
When looking at problem in NLP, need to

know how to evaluate Possible evaluations:


o o o

against naturally occurring annotated corpus (POS tagging: 96%) against hand-crafted corpus against human task performance

Need to know baseline: how well does simple

method do? (POS tagging: 91%) Need to do topline: given evaluation, what is meaninful best result? (POS tagging: 97%)

Reminder: Bayess Law


10 students, of which 4 women & 3 smokers
women

smokers Probability that randomly chosen student is woman: p(w) = 0.4


Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2

Reminder: Bayess Law


10 students, of which 4 women & 3 smokers
women

smokers Probability that randomly chosen student is woman: p(w) = 0.4


Prob that a rc woman is a smoker: p(s|w) = 0.5 Prob that rcs is female smoker: p(s,w) = 0.2

Reminder: Bayess Law


10 students, of which 4 women & 3 smokers
women

smokers Probability that randomly chosen student is woman: p(w) = 0.4


Prob that a rc woman is a smoker: p(s|w) = 0.5 Prob that rcs is female smoker: p(s,w) = 0.2 = p(w) p(s|w)

Reminder: Bayess Law


10 students, of which 4 women & 3 smokers
women

smokers Prob that rcs is smoker: p(s) = 0.3


Prob that rc smoker is a woman: p(w|s) = 0.66 Prob that rcs is female smoker: p(s,w) = 0.2 = p(s) p(w|s)

Reminder: Bayess Law (end)


p(s,w) = p(s) p(w|s) p(s,w) = p(w) p(s|w)
prior probability likelihood

So: p(s) = p(w) p(s|w) / p(w|s) p(s|w) = p(s) p(w|s) / p(w)

Statistical POS Tagging


Want to choose most likely string of

tags (T), given the string of words (W) W = w1, w2, , wn T = t1, t2, , tn I.e., want argmaxT p(T | W) Problem: sparse data

Statistical POS Tagging (ctd)


p(T|W) = p(T,W) / p(W)

= p(W|T) p (T) / p(W)


argmaxT p(T|W)

= argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T)

Statistical POS Tagging (ctd)


p(T) = p(t1, t2, , tn-1 , tn) = p(tn | t1, , tn-1 ) p (t1, , tn-1) = p(tn | t1, , tn-1 ) p(tn-1 | t1, , tn-2) p (t1, , tn-2) = i p(ti | t1, , ti-1 )
i p(ti | ti-2, ti-1 ) trigram (n-gram)

Statistical POS Tagging (ctd)


p(W|T) = p(w1, w2, , wn | t1, t2, , tn ) = i p(wi | w1, , wi-1, t1, t2, , tn) i p(wi | ti )

Statistical POS Tagging (ctd)


argmaxT p(T|W) = argmaxT p(W|T) p (T) argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 )
Relatively easy to get data for parameter

estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)

Probability Estimation for trigram POS Tagging


Maximum-Likelihood Estimation p ( wi | ti ) = c( wi, ti ) / c( ti ) p ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-1 )

Statistical POS Tagging


Method common to many tasks in

speech & NLP Noisy Channel Model, Hidden Markov Model

Anda mungkin juga menyukai