Syntax

Introduction to Syntax, with Part-of-Speech Tagging
Owen Rambow rambow@cs.columbia.edu September 17 & 19
Admin Stuff (Homework)

Ex 2.6: every possible time expression,
not just those patterns that are listed in book! Email Ani if cant access links on homework page
What is Syntax?
Study of structure of language
Specifically, goal is to relate surface form
(e.g., interface to phonological component) to semantics (e.g., interface to semantic component) Morphology, phonology, semantics farmed out (mainly), issue is word order and structure Representational device is tree structure
What About Chomsky?

At birth of formal language theory (comp sci) and

formal linguistics Major contribution: syntax is cognitive reality Humans able to learn languages quickly, but not all languages universal grammar is biological Goal of syntactic study: find universal principles and language-specific parameters Specific Chomskyan theories change regularly These ideas adopted by almost all contemporary syntactic theories
Types of Linguistic Activity

Descriptive: provide account of syntax
of a language; often good enough for NLP engineering work Explanatory: provide principles-andparameters style account of syntax of (preferably) several languages Prescriptive: prescriptive linguistics is an oxymoron
Structure in Strings
Some words: the a small nice big very boy girl sees likes Some good sentences: o the boy likes a girl o the small girl likes the big girl o a very small nice boy sees a very nice boy Some bad sentences: o *the boy the girl o *small boy likes nice girl Can we find subsequences of words
(constituents) which in some way behave alike?
Structure in Strings Proposal 1

Some words: the a small nice big very boy girl sees likes
Some good sentences: o (the) boy (likes a girl) o (the small) girl (likes the big girl) o (a very small nice) boy (sees a very nice boy) Some bad sentences: o *(the) boy (the girl) o *(small) boy (likes the nice girl)
Structure in Strings Proposal 2

Some good sentences: o (the boy) likes (a girl) o (the small girl) likes (the big girl) o (a very small nice boy) sees (a very nice boy) Some bad sentences: o *(the boy) (the girl) o *(small boy) likes (the nice girl)
This is better proposal: fewer types of constituents
More Structure in Strings Proposal 2 -- ctd

Some good sentences: o ((the) boy) likes ((a) girl) o ((the) (small) girl) likes ((the) (big) girl) o ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) Some bad sentences: o *((the) boy) ((the) girl) o *((small) boy) likes ((the) (nice) girl)
From Substrings to Trees

(((the) boy) likes ((a) girl))
boy the
likes a
girl
Node Labels?
((the) boy) likes ((a) girl) Deliberately chose constituents so each one has
one non-bracketed word: the head Group words by distribution of constituents they head (part-of-speech, POS):
o
Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det)
Category of constituent: XP, where X is POS o NP, S, AdjP, AdvP, DetP
Node Labels
(((the/Det) boy/N) likes/V ((a/Det) girl/N))
S NP DetP
likes
NP
boy
DetP
girl
the
Word Classes (=POS)

Heads of constituents fall into
distributionally defined classes Additional support for class definition of word class comes from morphology
Some Points on POS Tag Sets

Possible basic set: N, V, Adj, Adv, P, Det,
Aux, Comp, Conj 2 supertypes: open- and closed-class

o o
Open: N, V, Adj, Adv Closed: P, Det, Aux, Comp, Conj
Many subtypes: o eats/V eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, o Reflect morphological form & syntactic function
More on POS Tag Sets: Problematic Cases

o
adjective or participle?
a
seen event, a rarely seen event, an unseen event, an event rarely seen in Idaho, *a rarely seen in Idaho event
child seat, *a very child seat, *this seat is child
o o
noun or adjective?
a
preposition or particle?
he
threw out the garbage, he threw the garbage out, he threw the garbage out the door, *he threw the garbage the door out
The Penn TreeBank POS Tag Set

Penn Treebank: hand-annotated corpus
of Wall Street Journal, 1M words 46 tags Some particularities:

o o
to /TO not disambiguated

Auxiliaries and verbs not distinguished
Part-of-Speech Tagging
Problem: assign POS tags to words in a
sentence
o
fruit flies like a banana
sentence
o
fruit/N flies/N like/V a/DET banana/N
sentence
o
fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N
sentence
o
2nd example: o the/Det flies/N like/V a/Det banana/N
sentence
o
2nd example: o the/Det flies/N like/V a/Det banana/N Useful for parsing, but also partial
parsing/chunking, IR, etc
Approaches to POS Tagging

Hand-written rules Statistical approaches Machine learning of rules (e.g., decision trees
or transformation-based learning)
Role of corpus:
o o o o
No corpus (hand-written) No machine learning (hand-written) Unsupervised learning from raw data Supervised learning from annotated data
Methodological Points
When looking at problem in NLP, need to
know how to evaluate Possible evaluations:

o o o
against annotated naturally occurring corpus against hand-crafted corpus against human task performance
Need to know baseline: how well does simple
method do? Need to do topline: given evaluation, what is meaninful best result?
Methodological Points
When looking at problem in NLP, need to
know how to evaluate Possible evaluations:

o o o
against naturally occurring annotated corpus (POS tagging: 96%) against hand-crafted corpus against human task performance
Need to know baseline: how well does simple
method do? (POS tagging: 91%) Need to do topline: given evaluation, what is meaninful best result? (POS tagging: 97%)
Reminder: Bayess Law

10 students, of which 4 women & 3 smokers
women
smokers Probability that randomly chosen student is woman: p(w) = 0.4

Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2

women

Prob that a rc woman is a smoker: p(s|w) = 0.5 Prob that rcs is female smoker: p(s,w) = 0.2

women

Prob that a rc woman is a smoker: p(s|w) = 0.5 Prob that rcs is female smoker: p(s,w) = 0.2 = p(w) p(s|w)

women
smokers Prob that rcs is smoker: p(s) = 0.3

Prob that rc smoker is a woman: p(w|s) = 0.66 Prob that rcs is female smoker: p(s,w) = 0.2 = p(s) p(w|s)
Reminder: Bayess Law (end)

p(s,w) = p(s) p(w|s) p(s,w) = p(w) p(s|w)
prior probability likelihood
So: p(s) = p(w) p(s|w) / p(w|s) p(s|w) = p(s) p(w|s) / p(w)
Statistical POS Tagging

Want to choose most likely string of
tags (T), given the string of words (W) W = w1, w2, , wn T = t1, t2, , tn I.e., want argmaxT p(T | W) Problem: sparse data
Statistical POS Tagging (ctd)

p(T|W) = p(T,W) / p(W)
= p(W|T) p (T) / p(W)

argmaxT p(T|W)
= argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T)

p(T) = p(t1, t2, , tn-1 , tn) = p(tn | t1, , tn-1 ) p (t1, , tn-1) = p(tn | t1, , tn-1 ) p(tn-1 | t1, , tn-2) p (t1, , tn-2) = i p(ti | t1, , ti-1 )
i p(ti | ti-2, ti-1 ) trigram (n-gram)

p(W|T) = p(w1, w2, , wn | t1, t2, , tn ) = i p(wi | w1, , wi-1, t1, t2, , tn) i p(wi | ti )

argmaxT p(T|W) = argmaxT p(W|T) p (T) argmaxT i p(wi | ti ) p(ti | ti-2, ti-1 )
Relatively easy to get data for parameter
estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)
Probability Estimation for trigram POS Tagging

Maximum-Likelihood Estimation p ( wi | ti ) = c( wi, ti ) / c( ti ) p ( ti | ti-2, ti-1 ) = c( ti, ti-2, ti-1 ) / c( ti-2, ti-1 )
Statistical POS Tagging

Method common to many tasks in
speech & NLP Noisy Channel Model, Hidden Markov Model

Syntax

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Syntax

Diunggah oleh

Hak Cipta:

Format Tersedia

Introduction to Syntax, with Part-of-Speech Tagging

Owen Rambow rambow@cs.columbia.edu September 17 & 19

Admin Stuff (Homework)

What About Chomsky?

Types of Linguistic Activity

(constituents) which in some way behave alike?

Structure in Strings Proposal 1

Structure in Strings Proposal 2

More Structure in Strings Proposal 2 -- ctd

From Substrings to Trees

Category of constituent: XP, where X is POS o NP, S, AdjP, AdvP, DetP

Word Classes (=POS)

Some Points on POS Tag Sets

Aux, Comp, Conj 2 supertypes: open- and closed-class

Open: N, V, Adj, Adv Closed: P, Det, Aux, Comp, Conj

More on POS Tag Sets: Problematic Cases

The Penn TreeBank POS Tag Set

of Wall Street Journal, 1M words 46 tags Some particularities:

to /TO not disambiguated

fruit flies like a banana

fruit/N flies/N like/V a/DET banana/N

2nd example: o the/Det flies/N like/V a/Det banana/N

parsing/chunking, IR, etc

Approaches to POS Tagging

know how to evaluate Possible evaluations:

Need to know baseline: how well does simple

know how to evaluate Possible evaluations:

Need to know baseline: how well does simple

Reminder: Bayess Law

smokers Probability that randomly chosen student is woman: p(w) = 0.4

Reminder: Bayess Law

smokers Probability that randomly chosen student is woman: p(w) = 0.4

Reminder: Bayess Law

smokers Probability that randomly chosen student is woman: p(w) = 0.4

Reminder: Bayess Law

smokers Prob that rcs is smoker: p(s) = 0.3

Reminder: Bayess Law (end)

So: p(s) = p(w) p(s|w) / p(w|s) p(s|w) = p(s) p(w|s) / p(w)

Statistical POS Tagging

Statistical POS Tagging (ctd)

= p(W|T) p (T) / p(W)

= argmaxT p(W|T) p (T) / p(W) = argmaxT p(W|T) p (T)

Statistical POS Tagging (ctd)

Statistical POS Tagging (ctd)

Statistical POS Tagging (ctd)

Probability Estimation for trigram POS Tagging

Statistical POS Tagging

speech & NLP Noisy Channel Model, Hidden Markov Model

Anda mungkin juga menyukai