Anda di halaman 1dari 12

An Introduction to the Link Grammar Parser

Abbreviated and adapted by C. Lyon from the paper at Link Grammar home page.
Primarily for use as a reference.

1. The logic and notation of link grammars


1.1. The basic idea
1.2. Word rules
1.3. Global rules
1.4. Link grammar in relation to other systems
2. Using the parser
2.1. Multiple linkages
3. General features of the parser
3.1. Connector subscripts
3.2. Macros
3.3. Word files
3.4. Word subscripts
3.5. The cost system
4. Special features of the dictionary
4.1. Capitalization
4.2. Hyphenated expressions
4.3. Number expressions
4.4. Unknown words
4.5. Punctuation
4.6. The wall(s)
4.7. Idioms
5. Coordinating conjunctions
5.1. The handling of conjunctions
5.2. Uses of conjunctions
5.3. Subscripts
5.4. Some problems
6. Post-processing
7. Speed and robustness features
7.1. The null-link system
7.2. The link-length limit
7.3. The post-processing limit

1. The Logic and Notation of Link Grammars


1.1. THE BASIC IDEA.
Think of words as blocks with connectors coming out. For example, for the words a, the, cat,
snake, Mary, ran, and chased, the connectors may look like this:

There are different types of connectors and those connectors may also point to the right or to the left.
Right-pointing connectors are labeled "+", left-pointing connectors are labeled "-". A left-pointing
connector connects with a right-pointing connector of the same type on another word. The two connectors
joined together form a "link". For example, for the sentence the cat chased a snake, the links look like
this:

Words have rules about how their connectors can be connected up, that is, rules about what would
constitute a valid use of that word. A valid sentence is one in which all the words present are used in a way
which is valid according to their rules, and which also satisfies certain global rules.
blah: A+ & B+;

1.2. WORD RULES. A simple dictionary entry would look like this:
blah: A+;

This means that if the word "blah" is used in a sentence, it must form an "A" link with another word; that
is, there must be another word to the right of it with an "A-" connector. Otherwise the sentence is not valid.
The expression following the colon is the "linking requirement" for the word.
A word may have more than one connector that has to be connected. This would be notated as
blah: A+ & B+;

A word may have a rule that either one of two (or one of several) connectors can be used, but exactly one
must be used. In the dictionary, we notate this as
2

blah: A+ or B-;

This means that if the word can make either an "A" link to the right, or a "B" link to the left, its use in the
sentence is valid; but it must make one or the other, and it can not make both.
These rules can be combined. For example, consider the following notation:
blah: A+ or (B- & C+);

This means that the word must make either an "A" link to the right, or a "B" link to the left and a "C" link
to the right. No other combination will be valid.
Such expressions can be nested without limit, such as
blah: (A+ or B-) & ((C- & A+ & (D- or E-)) or F+);

Some connectors are optional; this is notated with curly brackets. For example:
blah: A+ & {B+};

This means the word must make an "A" link to the right, and it can make a "B" link to the right but
does not have to. Curly brackets can also be put around complex expressions, like
blah: (A+ or B+) & {C- & (D+ or E-)};

An equivalent way of writing an optional expression like "{X-}" is "(X- or ())". This can be useful,
since it allows a cost to be put on the no-link option (see Section 3.5).
A word can also make an indefinite number of links of the same type to other words. For this, we use
the "multi-connector" symbol "@". For instance, the word below could make any number of F links to
words to the right (but is not required to make any).
blah: (A+ or B+) & {C- & (D+ or E-)} & {@F+};

(If a word has "@A+", with no curly brackets, it is required to make at least one A+ link to the right;
any others are optional.)
The ordering of elements in the "connector expression" is important. What that dictates is the relative
closeness of the words that are being connected to. The further to the left the connector name, the
closer the connection must be. For example,
blah: A+ & B+;

This means that "blah" must make an "A" link to the right and a "B" link to the right, and the word it
makes the "A" link with must be closer than the word it makes the "B" link with.
This only pertains, however, to connections in the same direction. For connectors pointing in opposite
directions, the ordering is irrelevant. Therefore
blah: A+ & B-;

means exactly the same thing as


blah: B- & A+;

For that matter,


blah: A- & B+ & C+ & D-;

means exactly the same thing as


blah: B+ & C+ & A- & D-;

For "or" expressions, such as "A+ or B+", the ordering of the elements is irrelevant.
3

A dictionary entry thus consists of a word, followed by a colon, followed by a connector expression,
followed by a semi-colon. The dictionary consists of a series of such entries. Any number of words can
be put in a list, separated by spaces; they will then all possess the linking requirement that follows:
blah blee blay: A+;

A connector name must consist of one or more capital letters (any number may be used), followed by
"+" or "-".
We should mention one concept here that plays an important role in the internal workings of the parser:
the "disjunct". A disjunct is a set of connector types that constitutes a legal use of a word. The
dictionary expression for any word can be represented as a set of disjuncts. If a word has the following
expression:
blah: {C-} & (A+ or B+);

then it has the following four disjuncts:


C- A+
A+
C- B+
B+

These disjuncts represent all the legal uses of the word "blah". Using C- and A+ is a legal use of the
word; using A+ and B+ is not. Disjuncts play an important role in the internal workings of the parser.
1.3. GLOBAL RULES. As well as these "word rules", which are specified in the dictionary, there are
two other global rules which control how words can be connected.
First of all, links can not cross. For example, the following way of connecting these four words
(connecting "cat" to "dog" and "horse" to "fish") would be illegal. The parser simply will not find such
linkages.
+------------+
+---- | -----+
|
|
|
|
|
cat
horse dog fish

This is the "crossing-links" (or "planarity") rule. Secondly, all the words in a sentence must be
indirectly connected to each other. Therefore the following way of connecting these four words would
be illegal (if it was the entire linkage).
+-----+
+----+
|
|
|
|
cat horse dog fish

This is the "connectivity" rule. A valid sentence is therefore one which can be linked up in a way that
a) all the words are used in a way that satisfies their linking requirements; and b) the crossing- links
and connectivity rules are not violated.
1.4. LINK GRAMMAR IN RELATION TO OTHER SYSTEMS. The structure assigned to a
sentence by a link grammar is rather unlike any other grammatical system that we know of (although it
is certainly related to dependency grammar). Rather than thinking in terms of syntactic functions (like
subject or object) or constituents (like "verb phrase"), one must think in terms of relationships between
pairs of words. In the sentence below, for example, there is an "S" ("subject") relation between "dog"
and "has"; a "PP" (past-participle) relationship between "has" and "gone"; and a "D" (determiner)
relation between "the" and "dog". (Ignore the lower-case letters for the moment; they will be explained
below.)
4

+-----Ds-----+
|
+---A--+-Ss-+-PP-+
|
|
|
|
|
the black.a dog.n has gone

Where possible, we try to give link-types names that have mnemonic significance in this way.
It may be seen, however, that parts of speech, syntactic functions, and constituents may be recovered
from a link structure rather easily. For example, whatever word is on the left end of an "S" link is the
subject of a clause (or the head word of the subject phrase); whatever is on the right end is the finite
verb; whatever is on the left-end of a D link is a determiner; etc.. Moreover, all nouns, verbs, and
adjectives in the dictionary are subscripted (as ".n", ".v", or ".a"--see section 3.4), so in these cases the
syntactic category of the word is made explicit.
With version 4.0, we have incorporated a system for deriving a traditional constituent representation of
a sentence from a linkage.

2. Using the Parser


Go to Link Grammar home page and follow instructions. You will probably start by using the trial
demonstration parser. Otherwise, you can install the full parser as explained.
2.1. MULTIPLE LINKAGES If at any point the parser finds more than one way of analyzing a string, it
generates both of them, and tries parsing the sentence with both forms of the string. This might happen a)
if there are multiple forms of the word in the dictionary, such as run.v and run.n with different subscripts;
b) if the string is capitalized and occurs at the beginning of the sentence, and both the capitalized and
lower-case forms are listed in the dictionary, such as Bill and bill; c) if there is more than one
UNKNOWN-WORD category. See Section 4 for further details. Multiple linkages are also sometimes
generated when there is a conjunction like 'and'. See section 5.

3. General Features of the Parser


3.1. CONNECTOR SUBSCRIPTS. In general, a connector may only link to another one with the same
name, i.e., the same string of capital letters. However, there is another way of controlling how connectors
may link to each other, using connector subscripts. A subscript is a lower-case letter following a connectorname, like "Ss+". An "Ss+" connector can connect with an unspecified "S-" connector, or an "Ss-"
connector, but not with an "Sp-" connector.
Connector types may have multiple subscript characters, such as "Spa+". An "Spa+" can connect with
an "S-", an "Sp-", or an "Spa-", but not with an "Ss-" or an "Ssa-" or an "Spb-".
An "*" subscript type is a "wildcard" that can connect with anything. Therefore, an "S*+" is exactly the
same as an "S+". An "S*a+" can connect with an "S-", an "Ss-", an "Sp-", or an "Ssa-", but not with an
"Ssb-".
3.2. MACROS. It is possible to define a single symbol as a longer connector expression, and then use
that symbol to refer to the longer expression in the dictionary. To do this, simply choose a name for the
longer expression, and surround it with angle brackets (<>). Then treat it like a word in the dictionary;
list the name, then a colon, then the connector expression that it should stand for. For example, we
define "<noun-main-s>" in the dictionary as follows:
<noun-main-s>: (Ss+ & <CLAUSE>) or SIs- or Js- or Os- or
({[Bsj+]} & Xd- & Xc+ & MX-);

We then use this symbol in many other actual word definitions.


We use many of these macros in the dictionary, to reduce redundancy; there are many connector
expressions that are used over and over in longer expressions. Here are a few common ones:
<noun-main-...>: the "main" connectors for nouns, used to link them to the rest of the sentence
(as subject, object, etc).
<noun-sub-...>: the "sub" connectors for nouns, used to link them to modifiers like
prepositional phrases and relative clauses.
<verb-...>: These macros are for verbs; they distinguish different forms of the same verb. That
is, they contain connector types - like S-, PP-, etc. - that distinguish different forms of the same
verb. <verb-s> is for singular verbs, <verb-pp> for past participles, <verb-sp,pp> for forms
which are both simple past and past participle, etc..
<vc-...>: These macros are for verb complements; they stand for different complement
expressions. Some verbs can connect to a direct object, using O+; some can connect to an
infinitive verb, using TO+; and so on.
3.3. WORD-FILES. The most basic way to write the dictionary is to list all the words in a particular
category, followed by a colon, followed by their connector expression. There is another way, however. One
can put all the words in a category in a file, choose a name for the file, and put that filename in the
dictionary in place of the list of words. When listed in the dictionary, the filename must be preceded by a
slash (/).
Here are the word files that are in use at the moment:
words.n.1
words.n.2.s
words.n.2.x
words.n.3
words.n.4
words.n.p
words.n.t

singular countable (i.e. not mass) nouns


plural nouns ending in "s"
plural nouns not ending in "s"
mass nouns
nouns that may be mass or countable
proper names that are also ordinary words when not capitalized (see Section
3.1 for explanation)
nouns that can be used as titles, like "president"

(In the following verb files, the final number indicates the verb form. ".1" is for infinitive-plural forms,
".2" is for singular forms, ".3" is for simple-past / past-participle forms, ".4" is for present participles,
".5" is for gerunds. On intransitive verbs, the present participle and gerund expression are combined
into a single dictionary entry.)
words.v.1.(1-4)
words.v.1.p
words.v.2.(1-5)
words.v.4.(1-5)
words.v.5.(1-4)
words.v.6.(1-5)
words.v.8.(1-5)
words.v.10.(1-4)

intransitive verbs
special two-word passives ("lied_to_", "paid_for")
optionally transitive verbs
transitive verbs
intransitive verbs that may form two-word verbs with particles like
"up" and "out"
optionally transitive verbs that may form two-word verbs
transitive verbs that may form two-word verbs
verbs that may be used in quotation expressions, like "said" ("John is
here, he said").
6

words.adj.1
words.adj.2
words.adj.3
words.adv.1
words.adv.2
words.adv.3
words.y
words.s

ordinary adjectives, with no special complements


ordinary comparative adjectives (e.g. "bigger")
ordinary superlative adjectives (e.g. "biggest")
ordinary manner adverbs ("quickly", "angrily")
ordinary clausal adverbs ("fortunately")
adverbs like "chemically"
common year numbers ("1990", etc.)
US state names and abbreviations

3.4. WORD SUBSCRIPTS. A single word can be given several different dictionary entries. To do this, the
entries must be distinguished by giving the words different subscripts. Words may be followed by a
subscript such as ".n". For example:
run.n: A+ or B+...
run.v: C+ or D+...

If a word is listed more than once with the same subscript, or if it listed once with a subscript and once
without, the parser will generate a warning message and will ignore one of the entries.
The parser starts at the right end of every string of characters. Any sequence of letters to the right of the
right-most period in the string will be considered the subscript.
In searching for linkages, the parser will consider each entry for the word as a different word, and will
generate all linkages found for all entries. The subscript is shown in the display, thus indicating which
entry the parser chose for a particular linkage.
The main word subscripts we use are ".n" for nouns, ".v" for verbs, and ".a" for adjectives. All nouns,
verbs, and adjectives are subscripted in this way. Certain other subscripts are used only when needed to
distinguish two forms of the same word: ".e" for adverb ".p" for preposition, ".s" for singular, ".p" for
plural, ".t" for title.
3.5. THE COST SYSTEM. (Ignore this section initially )
We have a system for assigning a cost to a linkage. This allows the parser to express preferences among
the linkages it finds. The cost system uses square brackets ("[" and "]"). If a connector, or a series of
connectors, is surrounded by square brackets, it is assigned a cost. The amount of cost is equal to the
number of square brackets on each side: [A+] will receive a cost of 1; [[A+]] will receive a cost of 2; etc..
The parser uses this cost as a criterion for deciding which linkage to output first; it outputs them in order of
cost (i.e., lowest cost first).
At the moment, connectors with a cost of 0, 1 or 2 are considered in normal parsing.
Given several linkages of the same cost level, the parser has certain heuristics for choosing the best parse,
i.e., the one to output first. It prefers the linkage in which the total length of the links is lowest; and in
sentences with conjunctions, it prefers a linkage where the lengths of the conjoined word-lists are similar
(see section 5). This information is indicated in the cost vector shown above the linkage:
Unique linkage, cost vector = (UNUSED=1 DIS=0 AND=0 LEN=1)

"DIS" is the connector cost or disjunct cost for the linkage (the "[]" system explained above); "AND" is the
difference in length between and-list elements; and "LEN" is the total length of all links in the sentence
7

(minus the number of words--since the total link length is never less than the number of words).
"UNUSED" indicates the number of null-links; see section 7.1.

4. Special Features of the Dictionary


4.1. CAPITALIZATION. The parser respects capitalization: that is, the use of upper- and lower-case
letters. If a string is listed in the dictionary beginning with a capital letter, then a word that is inputted will
only match it if it has the same capitalization. (The same with strings with capital letters in the middle,
although this is probably of little use.) However, there are a few special cases here.
There is a general category in the dictionary called "CAPITALIZED_WORDS". This is the default
category for words whose first character is capitalized. Any such word which is inputted which is not
explicitly listed in another category will be assigned to this category. This is of course useful, since most
capitalized words are names which are grammatically all the same.
A special situation occurs with words at the beginning of the sentence. If a sentence-initial word has an
uncapitalized first letter, it is treated in the normal manner. If it is capitalized, the parser will first look to
see if it is listed in the dictionary as a either a capitalized word or an uncapitalized word. If not, it will then
assign it to the generic "CAPITALIZED_WORDS" category. (If the word is listed both as a capitalized
word and an uncapitalized one, the parser will try to use it in both ways. Because there are certain words
which are also common names, like "Will" and "Rob", we have created a special category for them, so that
when they are used sentence-initially, they will be recognized as possible names.)
The situation at the beginning of the sentence also applies after a colon. Sometimes, after a colon, the
following word is capitalized as if it was the beginning of a sentence; the parser recognizes this. So, for
example, the following sentence is accepted: "The problem is this: The dog ran."
4.2. HYPHENATED EXPRESSIONS. The dictionary also contains a special category called
"HYPHENATED_WORDS". If a string contains a hyphen, and it is not listed in the dictionary, the parser
will assign it to the category "HYPHENATED_WORDS". This is, again, useful, since hyphenated words
are used somewhat "productively", and it would be very difficult to list them all.
4.3. NUMBER EXPRESSIONS. The dictionary contains a category "NUMBERS". Any numerical
expression -- that is, a string consisting entirely of numerical characters -- will be assigned to this category
unless it is explicitly listed elsewhere in the dictionary. (The string may also contain a period, i.e. a
decimal point, or a comma, as in "3,287". It may also contain a colon; thus time expressions like "4:30" are
treated as generic numbers.)
4.4. UNKNOWN WORDS. The dictionary also permits a feature known as "unknown words". A category
can be defined using the string "UNKNOWN-WORD.x", where x is any subscript. If a word beginning
with a lower-case letter is typed in that is not recognized, it will be assigned to that category. The word is
then displayed with a question-mark in brackets, like "blah" below:
+-----Wd----+
|
+---D--+--Ss--+-Pp+
|
|
|
|
|
///// the blah[?].n is here

Several different unknown word categories may be generated, labeled with different subscripts: for
example, corresponding to nouns, verbs, and adjectives and adverbs. (These are the four categories we use,
8

labeled .n, .v, .a, and .e, respectively.) The parser will search for all linkages that can be found using each
entry. If it only finds a linkage for the "noun" category, then the output will show the unknown word
labeled ".n": in effect, the parser is then guessing that the word is a noun.
Version 4.0 of the parser has an new feature for handling unknown words, known as "morpho-guessing".
This is a system for guessing the syntactic category of an unknown word (that is, a word not explicitly
listed in the dictionary) based on its spelling. Words that end in "-s" are assumed to be plural nouns or
singular verbs; these are assigned to a category listed as "S-WORDS" in the dictionary. Similarly, words
ending in "-ed" are assumed past-tense (or passive) verbs; those ending in "-ing", present participles; those
ending in "-ly", adjectives. This greatly improves the ability of the parser to handle sentences containing
multiple unknown words. Words that have been treated in this way are marked with a "[!]".
4.5. PUNCTUATION The parser is capable of handling a variety of punctuation symbols. There are two
issues to be discussed here. One is the listing of symbols in the dictionary; the other is the way they are
"read" by the parser when they are used in sentences.
Punctuation symbols can be listed in the dictionary just like words, and given ordinary linkage
expressions. The same is true for strings containing multiple punctuation symbols or a mixture of letters
and punctuation. The problem here is that certain punctuation symbols are also used as the "syntax" of the
dictionary: colons, semi-colons, ampersands, etc.. Our solution to this is as follows: when listing these
special characters, or a string containing them, one must put them in quotation marks:
";": A+ or B-;
"+": C+ or D-;

(The special characters that must be treated this way are precisely those which are used in the dictionary in
a "syntactic" way: "(", ")", "{", "}", "[", "]", "@", "%", "&", "*", "+", "-", "/", "<", ">".)
When punctuation symbols are used in sentences, they will be used in linkages according to the connector
expressions listed in the dictionary, in the normal way. There is a difference, however. It may be noted that
although many punctuation symbols are similar to words in the ways they are used, they are often not
separated from preceding or following words by spaces. In order for these symbols to be recognized as
separate units, then, they must be "stripped off": that is, a space must be inserted between the symbol and
the adjacent word. Details are in the paper accessed from Link Grammar home page.
One exceptional case is quotation marks. Quotation marks may not be defined in the dictionary; and they
are simply ignored when they are used in sentences. This is sufficient to handle most uses of quotes;
generally, the presence of quotes does not affect the well-formedness of sentences, and it is often only
subtlely affects meaning. However there are a few constructions, such as the pair of sentences below,
which seem to be only correct when quotes are included.
She said, "John is leaving".
?She said, John is leaving.
We are unable to control such usages at the moment.
4.6. THE WALL(S). It proved to be useful to imagine that there was a dummy word at the beginning of
every sentence. We call this "the wall". The wall has a linking requirement like any other word; it is listed
in the dictionary under "LEFT-WALL". If this entry is included in the dictionary, the wall will be
automatically inserted at the beginning of every sentence. Because of the connectivity rule, it is then
necessary for the wall to be linked to the rest of the sentence in order for the sentence to be valid.
9

There is also a "right-hand wall", which is similar to the original wall at the other hand of the sentence.
This is only needed for certain punctuation phenomena. In most sentences, we use a special "RW"
connector to simply connect the left hand wall to the right hand one. The right-wall's dictionary entry is
"RIGHT-WALL". (Since the left-wall is much more important than the right-wall, we often refer to the
left-wall simply as "the wall".)
In most sentences, the left-wall connects to the sentence with a "Wd" link, and the right-wall connects to
the left-wall with "RW". When only these connectors on the walls are being used, they are not displayed in
the linkage diagram. When other connectors on the walls are being used, instead or as well, the walls are
shown. (For example, the left-wall is shown in questions and imperatives.) To make it so that the walls are
_always_ shown, type "!walls".
4.7. IDIOMS. A string of words can be defined as a single dictionary entry. To do this, simply join the
words together with underbars:
a_la_mode: A+ or B-;

Most idioms can be interpreted either as a single "idiom" or as a string of words (for example, "in
question"). In this case, the parser will find all linkages with both interpretations.
In reading idiomatic strings from the dictionary, the parser breaks them up into individual words and
assigns them "dummy" link-types which simply link the words of the idiom together in series. These linktypes are assigned four-letter names of the form ID[X][Y], where X and Y are arbitrary letters.
Idioms cannot be given subscripts; if "a_la_mode.a" is included in the dictionary, this will not be accepted.
However, an idiom can be listed in the dictionary more than once, without subscripts.

5. Coordinating Conjunctions
Coordination constructions do not fit naturally into the framework of link grammars. We have devised a
method for automatically transforming the given link grammar into another one that captures the desired
phenomena. See the full introduction at the Link Grammar home page for details, but problems associated
with conjunctions are not yet fully resolved.
Conjunctions are a frequent source of ambiguity. For example, in the sentence "Several big cats and dogs
with sharp teeth chased me", "several" may or may not apply to "dogs" (as a plural noun, "dogs" does not
require a determiner); "big" may or may not apply to dogs; and "with sharp teeth" may or may not apply to
cats. Linkages for all of these possibilities will of course be generated.
A few usages of coordinating conjunctions are handled using ordinary link logic. There is some overlap
between the special handling of conjunctions and the ordinary handling, so that some sentences receive
multiple parses. For example, ordinary clauses conjoined together will receive two parses: "John ran and
Fred walked". See the entries in the Guide-To-Links on "W" and "CC" for discussion these ordinary
usages of conjunctions.
Another problem concerns the different kinds of conjunctions. Our discussion focuses on the word "and",
although the ideas apply to the use of "or", "but", "either-or", "neither-nor", "both-and", and "not only but". Right now, our system does not always distinguish between the various kinds of conjunctions
allowed. However, there appear to be different constraints on different conjunctions. This results in some
false positives:
10

I saw John and Fred


*I saw John but Fred
The dog or cat ran
*The either dog or cat ran
Thus our system still needs some work in the area of conjunctions.

6. Post-Processing
Besides conjunctions, there are certain phenomena in English which the parser is incapable of dealing with
in its basic form. To solve these problems, we developed a post-processing system, based on a concept we
call "domains". A domain is a subset of the links that make up a sentence. After a linkage has been found,
the post-processing mechanism goes through the linkage and divides the sentence up into domains based
on the kind of links that are present in the sentence. It then further divides the links into "groups": sets of
links which share a particular domain membership. It then applies rules which may declare the linkage
invalid based on the combinations of links present in a given group. See the full paper at the Link
Grammar home page for details.

7. Speed and Robustness Features


The original version of the parser, as described in previous sections, did an exhaustive search for all
correct linkages; if none were found, it produced nothing. However, this meant that the parser was not
"robust": it could not do anything useful with a sentence unless it could parse the entire thing. It was also
quite slow. We have developed several remedies to these problems.
7.1. THE NULL-LINK SYSTEM. An important feature of the parser is the "null-link" system. This
effectively allows robust parsing: that is, it allows the parser to assign some structure to a sentence even
when it cannot fully interpret it. Basically, if the parser cannot parse a sentence normally (that is, if it
cannot find any valid linkages), it tries ignoring one word in the sentence. It finds all the linkages it can,
ignoring just one word (some linkages may ignore one word, some may ignore another). This is "null link
stage 1". Failing that, it then attempts to find linkages ignoring 2 words. This is "null link stage 2". Failing
that, it will continue to increment the number of null links, until it finds some valid linkages; it will then
output all the linkages found at this stage, and stop. There may be some cases where it cannot find a valid
linkage unless it ignores _all_ the words in the sentence; in this case, it simply gets to "null link stage N"
(where N is the number of words in the sentence), and then gives up.
In the graphic display, "null-linked" words are shown in brackets, with no links attached to them. In the
sentence below, "gosh" and "," are null-linked:
+--Dsu--+---Ss---+----O---+
|
|
|
|
[gosh] [,] this sentence.n uses.v null-links

In null-link parsing, the connectivity requirement is suspended (see Section 1.3). This means that
disconnected "islands" may form. However, each island represents one added null link. That is, if a
sentence can be parsed as three disconnected islands (but with all the words otherwise connected with
regular links), this will linkage will be found at null link stage 2.

11

The null-link system can be turned on or off by typing the command "!null". The default is that null-links
are on. If null-links are turned off, then, when ther parser is unable to find a complete linkage for a
sentence, it will say "No complete linkages found", and prompt for the next sentence.
7.2. THE LINK-LENGTH LIMIT. In studying the parser's performance on very long sentences (on
which it was often very slow), we discovered that it was often considering extremely long links even for
link-types which are generally very short. For this reason, we installed a "link-length-limit": links are only
allowed to be a certain length, in terms of the number of words from end to end.
7.3. THE POST-PROCESSING LIMIT. Since post-processing proved to be a major source of the
slowness of the parser, we installed a "post-processing limit". This is simply a limit on the number of
linkages that will be considered by post-processing. If the limit is set at 100 (this is the default), then only
100 linkages will be considered by post-processing, even if many more than that are generated; the others
will just be discarded. This means, of course, that the "best" linkage (by the parser's heuristics, for
example) may be discarded. However, the linkages to be considered by post-processing are selected
randomly from all the generated ones, which means that at least one linkage is likely to be found which is
fairly similar to the correct one.

12

Anda mungkin juga menyukai