Anda di halaman 1dari 25

This paper describes a novel framework for creating a parser to process and analyze texts written in a partially structured

natural language. Here an application of such combined parser in practical situations is demonstrated and shown that the proposed approach can efficiently construct a parser for analyzing project-specific industrial specification documents.

Parser combinators are operators to combine existing simple parsers and to construct more complicated parsers. Using parser combinators or a combinatorial parser framework, we can quickly and flexibly construct parsers.

In various real projects such as software development projects, a lot of text documents are created in project-specific forms.Therefore we have a large number of opportunities of such text parser construction frameworks to automatically analyze the text documents.

We often need to parse some specific parts of entire text


documents. However, in the conventional combinatorial parser frameworks. It is difficult to concisely deal with partial parsing.

Various kinds of natural language processings or parsers


NLPs are useful, but such NLPs are not dealt with flexibly in the conventional frameworks, because ordinary NLPs do not have an explicit acceptance-language; and

We would like to quickly and flexibly construct text-parsers


using natural language parsers as sub-parsers. There are, however, no frameworks to generate such parsers from a grammar or an expression.

Characterizing each parser as a function that accepts a prefix of input-text and extracts some parts of the accepted text, which expresses the extracted parts from the input text by the parser, Designing text-parser combinators that make it easier and more flexible to construct parsers, especially combinators for partial parsing and combinations of natural language parsers and formal language parsers, and Formalizing text-parser expressions as extensions of the parsing expression grammars and constructing a system for generating project-specific parsers from the expressions.

Consider parsers pa, pb, pc (each accepts only a, b, c, respectively) and the combinators ; and / and . p1; p2- a parser that parses the input-string with p1 and then parses remained-string with p2, p1/p2 - parses the input-string with p1 and if the parsing fails, parses the input-string with p2, and p - parses the input-string with p as many times as possible. Parsing abbcde with the parser pa; (pb/pc) accepts abbc and returns the remainder-string de and an abstract syntax tree for the accepted string.

we characterize each text parser as a function that receives a text, i.e a string, extracts a series of strings from the text and returns both the remainder string and the abstract syntax tree for the parsed string.
Abstract syntax tree as a tree structure with string-labels :

We introduce two parser combinators for the partial parsings and add explicitly them into our text parser combinators.

<>p Matching Parse @p Checking Parse

Let us consider a NLP Nisa that receives a sentence and extracts two is-a-relation strings if a relation can be extracted from the given sentence . We can deal with Nisa as a function that receives a string and return an Astb(isa, Asts(is, sentence1), Asts(a, sentence2)). For example, given input strings If the account number is 0 and foreign-currency trade, the parser could return ASTs Astb(isa, Asts(is, s1), Asts(a, 0)) and Astb(isa, Asts(is, null), Asts(a, s2)), where s1 = the account number and s2 = foreign-currency trade.

We design combinators that make input-text for NLPs by using formal language parsers.

p1Up2 This parser scans the input text and locates the first position in the text where parsing by parser p2 succeeds. Then It parses the text from the start until the located position with parser p1 and then parses the text after the located position with parser p2.
p1Fp2 This parser first parses the input text with parser p2 and then parses the accepted-string of p2 with parser p1.

The text-parser expressions are based on PEGs In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.

Rules for [N]: We assumed the natural language parsers are modeled (and may be wrapped) as parsers that do not have any explicit acceptance language but accept all input strings. Therefore, [N] accepts input-string s and extracts [s] and then returns both the empty string and an AST. 2. Rules for @p: Combinator @ cancels or clears away the extracted strings, so in each rule using this combinator, @p extracts no string. The abstract syntax tree created earlier is also cleared. This combinator is useful in situations where we would like to only check the description form of the input text and not extract any parts of the text.
1.

3. Rules for <>p:

Combinator lets the operand parser search for the first


position k of the input string s such that s[k..] can be parsed. If such a position exists, then the parser continues past the substring s[0..k 1] and parses s[k..]. Otherwise, the parsing fails. 3. Rules for p1Fp2: The first rule for this combinator says that parser p2 first parses the input string s. If the parsing succeeds then parser p1 parses the accepted string of the parser p2, which is expressed by s - s. The whole parser extracts l and returns the remaining string s that was the result of parsing with p2. The other two rules are for parsing errors. One is for the fail parsing of p2 and the other is for the failure of parsing with p1.

5. Rules for p1Up2:

This combination seeks the first position k of the input string s, by advancing the parse-starting position of the input-string, so that the parser p2 can parse s[k..].
If such a position exists, then the parser p1 try to parse the substring from position 0 to k of the input s (i.e. s[0..k]), otherwise the parse fails. The three rules of this combination cover these situations.

The system is following these steps: 1) The user prepares NLP functions and stores these functions and reference names for the functions into a NLP-store. 2) The user writes text parser expressions describing the text parser to be constructed. 3) The system generates the intended text parser function (or object) from the text parser expressions and NLP-store.

The text parser saved a great deal of labor and improved the document quality by standardization and formalization. The text parser could be flexibly extended for changes in the description forms and document templates. The text parser was constructed by document engineers themselves using their document knowledge.

We designed a framework for combinatorial text-parsers


and implemented the text-parser combination system which can combine existing natural language parsers with formal language parsers by using novel combinators and then generate text parsers applying specific natural language processing to specified parts of entire textdocuments.

It includes an investigation of the expressive power of the text parser expression. We also would like to extend the combinatorial text parsers with other useful combinators for token-level parsing and partial AST querying during parsing.

Thank You !!!!

Anda mungkin juga menyukai