natural language. Here an application of such combined parser in practical situations is demonstrated and shown that the proposed approach can efficiently construct a parser for analyzing project-specific industrial specification documents.
Parser combinators are operators to combine existing simple parsers and to construct more complicated parsers. Using parser combinators or a combinatorial parser framework, we can quickly and flexibly construct parsers.
In various real projects such as software development projects, a lot of text documents are created in project-specific forms.Therefore we have a large number of opportunities of such text parser construction frameworks to automatically analyze the text documents.
Characterizing each parser as a function that accepts a prefix of input-text and extracts some parts of the accepted text, which expresses the extracted parts from the input text by the parser, Designing text-parser combinators that make it easier and more flexible to construct parsers, especially combinators for partial parsing and combinations of natural language parsers and formal language parsers, and Formalizing text-parser expressions as extensions of the parsing expression grammars and constructing a system for generating project-specific parsers from the expressions.
Consider parsers pa, pb, pc (each accepts only a, b, c, respectively) and the combinators ; and / and . p1; p2- a parser that parses the input-string with p1 and then parses remained-string with p2, p1/p2 - parses the input-string with p1 and if the parsing fails, parses the input-string with p2, and p - parses the input-string with p as many times as possible. Parsing abbcde with the parser pa; (pb/pc) accepts abbc and returns the remainder-string de and an abstract syntax tree for the accepted string.
we characterize each text parser as a function that receives a text, i.e a string, extracts a series of strings from the text and returns both the remainder string and the abstract syntax tree for the parsed string.
Abstract syntax tree as a tree structure with string-labels :
We introduce two parser combinators for the partial parsings and add explicitly them into our text parser combinators.
Let us consider a NLP Nisa that receives a sentence and extracts two is-a-relation strings if a relation can be extracted from the given sentence . We can deal with Nisa as a function that receives a string and return an Astb(isa, Asts(is, sentence1), Asts(a, sentence2)). For example, given input strings If the account number is 0 and foreign-currency trade, the parser could return ASTs Astb(isa, Asts(is, s1), Asts(a, 0)) and Astb(isa, Asts(is, null), Asts(a, s2)), where s1 = the account number and s2 = foreign-currency trade.
We design combinators that make input-text for NLPs by using formal language parsers.
p1Up2 This parser scans the input text and locates the first position in the text where parsing by parser p2 succeeds. Then It parses the text from the start until the located position with parser p1 and then parses the text after the located position with parser p2.
p1Fp2 This parser first parses the input text with parser p2 and then parses the accepted-string of p2 with parser p1.
The text-parser expressions are based on PEGs In computer science, a parsing expression grammar, or PEG, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Syntactically, PEGs also look similar to context-free grammars (CFGs), but they have a different interpretation: the choice operator selects the first match in PEG, while it is ambiguous in CFG. This is closer to how string recognition tends to be done in practice, e.g. by a recursive descent parser.
Rules for [N]: We assumed the natural language parsers are modeled (and may be wrapped) as parsers that do not have any explicit acceptance language but accept all input strings. Therefore, [N] accepts input-string s and extracts [s] and then returns both the empty string and an AST. 2. Rules for @p: Combinator @ cancels or clears away the extracted strings, so in each rule using this combinator, @p extracts no string. The abstract syntax tree created earlier is also cleared. This combinator is useful in situations where we would like to only check the description form of the input text and not extract any parts of the text.
1.
This combination seeks the first position k of the input string s, by advancing the parse-starting position of the input-string, so that the parser p2 can parse s[k..].
If such a position exists, then the parser p1 try to parse the substring from position 0 to k of the input s (i.e. s[0..k]), otherwise the parse fails. The three rules of this combination cover these situations.
The system is following these steps: 1) The user prepares NLP functions and stores these functions and reference names for the functions into a NLP-store. 2) The user writes text parser expressions describing the text parser to be constructed. 3) The system generates the intended text parser function (or object) from the text parser expressions and NLP-store.
The text parser saved a great deal of labor and improved the document quality by standardization and formalization. The text parser could be flexibly extended for changes in the description forms and document templates. The text parser was constructed by document engineers themselves using their document knowledge.
It includes an investigation of the expressive power of the text parser expression. We also would like to extend the combinatorial text parsers with other useful combinators for token-level parsing and partial AST querying during parsing.