Anda di halaman 1dari 8

A Constructive Approach to Parsing with Neural Networks The Hybrid Connectionist Parsing Method

Christel Kemke
Department of Computer Science 562 Machray Hall University of Manitoba Winnipeg, Manitoba, R3T 2N2, Canada ckemke@cs.umanitoba.ca

Abstract. The concept of Dynamic Neural Networks (DNN) is a new approach within the Neural Network paradigm, which is based on the dynamic construction of Neural Networks during the processing of an input. The DNN methodology has been employed in the Hybrid Connectionist Parsing (HCP) approach, which comprises an incremental, on-line generation of a Neural Network parse tree. The HCP ensures an adequate representation and processing of recursively defined structures, like grammar-based languages. In this paper, we describe the general principles of the HCP method and some of its specific Neural Network features. We outline and discuss the use of the HCP method with respect to parallel processing of ambiguous structures, and robust parsing of extra-grammatical inputs in the context of spoken language parsing.

Introduction

Parsing and Natural Language Processing with Neural Networks faces in general a discrepancy between using fixed-sized, relatively rigid Neural Network architectures, on the one hand, and the unlimited generative capacity of language models described by recursive grammars, on the other hand. A common approach to overcome this discrepancy has been the use of Recurrent Neural Networks (RNN) [1] [2] [3] [4] [5] [6] [7]. RNNs have a certain ability to represent prior inputs or contexts in hidden units of the network, and thus are in a limited way able to deal with structures of arbitrary size. We suggest an alternative model within the Neural Network paradigm, which addresses the problem of processing and representing time-dependent input structures of unknown size in a more adequate manner. The underlying concept is the Dynamic Neural Network (DNN), a methodology in which complex Neural Networks are constructed dynamically via generating and merging of small mini-networks. The Hybrid Connectionist Parsing (HCP) approach is based on this concept of DNNs and involves the incremental, online construction of a Neural Network parse tree by successively merging mini-networks, which correspond to rules of a context-free grammar (CFG) into a more and more complex (partial) parse tree. The methodology is in these respects related to the Neural Network unification parser developed by Kempen and Vosse [8][9] and to Quing Mas concept of Elastic Networks used, for example, for part-ofspeech tagging [10]. In this paper, we present the basic concepts of the Hybrid Connectionist Parser and discuss issues of parallel and robust processing using the HCP method, in particular related

to the problems of syntactical ambiguity and binding (parallelism), and the parsing of ungrammatical structures or disfluencies for spoken language (robustness). In the following sections we introduce the basic concept of the standard Hybrid Connectionist Parsing (HCP) approach and discuss its use in relation to two major problem categories in Natural Language Processing: parallel processing for ambiguous structures and binding problems, and robust parsing in the context of spoken language processing.

The Hybrid Connectionist Parsing Approach

The HCP method of parsing with Dynamic Neural Networks was inspired by the early work of Jordan Pollack on word-sense disambiguation using Semantic Networks with Spreading Activation [11][12]. Pollack employed a traditional chart-parser as front-end to the disambiguation network; the model did not contain a Neural Network based parser. This lead to the idea to develop a hybrid connectionist parser by integrating the concept of chart-parsing into a Neural Network parser, in order to allow full recursion and have the option to include techniques and features from traditional parsing [13][14]. The Hybrid Connectionist Parsing method has in its essence been tested in three early prototypical systems: PAPADEUS [15][16] and INKAS and INKOPA [17][18], and was later more thoroughly and theoretically re-examined [19][20]. We outline in the following sections the basic concepts of the HCP and its special Neural Network features, and investigate applications of the HCP with respect to parallel parsing of ambiguities and robust parsing of extra-grammatical structures.

2.1

Mini-Networks - Representing Grammatical Rules in the HCP

A mini-network represents the rule of a CFG in the following form: It is a two-layer network with one root-node and one or more child-nodes. The root-node corresponds to the left-hand side (LHS) of the rule and the child-nodes correspond to the items on the right-hand side (RHS) (cf. Fig. 1). Each node has an associated symbol marker, which represents the respective grammatical symbol (syntactic category, lexical entry, etc.). Connections in a mini-network are directed from each child-node to the one root- or parent-node. The weight of a connection in such a mini-network is 1/n, where n is the number of child-nodes in the mini-network (= symbols on the RHS of the rule). B 1/n A1 1/n An Cw

Fig. 1. (left) Mini-network for the general grammar rule B A1 An with parent-node B, childnodes A1 , , An and connection weights 1/n for each connection Ai to B, i=1, n. (right) Lexical mini-network created based on the input of a word w with associated word category Cw.

The units are in general simple threshold units with threshold 1.0, a linear input and a threshold dependent output function with threshold =1.0. The activation value ranges in general between 0 and 1 and it represents the state or degree of recognition of the respective syntactic item (cf. Fig. 2).

2.2

Dynamic Construction of a Neural Network Parse Tree

Processing of an input in the HCP approach comprises the following steps (cf. Fig. 2): the instantiation or generation of a mini-network, triggered by a fully activated node (a lexical input node or the parent-node of an already constructed initial partial NN parse tree), the successive merging of mini-networks and partial NN-parse trees by unifying a fully activated parent-node of one network with a matching (leftmost) open childnode of an earlier generated partial NN parse tree, the transfer of activation in the resulting combined network. Step 1 exploits parallelism since all possible derivations/rule applications are generated. Step 2 ensures LR parsing since a binding occurs only to the leftmost open child-node. This step also ensures the correctness of the method since a structure has to be fully recognized (fully activated parent-node) in order to be integrated into the initial parse-tree. The overall parsing process is initiated by a word-input, which causes the generation of a mini-network representing the terminal, lexical rule (cf. Fig. 1 right). The processing in the network proceeds as outlined above, until the network comes into a state, in which none of the three processes changes the network. Then the next input-word is read and processed (cf. Fig. 2). input word the
the

mini-net for det the


1.0 the det

resulting network
1.0 the det

generated initial parse tree


the 1.0 det 0.5 NP man 1.0 noun 0.5

new mini-network

combined parse tree


the 1.0 det 0.5 NP 0.5 S VP 0.5

NP

0.5 S man 1.0 noun 0.5

VP full activation a = 1.0 medium activation a = 0.5 no activation a = 0.0

0.5 0.5 connection weights

unifying nodes for merging

Fig. 2. (top) Access to a new mini-network, triggered by the input-word the, and merging of the two networks by unifying the the-nodes, yielding one network withfully activated nodes due to activation passing from the fully activated the-node. (bottom) Merging of a generated NN-parse tree for the man with a new mini-network for S NP VP, triggered by the fully activated NPnode of the initial parse tree. Activation in the combined parse tree is passed on, leading to a partial activation of the sentence-node S with a(S)=0.5.

The Standard HCP

The HCP approach to parsing is called hybrid since it employs a Neural Network representation of the parse tree and uses special NN-features like weighted connections and activation transfer in the parsing process, and it involves at the same time traditional

concepts of parsing, like recursive grammatical rules with syntactic constituents. Due to the dynamic construction of NN parse trees, the HCP approach encompasses the typical problems of NN parsers and enables full recursion. The standard HCP approach as described above is equivalent to deterministic bottom-up parsing [19].

3.1.1 Competing Derivations


One technique to ensure the correct selection of a derivation for competing right-hand sides (RHS) of rules with the same left-hand side (LHS) is based on the concept of mutual inhibition, similar to winner-take-all networks. In order to integrate this mutual inhibition into the parsing process, one strategy is to introduce hypothesis nodes for the competing derivations, and add small inhibitory connections, which implement an inhibition of invalid hypothesis nodes. This leads to complex mini-networks, which provide combined, compressed representations of the different RHS for the same LHS (see Fig. 3).
VP

h1

h2

h3

h4 1.0 0.5 1/3 -0.2

NP

PP

Fig. 3. Representation of competing RHS for the same LHS in a complex mini-network. h1 stands for VP V, h2 for VP V NP, h3 for VP V PP, h4 for VP V NP PP.

This complex mini-network is generated on the basis of the following rules: For each competing grammar rule (with the same LHS) define a hypothesis-node. Provide a positive connection for each RHS item to the respective hypothesisnode (i.e. if this RHS item is part of the rule which is represented by that hypothesis-node). Set the connection weight to 1/n, if n is the number of items on this RHS. Add a negative connection, e.g. 0.2, from any RHS-item to a hypothesis-node, if the RHS-items is not contained in the RHS of the rule, which is represented by this hypothesis-node.

In Fig. 3, the connection weights for the RHS items V and NP to the hypothesis-node h2, which represents the rule VP V NP, are 0.5 (1/2 for 2 RHS items). Inhibitory connections are introduced, for example, from NP to h1 since h1 represents the rule VP V, and NP is not part of the RHS of this rule. We can show that this implementation ensures a correct, parallel processing of competing rule applications: A hypothesis-node becomes fully activated if and only if the sum of its inputs is equal to or larger than 1.0. This means that all items on the RHS of the rule represented by this hypothesis-node must be fully activated with 1.0, since only n1/n1.0 (number of RHS-itemsconnection weightsactivation of each RHS-item) yields full activation 1.0 of the hypothesis-node. In addition, no item which is not on the RHS of this

rule can be activated, since otherwise this hypothesis-node would receive negative input (inhibition 0.2) and thus would not be fully activated. This means that a hypothesis-nod becomes fully activated if and only if all its RHS constituents are fully activated, and no other constituent appearing on the RHS of a competing rule is fully activated.

3.1.2 General Parallelism in the HCP Multi-dimensional Parsing Nets


A new design of the HCP method employs full parallel derivations, i.e. parallel generation of all possible parse trees and bindings, based on multi-dimensional parsing nets. In multidimensional parsing nets or parse trees, each competing derivation is developed in a new, separate dimension. Thus, the alternative parsing nets might share structures of an initial parse tree, up to the point where ambiguity leads to different possible bindings and thus alternative, competing parse trees are developed. Multi-dimensional parsing nets are implemented in the HCP by adding a dimensionality indicator to each node or unit in the network. The network starts with the dimension 1 for the initial parse tree. The dimension value is carried on in the network through merging and other processes and propagated to other units as long as they are connected to the initial network. If ambiguity occurs, any new alternative possible binding receives a different dimension-value.

Fault-tolerance in the HCP - Towards Robust Parsing

A major issue of current investigation is the use of the HCP method for robust parsing, i.e. parsing of ungrammatical structures [19][20]. Ungrammatical structures are often observed and frequently used in spontaneous speech and spoken language understanding (analysis of transcribed speech). Some modifications of the standard HCP have been suggested for dealing with phenomena typical for spontaneous spoken language, e.g. repetitions, corrections, insertions, reversions of substructures, and omissions. Often, these speech phenomena involve ungrammaticalities, which become obvious during the parsing process and in general lead to a complete parser failure.

4.1

Modifications of the Standard HCP for Robust Parsing

A general formal analysis of distortions of strings together with an informal analysis of transcribed corpora of spoken dialogues 1 has preceded the investigation of the HCP approach with respect to the processing of spontaneous spoken language [19]. This study resulted in the suggestion of several modifications of the standard HCP method, which allow the tolerance and repairs of spoken language phenomena, like repetitions and corrections of utterances and parts thereof. The HCP method can be modified with respect to the processing of speech related grammatical abnormalities by changes on the following levels: the merging process (sentence ordering, violation of LR parsing); the activation transfer function (incomplete structures); the setting of connection weights (incomplete structures, cognitive human parsing) and by overlapping of competing partial parse trees.

4.1.1 Changes to the merging process


This involves in particular the possibility of later/earlier merging of partial parse trees: a parse tree, which cannot be merged immediately, can be kept and inserted if possible in an
1

Transcripts of the SFB 360, Situated Communicating Agents, U. of Bielefeld, Germany

appropriate place, later during processing. This modification overcomes strict LR-parsing and takes care of variations in the sentence structure.

4.1.2 Changes to the activation transfer function.


One possibility to allow for incomplete structures is to change the activation transfer function such that the sum of inputs can be lower than the threshold value, and nevertheless activation is passed on. Since this modification would violate the correctness of the parsing method, it requires careful examination and has not yet been further investigated.

4.1.3 Changes to the weight setting


An adaptation of the connection weight setting by selecting weights, which reflect the relevancy of a syntactic structure (child-node of a mini-network) instead of choosing equal weights for all child-nodes of a mini-network, can take care of incomplete structures and inputs. The connection weights of irrelevant or unnecessary items can be set close to 0, and the weights of mandatory or necessary items (child-nodes of a mini-network) have to be set close or equal to the threshold value. Then, the recognition of a sub-structure depends on the appearance of the necessary items, and further irrelevant items can be absorbed but are not crucial in the further processing. Example: For the rule NP det noun the connection weight in the corresponding mini-network can be set to 0 for NP det and to 1 for NP noun. Thus, a determiner (det) can be absorbed by this mininetwork but does not have to be present, whereas a noun is mandatory and has to be recognized in order to fully activate the parent-node.

4.2

Overlapping of Partial Parse Trees

Corrections and repetitions in spontaneous speech are characterized in the HCP parsing process by the appearance of a second, similar (partial) parse-tree, which competes in the first initial parse-tree for a binding to the same child-node. Often, one of these parse-trees is incomplete, and thus a partial parse tree (PPT). Example: I went to to the shop. This utterance has an incomplete VP, i.e. a not fully activated VP-node, due to an incomplete PP-structure (in the first parse-tree). This PP appears again as a complete PPstructure in the correction/repetition part and forms a second partial parse-tree. Both structures (the incomplete PP to and the complete PP to the shop) compete for binding to the PP child-node in the VP. One solution for this phenomenon is to ignore and overread the incomplete PP. The more general solution is to overlap and combine both subtrees in order to yield a complete, repaired, grammatically correct sub-structure. The essence of this graph-overlapping method is to compare and combine two (partial) parse trees which were identified based on the detection of ungrammaticalities in the HCP as described above and find a suitable overlap based on matches of symbol markers and structures. This problem can be seen as a graph-matching problem for marked graphs, where markers associated with nodes are representing syntactic categories. The comparison method developed and used so far is constrained to the examination of directed acyclic graphs, i.e. directed trees, which are represented in a linear form. A sample input sentence, with comparison results, and resulting output is shown in Fig. 4.

the screw is over --- is in the block


-----------------------------------------------------------------------------------------------------1. partial parse tree S[ NP( DET-the N-screw) VP( V-is PP( P-over NIL))] 2. partial parse tree [ VP( V-is PP( P-in DET-the N-block))] -----------------------------------------------------------------------------------------------------MATCH VP MATCH V MATCH is MATCH PP MATCH P NO MATCH 1) over 2) in NO MATCH 1) NIL 2) DET-the N-block -----------------------------------------------------------------------------------------------------S[ NP( DET-the N-screw) VP( V-is PP( P-in DET-the N-block))] ------------------------------------------------------------------------------------------------------

the screw is in the block


Fig. 4. Sample for overlapping two partial parse trees. Shown is the input sentence, the compared syntactic structures, the comparison results for syntactic categories and words, and the output sentence and structure as result of overlapping the parse trees.

The method works well for corrections and repetitions on phrase level or higher, and in some cases for false starts, as a first set of tests has shown. The method is up to now in agreement with the hypothesis that corrections in spoken language often take place on the level of phrases (cf. e.g. Jurafsky and Martin 2000, Hindle 1983).

Conclusion

In this paper the method of Hybrid Connectionist Parsing (HCP) approach was described based on the concept of Dynamic Neural Networks. As special NN-related features of the HCP we discussed parsing with limited look-ahead and top-down expectations due to the use of mini-networks, graded hypotheses values for the partial recognition of higher-level constituents through gradual activations. Two mechanisms for parallel processing of competing grammatical rules were introduced: complex mini-networks for rules with the same LHS, and the representation of parallel derivations by multidimensional parsing nets. Some modifications of the HCP for implementing robust and fault-tolerant parsing have been described, in particular focusing on ungrammatical sentences as phenomena in spoken language. In particular the method of matching and overlapping partial parse trees as a special graph-matching problem has been introduced and its application to a sample test sentence has been shown. An application to a significantly large set of transcribed spoken language dialogues as well as an integration of the various techniques for robust parsing are next steps in the further testing and development of the HCP methodology.

References
1. Elman, J. L. Finding Structure in Time. Cognitive Science, 1990, no 14, 179-211. 2. Reilly, R. and N. E. Sharkey (eds.) Connectionist Approaches to Languages, North-Holland, Amsterdam, 1988.

3. Reilly, R. and N. E. Sharkey (eds.) Connectionist Approaches to Natural Language Processing, Lawrence Erlbaum, Hillsdale, 1992. 4. Sharkey, N. Connectionist Natural Language Processing. intellect, Oxford, England,1992. 5. Wermter, S. and V. Weber. SCREEN: Learning a Flat Syntactic and Semantic Spoken Language Analysis Using Artificial Neural Networks. Journal of Artificial Intelligence Research, vol 6, 1997, 35-85. 6. Jain, A. N. A Connectionist Architecture for Sequential Symbolic Domains. Technical Report CMU-CS-89-187, School of Computer Science, Carnegie Mellon University, 1989. 7. Jain, A. N. and A. H. Waibel. Incremental Parsing by Modular Recurrent Connectionist Networks. In D. S. Touretzky (ed.): Advances in Neural Information Processing Systems 2, Morgan Kaufmann, San Mateo, CA, 1990. 8. Kempen, G. and T. Vosse. Incremental syntactic tree formation in human sentence processing: A cognitive architecture based on activation decay and simulated annealing, Connection Science, Vol.1, 1989, 273-290. 9. Vosse, T. and G. Kempen. Syntactic structure assembly in human parsing: A computational model based on competitive inhibition and a lexicalist grammar, Cognition (75) 2000, 105-143. 10. Qing Ma, Kiyotaka Uchimoto, Masaki Murata and Hitoshi Isahara. Elastic Neural Networks for Part of Speech Tagging, IJCNN-99. 11. Pollack, J. B. On Connectionist Models of Natural Language Processing. Technical Report MCCS-87-100, Computing Research Laboratory, New Mexico State University, 1987. 12. Waltz, D.L. and Pollack, J.B. Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science, 1985 (9:1), 51-74. 13. Kemke, C. Parsing Neural Networks - Combining Symbolic and Connectionist Approaches. Proc. International Conference on Neural Information Processing ICONIP'94, Seoul, Korea, October 1994. Also TR-94-021, ICSI, Berkeley, CA, 1994. 14. Kemke, C. A Hybrid Approach to Natural Language Parsing. von der Malsburg, von Seelen, Vorbrueggen, Sendhoff (Eds.): Artificial Neural Networks, 875-880. 15. Schommer, C. PAPADEUS - Ein inkrementeller konnektionistischer Parser mit einer parallelen Disambiguierungskomponente (PAPADEUS An Incremental Connectionist Parser with a Disambiguation Component). Master's Thesis, Computer Science Department, University of the Saarland, 1993 16. Kemke, C. and C. Schommer. PAPADEUS - Parallel Parsing of Ambiguous Sentences, Proc. World Congress on Neural Networks, Portland, Oregon, 1993, Vol. 3, 79-82 17. Kone, H. INKOPA - Ein inkrementeller konnektionistischer Parser fuer natuerliche Sprache (INKOPA an Incremental Connectionist Parser for Natural Language). Master's Thesis, Computer Science Department, University of the Saarland, 1993 18. Kemke, C. and H. Kone. INCOPA - An Incremental Connectionist Parser. Proc. World Congress on Neural Networks, Portland, Oregon, 1993, Vol. 3, 41-44 19. Kemke, C. Konnektionistische Modelle in der Sprachverarbeitung (Connectionist Models for Speech and Language Processing), Cuvillier-Verlag, Goettingen, Germany, 2000. 20. Kemke, C. Connectionist Parsing with Dynamic Neural Networks or: Can Neural Networks make Chomsky Happy? Technical Report, Computing Research Laboratory CRL, New Mexico State University, 2001. 21. Jurafsky, D. and J. H. Martin. Speech and Language Processing, Prentice-Hall, 2000. 22. Hindle, D. Deterministic Parsing of Syntactic Non-Fluencies. In ACL-83, Cambridge, MA, 12328. 23. Heeman, Peter A., Kyung-ho Loken-Kim and James F. Allen. Combining the detection and correction of speech repairs. ICSLP-96. 24. Hemphill, Charles T., John J. Godfrey and George R. Doddington. The ATIS Spoken Language Systems Pilot Corpus. Proc. of the Speech and Natural Language Workshop, Hidden Valley, PA, 1990, 96-101. 25. Bear, J., J. Dowding and E. Shriberg. Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog. Proc. Annual Meeting of the Association for Computational Linguistics, Delaware, 1992, 56-63.

Anda mungkin juga menyukai