Anda di halaman 1dari 11

A Yacc Tutorial 1 Introduction The unix utility yacc (Yet Another Compiler Compiler) parses a stream of token, typically

generated by lex, according to a user-specified grammar. 2 Structure of a yacc file A yacc file looks much like a lex file ...definitions... !! ...rules... !! ...code... "n the example you #ust sa$, all three sections are present definitions All code bet$een !% and !& is copied to the beginning of the resulting C file. rules A number of combinations of pattern and action if the action is more than a single command it needs to be in braces. code This can be 'ery elaborate, but the main ingredient is the call to yylex, the lexical analyser. "f the code segment is left out, a default main is used $hich only calls yylex. 3 Definitions section There are three things that can go in the definitions section C code Any code bet$een !% and !& is copied to the C file. This is typically used for defining file 'ariables, and for prototypes of routines that are defined in the code segment.

definitions The definitions section of a lex file $as concerned $ith characters( in yacc this is tokens. These token definitions are $ritten to a .h file $hen yacc compiles this file. associativity rules These handle associati'ity and priority of operators. 4 Lex Yacc interaction Conceptually, lex parses a file of characters and outputs a stream of tokens( yacc accepts a stream of tokens and parses it, performing actions as appropriate. "n practice, they are more tightly coupled. "f your lex program is supplying a tokeni)er, the yacc program $ill repeatedly call the yylex routine. The lex rules $ill probably function by calling return e'erytime they ha'e parsed a token. *e $ill no$ see the $ay lex returns information in such a $ay that yacc can use it for parsing. 4.1 The shared header file of return codes "f lex is to return tokens that yacc $ill process, they ha'e to agree on $hat tokens there are. This is done as follo$s. + The yacc file $ill ha'e token definitions !token ,-./01 in the definitions section. + *hen the yacc file is translated $ith yacc -d, a header file y.tab.h is created that has definitions like 2define ,-./01 345 This file can then be included in both the lex and yacc program. + The lex file can then call return ,-./01, and the yacc program can match on this token.

The return codes that are defined from !T670, definitions typically start at around 345, so that single characters can simply be returned as their integer 'alue 89 in the lex program 98 :;-<=> %return ,-./01& :->98= %return 9yytext& 89 in the yacc program 98 sum T01.? @>@ T01. ?ee example A.B for a $orked out code. 4.2 eturn values

"n the abo'e, 'ery sketchy example, lex only returned the information that there $as a number, not the actual number. Cor this $e need a further mechanism. "n addition to specifying the return code, the lex parse can return a symbol that is put on top of the stack, so that yacc can access it. This symbol is returned in the 'ariable yyl'al. /y default, this is defined as an int, so the lex program $ould ha'e extern int ll'al( !! :;-<=> %ll'alDatoi(yytext)( return ,-./01(& "f more than #ust integers need to be returned, the specifications in the yacc code become more complicated. ?uppose $e $ant to return double 'alues, and integer indices in a table. The follo$ing three actions are needed. B. The possible return 'alues need to be stated !union %int i'al( double d'al(& 3. These types need to be connected to the possible return tokens !token Ei'alF ",G0H !token Ed'alF ,-./01

I. The types of non-terminals need to be gi'en !type Ed'alF expr !type Ed'alF mulex !type Ed'alF term The generated .h file $ill no$ ha'e 2define ",G0H 345 2define ,-./01 34< typedef union %int i'al( double d'al(& YY?TYJ0( extern YY?TYJ0 yyl'al( This is illustrated in example A.3. ! ules section The rules section contains the grammar of the language you $ant to parse. This looks like nameB TK",L something 6TK01TK",L %action& M othersomething TK",L %other action& name3 ..... This is the general form of context-free grammars, $ith a set of actions associated $ith each matching right-hand side. "t is a good con'ention to keep non-terminals (names that can be expanded further) in lo$er case and terminals (the symbols that are finally matched) in upper case. The terminal symbols get matched $ith return codes from the lex tokeni)er. They are typically defines coming from !token definitions in the yacc program or character 'alues( see section N.B. A simple example illustrating these ideas can be found in section A.B. " #ser code section The minimal main program is int main() % yyparse()(

return ;( & 0xtensions to more ambitious programs should be self-e'ident. "n addition to the main program, the code section $ill usually also contain subroutines, to be used either in the yacc or the lex program. ?ee for instance example A.I. $ %xa&'les $.1 Si&'le calculator This calculator e'aluates simple arithmetic expressions. The lex program matches numbers and operators and returns them( it ignores $hite space, returns ne$lines, and gi'es an error message on anything else. !% 2include Estdlib.hF 2include Estdio.hF 2include OcalcB.hO 'oid yyerror(char9)( extern int yyl'al( !& !! : Pt=> ( :;-<=> %yyl'al D atoi(yytext)( return ",T0L01(& :->98= %return 9yytext(& O(O %return 9yytext(& O)O %return 9yytext(& Pn %return 9yytext(& . %char msg:34=( sprintf(msg,O!s E!sFO,Oin'alid characterO,yytext)( yyerror(msg)(& Accepting the lex output, this yacc program has rules that parse the stream of numbers and operators, and perform the corresponding calculations.

!% 2include Estdlib.hF 2include Estdio.hF int yylex('oid)( 2include OcalcB.hO !& !token ",T0L01 !! program line program M line line expr @Pn@ % printf(O!dPnO,QB)( & M @n@ expr expr @>@ mulex % QQ D QB > QI( & M expr @-@ mulex % QQ D QB - QI( & M mulex % QQ D QB( & mulex mulex @9@ term % QQ D QB 9 QI( & M mulex @8@ term % QQ D QB 8 QI( & M term % QQ D QB( & term @(@ expr @)@ % QQ D Q3( & M ",T0L01 % QQ D QB( & !! 'oid yyerror(char 9s) % fprintf(stderr,O!sPnO,s)( return( & int main('oid) % 89yydebugDB(98 yyparse()( return ;(

& Kere $e ha'e reali)ed operator precedence by ha'ing separate rules for the different priorities. The rule for plus8minus comes first, $hich means that its terms, the mulex expressions in'ol'ing multiplication, are e'aluated first. $.2 Calculator (ith si&'le varia)les "n this example the return 'ariables ha'e been declared of type double. Curthermore, there can no$ be single-character 'ariables that can be assigned and used. There no$ are t$o different return tokens double 'alues and integer 'ariable indices. This necessitates the !union statement, as $ell as !token statements for the 'arious return tokens and !type statements for the non-terminals. This is all in the yacc file !% 2include Estdlib.hF 2include Estdio.hF int yylex('oid)( double 'ar:3R=( !& !union % double d'al( int i'ar( & !token Ed'alF G6-/S0 !token Ei'arF ,A.0 !type Ed'alF expr !type Ed'alF mulex !type Ed'alF term 4 !! program line program M line line expr @Pn@ % printf(O!gPnO,QB)( & M ,A.0 @D@ expr @Pn@ % 'ar:QB= D QI( &

expr expr @>@ mulex % QQ D QB > QI( & M expr @-@ mulex % QQ D QB - QI( & M mulex % QQ D QB( & mulex mulex @9@ term % QQ D QB 9 QI( & M mulex @8@ term % QQ D QB 8 QI( & M term % QQ D QB( & term @(@ expr @)@ % QQ D Q3( & M ,A.0 % QQ D 'ar:QB=( & M G6-/S0 % QQ D QB( & !! 'oid yyerror(char 9s) % fprintf(stderr,O!sPnO,s)( return( & int main('oid) % 89yydebugDB(98 yyparse()( return ;( & The lex file is not all that different( note ho$ return 'alues are no$ assigned to a component of yyl'al rather than yyl'al itself. !% 2include Estdlib.hF 2include Estdio.hF 2include Ocalc3.hO 'oid yyerror(char9)( !& !! : Pt=> ( R ((:;-<=>(P.:;-<=9)T)M(:;-<=9P.:;-<=>)) % yyl'al.d'al D atof(yytext)( return G6-/S0(&

:->98D= %return 9yytext(& O(O %return 9yytext(& O)O %return 9yytext(& :a-)= %yyl'al.i'ar D 9yytext( return ,A.0(& Pn %return 9yytext(& . %char msg:34=( sprintf(msg,O!s E!sFO,Oin'alid characterO,yytext)( yyerror(msg)(& $.3 Calculator (ith dyna&ic varia)les /asically the same as the pre'ious example, but no$ 'ariable names can ha'e regular names, and they are inserted into a names table dynamically. The yacc file defines a routine for getting a 'ariable index !% 2include Estdlib.hF 2include Estdio.hF 2include Estring.hF int yylex('oid)( 2define ,UA1? B;; char 9'ars:,UA1?=( double 'als:,UA1?=( int n'arsD;( !& !union % double d'al( int i'ar( & !token Ed'alF G6-/S0 !token Ei'arF ,A.0 !type Ed'alF expr !type Ed'alF mulex !type Ed'alF term !! program line program M line line expr @Pn@ % printf(O!gPnO,QB)( & M ,A.0 @D@ expr @Pn@ % 'als:QB= D QI( & expr expr @>@ mulex % QQ D QB > QI( & M expr @-@ mulex % QQ D QB - QI( &

M mulex % QQ D QB( & mulex mulex @9@ term % QQ D QB 9 QI( & M mulex @8@ term % QQ D QB 8 QI( & M term % QQ D QB( & A term @(@ expr @)@ % QQ D Q3( & M ,A.0 % QQ D 'als:QB=( & M G6-/S0 % QQ D QB( & !! int 'arindex(char 9'ar) % int i( for (iD;( iEn'ars( i>>) if (strcmp('ar,'ars:i=)DD;) return i( 'ars:n'ars= D strdup('ar)( return n'ars>>( & int main('oid) % 89yydebugDB(98 yyparse()( return ;( & The lex file is largely unchanged, except for the rule that recognises 'ariable names !% 2include Estdlib.hF 2include Estdio.hF 2include OcalcI.hO 'oid yyerror(char9)( int 'arindex(char 9'ar)( !& !! : Pt=> ( ((:;-<=>(P.:;-<=9)T)M(:;-<=9P.:;-<=>)) % yyl'al.d'al D atof(yytext)( return G6-/S0(&

:->98D= %return 9yytext(& O(O %return 9yytext(& O)O %return 9yytext(& :a-)=:a-);-<=9 % yyl'al.i'ar D 'arindex(yytext)( return ,A.0(& Pn %return 9yytext(& . %char msg:34=( sprintf(msg,O!s E!sFO,Oin'alid characterO,yytext)( yyerror(msg)(& 5