Anda di halaman 1dari 12

Writing a parser with YACC (Yet Another

Compiler Compiler).

Automatically generate a parser for a context free grammar


(LALR parser)
Allows syntax direct translation by writing grammar productions and
semantic actions
LALR(1) is more powerful than LL(1).

Work with lex. YACC calls yylex to get the next token.
YACC and lex must agree on the values for each token.

Like lex, YACC pre-dated c++, need workaround for some


constructs when using c++ (will give an example).
Writing a parser with YACC (Yet Another
Compiler Compiler).
YACC file format:

declarations /* specify tokens, and non-terminals */


%%
translation rules /* specify grammar here */
%%
supporting C-routines

Command yacc yaccfile produces y.tab.c, which contains a


routine yyparse().
yyparse() calls yylex() to get tokens.

yyparse() returns 0 if the program is grammatically correct,


non-zero otherwise
The declarations part specifies tokens, non-terminals
symbols, other C/C++ constructs.

To specify token AAA BBB


%token AAA BBB

To assign a token number to a token (needed when using lex), a


nonnegative integer followed immediately to the first appearance
of the token
%token EOFnumber 0
%token SEMInumber 101

Non-terminals do not need to be declared unless you want to


associated it with a type to store attributes (will be discussed later).
Translations rules specify the grammar productions

exp : exp PLUSnumber exp


| exp MINUSnumber exp
| exp TIMESnumber exp
| exp DIVIDEnumber exp
| LPARENnumber exp RPARENnumber
| ICONSTnumber
;

exp : exp PLUSnumber exp


;
exp : exp MINUSnumber exp
;
Yacc environment
Yacc processes a yacc specification file and produces a y.tab.c file.
An integer function yyparse() is produced by Yacc.
Calls yylex() to get tokens.
Return non-zero when an error is found.
Return 0 if the program is accepted.
Need main() and and yyerror() functions.
Example:
yyerror(const char *str)
{ printf("yyerror: %s at line %d\n", str, yyline);
}
main()
{
if (!yyparse()) {printf("accept\n");}
else printf("reject\n");
}
Hooking yacc and lex together, see example0.y and lexer.l
Matching the tokens
In lex:
#define INTEGERCONST 2
#define PLUSNUM 4
In yacc:
%token INTEGERCONST 2
%token PLUSNUM 4

All tokens used in the yacc grammar need to be specified. Some tokens
recognized by lex may not be in the yacc grammar token. See lexer.l Non-
terminals do not need to be specified.

lex.yy.c and y.tab.c may be compiled separately, or yacc


file may just include lex.yy.c as in example0.y
Global variables such as yyline, yycolumn, and yylval can
be used in yacc routines.
YACC automatically builds a parser for the grammar (LALR
parser).
May have shift/reduce and reduce/reduce conflicts when the
grammar is not LALR
In this case, you will need to modify grammar to make it LALR in order
for yacc to work properly.

YACC tries to resolve conflicts automatically


Default conflict resolution:
shift/reduce --> shift
reduce/reduce --> first production in the state
Not very informative, not clear if such action is what you
wanted.

yacc -v *.y will generate a report in file y.output.


See example1.y
Resolving conflicts
modify the grammar. See example1.y example0.y
Use precedence and associativity of operators.
Using keywords %left, %right, %nonassoc in the
declarations section.
All tokens on the same line are the same precedence
level and associativity.
The lines are listed in order of increasing precedence.

%left PLUSnumber, MINUSnumber


%left TIMESnumber, DIVIDEnumber

See example3.y
Attribute grammar with yacc
Each symbol can be associated with some
attributes.
Data structure of the attributes can be specified in the union in the
declarations. (see example4.y).

%union {
int semantic_value;
}
%token <semantic_value> INTEGERCONST 2
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item

Semantic actions associate with productions can be specified.


The union is used to define yylval (dont need to
redeclare again, but you can directly using
yylval.semantic_value in the lex code).
Semantic actions
Semantic actions associate with productions can be
specified.

item : LPARENnumber exp RPARENnumber


{$$ = $2;}
| ICONSTnumber
{$$ = $1;}
;
$$ is the attribute associated with the left handside of the
production
$1 is the attribute associated with the first symbol in the
right handside, $2 for the second symbol,
An action can be in anywhere in the production, it is also
counted as a symbol.
Semantic actions
Semantic actions can be in anywhere in the
production, an action is also counted as a
symbol.

item : LPARENnumber {cout << debug;} exp RPARENnumber


{$$ = $3;}
| ICONSTnumber
{$$ = $1;}
;
Multiple attributes and C/C++
issues
Multiple attributes can be associated with a
symbol by declaring a structure in the
union. See cal_trans_c.y (in
yacc1_cop4020).
Unfortunately C++ does not like union with a
structure or a class.
A workaround example is given in
cal_trans_cpp.y.

Anda mungkin juga menyukai