Anda di halaman 1dari 5

Syntax

Definition Language definition lexical rules syntax rules EBNF (for defining syntax) Regular expressions Syntax and learning programming languages (ex: Scheme function calls)

Definition
Syntax defines the allowable sequences of words in a language Syntax does affect the "power" of the language ex: language with only pronouns and nouns "Me Tarzan, you Jane"

Goals for programming languages


readability, writability do you want the language to be concise? verifiability side-effects make this more difficult ex: difference = list[i++] - list[i++]; // bug?

Goals (continued)
ease of translation -- ex: Lisp see example in EBNF discussion below avoid ambiguity if ... if ... else ... //match "else" to which "if"? syntax must answer this question

Lexical vs. Syntactic rules


Lexical rules determine how the symbols of the language can be combined: for example, English: words include letters and hyphens (no *, %, $, etc.) C++: variable name starts with a letter or '_', = cannot have a space between
and =, spaces between statements are ignored

Syntactic rules: how the words can be combined ex: (a = b) is OK, but (a = b = c) is not

Syntax and Cause-and-Effect problems

Grammatical definition of a sentence of English? Define each grammatical construct by possible sequences of components Ex: Possibilities for the subject (noun part) of a sentence?

Defining lexical and syntactic rules


Regular expressions define lexical rules Extended Backus-Naur Form (EBNF) defines syntactic rules EBNF is a more powerful notation harder to translate

EBNF
notation for defining grammar of a language each grammar includes: terminals -- symbols/words in the language nonterminals -- a unit representing a grammatically correct sequence of
terminals usually one nonterminal is considered the "start" symbol productions -- the valid ways a nonterminal could be replaced by terminals and other nonterminals

EBNF (continued)
ex: English sentence -> noun-part verb-part . noun-part -> noun | adj noun -> may be replaced by ::= sentence would be start symbol for empty string $$ for end of input in production -- can help identify start symbol

EBNF (continued)
nonterminals in italics (or surrounded by >) | means "or" * to indicate 0 or more of something, + 1 or more: ex: posInteger -> digit+ digit -> 0 | 1 | 2 | ... | 9

Lisp in EBNF

expression -> atom | list atom -> number | name | string | operator list -> (expression*) close to complete -- compare with C++

EBNF in EBNF
EBNF-grammar -> production* production -> nonterminal -> list list -> expr | expr | list expr -> term | term expr term -> item | item* | item+ item -> nonterminal | terminal | (expr)

EBNF in EBNF (continued)


nonterminal -> word terminal -> word | symbol

Derivations and parse trees


Parser generators (ex: yacc/bison) are used to generate code to check syntax Such tools need a grammar: will look like EBNF Computer must see if possible to find derivation from start symbol to program
being tested derivation = set of steps replacing a nonterminal by right side of a production in the grammar

Examples
372 for posInteger (+ 2 5) for Lisp grammar (exit) for Lisp grammar Fig's. 2.1-2.4 in Scott

Regular vs. BNF grammar


Regular grammar : restricted BNF nonterminal -> terminal nonterminal | terminal can rewrite without recursion Ex: expr -> term+ can replace rule on previous slide Compare with item -> (expr) cannot represent rules that require matches matching parentheses, if-else

Regular expressions
regular expression = terminal symbol if a and b are reg. expr.'s, so are: a | b, ab, (a), a* nothing else

Regular expressions (continued)


describe the following languages: (MM)*M // M is a terminal (_ | M | m)(_ | M | m | 3)* equivalent to regular grammar What are the advantages of this notation? Usually used for lexical analysis

Syntax and learning a programming language


Learning syntax does not necessarily help you understand the language Look for concepts that are familiar Ex: Scheme function call (quote a) ; quote is a built-in function so argument is not evaluated semi-colon is like // rest of line is comment 'a ; for convenience ' is equivalent

More Scheme
(cons (quote a) (quote (b c))) ; cons constructs list 1st parameter is 1st element of list 2nd parameter is rest of list (cons '(a b) '(c)) ; compare with previous line

More Scheme
Some erroneous lines: (f) ; Scheme cannot find definition for f (cons a (b c)) ; similar (cons 'a '(b) ; Scheme waits for you (to do what?) Where are missing semi-colons detected in C++/Java compilers?

Interpreter modules
Read need complete expression Evaluate

Print every expression must have a value (no void) And repeat

Anda mungkin juga menyukai