Anda di halaman 1dari 20

Syntax and Semantics

Different Methods to describe syntax and semantics


Syntax
Lexical Structure of Programming Languages
Context-Free Grammars and BNFs
Parse Trees and Abstract Syntax Trees
Ambiguity, Associativity, and Precedence
EBNFs and Syntax Diagrams
Parsing Techniques and Tools
Lexics versus Syntax versus Semantics
Lexical Structure of Programming Languages
Tokens are words which comprise a programming language
Lexical structure - structure of words/tokens
Scanning phase collects sequences of characters from the input program
into tokens
Parsing phase processes the tokens syntactic structure
Lexical Structure of Programming Languages
Categories of Tokens:
Reserved words or keywords
Literals or constants
Special symbols
Identifiers
Lexical Structure of Programming Languages
The format of a program can affect the way tokens are recognized.
Certain tokens are separated by token delimiters or white space
Indentation can also be used
Free-format language one in which format has no effect on the program
structure
Fixed format all tokens must occur in pre-specified locations on the page
Lexical Structure of Programming Languages
Tokens in programming language are often described in English, but they can also be described formally by regular
expressions (descriptions of patterns of characters).
3 basic operations:
- concatenation (sequencing the items without an explicit operation)
- repetition (*)
- choice or selection (|)
Parentheses are also often included to allow for the grouping of operations
Square brackets with a hyphen indicate a range of characters
+ indicates one or more repetitions
? indicates an optional item
. indicates any character
Context-Free Grammars and BNFs
Context-Free Grammars consists of a series of grammar rules:
Rules consist of a left-hand side that is a single structure name
Then the metasymbol
Followed by the right-hand side consisting of a sequence of items that can be symbols or other
structure names
Nonterminals names of structures, broken down into further structures
Terminals words or token symbols, never broken down
Grammar rules = production they produce the strings of the language using derivations
Context-Free Grammars and BNFs
(1)sentence noun-phrase verb-phrase .
(2)noun-phrase article noun
(3)article a | the
(4)noun girl | dog
(5)verb-phrase verb noun-phrase
(6)verb sees | pets
Context-Free Grammars and BNFs
consists of or is the same as; metasymbol which separates the
left-hand side from the right-hand side of a rule
The italics serve to distinguish the names of the structures from the actual
words or tokens that may appear in the language
| also a metasymbol; or
Other metasymbols: ::=, angle brackets, double quotes
Context-Free Grammars and BNFs
BNF notation ISO standard format for notation conventional conventions
in describing syntax of programming languages
Start symbol (left-hand side) and derivation (right-hand side, foregoing
rules)

Context-Free Grammars and BNFs
(1)sentence noun-phrase verb-phrase .
(2)noun-phrase article noun
(3)article a | the
(4)noun girl | dog
(5)verb-phrase verb noun-phrase
(6)verb sees | pets
Parse Trees and [Abstract] Syntax Trees
Parse tree - describes graphically the replacement process in a derivation.
the girl sees a dog
234
3 + 4 * 5

Parse Trees and Abstract Syntax Trees
A parse tree is labelled by nonterminals at interior nodes and terminals at
leaves.
The structure of the parse tree is completely specified by the grammar rules
of the language and a derivation of particular sequence of terminals
Abstract syntax trees do away with terminals that are redundant once the
structure of the tree is determined.
Ambiguity, Associativity, and Precedence
Two different derivations can lead to the same parse tree or syntax tree
Different derivations can also lead to difference parse trees
Ambiguity present difficulties since no clear structure; addressed/prevented by
special derivations
Leftmost derivation where the leftmost remaining nonterminal is singled out for
replacement at each step.
Disambiguity rule/ precedence
Right- or left- associative
EBNFs and Syntax Diagrams
Extended Backus-Naur Form special notation which expresses more
clearly the repetitive nature of their structures
{ } stand for zero or more repetitions
[ ] indicate optional parts of the structure
EBNFs and Syntax Diagrams
Syntax diagrams which indicates the sequence of terminals and
nonterminals encountered in the right-hand side of the rule
Use circles or ovals for terminals and squares or rectangles for
nonterminals, connecting them with lines and arrows to indicate
appropriate sequencing
Parsing Techniques and Tools
Grammar explicitly describes the strings of tokens that are syntactically
legal in a programming language
Grammar implicitly describes the actions that a parser must take to parse a
string of tokens correctly
Recognizer - simplest form of parser; program that accepts or rejects
strings, based on whether they are legal strings in the language; build parse
trees
Parsing Techniques and Tools
Bottom-up parsers when a match occurs, the right-hand side is replaced
by or reduced to the nonterminal on the left; construct derivations and
parse trees from the leaves to the root; shift-reduce parsers
Top-down parsers nonterminals are expanded to match incoming tokens
and directly construct a derivation
Recursive-descent parsers operates by turning the nonterminals into a
group of mutually recursive procedures whose actions are based on the
right-hand sides
Lexics versus Syntax versus Semantics

Next Topic: Semantics
Axiomatic Semantics
Denotational Semantics
Translation Semantics
Algebraic Semantics
Operational Semantics

Anda mungkin juga menyukai