Different Methods to describe syntax and semantics
Syntax Lexical Structure of Programming Languages Context-Free Grammars and BNFs Parse Trees and Abstract Syntax Trees Ambiguity, Associativity, and Precedence EBNFs and Syntax Diagrams Parsing Techniques and Tools Lexics versus Syntax versus Semantics Lexical Structure of Programming Languages Tokens are words which comprise a programming language Lexical structure - structure of words/tokens Scanning phase collects sequences of characters from the input program into tokens Parsing phase processes the tokens syntactic structure Lexical Structure of Programming Languages Categories of Tokens: Reserved words or keywords Literals or constants Special symbols Identifiers Lexical Structure of Programming Languages The format of a program can affect the way tokens are recognized. Certain tokens are separated by token delimiters or white space Indentation can also be used Free-format language one in which format has no effect on the program structure Fixed format all tokens must occur in pre-specified locations on the page Lexical Structure of Programming Languages Tokens in programming language are often described in English, but they can also be described formally by regular expressions (descriptions of patterns of characters). 3 basic operations: - concatenation (sequencing the items without an explicit operation) - repetition (*) - choice or selection (|) Parentheses are also often included to allow for the grouping of operations Square brackets with a hyphen indicate a range of characters + indicates one or more repetitions ? indicates an optional item . indicates any character Context-Free Grammars and BNFs Context-Free Grammars consists of a series of grammar rules: Rules consist of a left-hand side that is a single structure name Then the metasymbol Followed by the right-hand side consisting of a sequence of items that can be symbols or other structure names Nonterminals names of structures, broken down into further structures Terminals words or token symbols, never broken down Grammar rules = production they produce the strings of the language using derivations Context-Free Grammars and BNFs (1)sentence noun-phrase verb-phrase . (2)noun-phrase article noun (3)article a | the (4)noun girl | dog (5)verb-phrase verb noun-phrase (6)verb sees | pets Context-Free Grammars and BNFs consists of or is the same as; metasymbol which separates the left-hand side from the right-hand side of a rule The italics serve to distinguish the names of the structures from the actual words or tokens that may appear in the language | also a metasymbol; or Other metasymbols: ::=, angle brackets, double quotes Context-Free Grammars and BNFs BNF notation ISO standard format for notation conventional conventions in describing syntax of programming languages Start symbol (left-hand side) and derivation (right-hand side, foregoing rules)
Context-Free Grammars and BNFs (1)sentence noun-phrase verb-phrase . (2)noun-phrase article noun (3)article a | the (4)noun girl | dog (5)verb-phrase verb noun-phrase (6)verb sees | pets Parse Trees and [Abstract] Syntax Trees Parse tree - describes graphically the replacement process in a derivation. the girl sees a dog 234 3 + 4 * 5
Parse Trees and Abstract Syntax Trees A parse tree is labelled by nonterminals at interior nodes and terminals at leaves. The structure of the parse tree is completely specified by the grammar rules of the language and a derivation of particular sequence of terminals Abstract syntax trees do away with terminals that are redundant once the structure of the tree is determined. Ambiguity, Associativity, and Precedence Two different derivations can lead to the same parse tree or syntax tree Different derivations can also lead to difference parse trees Ambiguity present difficulties since no clear structure; addressed/prevented by special derivations Leftmost derivation where the leftmost remaining nonterminal is singled out for replacement at each step. Disambiguity rule/ precedence Right- or left- associative EBNFs and Syntax Diagrams Extended Backus-Naur Form special notation which expresses more clearly the repetitive nature of their structures { } stand for zero or more repetitions [ ] indicate optional parts of the structure EBNFs and Syntax Diagrams Syntax diagrams which indicates the sequence of terminals and nonterminals encountered in the right-hand side of the rule Use circles or ovals for terminals and squares or rectangles for nonterminals, connecting them with lines and arrows to indicate appropriate sequencing Parsing Techniques and Tools Grammar explicitly describes the strings of tokens that are syntactically legal in a programming language Grammar implicitly describes the actions that a parser must take to parse a string of tokens correctly Recognizer - simplest form of parser; program that accepts or rejects strings, based on whether they are legal strings in the language; build parse trees Parsing Techniques and Tools Bottom-up parsers when a match occurs, the right-hand side is replaced by or reduced to the nonterminal on the left; construct derivations and parse trees from the leaves to the root; shift-reduce parsers Top-down parsers nonterminals are expanded to match incoming tokens and directly construct a derivation Recursive-descent parsers operates by turning the nonterminals into a group of mutually recursive procedures whose actions are based on the right-hand sides Lexics versus Syntax versus Semantics