COMPILER CONSTRUCTION
Shared by: dffhrtcv3
-
Stats
- views:
- 21
- posted:
- 7/29/2012
- language:
- English
- pages:
- 16
Document Sample


COMPILER
CONSTRUCTION
WEEK-2:
LANGUAGE DESCRIPTION-
SYNTACTIC STRUCTURE:
An Overview
• Clear and complete descriptions of a language are needed
by programmers, implementers, and even language
designers.
• The syntax of a language specifies how programs in the
language are built up.
• The semantics of the language specifies what programs
mean.
• For example, dates are built up from digits represented by
D and the symbol / as follows:
• DD/DD/DDDD
• According to this syntax, 01/02/2001 is a date.
• The day this date refers to is not identified by the syntax.
• In the United States, this date refer to January 2, 2001, but
elsewhere 01 is interpreted as the day and 02 as the
month, so the date refers to February 1, 2001.
• The same syntax therefore has different semantics in
different parts of the world.
Expression Notations:
• Expression such as a+b*c have been in use for centuries and were a
starting point for design of programming languages.
• For example, a expression in Fortran can be written as:
• (- b + b2 – 4 * a * c ) / (2 * a)
• (- b + sqrt (b * b – 4.0 * a * c)) / (2.0 * a)
• Programming languages use a mix of infix, prefix, and postfix
notations. (Assignment)
• A binary operator is applied to two operands.
• In infix notation, a binary operator is written between its operands, as
in the expression a+b.
• Other alternative are prefix notation, in which the operator is written
first, as + a b, and postfix notation, in which the operator is written last,
as a b +.
• An expression can be enclosed within parentheses without affecting
its value.
• Expression E has the same value as (E), as a rule.
• Prefix and Postfix notations are sometimes called parenthesis-free
because as we shall see, the operands of each operator can be found
unambiguously, without the need for parentheses.
Prefix Notation:
• An expression in prefix notation is written as follows:
• The prefix notation for a constant or a variable is the
constant or variable itself.
• The application of an operator op to sub-expressions E1
and E2 is written in prefix notation as op E1 E2.
• An advantage of prefix notation is that it is easy to decode
during a left-to-right scan of an expression.
• If a prefix expression begins with operator +, the next
expression after + must be the first operands of + and the
expression after that must be the second operand of +.
• For example, the sum of x and y is written in prefix
notation as + x y.
• The product of + x y and z is written as * + x y z.
• Thus + 20 30 equals to 50 and * + 20 30 60 = * 50 60 =
3000
• Or * 20 + 30 60 = * 20 90 = 1800
Postfix Notation:
• An expression in postfix notation is written as follows:
• The postfix notation for a constant or a variable is the
constant or variable itself.
• The application of an operator op to sub-expressions E1
and E2 is written in postfix notation as E1 E2 op.
• An advantage of postfix notation is that they can be
mechanically evaluated with the help of a stack data
structure.
• For example, the sum of x and y is written in postfix
notation as x y +.
• The product of x y + and z is written as x y + z *.
• Thus 20 30 + equals to 50 and 20 30 + 60 * = 50 60* =
3000
• Or 20 30 60 + * = 20 90 * = 1800
Infix Notation:
• In infix notation, operators appear between their operands;
+ appear between a and b in the sum a + b.
• An advantage of infix notation is that it is familiar and
hence easy to read.
• Infix notation comes with rules for precedence and
associativity.
• How is an expression like a + b * c to be decoded?
• Is it the sum of a and b * c, or is it the product of a + b and
c?
• The operator * usually takes its operands before + does.
• An operator at a higher precedence level takes its
operands before an operator at a lower precedence level.
• BODMAS rules is an example.
Mixfix Notation:
• Operations specified by a combination of symbols do not
fit neatly into the prefix, infix, postfix classification.
• For example the keywords, if, then, and else are used
together in the expression
• if a > b then a else b
• The meaningful components of this expression are the
condition a>b and the expressions a and b.
• If a>b evaluates to true, then the value of the expression is
a, otherwise, it is b.
• When symbols or keywords appear interspersed with the
components of an expression, the operation will be said to
be in mixfix notation.
Abstract Syntax Trees:
• The abstract syntax of a language identifies the meaningful
components of each construct in the language.
• The prefix expression +ab, the infix expression a+b, and the postfix
expression ab+ all have the same meaningful components; the
operator + and the sub-expressions a and b.
• A corresponding tree representation is a better grammar can be
designed if the abstract syntax of a language is known before the
grammar is specified.
+
a b
• An operator and its operands are represented by a node and its
children.
• A tree consists of a node with k 0 trees as its children.
• When k = 0, a tree consists of just a node, with no children.
• A node with no children is called a leaf.
• The root of a tree is a node with no parent; that is, it is not a child of
any node.
Lexical Syntax:
• Keyword like if and symbol like <= are treated as units in a
programming language, just as words are treated as units in English.
• The meaning of the word dote (love / admire) bears no relation to the
meaning of dot, despite the similarity of their written representations.
• The two-characters symbol <= is treated as a unit in Pascal and C.
• It is distinct from the one-character < and =, which have different
meaning of their own.
• For example:
• <> in Pascal and != in C
• mod in Pascal and % in C etc.
• Grammars deal with units called tokens
• The syntax of a programming language is specified in terms of units
called tokens or terminals.
• A lexical syntax for language specifies the correspondence between
the written representation of the language and the tokens or terminals
in a grammar for the language.
• Alphabetic character sequences that are treated as units in a
language are called keywords.
Lexical Syntax:
• Similarly comments between tokens are ignored.
• Informal descriptions usually suffice for white space,
comments and the correspondence between tokens and
their spellings, so lexical syntax will not be formalized.
• Real numbers are a possible exception.
• The most complex rules in a lexical syntax are typically the
ones describing the syntax of real numbers, because parts
of the syntax are optional.
• The following some of the ways of writing the same
number:
• 314.E-2 = 3.14 = 0.314E+1 =
0.313E1
• and leading 0 can sometimes be dropped as:
.314E1
Context-Free Grammars:
• The concrete syntax of a language describes its written
representation, including lexical details such as the
placement of keywords and punctuation marks.
• Context-free grammars, or simply grammars, are a
notation for specifying concrete syntax.
• BNF-form, Backus-Nour Form, is a one way of writing
grammars. (Assignment)
• A grammar for a language imposes a hierarchical
structure, called a parse tree on programs in the language.
• The following is a parse tree for the string 3.14 in a
language of real numbers:
Context-Free Grammars:
real number
Integer part fraction
fraction
digit digit
digit
1
3 4
.
Context-Free Grammars:
• The leaves at the bottom of a parse tree are labeled with terminals or
tokens like 3; tokens represent themselves.
• By contrast, the other nodes of a parse tree are labeled with non-
terminals like real-number and digit; non-terminal represent language
constructs.
• Each node in the parse tree is based on a production, a rule that
defines a non-terminal in terms of a sequence of terminals and non-
terminals.
• The root of the parse tree for 3.14 is based on the following informally
stated production:
• “A real number consists of an integer part, a point, and a fraction part”.
• Together the tokens, the non-terminals, the productions, and a
distinguished non-terminal, called the starting non-terminal, constitute
a grammar for a language.
• The starting non-terminal may represent a portion of a complete
program when fragments of a programming language are studies.
• Both tokens and non-terminals are referred to as grammar symbols, or
simply symbols.
Definition of Context-Free Grammars:
• Given a set of symbols, a starting over the set is a finite sequence of
zero or more symbols from the set.
• The number of symbol in the sequence is said to be the length of the
string.
• The length of the string “teddy” is 5.
• An empty string is a string of length zero.
• A context-free grammar, or simply grammar, has four parts:
– A set of tokens or terminal; these are the atomic symbols in the
language
– A set of non-terminals; these are the variable representing
constructs in the language.
– A set of rules called productions for identifying the components of a
construct. Each production has a non-terminals as its left side, the
symbol =, and a string over the sets of terminals and non-terminals
as its right side
– A non-terminal chosen as the starting non-terminal; it represents the
main construct of the language.
• Unless otherwise stated, the production for the starting non-terminal
appear first.
BNF: Backus-Naur Form:
• The concept of a context-free grammar, consisting of terminals, non-
terminals, productions, and a string non-terminal, is independent of the
notation used to write grammars.
• BNF is one such notation, made popular by its use to organize the
report on the Algol-60 programming language.
Grammars for Expressions:
• A well-designed grammar can make it easy to pick out the meaningful
components of a construct.
• In other words, with a well-designed grammar, parse trees are similar
enough to abstract syntax trees that the grammar can be used to
organize a language description or a program that exploits the syntax.
• An example of a program that exploits syntax is an “expression
evaluator” that analyzes and evaluates expressions.
• After expressions, the remaining syntax is often easy.
Variants of Grammars:
• The other ways of grammars are Extended BNF and Syntax Charts.
• EBNF is an extension of BNF that allows lists and optional elements to
be specified. Lists or sequences of elements appear frequently in the
syntax of programming language. The appeal of EBNF is
convenience, not additional capability, since anything that can be
specified with EBNF can also be specified using BNF.
• Syntax charts are a graphical notation for grammars. They have
visual appeal; again, anything that can be specified using syntax
charts can also be specified using BNF. (Assignment)
Get documents about "