Documents
Resources
Learning Center
Upload
Plans & pricing Sign in
Sign Out

Grammars

VIEWS: 2 PAGES: 38

									Grammars

 CPSC 5135
           Formal Definitions
• A symbol is a character. It represents an
  abstract entity that has no inherent meaning
• Examples:
  a, A, 3, *, - ,=
          Formal Definitions
• An alphabet is a finite set of symbols.
• Examples:
  A = { a, b, c }
  B = { 0, 1 }
          Formal Definitions
• A string (or word) is a finite sequence of
  symbols from a given alphabet.
• Examples:
 S = { 0, 1 } is a alphabet
 0, 1, 11010, 101, 111 are strings from S
  A = { a, b, c ,d } is an alphabet
 bad, cab, dab, d, aaaaa are strings from A
          Formal Definitions
• A language is a set of strings from an
  alphabet.
• The set can be finite or infinite.
• Examples:
  A = { 0, 1}
  L1 = { 00, 01, 10, 11 }
  L2 = { 010, 0110, 01110,011110,
      …}
             Formal Definitions
• A grammar is a quadruple G = (V, Σ, R, S) where
  1) V is a finite set of variables (non-terminals),
  2) Σ is a finite set of terminals, disjoint from V,
  3) R is a finite set of rules. The left side of each rule
  is a string of one or more elements from V U Σ and
  whose right side is a string of 0 or more elements
  from V U Σ
  4) S is an element of V and is called the start symbol
          Formal Definitions
• Example grammar:
• G = (V, Σ, R, S)
      V = { S, A }
      Σ = { a, b }
      R = { S → aA
             A → bA
             A→a }
                  Derivations
  R = S → aA
      A → bA
      A→a
• A derivation is a sequence of replacements ,
  beginning with the start symbol, and replacing a
  substring matching the left side of a rule with the
  string on the right side of a rule
    S → aA
      → abA
      → abbA
      → abba
               Derivations
• What strings can be generated from the
  following grammar?
     S → aBa
     B → aBa
     B→b
           Formal Definitions
• The language generated by a grammar is
  the set of all strings of terminal symbols
  which are derivable from S in 0 or more steps.
• What is the language generated by this
  grammar?
• S→a
  S → aB
  B → aB
  B→a
              Kleene Closure
• Let Σ be a set of strings. Σ* is called the
  Kleene closure of Σ and represents the set of
  all concatenations of 0 or more strings in Σ.
• Examples
  Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …}
  Σ* = { 01 }* = { ø, 01, 0101, 010101, …}
  Σ* = { 0 + 1 }* = set of all possible strings of 0’s
  and 1’s. (+ means union)
           Formal Definitions
• A grammar G = (V,Σ, R, S) is right-linear if all
  rules are of the form:
     A → xB
     A→x
where A, B ε V and x ε Σ*
         Right-linear Grammar
• G = { V, Σ, R, S }
 V = { S, B }
  Σ = { a, b }
  R={        S → aS ,
             S→B,
             B → bB ,
             B→ε    }
What language is generated?
           Formal Definitions
• A grammar G = (V,Σ, R, S) is left-linear if all
  rules are of the form:
     A → Bx
     A→x
where A, B ε V and x ε Σ*
            Formal Definitions
• A regular grammar is one that is either right or left
  linear.
• Let Q be a finite set and let Σ be a finite set of
  symbols. Also let δ be a function from Q x Σ to
  Q, let q0 be a state in Q and let A be a subset of Q.
  We call each element of Q a state, δ the transition
  function, q0 the initial state and A the set of
  accepting states.
  Then a deterministic finite automaton (DFA) is a 5-
  tuple < Q , Σ , q0 , δ , A >

• Every regular grammar is equivalent to a DFA
         Language Definition
• Recognition – a machine is constructed that
  reads a string and pronounces whether the
  string is in the language or not. (Compiler)
• Generation – a device is created to generate
  strings that belong to the language.
  (Grammar)
        Chomsky Hierarchy
• Noam Chomsky (1950’s) described 4 classes
  of grammars
  1) Type 0 – unrestricted grammars
  2) Type 1 – Context sensitive grammars
  3) Type 2 – Context free grammars
  4) Type 3 – Regular grammars
                Grammars
• Context-free and regular grammars have
  application in computing
• Context-free grammar – each rule or
  production has a left side consisting of a
  single non-terminal
     Backus-Naur form (BNF)
• BNF was used to describe programming
  language syntax and is similar to Chomsky’s
  context free grammars
• A meta-language is a language used to
  describe another language
• BNF is a meta-language for computer
  languages
                    BNF
• Consists of nonterminal symbols, terminal
  symbols (lexemes and tokens), and rules or
  productions
• <if-stmt> → if <logical-expr> then <stmt>
• <if-stmt> → if <logical-expr> then <stmt>
                        else <stmt>
• <if-stmt> → if <logical-expr> then <stmt>
           | if <logical-expr> then <stmt>
                        else <stmt>
           A Small Grammar
<program>  begin <stmt_list> end
<stmt_list>  <stmt>
            | <stmt> ; <stmt_list>
<stmt>  <var> = <expression>
<var>  A | B | C
<expression>  <var> + <var>
            | <var> - <var>
            | <var>
             A Derivation
<program>  begin <stmt_list> end
 begin <stmt> end
begin <var> = <expression> end
begin A = <expression> end
begin A = <var> + <var> end
begin A = B + <var> end
begin A = B + C end
                    Terms
• Each of the strings in a derivation is called a
  sentential form.
• If the leftmost non-terminal is always the one
  selected for replacement, the derivation is a
  leftmost derivation.
• Derivations can be leftmost, rightmost, or
  neither
• Derivation order has no effect on the
  language generated by the grammar
  Derivations Yield Parse Trees
<program>  begin <stmt_list> end            <Program>
 begin <stmt> end
begin <var> = <expression> end
                                    begin    <stmt_list>   end
begin A = <expression> end
begin A = <var> + <var> end
begin A = B + <var> end                      <stmt>
begin A = B + C end
                                     <var>   =     <expression>


                                      A                <var> + <var>


                                                       B         C
               Parse Trees
• Parse trees describe the hierarchical structure
  of the sentences of the language they define.
• A grammar that generates a sentence for
  which there are two or more distinct parse
  trees is ambiguous.
     An Ambiguous Grammar
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <expr>
      | <expr> * <expr>
      | ( <expr> )
      | <id>
             Two Parse Trees – Same
                    Sentence
       <assign>                                     <assign>


<id>     =        <expr>                    <id>      =        <expr>


A       <expr>      +       <expr>          A            <expr> *      <expr>


         <id>       <expr> * <expr>                <expr> + <expr> <id>


             B          <id>         <id>           <id>        <id>      A


                        C             A              B           C
                   Derivation 1
<assign>  <id> = <expr>
        A = <expr>
        A = <expr> + <expr>
        A = <id> + <expr>
        A = B + <expr>
        A = B + <expr> * <expr>
        A = B + <id> * <expr>
        A = B + C * <expr>
        A = B + C * <id>
       A=B+C*A
                   Derivation 2
<assign>  <id> = <expr>
        A = <expr>
        A = <expr> * <expr>
        A = <expr> + <expr> * <expr>
        A = <id> + <expr> * <expr>
        A = B + <expr> * <expr>
        A = B + <id> * <expr>
        A = B + C * <expr>
        A = B + C * <id>
       A=B+C*A
                 Ambiguity
• Parse trees are used to determine the
  semantics of a sentence
• Ambiguous grammars lead to semantic
  ambiguity - this is intolerable in a computer
  language
• Often, ambiguity in a grammar can be
  removed
      Unambiguous Grammar
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <term> | <term>
<term>  <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>

• This grammar makes multiplication take precedence
  over addition
          Associativity of Operators
<assign>  <id> = <expr>                                <assign>
<id>  A | B | C
<expr>  <expr> + <term> |          <id>            =       <expr>
    <term>
<term>  <term> * <factor> |
                                     A             <expr>        +   <term>
    <factor>
<factor>  ( <expr> ) | <id>
                                           <expr> + <term>           <factor>

Addition operators associate from
   left to right                           <term>        <factor>     <id>


                                           <factor>         <id>       A


                                            <id>             C


                                            B
                       BNF
• A BNF rule that has its left hand side
  appearing at the beginning of its right hand
  side is left recursive .
• Left recursion specifies left associativity
• Right recursion is usually used for associating
  exponetiation operators
 <factor>  <exp> ** <factor> | <exp>
 <exp>  ( <expr> ) | <id>
         Ambiguous If Grammar
<stmt>  <if_stmt>
<if_stmt>  if <logic_expr> then <stmt> |
          if <logic_expr> then <stmt> else <stmt>

• Consider the sentential form:
if <logic_expr> then if <logic_expr> then <stmt> else <stmt>
 Parse Trees for an If Statement
<if_stmt>                                 <if_stmt>


If <logic_expr> then <stmt> else <stmt>   If <logic_expr> then <stmt>


                      <if_stmt>                              <if_stmt>




            if <logic_expr> then <stmt>   if <logic_expr> then <stmt> else <stmt>
    Unambiguous Grammar for If
           Statements
<stmt>  <matched> | <unmatched>
<matched>  if <logic_expr> then <matched> else <matched>
    | any non-if statement
<unmatched>  if <logic_expr> then <stmt>
        | if <logic_expr> then <matched> else <unmatched>
           Extended BNF (EBNF)
• Optional part denoted by […]
<selection>  if ( <expr> ) <stmt> [ else <stmt> ]

• Braces used to indicate the enclosed part can be repeated
   indefinitely or left out
<ident_list>  <identifier> { , <identifier> }

• Multiple choice options are put in parentheses and separated
  by the or operator |
<for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>
 BNF vs EBNF for Expressions
BNF:                          EBNF:
 <expr>  <expr> + <term>      <expr>  <term> { (+ | - ) <term> }
        | <expr> - <term>
        | <term>               <term>  <factor> { ( * | / ) <factor>
 <term>  <term> * <factor>
     | <term> / <factor>
     | <factor>

								
To top