Compiler Construction

Document Sample
Compiler Construction Powered By Docstoc
					Compiler Construction

        Vana Doufexi
   office #317 @ CS dept

         Administrative info

• class webpage
  – contains:
     • news
     • staff information
     • lecture notes & other handouts
     • homeworks & manuals
     • policies, grades
     • newsgroup portal
     • useful links

         What is a compiler

• A program that reads a program written in
  some language and translates it into a
  program written in some other language
  – Modula-2 to C
  – Java to bytecodes
  – COOL to MIPS code

       Why study compilers?

• Application of a wide range of theoretical
• Good SW engineering experience
• Better understand languages

       Features of compilers

• Correctness
  – preserve the meaning of the code
• Speed of target code
  – vs. speed of compilation?
• Good use of resources (size, power)
• Good error reporting/handling

              Compiler structure

source                    IR              target
              Front End        Back End
code                                      code

  • Use intermediate representation
     – Why?

           Compiler Structure

• Front end
  – Recognize legal/illegal programs
     • report/handle errors
  – Generate IR
  – The process can be automated
• Back end
  – Translate IR into target code
     •   instruction selection
     •   register allocation
     •   instruction scheduling
     •   lots of NPC problems -- use approximations
            Compiler Structure

• Optimization: Middle stage
  – goals
     • improve running time of generated code
     • improve space, power consumption, etc.
  – how?
     • perform a number of transformations on the IR
     • multiple passes

  – important: preserve meaning of code

              The Front End

• Scanning (a.k.a. lexical analysis)
   – recognize "words"
• Parsing (a.k.a. syntax analysis)
   – check syntax
• Semantic analysis
   – examine meaning (e.g. type checking)
• Other issues:
   – symbol table (to keep track of identifiers)
   – error detection/reporting/recovery

                The Scanner

• Its job:
  – given a character stream, recognize words
     • e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
  – collect identifier information
     • e.g. IDENTIFIER corresponds to a lexeme (the actual
       word x) and its type (acquired from the declaration of
  – ignore white space and comments
  – report errors
• Good news
  – the process can be automated                           10
                  The Parser

• Its job:
  – Check and verify syntax based on specified
    syntax rules
     • e.g. IDENTIFIER LPAREN RPAREN make up an
     • Coming soon: how context-free grammars specify
  – Report errors
  – Build IR
     • often a syntax tree
• Good news
  – the process can be automated                        11
             Semantic analysis

• Its job:
  – Check the meaning of the program
     • e.g. In x=y, is y defined before being used? Are x and
       y declared?
     • e.g. In x=y, are the types of x and y such that you can
       assign one to the other?
  – Meaning may depend on context
  – Report errors


• Graphical
  – e.g. parse tree, DAG
• Linear
  – e.g. three-address code
• Hybrid
  – e.g. linear for blocks of straight-line code, a
    graph to connect blocks
• Low-level or high-level

        The scanning process

• Main goal: recognize words
• How? by recognizing patterns
  – e.g. an identifier is a sequence of letters or
    digits that starts with a letter.
• Lexical patterns form a regular language
• Regular languages are described using
  regular expressions (REs)
• Can we create an automatic RE recognizer?
  – Yes! (Hold that thought)

           The scanning process

• Definition: Regular expressions (over alphabet )
   –  is an RE denoting {}
   – If , then  is an RE denoting {}
   – If r and s are REs, then
      •   (r) is an RE denoting L(r)
      •   r|s is an RE denoting L(r)L(s)
      •   rs is an RE denoting L(r)L(s)
      •   r* is an RE denoting the Kleene closure of L(r)
• Property: REs are closed under many operations
   – This allows us to build complex REs.

           The scanning process

• Definition: Deterministic Finite Automaton
   – a five-tuple (, S, , s0, F) where
      •    is the alphabet
      •   S is the set of states
      •    is the transition function (SS)
      •   s0 is the starting state
      •   F is the set of final states (F  S)
• Notation:
   – Use a transition diagram to describe a DFA
• DFAs are equivalent to REs
   – Hey! We just came up with a recognizer!
       The scanning process
• Goal: automate the process
• Idea:
  – Start with an RE
  – Build a DFA
     • How?
        – We can build a non-deterministic finite automaton
          (Thompson's construction)
        – Convert that to a deterministic one
          (Subset construction)
        – Minimize the DFA
          (Hopcroft's algorithm)
  – Implement it
• Existing scanner generator: flex

Shared By: