Compiler Construction

Document Sample
Compiler Construction Powered By Docstoc
					Compiler Construction

        Vana Doufexi
vdoufexi@cs.northwestern.edu
   office #317 @ CS dept


                               1
         Administrative info

• class webpage
  – http://www.cs.northwestern.edu/academics/courses/322
  – contains:
     • news
     • staff information
     • lecture notes & other handouts
     • homeworks & manuals
     • policies, grades
     • newsgroup portal
     • useful links



                                                       2
         What is a compiler

• A program that reads a program written in
  some language and translates it into a
  program written in some other language
  – Modula-2 to C
  – Java to bytecodes
  – COOL to MIPS code




                                              3
       Why study compilers?

• Application of a wide range of theoretical
  techniques
• Good SW engineering experience
• Better understand languages




                                               4
       Features of compilers

• Correctness
  – preserve the meaning of the code
• Speed of target code
  – vs. speed of compilation?
• Good use of resources (size, power)
• Good error reporting/handling




                                        5
              Compiler structure

source                    IR              target
              Front End        Back End
code                                      code



  • Use intermediate representation
     – Why?




                                                   6
           Compiler Structure

• Front end
  – Recognize legal/illegal programs
     • report/handle errors
  – Generate IR
  – The process can be automated
• Back end
  – Translate IR into target code
     •   instruction selection
     •   register allocation
     •   instruction scheduling
     •   lots of NPC problems -- use approximations
                                                      7
            Compiler Structure

• Optimization: Middle stage
  – goals
     • improve running time of generated code
     • improve space, power consumption, etc.
  – how?
     • perform a number of transformations on the IR
     • multiple passes


  – important: preserve meaning of code



                                                       8
              The Front End

• Scanning (a.k.a. lexical analysis)
   – recognize "words"
• Parsing (a.k.a. syntax analysis)
   – check syntax
• Semantic analysis
   – examine meaning (e.g. type checking)
• Other issues:
   – symbol table (to keep track of identifiers)
   – error detection/reporting/recovery

                                                   9
                The Scanner

• Its job:
  – given a character stream, recognize words
    (tokens)
     • e.g. x = 1 becomes IDENTIFIER EQUAL INTEGER
  – collect identifier information
     • e.g. IDENTIFIER corresponds to a lexeme (the actual
       word x) and its type (acquired from the declaration of
       x).
  – ignore white space and comments
  – report errors
• Good news
  – the process can be automated                           10
                  The Parser

• Its job:
  – Check and verify syntax based on specified
    syntax rules
     • e.g. IDENTIFIER LPAREN RPAREN make up an
       EXPRESSION.
     • Coming soon: how context-free grammars specify
       syntax
  – Report errors
  – Build IR
     • often a syntax tree
• Good news
  – the process can be automated                        11
             Semantic analysis

• Its job:
  – Check the meaning of the program
     • e.g. In x=y, is y defined before being used? Are x and
       y declared?
     • e.g. In x=y, are the types of x and y such that you can
       assign one to the other?
  – Meaning may depend on context
  – Report errors




                                                            12
                       IRs

• Graphical
  – e.g. parse tree, DAG
• Linear
  – e.g. three-address code
• Hybrid
  – e.g. linear for blocks of straight-line code, a
    graph to connect blocks
• Low-level or high-level


                                                      13
        The scanning process

• Main goal: recognize words
• How? by recognizing patterns
  – e.g. an identifier is a sequence of letters or
    digits that starts with a letter.
• Lexical patterns form a regular language
• Regular languages are described using
  regular expressions (REs)
• Can we create an automatic RE recognizer?
  – Yes! (Hold that thought)

                                                     14
           The scanning process

• Definition: Regular expressions (over alphabet )
   –  is an RE denoting {}
   – If , then  is an RE denoting {}
   – If r and s are REs, then
      •   (r) is an RE denoting L(r)
      •   r|s is an RE denoting L(r)L(s)
      •   rs is an RE denoting L(r)L(s)
      •   r* is an RE denoting the Kleene closure of L(r)
• Property: REs are closed under many operations
   – This allows us to build complex REs.

                                                            15
           The scanning process

• Definition: Deterministic Finite Automaton
   – a five-tuple (, S, , s0, F) where
      •    is the alphabet
      •   S is the set of states
      •    is the transition function (SS)
      •   s0 is the starting state
      •   F is the set of final states (F  S)
• Notation:
   – Use a transition diagram to describe a DFA
• DFAs are equivalent to REs
   – Hey! We just came up with a recognizer!
                                                  16
       The scanning process
• Goal: automate the process
• Idea:
  – Start with an RE
  – Build a DFA
     • How?
        – We can build a non-deterministic finite automaton
          (Thompson's construction)
        – Convert that to a deterministic one
          (Subset construction)
        – Minimize the DFA
          (Hopcroft's algorithm)
  – Implement it
• Existing scanner generator: flex
                                                              17

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/14/2012
language:Latin
pages:17