Docstoc

02

Document Sample
02 Powered By Docstoc
					       The scanning process
• Goal: automate the process
• Idea:
  – Start with an RE
  – Build a DFA
     • How?
        – We can build a non-deterministic finite automaton
          (Thompson's construction)
        – Convert that to a deterministic one
          (Subset construction)
        – Minimize the DFA
          (Hopcroft's algorithm)
  – Implement it
• Existing scanner generator: flex
                                                              1
  The scanning process: step 1
• Let's build a mini-scanner that recognizes
  exactly those strings of as and bs that end
  in ab
• Step 1: Come up with a Regular Expression


                  (a|b)*ab




                                                2
   The scanning process: step 2
• Step 2: Use Thompson's construction to create
  an NFA for that expression
• We want to be able to automate the process
• Thompson's construction gives a systematic
  way to create an NFA from a RE.
• It builds the NFA in a bottom-up manner.
• At any time during construction
  – there is only one final state
  – no transitions leave the final state
  – components are linked together using -productions.
                                                   3
    The scanning process: step 2
• Step 2: Use Thompson's construction to
  create an NFA for that expression



a            a                         a   
                                         
                                               
b           b                       b   
                          

             a|b
                                       (a|b)*
                                                    4
  The scanning process: step 2
• Step 2: Use Thompson's construction to
  create an NFA for that expression



                   a   
                                 a       b
                                    
                  b   
      


                   (a|b)*ab                    5
  The scanning process: step 3
• Step 3: Use subset construction to convert
  the NFA to a DFA
• Observation:
  – Two states qi, qk, linked together with an -
    productions in the NFA should be the same
    state in the DFA because the machine goes
    from qi to qk without consuming input.
     • The -closure() function takes a state q and returns
       all the states that can be reached from q on -
       productions only.



                                                              6
  The scanning process: step 3
• Step 3: Use subset construction to convert
  the NFA to a DFA
• Observation:
  – If, on some input a, the NFA can go to any one
    of k states, then those k state should be
    represented by a single state in the DFA.
     • The () function takes as input a state q and a
       character x and returns all states that we can go to
       from q when reading a single x.




                                                              7
  The scanning process: step 3
• Step 3: Use subset construction to convert
  the NFA to a DFA
  – The start state Qo of the DFA is the -closure
    of the start state q0 of the NFA
  – Compute -closure((Q0, x)) for each valid input
    character x. This will generate new states.
  – Systematically compute -closure((Qi, x)) until
    no new states can be created.
  – The final states of the DFA are those that
    contain final states of the NFA.


                                                   8
  The scanning process: step 3
• Step 3: Use subset construction to convert
  the NFA to a DFA


                 a       
             3       5
                        
   1  2                     7  8  9 a 10  11 b 12
                b       
    
             4       6

-closure(1) = {1, 2, 3, 4, 8, 9}

                                                        9
       The scanning process: step 3

                       a       
                   3       5
                              
        1  2                      7  8  9 a 10  11 b 12
                      b       
         
                   4       6


Q0 = {1,2,3,4,8,9}
                                           (Q2, a) = Q1
(Q0, a) = {5,7,8,9,2,3,4,10,11} = Q1
                                           (Q2, b) = Q2
(Q0, b) = {6,7,8,9,2,3,4} = Q2
                                           (Q3, a) = Q1
(Q1, a) = Q1
                                           (Q3, b) = Q2
(Q1, b) = {6,7,8,9,2,3,4,12} = Q3

                                                              10
The scanning process: step 3

              a       
          3       5
                     
1  2                     7  8  9 a 10  11 b 12
             b       
 
          4       6

                                  a
                                          a
                              a   1   b
                          0     a         3
                              b 2     b

                                  b                  11
        The scanning process: step 4
    • Step 4: Use Hopcroft's algorithm to
      minimize the DFA
        a                            States Q0 and Q2 behave
                     (Q0, a) = Q1   the same way, so they can
                 a   (Q0, b) = Q2   be merged.
    a    1   b       (Q2, a) = Q1   Note that even though Q3
0     a          3   (Q2, b) = Q2   also behaves the same way,
    b 2      b                       it cannot be merged with
                                     Q0 or Q2 because Q3 is a
         b                           final state while Q0 and Q2
                                     are not.     a
                                                           a
                                               a   1   b
                                           0               3
                                                       b
                                          b
                                                               12
                  In practice

• flex is a scanner generator that takes a
  RE specification and follows the described
  process to generate a DFA.
• The user additionally specifies
  – actions to be performed whenever a valid string
    has been recognized
     • e.g. insert identifier in symbol table
  – error messages to be generated when the input
    string is invalid.


                                                  13
              In practice

• Errors that are typically detected during
  scanning include
  – Unterminated strings
  – Unterminated comments
  – Invalid characters




                                              14

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:2
posted:2/11/2012
language:English
pages:14