# 02

Document Sample

```					       The scanning process
• Goal: automate the process
• Idea:
– Build a DFA
• How?
– We can build a non-deterministic finite automaton
(Thompson's construction)
– Convert that to a deterministic one
(Subset construction)
– Minimize the DFA
(Hopcroft's algorithm)
– Implement it
• Existing scanner generator: flex
1
The scanning process: step 1
• Let's build a mini-scanner that recognizes
exactly those strings of as and bs that end
in ab
• Step 1: Come up with a Regular Expression

(a|b)*ab

2
The scanning process: step 2
• Step 2: Use Thompson's construction to create
an NFA for that expression
• We want to be able to automate the process
• Thompson's construction gives a systematic
way to create an NFA from a RE.
• It builds the NFA in a bottom-up manner.
• At any time during construction
– there is only one final state
– no transitions leave the final state
– components are linked together using -productions.
3
The scanning process: step 2
• Step 2: Use Thompson's construction to
create an NFA for that expression

a            a                         a   
                                 
                    
b           b                       b   


a|b
(a|b)*
4
The scanning process: step 2
• Step 2: Use Thompson's construction to
create an NFA for that expression

a   
                      a       b
                             
   b   


(a|b)*ab                    5
The scanning process: step 3
• Step 3: Use subset construction to convert
the NFA to a DFA
• Observation:
– Two states qi, qk, linked together with an -
productions in the NFA should be the same
state in the DFA because the machine goes
from qi to qk without consuming input.
• The -closure() function takes a state q and returns
all the states that can be reached from q on -
productions only.

6
The scanning process: step 3
• Step 3: Use subset construction to convert
the NFA to a DFA
• Observation:
– If, on some input a, the NFA can go to any one
of k states, then those k state should be
represented by a single state in the DFA.
• The () function takes as input a state q and a
character x and returns all states that we can go to
from q when reading a single x.

7
The scanning process: step 3
• Step 3: Use subset construction to convert
the NFA to a DFA
– The start state Qo of the DFA is the -closure
of the start state q0 of the NFA
– Compute -closure((Q0, x)) for each valid input
character x. This will generate new states.
– Systematically compute -closure((Qi, x)) until
no new states can be created.
– The final states of the DFA are those that
contain final states of the NFA.

8
The scanning process: step 3
• Step 3: Use subset construction to convert
the NFA to a DFA

a       
3       5
               
1  2                     7  8  9 a 10  11 b 12
       b       

4       6

-closure(1) = {1, 2, 3, 4, 8, 9}

9
The scanning process: step 3

a       
3       5
               
1  2                      7  8  9 a 10  11 b 12
        b       

4       6

Q0 = {1,2,3,4,8,9}
(Q2, a) = Q1
(Q0, a) = {5,7,8,9,2,3,4,10,11} = Q1
(Q2, b) = Q2
(Q0, b) = {6,7,8,9,2,3,4} = Q2
(Q3, a) = Q1
(Q1, a) = Q1
(Q3, b) = Q2
(Q1, b) = {6,7,8,9,2,3,4,12} = Q3

10
The scanning process: step 3

a       
3       5
                 
1  2                     7  8  9 a 10  11 b 12
       b       

4       6

a
a
a   1   b
0     a         3
b 2     b

b                  11
The scanning process: step 4
• Step 4: Use Hopcroft's algorithm to
minimize the DFA
a                            States Q0 and Q2 behave
(Q0, a) = Q1   the same way, so they can
a   (Q0, b) = Q2   be merged.
a    1   b       (Q2, a) = Q1   Note that even though Q3
0     a          3   (Q2, b) = Q2   also behaves the same way,
b 2      b                       it cannot be merged with
Q0 or Q2 because Q3 is a
b                           final state while Q0 and Q2
are not.     a
a
a   1   b
0               3
b
b
12
In practice

• flex is a scanner generator that takes a
RE specification and follows the described
process to generate a DFA.
– actions to be performed whenever a valid string
has been recognized
• e.g. insert identifier in symbol table
– error messages to be generated when the input
string is invalid.

13
In practice

• Errors that are typically detected during
scanning include
– Unterminated strings
– Invalid characters

14

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 2/11/2012 language: English pages: 14
How are you planning on using Docstoc?