# regular_expressions by wuzhengqin

VIEWS: 0 PAGES: 4

• pg 1
```									C311 Regular Grammars, Scanners, DFA                              http://www.cs.iusb.edu/~danav/teach/c311/c311_10_reggr.html

Dana Vrajitoru
C311 Programming Languages

Regular Grammars, Scanners, DFA
DFA versus NFA

DFA - deterministic ﬁnite state automaton.
A state transition machine where from each state we can move to one of the others based on the
current terminal symbol.
There at most one outgoing branch for each possible symbol.
NFA - non-deterministic ﬁnite state automaton.
The difference with a DFA is that there can be more than one outgoing branch with the same
symbol.
Regular grammars can be translated as NFAs.

NFA to DFA

To write a scanner more easily, we can transform an NFA into a DFA.
For this we construct a DFA where the states are combinations of states from the NFA that
represent alternatives that can be reached from any given state with the same symbol.
For example in the ﬁrst grammar we would add a state called TV:

Example: NFA (top) and equivalent DFA (bottom)

1 of 4                                                                                                       1/13/12 4:25 PM
C311 Regular Grammars, Scanners, DFA                                http://www.cs.iusb.edu/~danav/teach/c311/c311_10_reggr.html

Regular Expressions

Recognized by regular grammars.
They can be parsed by a linear algorithm that doesn't need to store the global structure of the
parsing tree.
Only one non-terminal symbol is active at any moment.
This type of parsing algorithm is called a scanner.
The ﬁrst compilation step, the lexical analysis, is done by a scanner.

Examples

An identiﬁer in Lisp:
I    =>   letter J
J    =>   letter J
J    =>   digit J
J    =>   _ J
J    =>   - J
J    =>   e

Condense the rules:
J =>{_|-} J
where e is the empty string.

An integer number:
N     =>   UN
N     =>   - UN
N     =>   + UN
UN    =>   digit UN
UN    =>   digit

2 of 4                                                                                                         1/13/12 4:25 PM
C311 Regular Grammars, Scanners, DFA                                  http://www.cs.iusb.edu/~danav/teach/c311/c311_10_reggr.html

Condensing the rules:
N => {e|+|-} UN

Finite States Machines

A string (simple character means one that is not a quote, simple quote, backslash):
S    =>   " T
T    =>   simple-character T
T    =>   \ V
V    =>   {"|'|\|n|t...} T
T    =>   "

Real Number
N => {-|+|e} UN
UN => UI . UI
UI => digit UI
UI => digit

The second rule above is not a regular rule. It needs a parser and not a scanner to be processed.
We can transform it the following way:
UN    =>   digit UN
UN    =>   digit
UN    =>   . UF
UF    =>   digit UF
UF    =>   digit

Scanner / Parser

The lexical analysis of a program is done by a scanner.
A scanner is based on a regular grammar where the end of any chain of rules (a non-terminal
converted to a single terminal) produces a token as a result.
The parsing process calls the scanner during its execution to retrieve every new token, then
processes it before invoking the scanner again.
They are not two separate phases: the scanner is called repeatedly from the parser.

Scanner in Context

A scanner that works in the context of a program in which it recognizes individual units.
It can be written to recognize the termination of a unit by a character that is not part of it.
We can replace:
UN => digit UN
UN => digit

with:
UN => digit UN
UN => e

Table Driven Scanner

3 of 4                                                                                                           1/13/12 4:25 PM
C311 Regular Grammars, Scanners, DFA                            http://www.cs.iusb.edu/~danav/teach/c311/c311_10_reggr.html

If the grammar is equivalent to a DFA, then we can organize the rules in a table:

I   =>   letter J
I   =>   _ J
J   =>   letter J
J   =>   digit J
J   =>   _ J
J   =>   - J
J   =>   e

Table:
letter digit _ -   else
I J       err   J err err
J J       J     J J   end

4 of 4                                                                                                     1/13/12 4:25 PM

```
To top