# Regular expressions

Document Sample

```					Regular Expressions

1
Regular Expression

• A regular expression (RE) is
defined inductively
a       ordinary character
from S
e       the empty string

2
Regular Expression

R|S = either R or S
RS      = R followed by S
(concatenation)
R*      = concatenation of R
zero or more times
(R*= e |R|RR|RRR...)

3
RE Extentions

R?   = e | R (zero or one R)
R+   = RR* (one or more R)

4
RE Extentions

[abc] = a|b|c (any of
listed)
[a-z] = a|b|....|z (range)
[^ab] = c|d|... (anything
but
‘a’‘b’)
5
Regular Expression

RE       Strings in L(R)
a         “a”
ab        “ab”
a|b           “a” “b”
(ab)*     “” “ab” “abab” ...
(a|e)b        “ab” “b”
6
Example: integers

• integer: a non-empty string
of digits
• digit       = ‘0’|’1’|’2’|’3’|’4’|
’5’|’6’|’7’|’8’|’9’
• integer = digit digit*

7
Example: identifiers

• identifier:
string or letters or digits
starting with a letter
• C identifier:
[a-zA-Z_][a-zA-Z0-9_]*

8
Regular Definitions
• To write regular expression for some languages can be
difficult, because their regular expressions can be quite
complex. In those cases, we may use regular definitions.
• We can give names to regular expressions, and we can use
these names as symbols to define other regular
expressions.

• A regular definition is a sequence of the definitions of
the form:
d1  r1          where di is a distinct name and
d2  r2          ri is a regular expression over symbols in
.                             S{d1,d2,...,di-1}
dn  rn
9
Specification of Patterns for Tokens: Regular
Definitions

• Example:

letter  AB…Zab…z
digit  01…9
id  letter ( letterdigit )*

• digits  digit digit*

10
Regular Definitions (cont.)
• Ex: Identifiers in Pascal
letter  A | B | ... | Z | a | b | ... | z
digit  0 | 1 | ... | 9
id  letter (letter | digit ) *
– If we try to write the regular expression representing identifiers without using
regular definitions, that regular expression will be complex.
(A|...|Z|a|...|z) ( (A|...|Z|a|...|z) | (0|...|9) ) *

• Ex: Unsigned numbers in Pascal
digit  0 | 1 | ... | 9
digits  digit +
opt-fraction  ( . digits ) ?
opt-exponent  ( E (+|-)? digits ) ?
unsigned-num  digits opt-fraction opt-exponent

11
Specification of Patterns for Tokens: Notational
Shorthand

• The following shorthands are often used:
– + one or more instances of
– ? Zero or one instance

r+ = rr*
r? = re
[a-z] = abc…z

• Examples:
digit  [0-9]
num  digit+ (. digit+)? ( E (+-)? digit+ )?

12
Definition

• For primitive regular expressions:

L   

L    

La   a
13
Definition (continued)

• For regular expressions r1 and      r2

•
Lr1  r2   Lr1   Lr2 

Lr1  r2   Lr1  Lr2 

Lr1 *   Lr1  *

Lr1   Lr1 
14
Concatenation of Languages

• If L1 and L2 are languages, we can define the
concatenation
L1L2 = {w | w=xy, xL1, yL2}
• Examples:
– {ab, ba}{cd, dc} =? {abcd, abdc, bacd, badc}
– Ø{ab} =? Ø
Kleene Closure

• L* = i=0Li
= L0  L1  L2  …
• Examples:
– {ab, ba}* =? {e, ab, ba, abab, abba,…}
– Ø* =? {e}
– {e}* =? {e}
Example

• Regular expression     r  (0  1) * 00 (0  1) *

L(r ) = { all strings with at least
two consecutive 0 }

17
Example

• Regular expression     r  (1  01) * (0   )

L(r ) = { all strings without
two consecutive 0 }

18
Equivalent Regular Expressions

• Definition:

•   Regular expressions   r1   and   r2

•   are   equivalent if L(r )  L(r )
1       2

19
Example

•   L = { all strings without
two consecutive 0 }

r1  (1  01) * (0   )
r2  (1* 011*) * (0   )  1* (0   )

r1  and r2
L(r1)  L(r2 )  L
are equivalent
regular expr.
20
Assignment
• Σ = {0, 1}
• What is the language for
– 0*1*
• What is the regular expression for
–   {w | w has at least one 1}
–   {w | w starts and ends with same symbol}
–   {w | |w|  5}
–   {w | every 3rd position of w is 1}
–   L + = L1  L2  …
–   L? (means an optional L)
Regular Expressions
and
Regular Languages

22
Theorem

Languages
Generated by
Regular Expressions
   Regular
Languages

23
Standard Representations
of Regular Languages

Regular Languages

FAs

Regular
NFAs
Expressions

24
Elementary Questions

Regular Languages

25
Membership Question
Question:   Given regular language L
and string w
how can we check if w     L?

Answer:     Take the DFA that accepts L
and check if w is accepted

26
DFA
w
w L

DFA
w
w L

27
Question:    Given regular language L
how can we check
if L is empty: ( L  ) ?

Answer:     Take the DFA that accepts     L

Check if there is any path from
the initial state to a final state
28
DFA

L

DFA

L

29
Question:   Given regular language   L
how can we check
if L is finite?

Answer: Take the DFA that accepts        L

Check if there is a walk with cycle
from the initial state to a final state
30
DFA

L is infinite

DFA

L is finite

31
From RE to e-NFA

• For every regular expression R, we can
construct an e-NFA A, s.t. L(A) = L(R).
• Proof by structural induction:

Ø:

e:
a
a:
From RE to e-NFA

R+S:
e      R          e

e                 e
S
RS:
e
R                     S

R*:           e
e             e
R

e
Example: (0+1)*1(0+1)

e
0                                   0
e               e                   e           e
e                   e
e       1       e                   e       1   e

e
e
0                                   0
e               e                   e               e
e                           e   e   1   e
e       1       e                   e       1       e

e
Example : (a+b)*aba

```
DOCUMENT INFO
Categories:
Tags:
Stats:
 views: 1 posted: 1/14/2013 language: English pages: 35