# Regular Expressions and Non-regular Languages

Document Sample

```					Regular Expressions and
Non-regular Languages
http://cis.k.hosei.ac.jp/~yukita/
Expressions and their values

Expression Value
Arithmetic
(5  3)  4    32
expression
(0  1)  0*   The language that consistsof
Regular
or             strings begining with 0 or 1
expression
(0  1)0*      followed by an arbitrary number of 0s.
Subexpressions 0, 1, 0  1, and 0* have values {0},{1}, and
{0,1}, and {0}* , respectively. The value of the whole expression
is {0,1}  {0}* .
2
Definition 1.26

R is a regular expression if
1. R  a  ,
2. R   ,
3. R  ,
4. R  ( R1  R2 ), where R1 and R2 are regular expressions,
5. R  ( R1  R2 ), where R1 and R2 are regular expressions, or
6. R  ( R1 ), where R1 is a regular expression.
*

3
The values of atomic expressions
expression     value             meaning
The language that consistsof
a          {a}
only one string a  * .
The language that consistsof
          { }
only the empty string   * .
          {}     The empty language
In many real programming languages, we must distinguis h
alphabet a   and string a  * ; the former should be
written as ' a' while the latter " a". But we will not follow
this convention.
4
Example 1.27
Let   {0,1} throughout in the following examples.
1. 0*10*  {w | w has exactly a single 1}.
2. *1*  {w | w has at least one 1}.
3. * 001*  {w | w contains the string 001 as a substring}.
4. ()*  {w | w is a string of even length}.
7. 0* 0  1*1  0  1  {w | w starts and ends with thesame symbol}.
8. (0   )1*  01*  1*.
10. 1*   .
11. *  { }.

5
Units for the binary operations

R    R shows that  is the unit for theunion operation.
R   R    shows that  is the unit for theconcatenat operation.
ion
We denote by L( R) the language of R.
R   may not equal R.
For example, if R  0, then L( R)  {0} but L( R   )  {0,  }.
In general, R    R.
For example, if R  0, then L( R)  {0} but L( R  )  .

6
Theorem 1.28
• A language is regular if and only if some
regular expression describes it.
We break down this theorem as follows.
• Lemma 1.29
– If a language is described by a regular
expression, then it is regular.
• Lemma 1.32
– If a langulage is regular, then it is described
by a regular expression.
7
Proof of Lemma 1.29
We shall convert R into an NFA N .
a
Case 1 : R  a.
Case 2 : R   .
Case 3 : R  .
Case 4 : R  R1  R2 .
Case 5 : R  R1  R2 .                    Next three slides

Case 6 : R  R
*
1
Proof of Lemma 1.29                       8
Case 4: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N
N1



N2

Proof of Lemma 1.29          9
Case 5: Let N1, N2, and N correspond to R1,
R2, and R, respectively.
N1                       N2

N                        



Proof of Lemma 1.29      10
Case 6: Let N1 and N correspond to R1 and
R, respectively.

N
   
N1


Proof of Lemma 1.29           11
Generalized Nondeterministic
Finite Automaton
• is roughly a NFA in which the transition arrows may have
regular expressions as labels.
• We assume the following standard form for convenience,
which can always be attained with an easy modification.
– There is only one accept state and different from the start state.
– The start state has transition arrows going to every other state
but no arrows coming in from any other state.
– There is only a single accept state, and it has arrows coming in
from any other state but no arrows going to any other state.
– Except for the start and accept states, one arrow goes from
every state to every other state and also from each state to itself.

12
Standard Form of GNFA

...
13
Standard Form of GNFA

Q  {qstart }  Q  {qaccept} is a disjont union.
For each pair (qi , q j ) in ({qstart }  Q)  (Q  Q)  Q  {qaccept}
there is one and only one directed arrow going from qi to q j .

14
Equivalent GNFA with one fewer state

R4                     ( R1 )( R2 ) * ( R3 )  ( R4 )
qi                 qj   qi                                    qj

R1           R3

qrip
R2

15
Definition 1.33

A generalized nondeterministic finite automaton is 5 - tuple
(Q, ,  , qstart , qaccept), whereR is the set of regular expressions,
1. Q is the finite set of states,
2.  is the input alphabet,
3.  : (Q  {qaccept})  (Q  {qstart })  R,
4. qstart is the start state, and
5. qaccept is the acceptstate.

16
Computation with GNFA

A GNFA acceptsa string w  w1w2  wk  * with wi  * ,
and a sequenceof states q0 , q1 , , qk exists such that
1. q0  qstart ,
2. qk  qaccept,
3. for each i, we have wi  L( Ri ), where Ri   (qi 1 , qi ).

17
Converting GNFA
Convert(G ) :
1. Let k be the number of states of G.
2. If k  2, then return theregular expression appearing on the
only arrow.
3. If k  2, weselect any state qrip  Q  {qstart , qaccept} and let
G be the GNFA (Q, ,  , qstart , qaccept), whereQ  Q  {qrip },
and for any qi  Q  {qaccept} and q j  Q  {qstart} let
 (qi , q j )  ( R1 )(R2 ) * ( R3 )  ( R4 ),
for R1   (qi , qrip ), R2   (qrip , qrip ), R3   (qrip , q j ), and R4   (qi , q j ).
4. Compute Convert(G) and return thi value.
s
18
Claim 1.34 For any GNFA G, Convert(G) is
equivalent to G.
Proof. B asis(k  2) : Obvious.
Induction step: Assume that theclaim is true for k  1 states.
Suppose that G acceptsan input w. Then, in an accepting branch
of the computation, G enters a sequenceof states
qstart , q1 , q2 , q3 , , qaccept.
If qrip  {qstart , q1 , q2 , q3 , , qaccept}, clearly G also acceptsw.
If qrip  {qstart , q1 , q2 , q3 , , qaccept}, removing each run of
consecutiv qrip states forms an accepting computation for G.
e
The states qi and q j bracketing a run have a new regular expression
em
on the arrow between th that describesall strings taking qi to q j
in via qrip on G. So G acceptsw.
19
Proof continued
For theother direction, suppose that G acceptsan input w.
As each arrow between any twostates qi and q j in G describes
the collection of strings taking qi and q j in G , either directly
or via qrip , G must also accept w. Thus G and G are equivalent.
The induction hypothesis states that when the algorithm calls
itself recursivel on input G , the result is a regular expression
y
that is equivalent to G becauseG has k  1 states.Hence the
regular expression also is equivalent to G, and the algorithm
is proved correct.
20
Non-regularity

B, C , and D seem to require machines with infinite number of states
to recognize them.
B  {0 n1n | n  0}
C  {w | w has an equal number of 0s and 1s}
D  {w | w has an equal number of occurrence of
s
01 and 10 as substrings}.
B and C will turn out to be nonregular while D regular.
See, problem 1.41.

21
Theorem 1.37 Pumping Lemma

If A is a regular language, then there is a number p
(the pumping length) where,if s  A with | s | p,
then s can be divided into three pieces,
s  xyz, satisfying the following conditions :
1. for each i  0, xy z  A,
i

2. | y | 0, and
3. | xy | p.
22
Proof of Th 1.37
s  s1 s2  sk  sl  sn
          
q1 q2  qk  qk  qa
 ( s1s2 )(sk  sl )( sn )  xyz
where qa is an acceptingstate.
By the pigeonhole principle,
we can take the pumping length
as the number of states plus one.
23
Example 1.38

Claim : B  {0 n1n | n  0} is not regular.
t
Proof. Assume the contrary hat B is regular.
Let p be the pumping length. Then, s  0 p1p can be decomposed
as s  xyz with | y | 1, and xy n z  B for any n  0.
There can be three cases y  0 k , y  0 k1l , and y  1l , for some
nonzero k , l. In each case, we can eazily see that xy n z  B, which

24
Example 1.39

Claim : C  {w | w has an equal number of 0s and 1s}
is not regular.
t
Proof. Assume the contrary hat C is regular.
Let p be the pumping length. Then, s  0 p1p can be decomposed
as s  xyz with | y | 1 and | xy | p, and xy n z  C for any n  0.
Then wemust have y  0 k for some nonzero k .
We can eazily see that xy n z  C , which contradicts the assumption.

25
Alternative proof of 1.39

The class of regular languages is closed under theitnersection
operation. This is eazy to prove if we run twoDFAs parallely and
accept only strings which are acceptedby both of the DFAs.
Now, assumeC is regular. Then, C  0*1*  B is also regular.
s
This contradict what weproved in Example 1.38.

26
Example 1.40

Claim : F  {ww | w  {0,1}*} is not regular.
t
Proof. Assume the contrary hat F is regular.
Let p be the pumping length and let s  0 p10 p1 F . This s can be split
into pieces like s  xyz with | y | 1 and | xy | p, and xy n z  F for any n  0.
Then wemust have y  0 k for some nonzero k .
We can eazily see that xy n z  F , which contradicts the assumption.

27
Example 1.41 Unary Language

Claim : D  {1 | n  0} is not regular.
n2

t
Proof. Assume the contrary hat D is regular.
Let p be the pumping length and let s  1  D. This s can be split
p2

into pieces like s  xyz with | y | 1 and | xy | p, and xy n z  D
for any n  0.
The length of xy n z
growslinearly with n, while the lengths
of strings in D growsas 0,1,4,9,16,25,36,49, 
These two facts are incompatible as can be easily seen.

28
Example 1.42 Pumping Down

Claim : E  {0i1 j | i  j} is not regular.
t
Proof. Assume the contrary hat E is regular.
Let p be the pumping length and let s  0 p 11p. This s can be split
into s  xyz with | y | 1 and | xy | p, and xy n z  E for any n  0.
Then wemust have y  0 k for some nonzero k .
We can eazily see that xy 0 z  xz  E , which contradict the assumption.
s

29
Problem 1.41 Differential Encoding
Claim : D  {w | w contains equal number of occurrences of the
substrings 01 and 10} is regular.
0             1

1
0   0xx0         0xx1
0

qstart
0
1   1xx1         1xx0
1

1                0          30

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 6 posted: 8/28/2012 language: Unknown pages: 30
How are you planning on using Docstoc?