# 04_RegularExpressions by dffhrtcv3

VIEWS: 0 PAGES: 21

• pg 1
Regular Expressions

CIS 361
Fundamental Problems

   Need finite descriptions of infinite sets
of strings.
   Discover and specify “regularity”.
   The set of languages over a finite
alphabet is uncountable, while the set
of descriptions is countable
Regular Expressions
   Language L is regular if there exists a
finite acceptor for it
   Any language that is described by a regular
expression can be accepted by some finite
automaton
Regular Expressions
   Regular expressions
   Combination of strings of symbols from some alphabet,
parentheses and operators U, ., *
   U is union (some literature uses +)
   . (or nothing) is concatenation
   * is star closure or Kleene star
   superscripted
   repetition, 0 or more times
   + is closure
   superscripted
   repetition, 1 or more times
Specifying Lexical Structure
Using Regular Expressions
   Have some alphabet  = set of symbols
   Regular expressions are built from:
    - empty string
   Any letter from 
   r1r2 – String r1 followed by r2 (concatenation)
   r1 U r2 (r1 + r2) – either regular expression r1 or r2
(union)
   r* - iterated sequence and choice  | r | r r | …
   Parentheses to indicate grouping/precedence
Regular Expressions
   Operations
   Union
   Complement
   Intersection
   Difference
   Concatenation
   Repetition
   Kleene star
   Plus operator
Regular Expressions
   Union
   LM
   The union of two regular expressions Q
and R is Q U R
   In terms of automata A and B, respectively
   create a new initial state q
   connect it to the initial states of A and B by 
transitions
Regular Expressions
   Complement
   * - L
   To construct the complement of a regular
expression L, inspect the automaton that
accepts its strings
   convert the automaton for L to a deterministic
automaton
   flips favorable and nonfavorable states
   construct a regular expression for strings
accepted by the updated automaton
Regular Expressions
   Complement of
bit strings with at least one “1”
= bit strings containing no “1”s
= 0*
   Complement of
bit strings with exactly one “1”
= bit strings containing no “1”s
U bit strings with at least two “1”s
= 0* U (0* 1 0* 1 0*)(0 U 1)*
Regular Expressions
   Intersection
   LM
   Apply DeMorgan’s law
   Union of the complements of L and M
Regular Expressions
   Difference
   L–M
   Can be expressed as the intersection of
languages L and * - M
Regular Expressions
   Concatenation
   Strings u and v over alphabet  is string uv
   Languages L1 and L2 concatenated
   L1L2 ={uv|u  L1, v  L2}
   Can be extended to any finite number of
languages
Regular Expressions
   Concatenation
   LM
   Algorithm connects every favorable state of
L to the initial state of M by an arrow
labeled 
   Favorable states of L become non-favorable
   Favorable states of M become favorable
states of the new automaton
Regular Expressions
   Kleene star
   L*
   In terms of automaton
   connect every favorable state of L to the initial
state of L by a transition labeled 
   create a new initial state s, make it the only
favorable state and connect it to the old initial
state by  transition
Regular Expressions
   Plus (+)
 L+
 In terms of automaton

   connect every favorable state of L to the
initial state of L by a transition labeled 
   That’s it. This gets one or more times to a
favorable state
Naming Languages
 Regular sets can be named using the
derivation in terms of the seed elements
and the closure operations. Regular
expressions formalize this approach.
Regular sets  Regular Expressions
Numbers  Numerals
Semantics  Syntax
Example

   Regular expressions for strings over {a,b}
containing at least one “a”.
 Focus on the one “a”

(a u b)*a(a u b)*
   Focus on the leftmost “a”
b*a(a u b)*
   Focus on the “a”s
b*ab*(ab*)*
   Further optimization
b*(ab*)+
Equivalence of regular expressions

Two regular expressions are equivalent
if they represent the same regular set.
Concept of Language Generated by
Regular Expressions
   Set of all strings generated by a regular
expression is the language of the
regular expression
   In general, a language may be
(countably) infinite
   A string in a language is often called a
token
Examples of Languages and
Regular Expressions
    = { 0, 1, . }
 (0 U 1)*.(0 U 1)* - Binary floating point numbers

 (00)* - even-length all-zero strings

 1*(01*01*)* - strings with even number of zeros

    = {A,…,Z, a,…,z, 0,…,9,_ }
 (A U … U z)(A U … U z U 0 U … U 9 U _)
* identifiers

 (1 U … U 9)(0 U … U 9) natural numbers (no negatives)
*

 (0|1|2)* - trinary (base 3) numbers
Finite-State Automata
   Alphabet 
   Set of states with initial and accepting states
   Transitions between states, labeled with symbol(s)

(0 | 1)*.(0|1)*
1                                  1

0

0

To top