04_RegularExpressions by dffhrtcv3

VIEWS: 0 PAGES: 21

									Regular Expressions

        CIS 361
    Fundamental Problems

   Need finite descriptions of infinite sets
    of strings.
          Discover and specify “regularity”.
   The set of languages over a finite
    alphabet is uncountable, while the set
    of descriptions is countable
Regular Expressions
   Language L is regular if there exists a
    finite acceptor for it
       Any language that is described by a regular
        expression can be accepted by some finite
        automaton
        Regular Expressions
   Regular expressions
       Combination of strings of symbols from some alphabet,
        parentheses and operators U, ., *
            U is union (some literature uses +)
            . (or nothing) is concatenation
            * is star closure or Kleene star
                  superscripted
                  repetition, 0 or more times
            + is closure
                  superscripted
                  repetition, 1 or more times
Specifying Lexical Structure
Using Regular Expressions
   Have some alphabet  = set of symbols
   Regular expressions are built from:
        - empty string
       Any letter from 
       r1r2 – String r1 followed by r2 (concatenation)
       r1 U r2 (r1 + r2) – either regular expression r1 or r2
        (union)
       r* - iterated sequence and choice  | r | r r | …
       Parentheses to indicate grouping/precedence
Regular Expressions
   Operations
       Union
       Complement
       Intersection
       Difference
       Concatenation
       Repetition
            Kleene star
            Plus operator
Regular Expressions
   Union
       LM
       The union of two regular expressions Q
        and R is Q U R
       In terms of automata A and B, respectively
            create a new initial state q
            connect it to the initial states of A and B by 
             transitions
Regular Expressions
   Complement
       * - L
       To construct the complement of a regular
        expression L, inspect the automaton that
        accepts its strings
            convert the automaton for L to a deterministic
             automaton
            flips favorable and nonfavorable states
            construct a regular expression for strings
             accepted by the updated automaton
     Regular Expressions
   Complement of
       bit strings with at least one “1”
       = bit strings containing no “1”s
       = 0*
   Complement of
       bit strings with exactly one “1”
       = bit strings containing no “1”s
               U bit strings with at least two “1”s
       = 0* U (0* 1 0* 1 0*)(0 U 1)*
Regular Expressions
   Intersection
       LM
       Apply DeMorgan’s law
           Union of the complements of L and M
Regular Expressions
   Difference
       L–M
       Can be expressed as the intersection of
        languages L and * - M
Regular Expressions
   Concatenation
       Strings u and v over alphabet  is string uv
       Languages L1 and L2 concatenated
            L1L2 ={uv|u  L1, v  L2}
       Can be extended to any finite number of
        languages
Regular Expressions
   Concatenation
       LM
       Algorithm connects every favorable state of
        L to the initial state of M by an arrow
        labeled 
            Favorable states of L become non-favorable
       Favorable states of M become favorable
        states of the new automaton
Regular Expressions
   Kleene star
       L*
       In terms of automaton
            connect every favorable state of L to the initial
             state of L by a transition labeled 
            create a new initial state s, make it the only
             favorable state and connect it to the old initial
             state by  transition
Regular Expressions
   Plus (+)
     L+
     In terms of automaton

           connect every favorable state of L to the
            initial state of L by a transition labeled 
                That’s it. This gets one or more times to a
                 favorable state
    Naming Languages
 Regular sets can be named using the
  derivation in terms of the seed elements
  and the closure operations. Regular
  expressions formalize this approach.
Regular sets  Regular Expressions
Numbers  Numerals
Semantics  Syntax
      Example

   Regular expressions for strings over {a,b}
    containing at least one “a”.
        Focus on the one “a”

               (a u b)*a(a u b)*
          Focus on the leftmost “a”
               b*a(a u b)*
          Focus on the “a”s
               b*ab*(ab*)*
          Further optimization
               b*(ab*)+
Equivalence of regular expressions

Two regular expressions are equivalent
if they represent the same regular set.
Concept of Language Generated by
Regular Expressions
   Set of all strings generated by a regular
    expression is the language of the
    regular expression
   In general, a language may be
    (countably) infinite
   A string in a language is often called a
    token
      Examples of Languages and
      Regular Expressions
    = { 0, 1, . }
      (0 U 1)*.(0 U 1)* - Binary floating point numbers

      (00)* - even-length all-zero strings

      1*(01*01*)* - strings with even number of zeros

    = {A,…,Z, a,…,z, 0,…,9,_ }
      (A U … U z)(A U … U z U 0 U … U 9 U _)
                                                  * identifiers

      (1 U … U 9)(0 U … U 9) natural numbers (no negatives)
                                *

      (0|1|2)* - trinary (base 3) numbers
    Finite-State Automata
   Alphabet 
   Set of states with initial and accepting states
   Transitions between states, labeled with symbol(s)

            (0 | 1)*.(0|1)*
     1                                  1


                                        0

    0

								
To top