Docstoc

Tree Automata

Document Sample
Tree Automata Powered By Docstoc
					      Tree Automata

First: A reminder on Automata on
              words



          Typing semistructured data
 Finite state automata on words

                   (, Q, q0 , F ,  )
                                                       Transitions
Alphabet
                                                       :   Q  P(Q)
           State
                   Initial state         Accepting states
                     q0  Q                    F Q



                         Typing semistructured data
     Nondeterministic automaton: Example
                                                        a, q0   q0 ,q1
       a, b
                                                        b, q1   q0 
     Q  q0 , q1 , q2 ,q3 
                                                        , q1   q2 
     F  q2 
                                                        , q2   q3 
                                                        , q3   q3 

     a        b        a        a        b        -               a        b        a        -
q0       q0       q0       q0       q0       q0                       q0       q0       q0
                                                             q0                                  q2
         q1                q1       q1                                q1                q1
                                                      KO                                     OK
                              Reminder
• Deterministic
   – No  transition                        , q   q0 
   – No alternative transitions such as
                                            a, q0   q0 ,q1
• Determinization
   – It is possible to obtain an equivalent deterministic automaton
   – State of new automaton = set of states of the original one
   – Possible exponential blow-up
• Minimization
• Limitations – cannot do
   – Context-free languages               a b , n  Ν
                                            n   n

• Essential tool – e.g., lexical analysis
                    Reminder (2)
•   L(A) = set of words accepted by automata A
•   Regular languages
•   Can be described by regular expressions, e.g. a(b+c)*d
•   Closed under complement
                  *  L( A)
• Closed under union, intersection
                 L( A)  L( B )
                 L( A)  L( B )
    – Product automata with states (s,s’)
      where s is from A and s’ is from A’
    Automata on words versus trees

                                               a
                              B                                        T
         Left to right
                              o                                        o
                               t       b                   b           p
a    b             b      a    t
                              o                                        d
                              m    b       a           b        a      o
          Right to left                                                w
                              u                                        n
                              p
                                                   a       b

          No difference                                        Differences
     Automata

Automata on ranked trees




      Typing semistructured data
           Binary tree automata
• Parallel evaluation

      (, Q, F ,  )
                                                                       q2
                                                                   a
                                            B
• For leaves:                               o         q”                         q1
                                             t             b                b
                                             t
    :   P(Q)                             o
                                            m          b       a       b             a
• For other nodes:                                   q’    q                q”           q
                                             u
   :   Q  Q  P(Q)                       p
                                                                   a             b
                                                               q                         q’

                        Typing semistructured data
       Bottom-up tree automata
        a, q, q'  r , r '
• Bottom-up: if a node labeled a has its children in
  states q, q’ then the node moves
  nondeterministically to state r or r’
• Accepts is the root is in some state in F

• Not deterministic if alternatives or -transitions:
       a, q, q'  {r , r '}  , r   r '
Example: deterministic bottom-up
                                                     0,1,,
1 0  q0 
                                                   Q  q0 , q1
1 1  q1
                                                   F  q1
 2 , q1 , q1   q1
 2 , q0 , q0 ,  2 , q1 , q0 ,  2 , q0 , q1   q0 
 2 , q0 , q0   q0 
 2 , q1 , q1 ,  2 , q1 , q0 ,  2 , q0 , q1   q1
           Boolean circuit evaluation
                            v   q1
                                                         1 0  q0 
               q1       v
                                v    q1                  1 1  q1

                                                          2 , q1 , q1   q1
     q1   v                               v   q1
                    q1
                        1       1                         2 , q0 , q0   q0 
                                q1
                                                          2 , q1 , q0   q0 
                                                          2 , q0 , q1   q0 
                                              v
     0         1                     0             q1
q0                 q1           q0                        2 , q0 , q0   q0 
                                          1        1      2 , q1 , q1   q1
                                     q1             q1    2 , q1 , q0   q1
          OK                                              2 , q0 , q1   q1
Regular tree language = set of trees
  accepted by a bottom-up tree
            automaton




             Typing semistructured data
        Regular tree languages
Theorem: the following are equivalent
  – L is a regular tree language
  – L is accepted by a nondeterministic bottom-up
    automaton
  – L is accepted by a deterministic bottom-up
    automaton
  – L is accepted by a nondeterministic top-down
    automaton
Deterministic top-down is weaker
       Top-down tree automata
                a, q"  q, q'

• Top-down: if a node labeled a is in state q”,
  then its left child moves to state q, right to q’
• Accepts is all leaves are in states in F
• Not deterministic if

               a, q"  q, q', r , r '
               Why deterministic
              top-down is weaker?
• Consider the language
   – L = { <r> <a\>,<b\> <\r>, <r> <b\>,<a\><\r>) }
• It can be accepted by a bottom-up TA
   – Exercise: write a BUTA A such that L = L(A)
• Suppose that B is a deterministic top-down TA that
  accepts both trees in L
   – Exercise: Show that B also accepts <r> <a\><a\> <\r>
   – A contradiction
Fact: No deterministic top-down tree automata accepts
  exactly L
    Ranked trees automata: Properties

•   Like for words
•   Determinization
•   Minimization
•   Closed under
    – Complement
    – Intersection
    – Union
                      But…

• XML documents are unranked:
  book (intro,section*,conclusion)
      Automata

Automata on unranked tree




       Typing semistructured data
           Unranked tree automata
 2 , t   t  2 , t , t   t  2 , t , t , t   t...
 2 , f    f   2 , t , f    f   2 , f , t    f ...
 2 , t   t  2 , t , f   t  2 , f , t   t...
 2 , f    f   2 , f , f    f   2 , f , f , f    f ...


    Issue: represent an infinite set of transitions
    Solution: a regular language
      Unranked tree automata (2)

• Rule:      a, L(Q)   r1,..., rm
• Meaning: if the states of the children of some
  node labeled a form a word in L(Q), this node
  moves to some state in {r1,…,rm}
 2 , And1  t where    And1  t 
 2 , And 0   f  where  And 0  (t  f ) * f (t  f ) *
 2 , Or1  t where Or1  (t  f ) * t (t  f ) *
 2 , Or 0   f  where Or 0  f 
            Building on ranked trees
    a                          a


b       b     a   b        b       b    a      b


b       b     a   b        b       b    a     b


                      Ranked tree: FirstChild-NextSibling

                      F: encoding into a ranked tree
                      F is a bijection
                      F-1: decoding
             Building on
       bottom-up ranked trees (2)
• For each Unranked TA A, there is a Ranked TA
  accepting F(L(A))
• For each Ranked TA A, there is an unranked TA
  accepting F-1(L(A))
• Both are easy to construct

Consequence: Unranked TA are closed under union,
  intersection, complement
Determinaztaion also possible, a bit more tricky

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/7/2012
language:
pages:22