# Tree Automata

Document Sample

```					      Tree Automata

First: A reminder on Automata on
words

Typing semistructured data
Finite state automata on words

(, Q, q0 , F ,  )
Transitions
Alphabet
 :   Q  P(Q)
State
Initial state         Accepting states
q0  Q                    F Q

Typing semistructured data
Nondeterministic automaton: Example
 a, q0   q0 ,q1
  a, b
 b, q1   q0 
Q  q0 , q1 , q2 ,q3 
 , q1   q2 
F  q2 
 , q2   q3 
 , q3   q3 

a        b        a        a        b        -               a        b        a        -
q0       q0       q0       q0       q0       q0                       q0       q0       q0
q0                                  q2
q1                q1       q1                                q1                q1
KO                                     OK
Reminder
• Deterministic
– No  transition                        , q   q0 
– No alternative transitions such as
 a, q0   q0 ,q1
• Determinization
– It is possible to obtain an equivalent deterministic automaton
– State of new automaton = set of states of the original one
– Possible exponential blow-up
• Minimization
• Limitations – cannot do
– Context-free languages               a b , n  Ν
n   n

• Essential tool – e.g., lexical analysis
Reminder (2)
•   L(A) = set of words accepted by automata A
•   Regular languages
•   Can be described by regular expressions, e.g. a(b+c)*d
•   Closed under complement
 *  L( A)
• Closed under union, intersection
L( A)  L( B )
L( A)  L( B )
– Product automata with states (s,s’)
where s is from A and s’ is from A’
Automata on words versus trees

a
B                                        T
Left to right
o                                        o
t       b                   b           p
a    b             b      a    t
o                                        d
m    b       a           b        a      o
Right to left                                                w
u                                        n
p
a       b

No difference                                        Differences
Automata

Automata on ranked trees

Typing semistructured data
Binary tree automata
• Parallel evaluation

(, Q, F ,  )
q2
a
B
• For leaves:                               o         q”                         q1
t             b                b
t
 :   P(Q)                             o
m          b       a       b             a
• For other nodes:                                   q’    q                q”           q
u
 :   Q  Q  P(Q)                       p
a             b
q                         q’

Typing semistructured data
Bottom-up tree automata
 a, q, q'  r , r '
• Bottom-up: if a node labeled a has its children in
states q, q’ then the node moves
nondeterministically to state r or r’
• Accepts is the root is in some state in F

• Not deterministic if alternatives or -transitions:
 a, q, q'  {r , r '}  , r   r '
Example: deterministic bottom-up
  0,1,,
1 0  q0 
Q  q0 , q1
1 1  q1
F  q1
 2 , q1 , q1   q1
 2 , q0 , q0 ,  2 , q1 , q0 ,  2 , q0 , q1   q0 
 2 , q0 , q0   q0 
 2 , q1 , q1 ,  2 , q1 , q0 ,  2 , q0 , q1   q1
Boolean circuit evaluation
v   q1
1 0  q0 
q1       v
v    q1                  1 1  q1

 2 , q1 , q1   q1
q1   v                               v   q1
q1
1       1                         2 , q0 , q0   q0 
q1
 2 , q1 , q0   q0 
 2 , q0 , q1   q0 
v
0         1                     0             q1
q0                 q1           q0                        2 , q0 , q0   q0 
1        1      2 , q1 , q1   q1
q1             q1    2 , q1 , q0   q1
OK                                              2 , q0 , q1   q1
Regular tree language = set of trees
accepted by a bottom-up tree
automaton

Typing semistructured data
Regular tree languages
Theorem: the following are equivalent
– L is a regular tree language
– L is accepted by a nondeterministic bottom-up
automaton
– L is accepted by a deterministic bottom-up
automaton
– L is accepted by a nondeterministic top-down
automaton
Deterministic top-down is weaker
Top-down tree automata
 a, q"  q, q'

• Top-down: if a node labeled a is in state q”,
then its left child moves to state q, right to q’
• Accepts is all leaves are in states in F
• Not deterministic if

 a, q"  q, q', r , r '
Why deterministic
top-down is weaker?
• Consider the language
– L = { <r> <a\>,<b\> <\r>, <r> <b\>,<a\><\r>) }
• It can be accepted by a bottom-up TA
– Exercise: write a BUTA A such that L = L(A)
• Suppose that B is a deterministic top-down TA that
accepts both trees in L
– Exercise: Show that B also accepts <r> <a\><a\> <\r>
Fact: No deterministic top-down tree automata accepts
exactly L
Ranked trees automata: Properties

•   Like for words
•   Determinization
•   Minimization
•   Closed under
– Complement
– Intersection
– Union
But…

• XML documents are unranked:
book (intro,section*,conclusion)
Automata

Automata on unranked tree

Typing semistructured data
Unranked tree automata
 2 , t   t  2 , t , t   t  2 , t , t , t   t...
 2 , f    f   2 , t , f    f   2 , f , t    f ...
 2 , t   t  2 , t , f   t  2 , f , t   t...
 2 , f    f   2 , f , f    f   2 , f , f , f    f ...

Issue: represent an infinite set of transitions
Solution: a regular language
Unranked tree automata (2)

• Rule:      a, L(Q)   r1,..., rm
• Meaning: if the states of the children of some
node labeled a form a word in L(Q), this node
moves to some state in {r1,…,rm}
 2 , And1  t where    And1  t 
 2 , And 0   f  where  And 0  (t  f ) * f (t  f ) *
 2 , Or1  t where Or1  (t  f ) * t (t  f ) *
 2 , Or 0   f  where Or 0  f 
Building on ranked trees
a                          a

b       b     a   b        b       b    a      b

b       b     a   b        b       b    a     b

Ranked tree: FirstChild-NextSibling

F: encoding into a ranked tree
F is a bijection
F-1: decoding
Building on
bottom-up ranked trees (2)
• For each Unranked TA A, there is a Ranked TA
accepting F(L(A))
• For each Ranked TA A, there is an unranked TA
accepting F-1(L(A))
• Both are easy to construct

Consequence: Unranked TA are closed under union,
intersection, complement
Determinaztaion also possible, a bit more tricky

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 4 posted: 8/7/2012 language: pages: 22