Docstoc

Matching Twigs in Probabilistic XML

Document Sample
Matching Twigs in Probabilistic XML Powered By Docstoc
					                         VLDB 2007
                         Vienna, Austria




            Matching Twigs in
         Probabilistic XML
         Benny Kimelfeld & Yehoshua Sagiv


The Selim and Rachel Benin School of Engineering and Computer Science
 Example: Scanning Aerial Photography
  Find regions that include a factory building and a road
               … with a high probability




VLDB 2007         Matching Twigs in Probabilistic XML
 What is the probability that this
 Analyzing a Region region is an answer
 (i.e., includes a factory building and a road)?

                       factory bldg. & wall (40%)
   match                 / house & road (30%)
    (36%)

                                    can be
    The probability of each matchroad (60%) significantly
              road (90%)
    smaller than the probability that there is any match
                                                     match
                                                                   (24%)
  match
                   house (50%) /           factory bldg. (40%) /
   (45%)                                    apt. building (50%)
                 factory bldg. (50%)
                   But specifying the probability of each
                    match
                  match does not answer the question!
                     (36%)
VLDB 2007          Matching Twigs in Probabilistic XML
 A Database Point of View
                   *
      Query
               region
                                   Querying probabilistic data:
            road       factory
                                   Each answer has an amount of certainty:
                       building    The probability of being obtained
                                   when querying a random database
                                                                  aerial-photo
    Probabilistic
                                                                    region
    Data
                                                   factory         compund       road

                                              building building    road house    width
         A prob. process for                                                      4m
                                                area   area       width
       generating random data                        2
                                               200m 550m2          2m

VLDB 2007                   Matching Twigs in Probabilistic XML
 What Query Should We Pose?
      A pattern           *
                                         • An answer is a match
                     region              • What is the probability of each
                                           specific match?
                  road        factory
                                         • What is the probability of each
                                           pair of road & factory building?
                              building

                                         • An answer is a projection of
    A pattern w/          *
                                           one or more matches
     projection      region              • What is the prob. of each
       project
                                           answer after the projection?
                  road        factory
     on region                           • For each region, what is the
                              building     prob. that it has some pair of
  This is what we need!                    road & factory building?
VLDB 2007                Matching Twigs in Probabilistic XML
 Another Example
            Find the following objects in one region:
      A factory building, a road, an antenna, a heliport, a track




VLDB 2007             Matching Twigs in Probabilistic XML
 Finding a Partial Match
            Find the following objects in one region:
      A factory building, a road, an antenna, a heliport, a track
                                 heliport (80%)

                                                               partial
                road (90%)                                     match
                                                               (36%)
                                No Track!

                          factory bldg. w/ antennas (50%) /
                          apt. building w/ water tanks (30%)

       For many applications, that’s good enough …

VLDB 2007             Matching Twigs in Probabilistic XML
 What If … filter out the whole match?
  Should we just
            Does not make sense!
   What about the previous partial match?

                               heliport (80%)

                                                track (20%)   match
               road (90%)
                                                              (7.2%)




                         factory bldg w/ antennas (50%) /
                        apt. building w/ water tanks (30%)

    The probability may be too low to be of any interest!

VLDB 2007           Matching Twigs in Probabilistic XML
 Finding Maximal Matches
                                       *
      A pattern
                                     region

                    factory   road   track antenna heliport

                   building

            The goal is to find the maximal among the
            partial matches with a sufficient probability
                                                                     aerial-photo

     Probabilistic                                                     region

                                                      factory         compund       road
     Data
                                                 building building    road house    width
                                                                                     4m
                                                  area   area        width
                                                       2
                                                 200m 550m2           2m


VLDB 2007              Matching Twigs in Probabilistic XML
 Querying Prob. Data: Earlier Work
     • Projection and incomplete semantics were
       explored for relational models
            – Projection: Very simple queries can be highly
              intractable (data complexity) [Dalvi & Suciu, VLDB 04]
            – Maximally joining relations: Tractable under data
              complexity, generally intractable under query-and-
              data complexity [Kimelfeld & Sagiv, PODS 07]
               • Yet tractable for important classes of schemas

     • None of these paradigms studied in the context
       of prob. XML (only complete matches w/o projection)
        But they are more relevant to prob. XML
     since, as the paper shows, they become tractable
VLDB 2007                 Matching Twigs in Probabilistic XML
  In the
         Content of the preliminary results
                  also have
 The paper, we of maximalsomePaperprojection on
  the combination           matches and

   Query evaluation over probabilistic XML
            Efficient algorithms and complexity analysis
                  for various paradigms of querying

            Evaluating twig queries with projection
            Evaluating Boolean twig queries
            Finding maximal matches of twigs

       In the paper, we explain in detail why our results do not
        follow from previous results on XML/relational models

VLDB 2007             Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 (Ordinary) XML Documents
        Rooted tree
                                   aerial-photo

                                      region

                         factory                     field

                complex bldg. @area            @area heliport

            park.lot heliport      2.5km2      1.3km2


                  Each node has a tag, a value or both

VLDB 2007               Matching Twigs in Probabilistic XML
 Twig Queries
        Rooted tree                        Output node (projection)
                                             Possibly, more than one
                                    *

     Descendant edge                                     Node predicate over
                                  region
                                                          the tag and value
                       heliport            factory


                              park. lot              @area


                                                 ≥10km2
                  Child edge

VLDB 2007         Matching Twigs in Probabilistic XML
 Matches and Answers
                A match of a twig T in a document d is a
               mapping from the nodes of T to those of d
               root(T) → root(d)            node predicates are satisfied
                       child edge → edge            desc. edge → path
                *
    T
                                                                   aerial-photo
                                                                                        d
              region
                                                                       region

   heliport            factory                           factory                    field

                                              complex bldg. @area               @area heliport
                An
            park. lot   answer is obtained from a match by
                           @area
                                                                 2
                                   park.lot heliport 15km2 1.3km
                    listing the images of the output nodes
                       That ≥10km2
                            is, applying projection to the match
VLDB 2007                        Matching Twigs in Probabilistic XML
 Boolean Queries
    A twig without output nodes is a Boolean twig
                        The answer is either true or false


               *
                                    B(d) = true
    B
                                means that there is                aerial-photo
                                                                                          d
             region              a match of B in d                      region

                                                         factory                    field
  heliport            factory

                                               complex bldg. @area               @area heliport
            park. lot           @area
                                          park.lot heliport        15km2         1.3km2

                            ≥10km2

VLDB 2007                         Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 Probabilistic XML
                                                               aerial-photo
                                                                                      d
                                                                 region

                                                     factory                  field

                                            complex bldg. area             area    heliport

                         ∑ Pr(d) = 1    park.lot heliport   2.5km2        1.3km2

                          d
       Probabilistic
      XML document                           Random Instance
   A probabilistic process                     An ordinary XML
   of generating ordinary                   document d, generated
      XML documents                          with probability Pr(d)


VLDB 2007            Matching Twigs in Probabilistic XML
 Implicit Representations
                In practice, the probability space may be huge
                                E.g., uncertainty is many small pieces of data
          It is unrealistic to represent the probabilistic
       document by explicitly specifying the entire space
                       We usually explore implicit representations
                                                                                                                                 Such as the following
                                                                                                                                 one that we consider:


                                                                   aerial-photo                                                                                                 aerial-photo
                                                                                                                 aerial-photo   aerial-photo
                          aerial-photo
                                                                     region                                        region                                                         region
                            region
                                                                                                                                  region

                                                         factory                  field                factory                                 field                  factory                  field
                factory                  field

       complex bldg. area             area    heliport   bldg. area            area           complex bldg. area                            area    heliport   complex bldg. area          area    heliport

   park.lot heliport   2.5km2        1.3km2                    2.5km2         1.3km2      park.lot heliport   2.5km2                       1.3km2



VLDB 2007                                                     Matching Twigs in Probabilistic XML
 A ProTDB Document [Nierman & Jagadish 02]
                                  aerial -photo
    Ordinary                                                        Distributional
     nodes                                                             nodes
                                       region

                   neighborhood                         factory

                                  Independent
                        0. 8

        house         house       vehicle         building

            size       size         type                park .lot     heliport
             s          m                       • 2 types of nodes
    Rooted tree
   Mutually exclusive                         •   2 types of distributions
                               track    private
VLDB 2007                 Matching Twigs in Probabilistic XML
 A ProTDB Document [Nierman & Jagadish 02]
                                  aerial -photo
   A probability for each outgoing
                            region
    edge of a distributional node
                   neighborhood                         factory


                        0. 8

        house         house       vehicle         building

            size       size         type                park .lot   heliport
             s          m

                               track    private
VLDB 2007                 Matching Twigs in Probabilistic XML
 Instance Generation: Step 1
                                    aerial -photo
                                                                      Traverse the
                                        region                       tree top-down

                   neighborhood                          factory
                                                                      Choose children
 Choose children
                                                                      independently
  independently
                         0. 8                                            Choose at
                                                                         most one child
        house         house         vehicle        building

            size        size         type                park .lot     heliport
                      Choose at
                   most one child
         s      m                                       Distributional nodes
  Drop unchosen                                       choose a set of children
     children                   track    private
VLDB 2007                  Matching Twigs in Probabilistic XML
 Instance Generation: Step 2
                                  aerial -photo

                                     region

                   neighborhood                        factory




        house                     vehicle

            size                   type                          heliport
             s
                                                                Drop the
                                                         distributional nodes
                             track
VLDB 2007                Matching Twigs in Probabilistic XML
 Instance Generation: Step 2
                                  aerial -photo               Connect each
                                                           ordinary node to its
                                     region                 closest ancestor
                   neighborhood                        factory




        house                     vehicle

            size                   type                           heliport
             s
                                                                Drop the
                                                         distributional nodes
                             track
VLDB 2007                Matching Twigs in Probabilistic XML
 The Result: An Ordinary Document
                                  aerial -photo

                                     region

                   neighborhood                        factory




        house                     vehicle

            size                   type                          heliport
             s

                             track
VLDB 2007                Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 Querying Probabilistic XML
    Twig w/        *
   projection                         Users pose an ordinary query
                region                 That is, of the type that is applied
                                        to non-probabilistic documents
            road       factory


  Query                building

  Probabilistic XML document                                     aerial-photo

                                                                   region

                                                  factory         compund       road

                                             building building    road house    width
                                                                                 4m
        … but the document is550m width
                         area
                        200m
                               probabilistic
                              area
                                   2m              2        2


VLDB 2007                  Matching Twigs in Probabilistic XML
 The Probability of an Answer
        When querying probabilistic data,
        Each answer has a probability (certainty)

                        (
                       A is obtained by applying Q
            Pr(A) = Pr to a random document of P                              )
                                               P

                  Pr                                           aerial-photo




                         A∈ Q
                                                                 region

                                                     factory

                                            complex bldg. area

                                        park.lot heliport   2.5km2




VLDB 2007              Matching Twigs in Probabilistic XML
 The Prob. of Satisfying a Boolean Query
        When querying probabilistic data,
        Each answer has a probability (certainty)
            If B is a Boolean pattern, we have interest in:

                     (There is a match of B in
                   Pr a random document of P                                   )
                                       P

                     Pr                                aerial-photo

                                                         region


                           Q                 factory

                                    complex bldg. area
                                                                      = true
                                park.lot heliport   2.5km2



VLDB 2007              Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 Computational Problems
   Non-Boolean Queries:
     Input: A prob. document P, a non-Boolean twig
            query Q, a threshold p≥0
     Goal: Find all answers A, s.t. Pr(A∈Q(P))≥ p


   Boolean Queries:
    Input: A prob. document P, a Boolean twig query B
    Goal: Compute Pr(B(P)=true)


VLDB 2007             Matching Twigs in Probabilistic XML
 From Regular to Boolean Queries
            We apply a standard reduction from regular queries
               (that generate mappings) to Boolean ones:

       1. Compute the answers as if the document is
          ordinary (i.e., ignore the distributional nodes)
       2. Compute the probability of each answer

        Step 2 is done by evaluating a Boolean query
               That is, computing the probability of a match

        Next, we consider the evaluation of Boolean queries

VLDB 2007              Matching Twigs in Probabilistic XML
 An Example
                                                                            Q       *
  P                               r                                         a       *       *

                                                           0.5
                                                                                b       c   d
   0.5
                                        2                        0. 5
                          0.



                                      0.
                            7




    a               a                           e      a                e

            b                          b                                d
                    0.8                         0.6
                            0.8                       0.4
                c                 d         c              d




VLDB 2007                             Matching Twigs in Probabilistic XML
 !
 Possible Matches
   • Matches are not disjoint events
        • Matches are not independent events
                                                                                   Q            *
  P                         r                                                       a           *             *

                                                                                            b       c         d
   0.5                              r                    0.5
                                    2          r                 0. 5
                      0.



                                  0.
                                                           r
                        7




                                                                        r
    a             a                      e          a                   e
         0.5                                                     0.5
                                                                            0055
                                                                             ..
                                           2
                            0.




                   0.5                                   .5
                                         0.

             b                 b0.5              d
                              7




                                                      2
                                        0.


                            0.5                         00.5
                                                    0.        0055
                                                               ..
                                          7




                                                                  2
                                                   0.
            a         a               e4 e   a       e                                              0.5




                                                                              2
                                                               0.
                                                                0.
                                                     7




                                                                            0.
                    a 0.8 a a 0.6 0.               a   a e a e
                                                                 7
                  0.8
                                 a  a     a        e   e                                                  e
                 c b      d     c b      d           d
                          b     b     0.6
                                          b     b           d     d
                      0.8
                            008
                             .        b     004
                                             ..    4
                                                     b                                                    d
                                   .8     0.88
                                           0.                   6 0. .
                                                                    06
                                                       8
                                                     0.0.8                          .4
                                                                                   00.6
                                                                d.8                       0.4
                      c               c
                                c d c d
                                                                 0
                                        c
                                          c
                                          d                             c d c d
                                                                        d                   d
VLDB 2007                         Matching Twigs in Probabilistic XML
 Our Approach: Dynamic Programming
           *                                                *                                  *

   a       *           *               *                    *             *        *   a       *        *
       b       c       d           b       c        b           c        d     b   d       b       c    b    …
           0.0                     0.6                      0.0                0.4      0.0            1.0
                                       r

                                                                                      When visiting a node,
  0.5                                                                    0.5
                                                                               0. 5
                                             2
                             0.




                                                                                      evaluate a collection of
                                           0.
                               7




   a                   a                                e            a              e queries (inc. the original
                                                                                      one) over its subtree
               b                                b                                   d
                       0.8                              0.6
                               0.8                                  0.4                Document nodes are
                   c                   d            c                    d             traversed bottom-up
VLDB 2007                                      Matching Twigs in Probabilistic XML
 Our Approach: Dynamic Programming
           *                                                 *                                  *

   a       *            *               *                    *             *        *   a       *       *
       b       c        d           b       c        b           c        d     b   d       b       c   b   …
               Special treatment if the visited node is distributional
                          r

                                                                                       When visiting a node,
  0.5                                                                     0.5
                                                                                0. 5
                                              2
                              0.




                                                                                       evaluate a collection of
                                            0.
                                7




   a                    a                                e            a              e queries (inc. the original)
                                                                                       over its subtree
                b                                b                                   d
                        0.8                              0.6
                                0.8                                  0.4                Document nodes are
                    c                   d            c                    d             traversed bottom-up
VLDB 2007                                       Matching Twigs in Probabilistic XML
  How can we compute the probability that there is a
 Bottom-Up Evaluation
 match, based on previous results for the descendants?*

                                        r                                            a                             *                      *

                                                                                                           b                 c            d
            0.5                                                  0.5   0.5




                                              2
                                0.



                                            0.
            a             a       7                   e      a               e

                  b                           b                              d
                          0.8                         0.6
                                  0.8                       0.4
                                                                                                       r
                      c                 d         c              d
                                                                         0.5                                                      0.5
                                                                                                                                        0. 5




                                                                                               0.



                                                                                                             2
                                                                                                           0.
                                                                                                 7
                                                                         a               a                             e      a                e

   Problem: Each specific match can                                              b                             b                               d
                                                                                         0.8                           0.6
                                                                                                 0.8                         0.4
   involve several different children                                                c                 d           c              d

VLDB 2007                       Matching Twigs in Probabilistic XML
 From Twig to Negated Branches
                    *                            *           *            *
            aNext: *How to
                        *             ≡ a ⋀ * ⋀
                                    compute this value                    *
                b       c      d                             d        b           c



                    *                                   *         *                   *

  Pr    a           *          *    = 1- Pr ⌝ a ⋁ ⌝ * ⋁ ⌝ *
               b        c      d                                  d           b           c


VLDB 2007                   Matching Twigs in Probabilistic XML
 From a Disjunction to Conjunctions
                                   *          *           *

                         Pr    ⌝ a ⋁⌝ * ⋁⌝ *
                                              d       b       c
                        Next: How to compute this value
                                                  The principle of
                                                  inclusion & exclusion

       *            *              *              *       *         *       *

Pr   ⌝a     + Pr⌝   *   + Pr   ⌝*          - Pr ⌝ a ⋀⌝    *   - Pr ⌝ * ⋀⌝   *
                                                                                    …
                    d          b       c                  d         d   b       c


VLDB 2007                  Matching Twigs in Probabilistic XML
 From a Document to Branches
                                      A document satisfies a
                                                     r
            *         *               conjunction of negated twig

  Pr ⌝      *   ⋀ ⌝*                                 of the
                                      branches iff each
                                      doc. branch satisfies
            d     b       c           the conjunction




     Good news: Document branches are independent!

VLDB 2007                 Matching Twigs in Probabilistic XML
 Using Previous Computations on Children
            *       *                         *        *                               *       *
    Pr   ⌝ * ⋀⌝ *                 x Pr    ⌝ * ⋀⌝ *                     x Pr   ⌝ * ⋀⌝ *
            d   b       c                     d    b           c                       d   b       c
                        r                                                      r
                                                   r



            Cut the roots from both twig and doc. branches:


          ⌝d ⋀ ⌝b                         ⌝d ⋀ ⌝b                             ⌝d ⋀ ⌝b
            *           *                    *             *                       *           *
     Pr                            x Pr                                x Pr
                            c                                      c                               c




VLDB 2007                       Matching Twigs in Probabilistic XML
 Descendant Edges
     • In the computation we described, we assumed that the
       root has only child edges; it would not work otherwise!
     • What about descendant edges?

       The corresponding twig branches are replaced:


                    *                         *                   *

                ⌝   *            ≡        ⌝   *        ⋀ ⌝        *

                b        c                b       c               *

                                                              b       c
VLDB 2007               Matching Twigs in Probabilistic XML
 Missing Details
     • Creating the list of twigs that are evaluated over
       the subtree rooted at each visited node
     • Different evaluation methods, depending on the
       type of the visited node
            – Ordinary node (sketched in the previous slides)
            – Distributional node
               • Independent distribution
               • Mutually-exclusive distribution

     • Dealing with node predicates of the twig

        All the details of the algorithm are in the paper
VLDB 2007                 Matching Twigs in Probabilistic XML
 Efficiency
        The algorithm computes Pr(B(P)=true) in time

                                 O(c|B|·|P|)


            Is there an efficient algorithm under query-and-data
                 complexity (polynomial in the query also)?
            No! Computing Pr(B(P)=true) is #P-complete
                       under query & data complexity!

     Even if:                                                     ...
                No desc. edges   Only independent distributions
VLDB 2007                Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 Standard Terminology
                                                         *
   T0: a subtree of                                                                   A match m0 of T0 is a
   twig T, includes           a                          e             T0           f partial match of T
   the root
                                                 b                   c              d
                                                       T

  m2 subsumes m1 if m2 includes the mappings of m1
                                                   r                                                                       r


                     0.5                                                      0.5              0.5                                                  0.5
                                                                                    0. 5                                                                  0. 5




                                                                                                                   0.



                                                                                                                                 2
                                           0.



                                                         2




                                                                                                                               0.
                                                       0.




                                                                                                                     7
                                             7




                     a               e                           e        a                f   a             e                           e      a                f
    That is, m1=m2           b                          b                                  d         b                          b                                d
   over domain(m1)                   0.8
                                             0.8
                                                                 0.6
                                                                         0.4
                                                                                                             0.8
                                                                                                                     0.8
                                                                                                                                         0.6
                                                                                                                                               0.4

                                 c                 d         c                d
                                                                                    m1                   c                 d         c              d
                                                                                                                                                          m2
VLDB 2007                  Matching Twigs in Probabilistic XML
 Maximal Answer: Definition
                   m is a maximal answer:
     Ordinary Data:

      ∄ m0, such that m0 ≠ m and m0 subsumes m

     Probabilistic Data:                  In other words, m is maximal
                                          among the partial answers
       • Pr(m) ≥ threshold                with a sufficient probability

       • ∀ m0, if m0 ≠ m and m0 subsumes m, then
                   Pr(m0) < threshold
VLDB 2007             Matching Twigs in Probabilistic XML
 The Computational Problem



     Input: A probabilistic document P, a twig pattern T,
            a threshold p≥0
     Goal: Find all maximal matches of T in P w.r.t. p




VLDB 2007          Matching Twigs in Probabilistic XML
 Complexity of Finding Maximal Matches
     • It is trivial to show that maximal matches can be found
       efficiently under data complexity
     • Unlike the case of complete matches (NP-complete),

           Maximal matches can be computed
      efficiently under query-and-data complexity


                    Evaluation Algorithm
        • The algorithm runs with incremental polynomial time
        • All the details are in the paper …



VLDB 2007            Matching Twigs in Probabilistic XML
  Talk Overview

       1. Introduction
       2. Twig Queries over Probabilistic XML
            − XML and Twig Queries
            − Probabilistic XML
            − Querying Probabilistic XML (Complete Semantics)

       3. Query Evaluation (Complete Semantics)
       4. Finding Maximal Matches
       5. Conclusion, Related and Future Work

VLDB 2007               Matching Twigs in Probabilistic XML
 Paper Summary
   • Query evaluation over probabilistic XML is
     investigated
       – Known data model
       – Twig patterns (node predicates, child & desc. edges)
       – Complete & maximal semantics, projection
   • Evaluation algorithm for Boolean queries
       – Also used for evaluating queries with projection
       – Efficient under data complexity
   • An algorithm for finding the maximal matches
       – Efficient under query-and-data complexity
   • Analysis of the complexity of querying prob. XML

VLDB 2007            Matching Twigs in Probabilistic XML
 Complexity Results
                                                         Query & Data
                                  Data Complexity
                                                          Complexity

                w/o projection          Poly.            NP-complete
    Complete
    semantics
                     Boolean            Poly.            #P-complete

                 w/ projection          Poly.            #P-complete

                w/o projection          Poly.             Inc. Poly.
     Maximal
   semantics
                 w/ projection          Poly.               Open


VLDB 2007          Matching Twigs in Probabilistic XML
 Other Models of Probabilistic XML
  The complexity results in the different prob. XML
        models are a part of our ongoing research
                  Fuzzy trees [Abiteboul & Senellart, 2006]
                    Query Evaluation: #P-Complete

            Our ProTDB [Nierman and Jagadish, 2002]
            model  Query Evaluation: Tractable

               Simple prob. trees [Abiteboul & Senellart, 2006]
                      Query Evaluation: Tractable

            PXML [Hung, Getoor & Subrahmanianm, 2003]
                        Query Evaluation:
            Tree docs.: Tractable, DAG docs.: #P-hard

    Query evaluation: Complete semantics w/ projection
VLDB 2007               Matching Twigs in Probabilistic XML
 Ongoing and Future Work
        Implementing a system for representing and
        querying probabilistic XML
        Optimization of the proposed algorithms
            – We already obtained significant improvements, both
              experimentally and analytically
        Extending the expressiveness of the model of
        probabilistic XML
            – New types of distributional nodes
            – Ongoing work: A combination of ProTDB [Nierman and
              Jagadish, 2002] and PXML [Hung, Getoor &
              Subrahmanianm, 2003]
        Combining incompleteness and projection
VLDB 2007                Matching Twigs in Probabilistic XML
              Thank you!

                   Questions?




VLDB 2007   Matching Twigs in Probabilistic XML

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/1/2012
language:Unknown
pages:55