Location-Based Services _amp; Continuous kNN Query Processing by ert554898

VIEWS: 4 PAGES: 51

									Location-Based Services &
Continuous kNN Query Processing

                Tai Do
       Data Systems Group, UCF
               Fall 2005
Outline
   Introduction to Data Management in
    Mobile Computing.
   Discussion on Location-Based Services
    and its enabling technologies.
   In-depth discussion on Continuous kNN
    queries (2 recent papers [MHP05], and
    [XMA05])
    Data Management in Mobile
    Computing
       Our interest: application-driven research that
        involves data management in mobile computing.
       Services/applications that inspire data management
        research:
         Location-based services, Transactional services, Data
          mining applications.
       Research problems to support these novel services
        efficiently:
         Spatiotemporal Query Processing.
         Data dissemination over limited bandwidth channels.
         Data consistency guarantees.
         Advanced interfaces for mobile computers.
Location-Based Services (LBS)
   Location-Based Services
        can be defined as services that integrate a mobile device’s location or position with
         other information so as to provide added value to a user.
   Examples:
        Military and Government industries
        Emergency services (E911 in US and 112 in Europe)
        Commercial Sector: Advanced Traveler Information Systems (DoT), location-aware
         games, Advertising services
   Commercial potentials of LBS([S03]):
        Optimistic prediction: $4B by 2002, $81.9B by 2005 (Europe only)
        Pessimistic prediction: $11M by 2002, $167M by 2005 (USA only).
   Enabling Technologies:
        Mobile Positioning Methods
        Location Update Techniques
        Location-based Query Processing
Mobile Positioning
   GPS: Global Positioning System. Accuracy: up
    to 3 meters or more.
   Cell-ID (Europe): Accuracy: 100m-3km




    Overview of LBS app. And level of accuracy required ([SV04])
Location Update Techniques
   Dead-Reckoning Location Update
    Policies ([GS05])
   Periodic Updates
Concept of Uncertainty
   Uncertainty is an inherent feature in
    databases storing location information.
   Sources of uncertainty:
       Mobile Positioning Methods
       Location Update Techniques
   Capturing uncertainty in the model and
    query language is an ongoing research.
Location-Based Queries
   Two kinds of location-based queries:
       Snapshot queries: “Tell me 3 nearest cars around
        me now”
       Continuous queries: “Monitor 3 nearest
        restaurants around me in the next 10 minutes”
   We focus on continuous kNN (CkNN) query
    processing.
       Main memory solution: Conceptual Partitioning
        Model CPM {MHP05}
       Disk-based solution: Shared Execution Algorithm
        SEA-CNN {XMA05}
       Common Assumptions
  Parameters                                     Values

                     Unconstrained (Euclidean)            Transportation Network
Underlying network                                            (shortest path)
Movement pattern           Unpredictable                         Trajectory

 Location Update     Query-Aware (safe region)        Query-Blind (periodic OR
                                                          dead reckoning)
    Mutability        Moving queries       Static queries over           Moving
                     over static objects    moving objects             queries over
                                                                      moving objects
 Processing Type             Distributed                       Centralized

     Storage               Disk-resident                      Main memory
SEA-CNN: Over view
   Overview:
       Objects are stored in disk, everything else is in memory.
       Centralized processing.
       Support all kinds of mutability between objects and queries.
       No movement pattern, in open space.
   Goal:
       Minimize I/O cost, and CPU time.
   Two important features:
       Incremental evaluation of queries
       Shared execution
        SEA-CNN: Data Structures
                                     list of Qids whose SR
     <Qid, Loc, k, AR, SR>           overlapps with this cell
          ... ... ... ...




                                                                                          ... ...




                                                                       Object Buffer     Query
                                   Answer Grid                                         Buffer (QB)
Query Table (QT)                                                        (OB) NxN
                                 Table (AG) Nx N




                   <Ob.ID, ob.Loc>




                                                        Object Table
                                                         (OT) NxN
         SEA-CNN: Incremental Search
   Key points:
        For each query q, define a search region based on past answer and recent
         movements of q and objects.
         Only objects inside search region are checked against q.
   Given q.ARt0 as the answer radius of q at time t0 (q.AR = distance from
    q to kth-NN object)
    At time t1, the search radius of query q (q.SRt1) is computed
    as follows:
        Step 1: check if any object moves in q.ARt0 during [t0, t1]. If yes, q.SRt1 =
         q. ARt0. If no, q.SRt1 = 0.
        Step 2: check if any object that was in q.ARt0 but moves out of q.ARt0
         during [t0, t1]. If yes, q.SRt1 equals the distance from q to the furthest
         object.
        Step 3: check if q moves during [t0, t1]. If yes:
             If q. SRt1 =0 then q.SRt1 = q.ARt0 + |q.Loct1- q.Loct0|
             If q. SRt1 !=0 then q.SRt1 = q. SRt1 + |q.Loct1- q.Loct0|
   SEA-CNN: Incremental Search
   (An Example)




Q1: O5 and Q1 move during [T0, T1]. So Q1.SRT1 = Q1.ART0 + |Q1.LocT1-Q1.LocT0


Q2: O8 moves out of Q2.ART0 during [T0, T1]. So Q2.SRT1 = |Q2.LocT1-O8.LocT0
    SEA-CNN: Shared Execution
   Key points:
      Utilize shared execution to reduce repeated I/O operations.

      Group similar queries together. Evaluating this set of queries is

       reduced to a spatial join between the objects and the queries.
SEA-CNN: Algorithm
CPM: Overview
   Overview:
       Objects and queries are stored in memory.
       Centralized processing.
       Support all kinds of mutability between objects and queries.
       No movement pattern, in open space.
   Goal:
       Minimize CPU time.
   Important features:
       Conceptual Partitioning
       Simulate traditional kNN search (using branch-and-bound search
        with breadth-first (or best-first) traversal)
   Roadmap:
       Initial NN Computation (conceptual partitioning + branch and
        bound search + breadth-first traversal)
       Handling Updates
CPM: Data Structures
    CPM: NN Computation
    (Conceptual Partitioning)

Conceptual Partitioning:
  • What is CP? Partitioning of
  cells into rectangles based on
  proximity to the query cell. Each
  rectangle has direction and level.
  • Why CP? A natural processing
  order of the cells. Facilitate NN
  search (search minimal set of
  cells).
CPM: NN Computation
(Algorithm by Example)
Search heap content (always
   sorted):
  H ={<c4,4,0>, <U0,0.1>,
   <L0,0.2>, <R0,0.8>,
   <D0,0.9>}
  Deheap c4: do nothing.
  Deheap U0:
        insert cells of U0
        Insert U1
   Continue until deheap
    <c3,3, 1> and find 1st
    candidate p1:
        best_dist = dist(p1, q) =
         1.7
   Continue until deheap c2,4
    and find p2:
        best_dist = dist(p2, q) =
         1.3
   Terminate because the
    next entry in the heap has
CPM: Handling Updates
   Key Points:
       Focus on moving objects, static queries.
        Moving queries are treated as new queries.
       Reexamine only queries whose influence
        regions overlap with updated cells.
       Re-compute affected queries incrementally
        based on book keeping information to save
        computation time.
          CPM: Handling Updates
          (Algorithm by Example)
                                        NN Re-computation Algorithm
                                        Input: grid G, affected query q
                                        Output: new NN for q
                                        /* Similar to NN Computation. Utilize the
                                        book keeping information in visit_list and
                                        search heap */




• p2 moves from c2,4 to c0,6
• c2,4 has q in the influence list and dist(q, p2’) > best_NN = dist(q,
p2)  mark q as affected query.
• c0,6 has an empty influence list  ignore
• Re-compute NN for q in the NN Re-computation algorithm
SEA-CNN & CPM:
A Comparison
   Common features between the two:
       Performance metrics:
            Use query processing time (or CPU time) at the centralized
             server as the primary metric.
            Ignore communication cost.
       Employ Grid-based Indexing (simple, fast maintenance).
       Keep a search region for each query to handle updates.
   Are the differences significant?
       CPM saves some computations over SEA-CNN (as shown in
        the CPM paper) because CPM uses an optimal search
        algorithm.
       However, is saving in CPU time still very important?
     Summary
   Monitoring queries to support LBS is an intensive research area
    in the past few years:
        Short-term research trend seems to be proposals of new, more
         advance query types (our next presentation will discuss Reverse
         NN, and Group NN).
        Long-term research could be a Moving Object Databases.
         Recommend: “Moving Objects Databases” textbook to gain
         perspective:
             Location-management perspective vs. spatio-temporal data
              perspective.
   Many LBS-based commercial products: Verilocation, uLocate,
    meetro, EarthComber, CellSpotting.
   Standards and Development Software: Natural Area Coding
    System, Mobile Location Services Reference Architecture by
    Sun.
   For LBS updated info: try LBSZone.
References
   {B99} D. Barbara. "`Mobile Computing and Databases- A Survey.“ In
    {\em IEEE Transactions of Knowledge and Data Engineering, 11(1),
    108-117, 1999.}
   {S03} http://www.wirelessdevnet.com/features/nacjan03/
   {GS05} R. H. Guting, M. Schneider. Moving Object Databases. Book.
   {SV04} J. Schiller, A. Voisard. Location-based Services. Book.
   {MHP05} Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris
    Papadias. Conceptual Partitioning: An Efficient Method for
    Continuous Nearest Neighbor Monitoring Nearest Neighbor Monitoring.
    SIGMOD 2005.
   {YPK05} Yu, X., Pu, K., Koudas, N. Monitoring K-Nearest Neighbor
    Queries Over Moving Objects. ICDE, 2005.
   {XMA05} Xiong, X., Mokbel, M., Aref, W. SEA-CNN: Scalable Processing
    of Continuous K-Nearest Neighbor Queries in Spatio-temporal
    Databases. ICDE, 2005.
   {CDT+00} Jianjun Chen, David J. DeWitt, Feng Tian, and Yuan Wang. NiagaraCQ: A
    Scalable Continuous Query System for Internet Databases. In SIGMOD, 2000.
   {CF02} Sirish Chandrasekaran and Michael J. Franklin. Streaming Queries over
    Streaming Data. In VLDB, 2002. (Psoup system).
Note
   Due date of your presentation slides is
    November 14 2005.
Aggregate NN Queries in Spatial
Databases and Location-based
Services

               Tai Do
      Data Systems Group, UCF
        November 11, 2005
Outline
   Aggregate Nearest Neighbor (ANN)
    queries:
       Introduction to ANN.
       Solutions for Group Nearest Neighbor
        (GNN) Queries, a specific type of ANN.
       Solutions for Continuous Group Nearest
        Neighbor Queries (CGNN).
    Aggregate NN:
    Examples and Applications




   Applications:
       Business decision making (construction of new facilities)
       Military Rescue (earliest pick-up time)
       Severe weather monitoring (most dangerous area)
Aggregate NN: Definition
   What is ANN?
       A generalized form of NN search (multiple query points vs.
        single query point)
   Formally:
       Given P = {p1, …, pN} (set of data points), Q={q1,…qn} (set
        of query points)
       Aggregate distance function adist(p, Q) = f(|pq1|, …, |pqn|)
       An ANN query returns the data point p with the minimum
        aggregate distance
        Note: AkNN is similar (find k >=1 data points), we only
        focus on ANN.
       When f= sum, the ANN is called Group Nearest Neighbor
        Queries.
Group NN Queries
   Assumptions:
       Queries are in memory.
       Data points are in disk and indexed by R-tree.
   Goal:
       Minimize the extent and cost of the search (I/O
        and CPU time)
   Roadmap: 3 solutions
       Multiple query method
       Single point method
       Minimum bound method
Multiple Query Method (MQM)
   Apply multiple conventional NN queries, then combine the
    results.
   MQM is a straightforward application of the threshold algorithm
    ([FLN03]):
        Each query point visits incrementally its NN data points (1st NN,
         then 2nd NN, …)
        Compute the aggregate distance of the current NN data point
        Do the two above steps until we have seen the best data point.
        Main idea:
             Question: how do we know that the aggregate distance of the seen
              data point is smaller than the aggregate distance of unseen data
              points?
             Answer: Predict minimum aggregate distance of unseen data points (or
              in other words, use a threshold)
      MQM: An Example (1)

   Q = {q1, q2}
   P = {p1,…, p12}
MQM: An Example (2)

   q1               q2

                                      ID   Dist(q1) Dist(q2) Sum/adist

 t1 = 0          t2 = 0



T= 0, best_dist = , best_NN = null
     MQM: An Example (3)
Step 1:
    • Find the next (1st ) NN of q1
    • Update t1 and T



    q1             q2

 (p10, 2)                             ID   dist(q1) dist(q2) Sum/adist

  t1 = 2



T= t1 + t2 = 2 + 0 = 2
   MQM: An Example (4)
Step 2:
• if the current aggregate distance < best_dist ? Update best_dist and best_NN
• If current best aggregate distance <= T ? Stop
Else go to the next NN of the next query point and repeat step 1



       q1             q2

  (p10, 2)                                 ID    dist(q1) dist(q2) Sum/adist

      t1 = 2                               p10     2        5        7

T=2                best_dist = 




          best_dist = 7    best_NN = p10
                MQM: An Example (5)
Step 1:
    • Find the next (1st ) NN of q2
    • Update t2 and T




        q1              q2

     (p10, 2)        (p11, 3)                ID    dist(q1) dist(q2) Sum/adist

      t1 = 2          t2 = 3                 p10     2        5         7


T = t1 + t2 = 2 + 3 = 5




          best_dist = 7      best_NN = p10
   MQM: An Example (6)
Step 2:
• if the current aggregate distance < best_dist ? Update best_dist and best_NN
• If current best aggregate distance <= T ? Stop
Else go to the next NN of the next query point and repeat step 1



       q1             q2
  (p10, 2)          (p11, 3)               ID    dist(q1) dist(q2) Sum/adist

      t1 = 2         t2 = 3                p10     2        5        7

T=5                best_dist = 7           p11     3        3         6




          best_dist = 6    best_NN = p11
                     MQM: An Example (7)
    Step 1:
        • Find the next (2nd ) NN of q1
         • Update t1 and T




             q1              q2

          (p10, 2)        (p11, 3)                ID    dist(q1) dist(q2) Sum/adist

          (p11, 3)        t2 = 3                  p10     2        5         7

           t1 = 3                                 p11     3        3         6


T = t1 + t2 = 3 + 3 = 6

               best_dist = 6      best_NN = p11
         MQM: An Example (6)
      Step 2:
      • if the current aggregate distance < best_dist ? Update best_dist and best_NN
      • If current best aggregate distance <= T ? Stop
      Else go to the next NN of the next query point and repeat step 1



           q1            q2
        (p10, 2)       (p11, 3)                 ID   dist(q1) dist(q2) Sum/adist

        (p11, 3)        t1 = 3                 p10      2        5       7

         t1 = 3       best_dist = 6            p11      3        3        6

T=6                    No Update               p11      3        3        6
 STOP        best_dist = 6    best_NN = p11
Single Point Method (SPM)
   Problem with MQM:
       Multiple accesses to the same node and retrieve the same
        data point (e.g p11) through different queries.
   SPM processes queries by a single traversal.
   Strategy:
       Compute the centroid q of Q, which is a point with small
        adist(q, Q)
       The GNN is a point of P “near” q.
   Challenges:
       The computation of q.
       The range around q, in which we should look for points of P,
        before we conclude that no better GNN can be found.
SPM: Illustration
SPM: The Computation of q
    SPM: Finding the range
   To define the range around q: find heuristics
    that can safely prune nodes in R-tree
       Lemma 1:
         For each query point qi we have |pqi| + |qiq|>= |pq|
         Summing up the n inequalities:

|pqi| + |qiq| >= n*|pq|  adist (p, Q) >= n|pq| - adist (q, Q) (1)
       Lemma 1 can be used for pruning intermediate nodes:
         Node N can be pruned if
          mindist(N, q) >= (1/n) * [best_dist + adist(q,Q)] (2)
    Because: when we transform this pruning rule we have
    n * mindist(N, q) – adist(q,Q) >= best_dist (3)
    For any p in node N: dist(p,q) >= mindist(N,q), so
    n * dist(p,q) – adist(q, Q) >= best_dist (4)
    Using Lemma 1 we have adist(p, Q) >= best_dist, hence node N
      can be safely pruned.
SPM: Pruning Illustration
   Both N1 and N2 can
    be pruned:
       best_dist =
        adist(best_NN, Q)
        =9
       adist(q, Q) = 3
       (1/n)(best_dist +
        adist(q,Q)) = ½ (9
        + 3) = 6
       mindist(N1,q) = 10
        and mindist(N2,q)
        =6
Minimum Bound Method
(MBM)
   Like SPM, MBM performs a single query, but
    uses the minimum bounding rectangle M of Q
    (instead of a centroid q) to prune the search
    space.
   Is MBM obviously better than SPM? No clear
    reason. Must evaluate through experiments.
   Strategy:
       Use good heuristics to identify the qualifying
        nodes
          Minimum Bound Method:
          Heuristics




   Heuristic 1: A node N can’t contain      Heuristic 2: A node N can be safely
    qualifying points if:                     pruned if:
    mindist (N, M) >= (1/n)*best_dist,        (mindist(N, qi)) >= best_dist
    because for any data point p in N
                                             Heuristic 2 prunes both N1 and N2
    adist(p, Q) >= n * mindist(N, M)
    >= best_dist
   Heuristic 1 prunes N1 but not N2.
Performance Study
Continuous Group NN
   Assumptions:
       Both query points and data points are in
        memory.
   Method:
       Use a grid index.
       Utilize conceptual partitioning of the space
        around query Q.
       Apply Minimum Bound Method.
     Continuous GNN:
     Details
   amindist (c, Q) = (qi in Q)
    (mindist(c, qi)).
   amindist(c,Q) is the lower
    bound of mindist(p, Q) for
    any data point p in cell c.
   The GNN computation is
    similar to the NN
    computation presented in
    previous class.
Summary
   Threshold Algorithm:
       Simple, useful, and reusable.
   Aggregate Nearest Neighbor Queries in
    Spatial Database:
       Practical applications.
       Good heuristics are important.
   Optimal ANN search remains
    unsolved???
References
   {GS05} R. H. Guting, M. Schneider. Moving Object Databases. Book.
   {PTM+05} D. Papadia, Y. Tao, K. Mouratidis, and C. K. Hui. Aggregate
    Nearest Neighbor Queries in Spatial Databases. ACM Trans. On
    Database Systems, Vol. 30, No. 2, June 2005, Pages 529-576.
   {MHP05} Kyriakos Mouratidis, Marios Hadjieleftheriou, Dimitris
    Papadias. Conceptual Partitioning: An Efficient Method for
    Continuous Nearest Neighbor Monitoring Nearest Neighbor Monitoring.
    SIGMOD 2005.
   {PST+04} Dimitris Papadias, Qiongmao Shen, Yufei Tao, Kyriakos
    Mouratidis.Group Nearest Neighbor Queries. ICDE 2004.
   {XZ06} Tian Xia, Donghui Zhang. Continuous Reverse Nearest Neighbor
    Monitoring. ICDE 2006.
   [FLN03] Ronald Fagin, Amnon Lotem, and Moni Naorc. Optimal
    aggregation algorithms for middleware. Journal of Computer and
    System Sciences 66 (2003) 614–656.
   www.cs.fiu.edu/~vagelis/ classes/COP6727/slides/fagin.ppt. The
    animation for the MQM comes from this.

								
To top