A Library by qingyunliuliu

VIEWS: 7 PAGES: 21

									 DGrid: A Library of Large-Scale
Distributed Spatial Data Structures




                       Pieter Hooimeijer, 2006-05-02
                 Motivation
• DGrid was designed to:
  – Support very large sets of dynamic point data
    (i.e. points that move unpredictably).

  – Offer flexible trade-offs between the cost of
    search operations and the cost of updates.

  – Run on parallel and distributed systems.
                                                    2
          Spatial Data Structures
• Definition—Any data structure that holds:
         - points        - rectangles
         - lines         - polygons
         - curves        - etc.
• They are typically optimized for a particular
  type of search operation.

• We’ll focus on point data.
                                                  3
• Commonly used example: The Quadtree.
  – Works like a binary search tree.
  – Each node in the tree has four children: NE,
    SE, SW, NW.
  – This is called ‘Recursive Decomposition’




                                                   4
• The quadtree implementation in DGrid works
  like this:




   A (1,3)          B (1,2)         C (2,0)
• This is a ‘bottom-up Matrix (MX) Quadtree.’
  (section 3.1.2)                               5
• Let’s do a search on that quadtree:




• We ruled out the ‘entire’ NE quadrant at
  the root level of the tree.
                                             6
• Trade-offs for bottom-up MX Quadtree,
  compared to other tree data structures:

  – The shape of the tree does not depend on the
    insertion order.
  – No need to balance the tree (which would be
    expensive).
  – Insertion and deletion are cheaper for
    clustered data.


                                               7
             C++ Templates
• They look like this:
   template <typename ItemType>
   class vector {
   // ...
   };

• In this case, a separate vector< > class is
  generated for each item type.
   // Strong type-checking using templates:
   MyType * a = someVector.get(5);
   // instead of:
   MyType * a = (MyType)someVector.get(5);
                                                8
• Turns out, C++ templates are a crude
  functional programming language.
• Why ‘crude?’
    {- Templates
// C++ Haskell -}
    fact < int
template 1 = 1 N>
    fact n = n
struct fact { * fact (n - 1)
static const int value = N * fact <N - 1 >:: value;
};

template < >
struct fact <1 > {
static const int value = 1;
};


• This is ‘executed’ by the compiler!                 9
• This is called template metaprogramming.

• It’s used extensively in DGrid, to make it:
   – easier to use;
   – faster;
   – type safe.




                                                10
             Distributed Data
• DGrid uses Message Passing Interface (MPI)
  to run on distributed systems.
  – MPI is a library of basic ‘send’ and ‘receive’
    operations.

  – Each processor gets a unique ID (‘rank’).

  – Use if-statements to run different code on
    different processes.

                                                     11
12
                    DGrid
• DGrid has these data structures:
  – Two types of 2D arrays.
  – A quadtree.
  – A distributed data structure.
  – A location class.

• Allows nesting of these data structures.
                                             13
• Let’s see some examples of nested data
  structures:
  – A 2D array of quadtrees
    (implied: the quadtree contains locations).

  – A quadtree of small 2D arrays.

  – A 2D array of 2D arrays.
     • Called ‘tiling.’
     • A lot like a ‘shallow’ quadtree.
                                                  14
• DGrid uses the Composite Design Pattern:




   location< >          DataStructure




                                         15
• DGrid uses templates instead of a ‘Component’
  interface.
• The result is that the user can do this:
 using namespace dgrid::tags;
 typedef dgrid::dgrid<MyItem,
                      partial_grid_tag<
                       quadtree_tag< > > > bucket;

 bucket a(0, 0, 639, 639, tiles(64, 64) <<
                           tiles(1, 1));



• This is the ‘2D array of quadtrees’
  example.
                                                     16
• Important: The definition of the data structure
  is a type!
• Consequences:
  – Can check parameters at compile time.
    (Must be tiles( ) << tiles ( ) for this example,
    or it won’t compile.)
  – Compiler can optimize extensively (it knows which
    functions are going to call each other).

  – Can’t define a type at runtime, so ‘composition’
    must be known at compile time.

                                                        17
• Data structure operations:
  – insert(x, y, item) – add item at (x, y)
  – delete(x, y, item) – remove item from (x, y)

  – get(x, y, some_list) – get all items at (x, y)
  – get_range(x0, y0, x1, y1) – get all items in the
    range [ (x0, y0) : (x1, y1) ]

• Note: even the location class must support
  these operations.
                                                       18
• In a nested data structure, operations are
  passed on from level to level.

• Because the types are known at compile
  time, these calls can be inlined.
  – Pro: eliminates the overhead of the function
    call.
  – Con: code size increases (function body is
    repeated).

                                                   19
20
              Future Work
• Add more data structures, more search
  operations.

• Separate interface further from
  implementation.

• (Dynamic) Load Balancing.
                                          21

								
To top