A Scalable Distributed Information Management System (SDIMS) by scd34940


									A Scalable Distributed
Information Management System

    P. Yalagandula, M. Dahlin
         SIGCOMM 2004
   Introduction
       Goal : Aggregation
   Innovation
       Flexibility
       Scalability
       Robustness
   Implementation
   Evaluation
   Conclusions
   Why SDIMS ?
       Monitor, querying, reacting to changes are core
        components of applications such as system
        management, service placement, data sharing
        and caching, etc.
       SDIMS in a networked system would provide a
        distributed operating system backbone and
        facilitate the development and deployment of
        new distributed service.
Introduction (cont.)
   Fundamental
       Hierarchical aggregation
            A node access detailed views of nearby
             information and summery views of global
            A hierarchical system aggregate information
             through reduction trees.
Introduction (cont.)
   A SDIMS should have four properties.
       Scalable
       Flexibility
       Administrative isolation
       Robustness
   SDIMS should accommodate large
    numbers of nodes.
   SDIMS should allow applications to
    install and monitor large numbers of
    data attributes.
   SDIMS should accommodate a range of
    applications and attributes.
       Read-dominated attribute (rarely change)
            Num of CPUs
       Write-dominated attribute (change often)
            Num of processes
   SDIMS should leave the policy decision
    of tuning replication to applications.
Administrative isolation
   Nodes can be arranged in an
    organizational or administrative
   Domain-based control.
       Monitor
       Query
   SDIMS should adapt to reconfigurations in
    a timely fashion when node failures or
   SDIMS should provide mechanisms so that
    applications can tradeoff the cost of
    adaptation with consistency level of
    aggregated results when reconfigurations
    Related Works
   Astrolabe
       A single logical aggregation tree that mirrors
        a system administrative hierarchy.
       A general interface for installing new
        aggregation functions.
       An unstructured gossip protocol for
        disseminating information and replicating all
        aggregated attribute values for a sub-tree to
        all nodes in the sub-tree.
Related Works (cont.)
       Any nodes can answer queries by using
        local information.
       Not scalable. (replication)
       Not flexibility. (Type of attribute)

   Solution : P2P

                                           Go to DHT
   For each level in the
    hierarchy, the agent
    maintains a record with the
    list of child zones (and their
    attributes), and which child
    zone represents its own
    zone (self).

                                     Back to
    Gossip protocol
   Periodically, each agent selects some other
    agent at random and exchanges state
    information with it.
   If the two agents are in the same zone, the
    state exchanged relates to MIBs in that zone.
   If the two agents are in different zone, they
    exchange state associated with the MIBs of their
    least common ancestor zone.

                                                Back to
Related Works (cont.)
   DHT
       SkipNet, CAN, Pastry, Chord, Tapestry
   How to scalable map different attributes to
    different aggregation tree in a DHT mesh ?
    {physical network vs overlay network}
   How to provide flexibility in the
    aggregation to accommodate different
    application requirement ?
    {flexible API for installing and controlling
    Problem ?
   How to adapt a DHT mesh to attain
    administrative isolation property ?
    {virtual organization}
   How to provide robustness without
    unstructured gossip and total replication ?
    {cache; pre-computing or on-demand re-
Aggregation Abstraction
Aggregation Abstraction
   Each physical node in the system is a
    leaf in the tree.
   An internal non-leaf, which we call
    virtual node, is simulated by one or
    more physical nodes at the leaves of
    the sub-tree for which the virtual node
    is the root.
      Aggregation Abstraction (cont.)
   Each physical node has local data stored as
    a set of (attributeType, attributeName, value)
   The system associates an aggregation
    function ftype with each attribute type.
        Aggregation Abstraction (cont.)
   For each level-i sub-tree Ti in the system
    has an aggregate value Vi, type, name for each
    (attributeType, attributeName) pair.
   The aggregate value for a level-i sub-tree Ti
    is the aggregate function for the type, ftype
    computed across the aggregate values of
    each of Ti ‘s k children.
     Vi, type, name = ftype (Vi1,type,name ,Vi11,type,name ,  ,Vi1,1type,name )
Aggregation Abstraction (cont.)
   Example of ftype          n

       Avg(V1, …, Vn)=1/n  Vi
                           i 0

       SUM(V1, …, Vn) =  Vi
                         i 0

   Aggregation function satisfy the
    hierarchical computation property
       Aggregation Abstraction (cont.)

node     Virtual node
   Flexibility
   Scalability
   Administrative isolation
   Robustness
   Operation API
       Install
       Update
       Prob
Install Operation
   The Install operation installs an
    aggregation function in the system.
  Prob Operation

    Prob Operation (cont.)
   When node A issues a continuous probe at
    level l for an attribute, then updates for the
    attribute at any node in A’s level-l ancestor’s
    subtree are aggregated up to level l and is
    propagated down along the path from the
    ancestor to A.
Update and Prob Operation
Update and Prob Operation
    Update Operation API
   Update-UpK-downj :
    Up to kth level and propagates the
    aggregate values of a node at level l
    downward for j levels. (l ≤ k)
        Operation API
Update-UpK-downj   K


    L                     Level-2

J                           Level-1

    Dynamic Adaptation
   A SDIMS implementation can dynamically
    adjust its up/down strategies for an
    attribute based on its measured
    read/write frequency.
   SDIMS defines the aggregation
    abstraction to mesh with its underlying
    scalable DHT system.
   SDIMS refines the basic DHT abstraction
    to form an Autonomous DHT (ADHT) to
    achieve the administrative isolation
Mapping to DHT
Mapping to DHT
   Aggregating an attribute along the
    aggregation tree is corresponding to
    DHTtreek for k =hash(attribute type,
    attribute name)
   Different attributes will be aggregated
    along different trees.
Administrative isolation
   For security
       Updates and Probes are not accessible outside the
   For availability
       Queries for values in a domain are not affected by
        failures of nodes in other domains
   For efficiency
       Domain-scoped queries can be simple and
Administrative isolation
   Autonomous DHT
       Path Locality: Search paths should always
        be contained in the smallest possible
       Path Convergence: Search paths for a key
        from different nodes in a domain should
        converge at a node in that domain.
            Administrative isolation
                                               Domain univ.

                                Domain dept.

L0: host
                   isolation property is violated
L2: univ.
            Administrative isolation
                             Domain dept.   Domain univ.

L0: host             Autonomous DHT
L2: dept.
   ADHT
       Distributed Computing (?)
   Aggregation Management Layer (AML)
       Lazy re-aggregation
       On-demand Re-aggregation
       Replication in Space
2 Layer arch. : ADHT and AML
   The ADHT layer informs the AML layer
    about reconfigurations in the network.
       NewParent
       FailedChild
       NewChild

   Child MIBs containing raw aggregate values
    gathered from children.
   Reduction MIB containing locally aggregated
    values across this raw information
   Ancestor MIB containing aggregate values
    scattered down from ancestors.
parent           child
Implementation (cont.)
   attribute key : Use for retrieving data
    by aggregation function.
       (attributetype, attribute name)
    Implementation (cont.)
   A node acts
       as leaf for all attribute keys
       as a level-1 subtree root for keys whose hash
        matches the node’s ID in b prefix bits.
       as a level-i subtree root for keys whose hash
        matches the node’s ID in the initial i * b bits.
       as the system’s global root for attribute keys
        whose hash matches the node’s ID in more
        prefix bits than any other node
     Evaluation                          更新自己的MIB

                       Up-All, Down 0

Monitor的attribute變化少                    Monitor的attribute變化多
    Evaluation (cont.)
          the session size is set to 8 (domain size),
Message   the branching factor is set to 16

Evaluation (cont.)

    Bf: Branch Factor

                  Average path length to root
        Evaluation (cont.)

Bf: Branch Factor
Evaluation (cont.)


    40               100
Evaluation (cont.)

   283個node算總和, 一個node是10
Evaluation (cont.)


                     275s時 root killed
   Scalability with respect to both nodes
    and attributes through a new aggregation
    abstraction that helps leverage DHT's
    internal trees for aggregation.
   Flexibility through a simple API that lets
    applications control propagation of reads
    and writes.
     Conclusion (cont.)
   Administrative isolation through simple
    augmentations of current DHT algorithms.
   Robustness to node and network
    reconfigurations through lazy reaggregation,
    on-demand reaggregation, and tunable
    spatial replication.

To top