Docstoc

SRB Matrix

Document Sample
SRB Matrix Powered By Docstoc
					                                                     Or
                                            What is SRB Matrix?



         Data Grid Automation

      Arun Jagatheesan et al.,
  San Diego Supercomputer Center
  University of California, San Diego

     VLDB Workshop on Data Management in Grids
       Trondheim, Norway, 2-3 September 2005


SDSC Storage Resource Broker   San Diego Supercomputer Center
                            Talk Outline

• Data grid Landscape
• Long-run data management processes
   • Data Grid ILM
   • Data Grid Triggers
   • Dataflow Pipelines
• Execution Logic – Data Grid Language
• End-to-End Infrastructure Deployment
   • API
   • User GUI
• Service-oriented *Infrastructure*


        SDSC Storage Resource Broker   2   San Diego Supercomputer Center
          Data Grid Landscape




SDSC Storage Resource Broker   3   San Diego Supercomputer Center
              The “Grid” Vision




SDSC Storage Resource Broker   4   San Diego Supercomputer Center
Data Grid Resource Providers




                      Grid Resource Providers
                      (GRP) providing content               GRP        GRP
                           and/or storage

                                                           /txt3.txt
SDSC Storage Resource Broker   5   San Diego Supercomputer Center
Data Grid Administrative Domain
• Administrative domain with one or more GFS
              Resource Providers
      •Could include their data centers


                                                           Research Lab




                                                             GRP        GRP


                                                            /txt3.txt
 SDSC Storage Resource Broker   6   San Diego Supercomputer Center
      Data Grid Administrative domains




                                                                      University
                                                                  data + storage (10)
Storage-R-Us Resource
                               Research lab- Taiwan
       Providers
                                data + storage (40)
  data + storage (50)


GRP   GRP   GRP         GRP    GRP        GRP                          GRP        GRP


       /…/text1.txt           /…//text2.txt                           /txt3.txt
       SDSC Storage Resource Broker   7       San Diego Supercomputer Center
        Data Grid (Enterprise Utility)

                               Physical Resources managed by
                             autonomous administrative domains
                             of the same enterprise (ABCZ.com)




IT Department US                 IT Department Asia                      3rd Party


 ABCZ.com US                                                            Data center
                                      ABCZ.com Asia

       SDSC Storage Resource Broker     8   San Diego Supercomputer Center
            Data Grid (Enterprise Utility)
                                                Each project has a data grid
                                                    instance consisting of
Project 1       Project 2
                                                   Logical Resources with
                                                different SLAs offered by IT
                                                         department




IT Department US                  IT Department Asia                      3rd Party


 ABCZ.com US                                                             Data center
                                       ABCZ.com Asia

        SDSC Storage Resource Broker     9   San Diego Supercomputer Center
           Data Grid (Enterprise Utility)

Project1       Project2               Project3           Project4




IT Department US                 IT Department Asia                           3rd Party


 ABCZ.com US                                                                 Data center
                                      ABCZ.com Asia

       SDSC Storage Resource Broker     11       San Diego Supercomputer Center
Long-run Processes in Data Grid



                   • Data Grid ILM
                • Data Grid Triggers
                  • Data Gridflows




 SDSC Storage Resource Broker   12   San Diego Supercomputer Center
                  Data Grid ILM




SDSC Storage Resource Broker   13   San Diego Supercomputer Center
                 Change is Constant

• Changes in access patterns
  • Based on number of users accessing a data
  • Domains which want to access data
• Data Value
  • The value of data set (collections?) for a particular domain
    based on it business model and users’ access patterns
  • Each domain will have a different value based on its users
    and its role in a data grid



      SDSC Storage Resource Broker   14   San Diego Supercomputer Center
           “Data Value” based on users
                                  When more users access a project’ data, its
                                   data value increases, move that data to a
Project1       Project2            Project3 fasterProject4 type
                                                    storage




IT Department US                 IT Department Asia                       3rd Party


 ABCZ.com US                                                             Data center
                                      ABCZ.com Asia

       SDSC Storage Resource Broker     15   San Diego Supercomputer Center
       “Data Value” based on domain
                                  When more users from the same domain
                                    access the data, the data value for that
Project1       Project2                             Project4
                                   Project3 data in that particular domain
                                   particular
                                increases, so replicate the data to resources in
                                     that domain. (converse is also true)




IT Department US                 IT Department Asia                       3rd Party


 ABCZ.com US                                                             Data center
                                      ABCZ.com Asia

       SDSC Storage Resource Broker     16   San Diego Supercomputer Center
           “Data Value” based on role
                                 The 3rd party data center – no users who use
                                  data, but is interested in having replica of
Project1       Project2                             Project4
                                  Project3 (or deleted data) for long term
                                   any data
                                                  preservation




IT Department US                 IT Department Asia                       3rd Party


 ABCZ.com US                                                             Data center
                                      ABCZ.com Asia

       SDSC Storage Resource Broker     17   San Diego Supercomputer Center
                        Data Grid ILM
• ILM = Information Lifecycle Management
• Dynamic re-orientation of data placement and
  data retention policies (rules)
• Based on “business value of data” and storage
  cost
• HSM = Hierarchical Storage Management, based
  on “data freshness”. ILM goes one step further
• Applying this concept on Data Grid, very tricky
  as different autonomous domains have different
  business rules

      SDSC Storage Resource Broker   18   San Diego Supercomputer Center
             Data Grid Triggers




SDSC Storage Resource Broker   19   San Diego Supercomputer Center
                   Data Grid Triggers

• Similar to triggers in databases
• Based on ECA concepts
  • Event
  • Condition
  • Action
• Example
  • Event = Insert new file in collection (“/ourProject/data”)
  • Condition = (color= “blue” && galaxy = “Andromedia”)
  • Action = Run ( selectiveDataReplicator.dgl )

      SDSC Storage Resource Broker   20   San Diego Supercomputer Center
                 Data  Discovery
                                                     New data
Digital entities
                                  updates relationships among
                                       data in collections
     Meta-data                     Services invoked to analyze
                                        new relationships

     Services                          DGMS applications get
                                       notified of state updates

       State


   SDSC Storage Resource Broker   21    San Diego Supercomputer Center
                 Data Gridflows




SDSC Storage Resource Broker   22   San Diego Supercomputer Center
       Gridflow in SCEC
 (data  information pipeline)
            Metadata derivation                       Pipeline could be
                                                      triggered by input
                 Ingest Data
                                                       at data source or
                                                       by a data request
                     Ingest Metadata
                                                           from user
                          Determine analysis pipeline

                               Initiate automated analysis
        Use the optimal set          Organize result data into distributed
        of resources based                  data grid collections
         on the task – on
              demand
                         All gridflow activities
                          stored for data flow
                              provenance
SDSC Storage Resource Broker   23    San Diego Supercomputer Center
    Data Grid Language (DGL)




SDSC Storage Resource Broker   24   San Diego Supercomputer Center
                 Data Grid Language

• Requirement
  • Data Grid ILM process
     • The long run process that has to be run is described in DGL
  • Data Grid Triggers
     • Action part of the ECA (Event-Condition-Action) logic
  • Data Gridflows
     • Step by step execution of long run process on Data Grid
• Analogy of SQL in relational databases
  • Long-run process procedures stored and executed in Data
    Grid it self
  • Captures the “Infrastructure Execution Logic”

      SDSC Storage Resource Broker   25   San Diego Supercomputer Center
                  DGL Request
                                                       Annotations about
                                                        the Data Grid
                                                           Request




                                                        Can be either a
                                                        Flow or a Status
                                                            Query




SDSC Storage Resource Broker   26   San Diego Supercomputer Center
             DGL Requests (2 types)

• Data Grid Flow
  • An XML Structure that describes the execution logic,
    associated procedural rules and DGL variables. Can be
    synchronous or asynchronous flow
• Status Query
  • An XML Structure used to query the execution status any
    gridflow or a sub-flow at any granular level. Status Queries
    can be made for both synchronous and asynchronous
    flows




      SDSC Storage Resource Broker   27   San Diego Supercomputer Center
                               Flow
                                                        Scoped Variables
                                                       that can control the
                                                               flow

                                                        Logic used by the
                                                          sub-members


                                                            Sub-members
                                                              that are the
                                                            real execution
                                                              statements



SDSC Storage Resource Broker    28   San Diego Supercomputer Center
Flow Logic (How a flow executes)




 SDSC Storage Resource Broker   29   San Diego Supercomputer Center
…
<userDefinedRule name="beforeEntry">
<condition>
<simpleQuery>$numVar == 1</simpleQuery>
</condition>

<action name="true">
<actionString>SET var1 = 1</actionString>
</action>
<action name="true">
<actionString>SET var2 = "foo"</actionString>
</action>
<action name="false">
<actionString>SET var1 = 0</actionString>
</action>
</userDefinedRule>
…



      SDSC Storage Resource Broker   30   San Diego Supercomputer Center
                What is SRB Matrix?

• Matrix provides the SRB as a Web Service
  • Web Service based on Data Grid Language
• SOA for Data Grid or Digital Library
  • Service oriented *infrastructure*
• Asynchronous end-user facing applications
  • Long run operations presented to users as portlets
• Data Grid Automation and ILM
  • File Triggers on unstructured data
  • Automated movement or management of data

      SDSC Storage Resource Broker   31   San Diego Supercomputer Center
 Matrix Gridflow Server Architecture
JAXM Wrapper          WSDL Description        Event Publish        JMS Messaging
                                               Subscribe,            Interface
    SOAP Service for Matrix Clients            Notification

                      Matrix Data Grid Request Processor

                  Sangam P2P Gridflow Broker and Protocols

Transaction Handler      Status Query Handler          Workflow Query Processor

Flow Handler and           XQuery          ECA rules          Gridflow Meta data
Execution Manager         Processor         Handler                Manager

           Matrix Agent Abstraction                         Persistence (Store)
                                                               Abstraction
SDSC SRB     Other SDSC         Agents for java,
 Agents      Data Services      WSDL and other           JDBC          In Memory
                                grid executables                          Store

      SDSC Storage Resource Broker    32   San Diego Supercomputer Center
                           Conclusion

• Data Grids are evolving
• Data Grid Automation of long-run processes
  essential
• Need a language for Data Grid Automation
• Data Grid Language is one such effort as part
  SRB Matrix Project
• Open source project for anyone to use (or join)
• talk2matrix@sdsc.edu (or arun@sdsc.edu)


      SDSC Storage Resource Broker   33   San Diego Supercomputer Center

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:1
posted:10/18/2011
language:English
pages:32