Docstoc

PowerPoint - Semantic Grid

Document Sample
PowerPoint - Semantic Grid Powered By Docstoc
					      Metadata and
Information Services for
     an Earthquake
     Simulation Grid
 Mehmet Aktas, Marlon Pierce, and
          Geoffrey Fox
     Community Grids Lab
       Indiana University
       SERVOGrid Background
   Web services based grid for supporting
    earthquake simulation
   Components
    • Databases: faults, GPS, Seismic catalogs
    • Simulation codes: Monte Carlo, FEM, mesh
      generation tools
    • Web services
          Data access, job management (“workflow”), file
           transfer, session management.
    • Portlet-based user interface
   Information services
    Information Service Complaints
   I have never been happy with the various
    information services.
   In my experience, data models have problems
     • Tree model forces arbitrary decisions about container
       organization
     • Overuse of <any> tags
   Poorly maintained information
     • UDDI problem: registries are filled with obsolete
       information.
   Information servers tend to be very centralized.
     • Peer approaches need to be examined
    Semantic Information Services
   After reviewing the Semantic Web specifications,
    I became interested in using them for information
    services.
   Graph models seem to be a more natural way to
    extend interlinked information.
   Using URIs, potentially easier to support
    fragmented data
    • Centralized data services can be used but also P2P
      approaches.
   I am not so interested in artificial intelligence,
    reasoning, etc.
    • Interesting problems, but for someone else.
   We see two different activities
    • Designing an RDFS/OWL ontology to act as SERVOGrid
      information services data models
    • Implementing the information services middleware.
An Ontology Overview
     Sample Simulation Codes
   Disloc: calculates surface stress displacements causes by
    a fault placed in an elastic half-space. Surface data can
    be either on a grid or on defined scattered points. Can
    also create InSAR-style surface displacements.
   Simplex: inverts Disloc to estimate fault parameters from
    observed surface displacements. Surface displacements
    can be either on a grid or at defined points.
   GeoFEST: does a realistic model of stresses created by a
    fault. Uses finite element method, realistic material
    properties.
   AKIRA: Converts a geometry (layers, faults) specification
    into a finite element mesh. Successive calls refine the
    mesh. Needed as a helper application for GeoFEST.
   Virtual California: Based on realistic fault and fault friction
    models, simulates interacting fault systems.
          Visualization Codes
   We associate simulation codes with
    zero or more visualization systems.
    • GMT (General Mapping Tool)
    • IDL
    • RIVA
    • Web Map Service (GIS)
   In practice, we usually refer to
    scripts for specific tasks rather than
    the entire toolkit.
    Sample Compute Resources
   Grids: a Sun Ultra 60 with Disloc,
    Simplex, and VC installed.
   Danube: linux dual processor
    machine with GeoFEST, Akira, GMT
    installed.
   Jabba: an SGI 8 processor machine
    with RIVA installed.
            Data Types and Formats
   This is a mixture of data objects and
    representations. As always, the data itself
    is not represented but information like the
    creator of the data is.
    •   Faults
    •   GPS data
    •   Seismicity
    •   Surface stress data
    •   INSAR data
    •   Surface data representation: grid or point data
Managing Distributed
    Metadata
Managing Distributed Metadata
   Small problem with the Semantic
    Web/Grid:
    • How do you manage fragments of dynamic
      metadata?
    • (Assume a uniform data model)
   In our case, we need a “medium sized”
    distributed information system
    • Not the entire web, but dynamic enough to
      benefit from distributed information systems.
   We want to strike a balance between
    response time efficiency and reliability.
               Cache Nodes
   Instances of the
    SERVO ontology         (grids,hasCode,disloc)

    are initially
    distributed over
    several distributed
    cache nodes.
                          (danube,hasCode,geoFEST)
   No one cache has
    all the instances.
   Caches are
    accessed as peer-
    to-peer nodes.         (kamet,hasCode,Slider)
       Querying a Proxy Cache
   Clients can connect to any of the cache
    nodes via a Web service connection.
   Queries and responses are just SOAP
    requests.
   If the Proxy cache can’t answer a query, it
    does a P2P search of all neighbors.
   If/when query is answered, the initiating
    proxy cache augments its RDF store with
    the new info from the peer.
    • It can henceforth answer that query without
      searching.
Client

               (?,hasCode,GeoFEST)


                       Proxy   (Grids,hasCode,Disloc)
  SOAP Call            Cache
                       #1      (Danube,hasCode,GeoFEST)


         Peer Search




          Proxy
          Cache    (Danube,hasCode,GeoFEST)
          #2
      Notification Updates to Proxy
                  Caches
   Proxy caches acquire larger sets of metadata
    over time in response to client queries.
   Problem now is that caches can become out of
    synch.
    • Disloc may be removed from Grids, so all caches have to
      be notified.
   This is handled through publish/subscribe system
    based on topics.
   There is one topic for each property.
   Caches subscribe to topics for each property.
   Origin caches are allowed to publish changes.
            More Information
   QuakeSim: http://www-
    aig.jpl.nasa.gov/public/dus/quakesim/
   Semantic Web Work:
    http://grids.ucs.indiana.edu/~maktas/ser
    vo/
   NASA CT and AIST support the QuakeSim
    project, and to NASA Ames supported
    Semantic Grid investigations.
 Querying Cache Space

                  Broker Cloud


    Web Service



Client



                                 Cache Space
                         The Picture
-   Each peer of the P2P network is working as Proxy Cache. A Proxy cache
    forms a door between client a the Cache Space.
-   Clients interact with peers through a Web Service interface
-   When a clients queries a peer where the cache is installed, this peer will
    query its cache and then forward the query to the rest of the Cache Space.
-   Forwarding simply happens as publishing the query to the available topics.
    With this method query is distributed stepwise to the nodes that are
    semantically connected to the origin Proxy Cache.
-   Each query message has the unique identifier of the peer that originates
    the query. When the results are propagated
-   Each cache repeats the querying and forwarding process unless
    there is results.
-   When there is results to the query, results are propagated back as
    an RDF Model
-   Distributed search stops when there are results satisfying the
    query or when there are no results found after a customized
    threshold for the number of stepwise exploration.
       What About WS-<any>?
   We are examining the feasibility of using
    RDF and related languages to describe our
    information requirements.
    • Build a testbed infrastructure for decentralized
      metadata management as a proof of concept.
   There are many activities and
    specifications in this general area that we
    do not want to use in the proof-of-concept
    phase.
    • WS-Notification and WSRF obviously.
              Edutella: P2P Network
                  Infrastructure
   Edutella uses JXTA framework for P2P functionality and provides services
    that complement JXTA service layer.
   Edutella uses RDF syntax for metadata. Each peer is provides a Query
    Service to search its RDF repository.
   There is no stepwise exploration of the peers. A JXTA peer sends a query
    only to its JXTA neighborhood regardless of the link structure of the
    metadata.
   There is no cashing of the metadata on the peers to decrease the search
    time. Each JXTA peer performs a query on its own data repository.
 High level classification of classes in Servo
                Grid Ontology
Servo Grid Resources
Classes                    Description
                           describes the code and the data created as a result of an
ServoObjects               experiment or creativity
                           describes the types of different data formats used in
ServoDataFormat            SERVO Grid Project
                           describes the computing platforms exist in SERVO Grid
ServoComputePlatform       Project
                           describes the characteristics of the ServoCode, e.g. model,
ServoCodeCharacteristics   programming languages used in ServoCode
                           describes the various types of containers for code, data
ServoObjectContainer       and documents, e.g. HDF file
                           describes the organizations that are involved in SERVO
Organization               Grid Project
Person                     describes the people working in the SERVO Grid Project
                           describes the location of the organization involved in
Location                   SERVO Grid Project
     Classification of ServoComputePlatform


ServoComputePlatform
Classes                Description
                       describes the computers providing
                           computational environment
ComputeResources       for SERVO Grid project
                       describes the web services providing
                          message oriented
                       computing environment in SERVO Grid
InstalledWebServices      environment
Classification of Servo Code Characteristics


Servo Code Characteristics
Classes                      Description
                             describes the general problems encountered while compiling
KnownProblemsInCompiling     the ServoCodes
                             describes the successful libraries that have been tested and
KnownSuccessfulLibrary       used with ServoCodes
                             describes various earth science models used in ServoCodes,
UsedModel                    e.g. Elastic, inverse, viscoelastic and etc.
                             describes used programming languages in implementing
UsedProgrammingLanguage      ServoCodes
   Associated properties with ServoCode Class
ServoCode Class
Properties            Range                   Description
                                            what programming
                      UsedProgrammingLangua    language
isDevelopedWith          ge                 used to develop the code.
                                              what kind of data a code
                                                  takes
createsOutputData                             as input and generates as
   \ takesInputData   ServoData               output
                                              a code depends upon
                                                  another
                                              operation before it can be
dependsUpon           ServoCode               completed
                                              where a ServoCode is
installedOn           ServoComputePlatform      installed
                                              person or organization who
developedBy           Person, Organization    developes the resource
                                              person or organization who
isOwnedBy             Person, Organization    owns the resource
   Associated properties with ServoData Class

ServoData Class

Properties             Range             Description
                                         associate a data
                                            format with a
hasDataFortmatOf       ServoDataFormat   piece of data

                                         what kind of code is
                                            using or
                                         taking this data as
                                            input/output.
                                         Inverse properties for
isInputDataFor                           createsOutputData and
   \ isOutputDataFor   ServoCode         takesInputData
              Associated properties with
             ServoComputePlatform Class
ServoComputePlatform
   Class
Properties             Range                  Description
                                              describes the person or
                                              organization that ownes
                                                 the
isOwnedby              Person, Organization   resource
                                              describes the ServoData
                                                 or
                                              ServoCode accessed
                       ServoData                 through this
hasData \ hasCode         \ ServoCode         compute platform
                                              describes the person
                                                 that
                                              maintains this compute
isMaintainedBy         Person                    platform
              Edutella: P2P Network
                  Infrastructure
   Edutella uses JXTA framework for P2P functionality and provides services
    that complement JXTA service layer.
   Edutella uses RDF syntax for metadata. Each peer is provides a Query
    Service to search its RDF repository.
   There is no stepwise exploration of the peers. A JXTA peer sends a query
    only to its JXTA neighborhood regardless of the link structure of the
    metadata.
   There is no cashing of the metadata on the peers to decrease the search
    time. Each JXTA peer performs a query on its own data repository.
       Notification-Based Caching
   We want to strike a balance between centralized
    and decentralized content management.
   Metadata instances may be distributed over
    several hosts.
   We are investigating caching based on breadth-
    first search.
    • Each node stores its own data and all of the immediate
      property values, one node deep.
   This allows each node to maintain a moderate
    amount of information sufficient to satisfy
    immediate (one-hop) RDF queries.
    Distributed RDF Queries based on
                 properties
   Queries are formed as triples to find available
    metadata about a resource.
   Results are metadata (set of triples) regarding
    the requested resource.
   A query may be issued to any P2P node where
    the cache is installed.
   Starting from first cache, each cache is queried
    via stepwise exploration.
     • First cache interacting with the client is the
       Proxy cache.
     • The Proxy cache distributes the client’s query
       with its unique identifier to be able to receive
       the results.
      Distributed Query Steps occur as
                   follows
   First cache in the cache space is queried.
   When there is no results, query is published to
    available topics with the unique identifier of the
    first P2P node.
   Each cache repeats the querying and forwarding
    process unless there is results.
   When there is results satisfying the query, results
    are propagated back as an RDF Model
   Distributed search stops when there are results
    satisfying the query or when there are no results
    found after a customized threshold for the
    number of stepwise exploration.
        Initialization of system with
              fragmented RDF
   When a node is bootstrapped, a triple store is
    created out of available RDF Models at that node.
   Predicates of available triples (where the object
    of the triple is a Resource) form topics.
   Topics are created at the broker node
    dynamically by publishing a message to static
    topics such as “createTopic”.
   A Resource metadata provider can be a publisher
    for the topics where the Resource is the Domain
    of that topic.
   A node can be a subscriber to a topic when there
    are Resource Objects (in the cached triple store)
    that are in the Range of that topic.
      Notification based updates
   Static topics, such as createTopic, deleteTopic are
    used to create dynamic topics.
   Finite amount of topics available
     • SERVOGrid ontology defines all possible topics
       (predicates) available
   Publisher node of a topic stores the origin
    metadata (set of triples)
     • metadata provider is responsible to propagate
       updates
   Subscriber node of a topic is listening to updates
    for the resources that are in the range of that
    topic.
     Topic-Based Publish/Subscribe
               Systems
   Publish/subscribe
    systems are a way to               Subscriber
    distribute messages to
    many different
    listeners.                Subscriber           Subscriber

   Publishers and
    subscribers are
    associated with topics.
   We use in-house                         Broker
    developed                               Cloud
    NaradaBrokering
    system
    • JMS, WS-Notification
    • S. Pallickara
                                           Publisher

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:2/20/2010
language:English
pages:36