Docstoc

overview.ppt - EU Provenance

Document Sample
overview.ppt - EU Provenance Powered By Docstoc
					      Provenance: overview


                          Professor Luc Moreau
                        L.Moreau@ecs.soton.ac.uk
                        University of Southampton
                        www.ecs.soton.ac.uk/~lavm




Architecture Tutorial
  Provenance & PASOA Teams
  •   University of Southampton
        – Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia
          Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen
  •   IBM UK (EU Project Coordinator)
        – John Ibbotson, Neil Hardman, Alexis Biller
  •   University of Wales, Cardiff
        – Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari
  •   Universitad Politecnica de Catalunya (UPC)
        – Steven Willmott, Javier Vazquez
  •   SZTAKI
        – Laszlo Varga, Arpad Andics,
          Tamas Kifor
  •   German Aerospace
        – Andreas Schreiber, Guy Kloss,
          Frank Danneman




Architecture Tutorial
  Contents

  •   Motivation
  •   Provenance Concepts
  •   Provenance Architecture
  •   Standardisation
  •   Conclusions




Architecture Tutorial
        Motivation




Architecture Tutorial
  Scientific Research




                        Academic Peer Review
Architecture Tutorial
  Business Regulations
                        Accounting




                           Audit (Sarbanes-Oxley)


                                                    Banking




                                               Audit (Basel II)

Architecture Tutorial
  Health Care Management




                        European
                        Recommendation
                        R(97)5: on the
                        protection of medical
                        data
Architecture Tutorial
  e-Science datasets

  • How to undertake peer-reviewing and
    validation of e-Scientific results?




Architecture Tutorial
  Compliance to Regulations


     • The “next-compliance”
       problem
          – Can we be certain that
            by ensuring compliance
            to a new regulation, we
            do not break previous
            compliance?




Architecture Tutorial
  Current Solutions

         • Proprietary, Monolithic
         • Silos, Closed
         • Do not inter-operate
           with other applications
         • Not adaptable to new
           regulations




Architecture Tutorial
  Provenance

  • Oxford English Dictionary:
        – the fact of coming from some particular source or
          quarter; origin, derivation
        – the history or pedigree of a work of art, manuscript,
          rare book, etc.;
        – concretely, a record of the passage
          of an item through its various
          owners.
  • Concept vs representation



Architecture Tutorial
  Provenance in Computer
  Systems
  • Our definition of provenance in the context of
    applications for which process matters to end users:

          The provenance of a piece of data is the
          process that led to that piece of data

  • Our aim is to conceive a computer-based representation
    of provenance that allows us to perform useful analysis
    and reasoning to support our use cases




Architecture Tutorial
  Our Approach

    • Define core concepts pertaining to
      provenance
    • Specify functionality required to become
      “provenance-aware”
    • Define open data models and protocols
      that allow systems to inter-operate
    • Standardise data models and protocols
    • Provide a reference implementation
    • Provide reasoning capability

Architecture Tutorial
  Context (1)


         Aerospace engineering:
         maintain a historical
         record of design
         processes, up to 99 years.



                             Organ transplant management:
                             tracking of previous decisions,
                             crucial to maximise the efficiency
                             in matching and recovery rate of
                             patients

Architecture Tutorial
  Context (2)

      Bioinformatics: verification and
      auditing of “experiments” (e.g.
      for drug approval)




                              High Energy Physics:
                              tracking, analysing, verifying
                              data sets in the ATLAS
                              Experiment of the Large
                              Hadron Collider (CERN)

Architecture Tutorial
        Provenance Concepts




Architecture Tutorial
  Provenance “Lifecycle”
     Core Interfaces to Provenance
     Store

                                Application

                                                         Data Results


                                          Record Documentation of Execution



                Administer        Provenance       Query and
                Store and its        Store        Reason over
                contents                          Provenance
                                                    of Data

Architecture Tutorial
  Nature of Documentation
  • We represent the provenance of some data by
    documenting the process that led to the data:
     – documentation can be complete or partial;
     – it can be accurate or inaccurate;
     – it can present conflicting or consensual views
       of the actors involved;
     – it can provide operational details of execution
       or it can be abstract.




Architecture Tutorial
  p-assertion
  • A given element of process documentation
    will be referred to as a p-assertion
    – p-assertion: is an assertion that is made
      by an actor and pertains to a process.




Architecture Tutorial
  Service Oriented
  Architecture
  • Broad definition of service as component that takes
    some inputs and produces some outputs.
  • Services are brought together to solve a given problem
    typically via a workflow definition that specifies their
    composition.
  • Interactions with services take place with messages that
    are constructed according to services interface
    specification.
  • The term actor denotes either a client or a service in a
    SOA.
  • A process is defined as execution of a workflow




Architecture Tutorial
  Process Documentation (1)
   From these p-assertions, we can derive that M3 was sent by Actor 1
   and received by Actor 2 (and likewise for M4)

                         Actor 1                 Actor 2
                 M1
                                     M3


                                     M4
                 M2



                                           I received M3
              I received M1, M4               I sent M4
                 I sent M2, M3


If actors are black boxes, these assertions are not very useful because
we do not know dependencies between messages

Architecture Tutorial
  Process Documentation (2)
              These assertions help identify order of messages,
              but not how data was computed
                         Actor 1                Actor 2
                 M1
                                     M3


                                     M4
                 M2




             M2 is in reply to M1
             M3 is caused by M1       M4 is in reply to M3
             M2 is caused by M4



Architecture Tutorial
  Process Documentation (3)
          These assertions help identify how data is computed,
          but provide no information about non-functional
          characteristics of the computation
          (time, resources used, etc)

                              Actor 1               Actor 2
                        M1
                                         M3
                                  f1
                                                        f
                                   f2
                                          M4
                        M2




                         M3 = f1(M1)
                        M2 = f2(M1,M4)         M4 = f(M3)



Architecture Tutorial
  Process Documentation (4)

                        Actor 1                Actor 2
             M1
                                   M3


                                   M4
              M2




                                           I used sparc
              I used 386 cluster             processor
                Request sat in          I used algorithm
                queue for 6min            x version x.y.z




Architecture Tutorial
  Types of p-assertions (1)
        – Interaction p-assertion: is an assertion of the
          contents of a message by an actor that has
          sent or received that message




                            I received M1, M4
                               I sent M2, M3




Architecture Tutorial
  Types of p-assertions (2)
       – Relationship p-assertion: is an assertion, made
         by an actor, that describes how the actor
         obtained an output message sent in an
         interaction by applying some function to input
         messages from other interactions (likewise for
         data)



                    M2 is in reply to M1    M3 = f1(M1)
                    M3 is caused by M1     M2 = f2(M1,M4)
                    M2 is caused by M4




Architecture Tutorial
  Types of p-assertions (3)
        – Actor state p-assertion: assertion made by an
          actor about its internal state in the context of a
          specific interaction




                             I used sparc
                               processor
                          I used algorithm x
                             version x.y.z




Architecture Tutorial
  Data flow

  • Interaction p-assertions allow us to specify
    a flow of data between actors
  • Relationship p-assertions allow us to
    characterise the flow of data “inside” an
    actor
  • Overall data flow (internal + external)
    constitutes a DAG, which characterises
    the process that led to a result



Architecture Tutorial
        Provenance Architecture




Architecture Tutorial
  Interfaces to Provenance Store

                                Application

                                                        Results




                                          Record Documentation of Execution


                                  Provenance
                Administer           Store         Query and
                Store and its                     Reason over
                contents                          Provenance
                                                    of Data


Architecture Tutorial
Architecture Tutorial
  P-Assertion schemas




Architecture Tutorial
  The p-structure

    • The p-structure is a common logical structure of the
      provenance store shared by all asserting and querying
      actors
    • Hierarchical
    • Indexed by interactions (interaction= 1 message
      exchange)




Architecture Tutorial
  Recording Protocol (Groth04-06)
  • Abstract machines
  • DS Properties
        –   Termination
        –   Liveness
        –   Safety
        –   Statelessness
  • Documentation Properties
        – Immutability
        – Attribution
        – Datatype safety
  • Foundation for adding
    necessary cryptographic
    techniques




Architecture Tutorial
  Querying Functionality (Miles06)

     • Process Documentation Query Interface:
       allows for “navigation” of the documentation
       of execution
           – Allows us to view the provenance store (i.e. the p-
             structure) as if containing XML data structures
           – Independent of technology used for running
             application and internal store representation
           – Seamless navigation of application dependent
             and application independent process
             documentation




Architecture Tutorial
  Querying Functionality (Miles06)

 • Provenance Query Interface: allows us to obtain
   the provenance of some specific data
 • A recognition that there is not “one” provenance
   for a piece of data, but there may be different,
   depending on the end-user’s interest
 • Hence, provenance is seen as the result of a
   query:
      – Identify a piece of data at a specific execution point
      – Scope of the process of interest:
            • Filter in/out p-assertions according to actors, process, types
              of relationships, etc


Architecture Tutorial
        Standardisation




Architecture Tutorial
  Standardisation Options
                               APIs
                            Programmatic
                               inter-op


                           Recording
                          and querying
                           Interfaces
                           Service inter-op


                        Provenance Model
                            Data inter-op



Architecture Tutorial
  Purpose of Standardisation

                        Application
                                       Application




                                  Record Documentation
                                  of Execution



                          Provenance
                            Stores


Allow for multiple applications to document their execution.
Applications may be running in different institutions.
Architecture Tutorial
  Purpose of Standardisation

                             Application




                                       Record Documentation
                                       of Execution



                Provenance     Provenance      Provenance
                   Store          Store           Store



       Allow for multiple stores from multiple IT providers

Architecture Tutorial
  Purpose of Standardisation


                        Provenance
                           Store




                        Provenance
                           Store        Query
                                     Provenance
                                       of Data


       Allow for multiple stores from multiple IT providers


Architecture Tutorial
  Purpose of Standardisation



                                  Convert in standard data
                                  format




   Allow for legacy, monolithic applications to expose their
   contents (according to standard schema)


Architecture Tutorial
  Purpose of Standardisation


                            Provenance
                               Store
       Application




Allow third parties to host provenance stores,
which are trusted by application owners but also auditors

Architecture Tutorial
  Compliance Oriented Architectures

                                                  • Separate execution
                                                    documentation from
    Application
                                                    compliance verification
    Application
                                                  • Allows for multiple
                                                    compliance verifications
                                                  • Allows for validation to take
             Record Documentation                   place across multiple
             of Execution
                                     Compliance     applications, possibly run by
                                     verification   different institutions (in
      Provenance
                           Query
                                                    particular, allows for
         Store
                        Provenance                  outsourcing and
                          Of Data                   subcontracting).
                                                  • Approach is suitable for e-
                                                    scientific peer-reviewing and
                                                    business compliance
                                                    verification

Architecture Tutorial
  Organ Transplant Scenario
                              Hospital




      Electronic Healthcare   Testing Lab
      Management Service


Architecture Tutorial
  Hospital Actors

                                      User Interface




                        Brain Death                    Donor Data
                         Manager                        Collector




Architecture Tutorial
  What’s on the CD
        • Documents relating to both PASOA and EU
          Provenance projects
        • All the talks presented today
        • Handouts
        • Software
             – PReServ (Paul Groth & Simon Miles)
             – The EU Provenance client side library




Architecture Tutorial
        Conclusions




Architecture Tutorial
     To Sum Up
                                            Finance           Aerospace
                   Distribution



                                                                            Standardising the
     Healthcare                                                             documentation of
                                                                           Business Processes


                                                              Automobile
                                           Record

 • Provenance             Pharmaceutical
      – Architecture
                                                              • Compliance check
      – Methodology
                                    Provenance        Query
                                                              • Rerun/Reproduce
                                       Store                  • Analyse

                                                         Slide from John Ibbotson

Architecture Tutorial
  Overview of Today’s Talks

  • Provenance Data Structures
  • Recording and Querying Provenance
        – Break (30 minutes)
  • Distribution and Scalability
  • Security
  • Methodology


Architecture Tutorial
        Questions




Architecture Tutorial

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:3
posted:9/21/2011
language:English
pages:51