OMG_ LSR and standards

Document Sample
OMG_ LSR and standards Powered By Docstoc
					OMG, LSR and standards

  A world of abbreviations and
      Do we need standards?
• There is no need for standards…
• …unless/until we start to integrate
• In other words:
  – I don’t care about electrical plug socket…
  – …until I travel and wish to integrate my laptop
    into the plug socket in my hotel room
               A historical essay:
              The Machine Screw
• Principle discovered around 400 BC
• Limited use until machine tools made mass production
  possible (18th cent.)
• Every machine shop and foundry made unique sizes and
  thread dimensions
• 1841: Joseph Whitworth presented “The Uniform System
  of Screw-Threads” to Britain’s Institute of Civil Engineers
• 1864: William Sellers proposes “On a Uniform System of
  Screw Threads” to the Franklin Institute, Philadelphia
• Enabled interchangeable parts and tooling for
  mechanization and mass production
• 1945: British and American standards merged
    Why integrate information?
• Provide unified access to data regardless of
• Connect related data (e.g., by target,
• Allows users to focus on questions, not
  access (schemata, query languages, source
• See platform data in context
• Provide all information required for decision
  support (“one-stop shopping”)
     Who defines standards?
• The strongest one!
  – Who is the strongest?
     • Who has the best product, or
     • Who has the best position on the market, or
     • Whom others believe
        – that he will have the best product, or the best position on
          the market, in the future, etc.

• But what if there is no strongest one?
  – Then we need a process for negotiation
 Why are standards so difficult?
• Because people, by nature, do not want them
  – vendors can lock users if they use their proprietary
  – academics can write more papers about their own
• And, when they are forced to have them
  – people want to change their current solutions as little
    as possible
  – standards tends to be more general than a one-
    vendor solution, and generalization can be costly
      Standards alliances & consortia
•   Health Level Seven (HL7), CDISC
•   OMG Life Sciences Research Domain Task Force
•   Interoperable Informatics Infrastructure Consortium (I3C)
•   Open Bioinformatics Foundation:
    Biopython, BioJava, BioCORBA, Bioperl, BioDAS, BioMOBY, BioSOAP,
    BioXML, BioRuby, BioGopher, BioPHP, BioSQL ...
•   MGED
•   Global Grid Forum Life Sciences Grid
•   Open Science Alliance™
•   Gene Ontology (GO) Consortium
•   BioPathways Consortium
•   …
   So, after this introduction…
• If you want/need a standard, and
• If you are not the only, or the strongest
  party, or
• If you believe that even if you are the
  strongest, the openness can help you
• then it is a time to choose how to make it
  – and (in my opinion!) OMG is one of the best
     Object Management Group
• Several hundreds member companies; world’s largest software
• Founded April 1989
• Several tens full-time staff; no internal development
• Offices in Australia, Bahrain, Brazil, Germany, India, Italy, Japan,
  U.K., U.S.A.
• Dedicated to creating and popularizing object-oriented specifications
  for distributed application integration based on existing technology
• An open, consensus-based process
• Create a multi-vendor, competitive/co-operative marketplace of tools
  and components that are guaranteed to interoperate

• Best known for UML and CORBA

• Several membership types:
   – who can contribute and how much
   – who can vote
                   OMG Organization
                                      Board of Directors

                     Business Committee

Platform Technology Committee        Architecture Board              Domain Technology Committee

      Platform Task Forces                Subcommittees                    Domain Task Forces

     Special Interest Groups                                             Special Interest Groups
      Revision Task Forces                                                Revision Task Forces
                                          Object & Reference Model
     Finalization Task Forces                                            Finalization Task Forces
                                     Special Interest Groups

                                              Business Rules

                                          Java Community Process

                                                MDA Users


                                             Test & Validation

                                              Web Services
        OMG Organization: DTC
Task Forces
•   Business Enterprise Integration
•   C4I
•   Finance
•   Geospatial & Imagery
•   Healthcare
•   Life Sciences Research
•   Manufacturing Technology & Industrial Systems
•   Space
•   Telecommunications
•   Transportation
• OMG adopts & publishes Model and
  Interface Specifications
• Specifications chosen from existing
  products in competitive selection process
• Specifications are freely available to both
  members and non-members
• Implementations must be available
  commercially or Open Source from OMG
  Corporate member(s)
       OMG Technology Adoption
Need                                       Submission
         RFI                                            Adoption
        4-6 mo      Initial
                   4-6 mo        Revised
                                  4-6 mo
        Issue                                     Implementation   Products
                                                        12 mo
           OMG’s Model Driven
            Architecture (MDA)
• Solve the domain problem once: PIM in UML™
    • Formal specification of the structure and function of a system that
      abstracts away technical details
    • UML can be used to specify structures, behaviors, and constraints
      without committing to a specific platform
    • Distills the fundamental structure and meaning

• Add middleware details as needed: PSMs in UML
    • Introduces artifacts specific to a given platform, e.g., CORBA Services
    • By leveraging a PIM it’s easier to produce implementations on different
      platforms while holding the essential structure and meaning of the
      system invariant

• Approach works with SOAP, XML, Java, EJB,
  CORBA, .NET, Web Services, “next best thing”
       MDA: Putting It All Together

   Java/EJB   Bridge
                       XML/SOAP   Bridge
                                           CORBA   Bridge
     Model               Model             Model            Model

   Java/EJB   Bridge   XML/SOAP   Bridge   CORBA   Bridge   Other
      How does MDA help with
• Common situation: there are multiple middleware
  solutions for a given problem, e.g., Java and XML
  representations of a bio-sequence
• Interoperability and integration are much easier if
  these solutions have a PIM in common
• MDA provides a clean separation of domain and
  implementation issues
• Develop semantic standards at the PIM level
• Derive “platform-specific” standards from these
  PIM-level standards
    MDA sounds so obvious…
• And it is, indeed, a new name for a known
• But our work is not, unfortunately, driven
  only by ideas but often by the market –
  and market needs new names
• Having MDA as a standard, individual
  software vendors can produce and “sell”
  MDA-aware components and tools…
• …which brings more interoperability
        LSR: A Brief History
• June 97, Hinxton: Proposal at OiB97
• August 97, Philadelphia: Organizational
• September 97, Dublin: Inaugural DSIG
  meeting, RFI-1 issued
• April 98, Manchester: “Promoted” to DTF
  status, RFP-1 (Biomolecular Sequence
  Analysis) issued
  LSR – Life Sciences research
• Mission: Adopt model and interface
  specifications to enable interoperable
  software components in the life sciences
  research “vertical domain”
• Scope: Genomics, bioinformatics,
  cheminformatics, genetics, proteomics,
  structural biology, …
• Open participation: mission defined by
  participants ! (there are no “they”, just “you”)
   LSR Working Groups (2004)
• Architecture & Roadmap
• Cheminformatics
• Gene Expression
• Liaison
• Macromolecular Structure
• Pathways
• Sequence Analysis
• Single Nucleotide Polymorphisms
• Web Site
       OMG LSR Technology Adoptions
      Biomolecular Sequence Analysis                               LAB             Biochemical Pathways

                                                                                      Life Sciences
                         Genomic Maps                                                  Identifiers
                                                                                    Gene Expression Query
                        Bibliographic Query Service                                        Service
                                                                                       Life Sciences
                          Macromolecular Structure                                    Analysis Engine

                                                              Chemical Structure

                                                                                         Single Nucleotide
                                  Gene Expression                        CSM
                               Entity ID                                                Compound
                                Service                                                 Collection


 Gold border                               Clinical Trials
indicates MDA                                Lab Data

    1998         1999            2000                  2001               2002         2003             2004
     Lessons learned: Participants
• Vendor participation is essential. Standards must be
• User participation is essential. Standards must be user-
• Need passionate advocates who are also practitioners
• Need all significant players and stakeholders participating
• Encourage non-member consortia/individuals to participate
• Quality standards can only be developed on time if they
  are seen to be business critical, not “nice to have”
• Expect and allow for “single-issue” organizations
• “Open” & “Standard” have multiple definitions
        Lessons learned: Process
• Well-defined “due process” is required for openness
• Open process is necessary (but not sufficient) for
  broad acceptance of standards developed
• OMG technology adoption process provides a
  structure with deadlines, prevents endless discussion
• Experience makes it possible to use the process &
  avoid being used by it
• Different standards bodies and developer groups can
  interact productively, if the process allows it
• It is essential to leverage existing (open source)
  efforts, use the OMG process as a unifier
• Recruiting active participants for each technology
  adoption is the key to success
       Lessons learned: Products
• Nail down good use cases from RFI's or other
  sources. Let use cases drive standards
• Keep it simple and make it real
• Begin by focussing on data models, add methods
• Marketing and education are key to ensuring
  standards are actually used
• OMG requirement for implementations is
  important for avoiding “paper only” specifications
     LSR Contact Information
• Web pages
  – OMG:
  – LSR:
  – Next meeting agenda:
• Email addresses
• Next meeting: Orlando 21-22 June
• Gene Expression
• Life Sciences Identifiers
       Gene Expression RFP
• Standardized programmatic interfaces are
  required to support automated data exchange
  and interoperability among gene expression
  data systems
• RFP solicited specifications of interfaces and
  services to support array-based gene
  expression data collection, management,
  retrieval, and analysis
• Intended to form a common basis for building
  advanced gene expression data services
Gene Expression UML Overview
  Gene Expression Submission
• MAGE: Microarray Gene Expression
   – Platform Independent Model is MAGE-OM
   – Platform Specific Model is MAGE-ML
   – follows MGED’s Minimum Information about a Microarray
     Experiment (MIAME)
• Submitters
   – EMBL-EBI (European Bioinformatics Institute), representing
   – Rosetta Inpharmatics
• Document numbers
   – specification - dtc/2002-02-05
   – XMI representation of MAGE-OM - lifesci/2002-01-02
   – XML DTD - lifesci/2002-01-03
• For details see
               History & Status
• I3C started the initiative
   – concentration on early implementation
• IBM (an I3C member) implemented it
   – main use case: PDB
• Sun (another I3C member) implemented
   – based on LDAP
   – does not use the same spec as IBM
• Trying to agree on a specification
   – OMG-LSR initial submissions posted
      • submitters: EBI/IBM (joint submission), I3C
   – A revised submission approved January 2004
           Three basic parts
• LSID Syntax
  – how to name uniquely data entities
• LSID Resolution Service
  – how to get (to) data entity from its LSID
  – subpart: how to find the LSID Resolution
• LSID Assigning Service
  – how to invent LSIDs for new data entities
                         LSID Syntax
• Examples

• Parts:
  – authority:namespace:object[:revision]
• Basic rules:
  – LSIDs must be assigned to at most one resource, and are never
  – There is no requirement that a data entity has only one LSID
  – An LSID usually represents a piece of data, but it is allowed to have
    LSIDs representing an abstract entities or concepts
       • If an LSID represents real data, the LSID Resolution service must resolve
         always the same set of bytes representing such data
       • If an LSID represents an abstract entity the LSID resolution service must
         always resolve an empty result
• Too broad…let’s categories, at least…
  – LSR chairs & Working Group chairs
  – Submitters
  – Evaluators of submissions
  – last but not least: the Open Source community
    who pushes to make things well, effective and

Shared By: