CAREcenterInformaticsCall ppt Slide Pips

Document Sample
CAREcenterInformaticsCall ppt Slide  Pips Powered By Docstoc
					CARE Center Informatics
Background and Progress Report for
    10/20/06 Conference Call

            Marcia Nizzari
Background Information

  On CARE informatics, GAP
(Genetic Analysis Platform) and
      Informatics Team
     CaRE Center Informatics
• Builds on existing Genetic Analysis Platform
  – Operational for 2+ years
  – Genotyping and Resequencing
  – Code base successfully reused
• CaRE Center enhancements:
  – Data sharing strategy
  – Phenotype/Trait thesaurus, meta thesaurus
  – Customizable analytic pipelines
      GAP by the Numbers…
• 510,000 lines of working source code
• Very large databases, one of the largest
  tables has 640 M rows
• Standard industry metrics (SLOCCount)
  estimate that this code base required
  – $18M to develop (actual: ~ $4.5M)
  – Staff of 40 for 3.5 years (actual: average 12/yr)
• Pretty good deal!
  – Truly a World Class informatics team
         GAP by the #’s (cont’d.)
• User statistics:
   – 222 logins for the system (internal & external)
   – Nina has trained (since 12/05):
        • 47 people individually
        • Held 25 small group sessions
• Jira statistics:
   –   1,722 issues logged since Jira in production
   –   1,330 resolved
   –   392 open/in progress
   –   Daniel Mirel is the champion Jira user
        GAP by the #’s (cont’d)
• Informatics staff statistics:
   – Total software dev experience: ~200 years
   – Degrees held:
      • 1 PhD (neuroscience)
      • 6 masters degrees
         – 4 comp sci, 1 molbio, 1 manufacturing
      • 4 eng bachelor degrees
         – 2 EE’s, 1 ChemEng, 1 SoftEng
      • 3 biochem/molbio/biology bachelor degrees
      • 4 comp sci bachelor degrees
      • 1 physics bachelor degree
           User Workflow in Software
  Samples & Clinical Information                    Genome Sequence & Genetic Variation

                                   Biological                     Purchased
                                    Sample           NCBI           SNPs dbSNP
                                    Platform                  DCC (Celera)

                                     Project           Plan Experiment in
                                   Management          Project Management

Execute                                                              Execute
            Genotyping                          Resequencing
Genotyping                                                           Resequencing
           Pipeline (ESP)                          Pipeline
Experiment                                                           Experiment

               Perform analysis and loop back to next round
                         of experiment planning
    CARE Association Study Workflow
Production:                                                 Analysis: Gene Pattern +
Sample Mgt, Project Mgt, Genotyping                                  custom analysis tools

     Upload Samples,
     Peds, Individuals,       Sample                           Data Compile
        Phenotypes              DB

                                            Web Services
                              Project                          PLINK
    Create Experiments          DB
   (Samples x Features)
                                                             Association &
                                                            Statistics Viewers
       Design and
        Execute            LIMS DBs                              Custom
       Experiments                                         Algorithms, Viewers

    QC/Curate Results                   Data Vault
            Phenotype Component
            Conceptual Architecture

            Thesauri, Meta Thesaurus for CARE

            Controlled Vocabulary Constraints
                                                – either
                                                Group or
  Base            Phenotype Inquiry             specified
            Phenotype Capture and Validation
      CARE Progress Report
• PhenoMall functionality
  – Rapid enhancement of capture function
     • Meta data
     • Mapping of all CARE phenotypes looks good
  – Major enhancements for pheno inquiry
  – Informatics goals of pilot
     • Figure out how far up the controlled
       vocab/thesaurus stack we need to go
     • What curation tools are needed?
     • Requirements gathering beyond pilot
• Awaiting the decision on data sharing…
• NIH Application/System Security Plan
  – Two major revisions, July 17th and Oct 16th
  – Security officer at NHLBI is Cindy Walczak
• When data sharing model decided:
  – Research technologies, approaches, make
    recommendation to subcommittee
  – Spec/design and review by subcommittee
• Working pilot
  – Need to discuss when to demo – Feb meeting
    in Bethesda??
Security Considerations
    Security Layers - General
• There are at least three levels:
  – MIT firewalls
     • Penetration testing, Tripwire, packet monitoring, etc.
  – Broad
     • New Cisco firewalls
     • Route to host servers
        – Explicit Allows only
     • Wireless access goes out to MIT firewall
     • Open jack goes to Broad firewall
  – CARE Center application itself
The World   MIT                The Broad Institute

                                                                 On LIMS
                                  Used for authentication for
                      Cisco       VPN access
                     ASA 5540                                    Host A
Internet    MIT       Radius                 Core
“Cloud”                DB                    Router
                      Cisco                                      Host B
                     ASA 5540

                                                                Host on server
                                         Access Rules

                                         for Subnets:
                                         Explicit allows,
                  Allow Rules:
                                         e.g., allow host
                  Explicit allows –
                                         on LIMS to talk to
                  http = 80 -> host
                                         host on server
                  Ssh = 22 -> host
                  https = 443 (SSL)
                                         Must be in the list
                                         to permit access

                                                 Unregistered 10.10
                    Open jack                    domain

  Security Layers - Application
• Genetic Analysis Platform application
  – Role-based security
  – Passwords that expire
  – Audit trails track user activity
• Detailed information available in NIH
  Application/System Security Plan for
  CARE Center
  Summary: Issues/Questions
• Scope of phenotype-related enhancements
• Group/Project structure for CaRE Center
• CaRE user visibility into Process
• Data release model decision
  – Data Enclave scenarios and security
• User training and doco
  – Analysis methodology
  – System and security training
      Security for Production & Analysis
             Users in JAAS domain                     BSP Lab                   CaRE
                                                      Technician                Scientist
Cohort                                Biological Samples
Technician     Project                                              Analysis
             Management                    Platform                 Pipelines
                                       BSP Security Context
                   Proj Mgt            (Sample Collection)         CaRE Analysis
   Groups,         Security                                        Security Context
   Projects,       Context                                         (Scope based on rules
   Grants,         (Project)                                       of Data Enclave, could
   Panels,                                                         cover multiple
   Feature Sets,                                                   Projects)
   Sample Sets
                                     Shareable Objects:
                                      Peds, Individuals,
                                    Phenotypes, Samples,
        LIMS          Lab
                      Security        PIPS DB      Feature DB
Broad Lab Technician, Context
Coordinator           (X-Project)
          How Users Can Help
• Specify! We need things nailed down…
• The classic specification:
   – Genesis 6:14 - 16 (NKJV) 14 "Make yourself an ark
     of gopherwood; make rooms in the ark, and cover it
     inside and outside with pitch. 15 "And this is how you
     shall make it: The length of the ark shall be three
     hundred cubits, its width fifty cubits, and its height
     thirty cubits. 16 "You shall make a window for the ark,
     and you shall finish it to a cubit from above; and set
     the door of the ark in its side. You shall make it with
     lower, second, and third decks
• We live in the world of 0’s and 1’s!
 Informatics Development Team
         Jason Carey                 James Nemesh (CH)
         Kristian Cibulskis          Huy Nguyen
         Michael Dinsmore            Howard Rafal
         Tim Fennell                 Greg Rushton
         George Grant                Dennis Ryan
         Bob Handsaker               David Tefft
         Nina Lapchyk                Alex Thomson
         Pei Lin                     Ellen Winchester
                                     Alec Wysoker

Names in bold have significant time allocated to CARE center activity.

Shared By:
Description: CAREcenterInformaticsCall ppt Slide Pips