Docstoc
EXCLUSIVE OFFER FOR DOCSTOC USERS
Try the all-new QuickBooks Online for FREE.  No credit card required.

USGS Bioinformatics Activities Jan 2010.ppt - Eionet Projects

Document Sample
USGS Bioinformatics Activities Jan 2010.ppt - Eionet Projects Powered By Docstoc
					USGS Bioinformatics Activities
Ecoinformatics
January 2010




Gladys Cotter
Mike Frame
    Topics for Discussion

1   USGS Bioinformatics Activities


2   Potential areas of collaboration


3   Questions
                                      Bioinformatics
                USGS NBII – addressing bioinformatics challenges
                  through collaboration, content development,
                technology, and creating long-term infrastructure



 Collecting          Linking          Storage     Organization




•Tools                              •DBMS         •Structure
                                                                     Applications
                   •Cross-
•Protocols         referencing      •Central &    •Governance            for
•Standards         •Relationship    Distributed   •Standards
                   of data          •Security     •Policies          •Tools
                                    •Backups                         •Protocols
                                    •Archival                        •Standards
                                    •Standards



                                                                      •Research
                                                                      •Decision Making
Integration        Analysis         Synthesis        Delivery         •Policies
                                                                      •Education
                                                                      •Outreach
•Multi-levels     •Tools           •Fusion         •Tools
•Difficult        •Standards       •Blending       •Governance
•Mashups          •Usability       •Related        •Infrastructure
•Standards        •Training        Integration     •User analysis
                  •Non-biased      •Analysis
                                   •Models


 Sustainable            Reliable       Outreach       Training
               Biological Spatial Infrastructure
                            NBII




   Over 72,000 records
   Based on FGDC BDP
   Training Program
   QA/QC Program
   Standards Cross-walks
    EML
    Dublin Core
 Establishing Administrative Tools
 Expanding internationally
 Embedding in-line visualization
              World Data Center for
              Biodiversity & Ecology
• World Data System
  created through the
  International Council of
  Scientific Unions (ICSU)
  in 1957
• Currently 50 World Data
  Centers (WDC) in place
  internationally
• USGS National Biological
  Information Infrastructure
  (NBII) network designated
  as the WDC for
  Biodiversity & Ecology in
  2002
                             WDC
                        Current Activities
•   Renewable Energy Project Prequalification Demonstration project
     – Goal: support rapid prequalification of sites across the nation that are
       potentially suitable for renewable energy (with an initial focus on federal
       lands).
         • Data sets include, but are not limited to:
         • Land Cover (GAP),
         • Protected areas/Stewardship (GAP),
         • Species Distributions/Habitat Affinities (GAP),
         • Species Occurrences (US-GBIF Mirror Site and NBII),
         • Integrated Taxonomic Information System (ITIS)
         • Topography (USGS),
         • Landforms (USGS/GAM),
         • Soil Moisture (USGS/GAM),
         • Ecosystems (USGS/GAM),
         • Renewable Energy Potential (i.e., wind, solar, geothermal, and
                    biofuels; NREL), and
         • Infrastructure (i.e., power grid, projected smart grid, and roads;
                    NREL and USGS).
•   Protected areas – working with WDPA, USGS GAP
•   Sponsoring WDC for Biodiversity & Human Health
     – South Africa is hosting
     – Providing workshops, training, demonstration projects
     – Evaluating how to leverage ILTER activities
                         Multilingual
                        IABIN Catalog


Ability to search by:
     IABIN TN
     Map interface
     Resource Type
     Language
     Taxonomy
     Multi-lingual thesaurus
Thesaurus web-services
     English
     Spanish
     Portuguese
              NBII Search
             Unique Facets
Dynamic
biological                   Biological
 clusters                     images




Refine
Results
                               Map
                              Display
               Additional
             Unique Facets
 Diverse
 Sources
                      Thesaurus
                      integration

  DBMS
 Websites
Federation
Documents




 Publisher
refinement                          Weighting
                                    of sources
Integrated Taxonomic
  Information System




• Multi-agency partnership
• Primarily North America Taxa
• Used Globally
• Web-services released Summer 2009
• Taxonomic Workbench 2010
                NBII Species Mashups
Designed for
 – One-stop-shop for species information in SE
 – Integrate diverse sources
     • Content Type
     • UI Presentation
       USGS Data Integration
3 Major Goals:
  1. Establishing corporate data available via
     ESRI services
  2. Improving access to Modeling data,
     including Water quality, stream, etc.
  3. Providing easy to use “data upload”,
     “registry”, and “discovery tools”
        North American EOL

• Multi-agency partnership designed to
  develop a prototype for “species”
  information” within the Great Lakes and
  Chesapeake Bay regions
          NSF DataNet Grant
            Background

• NSF solicitation to establish
   – Long-term archives for science data
   – Develop sustainable business model to
     support these activities
   – Involve multi-disciplinary domains
   – Develop various R&D needed to support effort
   – Provide ongoing “operational” support

   Funded 2:
     DataONE
     The Data Conservancy
               DataONE
           Areas of emphasis
• Data loss: preserving all the work that has been done; by
  preserving at-risk (orphaned) biological ecological environmental
  data from individual scientists
• Data dispersion: finding the needle in the haystack; by
  facilitating discovery and access of data through a single easy-
  to-use portal
• Data deluge: navigating the flood of increasingly heterogeneous
  data; by providing a toolbox that empowers scientists and
  organizations to more easily and effectively manage, analyze,
  and synthesize data
• Data Practice: using the best tools to do the job; by creating an
  informatics-literate workforce through innovative outreach and
  training efforts (e.g., best-practice videos, podcasts, on-line
  certificate programs, downloadable best practice guides and
  exemplars of data management plans)
               DataONE Technology
                    Directions
•       DataONE will enable new science and knowledge
        creation through universal access to data about
        life on earth and the environment that sustains it
        by:
    –     making the scientist an active member of the data
          preservation process,
    –     creating cyberinfrastructure that supports the full
          data life cycle,
    –     promulgating cultural changes that value data
          stewardship and data sharing,
    –     broadly promoting best practices
    –     engaging citizens in science
                                                            16
    –     domain-agnostic Solutions
                 Partnering organizations

•   Libraries & digital libraries
•   Academic institutions
•   Research networks
•   NSF- and government-
    funded synthesis &
    supercomputer
    centers/networks
•   Governmental organizations
•   International organizations
•   Data and metadata archives
•   Professional societies
•   NGOs
•   Commercial sector
                                            17
          Why is this relevant to
            Ecoinformatics
 Share similar Cyber infrastructure needs
     Architecture
     Portals
     Distributed approaches
     Replication
     Secure, controlled access
     Authentication methods
     Tools deployed, and supported
     Data discovery & interoperability methods
     Standards developed, deployed
 Life Cycle Data Management tools (i.e Investigator toolkit, CI)
 R&D activities in the areas of CS, IS, SS, GIS, Env., etc.
 Opportunity for broad Governmental & International
Participation (i.e. working groups, tool evaluations, etc.)
 Complementary to several of our groups goals, projects,
activities
 Potential Microsoft related projects (i.e. MS Excel)
   Potential areas of
     collaboration

• NBII Metadata Expansion
• Incorporation of additional species data
  into NA EOL, NBII Species Mashups, etc
• USGS Data Integration activities
• NSF DataONE Grant
• Potential Microsoft tools
      Questions & Comments




Mike Frame            Gladys Cotter
mike_frame@usgs.gov   Gladys_cotter@usgs.gov
865 576-3605          703 648-4182
Technical Architecture & Discussions




DataONE: Enabling Data-Intensive
Biological and Environmental
Research
Existing biological data
       archives
                 ESA’s
                 Ecological
                 Archive


                 Distributed Active
                 Archive Center

                 National Biological Information
                 Infrastructure

                 Fire Research & Management
                 Exchange System

                 Long Term Ecological
                 Research Network


                 Knowledge Network
                 for Biocomplexity
                  Example data holdings
                     Metadata Interoperability Across Data Holdings
                                                                                              Metadata
               Data Archive                            Types of Data Managed
                                                                                             Standard(s)
                                                  Biodiversity, taxonomic, ecological       BDP, DwC, DC,
                                                                                                OGIS
                                                 Biogeochemical dynamics, terrestrial       DIF, BDP, ECHO
                                                 ecological Earth observation imagery
                                                 Ecological, biodiversity, biophysical,          EML
                                                  social, genomics, and taxonomic
                                               Avian populations and molecular biology          DwC


                                                       Biological and taxonomic               DC subset

                                              Biophysical, biodiversity, disturbance, and        EML
                                                     Earth observation imagery

                                                     Biodiversity, biotic structure,             EML
                                              function/process, biogeochemical, climate,
                                                            and hydrologic

BDP=Biological Data Profile   DC subset=Dublin Core subset       DwC=Darwin Core

DC=Dublin Core                DIF=Directory Interchange Format    ECHO=EOS ClearingHOuse                     23
EML=Ecological Metadata Language                                 OGIS=OpenGIS
                Distributed framework
Coordinating Nodes
Member Nodes                 Flexible, scalable,
• retain complete            sustainable network
• diverse catalog
metadata institutions
• subset of all data
• serve local community
• perform basic indexing
• provide resources for
• provide network-wide
managing
services their data
• ensure data availability
(preservation)
• provide replication
services
                     Supporting the
                      data lifecycle


                                        ORC
                                        Node
                                UCSB
                                Node

                                       UNM
                                       Node




                1.   Deposition/acquisition/ingest


            }
                2.   Curation and metadata management
The data        3.   Protection, including privacy
lifecycle       4.   Discovery, access, use, and dissemination
                5.   Interoperability, standards, and integration
                6.   Evaluation, analysis, and visualization
          Use Cases, Architecture
                Planning
http://mule1.dataone.org/ArchitectureDocs/index.html
        Changing science culture


1. Education and training
2. Engaging citizens in science
3. Building global communities of practice
                 Education and training

Career Long Learning:           Best Practice Guide

•   best practice guides        Using Best Practice Guide
                                       Metadata for               Gold Star
                                e-research                        Data Management
•   exemplary data management          How to Cite                Plan
    plans                                       Best
                                       Your Data Practice Guide
                                5 in a series                     Here’s How
•   podcasts, web-casts                         How to Cite
                                                Your
                                       6 in a series Data
•   workshops and seminars
•   downloadable curricula                   6 in a series
Engaging citizens in science




                     www.CitizenScience.org
               Building global long-lived
               communities of practice:

•   Broad, active community engagement
    –   Involvement of library and science educators engaging
        new generations of students in best practices
    –   Existing outreach and education programs
•   Transparent, participatory governance
•   Adoption/creation of innovative and sustainable business
    and organizational models
                                          External Advisory Committee
   NSF
                                                Principal Investigator                    Leadership Team
   DataNet
   Partners
                                                Executive Director

  Director                                            DataONE              Director
  Development & Operations                            Office               Community Engagement & Outreach


  R&D                      Operations                                    R&D             Operations               DIUG

                                          Coordinating Nodes
                                                                                      Education and
    Core CI Team
                                                                                      Outreach Team
                                          Member Nodes
Infrastructure and Research Working Groups                                     Engagement Working Groups
         Federated security                                                       Sociocultural barriers to data sharing
                                                                                  and preservation
         Distributed storage
                                                                                  Community engagement and education
         Data preservation, metadata, and
         interoperability
                                                                                  Citizen science and public outreach
         Scientific workflows
                                                                                  Long-term sustainability and
                                                                                  governance
         Data integration and semantics

         Exploration, Visualization, Analysis                                     Exploration, Visualization, Analysis

         Usability and assessment                                                 Usability and assessment
          Why is this relevant to
            Ecoinformatics
 Share similar Cyber infrastructure needs
     Architecture
     Portals
     Distributed approaches
     Replication
     Secure, controlled access
     Authentication methods
     Tools deployed, and supported
     Data discovery & interoperability methods
     Standards developed, deployed
 Life Cycle Data Management tools (i.e Investigator toolkit, CI)
 R&D activities in the areas of CS, IS, SS, GIS, Env., etc.
 Opportunity for broad Governmental & International
Participation (i.e. working groups, tool evaluations, etc.)
 Complementary to several of our groups goals, projects,
activities
 Potential Microsoft related projects (i.e. MS Excel)
                 Thanks!
Management Team:
Leadership Team:                 Projects and Funding
Bill Michener – UNM, PI            Sources:
Suzie Allard – UT
Suzie Allard – UT
John Cobb – ORNL                 DataONE Partners
John Cobb – ORNL
Bob Cook - ORNL                  Virtual Data Center – InterOP
Bob Cook – ORNL
Patricia Cruse – CDL             Kepler-CORE Team
Mike Frame - – CDL
Patricia Cruse USGS
                                 SEEK & KNB Teams
Mike Frame – USGS - UCSB
Stephanie Hampton
Stephanie Hampton – UCSB
Viv Hutchison - USGS
Matt Jones - – USGS
Viv HutchisonUCSB
Matt Jones – UCSB
Steve Kelling - Cornell
Kathleen Smith – UNC
Steve Kelling – Cornell
Carol Tenopir – Duke
Kathleen Smith -UT
Bruce Wilson - Joint ORNL – UT
Carol Tenopir – UT
Dave Vieglais – KU, DataONE
Bruce Wilson – Joint ORNL – UT

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:7
posted:2/18/2013
language:Unknown
pages:33