Docstoc

Science Gateways as a Conduit for Major Scientific Advances.ppt

Document Sample
Science Gateways as a Conduit for Major Scientific Advances.ppt Powered By Docstoc
					 Science Gateways

 and their tremendous
potential for science and
      engineering



                Nancy Wilkins-Diehr
    TeraGrid Area Director for Science Gateways
                  wilkinsn@sdsc.edu
       Thank You for the Invitation to Speak
   To such a distinguished audience in such a beautiful location

•Many similarities
between Banff and
Gateways
 –Both are about
  connections
   •National park created due to
    sea to sea railway connection
 –Trail guides lead the way
   •“Peyto assumes a wild and
    picturesque, though
    somewhat tattered attire”
     –Describes Banff trail guides and
      gateway developers!



                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
   Phenomenal Impact of the Internet on Worldwide
      Communication and Information Retrieval

       Only 15 years since the release of Mosaic!

•Implications on the conduct of science are still evolving
 – 1980’s, Early gateways, National Center for Biotechnology Information BLAST
   server, search results sent by email, still a working portal today
 – 1989, First ftp archive (archie) created at McGill
 – 1992 Mosaic web browser developed
 – 1995 “International Protein Data Bank Enhanced by Computer Browser”
 – 2004 TeraGrid project director Rick Stevens recognized growth in scientific
   portal development and proposed the Science Gateway Program
•Simultaneous explosion of digital information
 – Analysis needs in a variety of scientific areas
 – Sensors, telescopes, satellites, digital images and video
 – #1 machine on Top500 today is 300x more powerful than all combined entries
   on the first list in 1993

                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
     1998 Workshop Highlights Early Impact of
              Internet on Science
•Shared access to geographically
 disperse resources
•Assembling the best minds to
 tackle the toughest problems
 regardless of location
•Tackling the same problems
 differently, but also tackling
 different problems
•Not only the scope, but the
process of scientific investigation is
changed
 – “As the chemical applications and
   capabilities provided by collaboratories
                                            Requirements for future success include:
   become more familiar, researchers
                                                - Development of interdisciplinary partnerships of
   will move significantly beyond
                                                chemists and computer scientists
   current practice to exciting new
                                                - Flexible and extensible frameworks for
   paradigms for scientific work”
                                                collaboratories
                                                - Means to deploy, support, and evaluate
                                                collaboratories in the field
                                        Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
               Rapid Advances in Web Usability
•First generation
 – Static Web pages
•Second generation
 – Dynamic, database interfaces, cgi
 – Lacked the ease of use of desktop applications
•Third generation
 – True networked and internetworked applications that enable dynamic two-
   way, even multi-way, communication and collaboration on the Web.
 – Remarkable new uses of the Web in the organizational workplace and on the
   Internet




  Source: Screen Porch White Paper, The University of Western Ontario (1996)



                                              Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
                                    What’s Next?
            “Prediction is hard. Especially about the future.” Yogi Berra
•Scientists of tomorrow are familiar with media we don’t even know about
•Not using full power of the internet by any means today
  – Data and knowledge are handled differently
     •Linking publications and data referenced in those publications
     •Annotation, data provenance
     •Inability to create discourse around a piece of data
  – Ability to keep up with knowledge generation
     •16,000 papers a week into PubMed
     •50,000 papers a week in biology
       –Right now have choice between reading abstract or paper, might add 10 minute
        author clip
•How can science motivate in the way YouTube can?
  – Streaming video to view simulations, using visual and sound media
  – Ipods everywhere, but not exploited for science
  – Web 2.0
•Science was earlier internet adopter, now overtaken by business
  – Now a big difference between commercial and scientific sites
    • Noticeable efforts to keep users on commercial sites

               Source: 5/14/07 interview with Dr. Philip Bourne, Protein Data Bank

                                        Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
                                                   The Internet as a Resource for News and Information about Science:
                                                                     Summary of Findings at a Glance
                                  40 million Americans rely on the internet as their primary source for news and information about science.
                                  For home broadband users, the internet and television are equally popular as sources for science news – and
                                  the internet leads the way for young broadband users.
                                  The internet is the source to which people would turn first if they need information on a specific scientific
  The convenience of getting      topic.
scientific material on the web    The internet is a research tool for 87% of online users. That translates to 128 million adults.
opens doors to better attitudes   Consumers of online science information are fact-checkers of scientific claims. Sometimes they use the
and understanding of science.     internet for this, other times they use offline sources.
                                  Convenience plays a large role in drawing people to the internet for science information.
                                  Happenstance also plays a role in users’ experience with online science resources. Two-thirds of internet
     November 20, 2006            users say they have come upon news and information about science when they went online for another
 John B. Horrigan, Associate      reason.
          Director                Those who seek out science news or information on the internet are more likely than others to believe
                                  that scientific pursuits have a positive impact on society.
                                  Internet users who have sought science information online are more likely to report that they have higher
                                  levels of understanding of science.
                                  Between 40% and 50% of internet users say they get information about a specific topic using the internet or
                                  through email.
                                  Search engines are far and away the most popular source for beginning science research among users who
                                  say they would turn first to the internet to get more information about a specific topic.
                                  Half of all internet users have been to a website which specializes in scientific content.
                                  Fully 59% of Americans have been to a science museum in the past year.
                                  Science websites and science museums may serve effectively as portals to one another.
                         http://www.pewinternet.org/pdfs/PIP_Exploratorium_Science.pdf
     NSF (my sponsor) has long recognized the
       importance of science and technology
                   interactions
•Interdisciplinary programs did much to facilitate application-
technology integration and develop standard tools
 – 1997 PACI Program
   •Marriage of technologists and application scientists
     –A few groups served as path finders and benefited
     tremendously
     –NPACI neuroscience thrust in 1997 leads to Telescience
     portal and BIRN in 2001
 – Information Technology Research (ITR)
 – NSF Middleware Initiative (NMI)
   •Plug and play tools so more groups can benefit




                                         Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
          NSF Continues Its Leadership Today
                 What Will Lead to Transformative Science?
•“Virtual environments have the
 potential to enhance collaboration,
 education, and experimentation in
 ways that we are just beginning to
 explore.”
•“In every discipline, we need new
 techniques that can help scientists
 and engineers uncover fresh
 knowledge from vast amounts of
 data generated by sensors,
 telescopes, satellites, or even the
 media and the Internet.”            Gateways are a terrific example of
                                        interfaces that can support
                                          transformative science



                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
  Flagship US$52M CDI Program Launched in
                   2008
•Cyber-enabled Discovery and Innovation (CDI) is
 – “NSF’s bold five-year initiative to create revolutionary science and engineering
  research outcomes made possible by innovations and advances in
  computational thinking.”
 – Program announced October 1
   •Bold multidisciplinary activities that, through computational thinking, promise radical,
    paradigm-changing research findings
   •Far-reaching, high-risk science and engineering research and education agendas that
    capitalize on innovations in, and/or innovative use of, computational thinking
   •Partnerships to involve investigators from academe, industry and may include
    international entities

•Growth to US$250M recommended by 2012
 – Funded across NSF directorates
•Birds-of-a-feather session at SC07 in Reno, NV



                                    Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
        Three Thematic Areas Offer Diversity
•From Data to Knowledge
 – Enhancing human cognition and generating new knowledge from a wealth of
   heterogeneous digital data
 – Data mining, visualization, petascale computational power, etc. to assist scientists and
   engineers extract most important information from the almost infinite amounts of data
   from sensors, telescopes, satellites, the media, the Internet, surveys, etc.
•Understanding Complexity in Natural, Built, and Social Systems
 – Deriving fundamental insights on systems comprising multiple interacting elements
 – Simulate and predict complex stochastic or chaotic systems
 – Explore and model nature’s interactions, connections, complex relations, and
   interdependencies, scaling from sub-particles to galactic, from subcellular to biosphere,
   and from the individual to the societal
•Building Virtual Organizations
 – Facilitate creative, cyber-enabled boundary-crossing collaborations, including those with
   industry and international dimensions
 – Advance the frontiers of science and engineering and broaden participation in science,
   technology, engineering and math fields



                                    Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
                 Exciting Canadian Activities
•September 13, 2007 announcement of $30M CANARIE
program
 – Network-Enabled Platforms (NEP)
   •Collaborative projects that accelerate the development of, and participation in,
    national and international cyberinfrastructure and e-Research platforms. Participants
    in the Program can be from both the public and private sectors.
 – Infrastructure Extension Program (IEP)
   •Extensions to Canada's research and education network that will enhance and
    accelerate research, enable national and international collaboration, improve access to
    knowledge, and contribute to the development of cyberinfrastructure and e-research
    in Canada.




                                   Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
 Science Gateways are a Natural Extension of
           Internet Developments
•3 common types of gateway
 – Web portal with users in front and services in back
 – Client server model where application programs running on users' machines
   (i.e. workstations and desktops) and accesses services
 – Bridges across multiple grids, allowing communities to utilize both community
   developed grids and shared grids
•Continued rapid changes ahead, must be adaptable,
gateways can provide some nimbleness




                                Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
     Gateway Idea Resonates with Scientists
•Capabilities provided by the Web are easy to envision
because we use them in every day life
•Researchers can imagine scientific capabilities provided
through a familiar interface

•Groups resonate with the fact that gateways are designed
by communities and provide interfaces understood by those
communities
 – But also provide access to greater capabilities on the back end without the
   user needing to understand the details of those capabilities
 – Scientists know they can undertake more complex analyses and that’s all they
   want to focus on
•But this seamless access doesn’t come for free. It all hinges
on very capable developers
                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
  Trust and Reliability are Fundamental to Success
•Fundamental in business applications
 – Fundamental for science too
•The public gains confidence in internet sites that provide
accurate information reliably
 – Pub Med
 – National Cancer Institute
 – Google
 – Paypal
•For scientists it takes far longer to build this confidence
 – Scientists will not rely on gateway tools to conduct their analysis and store
   their research results unless they have ultimate confidence in the interfaces
   •Proven track record
      –Run by reputable organization
      –Have been in existence “a long time”
      –Provide accurate results
      –Work repeatedly
      –Confidence in PDB developed over 30 years, started with community mandate that
       proteins must be deposited before publications would be accepted
                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
 How can we build interfaces that scientists will trust?
•Expertise
 – Simple web pages are easy to design
 – Complex capabilities, particularly those involving grid access, take
   knowledgeable developers to create a production product
   •LEAD, nanoHUB show what investment can do

•Sustained funding
 – Most science groups have money for research, not portal building or ongoing
   support for portals
•Knowledge transfer
 – Must take advantage of industry advancements
 – Investments must result in building blocks that other applications can use
 – Many gateways have similar issues
   •Data access
   •Analysis capabilities
   •User work environments
   •Workflow capabilities


                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
 Tremendous Opportunities Using the Largest
           Shared Resources -
             Challenges too!
•What’s different when the resource doesn’t belong just to
me?
 – Resource discovery
 – Accounting
 – Security
 – Proposal-based requests for resources (peer-reviewed access)
   •Code scaling and performance numbers
   •Justification of resources
   •Gateway citations

•Tremendous benefits at the high end, but even more work
for the developers
•Potential impact on science is huge
 – Small number of developers can impact thousands of scientists
 – But need a way to train and fund those developers and provide them with
   appropriate tools
                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
              What is the TeraGrid?
A unique combination of fundamental CI components
          Opportunities and Challenges as a Virtual
                        Organization
•Full vision of cyberinfrastructure
 – Data, compute, visualization, workflows
 – But need to do a better job of representing the capabilities to researchers
 – Creating prototypes for others to follow
 – Never underestimate the value in keeping things SIMPLE
•Work with top notch people regardless of location
 – Better for end users
   •Single request process for all types of resources
   •Single place for documentation

•But must work harder
 – To sustain momentum in projects
   •Set a few high-level goals
   •Clear management structure
      –Individual responsibility
      –Project accountability
 – To provide clarity for users

                                   Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
TeraGrid Resources Available for all Domain Scientists
                           At no cost to them!
•Integrated, persistent, pioneering
 resources
•Significantly improve the ability
 and capacity to gain new insights
 into the most challenging research
 questions and societal problems
•Peer-reviewed, proposal-based
 access
  – Targeted support available as
    well
   •Dedicated staff investment to
    really make a difference on
    complex problems
     –Transformational science
   •Must have PI commitment
   •Make lessons learned available
    for all

                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
                                                   TeraGrid Usage
                             275
                                   Specific
                                   Specific Allocations            Roaming Allocations
                             250   Roaming
Compute
 Cycles                      225                                                           ~50% Annual Growth
Delivered
            200
                             200


                             175
Normalized
            NUs (millions)




                             150
   Units
 (millions)                  125


                    100
                     100


                              75


                              50


                              25
                                    J




                                    J
                                    J




                                    J




                                    J
                                    J




                                    J




                                    J
                                    J




                                    J




                                    J
                                   F




                                   F




                                   F




                                   F
                                   A



                                   A




                                   A



                                   A




                                   A



                                   A




                                   A
                                   M

                                   M



                                   S




                                   M

                                   M



                                   S




                                   M

                                   M



                                   S




                                   M

                                   M
                                   N
                                   D




                                   N
                                   D




                                   N
                                   D
                                   O




                                   O




                                   O
                                          2004                           2005                            2006   2007
 TeraGrid currently delivers an
average of 420,000 cpu-hours per
                                                          Source: Dave Hart (dhart@sdsc.edu)
 day -> ~21,000 DC every hour

                                                               Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
TeraGrid User Community


                                               Gateways



                                                          Growth Target
                          u r r n in
                            t
                         Q a e Ed g




     Source: Dave Hart (dhart@sdsc.edu)



             Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
Easy TeraGrid Gateway True and False Test
                          Answers Provided
•Any PI can request an allocation          •TeraGrid selects all gateways (F)
 and use it to develop a gateway           •TeraGrid designs all gateways (F)
 (T)                                       •TeraGrid limits the number of
•Gateway design is community-               gateways (F)
 developed and that is the core            •All gateways need TeraGrid
 strength of the program (T)                funding to exist (F)
•TeraGrid staff are alerted to
 gateway work when a proposal is
 reviewed or when a community
 account is requested (T)
•Limited TeraGrid support can be
 provided for targeted assistance to
 integrate an existing gateway with
 TeraGrid (T)




                             Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
                         TeraGrid RATs
                  (Requirements Analysis Teams)

•Spring, 2005 Science
Gateway Requirements
Analysis Team (RAT)
 –Identification of common needs
   across the gateways
 – Goal is production use of TG
   resources in the gateway as well
   as development of process and
   policy within TG for scalable
   gateway program and services
 – Tremendous sharing of
   experiences amongst talented
   developers




                             Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
      2006 – Implementing Common Gateway
                  Requirements
•Web Services                                        •Scheduling
  – GT4 deployment, identification of                  – Metascheduling RAT
    remaining capabilities                             – On-demand via SPRUCE framework
  – Information services, WebMDS                     •Outreach
•Auditing                                              – Talks, Schools/workshops (NVO,
  – Need to retrieve job usage info on                   GISolve), major project demonstrations
    production resources                                 (LEAD)
  – GRAM audit deployed in test mode in                – SURA, HASTAC, GEON, CI-Channel, SC,
    September, inclusion in CTSSv4                       Grace Hopper, MSI-CI2, Lariat, Science
•Community Accounts                                      Workflows and On Demand Computing
  – Policy finalized, security approaches                for Geosciences Workshop
    being tested by RPs                              •Primer
  – Attribute-based authentication testing             – Living document in wiki, provides up-to-
•Allocations                                             date overview and instructions for new
                                                         gateway developers (“how to make your
  – Changes in allocation procedures, the                portal a TeraGrid science gateway”)
    mechanisms used to evaluate science
    impact, and models for identity
    management, authentication and
    authorization that are more tuned to
    virtual organizations.

                                      Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
        Current Activities – Moving Forward!
•Extend development of general gateway services
 – React to and anticipate community needs
   •Streamlined TeraGrid integration means more interest and more science
 – Building Blocks for Science Gateways
   (http://www.cigi.uiuc.edu/doku.php/projects/simplegrid)
•Continue targeted work with selected projects
 – SidGrid, CReSIS
•Stay ahead of technology changes
 – Well, at least not get too far behind…
•Build on burgeoning interest in gateways for education
 – Navajo Technical College
 – TeraGrid EOT supplemental funding




                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
         Planning for the Future of TeraGrid
•Activity lead by U Michigan School of Information
 – www.teragridfuture.org
 – Gateway (June) and user (August) workshops held
 – Report due February, 2008
•Recommendations from gateway workshop include:
 – Support interaction and cross-fertilization among Science Gateway
   development communities
   •Sharing code and successful solutions
   •Financial and professional support for developing gateways
 – Develop gateway framework templates built upon toolkits which may already
   exist
 – Training, education, workshops, generalized & standardized basic services,
   documentation
 – End-to-end support for Virtual Organizations
 – Operating more effectively as a community in order to better support the
   education and development needs of gateway developers.

                                  Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
           Selected Gateway Highlights
•nanoHUB
•Linked Environments for Atmospheric Discovery (LEAD)
•GridChem
•Biomedical Informatics Research Network (BIRN)
•Center for Remote Sensing of Polar Icesheets (CReSIS)




                       Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
 Highlights: NanoHub Explosive User Growth
•In past 12 months
 – 26,000 users
   •50% of usage from U.S.
 – 10 courses viewed by over 6,000 users
 – 165 podcasts downloaded by over 4,000 users
 – 1400 online meetings
•Short clip from Gerhard Klimeck




                              Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
           Highlights: LEAD Inspires Students
         Advanced capabilities regardless of location
•A student gets excited about what he
 was able to do with LEAD
•“Dr. Sikora:Attached is a display of 2-
 m T and wind depicting the WRF's
 interpretation of the coastal front on
 14 February 2007. It's interesting that
 I found an example using IDV that
 parallels our discussion of mesoscale
 boundaries in class. It illustrates very
 nicely the transition to a coastal low
 and the strong baroclinic zone with a
 location very similar to Markowski's
 depiction. I created this image in IDV
 after running a 5-km WRF run
 (initialized with NAM output) via the
 LEAD Portal. This simple 1-level plot
 is just a precursor of the many
 capabilities IDV will eventually offer to
 visualize high-res WRF output. Enjoy!
•              Eric” (email, March 2007)
                                    Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
Highlights: GridChem’s Client-Server Approach
    Provides Power and a Rich Feature Set




                                              National Center for
       Source: Sudhakar Pamidighantam, NCSA     Supercomputing
                                                    Applications
      Biomedical Informatics Research Network (BIRN)‫‏‬
BIRN is a National Center for Research Resources (NCRR) initiative
   aimed at creating a testbed to address biomedical researchers




                                    Source: Anthony Kolasny, Johns Hopkins
    Shape Analysis - A Morphometry BIRN Project



                               4
                                           JHU CIS-KKI
                                       Shape Analysis
    3
                                   of Segmented Structures
       MGH                                                                 5
                                                                                        BWH
    Segmentation                                                                     Visualization
                                              TeraGrid
                                           Supercomputing




        Data Donor                                                         Goal: comparison and
1
           Sites                                                         quantification of structures’
                                            Storage                         shape and volumetric
                                                                          differences across patient
           De-identification
             And upload                                                          populations
                                       2
                                                  Source: Anthony Kolasny, Johns Hopkins
 BIRN uses SSHFS to mount TeraGrid
        filesystems locally



                                                                             CIS has 87TB
                                                                                of local
                                                                                storage.
                                                                              /cis/net lists
                                                                                network
                                                                                 drives.


                                                                                  220TB
                                                                                 through
                                                                                CIS portal
                                                                                   using
                                                                              autofs, samba,
                                                                              smbwebclient.


Source: Anthony Kolasny, Johns Hopkins University
                                   Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
   CReSIS (Center for Remote Sensing of Ice
                   Sheets)
•Awarded CI-TEAM
funding to build a Polar
Gateway
 – International Polar Year 2007-2008
 – Led by Geoffrey Fox, IU and Linda
   Hayden, Elizabeth City State

•CReSISGrid
 – Build a TeraGrid Science Gateway
 – Provide broad-based educational and
   training activity in Cyberinfrastructure
   for remote sensing and ice sheet
   dynamics
 – Lessons learned in remote data
   gathering can be applied to fields




                                       Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
            When is a gateway appropriate?
•Researchers using defined sets of tools in different ways
 – Same executables, different input
   •GridChem, CHARMM
 – Creating multi-scale workflows
 – Datasets
•Common data formats
 – National Virtual Observatory
 – Earth System Grid
 – Some groups have invested significant efforts here
   •caBIG, extensive discussions to develop common terminology and formats
   •BIRN, extensive data sharing agreements

•Difficult to access data/advanced workflows
 – Sensor/radar input
   •LEAD, GEON




                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
         Tremendous Potential for Gateways
•In only 15 years, the Web has fundamentally changed
human communication
•Science Gateways can leverage this amazingly powerful tool
to:
 – Transform the way scientists collaborate
 – Streamline conduct of science
 – Influence the public’s perception of science
•Like e-commerce, Science Gateways need to build trust in
the infrastructure, tools, and methods that they use
•Unlike the public or commercial arena, scientists will be
vested in these gateways
 – Science Gateways will need to build trust in the organization behind
   them. Gateways need to have continuity
•High end resources can have a profound impact
•The future is very exciting!
                                 Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)
               Enjoy the Summit!




•Thank you for your
attention
•Please contact me for
further information
wilkinsn@sdsc.edu




                     Nancy Wilkins-Diehr (wilkinsn@sdsc.edu)

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:12/19/2012
language:English
pages:38
wangnuanzg wangnuanzg http://
About