Docstoc

Grid-BGC_year1_annrep

Document Sample
Grid-BGC_year1_annrep Powered By Docstoc
					                    Grid-BGC Annual Review – Year 1
    “Implementing an efficient supercomputer-based grid-compute engine for
      end-to-end operation of a high-resolution, high data-volume terrestrial
                              carbon cycle model.”

                                          Project Team
    Project PI: Peter Thornton (NCAR)                   Co-PI: Henry Tufo (NCAR/CU)
                                               Staff:
               Nathan Wilhelmi (NCAR)                   Craig Hartsough (NCAR)
                     Matthew Woitaszek (CU)                 Jason Cope (CU)
                                           Collaborators:
                 Don Middleton (NCAR)                    Luca Cinquini (NCAR)
                                           Beta testers:
                          Niklaus Zimmermann (WSL, Switzerland)
                                 Tom Gower, Douglas Ahl (U. Wisc.)
Year 1 Annual Review: Grid-BGC                                                    Slide 1
                                 Example Results




  Daymet inputs…


                                          …Grid-BGC outputs




Year 1 Annual Review: Grid-BGC                                Slide 2
                                 Grid-BGC Project Goals
      1. Use emerging Grid-Compute technologies to provide a research-
         quality platform for terrestrial carbon cycle modeling.
      2. Provide a Web Portal user interface to organize the complicated
         workflow and data object dependencies that are typical of very
         large gridded ecosystem model implementations.
      3. Connect Portal-based simulation definition and control with
         automated job execution on remote supercomputer platforms,
         eliminating direct user interaction with the remote computational
         resources.
      4. Provide automated data streaming for very large model input and
         output datasets between the Portal, remote computational
         resources, and a remote mass storage facility.
      5. Provide robust analysis and visualization tools through the Portal.
      6. Demonstrate end-to-end functionality with a research-quality
         application (U.S. 1 km gridded simulations, targeting NACP).
      7. Focus on the needs of real researchers, through multiple
         iterations of platform development and beta-testing.

Year 1 Annual Review: Grid-BGC                                                 Slide 3
       Grid-BGC Design…          and Data Flow




Year 1 Annual Review: Grid-BGC                   Slide 4
          Grid-BGC Project Schedule and Milestones




Year 1 Annual Review: Grid-BGC                       Slide 5
                 Year 1 Progress Review (by Quarter)

                                    First Quarter
              • Completed hiring
              • Staff training in use of existing interface components
              • Staff training in use of existing core science code
              • Began Daymet website migration
              • Began Daymet data update for 1998-2003 (for research
              application).
              • Completed C++ prototype of user interface
              • Planned system architecture retreat




Year 1 Annual Review: Grid-BGC                                           Slide 6
                 Year 1 Progress Review (by Quarter)
                                 Second Quarter
      • Held system architecture retreat: identified key technology
      components, made initial work assignments.
      • Staff attended Globus World conference.
      • Implemented Java Struts version of Daymet website.
      • Completed Daymet update for 1998-2001.
      • Began testing core science code on CU cluster.
      • Began testing data transfer from NCAR Mass Store to CU cluster.
      • First web-portal User Interface prototype complete, based on C++
      tool.
      • Began developing usage scenarios for UI.
      • Began exploring visualization server tools, including web-enabled GIS
      applications.

Year 1 Annual Review: Grid-BGC                                             Slide 7
                 Year 1 Progress Review (by Quarter)
                                 Third Quarter
      • Completed usage scenarios for UI, data and process flow.
      • Began writing software requirements documentation and defining
      system architecture design, based on project objectives, usage
      scenarios, and project resource considerations.
      • Began migration of core science code I/O to Net-CDF format, for
      portability.
      • Established code and documentation repository (CVS).
      • Began defining XML interface specifications for message-passing
      between Portal and CU cluster.
      • Established staged development/release schedule to allow frequent
      feedback between developers and beta-testers.
      • Formalized relationships with two primary beta-testing groups.


Year 1 Annual Review: Grid-BGC                                              Slide 8
                 Year 1 Progress Review (by Quarter)
                                 Fourth Quarter (1 of 3)
      • Completed System Requirements Specification, including definition of
      data objects.
      • Completed development of initial System Architecture, describing the
      software and hardware components required to meet the Requirements
      Specification, and how they interact to accomplish desired work-flow.
      • Produced initial breakdown of system development tasks, including
      prioritization and assignments to project staff at NCAR and CU.
      • Continued development of Prototype1 GUI. Implementation complete
      at level of UI elements, implementation underway for connection of UI
      to Portal server-side Database System.
      • Completed design of Job Manager Services on CU cluster. Includes
      the Grid Services that enable communication to the Portal and Mass
      Store, a Reliable Job Execution sub-system, and a job management
      sub-system.

Year 1 Annual Review: Grid-BGC                                              Slide 9
                 Year 1 Progress Review (by Quarter)
                                 Fourth Quarter (2 of 3)
      • Implemented a Reliable Job Execution System (RJES) with multiple
      levels of fault tolerance.
      • Implemented a Job Manager system that interprets user requests,
      submits jobs to the RJES, communicates with Grid Services for data
      staging to/from Mass Store, and maintains persistent state information
      about all simulations. Includes web interface to state information and
      process diagnostics.
      • Implemented first prototype of Grid Services to handle user requests
      to submit, query, and terminate simulations. Integration of DataMover
      (Lawrence Berkeley Laboratories) is underway.
      • Developed and implemented Grid-BGC request simulator to test Grid
      Services, Job Manager, and RJES. Verification of these components
      was successful.
      • Continued migration of core science I/O handling to Net-CDF.

Year 1 Annual Review: Grid-BGC                                             Slide 10
                 Year 1 Progress Review (by Quarter)
                                 Fourth Quarter (3 of 3)
      • Updated Daymet U.S. database for 2002-2003. Migrated all Daymet
      daily data files to Net-CDF.
      • Developed and tested formatting for user-specified surface weather
      observations.
      • Developed and tested list-based implementation of data extraction
      from the Daymet database.




Year 1 Annual Review: Grid-BGC                                               Slide 11
                     Review of Current Project Status
                     Topics detailed in the following slides:
                 • System Requirements and Architecture
                 • Grid Services and Job Management
                 • Core science code
                 • Schedule Status
                 • TRL update
                 • Summary of Accomplishments for Year 1
                 • Expected Progress in Year 2
                 • Project Milestone Schedule
                 • Budget Status
Year 1 Annual Review: Grid-BGC                                  Slide 12
       System Requirements and Architecture (1/14)


                          Requirements Gathering Process
     • High-level usage scenarios were developed.
     • Scenarios were refined into functional requirements.
     • Requirements were distilled into the Software
       Requirements Specification (SRS). Available on project
       website.
     • The SRS is a living document, which will require
       modifications throughout the project.




Year 1 Annual Review: Grid-BGC                                Slide 13
       System Requirements and Architecture (2/14)


   System Organization
    The application was
    factored into Objects
    and Projects as the
    entities that will manage
    and control the system
    workflow.




Year 1 Annual Review: Grid-BGC                       Slide 14
       System Requirements and Architecture (3/14)

                                 System Design Goals
          • The system is being designed for a long lifespan,
            anticipating an extensive user base.
          • Manage changing technologies
                – Globus Toolkit (OGSA -> WSRF)
                – GUI Technologies (JSF / Portlets)
                – Computational Models
          • Manage changing resources
                – Computational Resources
                – Storage Resources




Year 1 Annual Review: Grid-BGC                                  Slide 15
       System Requirements and Architecture (4/14)
                                 System Design Drivers
    • The needs of our research community
    • Project Resources
          – Portioning the system into components that map to the
            available resources on the project.
          – Partitioning components to facilitate development in
            isolated environments
    • Changing Security Policies
          – Need to be able to react to changing security policies
            and procedures throughout the project.


Year 1 Annual Review: Grid-BGC                                       Slide 16
       System Requirements and Architecture (5/14)




     System Architecture:
      summary schematic




Year 1 Annual Review: Grid-BGC                       Slide 17
       System Requirements and Architecture (6/14)
 System Architecture:                          Web Portal GUI
                                                                 Application
       details                                                     Data

                 Visualization
                                                Application /    Online Disk
                    Tools                                          Cache
                                                 Workflow
                                                  Engine
                                               Job Execution     NCAR MSS
                                                 Interface

                                                  Globus        Data Transfer
                 Grid Service Interface Line

                                                Grid Service    Data Transfer

                           Models                Execution
                                                  Engine
Year 1 Annual Review: Grid-BGC                                                 Slide 18
       System Requirements and Architecture (7/14)
            Software Architecture: Application Stack overview
                                                        Provides User
                                                        Interface and
                                                        External Interface
                                                        Services.




                                                        Contains core
                                                        application code.



                                                        Contains
                                                        implementation-
                                                        specific data
                                                        mapping services.



                                                        Physical data
                                                        storage



Year 1 Annual Review: Grid-BGC                                       Slide 19
       System Requirements and Architecture (8/14)
                  Software Architecture: User Interface Layer

        • Web Portal organizes work flow and provides
          customized interface to user’s projects and data
          objects.
        • Thin client architecture accommodates distributed
          user base.
        • Implementation based on JSP / Struts.
        • Initial prototype nearing completion:
              – User logins supported, with NCAR authentication.
              – Proof of concept for database connectivity.
        • Next stages:
              – Completion of workflow structure.
              – Submission to beta testers for feedback.
              – Revise to incorporate beta tester feedback.
Year 1 Annual Review: Grid-BGC                                     Slide 20
       System Requirements and Architecture (9/14)
                  Software Architecture: User Interface Layer


         Login:
  System maintains
  list of registered
  users. User login
  is authenticated
  through NCAR
  Gatekeeper
  utilities. Requires
  each user to have
  an NCAR
  Gatekeeper
  account.




Year 1 Annual Review: Grid-BGC                                  Slide 21
      System Requirements and Architecture (10/14)
                  Software Architecture: User Interface Layer

       Workflow:
  Multiple data
  objects are
  associated with
  each project type.
  User defines new
  objects and enters
  data. System
  templates provide
  default values.
  Users can
  designate objects
  they wish to share
  with other users.


Year 1 Annual Review: Grid-BGC                                  Slide 22
      System Requirements and Architecture (11/14)
                  Software Architecture: User Interface Layer



  Object tracking:
  User is presented
  with lists of
  available data
  objects, including
  system templates
  and objects
  shared by other
  users.




Year 1 Annual Review: Grid-BGC                                  Slide 23
      System Requirements and Architecture (12/14)

               Software Architecture: Application Logic Layer

    • Domain Model
       – Core Application Logic and Workflow Management.
       – Java Object Model.
    • Job Management Services
       – Contains services to execute the models on remote
         resources.
       – Uses the Globus Toolkit for the grid infrastructure.
       – Key abstraction for managing upcoming changes in the
         Globus Toolkit.


Year 1 Annual Review: Grid-BGC                                  Slide 24
      System Requirements and Architecture (13/14)

              Software Architecture: Data Management Layer

        • Data Mapping Services
           – RDBMS specific mapping.
           – Abstracts details specific to a particular database
             implementation.
        • File Storage Services
           – Manages the file storage resources for the system.
           – Manages online disk cache and NCAR MSS
             resources.




Year 1 Annual Review: Grid-BGC                                     Slide 25
      System Requirements and Architecture (14/14)

                  Software Architecture: Data Storage Layer

            • Split into two components: relational database
              and file based storage.
            • Design drivers
               – User interface responsiveness.
               – Minimize mass store access costs.
            • Relational database
               – Postgres 7.3
            • File storage
               – Online disk cache
               – NCAR mass storage system

Year 1 Annual Review: Grid-BGC                                 Slide 26
           Grid Services and Job Management (1/11)

                         GridBGC Execution Engine Goals:

          • Export a Grid service for executing Daymet and
            Grid-BGC simulations
          • Support running multiple simulation engines and
            accepting requests from multiple user interfaces
          • Provide reliable tile-based simulation execution




Year 1 Annual Review: Grid-BGC                                 Slide 27
           Grid Services and Job Management (2/11)
                       GridBGC Compute Service Overview
         Portal Interface        1. The user interacts with the web portal interface on
                                    Dataportal. The web portal submits job requests to
                                    Hemisphere as a Globus Grid client.

           Grid Service          2. A Globus Grid service on Hemisphere listens for
                                    incoming new jobs and job queries. New jobs are
                                    checked and recorded in a local SQL database.

                                 3. The local SQL database maintains information about
                                    simulation requests, tiles, file transfers, and execution
                                    attempt history.

             Execution           4. A Java-based Execution Engine running on
              Engine                Hemisphere periodically checks the SQL database for
                                    new simulations. Files are retrieved from DataPortal
                                    and simulations are executed as required. The status
                                    is recorded in the database.
           Executables

Year 1 Annual Review: Grid-BGC                                                             Slide 28
           Grid Services and Job Management (3/11)
          Globus Managed Jobs vs. GridBGC Service:
    • Application-based approach: Globus Grid Resource
      Allocation and Management (GRAM) provides a Managed
      Job Execution service
       – GRAM uses User Hosting Environments
       – Allows running executables somewhere “on the Grid”
    • We selected a service-based approach
       – We don’t require completely portable code
       – NCAR MSS is not globally available secondary
         storage, it requires machine-specific DataMover
         configuration
       – Instead, an installable “GridBGC service” is used by
         remote clients
Year 1 Annual Review: Grid-BGC                             Slide 29
           Grid Services and Job Management (4/11)
              Grid-BGC Service: design approach
    • Provide a Globus-based Grid Service for running
      Daymet and Biome-BGC simulations.
    • Design goals:
          –   Portable installable Grid Service and execution engine
          –   Simple XML client interface for simulation specification
          –   Reliable execution and data transfer
          –   “Submit and forget”
    • We use the following Globus Toolkit components:
          – Grid Security Infrastructure (GSI) for authentication
          – GridFTP for data transfer
          – Web Services (WS) for the Grid Service
    • We separate the Grid Service from job execution: we run
      the simulations separately using a Job Engine

Year 1 Annual Review: Grid-BGC                                           Slide 30
           Grid Services and Job Management (5/11)
                     GridBGC Service System Architecture:
       Portal Interface                             DataMover
                                                     Server       NCAR MSS
           Grid Client                                            and Cache
                                  NCAR Dataportal    GridFTP
                                   CU Hemisphere                      GSI Line
         Grid Service                                GridFTP
                                                    DataMover
                                                     Server



                Execution Engine                    DataMover     Hemisphere
                                                      Client        Cache
            Tile                 Reliable Job
         Simulation               Execution
        Management                 System           Executables
Year 1 Annual Review: Grid-BGC                                              Slide 31
           Grid Services and Job Management (6/11)

                                 Simulation and Tile execution

            • Grid Service:
              – Listens for user job submissions and queries
              – Interacts with database only

            • Java-based Execution Engine:
               – Potential states:
                  • Waiting and stalling
                  • Data stage-in
                  • Model execution
                  • Data stage-out
                  • Cleanup and finalization

Year 1 Annual Review: Grid-BGC                                   Slide 32
           Grid Services and Job Management (7/11)
                                 Potential Job States

      • Most tile jobs finish with Success.
      • Tile jobs may terminate with Errors
         – Unrecoverable problems (e.g. missing files, system
           code errors)
         – Errors handler will save return codes and error
           messages for user query.
      • Tile jobs may be Held for manual intervention
         – True anticipated transients
             • DataMover servers down, NCAR MSS down, etc.
             • Disk space issues, scheduled maintenance
         – Tiles held while administrator corrects situation
         – Tiles may resume from Held state
Year 1 Annual Review: Grid-BGC                              Slide 33
           Grid Services and Job Management (8/11)
                       Administrative Support Applications

          • Additional web interface to JobEngine server
          • Grid Service also exports data for inclusion in
            main portal interface




Year 1 Annual Review: Grid-BGC                                Slide 34
           Grid Services and Job Management (9/11)
                 Grid Service System Testing
    • Automated client simulator
          – Generates random simulation and tile requests
                • Submits job via Globus Web Services
                • Transfers data from DataPortal using DataMover and GridFTP
                • Runs sample application using PBS on Hemisphere
          – Client simulator may be run on DataPortal or Hemisphere
          – Test configuration downloads 2-5 10-20 MB files, requires 15-30
            minutes walltime, and uploads 1 10-20 MB file
    • Testing methodology
          – Run through crontab, submits about 100 tiles/hour
          – Overall, we have run over 1000 simulated tiles through the system
    • Testing identified timeout problems with the DataMover
      system on Hemisphere

Year 1 Annual Review: Grid-BGC                                                 Slide 35
          Grid Services and Job Management (10/11)
                         Challenges: DataMover Integration

    • DataMover from Lawrence Berkeley Laboratories
       – Grid transfer engine
       – Integrates with NCAR Mass Storage System
       – Runs local disk resource manager cache

    • Received interim release from LBL
       – Working directly with DataMover developers
       – Developers Alex Sim and Junmin Gu have been
         responsive and helpful.



Year 1 Annual Review: Grid-BGC                               Slide 36
          Grid Services and Job Management (11/11)
                  Challenges: Grid Security Considerations
                                 Grid Service
            Already              Dataportal  Hemisphere
            Solved
                                  GSI Authentication: Portal uses service account


                                 Data Transfer
                                 Dataportal  Hemisphere
            Solution
                                 Hemisphere  Dataportal
          In Progress
                                  Hemisphere user must have Dataportal certificate
                                  Maybe use service account for Dataportal disk retrieval?


                                 Data Transfer
      Policy Problem
                                 Hemisphere  MSS
      (to be resolved)
                                 MSS  Hemisphere
                                  Policy prohibits service account sharing! Current
                                  design does not give users certificates on Hemisphere
Year 1 Annual Review: Grid-BGC                                                       Slide 37
                                 Core science code

                                   Current Status

       • Threaded version of Biome-BGC (using pthreads)
       compiled and tested on CU cluster.
       • Complete Daymet executions now running routinely on
       NCAR supercomputers.
       • Net-CDF migrations underway.
       • Formatting defined for subset of user-specified input
       datasets.
       • Batch Daymet extraction tested for list-based project.



Year 1 Annual Review: Grid-BGC                                    Slide 38
                          Project Schedule Status (1/2)




Year 1 Annual Review: Grid-BGC                            Slide 39
                          Project Schedule Status (2/2)

    Tasks are mostly on schedule, with the following
    exceptions:
    • Prototype 1 release pushed back to end of September
    (originally scheduled for August).
    • Will not change later release schedules.
    • Development of visualization tools put off until second half
    of Year 2, continuing into Year 3 (original proposal had this
    development starting in second half of Year 1).
    • Development of interactions with existing data systems will
    also be put off to the end of Year 2 (original proposal had
    design for this task starting in Year 1).


Year 1 Annual Review: Grid-BGC                                   Slide 40
                                 Project TRL Update
    Initial TRL was 3-4:
    • We had successfully implemented and tested a prototype
    system to solve a real research problem with full-scale data
    sets. (TRL 4)
    • Some of the proposed system components were not
    included in the original prototype (Mass Storage System,
    Grid Services, thin client User Interface). Technical
    feasibility for these components had been established
    independently. (TRL 3)
    Current TRL is now 4 for all components:
    • All system components have now been subjected to
    stand-alone prototype implementation and testing, with a
    focus on integration of technology components.
Year 1 Annual Review: Grid-BGC                                 Slide 41
              Summary of Accomplishments (Year 1)

    • Developed a formal System Architecture defining the interactions
    between software and hardware components, including the
    implementation of a Grid Service Interface connecting the User Interface
    with the Computational Engine.
    • Replaced the original client-side user interface application with a thin-
    client Web Portal User Interface
    • Introduced a project and data object model to manage the complex
    input and output requirements for the core science codes
    • Introduced a user authentication system to manage certification across
    distributed secure resources
    • Replaced the user-mediated interaction with the supercomputer
    resource from the original prototype with an automated job execution
    interface and job management service
    • Introduced Grid-based automated storage and retrieval mechanisms
    interfacing with the Mass Storage System.

Year 1 Annual Review: Grid-BGC                                               Slide 42
                           Expected Progress in Year 2
    We expect to have the entire system at TRL 5 by the
    end of Year 2:
    • Fully functional scientific capability for coupled Daymet
    and Biome-BGC simulations.
    • Validation carried out within the established System
    Architecture (representative environment).
    • All basic technology elements integrated, with design and
    initial prototyping finished for supporting technology
    elements (e.g. visualization, interaction with existing data
    systems).
    • All prototype implementations will conform to target
    interface (e.g. all I/O in netCDF, all message passing in
    XML, nothing hard-wired in user access to Mass Store).
Year 1 Annual Review: Grid-BGC                                    Slide 43
                                 Milestone Schedule
    1. Completed top-level design, Software Requirements,
       and System Architecture (8/04).
    2. Established beta-tester relationships (8/04).
    3. Complete Prototype 1 development, for testing of
       system at TRL 4 (9/04).
    4. Complete Prototype 2 development, for testing of
       system to establish TRL 5 (3/05).
    5. Complete Pre-Final prototype development, for initial
       testing of system to establish TRL 6 (12/05).
    6. Complete development and testing of Final prototype,
       establishing exit TRL 6 (7/06).
    7. Documentation complete for Final prototype (7/06).
Year 1 Annual Review: Grid-BGC                                 Slide 44
                                 Project Budget Status



                                                         $428,945



                                                         $317,955




              Hiring complete

Year 1 Annual Review: Grid-BGC                                 Slide 45
                                 Example Results




  Daymet inputs…


                                          …Grid-BGC outputs




Year 1 Annual Review: Grid-BGC                                Slide 46

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:4
posted:4/8/2010
language:English
pages:46