Data Management Technologies for a Fusion Simulation Project

Document Sample
Data Management Technologies for a Fusion Simulation Project Powered By Docstoc
					                                  Data Management on the Fusion Computational
                                         Pipeline, “End-to-end solutions”
No singing in this presentation

                                       Presentation to SciDAC 2005 meeting
                                                        Scott A. Klasky
                                         M. Beck (UTK) , V. Bhat (PPPL), E. Feibush(PPPL)
                                    B. Ludäscher (UCD) ,M. Parashar (Rutgers), A. Shoshani (LBL)
                                                 D. Silver (Rutgers), M. Vouk (NCS)

                                                                                   CEMM SciDAC
                                           GPS SciDAC

                                                                                                     QuickTime™ and a
                                                                                               MPEG-4 Video decompressor
                                                                                              are needed to see this picture.

                                   Outline of Talk
• The Fusion Simulation Project (FSP).
• Computer Science enabling
• The Scientific Investigation Process.
• Technologies necessary for
  leadership class computing, such as
  the FSP.
   –   Adaptive Workflow Technology
                                                                        QuickTime™ and a

   –   Data streaming                                             MPEG-4 Video decompressor
                                                                 are needed to see this picture.

   –   Collaborative code monitoring
   –   Integrated Data Analysis and Visualization Environment.
   –   Ubiquitous and Transparent Data Sharing.
A complete simulation of all interacting phenomena
       Fusion Simulation Project (FSP) 15 year project.

•Strong need for Scientific Data Management for the FSP!
                                                  Dahlburg report
      It’s about the enabling technologies

                                               QuickTime™ and a
                                         MPEG-4 Video decompressor
                                        are needed to see this picture.

Applications drive   Applications   Enabling
                        Math        respond
     FSP has computer science/DM requirements
• Coupling multiple codes/data
   – In-core and network-based
• Analysis and visualization
   – Feature extraction, data juxtaposition for          QuickTime™ and a

                                                   MPEG-4 Video decompressor
                                                  are needed to see this picture.

• Dynamic monitoring and control
   – Parameter modification, snapshot
     generation, …
• Data sharing among collaborators
   – Transparent and efficient data access
• These requirements are shared with
  many other simulations across the
  DOE community.
                   Six data technologies
        Fundamental to supporting the data management
            requirements for scientific applications

•   From the report from the DOE Office of
    Science Data-Management Workshops (March
    – May 2004) (R. Mount)
    –    Workflow, data flow, data transformation
    –    Metadata, data description, logical organization
    –    Efficient access and queries, data integration
    –    Distributed data management, data movement, networks
    –    Storage and caching
    –    Data analysis, visualization, and integrated environments
•   A path-finding FSP should develop and
    demonstrate these components
Overall priorities for each of the six areas of data management

• Each branch (simulation-driven, experiment/observation-driven, information-intensive)
of each application science ranked the six areas from 1 (lowest) to 6 (highest).
• (Fusion, Astrophysics, Combustion and Climate) simulations have similar needs
• And end-to-end solution links this areas together for 1 consistent view of the data.
OFES has a clear need for advanced SDM
 • Current OFES Data management technologies
   work well for current experiments, but do not scale
   well for large data.
 • The time is ripe for OFES to join in collaborative
   efforts with other DOE data management
   researchers and design a system, which will be
   scaleable to a FSP and ultimately to the needs of
                         The Scientific Investigation Process.
  •    A simplified version of the scientific investigation process is shown below,
       with seven stages.
  • At every stage, data management is essential.
  • Idea stage: Scientists question about a phenomenon and a hypothesis for
       the explanation.
  • Implementation stage: Implement a test-bed. Possibly make changes in
       the hypothesis to implement the changes.
  •    V&V Stage: interpret results via data analysis/viz tools.
  •    Pre-production Stage: run parameter surveys/sensitivity analysis.
  •    Production Stage: perform large experiments.
  •    Interpretation Stage: Interpret the results from the production/pre-
       production stages.
  •    Assimilation Stage: assimilation of results from previous steps.
  • GOAL in end to end solutions is to reduce the time from
    ideas to discovery!

Idea   Impl ementation Validation and      Pre-      Production   Interpretation   Assimilation
                       Verification     Production
         Workflows (an edge FSP project, NYU - Chang et al.)
                       monitoring, analysis, storing
                 Start (L-H)

                 XGC-ET                Mesh/Interpolation                 M3D-L
                                                                       (Linear stability)

                 XGC-ET                Mesh/Interpolation                    M3D

Distributed                                                                  t Stable?
   Store                                                                 No       B healed?   Distributed
                                       Mesh/Interpolation                                        Store
 TBs                                                                   Yes                      GBs
                 Noise                                                    Puncture
                Detection                                                  Plots
                Need More                                                  Island
                 Flights?                                                 detection

                  Blob                                                   Out-of-core
                                 Distributed                Feature
                Detection                                                Isosurface
                                    Store                  Detection

                                 I D A V E
          Scientific Workflows, Pre-KEPLER/SPA
• Distributed Data & Job Management
    – Authenticate, access, move, replicate, query … data (“Data-Grid”)
    – schedule, launch, monitor jobs (“Compute-Grid”)
• Data Integration:
    – Conceptual querying & integration, structure & semantics, e.g. mediation w/
      SQL, XQuery + OWL (Semantics-enabled Mediator)
• Data Analysis, Mining, Knowledge Discovery:
• Scientific Visualization
    – 3-D (volume), 4-D (spatial-temporal), n-D (conceptual views) …
     Lack of Integration

 one-of-a-kind custom apps., detached (island) solutions
 such workflows are hard to understand, maintain, reproduce
 no/little workflow design, automation, reuse, documentation
 need for an integrated scientific workflow environment
          What is a Scientific Workflow (SWF)?

• Model the way scientists work with their data and tools
    – Mentally coordinate data export, import, analysis via software systems
• Scientific workflows emphasize data flow (≠ business workflows)
• Metadata (incl. provenance info, semantic types etc.) is crucial for
  automated data ingestion, data analysis, …
• Goals:
    – SWF automation,
    – SWF &
      component reuse,
    – SWF design &
    – making
      scientists’ data
      analysis and
 Interactive and Autonomic Control of Workflows/

• Scale, complexity and dynamism of the FSP requires
  simulations to be accessed, monitored and controlled
  during execution.
• Development and deployment of applications that can be
  externally monitored and interactively or autonomically
   – Enable interactive and autonomic (policies driven) control of
     simulation elements, interactions and workflows.
   – A control network to enable elements to be accessed and
     managed externally.
       • Support runtime monitoring, dynamic data injection and simulation
         workflow control.
       • Support efficient and scalable implementations of monitoring,
         interactive and autonomic control and rule execution.
    PPPL/LN/Rutgers Data streaming technology
                                                              Adaptive threaded buffer management
•Thread+Buffer the IO layer to overlap Data Input                                                  Data Transfer
communication/computation with IO.
•Idea is to stream as much data over the                                     3 blocks

WAN as possible during the simulation
with < overhead than writing to local disk.                                  2 blocks
•Data is accessed in an identical fashion
for local and remote depots.                                                 3 blocks                                Metadata
Local depots 500Mbs:Low latency                                                                    Feedback
                                                                              1 block
PPPL Depots 100Mbs:High latency

              1) Create Groups of 16 Processors per TWRITE g roup                         I/O                     Failsafe
              1 I/O processor and 1 failsafe processor                                    Processor               Processor
                      Simulation cluster (16 procs/node) group

                                                                     2) Simulation processors transfer data to I/O
                                                                     processor which enqueues data in the I/O buffer

                 I/O                                                         3) I/O buffer transfers     data to PPPL
                 Bu ffer           Failsafe
Logistic                                                                   Local Depots closer
                                                                           to simulation cluster
Networks is
                                      Local Depots
essential                             simulation machine                                               Remote Depots at PPPL

              4) I/O buffer fills up simulation processor     5) Failsafe buffer transfers data to local
              transfers data to Failsafe Processor Õ buffer
                                                      s       depots on simulation machine
                                                       Network Adaptability

                      40                                                       40                                50                                                   50
                                                         Transfer size                                                                          Transfer Size
                      35                                                       35
                                                         Transfer rate                                           40                             Transfer Rate         40
                      30                                                       30

                                                                                           Blocks, (1MB/Block)
Blocks, (1MB/Block)

                      25                                                       25                                30                                                   30

                      20                                                       20
                                                                                                                 20                                                   20
                      15                                                       15

                      10                                                       10                                10                                                   10

                       5                                                       5
                                                                                                                  0                                                   0
                       0                                                       0
                                                                                                                      0   50   100      150      200    250     300
                           0   50     100     150       200     250      300

                                                                                                     Network Aware Self Adjusting
                                    Latency Aware
          High Throughput for “live” simulations

                                              • Buffering Scheme can keep
                                                up with data generation
                                                rates of 85Mbps from
    NERSC to PPPL: GTC                          NERSC to PPPL, and
      simulation on 512                         99Mbs from ORNL to PPPL.
 processors: 97Mbs/100Mbs.
                                              • The data was generated
                                                across 32 nodes (SP), 64
                                                processors (SGI).

ESNET router statistics peak transfer rates of 99.2Mbs/100Mbs from ORNL to
PPPL for 8 hours! (5 minute average)

•The simulations dictates the data generation rate.
   •Example 3D simulation 20483, writing 5 variables every hour = 364Mbs
                                        Low Overhead
                                    Overhead of the Buffering Scheme compared to GPFS
                                                                  Buffering scheme
                                                                  Write 2 MB blocks
                                                                  per timestep to GPFS
                                                                  Write 10 MB blocks
                                                                  per timestep to GPFS
                                                                  Overhead with HDF5
            % overhead
              % Overhead   15                                     + GPFS
                                                                  Predicted data generation
                                                                  rate of GTC in 5 Yrs


                                1       3      5      7       9       11    13        15
                                            Data Generation Rates - Mbps/Node
•   Overhead is defined as
     – Difference between Time taken with I/O scheme and time taken with
       no I/O of resulting data from simulations during its lifetime.
•   Data generation
     – 1.5Mbs/node * 64 nodes = 96Mbs (now)
     – 8Mbs/node * 64 nodes   = 520Mbs (GTC future)
        ElVis: Collaborative Code Monitoring
• Part of the Fusion Collaboratory SciDAC
• Develop a “harden” java based collaborative visualization
   system based on SciVis [Klasky, Ki, Fox]
• Used for monitoring fusion (transp) runs.
• Web based and java application.
• Used by dozens of fusion scientist.
• Being extended to be actors
 in the Kepler system.
   Requirements for data analysis and visualization
• Feature extraction routines
   – Puncture plots classification

   – Feature/Blob detection SDM
     Kamath (LLNL)         Zweben (PPPL)

                                  QuickTime™ and a
                            YUV420 codec decompressor
                           are needed to see this picture.


• Data juxtaposition requires
   – Normalize simulation and experimental
     data into a common space
     (units, meshes, interpolation)
   – Quantifying the similarity (surface area,
     volume, rate of change over time, where
     are the features over time,…)
       IDAVE --- Integrated Data Analysis and
            Visualization Environment
• Approach:
  – Enhance the existing IDAVE’s in            QuickTime™ and a
                                         MPEG-4 Video decompressor

    the fusion community to support     are needed to see this picture.

    robust and accessible
  – Incorporate and tightly integrate
    visualization into the scientific
  – Support advanced visualization/
    data mining capabilities on the
    simulation and experimental data
  – Support visualization on
    workstations and display walls.
    Ubiquitous and Transparent Data Sharing
• Problem:
  – Simulations and collaborators in any FSP will be distributed across a
    national and international networks
  – FSP simulations will produce massive amounts of data that will be
    permanently stored in national facilities, and temporary stored at
    collaborators disk storage systems
  – Need to share large volume of data amongst collaborators and the
    wider community.
  – Current fusion solutions are inadequate to handle FSP data
    management challenges.

                                                        e.g. HPSS

    Ubiquitous and Transparent Data Sharing

• What technology is required
  – Metadata system
     • To map user concepts to datasets and files
     • e.g. find {ITER, shot_1174, Var=P(2D), Time=0-10}
     • e.g. Yields: /iter/shot1174/mhd
  – Logical to physical data (files) mapping
     • e.g. lors://
     • Support for multiple replicas based on access patterns
  – Technology to manage temporary space
     • Lifetime, garbage collection
  – Technology for fast access
     • Parallel streams, large transfer windows, data streaming
  – Robustness
     • If mass store unavailable, replicas can be used
     • Technology to recover from transient failure
                                                                         QuickTime™ and a
                                                                   MPEG-4 Video decompressor
                                                                  are needed to see this picture.
   Ubiquitous and Transparent Data Sharing
• Approach
  – Need logistical versions of standard libraries and tools (NetCDF,
    HDF5) for moving and accessing data across the network
  – Speed of transfer and control of placement are vital to
    performance and fault tolerance
  – Data staging, scheduling and tracking based on common SDM
    tools and policies
  – Global namespace and placement policies to enable community
    collaboration around distributed postprocessing, visualization
• Use:
  – Logistical Networking: distributed depot system, maps logical to
    physical, parallel access, file staging
  – Storage Resource Management (SRM): Disk & Tape Mgmt
    Systems, manage space, lifetime, garbage collection,
  – No dependence on a single system: SRM is a middleware
    standard for multiple storage systems
• The scientific investigation process in the FSP
  will be limited without a strong data
  management-visualization approach highlighted
  in the 2004 DOE Data Management report.
• Many DOE projects would benefit from End-to-
  End solutions.
• Need to couple DOE/NSF computer science
  research with hardened solutions for
• 2004 Data Management Workshop: need
  $32M/year of new funding!
                                                     QuickTime™ and a
                                        Canyon Movie Toolkit (cvid)  decompressor
                                              are needed to see this picture.