HDF Update

Document Sample
HDF Update Powered By Docstoc
					   HDF Update
         Mike Folk
      The HDF Group
HDF and HDF-EOS Workshop X
     November 29, 2006

                             HDF
                                      Outline

  • Organizational info
  • HDF Software Update
  • Other Activities of Interest




Nov. 29, 2006   HDF Workshop X, Landover MD     2
Organizational info
                  “The HDF Group” = “THG”




Founded Dec. 2006                               Went solo July 15, 2006
                              Non-profit

  Nov. 29, 2006   HDF Workshop X, Landover MD            4
         THG mission
To support the vast community of HDF
   users and to ensure the sustainable
 development of HDF technologies and
the ongoing accessibility of HDF-stored
                  data.
                            The HDF Team

          Frank Baker                         John Mainzer
          Christian Chilan                    Matthew Needham
          Peter Cao                           Pedro Nunes
          Vailin Choi                         Tammi O’Neill
          Mike Folk                           Elena Pourmal
          Anne Jennings                       Binh-minh Ribler
          Barbara Jones                       Randy Ribler
          Quincey Koziol                      Rishi Sinha
          James Laird                         Kent Yang
          Raymond Lu

           And all those wonderful folks out there
           who contribute ideas, requests, bug
           reports, code, and support.
Nov. 29, 2006   HDF Workshop X, Landover MD            6
HDF Software Update
HDF4 update
                      Platforms to be dropped

  • Operating systems                       • Compilers
        •   HPUX 11.00                          • GNU C compilers older
        •   Crays SV1 and TS IEEE                 than 3.4 (Linux)
        •   AIX 5.1 and 5.2                     • Intel 8.*
        •   SGI IRIX64-6.5                      • PGI V. 5.*, 6.0
        •   Linux 2.4
        •   Solaris 2.7, 2.8, 2.9
        •   Windows 2000
        •   MAC OSX 10.3




Nov. 29, 2006     HDF Workshop X, Landover MD             10
                        Platforms to be added

  • Systems                                 • Compilers
        •   MAC OSX 10.4 (Intel)                • g95
        •   Solaris 2.* on Intel                • PGI V. 6.1
        •   Cray XT3                            • Intel 9.*
        •   Windows 64-bit (?)
        •   Linux 2.6
        •   HPUX 11.23
        •   IBM Power 5




Nov. 29, 2006     HDF Workshop X, Landover MD             11
                                New features

  • Configuration
        • Switched to use F77_FUNC macro for better
          Fortran support (no hard-coded compilers
          anymore!)
        • Support for shared libraries
  • Library
        • No hard-coded limit on number of opened files
        • New APIs to control number of files opened by
          application
        • Fortran support for SZIP compression
Nov. 29, 2006   HDF Workshop X, Landover MD    12
                                    Bugs fixes

  • Tools
        • A lot of improvements to the hdp, hrepack,
          hdiff and hdfimport utilites based on users’
          feedback
  • Library
        • Data corruption bug for several opened
          unlimited dimension SDSs
        • Better handling of SDSs with duplicated
          names in SDgetdimscale and more
Nov. 29, 2006   HDF Workshop X, Landover MD      13
HDF5 update
                              No new releases!

  • Focus on HDF5 release 1.8
  • HDF5-1.8.0 Alpha 5 release is available from:

      hdf.ncsa.uiuc.edu/HDF5/release/alpha/obtain518.html




Nov. 29, 2006   HDF Workshop X, Landover MD      15
                       Platforms to be dropped

  • Operating systems                        • Compilers
        •   HPUX 11.00                           • GNU C compilers older
        •   MAC OS 10.3                            than 3.4 (Linux)
        •   AIX 5.1 and 5.2                      • Intel 8.*
        •   SGI IRIX64-6.5                       • PGI V. 5.*, 6.0
        •   Linux 2.4                            • MPICH 1.2.5
        •   Solaris 2.8 and 2.9

       http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html




Nov. 29, 2006      HDF Workshop X, Landover MD             16
                       Platforms to be added

  • Systems                                • Compilers
        • Alpha Open VMS                       •   g95
        • MAC OSX 10.4 (Intel)                 •   PGI V. 6.1
        • Solaris 2.* on Intel (?)             •   Intel 9.*
        • Cray XT3                             •   MPICH 1.2.7
        • Windows 64-bit (32-bit               •   MPICH2
          binaries)
        • Linux 2.6
        • BG/L




Nov. 29, 2006    HDF Workshop X, Landover MD              17
New Features
in HDF5 1.8
                HDF5 1.8 new library features

  • Datatype and dataspace features
        •   Serialized dataspaces and datatypes
        •   Ability to create data type from text description
        •   Integer to float conversions during I/O
        •   Revised exception handling during type
            conversion
        •   Compact storage for N-bit data types
        •   Offset+size storage filter, saving space
        •   “Null” dataspace – datasets with no elements
        •   Data transformation filter

Nov. 29, 2006     HDF Workshop X, Landover MD      19
                HDF5 1.8 – new library features

  • Group revisions
        •   Creation order access
        •   Compact groups – small groups take less space
        •   Large group storage improvements
        •   Intermediate group creation




Nov. 29, 2006    HDF Workshop X, Landover MD    20
                 HDF5 1.8 – new library features

  • Link improvements
        • External links -- can refer to objects in another file
        • User defined links – apps create own kinds of
          links
  • Attribute improvments
        • Storage improvements for large numbers of attr
        • Iterate or look up by creation order




Nov. 29, 2006   HDF Workshop X, Landover MD       21
                HDF5 1.8 – new library features

  • Support for Unicode UTF-8 character set
  • Shared header info – duplicate header info
    shared, possibly saving space
  • Metadata cache improvements – faster I/O on
    files with many objects
  • Data transformation filter
  • Stackable Virtual File Drivers
  • Better UNIX/Linux portability



Nov. 29, 2006   HDF Workshop X, Landover MD   22
                         HDF5 1.8– new APIs

  • New extendible error-handling API
  • New APIs to copy objects between files fast
  • Dimension scale model and API
  • “HDFpacket” – API to read/write packets efficiently




Nov. 29, 2006   HDF Workshop X, Landover MD   23
HDF5 1.8 – backward and
 forward compatibility
                           HDF5 1.8 vs. 1.6.5

  • Differences between 1.8 vs. 1.6.5
        • Some file format changes
        • Several new routines added
        • Old APIs deprecated -- removed in later release
  • Consequences
        • Application requiring 1.8 format changes will write
          objects that 1.6.5 library cannot read
        • To exploit 1.8 changes, apps need to be rewritten



Nov. 29, 2006   HDF Workshop X, Landover MD     25
                          Principle of
                “Maximum file format compatibility”

      Unless instructed otherwise, the HDF5 library will
      write objects using the earliest version of the format
      possible for describing the information.

      Assures forward compatibility with the older
      versions whenever possible – objects in new
      files can be read with old libraries if those
      objects are “known” to the old libraries.



Nov. 29, 2006     HDF Workshop X, Landover MD    26
Command line tools
                  New features for old tools

  • h5dump
        • Dump data in binary format
  • h5diff
        • Compare dataset regions
  • Parallel h5diff (ph5diff)
        • Compare two files in MPI parallel environment
  • h5repack
        • Efficient data copy using H5Gcopy()
        • Able to handle big datasets


Nov. 29, 2006   HDF Workshop X, Landover MD     32
                          New HDF5 Tools

  • h5copy
        • Copies an group, dataset or named datatype from
          one location to another location
        • Copies within a file or across files
  • h5check
        • Verifies an HDF5 file against the defined HDF5
          File Format Specification
  • h5stat
        • Reports statistics about a file and objects in a file

Nov. 29, 2006   HDF Workshop X, Landover MD       33
HDF Java Products
                           HDFView changes

  • Quality improvements for HDF-java package
        • Full documentation of hdf-java object package
        • Test suite for hdf-java object package
  • Support 64-bit Java on Linux and Solaris
  • Many new features, including
        •   Change font size easily
        •   Grab and move image
        •   Create new table (compound dataset) from template
        •   Filter out fill value for image creation
        •   -geometry option for very high resolution displays


Nov. 29, 2006     HDF Workshop X, Landover MD         35
                        Future work for Java

  • Update HDF5 JNI APIs for HDF5 1.8 release
  • Release HDFView 2.4 with bug fixes/new
    features with HDF5 1.8 release
  • New GUI features dealing with table, image
    and animation
  • Writing capability for HDF5-SRB model




Nov. 29, 2006   HDF Workshop X, Landover MD   36
Website Development for
 HDF-EOS Tools &
 Information Center
                   Website for HDF-EOS Tools

  • THG now manages HDF-EOS web site
        •   Registered domain names: hdfeos.net/.org/.com
        •   Re-implemented major topic areas
        •   Re-designed interface
        •   Registered google search
  • Will continue maintenance
  • Phase two
        • Host mailing list
        • Support simple forum features



Nov. 29, 2006     HDF Workshop X, Landover MD       38
                 Website for HDF-EOS Tools




Nov. 29, 2006   HDF Workshop X, Landover MD   39
Other Activities of
    Interest
Performance R&D
                HDF5 - PnetCDF performance comparison

                                   Flash I/O Benchmark (Checkpoint files)

                        PnetCDF            HDF5 collective       HDF5 independent

                       2500

                       2000

                       1500
                MB/s




                       1000                                                          uP: Power 5
                       500

                          0
                              10                110             210            310
                                               Number of Processors


         I/O performance of PnetCDF is comparable with
         parallel HDF5 when the libraries are used in similar
         manners.

Nov. 29, 2006            HDF Workshop X, Landover MD                          42
                                   PnetCDF4 - PnetCDF comparison

                                              PNetCDF collective      NetCDF4 collective

                             160
                             140
          Bandwidth (MB/S)




                             120
                             100
                              80
                              60
                              40
                              20
                               0
                                   0     16     32     48    64      80   96   112    128   144

                                                     Number of processors

        I/O performance of parallel NetCDF4 is comparable
        with PnetCDF with about 15% slowness on average for
        the output of ROMS history file.
Nov. 29, 2006                          HDF Workshop X, Landover MD                   43
                Collective I/O improvements

  • HDF5 supports collective IO for non-regular
    selections
  • Collective IO for chunked storage is not trivial.
  • Non-regular selection performance optimizations:
        • Added IO options to achieve good collective IO
          performance
        • Added APIs for applications to participate in the
          optimization process
  • See the poster



Nov. 29, 2006    HDF Workshop X, Landover MD         44
             DOE Labs

                         Lawrence
  Sandia
                        Livermore
 National
                          National
Laboratory
                        Laboratory
                         DOE ASC* and Others

  • Support HDF5 on major systems at Sandia &
    Lawrence Livermore National Laboratories
  • R&D efforts underway
        •   File recovery after a crash
        •   Very fast write speed – goal is 300 MB/sec
        •   Read-while-writing capability
        •   Java library and HDFView improvements



* Advanced Scientific Computing project
Nov. 29, 2006     HDF Workshop X, Landover MD    46
Flight test
                Flight test – collect, then process




Nov. 29, 2006     HDF Workshop X, Landover MD   48
                Boeing HDF5 for flight test data

  • Boeing 787 active archive
        • 10 TB per flight-test day
  • Must handle raw, real-time data
        • High speed ingest, by “packet”
        • Post-processing, by “time-history”
  • Boeing High Level API’s
        • HDFpacket – released with HDF5 1.8
        • HDFtime_history – new, open version likely



Nov. 29, 2006     HDF Workshop X, Landover MD   49
Product data

    STEP
             Bioinformatics
caacaagccaaaactcgtacaa
Cgagatatctcttggaaaaact
gctcacaatattgacgtacaag
gttgttcatgaaactttcggta
Acaatcgttgacattgcgacct
aatacagcccagcaagcagaat




          Managing genomic data
C# HDF5 API
 for Agilent
                              Agilent C# project

  • Why?
        • Heavy use of C# at Agilent
        • Compatibility with Matlab
        • Other interest in HDF5 at Agilent
  • What?
        • Prototype API in C# for Windows XP
        • Basic functions to create, open, close, read, write
        • Limited datatypes, no partial I/O
  • When?
        • March 2007


Nov. 29, 2006     HDF Workshop X, Landover MD           53
                         HDF5 Software
                         Tools & Applications



                      Fortran C++ Java C#
                              C API

                           HDF I/O Library




                                 HDF File


Nov. 29, 2006   HDF Workshop X, Landover MD     54
NetCDF 4
                          NetCDF 4 project

  • Enhanced NetCDF-4 Interface to HDF5
        • Combine features of netCDF and HDF5
        • Take advantage of their separate strengths
  • Collaboration between NCSA, THG, Unidata
  • Currently in Alpha Release
  • Waiting for beta release




Nov. 29, 2006   HDF Workshop X, Landover MD    56
                        NetCDF-4 Architecture

                               netCDF-3          netCDF-4         HDF5
                              applications      applications   applications




        netCDF                 netCDF-3
         files                 Interface

                                             netCDF-4
       netCDF-4                               Library
       HDF5 files

        HDF5
         files                                    HDF5 Library


            • Supports access to netCDF files and HDF5 files
              created through netCDF-4 interface
Nov. 29, 2006       HDF Workshop X, Landover MD                     57
                            Archival formats

  • Proposal to NOAA Scientific Data
    Stewardship program
  • Will investigate use of OAIS “Archive
    Information Package” standard with HDF5
  • PI: Ruth Duerr (NSIDC) and Kent Yang


       OAIS: Open Archival Information System




Nov. 29, 2006   HDF Workshop X, Landover MD     58
   Asymmetries between
collecting and accessing data
  • Huge streams of data                  • To be accessed in little
    collected …                             bits…




Nov. 29, 2006   HDF Workshop X, Landover MD             60
                Challenge – efficient remote access

  • How do we efficiently find and access data
    from distributed repositories, when the data
    are big and complex?
  • Storage Resource Broker (SRB)
        • Efficient access to HDF5 objects in repository
  • OPeNDAP
        • Powerful protocol for remote querying and
          subsetting of scientific data



Nov. 29, 2006    HDF Workshop X, Landover MD   61
                Example – Storage resource broker

  • Storage Resource Broker – repository for
    heterogeneous data collections
  • Simplifies storage, query and access to massive
    amounts of scientific data
  • Has data in HDF5, netCDF, other formats




Nov. 29, 2006    HDF Workshop X, Landover MD   62
                       Normal SRB configuration


                      client


                                                         HDF5

            HDF5 File
         (whole file or a
                                                   SRB Server
           sequence of
              bytes)

                                                     MCAT




Nov. 29, 2006        HDF Workshop X, Landover MD                63
                    OPeNDAP-HDF5 project

  • OPeNDAP
        • Powerful protocol for remote querying and
          subsetting of scientific data
        • Replaces direct file access with remote query and
          access
        • Widely used in Earth Sciences




Nov. 29, 2006   HDF Workshop X, Landover MD    64
                    OPeNDAP – HDF5 Project

  • A NASA ROSES NRA project
  • Tasks
        •   HDF5-DAP2 server (now a prototype)
        •   HDF5-DAP4 server
        •   DAP4 to HDF5 conversion utility
        •   Investigate integrated DAP-aware HDF5 library




Nov. 29, 2006     HDF Workshop X, Landover MD        65
SQL Server and HDF5
   with Microsoft
                        SQL Server and HDF5

  • Microsoft “dream environment for scientists”
  • Combine data management, computing
  • SQL Server 2005 solution
        • Combine RDBMS with scientific analysis tools,
          together in one integrated system.
        • HDF5 & other formats manage scientific objects




Nov. 29, 2006   HDF Workshop X, Landover MD   67
                                 HDF5 in SQL server

                           Libraries      Web Services       OLAP and
  Visualization                          (XML, REST, RSS)                  Reporting
                          (MATLAB,…)                        Data Mining

                .NET Languages with Language Integrated Query

    HDF5 EDM model                 Entity Framework (EDM, eSQL, O-R mapping)


                            SQL Server
                HDF5                                         HDF5
                Index
                                               HDF5
                                               TVFs          files
                        HDF
                        5 type              HDF5
                                            FS blob




Nov. 29, 2006           HDF Workshop X, Landover MD                   68
  Thank you all
      and
Thank you NASA!
              Acknowledgement
This report is based upon work supported in part by a
 Cooperative Agreement with NASA under NASA
    NNG05GC60A. Any opinions, findings, and
 conclusions or recommendations expressed in this
    material are those of the author(s) and do not
     necessarily reflect the views of the National
       Aeronautics and Space Administration.
Questions/comments?
                        Information Sources

  • HDF website
        http://hdfgroup.org/
  • HDF5 Information Center
        http://hdfgroup.org/HDF5/
  • HDF Helpdesk
        hdfhelp@hdfgroup.org
  • HDF users mailing list
        hdfnews@ncsa.uiuc.edu
         coming soon: news@hdfgroup.org


Nov. 29, 2006   HDF Workshop X, Landover MD   72

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:8/20/2011
language:English
pages:67