Poster 1 CMDL Annual Meeting 2003 _Powerpoint; 18MB_ by lifemate


									                                                                   The Radiatively Important Trace Species (RITS) Data Recovery                                                                                                  Project 1

                                                                                                 J.D. Nance3, T.M. Thompson2, J.H. Butler2, J.W. Elkins2
                                                                        1Fundedby NOAA’s Environmental Services Data and Information Management Program (ESDIM)
                                                                          2NOAA Climate Monitoring and Diagnostics Laboratory, 325 Broadway, Boulder, CO 80305
                                                                      3Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder 80309

                                                                                                                                                 Species) systems.                                                                                        Table: RITS System Channel Summary
                                                       The RITS program was launched in 1985 to provide ground-based, in situ
                                                       atmospheric monitoring of several ozone-depleting and greenhouse gases                    Over the 16-year history of the RITS program, numerous modifications to                                              Column
                                                                                                                                                                                                                                          RITS        Gas     Carrier                                                 Eluted
                                                       measured by NOAA/CMDL (Table I). Three-channel gas chromatographs                         system hardware/software and sampling conventions has given an evolutionary                                          Packing                          Detector
                                                                                                                                                                                                                                         Channel Chromatograph Gas                                                  Compounds
                                                       (shown at left) with electron capture detectors were installed at five sites over a       aspect to the basic structure and storage format of the RITS database. Early                                         Material
                                                       five-year period (1986-1990). An additional ship-based deployment spanning the            chromatogram analysis and quality control measures were significantly                                                                                                  N2O
                                                       tropics and mid-latitudes of the Pacific Ocean was executed in the winter/spring          constrained by limitations in processing power. The computation of atmospheric                      Hewlett-Packard                                    Electron
                                                                                                                                                                                                                                             A                                  P5        Porasil B                    CFC-12
                                                       of 1989. Secondary calibration standards referenced to primary gravimetric                concentrations from processed chromatograms has largely been performed in a                             5890                                           Capture
                                                       standards were prepared in the laboratory and shipped to the ground stations for          piecewise fashion on an annual basis.                                                                                                                                CFC-11
                                                       sampling alternately with the outdoor environment. By the end of 1991, the                                                                                                                    Hewlett-Packard                                    Electron      CFC-113
                                                                                                                                                 Since the termination of the RITS program, an enhanced system of quality                    B                                  N2        OV-101
                                                       RITS systems at all sites were injecting samples every 30 minutes producing a                                                                                                                     5890                                           Capture       CH3CCl3
                                                                                                                                                 control methods and graphical analysis techniques has been implemented for the
                                                       total of up to 4700 chromatograms every week.                                                                                                                                                                                                                   CCl4
                                                                                                                                                 purpose of re-examining the RITS data in its entirety. This poster focuses on the
                                                       Between March of 1999 and August of 2001, the RITS systems were replaced                  effort to assemble all of the RITS data into a standardized and finalized form for                                                                     Electron         N2O
                                                       with newer and more capable CATS (Chromatograph for Atmospheric Trace                     inclusion in NOAA data center archives.                                                     C           Shimadzu               P5    Porapak Q
                                                                                                                                                                                                                                                                                                        Capture          SF6

Summary: Primary Reasons for Data Loss
                                                                                                                                                                                                                                                                                                         Ocean Cruise
1. Raw data (i.e. chromatogram) recording errors
     • Timestamp, sample-type/channel identifiers                                  Niwot Ridge, Colorado                             Barrow, Alaska                           Mauna Loa, Hawaii               Cape Matatula, American Samoa              South Pole, Antarctica
2. Problems with the original chromatogram analysis
     • Misidentified peaks
     • Excessively/Insufficiently-constrained analysis
     • Limitations of analysis software
3. Problems with the original analyzed peak database
     • Variant structure dependent on details of sampling cycle
     • Several injections grouped under a single timestamp
     • No facility for flagging samples of poor quality
     • Analyses of additional peaks very inconvenient

Data Collection                                                                                                               Chromatogram Examples
                                                                                                                                                                                                                   Chromatogram Standardization, Inventory, and Storage Renewal
Transport of chromatograms to Boulder was normally                                            Channel A                                 Channel B                                 Channel C                        Chromatograms were converted to a standard format and run through a series of consistency
accomplished via floppy disk and US mail or, in later years, via                                                                                                                                                   checks prior to storage renewal on CDROM. The format-standardizing program checked for
the internet. In Boulder, the chromatograms were transferred to a                       N2O                                                 CFC-11                                                                 “time folds” -- regions of overlapping data due to system clock changes -- and other
total of 48 DC600 tape cartridges (prior to normal quality control                                                                                                                                                 inconsistencies between the internal (file header) and external (filename) descriptors. Sample-
measures) and also to hard disk for quality control, processing and                         CFC-12                                                CH3CCl3                                                          type labeling errors were detected by plotting ratios of processed peak measures for nearby
subsequent storage to a total of 17 magneto optical disks. Original                                                                                                                                                environmental and calibration sample injections. Cross-channel inconsistencies were detected by
                                                                                                            CFC-11                       CFC-113          CCl4
storage formats for the chromatograms include both binary and                                                                                                                                                      running the chromatograms through an inventory program that recorded the station, timestamp,
text file types with byte-order differences among the binary types.                                                                                                                                                sample-type, and channel of each chromatogram found within a 30-minute time slot (30 minutes
The entire store of RITS raw data consists of ~2.5 million                                                                                                                                                         being the highest sample injection rate for the RITS data). Inconsistencies were found in ~1 % of
chromatograms from the five field sites combined.                                                                                                                                                                  the chromatograms rechecked. These were corrected and reanalyzed to recover the lost data.

Data Reduction                                                                                                                                                                 Chromatogram Reanalysis: Three Examples
Chromatogram analysis was most often performed in Boulder using modified BASIC
language software acquired during the very early stages of the RITS program. The                                           Misidentified Peaks                                                         Missed Peaks                                                             Temporal Instability
sole exception to this rule was during the years 1988-1993 when, because of logistical
constraints, South Pole chromatograms were analyzed on site. The outputs generated                     Original Analysis                                                        Original Analysis                                                         Original Analysis
during analysis (i.e. peak areas and heights) were assembled in record-oriented binary
or text format database files for later retrieval during the computation of atmospheric
concentrations. Each database file was structured in accordance with one of several
multiple-injection sampling cycles. Data records were designed to accommodate a full
cycle of injections to which a single timestamp was assigned. The details of the
sampling cycle and the form of the timestamp both changed over time.
Chromatogram Analysis Issues
Apart from issues involving the non-uniformity of data storage formats and data loss
from chromatogram recording errors, newly-developed graphical displays of the                          Reanalysis                                                               Reanalysis                                                                Reanalysis
database found substantial data loss that occurred during chromatogram analysis due
to the limitations of the analysis software:
     • Misidentified peaks
     • Missed peaks (Excessively-constrained analysis method)
     • Temporal instability of analysis (Insufficiently-constrained analysis method )
Much of this data loss ultimately resulted from the inability of the analysis software to
focus all of its limited resources on one peak at a time. This problem was addressed by
modifying the software to give it this ability and reanalyzing the appropriate

Database Restructuring
Another type of data loss was discovered to be related to the coarse time-resolution of the original database
files. The grouping of an entire sampling cycle into a single data record with a single timestamp lead to                                                                                    Database Restructuring Example: Mauna Loa CH3CCl3
inadvertent and inappropriate timestamp modification and data loss by overwriting after interruptions to the
normal sampling cycle. This problem was addressed by restructuring the database of analyzed peaks to
include timestamps for every sample injection. This was accomplished by initializing the restructured
database with timestamps and sample types from the chromatogram inventory and employing an algorithm                                                 The Original Database: Examples From 3 Files                                                              The Restructured Database
to match the peak analysis outputs stored in the original database with the appropriate inventoried
                                                                                                                            Processed data file: "M89A_AREA"                                                                       MLO/MC areas and heights:
chromatograms and transfer the data into the new database. Although this form of data loss was relatively                       Record size: 200 bytes
                                                                                                                                                                         Channel A: N2O, CFC-12, CFC-11                                First year of measurements: 1987
minor, restructuring the database offered several important additional advantages:                                                                                                                                                     Last year of measurements: 2000
                                                                                                                                                                                                                                                                                                 All Mauna Loa CH3CCl3
                                                                                                                                Records per day: 8
                                                                                                                                Samples per record: 3                                                                                  Record size: 20 bytes                                     peak areas and heights are
     1. The restructured database is compatible with all of the varied types of original database files. Thus, all                                                              Channel B: CFC-11, CFC-113, CH3CCl3, CCl4
                                                                                                                                Minutes per sample: 60                                                                                 Records per day: 48                                       contained in a single file.
        of the data associated with a given analysis peak was able to be collected into a single file without                                                                                                                          Minutes per record: 30
        regard to the details of the sampling cycle.                                                                        RECORD: 1     |      CAL1: 18857        CAL2: 62635
     2. Upon scanning the new database in search of overwritten samples (i.e. initialized records for which no              M89001.000 C2 |      116709 106833       81992 129625        458268       89422   53084         0      RECORD:       35089   M89 001.002          62635   0    000        89422           3368
        peak analysis outputs were transferred over from the original database) -- which typically numbered                 M89001.030 A1 |      118587 108187       87758 120024         32041       48073   45640         0
                                                                                                                            M89001.060 C1 |      117745 108023      104585 155078         58808       72077   48562         0      RECORD:       35091   M89 001.032              1   0    000        48073           2509
        on the order of a thousand per station -- tens of thousands more good quality samples were discovered
        to have been overlooked during prior analyses. All overwritten and overlooked chromatograms were                    RECORD: 2     |      CAL1: 18857        CAL2: 62635                                                    RECORD:       35093   M89 001.062          18857   0    000        72077           3670
        fetched and analyzed to fill in the gaps.                                                                           M89001.090 C2 |      117333 107242       83220 130655        454178       89622   52106         0
                                                                                                                            M89001.120 A1 |      119568 107565       82080 118471         31417       45542   44253         0      RECORD:       35095   M89 001.092          62635   0    000        89622           3385
     3. A flag byte was added to each data record to facilitate the flagging of individual injections for
                                                                                                                            M89001.150 C1 |      119525 107279       98577 154252         63241       72239   46884         0
        equipment problems. Because a single calibration sample of poor quality can adversely affect several                                                                                                                       RECORD:       35097   M89 001.122              1   0    000        45542           2384
        individual computations of a compound’s atmospheric concentration, flagging these samples prior to                  *******************************************************************************
        final reduction becomes a powerful way to enhance the overall quality of the final dataset.                                                                                                                                RECORD:       35099   M89 001.152          18857   0    000        72239           3606
     4. Isolating each chromatographic peak in its own file facilitates potential analyses of additional peaks              Processed data file: "M89_AREA"                                               Channel C: N2O
                                                                                                                                Record size: 200 bytes                                                                             ***
        (e.g. SF6 in channel C).                                                                                                Records per day: 12                                                                                                         Every sample injection is initialized with a timestamp
                                                                                                                                Samples per record: 2      Details of sampling cycle                                               ***
                                                                                                                                                                                                                                                            and sample type from the chromatogram inventory.
                                                                                                                                Minutes per sample: 60
                                                   Chromatography Problems: Flagging Example                                                                                                                                       ***
                                                                                                                            RECORD: 1     |      CAL1: 62635
                                                                                                                            M89005.030 A1 |      133639 115526        48404   120229      32122       45119   46083    213023      RECORD:       35283   M89 005.032              1   0    000        45119           2377
                                                                                                                            M89005.060 C1 |      134752 115629        45905   112618     483211       83517   54996    212240
                                              Before                                                                                                                                                                               RECORD:       35285   M89 005.062          62635   0    000        83517           3096
                                                                                                                            RECORD: 2     |      CAL1: 62635
                                                                                                                            M89005.090 A1 |      131066 116164        47814   120268      32215       46229   46174    213993      RECORD:       35287   M89 005.092              1   0    000        46229           2379
                                                                                                                            M89005.120 C1 |      130249 114536        43369   112737     457665       81288   57024    214091
                                                                                                                                                                                                                                   RECORD:       35289   M89 005.122          62635   0    000        81288           3116
       All line-connected data                                                                                              Processed data file: "M90_HGHT"
       points are used to compute                                                                                               Record size: 264 bytes                          Areas and heights kept in separate files.          ***                     A flag byte is used to mark individual injections for
       atmospheric concentrations.                                                                                              Records per day: 12                             Timestamps associated with AIR1 sample.                                    chromatography problems. These injections can be passed
                                                                                                                                Samples per record: 4                                                                              ***                     over during final reduction (i.e. the computation of
                                                                                                                                Minutes per sample: 30                                                                                                     atmospheric concentrations). One of several possible
       Off-line data points are                                                                                                                                                                                                    ***
                                              After                                                                                                                                                                                                        computational algorithms can also be set using the flag byte.
       ignored.                                                                                                             RECORD: 1009     |   CAL1: 18566        CAL2: 68285
                                                                                                                            M90200.030 C1    |     6695    6054       2210   10861           6072     2888     3485     13789      ***
                                                                                                                            M90200.045 A1    |     6716    6092       2255   11331           3167     2779     3493     13912
                                                                                                                            M90200.060 C2    |     6673    6224       2296   11300           4129     3076     3860     13878      RECORD:       62163   M90   200.032        18566   0    000        52915           2888
                                                                                                                            M90200.075 A2    |     6680    6096       2281   11317           3221     2724     3501     13670      RECORD:       62164   M90   200.047            1   0    000        54394           2779
                                                                                                                                                                                                                                   RECORD:       62165   M90   200.062        68285   0    000        56676           3076
                                                                                                                            RECORD: 1010     |   CAL1: 18566        CAL2: 68285                                                    RECORD:       62166   M90   200.077            2   0    000        49393           2724
                                                                                                                            M90200.090 C1    |     6692    6072       2235   10846           6145     2895     3494     13758      RECORD:       62167   M90   200.092        18566   0    000        54117           2895
                                                                                                                            M90200.105 A1    |     6667    6090       2289   11210           3156     2747     3508     13656      RECORD:       62168   M90   200.107            1   0    000        51115           2747
                                                                                                                            M90200.120 C2    |     6678    6212       2244   11287           4110     3089     3878     13882      RECORD:       62169   M90   200.122        68285   0    000        56543           3089
                                                                                                                            M90200.135 A2    |     6710    6077       2235   11237           3254     2764     3498     13831      RECORD:       62170   M90   200.137            2   0    000        51095           2764

To top