Ensuring High Quality Data

Document Sample
Ensuring High Quality Data Powered By Docstoc
					       Ensuring High Quality Data
• The Importance of Data                                 • Are Measurements
  Validation                                               Comparable?
• Data Validation Procedures and                         • National Contract Lab
  Tools                                                    Responsibilities
• Data Validation Levels                                 • Data Access
      – Level I: Field and Laboratory                    • Sample Size Issues
        Checks
                                                         • References
      – Level II: Internal Consistency
        Checks and Examples                              • Appendix: Criteria Tables for
      – Level III/IV: Unusual Value
                                                           PM2.5 Mass Validation
        Identification and Examples                              – Critical Criteria Table
• Validation of PM2.5 Mass                                       – Operational Evaluations Table
                                                                 – Systematic Issues
• Information to be Provided with
  PM Sampler Data
               “The purpose of data validation is to detect and then verify any data
                 values that may not represent actual air quality conditions at the
                               sampling station.” (U.S. EPA, 1984)
October 1999                      PM Data Analysis Workbook: Data Validation                       1
               The Importance of Data Validation

• Data validation is critical
  because serious errors in data
  analysis and modeling results can                              Do data validation early!
  be caused by erroneous individual
  data values.                                                                         effort to recover
                                                                                        data

• Data validation consists of
  procedures developed to identify
  deviations from measurement
  assumptions and procedures.
                                                                                       data recovery

• Timely data validation is                        data               time
  required to minimize the                         collect ion


  generation of additional data that                                                         Main et al., 1998

  may be invalid or suspect and to
  maximize the recoverable data.

October 1999             PM Data Analysis Workbook: Data Validation                               2
               The Importance of Data Validation
•    The quality and applicability of data analysis results are directly dependent upon the inherent quality
     of the data. In other words, data validation is critical because serious errors in data analysis and
     modeling results can be caused by erroneous individual data values. The EPA's PM2.5 speciation
     guidance document provides quality requirements for sampling and analysis. The guidance
     document also discusses data validation including the suggested four-level data validation system.
     It is the monitoring agency’s responsibility to prevent, identify, correct, and define the
     consequences of difficulties that might affect the precision and accuracy, and/or the validity, of
     the measurements.

•    Once the quality assured data are provided to data analysts, additional data validation steps need to
     be taken. Given the newness and complexity of the PM2.5 speciation monitoring and sample
     analysis methods, errors are likely to pass through the system despite rigorous application of
     quality assurance and validation measures by the monitoring agencies. Therefore, data analysts
     should also check the validity of the data before conducting their analyses.

•    While some quality assurance and data validation can be performed without a broad understanding
     of the physical and chemical processes of PM (such as ascertaining that the field or laboratory
     instruments are operating properly), some degree of understanding of these processes is required.
     Key issues to understand include PM physical, chemical, and optical properties; PM formation and
     removal processes; and sampling artifacts, interferences, and limitations. These topics were
     discussed in the introduction and references therein. The analyst should also understand the
     measurement uncertainty and laboratory analysis uncertainty. These uncertainties may differ
     significantly among samplers and analysis methods which, in turn, have an affect on the
     interpretation and uses of the data (e.g., in source apportionment).




October 1999                        PM Data Analysis Workbook: Data Validation                               3
            Data Validation Procedures and Tools

Data validation tools for PM are in development




 October 1999         PM Data Analysis Workbook: Data Validation   4
                Data Validation Levels
• Level I. Routine checks during the initial data processing
  and generation of data (e.g., check file identification;
  review unusual events, field data sheets, and result reports;
  do instrument performance checks).
• Level II. Internal consistency tests to identify values in the
  data that appear atypical when compared to values of the
  entire data set.
• Level III. Current data comparisons with historical data to
  verify consistency over time.
• Level IV. Parallel consistency tests with data sets from the
  same population (e.g., region, period of time, air mass) to
  identify systematic bias.                            U.S. EPA, 1999a




October 1999           PM Data Analysis Workbook: Data Validation    5
           Level I: Field and Laboratory Checks

• Verify computer file entries against data sheets.
• Flag samples when significant deviations from
  measurement assumptions have occurred.
• Eliminate values for measurements that are known to be
  invalid because of instrument malfunctions.
• Replace data from a backup data acquisition system in the
  event of failure of the primary system.
• Adjust measurement values of quantifiable calibration or
  interference bias.

                                                                  Chow et al., 1996




October 1999         PM Data Analysis Workbook: Data Validation                       6
           Level II: Internal Consistency Checks

• Compare collocated samplers (scatter plots, linear regression).
• Check sum of chemical species vs. PM2.5 mass (multielements
     Al to U + sulfate + nitrate + ammonium ions + OC + EC - Sulfur).
• Check physical and chemical consistency (sulfate vs. total sulfur,
     soluble potassium vs. total potassium, soluble chloride vs. chlorine, babs vs.
     elemental carbon).
• Balance cations and anions.
• Balance ammonium.
• Investigate nitrate volatilization and adsorption of gaseous
  organic carbon.
• Prepare material balances and crude mass balances.

                                                                           Chow, 1998


October 1999                  PM Data Analysis Workbook: Data Validation                7
        Level II: Consistency Check Guidelines

                  Consistency Check                                          Expectation
 Difference between PM10 and PM2.5                                           PM2.5  PM10
 Sum of individual chemical species and PM2.5                             species sum < PM2.5
 Ratio of water-soluble sulfate by IC to total sulfur
                                                                                 ~3
 by XRF
 Ratio of chloride by IC to chlorine by XRF                                      <1
 Ratio of water-soluble potassium by AAS to total
                                                                                 <1
 potassium by XRF
 babs compared to elemental carbon                                         good correlation
 IC = ion chromatography                                                                Chow, 1998

 XRF = energy dispersive X-ray fluorescence
 AAS = atomic absorption spectrophotometry




October 1999                 PM Data Analysis Workbook: Data Validation                          8
        Example: Compare Collocated Samplers
• Data from collocated samplers
  should be compared - between
  the same sampler type and
  different sampler types.
• During the 1995 Integrated
  Monitoring Study (IMS95) in
  California, the collocated PM2.5
  samplers (same type) at
  Bakersfield showed excellent                                                                  Collocated Comparison
                                                                                             SSI 1 and TEOM (Winter/Fall)
  agreement.                                                    200
                                                                          Bakersfield
                                                                                                                      200
                                                                                                                                Sacramento
                                                                                                       1:1
• SSI 1 and TEOM                                                          Oct. - Feb.                                           Oct. - Feb.
                                                                150                                                   150

  measurements did not correlate               TEOM ( µg/m3 )


  very well during the winter/fall                              100
                                                                                                             Reg.
                                                                                                                      100



  season. The two samplers                                       50                              Slope = 0.61          50                             Slope = 0.65
                                                                                                 Intercept = 7.4                                      Intercept = 3.0
  showed much better agreement                                                                   r = 0.82
                                                                                                 N = 14
                                                                                                                                                      r = 0.84
                                                                                                                                                      N = 58
                                                                  0                                                     0
  during March-September (not                                         0         50         100         150      200         0        50        100          150     200

                                                                                     SSI 1 ( µg/m3 )                                      SSI 1 ( µg/m3 )
  shown).                                              Reg. = linear regression fit                                                                     Chow, 1998
October 1999             PM Data Analysis Workbook: Data Validation                                                                                      9
   Example: Check Sum of Chemical Species vs. PM2.5 Mass

                                                    = Multielements (from Al to U)
                                       1:1
                                                    + Ions           (SO4=, NO3–, NH4+)
                                                    + Carbon         (OC, EC)
                                       Reg.         – Sulfur
                                                    ___________
                                                    * Exclude Cl– and K+ to avoid double-
                                                       counting.


                                                                                     Chow, 1998


• Compare the sum of species to the PM2.5 mass measurements.
• The comparison shown here indicates an excellent correlation (r=0.98).
• The sum of species concentrations is lower than the reported mass
  because the sum of species does not include oxygen.


October 1999            PM Data Analysis Workbook: Data Validation                         10
    Example: Check Chemical and Physical Consistency (1 of 2)
                                                                       Soluble Potassium vs. Total Potassium
               Sulfate vs. Total Sulfur
                                                                                                 1:1
                                    3:1




                                           Reg.



                                                                                                       Reg.




                                                                                                   Chow, 1998



• Chemical and physical consistency checks include comparing sulfate
  with total sulfur (sulfate should be about three times the sulfur
  concentrations) and comparing soluble potassium with total potassium.
• In the examples shown, the sulfur data compare well while the
  potassium data comparison shows a considerable amount of scatter.

October 1999                      PM Data Analysis Workbook: Data Validation                             11
    Example: Check Chemical and Physical Consistency (2 of 2)

 • Another consistency                                     babs vs. Elemental Carbon
   check that can be                                                              Reg.
   performed (if data are
   available) is to
   compare the elemental
   carbon concentrations
   with particle
   absorption (babs)
   measurements.
 • In the example shown,
   the two measurements                                                                  Chow, 1998

   agree well.




October 1999           PM Data Analysis Workbook: Data Validation                           12
                  Example: Anion and Cation Balance
• Equations to calculate anion
  and cation balance (moles/m3)

Anion equivalence
                                                                             Reg.
e = Cl- + NO3- + SO4=
   35.453     62.005 48.03

Cation equivalence
e = Na+ + K+ + NH4+
     23.0     39.098 18.04

Plot cation equivalents vs. anion                                        Chow 1998
   equivalents


   October 1999             PM Data Analysis Workbook: Data Validation           13
                Example: Ammonia Balance

• Equations to calculate ammonia
  balance (g/m3)

Calculated ammonium based on
   NH4NO3 and NH4HSO4 =
   0.29 (NO3-)+ 0.192 (SO4=)


Calculated ammonium based on
   NH4NO3 and (NH4)2SO4 =
   0.29 (NO3-)+ 0.38 (SO4=)


Plot calculated ammonium vs.                                         Chow 1998
   measured ammonium for both
   forms of sulfate

 October 1999           PM Data Analysis Workbook: Data Validation      14
                Example: Nitrate Volatilization Check
                                                                       San Joaquin Valley, CA
• Particularly for the western
  U.S., the analyst should
  understand the extent of
  possible nitrate
  volatilization in the data set.
• This example shows that
  nitrate volatilization was
  significant during the
  summer.




                                                                                       Chow 1998
 October 1999             PM Data Analysis Workbook: Data Validation                    15
              Example: Adsorption of Gaseous OC Check
• Some VOCs evaporate from a filter
  (negative artifact) during sampling
  while others are adsorbed (positive
  artifact).
• The top figure shows the organic
  carbon (OC) concentrations on the
  backup filters were frequently 50%
  of more of the front filter
  concentrations. The error bars reflect
  measurement standard deviation.
• The bottom figure shows the ratio of
  the backup OC to the front filter OC
  as a function of PM2.5 mass.
  Relatively larger organic vapor
  artifacts at lower PM2.5
  concentrations suggests that particles
  provide additional adsorption sites on
  the front filters (Chow et al., 1996).
                                                                         Chow 1998
   October 1999             PM Data Analysis Workbook: Data Validation   16
                Example: Material Balance
                                                           Denver, CO Core Sites
= Geological ( [ 1.89  Al ] +
  [ 2.14  Is ] + [ 1.4  Ca ] +
  [ 1.43  Fe ] )
+ Organic carbon ( 1.4  OC )
+ Elemental carbon
+ Ammonium nitrate ( 1.29 
  NO3– )
+ Ammonium sulfate ( 1.38 
  SO4= )
+ Remaining trace elements
  (excluding Al, Si, Ca, Fe,
  and S)
+ Unidentified
                                                                                   Chow 1998



October 1999              PM Data Analysis Workbook: Data Validation                   17
                     Example: Crude Mass Balance
                                                                                                 • Crude mass balances can
· Calculated Mass =                                                                                be constructed to
      Geological Material (              [ 1.89  aluminum ] +                                     investigate estimated
                                         [ 2.14  silicon ] +                                      source contributions.
                                         [ 1.4  calcium ]                                       • Do the crude estimates
                                         [ 1.43  iron ] )
                                                                                                   make sense spatially and
  + Combustion Byproducts ( babs ¸ 8.6 )                                                           temporally?
  + Secondary Sulfate ( 3  total elemental sulfur )                                                                Las Vegas, NV
                         a) 06/05/95 (average PM10 mass = 52.8 ± 19.1 µg/m3)                         Crustal    Combustion    Sulfate      Others

    Site types             Industrial               Construction                     Commercial       Residential            Vacant Land
                  100%

                  80%

                  60%

                  40%

                  20%

                   0%




                                                                                                                        SV
                        NE

                          L




                                            CO




                                                                                                                        LC
                                                                                   I
                                        LO F




                                                                                                                        LL
                                            OV




                                                                                                       ER

                                                                                                       CO




                                                                                                                       UN
                                                                                                         I
                                            RK




                                                                                 LN
                                           NO




                                                                                                                       ND
                         B




                                            CD




                                             IF




                                                                                             H
                                           NN




                                                                                                                          I
                                                                                                                        SS
                                           OW




                                                                                                                        CP
                                                                                             A
                                           NM




                                                                                           HA
                                                                                 M




                                                                                                       M




                                                                                                                        A
                                            L
                        A




                                                                                           IC
                       M




                                                                                          CD
                                          CL




      Sites
                                         GO




                                                                                                                     CR
                                                                                                                      BI
                                                                                                                     LO
                                                                               BE




                                                                                                     EF
                      LO




                                                                                                                    SW




                                                                                                                     LA
                                         PE




                                                                                                                    NW
                                                                                                    NO

                                                                                                    HA
                                         NE




                                                                                     A
                                        NC




                                                                                                                    TH
                                          A
                     NW




                                        DO



                                         CI




                                                                                                                    VA
                                                                                         EC
                                        GR
                    LA




                                                                                         M
                                                                                 W
                                        M




                                                                                             M




                                                                                                                                    Chow 1998
   October 1999                                 PM Data Analysis Workbook: Data Validation                                                  18
    Level III/IV: Unusual Value Identification

• Extreme values
• Values that normally track the values of other variables in
  a time series
• Values that normally follow a qualitatively predictable
  spatial or temporal pattern



       The first assumption upon finding a measurement that is inconsistent
 with physical expectations is that the unusual value is due to a measurement error.
      If, upon tracing the path of the measurement, nothing unusual is found,
      the value can be assumed to be a valid result of an environmental cause.
                                                                         Chow et al., 1996



October 1999               PM Data Analysis Workbook: Data Validation                19
          Example: Unusual Value Identification

 • Potassium nitrate (KNO3) is a
   major component of all
   fireworks.
 • This figure shows all available
   PM2.5 K+ data from all North
   American sites, averaged to
   produce a continental average
   for each day during 1988-1997.
 • Fourth of July celebration
   fireworks are clearly observed                                                            Poirot (1998)

   in the potassium time series.                      Regional averaging and count of sample numbers were
                                                      conducted in Voyager, using variations of the Voyager script
 • Fireworks displays on local                        on p. 6 of the Voyager Workbook Kvoy.wkb. Additional
                                                      averaging and plotting was conducted in Microsoft Excel.
   holidays/events could have a
   similar affect on data.

October 1999           PM Data Analysis Workbook: Data Validation                                       20
        Data Validation Continues During Data Analysis
• Two source apportionment models
  were applied to PM2.5 data collected in
  Vermont, and the results of the models
  were compared.
• Excellent agreement for the selenium
  source was observed for part of the
  data while the rest of the results did
  not agree well.
• Further investigation showed that the
  period of good agreement coincided
  with a change in laboratory analysis
  (with an accompanying change in
  detection limit and measurement
  uncertainty - the two models treat
  these quantities differently.)
                                        Poirot, 1999

   October 1999             PM Data Analysis Workbook: Data Validation   21
               Validation of PM2.5 Mass
• Consistent validation of PM2.5 mass concentrations across
  the U.S. is needed. To aid in this, three tables of criteria
  were developed and are provided in the appendix to this
  section of the workbook.
• Observations that do not meet each and every criterion on
  the Critical Criteria Table should be invalidated unless
  there are compelling reasons and justification not to do so.
• Criteria that are important for maintaining and evaluating
  the quality of the data collection system are included in the
  Operational Evaluations Table. Violation of a criterion
  or a number of criteria may be cause for invalidation.
• Criteria important for the correct interpretation of the data
  but that do not usually impact the validity of a sample or
  group of samples are included on the Systematic Issues
  Table.                                               U.S. EPA, 1999c

October 1999            PM Data Analysis Workbook: Data Validation       22
Information to be Provided with PM Sampler Data

         These supplemental measurements will be useful to help
                     explain or caveat unusual data
           Measurement                                    Variations
    Flow rate                      30-s max interval, average for sample period, CV for
                                   period, 5-min. average out-of-specifications
    Sample volume
    Ambient temperature            30-s interval; min, max, average for period
    Barometric pressure            30-s interval; min, max, average for period
    Filter temperature             30-s interval; 30-s interval differential out-of-spec.; max
                                   differential from ambient, date and time of occurrence
    Date and time
    Sample start and stop time
    settings
    Sample period start time
    Elapsed sample time            Actual and out-of-spec.
    1-min. Power interruptions     Start time of first 10
    User-entered information       For example, sampler and site identification
                                                                              40 CFR 50 Appendix L, Table L-1

October 1999                     PM Data Analysis Workbook: Data Validation                                     23
               Are Measurements Comparable?



To be added, a discussion of the following:
• FRM vs. continuous vs. speciation
• IMPROVE vs. Federal PM samplers




October 1999          PM Data Analysis Workbook: Data Validation   24
       National Contract Lab Responsibilities



To be added, a discussion of the following:
• Levels 0 and 1 validation
• AIRS reporting




October 1999        PM Data Analysis Workbook: Data Validation   25
                            Data Access (1 of 2)
Official data sources:
   – AIRS Data via public web at http://www.epa.gov/airsdata
   – AIRS Air Quality System (AQS) via registered users
         register with EPA/NCC (703-487-4630)
    – PM2.5 websites via public web
    PM2.5 Data Analysis Workbook at
      http://capita.wustl.edu/databases/userdomain/pmfine/
    EPA PM2.5 Data Analysis clearinghouse at http://www.epa.gov/oar/oaqps/pm25/
    Northern Front Range Air Quality Study at
      http://nfraqs.cira.colostate.edu/index2.html
    NEARDAT at http://capita.wustl.edu/NEARDAT




   October 1999               PM Data Analysis Workbook: Data Validation          26
                            Data Access (2 of 2)
Secondary data sources:
   – Meteorological parameters from NWS
           http://www.nws.noaa.gov
      – Meteorological parameters from PAMS/AIRS AQS
           register with EPA/NCC (703-487-4630)
      – Collocated or nearby SO2, nitrogen oxides, CO, VOC
        from AIRS AQS
      – Private meteorological agencies (e.g., forestry service,
        agricultural monitoring, industrial facilities)




October 1999                 PM Data Analysis Workbook: Data Validation   27
                          Sample Size Issues
       How complete must data be to show that an area meets the
                           NAAQS for PM?

Standard        Data completeness to show you meet the standards
Daily PM2.5     Single site: at least 75% of the scheduled sampling days per quarter
Daily PM10      Single site: at least 75% of the scheduled sampling days per quarter
Annual PM2.5    Single site: if each quarter has at least 75% of the scheduled sampling days, the
                annual mean for that year and site is valid
                Community monitoring zone: In each of the three years, at least one site must
                have a valid annual mean. The valid sites may be the same every year, or may
                vary from year to year.
Annual PM10     Single site: at least 75% of the scheduled sampling days per quarter.
                                                                                     U.S. EPA, 1999b



  Sample size requirements for data analyses will vary depending upon the analysis
       type, the analysis goals, the variability in the data, and other factors.



 October 1999                  PM Data Analysis Workbook: Data Validation                     28
                                             References
Ayers G.P., Keywood M.D., Gras J.L. (1999) TEOM vs. manual gravimetric methods for determination of PM2.5 aerosol
     mass concentrations. Atmos. Environ., 33, pp. 3717-3721.
Chow J.C. and J.G. Watson (1998) Guideline on speciated particulate monitoring. Draft report 3 prepared by Desert
     Research Institute for the U.S. EPA Office of Air Quality Planning and Standards. August.
Chow J.C. (1998) Descriptive data analysis methods. Presentation prepared by Desert Research Institute for the U.S. EPA
     in Research Triangle Park, November.
Chow J.C., J.G. Watson, Z. Lu, D.H. Lowenthal, C.A. Frazier, P.A. Solomon, R.H. Thuillier, K. Magliano (1996)
     Descriptive analysis of PM2.5 and PM10 at regionally representative locations during SJVAQS/AUSPEX. Atmos.
     Environ., Vol. 30, No. 12, 2079-2112.
Chow J.C. (1995) Measurement methods to determine compliance with ambient air quality standards for suspended
     particles. J. Air Waste Manage. Assoc., 45, pp.320-382.
Homolya J.B., Rice J., Scheffe R.D. (1998) PM2.5 speciation - objectives, requirements, and approach. Presentation.
     September.
Main H.H., Chinkin L.R., and Roberts P.T. (1998) PAMS data analysis workshops: illustrating the use of PAMS data to
     support ozone control programs. Web page prepared for the U.S. Environmental Protection Agency, Research
     Triangle Park, NC by Sonoma Technology, Inc., Petaluma, CA, <http://www.epa.gov/oar/oaqps/pams/analysis> STI-
     997280-1824, June.
Poirot R. (1999) personal communication
Poirot R. (1998) Tracers of opportunity: Potassium. Paper available at
     http://capita.wustl.edu/PMFine/Workgroup/SourceAttribution/Reports/In-progress/Potass/ktext.html
U.S. Environmental Protection Agency (1984) Quality assurance handbook for air pollution measurement systems, volume
     ii: ambient air specific methods (interim edition), EPA/600/R-94/0386, April.
U.S. Environmental Protection Agency(1999a) Particulate matter (PM2.5) speciation guidance document. Available at
     http://www.epa.gov/ttn/amtic/files/ambient/pm25/spec/specpln3.pdf
U.S. Environmental Protection Agency(1999b) Guideline on data handling conventions for the PM NAAQS. EPA-454/R-
     99-008, April.
U.S. Environmental Protection Agency(1999c) PM2.5 mass validation criteria. Available at
     http://www.epa.gov/ttn/amtic/pmqa.html
October 1999                            PM Data Analysis Workbook: Data Validation                                    29
               Critical Criteria Table




                                                               U.S. EPA, 1999c


October 1999      PM Data Analysis Workbook: Data Validation            30
               Operational Evaluations Table (1 of 2)




                                                                      U.S. EPA, 1999c
October 1999             PM Data Analysis Workbook: Data Validation               31
               Operational Evaluations Table (2 of 2)




                                                                      U.S. EPA, 1999c




October 1999             PM Data Analysis Workbook: Data Validation          32
               Systematic Issues




                                                             U.S. EPA, 1999c
October 1999    PM Data Analysis Workbook: Data Validation                     33

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:4
posted:9/8/2011
language:English
pages:33