Safe Drinking Water Act - DOC by BeunaventuraLongjas

VIEWS: 1,380 PAGES: 22

More Info
									Draft from Saturday, September 20, 2008 at 19:09 a9/p9

Safe Drinking Water and Consumer Right-to-Know:
Some Statistical Gauges

                                  James Hilden-Minton, Ph.D.
                             National Institute of Statistical Sciences
                             Research Triangle Park, North Carolina

                                       September 20, 2008

Central to the Safe Drinking Water Act (SDWA) is the principle of community right-to-know and
citizen involvement. The 1996 amendments expanded provisions for consumer involvement and
notification. Presently, the Environmental Protection Agency (EPA) is proposing a rule to require
most water systems to report annually water quality information to consumers. In this period of
rapidly expanding access to information, it is vital that the public is given appropriate tools to
digest and interpret data. This report supposes that the public has access to drinking water
sampling data and shows four statistical and graphical methods, in order of difficulty, that the
involved citizen may use to interpret such data. These four gauges may be used to address
questions concerning 1) degree to which water is safe, 2) certainty or assurance that water is safe
at this moment, 3) variation of water quality over time including prediction of future safety, and
4) extent of unsafe water in a large population by size.

Safe Drinking Water Act

Community water systems serve about 84 percent of the nations 102 million households.1 In
1974, Congress enacted the Safe Drinking Water Act charging the Environmental Protection
Agency (EPA) to establish national drinking water standards for public water systems. Congress
further revised and strengthened the Act with amendments in 1986 and 1996. States are given
some flexibility to establish their own more stringent standards and administrative programs.

Presently, there are about 55,000 community drinking water systems nationwide. As SDWA
defines it, a community water system is any ―public water system which serves at least 15 service
connections or regularly serves at least 25 year-round residents.‖2 They may serve, for example, a
municipality or a mobile home park. Non-community water systems, such as for schools,
restaurants or highway rest stops, serve non-residential customers. Private wells and very small
public systems are not within the purview of SDWA.

Public water systems deliver water to customers. Source water may be obtained from ground
water, surface water, or purchased wholesale. Ground water is from wells, while surface water
may come from lakes, rivers, or reservoirs. Water from separate sources is sometimes mixed or
blended. Before water is distributed, it may be treated at one or more treatment facilities.
         Drinking Water: Information on the Quality of Water Found at Community Water Systems and
Private Wells (GAO/RCED-97-123, June 1997) 10.
         40 CFR 141.2

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

Untreated water is at least pumped into the distribution system. In this case, the pumping station
is considered a treatment facility. Thus, each water system has at least one treatment facility, and
water leaving a treatment facility may or may not be treated water.

Under SDWA, public water systems are required to monitor their water for over 70 contaminants.
These include volatile organic compounds such as from solvents, coliform and viruses often from
fecal matter, inorganic contaminants such as nitrate, and other organic contaminants including
pesticides. Lead, copper and trihalomethanes, water chlorination by-products, receive special
monitoring as well.

The EPA has established maximum contaminant level (MCL) for each contaminant.3 Maximum
contaminant level is defined as ―the maximum permissible level of a contaminant in water which
is delivered to any user of a public water system.‖ Thus, MCLs are legal thresholds. Related to
the MCL is the maximum contaminant level goal (MCLG) which is a health-related threshold.
The regulatory language defines this as ―the maximum level of a contaminant in drinking water at
which no known or anticipated adverse effect on the health of persons would occur, and which
allows an adequate margin of safety. Maximum contaminant level goals are nonenforceable
health goals .‖4 The EPA has proposed simple definitions for MCL and MCLG for
communication with the public. Specifically, the MCLG would be ―the level of a contaminant in
drinking water below which there is no known or expected risk to health,‖ and the MCL would be
―the highest level of a contaminant that is allowed in drinking water.‖5 Clearly, the intent is to
distinguish between regulatory issues and health-related issues. Although MCLGs are never
greater than MCL, any water with all contaminant levels below MCL is officially safe, whether or
not the MCLGs are achieved.

Regulations6 lay out how each contaminant is to be sampled, which systems must sample and
how frequently they must sample. Most contaminants are sampled at the point-of-entry, between
treatment facility and distribution system, since this is where they are most likely to be greatest.
In contrast, water disinfection by-products, trihalomethanes, are sampled widely in the
distribution system.7 This is because trihalomethanes increase with time in the distribution
system. Similarly, lead and copper are sampled at residential taps.8

Most contaminants are sampled quarterly, once per year, or once per three-year period.
Microbiological contaminants are typically sample more frequently than once per month. Non-
community water systems sample less frequently than community water systems. Systems
meeting certain criteria may be exempted from monitoring for a particular contaminant or be
placed on a reduced schedule. For some contaminants, surface water systems are sampled more
frequently than those with ground water are. For example, all public water systems must sample
for nitrate quarterly if distributing surface water or annually if ground water. If, after one year, all
nitrate samples are below one half the maximum contaminant level (10 parts per million), then
the system may be put on an annual schedule.9 The question of sampling frequency is truly

         40 CFR Part 141 Subpart G
         40 CFR 141.2
         63 FR 7610, Feb. 13, 1998
         40 CFR Part 141 Subparts C, E, I and M
         40 CFR 141.30(g)
         40 CFR 141.86
         40 CFR 141.23(d)

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

complex. The EPA recommends, ―To find out how frequently your drinking water is tested,
contact your water system or state agency in charge of drinking water.‖10

Expanding the public right-to-know
Drafted into the Code of Federal Regulations (CFR) 141 National Primary Drinking Water
Regulations are provisions for public notification. Essentially, the consumer has the right to be
informed of any violations. First, systems are required to publish any MCL, treatment technique,
or variance and exemption schedule violations in local newspapers within 14 days. Second,
systems in violation must report to their customers by mail or hand delivery within 45 days,
unless they correct the problem within that period. Finally, if an MCL violation poses a serious
short-term risk to health, that system must report within 72 hours of the violation.11 Also,
drinking water systems must provide the most recent notification to new billing units. Not
required to do so, the EPA publishes all violations on the World Wide Web.12 Thus, at any
moment, an individual may query (with ease) the Safe Drinking Water Information System
(SDWIS) to obtain the latest list of violations for any public water system within the United

There are three types of violations, which must be reported. Variance and exemption violations
refer to a failure to monitor or report according to a specific sampling schedule. While such a
violation is not direct evidence that water may be unsafe, it does indicate that the system has had
difficulty complying with regulations. Furthermore, where sampling is delayed or neglected the
consumer has diminished assurance that their water is safe. Simply put, the water provider has
failed to demonstrate to the state or EPA that their water is indeed safe.

Treatment technique violations indicate that a system is failing to treat water or to remedy other
features of water delivery with methods prescribed by the National Primary Drinking Water
Regulations (NPDWR). While this may not mean that the water is unsafe, as with the monitoring
and reporting violations, consumer confidence is not bolstered.

Of immediate significance to the safety of water are MCL violations. For most contaminants, an
MCL violation occurs when the average level over the last twelve months for that contaminant is
above MCL. This may indicate that the contaminant level has been near or above MCL for most
of the preceding year. Thus, water may be unsafe, i.e. above MCL or MCLG, for one or more
contaminants for a year or more before regulatory actions are triggered and the community is
notified under NPDWR. We will discuss below ways of interpreting sampled water data should
those ever become available.13

Consumers not only have the right to know when their drinking water provider is in violate, but
they also have the right to be informed of the possible health risks. Care has gone into the specific
wording of health risk statements. When a system notifies users of violations, mandatory health
effects language for the relevant contaminants must be included along with a plain English (non-
technical) description of the violations. For example, if there is an MCL violation for
tetrachloroethylene the following paragraph must included.

         EPA, Water on Tap: A Consumer’s Guide to the Nation’s Drinking Water (EPA 815-K-97-002,
July 1997) 10.
         40 CFR 141.32(a)
         NPDWR does not require that the results of water sampling be made public, neither does the
Consumer Confidence Proposed Rule of 1998 go so far.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

        The United States Environmental Protection Agency (EPA) sets drinking water standards
        and has determined that tetrachloroethylene is a health concern at certain levels of
        exposure. This organic chemical has been a popular solvent, particularly for dry cleaning.
        It generally gets into drinking water by improper waste disposal. This chemical has been
        shown to cause cancer in laboratory animals such as rats and mice when the animals are
        exposed at high levels over their lifetimes. Chemicals that cause cancer in laboratory
        animals also may increase the risk of cancer in humans who are exposed over long
        periods of time. EPA has set the drinking water standard for tetrachloroethylene at 0.005
        part per million (ppm) to reduce the risk of cancer or other adverse health effects which
        have been observed in laboratory animals. Drinking water that meets this standard is
        associated with little to none of this risk and is considered safe with respect to
        tetrachloroethylene. [40 CFR 141.32(e)(48)]

This mandatory language addresses several important points. What is the contaminant? Where
does it come from? How does it get into drinking water? What are the known or anticipated
health effects? How are these health effects known or derived? Are these effects acute or long
term? What standard has EPA set? And what does the EPA consider safe? In all this, there is
effort to make the language accessible to nearly all adults.

The pending amendment to the National Primary Drinking Water Regulations would expand
community right-to-know. The proposed rule would add a subpart mandating consumer
confidence reports. These reports would exceed the current public notification of violations.
Consumers would receive annual reports on water quality whether or not there were violations.
Consumer confidence reports would ―raise consumers’ awareness of where their water comes
from, show them the processes by which safe drinking water is delivered to their homes, [and]
educate them about the importance of prevention measures such as source water protection to a
safe drinking water supply.‖14 The EPA anticipates that these reports will initiate a ―dialogue‖
between the public and drinking water providers.

The proposed consumer confidence reports would contain the following information. Source
water used by the water system will be identified. Terms such as MCL and MCLG will be
defined with jargon-free language. Central to the report is the reporting of contaminant levels. For
some contaminants, the highest test result within the last twelve months will be reported, and for
others, particularly those with compliance criteria based on averages, the average over the last
twelve months will be reported. These maximums and averages will be reported along with the
related MCLs and MCLGs. Violations will be identified, and variances and exemptions will be

Important changes are proposed for required health information. All consumer confidence reports
will display ―prominently‖ a disclaimer concerning individual susceptibility. Specifically, ―Some
people may be more vulnerable to contaminants in drinking water than the general population.‖15
Immunocompromized individuals are encouraged to ―seek advice…from their health care
provider‖ or call the Safe Drinking Water Hotline16

Violations will trigger the inclusion of health effects information in consumer confidence reports.
Under this proposal the mandatory health effects language is somewhat simplified. For example,
the paragraph above on tetrachloroethylene would be replaced by ―People who drink water

        63 FR 7606
        If the propose rule of 63 FR 7605 ff., Feb. 13, 1998, is accepted, 40 CFR 141.154(a).
        Safe Drinking Water Hotline, 800-426-4791.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

containing tetrachloroethylene in excess of the MCL over many years could have problems with
their liver, kidney or nervous system, and may have an increased risk of getting cancer.‖17 While
this single sentence does not cover all the points of the previous mandatory paragraph, combined
with the rest of the consumer confidence report all the necessary issues are covered. Missing,
perhaps for its technicality, is reference to how this cancer risk is extrapolated from tests on
laboratory animals.

The proposed rule to establish consumer confidence reports expands the role of public right-to-
know while being careful to employ principles of effective risk communication. These include the
use of non-technical, jargon-free language and the avoidance of ―information overload.‖18 This
latter concern is of particular importance to the question of which statistic or statistics on
monitoring levels should be reported. The decision in this proposed rule was to require either the
annual mean or the annual maximum depending on the contaminant. Reporting only one value
per contaminant helps avoid information overload. However, ―as far as accuracy is concerned, the
Agency [EPA] is aware that choosing one number to put into the report which gives a true
representation of the water that customers may have consumed during the year will sometimes be
difficult [as t]he quality of water is subject to spatial and temporal variability.‖19 The difficulty is
that one number is never sufficient to describe both the typical level and the degree of variability.
To do that, one needs at least two statistics. Furthermore, to address variation over time and
location, not even two statistics are sufficient.

While information overload is a sensible restraint to reporting requirements, it does not mean that
more information should not be given to those who ask for it. Individuals may ask state drinking
water authorities or their water providers for more information. Specifically, one may ask for the
records of water sample test results. Neither state agencies nor public water systems are required
to provide this detailed information, but may grant it under some circumstances. We obtained
data from a particular state agency for research purposes. We use it here to demonstrate the sort
of variability inherent in the monitoring program under SDWA and to suggest ways to understand
such information, should one obtain access to it.

Four additional questions
Regulations as regulations must define what is and is not legally acceptable. MCLs are the
mandated thresholds. For most contaminants under the NPDWR, a twelve-month average above
MCL is not legally acceptable. Penalties, correction, and public notification may be demanded of
drinking water systems in violation. Violation is a more or less black-and-white construct, though
determining whether a violation has occurred may involve some ambiguity. The NPDWR
recognizes that the enforceable threshold, the MCL, may not be adequate as a health-based
threshold. The MCL Goal (MCLG) serves this function. Unfortunately, the MCLG may be so low
that enforcement would not be practical. Different thresholds may be needed to answer different

Our purpose here is to develop alternative thresholds, or gauges, to help answer additional
questions which the public may ask, especially if the public becomes informed of the drinking
water sampling records. We will address four questions.

        If the propose rule of 63 FR 7605 ff., Feb. 13, 1998, is accepted, 40 CFR Part 141, Subpart O,
Appendix B (66).
        63 FR 7611, Feb. 13, 1998.
        63 FR 7611.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

1.   To what degree is our drinking water safe?
2.   How confident can we be that our drinking water is safe?
3.   If our drinking water is now unsafe, when did it become unsafe?
4.   How many people in this state are receiving unsafe drinking water?

Surely, the public may have additional questions, and many of these are addressed in ―Water on
Tap: A Consumer’s Guide to the Nation’s Drinking Water‖20 and other EPA documents.

Question 1 moves us beyond the black-and-white formulations of drinking water regulation.
Obviously, there is a difference between water at 95 percent of MCL and at 65 percent; however,
both are below MCL and are treated the same under present regulations. Our first set of gauges on
comparing measurements with MCL will help us examine measurements at any level and help us
appreciate the imprecision of the measurement process.

Appreciation of measurement imprecision is essential to the second question. Greater precision in
the measurement process leads us to greater confidence in determining whether drinking truly
below a threshold of safety. Thus, measurement error is related to our confidence that drinking
water is safe, especially on those days when the water is sampled. However, suppose three
months pass between two sampling dates. How certain can we be that our water remained safe
throughout that interval of time? To address that question, we must consider how variable
contaminant levels are over time. We will present some gauges for measuring process variation.
These gauges, to be sure, are only descriptive statistics. They cannot tell us why contaminant
levels vary; they only quantify how much they vary. Similarly, gauges for measurement error
cannot explain why the measurement process is imprecise.

The regulatory definition of an MCL violation usually is that the average of measurements over
the past twelve months is over the MCL. Accordingly, a community may receive unsafe water for
a year or more before a violation is discovered and reported. This approach gives the benefit of
the doubt to the drinking water system, a sensible posture for regulatory agencies that may be
challenged in court. Nonetheless, people who actually drink the water may prefer to error on the
side of safety. They want assurance at each moment that their water is safe. Question 3 concerns
when water becomes unsafe and when it becomes safe again. Individuals with increased
sensitivities to various contaminants may find this of vital interest. Most of the gauges we present
will include a graphical representation. One may form a chronology from these charts. While the
vicissitudes of water quality are made apparent, such a chart cannot tell us why the water became
unsafe or what happened to restore good quality. Also, the sampling record—no matter how
construed—cannot tell us when drinking water will become safe sometime in the future. These
questions are better asked of the drinking water provider.

The fourth question, concerning the size of an affected population, challenges us to assess
drinking water safety at each treatment facility, or system, and to aggregate population by the
outcome of each assessment. The EPA reports that about 7 percent of public water systems have
received one or more MCL violations.21 Many states draft reports on drinking water quality and
may include statistics about how many public water systems were in violation and how many
consumers are served by these facilities. One must remember that the definition of an MCL
violation gives benefit of the doubt to water systems. Thus, there may be other facilities with
actual contaminant levels above MCL, but which have not triggered a violation. It is truly a
daunting task to account for such possibly unsafe systems. One needs access to data for all

         EPA, op. cit., pp. 9-11.
         EPA, op. cit., p. 4.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

facilities and considerable computing resources. Nevertheless, such an analysis is feasible and
borrows from many gauges discussed in this paper. We will report some results in the section on
estimating the size of the affected population.

Statistical gauges
At the heart of data analysis is making comparisons be they visual or numerical. The eyes are
well adapted to make a multitude of comparisons with each glance and even do so automatically.
Graphical methods for data analysis facilitate the apprehension of patterns, trends and variability,
but often the eyes need a little guidance. We call such guidance a gauge. In contrast to the visual,
statistical methods for interval estimation, testing and modeling serve to formalize or quantify
comparisons. These methods also serve as gauges. A statistical gauge, then, is either a graphical
or numerical technique to measure, or gauge, how substantial a comparison may be.

We will present a range of gauges from simple to exceedingly complex. The objective is to
provide tools that help answer the first three questions and, if the user has the computational
resources, to answer the fourth question concerning population size not receiving or not assured
of safe water.

We will demonstrate the use of these gauges on a portion of the data that the state provided to us.
This state has identified tetrachloroethylene as one of three most serious contaminants. Within
this state, one community water system had multiple readings above MCL for tetrachloroethylene
(1 part per billion). Moreover, most of the thirty-one treatment facilities within this system had at
least one exceedance. For most of this demonstration, we will consider data from three of the 31
facilities. These are not the three worst, but they were chosen for the sake of illustration.
Fortunately, very few systems have any measurements above MCL for tetrachloroethylene. Thus,
for our purpose here, the typical case is both safe and uninteresting.

Data for these three facilities are given in Table 1. Essentially, this table provides the days when
water samples were drawn and the level of tetrachloroethylene measured. The other columns will
be explained in the next section. For now, observe which measured levels are above 1 ppb, the
MCL, and notice how many samples are actually taken over the three-year period from 1993
through 1995.
Table 1: Tetrachloroethylene sampled at three facilities with measurement bounds.
        Facility       Date Sign Level        Lower Upper
        A           3/28/94    < 0.50          0.00  0.83
                    9/12/94       1.40         1.00  2.33
                   12/19/94       0.80         0.57  1.33
                    3/21/95       0.85         0.61  1.42
                    5/16/95       0.81         0.58  1.35
                     6/1/95       0.92         0.66  1.53
                    7/11/95       1.22         0.87  2.03
                    8/22/95       1.35         0.96  2.25
                     9/6/95       1.80         1.29  3.00
                    10/3/95       1.36         0.97  2.27
                   10/16/95       1.14         0.81  1.90
                   10/31/95       0.64         0.46  1.07
        B           9/27/93       0.80         0.57  1.33
                     4/4/94       0.70         0.50  1.17
                   12/27/94       1.70         1.21  2.83

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

                      1/30/95       <    0.50       0.00     0.83
                      5/16/95            1.14       0.81     1.90
                      8/22/95            0.49       0.35     0.82
                      10/3/95            1.08       0.77     1.80
         C             4/4/94       <    0.50       0.00     0.83
                      8/22/95            0.95       0.68     1.58
                      11/1/95            2.17       1.55     3.62
                     11/15/95            1.66       1.19     2.77

Comparing with MCL
We present our first gauge to account for uncertainty in the individual measurements. In
particular, we will want to compare measurements of contaminant levels to their respective
MCLs. But what if a measurement is close? For example, the 6/1/95 sample at Facility A is 0.92
ppb, and the MCL is 1.00 ppb. Is that measurement close enough that the true level is plausibly
above 1 ppb? To what degree do we believe the water was safe, below MCL, on June 1?

Two sources of variation concern us. One, real contaminant levels vary from day to day, year to
year. Two, measurement procedures, sampling the water and testing in the laboratory, entail some
imprecision or variability. This is similar to the experience of many of us who try to lose weight.
We step on the bathroom scale one morning, it reads, say, 200 lbs. A week later we weigh
ourselves again. Despite our dieting and exercising, the scale now reads 203 lbs. Since this
apparent increase distresses us, we step off and back on again. We now get 204 lbs. Try again,
203 lbs. It is highly unlikely that our body weight has fluctuated by a pound within the minute
we took to weigh ourselves three times. This sort of variation is measurement error, the
imprecision of our bathroom scale. The three-pound variation from the beginning of one week to
the next, however, is mostly real change in body weight. Not even a clever account of bathroom
scale measurement error can explain away the two to four pounds we have gained.22

In this section, we will focus on measurement error, and change in real contaminant level will be
central to the following two sections. It is necessary to account for measurement error, which has
nothing to do with quality of water, before we can address questions about real levels of
contamination. Consider the plot in Figure 1. Here we see the measurement levels displayed by
the date on which the sample was drawn. The small black squares represent the measured level.
Vertical lines pass through these squares. The range covered by a bar represents the range of
possible contaminant levels consistent with mandated limits to laboratory measurement error.
Thus, the vertical bars express uncertainty about the real levels of contamination. As a whole, this
graphic shows two features: how contaminant levels have changed over time for Facility A and
how much variation may be due to laboratory testing.

          One exception is possible where the scale is out of adjustment, reading about 3 lbs. when nothing
is placed on it. This sort of error is called bias.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

                                                                                 Facility A with Error Bars


                                                                                                                 Well Above MCL

     Tetrachloroethylene (ppb)

                                           MCL Line


                                                               Below Detection

                                  Jan-93              Jul-93           Jan-94             Jul-94        Jan-95            Jul-95   Jan-96
                                                                                       Sample Date

Figure 1

Measurement error has two sources of variation. The first has to do with how the water sample
was physically drawn, including location, type of tap and any contamination of the sample in
handling and storage. The NPDW Regulations establish standards for these aspects of sampling,
but there is no guidance as to how much variation these steps contribute. The second concerns all
variation that occurs within the laboratory while measuring contaminant levels. Here the
Regulations do specify some bounds.

Water systems must send their water samples to certified laboratories for analysis. These
laboratories are certified by either the EPA or a state agency. Certification means that the
laboratory has demonstrated the level of accuracy required by NPDWR. The laboratory analyzes
Performance Evaluation samples with include known levels of various contaminants. For
tetrachloroethylene and many other contaminants, the laboratory must obtain measurements
within 40 percent of the true levels of the substance, and this accuracy must be obtained in at
least 80 percent of the samples.23 For instance, if ten samples contained 1 ppb tetrachloroethylene,
then at least eight of the ten measurements must lie between 0.6 and 1.4 ppb. Other contaminants
may have required precision levels other than 40 percent.

Another requirement for certification is that the laboratory achieve a minimum detection limit
(MDL) below a standard level. For tetrachloroethylene, that required MDL is 0.5 ppb or less.
MDL is the lowest value that a laboratory can measure for a substance. Table 1 has a column
with the heading sign. Where a ―<‖ occurs, the true level is most likely less than the value to the
right. For instance, the level of tetrachloroethylene at Facility A on 3/28/94 was most likely less
than 0.5 ppb. The other facilities also had measurements below MDL.
                                       40 CFR 141.24(f)(17)(i)

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

Now we shall use this information about laboratory certification to construct a gauge for
laboratory measurement error. Qualitatively, we will bear in mind that variation due to the
physical handling of the sample before it arrives at the laboratory is not incorporated within
laboratory measurement error. Another caveat is that certified laboratories maintain standards that
are sufficient to meet certification. These internal standards may actually lead to greater precision
than is required to meet regulations. Thus, we shall assume that most measurements, 80 percent
or more, of tetrachloroethylene are within 40 percent of the actual level.

                                                                  Facility B with Error Bars



                                                                                                       Well Above
  Tetrachloroethylene (ppb)


                                                                                    Wide Variation

                                     MCL Line



                                Jan-93          Jul-93   Jan-94            Jul-94                    Jan-95         Jul-95   Jan-96
                                                                        Sample Date

Figure 2

The first consequence for this assumed precision is that a measurement above 140 percent of
MCL is strong evidence that the actual level is above MCL. Conversely, a measurement below 60
percent of MCL is strong evidence that the actual level is below MCL. In the middle is an interval
of indifference, where one is uncertain about whether the actual level is above or below MCL.
For example, Facility A has two measured levels well above MCL (above 1.4 ppb), one well
below MCL (below 0.6 ppb), and the remaining nine are near MCL (between 0.6 and 1.4 ppb).
Figures 1 and 2 identify two measurements that are well above MCL.

We may go further in characterizing the measurement uncertainty of individual observations. In
as much as we can distinguish those well above and well below MCL from the rest, we can do the
same for other values besides MCL. Take the level L. If the actual value of a sample is L, then
most measurements will be between 60 percent and 140 percent of L. That is, a measurement X
                                                       .  . .
                                                        60           40
and actual level L are likely to have the relationship 0 L X 1 L may invert this
                             .   .
relationship to have that 140 L 060will hold most frequently. Next, we will show why this
                           X           X

measurement interval, from X/1.4 to X/0.6, is useful.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

The second observation from Facility A is 1.40 ppb. The measurement interval for this value is
from 1.00 (= 1.40/1.4) to 2.33 (= 1.40/0.6). Recall that this is well above MCL of 1 ppb, but just
barely. So 1.40 ppb is well above its lower limit, 1.00 ppb, and well below its upper limit, 2.33
ppb. Furthermore, this is the narrowest interval for which measured level is well above its lower
limit and well below upper limit. Most often, the actual level will be a level within the
measurement interval. Occasionally, a measurement will be so badly in error that the actual level
is outside of the measurement interval, but this should happen less than one out of five

                                                                              Facility C with Error Bars



  Tetrachloroethylene (ppb)

                                                                                               Sparse Sampling                     Level
                                        MCL Line


                                                            Below Detection
                              0.5                           Limit

                               Jan-93              Jul-93           Jan-94            Jul-94           Jan-95    Jul-95   Jan-96
                                                                                    Sample Date

Figure 3

We have included in Table 1 the lower and upper limits for each measurement. This is perhaps
best appreciated graphically. Each of the plots in Figures 1, 2 and 3 contain measurement error
bars. The bottom of the bar is at the lower limit of the measurement interval, and the top of the
bar is at the upper limit. The small square in the middle is the measured level.

Some lower limits are set to zero because those observations are below MDL. Any measurement
the is below MDL does not rule out extremely small values as possible true levels. Two
measurements below MDL have been noted in Figures 1 and 3. There is one more in Figure 2.

The plots in Figures 1, 2 and 3 also include a horizontal line at the level of MCL. This reference
line is there to help the eye distinguish between points above MCL from points below. An added
interpretation is that error bars not crossing the MCL line correspond to measurements well above
or well below. Conversely, all measurements with intervals crossing the MCL line have some
uncertainty about whether the actual levels are above or below MCL.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

We view the MCL as a threshold between safe and unsafe levels of a contaminant. These plot
take on relevance with this perspective. We see that each of the three facilities have moments
when the water is unsafe. This is certain when a measurement is well above MCL, but perhaps
this is true at other times as well. Facility A may have released unsafe water for much of 1995.
Facility B in Figure 2 shows great variability—ups and downs that seem to be wider than for
what measurement error alone can account. Are we really assured that the water was safe again in
late January 1995? Facility C has only four measurements. With such sparse sampling (Figure 3),
are we confident that the water was safe in 1994-95? This first gauge leads us to ask deeper

Comparing two measurements
In the previous section, we discussed how to compare measurements with MCL with due
consideration to error in measurement. Now we shall discuss how to compare two measurements
and to account for uncertainty due to measurement error.

Returning to the weight loss analogy, the scale readings went from 200 lbs. at the beginning of
the week to 203 lbs. at the end. Have we really gained three pounds? Suppose our weight
throughout the week was actually 202 lbs., no change. Then, the first reading would have been
two pounds below, and the second one pound above. Maybe our bathroom scale is so imprecise
that one- even two-pound errors will occur. However, if two-pound errors hardly ever occur, then
we have mostly gained one or more, even five, pounds. Naturally, if we weigh ourselves on a
more precise scale, such as the ones in medical offices, we would have greater confidence that a
three-pound difference in readings means that we have gained precisely three pounds.
Furthermore, it is the real gain or loss of weight that interests us, not the fluctuations of an
imprecise bathroom scale.

In the case of drinking water contamination, we do not have the option of using more precise
laboratory methods. Instead, we must the data at hand and do our best to account for
measurement uncertainty. For example, the first three measurements at Facility A are < 0.50, 1.40
and 0.80 ppb. The first is well below MCL and the second well above MCL. Note as well that
their error bars in Figure 1 do not overlap vertically. No likely value could be the common actual
level for both measurements. We are, therefore confident that the actual level of contamination
has increased from the time of the first measurement to the second. Are we as confident that the
water improved—contaminant levels decreased—from the second measurement to the third? This
time the error bars overlap. It at least seems plausible that the two measurements are from a
common level. How shall we gauge this?

Consider two measurements. Say one is at X and the other Y. Even with measurement error,
sometimes the actual level for X is clearly greater than that of Y, or the level for Y is clearly
greater than the other. However, in the intermediate case, we simply cannot be sure that one
actual level is greater than the other. Let’s consider the most indeterminate case—the actual
levels are the same. This is as if the to measurement were of same-day samples.

Say L is the actual level for both X and Y. The variation24 in X and Y are both proportional to L,
just as their deviation from L is 40 percent of L, or 0.40L. The difference X  Y has somewhat
more variation than the two have individually, and theory implies that the variation in the
difference is proportional to       . )
                                 2(040L, or 1.414 times 0.40L. Thus, most frequently

         Here variation means specifically the standard deviation.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

         2 ) X0L
          . Y (
        0L  2 )
         (40    40

holds when both X and Y measure the level L. To make this interval more useable, we replace L
with the average of X and Y, (X Y) 2. The resulting interval may serve as the region of
indifference where one is uncertain if one actual level is greater than the other. Outside of this
interval, one may say with confidence whether the actual levels are increasing or decreasing.

We may summarize this gauge as follows. We denote X as the measurement of actual level L, and
Y the measurement of G. For generality we replace 40 percent precision with k precision.
           
        k
1. If XY 2   then conclude L is clearly greater than G.
           2
            
        k
2. If X  2
       Y     then conclude L is clearly less than G.
            2
3. Otherwise, conclude that L is not clearly greater or less than G.

Following our example, the difference of the third and second measurements at Facility A is 0.60
(= 1.40 - 0.80), and the mean is 1.10 (= (1.40 + 0.80)/2). We compare the difference to 0.62 (=
1.414  0.40  1.10). Since 0.60 is neither greater than 0.62 nor less than –0.62, we conclude that
the actual level at the time of second sample is not clearly greater than or less than the level at
time of the third sample. We are saying that the apparent improvement might only be
measurement error. We cannot be certain that a real change has taken place.

This conclusion is probably not satisfactory to many of us. While the difference, 0.60, is not
exactly greater than the gauge cut-off value, 0.62, it is close. Thus, we would prefer some
statement of degree to show how close. We may gauge this difference with an interval rather than
a test. This interval would be

              
           k
         XY 2  
              2

where k = 0.40, the relative precision. Accordingly, the difference between levels at the second
and third sample is 0.600.62 ppb, or –0.02 to 1.22 ppb. Although this interval includes 0 as a
possibility, the difference in actual levels is likely to be much higher.

When making comparisons between pairs of measurements, the error bars in Figures 1, 2 and 3
are of limited value. These bars are intended to facilitate comparisons with MCL and other fixed
levels, but the bars will overlap too much when comparing two measurements. Consider
comparing the sixth measurement at Facility A, 6/1/95, with the ninth, 9/6/95. The measurements,
taken three months apart, are 0.92 and 1.80 ppb. In Figure 1, there is a visible overlap of error
bars, yet there is a clear increase in levels. Numerically, the difference, 0.88 ppb, is greater than
the cut-off value of 0.77 (= 1.414  0.40  (0.92 + 1.80)/2) ppb, but we can also show these
comparisons graphically.

With a slight modification, we can use error bars to facilitate pairwise comparisons. Essentially,
we will shorten the bars so that whenever two bars fail to overlap we would conclude that one
level is clearly greater than the level of the comparison. John Tukey (107) proposed this as a
graphical approach to multiple comparisons. While he dubbed these bars notches, I prefer to call

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

                                                                                                    X       X
them pair bars. The original error bars had the interval from                                           to      , and the pair bars
                                                                                                   1 k    1 k
                                                    X          X
will have the interval                                  and        . We show pair bars for Facilities A and B in
                                                 1 k 2     1 k 2
Figures 4 and 5.

                                                                    Facility A with Pair Bars



  Tetrachloroethylene (ppb)

                              1.5                                                       Some Overlap
                                                          No Overlap
                                                          Clear Increase



                                                                                                   At Same Level

                               Jun-93   Sep-93   Dec-93   Mar-94      Jun-94   Sep-94     Dec-94       Mar-95      Jun-95   Sep-95   Dec-95

Figure 4

The first two pair bars in Figure 4 clearly do not overlap. This illustrates that there was an
increase in actual levels between the two observations, that this increase is not merely an artifact
of measurement error. Conversely, the second and third pair bars do overlap, indicating that the
apparent decrease lacks strong evidence for an actual decrease.

It is also useful to identify periods where there is little or no change. As highlighted in Figure 4,
the measurements from the first half of 1995 are all at the same level. Pair bars do not accurately
indicate when more than two observations are truly at the same level. For example, are
observations 7 through 11 consistent with one actual level? This is analogous to how error bars
are not adequate for pairwise comparisons. We will, in the next section, develop a gauge for
identifying when two or more observations may all have the same level.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

                                                                      Facility B with Pair Bars



                                                     Up-Down Pattern:
  Tetrachloroethylene (ppb)

                                                     More Variation than
                              1.5                    Measurement Error Can Explain



                               Jun-93   Sep-93   Dec-93     Mar-94      Jun-94       Sep-94   Dec-94   Mar-95   Jun-95   Sep-95   Dec-95

Figure 5

The variation in Facility B is considerable (Fig. 5). After the first pair of measurements, ever pair
of successive measurements is clearly distinct. This up-down pattern cannot be explained by
laboratory measurement error alone. Perhaps the handling measurement error is causing the
variation. One might even be suspicious that cheating produced the pristine fourth and sixth
samples. We cannot say from this sort of data. The immediate interpretation is that the actual
level of contaminant at this facility is fluctuating, though we may not know why. We address this
kind of fluctuation, called process variation, in the next two sections.

Using simple means
Often we wish to summarize a group of measurements. The average, or mean, is a common
statistic used to describe the typical level. Another motivation for combining measurements is to
reduce the uncertainty due to measurement error. In this section, we will discuss uses means,
comparison of means, and measurement error. We will also discuss how to tell if there is
significant variation in contaminant level.

In Figure 4, we identified four measurements which appear to be at the same level. These levels
are 0.80, 0.85, 0.81 and 0.92 ppb, and we label these values X,X, nwith n equal to four
                                                              1    2   ,X
in this case. The mean is 0.845 (= ¼(0.80 + 0.85 + 0.81 + 0.92)) ppb. We may also express the
mean more generally as the sum of all values divided by the number of values, or

                                      (  X
                                      1 2 )
                                    XnX X

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

If all these measurements, X,X, n, are of water with the same contaminant level L, then
                            1 2 ,X
at least 80 percent of the time X will lie between (1 n)L and (1 n)L. Consequently, we
                                                      k           k

                                                                            X                 X
obtain for L the measurement interval from                                             to              .
                                                                          1 k     n
                                                                                            1 k   n

Where our numerical example has X  0 845the lower limit is 0.70 ppb. The upper limit, 1.05
                                           .     ,
ppb, is just a little above the 1 ppb MCL for tetrachloroethylene. As individual measurements, the
four observations that went into this mean had upper limits above 1.33 ppb and lower limits
below 0.66 ppb (Table 1). Thus, we see that the interval for the mean is tighter than the individual
measurement intervals. More importantly, we have greater confidence that the actual level of
tetrachloroethylene is below MCL at Facility A for the first half of 1995.

                                                                   Period Means

    Tetrachloroethylene (ppb)

                                                 Facility A                      Facility B                   Fac C
                                                                                                                      U Error
                                1.5                                                                                   U Pair
                                1.0                                                                                   L Pair
                                                                                                                      L Error





















Figure 6

Since means summarize a group of measurements and reduce the effect of measurement error, we
may also use them to clarify trends. In Figure 6, we present means for various periods. We
grouped the twelve measurements at Facility A into four six-month periods, and we used annual
periods for fewer measurements at Facilities B and C. These period means are plotted with both
error bars and pair bars. It is easy to tell them apart since the error bars will always be longer, and
here, we have added dashes at the ends of the pair bars. Pair bars have the lower limit of
                                                                                                                       1   k

and upper limit                                       . This is a slight modification of the error bar limits.
                                          1 k   2n

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

It is clear from Figure 6 that Facilities A and C have become more contaminated with
tetrachloroethylene from 1994 to 1995. Moreover, the final period means for Facilities A and C
are well above MCL. Pair bars indicate that the contaminant level at Facility B has clearly fallen
from 1994 to 1995. (Error bars alone would not have made this clear.)

Figure 6 appears to tell a rather tidy story. Do we believe that Facility B has improved all that
much in the last year? Perhaps, this story is not complete. The chart in Figure 5 tells a different
story. The measurement alternate above and below MCL. If samples 4 and 6 had not been taken,
we would have the impression that an MCL violation is immanent. Conversely, if samples 5 and
7 had not been taken, we would think that there was nothing to worry about. Process variability,
rather than measurement error only, reduces our confidence that water at Facility B is truly safe.
The chart in Figure 6 does not tell the full story; it is based only on measurement error.

We need, therefore, some measure of process variation, or at least total—measurement and
process—variation. The sample variance is a measure of total variation. Its formula is

                 X2 2 X]
                X )  2
               ( X ( X
              S1 1 ) X n )
              2 1
                      

The for 1995 measurements at Facility B are 0.50, 1.14, 0.49, and 1.08 ppb, and the mean is
                        .   ]
0.8025. This leads to S  ) 3375
                           .   2
                            1267 ) 227
                          03 3025 )       0(
                                            0 .
                                        [( ( )0 (
                                                1              2   2
                                                                      0
                                                                       .3125       2
                                                                                    . .         2

We shall compare this measure of total variance to another which measures only measurement
error. We propose the measurement error variance given by

              2 c 
                2    
              S     X X
                        n
                  X2 
                     2 2 2

                 1 
              E      1
                 ( )

Here c is a constant related to k. We suggest c should be about three quarters of k, or 0.30. For our
numerical example we obtain S
                                  E06100   . 
                                     0101This is
                                        .          4.
                                                       . 2
                                                          5049   . . .
                                                                   1408      2        2
only about half of the total variance, 0.061 to 0.127, indicating that measurement error can only
accout for about half of total variation.

                      Variance in 1995 Measurements





                             Fac A      Fac B         Fac C
              Total          0.1278     0.1267        0.3754
              Meas. Only     0.1132     0.0610        0.2304

Figure 7

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

The chart in Figure 7 shows comparisons between totals variance, S , and measurement error
variance, S E , for 1995 measurements at the three facilities. In each of these cases, the estimate of
total variance is greater than the estistimate of measurement error variance. This need not always
be the case, but where the ratio of variances is greater than 1 we have some evidence that there is
process variability in addition to measurement error.

To make this more formal, the ratio of variances is

         RS      2 ,

and (n 1)RS may be compared to a chi-square distribution with n  1 degrees of freedom to test
significance. The ratios for the three facilities in 1995 are 1.129, 2.077 and 1.629 with sample
size, n, 9, 4 and 3, respectively. Thus, a test at the 20 percent significance level25 would compare
9.03 (= (9-1) 1.129) with 11.03, the 80th percentile of the chi-squared distribution with 8 degrees
of freedom, to find that there is not significant process variation at Facility A for year 1995. That
is, we may think of Facility A having essentially the same actual contaminant level throughout
1995. However, when we compare 6.23 and 3.26 to 4.64 and 3.22 for Facilities B and C,
respectively, we find that process variation is significant, that measurement error alone cannot
account for the total variability. This should help to explain the annotation in Figure 5.

This way of measuring process variation does not take into consideration the temporal ordering of
measurement. Obviously, if there is variation in the process, this variation occurs over time. The
extraordinarily perceptive reader may have noticed that, for Facility A in Figure 6, we noted a
significant increase from the first half of 1995 to the second half. This is in contradiction with the
conclusion in the previous paragraph where we did not find strong evidence of process variation
in 1995. It is inevitable that gauges will sometimes disagree with each other, but here we must
keep in mind that this last test does not measure process variation as a function of time. For this
reason, it will be less sensitive to temporal process variation than perhaps the pairwise
comparison of means. We present a more suitable measure of process variation in the next

Measuring process variation

A line chart, such as the one in Figure 8, displays nicely what we mean by process variation. This
variation is how abruptly the process level changes over time, how wide and volatile the up- and
downward swings are. For example, volatility is an important descriptor of financial markets. The
Dow Jones Industrial Average and S&P 500 fluctuate considerably even within the same day.
Furthermore, NASDAQ and ValueLine track newer and less capitalized stocks. These tend to be
more volatile when compared to the Dow or S&P.

Returning to the weight watching analogy, we often call a person who goes on and off fad diets a
yo-yo dieter. We all gain and lose some weight, but the gains and losses of a yo-yo dieter may be
more frequent and extreme. When process variation is significant, we often feel a loss of control.

          The 80 percent confidence level, or 20 percent significance level, was chosen here to make it
compatible with the 80 percent rule applied to the certification of laboratories. We have chosen to make all
tests and intervals have the same significance.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

We are likewise uncertain about facilities with widely varying contaminant levels—even if their
averages are below MCL.

                                            Process Variation


                                                                                       Fac A

                                                                                       Fac B
                                                                                       Fac C
                          Jul-93   Jan-94   Jul-94   Jan-95   Jul-95   Jan-96

Figure 8

The variation that is emphasized in Figure 8 is mostly process variation plus some measurement
error. We may quantify process variance as the difference between total variance as above and
                                                                         2      2
variance due to measurement error. However, simple subtraction of S and S E will sometimes
lead to a negative estimate of process variance, a quantity which is surely not negative.
Furthermore, S E depends on c , a value for which there is considerable uncertainty.

Another approach is to consider measurement error as negligable. Thus, we think of S as only
measuring process variation. If this is not the case, if there is nonnegligable measurement error,
then this approach will only over estimate the amount of process variation. That is, we may
accidentally think that there is more process variation than there really is. This is at least
As was previously mention, S do not take into consideration the timing of measurements. For
this reason it is not yet a measure of process variation. Process variation is a ration of variation
per unit of time. In the weight watching example we may be interested in measuring the month to
month variation in weight or perhaps variation from year to year.

In what follows we will assume for simplicity that process variation accumulates additively. That
is, yearly variance is twelve time the monthly variance, and similarly four times the quarterly
variance. While this may not be accurate in the extremes (century variance may not be one
hundred times the annual variance), it is at least descriptive and can be computed by hand.

The adaptation that we require of S is that we divide by the cumulative time lapsed so as to
obtain a ration of variance to unit of time. An alternative formula for S is
                       ( )1
                            j2
                       n 1 ij
                       n j 
                             (i X
                              X  .

While this is not a handy formula for computation, it does illustrate an important point. The total
variance is a sum of squared differences between all pairs of observations. Now the amount of

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

time which lapsed between the ith and jth observations is ti  t j . We would like to accumulate
                              1 n
all of this lapsed time into
                              n )
                                    t .
                                 i  | This has a simpler form for computation,
                             n  j 1
                             ( 1i j
                                     |t j
             t1 t
           n i
           ( )1
               ( ii 
               i  i
                n)   ,

where the ith measurement was taken at time ti , and these times are in chronological order.

Finally, an estimate of process variance is simply the ratio, V S /T We illustrate these
                                                                   2     2
computations for Facility C. Samples were taken 505, 71, and 14 days appart, or in years, 1.383,
0.194, and 0.038 apart. Thus,
           1 (. 038
       3  194
     1 )2 )1 )
             .          )(30 (           0.
T                                                0.840
                         3 (2 )
for Facility C, and Facilities A and B have 0.328 and 0.613 for T accumulated years. Since the
Facility C has total variance 0.549, the estimate of process variance is 0.654 variance per year.
For Facilities A and B we also have total variances 0.141 and 0.184 for process variances of
0.430 and 0.301.

Process variance provides a way to compare the three facilities. Clearly, Facility C is the most
                                                     2         2                                2
variable of the three whether one compares by V or by S . However, on the basis of V ,
Facility A is seen to be more variable per year than Facility B, while the reverse is seen in S .
The difference is that B is observed over a much greater accumulation of time.

Estimating the size of the affected population
In another study, the author estimated the population at risk for exposure to high levels of
tetrachloroethylene. It is straightforward to identify the facilities that are in violation. In fact, such
information is available on the Web from the EPA’s Safe Drinking Water Information System
(SDWIS). However, this approach would likely underestimate the extent of risk in the population.
The regulatory definitions tend to put the burden of proof upon the state, effectively giving water
systems the benefit of the doubt. In probabilistic terms, the state acts when there is high
probability that a facility is above MCL.

From the public’s viewpoint, there is a desire to be assured that drinking water is safe. Here the
burden of proof may shift to showing that a facility has a low probability of exceeding MCL.
Thus, one approach is to estimate the size of the population that is not assured safe drinking
water. The scientific challenge is to estimate the probability that a facility is above MCL. If that
probability is greater than, say, 5%, then we say that the people served by that facility are not
assured. The accounting is complete when we add the number of persons served by an unassured
facility. In the study area, we found that at least 10% of the entire population was not assured—at
the 5% assurance level—water safe for tetrachlorethylene.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

Figure 9

Figure 9 shows a the estimate of the unassured population over the three-year study period. The
unassured population is represented by the top curve in the graph. Only for a brief time in winter
1995 did the unassured population drop to less than 10% of the entire population.

The unassured population size probably overestimates the number of people actually receiving
water above MCL for tetrachloroethylene. Here the overestimation is due to the low threshold
between assured and unassured. An unbiased approach would compute the expected number
persons receiving water above MCL. Once we have determined probabilities of exceedance for
each facility, the expected unprotected population is the sum of the population at each facility
multiplied by the exceedance probability.

In the study area, we found this expected unprotected population to be between 3 and 4% of the
whole population. Figure 9 traces the size of the expected population over the three-year period.
There are also 95% confidence bands around the expected value. The upper band indicates that
for the first two years more than 5% of the population may have been receiving water above
MCL for tetrachloroethylene.

The results of this section depend heavily on the assumed model used for determining exceedance
probabilities. Another source of uncertainty is due to the frequency of monitoring. Had the state
required sampling more frequent that what SDWA requires, there would be less uncertainty about
which facilities are in excess of MCL. This would effectively change the exceedance
probabilities. Since most exceedance probabilities are less than 0.5, most would shrink to smaller
values. Consequently, under more frequent sampling both the expected and unassured
populations would likely have smaller estimates.

Draft from Saturday, September 20, 2008 at 19:09 a9/p9

Citizenship in the age of information
The US EPA has pressed the public right-to-know as an essential tool of regulatory effort.
Coupled with advances in electronic media, the public now has unprecedented access to
administrative information. Access is the beginning. Tools for analysis and for extracting useful
information out of such large databases are also necessary. Our aim has been to stretch the
boundaries of what types of analysis the public may find both usable and useful. Some analyses
will remain in the domain of experts, but not all. How the citizenry, both lay and professional,
engage the right-to-know will make all the difference.

Works Cited
EPA. Water on Tap: A Consumer’s Guide to the Nation’s Drinking Water. United States
        Environmental Protections Agency, 815-K-97-002, July 1997.
Hilden-Minton, James. Case study in drinking water safety. Report, National Institute of
        Statistical Sciences, March 1998.
Tukey, John W. ―The Philosophy of Multiple Comparisons.‖ Statistical Science 6 (1991): 100-


To top