Document Sample
Integration Powered By Docstoc
					Chapter 7


This chapter is to be used as an alternative to chapter 5 in Conrad Plaut’s course notes. It
achieves more or less the same goal, but via a different route, and cross references between this
chapter and chapter 5, except explicitly stated, will be close to impossible until after the chapter is
accomplished. We will obtain the Lebesgue integral by means of an extension process from more
elementary notions of an integral, among which is the Riemann integral.

7.1     Vector spaces

Definition 7.1.1 A real vector space is a triple (V, +, ·) consisting of a set V , a binary operation
+ on V (i.e., a function + : V × V → V , and a multiplication of elements of V with real numbers,
i.e., · : R × V → V satisfying the following rules (with the dot often omitted):
      ∀u, v ∈ V :          u+v =v+u
      ∀u, v, w ∈ V :       (u + v) + w = u + (v + w)
      ∃0 ∈ V ∀v ∈ V :      0+v =v
      ∀v ∈ V ∃w ∈ V :      v+w =0
      ∀λ ∈ R ∀u, v ∈ V : λ(u + v) = λu + λv
      ∀λ, µ ∈ R ∀v ∈ V : (λ + µ)v = λv + µv
      ∀λ, µ ∈ R ∀v ∈ V : (λµ)v = λ(µv)
      ∀v ∈ V :           1v = v
In the context of vector spaces, the elements of V are called vectors, and the numbers (in R) are
called scalars. It is easy to see that the additive inverse w to any given v ∈ V , whose existence is
asserted in the 4th axiom, is unique, and we denote it as −v.

Note that in these axioms, + denotes two operations; addition in R and addition in V ; likewise there
are two multiplications denoted by juxtaposition or dot; multiplication in R and multiplication of
real numbers and elements of V . The context clarifies which operation is meant. While in a number
of examples of vector spaces, there is also a multiplication among vectors, such an operation is not
part of the definition of a vector space.

Definition 7.1.2 A complex vector space is a triple (V, +, ·) consisting of a set V , a binary oper-
ation + on V (i.e., a function + : V × V → V , and a multiplication of elements of V with complex
numbers, i.e., · : C × V → V satisfying the same rules as given in the previous definition, except
that R is to be replaced with C throughout.

   In like manner, we can define vector spaces over any field F; but only real or complex vector spaces
   are used in this course.

   Example 7.1.3 Rn is a real vector space for each n. Cn is a complex vector space; or it can be
   viewed also as a real vector space by restricting the scalars to be real numbers.

   Example 7.1.4 F(A → R), the space of functions from any set A into R is a real vector space
   with the usual definitions that (f + g)(a) := f (a) + g(a) and (λf )(a) = λf (a). The special case
   A = {1, 2, . . . , n} retrieves Rn , if we view an n-tuple (x1 , . . . , xn ) as a list of values (x(1), . . . , x(n))
   of a function x : {1, . . . , n} → R. — Similarly, we can allow functions with complex values and
   view F(A → C) as a complex, or as a real vector space.

   Example 7.1.5 Subsets of F(A → R) tha satisfy the (algebraic) closure property, namely that
   with any two elements f, g, their sum f + g and the negative −f , and all scalar multiples λf are
   also contained in the subset, are a vector space in their own right (called a subspace of F(A → R)).
   In particular C 0 (X → R) is a real vector space for any metric space X.

⇒ At this place, merge section 5.9 ‘Banach Spaces’

   7.2      The Riemann Integral

⇒ Copy this verbatim from Sec. 5.1 until Prop 5.1.2; and Ex 5.1.4
   (Note that Cor 5.1.3 in the notes, as it is written, is wrong; and the ‘proof’ reveals what is actually
   The definition of Riemann integral generalizes readily to a multi-variable setting. We can integrate
   over boxes [a1 , b1 ] × · · · × [an , bn ], and a partition of a box is the cartesian product of partitions of
                              (i)     (i)             (i)
   intervals. So if ai = x0 < x1 < . . . < xki = bi are partitions of [ai , bi ] into ki subintervals, then
                                                                                         (1)   (1)              (n)    (n)
   the box [a1 , b1 ] × · · · × [an , bn ] is partitioned into k1 · · · kn subboxes [xs1 , xs1 +1 ] × · · · × [xsn , xsn +1 ],
   which we will re-index and call Bj for j ∈ {1, 2, . . . , k1 · · · kn }.

                   a partition of a 2dim box:                              not a partition:

   The volume of a box B = [a1 , b1 ] × · · · × [an , bn ] is defined to be (b1 − a1 ) · · · (bn − an ) and will be
   denoted as µ(B).
   The Riemann sums S(P, {cj }) are defined as    f (cj )µ(Bj ), where j is an index counting all boxes.
   The above theorems and their proofs carry over naturally.

   7.3      Overview over the Lebesgue Integral

   In this section, we deviate from the theorem and proof style and give a heuristic overview with the
   basic ideas, examples and results given without proof. This section serves as a motivation, and to
   give the big picture. In later sections we will resume the rigorous exposition, filling in the details

and prove all the core results needed to obtain the full strength Lebesgue integral. (Properties
of some further examples in this overview section may remain unproved inasmuch as only the
motivation, but not the rigorous development depends on them. Specifically, Example 7.3.6 below
is of this manner; it has been patched together from results that can be found proved in Zygmund’s
book ‘Trigonometric Series’). So from a stricly logical point of view, the present section can be
omitted altogether; however this section is written in the belief that such omission would be utterly
unadvisable from a didactic point of view.

Deficiency of the Riemann Integral

In analysis, various versions of a completeness notion are crucial properties: For instance for real
numbers, we have the property that if (xn ) is a Cauchy sequence, then lim xn exists. Short of
this property, even very simple calculus theorems like the intermediate value theorem would fail.
(Remember for instance that in Q, the continuous function x → x2 − 2 does not take on the
intermediate value 0.)
Advanced subjects like differential equations (especially artial differential equations) benefit a lot
from carrying over calculus ideas from Rn to vector spaces of functions. We have seen examples:
The space of continuous functions (on a compact set) can be equipped with the supremum norm
(which sponsors the notion of uniform convergence), and we get a complete metric space; actually a
Banach space (this is a vector space in which a metric is defined by a norm, and which is a complete
metric space with this metric). In ODEs, the existence and uniqueness theorem for the initial value
problem (with Lipschitz right hand side) is proved by simple calculus like methods (Banach’s fixed
point theorem) in this vector space. In PDEs, similar ideas carry over, even though the Banach
fixed point theorem is a less universal tool there than it is in ODEs.
In a number of applications however, for instance Fourier series and calculus of variations (and PDEs
in trail of calculus of variations methods), norms defined in terms of integrals come very natural
and are strongly suggested by the very nature of the problem under consideration. Specifically,
                             b            1/2
the norm f    2   :=         a |f (x)| dx       of a function f on [a, b] is analogous to the euclidean norm
                     2 1/2                                                        b
 f 2 :=       i |fi |  of a vector f in Rn . Likewise the norm f 1 := a |f (x)| dx of a function f
on [a, b] is analogous to the taxi norm f 1 := i |fi | of a vector f in Rn . But in contrast to Rn ,
where we have seen that these norms (or the metrics obtained from these norms) are topologically
equivalent, the corrsponding norms for functions are not topologically equivalent, and particularly
they provide inequivalent notions of completeness. The space C 0 [a, b], which is complete with
respect to the max distance, is not complete with respect to either the norm · 2 or the norm · 1 .
Even if we allowed all Riemann integrable functions, rather than only the continuous funcitons, we
would still not have a complete metric space with respect to any of the integral norms of interest.
This deficit prevents us from pursuing the analogy with finite dimensional calculus in sufficient
depth to prove useful results.
The Lebesgue integral is a generalization of the Riemann integral. Every Riemann integrable func-
tion is Lebesgue integrable, and both integral notions coincide for Riemann-integrable functions.
However, there are functions that are not Riemann integrable, but are Lebesgue integrable. The
Lebesgue integral will mend the deficits of the Riemann integral.
Clarification: When I say ‘Riemann integrable’, I am referring to the proper Riemann integral only.
                                    1                        1
Improper Riemann integrals like eg. 0 x−1/2 dx := lima→0+ a x−1/2 dx are not Riemann integrals,
but arise from them via a secondary limiting process. Keep this in mind when interpreting the
statement that every Riemann integrable function is also Lebesgue integrable. A similar statement

would not apply for improper Riemann integrals: The improper Riemann integral 0 sin x dx is x
often cited as an example that is not a Lebesgue integral. Well, it is not a Riemann integral either!
And limN →∞ 0 sin x dx is perfectly good as a limit of Lebesgue integrals, too.
With Lebesgue’s notion of integral, certain functions that are too ‘wild’ to be Riemann integrable
will be integrable under Lebesgue’s definition. But let me stress that the main benefit of the
Lebesgue integral is not that we want to integrate ‘wild’ functions; rather we want to be able
to integrate wild functions so that their existence cannot screw up a clean theory. Even in situations
where all functions involved will (eventually) turn out to be continuous and hence Riemann inte-
grable, we may need the Lebesgue integral to arrive at that conclusion in the first place. It is analog
to the following situation from elementary calculus: Even though a statement like ∞ 2−n = 1
can be made sense of and proved rigorously within Q alone, without reference to irrational num-
bers, any reasonably general theorem that asserts the convergence of the series beforehand cannot
be proved within Q, as such a theorem should also cover cases where the sum turns out to be
(Note on notation: We will study definite, not indefinite, integrals. In the absence of an explicit
domain of integration, the domain of integration should be understood from the context; usually
R or Rn .)
Let us study a few examples of functions that ‘ought to be’ integrable but are not, in the sense of
Riemann. Some claims will only be justified in a heuristic sense, because they serve for motivation
purposes only. A rigorous justification that the examples possess the claimed properties could be
supplied a-posteriori, once the Lebesgue integral is constructed.

Example 7.3.1 (One has to quote this, b/c everybody else does.) For any set M , its characteristic
function χM is defined by χM (x) = 1 if x ∈ M and χM (x) = 0 otherwise. The function χM for
M := Q ∩ [0, 1] is not Riemann integrable.

This example will seem less convincing once the road to the Lebesgue integral is traveled, because
we will say that this χM is ‘almost’ the constant 0, except that it differs from the constant 0 only
on the set M , which is so small that it is considered as immaterial by the integral (because it is
countable). So we will end up identifying χM with the zero function (which was Riemann integrable
all along). Much ado about nothing? We’ll see.
Let me clarify this notion of sets that are ‘too small to be seen by the integral’:

Definition 7.3.2 A set A ⊂ R is called a Lebesgue null set, if for every ε > 0, there exists a
sequence of intervals Ij := ]aj , bj [ whose total length ℓ = j (bj − aj ) satisfies ℓ < ε, and which
covers A, i.e., A ⊂ j Ij . — We often simply say: null set, instead of Lebesgue null set. A similar
definition applies for subsets of Rn , where boxes with total volume ε play the role of the intervals.
‘Box’ refers to a cartesian product of intervals of course.

Definition 7.3.3 We will use the wording ‘almost everywhere’ (ae) to mean ‘everywhere except
on a null set’.

Obviously every countable set {xj | j ∈ N} is a null set, because the intervals Ij := ]xj −ε/2j+2 , xj +
ε/2j+2 [ satisfy the requirements of the definition. Similarly, unions of countably many null sets are
null sets again. Null sets deserve to be considered as having the length (in R) or volume (in R3 )
zero. Since the volume of nice sets A can be defined as χA , the characteristic function of a null
set ought to be integrable, with integral 0. From this point of view, Example 1 is a good example,

not a contrived one: the integrability of χM says that M has a meaningful ‘total length’ (namely
0 in the case of M = [0, 1] ∩ Q).

Exercise 7.1 Based on the definition, prove that a countable union of null sets is a null set.

For the Lebesgue integral to be constructed, the following property will hold true: If            f is defined
and g = f ae. (i.e., {x | g(x) = f (x)} is a null set), then g is also defined, and               g = f . In
colloquial language, the integral is insensitive to changes of the integrand on a null set.      — A variant
of this property also holds for the Riemann integral. However in this case, we need to           assume that
  g be defined, but we cannot conclude it a-priori, as shown by Example 7.3.1.
Note: Just for the record, and without proof: A bounded function f : [a, b] → R is Riemann
integrable if and only if the set of points where it fails to be continuous is a null set.
Obviously χM in Example 7.3.1 is discontinuous on all of [0, 1], and you may believe that this set
is not a null set (even though this statement requires proof of course).

Example 7.3.4 The usual Cantor set C, which arises from [0, 1] by successively removing the
‘middle third’ of each remaining interval, is a Lebesgue null set. Its characteristic function is
Riemann integrable (with integral 0) because of the criterion just mentioned. χC is discontinuous
exactly on C. However, a slight modification of the construction gives a ‘fat Cantor set’ C∗ with
measure (total length) 2 . Its characteristic function is not Riemann integrable any more. This fat
Cantor set can be constructed as follows: C0 := [0, 1], C1 := [0, 1 ] ∪ [ 3 , 1]. Given Cn , which is the
union of 2 n closed intervals, obtain C                                                            n
                                        n+1 by removing from the middle of each of these 2 intervals
                                                                    5        7         2
an open interval of length 2−n 3−n−1 . So, for instance C2 = [0, 36 ] ∪ [ 36 , 1 ] ∪ [ 3 , 29 ] ∪ [ 31 , 1]. Let
                                                                                 3         36       36
                                                ∞    n × 2−n 3−n−1 = 1 has been removed, C ‘ought
C∗ := n Cn . Given that a total length of n=0 2                        2                               ∗
to have’ the remaining length 1 (and the notion of Lebesgue measure will confirm this). The points
of discontinuity of the function χC∗ are exactly those in the set C∗ . Since C∗ is not a null set, χC∗
is not Riemann integrable. It is certainly annoying that the very same type of construction and the
very same type of heuristic reasoning concerning C and C∗ and their respective measure (length)
is borne out rigorously by the Riemann integral in one case, but not in the other.

Exercise 7.2 Prove that for the Cantor set C, the characteristic function χC : R → R is discon-
tinuous on C and continuous on the complement of C.

Example 7.3.5 The function f defined by f (x) := |x|−1/2 e−|x| is not Riemann integrable, because
it is not bounded (and also because the Riemann integral allows only bounded domains of integra-
tion). This deficit has been mended with the notion of the improper Riemann integral. With this
                                                ∞                     1      √
patch by means of a secondary limit, we can find −∞ f (x − a) dx = 2Γ( 2 ) = 2 π for every a. (The
precise value is not of essence here.)
It may now seem natural to wish that the function g given by g(x) := ∞ 21 f (x−an) be integrable
                                                                     n=1 n
for any sequence an for which this infinite sum produces a well-defined g: after all, infinite sums
                                                ∞                              √       √
are bread and butter of calculus. The value of −∞ g(x) dx ought to be ∞ 21 2 π = 2 π.
                                                                        n=1 n
But if the sequence (an ) is dense in R, the function g is unbounded on every small interval, and
the ‘improper integral’ patch does not help here. You might wonder whether in such a case g itelf
is actually still well defined, and this is not so easy to answer; it would however eventually turn
out that g does have a finite value for all x except on a certain null set. So, whereas I am not in
a position to immediately refute the legitimate concern on whether g is well-defined, it turns out
eventually that this concern can be allayed. (You might want to modify the example a bit and take

for an the sequence −1, 0, 1, − 2 , 1 , − 3 , − 4 , 1 , 3 , . . ., which may give you a chance to prove the claim
                                    2     4
                                                    4 4
that g is well defined outside a null set in a pedestrian way. — I haven’t tried to do it, admittedly.)

Example 7.3.6 The function g(x) := ∞ n cos(2n x) seems natural to be considered in the con-

text of Fourier series. It has the following properties (which can be proved after the Lebesgue integral
is established, and which are not at all obvious): The series converges almost everywhere (i.e., for
all x outside a certain null set). This implies that g is a meaningful function. However the con-
vergence is almost nowhere absolute (i.e., the set of x for which the series is absolutely convergent
forms a null set.) This indicates that it may be rather difficult to prove anything useful about g.
Moreover, the function g is unbounded on every small interval, and therefore nowhere continuous.
This means that it is way out of reach for the Riemann integral. Since g is unbounded, g2 is even
worse unbounded. Nevertheless, we would like to argue that 0 g(x)2 dx = π ∞ ( n )2 = π 3 /6

(using a famous result of Euler’s namely that       n −2 = π 2 /6); and this will be true, in the sense of

the Lebesgue integral.

On a formal level, the calculation substantiating this hope is quite easy:
     2π                    2π                                                           2π
                                    1                1                          1                                         1
          g(x)2 dx =                  cos 2n x         cos 2m x dx =                         cos 2n x cos 2m x dx =          π
 0                     0        n
                                    n            m
                                                     m                 n   m
                                                                               mn   0                                 n

because the integral under the double sum vanishes when m = n and is π when m = n. But
this formal calculation is nearly impossible to justify within the realm of classical calculus, be-
cause for almost all x, the series we are multiplying are only conditionally convergent, whereas
absolute convergence would be needed for at least one factor to justify multiplying out the series

The Completion Process

Let’s start with either the set (vector space) C 0 [a, b] of continuous functions on an interval [a, b],
or the vector space of Riemann integrable functions R[a, b]. Actually, you could also use the vector
space S[a, b], which by definition consists of uniform limits of step functions; and step functions are
(finite) linear combinations of characteristic functions of intervals. (The inclusions C 0 ⊂ S ⊂ R
can be seen easily, and they are strict.)
We define f L1 := a |f (x)| dx. Previously we had called this norm · 1 . It is easy to see that
  · L1 is a norm on C 0 [a, b]. However it is only a seminorm on R[a, b] (or S[a, b]). By definition,
a seminorm satisfies all properties of a norm with the exception of “ f = 0 =⇒ f = 0”. For
instance, χ{x0 } , the characteristic function of a single point, satisfies χ{x0 } L1 = 0.
C 0 with · L1 is not a Banach space. For instance the sequence fn in C 0 [−1, 1], defined by
fn (x) := nx/ 1 + (nx)2 , is a Cauchy sequence but it has no limit in C 0 . The function f (x) :=
sign x, which ‘ought to be’ the limit of fn is not in C 0 .

Exercise 7.3 Prove these statements; namely for fn (x) := nx/ 1 + (nx)2 , f (x) := sign x and
g ∈ C 0 [−1, 1]
(a) Show that fn → f uniformly on any compact interval that does not contain 0.
(b) Show if fn → f uniformly on [a, b] then fn − f L1 [a,b] := a |fn (x) − f (x)| dx → 0. [Note: A
similar statement assuming pointwise instead of uniform convergence is false, so don’t attempt to
prove that.]
                1                           b
(c) Show if −1 |fn (x) − g(x)| dx → 0, then a |fn (x) − g(x)| dx → 0 for [a, b] ⊂ [−1, 1].

(d) Show by epsilontics and decomposition of [−1, 1] into pieces [−1, −a] ∪ [−a, a] ∪ [a, 1] that fn is
a Cauchy sequence with respect to · L1 .
(e) Put the pieces together and show that fn does not converge to any continuous function g with
respect to the · L1 norm. Be sure to argue carefully in distinguishing uniform, pointwise and
  · L1 convergence and employing valid implications among these notions.

                                                                                        N   1
Similarly (but far more difficult to prove), the finite sums fN defined by fN (x) :=        n=1 n   cos(2n x)
form a Cauchy sequence in C 0 that has no limit.

The very cheap method of completion:

There is a ‘soft functional analysis’ way to achieve completeness: From any normed vector space X,
we define a vector space CX whose elements are Cauchy sequences from X. Addition of Cauchy
sequences is defined term by term: The sum of Cauchy sequence (fn )∞ and Cauchy sequence
(gn )∞ is the Cauchy sequence (fn + gn )∞ . (It is easy to see that this is indeed a Cauchy
     n=1                                         n=1
sequence). Similarly, a multiplication of Cauchy sequences with real numbers is defined. On the
vector space CX, an equivalence relation is defined as follows: The Cauchy sequence (fn ) is said
to be equivalent to the Cauchy sequence (gn ), iff fn − gn → 0. Consider the set of equivalence
                                            ˆ                         ˆ
classes of Cauchy sequences and call it X. It can be shown that X is a vector space, that a norm
                    ˆ                               ˆ
can be defined on X, and that X imbeds into X, in that f ∈ X can be retrieved as the equivalence
                                                    ˆ Finally one can show that X is a Banach space.
class of the constant sequence (f, f, f, . . .) in X.
It is called the completion of X.
It turns out that, in this sense, the completion of C 0 [a, b] with respect to the norm · L1 gives
L1 [a, b], the ‘vector space of Lebesgue integrable functions on [a, b]’. This fact seems convenient for
the purpose of illustration. It tells us: What we are doing is indeed analogous to the completion
that leads us from Q to R in calculus. However, for purposes of analysis, this general-abstract
approach is as useful as a junk car that breaks down in the first curve. The principal difficulty
being that there is no way to identify the abstract equivalence classes of Cauchy sequences with
actual functions [a, b] → R.
As a matter of fact, strictly speaking, the elements of L1 [a, b] are not functions, but equivalence
classes of functions, with respect to the equivalence relation =ae . As mentioned above, we say
f =ae g if the set {x | f (x) = g(x)} is a Lebesgue null set. (It’s easy to show that this defines
an equivalence relation). Without this identification, we would import the same problem into L1
that we already observed in S and R: · would be a seminorm, not a norm. Nevertheless, by
negligent use of language, we sometimes call the elements of L1 ‘functions’ rather than ‘functions
modulo equality ae.’.
But even the identification of abstract elements of X with such equivalence classes of functions
with respect to the relation =ae cannot be made merely within the abstract construction of X. Toˆ
prove that the abstract X ˆ indeed consists of functions modulo =ae , one has to get one’s hands dirty
and do hard analysis instead of soft functional analysis. And then, after the fact, one can see that
the abstract X is L1 , if one starts with X either C 0 , or R, or S.
The completion method just outlined is cheap and virtually useless: Cheap in that it only uses
general abstract principles about metric spaces, dodging any detail work concerning the specific
example at hand. Useless for the very same reason that it gives no insight in the specific metric
space(s) at hand. It does show that the analogies with R about completeness that were made above
are indeed valid and not merely vague metaphors; more precisely, it would show this, once we have
done the detail work and then proved that the outcome of the detailed completion construction

falls into this general framework.

The Daniell Approach

This approach of constructing the Lebesgue integral has the advantage that it does not a-priori
require any measure theory. Any notions relating to ‘measure’ (the abstract generalization of the
naive volume) will come out of the integral once it is constructed.
One begins with an ‘elementary integral’, which could be the Riemann integral on R[a, b], or its
restriction to S[a, b] or C 0 [a, b]. (Starting with an elementary integral that is even a bit less powerful
than the Riemann integral makes life a bit easier in the elementary courses without later exacting
a price when constructing the Lebesgue integral from it.) We call these functions elementary-
integrable. We use E for either C 0 or R or S. We can also let E stand for compactly supported1
C 0 functions on R, or Rn . In either case (and a in whole lot of other cases not in focus here), the
following extension procedure applies:
One extends the integral to other functions by a limiting process that builds on pointwise monotonic
convergence and seems well motivated in view of the above examples of functions that ‘ought to
be’ integrable.
For the extension procedure, the following properties of the elementary integral are crucial: It is
monotonic (f ≤ g =⇒ f ≤ g), and its domain is a vector space of real valued functions with
the extra property that for f and g integrable, their max{f, g} and min{f, g} are also integrable.
Moreover, one needs the property that for monotonic (increasing or decreasing) sequences fn of
integrable functions whose limit f is again integrable, it holds fn → f (Continuity of the
elementary integral with respect to monotonic convergence). The Riemann integral, restricted to
Ccpt satisfies these hypotheses.
In a first step we define the integral for ‘lower functions’ and for ‘upper functions’: A function is
called a lower function (∈ L) if it is the (pointwise) limit of an increasing sequence of elementary-
integrable functions. A function is called an upper function (∈ U) if it is the (pointwise) limit
of a decreasing sequence of elementary-integrable functions. (Values +∞ are allowed for f ∈ L,
and values −∞ are allowed for f ∈ U.) As an aside, let it be noted that in the case E = C 0 ,
the set L consists exactly of the lower semicontinuous functions, i.e., functions f that satisfy
lim inf xn →x f (xn ) ≥ f (x), and the set U consists of the upper semicontinuous functions, i.e., func-
tions f that satisfy lim supxn →x f (xn ) ≤ f (x).

Exercise 7.4 Prove: A function f : Rn → R is lower semicontinuous if there exists a sequence
(fk ) of continuous functions fk : Rn → R such that fk ր f . Hint: Make sure not to overlook the
fact that δ may depend on k. Show first fk (x0 ) ≤ lim inf x→x0 f (x) + ε for every k and every ε.

Note: The converse is also true, but we have no need to prove this at the moment.
[I have coined the words lower function and upper function ad hoc and am not aware whether there
is a ‘canonical name’ for them. Neither Royden’s Real Analysis nor Floret’s (German language)
book on measure and integration theory, which I have used as references, are coining a term at all.]
If E ∋ fn ր f ∈ L, we define f := lim           fn (and the limit exists by monotonicity; it may be +∞).
A similar definition is made for f ∈ U.
Of course one needs to show that these new integrals on L and U coincide with each other and/or
with the elementary integral for those functions where several are defined, and that the usual
    A function f : Rn → R is called compactly supported if the closure of the set {x | f (x) = 0} is compact; or
equivalently by Heine-Borel, if there is a bounded set S such that f vanishes on the complememnt of S.

properties (like linearity, monotonicity) are still valid. Some work is to be done here, but nothing
out of the ordinary.
In a second step, one defines an upper and lower integral for an arbitrary function (reminiscent of
the upper and lower Riemann sums): For a function f define the upper integral

                                    f := inf      g g ≥ f; g ∈ L

and the lower integral
                                    f := sup      g g ≤ f; g ∈ U

Note that we put ‘upper functions’ (= functions constructed by approximation from above) below
f , and ‘lower functions’ (= functions constructed by approximation from below) above f . The
other way round, namely approximating f from below by means of functions that themselves were
approximated from below would be a bit redundant and far less powerful. For instance, we would
get χC∗ ∈ U all right, but the supremum of those functions in L that are ≤ χC∗ would be the zero
function. This would, according to the following definition, disqualify χC∗ from being integrable:
A function is called integrable if its upper and lower integral coincide and are finite. If the upper
and lower integrals coincide, but may have the common value +∞ or −∞, we still permit ourselves
to write f = ∞ (or f = −∞) in these cases respectively. Whether we start with R, S, or C 0 ,
the resulting integral turns out to be the same, and this is the Lebesgue integral.

Exercise 7.5 Show: If f is lsc and f ≤ χC∗ , then f ≤ 0. (χC∗ refers to the characteristic function
of a fat Cantor set, cf. Example 7.3.4)

Again some work needs to be done to establish that this notion of the integral is not in conflict
with preceding notions and that the integral satisfies the fundamental properties. On the set of
integrable functions, we have the seminorm f L1 := |f (x)| dx. A function f satisfies f L1 = 0
if and only if f = 0 almost everywhere (a.e.). By identifying functions that are equal a.e., · L1
becomes a norm on the equivalence classes. This set of equivalence classes of integrable functions
modulo the relation ‘equal a.e.’ is called L1 . It turns out that L1 with this norm is a Banach space,
and that C 0 is dense in L1 , i.e., for every function (i.e., equivalence class of functions modulo
equality a.e.) f , there exists a sequence of continuous functions fn such that fn − f L1 → 0. .
Note that the definition of L1 has inherently used one property of the Lebesgue integral that you
may not have anticipated: If f is integrable, then |f | is also integrable. This, in particuar, implies
that sin x/x on R is not Lebesgue integrable. There is no notion of conditional convergence in the
theory of the Lebesgue integral, it must be absolute convergence or nothing. This feature can be
perceived as a necessary consequence of the fact that in the construction of the Lebesgue integral,
the domain of the functions does not need to have a notion of order: it works equally for R and
Rn . The area under the graph of a function f : R → R is not constructed by successively adding
up ‘thin vertical slices from left to right’ as is done with Riemann’s integral, but rather in terms of
horizontal stripes ‘bottom up’ or ‘top down’. Absolute convergence is the notion that is robust to
changes in the order of summation, whereas conditional convergence may well produce expressions
∞ − ∞ when the order of terms is changed.
We’ll list the fundamental properties of the Lebesgue integral in the next section. These properties
will in particular imply a ‘saturation’ feature, namely: if we start over with the extension procedure,
using the Lebesgue integral as an elementary integral, we do not gain further integrable functions.

Note: The extension process is rather general: We might define a function f : N → R (i.e.,
a sequence) as ‘elementarily integrable’ iff fn = 0 for all but finitely many n. In this case, the
elementary integral would be the (finite) sum      fn . The extension process would yield as integral
notion the absolute convergence of the series    fn . This stresses an analogy between integrals and
infinite series, and it allows to use the strong exchange of limit theorems of Lebesgue’s theory of
integration to be carried over to absolute convergent series.

Key properties of the Lebesgue integral

Preface: Occasionally, the hypothesis ‘f measurable’ is going to show up. This notion will be
defined below. For the moment, simply be aware that in all practical applications this hypothesis
will be satisfied almost trivially. The only way to ever encounter a non-measurable function is to
actively look for one. I prefer to collect all key properties together here rather than postponing
those that involve measurability.
Theorems are formulated for integrals over all of Rn . This is no loss of generality, since       M   f =
 Rn (f χM ). They will be re-quoted with a reference number, and proved, in due time.
Theorem: The integral is linear; also, if f and g are integrable, then so are max{f, g} and
min{f, g}. If f is integrable and k ≥ 0 is a constant, then min{f, k} and max{f, −k} are also
integrable. If f and χA are integrable, then f χA is integrable.
Theorem: (monotone convergence) Assume fn ∈ L1 is a non-decreasing sequence and let
f := lim fn . Then fn → f ; in particular, f ∈ L1 if lim fn < ∞. (An obvious analog holds for
non-increasing sequences.)
Theorem: (Fatou’s lemma) Let fn ∈ L1 be non-negative and assume fn → f a.e. Assume
lim inf fn < ∞. Then f ∈ L1 and f ≤ lim inf fn .
There are three key types of examples to show that equality needn’t hold in Fatou’s lemma:
(a) spreading: With (e.g.) g(x) := 1/(1 + x2 ), we let fn (x) = n g(x/n) = n/(n2 + x2 ). Then fn = π
for all n, but fn → 0 pointwise.
(b) concentration: fn (x) := ng(nx) = n/(1 + n2 x2 ). Again fn = π independent of n, but fn → 0
a.e. (everywhere except at x = 0). [If you want convergence everywhere, you can modify this example,
say take g(x) := |x|/(1 + x4 ).]
(c) running off to infinity: fn (x) := g(x − n)

Exercise 7.6 For each of these three examples, sketch graphs of f1 , f2 and f4 in a common coor-
dinate system (one figure with three graphs for each example).

Theorem: (dominated convergence) Suppose fn ∈ L1 and fn → f a.e. Assume that there
exists g ∈ L1 such that |fn | ≤ g for all n. Then fn → f . In particular f ∈ L1 .
This is almost the ‘one size fits all’ theorem for all types of exchange of limits. Eg., if we apply it to
difference quotients, it gives us a criterion for differentiating under the integral sign.
Theorem: If f ∈ L1 , then |f | ∈ L1 . The converse holds under the extra hypothesis that f is
measurable. — Moreover, if f is measurable and |f | ≤ g with g ∈ L1 , then f ∈ L1 .
We had noted the first part already above. The issue with the measurability hypothesis in the converse
is simple: There exist bounded sets M that are geometrically so wild that χM is not integrable, even
in the Lebesgue sense. Then obviously 2χM − 1 isn’t integrable either, but its absolute value is the
constant 1, which is integrable (over bounded sets).

Theorem: (Fubini) Let f ∈ L1 (Rn+m ). Then g(y) := Rn f (x, y) dx exists and is finite for
a.e. y ∈ Rm (i.e., the function f (·, y) ∈ L1 (Rn ) for a.e. y), and g ∈ L1 (Rm ), and f (x, y)d(x, y) =
 Rm g(y)dy.
Theorem: (Tonelli) – partial converse of Fubini Let f : Rn+m → R be measurable and
assume the iterated integral Rm Rn |f (x, y)|dx dy exists and is finite. Then f ∈ L1 (Rn+m ), and
the conclusions of Fubini’s theorem hold.
These two are for practical calculation of multi-variable integrals in the way ususally done for Riemann
integrals, but under weaker hypotheses. Tonelli needs the hypothesis about |f | rather than f in the
repeated integral, because otherwise the pre-specified order of integration could camouflage ‘∞−∞’ type
of problems: Eg., take g(x) := x/(1 + x2 )2 . Clearly g ∈ L1 (R), and g = 0. But for f (x, y) := g(x),
clearly f ∈ L1 (R2 ). Trying to integrate f over y first gives +∞ for all positive x and −∞ for all
negative x. But now that we are forced to assume the iterated integral for |f | rather than f , we also
need the measurability hypothesis, for the same reason as in the previous theorem on |f |.
Note: Since the Riemann integral is the restriction of the Lebesgue integral to more ‘benign’
functions, the convergence theorems, in particular the dominated convergence theorem, has a corol-
lary for the Riemann integral. The distinction is that for the Riemann integral, we would need
to assume the Riemann integrability of the limit function, whereas its Lebesgue integrability is a
conclusion. But if in a given situation we already know enough about the limit function to check
that it is Riemann integrable, then most likely we don’t need the dominated convergence theorem
at all in that situation. — An analogous situation is the following: If we already know that ∞ 2−n
is a rational number, then it is probably because we know that it is 2. But in this case, we do not
need any abstract theorem to assure us of the convergence of this series. In contrast, if we study
   0 1/n!, it is useful to know about convergence of this series before knowing its value, because
we then define the number e as the value of this series.


Measure is an abstract notion that reduces to the notion of length/area/volume for nice sets for
which these latter notions are naively defined. We intend to define the measure of a set as the
integral of its characteristic function. There is however a subtlety to be considered before we can
capture the definition precisely. All these subleties are produced as consequences of the axiom
of choice from abstract set theory. The axiom of choice says: Given a set A whose elements are
non-empty sets A, we may define a set R consisting of exactly one element taken from each set
A ∈ A.
This principle allows to construct counterintutive geometric examples:
The Banach Tarski Paradox (BTP): Let B be the closed unit ball in R3 . It is possible to
write B as a disjoint union of finitely many sets Bi and to have congruence mappings φi : Rn → Rn
(i.e., isometries) in such a way that the union φi (Bi ) is the union of two closed unit balls!
This means that any generalization of the naive volume notion cannot be defined on arbitray sets,
but must exclude certain ‘very wild’ sets, for which a volume (measure) would not be defined.
There needs to be a notion of a ‘measurable set’ M . And (some of) the pieces Bi in the BTP
should be ‘not measurable’.
Basically the BTP is surprisingly elementary to prove, given its radically counterintuitive nature.
It is not so different from the fact that, for instance, Z is the disjoint union of two sets 2Z =
{. . . , −4, −2, 0, 2, 4, . . .} and 2Z + 1 = {. . . , −3, −1, 1, 3, 5, . . .}; but both of these sets and their
union Z are all of the same ‘size’ in that each set is in 1-1 correspondence with the other. What is

needed as an extra ingredient to get the BTP is a study of the group of isometries of R3 to do the
decomposition in such a way that the pieces are geometrically congruent (i.e., isometric).
It has been shown that the existence of (Lebesgue-)non-measurable sets in Rn is needs the axiom
of choice (relative to the other axioms of set theory); this implies in particular that by constructive
means (i.e., without using the axiom of choice), we will never encounter a non-measurable set (nor
a non-measurable function).
How could we define measurability of a set? The condition ‘ χM exists and is finite’ is too strict,
because it would disqualify M = R3 , which deserves to be called measurable, albeit with infinite
measure. The condition ‘ χM exists’ (finite or infinite) is too lax, because it would let a wild
bounded set (like the Bi from BTP) hide in the shadow of a bona-fide measurable set of infinite
measure: Say Bi is a subset of the left half space, and let H be the right half space. Then
  χBi ∪H = ∞, but we do not want to call Bi ∪ H measurable.
Measure theory takes an axiomatic approach to the notion of measurable, and then constructs the
Lebesgue measure as an example of this abstract notion. One popular approach to the Lebesgue
integral builds on first constructing the Lebesgue measure and then defining the Lebesgue integral
in terms of the measure. The Daniell approach comes from the opposite direction: first the integral,
then the measure. The subtlety about measurable sets therefore necessitates to define a notion of
‘measurable function’; then we can call a set M measurable, if its characteristic function χM is
measurable; and it will turn out that under this hypothesis χM is defined (finite or +∞), and we
call this value the measure of M .
Definition: (a) A function f is measurable, if it is the limit (a.e.) of a sequence of integrable
functions. (b) Equivalently, f is measurable, if for every pair of integrable functions g and h
satisfying g < h, the ‘cut off function’ max{g, min{f, h}} is integrable.
The implication (a) =⇒ (b) follows from the dominated convergence theorem. The implication
(b) =⇒ (a) follows by choosing g = −h = nχBn (0) .
Definition: A set M is measurable if and only if its characteristic function χM is measurable.
The measure of a measurable set is the integral of its characteristic function (finite or infinite).
Sets of measure zero are exactly the null sets introduced earlier.
(For those who know the measure theoretic approach: our notion of measurable is the one called
‘Lebesgue measurable’, not the one called ‘Borel measurable’ there. I will not try to outline the
equivalence of the Daniell approach with the measure theoretic one. However, I list the following
equivalent notions of Lebesgue-measurability for reference.)
Theorem: A function f : Rn → R is measurable if and only if it is the limit (ae) of a sequence of
continuous functions. — A function f : Rn → R is measurable if and only if the sets {x | f (x) > α}
are measurable for every α ∈ R. — Complements, differences, countable unions, and countable
intersections of measurable sets are measurable. Open sets are measurable; so are closed sets.
Theorem: Continuous and integrable functions are (trivially) measurable. If g : Rn → R is
continuous and fi (i = 1, . . . , n) are measurable, then the function x → g(f1 (x), . . . , fn (x)) is also
measurable. In particular, sums, products, quotients with non-zero denominator, of measurable
functions are measurable. Limits (ae) of sequences of measurable functions are measurable.
Note: If f is measurable and g continuous, f ◦ g need not be measurable. However, if f is
measurable and g is C 1 , then f ◦ g is measurable. A counterexample in the ‘g continuous’ case can
be constructed via the ‘devil’s staircase function’ discussed below.

Popular counterexamples

The Banach Tarski Paradox we mentioned already.
A non-measurable set: But here is an even simpler (if less spectacular) example of a non-
measurable set: On R, call two numbers equivalent r1 ∼ r2 if and only if their difference is
rational. Select one representative from each equivalence class. Call the set of representatives S.
Let S0 := {x − ⌊x⌋ | x ∈ S}. This serves the purpose to choose representatives in [0, 1]. For
each q ∈ Q, let Sq := {x + q | x ∈ S0 }. All the Sq are disjoint, since we have only chosen one
representative from each equivalence class. If S0 had positive measure m, then Sq would have
the same measure m for each q, and the countable union T := q∈Q;|q|≤1 Sq would have infinite
measure. But T ⊂ [−1, 2]. If S0 had measure 0, then the countable union T ′ := q∈Q Sq would
also have measure 0. But T ′ = R. So S0 cannot be measurable.
The devil’s staircase:     This is a continuous (and monotone) function that has a classical
derivative almost everywhere (namely on the complement of the Cantor set C), but that is not
constant. Namely, define ψ inductively on the intervals that are the complements of the Cantor
ψ(x) =   2   for         1
                   x ∈ ] 3 , 2 [,
ψ(x) =   4   for         1
                   x ∈ ] 9 , 2 [,
ψ(x) =   4   for         7
                   x ∈ ] 9 , 8 [,
         1                1 2
ψ(x) =   8   for   x ∈ ] 27 , 27 [,
         3                7 8
ψ(x) =   8   for   x ∈ ] 27 , 27 [,
ψ(x) =   8   for         19
                   x ∈ ] 27 , 20 [,
ψ(x) =   8   for         25
                   x ∈ ] 27 , 26 [ etc.
The function can be extended continuously
onto all of [0, 1], and its derivative is defined
and 0 on the complement of C. (It is +∞ on
the null set C.) But ψ is not constant.
Sometimes the function φ(x) := 2 (x + ψ(x))
is useful, because it is a homeomorphism, i.e.,
continuous with a continuous inverse.
For instance, for the Cantor set C, we have φ(C), a compact set with measure 2 , and φ([0, 1] \ C)
an open set with measure 2 . It is possible to have a non-measurable subset M of φ(C). But
N := φ−1 (M ) ⊂ C is a null set. This means that χN is measurable, but χN ◦ φ−1 = χM is not
measurable, even though φ−1 is continuous.
A study of properties of derivatives in connection with Lebesgue-integrable functions is better left
to a more advanced course in real analysis. You might be enticed to say, concerning the devil’s
staircase function ψ: Since ψ ′ is defined and 0 almost everywhere (and undefined only on a null
set, namely C), we can integrate ψ ′ in the sense of the Lebesgue integral. But if we do this, we
observe that
                                                  ψ ′ (x) dx = 0 = 1 = ψ(1) − ψ(0)
So we seem to forfeit the fundamental theorem of calculus. If this were indeed the case, the whole
theory would be dead on arrival. The issue with this is that the notion of derivative ‘classical
derivative almost everywhere’ is not a useful notion; to regain the fundamental theorem of calculus
(FTC) in such a general setting requires a more sophisticated generalization of the notion of deriva-
tive. Doing so would make the FTC the cornerstone of the definition of derivative, and integration

would need to be studied prior to this, as we are indeed doing here.
So if you find it paradoxical that we deal with integration first, be alerted that it is not paradoxical
at all. However, we will not pursue the depth indicated here when dealing with derivatives later,
but rather study them in a classical abstract multi-variable setting.

7.4     Vector Lattices and Elementary Integrals

The starting point of our approach to integration is a vector space of real valued functions that is
closed under the operations max and min. On such a space we define an integral whose primary
properties are linearity, monotonicity, and a benign behavior with respect to monotone convergence.

Definition 7.4.1 A vector lattice is a real vector space V consisting of functions X → R (hence a
subspace of F(X → R)) satisfying the extra property that f, g ∈ V implies max{f, g} ∈ V .

Corollary 7.4.2 In a vector lattice V , if f, g ∈ V then also min{f, g} ∈ V , |f | ∈ V , f+ :=
max{f, 0} ∈ V , f− := max{−f, 0} ∈ V .

The proof is an easy exercise.

Definition 7.4.3 On a vector lattice V , an elementary integral is a function I : V → R satisfying
the following properties, for all f, g ∈ V and all λ ∈ R:
    I(λf ) = λI(f )
    I(f + g) = I(f ) + I(g)
    f ≤ g ⇒ I(f ) ≤ I(g)    monotonicity
    fn ց 0 ⇒ I(fn ) → 0     continuity with respect to monotone convergence

                                                                                                   (i)     (i)
Example 7.4.4 In Rn , call a function f a step function, iff there exist partitions a0 < a1 <
         (i)                                                             (1) (1)                   (n) (n)
. . . < aki for each i ∈ {1, 2, . . . , n} such that for each open box ]aj1 , aj1 +1 [ × . . . × ]ajn , ajn +1 [,
the restriction of f to this box is a constant, and f vanishes outside the closure of the union of
these boxes, and f is bounded. (This latter condition is only a technical restriction and concerns
the values on the boundaries of the open boxes.)
Step functions make up a vector lattice S. An elementary integral on S is given by the Riemann
integral, or equivalently and in more elementary terms: I(f ) = µ(Bi )fi , where fi is the value of
f on Bi , and µ(Bi ) is the elementary geometric volume of the box Bi , namely the product of the
side lengths. All properties are easily seen and an exercise for the reader, with the exception of the
continuity property for the elementary integral, which will be proved as Lemma 7.4.12.

Example 7.4.5 In Rn , consider all continuous functions with real values. They form a vector
lattice C 0 (Rn ). Select one point x∗ ∈ Rn arbitrarily, and consider the function δx∗ (f ) := f (x∗ ).
Then δx∗ is an elementary integral.

Definition 7.4.6 Given a metric space X, the support of a function f : X → R or f : X → C is
defined to be the closure of the set {x ∈ X | f (x) = 0}.

Example 7.4.7 In Rn , consider all continuous functions that vanish outside some bounded set,
i.e., all continuous functions with compact support. This is a vector lattice, called Ccpt(Rn ). As

an elementary integral, we can take the Riemann integral of a function f ∈ Ccpt over all of Rn , or

equivalently, over a big cube containing the support of the function. All properties are easily seen,
except for the continuity, which follows from the Theorem of Dini 7.4.10 proved below.

In this example, where the elementary integral is a Riemann integral that (unlike Example 7.4.4)
cannot trivially be written as a finite sum, we will assume the linearity and monotonicity of the
Riemann integral from elementary calculus. However, it is not necessary, strictly speaking, to have
a clean construction of Riemann’s integral prior to a construction of Lebesgue’s integral. Rather,
we will obtain Lebesgue’s integral based on Example 7.4.4. It will become clear a-posteriori that
starting the construction with Ccpt functions rather than step functions leads to the very same

Example 7.4.8 Choose a (weakly) increasing continuous function g : R → R and take as a vector
lattice the family of step functions on R. For f = fi on ]ai , bi [, define Ig (f ) :=   fi (g(bi ) − g(ai )).
This type of integral is called a Stieltjes integral; it weights the value of a function with a local rate
of increase of a given function g. Again, Ig is an elementary integral.

Example 7.4.9 The set of those functions f : N → R for which f (n) = 0 for only finitely many
n (in other words, compactly supported sequences) forms a vector lattice Seq. On this lattice,
I(f ) := n f (n) is an elementary integral. (The sum by which it is defined is a finite sum!)

Theorem 7.4.10 (Dini’s Theorem) Suppose K is compact and fi : K → R are continuous func-
tions. Suppose the sequence (fi ) converges monotonically to a continuous function f . Then fi → f

Proof: Without loss of generality, assume fi ց f (for increasing convergence, take −fi and
−f instead). Without loss of generality, assume f = 0, else consider fi − f instead of fi . The
assumption is fi (x) ց 0 for all x ∈ K, which in particular includes fi (x) ≥ 0 for all i, x. Let ε > 0.
For each x, there is some n(x) such that 0 ≤ fi (x) < ε/2 for i ≥ n(x). By continuity of fn(x) , there
exists an open ball B(x, δ(x)) such that fn(x) (y) < ε for y ∈ B(x, δ). By monotonicity, this implies
fi (y) < ε for all y ∈ B(x, δ(x)) and all i ≥ n(x). Trivially, the B(x, δ(x)) for x ∈ K form an open
cover of K. By compactness, we have a finite subcover, i.e., finitely many xj such that the union
of the B(xj , δ(xj )) contains K. Let n be the maximum of n(xj ). Then for all i ≥ n and all y ∈ K,
it holds 0 ≤ fi (y) < ε. This shows that fi → f uniformly.

Corollary 7.4.11 If fi ց 0 in Ccpt(Rn ), then I(fi ) → 0, where I denotes the Riemann integral.

Proof: There is a cube [−N, N ]n containing the support of f1 , and therefore the support of all
fi . For i large, 0 ≤ fi (x) < ε/(2N )n , and therefore 0 ≤ I(fi ) ≤ ε.

Exercise 7.7 Show by counterexamples that none of the hypotheses of Dini’s theorem can be omit-
ted: Dini’s theorem may fail if K is not compact, or if f is not continuous, or if the convergence
is pointwise but not monotonic, with all the other hypotheses unchanged in each case.

The continuity property for the elementary integral on step functions is proved along the same lines
as Dini’s theorem, except that some technicalities due to discontinuities need to be taken care of:

Lemma 7.4.12 Let (fi ) be a sequence of step functions (as defined in Example 7.4.4), such that
fi ց 0. Then I(fi ) → 0.

Proof: We use the max distance on Rn , so that the balls are boxes. f1 is supported in some box
C := [a(1) , b(1) ] × · · · × [a(n) , b(n) ] whose volume is V = (b(1) − a(1) ) · · · (b(n) − a(n) ).
Let M be a bound for f1 (and hence for all fn ). All fi are supported in this same closed box C. Let
ε > 0. For each fi , C is partitioned into boxes Cj (j = 1, . . . , mi ) on whose interior fi is constant.
                                                             (i)                 (i)
It is possible to cover the union D of the boundaries ∂Cj          of the boxes Cj     with a union of open
boxes   Dk     of total volume ε/(2M 2i ).
For every x that is not on the boundary of any box Cj , we conclude from fi (x) ց 0 that there
exists some n(x) such that 0 ≤ fi (x) ≤ ε/(2V ) if i ≥ n(x). The same holds for all y in some ball
B(x, δ(x)), because fn(x) is constant on such a ball. (The fi for i > n(x) may not be constant on
that same ball any more, but the estimate prevails because fi (y) ≤ fn(x) (y).)
The boxes Dk and the balls B(x, δ(x)) together make an open cover of C. There is a finite subcover
consisting of some boxes Dj (which we have taken from among the Dk and renumbered), and some
balls B(xj , δ(xj )) =: Bj , which are also boxes. The maximum of all the n(xj ) will be called N .
Now for i ≥ N and all x ∈ Bj , we have 0 ≤ fi (x) ≤ ε/(2V ). The total volume of the Bj is
trivially ≤ V . For all x ∈ Dj , we have trivially 0 ≤ fi (x) ≤ M . The total volume of the Dj is
≤ i ε/(2M 2i ) = ε/(2M ).
The Bj and Dj may overlap, but by refining the partition and renumbering the boxes, we can
achieve them to be mutually disjoint, and the estimates about their total volumes remain true.
                       ε          ε
Therefore I(fi ) ≤    2V   V +   2M M   = ε.

7.5      First Extension Step by Monotonic Convergence

Definition 7.5.1 Given a vector lattice V of functions X → R, define as ↑ V the set of all functions
f : X → R ∪ {∞} for which there exists an increasing sequence (fi ) ⊂ V such that fi ր f . Likewise
define as ↓ V the set of all functions f : X → R ∪ {−∞} for which there exists a decreasing sequence
(fi ) ⊂ V such that fi ց f .

Trivially, V is contained in ↑ V and in ↓ V . Note that ↑ V and ↓ V may not be vector lattices any
more, because f ∈ ↑ V does not imply −f ∈ ↑ V . (However f ∈ ↑ V ⇐⇒ −f ∈ ↓ V ). We aim to
extend the elementary integral from V to ↑ V and ↓ V . To this end, we will have to show that if
fi ր f and gi ր g, then lim I(fi ) = lim I(gi ). Then we can define I(f ) := lim I(fi ), and moreover,
using the constant sequence (f ), the new definition of I(f ) for f ∈ ↑ V coincides with the old one
in case f ∈ V .
We prove the

Lemma 7.5.2 Suppose fi ր f and let g ∈ V satisfy g ≤ f . Then I(g) ≤ lim I(fi ).

Proof: Clearly, by monotonicity lim I(fi ) exists in R ∪ {∞}, and we may assume the limit
is finite, because otherwise the claim is trivial. Consider gi := (g − fi )+ = max{g − fi , 0}.
So (gi ) is decreasing and the pointwise limit of gi is max{g − f, 0} = 0. Therefore we have

I(g) − I(fi ) = I(g − fi ) ≤ I(gi ) → 0. Hence I(g) − lim I(fi ) ≤ 0.

Corollary 7.5.3 (1) If fi ր f and gi ր g and g ≤ f , then lim I(gi ) ≤ lim I(fi ).
(2) If fi ր f and gi ր f where fi , gi ∈ V , then lim I(fi ) = lim I(gi ).

Proof: Use Lemma 7.5.2 with gj for g, then take the limit as j → ∞ to obtain (1). The converse
inequality needed for (2) follows by symmetry.

Definition 7.5.4 For f ∈ ↑ V , choose (fi ) such that fi ր f and define ↑ I(f ) := lim I(fi ). By the
preceding corollary, the definition is independent of the chosen sequence (fi ).
Similarly, for f ∈ ↓ V , choose (fi ) such that fi ց f and define ↓ I(f ) := lim I(fi ).

Clearly f ∈ ↑ V iff −f ∈ ↓ V , and ↑ I(f ) = −↓ I(−f ). Moreover ↑ I(f ) = I(f ) = ↓ I(f ), if f ∈ V .
If f ∈ ↑ V ∩ ↓ V , we also have ↑ I(f ) = ↓ I(f ). For if fi ց f and gi ր f , then fi − gi ց 0 and
hence I(fi ) − I(gi ) = I(fi − gi ) ց 0. Since lim(I(fi ) − I(gi )) = 0, lim I(fi ) ∈ R ∪ {−∞} and
lim gi ∈ R ∪ {∞}, we conclude lim I(fi ) = lim I(gi ) ∈ R.
For this reason, we now may drop the superscripts and consider I as defined on ↓ V ∪ ↑ V .
Our extended integral retains most, but not all properties from Def 7.4.3 of the elementary integral:

Proposition 7.5.5 If f, g ∈ ↑ V and λ ≥ 0, the following proerties hold:
 I(λf ) = λI(f ) subject to the ad-hoc stipulation 0 · ∞ := 0
 I(f + g) = I(f ) + I(g)
 f ≤ g ⇒ I(f ) ≤ I(g)
 fn ր f ⇒ I(fn ) → I(f ) in particular f ∈ ↑ V
Analogous statements hold in ↓ V .

In reading this proposition, note that f may have ∞ among its values, and that the limited
arithmetic on the extended real line E did not include a definition for 0 · ∞ (and for good reasons).
The ad-hoc stipulation in the first part makes sure that λf is defined for λ = 0 even if ∞ is among
the values of f . In measure and integration theory (and only there), this ad-hoc stipulation is
reasonable, and is frequently encountered. We will always quote this ad-hoc stipulation where it is
used, but will not make it part of the limited arithmetic of E.
Proof of Prop. 7.5.5: The (modified) linearity properties follow because fi ր f and gi ր g
imply fi + gi ր f + g and αfi ր αf . The monotonicity property is just Cor. 7.5.3.
The limit property is proved as follows: For fn ∈ ↑ V , we can find a sequence (fnj )∞ in V such
that fnj ր fn . Now if fn ր f , we can also find a sequence hk in V (rather than           ↑ V ) such that

hn ր f ; namely we choose: hk := max{fij | i, j ≤ k}. Clearly, (hk ) is an increasing sequence. Since
fij ≤ fi ≤ f , we have hk ≤ f . We still have to show hk (x) → f (x) for each x. Given x, and given
ε > 0, we find fi such that fi (x) > f (x) − 1 ε because fi ր f . Likewise, for this fi we find fij such
that fij (x) > fi (x)− 1 ε. Now letting k := max{i, j}, we have hk (x) ≥ fij (x) > fi (x)− 1 ε > f (x)−ε.
                       2                                                                   2
Now let lim I(fn ) =: I∗ , and we assume this quantity to be finite for the moment. Given ε > 0,
we find n such that I(fn ) > I∗ − 2 ε and then we find j such that I(fnj ) > I(fn ) − 1 ε. With
k := max{n, j}, we then have I(hk ) ≥ I(fnj ) > I∗ − ε. So I(f ) = lim I(hk ) ≥ I∗ , and the converse
inequality is obvious from fn ≤ f , hence I(fn ) ≤ I(f ). An analogous argument can be made for

I∗ = ∞ with I∗ − 1 ε being replaced by a large number N and I∗ − ε by N − 1.

Note that in ↑ V , we can not conclude from fn ց 0 that I(fn ) → 0, simply because I(f ) may
                       1                        1
be ∞, and then lim I( n f ) = ∞ even though n f ց 0. We will later prove this convergence result
under the extra hypothesis that I(f ) is finite.

Let us now look at ↑ V and ↓ V in the case of the various Examples of elementary integrals studied
in the previous section. The easiest case is

Proposition 7.5.6 For the vector lattice Seq defined in Example 7.4.9, ↑ Seq consists of those
sequences f : N → R ∪ {+∞} for which fn < 0 occurs for at most finitely many n. A similar
statement applies for ↓ Seq.

The proof is an easy exercise.

Proposition 7.5.7 For the vector lattice S from Example 7.4.4, Ccpt(Rn ) ⊂ ↑ S ∩ ↓ S.

Proof: First let f ∈ Ccpt (Rn ) and suppose the support of f is contained in [−a, a]n . Since f
is uniformly continuous, for every ε > 0 there exists a k such that on any cube of length a/k, f
will oscillate by at most ε in that cube, i.e., |f (x1 ) − f (x2 )| < ε for any x1 , x2 in that cube. So,
with ε = 1/i, we find a step function fi (constant on the boxes created by an equidistant partition
of [−a, a] and vanishing outside [−a, a]n ), such that sup |f − fi | ≤ 1 . These fi converge uniformly
to f . We take gi (x) := fi (x) − 1 for x ∈ [−a, a]n and 0 otherwise. gi → f uniformly, and morever
gi ≤ f . Now let hi := max{g1 , . . . , gi }. The sequence (hi ) is increasing, and still converges to f
uniformly (and hence pointwise). So f ∈ ↑ S. The proof that f ∈ ↓ S is similar.

We will not give a precise characterization of ↑ S and ↓ S, but will improve on Proposition 7.5.7 after
studying the next example.
For the next proposition, we remind the reader of the definition of lower semicontinuity, and
the properties proved about it in Sec. 3.4a. Moreover, we prove the following simple result, a
generalization of Thm 3.5.8, which is also of independent interest in the study of infinite dimensional
minimization problems (Calculus of Variations):

Lemma 7.5.8 A lower semicontinuous function on a compact set takes on its minimum.

Proof: Let f be lsc on a compact set K, and let m := inf K f ∈ R ∪ {−∞}. Take a sequence
(xk ) such that f (xk ) → m. By descending to a subsequence, we may assume that xk →: x∗ . Since
f (x∗ ) ≤ lim inf f (xk ) = m by lower semicontinuity, but also f (x∗ ) ≥ m by m being the infimum,
we conclude that m is a minimum, and equals f (x∗ ).

Proposition 7.5.9 For the vector lattice Ccpt, the functions in ↑ Ccpt are precisely those lower
                                             0                      0
semicontinuous functions f : Rn → R ∪ {∞} that are ≥ 0 outside some compact set. Those in ↓ Ccpt
are precisely those upper semicontinuous functions f : R n → R ∪ {−∞} that are ≤ 0 outside some

compact set.

Proof: We prove the first part, the second being analogous. We have seen in Exercise 7.4 that all
functions f ∈ ↑ Ccpt are lower semicontinuous. Moreover, if f1 ր f and fi have compact support,

then f ≥ 0 outside the support of f1 . Now let’s look at the converse claim and assume f is lower
semicontinuous and f ≥ 0 outside the compact set K. It is no loss of generality to assume that K
is the closed ball C(0, R) for some large R (if not, replace K with such a ball containing K).
Let m0 := inf f . As f takes on a (finite) minimum on K by the previous lemma, and f ≥ 0 outside
K, we know that m0 is finite, not −∞. Let m := min{m0 , 0} ≤ inf f . We will construct a countable
family of Ccpt functions, the supremum of which is f , and this countable family we can actually
convert into an increasing sequence that converges to f .
To start this program, we note that for each x ∈ Rn and each t < f (x), there exists some r > 0
such that f > t on all of B(x, r). We may choose r rational, and we will only use this observation
to x ∈ Qn and t ∈ Q. We define the countably infinite set G := {(x, t, r) ∈ Qn × Q × Q | r >
0, f > t on B(x, r)}. For the ith element (x, t, r) of G (in some enumeration N → G), we define
the continuous function gi by
                                                   r                                           t
                         t             if y ∈ B(x, 2 )
gi (y) :=                 m             if y ∈ B(x, r)
                                             /                                    r             m
            t + ( 2 |y − x| − 1)(m − t) if y ∈ B(x, r) \ B(x, 2 )
By construction, gi ≤ f and gi is continuous (but does not have compact support yet, unless
m0 ≥ 0). We introduce g(x) := sup{gi (x) | i ∈ N}, observe trivially that g ≤ f , and now we will
show that for each ε > 0, g ≥ f − ε, hence altogether g = f .
So let ε > 0 and let x ∈ Rn . First find a rational number t ∈ ]f (x) − ε, f (x)[. Since f is lsc and
f (x) > t, there is a δ > 0 such that f > t on the entire ball B(x, δ). Choose y ∈ Qn ∩ B(x, 1 δ),4
                              1   3
and choose a rational r in ] 2 δ, 4 δ[. This guarantees that B(y, r) ⊂ B(x, δ) (hence (y, t, r) ∈ G),
and x ∈ B(y, 2 r), hence gi (x) = t > f (x) − ε.
Now if we let hk (x) = max{gi (x) | i = 1, . . . , k}, we have an increasing sequence hk ր g = f
with hk continuous. If m0 = inf f ≥ 0, we have m = 0, and our gi and hk already have compact
support. If however m0 = m < 0, the construction needs a minor modification: We introduce the
extra function g0 by g0 (x) = m < 0 for x ∈ K = B(0, R), and g0 (x) = 0 for x ∈ B(0, 2R), and
g0 (x) = m(1 − (|x| − R)/R) otherwise. Now g0 ∈ Ccpt      0 and g ≤ f . Then we take h := max{g |
                                                                  0                   k        i
i = 0, 1, . . . , k}, and now all the hk are supported in C(0, 2R).

(This proof is a variant of the one found in Jost’s Postmodern Analysis, Thm. 12.10)
                                               0          0
Now that we have a precise understanding of ↑ Ccpt and ↓ Ccpt , we can return to improve upon Propo-
sition 7.5.7:

Proposition 7.5.10 ↑ S ⊃ ↑ Ccpt and ↓ S ⊃ ↓ Ccpt.
                            0                0

                                             0                                        0
Proof: We prove the first part. Let f ∈ ↑ Ccpt. Then there exists a sequence (fk ) in Ccpt such that
fk ր f . By Proposition 7.5.7, we have, for each fk , a sequence (fkj )∞ in S such that fkj ր fk .
Now let hk := max{fij | 1 ≤ i, j ≤ k} and argue exactly as in the proof of Prop. 7.5.5.

Remark 7.5.11 The proof idea of this proposition contains a general insight, namely, that repeat-
ing the first extension step will not produce anything new: ↑↑ V = ↑ V (in slight abuse of notation
since ↑ V is not actually a vector space any more).

Remark 7.5.12 Prop. 7.5.10 is very instructive, and is given for didactical, rather than mathe-
matical reasons. At first sight, one might think that starting the extension process with the extremely

simple elementary integral on step functions might limit the scope of the integral resulting from the
extension process, and that starting with a more sophisticated elementary integral (like the Rie-
mann integral) might be needed to get the most powerful integral notion in the end. Prop. 7.5.10 is
a partial refutation of this idea: At least if you start with the restriction of the Riemann integral to
continuous functions, you will not have more functions after the first extension step than you get
by starting with step functions.
Deciding the same question for V consisting of all Riemann integrable functions is not so obvious, in
particular without a precise characterization of the class of Riemann integrable functions; however
we will see that after the full two-step extension process is completed, starting with either S or
Ccpt , we do encompass all Riemann integrable functions, and that iteration of the full extension
procedure does not yield further integrable functions. We will later return to this issue and see that
any of these choices S, Ccpt, or all Riemann integrable functions, produce the same integral after
both extension steps are completed.

At this point we have already defined the measure of every open set, and of every compact set
in Rn , i.e., the integral of the characteristic function of each such set:

Lemma 7.5.13 If U ⊂ Rn is open, then χU ∈ ↑ Ccpt. If C ⊂ Rn is compact, then χC ∈ ↓ Ccpt .
                                             0                                       0

Proof: Remember the dist function in a metric space from Hwk. 3.4J1: Given a set A and a point
x, we defined dist(x, A) := inf{d(x, y) | y ∈ A}, and we showed | dist(x, A) − dist(y, A)| ≤ d(x, y),
so dist(·, A) is a continuous function.
Given any closed set C, we can define fk (x) := (1−k dist(x, C))+ , which is defined to be max{0, 1−
k dist(x, C)}. The fk are continuous, and if C is also bounded, they have bounded support (hence
compact support, because the suport is by definition a closed set). Clearly (fk ) is a decreasing
sequence. If x ∈ C, then dist(x, C) = 0 and hence fk (x) = 1 for all k. If x ∈ C, then dist(x, C) > 0
(by Exercise 3.4J4), and therefore fk (x) = 0 for all sufficiently large k. This proves that χC ∈ ↓ Ccpt.
Now consider the case of an open set U , and first assume also that U is bounded. Then we can
define fk (x) := min{1, k dist(x, U c )}. Then by a similar reasoning, fk ր χU , and the fk are con-
tinuous with compact support. If U fails to be bounded, the fk fail to have compact support and
the construction needs to be modified a bit. Consider the function ik (x) := (1 − (|x| − k)+ )+ ; i.e.,
ik (x) = 1 if |x| ≤ k; and ik (x) := 0 if |x| ≥ k + 1; and ik (x) ∈ [0, 1] otherwise. ik is a ‘truncation
of the constant 1’. We can now choose fk (x) := min{ik (x), k dist(x, U c )} to finish the proof.

Exercise 7.8 With V = Ccpt , decide which of the following functions are in ↑ V and which in ↓ V ,

or both or neither. χZ , χ[0,1[ , χ[0,1] , χ]0,1[ .

Exercise 7.9 With V = S according to Example 7.4.4, decide whether the function f : R → R
given by f (x) = x2 is in ↑ S, in ↓ S, both or neither. Give a proof for ‘not in this set’ and an explicit
monotonic sequence of step functions for ‘in this set’.
Consider g : R → R given by g(x) = x/(1 + x4 ). Explain why it is neither in ↑ S nor in ↓ S. Give
at least two ways of writing g as g = g1 + g2 with g1 ∈ ↑ S and g2 ∈ ↓ S.
(In this problem, the diversity bigshots will return a vicious frown to everybody who frowns at
piecewise defined functions.)

Exercise 7.10 We could extend the definition of I from ↑ V ∪ ↓ V to ↑ V + ↓ V , i.e., when g can
be written as g = g1 + g2 with g1 ∈ ↑ V and g2 ∈ ↓ V and I(gi ) finite, then I(g) := I(g1 ) + I(g2 ).
Prove that this is well-defined, i.e., independent of the choice of decomposition of g. The reason
why we forego this opportunity is that the extra functions thus gathered as integrable will anyways
be gathered in the next step.

Exercise 7.11 Let I denote the Riemann integral on Ccpt and assume knowledge of the Riemann
integral from elementary calculus. Prove for a bounded open interval J that I(χJ ) = µ(J), the
length of J. Prove a similar result for the multi-variable case with an open box B ⊂ Rn . It is
convenient to work with products of single-variable functions in this case, and you may use repeated
integrals to evaluate multi-variable Riemann integrals as done in elementary MV-calculus.

Remember: The logical outline by which we present the entire theory is to construct an integral
by extending the integral of step functions; and we will later recognize the Riemann-integral as
a special case, based only on the elementary definition of the Riemann integral, but not on any
elaborate theory for the Riemann integral. This is the rigorous part. At the same time, we study
an alternative approach based on Ccpt , which would be rigorous now if we had proved beforehand
the properties of the Riemann integral you know from elementary calculus. This part serves a
didactical purpose, and the rigor of the entire exposition does not depend on it.

Exercise 7.12 Given a metric on Rn providing the ‘usual’ topology (i.e., the metric is equivalent
to the euclidean metric), and an open set U ⊂ Rn . Consider the countable family B := {B(x, δ) |
x ∈ Qn , δ ∈ Q+ }. Show that U is the union of all those B ∈ B that are contained in U . Use this
for a direct proof that χU ∈ ↑ S. — Give a second proof that χU ∈ ↑ S as a consequence of numbered
results in this chapter.

7.6     Upper and Lower Integral; Integrable Functions

Definition 7.6.1 Given a vector lattice V of functions X → R, and an arbitrary function f : X →
R ∪ {±∞}, we define the upper integral I ∗ (f ) := inf{I(g) | g ≥ f and g ∈ ↑ V }. Likewise, define
the lower integral I∗ (f ) := sup{I(h) | h ≤ f and h ∈ ↓ V }. (It is understood here that inf ∅ := +∞
and sup ∅ := −∞.) Call f integrable, iff I∗ (f ) = I ∗ (f ) ∈ R. If f is integrable, we define I(f ) to
be the common value of I∗ (f ) = I ∗ (f ). (Due to part (2) of Lemma 7.6.2 below, this leads to no
conflict with the prior definition of I on ↑ V ∪ ↓ V ). The set of all integrable functions X → R is
called L1 (X), or, if the integral needs to be denoted, L1 (X, I). We call I on L1 (X) the Daniell
extension of the elementary integral I on V .

It will turn out that this simple 2-step extension that has been accomplished by this definition
gives us an integral with powerful completeness ad convergence properties. In the case where the
elementary integral with which we started is the one from either Example 7.4.4 of 7.4.7, this integral
is equivalent to the integral notion originally defined by Lebesgue, and we will call it the Lebesgue
integral for this reason. The approach presented here is however not the one due to Lebesgue, but
rather was invented by P.J. Daniell (‘A General Form of Integral’; Annals of Mathematics 2nd Ser,
Vol 19; June 1918; pp 279–294)

Lemma 7.6.2 Let E denote the extended real line R ∪ {−∞, ∞}.
(1) For all functions f : X → E, it holds I∗ (f ) ≤ I ∗ (f ).
(2) If f ∈ ↑ V ∪ ↓ V , then I∗ (f ) = I ∗ (f ) = I(f ) ∈ E

(3) Let f ∈ L1 (X) and λ ∈ R; let h(x) := λf (x), subject to the ad-hoc stipulation that 0 · ∞ := 0.
Then h ∈ L1 (X) and I(h) = λI(f ).
(4) Let f1 , f2 ∈ L1 (X) and let f (x) := f1 (x) + f2 (x) wherever defined, and f (x) arbitrary where
f1 (x) + f2 (x) is not defined. Then f ∈ L1 (X) and I(f ) = I(f1 ) + I(f2 ).
(5) If f, g ∈ L1 (X) and f ≤ g, then I(f ) ≤ I(g).
(6) If f ≤ h ≤ g and f, g ∈ L1 (X) with I(f ) = I(g); then h ∈ L1 (X) and I(h) = I(f ).
(7) If f1 , f2 ∈ L1 (X), then so are max{f1 , f2 }, min{f1 , f2 }, (f1 )+ , (f1 )− and |f1 |.

The issue with the clumsy wording in (4) is that ∞ − ∞ and is not defined. We choose to allow
functions with values +∞ and −∞, because otherwise the monotonic convergence issues would
become clumsy: perfectly benign functions like f (x) = |x|−1/2 are naturally approximated by an
increasing sequence of continuous functions, if we let f (0) = ∞; namely we can let fn := min{f, n}.
But this creates us the problem that now L1 (X) is, pedantically speaking, not even a vector space,
because the sum of functions may not be defined. However, for integrable functions f, g, any
undefined values of f + g may be supplied arbitrarily without affecting the integral. This indicates
that for practical purposes, L1 (X) is ‘as good as’ a vector space, actually a vector lattice. We will
soon prove that the continuity property for elementary integrals is also satisfied for the extended
integral I. So, apart from the technicality created by occasionally undefined values, we have
salvaged all properties of the elementary integral through the extension process.
Proof of Lemma 7.6.2: For (1) we choose h, g such that ↓ V ∋ h ≤ f ≤ g ∈ ↑ V . The claim is
trivial if no such h or g exist. We have to show that ↓ V ∋ h ≤ g ∈ ↑ V implies I(h) ≤ I(g). The
claim then follows by taking the sup and inf respectively. But since ↑ V ∋ g − h ≥ 0, this follows
from Prop. 7.5.5.
For (2), assume f ∈ ↑ V (the other case is analogous). Then clearly I ∗ (f ) ≤ I(f ), since g := f is
allowable in inf I(g). On the other hand, we have V ∋ fn ր f , so we can choose the fn for h and
conclude I∗ (f ) ≥ I(fn ) → I(f ). With part (1), we get I(f ) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I(f ) and hence the
For λ ≥ 0, claim (3) follows immediately from Prop. 7.5.5. For λ = −1, it follows because ↑ V = −↓ V
and I(f ) = −I(−f ) as upper integrals for f correspond to lower integrals for −f and vice versa.
The general case is a combination of these two.
Now concerning (4), note that among the g ∈ ↑ V that satisfy g ≥ f are the g := g1 + g2 with
↑ V ∋ g ≥ f . In particular, this observation also applies for those x for which f (x ) + f (x )
        i     i                                                                      0            1 0      2 0
is not defined, because in this case one of the fi (x0 ) (say f1 (x0 )) must be ∞. Then g1 (x0 ) must
then be ∞ as well; but g2 cannot be −∞ since g2 ∈ ↑ V . So g(x0 ) = ∞ ≥ f (x0 ) regardless how we
define f (x0 ). Then I ∗ (f ) ≤ I(g1 + g2 ) = I(g1 ) + I(g2 ). By taking the infima over g1 and g2 , we
conclude I ∗ (f ) ≤ I ∗ (f1 ) + I ∗ (f2 ). Similarly, I∗ (f ) ≥ I∗ (f1 ) + I∗ (f2 ). Note that these inequalities
have been proved for all f, fi : X → E subject to the condition f = f1 + f2 where the sum is defined
(hence Cor. 7.6.3 below).
But now, if fi are integrable, we can continue:

                       I ∗ (f ) ≤ I ∗ (f1 ) + I ∗ (f2 ) = I∗ (f1 ) + I∗ (f2 ) ≤ I∗ (f ) ≤ I ∗ (f )

and therefore f is integrable and equality holds everywhere.
Moreover, if f ≤ g, then I ∗ (f ) ≤ I ∗ (g) because the lhs is an infimum over a larger set of competitors.
Similarly I∗ (f ) ≤ I∗ (g) because the lhs is the sup over a smaller set of competitors. Property
(5) follows trivially. Also under the hypotheses for (6), we have I ∗ (f ) ≤ I ∗ (h) ≤ I ∗ (g) and
I∗ (f ) ≤ I∗ (h) ≤ I∗ (g), and the integrability of f, g implies the claim.

It suffices to prove (7) for max{f1 , f2 }. By carrying the max property of V through the approxi-
mating sequences, it is clear that max{g1 , g2 } ∈ ↑ V if g1 , g2 ∈ ↑ V , and also for min; the same result
applies for h1 , h2 ∈ ↓ V . Now assume f1 , f2 ∈ L1 (X) and let ε > 0. Then there are gi ∈ ↑ V (and
gi ≥ fi ) such that I(gi ) ≤ I(fi ) + 4 ε < ∞ (for i = 1, 2). Likewise, there are hi ∈ ↓ V (with hi ≤ fi )
such that I(hi ) ≥ I(fi ) − 4 ε > −∞.
As g := max{g1 , g2 } ≥ max{f1 , f2 } =: f , and g ∈ ↑ V , we have I ∗ (f ) ≤ I ∗ (g) = I(g) ∈ R ∪ {∞}.
(So far, an infinite value is not ruled out; but it will be ruled out later.) And similarly, I∗ (f ) ≥
I∗ (h) = I(h) for h := max{h1 , h2 }. Now observe that

                  0 ≤ g − h = max{g1 , g2 } − max{h1 , h2 } ≤ (g1 − h1 ) + (g2 − h2 ) .

This estimate can be checked for each x case by case depending on the relative sizes of the
gi (x), hi (x). We conclude
                                                                                     1   1
         I ∗ (f ) − I∗ (f ) ≤ I(g) − I(h) = I(g − h) ≤ I(g1 − h1 ) + I(g2 − h2 ) ≤     ε+ ε =ε .
                                                                                     2   2
This inequality in particular rules out infinite values for I(g) and I(h), and therefore for I ∗ (f ) and
I∗ (f ). But since the argument could be made for every ε > 0, we conclude that I ∗ (f ) = I∗ (f ),
hence f is integrable.

We stress here that I(f ) is defined as ↑ I(f ) or ↓ I(f ), as the case may be, for all f ∈ ↑ V ∪↓ V according
to Def. 7.5.4 and the paragraph following it. For all of these functions, we have I ∗ (f ) = I∗ (f ) = I(f )
by 7.6.2(2); however, among the functions in ↑ V ∪ ↓ V , only those for which I(f ) is finite are called
integrable according to Def. 7.6.1. There can also be functions not in ↑ V ∪ ↓ V for which I ∗ (f )
and I∗ (f ) coincide; but only if this common value is finite (and the functions are thus integrable)
do we use the notation I(f ) for them.
Let us harvest two easy corollaries from the proof of (3) and (7) respectively, for easy reference:

Corollary 7.6.3 For all f, g : X → E, we have I ∗ (f + g) ≤ I ∗ (f ) + I ∗ (g), and I∗ (f + g) ≥
I∗ (f ) + I∗ (g). (Here, arbitrary values may be prescribed for points x where the limited arithmetic
of E does not define f (x) + g(x).)

Corollary 7.6.4 f is integrable if and only if for every ε > 0, there exist functions h ∈ ↓ V and
g ∈ ↑ V such that h ≤ f ≤ g and I(g) − I(h) < ε.

Of this latter corollary, one direction (which one?) is harvested from the proof of Lemma 7.6.2; the
converse is an easy exercise.

Exercise 7.13 For integrable f , prove: |I(f )| ≤ I(|f |)

In Lemma 7.6.2, we had to deal with obnoxious exceptions caused by limitations of the arithmetic
of the extended real line E. Let us address this issue with a clean language.

Definition 7.6.5 Given f : X → E, we denote by Ef the set {x ∈ X | f (x) ∈ R}, i.e., the
set where f has infinite values. A set E ⊂ X is called an I-null set, if there exists a function
f ∈ L1 (X, I) such that E = Ef . A property is said to hold I-almost everywhere (I-a.e.), iff it holds
for all points with the exception of those in an I-null set.

With this definition, we have the following lemma, of which the first part is an easy variant of
Lemma 7.6.2(4):

Lemma 7.6.6 If g ∈ L1 (X, I) and h : X → E satisfies h(x) = g(x) for all x ∈ E, where E is an
I-null set, then h ∈ L1 (X) and I(g) = I(h).
A set E is an I-null set if and only if χE ∈ L1 (X, I) and I(χE ) = 0.

For the first part, suppose E = Ef , an I-null set, where f ∈ L1 (X, I). The idea is to consider
(f + g) − f , and use Lemma 7.6.2(4).
Define p(x) := f (x) + g(x) wherever this makes sense; this includes all x ∈ Ef , and possibly some
x ∈ Ef . For the other x, define p(x) := f (x). Then p ∈ L1 (X, I) and I(p) = I(f ) + I(g). Now
define q(x) := p(x) − f (x), wherever this makes sense (which includes all x ∈ Ef ). Otherwise define
q(x) := h(x). Then q ∈ L1 (X, I) and I(q) = I(p) + I(−f ) = I(p) − I(f ) = I(g). We claim q = h
(thus proving part 1):
Indeed, if x ∈ Ef (i.e., f (x) finite), then p(x) is defined by f (x) + g(x), and q(x) is defined by
q(x) = p(x) − f (x) = g(x) = h(x). If x ∈ Ef , then p(x) = f (x), even if f (x) + g(x) should be
defined. In that case, p(x) − f (x) is not defined, hence again q(x) = h(x).
If E is an I-null set, then χE = 0 except on the I-null set E, hence χE is integrable and I(χE ) = 0.
Conversely, assume χE is integrable and I(χE ) = 0. We want to show that the function f defined
by f (x) = ∞ for x ∈ E and f (x) = 0 for x ∈ E is integrable. Let ε > 0. Since I(χE ) = 0,
there exists gn ∈    ↑ V with g ≥ χ and I(g ) < ε/2n . We therefore have an increasing sequence
                                n     E         n
sn := g1 + . . . + gn in ↑ V , with I(sn ) < n ε/2j < ε. Let g := lim sn ∈ ↑ V ⊂ L1 (X, I) by
Prop. 7.5.5. I(g) ≤ ε from the same proposition. Clearly g(x) = ∞ for x ∈ E, and therefore g ≥ f .
This implies that I ∗ (f ) ≤ I ∗ (g) = I(g) ≤ ε. Since this is true for every ε, we have I ∗ (f ) ≤ 0, and
trivially I∗ (f ) ≥ 0. So f is integrable.

The following is immediate from this lemma and Lemma 7.6.2(6):

Corollary 7.6.7 Any subset of an I-null set is an I-null set, too.
In the case that I is the Riemann integral on S or Ccpt (Examples 7.4.4, 7.4.7), Lebesgue null sets
from Def. 7.3.2 are indeed I-null sets. The converse is also true, but is less easy to show, so we’ll
postpone this until after a more thorough discussion of the Lebesgue measure (Cor. 7.8.10).

Proposition 7.6.8 If N ⊂ Rn is a Lebesgue null set in the sense of Def. 7.3.2, then it is an
I-null set in the sense of Def. 7.6.5 with I denoting the extension of the elementary integral on step

Proof: Clearly, if U is a countable union of boxes Bj , then we can let Un := n Bj and χU ∈ ↑ S
with S ∋ χUn ր χU . As I(χBj ) = µ(Bj ) and I(χUn ) ≤ n µ(Bj ), we have I(χU ) ≤ ∞ µ(Bj ). So
                                                             1                           1
if N is a Lebesgue null set, it can be covered by U , a countable union of boxes Bj with total volume
   j µ(Bj ) ≤ ε. Therefore I (χN ) ≤ I(χU ) ≤ ε. Since this is true for every ε > 0, we conclude
I(χN ) = 0.

The same proposition applies when I is the extension of the Riemann integral on continuous
functions, with essentially the same proof, due to Exercise 7.11.
Note that in most of our examples, the constant function 1 is not integrable (due to the ‘infinite
volume’ of Rn we have I∗ (1) = I ∗ (1) = ∞). This prevents us from directly applying Lemma 7.6.2(7)

to min{f, 1}. However, such a truncation is often desirable, and for this purpose we introduce the
following definitions and a generalization of Lemma 7.6.2(7).

Definition 7.6.9 A vector lattice V is said to satisfy Stone’s axiom, if for every f ∈ V , the
function min{f, 1} is in V as well.

It is easy to see that then min{f, N } and max{f, −N } are in V for every positive constant N .

Exercise 7.14 Show: If V satisfies Stone’s axiom, and I is an elementary integral on V , then its
Daniell extension satisfies the property f ∈ L1 (X, I) =⇒ min{f, N } ∈ L1 (X, I) for N > 0. Hint:
All proof steps are easy; the main issue is to go back through the construction, mention and modify
every ingredient that enters into Lemma 7.6.2(7), such as to see that all proof steps carry indeed

Clearly, our main examples S, Ccpt , C 0 and Seq satisfy Stone’s axiom.

Exercise 7.15 Let V = Seq with the elementary integral I from Example 7.4.9. Show that L1 (N, I)
consists precisely of the absolutely convergent series. Which sets are I-null sets?

Exercise 7.16 Let V = C 0 (Rn ) with the elementary integral δx∗ from Example 7.4.5. Show that
L1 (Rn , δx∗ ) consists of all those functions f : Rn → E for which f (x∗ ) is finite, and that the δx∗ -null
sets are exactly those subsets of Rn that do not contain x∗ .

Out here I not def’d
even if I ∗ = I∗                 I = +∞             This Venn diagram displays the general picture of the
                                                     sets involved in the Daniell extension process. Note that
                                                     the sets V , ↑ V and ↓ V are independent of the choice of
                                   functions         an elementary integral I, except that V is the original
                         V             L1 (X, I)     domain of I, before the extension.
                                      I finite
                                                     In the cases V = C 0 , V = Ccpt , and V = Seq, it turns
                             ↓V                      out that the gray set (↑ V ∩ ↓ V ) \ V is empty. However,
                                                     in the case V = S, this gray set contains all of Ccpt,0

                                 I = −∞             except for the zero function, which is already in V .
F(X → E)
We can now get the following picture that combines the cases V = S and V = Ccpt each of which
can be used to construct the Lebesgue integral:
                                                     As we will see in the next chapter, we get one and
                                 ↑S                  the same L1 (Lebesgue-integrable functions; dotted line)
                                                     from the Daniell process on Ccpt (solid line) or on S (fine
            ↑C 0                                     dashed line). Note that the starting points, S and Ccpt 0
                                                     are ‘almost disjoint’: only the zero function is in the
                  Ccpt            S             L1   intersection. After the first extension step (by mono-
                                                     tonic convergence), the step function approach is more
            ↓    0                                                                0
                Ccpt                                 powerful than the one via Ccpt functions.
                                                     However, after the second step, the same set of Lebesgue
                                                     integrable functions is captured from either starting
F(Rn → E)                                            point.

7.7     The Lebesgue Integral vs the Riemann Integral; Convergence

First let us note that we have captured the Riemann integral by the extension process from S.

Theorem 7.7.1 Let I denote the elementary integral on S according to Example 7.4.4. Assume
f is Riemann integrable on a box B ⊂ Rn (and f = 0 outside B). Then f ∈ L1 (Rn , I), and I(f )
is the Riemann integral B f (x) dn x.

Proof: The Riemann-integrability of f means: There exists a number J (namely J = B f (x) dn x)
that makes the following claim true: for every ε > 0, there exists δ > 0, such that for every tagged
partition (P, {cj }) with meshsize σ(P) < δ, it holds

                                          J −ε<          f (cj )µ(Bj ) < J + ε

By taking the sup or inf as cj ∈ Bj , we conclude (with the same quantifiers)

                         J −ε≤            (inf f ) µ(Bj ) ≤            (sup f ) µ(Bj ) ≤ J + ε
                                           Bj                           Bj
                                      j                            j

In particular, the supBj f and inf Bj f involved are finite. (This amounts to the proof that every
Riemann integrable function in bounded.) For each partition P, the function gP that is constant
supBj f on each box open Bj (and whatever on the boundaries of the boxes) is a step function
in S, and therefore trivially in ↑ S. Likewise, we can define hP ∈ S ⊂ ↓ S by taking inf Bj f on
each box Bj . By taking the sup over all h ∈ ↓ S (which includes step functions hP obtained from
partitions with meshsize < δ, we therefore find I∗ (f ) ≥ J − ε. Likewise we find I ∗ (f ) ≤ J + ε.
Since this is true for all ε > 0, we obtain J ≤ I∗ (f ) ≤ I ∗ (f ) ≤ J.

Thanks to this theorem, we now write I(f ) as Rn f (x) dn x, when the Daniell extension of the
integral on S is meant. We refer to this integral as the Lebesgue integral of f , and we have proved
that it is an extension of the Riemann integral. For any set A ⊂ Rn , we understand A f (x) dn x to
mean Rn f (x)χA (x) dn x, if this integral makes sense.
As a key step towards the convergence theorems for the Lebesgue integral (or, more generally,
any Daniell extension of an elementary integral), we study the behavior of the integral under
monotonic limits. Note that the limit of an increasing sequence fn is the same as the infinite series
f1 + ∞ (fn+1 − fn ), a series with nonnegative terms. This latter point of view is now a bit more
convenient than the former. We can still maintain the generality of the Daniell extension of any
elementary integral.

                                                              ∞                  ∞ ∗                    ∞            ∞
Theorem 7.7.2 Let fn : X → [0, ∞]. Then I ∗ (                 1 fn )     ≤       1 I (fn )   and I∗ (   1 fn )   ≥   1 I∗ (fn ).

Proof: We first take the case of the I ∗ . If any I ∗ (fn ) is infinite, the inequality is trivial. So
assume all I ∗ (fn ) are finite, and select gn ∈ ↑ V such that gn ≥ fn and I(gn ) < I ∗ (fn ) + ε/2n , where
ε > 0 is arbitrary. Using Lemma 7.5.5, we have ↑ V ∋ ∞ gn ≥ n fn and therefore
                                                             1         1

                             ∞                  ∞                 ∞                ∞ ∗
                      I ∗(   1 fn )   ≤ I(      1 gn )   =        1 I(gn )   ≤     1 I (fn ) +    ε.

The claim follows as ε → 0.

For I∗ , we combine the monotonicity of I∗ with the desired inequality for finitely many terms
(Cor. 7.6.3) to get
                           I∗ ( ∞ fn ) ≥ I∗ ( N fn ) ≥ N I∗ (fn )
                                1             1          1

Now let N → ∞ on the right hand side.

The following first convergence theorem is an immediate consequence of this result:

Theorem 7.7.3 (Thm. of Beppo Levi) Let (fn ) be a sequence of nonnegative, integrable functions
and f := ∞ fn . If
         1         I(fn ) < ∞, then f is integrable and I(f ) = ∞ I(fn ).
If (gn ) is an increasing (resp. decreasing) sequence of integrable functions, with gn ր g (resp.
gn ց g), and lim I(gn ) < ∞ (resp. lim I(gn ) > −∞), then g is integrable and I(g) = lim I(gn ).

Proof: The first statement follows immediately from the preceding result by
            ∞              ∞                     ∞                 ∞            ∞ ∗             ∞
            1 I(fn )   =   1 I∗ (fn )   ≤ I∗ (   1 fn )   ≤ I ∗(   1 fn )   ≤   1 I (fn )   =   1 I(fn )

The second statement, for increasing, follows by applying the first result to fn := gn+1 − gn . (We
may let fn (x) = ∞ where the arithmetic on E fails us.) For decreasing, use the negative of these

An immediate consequence is

Corollary 7.7.4 The union of countably many I-null sets is an I-null set.

Proof: Let An be a countable family of I-null sets, and take fn to be their characteristic func-
tions. Let f be the characteristic function of An . Then 0 ≤ f ≤   fn , and I( fn ) = I(fn ) =
   0 = 0 by Thm. 7.7.3. Now I(f ) = 0 follows from Lemma 7.6.2(6).

We have seen examples where the integral does not commute with pointwise limits (Exercise 7.6).
However, we do have a one-sided inequality for integrals of nonnegative functions, and it is some-
times useful:

Lemma 7.7.5 (Fatou’s Lemma) Let fn be nonnegative integrable functions with sup I(fn ) < ∞.
Then f := lim inf n→∞ fn (defined pointwise) is integrable, and I(f ) ≤ lim inf n→∞ I(fn ).

Proof: For fixed n, the sequence (gnm := min{fn , fn+1 , . . . , fn+m })m is a decreasing sequence of
nonnegative integrable functions, and therefore, from Thm. 7.7.3, its limit gn = inf{fk | k ≥ n} is
integrable, and I(gn ) = limm→∞ I(gnm ). In particular I(gn ) ≤ I(fk ) for every k ≥ n. So we can
take the inf over k and get a bound I(gn ) ≤ inf k≥n I(fk ) ≤ sup I(fk ) < ∞. Now we can let n → ∞,
and (gn ) is an increasing sequence: gn ր lim inf fk = f . Again by Thm. 7.7.3, f is integrable and
I(f ) = lim I(gn ) ≤ limn→∞ inf k≥n I(fk ) = lim inf I(fk ).

Exercise 7.17 Show that the hypothesis sup I(fn ) < ∞ can be replaced by the weaker hypothesis
lim inf I(fn ) < ∞.

We are now in a position to prove Lebesgue’s dominated convergence theorem, which is almost a
universal tool for exchanging limits with integration.

Theorem 7.7.6 (Dominated Convergence) Suppose fn are integrable and fn → f I-almost every-
where, i.e., there exists an I-null set N such that fn (x) → f (x) for all x ∈ N ). Suppose further
that there is an integrable function g such that |fn | ≤ g for all n I-almost everywhere. Then f is
integrable, and I(f ) = lim I(fn ).

Proof: By changing fn and f to values 0 on the null set N , we change the a.e.-convergence to
pointwise (everywhere) convergence without changing the integrability hypothesis for the fn or the
integrability claim for f . Moreover, by changing the value of g to +∞ on the set ∪Nn , where Nn
is the null set on which the condition |fn (x)| ≤ g(x) fails, we are not changing the integrability of
g, because the countable union of the null sets Nn is still a null set (by Cor. 7.7.4). Therefore it
suffices to prove the theorem with ‘a.e.’ replaced by ‘everywhere’.
Now, since g − fn ≥ 0, we can use Fatou’s lemma to get that g − f (and hence f ) is integrable,
and I(g − f ) ≤ lim inf I(g − fn ) = I(g) − lim sup I(fn ). Likewise, using Fatou’s lemma on g + fn ,
we get I(g + f ) ≤ I(g) + lim inf I(fn ). Putting the two together and cancelling I(g), we conclude
I(f ) ≤ lim inf I(fn ) ≤ lim sup I(fn ) ≤ I(f ). Hence lim I(fn ) = I(f ).

Before continuing with the theory of L1 (X, I) in general, let us make good on the promise that the
two approaches to the Lebesgue integral are equivalent:

Theorem 7.7.7 Let IS be the elementary integral on step functions according to Example 7.4.4,
and let IC be the Riemann integral on Ccpt functions according to Example 7.4.7 (equivalently,
                          0 ). Then L1 (Rn , I ) = L1 (X, I ), and I (f ) = I (f ) for all integrable
the restriction of IS to Ccpt                 S            C        S        C
functions f .

Proof: By Prop. 7.5.10, we have ↑ Ccpt ⊂ ↑ S. Moreover IC (f ) = IS (f ) for all f ∈ Ccpt , because
                                          0                                                 0
the Riemann integral on Ccpt is a special case of the Lebesgue integral by Thm. 7.7.1. By going to
increasing limits, we have IC (f ) = IS (f ) for all f ∈ ↑ Ccpt as well. With analogous proof, the same
holds for all f ∈ ↓ Ccpt . Now suppose f is IC -integrable. Then, given ε > 0, there exist g ∈ ↑ Ccpt
                     0                                                                               0

and h ∈ ↓ Ccpt such that h ≤ f ≤ g and IC (g) − IC (h) < ε. The same h, g are in ↑ S and ↓ S and

provide for IS (g) − IS (h) < ε, hence f is IS -integrable by Cor. 7.6.4.
The converse inclusion needs a bit more work, because ↑ S is a strict superset of ↑ Ccpt ; set inclusion
arguments alone cannot rule out that IS is more powerful than IC . So assume f ∈ L1 (Rn , IS ) and
let ε > 0. There exists g ∈ ↑ S such that g ≥ f and IS (g) < IS (f ) + 2 . We want to modify g to
make it lower semicontinuous. There exist step functions gk such that gk ր g. gk is constant on the
‘steps’ (open boxes) Bk,j . We change the values on the boundaries of the steps: Let gk (x) = gk (x)
for all x ∈ Bk,j , and define gk (x) for x ∈ j (∂Bk,j ) to be the smallest among those values gk (Bk,j )
                                                 g                    ˜
for which x is in the closure of Bk,j . Then IS (˜k ) = IS (gk ), but gk is lower semicontinuous. Let
                            g       ˜
the limit of the sequence (˜k ) be g .
The sequence (˜k ) is still increasing. Indeed, for x ∈ j (∂Bk,j ) ∪ j (∂Bk+1,j ), we have gk+1 (x) =
                 g                                     /                                   ˜
gk+1 (x) ≥ gk (x) = gk (x). For the other x’s we choose a sequence xr → x with xr such that
lim gk+1 (xr ) = gk+1 (x); this can be achieved by choosing the xr from the appropriate among the
boxes Bk+1,j (and not on any of the boundaries ∂Bk,j either). Then gk+1 (xr ) ≥ gk (xr ) implies
                                                                          ˜            ˜
gk+1 (x) = lim gk+1 (xr ) ≥ lim inf gk (xr ) ≥ gk (x).
˜               ˜                   ˜          ˜
In contrast to g ≥ f , we may have violated the inequality g ≥ f on an IS null set. We can
cover j (∂Bk,j ) with finitely many open boxes Dk,r , the sum of whose volumes is < ε/2k+1 Mk ,
where Mk := sup gk − inf gk . The χDk,r are lower semi-continuous, because the Dk,r are open. So
       ˆ     ˜                              g
we let gk := gk + l≤k r Ml χDl,r . Now (ˆk ) is an increasing sequence of lsc functions, whose

limit g is therefore also lsc. (The proof of Exercise 7.4 carries over to the present situation). And
IS (ˆ) ≤ IS (˜) + k ε/2k+1 = IS (g) + 2 < IS (f ) + ε. Since gk ≥ gk , we have g ≥ g ≥ f .
    g        g                                                ˆ                 ˆ
Since g ∈ ↑ S, it is ≥ 0 outside some compact set, and since g is also lower semicontinuous, it is
       ˆ                                                         ˆ
      0                                     ∗
in ↑ Ccpt by Prop. 7.5.9. But this means IC (f ) ≤ IS (f ) + ε. A similar construction can be made
with the lower integral to obtain IC∗ (f ) ≥ IS (f ) − ε. Together, they prove the IC integrability of
f and the equality of the integrals.

We infer, from the properties of integrability, that L1 (X, I) satisfies all properties of a vector
lattice, except for the trouble with not-always defined vector space operations due to possible
infinte values of funcitons in L1 (X, I). Restricting I to finite-valued integrable functions, we have
again an elementary integral and may wonder if repetition of the Daniell extension process produces
further integrable functions. The answer is negative:

Theorem 7.7.8 Let I be the Daniell extension of an elementary integral on V . Let J be the
restriction of I to the vector lattice W consisting of the finite valued functions among L1 (X, I).
Then L1 (X, J) = L1 (X, I), and J(f ) = I(f ).

Proof: The fact that W is a vector lattice and J an elementary integral follows from Lemma 7.6.2
(parts 3,4,5,7) and the Beppo Levi theorem 7.7.3. The inclusion L1 (X, I) ⊂ L1 (X, J) is immediate
from the inclusion V ⊂ W , with details as in the first part of the proof of Thm. 7.7.7.
So let us now assume f is J-integrable. Then, given ε > 0, there exist functions g ∈ ↑ W and h ∈ ↓ W
such that h ≤ f ≤ g and J(g) < J(f )+ 1 ε and J(h) > J(f )− 1 ε. For such functions, the Beppo Levi
                                        2                    2
theorem guarantees g, h ∈ L1 (X, I), and J(g) = I(g), J(h) = I(h). So we have found I-integrable
functions h, g such that h ≤ f ≤ g and I(g) − 1 ε < J(f ) < I(g) and I(h) < J(f ) < I(h) + 1 ε. By
                                               2                                              2
the monotonicity of the upper and lower integrals, we infer
               J(f ) − 2 ε ≤ I(h) = I∗ (h) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I ∗ (g) = I(g) ≤ J(f ) + 1 ε

The claim follows.

The following lemma will help in proving that the Daniell extension process makes good on the
goal of metric completion of V = Ccpt with respect to the norm f := |f (x)| dx.

Lemma 7.7.9 f ∈ L1 (X, I) iff ∀ε > 0 ∃f0 ∈ V : I ∗ (|f − f0 |) < ε.

Proof: Assume f ∈ L1 (X, I). Then there exists a g ∈ ↑ V such that g ≥ f and I(g) < I(f ) + 1 ε.2
There also exists g0 ∈ V such that g0 ≤ g and I(g0 ) > I(g) − 1 ε. Then I(|f − g0 |) ≤ I(|f − g|) +
I(|g − g0 |) < 1 ε + 2 ε = ε. We can choose f0 := g0 .
Conversely, assume that for every ε > 0 there is some f0 ∈ V such that I ∗ (|f − f0 |) < ε. So there
exists a g ∈ ↑ V such that |f − fn | ≤ g (i.e., f0 − g ≤ f ≤ f0 + g and I(g) < ε. But then

                             I(f0 ) − I(g) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I(f0 ) + I(g)

and therefore I ∗ (f ) − I∗ (f ) < 2I(g) < 2ε. This implies the integrability of f .

Definition 7.7.10 A real-valued function · on a vector space is called a seminorm, iff it satisfies
the following axioms:

 f + g ≤ f + g for all f, g,
 λf = |λ| f for all λ ∈ R and all f ,
 f ≥ 0 for all f .

So what is missing from a norm is the property f = 0 =⇒ f = 0. It is immediate that an
elementary integral I on a vector lattice V defines a seminorm there by f := I(|f |). The same
applies for the Daniell extension of I, restricted to the finite valued functions in L1 (X, I) such as to
have a vector space as a domain. For the Riemann integral on V = Ccpt , f := I(|f |) is actually a
norm. However, this is no longer true for the extension, because, e.g., χN = 0 in case N = ∅ is a
null set, but χN = 0. And also for other examples of elementary integrals (e.g., δx∗ on C 0 ), I(|f |)
need not be a norm. The way to fix this deficit is to consider equivalence classes of functions.

Lemma 7.7.11 The relation f ∼ g : ⇐⇒ f = g a.e. is an equivalence relation.

The proof is trivial; we can now define

Definition 7.7.12 L1 (X, I) := L1 (X, I)/ ∼, i.e., the set of equivalence classes of functions modulo
equality I-a.e.

Now we have the crucial

Theorem 7.7.13 L1 (X, I), together with        ·   L1   defined by f   L1   := I(|f |) is a Banach space.

Proof: Despite the slight abuse of notation, remember that the elements of L1 (X, I) are not
functions f , but equivalence classes [f ] of functions. As a consequence of Lemma 7.6.6, each equiv-
alence class [f ] contains functions that have finite values everywhere. The vector space operations
are defined in terms of such representatives: [f ]+[g] := [f +g], λ[f ] := [λf ], and this is well-defined,
i.e., the definition does not depend on the choice of representatives: If f1 ∼ f2 and g1 ∼ g2 then
f1 + g1 ∼ f2 + g2 . Likewise the norm is well-defined by choosing a representative. The seminorm
properties of f translate into seminorm properties for [f ] . Now if [f ] = 0, this means f = 0
for a representative, i.e., I(|f |) = 0, and then (why actually?) f ∼ 0, i.e., [f ] = [0]. So · is a
norm on the quotient space L1 (X, I).

Exercise 7.18 Write out the proof details of these statements.

Proof continues: The key issue is to prove completeness. Suppose ([fn ]) is a Cauchy sequence in
L1 (X, I). We will construct a limit [f ] with respect to the L1 norm by constructing a representative
f as a pointwise (a.e.) limit of a subsequence of (fn ). Indeed, we will define nj inductively such
that fn − fm < 2−j provided n, m ≥ nj and (for j > 1) nj > nj−1 . Then gN := N |fnj+1 − fnj |
defines an increasing sequence (gN ) of integrable functions with I(gN ) bounded by        2−j = 1. By
Beppo Levi, g := lim gN exists pointwise, is integrable and finite almost everywhere. But then, for
each x for which g(x) is finite, the sequence fnk (x) = fn1 (x) + k (fnj (x) − fnj−1 (x)) converges,
because it converges absolutely. In other words, (fnk )k converges a.e. to a function f . But this
sequence is also majorized by the integrable function fn1 + g, and therefore f is integrable and
I(f ) = lim I(fnk ). More even, I(|f − fnk |) ≤ ∞ I(|fnj+1 − fnj |) = 2−(k−1) . So [fnk ] → [f ]
in the · L1 sense. We know from Lemma 3.11.3 that a Cauchy sequence that has a convergent
subsequence is convergent itself.

Let us extract from this proof the following

Corollary 7.7.14 Every L1 -convergent sequence has an a.e. convergent subsequence.

It bears repeating that when we talk about an L1 -convergent sequence, then each member of
this sequence is an equivalence class of functions. We get a sequence of functions by choosing a
representative from each equivalence class. It makes sense, for such a sequence, to ask whether it
(or a subsequence thereof) converges pointwise, or at least almost everywhere. The answer to the
question ‘almost everywhere convergent?’ is independent of the choice of representatives, because
each different choice affects only the values on a null set, and the union of countably many null
sets (one for each sequence index n) is still a null set. These issues are often conveniently glossed
over, and the theory is set up in such a way that usually hypotheses and conclusions are only
intended to hold almost everywhere. So the issue of choosing representatives of each equivalence
class is a non-issue in most practical applications. However, bear the distinction in mind to avoid
confusion in the very rare circumstances where it is crucial. If you were to ask: “Does every L1 -
convergent sequence have a pointwise convergent subsequence?”, this would, strictly speaking, be
a meaningless question; the answer would depend on a choice of representatives; a rewording of
the question that makes it meaningful would read: “Can every L1 -convergent sequence be given
a choice of representatives such that the sequence of representatives has a pointwise convergent
subsequence?” The answer to this question would be ‘yes’, and the reason is simply Cor. 7.7.14.
The exceptional set allowed there can be removed by choosing representatives that are (e.g.) 0 on
the exceptional set.
It is noteworthy that an L1 convergent sequence need not be a.e.-convergent itself:

Example 7.7.15 Given [a, b[ ⊂ R with b − a < 1, define the ‘wrap-around characteristic function’
χ[a,b[ on [0, 1] as follows: If n ≤ a < b ≤ n + 1 for some integer n, then χ[a,b[ = χ[a−n,b−n[ . If
˜                                                                                ˜
n ≤ a < n + 1 < b for some n ∈ N, then χ[a,b[ := χ[a−n,1] + χ[0,b−n−1[ . Let an := n 1 and define
                                           ˜                                          j=1 j
fn := χ[an ,an+1 [ . Then fn converges nowhere in [0, 1], but converges to 0 in the L1 norm.

While the issue of choosing representatives is irrelevant in most examples, let me give you one
simple example where it needs to be addressed: In a practical context, we may obtain a ‘function’
f out of an argument using the vector space L1 ; i.e., we may say something in the style “Since
this is a Cauchy sequence in L1 , it has a limit; let’s call this limit f .” Now as we pretend this
limit to be a function, we may want to ask whether this function is continuous. But in reality, we
have constructed an equivalence class [f ], and our question of continuity really asks whether this
equivalence class [f ] has a continuous representative. Specifically for the Lebesgue integral I, if
two continuous functions are not the same, then they differ on an open interval (which is not an
I-null set), so they cannot be equivalent. This means in the case of the Lebesgue integral, every
equivalence class has at most one continuous representative. If it has one indeed, we would choose
this as the preferred representative and call this [f ] ∈ L1 continuous, because it has a (unique)
continuous representative. So g represents a continuous equivalence class [g], if there is a continuous
function f such that g = f almost everywhere. This is not the same as saying g is continuous
almost everywhere! χQ represents a continuous element of L1 , namely [χQ ] = [0], but χQ is nowhere
It gives a ‘good feeling’ to have a way of choosing a representative f of an equivalence class [f ] ∈ L1
in a natural way (even though in most cases the need doesn’t arise). For this ease of mind, I quote
a theorem (whose proof is beyond the scope of this class) that helps out in this respect:

Theorem 7.7.16 (Lebesgue’s differentiation theorem) If f is Lebesgue integrable, then the limit of

ball averages
                                                   B(x,r) f (y) d y
                                    g(x) := lim                n
                                                    B(x,r) 1 d y

exists almost everywhere, and g = f a.e. The balls are understood with respect to the euclidean
metric here.

This theorem gives a well-defined and natural way of assigning a value f (x) to an [f ] ∈ L1 at least
in almost every point x, and it selects in a natural and unambiguous way the precise null set on
which we may refrain from assigning a value f (x).
In the sequel, we will take the liberty (as is common usage) of writing elements of L1 like functions,
when this causes no problems; but we will revert to the equivalence class notation [f ] whenever
needed for mathematical or didactical reasons.
Finally, it is instructive to observe a similarity between the completion of (Q, deucl ) and the com-
pletion of (Ccpt (K), · L1 ). In both cases we started out with the order structure. For getting
R from Q we introduced the supremum axiom, rather than going through an actual construction;
but we could have gone a constructive route instead. We chose the axiomatic approach in this
course, because the constructive approach would have been more ‘pre-analysis’ foundational. It
turned out subsequently that this completion process based on order achieved metric completeness.
The Daniell extension process relied on the order structure as well; this time we carried out a
construction in minute detail, because it is of germane analysis interest. The resulting space L1
again turns out to be metrically complete.

7.8     Measurable Functions and Sets

General Properties that apply to all vector lattices and integrals:
We continue to study an integral I that is the Daniell extension of an elementary integral on a
vector lattice V ⊂ F(X → R).
Informally speaking, there are two ways in which a function can fail to be integrable. One is that
a reasonable candidate for the integral just fails to be finite, another is that the function is so
‘wild’ that a reasonable candidate for its integral cannot be found. A function is measurable if the
second alternative does not happen. It is difficult to come up with examples of functions that are
not measurable. Measurability is a notion that is not intuitive to define in the first place, but once
defined, it is a hypothesis that is easy to check and easy to fulfill.
Here are the definitions:

Definition 7.8.1 Given functions h, f, g : X → E, where we assume h ≤ g, define med(h, f, g) :=
max{h, min{f, g}}.

Intuitively, the graph of med(h, f, g) is obtained by truncating the graph of f from above with the
graph of g and from below with the graph of h.

Exercise 7.19 Assume h ≤ g. Show: max{h, min{f, g}} = min{max{h, f }, g}. What is med{h, f, g}
in case f > g, what in case f < h?

Definition 7.8.2 f : X → E is I-measurable, iff med(−g, f, g) is I-integrable for every non-
negative I-integrable g. A set A ⊂ X is I-measurable if its characteristic function χA is I-

We have an immediate corollary:

Corollary 7.8.3 If f is integrable, then f is measurable. If f is measurable and |f | ≤ g for an
integrable function g, then f is integrable; specifically, if |f | is integrable and f is measurable, then
f is integrable.
If f is measurable, then med(h, f, g) is integrable for every pair of integrable functions (h, g) with
h ≤ g.

Exercise 7.20 Prove this corollary.

Definition 7.8.4 Let A ⊂ X. We define µ∗ (A) := I ∗ (χA ) and call it the outer measure of A. If
χA is integrable, we denote I(χA ) =: µ(A) and call it the measure of A. If χA is measurable, but
not integrable, we let µ(A) := +∞ and still call it the measure of A. If I needs to be specified, an
index I can be attached to µ.

In view of Lemma 7.6.6, the I-null sets are exactly the sets of measure 0.
We get a list of easy properties for measurable functions and sets:

Proposition 7.8.5 (1) If f is measurable and g = f a.e., then g is measurable.
(2) If f1 and f2 are measurable, then f1 + f2 , provided it is defined a.e., is measurable. Constant
multiples of measurable functions, if defined a.e., are measurable.
(3) Absolute values and minima and maxima of measurable functions are measurable.
(4) If (fn ) is a sequence of measurable functions that converges a.e. to f , then f is measurable.

Proof: The first statement follows from Lemma 7.6.6. Now assume f1 and f2 are measurable and
consider med(−g, f1 + f2 , g) for g integrable. In view of the first part, it is no loss of generality to
assume that f1 + f2 is defined everywhere. Since fi,n := med(−ng, fi , ng) are integrable, so is f1,n +
f2,n , and therefore med(−g, f1,n + f2,n , g) is also integrable. Now f1,n (x) + f2,n (x) → f1 (x) + f2 (x)
as n → ∞, provided g(x) > 0. Therefore med(−g, f1,n + f2,n , g) → med(−g, f1 + f2 , g) pointwise
(in points where g(x) = 0 this claim is trivial). By majorized convergence, med(−g, f1 + f2 , g) is
integrable, hence f1 + f2 is measurable.
Now assume f measurable and λ ∈ R. Assume λ = 0 as the other case is trivial. Then
med(−g, λf, g) = λ med(−g/|λ|, f, g/|λ), and the measurability of λf follows. If f is measur-
able, then |f | is measurable because med(−g, |f |, g) = | med(−g, f, g)|, and the rhs is integrable,
because it is the absolute value of an integrable function; see Lemma 7.6.2(7). Now if f1 , f2 are
measurable and at least one of them is finite valued a.e., then we can argue that max{f1 , f2 } :=
1              1
2 (f1 + f2 ) + 2 |f1 − f2 | is also measurable; and similarly for min{f1 , f2 }. The result also holds
without the assumption of finite-valuedness (and thus with f1 ± f2 maybe not defined a.e.): namely
we can argue that med(−g, max{f1 , f2 }, g) = max{med(−g, f1 , g), med(−g, f2 , g)}, as can be eas-
ily seen by a pointwise case distinction. Then refer to Lemma 7.6.2(7). Majorized convergence
proves that pointwise limits of measurable functions are measurable; namely, if fn → f a.e., then
med(−g, fn , g) → med(−g, f, g) a.e., and majorized by |g|, which was assumed integrable.

Similarly, we have

Proposition 7.8.6 Countable unions and countable intersections of measurable sets are measur-
able. If Stone’s axiom applies, complements of measurable sets are measurable.

Proof: The characteristic function of ∞ Ai is limn→∞ max{χA1 , . . . , χAn }. A similar formula
with min instead of max applies for intersections. By the preceding proposition, these functions
are measurable, if the χAi are. Stone’s axiom guarantees that min{g, 1} is integrable if g is. So the
constant 1 is measurable. Then 1 − χA is measurable if χA is.
Let us study an easy example, before tackling specifically the Lebesgue measure:

Exercise 7.21 In the case of V = Seq with I as in Example 7.4.9, show that all sequences N → E
are measurable.

Exercise 7.22 (1) If A1 ⊂ A2 ⊂ . . . is an increasing sequence of measurable sets, then µ( Ai ) =
lim µ(Ai ).
(2) If (Ai ) is a sequence of measurable sets (not necessarily increasing, then µ( Ai ) ≤    µ(Ai ).
If the Ai are pairwise disjoint, then equality holds.
(3) If A1 ⊃ A2 ⊃ . . . is a decreasing sequence of measurable sets, with µ(A1 ) < ∞, then µ( Ai ) =
lim µ(Ai ). Give a counterexample that the conclusion need not hold if the hypothesis µ(A1 ) < ∞ is

Specifically, the last part of Exercise 7.22 puts our heuristic idea about the measure of a fat Cantor
set (Example 7.3.4) on a rigorous footing.

Remark for cross reference: (may be skipped)
Let me comment briefly on the measure theoretic approach to Lebesgue integration (i.e., the main
alternative to the approach we are taking here): In measure theory, a family S of sets is called
a σ-algebra if, with any countable collection of sets Ai , it also contains their union (Ai ∈ S =⇒
  ∞                                                                             c
  i=1 Ai ∈ S), and if with any set A it contains its complement (A ∈ S =⇒ A ∈ S). A measure is
axiomatically defined as a function µ : S → R ∪ {+∞} satisfying property (2) of exercise 7.22, and
µ(∅) = 0. What is rather unmotivated in this approach is: why σ-algebras? The analyst’s answer
to this question is: a powerful theory can only be expected from a notion that allows to pass to
limits of sequences; in particular, countable unions of ‘good’ (measurable) sets should be ‘good’
(measurable), too. The difficult part in this approach is to actually construct the Lebesgue measure.
Some labor needs to be invested here, equivalent to a corresponding portion of our extension
construction. With the measure constructed first, one can then define step functions whose ‘steps’
are measurable sets with finite measure, rather than just boxes, and get a quicker definition of
the Lebesgue integral by approximating functions with such step functions with measurable steps.
There are a few variants of this construction on the market, differing in details. Royden’s book is
one that pursues this approach.
Chapter 5 of the notes gives a downsized version of this construction, bypassing some technicalities
and restricting discussion to ‘Borel sets’, which form a subset of the Lebesgue measurable sets.
This downsized version is good enough for many practical purposes, but the downsizing does not
serve a useful didactical purpose when the Daniell extension approach is taken.

Properties specific to the Lebesgue integral:
Not surprisingly, the measure µI obtained from Def. 7.8.4 in case I denotes the Lebesgue integral
is called the Lebesgue measure.
Note that in our general framework, we had a vector lattice V ⊂ F(X → R), and no metric
or topology was assumed on X. Any properties about measurable functions or sets that involve

topological notions (like open or compact sets, continuous functions), will therefore need more
specific hypotheses. A crucial hypothesis that connects topology and measurability is that all
open sets should be measurable. In the case of the Lebesgue measure, this hypothesis is verified.
Rather than pursuing further generalities, we restrict our discussion now exclusively to the Lebesgue
measure. We will now write instead of I to stress that we no longer have the vast generality,
but refer to the Lebesgue integral; however, we will keep the simplicity of writing f instead of
 Rn f (x) d x.

Theorem 7.8.7 (1) A function f : Rn → E is Lebesgue measurable, iff for every compact cube C
and every k > 0 the function med(−kχC , f, kχC ) is integrable.
(2) A function f : Rn → E is Lebesgue measurable, iff for every k, the set {x | f (x) > k} is
Lebesgue measurable. The same equivalence holds with f (x) ≥ k, or f (x) < k, or f (x) ≤ k.
(3) Continuous functions are Lebesgue measurable.
(4) If g : Rk → R is continuous and fi : Rn → R are measurable, then the composite function
g ◦ (f1 , . . . , fk ) is Lebesgue measurable.

Proof: We write ‘measurable’ for Lebesgue measurable.
The ‘only if’ part of (1) is a trivial consequence of Def. 7.8.1 since kχC is measurable. Conversely,
f = limk→∞ med(−kχ[−k,k]n , f, kχ[−k,k]n ), and limits of integrable functions are measurable by
Prop. 7.8.5(4).
Concerning (2), if f is measurable, then min{n(f − k)+ , 1} is also measurable for each constant
k. As n → ∞, this function converges pointwise to χ{f >k} . So this characteristic function is
measurable, and therefore so is the set {x | f (x) > k}. The set {x | f (x) ≥ k} is the intersection of
the sets {x | f (x) > k − n } and is therefore also measurable. Similar arguments can be made for
the other cases.
Conversely, assume that {x | f (x) ≥ k} is measurable for every k. Consider fn := ∞ j−1 (χ{f ≥(j−1)/n} −
                                                                                     j=1 n
χ{f ≥j/n} ). Each term in this sum is a nonnegative measurable function, so the sum fn is measur-
able, too. Actually, for j ≥ 1, fn (x) = (j − 1)/n exactly if f (x) ∈ [ j−1 , n [. If f (x) < 0, then
fn (x) = 0. So, as n → ∞, fn → f+ pointwise. Therefore, f+ is measurable. The same argument,
applied to f + s for any constant s, proves that (f + s)+ , and hence (f + s)+ − s, is measurable.
As s → ∞, we retrieve f as measurable. Similar arguments can be made for sets defined by the
other inequalities.
For (3), note that open sets are measurable, by Exercise 7.12 (or by Lemma 7.5.13 in connection
with Thm. 7.7.7). For continuous functions f , the sets {f > k} are open. These two facts, together
with part (2), guarantee the measurability of f .
For (4), assume g is continuous and the fi are measurable. We want to show, for every N ∈ R,
that the set S := {x | g(f1 (x), . . . , fk (x)) > N } is measurable. By continuity of g, the set
U := g−1 (]N, ∞[) is open, and by Exercise 7.12, U is the countable union of open boxes Bj .
So we can write S = ∞ {x | (f1 (x), . . . , fk (x)) ∈ Bj }. The set {x | fi (x) ∈ ]a, b[} is measurable
by part (2), and as the intersection of k such sets, {x | (f1 (x), . . . , fk (x)) ∈ Bj } is also measurable.
So S is the countable union of measurable sets, hence measurable as well.

Remark 7.8.8 It is not true that compositions f ◦g with f Lebesgue measurable and g continuous
must be Lebesgue measurable. A counterexample can be found in Gelbaum-Olmsted, Ch. 8, Ex. 16.
The idea is to use the function x + ψ(x) (with ψ the devil’s staircase function) as a homeomorphism
that transforms a non-measurable set into the subset of a null set. So it is crucial for Thm. 7.8.7(4)

that the continuous function is the outer, not the inner function of the composition. On the other
hand, the theorem we have proved implies in particular that the product of measurable functions is

Exercise 7.23 Show: If f, g are measurable and g doesn’t vanish, then f /g is measurable. (You
may need to open up and slightly modify the proof of a useful lemma here. No need to repeat
everything, just indicate the changes.)

Exercise 7.24 Work out the details of the construction of a non-Lebesgue measurable set given on
page 13, substantiating the heuristic assumptions used there with actual results now proved. Include
a sentence of explanation why the Lebesgue measure is ‘translation invariant’ and what this means,

Exercise 7.25 Show: If f is Lebesgue integrable and A is Lebesgue measurable, then f χA is
Lebesgue integrable. (This means: we may integrate integrable functions over measurable sets,
writing A f (x) dn x := f (x)χA (x) dn x.)

Theorem 7.8.9 The Lebesgue measure is outer regular, i.e., given any set A, it holds µ∗ (A) =
inf{µ(U ) | U ⊃ A , U open}

Proof: The claim is trivial if µ∗ (A) = ∞; so assume µ∗ (A) = I ∗ (χA ) =: m < ∞. Given ε, we
want to find U ⊃ A open such that χU < ( χA ) + ε. It is possible to find a function f ∈ ↑ S (or,
                                       ↑ C 0 ) such that f ≥ χ and                    1
in view of Thm 7.7.7 a function f ∈ cpt                       A          f < µ∗ (A) + 2 ε. Choose such
an f ∈ ↑ Ccpt . So f is lower semicontinuous, and the set U = {x | f (x) > 1 − δ} is open for any
δ > 0. Moreover, this set contains A, and µ(U ) ≤ ( f )/(1 − δ), because f ≥ (1 − δ)χU . So we have
µ(U ) < (µ∗ (A)+ 2 ε)/(1−δ), and this is smaller than µ∗ (A)+ε if δ is chosen sufficiently small.

We can now prove the followng theorem, of which the converse had already been proved in
Prop. 7.6.8.

Theorem 7.8.10 If N is an I-null set in Rn (as defined in Def. 7.6.5) for the Lebesgue integral,
then N is a Lebesgue null set in the sense of Def. 7.3.2.

Proof: By outer regularity, we can find, for any given ε > 0, an open set U ⊃ N such that
µ(U ) < ε/5n , where n is dimension (of Rn ). U is the union of the open max-balls (which are
in particular boxes) contained in it. Let B be the family of these balls. The issue is now that
these balls overlap, so we cannot conclude from µ( B∈B B) to      µ(B). By a method called Vitali
covering argument, we first thin the family B out to get a maximal subfamily D of disjoint balls,
and then enlarge the balls to make sure that the enlarged balls cover U again:
Let R := sup{r(B) | B ∈ B}, where r(B) is the radius of B. Now R is finite, and actually
(2R)n ≤ µ(U ) since each ball must have measure no larger than the measure of U . Now let
Bj := {B ∈ B | R/2j < r(B) ≤ R/2j−1 } and let D1 be a maximal subfamily of B1 consisting of
mutually disjoint balls: i.e., choose B1 ∈ B1 arbitrarily; if there exists a B ∈ B1 that is disjoint
to B1 , choose such a ball and call it B2 . Continue inductively, choosing Bj disjoint to ∪i=1 Bi .
The algorithm terminates after finitly many steps, when no ball in B1 can be chosen any more.
(Specifically (#D1 ) Rn ≤ µ( B∈B1 B) ≤ µ(U ).) With D1 , . . . , Dk constructed (and consisting of
pairwise disjoint balls B1 , . . . , Bjk ), choose now further balls Bjk +1 , . . . Bjk+1 from Bk+1 (if any)

that are disjoint to all precedingly chosen balls Bi . These will make up the family Dk+1 ⊂ Bk+1 .
We thus obtain D = D1 ∪ D2 ∪ . . ., a family of countably (finite or infinite) many pairwise disjoint
balls Bi . This family is maximal in the sense that no further ball B ∈ B can be added that would
be disjoint to all Bi ∈ D. Now denote by 5Bi the ball that has the same center as Bi , but 5 times
the radius of Bi . We claim that {5Bi | Bi ∈ D} covers each B ∈ B = Bk and hence U . So let
B ∈ Bk . B intersects some Bj ∈ Dk because otherwise it would have been added to Dk when this
family was constructed. Since the radius r(B) ≤ R/2k−1 and r(Bj ) > R/2k , we habe B ⊂ 5Bj .
Now j µ(5Bj ) = 5n j µ(Bj ) = 5n µ( Bj ) ≤ 5n µ(U ) ≤ ε, and the family of boxes 5Bj covers U
and hence N . So N is a Lebesgue null set in the sense of Def. 7.3.2.

We have thus recaptured the geometrically intuitive description of Lebesgue null sets given in the
introduction in terms of the Daniell formalism, which defines null sets (and measure in general) in
terms of integrals.

Remark 7.8.11 There is a similar result stating that            ∗ χA   is the supremum of the measures of
compact sets contained in A; with analogous proof.

Finally, there is one key feature of the Lebesgue measure (and a very expected one from an intuitive
point of view) that, as an artefact of our approach through the vector lattice S, has been rendered
rather non-obvious; namely the rotation invariance of Lebesgue measure:

Theorem 7.8.12 Let T be an n × n matrix and b ∈ Rn . Consider the linear mapping φ : Rn →
Rn , x → T x + b. Then if A is a Lebesgue measurable set, then φ(A) is also Lebesgue measurable,
and µ(φ(A)) = | det T | µ(A). Specifically, if T is a rotation matrix, µ(φ(A)) = µ(A).
If f ∈ L1 (Rn ), and T is invertible, then   Rn   f (T x + b) dn x = | det T |−1   Rn   f (y) dn y.

Proof Sketch: By writing φ = τ ◦ ψ, where ψ : x → T x and τ : x → x + b, it suffices to prove
the theorem separately for translations τ and for mappings ψ that have b = 0. The translation
invariance of the measure is obvious for boxes Bj : µ(τ (Bj )) = µ(Bj ), and therefore it holds for all
step functions f that f ◦ τ −1 = f . (The inverse arises because χφ(A) = χA ◦ φ−1 , whenever φ
is invertible; specifically this applies to φ = τ .) Having proved the equality for functions in S, it
follows for functions in ↑ S and ↓ S by monotone limits, and then for all integrable function by upper
and lower approximation. This result for all integrable functions now includes in particular the
integrable functions χA . If χA is measurable but with µ(A) = ∞, then χτ (A) is also measurable,
using any of the characterisations of measurability we had; and µ(τ (A)) cannot be finite, because
otherwise µ(τ −1 (τ (A))) = µ(A) would have to be finite as well, due to the result for integrable
Now we have to prove a similar result for ψ : x → T x, and we assume first that T is invertible.
It is shown in linear algebra that such a T is a product of elementary matrices Ei , and since
det(E1 · · · Ek ) = (det E1 ) · · · (det Ek ), it suffices to prove the result for elementary matrices. El-
ementary matrices come in three kinds: one kind multiplies the j-th coordinate by a number a,
and it has determinant a; so µ(ψ(B)) = | det T | µ(B) holds for boxes B and elementary matrices
T of this kind. Another kind swaps two coordinates, and it has determinant −1, so the formula
µ(ψ(B)) = | det T |µ(B) holds again. The third kind is a shear mapping, and it has determinant 1.
For such a mapping ψ there exists (without loss of generality) a set S and a translation τ for which
χψ(B) − χB = χS − χτ (S) . Rather than working out the algebra, we give a picture that conveys the
idea, and focus on the analytical part:

                                                                        Without loss of generality
                 1 a              B                  ψ(B)               assume ah < l; else consider
          T =
                 0 1                                                    the difference of two boxes,
          ψ : x → Tx                                                    or take products of several T
                                                                        with smaller a

                                 τ (S)                      S

The measurability of χψ(B) is warranted because ψ(B) is open. As ψ(B) is contained in a compact
box, the integrability of χψ(B) follows. The integrability of χS − χτ (S) follows from the integrability
of χB and χψ(B) , and χS = (χS − χτ (S) )+ is now also integrable. But since χS = χτ (S) , we
conclude χB = χψ(B) . Having thus shown f = f ◦ φ−1 for characteristic functions of boxes,
it follows again for all functions in S, then for all functions in ↑ S or ↓ S, then for all integrable
functions, and this includes the characteristic functions of measurable sets with finite measure, and
then indirectly for measurable sets with infinite measure as well.
If T is not invertible, the claim amounts to showing that µ(φ(A)) = 0. We argue that φ(A) is a
subset of a strict subspace of Rn , which may be chosen to have codimension 1, and that such a
subspace is the image ψ(H) of a coordinate hyperplane H in Rn , under an invertible linear mapping
ψ. It therefore has measure 0 by the second part of the proof.

7.9     Fubini and Tonelli

The theorems of Fubini and Tonelli essentially explain the connection of multi-variable integrals
with iterated integrals. These theorems are not restricted to the Lebesgue integral, but can be
proved similarly for Daniell extensions of other elementary integrals; however, in order to avoid
defining the product construction generally, we write these theorems out only for the Lebesgue
Be aware that the construction of the Lebesgue measure and integral on Rn is of course dependent
on n: For instance R is not a null set in R, but R × {a} is a null set in R2 . We will denote the
Lebesgue measure in Rn as µn , and the Lebesgue integral over Rn will be written as Rn f (x) dn x.
We also write S, the vector lattice of step functions, as Sn or Sm to clarify on which domain these
step functions are defined.
We identify Rn ×Rm with Rn+m ; and for a two-variable function f : (x, y) → f (x, y), Rn ×Rm → E,
we have single variable functions f (x, ·) : y → f (x, y), Rm → R and f (·, y) : x → f (x, y), Rn → R.
First we state the theorem of Fubini.

Theorem 7.9.1 Let f ∈ L1 (Rn × Rm). Then for (µn -)almost every x ∈ Rn , the function f (x, ·) is
in L1 (Rm ); and the integral F (x) := Rm f (x, y) dm y defines a function F ∈ L1 (Rn ). It then holds

                                         f (x, y) dn+m (x, y) =        F (x) dn x
                                 Rn+m                             Rn

The analogous result holds with the order of integration reversed.

Proof: We prove this theorem first for characteristic functions of boxes, then for step functions,
and from there for functions in ↑ S and ↓ S, and finally for L1 functions.
An open box B in Rn × Rm is of the form B = B1 × B2 with B1 , B2 open boxes in Rn and
Rm respectively. Then f (x, y) := χB (x, y) = χB1 (x)χB2 (y) and F (x) = µ(B2 )χB1 (x). Since
µn+m (B) = µn (B1 )µm (B2 ), we obtain the theorem for this function f immediately. In preparation
for the case where f is a step function (with arbitrary values on the boundaries of the steps),
let us now assume that f (x, y) = χB (x, y) only for (x, y) ∈ ∂B (a µn+m -null set). Note that
∂B = ((∂B1 ) × B2 ) ∪ (B1 × (∂B2 )). We still have f (x, ·) integrable for every x ∈ ∂B1 (a µn -null
set), specifically f (x, ·) ≡ 0 if x ∈ ¯ 1 )c , and f (x, ·) = χB2 almost everywhere for x ∈ B1 , with
the exceptional null set being ∂B2 , a µm -null set. With this modification, Fubini follows for such
functions f .
Step functions are finite linear combinations of functions that agree with the characteristic function
of a box except on the boundary of this box. If Fubini holds for certain functions fi , then it
immediately follows for finite linear combinations of the fi , by linearity of the integrals. So we have
established Fubini for functions in Sn+m .
Now assume f ∈ ↑ Sn+m , with Sn+m ∋ fj ր f . Fubini has already been proved for the fj . We also
assume that f has finite integral, because we have to prove the theorem for integrable functions only.
Then fj (x, ·) ∈ L1 (Rm ) for all x outside a µn -null set Nj . With Fj (x) := fj (x, y) dm y defined
at least for x ∈ N = Nj (still a null set), we have Fj ∈ L1 (Rn ) and fj (x, y) dn+m (x, y) =
  Fj (x) dn x.

So (Fj ) is an increasing sequence of integrable functions on Rn \ N ; let’s call its limit F . By Beppo
Levi, F is still integrable and F (x) dn x = lim Fj (x) dn x, provided this limit on the right hand
side is finite. But lim Fj (x) dn x is finite indeed, because it equals lim fj (x, y) dn+m (x, y) ≤
  f (x, y) dn+m (x, y) < ∞. Being integrable, F is finite almost everywhere; the exceptional null set
may be larger than N , but this is no problem. By going to the limit j → ∞ in fj (x, y) dn+m (x, y) =
  Fj (x) dn x, we conclude f (x, y) dn+m (x, y) = F (x) dn x. We had defined F as the monotonic
limit of Fj ; but we still have to prove that F (x) = f (x, y) dm y for almost every x. But since
fj (x, ·) ր f (x, ·), we have f (x, y) dm y = lim fj (x, y) dm y ∈ R ∪ {∞} by the definition of the
integral on ↑ Sm . But the rhs is lim Fj (x) = F (x). This finishes the proof of Fubini for f ∈ ↑ S ∩ L1 ;
the proof for f ∈ ↓ S ∩ L1 is analogous.
Finally assume f ∈ L1 and let ↓ S ∋ hj ≤ f ≤ gj ∈ ↑ S with (gj − hj )(x, y) dn+m (x, y) < 1/j. We
can assume that (hj ) is increasing and (gj ) is decreasing, otherwise replace gj with min{g1 , . . . , gj }
and hj with max{h1 , . . . hj }. With a null set N ∈ Rn , we can now let Gj (x) := gj (x, y) dm y
and Hj (x) := hj (x, y) dm y for x ∈ N , and have gj (x, y) dn+m (x, y) = Gj (x) dn x and similarly
for hj , Hj . So for the non-negative functions Jj := Gj − Hj , we know that the sequence (Jj ) is
decreasing (since gj − hj is decreasing and the integral preserves inequalities). So let Jj ց: J. The
function J is integrable and J(x) dn x = lim Jj (x) dn x = 0. We claim that this implies J = 0 a.e.
Indeed, consider the set Ak := {x ∈ Rn | J(x) > 1/k}. Then µ(Ak ) = χAk ≤ kJ = k J = 0.
So Ak is a null set, and therefore so is k Ak = {x | J(x) = 0}.
With J = 0 a.e., we know that Gj and Hj have a common limit, which we call F : namely
Gj ց F and Hj ր F a.e. But this means the increasing sequence hj (x, y) dm y has a finite
limit F (x) for a.e. x, and so limj→∞ hj (x, ·) =: f∗ (x, ·) is an integrable function (on Rm ) for
a.e. x. Likewise limj→∞ gj (x, ·) =: f ∗ (x, ·) is integrable over y for a.e. x. And since (f ∗ (x, y) −
f∗ (x, y)) dm y = F (x) − F (x) = 0 for a.e. x, we have, for each x outside a null set, that the single
variable function f ∗ (x, ·)− f∗ (x, ·) = 0 a.e. (with the µm -null set of exceptional y’s depending on x).
So this common function f∗ = f ∗ must be f (since f ∗ ≥ f ≥ f∗ ). Now the theorem follows from
  gj (x, y) dn+m (x, y) = Gj (x) dn x by going to the limit. The lhs converges to f (x, y) dn+m (x, y)

by construction, the rhs converges to F (x) dn x with F (x) = lim Gj (x) = lim              gj (x, y) dm y =
 (lim gj (x, y)) dm y = f (x, y) dm y for a.e x.
The proof with the order of integration reversed is analogous.

Tonelli’s theorem is a partial converse of Fubini, namely:

Theorem 7.9.2 Let f : Rn × Rm → E be measurable. Further assume that the iterated integral
    |f (x, y)| dm y dn x exists (meaning |f (x, ·)| is integrable for a.e. x and F (x) := |f (x, y)| dm y is
integrable over x). Then f ∈ L     1 (Rn × Rm ) and Fubini applies.

Proof: Since f is measurable, the trunctated functions fN := med(−bN , f, bN ) with bN :=
N χ[−N,N ]n+m are integrable, and so are the |fN |. Then, by Fubini, we have

               |fN (x, y)| dm+n (x, y) =    ˆ
                                            FN (x) dn x   with   ˆ
                                                                 FN (x) =     |fN (x, y)| dm y .

We have |fN | ր |f |. To show |f | ∈ L1 (Rn+m ) by Beppo Levi, we need to have a finite upper bound
                                    ˆ                                                        ˆ
for |fN (x, y)| dn+m (x, y) = FN (x) dn x. Indeed, we conclude from |fN | ≤ |f | that FN (x) =
  |fN (x, y)| dmy ≤   |f (x, y)| d      ˆ                          ˆ
                                  m y = F (x) for a.e. x, and this F was assumed to be integrable. So

  FˆN (x) dn x ≤ F (x) dn x < ∞.
So we conclude that |f | is integrable, and hence f is integrable, too, using 7.8.3.

Remark 7.9.3 The iterated integrability of f , rather than |f |, does not ensure the multi-variable
integrability of f . For instance let f (x, y) := χ[0,1] (y − x) − χ[−1,0[ (y − x). Then F (x) :=
  f (x, y) dy = 0 for every x, and obviously F (x) dx = 0. However, f ∈ L1 (R2 ), because oth-
erwise, |f | given by |f (x, y)| = χ[−1,1] (y − x) would have to be integrable as well. But the set
{(x, y) | −1 ≤ y − x ≤ 1} has infinite measure in R2 , e.g., because it contains infinitely many
pairwise disjoint unit squares ]n − 2 , n + 1 [ × ]n − 1 , n + 2 [ for n ∈ Z.
                                             2         2

7.10      Lp spaces, and the Prominent Integral Inequalities

This section compares neatly with Sec 5.10 in the blue notes
The following generalization of L1 is often used:

Definition 7.10.1 Given a Daniell extended integral I on X, and p ∈ [1, ∞[, we denote by Lp (X),
or Lp (X, I), the set of those measurable functions f : X → E for which f Lp := I(|f |p )1/p < ∞
(in other words, for which |f |p is integrable).

It turns out that the finite valued functions in Lp form a vector space and · Lp is a semi-norm.
Accepting these results for the moment (they will be proved soon), we can define

Definition 7.10.2 Under the hypotheses of Def. 7.10.1, Lp (X) consists of equivalence classes of
functions f ∈ Lp (X) modulo equivalence a.e.

We then get the

Theorem 7.10.3 Lp (X) with the norm                        ·   Lp   is a Banach space.

We will mainly be interested in the case of the Lebesgue integral on Rn , but the case of the
summation ‘integral’ IΣ on Seq (Example 7.4.9) is also relevant. In this case, Lp and Lp need not
be distinguished because there are no null sets other than ∅, and Lp (N, IΣ ) is usually called ℓp .
Let us now fill in the details:
To see that the finite valued Lp functions form a vector space, we need to show that sums and
constant multiples of such functions are again in Lp . For constant multiples, this is trivial; for the
sum, we estimate pointwise

           |f + g|p ≤ (|f | + |g|)p ≤ [2 max{|f |, |g|}]p = 2p max{|f |p , |g|p } ≤ 2p (|f |p + |g|p )

So |f + g|p is measurable and has an integrable majorant, if f, g ∈ Lp .
The proof that equality a.e. is an equivalence relation on Lp , that addition of equivalence classes can
be defined in terms of addition of representatives ([f ] + [g] := [f + g] well-defined), and similarly for
multiplication with numbers, is exactly as in the case p = 1 (see Ex. 7.18). All seminorm properties
except for the triangle inequality are obvious. The triangle inequality has been proved for p = 1
already, so we only need to deal with the case p > 1 yet. And we’ll have to establish completeness
of Lp , which is a similar to the completeness of L1 proved in 7.7.13. The triangle equality for · Lp
is also called Minkowski’s inequality, it is a consequence of another important inequality, called
H¨lder’s inequality, and this inequality in turn follows from the following simple inequality for real

                                                                                                     1       1
Lemma 7.10.4 (Young’s Inequality) Let a, b ≥ 0 and 1 < p, q < ∞ such that                            p   +   q   = 1. Then
       p   q
ab ≤ a + a .
      p   q

We give several proofs for this inequality, in order to reinforce the message that skill with inequalities
is a core skill in analysis. Numbers p, q > 1 satisfying 1 + 1 = 1 are sometimes called each other’s
                                                            p   q
H¨lder conjugates.
First Proof of Lemma 7.10.4: We may assume a, b > 0, and we first prove the result for p > 1
rational. So let p = r/s with r, s ∈ N and r > s. Then q = r−s . By the inequality of the arithmetic
and geometric mean (Thm 4.2.x1), we get
                                                                                          

      ar/s · · · ar/s · br/(r−s) · · · br/(r−s)              1  r/s
                                                                 a + · · · + ar/s + br/(r−s) + · · · + br/(r−s) 
         s many              r − s many                                s many              r − s many

and the claim is immediate. Since the right hand side ap /p + aq /q depends continuously on p (with
q = p−1 ) and Q is dense in ]1, ∞[, we can extend the claim from rational to real p by letting
Q ∋ pn → p and moving the limit into the inequality for pn .

Second Proof of Lemma 7.10.4, using elementary calculus: Again assuming a, b > 0
and fixing ab = K, we want to show that K ≤ ap /p + (K/a)q /q =: f (a) with q = p−1 , for all a > 0.
With p fixed, the right hand side is a differentiable function of a, and it goes to ∞ as either a → 0+
or a → ∞. So it must have a global minimum on ]0, ∞[ by an easy corollary of Thm. 3.5.8, and at
this minimum, f ′ (a) must vanish. But f ′ (a) = ap−1 − K q a−q−1 , hence the minimum can only be
at a = a∗ := K q/(p+q) . Hence f (a) ≥ f (a∗ ) = p K pq/(p+q) + 1 K q−q /(p+q) = ( 1 + 1 )K pq/(p+q) = K.
                                                                q                  p   q

             1       1
(Note that   p   +   q   = 1 implies p + q = pq.)

A third proof, as a consequence of a more general Young inequality, can be found as Lemma
5.10.2, Cor. 5.10.3, where however you may want to substitute analogous references in Ch. 7 for
the given references in Ch. 5. Namely,

Lemma 7.10.5 Let f : [0, ∞[ → [0, ∞[ be continuous and one-to-one onto, with f (0) = 0. Then
for a, b > 0, it holds
                                                      a                    b
                                           ab ≤           f (x) dx +           f −1 (y) dy
                                                  0                    0

Proof: Let
                     A = {(x, y) ∈ R2 | 0 ≤ x ≤ a , 0 ≤ y ≤ f (x)}                                              B
                     B = {(x, y) ∈ R2 | 0 ≤ y ≤ b , 0 ≤ x < f −1 (y)}
                                                                                                                           a x
Then                                                                                                        a
                         µ2 (A) =    χA (x, y) d2 (x, y) =             χA (x, y) dy dx =                        f (x) dx
using Tonelli’s and Fubini’s theorem. Similarly,
                     µ2 (B) =       χB (x, y) d2 (x, y) =            χB (x, y) dx dy =                      f −1 (y) dy .

A and B are disjoint, so µ2 (A ∪ B) = µ2 (A) + µ2 (B). On the other hand, [0, a] × [0, b] ⊂ A ∪ B,
hence ab = µ2 ([0, a] × [0, b]) ≤ µ2 (A) + µ2 (B). This proves the claimed inequality.

Now Lemma 7.10.4 arises as a special case of Lemma 7.10.5 for f (x) = xp−1 and hence f −1 (y) =
y q−1 , using the Fundamental Lemma of Calculus to evaluate the integral.

We can now prove the H¨lder inequality:

                                                                                     1        1
Theorem 7.10.6 (H¨lder Inequality) Suppose p, q > 1 and
                      o                                                              p   +    q   = 1. Suppose f ∈ Lp (Rn ) and
g∈L q (Rn ). Then f g ∈ L1 (Rn ) and f g
                                         L1 ≤ f Lp g Lq .

Proof: If either f Lp or g Lq vanishes, the inequality is trivial, because f g = 0 a.e. So we now
assume these norms are positive. For simplicity and generality, we use the I notation again. From
the specifics of the Lebesgue integral, we use that f g is measurable if f and g are, see 7.8.7(4). (But
this is trivially also true for sequences f, g according to Exercise 7.21). Since |f g| ≤ |f |p /p + |g|q /q
pointwise by Young’s inequality, we conclude that f g is integrable. We write f g as αf · α g with a
number α > 0 yet to be chosen, and obtain from integrating Young’s inequality that

                                         αp             α−q            αp                p        α−q             q
                            I(|f g|) ≤      I(|f |p ) +     I(|g|q ) =    f              Lp   +       g           Lq
                                         p               q             p                           q
The best result is obtained by single-variable minimization of the right hand side with respect to
α, i.e., by choosing αp+q = g q q / f p p . Plugging this in obtains the H¨lder inequality.
                              L       L                                   o

Remark 7.10.7 The special case p = q = 2 of H¨lder’s inequality is called Cauchy-Schwarz in-

Finally, we prove the Minkowski inequality f + g                  Lp    ≤ f       Lp   + g     Lp :

       f +g   Lp   = I(|f + g|p ) ≤ I((|f | + |g|) |f + g|p−1 ) ≤ I(|f | |f + g|p−1 ) + I(|g| |f + g|p−1 )
Using H¨lder’s inequality with p and q =            p−1 ,   we continue

                                    p                             p−1                             p−1
                             f +g   Lp   ≤ f    Lp      f +g      Lp    + g       Lp   f +g       Lp

and hence the Minkowski inequality by cancellation.

Remark 7.10.8 Written out in integrals, Minkowski’s inequality reads
                                              1/p                       1/p                     1/p
                                  |f + g|p          ≤         |f |p           +         |g|p

For sequences, it reads
                                              1/p                       1/p                           1/p
                                |fn + gn |p         ≤         |fn |p          +          |gn |p

Restricting this to finite sequences, we obtain the p-norm for vectors in Rn , namely f p =
( n |fj |p )1/p , which generalizes the taxi norm (p = 1) and the euclidean norm (p = 2) and
contains the max-norm as a limiting case p → ∞.

Exercise 7.26 For f = (f1 , . . . , fn ) ∈ Rn , show that limp→∞ f                      p   = maxj |fj |.

Finally, we want to see that Lp is complete.

Theorem 7.10.9 The normed vector space Lp (X) is a Banach space

Proof: This proof is a slight modification of the proof of Thm. 7.7.13. The well-definedness of
the vector space operations on the equivalence classes carries over directly.
Suppose ([fn ]) is a Cauchy sequence in Lp (X). We define nj inductively such that fn −fm Lp < 2−j
provided n, m ≥ nj and (for j > 1) nj > nj−1 . Then gN := N |fnj+1 − fnj | defines an increasing
sequence (gN ) of measurable functions, converging pointwise to a function g (which, as of yet,
might be infinity in many points). But gN Lp bounded by            2−j = 1. So the gN are integrable,
with integral ≤ 1p = 1, and so their monotonic limit gp , is also integrable by Beppo Levi. As
an integrable function, gp is finite a.e., and so is g. Then, for each x for which g(x) is finite, the
sequence fnk (x) = fn1 (x) + k (fnj (x) − fnj−1 (x)) converges, because it converges absolutely. In
other words, (fnk )k converges a.e. to a function f that is measurable as an a.e. limit of measurable
Now |fnj − fni |p is bounded as ni → ∞, indeed it is ≤ (2−j )p for i > j. Keeping nj fixed and
letting ni → ∞, Fatou’s lemma implies that |fnj − f |p ≤ lim inf ni →∞ |fnj − fni |p ≤ 2−jp . So
f ∈ Lp , and [fnj ] → [f ] in Lp norm as nj → ∞. We know from Lemma 3.11.3 that a Cauchy
sequence that has a convergent subsequence is convergent itself.