Document Sample

Chapter 7 Integration This chapter is to be used as an alternative to chapter 5 in Conrad Plaut’s course notes. It achieves more or less the same goal, but via a diﬀerent route, and cross references between this chapter and chapter 5, except explicitly stated, will be close to impossible until after the chapter is accomplished. We will obtain the Lebesgue integral by means of an extension process from more elementary notions of an integral, among which is the Riemann integral. 7.1 Vector spaces Deﬁnition 7.1.1 A real vector space is a triple (V, +, ·) consisting of a set V , a binary operation + on V (i.e., a function + : V × V → V , and a multiplication of elements of V with real numbers, i.e., · : R × V → V satisfying the following rules (with the dot often omitted): ∀u, v ∈ V : u+v =v+u ∀u, v, w ∈ V : (u + v) + w = u + (v + w) ∃0 ∈ V ∀v ∈ V : 0+v =v ∀v ∈ V ∃w ∈ V : v+w =0 ∀λ ∈ R ∀u, v ∈ V : λ(u + v) = λu + λv ∀λ, µ ∈ R ∀v ∈ V : (λ + µ)v = λv + µv ∀λ, µ ∈ R ∀v ∈ V : (λµ)v = λ(µv) ∀v ∈ V : 1v = v In the context of vector spaces, the elements of V are called vectors, and the numbers (in R) are called scalars. It is easy to see that the additive inverse w to any given v ∈ V , whose existence is asserted in the 4th axiom, is unique, and we denote it as −v. Note that in these axioms, + denotes two operations; addition in R and addition in V ; likewise there are two multiplications denoted by juxtaposition or dot; multiplication in R and multiplication of real numbers and elements of V . The context clariﬁes which operation is meant. While in a number of examples of vector spaces, there is also a multiplication among vectors, such an operation is not part of the deﬁnition of a vector space. Deﬁnition 7.1.2 A complex vector space is a triple (V, +, ·) consisting of a set V , a binary oper- ation + on V (i.e., a function + : V × V → V , and a multiplication of elements of V with complex numbers, i.e., · : C × V → V satisfying the same rules as given in the previous deﬁnition, except that R is to be replaced with C throughout. 1 In like manner, we can deﬁne vector spaces over any ﬁeld F; but only real or complex vector spaces are used in this course. Example 7.1.3 Rn is a real vector space for each n. Cn is a complex vector space; or it can be viewed also as a real vector space by restricting the scalars to be real numbers. Example 7.1.4 F(A → R), the space of functions from any set A into R is a real vector space with the usual deﬁnitions that (f + g)(a) := f (a) + g(a) and (λf )(a) = λf (a). The special case A = {1, 2, . . . , n} retrieves Rn , if we view an n-tuple (x1 , . . . , xn ) as a list of values (x(1), . . . , x(n)) of a function x : {1, . . . , n} → R. — Similarly, we can allow functions with complex values and view F(A → C) as a complex, or as a real vector space. Example 7.1.5 Subsets of F(A → R) tha satisfy the (algebraic) closure property, namely that with any two elements f, g, their sum f + g and the negative −f , and all scalar multiples λf are also contained in the subset, are a vector space in their own right (called a subspace of F(A → R)). In particular C 0 (X → R) is a real vector space for any metric space X. ⇒ At this place, merge section 5.9 ‘Banach Spaces’ 7.2 The Riemann Integral ⇒ Copy this verbatim from Sec. 5.1 until Prop 5.1.2; and Ex 5.1.4 (Note that Cor 5.1.3 in the notes, as it is written, is wrong; and the ‘proof’ reveals what is actually intended.) The deﬁnition of Riemann integral generalizes readily to a multi-variable setting. We can integrate over boxes [a1 , b1 ] × · · · × [an , bn ], and a partition of a box is the cartesian product of partitions of (i) (i) (i) intervals. So if ai = x0 < x1 < . . . < xki = bi are partitions of [ai , bi ] into ki subintervals, then (1) (1) (n) (n) the box [a1 , b1 ] × · · · × [an , bn ] is partitioned into k1 · · · kn subboxes [xs1 , xs1 +1 ] × · · · × [xsn , xsn +1 ], which we will re-index and call Bj for j ∈ {1, 2, . . . , k1 · · · kn }. Example: a partition of a 2dim box: not a partition: The volume of a box B = [a1 , b1 ] × · · · × [an , bn ] is deﬁned to be (b1 − a1 ) · · · (bn − an ) and will be denoted as µ(B). The Riemann sums S(P, {cj }) are deﬁned as f (cj )µ(Bj ), where j is an index counting all boxes. The above theorems and their proofs carry over naturally. 7.3 Overview over the Lebesgue Integral In this section, we deviate from the theorem and proof style and give a heuristic overview with the basic ideas, examples and results given without proof. This section serves as a motivation, and to give the big picture. In later sections we will resume the rigorous exposition, ﬁlling in the details 2 and prove all the core results needed to obtain the full strength Lebesgue integral. (Properties of some further examples in this overview section may remain unproved inasmuch as only the motivation, but not the rigorous development depends on them. Speciﬁcally, Example 7.3.6 below is of this manner; it has been patched together from results that can be found proved in Zygmund’s book ‘Trigonometric Series’). So from a stricly logical point of view, the present section can be omitted altogether; however this section is written in the belief that such omission would be utterly unadvisable from a didactic point of view. Deﬁciency of the Riemann Integral In analysis, various versions of a completeness notion are crucial properties: For instance for real numbers, we have the property that if (xn ) is a Cauchy sequence, then lim xn exists. Short of this property, even very simple calculus theorems like the intermediate value theorem would fail. (Remember for instance that in Q, the continuous function x → x2 − 2 does not take on the intermediate value 0.) Advanced subjects like diﬀerential equations (especially artial diﬀerential equations) beneﬁt a lot from carrying over calculus ideas from Rn to vector spaces of functions. We have seen examples: The space of continuous functions (on a compact set) can be equipped with the supremum norm (which sponsors the notion of uniform convergence), and we get a complete metric space; actually a Banach space (this is a vector space in which a metric is deﬁned by a norm, and which is a complete metric space with this metric). In ODEs, the existence and uniqueness theorem for the initial value problem (with Lipschitz right hand side) is proved by simple calculus like methods (Banach’s ﬁxed point theorem) in this vector space. In PDEs, similar ideas carry over, even though the Banach ﬁxed point theorem is a less universal tool there than it is in ODEs. In a number of applications however, for instance Fourier series and calculus of variations (and PDEs in trail of calculus of variations methods), norms deﬁned in terms of integrals come very natural and are strongly suggested by the very nature of the problem under consideration. Speciﬁcally, b 1/2 2 the norm f 2 := a |f (x)| dx of a function f on [a, b] is analogous to the euclidean norm 2 1/2 b f 2 := i |fi | of a vector f in Rn . Likewise the norm f 1 := a |f (x)| dx of a function f on [a, b] is analogous to the taxi norm f 1 := i |fi | of a vector f in Rn . But in contrast to Rn , where we have seen that these norms (or the metrics obtained from these norms) are topologically equivalent, the corrsponding norms for functions are not topologically equivalent, and particularly they provide inequivalent notions of completeness. The space C 0 [a, b], which is complete with respect to the max distance, is not complete with respect to either the norm · 2 or the norm · 1 . Even if we allowed all Riemann integrable functions, rather than only the continuous funcitons, we would still not have a complete metric space with respect to any of the integral norms of interest. This deﬁcit prevents us from pursuing the analogy with ﬁnite dimensional calculus in suﬃcient depth to prove useful results. The Lebesgue integral is a generalization of the Riemann integral. Every Riemann integrable func- tion is Lebesgue integrable, and both integral notions coincide for Riemann-integrable functions. However, there are functions that are not Riemann integrable, but are Lebesgue integrable. The Lebesgue integral will mend the deﬁcits of the Riemann integral. Clariﬁcation: When I say ‘Riemann integrable’, I am referring to the proper Riemann integral only. 1 1 Improper Riemann integrals like eg. 0 x−1/2 dx := lima→0+ a x−1/2 dx are not Riemann integrals, but arise from them via a secondary limiting process. Keep this in mind when interpreting the statement that every Riemann integrable function is also Lebesgue integrable. A similar statement 3 ∞ would not apply for improper Riemann integrals: The improper Riemann integral 0 sin x dx is x often cited as an example that is not a Lebesgue integral. Well, it is not a Riemann integral either! N And limN →∞ 0 sin x dx is perfectly good as a limit of Lebesgue integrals, too. x With Lebesgue’s notion of integral, certain functions that are too ‘wild’ to be Riemann integrable will be integrable under Lebesgue’s deﬁnition. But let me stress that the main beneﬁt of the Lebesgue integral is not that we want to integrate ‘wild’ functions; rather we want to be able to integrate wild functions so that their existence cannot screw up a clean theory. Even in situations where all functions involved will (eventually) turn out to be continuous and hence Riemann inte- grable, we may need the Lebesgue integral to arrive at that conclusion in the ﬁrst place. It is analog to the following situation from elementary calculus: Even though a statement like ∞ 2−n = 1 n=1 can be made sense of and proved rigorously within Q alone, without reference to irrational num- bers, any reasonably general theorem that asserts the convergence of the series beforehand cannot be proved within Q, as such a theorem should also cover cases where the sum turns out to be irrational. (Note on notation: We will study deﬁnite, not indeﬁnite, integrals. In the absence of an explicit domain of integration, the domain of integration should be understood from the context; usually R or Rn .) Let us study a few examples of functions that ‘ought to be’ integrable but are not, in the sense of Riemann. Some claims will only be justiﬁed in a heuristic sense, because they serve for motivation purposes only. A rigorous justiﬁcation that the examples possess the claimed properties could be supplied a-posteriori, once the Lebesgue integral is constructed. Example 7.3.1 (One has to quote this, b/c everybody else does.) For any set M , its characteristic function χM is deﬁned by χM (x) = 1 if x ∈ M and χM (x) = 0 otherwise. The function χM for M := Q ∩ [0, 1] is not Riemann integrable. This example will seem less convincing once the road to the Lebesgue integral is traveled, because we will say that this χM is ‘almost’ the constant 0, except that it diﬀers from the constant 0 only on the set M , which is so small that it is considered as immaterial by the integral (because it is countable). So we will end up identifying χM with the zero function (which was Riemann integrable all along). Much ado about nothing? We’ll see. Let me clarify this notion of sets that are ‘too small to be seen by the integral’: Deﬁnition 7.3.2 A set A ⊂ R is called a Lebesgue null set, if for every ε > 0, there exists a sequence of intervals Ij := ]aj , bj [ whose total length ℓ = j (bj − aj ) satisﬁes ℓ < ε, and which covers A, i.e., A ⊂ j Ij . — We often simply say: null set, instead of Lebesgue null set. A similar deﬁnition applies for subsets of Rn , where boxes with total volume ε play the role of the intervals. ‘Box’ refers to a cartesian product of intervals of course. Deﬁnition 7.3.3 We will use the wording ‘almost everywhere’ (ae) to mean ‘everywhere except on a null set’. Obviously every countable set {xj | j ∈ N} is a null set, because the intervals Ij := ]xj −ε/2j+2 , xj + ε/2j+2 [ satisfy the requirements of the deﬁnition. Similarly, unions of countably many null sets are null sets again. Null sets deserve to be considered as having the length (in R) or volume (in R3 ) zero. Since the volume of nice sets A can be deﬁned as χA , the characteristic function of a null set ought to be integrable, with integral 0. From this point of view, Example 1 is a good example, 4 not a contrived one: the integrability of χM says that M has a meaningful ‘total length’ (namely 0 in the case of M = [0, 1] ∩ Q). Exercise 7.1 Based on the deﬁnition, prove that a countable union of null sets is a null set. For the Lebesgue integral to be constructed, the following property will hold true: If f is deﬁned and g = f ae. (i.e., {x | g(x) = f (x)} is a null set), then g is also deﬁned, and g = f . In colloquial language, the integral is insensitive to changes of the integrand on a null set. — A variant of this property also holds for the Riemann integral. However in this case, we need to assume that g be deﬁned, but we cannot conclude it a-priori, as shown by Example 7.3.1. Note: Just for the record, and without proof: A bounded function f : [a, b] → R is Riemann integrable if and only if the set of points where it fails to be continuous is a null set. Obviously χM in Example 7.3.1 is discontinuous on all of [0, 1], and you may believe that this set is not a null set (even though this statement requires proof of course). Example 7.3.4 The usual Cantor set C, which arises from [0, 1] by successively removing the ‘middle third’ of each remaining interval, is a Lebesgue null set. Its characteristic function is Riemann integrable (with integral 0) because of the criterion just mentioned. χC is discontinuous exactly on C. However, a slight modiﬁcation of the construction gives a ‘fat Cantor set’ C∗ with 1 measure (total length) 2 . Its characteristic function is not Riemann integrable any more. This fat 2 Cantor set can be constructed as follows: C0 := [0, 1], C1 := [0, 1 ] ∪ [ 3 , 1]. Given Cn , which is the 3 union of 2 n closed intervals, obtain C n n+1 by removing from the middle of each of these 2 intervals 5 7 2 an open interval of length 2−n 3−n−1 . So, for instance C2 = [0, 36 ] ∪ [ 36 , 1 ] ∪ [ 3 , 29 ] ∪ [ 31 , 1]. Let 3 36 36 ∞ n × 2−n 3−n−1 = 1 has been removed, C ‘ought C∗ := n Cn . Given that a total length of n=0 2 2 ∗ to have’ the remaining length 1 (and the notion of Lebesgue measure will conﬁrm this). The points 2 of discontinuity of the function χC∗ are exactly those in the set C∗ . Since C∗ is not a null set, χC∗ is not Riemann integrable. It is certainly annoying that the very same type of construction and the very same type of heuristic reasoning concerning C and C∗ and their respective measure (length) is borne out rigorously by the Riemann integral in one case, but not in the other. Exercise 7.2 Prove that for the Cantor set C, the characteristic function χC : R → R is discon- tinuous on C and continuous on the complement of C. Example 7.3.5 The function f deﬁned by f (x) := |x|−1/2 e−|x| is not Riemann integrable, because it is not bounded (and also because the Riemann integral allows only bounded domains of integra- tion). This deﬁcit has been mended with the notion of the improper Riemann integral. With this ∞ 1 √ patch by means of a secondary limit, we can ﬁnd −∞ f (x − a) dx = 2Γ( 2 ) = 2 π for every a. (The precise value is not of essence here.) It may now seem natural to wish that the function g given by g(x) := ∞ 21 f (x−an) be integrable n=1 n for any sequence an for which this inﬁnite sum produces a well-deﬁned g: after all, inﬁnite sums ∞ √ √ are bread and butter of calculus. The value of −∞ g(x) dx ought to be ∞ 21 2 π = 2 π. n=1 n But if the sequence (an ) is dense in R, the function g is unbounded on every small interval, and the ‘improper integral’ patch does not help here. You might wonder whether in such a case g itelf is actually still well deﬁned, and this is not so easy to answer; it would however eventually turn out that g does have a ﬁnite value for all x except on a certain null set. So, whereas I am not in a position to immediately refute the legitimate concern on whether g is well-deﬁned, it turns out eventually that this concern can be allayed. (You might want to modify the example a bit and take 5 for an the sequence −1, 0, 1, − 2 , 1 , − 3 , − 4 , 1 , 3 , . . ., which may give you a chance to prove the claim 1 2 4 1 4 4 that g is well deﬁned outside a null set in a pedestrian way. — I haven’t tried to do it, admittedly.) Example 7.3.6 The function g(x) := ∞ n cos(2n x) seems natural to be considered in the con- n=1 1 text of Fourier series. It has the following properties (which can be proved after the Lebesgue integral is established, and which are not at all obvious): The series converges almost everywhere (i.e., for all x outside a certain null set). This implies that g is a meaningful function. However the con- vergence is almost nowhere absolute (i.e., the set of x for which the series is absolutely convergent forms a null set.) This indicates that it may be rather diﬃcult to prove anything useful about g. Moreover, the function g is unbounded on every small interval, and therefore nowhere continuous. This means that it is way out of reach for the Riemann integral. Since g is unbounded, g2 is even 2π worse unbounded. Nevertheless, we would like to argue that 0 g(x)2 dx = π ∞ ( n )2 = π 3 /6 n=1 1 (using a famous result of Euler’s namely that n −2 = π 2 /6); and this will be true, in the sense of the Lebesgue integral. On a formal level, the calculation substantiating this hope is quite easy: 2π 2π 2π 1 1 1 1 g(x)2 dx = cos 2n x cos 2m x dx = cos 2n x cos 2m x dx = π 0 0 n n m m n m mn 0 n n2 because the integral under the double sum vanishes when m = n and is π when m = n. But this formal calculation is nearly impossible to justify within the realm of classical calculus, be- cause for almost all x, the series we are multiplying are only conditionally convergent, whereas absolute convergence would be needed for at least one factor to justify multiplying out the series distributively. The Completion Process Let’s start with either the set (vector space) C 0 [a, b] of continuous functions on an interval [a, b], or the vector space of Riemann integrable functions R[a, b]. Actually, you could also use the vector space S[a, b], which by deﬁnition consists of uniform limits of step functions; and step functions are (ﬁnite) linear combinations of characteristic functions of intervals. (The inclusions C 0 ⊂ S ⊂ R can be seen easily, and they are strict.) b We deﬁne f L1 := a |f (x)| dx. Previously we had called this norm · 1 . It is easy to see that · L1 is a norm on C 0 [a, b]. However it is only a seminorm on R[a, b] (or S[a, b]). By deﬁnition, a seminorm satisﬁes all properties of a norm with the exception of “ f = 0 =⇒ f = 0”. For instance, χ{x0 } , the characteristic function of a single point, satisﬁes χ{x0 } L1 = 0. C 0 with · L1 is not a Banach space. For instance the sequence fn in C 0 [−1, 1], deﬁned by fn (x) := nx/ 1 + (nx)2 , is a Cauchy sequence but it has no limit in C 0 . The function f (x) := sign x, which ‘ought to be’ the limit of fn is not in C 0 . Exercise 7.3 Prove these statements; namely for fn (x) := nx/ 1 + (nx)2 , f (x) := sign x and g ∈ C 0 [−1, 1] (a) Show that fn → f uniformly on any compact interval that does not contain 0. b (b) Show if fn → f uniformly on [a, b] then fn − f L1 [a,b] := a |fn (x) − f (x)| dx → 0. [Note: A similar statement assuming pointwise instead of uniform convergence is false, so don’t attempt to prove that.] 1 b (c) Show if −1 |fn (x) − g(x)| dx → 0, then a |fn (x) − g(x)| dx → 0 for [a, b] ⊂ [−1, 1]. 6 (d) Show by epsilontics and decomposition of [−1, 1] into pieces [−1, −a] ∪ [−a, a] ∪ [a, 1] that fn is a Cauchy sequence with respect to · L1 . (e) Put the pieces together and show that fn does not converge to any continuous function g with respect to the · L1 norm. Be sure to argue carefully in distinguishing uniform, pointwise and · L1 convergence and employing valid implications among these notions. N 1 Similarly (but far more diﬃcult to prove), the ﬁnite sums fN deﬁned by fN (x) := n=1 n cos(2n x) form a Cauchy sequence in C 0 that has no limit. The very cheap method of completion: There is a ‘soft functional analysis’ way to achieve completeness: From any normed vector space X, we deﬁne a vector space CX whose elements are Cauchy sequences from X. Addition of Cauchy sequences is deﬁned term by term: The sum of Cauchy sequence (fn )∞ and Cauchy sequence n=1 (gn )∞ is the Cauchy sequence (fn + gn )∞ . (It is easy to see that this is indeed a Cauchy n=1 n=1 sequence). Similarly, a multiplication of Cauchy sequences with real numbers is deﬁned. On the vector space CX, an equivalence relation is deﬁned as follows: The Cauchy sequence (fn ) is said to be equivalent to the Cauchy sequence (gn ), iﬀ fn − gn → 0. Consider the set of equivalence ˆ ˆ classes of Cauchy sequences and call it X. It can be shown that X is a vector space, that a norm ˆ ˆ can be deﬁned on X, and that X imbeds into X, in that f ∈ X can be retrieved as the equivalence ˆ ˆ Finally one can show that X is a Banach space. class of the constant sequence (f, f, f, . . .) in X. It is called the completion of X. It turns out that, in this sense, the completion of C 0 [a, b] with respect to the norm · L1 gives L1 [a, b], the ‘vector space of Lebesgue integrable functions on [a, b]’. This fact seems convenient for the purpose of illustration. It tells us: What we are doing is indeed analogous to the completion that leads us from Q to R in calculus. However, for purposes of analysis, this general-abstract approach is as useful as a junk car that breaks down in the ﬁrst curve. The principal diﬃculty being that there is no way to identify the abstract equivalence classes of Cauchy sequences with actual functions [a, b] → R. As a matter of fact, strictly speaking, the elements of L1 [a, b] are not functions, but equivalence classes of functions, with respect to the equivalence relation =ae . As mentioned above, we say f =ae g if the set {x | f (x) = g(x)} is a Lebesgue null set. (It’s easy to show that this deﬁnes an equivalence relation). Without this identiﬁcation, we would import the same problem into L1 that we already observed in S and R: · would be a seminorm, not a norm. Nevertheless, by negligent use of language, we sometimes call the elements of L1 ‘functions’ rather than ‘functions modulo equality ae.’. ˆ But even the identiﬁcation of abstract elements of X with such equivalence classes of functions with respect to the relation =ae cannot be made merely within the abstract construction of X. Toˆ prove that the abstract X ˆ indeed consists of functions modulo =ae , one has to get one’s hands dirty and do hard analysis instead of soft functional analysis. And then, after the fact, one can see that ˆ the abstract X is L1 , if one starts with X either C 0 , or R, or S. The completion method just outlined is cheap and virtually useless: Cheap in that it only uses general abstract principles about metric spaces, dodging any detail work concerning the speciﬁc example at hand. Useless for the very same reason that it gives no insight in the speciﬁc metric space(s) at hand. It does show that the analogies with R about completeness that were made above are indeed valid and not merely vague metaphors; more precisely, it would show this, once we have done the detail work and then proved that the outcome of the detailed completion construction 7 falls into this general framework. The Daniell Approach This approach of constructing the Lebesgue integral has the advantage that it does not a-priori require any measure theory. Any notions relating to ‘measure’ (the abstract generalization of the naive volume) will come out of the integral once it is constructed. One begins with an ‘elementary integral’, which could be the Riemann integral on R[a, b], or its restriction to S[a, b] or C 0 [a, b]. (Starting with an elementary integral that is even a bit less powerful than the Riemann integral makes life a bit easier in the elementary courses without later exacting a price when constructing the Lebesgue integral from it.) We call these functions elementary- integrable. We use E for either C 0 or R or S. We can also let E stand for compactly supported1 C 0 functions on R, or Rn . In either case (and a in whole lot of other cases not in focus here), the following extension procedure applies: One extends the integral to other functions by a limiting process that builds on pointwise monotonic convergence and seems well motivated in view of the above examples of functions that ‘ought to be’ integrable. For the extension procedure, the following properties of the elementary integral are crucial: It is monotonic (f ≤ g =⇒ f ≤ g), and its domain is a vector space of real valued functions with the extra property that for f and g integrable, their max{f, g} and min{f, g} are also integrable. Moreover, one needs the property that for monotonic (increasing or decreasing) sequences fn of integrable functions whose limit f is again integrable, it holds fn → f (Continuity of the elementary integral with respect to monotonic convergence). The Riemann integral, restricted to 0 Ccpt satisﬁes these hypotheses. In a ﬁrst step we deﬁne the integral for ‘lower functions’ and for ‘upper functions’: A function is called a lower function (∈ L) if it is the (pointwise) limit of an increasing sequence of elementary- integrable functions. A function is called an upper function (∈ U) if it is the (pointwise) limit of a decreasing sequence of elementary-integrable functions. (Values +∞ are allowed for f ∈ L, and values −∞ are allowed for f ∈ U.) As an aside, let it be noted that in the case E = C 0 , the set L consists exactly of the lower semicontinuous functions, i.e., functions f that satisfy lim inf xn →x f (xn ) ≥ f (x), and the set U consists of the upper semicontinuous functions, i.e., func- tions f that satisfy lim supxn →x f (xn ) ≤ f (x). Exercise 7.4 Prove: A function f : Rn → R is lower semicontinuous if there exists a sequence (fk ) of continuous functions fk : Rn → R such that fk ր f . Hint: Make sure not to overlook the fact that δ may depend on k. Show ﬁrst fk (x0 ) ≤ lim inf x→x0 f (x) + ε for every k and every ε. Note: The converse is also true, but we have no need to prove this at the moment. [I have coined the words lower function and upper function ad hoc and am not aware whether there is a ‘canonical name’ for them. Neither Royden’s Real Analysis nor Floret’s (German language) book on measure and integration theory, which I have used as references, are coining a term at all.] If E ∋ fn ր f ∈ L, we deﬁne f := lim fn (and the limit exists by monotonicity; it may be +∞). A similar deﬁnition is made for f ∈ U. Of course one needs to show that these new integrals on L and U coincide with each other and/or with the elementary integral for those functions where several are deﬁned, and that the usual 1 A function f : Rn → R is called compactly supported if the closure of the set {x | f (x) = 0} is compact; or equivalently by Heine-Borel, if there is a bounded set S such that f vanishes on the complememnt of S. 8 properties (like linearity, monotonicity) are still valid. Some work is to be done here, but nothing out of the ordinary. In a second step, one deﬁnes an upper and lower integral for an arbitrary function (reminiscent of the upper and lower Riemann sums): For a function f deﬁne the upper integral f := inf g g ≥ f; g ∈ L and the lower integral f := sup g g ≤ f; g ∈ U Note that we put ‘upper functions’ (= functions constructed by approximation from above) below f , and ‘lower functions’ (= functions constructed by approximation from below) above f . The other way round, namely approximating f from below by means of functions that themselves were approximated from below would be a bit redundant and far less powerful. For instance, we would get χC∗ ∈ U all right, but the supremum of those functions in L that are ≤ χC∗ would be the zero function. This would, according to the following deﬁnition, disqualify χC∗ from being integrable: A function is called integrable if its upper and lower integral coincide and are ﬁnite. If the upper and lower integrals coincide, but may have the common value +∞ or −∞, we still permit ourselves to write f = ∞ (or f = −∞) in these cases respectively. Whether we start with R, S, or C 0 , the resulting integral turns out to be the same, and this is the Lebesgue integral. Exercise 7.5 Show: If f is lsc and f ≤ χC∗ , then f ≤ 0. (χC∗ refers to the characteristic function of a fat Cantor set, cf. Example 7.3.4) Again some work needs to be done to establish that this notion of the integral is not in conﬂict with preceding notions and that the integral satisﬁes the fundamental properties. On the set of integrable functions, we have the seminorm f L1 := |f (x)| dx. A function f satisﬁes f L1 = 0 if and only if f = 0 almost everywhere (a.e.). By identifying functions that are equal a.e., · L1 becomes a norm on the equivalence classes. This set of equivalence classes of integrable functions modulo the relation ‘equal a.e.’ is called L1 . It turns out that L1 with this norm is a Banach space, and that C 0 is dense in L1 , i.e., for every function (i.e., equivalence class of functions modulo equality a.e.) f , there exists a sequence of continuous functions fn such that fn − f L1 → 0. . Note that the deﬁnition of L1 has inherently used one property of the Lebesgue integral that you may not have anticipated: If f is integrable, then |f | is also integrable. This, in particuar, implies that sin x/x on R is not Lebesgue integrable. There is no notion of conditional convergence in the theory of the Lebesgue integral, it must be absolute convergence or nothing. This feature can be perceived as a necessary consequence of the fact that in the construction of the Lebesgue integral, the domain of the functions does not need to have a notion of order: it works equally for R and Rn . The area under the graph of a function f : R → R is not constructed by successively adding up ‘thin vertical slices from left to right’ as is done with Riemann’s integral, but rather in terms of horizontal stripes ‘bottom up’ or ‘top down’. Absolute convergence is the notion that is robust to changes in the order of summation, whereas conditional convergence may well produce expressions ∞ − ∞ when the order of terms is changed. We’ll list the fundamental properties of the Lebesgue integral in the next section. These properties will in particular imply a ‘saturation’ feature, namely: if we start over with the extension procedure, using the Lebesgue integral as an elementary integral, we do not gain further integrable functions. 9 Note: The extension process is rather general: We might deﬁne a function f : N → R (i.e., a sequence) as ‘elementarily integrable’ iﬀ fn = 0 for all but ﬁnitely many n. In this case, the elementary integral would be the (ﬁnite) sum fn . The extension process would yield as integral notion the absolute convergence of the series fn . This stresses an analogy between integrals and inﬁnite series, and it allows to use the strong exchange of limit theorems of Lebesgue’s theory of integration to be carried over to absolute convergent series. Key properties of the Lebesgue integral Preface: Occasionally, the hypothesis ‘f measurable’ is going to show up. This notion will be deﬁned below. For the moment, simply be aware that in all practical applications this hypothesis will be satisﬁed almost trivially. The only way to ever encounter a non-measurable function is to actively look for one. I prefer to collect all key properties together here rather than postponing those that involve measurability. Theorems are formulated for integrals over all of Rn . This is no loss of generality, since M f = Rn (f χM ). They will be re-quoted with a reference number, and proved, in due time. Theorem: The integral is linear; also, if f and g are integrable, then so are max{f, g} and min{f, g}. If f is integrable and k ≥ 0 is a constant, then min{f, k} and max{f, −k} are also integrable. If f and χA are integrable, then f χA is integrable. Theorem: (monotone convergence) Assume fn ∈ L1 is a non-decreasing sequence and let f := lim fn . Then fn → f ; in particular, f ∈ L1 if lim fn < ∞. (An obvious analog holds for non-increasing sequences.) Theorem: (Fatou’s lemma) Let fn ∈ L1 be non-negative and assume fn → f a.e. Assume lim inf fn < ∞. Then f ∈ L1 and f ≤ lim inf fn . There are three key types of examples to show that equality needn’t hold in Fatou’s lemma: 1 (a) spreading: With (e.g.) g(x) := 1/(1 + x2 ), we let fn (x) = n g(x/n) = n/(n2 + x2 ). Then fn = π for all n, but fn → 0 pointwise. (b) concentration: fn (x) := ng(nx) = n/(1 + n2 x2 ). Again fn = π independent of n, but fn → 0 a.e. (everywhere except at x = 0). [If you want convergence everywhere, you can modify this example, say take g(x) := |x|/(1 + x4 ).] (c) running oﬀ to inﬁnity: fn (x) := g(x − n) Exercise 7.6 For each of these three examples, sketch graphs of f1 , f2 and f4 in a common coor- dinate system (one ﬁgure with three graphs for each example). Theorem: (dominated convergence) Suppose fn ∈ L1 and fn → f a.e. Assume that there exists g ∈ L1 such that |fn | ≤ g for all n. Then fn → f . In particular f ∈ L1 . This is almost the ‘one size ﬁts all’ theorem for all types of exchange of limits. Eg., if we apply it to diﬀerence quotients, it gives us a criterion for diﬀerentiating under the integral sign. Theorem: If f ∈ L1 , then |f | ∈ L1 . The converse holds under the extra hypothesis that f is measurable. — Moreover, if f is measurable and |f | ≤ g with g ∈ L1 , then f ∈ L1 . We had noted the ﬁrst part already above. The issue with the measurability hypothesis in the converse is simple: There exist bounded sets M that are geometrically so wild that χM is not integrable, even in the Lebesgue sense. Then obviously 2χM − 1 isn’t integrable either, but its absolute value is the constant 1, which is integrable (over bounded sets). 10 Theorem: (Fubini) Let f ∈ L1 (Rn+m ). Then g(y) := Rn f (x, y) dx exists and is ﬁnite for a.e. y ∈ Rm (i.e., the function f (·, y) ∈ L1 (Rn ) for a.e. y), and g ∈ L1 (Rm ), and f (x, y)d(x, y) = Rm g(y)dy. Theorem: (Tonelli) – partial converse of Fubini Let f : Rn+m → R be measurable and assume the iterated integral Rm Rn |f (x, y)|dx dy exists and is ﬁnite. Then f ∈ L1 (Rn+m ), and the conclusions of Fubini’s theorem hold. These two are for practical calculation of multi-variable integrals in the way ususally done for Riemann integrals, but under weaker hypotheses. Tonelli needs the hypothesis about |f | rather than f in the repeated integral, because otherwise the pre-speciﬁed order of integration could camouﬂage ‘∞−∞’ type of problems: Eg., take g(x) := x/(1 + x2 )2 . Clearly g ∈ L1 (R), and g = 0. But for f (x, y) := g(x), clearly f ∈ L1 (R2 ). Trying to integrate f over y ﬁrst gives +∞ for all positive x and −∞ for all negative x. But now that we are forced to assume the iterated integral for |f | rather than f , we also need the measurability hypothesis, for the same reason as in the previous theorem on |f |. Note: Since the Riemann integral is the restriction of the Lebesgue integral to more ‘benign’ functions, the convergence theorems, in particular the dominated convergence theorem, has a corol- lary for the Riemann integral. The distinction is that for the Riemann integral, we would need to assume the Riemann integrability of the limit function, whereas its Lebesgue integrability is a conclusion. But if in a given situation we already know enough about the limit function to check that it is Riemann integrable, then most likely we don’t need the dominated convergence theorem at all in that situation. — An analogous situation is the following: If we already know that ∞ 2−n 0 is a rational number, then it is probably because we know that it is 2. But in this case, we do not need any abstract theorem to assure us of the convergence of this series. In contrast, if we study ∞ 0 1/n!, it is useful to know about convergence of this series before knowing its value, because we then deﬁne the number e as the value of this series. Measure Measure is an abstract notion that reduces to the notion of length/area/volume for nice sets for which these latter notions are naively deﬁned. We intend to deﬁne the measure of a set as the integral of its characteristic function. There is however a subtlety to be considered before we can capture the deﬁnition precisely. All these subleties are produced as consequences of the axiom of choice from abstract set theory. The axiom of choice says: Given a set A whose elements are non-empty sets A, we may deﬁne a set R consisting of exactly one element taken from each set A ∈ A. This principle allows to construct counterintutive geometric examples: The Banach Tarski Paradox (BTP): Let B be the closed unit ball in R3 . It is possible to write B as a disjoint union of ﬁnitely many sets Bi and to have congruence mappings φi : Rn → Rn (i.e., isometries) in such a way that the union φi (Bi ) is the union of two closed unit balls! This means that any generalization of the naive volume notion cannot be deﬁned on arbitray sets, but must exclude certain ‘very wild’ sets, for which a volume (measure) would not be deﬁned. There needs to be a notion of a ‘measurable set’ M . And (some of) the pieces Bi in the BTP should be ‘not measurable’. Basically the BTP is surprisingly elementary to prove, given its radically counterintuitive nature. It is not so diﬀerent from the fact that, for instance, Z is the disjoint union of two sets 2Z = {. . . , −4, −2, 0, 2, 4, . . .} and 2Z + 1 = {. . . , −3, −1, 1, 3, 5, . . .}; but both of these sets and their union Z are all of the same ‘size’ in that each set is in 1-1 correspondence with the other. What is 11 needed as an extra ingredient to get the BTP is a study of the group of isometries of R3 to do the decomposition in such a way that the pieces are geometrically congruent (i.e., isometric). It has been shown that the existence of (Lebesgue-)non-measurable sets in Rn is needs the axiom of choice (relative to the other axioms of set theory); this implies in particular that by constructive means (i.e., without using the axiom of choice), we will never encounter a non-measurable set (nor a non-measurable function). How could we deﬁne measurability of a set? The condition ‘ χM exists and is ﬁnite’ is too strict, because it would disqualify M = R3 , which deserves to be called measurable, albeit with inﬁnite measure. The condition ‘ χM exists’ (ﬁnite or inﬁnite) is too lax, because it would let a wild bounded set (like the Bi from BTP) hide in the shadow of a bona-ﬁde measurable set of inﬁnite measure: Say Bi is a subset of the left half space, and let H be the right half space. Then χBi ∪H = ∞, but we do not want to call Bi ∪ H measurable. Measure theory takes an axiomatic approach to the notion of measurable, and then constructs the Lebesgue measure as an example of this abstract notion. One popular approach to the Lebesgue integral builds on ﬁrst constructing the Lebesgue measure and then deﬁning the Lebesgue integral in terms of the measure. The Daniell approach comes from the opposite direction: ﬁrst the integral, then the measure. The subtlety about measurable sets therefore necessitates to deﬁne a notion of ‘measurable function’; then we can call a set M measurable, if its characteristic function χM is measurable; and it will turn out that under this hypothesis χM is deﬁned (ﬁnite or +∞), and we call this value the measure of M . Deﬁnition: (a) A function f is measurable, if it is the limit (a.e.) of a sequence of integrable functions. (b) Equivalently, f is measurable, if for every pair of integrable functions g and h satisfying g < h, the ‘cut oﬀ function’ max{g, min{f, h}} is integrable. The implication (a) =⇒ (b) follows from the dominated convergence theorem. The implication (b) =⇒ (a) follows by choosing g = −h = nχBn (0) . Deﬁnition: A set M is measurable if and only if its characteristic function χM is measurable. The measure of a measurable set is the integral of its characteristic function (ﬁnite or inﬁnite). Sets of measure zero are exactly the null sets introduced earlier. (For those who know the measure theoretic approach: our notion of measurable is the one called ‘Lebesgue measurable’, not the one called ‘Borel measurable’ there. I will not try to outline the equivalence of the Daniell approach with the measure theoretic one. However, I list the following equivalent notions of Lebesgue-measurability for reference.) Theorem: A function f : Rn → R is measurable if and only if it is the limit (ae) of a sequence of continuous functions. — A function f : Rn → R is measurable if and only if the sets {x | f (x) > α} are measurable for every α ∈ R. — Complements, diﬀerences, countable unions, and countable intersections of measurable sets are measurable. Open sets are measurable; so are closed sets. Theorem: Continuous and integrable functions are (trivially) measurable. If g : Rn → R is continuous and fi (i = 1, . . . , n) are measurable, then the function x → g(f1 (x), . . . , fn (x)) is also measurable. In particular, sums, products, quotients with non-zero denominator, of measurable functions are measurable. Limits (ae) of sequences of measurable functions are measurable. Note: If f is measurable and g continuous, f ◦ g need not be measurable. However, if f is measurable and g is C 1 , then f ◦ g is measurable. A counterexample in the ‘g continuous’ case can be constructed via the ‘devil’s staircase function’ discussed below. 12 Popular counterexamples The Banach Tarski Paradox we mentioned already. A non-measurable set: But here is an even simpler (if less spectacular) example of a non- measurable set: On R, call two numbers equivalent r1 ∼ r2 if and only if their diﬀerence is rational. Select one representative from each equivalence class. Call the set of representatives S. Let S0 := {x − ⌊x⌋ | x ∈ S}. This serves the purpose to choose representatives in [0, 1]. For each q ∈ Q, let Sq := {x + q | x ∈ S0 }. All the Sq are disjoint, since we have only chosen one representative from each equivalence class. If S0 had positive measure m, then Sq would have the same measure m for each q, and the countable union T := q∈Q;|q|≤1 Sq would have inﬁnite measure. But T ⊂ [−1, 2]. If S0 had measure 0, then the countable union T ′ := q∈Q Sq would also have measure 0. But T ′ = R. So S0 cannot be measurable. The devil’s staircase: This is a continuous (and monotone) function that has a classical derivative almost everywhere (namely on the complement of the Cantor set C), but that is not constant. Namely, deﬁne ψ inductively on the intervals that are the complements of the Cantor set: 1 ψ(x) = 2 for 1 x ∈ ] 3 , 2 [, 3 1 ψ(x) = 4 for 1 x ∈ ] 9 , 2 [, 9 3 ψ(x) = 4 for 7 x ∈ ] 9 , 8 [, 9 1 1 2 ψ(x) = 8 for x ∈ ] 27 , 27 [, 3 7 8 ψ(x) = 8 for x ∈ ] 27 , 27 [, 5 ψ(x) = 8 for 19 x ∈ ] 27 , 20 [, 27 7 ψ(x) = 8 for 25 x ∈ ] 27 , 26 [ etc. 27 The function can be extended continuously onto all of [0, 1], and its derivative is deﬁned and 0 on the complement of C. (It is +∞ on the null set C.) But ψ is not constant. 1 Sometimes the function φ(x) := 2 (x + ψ(x)) is useful, because it is a homeomorphism, i.e., continuous with a continuous inverse. 1 For instance, for the Cantor set C, we have φ(C), a compact set with measure 2 , and φ([0, 1] \ C) 1 an open set with measure 2 . It is possible to have a non-measurable subset M of φ(C). But N := φ−1 (M ) ⊂ C is a null set. This means that χN is measurable, but χN ◦ φ−1 = χM is not measurable, even though φ−1 is continuous. A study of properties of derivatives in connection with Lebesgue-integrable functions is better left to a more advanced course in real analysis. You might be enticed to say, concerning the devil’s staircase function ψ: Since ψ ′ is deﬁned and 0 almost everywhere (and undeﬁned only on a null set, namely C), we can integrate ψ ′ in the sense of the Lebesgue integral. But if we do this, we observe that 1 ψ ′ (x) dx = 0 = 1 = ψ(1) − ψ(0) 0 So we seem to forfeit the fundamental theorem of calculus. If this were indeed the case, the whole theory would be dead on arrival. The issue with this is that the notion of derivative ‘classical derivative almost everywhere’ is not a useful notion; to regain the fundamental theorem of calculus (FTC) in such a general setting requires a more sophisticated generalization of the notion of deriva- tive. Doing so would make the FTC the cornerstone of the deﬁnition of derivative, and integration 13 would need to be studied prior to this, as we are indeed doing here. So if you ﬁnd it paradoxical that we deal with integration ﬁrst, be alerted that it is not paradoxical at all. However, we will not pursue the depth indicated here when dealing with derivatives later, but rather study them in a classical abstract multi-variable setting. 7.4 Vector Lattices and Elementary Integrals The starting point of our approach to integration is a vector space of real valued functions that is closed under the operations max and min. On such a space we deﬁne an integral whose primary properties are linearity, monotonicity, and a benign behavior with respect to monotone convergence. Deﬁnition 7.4.1 A vector lattice is a real vector space V consisting of functions X → R (hence a subspace of F(X → R)) satisfying the extra property that f, g ∈ V implies max{f, g} ∈ V . Corollary 7.4.2 In a vector lattice V , if f, g ∈ V then also min{f, g} ∈ V , |f | ∈ V , f+ := max{f, 0} ∈ V , f− := max{−f, 0} ∈ V . The proof is an easy exercise. Deﬁnition 7.4.3 On a vector lattice V , an elementary integral is a function I : V → R satisfying the following properties, for all f, g ∈ V and all λ ∈ R: I(λf ) = λI(f ) linearity I(f + g) = I(f ) + I(g) f ≤ g ⇒ I(f ) ≤ I(g) monotonicity fn ց 0 ⇒ I(fn ) → 0 continuity with respect to monotone convergence (i) (i) Example 7.4.4 In Rn , call a function f a step function, iﬀ there exist partitions a0 < a1 < (i) (1) (1) (n) (n) . . . < aki for each i ∈ {1, 2, . . . , n} such that for each open box ]aj1 , aj1 +1 [ × . . . × ]ajn , ajn +1 [, the restriction of f to this box is a constant, and f vanishes outside the closure of the union of these boxes, and f is bounded. (This latter condition is only a technical restriction and concerns the values on the boundaries of the open boxes.) Step functions make up a vector lattice S. An elementary integral on S is given by the Riemann integral, or equivalently and in more elementary terms: I(f ) = µ(Bi )fi , where fi is the value of f on Bi , and µ(Bi ) is the elementary geometric volume of the box Bi , namely the product of the side lengths. All properties are easily seen and an exercise for the reader, with the exception of the continuity property for the elementary integral, which will be proved as Lemma 7.4.12. Example 7.4.5 In Rn , consider all continuous functions with real values. They form a vector lattice C 0 (Rn ). Select one point x∗ ∈ Rn arbitrarily, and consider the function δx∗ (f ) := f (x∗ ). Then δx∗ is an elementary integral. Deﬁnition 7.4.6 Given a metric space X, the support of a function f : X → R or f : X → C is deﬁned to be the closure of the set {x ∈ X | f (x) = 0}. Example 7.4.7 In Rn , consider all continuous functions that vanish outside some bounded set, i.e., all continuous functions with compact support. This is a vector lattice, called Ccpt(Rn ). As 0 14 an elementary integral, we can take the Riemann integral of a function f ∈ Ccpt over all of Rn , or 0 equivalently, over a big cube containing the support of the function. All properties are easily seen, except for the continuity, which follows from the Theorem of Dini 7.4.10 proved below. In this example, where the elementary integral is a Riemann integral that (unlike Example 7.4.4) cannot trivially be written as a ﬁnite sum, we will assume the linearity and monotonicity of the Riemann integral from elementary calculus. However, it is not necessary, strictly speaking, to have a clean construction of Riemann’s integral prior to a construction of Lebesgue’s integral. Rather, we will obtain Lebesgue’s integral based on Example 7.4.4. It will become clear a-posteriori that 0 starting the construction with Ccpt functions rather than step functions leads to the very same integral. Example 7.4.8 Choose a (weakly) increasing continuous function g : R → R and take as a vector lattice the family of step functions on R. For f = fi on ]ai , bi [, deﬁne Ig (f ) := fi (g(bi ) − g(ai )). This type of integral is called a Stieltjes integral; it weights the value of a function with a local rate of increase of a given function g. Again, Ig is an elementary integral. Example 7.4.9 The set of those functions f : N → R for which f (n) = 0 for only ﬁnitely many n (in other words, compactly supported sequences) forms a vector lattice Seq. On this lattice, I(f ) := n f (n) is an elementary integral. (The sum by which it is deﬁned is a ﬁnite sum!) Theorem 7.4.10 (Dini’s Theorem) Suppose K is compact and fi : K → R are continuous func- tions. Suppose the sequence (fi ) converges monotonically to a continuous function f . Then fi → f uniformly. Proof: Without loss of generality, assume fi ց f (for increasing convergence, take −fi and −f instead). Without loss of generality, assume f = 0, else consider fi − f instead of fi . The assumption is fi (x) ց 0 for all x ∈ K, which in particular includes fi (x) ≥ 0 for all i, x. Let ε > 0. For each x, there is some n(x) such that 0 ≤ fi (x) < ε/2 for i ≥ n(x). By continuity of fn(x) , there exists an open ball B(x, δ(x)) such that fn(x) (y) < ε for y ∈ B(x, δ). By monotonicity, this implies fi (y) < ε for all y ∈ B(x, δ(x)) and all i ≥ n(x). Trivially, the B(x, δ(x)) for x ∈ K form an open cover of K. By compactness, we have a ﬁnite subcover, i.e., ﬁnitely many xj such that the union of the B(xj , δ(xj )) contains K. Let n be the maximum of n(xj ). Then for all i ≥ n and all y ∈ K, it holds 0 ≤ fi (y) < ε. This shows that fi → f uniformly. 0 Corollary 7.4.11 If fi ց 0 in Ccpt(Rn ), then I(fi ) → 0, where I denotes the Riemann integral. Proof: There is a cube [−N, N ]n containing the support of f1 , and therefore the support of all fi . For i large, 0 ≤ fi (x) < ε/(2N )n , and therefore 0 ≤ I(fi ) ≤ ε. Exercise 7.7 Show by counterexamples that none of the hypotheses of Dini’s theorem can be omit- ted: Dini’s theorem may fail if K is not compact, or if f is not continuous, or if the convergence is pointwise but not monotonic, with all the other hypotheses unchanged in each case. The continuity property for the elementary integral on step functions is proved along the same lines as Dini’s theorem, except that some technicalities due to discontinuities need to be taken care of: 15 Lemma 7.4.12 Let (fi ) be a sequence of step functions (as deﬁned in Example 7.4.4), such that fi ց 0. Then I(fi ) → 0. Proof: We use the max distance on Rn , so that the balls are boxes. f1 is supported in some box C := [a(1) , b(1) ] × · · · × [a(n) , b(n) ] whose volume is V = (b(1) − a(1) ) · · · (b(n) − a(n) ). Let M be a bound for f1 (and hence for all fn ). All fi are supported in this same closed box C. Let (i) ε > 0. For each fi , C is partitioned into boxes Cj (j = 1, . . . , mi ) on whose interior fi is constant. (i) (i) It is possible to cover the union D of the boundaries ∂Cj of the boxes Cj with a union of open (i) boxes Dk of total volume ε/(2M 2i ). (i) For every x that is not on the boundary of any box Cj , we conclude from fi (x) ց 0 that there exists some n(x) such that 0 ≤ fi (x) ≤ ε/(2V ) if i ≥ n(x). The same holds for all y in some ball B(x, δ(x)), because fn(x) is constant on such a ball. (The fi for i > n(x) may not be constant on that same ball any more, but the estimate prevails because fi (y) ≤ fn(x) (y).) (i) The boxes Dk and the balls B(x, δ(x)) together make an open cover of C. There is a ﬁnite subcover (i) consisting of some boxes Dj (which we have taken from among the Dk and renumbered), and some balls B(xj , δ(xj )) =: Bj , which are also boxes. The maximum of all the n(xj ) will be called N . Now for i ≥ N and all x ∈ Bj , we have 0 ≤ fi (x) ≤ ε/(2V ). The total volume of the Bj is trivially ≤ V . For all x ∈ Dj , we have trivially 0 ≤ fi (x) ≤ M . The total volume of the Dj is ≤ i ε/(2M 2i ) = ε/(2M ). The Bj and Dj may overlap, but by reﬁning the partition and renumbering the boxes, we can achieve them to be mutually disjoint, and the estimates about their total volumes remain true. ε ε Therefore I(fi ) ≤ 2V V + 2M M = ε. 7.5 First Extension Step by Monotonic Convergence Deﬁnition 7.5.1 Given a vector lattice V of functions X → R, deﬁne as ↑ V the set of all functions f : X → R ∪ {∞} for which there exists an increasing sequence (fi ) ⊂ V such that fi ր f . Likewise deﬁne as ↓ V the set of all functions f : X → R ∪ {−∞} for which there exists a decreasing sequence (fi ) ⊂ V such that fi ց f . Trivially, V is contained in ↑ V and in ↓ V . Note that ↑ V and ↓ V may not be vector lattices any more, because f ∈ ↑ V does not imply −f ∈ ↑ V . (However f ∈ ↑ V ⇐⇒ −f ∈ ↓ V ). We aim to extend the elementary integral from V to ↑ V and ↓ V . To this end, we will have to show that if fi ր f and gi ր g, then lim I(fi ) = lim I(gi ). Then we can deﬁne I(f ) := lim I(fi ), and moreover, using the constant sequence (f ), the new deﬁnition of I(f ) for f ∈ ↑ V coincides with the old one in case f ∈ V . We prove the Lemma 7.5.2 Suppose fi ր f and let g ∈ V satisfy g ≤ f . Then I(g) ≤ lim I(fi ). Proof: Clearly, by monotonicity lim I(fi ) exists in R ∪ {∞}, and we may assume the limit is ﬁnite, because otherwise the claim is trivial. Consider gi := (g − fi )+ = max{g − fi , 0}. So (gi ) is decreasing and the pointwise limit of gi is max{g − f, 0} = 0. Therefore we have 16 I(g) − I(fi ) = I(g − fi ) ≤ I(gi ) → 0. Hence I(g) − lim I(fi ) ≤ 0. Corollary 7.5.3 (1) If fi ր f and gi ր g and g ≤ f , then lim I(gi ) ≤ lim I(fi ). (2) If fi ր f and gi ր f where fi , gi ∈ V , then lim I(fi ) = lim I(gi ). Proof: Use Lemma 7.5.2 with gj for g, then take the limit as j → ∞ to obtain (1). The converse inequality needed for (2) follows by symmetry. Deﬁnition 7.5.4 For f ∈ ↑ V , choose (fi ) such that fi ր f and deﬁne ↑ I(f ) := lim I(fi ). By the preceding corollary, the deﬁnition is independent of the chosen sequence (fi ). Similarly, for f ∈ ↓ V , choose (fi ) such that fi ց f and deﬁne ↓ I(f ) := lim I(fi ). Clearly f ∈ ↑ V iﬀ −f ∈ ↓ V , and ↑ I(f ) = −↓ I(−f ). Moreover ↑ I(f ) = I(f ) = ↓ I(f ), if f ∈ V . If f ∈ ↑ V ∩ ↓ V , we also have ↑ I(f ) = ↓ I(f ). For if fi ց f and gi ր f , then fi − gi ց 0 and hence I(fi ) − I(gi ) = I(fi − gi ) ց 0. Since lim(I(fi ) − I(gi )) = 0, lim I(fi ) ∈ R ∪ {−∞} and lim gi ∈ R ∪ {∞}, we conclude lim I(fi ) = lim I(gi ) ∈ R. For this reason, we now may drop the superscripts and consider I as deﬁned on ↓ V ∪ ↑ V . Our extended integral retains most, but not all properties from Def 7.4.3 of the elementary integral: Proposition 7.5.5 If f, g ∈ ↑ V and λ ≥ 0, the following proerties hold: I(λf ) = λI(f ) subject to the ad-hoc stipulation 0 · ∞ := 0 I(f + g) = I(f ) + I(g) f ≤ g ⇒ I(f ) ≤ I(g) fn ր f ⇒ I(fn ) → I(f ) in particular f ∈ ↑ V Analogous statements hold in ↓ V . In reading this proposition, note that f may have ∞ among its values, and that the limited arithmetic on the extended real line E did not include a deﬁnition for 0 · ∞ (and for good reasons). The ad-hoc stipulation in the ﬁrst part makes sure that λf is deﬁned for λ = 0 even if ∞ is among the values of f . In measure and integration theory (and only there), this ad-hoc stipulation is reasonable, and is frequently encountered. We will always quote this ad-hoc stipulation where it is used, but will not make it part of the limited arithmetic of E. Proof of Prop. 7.5.5: The (modiﬁed) linearity properties follow because fi ր f and gi ր g imply fi + gi ր f + g and αfi ր αf . The monotonicity property is just Cor. 7.5.3. The limit property is proved as follows: For fn ∈ ↑ V , we can ﬁnd a sequence (fnj )∞ in V such j=1 that fnj ր fn . Now if fn ր f , we can also ﬁnd a sequence hk in V (rather than ↑ V ) such that hn ր f ; namely we choose: hk := max{fij | i, j ≤ k}. Clearly, (hk ) is an increasing sequence. Since fij ≤ fi ≤ f , we have hk ≤ f . We still have to show hk (x) → f (x) for each x. Given x, and given ε > 0, we ﬁnd fi such that fi (x) > f (x) − 1 ε because fi ր f . Likewise, for this fi we ﬁnd fij such 2 that fij (x) > fi (x)− 1 ε. Now letting k := max{i, j}, we have hk (x) ≥ fij (x) > fi (x)− 1 ε > f (x)−ε. 2 2 Now let lim I(fn ) =: I∗ , and we assume this quantity to be ﬁnite for the moment. Given ε > 0, we ﬁnd n such that I(fn ) > I∗ − 2 ε and then we ﬁnd j such that I(fnj ) > I(fn ) − 1 ε. With 1 2 k := max{n, j}, we then have I(hk ) ≥ I(fnj ) > I∗ − ε. So I(f ) = lim I(hk ) ≥ I∗ , and the converse inequality is obvious from fn ≤ f , hence I(fn ) ≤ I(f ). An analogous argument can be made for 17 I∗ = ∞ with I∗ − 1 ε being replaced by a large number N and I∗ − ε by N − 1. 2 Note that in ↑ V , we can not conclude from fn ց 0 that I(fn ) → 0, simply because I(f ) may 1 1 be ∞, and then lim I( n f ) = ∞ even though n f ց 0. We will later prove this convergence result under the extra hypothesis that I(f ) is ﬁnite. Let us now look at ↑ V and ↓ V in the case of the various Examples of elementary integrals studied in the previous section. The easiest case is Proposition 7.5.6 For the vector lattice Seq deﬁned in Example 7.4.9, ↑ Seq consists of those sequences f : N → R ∪ {+∞} for which fn < 0 occurs for at most ﬁnitely many n. A similar statement applies for ↓ Seq. The proof is an easy exercise. 0 Proposition 7.5.7 For the vector lattice S from Example 7.4.4, Ccpt(Rn ) ⊂ ↑ S ∩ ↓ S. 0 Proof: First let f ∈ Ccpt (Rn ) and suppose the support of f is contained in [−a, a]n . Since f is uniformly continuous, for every ε > 0 there exists a k such that on any cube of length a/k, f will oscillate by at most ε in that cube, i.e., |f (x1 ) − f (x2 )| < ε for any x1 , x2 in that cube. So, with ε = 1/i, we ﬁnd a step function fi (constant on the boxes created by an equidistant partition of [−a, a] and vanishing outside [−a, a]n ), such that sup |f − fi | ≤ 1 . These fi converge uniformly i to f . We take gi (x) := fi (x) − 1 for x ∈ [−a, a]n and 0 otherwise. gi → f uniformly, and morever i gi ≤ f . Now let hi := max{g1 , . . . , gi }. The sequence (hi ) is increasing, and still converges to f uniformly (and hence pointwise). So f ∈ ↑ S. The proof that f ∈ ↓ S is similar. We will not give a precise characterization of ↑ S and ↓ S, but will improve on Proposition 7.5.7 after studying the next example. For the next proposition, we remind the reader of the deﬁnition of lower semicontinuity, and the properties proved about it in Sec. 3.4a. Moreover, we prove the following simple result, a generalization of Thm 3.5.8, which is also of independent interest in the study of inﬁnite dimensional minimization problems (Calculus of Variations): Lemma 7.5.8 A lower semicontinuous function on a compact set takes on its minimum. Proof: Let f be lsc on a compact set K, and let m := inf K f ∈ R ∪ {−∞}. Take a sequence (xk ) such that f (xk ) → m. By descending to a subsequence, we may assume that xk →: x∗ . Since f (x∗ ) ≤ lim inf f (xk ) = m by lower semicontinuity, but also f (x∗ ) ≥ m by m being the inﬁmum, we conclude that m is a minimum, and equals f (x∗ ). Proposition 7.5.9 For the vector lattice Ccpt, the functions in ↑ Ccpt are precisely those lower 0 0 0 semicontinuous functions f : Rn → R ∪ {∞} that are ≥ 0 outside some compact set. Those in ↓ Ccpt are precisely those upper semicontinuous functions f : R n → R ∪ {−∞} that are ≤ 0 outside some compact set. Proof: We prove the ﬁrst part, the second being analogous. We have seen in Exercise 7.4 that all 0 functions f ∈ ↑ Ccpt are lower semicontinuous. Moreover, if f1 ր f and fi have compact support, 18 then f ≥ 0 outside the support of f1 . Now let’s look at the converse claim and assume f is lower semicontinuous and f ≥ 0 outside the compact set K. It is no loss of generality to assume that K is the closed ball C(0, R) for some large R (if not, replace K with such a ball containing K). Let m0 := inf f . As f takes on a (ﬁnite) minimum on K by the previous lemma, and f ≥ 0 outside K, we know that m0 is ﬁnite, not −∞. Let m := min{m0 , 0} ≤ inf f . We will construct a countable 0 family of Ccpt functions, the supremum of which is f , and this countable family we can actually convert into an increasing sequence that converges to f . To start this program, we note that for each x ∈ Rn and each t < f (x), there exists some r > 0 such that f > t on all of B(x, r). We may choose r rational, and we will only use this observation to x ∈ Qn and t ∈ Q. We deﬁne the countably inﬁnite set G := {(x, t, r) ∈ Qn × Q × Q | r > 0, f > t on B(x, r)}. For the ith element (x, t, r) of G (in some enumeration N → G), we deﬁne the continuous function gi by r t t if y ∈ B(x, 2 ) gi (y) := m if y ∈ B(x, r) / r m r t + ( 2 |y − x| − 1)(m − t) if y ∈ B(x, r) \ B(x, 2 ) r x By construction, gi ≤ f and gi is continuous (but does not have compact support yet, unless m0 ≥ 0). We introduce g(x) := sup{gi (x) | i ∈ N}, observe trivially that g ≤ f , and now we will show that for each ε > 0, g ≥ f − ε, hence altogether g = f . So let ε > 0 and let x ∈ Rn . First ﬁnd a rational number t ∈ ]f (x) − ε, f (x)[. Since f is lsc and f (x) > t, there is a δ > 0 such that f > t on the entire ball B(x, δ). Choose y ∈ Qn ∩ B(x, 1 δ),4 1 3 and choose a rational r in ] 2 δ, 4 δ[. This guarantees that B(y, r) ⊂ B(x, δ) (hence (y, t, r) ∈ G), 1 and x ∈ B(y, 2 r), hence gi (x) = t > f (x) − ε. Now if we let hk (x) = max{gi (x) | i = 1, . . . , k}, we have an increasing sequence hk ր g = f with hk continuous. If m0 = inf f ≥ 0, we have m = 0, and our gi and hk already have compact support. If however m0 = m < 0, the construction needs a minor modiﬁcation: We introduce the extra function g0 by g0 (x) = m < 0 for x ∈ K = B(0, R), and g0 (x) = 0 for x ∈ B(0, 2R), and / g0 (x) = m(1 − (|x| − R)/R) otherwise. Now g0 ∈ Ccpt 0 and g ≤ f . Then we take h := max{g | 0 k i i = 0, 1, . . . , k}, and now all the hk are supported in C(0, 2R). (This proof is a variant of the one found in Jost’s Postmodern Analysis, Thm. 12.10) 0 0 Now that we have a precise understanding of ↑ Ccpt and ↓ Ccpt , we can return to improve upon Propo- sition 7.5.7: Proposition 7.5.10 ↑ S ⊃ ↑ Ccpt and ↓ S ⊃ ↓ Ccpt. 0 0 0 0 Proof: We prove the ﬁrst part. Let f ∈ ↑ Ccpt. Then there exists a sequence (fk ) in Ccpt such that fk ր f . By Proposition 7.5.7, we have, for each fk , a sequence (fkj )∞ in S such that fkj ր fk . j=1 Now let hk := max{fij | 1 ≤ i, j ≤ k} and argue exactly as in the proof of Prop. 7.5.5. Remark 7.5.11 The proof idea of this proposition contains a general insight, namely, that repeat- ing the ﬁrst extension step will not produce anything new: ↑↑ V = ↑ V (in slight abuse of notation since ↑ V is not actually a vector space any more). Remark 7.5.12 Prop. 7.5.10 is very instructive, and is given for didactical, rather than mathe- matical reasons. At ﬁrst sight, one might think that starting the extension process with the extremely 19 simple elementary integral on step functions might limit the scope of the integral resulting from the extension process, and that starting with a more sophisticated elementary integral (like the Rie- mann integral) might be needed to get the most powerful integral notion in the end. Prop. 7.5.10 is a partial refutation of this idea: At least if you start with the restriction of the Riemann integral to continuous functions, you will not have more functions after the ﬁrst extension step than you get by starting with step functions. Deciding the same question for V consisting of all Riemann integrable functions is not so obvious, in particular without a precise characterization of the class of Riemann integrable functions; however we will see that after the full two-step extension process is completed, starting with either S or 0 Ccpt , we do encompass all Riemann integrable functions, and that iteration of the full extension procedure does not yield further integrable functions. We will later return to this issue and see that 0 any of these choices S, Ccpt, or all Riemann integrable functions, produce the same integral after both extension steps are completed. At this point we have already deﬁned the measure of every open set, and of every compact set in Rn , i.e., the integral of the characteristic function of each such set: Lemma 7.5.13 If U ⊂ Rn is open, then χU ∈ ↑ Ccpt. If C ⊂ Rn is compact, then χC ∈ ↓ Ccpt . 0 0 Proof: Remember the dist function in a metric space from Hwk. 3.4J1: Given a set A and a point x, we deﬁned dist(x, A) := inf{d(x, y) | y ∈ A}, and we showed | dist(x, A) − dist(y, A)| ≤ d(x, y), so dist(·, A) is a continuous function. Given any closed set C, we can deﬁne fk (x) := (1−k dist(x, C))+ , which is deﬁned to be max{0, 1− k dist(x, C)}. The fk are continuous, and if C is also bounded, they have bounded support (hence compact support, because the suport is by deﬁnition a closed set). Clearly (fk ) is a decreasing sequence. If x ∈ C, then dist(x, C) = 0 and hence fk (x) = 1 for all k. If x ∈ C, then dist(x, C) > 0 / 0 (by Exercise 3.4J4), and therefore fk (x) = 0 for all suﬃciently large k. This proves that χC ∈ ↓ Ccpt. Now consider the case of an open set U , and ﬁrst assume also that U is bounded. Then we can deﬁne fk (x) := min{1, k dist(x, U c )}. Then by a similar reasoning, fk ր χU , and the fk are con- tinuous with compact support. If U fails to be bounded, the fk fail to have compact support and the construction needs to be modiﬁed a bit. Consider the function ik (x) := (1 − (|x| − k)+ )+ ; i.e., ik (x) = 1 if |x| ≤ k; and ik (x) := 0 if |x| ≥ k + 1; and ik (x) ∈ [0, 1] otherwise. ik is a ‘truncation of the constant 1’. We can now choose fk (x) := min{ik (x), k dist(x, U c )} to ﬁnish the proof. Exercise 7.8 With V = Ccpt , decide which of the following functions are in ↑ V and which in ↓ V , 0 or both or neither. χZ , χ[0,1[ , χ[0,1] , χ]0,1[ . Exercise 7.9 With V = S according to Example 7.4.4, decide whether the function f : R → R given by f (x) = x2 is in ↑ S, in ↓ S, both or neither. Give a proof for ‘not in this set’ and an explicit monotonic sequence of step functions for ‘in this set’. Consider g : R → R given by g(x) = x/(1 + x4 ). Explain why it is neither in ↑ S nor in ↓ S. Give at least two ways of writing g as g = g1 + g2 with g1 ∈ ↑ S and g2 ∈ ↓ S. (In this problem, the diversity bigshots will return a vicious frown to everybody who frowns at piecewise deﬁned functions.) 20 Exercise 7.10 We could extend the deﬁnition of I from ↑ V ∪ ↓ V to ↑ V + ↓ V , i.e., when g can be written as g = g1 + g2 with g1 ∈ ↑ V and g2 ∈ ↓ V and I(gi ) ﬁnite, then I(g) := I(g1 ) + I(g2 ). Prove that this is well-deﬁned, i.e., independent of the choice of decomposition of g. The reason why we forego this opportunity is that the extra functions thus gathered as integrable will anyways be gathered in the next step. 0 Exercise 7.11 Let I denote the Riemann integral on Ccpt and assume knowledge of the Riemann integral from elementary calculus. Prove for a bounded open interval J that I(χJ ) = µ(J), the length of J. Prove a similar result for the multi-variable case with an open box B ⊂ Rn . It is convenient to work with products of single-variable functions in this case, and you may use repeated integrals to evaluate multi-variable Riemann integrals as done in elementary MV-calculus. Remember: The logical outline by which we present the entire theory is to construct an integral by extending the integral of step functions; and we will later recognize the Riemann-integral as a special case, based only on the elementary deﬁnition of the Riemann integral, but not on any elaborate theory for the Riemann integral. This is the rigorous part. At the same time, we study 0 an alternative approach based on Ccpt , which would be rigorous now if we had proved beforehand the properties of the Riemann integral you know from elementary calculus. This part serves a didactical purpose, and the rigor of the entire exposition does not depend on it. Exercise 7.12 Given a metric on Rn providing the ‘usual’ topology (i.e., the metric is equivalent to the euclidean metric), and an open set U ⊂ Rn . Consider the countable family B := {B(x, δ) | x ∈ Qn , δ ∈ Q+ }. Show that U is the union of all those B ∈ B that are contained in U . Use this for a direct proof that χU ∈ ↑ S. — Give a second proof that χU ∈ ↑ S as a consequence of numbered results in this chapter. 7.6 Upper and Lower Integral; Integrable Functions Deﬁnition 7.6.1 Given a vector lattice V of functions X → R, and an arbitrary function f : X → R ∪ {±∞}, we deﬁne the upper integral I ∗ (f ) := inf{I(g) | g ≥ f and g ∈ ↑ V }. Likewise, deﬁne the lower integral I∗ (f ) := sup{I(h) | h ≤ f and h ∈ ↓ V }. (It is understood here that inf ∅ := +∞ and sup ∅ := −∞.) Call f integrable, iﬀ I∗ (f ) = I ∗ (f ) ∈ R. If f is integrable, we deﬁne I(f ) to be the common value of I∗ (f ) = I ∗ (f ). (Due to part (2) of Lemma 7.6.2 below, this leads to no conﬂict with the prior deﬁnition of I on ↑ V ∪ ↓ V ). The set of all integrable functions X → R is called L1 (X), or, if the integral needs to be denoted, L1 (X, I). We call I on L1 (X) the Daniell extension of the elementary integral I on V . It will turn out that this simple 2-step extension that has been accomplished by this deﬁnition gives us an integral with powerful completeness ad convergence properties. In the case where the elementary integral with which we started is the one from either Example 7.4.4 of 7.4.7, this integral is equivalent to the integral notion originally deﬁned by Lebesgue, and we will call it the Lebesgue integral for this reason. The approach presented here is however not the one due to Lebesgue, but rather was invented by P.J. Daniell (‘A General Form of Integral’; Annals of Mathematics 2nd Ser, Vol 19; June 1918; pp 279–294) Lemma 7.6.2 Let E denote the extended real line R ∪ {−∞, ∞}. (1) For all functions f : X → E, it holds I∗ (f ) ≤ I ∗ (f ). (2) If f ∈ ↑ V ∪ ↓ V , then I∗ (f ) = I ∗ (f ) = I(f ) ∈ E 21 (3) Let f ∈ L1 (X) and λ ∈ R; let h(x) := λf (x), subject to the ad-hoc stipulation that 0 · ∞ := 0. Then h ∈ L1 (X) and I(h) = λI(f ). (4) Let f1 , f2 ∈ L1 (X) and let f (x) := f1 (x) + f2 (x) wherever deﬁned, and f (x) arbitrary where f1 (x) + f2 (x) is not deﬁned. Then f ∈ L1 (X) and I(f ) = I(f1 ) + I(f2 ). (5) If f, g ∈ L1 (X) and f ≤ g, then I(f ) ≤ I(g). (6) If f ≤ h ≤ g and f, g ∈ L1 (X) with I(f ) = I(g); then h ∈ L1 (X) and I(h) = I(f ). (7) If f1 , f2 ∈ L1 (X), then so are max{f1 , f2 }, min{f1 , f2 }, (f1 )+ , (f1 )− and |f1 |. The issue with the clumsy wording in (4) is that ∞ − ∞ and is not deﬁned. We choose to allow functions with values +∞ and −∞, because otherwise the monotonic convergence issues would become clumsy: perfectly benign functions like f (x) = |x|−1/2 are naturally approximated by an increasing sequence of continuous functions, if we let f (0) = ∞; namely we can let fn := min{f, n}. But this creates us the problem that now L1 (X) is, pedantically speaking, not even a vector space, because the sum of functions may not be deﬁned. However, for integrable functions f, g, any undeﬁned values of f + g may be supplied arbitrarily without aﬀecting the integral. This indicates that for practical purposes, L1 (X) is ‘as good as’ a vector space, actually a vector lattice. We will soon prove that the continuity property for elementary integrals is also satisﬁed for the extended integral I. So, apart from the technicality created by occasionally undeﬁned values, we have salvaged all properties of the elementary integral through the extension process. Proof of Lemma 7.6.2: For (1) we choose h, g such that ↓ V ∋ h ≤ f ≤ g ∈ ↑ V . The claim is trivial if no such h or g exist. We have to show that ↓ V ∋ h ≤ g ∈ ↑ V implies I(h) ≤ I(g). The claim then follows by taking the sup and inf respectively. But since ↑ V ∋ g − h ≥ 0, this follows from Prop. 7.5.5. For (2), assume f ∈ ↑ V (the other case is analogous). Then clearly I ∗ (f ) ≤ I(f ), since g := f is allowable in inf I(g). On the other hand, we have V ∋ fn ր f , so we can choose the fn for h and conclude I∗ (f ) ≥ I(fn ) → I(f ). With part (1), we get I(f ) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I(f ) and hence the claim. For λ ≥ 0, claim (3) follows immediately from Prop. 7.5.5. For λ = −1, it follows because ↑ V = −↓ V and I(f ) = −I(−f ) as upper integrals for f correspond to lower integrals for −f and vice versa. The general case is a combination of these two. Now concerning (4), note that among the g ∈ ↑ V that satisfy g ≥ f are the g := g1 + g2 with ↑ V ∋ g ≥ f . In particular, this observation also applies for those x for which f (x ) + f (x ) i i 0 1 0 2 0 is not deﬁned, because in this case one of the fi (x0 ) (say f1 (x0 )) must be ∞. Then g1 (x0 ) must then be ∞ as well; but g2 cannot be −∞ since g2 ∈ ↑ V . So g(x0 ) = ∞ ≥ f (x0 ) regardless how we deﬁne f (x0 ). Then I ∗ (f ) ≤ I(g1 + g2 ) = I(g1 ) + I(g2 ). By taking the inﬁma over g1 and g2 , we conclude I ∗ (f ) ≤ I ∗ (f1 ) + I ∗ (f2 ). Similarly, I∗ (f ) ≥ I∗ (f1 ) + I∗ (f2 ). Note that these inequalities have been proved for all f, fi : X → E subject to the condition f = f1 + f2 where the sum is deﬁned (hence Cor. 7.6.3 below). But now, if fi are integrable, we can continue: I ∗ (f ) ≤ I ∗ (f1 ) + I ∗ (f2 ) = I∗ (f1 ) + I∗ (f2 ) ≤ I∗ (f ) ≤ I ∗ (f ) and therefore f is integrable and equality holds everywhere. Moreover, if f ≤ g, then I ∗ (f ) ≤ I ∗ (g) because the lhs is an inﬁmum over a larger set of competitors. Similarly I∗ (f ) ≤ I∗ (g) because the lhs is the sup over a smaller set of competitors. Property (5) follows trivially. Also under the hypotheses for (6), we have I ∗ (f ) ≤ I ∗ (h) ≤ I ∗ (g) and I∗ (f ) ≤ I∗ (h) ≤ I∗ (g), and the integrability of f, g implies the claim. 22 It suﬃces to prove (7) for max{f1 , f2 }. By carrying the max property of V through the approxi- mating sequences, it is clear that max{g1 , g2 } ∈ ↑ V if g1 , g2 ∈ ↑ V , and also for min; the same result applies for h1 , h2 ∈ ↓ V . Now assume f1 , f2 ∈ L1 (X) and let ε > 0. Then there are gi ∈ ↑ V (and 1 gi ≥ fi ) such that I(gi ) ≤ I(fi ) + 4 ε < ∞ (for i = 1, 2). Likewise, there are hi ∈ ↓ V (with hi ≤ fi ) 1 such that I(hi ) ≥ I(fi ) − 4 ε > −∞. As g := max{g1 , g2 } ≥ max{f1 , f2 } =: f , and g ∈ ↑ V , we have I ∗ (f ) ≤ I ∗ (g) = I(g) ∈ R ∪ {∞}. (So far, an inﬁnite value is not ruled out; but it will be ruled out later.) And similarly, I∗ (f ) ≥ I∗ (h) = I(h) for h := max{h1 , h2 }. Now observe that 0 ≤ g − h = max{g1 , g2 } − max{h1 , h2 } ≤ (g1 − h1 ) + (g2 − h2 ) . This estimate can be checked for each x case by case depending on the relative sizes of the gi (x), hi (x). We conclude 1 1 I ∗ (f ) − I∗ (f ) ≤ I(g) − I(h) = I(g − h) ≤ I(g1 − h1 ) + I(g2 − h2 ) ≤ ε+ ε =ε . 2 2 This inequality in particular rules out inﬁnite values for I(g) and I(h), and therefore for I ∗ (f ) and I∗ (f ). But since the argument could be made for every ε > 0, we conclude that I ∗ (f ) = I∗ (f ), hence f is integrable. We stress here that I(f ) is deﬁned as ↑ I(f ) or ↓ I(f ), as the case may be, for all f ∈ ↑ V ∪↓ V according to Def. 7.5.4 and the paragraph following it. For all of these functions, we have I ∗ (f ) = I∗ (f ) = I(f ) by 7.6.2(2); however, among the functions in ↑ V ∪ ↓ V , only those for which I(f ) is ﬁnite are called integrable according to Def. 7.6.1. There can also be functions not in ↑ V ∪ ↓ V for which I ∗ (f ) and I∗ (f ) coincide; but only if this common value is ﬁnite (and the functions are thus integrable) do we use the notation I(f ) for them. Let us harvest two easy corollaries from the proof of (3) and (7) respectively, for easy reference: Corollary 7.6.3 For all f, g : X → E, we have I ∗ (f + g) ≤ I ∗ (f ) + I ∗ (g), and I∗ (f + g) ≥ I∗ (f ) + I∗ (g). (Here, arbitrary values may be prescribed for points x where the limited arithmetic of E does not deﬁne f (x) + g(x).) Corollary 7.6.4 f is integrable if and only if for every ε > 0, there exist functions h ∈ ↓ V and g ∈ ↑ V such that h ≤ f ≤ g and I(g) − I(h) < ε. Of this latter corollary, one direction (which one?) is harvested from the proof of Lemma 7.6.2; the converse is an easy exercise. Exercise 7.13 For integrable f , prove: |I(f )| ≤ I(|f |) In Lemma 7.6.2, we had to deal with obnoxious exceptions caused by limitations of the arithmetic of the extended real line E. Let us address this issue with a clean language. Deﬁnition 7.6.5 Given f : X → E, we denote by Ef the set {x ∈ X | f (x) ∈ R}, i.e., the / set where f has inﬁnite values. A set E ⊂ X is called an I-null set, if there exists a function f ∈ L1 (X, I) such that E = Ef . A property is said to hold I-almost everywhere (I-a.e.), iﬀ it holds for all points with the exception of those in an I-null set. 23 With this deﬁnition, we have the following lemma, of which the ﬁrst part is an easy variant of Lemma 7.6.2(4): Lemma 7.6.6 If g ∈ L1 (X, I) and h : X → E satisﬁes h(x) = g(x) for all x ∈ E, where E is an / I-null set, then h ∈ L1 (X) and I(g) = I(h). A set E is an I-null set if and only if χE ∈ L1 (X, I) and I(χE ) = 0. Proof: For the ﬁrst part, suppose E = Ef , an I-null set, where f ∈ L1 (X, I). The idea is to consider (f + g) − f , and use Lemma 7.6.2(4). Deﬁne p(x) := f (x) + g(x) wherever this makes sense; this includes all x ∈ Ef , and possibly some / x ∈ Ef . For the other x, deﬁne p(x) := f (x). Then p ∈ L1 (X, I) and I(p) = I(f ) + I(g). Now deﬁne q(x) := p(x) − f (x), wherever this makes sense (which includes all x ∈ Ef ). Otherwise deﬁne q(x) := h(x). Then q ∈ L1 (X, I) and I(q) = I(p) + I(−f ) = I(p) − I(f ) = I(g). We claim q = h (thus proving part 1): Indeed, if x ∈ Ef (i.e., f (x) ﬁnite), then p(x) is deﬁned by f (x) + g(x), and q(x) is deﬁned by / q(x) = p(x) − f (x) = g(x) = h(x). If x ∈ Ef , then p(x) = f (x), even if f (x) + g(x) should be deﬁned. In that case, p(x) − f (x) is not deﬁned, hence again q(x) = h(x). If E is an I-null set, then χE = 0 except on the I-null set E, hence χE is integrable and I(χE ) = 0. Conversely, assume χE is integrable and I(χE ) = 0. We want to show that the function f deﬁned by f (x) = ∞ for x ∈ E and f (x) = 0 for x ∈ E is integrable. Let ε > 0. Since I(χE ) = 0, / there exists gn ∈ ↑ V with g ≥ χ and I(g ) < ε/2n . We therefore have an increasing sequence n E n sn := g1 + . . . + gn in ↑ V , with I(sn ) < n ε/2j < ε. Let g := lim sn ∈ ↑ V ⊂ L1 (X, I) by j=1 Prop. 7.5.5. I(g) ≤ ε from the same proposition. Clearly g(x) = ∞ for x ∈ E, and therefore g ≥ f . This implies that I ∗ (f ) ≤ I ∗ (g) = I(g) ≤ ε. Since this is true for every ε, we have I ∗ (f ) ≤ 0, and trivially I∗ (f ) ≥ 0. So f is integrable. The following is immediate from this lemma and Lemma 7.6.2(6): Corollary 7.6.7 Any subset of an I-null set is an I-null set, too. 0 In the case that I is the Riemann integral on S or Ccpt (Examples 7.4.4, 7.4.7), Lebesgue null sets from Def. 7.3.2 are indeed I-null sets. The converse is also true, but is less easy to show, so we’ll postpone this until after a more thorough discussion of the Lebesgue measure (Cor. 7.8.10). Proposition 7.6.8 If N ⊂ Rn is a Lebesgue null set in the sense of Def. 7.3.2, then it is an I-null set in the sense of Def. 7.6.5 with I denoting the extension of the elementary integral on step functions. Proof: Clearly, if U is a countable union of boxes Bj , then we can let Un := n Bj and χU ∈ ↑ S 1 with S ∋ χUn ր χU . As I(χBj ) = µ(Bj ) and I(χUn ) ≤ n µ(Bj ), we have I(χU ) ≤ ∞ µ(Bj ). So 1 1 if N is a Lebesgue null set, it can be covered by U , a countable union of boxes Bj with total volume ∗ j µ(Bj ) ≤ ε. Therefore I (χN ) ≤ I(χU ) ≤ ε. Since this is true for every ε > 0, we conclude I(χN ) = 0. The same proposition applies when I is the extension of the Riemann integral on continuous functions, with essentially the same proof, due to Exercise 7.11. Note that in most of our examples, the constant function 1 is not integrable (due to the ‘inﬁnite volume’ of Rn we have I∗ (1) = I ∗ (1) = ∞). This prevents us from directly applying Lemma 7.6.2(7) 24 to min{f, 1}. However, such a truncation is often desirable, and for this purpose we introduce the following deﬁnitions and a generalization of Lemma 7.6.2(7). Deﬁnition 7.6.9 A vector lattice V is said to satisfy Stone’s axiom, if for every f ∈ V , the function min{f, 1} is in V as well. It is easy to see that then min{f, N } and max{f, −N } are in V for every positive constant N . Exercise 7.14 Show: If V satisﬁes Stone’s axiom, and I is an elementary integral on V , then its Daniell extension satisﬁes the property f ∈ L1 (X, I) =⇒ min{f, N } ∈ L1 (X, I) for N > 0. Hint: All proof steps are easy; the main issue is to go back through the construction, mention and modify every ingredient that enters into Lemma 7.6.2(7), such as to see that all proof steps carry indeed over. 0 Clearly, our main examples S, Ccpt , C 0 and Seq satisfy Stone’s axiom. Exercise 7.15 Let V = Seq with the elementary integral I from Example 7.4.9. Show that L1 (N, I) consists precisely of the absolutely convergent series. Which sets are I-null sets? Exercise 7.16 Let V = C 0 (Rn ) with the elementary integral δx∗ from Example 7.4.5. Show that L1 (Rn , δx∗ ) consists of all those functions f : Rn → E for which f (x∗ ) is ﬁnite, and that the δx∗ -null sets are exactly those subsets of Rn that do not contain x∗ . Out here I not def’d even if I ∗ = I∗ I = +∞ This Venn diagram displays the general picture of the sets involved in the Daniell extension process. Note that ↑V the sets V , ↑ V and ↓ V are independent of the choice of integrable functions an elementary integral I, except that V is the original V L1 (X, I) domain of I, before the extension. I ﬁnite 0 In the cases V = C 0 , V = Ccpt , and V = Seq, it turns ↓V out that the gray set (↑ V ∩ ↓ V ) \ V is empty. However, in the case V = S, this gray set contains all of Ccpt,0 I = −∞ except for the zero function, which is already in V . F(X → E) 0 We can now get the following picture that combines the cases V = S and V = Ccpt each of which can be used to construct the Lebesgue integral: As we will see in the next chapter, we get one and ↑S the same L1 (Lebesgue-integrable functions; dotted line) 0 from the Daniell process on Ccpt (solid line) or on S (ﬁne ↑C 0 dashed line). Note that the starting points, S and Ccpt 0 cpt are ‘almost disjoint’: only the zero function is in the 0 Ccpt S L1 intersection. After the ﬁrst extension step (by mono- tonic convergence), the step function approach is more ↓ 0 0 Ccpt powerful than the one via Ccpt functions. However, after the second step, the same set of Lebesgue ↓S integrable functions is captured from either starting F(Rn → E) point. 25 7.7 The Lebesgue Integral vs the Riemann Integral; Convergence Theorems First let us note that we have captured the Riemann integral by the extension process from S. Theorem 7.7.1 Let I denote the elementary integral on S according to Example 7.4.4. Assume f is Riemann integrable on a box B ⊂ Rn (and f = 0 outside B). Then f ∈ L1 (Rn , I), and I(f ) is the Riemann integral B f (x) dn x. Proof: The Riemann-integrability of f means: There exists a number J (namely J = B f (x) dn x) that makes the following claim true: for every ε > 0, there exists δ > 0, such that for every tagged partition (P, {cj }) with meshsize σ(P) < δ, it holds J −ε< f (cj )µ(Bj ) < J + ε j By taking the sup or inf as cj ∈ Bj , we conclude (with the same quantiﬁers) J −ε≤ (inf f ) µ(Bj ) ≤ (sup f ) µ(Bj ) ≤ J + ε Bj Bj j j In particular, the supBj f and inf Bj f involved are ﬁnite. (This amounts to the proof that every Riemann integrable function in bounded.) For each partition P, the function gP that is constant supBj f on each box open Bj (and whatever on the boundaries of the boxes) is a step function in S, and therefore trivially in ↑ S. Likewise, we can deﬁne hP ∈ S ⊂ ↓ S by taking inf Bj f on each box Bj . By taking the sup over all h ∈ ↓ S (which includes step functions hP obtained from partitions with meshsize < δ, we therefore ﬁnd I∗ (f ) ≥ J − ε. Likewise we ﬁnd I ∗ (f ) ≤ J + ε. Since this is true for all ε > 0, we obtain J ≤ I∗ (f ) ≤ I ∗ (f ) ≤ J. Thanks to this theorem, we now write I(f ) as Rn f (x) dn x, when the Daniell extension of the integral on S is meant. We refer to this integral as the Lebesgue integral of f , and we have proved that it is an extension of the Riemann integral. For any set A ⊂ Rn , we understand A f (x) dn x to mean Rn f (x)χA (x) dn x, if this integral makes sense. As a key step towards the convergence theorems for the Lebesgue integral (or, more generally, any Daniell extension of an elementary integral), we study the behavior of the integral under monotonic limits. Note that the limit of an increasing sequence fn is the same as the inﬁnite series f1 + ∞ (fn+1 − fn ), a series with nonnegative terms. This latter point of view is now a bit more 1 convenient than the former. We can still maintain the generality of the Daniell extension of any elementary integral. ∞ ∞ ∗ ∞ ∞ Theorem 7.7.2 Let fn : X → [0, ∞]. Then I ∗ ( 1 fn ) ≤ 1 I (fn ) and I∗ ( 1 fn ) ≥ 1 I∗ (fn ). Proof: We ﬁrst take the case of the I ∗ . If any I ∗ (fn ) is inﬁnite, the inequality is trivial. So assume all I ∗ (fn ) are ﬁnite, and select gn ∈ ↑ V such that gn ≥ fn and I(gn ) < I ∗ (fn ) + ε/2n , where ε > 0 is arbitrary. Using Lemma 7.5.5, we have ↑ V ∋ ∞ gn ≥ n fn and therefore 1 1 ∞ ∞ ∞ ∞ ∗ I ∗( 1 fn ) ≤ I( 1 gn ) = 1 I(gn ) ≤ 1 I (fn ) + ε. The claim follows as ε → 0. 26 For I∗ , we combine the monotonicity of I∗ with the desired inequality for ﬁnitely many terms (Cor. 7.6.3) to get I∗ ( ∞ fn ) ≥ I∗ ( N fn ) ≥ N I∗ (fn ) 1 1 1 Now let N → ∞ on the right hand side. The following ﬁrst convergence theorem is an immediate consequence of this result: Theorem 7.7.3 (Thm. of Beppo Levi) Let (fn ) be a sequence of nonnegative, integrable functions and f := ∞ fn . If 1 I(fn ) < ∞, then f is integrable and I(f ) = ∞ I(fn ). 1 If (gn ) is an increasing (resp. decreasing) sequence of integrable functions, with gn ր g (resp. gn ց g), and lim I(gn ) < ∞ (resp. lim I(gn ) > −∞), then g is integrable and I(g) = lim I(gn ). Proof: The ﬁrst statement follows immediately from the preceding result by ∞ ∞ ∞ ∞ ∞ ∗ ∞ 1 I(fn ) = 1 I∗ (fn ) ≤ I∗ ( 1 fn ) ≤ I ∗( 1 fn ) ≤ 1 I (fn ) = 1 I(fn ) The second statement, for increasing, follows by applying the ﬁrst result to fn := gn+1 − gn . (We may let fn (x) = ∞ where the arithmetic on E fails us.) For decreasing, use the negative of these functions. An immediate consequence is Corollary 7.7.4 The union of countably many I-null sets is an I-null set. Proof: Let An be a countable family of I-null sets, and take fn to be their characteristic func- tions. Let f be the characteristic function of An . Then 0 ≤ f ≤ fn , and I( fn ) = I(fn ) = 0 = 0 by Thm. 7.7.3. Now I(f ) = 0 follows from Lemma 7.6.2(6). We have seen examples where the integral does not commute with pointwise limits (Exercise 7.6). However, we do have a one-sided inequality for integrals of nonnegative functions, and it is some- times useful: Lemma 7.7.5 (Fatou’s Lemma) Let fn be nonnegative integrable functions with sup I(fn ) < ∞. Then f := lim inf n→∞ fn (deﬁned pointwise) is integrable, and I(f ) ≤ lim inf n→∞ I(fn ). Proof: For ﬁxed n, the sequence (gnm := min{fn , fn+1 , . . . , fn+m })m is a decreasing sequence of nonnegative integrable functions, and therefore, from Thm. 7.7.3, its limit gn = inf{fk | k ≥ n} is integrable, and I(gn ) = limm→∞ I(gnm ). In particular I(gn ) ≤ I(fk ) for every k ≥ n. So we can take the inf over k and get a bound I(gn ) ≤ inf k≥n I(fk ) ≤ sup I(fk ) < ∞. Now we can let n → ∞, and (gn ) is an increasing sequence: gn ր lim inf fk = f . Again by Thm. 7.7.3, f is integrable and I(f ) = lim I(gn ) ≤ limn→∞ inf k≥n I(fk ) = lim inf I(fk ). Exercise 7.17 Show that the hypothesis sup I(fn ) < ∞ can be replaced by the weaker hypothesis lim inf I(fn ) < ∞. We are now in a position to prove Lebesgue’s dominated convergence theorem, which is almost a universal tool for exchanging limits with integration. 27 Theorem 7.7.6 (Dominated Convergence) Suppose fn are integrable and fn → f I-almost every- where, i.e., there exists an I-null set N such that fn (x) → f (x) for all x ∈ N ). Suppose further / that there is an integrable function g such that |fn | ≤ g for all n I-almost everywhere. Then f is integrable, and I(f ) = lim I(fn ). Proof: By changing fn and f to values 0 on the null set N , we change the a.e.-convergence to pointwise (everywhere) convergence without changing the integrability hypothesis for the fn or the integrability claim for f . Moreover, by changing the value of g to +∞ on the set ∪Nn , where Nn is the null set on which the condition |fn (x)| ≤ g(x) fails, we are not changing the integrability of g, because the countable union of the null sets Nn is still a null set (by Cor. 7.7.4). Therefore it suﬃces to prove the theorem with ‘a.e.’ replaced by ‘everywhere’. Now, since g − fn ≥ 0, we can use Fatou’s lemma to get that g − f (and hence f ) is integrable, and I(g − f ) ≤ lim inf I(g − fn ) = I(g) − lim sup I(fn ). Likewise, using Fatou’s lemma on g + fn , we get I(g + f ) ≤ I(g) + lim inf I(fn ). Putting the two together and cancelling I(g), we conclude I(f ) ≤ lim inf I(fn ) ≤ lim sup I(fn ) ≤ I(f ). Hence lim I(fn ) = I(f ). Before continuing with the theory of L1 (X, I) in general, let us make good on the promise that the two approaches to the Lebesgue integral are equivalent: Theorem 7.7.7 Let IS be the elementary integral on step functions according to Example 7.4.4, 0 and let IC be the Riemann integral on Ccpt functions according to Example 7.4.7 (equivalently, 0 ). Then L1 (Rn , I ) = L1 (X, I ), and I (f ) = I (f ) for all integrable the restriction of IS to Ccpt S C S C functions f . Proof: By Prop. 7.5.10, we have ↑ Ccpt ⊂ ↑ S. Moreover IC (f ) = IS (f ) for all f ∈ Ccpt , because 0 0 0 the Riemann integral on Ccpt is a special case of the Lebesgue integral by Thm. 7.7.1. By going to 0 increasing limits, we have IC (f ) = IS (f ) for all f ∈ ↑ Ccpt as well. With analogous proof, the same holds for all f ∈ ↓ Ccpt . Now suppose f is IC -integrable. Then, given ε > 0, there exist g ∈ ↑ Ccpt 0 0 and h ∈ ↓ Ccpt such that h ≤ f ≤ g and IC (g) − IC (h) < ε. The same h, g are in ↑ S and ↓ S and 0 provide for IS (g) − IS (h) < ε, hence f is IS -integrable by Cor. 7.6.4. 0 The converse inclusion needs a bit more work, because ↑ S is a strict superset of ↑ Ccpt ; set inclusion arguments alone cannot rule out that IS is more powerful than IC . So assume f ∈ L1 (Rn , IS ) and ε let ε > 0. There exists g ∈ ↑ S such that g ≥ f and IS (g) < IS (f ) + 2 . We want to modify g to make it lower semicontinuous. There exist step functions gk such that gk ր g. gk is constant on the ˜ ‘steps’ (open boxes) Bk,j . We change the values on the boundaries of the steps: Let gk (x) = gk (x) for all x ∈ Bk,j , and deﬁne gk (x) for x ∈ j (∂Bk,j ) to be the smallest among those values gk (Bk,j ) ˜ g ˜ for which x is in the closure of Bk,j . Then IS (˜k ) = IS (gk ), but gk is lower semicontinuous. Let g ˜ the limit of the sequence (˜k ) be g . The sequence (˜k ) is still increasing. Indeed, for x ∈ j (∂Bk,j ) ∪ j (∂Bk+1,j ), we have gk+1 (x) = g / ˜ gk+1 (x) ≥ gk (x) = gk (x). For the other x’s we choose a sequence xr → x with xr such that ˜ lim gk+1 (xr ) = gk+1 (x); this can be achieved by choosing the xr from the appropriate among the boxes Bk+1,j (and not on any of the boundaries ∂Bk,j either). Then gk+1 (xr ) ≥ gk (xr ) implies ˜ ˜ gk+1 (x) = lim gk+1 (xr ) ≥ lim inf gk (xr ) ≥ gk (x). ˜ ˜ ˜ ˜ In contrast to g ≥ f , we may have violated the inequality g ≥ f on an IS null set. We can ˜ cover j (∂Bk,j ) with ﬁnitely many open boxes Dk,r , the sum of whose volumes is < ε/2k+1 Mk , where Mk := sup gk − inf gk . The χDk,r are lower semi-continuous, because the Dk,r are open. So ˆ ˜ g we let gk := gk + l≤k r Ml χDl,r . Now (ˆk ) is an increasing sequence of lsc functions, whose 28 ˆ limit g is therefore also lsc. (The proof of Exercise 7.4 carries over to the present situation). And ε IS (ˆ) ≤ IS (˜) + k ε/2k+1 = IS (g) + 2 < IS (f ) + ε. Since gk ≥ gk , we have g ≥ g ≥ f . g g ˆ ˆ Since g ∈ ↑ S, it is ≥ 0 outside some compact set, and since g is also lower semicontinuous, it is ˆ ˆ 0 ∗ in ↑ Ccpt by Prop. 7.5.9. But this means IC (f ) ≤ IS (f ) + ε. A similar construction can be made with the lower integral to obtain IC∗ (f ) ≥ IS (f ) − ε. Together, they prove the IC integrability of f and the equality of the integrals. We infer, from the properties of integrability, that L1 (X, I) satisﬁes all properties of a vector lattice, except for the trouble with not-always deﬁned vector space operations due to possible inﬁnte values of funcitons in L1 (X, I). Restricting I to ﬁnite-valued integrable functions, we have again an elementary integral and may wonder if repetition of the Daniell extension process produces further integrable functions. The answer is negative: Theorem 7.7.8 Let I be the Daniell extension of an elementary integral on V . Let J be the restriction of I to the vector lattice W consisting of the ﬁnite valued functions among L1 (X, I). Then L1 (X, J) = L1 (X, I), and J(f ) = I(f ). Proof: The fact that W is a vector lattice and J an elementary integral follows from Lemma 7.6.2 (parts 3,4,5,7) and the Beppo Levi theorem 7.7.3. The inclusion L1 (X, I) ⊂ L1 (X, J) is immediate from the inclusion V ⊂ W , with details as in the ﬁrst part of the proof of Thm. 7.7.7. So let us now assume f is J-integrable. Then, given ε > 0, there exist functions g ∈ ↑ W and h ∈ ↓ W such that h ≤ f ≤ g and J(g) < J(f )+ 1 ε and J(h) > J(f )− 1 ε. For such functions, the Beppo Levi 2 2 theorem guarantees g, h ∈ L1 (X, I), and J(g) = I(g), J(h) = I(h). So we have found I-integrable functions h, g such that h ≤ f ≤ g and I(g) − 1 ε < J(f ) < I(g) and I(h) < J(f ) < I(h) + 1 ε. By 2 2 the monotonicity of the upper and lower integrals, we infer 1 J(f ) − 2 ε ≤ I(h) = I∗ (h) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I ∗ (g) = I(g) ≤ J(f ) + 1 ε 2 The claim follows. The following lemma will help in proving that the Daniell extension process makes good on the 0 goal of metric completion of V = Ccpt with respect to the norm f := |f (x)| dx. Lemma 7.7.9 f ∈ L1 (X, I) iﬀ ∀ε > 0 ∃f0 ∈ V : I ∗ (|f − f0 |) < ε. Proof: Assume f ∈ L1 (X, I). Then there exists a g ∈ ↑ V such that g ≥ f and I(g) < I(f ) + 1 ε.2 There also exists g0 ∈ V such that g0 ≤ g and I(g0 ) > I(g) − 1 ε. Then I(|f − g0 |) ≤ I(|f − g|) + 2 1 I(|g − g0 |) < 1 ε + 2 ε = ε. We can choose f0 := g0 . 2 Conversely, assume that for every ε > 0 there is some f0 ∈ V such that I ∗ (|f − f0 |) < ε. So there exists a g ∈ ↑ V such that |f − fn | ≤ g (i.e., f0 − g ≤ f ≤ f0 + g and I(g) < ε. But then I(f0 ) − I(g) ≤ I∗ (f ) ≤ I ∗ (f ) ≤ I(f0 ) + I(g) and therefore I ∗ (f ) − I∗ (f ) < 2I(g) < 2ε. This implies the integrability of f . Deﬁnition 7.7.10 A real-valued function · on a vector space is called a seminorm, iﬀ it satisﬁes the following axioms: 29 f + g ≤ f + g for all f, g, λf = |λ| f for all λ ∈ R and all f , f ≥ 0 for all f . So what is missing from a norm is the property f = 0 =⇒ f = 0. It is immediate that an elementary integral I on a vector lattice V deﬁnes a seminorm there by f := I(|f |). The same applies for the Daniell extension of I, restricted to the ﬁnite valued functions in L1 (X, I) such as to 0 have a vector space as a domain. For the Riemann integral on V = Ccpt , f := I(|f |) is actually a norm. However, this is no longer true for the extension, because, e.g., χN = 0 in case N = ∅ is a null set, but χN = 0. And also for other examples of elementary integrals (e.g., δx∗ on C 0 ), I(|f |) need not be a norm. The way to ﬁx this deﬁcit is to consider equivalence classes of functions. Lemma 7.7.11 The relation f ∼ g : ⇐⇒ f = g a.e. is an equivalence relation. The proof is trivial; we can now deﬁne Deﬁnition 7.7.12 L1 (X, I) := L1 (X, I)/ ∼, i.e., the set of equivalence classes of functions modulo equality I-a.e. Now we have the crucial Theorem 7.7.13 L1 (X, I), together with · L1 deﬁned by f L1 := I(|f |) is a Banach space. Proof: Despite the slight abuse of notation, remember that the elements of L1 (X, I) are not functions f , but equivalence classes [f ] of functions. As a consequence of Lemma 7.6.6, each equiv- alence class [f ] contains functions that have ﬁnite values everywhere. The vector space operations are deﬁned in terms of such representatives: [f ]+[g] := [f +g], λ[f ] := [λf ], and this is well-deﬁned, i.e., the deﬁnition does not depend on the choice of representatives: If f1 ∼ f2 and g1 ∼ g2 then f1 + g1 ∼ f2 + g2 . Likewise the norm is well-deﬁned by choosing a representative. The seminorm properties of f translate into seminorm properties for [f ] . Now if [f ] = 0, this means f = 0 for a representative, i.e., I(|f |) = 0, and then (why actually?) f ∼ 0, i.e., [f ] = [0]. So · is a norm on the quotient space L1 (X, I). Exercise 7.18 Write out the proof details of these statements. Proof continues: The key issue is to prove completeness. Suppose ([fn ]) is a Cauchy sequence in L1 (X, I). We will construct a limit [f ] with respect to the L1 norm by constructing a representative f as a pointwise (a.e.) limit of a subsequence of (fn ). Indeed, we will deﬁne nj inductively such that fn − fm < 2−j provided n, m ≥ nj and (for j > 1) nj > nj−1 . Then gN := N |fnj+1 − fnj | j=1 deﬁnes an increasing sequence (gN ) of integrable functions with I(gN ) bounded by 2−j = 1. By Beppo Levi, g := lim gN exists pointwise, is integrable and ﬁnite almost everywhere. But then, for each x for which g(x) is ﬁnite, the sequence fnk (x) = fn1 (x) + k (fnj (x) − fnj−1 (x)) converges, j=2 because it converges absolutely. In other words, (fnk )k converges a.e. to a function f . But this sequence is also majorized by the integrable function fn1 + g, and therefore f is integrable and I(f ) = lim I(fnk ). More even, I(|f − fnk |) ≤ ∞ I(|fnj+1 − fnj |) = 2−(k−1) . So [fnk ] → [f ] j=k in the · L1 sense. We know from Lemma 3.11.3 that a Cauchy sequence that has a convergent subsequence is convergent itself. Let us extract from this proof the following 30 Corollary 7.7.14 Every L1 -convergent sequence has an a.e. convergent subsequence. It bears repeating that when we talk about an L1 -convergent sequence, then each member of this sequence is an equivalence class of functions. We get a sequence of functions by choosing a representative from each equivalence class. It makes sense, for such a sequence, to ask whether it (or a subsequence thereof) converges pointwise, or at least almost everywhere. The answer to the question ‘almost everywhere convergent?’ is independent of the choice of representatives, because each diﬀerent choice aﬀects only the values on a null set, and the union of countably many null sets (one for each sequence index n) is still a null set. These issues are often conveniently glossed over, and the theory is set up in such a way that usually hypotheses and conclusions are only intended to hold almost everywhere. So the issue of choosing representatives of each equivalence class is a non-issue in most practical applications. However, bear the distinction in mind to avoid confusion in the very rare circumstances where it is crucial. If you were to ask: “Does every L1 - convergent sequence have a pointwise convergent subsequence?”, this would, strictly speaking, be a meaningless question; the answer would depend on a choice of representatives; a rewording of the question that makes it meaningful would read: “Can every L1 -convergent sequence be given a choice of representatives such that the sequence of representatives has a pointwise convergent subsequence?” The answer to this question would be ‘yes’, and the reason is simply Cor. 7.7.14. The exceptional set allowed there can be removed by choosing representatives that are (e.g.) 0 on the exceptional set. It is noteworthy that an L1 convergent sequence need not be a.e.-convergent itself: Example 7.7.15 Given [a, b[ ⊂ R with b − a < 1, deﬁne the ‘wrap-around characteristic function’ χ[a,b[ on [0, 1] as follows: If n ≤ a < b ≤ n + 1 for some integer n, then χ[a,b[ = χ[a−n,b−n[ . If ˜ ˜ n ≤ a < n + 1 < b for some n ∈ N, then χ[a,b[ := χ[a−n,1] + χ[0,b−n−1[ . Let an := n 1 and deﬁne ˜ j=1 j fn := χ[an ,an+1 [ . Then fn converges nowhere in [0, 1], but converges to 0 in the L1 norm. ˜ While the issue of choosing representatives is irrelevant in most examples, let me give you one simple example where it needs to be addressed: In a practical context, we may obtain a ‘function’ f out of an argument using the vector space L1 ; i.e., we may say something in the style “Since this is a Cauchy sequence in L1 , it has a limit; let’s call this limit f .” Now as we pretend this limit to be a function, we may want to ask whether this function is continuous. But in reality, we have constructed an equivalence class [f ], and our question of continuity really asks whether this equivalence class [f ] has a continuous representative. Speciﬁcally for the Lebesgue integral I, if two continuous functions are not the same, then they diﬀer on an open interval (which is not an I-null set), so they cannot be equivalent. This means in the case of the Lebesgue integral, every equivalence class has at most one continuous representative. If it has one indeed, we would choose this as the preferred representative and call this [f ] ∈ L1 continuous, because it has a (unique) continuous representative. So g represents a continuous equivalence class [g], if there is a continuous function f such that g = f almost everywhere. This is not the same as saying g is continuous almost everywhere! χQ represents a continuous element of L1 , namely [χQ ] = [0], but χQ is nowhere continuous! It gives a ‘good feeling’ to have a way of choosing a representative f of an equivalence class [f ] ∈ L1 in a natural way (even though in most cases the need doesn’t arise). For this ease of mind, I quote a theorem (whose proof is beyond the scope of this class) that helps out in this respect: Theorem 7.7.16 (Lebesgue’s diﬀerentiation theorem) If f is Lebesgue integrable, then the limit of 31 ball averages n B(x,r) f (y) d y g(x) := lim n B(x,r) 1 d y r→0 exists almost everywhere, and g = f a.e. The balls are understood with respect to the euclidean metric here. This theorem gives a well-deﬁned and natural way of assigning a value f (x) to an [f ] ∈ L1 at least in almost every point x, and it selects in a natural and unambiguous way the precise null set on which we may refrain from assigning a value f (x). In the sequel, we will take the liberty (as is common usage) of writing elements of L1 like functions, when this causes no problems; but we will revert to the equivalence class notation [f ] whenever needed for mathematical or didactical reasons. Finally, it is instructive to observe a similarity between the completion of (Q, deucl ) and the com- 0 pletion of (Ccpt (K), · L1 ). In both cases we started out with the order structure. For getting R from Q we introduced the supremum axiom, rather than going through an actual construction; but we could have gone a constructive route instead. We chose the axiomatic approach in this course, because the constructive approach would have been more ‘pre-analysis’ foundational. It turned out subsequently that this completion process based on order achieved metric completeness. The Daniell extension process relied on the order structure as well; this time we carried out a construction in minute detail, because it is of germane analysis interest. The resulting space L1 again turns out to be metrically complete. 7.8 Measurable Functions and Sets General Properties that apply to all vector lattices and integrals: We continue to study an integral I that is the Daniell extension of an elementary integral on a vector lattice V ⊂ F(X → R). Informally speaking, there are two ways in which a function can fail to be integrable. One is that a reasonable candidate for the integral just fails to be ﬁnite, another is that the function is so ‘wild’ that a reasonable candidate for its integral cannot be found. A function is measurable if the second alternative does not happen. It is diﬃcult to come up with examples of functions that are not measurable. Measurability is a notion that is not intuitive to deﬁne in the ﬁrst place, but once deﬁned, it is a hypothesis that is easy to check and easy to fulﬁll. Here are the deﬁnitions: Deﬁnition 7.8.1 Given functions h, f, g : X → E, where we assume h ≤ g, deﬁne med(h, f, g) := max{h, min{f, g}}. Intuitively, the graph of med(h, f, g) is obtained by truncating the graph of f from above with the graph of g and from below with the graph of h. Exercise 7.19 Assume h ≤ g. Show: max{h, min{f, g}} = min{max{h, f }, g}. What is med{h, f, g} in case f > g, what in case f < h? Deﬁnition 7.8.2 f : X → E is I-measurable, iﬀ med(−g, f, g) is I-integrable for every non- negative I-integrable g. A set A ⊂ X is I-measurable if its characteristic function χA is I- measurable. 32 We have an immediate corollary: Corollary 7.8.3 If f is integrable, then f is measurable. If f is measurable and |f | ≤ g for an integrable function g, then f is integrable; speciﬁcally, if |f | is integrable and f is measurable, then f is integrable. If f is measurable, then med(h, f, g) is integrable for every pair of integrable functions (h, g) with h ≤ g. Exercise 7.20 Prove this corollary. Deﬁnition 7.8.4 Let A ⊂ X. We deﬁne µ∗ (A) := I ∗ (χA ) and call it the outer measure of A. If χA is integrable, we denote I(χA ) =: µ(A) and call it the measure of A. If χA is measurable, but not integrable, we let µ(A) := +∞ and still call it the measure of A. If I needs to be speciﬁed, an index I can be attached to µ. In view of Lemma 7.6.6, the I-null sets are exactly the sets of measure 0. We get a list of easy properties for measurable functions and sets: Proposition 7.8.5 (1) If f is measurable and g = f a.e., then g is measurable. (2) If f1 and f2 are measurable, then f1 + f2 , provided it is deﬁned a.e., is measurable. Constant multiples of measurable functions, if deﬁned a.e., are measurable. (3) Absolute values and minima and maxima of measurable functions are measurable. (4) If (fn ) is a sequence of measurable functions that converges a.e. to f , then f is measurable. Proof: The ﬁrst statement follows from Lemma 7.6.6. Now assume f1 and f2 are measurable and consider med(−g, f1 + f2 , g) for g integrable. In view of the ﬁrst part, it is no loss of generality to assume that f1 + f2 is deﬁned everywhere. Since fi,n := med(−ng, fi , ng) are integrable, so is f1,n + f2,n , and therefore med(−g, f1,n + f2,n , g) is also integrable. Now f1,n (x) + f2,n (x) → f1 (x) + f2 (x) as n → ∞, provided g(x) > 0. Therefore med(−g, f1,n + f2,n , g) → med(−g, f1 + f2 , g) pointwise (in points where g(x) = 0 this claim is trivial). By majorized convergence, med(−g, f1 + f2 , g) is integrable, hence f1 + f2 is measurable. Now assume f measurable and λ ∈ R. Assume λ = 0 as the other case is trivial. Then med(−g, λf, g) = λ med(−g/|λ|, f, g/|λ), and the measurability of λf follows. If f is measur- able, then |f | is measurable because med(−g, |f |, g) = | med(−g, f, g)|, and the rhs is integrable, because it is the absolute value of an integrable function; see Lemma 7.6.2(7). Now if f1 , f2 are measurable and at least one of them is ﬁnite valued a.e., then we can argue that max{f1 , f2 } := 1 1 2 (f1 + f2 ) + 2 |f1 − f2 | is also measurable; and similarly for min{f1 , f2 }. The result also holds without the assumption of ﬁnite-valuedness (and thus with f1 ± f2 maybe not deﬁned a.e.): namely we can argue that med(−g, max{f1 , f2 }, g) = max{med(−g, f1 , g), med(−g, f2 , g)}, as can be eas- ily seen by a pointwise case distinction. Then refer to Lemma 7.6.2(7). Majorized convergence proves that pointwise limits of measurable functions are measurable; namely, if fn → f a.e., then med(−g, fn , g) → med(−g, f, g) a.e., and majorized by |g|, which was assumed integrable. Similarly, we have Proposition 7.8.6 Countable unions and countable intersections of measurable sets are measur- able. If Stone’s axiom applies, complements of measurable sets are measurable. 33 Proof: The characteristic function of ∞ Ai is limn→∞ max{χA1 , . . . , χAn }. A similar formula i=1 with min instead of max applies for intersections. By the preceding proposition, these functions are measurable, if the χAi are. Stone’s axiom guarantees that min{g, 1} is integrable if g is. So the constant 1 is measurable. Then 1 − χA is measurable if χA is. Let us study an easy example, before tackling speciﬁcally the Lebesgue measure: Exercise 7.21 In the case of V = Seq with I as in Example 7.4.9, show that all sequences N → E are measurable. Exercise 7.22 (1) If A1 ⊂ A2 ⊂ . . . is an increasing sequence of measurable sets, then µ( Ai ) = lim µ(Ai ). (2) If (Ai ) is a sequence of measurable sets (not necessarily increasing, then µ( Ai ) ≤ µ(Ai ). If the Ai are pairwise disjoint, then equality holds. (3) If A1 ⊃ A2 ⊃ . . . is a decreasing sequence of measurable sets, with µ(A1 ) < ∞, then µ( Ai ) = lim µ(Ai ). Give a counterexample that the conclusion need not hold if the hypothesis µ(A1 ) < ∞ is dropped. Speciﬁcally, the last part of Exercise 7.22 puts our heuristic idea about the measure of a fat Cantor set (Example 7.3.4) on a rigorous footing. Remark for cross reference: (may be skipped) Let me comment brieﬂy on the measure theoretic approach to Lebesgue integration (i.e., the main alternative to the approach we are taking here): In measure theory, a family S of sets is called a σ-algebra if, with any countable collection of sets Ai , it also contains their union (Ai ∈ S =⇒ ∞ c i=1 Ai ∈ S), and if with any set A it contains its complement (A ∈ S =⇒ A ∈ S). A measure is axiomatically deﬁned as a function µ : S → R ∪ {+∞} satisfying property (2) of exercise 7.22, and µ(∅) = 0. What is rather unmotivated in this approach is: why σ-algebras? The analyst’s answer to this question is: a powerful theory can only be expected from a notion that allows to pass to limits of sequences; in particular, countable unions of ‘good’ (measurable) sets should be ‘good’ (measurable), too. The diﬃcult part in this approach is to actually construct the Lebesgue measure. Some labor needs to be invested here, equivalent to a corresponding portion of our extension construction. With the measure constructed ﬁrst, one can then deﬁne step functions whose ‘steps’ are measurable sets with ﬁnite measure, rather than just boxes, and get a quicker deﬁnition of the Lebesgue integral by approximating functions with such step functions with measurable steps. There are a few variants of this construction on the market, diﬀering in details. Royden’s book is one that pursues this approach. Chapter 5 of the notes gives a downsized version of this construction, bypassing some technicalities and restricting discussion to ‘Borel sets’, which form a subset of the Lebesgue measurable sets. This downsized version is good enough for many practical purposes, but the downsizing does not serve a useful didactical purpose when the Daniell extension approach is taken. Properties specific to the Lebesgue integral: Not surprisingly, the measure µI obtained from Def. 7.8.4 in case I denotes the Lebesgue integral is called the Lebesgue measure. Note that in our general framework, we had a vector lattice V ⊂ F(X → R), and no metric or topology was assumed on X. Any properties about measurable functions or sets that involve 34 topological notions (like open or compact sets, continuous functions), will therefore need more speciﬁc hypotheses. A crucial hypothesis that connects topology and measurability is that all open sets should be measurable. In the case of the Lebesgue measure, this hypothesis is veriﬁed. Rather than pursuing further generalities, we restrict our discussion now exclusively to the Lebesgue measure. We will now write instead of I to stress that we no longer have the vast generality, but refer to the Lebesgue integral; however, we will keep the simplicity of writing f instead of n Rn f (x) d x. Theorem 7.8.7 (1) A function f : Rn → E is Lebesgue measurable, iﬀ for every compact cube C and every k > 0 the function med(−kχC , f, kχC ) is integrable. (2) A function f : Rn → E is Lebesgue measurable, iﬀ for every k, the set {x | f (x) > k} is Lebesgue measurable. The same equivalence holds with f (x) ≥ k, or f (x) < k, or f (x) ≤ k. (3) Continuous functions are Lebesgue measurable. (4) If g : Rk → R is continuous and fi : Rn → R are measurable, then the composite function g ◦ (f1 , . . . , fk ) is Lebesgue measurable. Proof: We write ‘measurable’ for Lebesgue measurable. The ‘only if’ part of (1) is a trivial consequence of Def. 7.8.1 since kχC is measurable. Conversely, f = limk→∞ med(−kχ[−k,k]n , f, kχ[−k,k]n ), and limits of integrable functions are measurable by Prop. 7.8.5(4). Concerning (2), if f is measurable, then min{n(f − k)+ , 1} is also measurable for each constant k. As n → ∞, this function converges pointwise to χ{f >k} . So this characteristic function is measurable, and therefore so is the set {x | f (x) > k}. The set {x | f (x) ≥ k} is the intersection of 1 the sets {x | f (x) > k − n } and is therefore also measurable. Similar arguments can be made for the other cases. Conversely, assume that {x | f (x) ≥ k} is measurable for every k. Consider fn := ∞ j−1 (χ{f ≥(j−1)/n} − j=1 n χ{f ≥j/n} ). Each term in this sum is a nonnegative measurable function, so the sum fn is measur- j able, too. Actually, for j ≥ 1, fn (x) = (j − 1)/n exactly if f (x) ∈ [ j−1 , n [. If f (x) < 0, then n fn (x) = 0. So, as n → ∞, fn → f+ pointwise. Therefore, f+ is measurable. The same argument, applied to f + s for any constant s, proves that (f + s)+ , and hence (f + s)+ − s, is measurable. As s → ∞, we retrieve f as measurable. Similar arguments can be made for sets deﬁned by the other inequalities. For (3), note that open sets are measurable, by Exercise 7.12 (or by Lemma 7.5.13 in connection with Thm. 7.7.7). For continuous functions f , the sets {f > k} are open. These two facts, together with part (2), guarantee the measurability of f . For (4), assume g is continuous and the fi are measurable. We want to show, for every N ∈ R, that the set S := {x | g(f1 (x), . . . , fk (x)) > N } is measurable. By continuity of g, the set U := g−1 (]N, ∞[) is open, and by Exercise 7.12, U is the countable union of open boxes Bj . So we can write S = ∞ {x | (f1 (x), . . . , fk (x)) ∈ Bj }. The set {x | fi (x) ∈ ]a, b[} is measurable j=1 by part (2), and as the intersection of k such sets, {x | (f1 (x), . . . , fk (x)) ∈ Bj } is also measurable. So S is the countable union of measurable sets, hence measurable as well. Remark 7.8.8 It is not true that compositions f ◦g with f Lebesgue measurable and g continuous must be Lebesgue measurable. A counterexample can be found in Gelbaum-Olmsted, Ch. 8, Ex. 16. The idea is to use the function x + ψ(x) (with ψ the devil’s staircase function) as a homeomorphism that transforms a non-measurable set into the subset of a null set. So it is crucial for Thm. 7.8.7(4) 35 that the continuous function is the outer, not the inner function of the composition. On the other hand, the theorem we have proved implies in particular that the product of measurable functions is measurable. Exercise 7.23 Show: If f, g are measurable and g doesn’t vanish, then f /g is measurable. (You may need to open up and slightly modify the proof of a useful lemma here. No need to repeat everything, just indicate the changes.) Exercise 7.24 Work out the details of the construction of a non-Lebesgue measurable set given on page 13, substantiating the heuristic assumptions used there with actual results now proved. Include a sentence of explanation why the Lebesgue measure is ‘translation invariant’ and what this means, precisely. Exercise 7.25 Show: If f is Lebesgue integrable and A is Lebesgue measurable, then f χA is Lebesgue integrable. (This means: we may integrate integrable functions over measurable sets, writing A f (x) dn x := f (x)χA (x) dn x.) Theorem 7.8.9 The Lebesgue measure is outer regular, i.e., given any set A, it holds µ∗ (A) = inf{µ(U ) | U ⊃ A , U open} Proof: The claim is trivial if µ∗ (A) = ∞; so assume µ∗ (A) = I ∗ (χA ) =: m < ∞. Given ε, we ∗ want to ﬁnd U ⊃ A open such that χU < ( χA ) + ε. It is possible to ﬁnd a function f ∈ ↑ S (or, ↑ C 0 ) such that f ≥ χ and 1 in view of Thm 7.7.7 a function f ∈ cpt A f < µ∗ (A) + 2 ε. Choose such 0 an f ∈ ↑ Ccpt . So f is lower semicontinuous, and the set U = {x | f (x) > 1 − δ} is open for any δ > 0. Moreover, this set contains A, and µ(U ) ≤ ( f )/(1 − δ), because f ≥ (1 − δ)χU . So we have 1 µ(U ) < (µ∗ (A)+ 2 ε)/(1−δ), and this is smaller than µ∗ (A)+ε if δ is chosen suﬃciently small. We can now prove the followng theorem, of which the converse had already been proved in Prop. 7.6.8. Theorem 7.8.10 If N is an I-null set in Rn (as deﬁned in Def. 7.6.5) for the Lebesgue integral, then N is a Lebesgue null set in the sense of Def. 7.3.2. Proof: By outer regularity, we can ﬁnd, for any given ε > 0, an open set U ⊃ N such that µ(U ) < ε/5n , where n is dimension (of Rn ). U is the union of the open max-balls (which are in particular boxes) contained in it. Let B be the family of these balls. The issue is now that these balls overlap, so we cannot conclude from µ( B∈B B) to µ(B). By a method called Vitali covering argument, we ﬁrst thin the family B out to get a maximal subfamily D of disjoint balls, and then enlarge the balls to make sure that the enlarged balls cover U again: Let R := sup{r(B) | B ∈ B}, where r(B) is the radius of B. Now R is ﬁnite, and actually (2R)n ≤ µ(U ) since each ball must have measure no larger than the measure of U . Now let Bj := {B ∈ B | R/2j < r(B) ≤ R/2j−1 } and let D1 be a maximal subfamily of B1 consisting of mutually disjoint balls: i.e., choose B1 ∈ B1 arbitrarily; if there exists a B ∈ B1 that is disjoint j−1 to B1 , choose such a ball and call it B2 . Continue inductively, choosing Bj disjoint to ∪i=1 Bi . The algorithm terminates after ﬁnitly many steps, when no ball in B1 can be chosen any more. (Speciﬁcally (#D1 ) Rn ≤ µ( B∈B1 B) ≤ µ(U ).) With D1 , . . . , Dk constructed (and consisting of pairwise disjoint balls B1 , . . . , Bjk ), choose now further balls Bjk +1 , . . . Bjk+1 from Bk+1 (if any) 36 that are disjoint to all precedingly chosen balls Bi . These will make up the family Dk+1 ⊂ Bk+1 . We thus obtain D = D1 ∪ D2 ∪ . . ., a family of countably (ﬁnite or inﬁnite) many pairwise disjoint balls Bi . This family is maximal in the sense that no further ball B ∈ B can be added that would be disjoint to all Bi ∈ D. Now denote by 5Bi the ball that has the same center as Bi , but 5 times the radius of Bi . We claim that {5Bi | Bi ∈ D} covers each B ∈ B = Bk and hence U . So let B ∈ Bk . B intersects some Bj ∈ Dk because otherwise it would have been added to Dk when this family was constructed. Since the radius r(B) ≤ R/2k−1 and r(Bj ) > R/2k , we habe B ⊂ 5Bj . Now j µ(5Bj ) = 5n j µ(Bj ) = 5n µ( Bj ) ≤ 5n µ(U ) ≤ ε, and the family of boxes 5Bj covers U and hence N . So N is a Lebesgue null set in the sense of Def. 7.3.2. We have thus recaptured the geometrically intuitive description of Lebesgue null sets given in the introduction in terms of the Daniell formalism, which deﬁnes null sets (and measure in general) in terms of integrals. Remark 7.8.11 There is a similar result stating that ∗ χA is the supremum of the measures of compact sets contained in A; with analogous proof. Finally, there is one key feature of the Lebesgue measure (and a very expected one from an intuitive point of view) that, as an artefact of our approach through the vector lattice S, has been rendered rather non-obvious; namely the rotation invariance of Lebesgue measure: Theorem 7.8.12 Let T be an n × n matrix and b ∈ Rn . Consider the linear mapping φ : Rn → Rn , x → T x + b. Then if A is a Lebesgue measurable set, then φ(A) is also Lebesgue measurable, and µ(φ(A)) = | det T | µ(A). Speciﬁcally, if T is a rotation matrix, µ(φ(A)) = µ(A). If f ∈ L1 (Rn ), and T is invertible, then Rn f (T x + b) dn x = | det T |−1 Rn f (y) dn y. Proof Sketch: By writing φ = τ ◦ ψ, where ψ : x → T x and τ : x → x + b, it suﬃces to prove the theorem separately for translations τ and for mappings ψ that have b = 0. The translation invariance of the measure is obvious for boxes Bj : µ(τ (Bj )) = µ(Bj ), and therefore it holds for all step functions f that f ◦ τ −1 = f . (The inverse arises because χφ(A) = χA ◦ φ−1 , whenever φ is invertible; speciﬁcally this applies to φ = τ .) Having proved the equality for functions in S, it follows for functions in ↑ S and ↓ S by monotone limits, and then for all integrable function by upper and lower approximation. This result for all integrable functions now includes in particular the integrable functions χA . If χA is measurable but with µ(A) = ∞, then χτ (A) is also measurable, using any of the characterisations of measurability we had; and µ(τ (A)) cannot be ﬁnite, because otherwise µ(τ −1 (τ (A))) = µ(A) would have to be ﬁnite as well, due to the result for integrable functions. Now we have to prove a similar result for ψ : x → T x, and we assume ﬁrst that T is invertible. It is shown in linear algebra that such a T is a product of elementary matrices Ei , and since det(E1 · · · Ek ) = (det E1 ) · · · (det Ek ), it suﬃces to prove the result for elementary matrices. El- ementary matrices come in three kinds: one kind multiplies the j-th coordinate by a number a, and it has determinant a; so µ(ψ(B)) = | det T | µ(B) holds for boxes B and elementary matrices T of this kind. Another kind swaps two coordinates, and it has determinant −1, so the formula µ(ψ(B)) = | det T |µ(B) holds again. The third kind is a shear mapping, and it has determinant 1. For such a mapping ψ there exists (without loss of generality) a set S and a translation τ for which χψ(B) − χB = χS − χτ (S) . Rather than working out the algebra, we give a picture that conveys the idea, and focus on the analytical part: 37 Without loss of generality 1 a B ψ(B) assume ah < l; else consider T = 0 1 the diﬀerence of two boxes, ψ : x → Tx or take products of several T ah with smaller a τ (S) S h l The measurability of χψ(B) is warranted because ψ(B) is open. As ψ(B) is contained in a compact box, the integrability of χψ(B) follows. The integrability of χS − χτ (S) follows from the integrability of χB and χψ(B) , and χS = (χS − χτ (S) )+ is now also integrable. But since χS = χτ (S) , we conclude χB = χψ(B) . Having thus shown f = f ◦ φ−1 for characteristic functions of boxes, it follows again for all functions in S, then for all functions in ↑ S or ↓ S, then for all integrable functions, and this includes the characteristic functions of measurable sets with ﬁnite measure, and then indirectly for measurable sets with inﬁnite measure as well. If T is not invertible, the claim amounts to showing that µ(φ(A)) = 0. We argue that φ(A) is a subset of a strict subspace of Rn , which may be chosen to have codimension 1, and that such a subspace is the image ψ(H) of a coordinate hyperplane H in Rn , under an invertible linear mapping ψ. It therefore has measure 0 by the second part of the proof. 7.9 Fubini and Tonelli The theorems of Fubini and Tonelli essentially explain the connection of multi-variable integrals with iterated integrals. These theorems are not restricted to the Lebesgue integral, but can be proved similarly for Daniell extensions of other elementary integrals; however, in order to avoid deﬁning the product construction generally, we write these theorems out only for the Lebesgue integral. Be aware that the construction of the Lebesgue measure and integral on Rn is of course dependent on n: For instance R is not a null set in R, but R × {a} is a null set in R2 . We will denote the Lebesgue measure in Rn as µn , and the Lebesgue integral over Rn will be written as Rn f (x) dn x. We also write S, the vector lattice of step functions, as Sn or Sm to clarify on which domain these step functions are deﬁned. We identify Rn ×Rm with Rn+m ; and for a two-variable function f : (x, y) → f (x, y), Rn ×Rm → E, we have single variable functions f (x, ·) : y → f (x, y), Rm → R and f (·, y) : x → f (x, y), Rn → R. First we state the theorem of Fubini. Theorem 7.9.1 Let f ∈ L1 (Rn × Rm). Then for (µn -)almost every x ∈ Rn , the function f (x, ·) is in L1 (Rm ); and the integral F (x) := Rm f (x, y) dm y deﬁnes a function F ∈ L1 (Rn ). It then holds f (x, y) dn+m (x, y) = F (x) dn x Rn+m Rn The analogous result holds with the order of integration reversed. 38 Proof: We prove this theorem ﬁrst for characteristic functions of boxes, then for step functions, and from there for functions in ↑ S and ↓ S, and ﬁnally for L1 functions. An open box B in Rn × Rm is of the form B = B1 × B2 with B1 , B2 open boxes in Rn and Rm respectively. Then f (x, y) := χB (x, y) = χB1 (x)χB2 (y) and F (x) = µ(B2 )χB1 (x). Since µn+m (B) = µn (B1 )µm (B2 ), we obtain the theorem for this function f immediately. In preparation for the case where f is a step function (with arbitrary values on the boundaries of the steps), let us now assume that f (x, y) = χB (x, y) only for (x, y) ∈ ∂B (a µn+m -null set). Note that / ∂B = ((∂B1 ) × B2 ) ∪ (B1 × (∂B2 )). We still have f (x, ·) integrable for every x ∈ ∂B1 (a µn -null / set), speciﬁcally f (x, ·) ≡ 0 if x ∈ ¯ 1 )c , and f (x, ·) = χB2 almost everywhere for x ∈ B1 , with (B the exceptional null set being ∂B2 , a µm -null set. With this modiﬁcation, Fubini follows for such functions f . Step functions are ﬁnite linear combinations of functions that agree with the characteristic function of a box except on the boundary of this box. If Fubini holds for certain functions fi , then it immediately follows for ﬁnite linear combinations of the fi , by linearity of the integrals. So we have established Fubini for functions in Sn+m . Now assume f ∈ ↑ Sn+m , with Sn+m ∋ fj ր f . Fubini has already been proved for the fj . We also assume that f has ﬁnite integral, because we have to prove the theorem for integrable functions only. Then fj (x, ·) ∈ L1 (Rm ) for all x outside a µn -null set Nj . With Fj (x) := fj (x, y) dm y deﬁned at least for x ∈ N = Nj (still a null set), we have Fj ∈ L1 (Rn ) and fj (x, y) dn+m (x, y) = / Fj (x) dn x. So (Fj ) is an increasing sequence of integrable functions on Rn \ N ; let’s call its limit F . By Beppo Levi, F is still integrable and F (x) dn x = lim Fj (x) dn x, provided this limit on the right hand side is ﬁnite. But lim Fj (x) dn x is ﬁnite indeed, because it equals lim fj (x, y) dn+m (x, y) ≤ f (x, y) dn+m (x, y) < ∞. Being integrable, F is ﬁnite almost everywhere; the exceptional null set may be larger than N , but this is no problem. By going to the limit j → ∞ in fj (x, y) dn+m (x, y) = Fj (x) dn x, we conclude f (x, y) dn+m (x, y) = F (x) dn x. We had deﬁned F as the monotonic limit of Fj ; but we still have to prove that F (x) = f (x, y) dm y for almost every x. But since fj (x, ·) ր f (x, ·), we have f (x, y) dm y = lim fj (x, y) dm y ∈ R ∪ {∞} by the deﬁnition of the integral on ↑ Sm . But the rhs is lim Fj (x) = F (x). This ﬁnishes the proof of Fubini for f ∈ ↑ S ∩ L1 ; the proof for f ∈ ↓ S ∩ L1 is analogous. Finally assume f ∈ L1 and let ↓ S ∋ hj ≤ f ≤ gj ∈ ↑ S with (gj − hj )(x, y) dn+m (x, y) < 1/j. We can assume that (hj ) is increasing and (gj ) is decreasing, otherwise replace gj with min{g1 , . . . , gj } and hj with max{h1 , . . . hj }. With a null set N ∈ Rn , we can now let Gj (x) := gj (x, y) dm y and Hj (x) := hj (x, y) dm y for x ∈ N , and have gj (x, y) dn+m (x, y) = Gj (x) dn x and similarly / for hj , Hj . So for the non-negative functions Jj := Gj − Hj , we know that the sequence (Jj ) is decreasing (since gj − hj is decreasing and the integral preserves inequalities). So let Jj ց: J. The function J is integrable and J(x) dn x = lim Jj (x) dn x = 0. We claim that this implies J = 0 a.e. Indeed, consider the set Ak := {x ∈ Rn | J(x) > 1/k}. Then µ(Ak ) = χAk ≤ kJ = k J = 0. So Ak is a null set, and therefore so is k Ak = {x | J(x) = 0}. With J = 0 a.e., we know that Gj and Hj have a common limit, which we call F : namely Gj ց F and Hj ր F a.e. But this means the increasing sequence hj (x, y) dm y has a ﬁnite limit F (x) for a.e. x, and so limj→∞ hj (x, ·) =: f∗ (x, ·) is an integrable function (on Rm ) for a.e. x. Likewise limj→∞ gj (x, ·) =: f ∗ (x, ·) is integrable over y for a.e. x. And since (f ∗ (x, y) − f∗ (x, y)) dm y = F (x) − F (x) = 0 for a.e. x, we have, for each x outside a null set, that the single variable function f ∗ (x, ·)− f∗ (x, ·) = 0 a.e. (with the µm -null set of exceptional y’s depending on x). So this common function f∗ = f ∗ must be f (since f ∗ ≥ f ≥ f∗ ). Now the theorem follows from gj (x, y) dn+m (x, y) = Gj (x) dn x by going to the limit. The lhs converges to f (x, y) dn+m (x, y) 39 by construction, the rhs converges to F (x) dn x with F (x) = lim Gj (x) = lim gj (x, y) dm y = (lim gj (x, y)) dm y = f (x, y) dm y for a.e x. The proof with the order of integration reversed is analogous. Tonelli’s theorem is a partial converse of Fubini, namely: Theorem 7.9.2 Let f : Rn × Rm → E be measurable. Further assume that the iterated integral ˆ |f (x, y)| dm y dn x exists (meaning |f (x, ·)| is integrable for a.e. x and F (x) := |f (x, y)| dm y is integrable over x). Then f ∈ L 1 (Rn × Rm ) and Fubini applies. Proof: Since f is measurable, the trunctated functions fN := med(−bN , f, bN ) with bN := N χ[−N,N ]n+m are integrable, and so are the |fN |. Then, by Fubini, we have |fN (x, y)| dm+n (x, y) = ˆ FN (x) dn x with ˆ FN (x) = |fN (x, y)| dm y . We have |fN | ր |f |. To show |f | ∈ L1 (Rn+m ) by Beppo Levi, we need to have a ﬁnite upper bound ˆ ˆ for |fN (x, y)| dn+m (x, y) = FN (x) dn x. Indeed, we conclude from |fN | ≤ |f | that FN (x) = |fN (x, y)| dmy ≤ |f (x, y)| d ˆ ˆ m y = F (x) for a.e. x, and this F was assumed to be integrable. So FˆN (x) dn x ≤ F (x) dn x < ∞. So we conclude that |f | is integrable, and hence f is integrable, too, using 7.8.3. Remark 7.9.3 The iterated integrability of f , rather than |f |, does not ensure the multi-variable integrability of f . For instance let f (x, y) := χ[0,1] (y − x) − χ[−1,0[ (y − x). Then F (x) := f (x, y) dy = 0 for every x, and obviously F (x) dx = 0. However, f ∈ L1 (R2 ), because oth- / erwise, |f | given by |f (x, y)| = χ[−1,1] (y − x) would have to be integrable as well. But the set {(x, y) | −1 ≤ y − x ≤ 1} has inﬁnite measure in R2 , e.g., because it contains inﬁnitely many pairwise disjoint unit squares ]n − 2 , n + 1 [ × ]n − 1 , n + 2 [ for n ∈ Z. 1 2 2 1 7.10 Lp spaces, and the Prominent Integral Inequalities This section compares neatly with Sec 5.10 in the blue notes The following generalization of L1 is often used: Deﬁnition 7.10.1 Given a Daniell extended integral I on X, and p ∈ [1, ∞[, we denote by Lp (X), or Lp (X, I), the set of those measurable functions f : X → E for which f Lp := I(|f |p )1/p < ∞ (in other words, for which |f |p is integrable). It turns out that the ﬁnite valued functions in Lp form a vector space and · Lp is a semi-norm. Accepting these results for the moment (they will be proved soon), we can deﬁne Deﬁnition 7.10.2 Under the hypotheses of Def. 7.10.1, Lp (X) consists of equivalence classes of functions f ∈ Lp (X) modulo equivalence a.e. We then get the 40 Theorem 7.10.3 Lp (X) with the norm · Lp is a Banach space. We will mainly be interested in the case of the Lebesgue integral on Rn , but the case of the summation ‘integral’ IΣ on Seq (Example 7.4.9) is also relevant. In this case, Lp and Lp need not be distinguished because there are no null sets other than ∅, and Lp (N, IΣ ) is usually called ℓp . Let us now ﬁll in the details: To see that the ﬁnite valued Lp functions form a vector space, we need to show that sums and constant multiples of such functions are again in Lp . For constant multiples, this is trivial; for the sum, we estimate pointwise |f + g|p ≤ (|f | + |g|)p ≤ [2 max{|f |, |g|}]p = 2p max{|f |p , |g|p } ≤ 2p (|f |p + |g|p ) So |f + g|p is measurable and has an integrable majorant, if f, g ∈ Lp . The proof that equality a.e. is an equivalence relation on Lp , that addition of equivalence classes can be deﬁned in terms of addition of representatives ([f ] + [g] := [f + g] well-deﬁned), and similarly for multiplication with numbers, is exactly as in the case p = 1 (see Ex. 7.18). All seminorm properties except for the triangle inequality are obvious. The triangle inequality has been proved for p = 1 already, so we only need to deal with the case p > 1 yet. And we’ll have to establish completeness of Lp , which is a similar to the completeness of L1 proved in 7.7.13. The triangle equality for · Lp is also called Minkowski’s inequality, it is a consequence of another important inequality, called o H¨lder’s inequality, and this inequality in turn follows from the following simple inequality for real numbers: 1 1 Lemma 7.10.4 (Young’s Inequality) Let a, b ≥ 0 and 1 < p, q < ∞ such that p + q = 1. Then p q ab ≤ a + a . p q We give several proofs for this inequality, in order to reinforce the message that skill with inequalities is a core skill in analysis. Numbers p, q > 1 satisfying 1 + 1 = 1 are sometimes called each other’s p q o H¨lder conjugates. First Proof of Lemma 7.10.4: We may assume a, b > 0, and we ﬁrst prove the result for p > 1 r rational. So let p = r/s with r, s ∈ N and r > s. Then q = r−s . By the inequality of the arithmetic and geometric mean (Thm 4.2.x1), we get 1/r ar/s · · · ar/s · br/(r−s) · · · br/(r−s) 1 r/s a + · · · + ar/s + br/(r−s) + · · · + br/(r−s) ≤ r s many r − s many s many r − s many and the claim is immediate. Since the right hand side ap /p + aq /q depends continuously on p (with p q = p−1 ) and Q is dense in ]1, ∞[, we can extend the claim from rational to real p by letting Q ∋ pn → p and moving the limit into the inequality for pn . Second Proof of Lemma 7.10.4, using elementary calculus: Again assuming a, b > 0 p and ﬁxing ab = K, we want to show that K ≤ ap /p + (K/a)q /q =: f (a) with q = p−1 , for all a > 0. With p ﬁxed, the right hand side is a diﬀerentiable function of a, and it goes to ∞ as either a → 0+ or a → ∞. So it must have a global minimum on ]0, ∞[ by an easy corollary of Thm. 3.5.8, and at this minimum, f ′ (a) must vanish. But f ′ (a) = ap−1 − K q a−q−1 , hence the minimum can only be 2 at a = a∗ := K q/(p+q) . Hence f (a) ≥ f (a∗ ) = p K pq/(p+q) + 1 K q−q /(p+q) = ( 1 + 1 )K pq/(p+q) = K. 1 q p q 41 1 1 (Note that p + q = 1 implies p + q = pq.) A third proof, as a consequence of a more general Young inequality, can be found as Lemma 5.10.2, Cor. 5.10.3, where however you may want to substitute analogous references in Ch. 7 for the given references in Ch. 5. Namely, Lemma 7.10.5 Let f : [0, ∞[ → [0, ∞[ be continuous and one-to-one onto, with f (0) = 0. Then for a, b > 0, it holds a b ab ≤ f (x) dx + f −1 (y) dy 0 0 Proof: Let y b A = {(x, y) ∈ R2 | 0 ≤ x ≤ a , 0 ≤ y ≤ f (x)} B A B = {(x, y) ∈ R2 | 0 ≤ y ≤ b , 0 ≤ x < f −1 (y)} a x Then a µ2 (A) = χA (x, y) d2 (x, y) = χA (x, y) dy dx = f (x) dx 0 using Tonelli’s and Fubini’s theorem. Similarly, b µ2 (B) = χB (x, y) d2 (x, y) = χB (x, y) dx dy = f −1 (y) dy . 0 A and B are disjoint, so µ2 (A ∪ B) = µ2 (A) + µ2 (B). On the other hand, [0, a] × [0, b] ⊂ A ∪ B, hence ab = µ2 ([0, a] × [0, b]) ≤ µ2 (A) + µ2 (B). This proves the claimed inequality. Now Lemma 7.10.4 arises as a special case of Lemma 7.10.5 for f (x) = xp−1 and hence f −1 (y) = y q−1 , using the Fundamental Lemma of Calculus to evaluate the integral. o We can now prove the H¨lder inequality: 1 1 Theorem 7.10.6 (H¨lder Inequality) Suppose p, q > 1 and o p + q = 1. Suppose f ∈ Lp (Rn ) and g∈L q (Rn ). Then f g ∈ L1 (Rn ) and f g L1 ≤ f Lp g Lq . Proof: If either f Lp or g Lq vanishes, the inequality is trivial, because f g = 0 a.e. So we now assume these norms are positive. For simplicity and generality, we use the I notation again. From the speciﬁcs of the Lebesgue integral, we use that f g is measurable if f and g are, see 7.8.7(4). (But this is trivially also true for sequences f, g according to Exercise 7.21). Since |f g| ≤ |f |p /p + |g|q /q 1 pointwise by Young’s inequality, we conclude that f g is integrable. We write f g as αf · α g with a number α > 0 yet to be chosen, and obtain from integrating Young’s inequality that αp α−q αp p α−q q I(|f g|) ≤ I(|f |p ) + I(|g|q ) = f Lp + g Lq p q p q The best result is obtained by single-variable minimization of the right hand side with respect to α, i.e., by choosing αp+q = g q q / f p p . Plugging this in obtains the H¨lder inequality. L L o 42 o Remark 7.10.7 The special case p = q = 2 of H¨lder’s inequality is called Cauchy-Schwarz in- equality. Finally, we prove the Minkowski inequality f + g Lp ≤ f Lp + g Lp : p f +g Lp = I(|f + g|p ) ≤ I((|f | + |g|) |f + g|p−1 ) ≤ I(|f | |f + g|p−1 ) + I(|g| |f + g|p−1 ) p o Using H¨lder’s inequality with p and q = p−1 , we continue p p−1 p−1 f +g Lp ≤ f Lp f +g Lp + g Lp f +g Lp and hence the Minkowski inequality by cancellation. Remark 7.10.8 Written out in integrals, Minkowski’s inequality reads 1/p 1/p 1/p |f + g|p ≤ |f |p + |g|p For sequences, it reads 1/p 1/p 1/p |fn + gn |p ≤ |fn |p + |gn |p Restricting this to ﬁnite sequences, we obtain the p-norm for vectors in Rn , namely f p = ( n |fj |p )1/p , which generalizes the taxi norm (p = 1) and the euclidean norm (p = 2) and j=1 contains the max-norm as a limiting case p → ∞. Exercise 7.26 For f = (f1 , . . . , fn ) ∈ Rn , show that limp→∞ f p = maxj |fj |. Finally, we want to see that Lp is complete. Theorem 7.10.9 The normed vector space Lp (X) is a Banach space Proof: This proof is a slight modiﬁcation of the proof of Thm. 7.7.13. The well-deﬁnedness of the vector space operations on the equivalence classes carries over directly. Suppose ([fn ]) is a Cauchy sequence in Lp (X). We deﬁne nj inductively such that fn −fm Lp < 2−j provided n, m ≥ nj and (for j > 1) nj > nj−1 . Then gN := N |fnj+1 − fnj | deﬁnes an increasing j=1 sequence (gN ) of measurable functions, converging pointwise to a function g (which, as of yet, p might be inﬁnity in many points). But gN Lp bounded by 2−j = 1. So the gN are integrable, with integral ≤ 1p = 1, and so their monotonic limit gp , is also integrable by Beppo Levi. As an integrable function, gp is ﬁnite a.e., and so is g. Then, for each x for which g(x) is ﬁnite, the sequence fnk (x) = fn1 (x) + k (fnj (x) − fnj−1 (x)) converges, because it converges absolutely. In j=2 other words, (fnk )k converges a.e. to a function f that is measurable as an a.e. limit of measurable functions. Now |fnj − fni |p is bounded as ni → ∞, indeed it is ≤ (2−j )p for i > j. Keeping nj ﬁxed and letting ni → ∞, Fatou’s lemma implies that |fnj − f |p ≤ lim inf ni →∞ |fnj − fni |p ≤ 2−jp . So f ∈ Lp , and [fnj ] → [f ] in Lp norm as nj → ∞. We know from Lemma 3.11.3 that a Cauchy sequence that has a convergent subsequence is convergent itself. 43

DOCUMENT INFO

Shared By:

Categories:

Tags:
Data Integration, respect to, European integration, integration technology, Data Quality, application integration, white paper, business processes, the euro, the European Union

Stats:

views: | 22 |

posted: | 5/22/2011 |

language: | English |

pages: | 43 |

OTHER DOCS BY fdh56iuoui

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.