# Introduction to Probability Theory and Stochastic

Document Sample

```					          Introduction to Probability Theory and
Stochastic Processes for Finance∗
Lecture Notes

Fabio Trojani

Department of Economics, University of St. Gallen, Switzerland

∗ Correspondence address: Fabio Trojani, Swiss Institute of Banking and Finance, University of St. Gallen,

Rosenbergstr. 52, CH-9000 St. Gallen, e-mail: Fabio.Trojani@unisg.ch.
Contents

1 Introduction to Probability Theory                                                                  4

1.1   The Binomial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       4

1.1.1   The Risky Asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      4

1.1.2   The Riskless Asset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     4

1.1.3   A Basic No Arbitrage Condition . . . . . . . . . . . . . . . . . . . . . . . .         5

1.1.4   Some Basic Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .       5

1.1.5   Pricing Derivatives: a ﬁrst Example . . . . . . . . . . . . . . . . . . . . . .        5

1.2   Finite Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      7

1.2.1   Measurable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      7

1.2.2   Probability measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    11

1.2.3   Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      14

1.2.4   Expected Value of Random Variables Deﬁned on Finite Measurable Spaces                 15

1.2.5   Examples of Probability Spaces and Random Variables with Finite Sample

Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   16

1.3   General Probability Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      17

1.3.1   Some First Examples of Probability Spaces with non ﬁnite Sample Spaces .              18

1.3.2   Continuity Properties of Probability Measures . . . . . . . . . . . . . . . .         20

1.3.3   Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      21

1.3.4   Expected Value and Lebesgue Integral . . . . . . . . . . . . . . . . . . . . .        25

1.3.5   Some Further Examples of Probability Spaces with uncountable Sample Spaces 28

1.4   Stochastic Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     29

2 Conditional Expectations and Martingales                                                            33

2.1   The Binomial Model Once More . . . . . . . . . . . . . . . . . . . . . . . . . . . .          33

2.2   Sub Sigma Algebras and (Partial) Information . . . . . . . . . . . . . . . . . . . .          34

1
2.3   Conditional Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .     36

2.3.1   Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   36

2.3.2   Deﬁnition and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . .     37

2.4   Martingale Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   41

3 Pricing Principles in the Absence of Arbitrage                                                     44

3.1   Stock Prices, Risk Neutral Probability Measures and Martingales . . . . . . . . . .          45

3.2   Self Financing Strategies, Risk Neutral Probability Measures and Martingales . . .           46

3.3   Existence of Risk Neutral Probability Measures and Derivatives Pricing . . . . . .           48

3.4   Uniqueness of Risk Neutral Probability Measures and Derivatives Hedging . . . . .            50

3.5   Existence of Risk Neutral Probability Measures and Absence of Arbitrage . . . . .            52

4 Introduction to Stochastic Processes                                                               52

4.1   Basic Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   52

4.2   Discrete Time Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . .        54

4.3   Girsanov Theorem: Application to a Semicontinuous Pricing Model . . . . . . . . .            57

4.3.1   A Semicontinuous Pricing Model . . . . . . . . . . . . . . . . . . . . . . . .       57

4.3.2   Risk Neutral Valuation in the Semicontinuous Model . . . . . . . . . . . . .         58

4.3.3   A Discrete Time Formulation of Girsanov Theorem . . . . . . . . . . . . . .          60

4.3.4   A Discrete Time Derivation of Black and Scholes Formula . . . . . . . . . .          64

4.4   Continuous Time Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . .        66

5 Introduction to Stochastic Calculus                                                                71

5.1   Starting Point, Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   71

5.2   The Stochastic Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    73

5.2.1   Some Basic Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . .     74

5.2.2   Simple Integrands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    75

2
5.2.3   Squared Integrable Integrands . . . . . . . . . . . . . . . . . . . . . . . . . .      81

5.2.4   Properties of Stochastic Integrals . . . . . . . . . . . . . . . . . . . . . . . .     84

5.3     o
Itˆ’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    85

5.3.1   Starting Point, Motivation and Some First Examples . . . . . . . . . . . . .           85

5.3.2                               o
A Simpliﬁed Derivation of Itˆ’s Formula . . . . . . . . . . . . . . . . . . . .        88

5.4   An Application of Stochastic Calculus: the Black-Scholes Model           . . . . . . . . . .   93

5.4.1   The Black-Scholes Market . . . . . . . . . . . . . . . . . . . . . . . . . . . .       93

5.4.2   Self Financing Portfolios and Hedging in the Black-Scholes Model . . . . .             93

5.4.3   Probabilistic Interpretation of Black-Scholes Prices: Girsanov Theorem once

more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   95

3
1       Introduction to Probability Theory

1.1      The Binomial Model

We start with the binomial model to introduce some basic ideas of probability theory related to

the pricing of contingent claims, basically for the following reasons:

• It is a simple setting where the arbitrage concept and its relation to risk neutral pricing can

be explained

• It is a model used in practice where binomial trees are calibrated to real data, for instance

to price American derivatives

• It is a simple setting to introduce the concept of conditional expectations and martingales,

which are at the hearth of the theory of derivatives pricing.

1.1.1     The Risky Asset

St is the price of a risky stock at time t ∈ I, where we start for simplicity with a discrete time

index I = {0, 1, 2}. The dynamics of St is deﬁned by


 uS
     t−1    with probability p
St =                                         ,

 dS
    t−1    with probability 1-p

where p ∈ (0, 1). We impose for brevity the further condition

1
u=     >1
d

giving a recombining tree.

1.1.2     The Riskless Asset

Bt is the price at time t of a riskless money account. r > 0 is the riskless interest rate on the

money account, implying

Bt = (1 + r) Bt−1

4
for any t = 1, 2. For simplicity we impose the normalization B0 = 1.

1.1.3   A Basic No Arbitrage Condition

A necessary condition for the absence of arbitrage opportunities in our model is

d<1+r <u        .                                    (1)

Example 1 In the sequel we will often use a numerical example with parameters S0 = 4, u =

1/d = 2, r = 0.25.

1.1.4   Some Basic Remarks

Notice that to any trajectory T T, T H, HT, HH, in the tree we can associate the corresponding

values of S1 and S2 . Thus, from the perspective of time 0, both S1 and S2 are random entities

whose value depends on which event/trajectory will be realized in the model. To fully describe

the random behaviour of S1 and S2 we can make use of the space Ω = {T T, T H, HT, HH} of

all random sequences that can be realized on the tree. Basically, Ω contains all the information

about the single outcomes that can be realized in our model.

Deﬁnition 2 (i) The set Ω of all possible outcomes in a random experiment is called the sample

space. (ii) Each single event ω ∈ Ω is called an outcome of the random experiment.

Example 3 In the above two period model we had Ω = {T T, T H, HT, HH} and ω = T T or

ω = T H or ω = HT or ω = HH.

Exercise 4 Give the sample space and all single outcomes in a binomial tree with three periods.

1.1.5   Pricing Derivatives: a ﬁrst Example

Deﬁnition 5 An European call option with strike price K and maturity T ∈ I is the right to buy

at time T the underlying stock for the price K. We denote by ct the price of the European call

option at time t.

5
From the deﬁnition we immediately have for the pay-oﬀ at maturity of the call option:


 S −K S >K
 T            T
cT =                           ,


     0       ST ≤ K

or, more compactly:
+
cT = (ST − K)           ,

+
where (x) := max (x, 0) is the positive part of x.

Remark 6 Notice that cT depends on ω ∈ Ω only through ST (ω). The goal in any pricing model

is to determined the time 0 price (as for instance the price c0 ) of a derivative payoﬀ falling at a
+
later time T , say (as for instance the pay-oﬀ cT = (ST − K) ).

Assumption 7 To illustrate the main ideas we start with T = 1.

Deﬁnition 8 A (perfect) hedging portfolio for cT with value V0 at time 0 is a position in ∆0 stock

and V0 − ∆0 S0 money accounts (recall the normalization B0 = 1), such that



 c (H) = ∆ S (H) + (V − ∆ S ) (1 + r)
1        0 1           0            0 0
(2)

 c (T ) = ∆ S (T ) + (V − ∆ S ) (1 + r)
 1         0 1         0   0 0

Remark 9 A (perfect) hedging portfolio replicates exactly the future pay-oﬀ of the derivative to

be hedged. Therefore, it is a vehicle to fully eliminate the risk intrinsic in the randomness of the

future value of a derivative.

Proposition 10 (i) For T = 1, the quantity ∆0 is given by

c1 (H) − c1 (T )
∆0 =                            .                           (3)
S1 (H) − S1 (T )

∆0 is called the ”delta” of the hedging portfolio. (ii) The risk neutral valuation formula follows:

1
c0 = V0 =       [pc1 (H) + (1 − p) c1 (T )]    ,
1+r

where
1+r−d
p=                .
u−d

6
Proof. (i) Compute the diﬀerence between the ﬁrst and the second equation in (2) and solve

for ∆0 . (ii) Insert ∆0 given by (3) in one of the two equations in (2) and solve for V0 . Absence of

arbitrage then implies V0 = c0 .

Remark 11 (i) The price V0 = c0 does not depend on the binomial probability p. (ii) Under the

given conditions (cf. (1)) one has p ∈ (0, 1). Therefore the identity

1
c0 =       [pc1 (H) + (1 − p) c1 (T )]
1+r

says that the price c0 is a discounted expectation of the call future random pay-oﬀs, computed using

the risk adjusted probabilities p and (1 − p). More compactly, we could thus write

E (c1 )
c0 =             ,
1+r

where E denotes expectations under p, 1 − p. This is a so called risk adjusted (or risk neutral)

valuation formula.

Exercise 12 (i) For the case T = 1 and for the model parameters in Example 1 compute the

numerical value of c0 . (ii) For the case T = 2 compute recursively the hedging portfolio of the

derivative, starting from ∆1 (H), ∆1 (T ), V1 (H), V1 (T ), and ﬁnishing with ∆0 and V0 .

1.2     Finite Probability Spaces

In the sequel we let Ω = ∅ be a given sample space.

1.2.1   Measurable Spaces

Let F be the family of all subsets of Ω; F is an example of a so called sigma algebra, a concept

that we deﬁne in the sequel.

Deﬁnition 13 (i) A sigma algebra G ⊂ F is a family of subsets of Ω such that:

1. ∅ ∈ G

7
2. If A ∈ G then it follows Ac ∈ G

3. If (Ai )i∈N ⊂ G is a countable sequence in G, then it follows

Ai ∈ G
i∈N

(ii) The couple (Ω, G) is called a measurable space.

Example 14 (i) F is a sigma algebra, the ﬁnest one on Ω. Indeed, ∅ ∈ F . Moreover, for any

set A ∈ F the complement Ac is a subset of Ω, i.e. is in F. The same holds for any (not only for

a countable) union of sets in F. (ii) The subfamily G := {∅, Ω} is the coarsest sigma algebra on

Ω. (iii) In the setting of the binomial model of Example 1, it is easy to verify (please do it!) that

the subfamily

G := {∅, Ω, {HT, HH} , {T T, T H}}      ,

is a sigma algebra, the sigma algebra generated by the ﬁrst period price movements in the model.

Remark 15 We make use of sigma algebras to model diﬀerent information sets at the disposal

of the investor in doing her portfolio choices. For instance, in the setting of the binomial model

of Example 1, the information available at time 0 (before observing prices) can be modelled by the

trivial information set

G0 := {∅, Ω}    .

That is, at time 0 investors only know that the possible realized outcome ω has to be an element

of the sample space Ω. At time 1 investors can observe S1 . Thus, depending on the value of S1

they will know at time 1 that either

ω ∈ {HT, HH}       (if and only if S1 (ω) = S0 u)   ,

or

ω ∈ {T T, T H}   (if and only if S1 (ω) = S0 d)    .

8
Thus at time 1 investors do not have full information about ω, since they still do not know the

direction of the price movement in period 2. However, they can determine to which speciﬁc event

of their information set ω belongs. The larger (smaller) this set, the preciser (the rougher) the

information on the realized outcome ω. For instance, while at time 0 investors only know that

the outcome will be an element of the sample space, at time 1 they know that the outcome implies

either an upward or a downward price movement in the ﬁrst period. Based on these considerations

a natural sigma algebra G1 to model investors price information at time 1 is

G1 := {∅, Ω, {HT, HH} , {T T, T H}}         ,

(verify that G1 is indeed a sigma algebra). Similarly, by observing only the price S2 investors will

know at time 2 that either

ω = HH    (if and only if S2 (ω) = S0 u2 )       ,

or

ω = TT   (if and only if S2 (ω) = S0 d2 )        ,

or

ω ∈ {T H, HT }     (if and only if S2 (ω) = S0 du)         .

On the other hand, by observing the prices S1 and S2 investors will know at time 2

ω = HH    (if and only if S2 (ω) = S0 u2 )       ,

or

ω = TT   (if and only if S2 (ω) = S0 d2 )        ,

or

ω = TH      (if and only if S1 (ω) = S0 d and S2 (ω) = S0 du )       ,

or

ω = HT      (if and only if S1 (ω) = S0 u and S2 (ω) = S0 du )       .

9
Based on these considerations a natural sigma algebra G2 to model investors price information up

to time 2 is the smallest one containing the system of subsets of Ω given by

E2 := {∅, Ω, {HT } , {HH} , {T T } , {T H}}        .

We denote this sigma algebra by G2 = σ (E2 ). Finally, the sigma algebra representing the infor-

mation obtained by observing only the price S2 is

G3 = {∅, Ω, {HH} , {T H, HT, T T } , {T T } , {T H, HT, HH} , {T H, HT } , {T T, HH}}

Notice that while the relation G0 ⊂ G1 ⊂ G2 implies an information set growing over time, we do

not have G1 ⊂ G3 (why?). Therefore, the sequence of sigma algebras G0 , G1 , G3 is not consistent

with the idea of an investor’s information set growing over time.

Exercise 16 (Borel sigma algebra on R) Let Ω := R and denote by T the set of all open intervals

in R

T = {(a, b)   |     a ≤ b, a, b ∈ R}      .

1. Show with a simple counterexample that T is not a sigma algebra on R.

2. We know that there does exist a sigma algebra over R containing T (which one?). Thus,

there also exists a ”minimal sigma algebra” containing T , the so-called Borel sigma algebra

over R (denoted by B (R)) which has to be of the form

B (R) =                            G
G is σ−algebra over R
T ⊂G

To show that B (R) is indeed a sigma algebra over R it is thus suﬃcient to show that in-

tersections of sigma algebras are sigma algebras. Do this, by verifying the corresponding

deﬁnition.

3. Show, using simple set operations, that the events (−∞, a), (a, ∞), [a, b], (a, b], {a}, where

a ≤ b, are elements of B (R).

10
4. Show that any countables subset {ai }i∈N of R is an element of B (R).

As mentioned, a natural way to model a growing amount of information over time is through

increasing sequences of sigma algebras. This is the next deﬁnition.

Deﬁnition 17 Let (Ω, G) be a measurable space. A sequence (Gi )i=0,1,...,n of sigma algebras over

Ω such that

G0 ⊂ G1 ⊂ ... ⊂ Gn ⊂ G     ,

is called a ﬁltration.

Example 18 In Remark 15 the sequence (Gi )i=0,1,2 is a ﬁltration, while the sequence (Gi )i=0,1,3

is not.

1.2.2     Probability measures

For the whole section let (Ω, G) be a measurable space.

Deﬁnition 19 We say that an event A ∈ G is realized in a random experiment with sample space

Ω if ω ∈ A.

Example 20 In the two period binomial model we have

{T H, T T } = {The stock price drops in the ﬁrst period}   .

Thus, if a time 1 we observe T , {T H, T T } is realized. On the other hand, if we observe H, then

{T H, T T } is not realized (i.e. Ac = {HT, HH} is realized).

The next step is to assign in a consistent way probabilities to events that can be realized in a

random experiment.

11
Deﬁnition 21 (i) A probability measure on (Ω, G) is a function P : G → [0, 1] such that:

1. P (Ω) = 1

2. For any disjoint sequence (Ai )i∈N ⊂ G such that Ai ∩ Aj = ∅ for i = j it follows

P           Ai    =             P (Ai )   .
i∈N                 i∈N

This property is called sigma additivity.

(ii) We call a triplet (Ω, G, P ) a probability space.

Example 22 In the two period binomial model we set Ω = {T T, T H, HT, HH}, G = F, and

deﬁne probabilities with the binomial rule

2
P (HH) = p2      ,   P (T T ) = (1 − p)            ,    P (T H) = P (HT ) = p (1 − p)   .

The sigma additivity then implies, for instance

P (HT, HH) = P (HH) + P (HT ) = p2 + p (1 − p)                  .

More generally, we have, in this ﬁnite sample space setting:

P (A) =               P (ω)
ω∈A

Proposition 23 Let (Ω, G, P ) be a probability space. We have:

1. P (A\B) = P (A) − P (A ∩ B)

2. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)

3. P (Ac ) = 1 − P (A)

4. If A ⊂ B then P (A) ≤ P (B)

12
Proof. 1. A\B = A ∩ B c and A = (A ∩ B) ∪ (A ∩ B c ). By sigma additivity if follows:

P (A) = P (A ∩ B) + P (A ∩ B c ) = P (A ∩ B) + P (A\B)   .

2. A ∪ B = (A\B) ∪ B. Therefore, using 1 and by sigma additivity:

P (A ∪ B) = P (A\B) + P (B) = P (A) + P (B) − P (A ∩ B)           .

3. This is a particular case of 1. with A = Ω and B = A. 4. By 1. we have, under the given

assumption:

P (B) = P (B ∩ A) + P (B\A) = P (A) + P (B\A) ≥ P (A)         .

Remark 24 In Deﬁnition 21, the condition 1. for a probability measure implies the condition,

1’. P (∅) = 0.

In fact, a function µ : G → [0, ∞] satisfying condition 1’. and 2. in Deﬁnition 21 is called a

measure on the measurable space (Ω, G). Notice, that in this case we can have µ (Ω) = ∞.

Exercise 25 The Lebesgue measure on the measurable space (R, B (R)) (denoted by µ0 ) is a mea-

sure µ0 : B (R) → [0, ∞] such that

µ0 ((a, b)) = b − a

for any open interval (a, b), a ≤ b. It can be shown that Lebesgue measure exists and is unique

(we will not prove this, we will just assume it in the sequel). Show the following properties of

Lebesgue measure, using the general deﬁnition of a measure.

1. µ0 (∅) = 0, µ0 (R+ ) = ∞

2. µ0 ({a}) = 0 for any a ∈ R

3. For any countable subset {ai }i∈N of R one has µ0 {ai }i∈N = 0.

13
1.2.3      Random Variables

For the whole section let (Ω, G) be a measurable space such that the cardinality of Ω is ﬁnite

(|Ω| < ∞). We will extend the concept of a random variable to non ﬁnite sample spaces in a later

section.

Deﬁnition 26 Let X : Ω → R be a function from Ω to the real line. (i) The sigma algebra

σ (X) := X −1 (B) : B is a subset of R                  ,

where X −1 (B) is a short notation for the preimage {ω : X (ω) ∈ B} of B under X, is called the

sigma algebra generated by X. (ii) X is called a random variable on (Ω, G) if it is measurable with

respect to G, that is if

σ (X) ⊂ G      .

Remark 27 (i) It is useful to know some properties of preimages. We have for any subset B of

R, and for any (non necessarily countable) sequence (Bα )α∈A of subsets of R:

c
X −1 (B c )   =     X −1 (B)

X −1             Bα      =          X −1 (Bα )
α∈A                α∈A

X −1             Bα      =          X −1 (Bα )
α∈A                α∈A

(ii) σ (X) is a sigma algebra. Indeed, ∅ = X −1 (∅) ∈ σ (X). Moreover, if A = X −1 (B) for some

subset of R, then
c
Ac = X −1 (B)            = X −1 (B c ) ∈ σ (X)     ,

because B c is a subset of R. Similarly, given a sequence (Ai )i∈N such that Ai = X −1 (Bi ) for a

sequence of subsets (Bi )i∈N of R we have:

Ai =         X −1 (Bi ) = X −1           Bi   ∈ σ (X)       ,
i∈N          i∈N                         i∈N

because     i∈N   Bi is a subset of R. (iii) σ (X) represents the (partial) information set that is

available about an outcome ω ∈ Ω by observing the values of X.

14
Example 28 In the two period binomial model S0 , S1 and S2 are all (trivially) measurable with

respect to the ﬁnest sigma algebra F over Ω. However, since S0 is constant we have

σ (S0 ) = {∅, Ω} = G0        ,

and S0 is G0 measurable. Further,

σ (S1 ) = {∅, Ω, {HT, HH} , {T T, T H}} = G1     ,

and S1 is G1 but not G0 measurable. Finally,

σ (S2 ) = {∅, Ω, {HH} , {T H, HT, T T } , {T T } , {T H, HT, HH} , {T H, HT } , {T T, HH}}

= G3    .

Therefore, S2 is G3 but not G1 measurable. On the other hand, S1 is G1 but not G3 measurable

(why?).

1.2.4     Expected Value of Random Variables Deﬁned on Finite Measurable Spaces

For the whole section let (Ω, G, P ) be a probability space such that the cardinality of Ω is ﬁnite

(|Ω| < ∞). We will extend the concept of expected value of a random variable to the non ﬁnite

sample space setting in a later section. Further, let X : (Ω, G) → R be a random variable.

Deﬁnition 29 (i) The expected value E (X) of a random variable X deﬁned on a ﬁnite sample

space is given by

E (X) :=          X (ω) P (ω)       .
ω∈Ω

(ii) The variance V ar (X) of X is given by

2                     2
V ar (X) := E (X − E (X))          = E X 2 − (E (X))     .

15
Example 30 In the two period binomial model of Example 1 we have:

S2 (HH) =      16       ;    P (HH) = p2

S2 (HT ) =     S2 (T H) = 4          ;    P (T H) = P (HT ) = p (1 − p)

2
S2 (T T ) =    1    ;       P (HH) = (1 − p)

Therefore,
2
E (S2 ) = 16 · p2 + 4 · 2 · p (1 − p) + 1 · (1 − p)

1.2.5   Examples of Probability Spaces and Random Variables with Finite Sample

Space

Example 31 The Bernoulli distribution with parameter p is a probability measure P on the mea-

surable space (Ω, G) given by Ω := {0, 1}, G := F, such that:

P (1) = p ∈ (0, 1)        .

Example 32 The Binomial distribution with parameters n and p is a probability measure P on

a measurable space (Ω, G) given below. The sample space is given by

Ω := {n − dimensional sequences with components 0 or 1}                  .

For instance, a possible element of Ω is

ω = 0010100...1111                .
n components

Further, we set G := F. Finally, P is given by

# of 0 in ω
P (ω) = p# of 1 in ω (1 − p)                           .

For instance, using the properties of a probability measure we have:

P (at least a 1 over the n components)            =   1 − P (no 1 over the n components)

n
= 1 − (1 − p)            ,

16
and so forth.

Example 33 A discrete uniform distribution modelling the toss of a fair die is obtained by setting

Ω := {1, 2, 3, 4, 5, 6}, G := F, and

1
P (ω) =          ,    ω∈Ω     .
6

For instance, using the properties of a probability measure we then have:

1
P (obtaining an even number ) = P (2) + P (4) + P (6) =           ,
2

and so forth.

Example 34 A discrete uniform distribution modelling the toss of two independent fair dies is

obtained by setting

Ω := {11, 12, 13, 14, 15, 16, 21, 22, ..., 66}

G := F, and
1
P (ω) =          ,    ω∈Ω     .
36

For instance, using the properties of a probability measure we then have:

1
P (the sum of the two numbers is larger than 10 ) = P (66) + P (56) + P (65) =         ,
12

and so forth. Let X : Ω → {2, 3, 4, .., 12} be the function giving the sum of the numbers on the two

dies. We have:                                                       

                                        

σ (X) = ∅, Ω, {11} , {12, 21}, {13, 31, 22}, ... ⊂ F      ,

                                        

X −1 (2)   X −1 (3)    X −1 (4)

that is X is a random variable on (Ω, F).

1.3    General Probability Spaces

Deﬁnition 21 of a probability space does not require the assumption |Ω| < ∞.

17
1.3.1     Some First Examples of Probability Spaces with non ﬁnite Sample Spaces

A ﬁrst simple example of a probability space deﬁned on a non ﬁnite sample space is the following.

Example 35 Let Ω = R, G = B (R) and deﬁne

P (A) = µ0 (A ∩ [0, 1])          .

P is a probability measure, the uniform distribution on the interval [0, 1]. Indeed, we have:

1. P (Ω) = µ0 (Ω ∩ [0, 1]) = µ0 ([0, 1]) = 1.

2. For any disjoint sequence (Ai )i∈N ⊂ B (R) it follows

P (∪i∈N Ai ) = µ0 ((∪i∈N Ai ) ∩ [0, 1]) = µ0 (∪i∈N (Ai ∩ [0, 1])) =            µ0 (Ai ∩ [0, 1]) =         P (Ai )
i∈N                        i∈N

More generally, setting
µ0 (A ∩ [a, b])
P (A) =                          ,
µ0 ([a, b])

deﬁnes a uniform distribution on the interval [a, b].

A famous example of a probability space with non ﬁnite sample space is the one underlying a

Poisson distribution on N.

Example 36 Let Ω := N and G := F . Thus in this case Ω is an inﬁnite, countable, sample

space. We deﬁne for any ω ∈ Ω

λω −λ
P (ω) :=      e       ,    λ>0         .
ω!

Setting for A ∈ F

P (A) :=         P (ω)    ,
ω∈A

one obtains the Poisson distribution on (N, F) with parameter λ. P is a probability measure on

(Ω, F). Indeed, we have
∞
λk −λ
P (Ω) =         P (ω) =            e =1         ,
k!
ω∈Ω             k=0

18
and, for any disjoint sequence (Ai )i∈N ⊂ F,

P         Ai   =               P (ω) =                   P (ω)       =         P (Ai )     .
S
i∈N            ω∈ i∈N Ai              i∈N     ω∈Ai                   i∈N

The last example of a probability space with non ﬁnite sample space that we present is the

one underlying a Binomial experiment where n → ∞.

∞
Example 37 Let Ω := {T, H}            be the space of inﬁnite sequences with components T or H. Thus

any outcome ω ∈ Ω is of the form

ω = (ωi )i∈N     ,        ωi ∈ {T, H}       .

This is an inﬁnite, uncountable, sample space. Therefore, some caution is needed in constructing

a suitable sigma algebra on Ω, on which we are enabled in a second step to extend the binomial

distribution in a consistent way. We deﬁne

Gn := {The sigma algebra generated by the ﬁrst n tosses}                          ,

for any n ∈ N. For instance, we obtain for G1 :

G1 = {∅, Ω, {ω ∈ Ω : ω1 = T } , {ω ∈ Ω : ω1 = H}}                      ,

and so on for n > 1. We know that there is a sigma algebra F over Ω such that Gn ⊂ F for

all n ∈ N. However, this sigma algebra is too large to assign binomial probabilities on it in a

consistent way. Therefore, we work in the sequel with the smallest sigma algebra containing all

Gn ’s. We deﬁne

G :=                                 H      ,
H⊃∪n∈N Gn
H is sigma algebra over Ω

the sigma algebra generated by ∪n∈N Gn . Notice that G contains events that can be quite rich and

that do not belong to any Gn , n ∈ N. An example of such an event is

A := {H on every toss} = {ω ∈ Ω : ωi = H for all i ∈ N} =                       {ω ∈ Ω : ωi = H for i ≤ n} ∈ G   ,
n∈N
∈Gn

19
where

{ω ∈ Ω : ωi = H for i ≤ n} = {H on the ﬁrst n tosses}               .

We now deﬁne a probability measure P on G whose restriction on any Gn is a binomial distribution

with parameters n and p. Precisely, deﬁne for any A ∈ Gn and some given n ∈ N

# of T in the ﬁrst n tosses
P (A) = p# of H   in the ﬁrst n tosses
(1 − p)                             .

For instance, for the event

{H on the ﬁrst 2 tosses} = {ω ∈ Ω : ωi = H for i ≤ 2}               ,

we obtain

P (H on the ﬁrst 2 tosses) = p2            ,

and so forth. Using the properties of a probability measure we can then uniquely extend P to all

of G. For instance, we have

P (H on all tosses) ≤ P (H on the ﬁrst n tosses) = pn              ,

for all n ∈ N. Therefore, for p ∈ (0, 1) it follows

P (H on all tosses) = 0            .

1.3.2     Continuity Properties of Probability Measures

Two further continuity properties of a probability measure - in excess of the properties in Propo-

sition 23 - are useful when working with countable set operations over monotone sequences of

events. They are given below.

Proposition 38 Let (An )n∈N ⊂ G be a countable sequence of events. It then follows:

1. If A1 ⊂ A2 ⊂ ..., then:

P (An )     ↑      P         An       ,
n→∞
n∈N

(continuity from below).

20
2. If A1 ⊃ A2 ⊃ ..., then:

P (An )     ↓       P          An
n→∞
n∈N

(continuity from above).

Proof. 1. Let A :=          n∈N   An . We have,

A=          (An \An−1 )          ,
n∈N

where A0 := ∅. Thus, under the given assumption the event A is written as a countable, disjoint,

union of subsets of G. It then follows using the properties of a probability measure

P (A) =         P (An \An−1 ) =         (P (An ) − P (An−1 )) = lim (P (An ) − P (A0 )) = lim P (An )              .
n→∞                                   n→∞
n∈N                     n∈N

2. We have
c

P (An )    ↓    P         An   ⇔P   (Ac )
n       ↑       P              An           =P         Ac
n     ,
n→∞                               n→∞
n∈N                                     n∈N                        n∈N

by de Morgan’s law. The proof now follows from 1.

1.3.3     Random Variables

For the whole section let (Ω, G, P ) be a probability space and (R, B (R)) be a Borel measurable

space over R.

When working with uncountable sample spaces, the measurability requirement behind Deﬁni-

tion 26 of a random variable for ﬁnite sample spaces has to be modiﬁed. Basically, we are going to

require measurability only for preimages of any Borel subset of R, rather than measurability for

preimages of any subset of R. This is a necessary step, in order to be able to assign consistently

probabilities to Borel events determined by the images of some random variable on (Ω, G, P ).

Deﬁnition 39 Let X : Ω → R be a real valued function. (i) The sigma algebra

σ (X) := X −1 (B)           :       B ∈ B (R)           ,

21
is the sigma algebra generated by X. (ii) X is a random variable on (Ω, G) if

σ (X) ⊂ G         .

Example 40 For a set A ⊂ Ω let a function 1A : Ω → {0, 1} be deﬁned by



 1     ω∈A
1A (ω) =                      .

 0 otherwise


1A is called the indicator function of the set A. We have (please verify)

σ (1A ) = {∅, Ω, A, Ac }       .

Hence, 1A is a random variable over (Ω, G) if and only if A ∈ G.

The measurability property in Deﬁnition 44 allows us to assign in a natural way probabilities

also to Borel events that are induced by images of random variables, as is illustrated in the next

example.

Example 41 Let X be a random variable on a probability space (Ω, F, P ). For any event B ∈

B (R) we deﬁne

LX (B) := P X −1 (B)               .                                               (4)

LX is a probability measure on B (R), the probability distribution of X (or the probability induced

by X on B (R)). Remark, that (4) is well deﬁned, precisely because of the measurability of the

random variable X. Showing that LX is indeed a probability measure is very simple. In fact, we

have:

LX (R) = P X −1 (R) = P (X ∈ R) = P (Ω) = 1                               .

Moreover, for any sequence (Bi )i∈N of disjoint events we obtain:
∞                           ∞
−1                                −1                             −1
LX            Bi   =P   X              Bi     =P          X        (Bi )   =           P X        (Bi ) =           LX (Bi )    ,
i∈N                      i∈N                i∈N                        i=1                            i=1

using in the third equality the fact that (Bi )i∈N (and thus also X −1 (Bi )                          i∈N
) is a sequence of

disjoint events.

22
Checking measurability of a candidate random variable can be by deﬁnition a quite hard and

lengthy task, since we have to check preimages of any Borel subset of R. Fortunately, the next

result oﬀers a much easier criterion by which measurability is easy to verify in many applications.

Proposition 42 For a function X : Ω → R let

E := X −1 ((−∞, t)) : t ∈ R = {{X < t} : t ∈ R}                 ,

be the set of preimages of open intervals of the form (−∞, t) under X. Then it follows:

E ⊂G      ⇔         σ (X) ⊂ G     .

Proof. Deﬁne:

H := B ∈ B (R) : X −1 (B) ∈ G ⊂ B (R)                   .

It is suﬃcient to show that under the given conditions B (R) ⊂ H, i.e. B (R) = H. We start by

showing that H is a sigma algebra. We have ﬁrst

X −1 (∅) = ∅ ∈ G         ,

hence ∅ ∈ H. Second, for a set B ∈ H it follows
                c
        
X −1 (B c ) = X −1 (B) ∈ G            .
∈G

Finally, for a sequence (Bn )n∈N ⊂ H we have
             
 −1    
X −1          Bn   =         X (Bn ) ∈ G         ,
n∈N            n∈N
∈G

showing that H is a sigma algebra as claimed. Since B (R) is by deﬁnition the smallest sigma

algebra containing all open intervals on the real line it is suﬃcient to show that under the given

conditions H contains all open intervals on the real line. To this end, recall that all sets of the

form (−∞, t) are by assumption elements of H. For a general open interval (a, b), a ≤ b it then

23
follows:
c
1
X −1 ((a, b))       =     X −1    (−∞, b) ∩              −∞, a +
n
n∈N
c
1
=     X −1    (−∞, b) ∩              −∞, a +
n
n∈N
                                      c 
                        1  
             −1            
=     X −1 ((−∞, b)) ∩             X   −∞, a +     ∈ G.
                        n  
n∈N
∈G
∈G

This concludes the proof of the proposition.

Example 43 Let (Xn )n∈N be an arbitrary sequence of random variables on (Ω, G). It then follows:

1. aX1 + bX2 is a random variable for any a, b ∈ R

2. supn∈N Xn and inf n∈N Xn are random variables

3. lim sup Xn := limn→∞ supk≥n Xk and lim inf Xn := limn→∞ inf k≥n Xk are random vari-

ables.

Proof. We apply several times Proposition 42. 1. For a, b = 0 we have

r        t−r
{aX1 + bX2 < t} =            {aX1 < r} ∩ {bX2 < t − r} =                       X1 <       ∩ X2 <                     ∈G   .
a         b
r∈Q                                               r∈Q
∈G                        ∈G

For statement 2. we obtain:

sup Xn < t          =         {Xn < t} ∈ G    ,         inf Xn < t        =            {Xn < t} ∈ G     .
n∈N                                                    n∈N
n∈N                                                   n∈N
∈G                                                       ∈G

3. For any n ∈ N it follows that Yn := supk≥n Xk and Zn := inf k≥n Xk are random variables,

by 2. Moreover, the sequences (Yn )n∈N and (Zn )n∈N are monotonically decreasing and increasing,

respectively. Thefore:

{lim sup Xn < t}            =    lim Yn < t =                 {Yn < t} ∈ G           .
n→∞
n∈N
∈G

{lim inf Xn < t}            =    lim Zn < t =                 {Zn < t} ∈ G           .
n→∞
n∈N
∈G

This concludes the proof.

24
1.3.4   Expected Value and Lebesgue Integral

For the whole section let (Ω, G, P ) be a probability space and (R, B (R)) be the Borel measurable

space over R.

The expected value of a general random variable is deﬁned as its Lebesgue integral with

respect to some probability measure P on (Ω, G). More generally, Lebesgue integrals of measurable

functions can be deﬁned with respect to some measure (as for instance Lebesgue measure µ0 )

deﬁned on a corresponding measurable space (as for instance the measurable space (R, B (R))).

The construction of the Lebesgue integral for a general random variable X starts by deﬁning the

value of the Lebesgue integral for linear combinations of indicator functions, goes over to extend

the integral to functions that are pointwise monotonic limits of sequences of simple functions, and

ﬁnally deﬁnes the integral for the more general case of an integrable random variable (see below

the precise deﬁnition.)

Deﬁnition 44 (i) A random variable X is simple if
n
X=            c i 1 Ai    ,
i=1

where n ∈ N, c1 , .., cn ∈ R, and A1 , .., An ∈ G are mutually disjoint events. The (vector) space

of simple random variables on (Ω, G) is denoted by S (G).The expected value E (X) of a simple

function X is deﬁned by
n
E (X) :=       XdP :=               ci P (Ai )       .
Ω                  i=1

(ii) Let X ≥ 0 be a non negative random variable. The expected value E (X) of X is deﬁned by

E (X) :=        XdP := sup          Y dP : Y ≤ X and Y ∈ S (G)           .
Ω                   Ω

(iii) A random variable X is integrable, if

E X+ < ∞           ,        E X− < ∞            ,

where X + := max (X, 0) and X − := max (−X, 0) are the positive and negative part of X, respec-

tively. We denote the (vector) space of integrable random variable by L1 (P ). For any X ∈ L1 (P )

25
the expected value E (X) of X is deﬁned by

E (X) = E X + − E X −                   .

(iv) Finally, for a random variable X ∈ L1 (P ) and a set A ∈ G we deﬁne

XdP :=         1A XdP
A              Ω

Remark 45 (i) The key point in the deﬁnition of E (X) is (iii). In fact, (iii) is a quite reasonable

deﬁnition because for any random variable X ≥ 0 there always exists a sequence (Xn )n∈N of simple

random variables converging monotonically pointwise to X from below. Such a sequence is obtained

for instance by setting for any ω ∈ Ω

n2n
k−1
Xn (ω) =             1 k−1         (ω) + n1{X>n} (ω)         .
2n { 2n <X≤ 2n }
k

k=1

Moreover, it can be shown that the limit of the sequence of integrals E (Xn ) does not depend on

the choice of the speciﬁc approximating sequence. Therefore, (iii) in Deﬁnition 44 could be also

equivalently written as

E (X) := lim E (Xn ) := lim                Xn dP   ,
n→∞                 n→∞   Ω

for a given approximating sequence (Xn )n∈N . (ii) As mentioned, expected values are by deﬁnition

just integrals of measurable functions with respect to some probability measure. In fact, the deﬁni-

tion of the Lebesgue integral of a measurable function with respect to some measure µ, say, follows

exactly the same steps as above, readily by replacing everywhere the probability measure P with

the measure µ in (i), (ii), (iii) and (iv).

Let us discuss some ﬁrst (very) simple examples of expected values computed using the above

deﬁnitions.

Example 46 Let Ω := R, G := B (R) and set for any A ∈ G

P (A) = µ0 (A ∩ [0, 1])        .

26
The expected value of X := 1Q is

E (X) = 1 · µ0 (Q ∩ [0, 1]) = 0    ,

because Q is a countable set. Notice, that this function is not Riemann integrable in the usual

sense. The expected value of


 ∞
           ω=0
Y (ω) :=                           ,

 0
       otherwise

can be computed as the limit of the expected values in an approximating sequence (Xn )n∈N of

simple functions given by                       

 n
          ω=0
Xn (ω) :=                          .

 0
         otherwise

Hence:

E (Y ) = lim E (Xn ) = lim n · µ0 ({0} ∩ [0, 1]) = lim n · µ0 ({0}) = 0            .
n→∞                n→∞                            n→∞

Notice, that also Y is not Riemann integrable in the usual sense.

The basic properties of the above integral deﬁnition are collected in the next proposition.

Proposition 47 Let X,Y ∈ L1 (P ) and a,b ∈ R; it then follows:

1. E (aX + bY ) = aE (X) + bE (Y )

2. If X ≤ Y pointwise, then

E (X) ≤ E (Y )

3. For two sets A, B ∈ G such that A ∩ B = ∅ it follows

XdP =          1A∪B XdP =        (1A + 1B ) XdP =1.         XdP +       XdP   .
A∪B              Ω                 Ω                          A           B

27
Proof. 1. For brevity we show this property only for indicator functions X = 1A , Y = 1B ,

where A, B ∈ G are disjoint events. We have

E (aX + bY ) = E (a1A + b1B ) =Def (ii) aP (A)+bP (B) = aE (1A )+bE (1B ) = aE (X)+bE (Y )                 .

2. If Y −X ≥ 0, then there exists a sequence of simple approximating functions Xn ≥ 0 converging

monotonically to Y − X. This implies:
kn
E (Y ) − E (X) =1. E (Y − X) = lim E (Xn ) = lim                         cin P (Ain ) ≥ 0   ,
n→∞                  n→∞
i=1

say, because for any n ∈ N we have c1n , .., ckn n ≥ 0.

1.3.5   Some Further Examples of Probability Spaces with uncountable Sample Spaces

For the whole section let (Ω, G, P ) be a probability space and (R, B (R)) be the Borel measurable

space over R.

Using Lebesgue integrals we are also able to construct probability measures by integrating a

suitable (density) function over events A ∈ G. A well-known example in this respect arises by

integrating the density function of a standard normal distribution.

Example 48 (Ω, G) := (R, B (R)); φ : R → R+ is deﬁned by

1      x2
φ (x) = √ exp −                     ,   x∈R           .
2π     2

φ is the density function of a standard normally distributed random variable and is such that

∞
φ (x) dµ0 (x) =            φ (x) dx = 1     ,
R                      −∞

i.e. φ ∈ L1 (µ0 ). A standard normal probability distribution P on (R, B (R)) is obtained by setting

for any A ∈ G:

P (A) :=       φ (x) dµ0 (x)      .
A

It is straightforward to verify, using the basic properties of Lebesgue integrals together with some

monotone convergence property, that P is indeed a probability measure.

28
More generally, densities can be also deﬁned on abstract probability spaces, as is demonstrated

in the next ﬁnal example.

Example 49 Let X ≥ 0 be a random variable on (Ω, G) such that X ∈ L1 (P ), and deﬁne

E (1A X)          X
Q (A) :=            = E 1A                    .
E (X)          E (X)

It is easy to verify, using the basic properties of Lebesgue integrals together with some monotone

convergence property, that Q is a further probability measure on (Ω, G). Moreover, the absolute

continuity property

P (A) = 0 ⇒ Q (A) = 0    ,

follows from the deﬁnition. If, moreover,

P (A) = 0 ⇐⇒ Q (A) = 0

the probabilities Q and P are called equivalent. This property holds when X > 0. The random

X                                                                                  dQ
variable Z :=   E(X)   is called the Radon Nykodin derivative of Q with respect to P , denoted by   dP .

dQ                                             dQ
By construction   dP   is a density function on (Ω, G) because   dP      ≥ 0 and

dQ           X
E         =E              =1         .
dP         E (X)

1.4    Stochastic Independence

For the whole section let (Ω, G, P ) be a probability space

Deﬁnition 50 Two events A, B ∈ G are (stochastically) independent if

P (A ∩ B) = P (A) P (B)         .                              (5)

We use the notation A ⊥ B to denote two independent events.

29
Remark 51 Condition (5) states that two events are independent if and only if their conditional

and unconditional probabilities are the same, i.e.:

P (A ∩ B)   P (A) P (B)
P (A|B) :=             =             = P (A)        ,
P (B) A⊥B    P (B)

(provided of course P (B) > 0). This property is symmetric in A, B.

Example 52 Stochastic independence is a feature determined by the structure of the underlying

probability P . As an illustration of this fact consider again the two period binomial model of

Example 1. We have there:

P (HH, HT ) P (HT, T H) = p2 + p (1 − p) (2p (1 − p)) = 2p2 (1 − p)      ,          (6)

and

P ({HH, HT } ∩ {HT, T H}) = P (HT ) = p (1 − p)          .                (7)

Therefore, (6) and (7) are equal if and only if p = 1 , that is the only binomial probability under
2

which the above events are independent is the one implied by p = 1 .
2

The concept of stochastic independence between events can be naturally extended to stochastic

independence between information sets, i.e. sigma algebras.

Deﬁnition 53 Two sigma algebras G1 , G2 ⊂ G are stochastically independent if for all A ∈ G1

and B ∈ G2 one has A ⊥ B. We use the notation G1 ⊥ G2 to denote independent sigma algebras.

Example 54 In the two period binomial model of Example 1 we deﬁne the two following sigma

algebras:

G1 := {∅, Ω, {HT, HH} , {T T, T H}}    ,

the sigma algebra generated by the ﬁrst price increment, and

G2 := {∅, Ω, {HH, T H} , {T T, HT }}   ,

30
the sigma algebra generated by the second price movement. We then have, for any p ∈ [0, 1]:

G1 ⊥ G 2      .

For instance, for the sets {HT, HH} and {HH, T H} one obtains

P (HT, HH) P (HH, T H) = p2 + p (1 − p)               p2 + p (1 − p) = p2   ,

and

P ({HT, HH} ∩ {HH, T H}) = P (HH) = p2                 .

These features derive directly from the way how probabilities are assigned by a binomial distribution

where
# of T in ω
P (ω) = p# of H   in ω
(1 − p)               .

Finally, we can also deﬁne independence between random variables as independence of the

information sets they generate.

Deﬁnition 55 Two random variables X,Y on (Ω, G, P ) are independent if

σ (X) ⊥ σ (Y )         .

We use the notation X ⊥ Y to denote independence between random variables.

Example 56 We already discussed that the two sigma algebras G1 , G2 of Example 54 are inde-

pendent in the binomial model. Notice that we have (please verify!)

G1 = {∅, Ω, {HT, HH} , {T T, T H}} = σ (S1 /S0 )             ,

and

G2 := {∅, Ω, {HH, T H} , {T T, HT }} = σ (S2 /S1 )           .

Therefore, the stock price returns S1 /S0 and S2 /S1 in a binomial model are stochastically inde-

pendent.

31
Example 57 Let A, B ∈ G be two independent events and let the functions
                               

 1                             
 1
       ω∈A                              ω∈B
1A (ω) =                     , 1B (ω) =                                       ,

 0 otherwise                   
 0 otherwise
                               

be the indicator functions of the sets A and B, respectively. We then have (please verify):

σ (1A ) = {∅, Ω, A, Ac }   ,        σ (1B ) = {∅, Ω, B, B c }   .

Therefore, 1A ⊥ 1B if and only if A ⊥ B (please verify).

Some properties related to independence are important. The ﬁrst one says that independence

is maintained under (measurable) transformations.

Proposition 58 Let X,Y be independent random variables on (Ω, G, P ) and h, g : R → R be two

(measurable) functions. It then follows:

h (X) ⊥ g (Y )        .

Proof. We give a graphical proof of this statement, which makes use of the fact that preimages

of composite mappings are contained in the preimage of the ﬁrst function in the composition:

σ (X)       ⊥By assumption            σ (Y )

∪                                     ∪       .

σ (h (X))                           σ (g (Y ))

The second important property of stochastic independence is related to the expectation of a

product of random variables.

Proposition 59 Let X,Y be independent random variables on (Ω, G, P ). It then follows

E (XY ) = E (X) E (Y )             .

Proof. For the sake of brevity we give the proof for the simplest case where X = 1A , Y = 1B ,

for events A, B ∈ G such that A ⊥ B. As usual, the extension of this result for more general

32
setting requires considering linear combinations of indicator functions, i.e. simple functions, and

pointwise limits of simple functions. For the given simpliﬁed setting we have:

c
E (XY )     = E (1A 1B ) = E (1A∩B ) = 1 · P (A ∩ B) + 0 · P ((A ∩ B) )

=   P (A ∩ B) = P (A) P (B) = E (1A ) E (1B ) = E (X) E (Y )          .
A⊥B

This concludes the proof.

2     Conditional Expectations and Martingales

For the whole section let (Ω, G, P ) be a probability space

2.1    The Binomial Model Once More

For later reference, we summarize the structure of a general n−period binomial model, since it

will be used to illustrate some of the concepts introduced below.

• I := {0, 1, 2, .., n} is a discrete time index representing the available transaction dates in the

model

• The sample space is given by Ω := {Sequences of n coordinates H or T } with single out-

comes ω of the form

ω = (T T T H..HT ) ,
n coordinates

for instance.

• G := F, the sigma algebra of all subsets of Ω

• Dynamics of the stock price and money account:


 uS
     t−1    with probability p
St =                                           ,   Bt = (1 + r) Bt−1

 dS
     t−1  with probability 1 − p

33
for given B0 = 1, S0 and where

u = 1/d ,          u>1+r >d           .

The sequence (St )t=0,..,n is a sequence of random variables deﬁned on a single probability space

(Ω, G, P ). This is an example of a so called stochastic process on (Ω, G, P ). Associated with

stochastic processes are ﬂows of information sets (i.e. sigma algebras) generated by the process

history up to a given time. For instance, for any t ∈ I we can deﬁne

t
Gt := σ (σ (S0 ) , σ (S1 ) , .., σ (St )) := σ         σ (Sk )   ,
k=0

the smallest sigma algebra containing all sigma algebras generated by S0 , S1 ,...,St . Gt represents

the information about a single outcome ω ∈ Ω which can be obtained exclusively by observing the

price process up to time t. Clearly,

Gt ⊂ Gs ⇐⇒ t ≤ s .

Therefore, the sequence (Gt )t=0,..,n constitutes a ﬁltration, the ﬁltration generated by the process

(St )t=0,..,n .

2.2      Sub Sigma Algebras and (Partial) Information

We model partial information about single outcomes ω ∈ Ω or about single events A ∈ G using

sub sigma algebras of G.

Example 60 Let X be a random variable on (Ω, G). Then σ (X) is (by deﬁnition) a sub sigma

algebra of G. σ (X) represent the partial information about an outcome ω ∈ Ω which can be

obtained by observing X (ω). For instance, set n = 3 in the above binomial model and consider

the outcome ω = (T T T ). By observing S1 , i.e. using σ (S1 ) as the available information set we

can only conclude

ω ∈ {T T T, T HH, T HT, T T H}           (⇔ S1 (ω) = S0 d)       .

34
However, when observing all price movements from t = 0 to t = 3 we can make use of the sigma

algebra
3
G3 := σ         σ (St )        ,
t=0

to fully identify ω ∈ Ω. Both σ (S1 ) and G3 are sub sigma algebras of G, which however represent

diﬀerent pieces of information about ω ∈ Ω

Based on the above simple considerations we can now formally deﬁne what it means for an

event to be ”realized”.

Deﬁnition 61 (i) An event A ∈ G is realized by means of a sub sigma algebra G ⊂ G if A ∈ G .

(ii) Let Gt be a sigma algebra generated by some price process1 up to time t. We say that A is

realized by means of the price information up to time t if A ∈ Gt .

Remark 62 By deﬁnition, realization of an event A ∈ G by means of G is precisely measurability

of that event with respect to the sub sigma algebra G . Precisely, given an event A ∈ G we can

determine it uniquely using G , i.e. we can say that A has been realized, if and only if A ∈ G . For

instance, in the above 3−period binomial model we can consider the event

A = {T T T }          .

/
Clearly, A ∈ σ (S1 ) since we do not know using σ (S1 ) the value of the second and the third coin

tosses. Therefore, A is not realized by means of σ (S1 ), i.e. it is not realized by means of the price

information up to time 1. However,
3
A ∈ G3 := σ               σ (St )       ,
t=0

i.e. A is realized by means of the whole price information available up to time 3.

Example 63 The event {The ﬁrst two price returns are both positive} is realized by means of the

price information up to time 2, while the event {The total number of positive price returns is 2}

is not.
1   See for instance the above examples.

35
2.3     Conditional Expectations

For the whole section let X be a random variable on (Ω, G).

2.3.1   Motivation

Given an event A = X −1 (a) ∈ G, for some a ∈ R, we are always able to identify for any ω ∈ A

the corresponding value X (ω) of the random variable X using the information set G. Indeed, we

then have by deﬁnition

ω ∈ X −1 (a)      ∈          G    ,   i.e. X (ω) = a ,
σ(X)⊂G

for all ω ∈ A. However, using a coarser information set G ⊂ σ (X) it may happen that we are

not able to fully determine the value X (ω) that a random variable X associates to a given single

outcome ω ∈ A. Speciﬁcally, it may happen that based on the information available in G we can

only state for some non singleton set B ∈ B (R)

ω ∈ X −1 (B)         ,       i.e. X (ω) ∈ B   .                 (8)

In this case, the information set G is not suﬃciently ﬁne to fully determine the precise value of

X (ω) associated with a speciﬁc ω ∈ A. Thus, the goal in such a situation is to deﬁne a suitable

candidate prediction E ( X| G ) (ω) for the unknown value X (ω) based on the information G .

We will call E ( X| G ) the conditional expectation of X conditionally on G . Notice, that a ﬁrst

necessary requirement on E ( X| G ) is that it can be fully determined using the information G ,

that is it has to be G −measurable. Further, a natural idea to compute the prediction E ( X| G )

as an unbiased forecast such that the expectation of E ( X| G ) and X agree on all sets A ∈ G :

E ( X| G ) dP =             XdP
A                           A

(see below the precise deﬁnition).

36
2.3.2     Deﬁnition and Properties

Deﬁnition 64 Let G ⊂ G be a sub sigma algebra. The conditional expectation E ( X| G ) of X

conditioned on the sigma algebra G is a random variable satisfying:

1. E ( X| G ) is G −measurable

2. For any A ∈ G :

E ( X| G ) dP =       XdP   ,
A                     A

(partial averaging property).

In the sequel, we write for any further random variable Y on (Ω, G):

E ( X| Y ) := E ( X| σ (Y ))

Remark 65 (i) E ( X| G ) exists, provided X ∈ L1 (P ); this is a consequence of the so called

Radon Nykodin Theorem. (ii) The random variable E ( X| G ) is unique, up to events of zero

probability. Precisely, if Y and Z are two candidate G −measurable random variables satisfying 2.

of the above deﬁnition, then:

P (Y = Z) = 1

Example 66 (i) If G = {∅, Ω} then E ( X| G ) = E (X) 1Ω , that is conditional expectations

conditioned on trivial information sets are unconditional expectations. Indeed, E (X) 1Ω is G

measurable and

E (X) 1Ω dP = E (X) P (Ω) = E (X) =                  XdP
Ω                                                    Ω

(ii) If X is G −measurable then E ( X| G ) = X, that is if the conditioning information set is

suﬃciently ﬁne to determine X completely then conditional expectations of a random variable are

the random variable itself. Indeed, in this case we trivially have:

E ( X| G ) dP =       XdP   ,
A                     A

for any set A ∈ G .

37
Proposition 67 Let G ⊂ G be a sub sigma algebra and X, Y ∈ L1 (P ). It then follows:

1. E (E ( X| G )) = E (X) (Law of Iterated Expectations).

2. For any a, b ∈ R:

E ( aX + bY | G ) = aE ( X| G ) + bE ( Y | G )                     ,

(Linearity).

3. If X ≥ 0 then E ( X| G ) ≥ 0 with probability 1 (Monotonicity).

4. For any sub sigma algebra H ⊂ G :

E ( E ( X| G )| H) = E ( X| H)                 ,

(Tower Property).

5. If σ (X) ⊥G then

E ( X| G ) = E (X) 1Ω              ,

(Independence).

6. If V is a G −measurable random variable such that V X ∈ L1 (P ) then

E ( V X| G ) = V E ( X| G )

Proof. 1. Set A = Ω ∈ G ; by deﬁnition in then follows

E (X) =        XdP =       E ( X| G ) dP = E (E ( X| G ))                      .
Ω           Ω

2. By construction aE ( X| G ) + bE ( Y | G ) is G measurable. Moreover, for any A ∈ G :

(aE ( X| G ) + bE ( Y | G )) dP    = a              E ( X| G ) dP + b             E ( Y | G ) dP
A                                                   A                             A

= a              XdP + b           Y dP
A                 A

=            (aX + bY ) dP             ,
A

38
using in the ﬁrst and the third equality the linearity of Lebesgue integrals and in the second

equality the deﬁnition of conditional expectations.

3. Let

A := {E ( X| G ) < 0} ∈ G          .

Then,

E ( X| G ) dP =         XdP ≥ 0      ,
A                       A

since X ≥ 0 and by the monotonicity of Lebesgue integrals. Further, the monotonicity of Lebesgue

integrals also implies

E ( X| G ) dP ≤ 0     ,
A

since 1A E ( X| G ) < 0. Therefore,

E ( X| G ) dP = 0     ,
A

implying P (A) = 0.

4. E ( X| H) is by deﬁnition H measurable. Further, for any A ∈ H:

E ( X| H) dP =       XdP =        E X| G       dP =:         Y dP   ,
A                    A            A                         A

since A ∈ G because H ⊂ G . By deﬁnition, this implies that E ( X| H) is the conditional

expectation of the random variable Y := E X| G               conditioned on the sigma algebra H.

5. E (X) 1Ω is trivially G −measurable. We show the statement for the case X = 1B , where

B ∈ G. The extension to the general case follows by standard arguments. We have for any A ∈ G :

E (X) 1Ω dP    =     E (X) P (A) = E (1B ) P (A) = P (B) P (A)
A

=     P (A ∩ B) = E (1A 1B ) =              XdP     ,
A

using in the fourth equality the independence assumption, in the ﬁfth the properties of indicator

functions and in the sixth the deﬁnition of X. 6. V E ( X| G ) is G −measurable. Again, we show

39
the statement for the simpler case V = 1B , where B ∈ G . We have for any A ∈ G ,

V E ( X| G ) dP   =          1B E ( X| G ) dP =                E ( X| G ) dP
A                             A                               A∩B

=            XdP =            1B XdP =            V XdP      ,
A∩B               A                    A

using in the third equality the deﬁnition of conditional expectations, and otherwise the properties

of indicator functions.

Example 68 In the n−period Binomial model we have

E (S1 |σ (S1 )) = S1         ,

by the σ (S1 ) −measurability of S1 . S2 is not σ (S1 ) −measurable. However, we know that

σ (S2 /S1 ) ⊥σ (S1 )      .

Therefore,

S2                           S2                          S2
E (S2 |σ (S1 )) = E        S1 σ (S1 )    = S1 E         σ (S1 )       = S1 E              = S1 (pu + (1 − p) d)   .
S1                           S1                          S1

More generally, we have

σ (St /St−1 ) ⊥Gt−1       ,

where
t−1
Gt−1 := σ           σ (Sk )        ,
k=0

t = 1, . . . , n. Therefore, by the same arguments:

E (St |Gt−1 ) = St−1 (pu + (1 − p) d)            .

40
Finally, the tower property gives after some iterations:

E (St+k |Gt−1 ) =    E (E (St+k |Gt+k−1 ) |Gt−1 )

=   E (St+k−1 (pu + (1 − p) d) |Gt−1 )

=   (pu + (1 − p) d) E (St+k−1 |Gt−1 )

=   ...

k
=   (pu + (1 − p) d) E (St |Gt−1 )

k+1
=   (pu + (1 − p) d)         St−1   .

2.4    Martingale Processes

We now introduce a class of stochastic processes that are particularly important in ﬁnance: the

class of martingale processes. Indeed, it will turn out in a later chapter that the price processes of

many ﬁnancial instruments are martingale processes after a suitable change of probability. In this

section we give the necessary deﬁnitions and present some ﬁrst examples of martingale processes.

Deﬁnition 69 (i) Let G := (Gt )t=0,..,n be a ﬁltration over (Ω, G, P ). The quadruplet (Ω, G, G,P ) is

called a ﬁltered probability space. (ii) A stochastic process X := (Xt )t=0,..,n on a ﬁltered probability

space (Ω, G, G,P ) is adapted (is G−adapted) if for any t = 0, .., n the random variable Xt is

Gt −measurable. (iii) A G−adapted process is a martingale if for any t = 0, .., n − 1 one has

Xt = E ( Xt+1 | Gt )    ,                                    (9)

(martingale condition). The process is a submartingale (a supermartingale) if in (9) the ”≤” sign

(the ”≥” sign) holds.

Remark 70 Notice, that in Deﬁnition 69 both the ﬁltration G and the relevant probability P are

crucial in determining the validity of the martingale condition (9) for an adapted process. Indeed,

41
diﬀerent probabilities and ﬁltrations can imply (9) to be satisﬁed or not. For instance, in the

n−period binomial model we obtained, using the ﬁltration generated by the stock price process,

E (St |Gt−1 ) = St−1 (pu + (1 − p) d)       .

Therefore, the only binomial probability measure under which the stock price process is a martingale

is the one satisfying
1−d
pu + (1 − p) d = 1 , i.e. p =              .                             (10)
u−d

The binomial probabilities such that p > (1 − d) / (u − d) (p < (1 − d) / (u − d)) imply a stock

price process that is a submartingale (a supermartingale).

Being a martingale is a quite strong condition on a stochastic process, which strongly relates

future process coordinates with current ones. This is made more explicit below.

Proposition 71 Let (Xt )t=0,..,n be a martingale on the ﬁltered probability space (Ω, G, G,P ).

1. It then follows for any t, s ∈ {0, 1, . . . , n} such that s ≥ t:

Xt = E ( Xs | Gt )    .

2. If (Yt )t=0,..,n is a further martingale on the ﬁltered probability space (Ω, G, G,P ) and such

that Yn = Xn then Yt = Xt almost surely for all t ∈ {0, 1, . . . , n}.

Proof. 1. The tower property combined with the martingale property implies

Xt = E ( Xt+1 | Gt ) = E ( E ( Xt+2 | Gt+1 )| Gt ) = E ( Xt+2 | Gt ) = ... = E ( Xt+k | Gt )   ,

for any k = s − t.

2. From 1. we have

Xt = E ( Xn | Gt ) = E ( Yn | Gt ) = Yt   .

This concludes the proof.

42
Example 72 AR(1) process: Let (εt )t=1,...,n be an identically distributed, zero mean, adapted

process on a ﬁltered probability space (Ω, G, G,P ) and such that for any t the random variable εt

is independent from the process history up to time t − 1, i.e.:

t−1
σ (εt ) ⊥ σ          σ (εi )         ,    t = 1, . . . , n   .              (11)
i=1

An Autoregressive Process of Order 1 (AR(1)) is deﬁned by



      0       t=0
Xt =                                              ,

 ρX
    t−1 + εt  t>0

where ρ ∈ R. It is easily seen that (Xt )t=0,..,n is G−adapted. Furthermore, for any t = 1, . . . , n,

E ( Xt | Gt−1 ) = E ( ρXt−1 + εt | Gt−1 ) = ρE ( Xt−1 | Gt−1 )+E ( εt | Gt−1 ) = ρXt−1 +E (εt ) = ρXt−1          ,

using in the second equality the linearity of conditional expectations, in the third the Gt−1 −measurabi-

lity of Xt−1 and the independence assumption (11), and in the fourth the zero mean property of

εt (E (εt ) = 0). Therefore, an AR(1) process is a martingale if and only if ρ = 1. The process

resulting for ρ = 1 is called a ”Random Walk” process.

Example 73 MA(1) process: Let (εt )t=0,..,n be the same process as in Example 72. A Moving

Average Process of Order 1 (MA(1)) is deﬁned by




      0                          t=0



Xt =         ε1                         t=1        ,






 ε + ρε      t         t−1       t>1

where ρ ∈ R. It is easily seen that (Xt )t=0,..,n is G−adapted. Furthermore, for any t = 2, .., n we

have, similarly to above,

E ( Xt | Gt−1 ) = E ( εt + ρεt−1 | Gt−1 ) = ρE ( εt−1 | Gt−1 ) + E ( εt | Gt−1 ) = ρεt−1 + E (εt ) = ρεt−1   .

Therefore,

Xt−1 = E ( Xt | Gt−1 )         ⇐⇒          εt−1 + ρεt−2 = ρεt−1        ,

43
implying that in order to satisfy the martingale condition one must have for all t = 2, .., n

εt−1 = 0         if ρ = 0
.
ρ
εt−1 =   ρ−1 εt−2    if ρ = 0

However, this is in evident contradiction with the independence assumption on the process (εt )t=0,..,n .

We thus conclude that MA(1) processes can never be martingales.

3     Pricing Principles in the Absence of Arbitrage

This section considers the pricing problem of a general European derivative in the context of an

n−period Binomial pricing model. The model structure is:

• I := {0, 1, 2, .., n} is a discrete time index representing the available transaction dates in the

model

• The sample space is given by Ω := {Sequences of n coordinates H or T } with single out-

comes ω of the form

ω = (T T T H..HT ) ,
n coordinates

for instance.

• G := F, the sigma algebra of all subsets of Ω

• Dynamics of the stock price and money account:


 uS
     t−1    with probability p
St =                                              ,       Bt = (1 + r) Bt−1

 dS
     t−1  with probability 1 − p

for given B0 = 1, S0 and where

u = 1/d ,        u>1+r >d            .                     (12)

• (Gt )t=0,...,n is the ﬁltration generated by the stock price process (St )t=0,...,n

44
• A binomial probability P on (Ω, F) is obtained by deﬁning for some p ∈ (0, 1),

# of T in ω
P (ω) := p# of H in ω (1 − p)                  .

• A binomial risk adjusted probability measure P on (Ω, F) is obtained by deﬁning

# of T in ω
P (ω) := p# of H in ω (1 − p)                  ,

where
1+r−d
p=         ,                                          (13)
u−d

is under condition (12) a risk adjusted probability in the one period binomial model.

The characterizing property of a risk adjusted probability measure is to make discounted stock

prices under such measure martingales. Therefore, this measure is also called a risk adjusted (or

risk neutral) martingale measure. Existence of a risk adjusted martingale measure is equivalent to

the absence of arbitrage opportunities. We now show formally all these properties in the setting

of a binomial pricing model.

3.1    Stock Prices, Risk Neutral Probability Measures and Martingales

Discounted stock prices are deﬁned next for completeness.

St
Deﬁnition 74 The stochastic process      Bt              is called the discounted stock price process.
t=0,..,n

The terminology of Deﬁnition 74 is obvious, since one has for any t ∈ 0, .., n (recall the

normalization B0 = 1):
St      1
=        t St .
Bt   (1 + r)

The next proposition shows that under the risk adjusted measure P discounted stock prices are

martingales.

45
Proposition 75 The discounted stock price process

St
, Gt                ,
Bt         t=0,...,n

is a martingale under the probability P .

Proof. Since Bt is deterministic, St /Bt is Gt −measurable if and only if St is Gt −measurable.

Therefore (St /Bt , Gt )t=0,...,n is an adapted process. Moreover,

St+1                St    St+1                 St 1         St+1
E        Gt     =E          ·      Gt          =            E        Gt   ,
Bt+1               Bt+1    St                  Bt 1 + r      St

using the Gt −measurability of St /Bt+1 . Therefore, (St /Bt , Gt )t=0,...,n is a martingale under P if

and only if
St+1
E         Gt         =1+r .
St

Indeed, we have
St+1              St+1
E         Gt    =E                   = pu + (1 − p) d = 1 + r ,
St                St

using in the ﬁrst equality the independence of the binomial increments and in the second equality

deﬁnition (13). This concludes the proof.

3.2    Self Financing Strategies, Risk Neutral Probability Measures and

Martingales

The martingale property for discounted stock prices under P is valid more generally for dynamic

portfolios that are self-ﬁnanced, i.e. portfolios that are rebalanced at any time using only past

capital gains. This fact is very important for the pricing of a derivative, because hedging portfolios

are a particular case of a self-ﬁnancing portfolio.

Deﬁnition 76 (i) An adapted process (∆t , Gt )t=0,...,n with value process (Xt )t=0,...,n deﬁnes a

portfolio process if ∆t is the number of stock at time t in the portfolio and (Xt − ∆t St ) /Bt is the

46
number of units of the money account. (ii) A portfolio process (∆t , Gt )t=0,...,n is self-ﬁnanced if

Xt+1 = ∆t St+1 + (Xt − ∆t St ) (1 + r)          .

(iii) A self-ﬁnanced portfolio is an arbitrage opportunity if X0 = 0 and

Xn ≥ 0    ,   P (Xn > 0) > 0           .

Let us remark a few important aspects on the above deﬁnition of a self ﬁnanced portfolio.

Remark 77 (i) The adaptedness condition on a portfolio process ensures that at any time t the

number of stocks in the portfolio is determined using only information available at that time. (ii)

The self-ﬁnancing condition of a self-ﬁnancing portfolio says that the portfolio value Xt+1 at any

time t+1 must be obtained as the sum of the stock and money account positions at time t evaluated

at t + 1 prices:

Xt − ∆t St
Xt+1 = ∆t · St+1 + (Xt − ∆t St ) (1 + r) =          ∆t      ·       St+1   +                 Bt+1       .
Bt
# of stocks Stock price                     Bond price
at time t at time t+1           # of bonds at time t+1
at time t

(iii) The self-ﬁnancing condition of a self-ﬁnanced portfolio already implies that the value process

(Xt , Gt )t=0,...,n of a self-ﬁnanced portfolio is adapted. Indeed, for any t = 0, .., n − 1,

Xt+1 = ∆t St+1 + (Xt − ∆t St ) (1 + r)          ,

i.e. Xt+1 is a linear combination of random variables that are Gt+1 −measurable, and is therefore

Gt+1 −measurable. (iv) An arbitrage portfolio is simply a self-ﬁnanced strategy of zero initial cost

and with non negative and non zero ﬁnal value.

The martingale property under P of discounted value processes of a self-ﬁnanced portfolio is

proved next.

Proposition 78 The discounted portfolio value process

Xt
, Gt                ,
Bt         t=0,...,n

of any self-ﬁnanced portfolio process (∆t , Gt )t=0,...,n is a martingale under the probability P .

47
Proof. We already showed that (Xt , Gt )t=0,...,n is an adapted process. Therefore, it remains to

show that the martingale condition for the discounted value process (Xt /Bt , Gt )t=0,...,n is satisﬁed.

We have:

Xt+1                  ∆t (St+1 − St (1 + r)) + Xt (1 + r)
E         Gt      =    E                                        Gt
Bt+1                                  Bt+1
∆t                                   Xt
=         E ( St+1 − St (1 + r)| Gt ) + E      Gt
Bt+1                                  Bt
∆t                                Xt
=         E ( St+1 − St (1 + r)| Gt ) +     ,
Bt+1                               Bt

using in the ﬁrst equality the self-ﬁnancing deﬁnition, in the second the Gt −measurability of

∆t /Bt+1 and the linearity of conditional expectations, and in the third the Gt −measurability of

Xt /Bt . Thus, (Xt /Bt , Gt )t=0,...,n is a martingale if and only if

E ( St+1 − St (1 + r)| Gt ) = 0     ,

i.e. if and only if
St+1
E         Gt      =1+r      .
St

This is precisely what we have shown in the proof of Proposition 75. Therefore, the proof is

completed.

3.3    Existence of Risk Neutral Probability Measures and Derivatives

Pricing

Self ﬁnanced portfolios are precisely the type of dynamic portfolios that can be used to hedge

derivatives. Indeed, the self-ﬁnancing condition implies that if we are able to fully replicate a

contingent claim by means of a self-ﬁnanced portfolio then we are also able to fully eliminate the

risk deriving from the random pay-oﬀ of the contingent claim.

Deﬁnition 79 (i) An European derivative VT with maturity T ∈ I is a GT −measurable random

variable. (ii) A European derivative VT is hedgeable if there exists a self-ﬁnanced portfolio process

48
(∆t , Gt )t=0,...,n with value process (Xt , Gt )t=0,...,n such that

XT = VT          .

Notice that if an European contingent claim VT is hedgeable, then absence of arbitrage oppor-

tunities immediately implies that its price is the value of the corresponding hedging portfolio. We

state this important fact in the next Proposition under point (i) for completeness.

Proposition 80 (i) If a European contingent claim VT is hedgeable, then in the absence of arbi-

trage opportunities we have for any t ∈ I:

Vt = Xt     .

(ii) The risk neutral valuation formula is obtained:

Bt                       1
Vt =      E ( VT | G t ) =        T −t
E ( VT | G t )       .
BT                  (1 + r)

Proof. (i) Without loss of generality assume that V0 > X0 , in order to imply a contradiction

with the no arbitrage assumption. Then, a portfolio short in one unit of the hedging portfolio and

long one unit of the derivative at time 0 costs X0 − V0 < 0. Holding the derivative until maturity

and rebalancing the short position in the portfolio according to its self ﬁnancing dynamics yields

a pay oﬀ XT − VT = 0 at maturity T . Investing in the money account the amount V0 − X0 yields
T
a ﬁnal pay-oﬀ (V0 − X0 ) (1 + r) > 0 at maturity, i.e. an arbitrage opportunity. Therefore, one

must have V0 = X0 . (ii) We have, by (i) and the martingale property of (Xt , Gt )t=0,...,n under P

(see also Proposition 78):

Vt   Xt              XT                  VT
=    =E              Gt      =E          Gt         ,
Bt   Bt              BT                  BT

where the last equality arises because (∆t , Gt )t=0,...,n is an hedging portfolio for VT . This concludes

the proof.

49
The implication of the results in this section is that any hedgeable derivative has a price given

by a risk neutral valuation formula. But, when is a derivative hedgeable? This is discussed in the

next secion.

3.4    Uniqueness of Risk Neutral Probability Measures and Derivatives

Hedging

Can a simple European derivative always be hedged? The answer depends on the pricing model

used. For instance, in the standard binomial model this is the case. On the other hand, in the

discrete time/continuous state space model in the next chapter this is not the case.

Basically, the answer depends on the relation between the number of basic instruments available

to construct a hedging portfolio and the number of independent risk factors in the model. Roughly

speaking, if the number of available instruments is suﬃciently large then every contingent claim is

perfectly hedgeable and thus obtains a unique price. Models that satisfy this property are called

complete.

Deﬁnition 81 In the absence of arbitrage opportunities a pricing model is complete if for any

European derivative VT there exists a hedging portfolio strategy (∆t , Gt )t=0,...,n for VT with value

process (Xt , Gt )t=0,...,n and such that

XT = VT    .

As mentioned, the standard binomial model is complete. This statement is made precise in

the next result.

Theorem 82 The binomial model is complete. Precisely, for any European derivative VT there

exists a hedging portfolio strategy (∆t , Gt )t=0,..,T −1 for VT with value process (Xt , Gt )t=0,..,T −1 .

For any t = 0, .., T the value Xt at time t is given by

VT
Xt = Bt E       Gt
BT

50
and for any t = 0, .., T − 1 the stock position ∆t is given by

Vt+1 (ω1 , .., ωt , H) − Vt+1 (ω1 , .., ωt , T )
∆t =                                                            .       (14)
St+1 (ω1 , .., ωt , H) − St+1 (ω1 , .., ωt , T )

Proof. Deﬁne a self-ﬁnanced portfolio process with stock position at time t given by ∆t in

(14) and with value process dynamics given recursively by

Xt+1 = ∆t St+1 + (Xt − ∆t St ) (1 + r)              ,

where
VT
X0 = E          G0         .
BT

We show that for any t = 0, .., T one has

VT
Xt = Vt := E            Gt         .
BT

This statement is correct by construction for t = 0. Thus, assume it is correct for some t < T ,

that is
VT
Xt = Vt = Bt E            Gt            .
BT

We show by induction that then it is correct also for t + 1, i.e. that

Xt+1 (ω1 , .., ωt , H)     = Vt+1 (ω1 , .., ωt , H)

Xt+1 (ω1 , .., ωt , T )   = Vt+1 (ω1 , .., ωt , T )       .

For brevity, we show the ﬁrst of these two equalities. The second follows in a similar way. The

self-ﬁnancing condition gives:

Vt+1 (H) − Vt+1 (T )
Xt+1 (H) =                           (St+1 − St (1 + r)) + Vt (1 + r) ,
St+1 (H) − St+1 (T )

using the deﬁnition (14) of ∆t . Moreover, we know that (Vt /Bt , Gt )t=0,...,n is a martingale under

P , implying

Bt+1
Vt (1 + r) = Vt        = E ( Vt+1 | Gt ) = pVt+1 (H) + (1 − p) Vt+1 (T )       .
Bt

51
We thus obtain

Vt+1 (H) − Vt+1 (T )
Xt+1 (H) =                            (St+1 (H) − St (1 + r)) + E ( Vt+1 | Gt )
St+1 (H) − St+1 (T )
Vt+1 (H) − Vt+1 (T )
=                        (u − (1 + r)) St + E ( Vt+1 | Gt )
(u − d) St
u − (1 + r)
=               (Vt+1 (H) − Vt+1 (T )) + (pVt+1 (H) + (1 − p) Vt+1 (T ))
(u − d)

=   (1 − p) (Vt+1 (H) − Vt+1 (T )) + (pVt+1 (H) + (1 − p) Vt+1 (T ))

=   Vt+1 (H)    ,

using in the last equality the deﬁnition

1+r−d
p=              .
u−d

Since, for t = T we obtain

XT = VT = E ( VT | GT ) = VT    ,

almost surely, i.e. (∆t , Gt )t=0,..T is an hedge portfolio, as claimed. This concludes the proof.

3.5    Existence of Risk Neutral Probability Measures and Absence of

Arbitrage

4     Introduction to Stochastic Processes

For the whole section let (Ω, G, P ) be a probability space.

4.1    Basic Deﬁnitions

A stochastic process is a mathematical model to describe the realizations of a random experiment

at some diﬀerent dates deﬁned on a time index set I, as for instance I = {0, 1, 2, .., n}, I = N,

I = [0, ∞).

52
Deﬁnition 83 Let I be a time index set. (i) A family X := (Xt )t∈I of random variables on

(Ω, G, P ) is called a stochastic process. (ii) If I is countable the process is called a discrete-time

stochastic process. If it is uncountable the process is called a continuous-time stochastic process.

(iii) For any ω ∈ Ω the real valued function t −→ Xt (ω) is called a trajectory of the process.

Remark 84 (i) The important thing to note in the above deﬁnition is that all random variables

in the family X are deﬁned on a single probability space (Ω, G, P ). In fact, when constructing

a stochastic process satisfying a set of a priori desirable properties one will have to construct

a family X of random variables deﬁned on a single probability space (Ω, G, P ). This puts quite

strong restrictions on the way how stochastic processes can be obtained. (ii) We can also think of

a stochastic process X as a (measurable) function

X :Ω×I →R         ;    (ω, t) → Xt (ω)   ,

i.e. as a random variable deﬁned on a measurable space with sample space Ω × I.

The concept of a ﬁltration of an adapted process and of a martingale extend in a natural way

to continuous time stochastic processes.

Deﬁnition 85 (i) A family G := (Gt )t∈I of sub sigma algebras of G is a ﬁltration if for any

t, s ∈ I

Gt ⊂ Gs ⇔ t < s .

(ii) A stochastic process X := (Xt )t∈I is G−adapted if for any t ∈ I the random variable Xt is

Gt −measurable. (iii) An adapted process (Xt , Gt )t∈I is a martingale if for any t, s ∈ I such that

s > t the martingale condition

Xt = E ( Xs | Gt ) ,

is satisﬁed.

53
4.2      Discrete Time Brownian Motion

A ﬁrst example of a discrete time process with continuous state space is a random walk process

with normally distributed innovations. This is the discrete time analogue of the continuous time

Brownian motion process.

Example 86 (Discrete Time Brownian Motion). Let Y :=(Yt )t=0,..,n be a sequence of iid N (0, 1)

random variables2 on (Ω, G, P ). We deﬁne Z0 = 0 and

t
Zt =         Yi   .
i=1

(Zt )t=0,..,n is a random walk with normally distributed innovations and is the discrete time ana-

logue of the (continuous time) Brownian motion process. We immediately have the following

properties of discrete time Brownian motion:

Zt     N (0, t) ,                            (15)

i.e. Zt is normally distributed with mean zero and a variance increasing proportionally with time,

and for s > t

Zs − Zt ⊥ σ (Zk ; k ≤ t) = σ (Yk ; k ≤ t) ,                 (16)

i.e. increments of Brownian motions are independent of the past and current history of the process.

Finally, we also have for any two time points s > t:

Cov (Zt , Zs ) = E (Zt Zs ) − E (Zt ) E (Zs )

= E ((Zs − Zt + Zt ) Zt )

2
= E ((Zs − Zt ) Zt ) + E Zt

= E (Zs − Zt ) E (Zt ) + V ar (Zt )

= min (t, s) ,

2   This deﬁnes already a discrete time stochastic process.

54
using in the second equality the zero mean property of Brownian motion, in the third the indepen-

dence of its increments and in the fourth again the zero mean property. The same result arises

for the case t < s. Therefore:

Cov (Zt , Zs ) = min (t, s) .

Further, deﬁne

Gt := σ (Y1 , .., Yt ) ,

the sigma algebra generated by the process Y up to time t. Notice that we have:

Y1 = Z1 , Y2 = Z2 − Z1 , Y3 = Z3 − Z2 , ..., Yn = Zn − Zn−1

Therefore, the sigma algebras generated by Y and by (Zt )t=0,..,n up to a given time are the same.

This implies that Zt is Gt −measurable and that (Zt )t=0,..,n is (Gt )t=0,..,n adapted. Further, for

any time indices s > t we have:

E ( Zs | Gt ) = E ( Zs − Zt + Zt | Gt ) = E ( Zs − Zt | Gt ) + Zt = Zt ,

i.e. discrete time Brownian motion is a martingale process.

Some further examples of a martingale process are obtained by looking at some simple func-

tionals of (discrete time) Brownian motion. The ﬁrst one arises simply by recentering squared

(discrete time) Brownian motion by its variance.

Example 87 Let (Zt , Gt )t=0,..,n be a discrete time Brownian motion. For the adapted process

55
2
(Xt , Gt )t=0,..,n := Zt , Gt   t=0,..,n
it follows for s > t:

2
E ( Xs | Gt ) =       E Zs G t

2
=      E (Zs − Zt + Zt ) Gt

2   2
=      E (Zs − Zt ) Gt + E Zt Gt + E ( 2 (Zs − Zt ) Zt | Gt )

2      2
=      E (Zs − Zt )         + Zt + 2Zt E ( (Zs − Zt )| Gt )

=      (s − t) + Xt + 2Zt E ((Zs − Zt ))

=      (s − t) + Xt ≥ Xt           ,

using in the third equality the linearity of conditional expectations, in the fourth the independence

of Brownian increments and the Gt −measurabiltiy of Zt , and in the ﬁfth the zero mean property

of Brownian motion. This implies that (Xt , Gt )t=0,..,n is a submartingale. However, by similar

2
arguments as those listed above we see that the process Zt − t, Gt                  t=0,..,n
is a martingale.

The last example of a Brownian functional that gives a martingale process is exponential

(discrete time) Brownian motion.

Example 88 (Exponential Brownian Motion) Let (Zt , Gt )t=0,..,n be a discrete time Brownian

motion. For the adapted process (Xt , Gt )t=0,..,n deﬁned by

σ2 t
Xt = exp σZt −
2

it follows for s > t:

σ2 s
E ( Xs | Gt ) =       E    exp σZs −                    Gt
2
σ 2 (s − t)              σ2 t
=      E    exp σ (Zs − Zt ) −                         exp σZt −           Gt
2                    2
σ2 t                                     σ 2 (s − t)
=      exp σZt −        E               exp σ (Zs − Zt ) −                 Gt
2                                           2
σ 2 (s − t)
=      Xt exp −                         E (exp σ (Zs − Zt ))   ,                       (17)
2

using in the last equality the independence of Brownian increments. Now, since Zs − Zt

N (0, s − t), the expression E (exp σ (Zs − Zt )) is the moment generating function of a N (0, s − t)

56
distributed random variable, evaluated at the point σ. Thus,

σ 2 (s − t)
E (exp σ (Zs − Zt )) = MZs −Zt (σ) = exp                          .
2

With this result, we obtain in (17) the martingale property for the process (Xt , Gt )t=0,..,n .

4.3      Girsanov Theorem: Application to a Semicontinuous Pricing Model

This section considers the pricing problem of a general European derivative in the context of an

n−period semicontinuous pricing model.

4.3.1     A Semicontinuous Pricing Model

The model structure is:

• I := {0, 1, 2, .., n} is a discrete time index representing the available transaction dates in the

model

• The sample space is given by Ω := Rn with single outcomes ω of the form

ω = (ω1 , ω2 , .., ωn )   ,

where ωi ∈ R, i = 1, . . . , n.

• G := B (Rn ) the Borel sigma algebra on Rn

• Dynamics of the stock price and money account:

σ2
St   = St−1 exp σYt −              exp (µ)       ,            (18)
2

Bt    = exp (r) Bt−1      ,

for some µ, r, σ > 0, for given B0 = 1, S0 and where (Yt )t=1,...,n is an iid N (0, 1) sequence

of random variables on (Rn , B (Rn )).

• (Gt )t=0,...,n is the ﬁltration generated by (Yt )t=1,...,n , which coincides with the ﬁltration

generated by the stock price process (St )t=0,...,n .

57
• A probability P on (Ω, G) such that (Yt )t=1,...,n is an iid N (0, 1) sequence.

Some simple properties of the above asset price dynamics can be immediately deduced from the

above deﬁnitions. Firstly, the above money account dynamics gives:

Bt = exp (rt)         ,

implying a continuous interest rate compounding. Secondly, for the risky asset price dynamics we

get:
s
Ss                                           σ 2 (s − t)
E       Gt    =     E     exp σ            Yi −                     exp (µ (s − t)) Gt
St                            i=t+1
2
σ 2 (s − t)
=     E    exp σ (Zs − Zt ) −                            Gt exp (µ (s − t))
2

= exp (µ (s − t))          ,

because exponential Brownian motion is a martingale process. Therefore, exp (µ (s − t)) is the

expected rate of return on the stock, or alternatively

log E ( Ss | Gt ) − log St
µ=                                     ,
s−t

is the continuous expected rate of returns on the stock. Similarly, for the variance of logarithmic

stock returns one gets

Ss                                       σ 2 (s − t)
V ar      log         Gt   = V ar       σ (Zs − Zt ) −                     Gt       = σ 2 V ar (Zs − Zt ) = σ 2 (s − t) ,
St                                            2

i.e.
Ss
V ar log        St    Gt
2
σ =                                    ,
s−t

is the continuous variance rate on the stock.

4.3.2      Risk Neutral Valuation in the Semicontinuous Model

In the semicontinuous model it is not possible to hedge perfectly any European derivative using

a suitable self-ﬁnanced hedge portfolio, i.e. this model is not complete. Therefore, the deriva-

tion/computation of a suitable risk neutral probability measure for pricing derivatives must follow

58
other arguments than those adopted in the binomial model setting. Fortunately, a powerful theo-

rem from the theory of stochastic process, namely Girsanov Theorem, can assist us in constructing

such probability measures by using pure probabilistic arguments. We ﬁrst give for completeness

some basic deﬁnitions which are the pendant of Deﬁnition 76 for the semicontinuous model setting

Deﬁnition 89 (i) An adapted process (∆t , Gt )t=0,...,n with value process (Xt )t=0,...,n deﬁnes a

portfolio process if ∆t is the number of stock at time t in the portfolio and (Xt − ∆t St ) /Bt is the

number of units of the money account. (ii) A portfolio process (∆t , Gt )t=0,...,n is self-ﬁnanced if

Xt+1 = ∆t St+1 + (Xt − ∆t St ) exp (r)      .

(iii) A self-ﬁnanced portfolio is an arbitrage opportunity if X0 = 0 and

Xn ≥ 0       ,   P (Xn > 0) > 0   .

(iv) A probability P on the measurable space (Ω, G) which is equivalent to P and such that the

discounted price process (St /Bt , Gt )t=0,...,n is a martingale, is called a risk adjusted (risk neutral)

probability measure.

Remark 90 Notice that self-ﬁnanced portfolios and arbitrage strategies in the semicontinuous

model are deﬁned exactly as in the earlier binomial setting, with the only diﬀerence that now

interest rates are continuously compounded. By contrast, a risk neutral probability measure P in

the semicontinuous model is explicitly required to be equivalent to the physical probability P . This

ensures that the null sets of these two probabilities coincide. This property was by construction

satisﬁed in the binomial model, where no null probability events - apart from the trivial empty set

- could arise.

To highlight the properties that a risk neutral measure in the semicontinuous model should

have we consider again the discounted price process (St /Bt , Gt )t=0,...,n , which is given by
2

St+1   St exp σYt+1 − σ + µ
2    St             σ2
=                      =    exp σYt+1 −    +µ−r                    .
Bt+1          Bt exp (r)      Bt             2

59
Therefore,

St+1             St                     σ2
E        Gt     =       E exp σYt+1 −           + µ − r Gt
Bt+1             Bt                     2
St                       µ−r                 σ2
=       E exp σ Yt+1 +                   Gt exp −
Bt                         σ                 2
2
St                                    σ
=       E ( exp (σ (Yt+1 + θ))| Gt ) exp −     ,                (19)
Bt                                    2

where
µ−r
θ=         ,
σ

is the so called market price of risk. Recall that if under P the random variable Yt+1 := Yt+1 + θ

is both standard normally distributed and independent of Gt then:

σ2
E ( exp (σ (Yt+1 + θ))| Gt ) = E exp σ Yt+1               = exp        ,
2

and the martingale property follows for time t. Thus, if under a probability P the process

Yt                = (Yt + θ)t=1,...,n is an iid N (0, 1) random sequence, then under P the discounted
t=1,...,n

stock price process is a martingale and P is a risk neutral measure. This is equivalent to stating

that under P the process Zt                     given by
t=0,..,n

t                 t
Zt = Zt + θt =              (Yi + θ) =         Yi ,
i=1                  i=1

is a discrete time Brownian motion. Notice, that under the physical probability P the process

Zt                is not a Brownian motion but a so called Brownian motion with drift. Therefore,
t=0,..,n

the probability P ”reconverts ” a process which is a Brownian motion with drift under P to a

standard Brownian motion.

How can we construct such a probability measure P ? The answer is provided by Girsanov’s

Theorem.

4.3.3      A Discrete Time Formulation of Girsanov Theorem

A discrete time formulation of the famous Girsanov Theorem is proved in the sequel.

60
Theorem 91 Let (Ω, G, P ) be a probability space such that the process (Yt )t=1,...,n is an iid N (0, 1)

random sequence (or equivalently the process (Zt )0=1,...,n is a discrete time Brownian motion).

Deﬁne a further measure P on (Ω, G) by

dP
P (A) =          dP      ,   A∈G    ,                                     (20)
A   dP

where
n
dP                   θ2 n                               θ2 n
= exp −θ     Yi −                  = exp −θZn −              .                         (21)
dP          t=1
2                                  2

It then follows:

1. P is a probability measure equivalent to P ,

2. The process Yt                 = (Yt + θ)t=1,...,n is an iid N (0, 1) random sequence under P ,
t=0,..,n

3. The process Zt                 = (Zt + θt)t=0,..,n is a discrete time Brownian motion under P .
t=0,..,n

Proof. 1. By the properties of Lebesgue integrals P is a measure on (Ω, G). Moreover, we

have

dP           dP                               θ2 n                                     θ2 n
dP = E             = E exp −θZn −                     = E (exp (−θZn )) exp −                  .
Ω   dP           dP                                2                                        2

Recall that Zn      N (0, n) under P . Therefore,

θ2 n
E (exp (−θZn )) = exp               ,
2

by the properties of moment generating functions of normally distributed random variables, im-
e
dP                       e
dP
plying   Ω dP
dP   = 1. Thus,      dP   is a strictly positive proper density and P is a probability measure

equivalent to P . 2. To show that under P the random sequence Yt                              := (Yt + θ)t=1,...,n is
t=1,...,n

iid N (0, 1) let us denote by LY1 ,..,Yn and LY1 ,..,Yn (LY1 ,..,Yn and LY1 ,..,Yn ) the distribution induced
e      e                   e      e

by Y1 , .., Yn and Y1 , .., Yn , respectively, on (Rn , B (Rn )) under P (under P ), that is

LY1 ,..,Yn (B) =
e      e             P        Y1 , .., Yn ∈ B    ,   LY1 ,..,Yn (B) = P ((Y1 , .., Yn ) ∈ B)     ,

LY1 ,..,Yn (B) =
e      e             P        Y1 , .., Yn ∈ B    ,   LY1 ,..,Yn (B) = P ((Y1 , .., Yn ) ∈ B)     ,

61
for any B ∈ B (Rn ). We have, for any B ∈ B (Rn ):

LY1 ,..,Yn (B) = P
e      e                 Y1 , .., Yn ∈ B = P ((Y1 , .., Yn ) ∈ B − θ) = LY1 ,..,Yn (B − θ)                                       ,

where B − θ := {(x1 − θ, .., xn − θ) : x ∈ B}, because for any i = 1, . . . , n, it follows Yi = Yi − θ.

Further, recall that the joint distribution of (Yt )t=1,...,n under P is iid N (0, 1), i.e.

n
−n                1           2
LY1 ,..,Yn (B) =           (2π)      2
exp −                  yt    dy1 · · · dyn           .
B                               2   t=1

Therefore, we obtain

n
θ2 n
LY1 ,..,Yn (B − θ) =              exp −θ              yt −             dLY1 ,..,Yn (y1, .., yn )
B−θ                 t=1
2
n                                                       n
θ2 n             −n                   1          2
=            exp −θ              yt −             (2π)      2
exp −                   yt        dy1 · · · dyn
B−θ                 t=1
2                                    2   t=1
n                                                                  n
θ2 n             −n                  1                      2
=          exp −θ           (yt − θ) −                     (2π)   2
exp −                    (yt − θ)        dy1 · · · dyn
B              t=1
2                                   2    t=1
n
−n                1          2
=          (2π) 2
exp −                  yt     dy1 · · · dyn         .
B                         2   t=1

Since the function
n
−n                1          2
f (y) = (2π)        2
exp −                  yt         ,
2   t=1

is the joint density function of an iid sequence of N (0, 1) random variables we have shown that

LY1 ,..,Yn is such a normal distribution on (Rn , B (Rn )), as claimed. Statement 3. follows directly
e      e

from statement 2.

Using Girsanov theorem, we are now able to give a risk neutral probability measure for the

above semicontinuous pricing model. We summarize this ﬁnding in the next corollary.

Corollary 92 In the semicontinous pricing model with stock price dynamics (18) a risk adjusted

martingale measure P on (Ω, G) is obtained by setting for any A ∈ G,

θ2 n
P (A) :=              exp −θZn −                         dP       ,
A                           2

62
where
µ−r
θ=            ,
σ

is the market price of risk in the model.

Inspired by the previous results in the Binomial model, we are now tempted to price European

derivatives also in the semicontinuous model by means of a risk neutral valuation formula under P .

Notice, that since the model is incomplete there could exist more than one risk neutral probability

measure for this setting, in excess to the just identiﬁed probability measure P . This causes the

problem of ﬁnding adequate criteria for selecting one of these probabilities to price contingent

claims in incomplete markets. Nevertheless, using P we can still compute, at least formally, the

corresponding risk neutral pricing formula as a speciﬁc expectation. Moreover, we can deﬁne the

t−time price of an European derivative VT in the semicontinuous model as

Bt
Vt :=      E ( VT | Gt ) = exp (−r (T − t)) E ( VT | Gt )   .
BT

+
In the case of a call pay-oﬀ VT = (ST − K) this yields the famous Black-Scholes pricing formula

using pure probabilistic arguments. In order to motivate this pricing formula completely, we will

have to work out - in a later section - a model where trading can evolve in continuous time, the

Black and Scholes model. In this setting we will also be able to construct hedging strategies for

any European derivative in the model and to show that the above pricing approach is the only

one consistent with the absence of arbitrage opportunities in the Black and Scholes model. To

this end we will have to introduce some continuous time stochastic processes more explicitly and

o
to develop a stochastic integral calculus, Itˆ’s calculus, where integrals are deﬁned with respect

to increments of a continuous time Brownian motion (see below).

Before doing that, we conclude this section by computing Black-Scholes formula in the semi-

continuous model by means of a risk neutral valuation formula under P .

63
4.3.4     A Discrete Time Derivation of Black and Scholes Formula

The famous Black and Scholes pricing formula for an European call option arises in our semicon-

tinuous setting as the discouted risk neutral expectation of the call pay-oﬀ at maturity. This is

shown in the next result.

Proposition 93 (Black and Scholes Call Price Formula) The time 0 discounted risk neutral ex-
+
pectation of the call pay-oﬀ (ST − K)            of a call option with maturity T and strike price K is

given by:
B0            +
E (ST − K)   = S0 N (d1 ) − K exp (−rT ) N (d2 )                  ,
BT

where
S0            σ2
log   K    + r+     2      T                  √
d1 =                √                  ,   d2 = d1 − σ T     .
σ T

Proof. Writing

+
(ST − K) = (ST − K) 1{ST >K} = ST 1{ST >K} − K1{ST >K}                             ,

we have

+
E (ST − K)        = E (ST − K) 1{ST >K} = E ST 1{ST >K} − K E 1{ST >K}                            .   (22)

Moreover,

E 1{ST >K} = P (ST > K)            .

To compute this probability notice that we have:

ST       K
{ST > K} =         log      > log          .
S0       S0

Moreover, using the explicit form for the stock price dynamics in the semicontinuous model,

ST                  σ2                                    σ2
log      = σZT −             − µ T = σ (ZT + θT ) −                −r T       ,
S0                  2                                     2

64
where θ = (µ − r) /σ, it follows

K        σ2
{ST > K}       =    σ (ZT + θT ) > log     +         −r T
S0         2
                          2

 Z + θT          K
log S0 + σ − r T 
2
T
=       √      >            √
      T              σ T           
                             2

 Z + θT        log S0 − σ − r T 
K        2
T
=    − √         <           √
        T              σ T          

ZT + θT         √
=       −  √     < d1 − σ T
T
ZT + θT
=       − √      < d2    .
T

Now, notice that by Girsanov theorem ZT + θT is a standard Brownian motion under the proba-

bility measure P , so that
ZT + θT
−     √            N (0, 1)       .
T

Therefore,
ZT + θT
P (ST > K) = P         −     √     < d2              = N (d2 )   .           (23)
T

We now compute the ﬁrst term in the diﬀerence on the RHS of (22), discounted by BT = exp (rT ).

We have:

ST                            ST
E       1{ST >K}        =             dP
BT                    {ST >K} BT
ST
= S0                dP
{ST >K} S0 BT
σ2 T
= S0                exp σ (ZT + θT ) −                   dP
{ST >K}                                    2

Now, recall that (see above)

√
{ST > K} = ZT + θT > −d2 T                          ,

and that under P we have

ZT + θT       N (0, T )      .

65
This gives

ST                                                                 σ2 T
E          1{ST >K}       = S0                exp σ (ZT + θT ) −                      dP
BT                         {ST >K}                                  2
∞                                                                     2
σ2 T       1         1                  z
= S0        √       exp σz −             √       exp −                   √         dz
−d2 T                     2         2πT       2                   T
∞                                              2
1         1      z − σT
= S0           √    √       exp −          √                 dz      .                  (24)
−d2    T       2πT       2          T

Finally, notice that
∞                                          2
1          1 z − σT
√        √
exp −     √          dz
−d2 T  2πT        2     T
√
is the probability of the interval −d2 T , ∞ under a N (σT, T ) distribution, which is the same
√
as the probability of the interval −d2 − σ T , ∞ under a N (0, 1) distribution. By symmetry,
√
this is also equal to the probability of the interval −∞, d2 + σ T under a N (0, 1) distribution,

i.e.:
∞                                         2                     √
1         1    z − σT
S0       √    √       exp −        √                  dz = S0 N d2 + σ T                = S0 N (d1 )      .    (25)
−d2 T       2πT       2        T

Putting terms together we ﬁnally obtain:

B0           +                   ST            K
E (ST − K)  =E                   1{ST >K} −    E 1{ST >K} = S0 N (d1 )−K exp (−rT ) N (d2 )                               ,
BT                               BT            BT

from (23) and (25).

4.4     Continuous Time Brownian Motion

The starting point to develop a continuous time model for the stock price is the Brownian motion

process.

Deﬁnition 94 A continuous time adapted process (Zt , Gt )t≥0 on a probability space (Ω, G, P ) is

a (standard) Brownian motion if

1. Z0 = 0

66
2. For any s > t if follows

Zs − Zt     N (0, s − t)     ,   Z s − Zt ⊥ G t     .

3. For any ω ∈ Ω the mapping t −→ Zt (ω) is continuous.

We shall speak sometimes of Brownian motion (Zt , Gt )0≤t≤T on [0, T ] for some T > 0 and the

meaning of this is apparent.

Remark 95 (i) The ﬁltration (Gt )t≥0 is part of the deﬁnition. A natural choice of a ﬁltration is

the one generated by the process coordinates, deﬁned by setting

Z
Gt = Gt := σ (σ (Zu ) ; 0 ≤ u ≤ t) .

In some cases3 , it is important to work with a larger ﬁltration that the one generated by the

process. In the sequel we will assume Gt to be at least augmented, i.e., for any t ≥ 0:

Z
Gt = σ Gt ∪ N          ,                                    (26)

Z
where N is the family of all subsest of G having probability 0. Notice, that Gt does not contain N ,

Z
so that Gt     t≥0
is not augmented. The augmented ﬁltration implied by (26) is sometimes called

the natural ﬁltration of a Brownian motion process. (ii) The fact that a probability space (Ω, G, P )

and an adapted stochastic process (Zt , Gt )t≥0 with the Brownian motion properties indeed can be

constructed is a fundamental result in probability theory. (iii) It can be shown that Brownian

motion is the only process with continuous paths and with independent stationary increments, i.e

satisfying

Zs − Zt ⊥ G t     and    LZs −Zt = LZs−t −Z0        ,

3   As for instance when constructing solutions to some particular stochastic diﬀerential equations.

67
for any s ≥ t. (iv) It can also be shown that Brownian motion is the only martingale with

continuous paths such that for any 0 ≤ t ≤ s:

E (Zs − Zt )2 Gt = s − t                .

The ﬁnite dimensional distributions of a Brownian motion process are easily obtained from the

deﬁnition.

Proposition 96 For any ﬁnite index set 0 ≤ t1 < t2 < ... < tn the ﬁnite dimensional distribution

of the random vector (Zt1 , Zt2 , .., Ztn ) is Gaussian,

LZt1 ,Zt2 ,..,Ztn = N (0, Σ) ,

where                                                                   
      t1       ···     t1      t1 
                                  
      .                           
      .
.        t2     ···      t2 
                                  
Σ=

 .
                .
.     ..        . 
. 
      t1        .          .    . 
                                  
                                  
t1       t2     ···      tn

Proof. We have
                                                     
 Zt 1          1   0    ···        0   Zt1 − Z0      
                                                     
                                                     
 Z             1   1    ···        0   Zt 2 − Zt 1   
 t2                                                  
        =                                             ,
 .             .   .     ..        .        .        
 .             .   .          .    .        .        
 .             .   .               .        .        
                                                     
                                                     
Ztn             1   1    ···        1    Ztn − Ztn−1

Λ

i.e. any ﬁnite vector of coordinates of a Brownian motion can be written as a linear function

of a vector of Gaussian Brownian increments, and is thus also Gaussian with expectation 0 and

covariance matrix

Σ = ΛDΛ ,

68
where                                                                                     
          t1     0           0     ···           0           
                                                             
                                                             
          0    t2 − t1       0     ···           0           
                                                             
                                                             
                                                     .       
D=
          0      0       t3 − t2   ···               .
.       

                                                             
          .       .           .     ..               .       
          .
.       .
.           .
.          .           .
.       
                                                             
                                                             
                                                             
0      0           0     ···        tn − tn−1

Explicit computation of Σ concludes the proof.

In some cases, transformations of a Brownian motion give again a Brownian motion process.

Here are some well-known examples.

Example 97 Let (Zt , Gt )t≥0 be a standard Brownian motion on a probability space (Ω, G, P ). The

following transformations give again a Brownian motion:

1. Symmetry: (−Zt , Gt )t≥0

1
2. Scaling:   c− 2 Zct , Gct
t≥0

V
3. Time reversal: For given T > 0, the process Vt , Gt                   0≤t≤T
deﬁned by

V
Vt = ZT − ZT −t       ,    Gt := σ (Vu ; 0 ≤ u ≤ t)           .

For instance, to show 3. notice ﬁrst that for any ω ∈ Ω the map t −→ Vt (ω) = ZT (ω) −

ZT −u (ω) |u=t is continuous, because it consists of sums and compositions of continuous functions

of t. Further, by deﬁnition V0 = ZT − ZT −0 = 0 and any ﬁnite dimensional distribution of

(Vt )0≤t≤T is Gaussian since coordinates of (Vt )0≤t≤T arise as simple linear transformations of

coordinates of (Zt )0≤t≤T . Finally, for any t ≤ T

E (Vt ) = E (ZT − ZT −t ) = 0          ,

69
and for any 0 ≤ u ≤ t ≤ s ≤ T

Cov (Vs , Vu ) =            Cov (ZT − ZT −s , ZT − ZT −u )

=     T + (T − s) ∧ (T − u) − T ∧ (T − u) − (T − s) ∧ T

=     T + (T − s) − (T − u) − (T − s) = u     .

In particular, then

Cov (Vs − Vu , Vu ) = 0    ,

for any 0 ≤ u ≤ t. Since any ﬁnite dimensional distribution of (Vt )0≤t≤T is Gaussian this implies

V               V
Vs − Vt ⊥ σ (Vu ; 0 ≤ u ≤ t) = Gt . Thus, Vt , Gt             0≤t≤T
satisﬁes the deﬁnition of a Brownian

motion process.

As in the discrete time setting, some simple examples of continuous martingales are obtained

by considering some speciﬁc functionals of Brownian motion. For completeness, we give in the

next results two examples that are the most relevant to our exposition.

Example 98 Let (Zt , Gt )t≥0 be a standard Brownian motion on a probability space (Ω, G, P ).

Then the processes

2
1. Zt − t, Gt   t≥0

σ2 t
2.   exp σZt −       2     , Gt
t≥0

are both martingales.

Proof. It is obvious that both processes are (Gt )t≥0 −adapted, since they are both simple

measurable functions of Brownian motion. Moreover, the proof of the martingale property is

obtained readily with the same arguments as for the proof of the discrete time case in Example

87 and 88

70
5     Introduction to Stochastic Calculus

For the whole chapter let (Zt , Gt )t≥0 be a Brownian motion on a probability space (Ω, G, P ).

5.1    Starting Point, Motivation

To motivate the introduction of a stochastic integral consider again the discrete time dynamics

for the discounted value process (Xt /Bt )t=0,..,n of a self-ﬁnanced portfolio (∆t )t=0,..,n−1 :

Xt+1         St+1   (Xt − ∆t St ) Bt+1
=    ∆t   +              ·
Bt+1         Bt+1      Bt+1        Bt
St+1   St     Xt
= ∆t        −     +
Bt+1   Bt     Bt
St+1   St            St   St−1                             Xt−1
= ∆t        −     + ∆t−1       −                              +
Bt+1   Bt            Bt   Bt−1                             Bt−1
t
Si+1   Si                X0
=    ... =         ∆i        −               +
i=0
Bi+1   Bi                B0

Recall, that under the risk neutral measure P the discounted price process (St /Bt )t=0,..,n is a

martingale. Thus, we have represented the time variation of the discounted values of a self-

ﬁnanced portfolio as a sum of portfolio exposures (∆t )t=0,..,n−1 weighted by the increments of a

martingale process under P :

t−1
Xt   X0                                  Si+1   Si
−                =              ∆i         −           .          (27)
Bt   B0                 i=0 Portfolio
Bi+1   Bi
Change in discounted             exposure         Martingale
portfolio value                                 increment

Expression (27) is an example of a so called martingale transform. Martingale transforms are the

discrete analogues of stochastic integrals in which the process (∆t )t=0,..,n−1 is used as the integrand

and the process (St /Bt )t=0,..,n is used as an integrator. Informally, we could thus introduce the

suggestive notation:
t
Xt   X0                            S
−    =                    ∆·d            ,
Bt   B0              0             B

to denote such martingale transforms. In a later section this will denote a stochastic integral of

an adapted process ∆ with respect to the martingale process S/B over the time interval [0, t].

71
Deﬁnition 99 Let M = (Mt , Gt )t=0,..,n be a martingale and H = (Ht , Gt )t=0,..,n−1 an adapted

process on a probability space (Ω, G, P ). The process X = H • M deﬁned dy
t−1
Xt =            Hi (Mi+1 − Mi )      ,
i=0

for t > 0 and by X0 = 0 is called the martingale transform of M by H.

Example 100 (i) Equation (27) deﬁnes (Xt /Bt )t=0,..,n as the martingale transform of the P −mar-

tingale (St /Bt , Gt )t=0,..,n by (∆t , Gt )t=0,..,n−1 . Therefore, after an appropriate change of probabil-

ity measure the discounted value processes of self ﬁnanced portfolios are martingale transforms. (ii)

Let (Zt , Gt )t≥0 be a Brownian motion on a probability space (Ω, G, P ) and H = (Ht , Gt )t=0,..,n−1

be an adapted process. Then the process (Xt )t=0,..,n deﬁned by X0 = 0 and
t−1
Xt =            Hi (Zi+1 − Zi ) ,
i=0

is the martingale transform of the the Brownian motion process (Zt , Gt )t≥0 by H.

Modulo some integrability conditions (which are for example satisﬁed if H is a bounded pro-

cess), martingale transforms are martingales. Indeed, for any s > t:
t−1                                 s−1
E ( (H • M )s | Gt ) =           Hi (Mi+1 − Mi ) + E                 Hi (Mi+1 − Mi ) Gt
i=0                                 i=t
                           
s−1
=     (H • M )t +              E  Hi E ( Mi+1 − Mi | Gi ) Gt        ,
i=t
=0

by the tower property. In fact, we will see in a later section that under some integrability con-

ditions on the integrand H stochastic integrals are martingale processes also in the more general

continuous time setting.

Finally, notice that
                                                 
t−1                     t−1
2
E (H • M )t        = E                Hi (Mi+1 − Mi )          Hj (Mj+1 − Mj )
i=0                     j=0
t−1 t−1
=                   E [Hi Hj (Mi+1 − Mi ) (Mj+1 − Mj )]         .
i=0 j=0

72
Now, for any i < j one has
                                                  

E [Hi Hj (Mi+1 − Mi ) (Mj+1 − Mj )] = E Hi (Mi+1 − Mi ) Hj E ( (Mj+1 − Mj )| Gj ) = 0                     ,
=0

again by the tower property, and similarly for j < i. Therefore, we get

t−1
2                    2                2
E (H • M )t     =E              Hi (Mi+1 − Mi )        ,                          (28)
i=0

o
i.e. a discrete time version of the so called Itˆ isometry (see below). Speciﬁcally, for the case

where M is a Brownian motion process, equation (28) has the simpler form

t−1                                          t−1                               t
2               2                    2                       2                               2
E (H • Z)t      =E         Hi E (Zi+1 − Zi ) Gi             =E          Hi (i + 1 − i)     =E           Hs · ds   ,
i=0                                          i=0                           0

t−1    2                                             t    2
writing for any ω ∈ Ω the expression       i=0   Hi (ω) as a standard Lebesgue integral         0
Hs (ω) · ds.

There are several important theoretical and applied settings that ask for an extension of the

martingale transform (of the discrete time stochastic integral) concept to a more general class of

integrands deﬁned on a continuous index set and with possibly not piecewise constant paths. From

a more ﬁnancially oriented perspective, extending the class of integrands suitable for a stochastic

integration with respect to a martingale will allow us to enlarge the set of self-ﬁnanced portfolios

that can act as an hedging portfolio. Eventually, this will allow us to perfectly replicate some

contingent claims which in the semicontinuous model setting could not be perfectly hedged.

5.2    The Stochastic Integral

The deﬁnition and the construction of the stochastic integral for continuous time integrands with

respect to a Brownian motion process is performed in this section. Since with probability one the

trajectories of a Brownian motion process are of unbounded variation, this construction cannot

happen simply by integrating pathwise the Brownian trajectories via a standard Lebesgue Stjielties

integral.

73
5.2.1      Some Basic Preliminaries

The basic idea in constructing the stochastic integral of an adapted integrand H := (Ht , Gt )0≤t≤T

is to interpret such processes as random variables H : [0, T ] × Ω → R on the product measurable

space

([0, T ] × Ω, B ([0, T )) ⊗ G)                        .

This space becomes a measure space ([0, T ] × Ω, B ([0, T )) ⊗ G, µT ) when equipped with the prod-

uct measure µT : B ([0, T ]) ⊗ G → [0, T ] deﬁned by
T                                          T
µ (A) = E                        1A (t, ω) dt =                             1A (t, ω) dt dP (ω)                       ,
0                                 Ω        0

for any A ∈ B ([0, T )) ⊗ G. In particular, for any adapted process H := (Ht , Gt )0≤t≤T one can

deﬁne the L2 −norm of H as
1                                             1
2                           T                 2

H        2,T   :=                      H 2 dµT             =       E                   H 2 dt                ,
[0,T ]×Ω                                           0

provided of course H             2,T   < ∞. We call the space of measurable processes (Ht , Gt )0≤t≤T such that

H   2,T   < ∞ the space of squared integrable random variables on ([0, T ] × Ω, B ([0, T ]) ⊗ G, µT )

and denote it by HT . Precisely:
T
HT :=     B ([0, T ]) ⊗ G − measurable processes H such that E                                                             H 2 dt       <∞   .       (29)
0

This space equipped with the norm ·                            2,T     is a normed vector space.

Example 101 (i) The Brownian motion process Z := (Zt , Gt )0≤t≤T is an element of HT . Indeed,
T                       T                               T
2                         2                                         T2
E                  Zt dt       =           E Zt dt =                       tdt =           <∞            .
0                       0                               0                    2
2
(ii) The squared Brownian motion process Z 2 := Zt , Gt                                            0≤t≤T
is also an element of HT . Indeed,
T                                T                         T                                             T
2   2                              4                           2               2
E           Zt       dt     =                E Zt dt =                 3 E Zt                   dt = 3               t2 dt = T 3 < ∞       ,
0                                0                         0                                             0

using the fact that for a normally N 0, σ 2 −distributed random variable X one has E X 4 = 3σ 4 .

(iii) In fact, by similar arguments as the ones above any power Z k , k ∈ N, of a Brownian motion

can be shown to be an element of HT .

74
For simplicity, we ﬁx in the sequel T > 0 and use the notation H := HT . Convergence in the

space H means convergence in the ·              2,T   −norm.

Deﬁnition 102 (i) A sequence (H n )n∈N ⊂ H is said to converge to some process H ∈ H if and

only if H n − H       2,T   → 0, as n → ∞, i.e. if and only if
1
T                         2
n        2
E              (H − H) dt                    → 0       ,
0                                     n→∞

or, equivalently, if and only if

T
2
E             (H n − H) dt          → 0           .
0                             n→∞

(ii) In that case we call H the limit4 of the sequence (H n )n∈N and denote it by H = limn→∞ H n .

As usual, as for instance when deﬁning standard Lebesgue integrals, one starts from integrands

that are piecewise constant with respect to some underlying variable, i.e. simple integrands,

and then extends the integral deﬁnition to more general integrands by means of a suitable limit

argument.

5.2.2      Simple Integrands

Simple integrands in the space H and stochastic integrals for simple processes are deﬁned as

follows.

Deﬁnition 103 (i) An adapted process H = (Ht , Gt )t≥0 is simple if there exists a partition 0 =

t0 < t1 < .. < tn = T of [0, ∞) such that

n−1
Ht (ω) = ξ0 (ω) 1{0} (t) +           ξi (ω) 1(ti ,ti+1 ] (t)       ,    for all (t, ω) ∈ [0, T ] × Ω   ,
i=0

and for some Gti −measurable, bounded, random variables ξi (ω), i = 0, .., n. The vector space of

simple processes is denoted by S. (ii) For a simple process H = (Ht , Gt )0≤t≤T ∈ S, the stochastic

4    Notice, that if H = limn→∞ H n , then H can be modiﬁed on a set of µT measure 0 without aﬀecting the value
of ·   2,T . Therefore, every limit in H is in fact a class of processes that can diﬀer on a set of µT measure 0.

75
integral of H with respect to the Brownian motion Z := (Zt , Gt )t≥0 is the martingale transform of

Z by H. That is, for any t ∈ (tk , tk+1 ] and k = 0, .., n − 1, we deﬁne:

t               k−1
Hs dZs :=            ξi Zti+1 − Zti + ξk (Zt − Ztk )                  .
0                   i=0

t                  t
Sometimes we will write for brevity                        0
HdZ :=        0
Hs dZs .

Remark 104 (i) Notice that by construction S ⊂ H since

T                                     T                     n−1
E                 H 2 dt             = E                   2
ξ0 1{0} (s) +               2
ξi 1(ti ,ti+1 ] (s) ds
0                                     0                          i=0
n−1                               n−1
2                                              2
= E                ξi (ti+1 − ti )    =            (ti+1 − ti ) E ξi < ∞         ,       (30)
i=0                               i=0

by the boundedness of ξ1 , .., ξn . (ii) For any given t ≥ 0 one can deﬁne the Brownian motion

process stopped at time t by

Z t := (Zs∧t )s≥0            .

With this deﬁnition we have for any t ∈ (tk , tk + 1]:


 Z

 ti +1 − Zti                                          i<k



Z(ti +1)∧t − Zti ∧t =                                                                    ,
 Zt − Ztk

i=k



 Z −Z =0
 t      t                                             i>k

implying

t                 k−1                                                             n−1
Hs dZs =                   ξi Zti+1 − Zti + ξk (Zt − Ztk ) =                         ξi Z(ti +1)∧t − Zti ∧t       .
0                     i=0                                                              i=0

Moreover, we have


 Z =Z
 ti            t
ti ∧t = Zti                 i≤k
t
E Zti +1 Gti = E Z(ti +1)∧t Gti =                                                                         ,

 Z =Z
 t            t
ti ∧t = Zti                  i>k

t                                                                                         t
that is (Zs , Gs )s=0,t1 ,t2 ,..,tn is a discrete time martingale and                               0
Hs dZs is the martingale transform

t
of Z t by H. Finally, since for any ω ∈ Ω and ti ∈ {0, t1 , t2 , .., tn } the mapping t −→ Zti (ω) is a

76
continuous one, it follows that the mapping

t               n−1
t −→               Hs dZs =           ξi Z(ti +1)∧t − Zti ∧t                       ,
0                    i=0

is, as a linear combination of continuous functions, also continuous. Therefore, the stochastic
t
integral process       0
HdZ          is a continuous time process with continuous trajectories. Notice,
t≥0

that the integrand H is a process with possibly discontinuous trajectories. Therefore, the stochastic

integral is a ”regularizing” operator that maps possibly discontinuous processes into process with

continuous trajectories.

t
It is not suprising that the stochastic integral process                                    0
HdZ                  is a martingale process,
t≥0

since it can be written as a martingale transform. Moreover, second moments of stochastic inte-

o
grals can be often easily computed by means of the so called Itˆ isometry. We summarize these

properties in the next result for the case of a simple integrand H ∈ S. In the next section, such

properties will hold also for stochastic integrals of more general integrands H ∈ H.

Proposition 105 Let H, H ∈ S. Then it follows:

1. The stochastic integral is a linear operator, that is:

t                                        t                         t
(αH + βH ) dZ = α                        HdZ + β                   H dZ,
0                                        0                         0

for any α,β ∈ R.

t
2.     0
HdZ, Gt         is a martingale with continuous trajectories.
t≥0

o
3. The Itˆ isometry holds:

t           2                  t
E                  HdZ          =E                H 2 ds            .
0                              0

t
Proof. 1. This is immediate from the deﬁnition of the stochastic integral                                                       0
HdZ as a

(stochastic) linear combination of Brownian increments. To prove 2. it remains to show that

77
t
0
HdZ, Gt              is a martingale (continuity was already established in Remark 104). Thus, let
t≥0

t ≤ s. We then have
s                    n−1
E                  HdZ Gt      =          E ξi Z(ti +1)∧s − Zti ∧s       Gt        ,
0                        i=0

and
                                               

                                               


            ξi E Z(ti +1)∧s − Zti ∧s Gt        


                                               


                                               



              = ξi Z(ti +1)∧t − Zti ∧s                      t ≥ ti

                                               

                                               



E ξi Z(ti +1)∧s − Zti ∧s                     Gt =                  = ξi Z(ti +1)∧t − Zti ∧t                                      ,

                                                        



                                                        


      E ξi E Z(ti +1)∧s − Zti ∧s Gti           Gt       



                                                            t < ti

                                                        

            = 0 = ξi Z(ti +1)∧t − Zti ∧t                

because for t ≤ ti it follows Z(ti +1)∧t − Zti ∧t = 0. Therefore.
s                     n−1                                                 n−1                                         t
E                HdZ Gt       =           E ξi Z(ti +1)∧s − Zti ∧s              Gt =         ξi Z(ti +1)∧t − Zti ∧t =                  HdZ     ,
0                         i=0                                                 i=0                                     0

To prove 3. note ﬁrst that we have
                                                                                             
t            2                     n−1                              n−1
E               HdZ                  = E            ξi Z(ti +1)∧t − Zti ∧t           ξj Z(tj +1)∧t − Ztj ∧t 
0                                      i=0                              j=0
n−1 n−1
=               E ξi ξj Z(ti +1)∧t − Zti ∧t        Z(tj +1)∧t − Ztj ∧t               .
i=0 j=0

Further, if i < j (i.e if ti+1 ≤ tj ) it follows

E ξi ξj Z(ti +1)∧t − Zti ∧t           Z(tj +1)∧t − Ztj ∧t

=       E ξi Z(ti +1)∧t − Zti ∧t ξj E Z(tj +1)∧t − Ztj ∧t Gtj

=       0       ,

by the independence of Brownian increments. A similar argument applies for the case j < i.

Moreover, for i = j we get

2                                   2           2                                    2
E ξi Z(ti +1)∧t − Zti ∧t                     = E ξi E Z(ti +1)∧t − Zti ∧t Gti



 0 = E ξi ((ti + 1) ∧ t − ti ∧ t)
2
t ≤ ti
=                                                                .

 E ξ 2 ((t + 1) ∧ t − t ∧ t)
      i    i             i                           t > ti

78
Consequently,

t         2       n−1                                                    n−1
2                                                    2
E              HdZ       =         E ξi ((ti + 1) ∧ t − ti ∧ t) = E                       ξi ((ti + 1) ∧ t − ti ∧ t)
0                     i=0                                                    i=0
t
=   E              H 2 ds ,
0

as claimed. This concludes the proof.

Basically, one can think about the problem of deﬁning a stochastic integral for more general

integrands than simple processes, as the problem of extending smoothly the integral deﬁnition of

a process H ∈ S to a larger space of adapted integrands. Smoothness of the extension procedure

is desirable in order to maintain the integral properties in Proposition 105 - which are valid for

integrands H ∈ S - also for the resulting stochastic integral of a more general integrand.

It turns out that the adequate space on which stochastic integrals can be smoothly extended

is the space H deﬁned in (29). This fact relies on an approximation result which states that any

H ∈ H can be approximated by a sequence of simple processes (H n )n∈N ⊂ S converging to H,

i.e. approximating H in the norm deﬁned on H.

For completeness, we state without proof this crucial approximation ﬁnding precisely in the

next proposition.

Proposition 106 For any process H ∈ H there exists a sequence (H n )n∈N ⊂ S such that

T
2
E                  (H − H n ) ds       → 0.
0                         n→∞

Example 107 We illustrate the above approximation result for the case where H = Z. We know

(see Example 101) that the Brownian motion process Z := (Zt )0≤t≤T is an element of H. Z can

be approximated by means of a sequence (H n )n∈N ⊂ H given by5

2n −1
n
Ht (ω) :=                  ZiT /2n (ω) 1(iT /2n ,(i+1)T /2n ] (t)    .
i=0

5 Notice, that for any ﬁxed ω ∈ Ω this is the same type of approximation procedure we would use to deﬁne a

standard Lebesgue integral of the function t −→ Zt (ω) (see again Remark 45).

79
Indeed, we have

T                                          T
2                                                     2
E           (Z − H n ) dt =                                     n
E (Zt − Ht ) dt
0                                          0
2
T                         2n −1
=                   E        Zt −                 ZiT /2n 1(iT /2n ,(i+1)T /2n ] (t)            dt
0                              i=0
2
T            2n −1
=                   E                      Zt − ZiT /2n 1(iT /2n ,(i+1)T /2n ] (t)                    dt
0                   i=0
n
T 2 −1
2
=                               E Zt − ZiT /2n                    1(iT /2n ,(i+1)T /2n ] (t) dt
0        i=0
2n −1               T
=                               (t − iT /2n ) 1(iT /2n ,(i+1)T /2n ] (t) dt
i=0         0
2n −1                         2
(T /2n )    T
=                             = n+1 → 0                                 .
i=0
2     2   n→∞

Therefore, H n            →        Z in the space H. Remark that strictly speaking H n ∈ S because Zt is
/
n→∞

unbounded for any t. However, any H n can be approximated by a sequence K nk                                                               k∈N
⊂ S deﬁned

by

2n −1
nk
Kt (ω)             =           ZiT /2n (ω) 1(iT /2n ,iT /2n +1] (t) 1{ZiT /2n ≤k} (ω)
i=0
2n −1
+               k1(iT /2n ,iT /2n +1] (t) 1{|ZiT /2n |>k} (ω)                         .
i=0

Indeed,

T                                              T
n     nk         2                                       n    nk                   2
E            Ht − K t              dt       =               E         Ht − Kt                        dt
0                                              0
                                                                                            2

T                   2n −1
=               E                            ZiT /2n − k 1{|ZiT /2n |>k} 1(iT /2n ,(i+1)T /2n ] (t)                   dt
0                       i=0

T       2n −1
2
=                               E         ZiT /2n − k             1{|ZiT /2n |>k} 1(iT /2n ,(i+1)T /2n ] (t) dt
0               i=0

This last integral is ﬁnite and non negative, since it is the Lebesgue integral of a simple (deter-

ministic) function of t:

2n −1
2
t −→              E    ZiT /2n − k                        1{|ZiT /2n |>k} 1(iT /2n ,(i+1)T /2n ] (t)                              (31)
i=0

80
Moreover, by the properties of normal distributions one has for any i = 0, .., 2n −1 and any k > k ,

the strict inequality

2                                                                    2
E     ZiT /2n − k             1{|ZiT /2n |>k} < E                       ZiT /2n − k                1{|ZiT /2n |>k }      ,

implying that as a function of k the corresponding Lebesgue integrals in (31) build a monotonically

strictly decreasing non negative sequence. Therefore, this sequence of integrals can only converge

to a limit 0, i.e.:
T
n    nk                  2
E               Ht − Kt                       dt         ↓   0       .
0                                           k→∞

Summarizing, Z can be approximated by the sequence (H n )n∈N ⊂ H and any H n can be approxi-

mated by a sequence K nk                k∈N
⊂ S. In turn, Z can be approximated by a sequence (W n )n∈N ⊂ S

where W n := K nn .

5.2.3    Squared Integrable Integrands

We can now deﬁne the stochastic integral of any process H ∈ H by using the approximation result

in Proposition 106.

We notice ﬁrst that for any sequence (H n )n∈N ⊂ S converging to a process H ∈ H, i.e. such

that
T
n              2
E           (Hs − Hs ) ds → 0                            ,
0                                      n→∞

it follows, for any 0 ≤ t ≤ T :

t                   t               2                                t                               2
n                   m                                                 n            m
E               Hs dZ   −           Hs dZ               =       E                    (Hs      −    Hs ) dZ
0                   0                                                0
t
n    m               2
=       E            (Hs − Hs ) ds
0
T                                               T
n            2                                  m 2
≤       2E               (Hs − Hs ) ds + 2E                              (Hs − Hs ) ds    →      0
0                                               0                       n,m→∞

(32)

81
using in the ﬁrst equality the linearity of stochastic integrals for simple processes, in the second

n    m
o
the Itˆ isometry applied to the simple process Hs − Hs , and in the last inequality the bound
2
(a + b) ≤ 2 a2 + b2 , where a, b ∈ R.
t    n
Therefore, for any 0 ≤ t ≤ T the sequence                         0
Hs dZ                 is a Cauchy sequence of random
n∈N
t        2
variables with mean 0 and variance E                     0
(H n ) ds < ∞, i.e. a Cauchy sequence in the space of

squared integrable random variables on (Ω, G, P ). It is well known that in this space any Cauchy

sequence converges to a well deﬁned element of the space. Therefore, it is a natural idea to deﬁne
t    n                                                             t
the limit of the sequence         0
Hs dZ                  as the stochastic integral              0
HdZ of H with respect to
n∈N

Z. We state this precisely in the next deﬁnition.

Deﬁnition 108 Let H ∈ H be a squared integrable process and let (H n )n∈N ⊂ S be any sequence

of simple processes approximating H, i.e. such that:

T
2
E                (H − H n ) ds → 0                   .
0                              n→∞

For any 0 ≤ t ≤ T the stochastic integral of H with respect to the Brownian motion Z :=

(Zt , Gt )0≤t≤T is deﬁned by
t                                  t
n
Hs dZs := lim                      Hs dZs       .
0                       n→∞        0

Example 109 We compute the stochastic integral as the corresponding limit in an example where

explicit computations are possible. Recall from Example 107 that the sequence (H n ) ⊂ S deﬁned

by
2n −1
n
Ht   (ω) :=                 ZiT /2n (ω) 1(iT /2n ,(i+1)T /2n ] (t) ,
i=0

converges in the space H to the Brownian motion process Z. We have:

T                  2n −1
n
Hs dZs   =               ZiT /2n Z(i+1)T /2n − ZiT /2n
0                       i=0
2n −1
1 2     2   1                                                         2
=       ZT − Z0 −                                  Z(i+1)T /2n − ZiT /2n        .
2           2                      i=0

82
Moreover, we have
2n −1                                                        2n −1                                                                      2n −1
2                                                                              2                        T
E               Z(i+1)T /2n − ZiT /2n                     =                E         Z(i+1)T /2n − ZiT /2n                              =                   =T         ,
i=0                                                         i=0                                                                        i=0
2n

and
                                                                 2
                                                                                                       2

2n −1                                                                                         2n −1
2                                                                                                            2   T
E                  Z(i+1)T /2n − ZiT /2n              −T                = E                                       Z(i+1)T /2n − ZiT /2n                        − n                
i=0                                                                                       i=0
2
2n −1                                                                          2
2        T
=                     E           Z(i+1)T /2n − ZiT /2n                           −
i=0
2n
2n −1                                                                          2
2       T
=                     E               Z(i+1)T /2n − ZiT /2n                       −                    ,
i=0
2n

using in the second equality the independence of Brownian, which implies

2           T                                                         2           T
E       Z(i+1)T /2n − ZiT /2n                   −                     Z(j+1)T /2n − ZjT /2n                           −                    =0       ,
2n                                                                    2n

for i = j. Further we obtain
2
2       T                                                                                  4
E         Z(i+1)T /2n − ZiT /2n                  −                         =        E        Z(i+1)T /2n − ZiT /2n
2n
2
T                      T                                                      2
+                     −2·          E        Z(i+1)T /2n − ZiT /2n
2n                     2n
2                 2                        2                     2
T                     T                    T                         T
=        3                     +                 −2                           =2                    .
2n                    2n                   2n                        2n

Therefore,                                                                                                      
2
2n −1
2                            T
E                  Z(i+1)T /2n − ZiT /2n                            −T               =           → 0                       ,
i=0
2n−1 n→∞

i.e.
2n −1
2
Z(i+1)T /2n − ZiT /2n                                    → T           ,
n→∞
i=0

in the space of squared integrable random variables on (Ω, G, P ). Summarizing this gives
T                                         T
n
Hs dZs      =        lim                  Hs dZs
0                            n→∞          0
2n −1
1 2    1                                                                       2
=        lim            ZT −                              Z(i+1)T /2n − ZiT /2n
n→∞          2      2                i=0
1 2 1
=         Z − T                     .
2 T 2

83
5.2.4    Properties of Stochastic Integrals

It is immediate from the deﬁnition of the stochastic integral of a process H ∈ H deﬁned as the

limit of a sequence of stochastic integrals of processes H n ∈ S, that linearity is preserved in the

limit, i.e.
t                                 t                       t
(αHs + βHs ) dZs = α              Hs dZs + β              Hs dZs   ,
0                                 0                       0

for any H, H ∈ H and α, β ∈ R.

Furthermore, the key property of the space H from the perspective of stochastic integration

is that convergence of a sequence (H n )n∈N ⊂ H to some limit H ∈ H deﬁnes the corresponding

stochastic integral as the limit of the sequence                         H n dZ   n∈N
in the space of squared integrable

random variables on (Ω, G, P ) (see again equation (32) and the following discussion). In fact, this

implies convergence of the ﬁrst two moments and the conditional expectations of the sequence

H n dZ     n∈N
to the ﬁrst two moments and the conditional expectations of the limit

HdZ = lim                H n dZ    .
n→∞

For instance, this gives

t                                    t                                    s                    s
n                                    n                    n
E              Hu dZu Gs        = lim E             Hu dZu Gs        = lim               Hu dZu =             Hu dZu   ,
0                          n→∞       0                             n→∞    0                    0

o
where t ≥ s, i.e. the martingale property. Therefore, the martingale property and the Itˆ isom-

etry of stochastic integrals, which have been shown to hold for integrals of simple processes, are

maintained for stochastic integrals of integrands H ∈ H.

Finally, it can also be shown at the cost of some more technical details that convergence

in the space of squared integrable random variables together with the martingale property and

the continuity of stochastic integrals of simple processes implies the continuity of the stochastic

integral of a process H ∈ H. For completeness, we summarize the above discussion in the next

proposition.

Proposition 110 Let H, H ∈ H. It then follows:

84
1. The stochastic integral is a linear operator, that is:

t                                        t                            t
(αH + βH ) dZ = α                        HdZ + β                      H dZ,
0                                        0                            0

for any α,β ∈ R.

t
2.      0
HdZ, Gt           is a martingale with continuous trajectories.
0≤t≤T

o
3. The Itˆ isometry holds:

t             2                        t
E             HdZ            =E                      H 2 ds            .
0                                      0

5.3        o
Itˆ’s Lemma

This section introduces a diﬀerential calculus for diﬀerentiable functions of stochastic integrals.

o                                      o
This calculus is called Itˆ’s calculus and its primary tool is Itˆ’s formula, a version of the funda-

mental theorem of calculus for stochastic diﬀerentials.

5.3.1     Starting Point, Motivation and Some First Examples

To introduce the basic ideas behind stochastic diﬀerentials we start with a simple illustrative

2
example. Consider the squared Brownian motion process Z 2 := Zt , Gt                                                     t≥0
. The goal is to

2
express Zt by means of an integral form of the type

t                     t
2    2
Zt − Z0 =                  Ks ds +               Hs dZs           ,
0                     0

for some suitable adapted integrands (Kt , Gt )t≥0 and (Ht , Gt )t≥0 . Such a representation would

motivate the suggestive stochastic diﬀerential notation

2
d Zt = Kt dt + Ht dZt                            .

Under the naive assumption that for any ω ∈ Ω the Brownian path t −→ Zt (ω) is a diﬀerentiable

function (we know it is not!) one would be tempted to apply the standard fundamental theorem

85
of calculus to write

t                                        t
2        2                      d 2                                                    d
Zt (ω) − Z0 (ω) =                  Z (ω) ds = 2                           Zs (ω) ·         Zs (ω) ds
0       dt s                             0                     dt
= dZs (ω) under
the diﬀerentiability
assumption
t
= 2               Zs (ω) · dZs (ω)                     .                                           (33)
0

Unfortunately, this approach cannot work, as we know. In fact, we already developed a partic-

ular stochastic integral construction in order to avoid the fact that Brownian trajectories are of

unbounded variation and thus not diﬀerentiable.

Indeed, the naive approach (33) leads immediately to an internal inconsistency which can be
t
highlighted as follows. First, notice that the stochastic integral process                                       0
ZdZ             on the
0≤t≤T

RHS of (33) is a martingale, since (Zt , Gt )0≤t≤T ∈ H (cf. again Example 101). At the same time,

2                                            2
the process Zt , Gt    0≤t≤T
is a submartingale, since Zt − t, Gt                             0≤t≤T
is a martingale (cf. again

Example (98)):

2           2                               2       2              2
E Zs − s Gt = Zt − t                 ⇐⇒     E Zs Gt = Zt + (s − t) > Zt                                 ,             (34)

for any s > t. Thus, (34) shows that a fundamental theorem of calculus for stochastic integrals

has to be of a diﬀerent form that the standard one. In particular, it appears that the standard

2                                    2     2
Theorem neglects a deterministic term in Zt which is the expected value t = E Zt of Zt in the

LHS of (33). Therefore, one could be tempted to write

t
2
Zt − t = 2                         ZdZ       ,
0

i.e.
t
2    2                               1
Zt − Z0 = 2                      ZdZ + t                    ,                                      (35)
0            2

in order to avoid, at least superﬁcially, the inconsistency behind (33). It turns out that this is

o
the correct guess. Indeed, the structure behind (35) can be highlighted as a special case of Itˆ’s

86
2
formula by setting g (Zt ) = Zt , to get

t                                 t
1
g (Zt ) − g (Z0 ) =                 g (Zs ) dZs +                     g (Zs ) ds .              (36)
0                            2    0

2
Notice that in (36) Zt has been decomposed as the sum of a stochastic integral giving the mar-

2
tingale component of Zt , and a standard pathwise Lebesgue integral describing the deterministic

2
trend of Zt . Expression (36) gives the mathematical foundation for the stochastic diﬀerential

form
1
dg (Zt ) = g (Zt ) dZt + g (Zt ) dt ,
2

o
of Itˆ’s formula.

o
An immediate extension of Itˆ’s formula (36) arises when g is a function of both t and Zt .

In this case, the rule is to apply standard diﬀerentials to deterministic variables and stochastic

o
diﬀerentials - by means of Itˆ’s formula (36) - to stochastic variables, that is:

t                               t
∂g                              ∂g            1 ∂2g
g (t, Zt ) − g (0, Z0 ) =               (s, Zs ) dZs +                  (s, Zs ) +       (s, Zs ) ds        ,   (37)
0       ∂Z                      0       ∂t            2 ∂2Z

or, in diﬀerential form,

∂g                            ∂g            1 ∂2g
dg (t, Zt ) =      (t, Zt ) dZt +                (t, Zt ) +       (t, Zt ) dt
∂Z                            ∂t            2 ∂2Z

Example 111 (Geometric Brownian Motion) A geometric Brownian motion process (St , Gt )t≥0

is deﬁned by
σ2
St = S0 exp σZt + µ −                             t       =: g (t, Zt )     ,                (38)
2

for given S0 , σ > 0 and µ ∈ R. It then follows

∂g                              σ2                                                σ2              σ2
(t, Zt ) = S0 exp σZt + µ −                                 t        µ−            = St µ −
∂t                             2                                                 2               2
∂g                              σ2
(t, Zt ) = S0 exp σZt + µ −                                 t σ = St σ
∂Z                              2
∂2g
(t, Zt ) = St σ 2 .
∂2Z

87
o
Itˆ’s formula (37) then gives

St − S0   =    g (t, Zt ) − g (0, Z0 )
t                                      t
∂g                                      ∂g            1 ∂2g
=                   (s, Zs ) dZs +                          (s, Zs ) +       (s, Zs ) ds       ,
0       ∂Z                             0        ∂t            2 ∂2Z
t                         t
σ2    1
=                Ss σdZs +                 Ss µ −                 + Ss σ 2 ds
0                         0                         2     2
t                     t
=                Ss µds +          Ss σdZs               ,
0                     0

or, in diﬀerential form,

dSt = µSt dt + σSt dZt                          .

Remark 112 Example 111 provides a ﬁrst example of a solution to a stochastic diﬀerential equa-

tion. Indeed, if we consider the problem of determining an adapted process (St , Gt )t≥0 that solves

the stochastic diﬀerential equation

dSt = µSt dt + σSt dZt                          ,

for a given S0 , Example 111 already gives the answer: (St , Gt )t≥0 is the Geometric Brownian

motion process deﬁned in (38).

5.3.2                               o
A Simpliﬁed Derivation of Itˆ’s Formula

o
We illustrate some of the main ideas behind the proof of Itˆ’s formula (37) by providing a proof

for the simpler case where
1        2
g (t, Z) =                (t + Z)          .                               (39)
2

Thus, we are going to show that
t                                      t
∂g                                         ∂g            1 ∂2g
g (t, Z) =              (s, Zs ) dZs +                             (s, Zs ) +       (s, Zs ) ds ,
0       ∂Z                             0           ∂t            2 ∂2Z

o
i.e., for our speciﬁc case (apply Itˆ’s formula (37) to (39)),
t                                          t
1          2                   1
(t + Zt ) =                    + s + Zs ds +                             (s + Zs ) dZs   .       (40)
2                      0       2                                  0

88
Proof. Let us ﬁx ﬁrst a partition {t0 , .., t2n } of the interval [0, t], given by

it
ti =               ,    i = 0, .., 2n     .
2n

We then have,
2n −1
1          2  1                                        2                  2
(t + Zt ) =                           ti+1 + Zti+1       − (ti + Zti )      .                (41)
2             2          i=0

We can now apply an exact second order Taylor approximation to any term in the sum (41):

1                     2                     2
ti+1 + Zti+1        − (ti + Zti )           = g ti+1 , Zti+1 − g (ti , Zti )
2
∂g (ti , Zti )                  ∂g (ti , Zti )
=                    (ti+1 − ti ) +                   Zti+1 − Zti
∂t                            ∂Z
1 ∂ 2 g (ti , Zti )               2     ∂ 2 g (ti , Zti )               2
+               2t
(ti+1 − ti ) +                      Zti+1 − Zti
2           ∂                                ∂2Z
∂ 2 g (ti , Zti )
+                   (ti+1 − ti ) Zti+1 − Zti           .
∂t∂Z

Explicit computations then give

∂g (ti , Zti )              ∂g (ti , Zti )
= ti + Zti =                              ,
∂t                          ∂Z

and
∂g (ti , Zti )   ∂g (ti , Zti )   ∂ 2 g (ti , Zti )   ∂ 2 g (ti , Zti )
=                =                   =                   =1                .
∂t∂Z             ∂Z∂t                 ∂2t                ∂2Z

Therefore,

2                          2
ti+1 + Zti+1                    t i + Zt i
−                              = (ti + Zti ) ti+1 − ti + Zti+1 − Zti
2                               2
1            2    1              2
+ (ti+1 − ti ) +      Zti+1 − Zti
2                 2
1
+ · 2 (ti+1 − ti ) Zti+1 − Zti    .
2

and

2n −1
1          2
(t + Zt )        =           (ti + Zti ) ti+1 − ti + Zti+1 − Zti
2                       i=0
2n −1
1                             2                          2
+                 (ti+1 − ti ) + Zti+1 − Zti                   + 2 (ti+1 − ti ) Zti+1 − Zti   . (42)
2     i=0

89
We now compute the limit as n → ∞ of each term in this expression. We ﬁrst have
2n −1                                              t
lim               (ti + Zti ) (ti+1 − ti ) =                     (s + Zs ) ds ,
n→∞                                                       0
i=0

a pathwise Lebesgue integral. Further,
2n −1                                                        t
lim            (ti + Zti ) Zti+1 − Zti =                               (s + Zs ) dZs               ,
n→∞                                                         0
i=0

a stochastic integral (see again Example 109). Moreover,
2n −1                                                     t
1                                   2       t   1
lim                  Zti+1 − Zti            =     =                        ds       ,                   (43)
n→∞ 2                                            2   2              0
i=0

again from the computations in Example 109.

In order to prove (40), we thus have to show that all remaining terms in (42) converge to zero.

Indeed, we have:
2n −1                          2n −1                2
2               1                      1
(ti+1 − ti ) =                                =           → 0                     .
i=0                         i=0
2n                     2n n→∞

Moreover,
2n −1                                               2n −1
1                                                    1
(ti+1 − ti ) Zti+1 − Zti              = n               Zti+1 − Zti =                          (Zt − Z0 )   .
i=0
2      i=0
2n

Hence, for any ω ∈ Ω:
2n −1
(ti+1 − ti ) Zti+1 − Zti (ω) → 0                                      .
n→∞
i=0

This concludes the proof.

Remark 113 The above proof has been considerably simpliﬁed by the fact that the function

1         2
g (t, Z) =      (t + Z)
2

can be locally exactly approximated by a sequence of second order Taylor expansions. In the more

general case one will have to work with exact Taylor approximations of the form

∂g (ti , Zti )                ∂g (ti , Zti )
g ti+1 , Zti+1 − g (ti , Zti )       =                      (ti+1 − ti ) +                 Zti+1 − Zti
∂t                             ∂Z
1 ∂ 2 g t∗ , Zti
i
∗
2
∗
∂ 2 g t∗ , Zti
i                                       2
(ti+1 − ti ) +                   Zti+1 − Zti
2         ∂2t                                ∂2Z
∗
∂ 2 g t∗ , Zti
i
+                  (ti+1 − ti ) Zti+1 − Zti                                   ,
∂t∂Z

90
for some t∗ , Zi such that t∗ ∈ [ti , ti+1 ] and Zi ∈ Zti , Zti+1 , and show that the residual approxi-
i
∗
i
∗

mation error goes to 0 as n → ∞.

o
The above simpliﬁed proof of Itˆ’s formula shows why the non standard second derivative term

t
1             ∂ 2 g (ti , Zti )
ds
2     0            ∂2Z

appears. In fact, we have shown for our speciﬁc example that

2n −1                                                                  2n −1                                             t              t
∂ 2 g (ti , Zti )                          2                                                      2                                 ∂ 2 g (ti , Zti )
lim                              Zti+1 − Zti                  = lim                   Zti+1 − Zti                     =             ds =                             ds .
n→∞
i=0
∂2Z                                         n→∞
i=0                                           0              0            ∂2Z

Notice that
2n −1
2
lim                    Zti+1 − Zti                 =t ,                                                       (44)
n→∞
i=0

is the quadratic variation of the Brownian motion process on the interval [0, t], which is of order

t and thus not6 zero. Therefore, the further term in Itˆ’s formula derives precisely from the non
o

zero quadratic variation of Brownian motion.

By contrast to that, we have shown that

2n −1                                                                  2n −1
∂ 2 g (ti , Zti )             2                                                            2
lim                             (ti+1 − ti ) = lim                                   (ti+1 − ti ) = 0                  ,
n→∞
i=0
∂2t                       n→∞
i=0

i.e. that the second order derivative terms arising from the dependence on the deterministic

o
argument t have zero quadratic variation, and thus do not contribute to Itˆ’s formula.

Similarly, for the mixed second order derivative terms we have shown

2n −1                                                                                        2n −1
∂ 2 g (ti , Zti )
lim                             (ti+1 − ti ) Zti+1 − Zti = lim                                             (ti+1 − ti ) Zti+1 − Zti = 0                      ,
n→∞
i=0
∂t∂Z                                   n→∞
i=0

6 In particular, this implies that Brownian motion has non diﬀerentiable trajectories, because otherwise one

would get
n
2X −1                               2X „
n
−1                       «2     2n −1 „       «2
`                 ´2                  d                      1 X       d
Zti+1 − Zti         =               Zt∗ (ti+1 − ti )   = n           Zt∗    (ti+1 − ti )                                     ,
i=0                                 i=0
dt i                  2 i=0 dt i

where t∗ ∈ [ti , ti+1 ], and, in the limit,
i
n
2X −1                                                     Z       t   „           «2
`                  ´2                       1                     d
Zti+1 − Zti         →           lim       ·                      Zs        ds = 0      ,
i=0
n→∞ n→∞            2n         0           dt

i.e. a contradiction with (43).

91
o
i.e. they also do not contribute to Itˆ’s formula. In that case, the contribution is zero because the

quadratic cross variation
2n −1
lim             (ti+1 − ti ) Zti+1 − Zti
n→∞
i=0

between t and Zt is zero.

o
Based on these considerations, a simple mechanical rule can be motivated to compute Itˆ

diﬀerentials. It consists in computing ﬁrst a second order Taylor ”diﬀerential” and then in deﬁning

second order diﬀerentials in the single variables according to the simple ”multiplications rule”:

dt     dZt

dt       0      0    .

dZt      0     dt

o
In a mechanical way, this gives Itˆ’s formula as

∂g (t, Zt )        ∂g (t, Zt )       1 ∂ 2 g (t, Zt )     2  1 ∂ 2 g (t, Zt )        2
dg (t, Zt ) =             dt +               dZt +          2t
(dt) +                  (dZt )
∂t                ∂Z             2      ∂                2     ∂2Z
∂ 2 g (t, Zt )
+                dtdZt
∂t∂Z
∂g (t, Zt ) 1 ∂ 2 g (t, Zt )           ∂g (t, Zt )
=                 +                  dt +               dZt ,
∂t          2    ∂2Z                  ∂Z

2                              2
by using the multiplication rule (dt) = dtdZt = 0 and (dZt ) = dt.

Example 114 Consider a process X := (Xt , Gt )0≤t≤T satisfying the stochastic diﬀerential

dXt = Kt dt + Ht dZt           ,

for given X0 and for some adapted processes K := (Kt , Gt )0≤t≤T and H := (Ht ,Gt )0≤t≤T such
that H ∈ H and K ∈ K, where
T
K :=    (Gt )0≤t≤T − adapted processes (Kt )0≤t≤T |                        Kt dt < ∞ P − a.s .
0

o
X is called an Itˆ process. By applying the above multiplication rules it then follows for any
function f of class C 2 :
1             2
df (Xt )   = f (Xt ) dXt + f (Xt ) (dXt )
2
1           2    2  2       2
= f (Xt ) (Kt dt + Ht dZt ) + f (Xt ) Kt (dt) + Ht (dZt ) + 2Kt Ht dtdZt
2
1         2
= f (Xt ) (Kt dt + Ht dZt ) + f (Xt ) Ht dt .
2

92
5.4      An Application of Stochastic Calculus: the Black-Scholes Model

o
By means of Itˆ’s calculus we are now endowed with the analytical tool that permits us to extend

the set of self-ﬁnancing strategies in Black and Scholes model in a way that will make any European

contingent claim in the model perfectly hedgeable.

5.4.1     The Black-Scholes Market

The model structure is:

• I := [0, T ] is a continuous time index representing the available transaction dates in the

model

• The sample space is given by Ω := R[0,T ] with single outcomes ω of the form

ω = (ωt )t∈[0,T ]   ,

where ωt ∈ R, t ∈ [0, T ].

• A Brownian motion process Z := (Zt , Gt )t∈[0,T ] on (Ω, G,P ), where (Gt )t∈[0,T ] is the natural

ﬁltration associated to Z.

• Dynamics of the stock price and money account:

σ2
St   =   S0 exp σZt + µ −                   t   ,            (45)
2

Bt   =   B0 exp (rt)   ,

for some µ, r, σ > 0, for given B0 = 1, S0 . In diﬀerential form this gives

dSt    =   µSt dt + σSt dZt        ,

dBt    =   rBt dt .

5.4.2     Self Financing Portfolios and Hedging in the Black-Scholes Model
Deﬁnition 115 A self-ﬁnancing strategy in the Black and Scholes model is an adapted process
∆ := (∆t , Gt )t∈[0,T ] ∈ H with value process X := (Xt , Gt )t∈[0,T ] such that

(Xt − ∆t St )
dXt = ∆t dSt +                  dBt = ∆t dSt + (Xt − ∆t St ) rdt     ,
Bt

93
for given X0 . We will implicitly require in the sequel the integrability condition ∆ · S ∈ H.

Remark 116 The continuous time deﬁnition of a self-ﬁnancing strategy is the direct extension
of the one introduced for the discrete time setting. In particular, one has, using the risky asset
dynamics:
dXt = ∆t (dSt − St rdt) + Xt rdt = [rXt + ∆t (µ − r) St ] dt + ∆t σSt dZt               .   (46)

Our goal is to hedge European derivatives deﬁned by some GT −measurable pay-oﬀ given by

v (T, ST ) = g (ST )           ,

for a given continuous function g. We denote by v (t, St ) the price of the derivative at time

t ∈ [0, T ] and assume that v is of class C 1,2 (in order to apply Itˆ’s Lemma). By Itˆ’s Lemma the
o                o

dynamics of vt := v (t, St ) are

1 2            2
dvt   =    ∂t vt dt + ∂SS vt · (dSt ) + ∂S vt · dSt                                  (47)
2
2
σ 2 St 2
=     ∂t vt +         ∂ vt + µSt ∂S vt dt + σSt ∂S vt dZt              .
2 SS

In order for a self-ﬁnanced portfolio ∆ with value process X to be a perfect hedge for (vt )t∈[0,T ]

the following hedging condition has to be satisﬁed:

Xt = vt    ,     t ∈ [0, T ]         .                            (48)

This imposes a strong restriction on the joint dynamics (46), (47), which have to coincide, implying:

∆t σSt   =       ∂S vσSt       ,
2
σ 2 St 2
rXt + ∆t (µ − r) St    =       ∂t v + µSt ∂S v +                ∂ v   .
2 SS

Therefore we get, from the ﬁrst of these two equations,

∆t = ∂ S v t      ,

i.e the delta of the portfolio. Inserting the delta in the second equation together with the perfect

hedging condition (48) ﬁnally gives the partial diﬀerential equation (PDE)

2
σ 2 St 2
∂v + rS∂S v +            ∂ v = rv            ,                       (49)
2 SS

94
for the function v (t, S), subject to the boundary condition

v (T, S) = g (S)     ,    S>0      .

This is Black-Scholes partial diﬀerential equation for the price of an European derivative. Solving

this equation for the case
+
g (S) = (S − K)        ,

gives the call Black-Scholes pricing formula vBS (t, S) in Proposition 93. Computing ∂S vBS based

on the formula in Proposition 93 gives the delta of the call as

∆t = ∂S vBS (t, St ) = N (d1t )          ,

where
S             σ2
log   K   + r+      2    (T − t)
d1t =                                                .
σ   (T − t)
S=St

We remark that since |N (d1t )| ≤ 1 one has (N (d1t ) St , Gt )t∈[0,T ] ∈ H, as initially assumed.

5.4.3     Probabilistic Interpretation of Black-Scholes Prices: Girsanov Theorem once
more

It is important to remark that the fundamental PDE (49) does not depend on the expected return

parameter7 µ. This gives us the possibility to provide a probabilistic interpretation of Black

and Scholes formula, which can be written as a discounted conditional expectation under a risk

neutral martingale measure in the model. In other words, this allows us to give a probabilistic

interpretation of pricing functions that are solutions of speciﬁc PDE’s.

To highlight this point, rewrite the stock price dynamics as

dSt = µSt dt + σSt dZt = rSt dt + σSt (dZt + θdt) = rSt dt + σSt dZt                ,

where Zt = Z + θt and θ = (µ − r) /σ is the market price of risk. Notice, that if we could ﬁnd an

equivalent probability measure P such that the process Zt , Gt                             is Brownian motion under
t∈[0,T ]

7   Therefore, the pricing of a derivative does not depend on the market expectations on the risky asset returns.

95
P , then we would have that under P the stock price process is a geometric Brownian motion with

dynamics

dSt = rSt dt + σSt dZt        .                            (50)

By replicating all arguments in the above section when using the P dynamics (50) with drift r

we would then obtain again precisely the PDE (49) for the price function v (t, S) of an European

derivative. Therefore, changing in this way the measure does not alter the functional form for the

pricing formula v (t, S).

As usual the desired change of probability measure is provided by a version of Girsanov Theo-

rem. We give a version of this theorem for the present setting, which is an immediate consequence

of the proofs developed in the semicontinuous model setting.

Corollary 117 In the Black and Scholes model with stock price dynamics (45) a risk neutral
martingale measure P on (Ω, GT ) is obtained by setting for any A ∈ GT ,

θ2 T
P (A) :=       exp −θZT −                 dP   ,
A                    2

where
µ−r
θ=          ,
σ
is the market price of risk in the model.

Proof. The proof follows the same arguments as those for the version of Girsanov Theorem

provided in the semicontinuous setting.

The key feature of risk neutral probabilities is that discounted prices of self-ﬁnanced portfolios

(and thus also discounted prices of hedge portfolios) are martingales. This allows us to write

today’s price function of a derivative as the discounted risk neutral expectation of its terminal pay

oﬀ.

Speciﬁcally, let Xt be the t−time value of a self-ﬁnanced portfolio in the Black Schole model

o
and deﬁne by Xt := Xt /Bt = Xt exp (−rt) the discounted portfolio value. By Itˆ’s Lemma we

96
then have under P (cf. also (46)):

d                    1          1
dXt   = Xt     (exp (−rt)) dt +    dXt +     ·0
dt                   Bt        Bt
Xt          1
= − rdt +          rXt dt + ∆t σSt dZt
Bt          Bt
∆t S t
=        σdZt ,
Bt

that is Xt , Gt               is a martingale under P , provided that ∆ · S ∈ H. For the hedge portfolio
t∈[0,T ]

∆t = ∂S v (t, St )     ,

this gives

v (0, S0 )   X0              XT                  v (T, ST )        1
v (0, S0 ) =              =    =E              G0        =E                  =      E ( g (ST )| G0 )   ,   (51)
B0        B0              BT                     BT            BT

which writes v (0, S0 ) as a discounted expectation of the terminal pay-oﬀ g (ST ), conditional on

the initial condition S = S0 .
+
For the case g (S) = (S − K)              we computed this expectation in the semicontinuous model,

providing Black and Scholes call price formula. Moreover, this formula is at the same time the

solution of the PDE (49) to which we already gave the probabilistic interpretation (51). Finally,

to compute the hedging strategy in the Black and Scholes model we just have to compute the

derivative
∂v (t, S)               Bt ∂ E ( g (ST )| St = S)
=       ·                           .
∂S        S=St        BT           ∂S

In the call option case, this gives after some algebra

∂v (t, S)
= N (d1t )   ,
∂S        S=St

using the explicit expression for
1           +
E (ST − K) G0
BT

obtained in Proposition 93.

97

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 59 posted: 5/1/2010 language: English pages: 98
How are you planning on using Docstoc?