# Source Coding and Quantization VI Fundamentals of Rate Distortion by sdaferv

VIEWS: 2 PAGES: 10

• pg 1
```									                                                                                                                        The rate distortion problem

Source: X1 , X2 , . . . , Xn , . . . are independent and identically
Source Coding and Quantization                                                                         distributed (i.i.d.) random variables from the source alphabet X .
ˆ ˆ         ˆ
Reproduction: the sequence X1 , X2 , . . . , Xn , . . . from the
VI: Fundamentals of Rate Distortion Theory                                                                                              ˆ
reproduction alphabet X .
Distortion: the per symbol distortion between
T. Linder                                                                                                                  ˆ
xn = (x1 , . . . , xn ) ∈ X n and xn = (ˆ1 , . . . , xn ) ∈ X n is
ˆ     x            ˆ
n
n    1n
Queen’s University                                                                                               ˆ
d(x , x ) =                        ˆ
d(xi , xi )
n         i=1

Fall 2009                                                                    ˆ
where d : X × X → [0, ∞) is called the distortion measure.
Question: What is the minimum number of bits that is needed to
ˆ
represent the reproduction X n to guarantee that the average
n      ˆ
distortion between X and X n does not exceed a given level D?

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                             1 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                       2 / 40

fn (X n ) ∈ {1, 2, . . . , 2nR }                                    Remarks:
Xn -                                                              - decoder - X n
encoder                                                                  ˆ
fn                                                             gn
For simplicity, we only consider ﬁnite source and reproduction
ˆ                                           ˆ
alphabets X and X in stating the main results. Thus Xi and Xi are
discrete random variables.
Deﬁnition A (lossy) source code of rate R and blocklength n consists of
Just as in channel coding, we use the simpliﬁcation that 2nR means
an encoder
2nR .
fn : X n → {1, 2, . . . , 2nR }
Since fn (X n ) can take 2nR diﬀerent values, we need binary words
and a decoder                                                                                                           of length
ˆ
gn : {1, 2, . . . , 2nR } → X n                                                                          log 2nR ≈ nR

to represent it exactly (either for transmission or storage).
The (expected) distortion of the code (fn , gn ) is
(Fixed-rate binary lossless coding.)
ˆ
D = Ed(X n , X n ) = Ed X n , gn (fn (X n ))                                                 Thus

R = # of bits per source symbol needed to represent fn (X n )
The rate of (fn , gn ) is R bits per source symbol.

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                             3 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                       4 / 40
Assumption on the distortion measure: For any x ∈ X there exists
Remarks cont’d:                                                                                                                 ˆ   ˆ
x ∈ X such that
ˆ
d(x, x) = 0
The distortion D = Ed X n , gn (fn (X n )) measures the ﬁdelity
ˆ
between X n and its reproduction X n . Note: the larger D, the less                                                    Examples:
the ﬁdelity.                                                                                                                                                    ˆ
Hamming distortion: Assume X = X and deﬁne
The distortion can be explicitly expressed as                                                                                                                 
ˆ
0 if x = x
D = Ed X n , gn (fn (X n )) =                                     p(xn )d xn , gn (fn (xn ))                                                       ˆ
d(x, x) =
ˆ
1 if x = x.
xn ∈X n

where                                                                                                                                          ˆ              ˆ
Note: Ed(X, X) = P (X = X), the probability of error.
n
n                    n      n
p(x ) = P (X = x ) =                            p(xi )                                                                             ˆ
Squared error distortion: Let X = X = R and deﬁne
i=1

and p(x), x ∈ X is the pmf of the memoryless source (Xi ∼ p(x)).                                                                                                            d(x, x) = (x − x)2
ˆ         ˆ
The ultimate goal of lossy source coding is to minimize R for a
Note: The source and reconstruction alphabets are not ﬁnite in this
given D, or to minimize D for a given R.
case. All the results we cover can be generalized from ﬁnite-alphabet
sources to more general source alphabets.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                              5 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                6 / 40

Rate distortion function                                                                                                        Remark: R(D) is the solution of the following constrained optimization
problem:
Let X the a generic X -valued random variable with the common
distribution of the Xi . Let p(x) = pX (x) be the pmf of X. We identify                                                                                                                                           x
p(ˆ|x)
minimize                                  x
p(x)p(ˆ|x) log
the source with X.                                                                                                                                                                                          x ∈X        x
p(x )p(ˆ|x )
ˆ ˆ
x∈X x∈X

Deﬁnition The rate distortion function of the source X with respect to                                                                                                                                 ˆ
I(X;X)
d is deﬁned for any D ≥ 0 by
x
over all conditional distribution p(ˆ|x) such that
R(D) =                     min                     ˆ
I(X; X)
ˆ
p(ˆ|x):Ed(X,X)≤D
x                                                                                                                               p(x)p(ˆ|x)d(x, x) ≤ D
x        ˆ
ˆ ˆ
x∈X x∈X

ˆ
Note: In the deﬁnition the mutual information I(X; X) is minimized                                                                                                                         ˆ
Ed(X,X)

x            x
over all conditional distributions p(ˆ|x) = pX|X (ˆ|x) such that the
ˆ
ˆ                                                                     This problem can be solved using numerical methods, and in some
ˆ          x
resulting joint distribution p(x, x) = p(x)p(ˆ|x) for (X, X) satisﬁes
ˆ ≤ D.                                                                                                                 special cases, analytically.
Ed(X, X)

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                              7 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                8 / 40
To prove the converse theorem, we need two lemmas:
Rate distortion theorem(s)
Lemma 2
Theorem 1 (Converse to the rate distortion theorem)                                                                      The rate distortion function R(D) is a nonincreasing convex function
of D.
For any n ≥ 1 and code (fn , gn ), if

Ed X n , gn (fn (X n )) ≤ D                                                     Proof: Recall that

R(D) =                    min            ˆ
I(X; X)
then the rate R of (fn , gn ) satisﬁes                                                                                                                                                  ˆ
p(ˆ|x):Ed(X,X)≤D
x

R ≥ R(D)
If D1 < D2 , then the set of conditional distributions over which the
ˆ
minimum of I(X; X) is taken is larger in the deﬁnition of R(D2 ) than in
the deﬁnition of R(D1 ). Thus
Note: The theorem states that R(D) is an ultimate lower bound on the
R(D1 ) ≥ R(D2 )
rate of any system that compresses the source with distortion ≤ D.
so R(D) is nonincreasing.

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                       9 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           10 / 40

x           x
Proof cont’d: Let D1 , D2 ≥ 0 and p1 (ˆ|x) and p2 (ˆ|x) be the
conditional distribution achieving R(D1 ) and R(D2 ), respectively:

ˆ
Ipi (X, X) = R(Di ),                              ˆ
Epi d(X, X) ≤ Di ,   i = 1, 2       (∗)                      Proof cont’d: Let Dλ = λD1 + (1 − λ)D2 . Thus we showed that

For 0 ≤ λ ≤ 1 deﬁne the conditional pmf                                                                                                                                            ˆ
Epλ d(X, X) ≤ Dλ                   (∗∗)

x           x                 x
pλ (ˆ|x) = λp1 (ˆ|x) + (1 − λ)p2 (ˆ|x)                                                                                           ˆ
Recall that the mutual information I(X, X) is a convex function of the
Then                                                                                                                                                x
conditional distribution p(ˆ|x).

ˆ                                                                                                                                 ˆ
R(Dλ ) ≤ Ipλ (X; X)    (from (∗∗) and the deﬁnition of R(D))
Epλ d(X, X)
ˆ                 ˆ                       ˆ
≤ λIp (X; X) + (1 − λ)Ip (X; X) (convexity of Ip (X; X))
=                              x        ˆ
p(x)pλ (ˆ|x)d(x, x)                                                                                                      1                                  2

ˆ ˆ
x∈X x∈X                                                                                                                    = λR(D1 ) + (1 − λ)R(D2 )                            (from (∗))
= λ                               x        ˆ
p(x)p1 (ˆ|x)d(x, x) + (1 − λ)                            x        ˆ
p(x)p2 (ˆ|x)d(x, x)
ˆ ˆ
x∈X x∈X                                                   ˆ ˆ
x∈X x∈X                                        Hence R(D) is convex.
ˆ                    ˆ
= λEp1 d(X, X) + (1 − λ)Ep2 d(X, X)
≤ λD1 + (1 − λ)D2                          (from (∗))

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                      11 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           12 / 40
Lemma 3
Let X1 , X2 , . . . , Xn be discrete independent random variables. Then                                                         Proof of converse to the rate distortion theorem: Assume (fn , gn ) is a
ˆ ˆ         ˆ
for any discrete random variables X1 , X2 , . . . , Xn ,                                                                        code with fn : X n → {1, . . . , 2nR } and
n                                                                                                                              n
ˆ                              ˆ                                                                                                           1
I(X n ; X n ) ≥                 I(Xi ; Xi )                                                                                    ˆ
Ed(X n , X n ) =                          Ed(Xi , Xi ) ≤ D
i=1                                                                                                                      n      i=1

ˆ
where X n = gn (fn (X n )). Then
Proof:
ˆ
I(X n ; X n )                              ˆ
= H(X n ) − H(X n |X n )                                                                                                                      ˆ
I(X n , X n )               ˆ         ˆ
= H(X n ) − H(X n |X n )
n                                                                                                                                                       ˆ
≤ H(X n )
=                               ˆ
H(Xi ) − H(X n |X n )                 (by independence)
i=1                                                                                                                                                 ≤ H(fn (X n ))
n                      n
=               H(Xi ) −                     ˆ
H(Xi |X n , X i−1 )     (by the chain rule)                      Since
i=1                    i=1                                                                                                            H(fn (X n )) ≤ log 2nR = nR
n                      n
≥               H(Xi ) −                     ˆ
H(Xi |Xi ) (conditioning reduces entropy)                        we obtain
i=1                    i=1                                                                                                                                   ˆ
nR ≥ I(X n , X n )
n
=                      ˆ
I(Xi ; Xi )
i=1
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                             13 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            14 / 40

Theorem 4 (Achievability of the rate distortion function)
Thus                                                                                                                            For any D ≥ 0 and δ > 0, if n is large enough, then there exists a
code (fn , gn ) with distortion
nR                   ˆ
≥ I(X n , X n )
n
ˆ                                                                                                                                     Ed X n , gn (fn (X n )) < D + δ
≥             I(Xi ; Xi ) (from Lemma 3)
i=1
n                                                                                                            and rate R such that
≥                       ˆ
R Ed(Xi , Xi )                  (from the deﬁnition of R(D) )                                                                                              R < R(D) + δ
i=1
n
1                    ˆ
= n                     R Ed(Xi , Xi )
n   i=1
n                                                                                                Proof: Based on random code selection. It is rather long and we omit it.
1                               ˆ
≥ nR                         Ed(Xi , Xi )               (from Jensen’s inequality and Lemma 2)
n               i=1                                                                                             Note: The converse and direct theorems together imply that
n
1
≥ nR(D)                 (since                 Ed(Xi , Xi ) ≤ D by assumption )                                               R(D) is the ultimate lower bound on the rate of any code
n        i=1                                                                                   compressing the source with distortion ≤ D;
We conclude that R ≥ R(D).                                                                                                               this lower bound can be approached arbitrarily closely by coding
blocks of asymptotically large length.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                             15 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            16 / 40
Calculation of rate distortion functions
Interpretation of converse and direct rate distortion theorems

If fn : X n → {1, . . . 2nR }, deﬁne                                                                         Closed-form solutions only exist in special cases.

Binary sources
r(fn , gn ) = R
ˆ
First we consider the binary case X = X = {0, 1} and the Hamming
and let d(fn , gn ) denote the distortion of (fn , gn ):                                                     distortion. Recall that
d(fn , gn ) = Ed X n , gn (fn (X n ))                                                                   Hb (q) = −q log q − (1 − q) log(1 − q)
The minimum rate of any code (fn , gn ) operating with distortion ≤ D is                                     denotes the binary entropy of q ∈ [0, 1].

Rn (D) =                      min          r(fn , dn )                      Theorem 5
(fn ,gn ):d(fn ,gn )≤D
For a binary source with P (X = 1) = p and the Hamming distortion,
Then Theorems 1 and 4 together imply that Rn (D) ≥ R(D) and                                                                          
H (p) − H (D),                              0 ≤ D ≤ min{p, 1 − p}
b     b
lim Rn (D) = R(D)                                                             R(D) =
n→∞
0,                                          D ≥ min{p, 1 − p}

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                          17 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                          18 / 40

Proof cont’d: Let 0 ≤ D < p and let ⊕ denote mod 2 addition.
Proof of Theorem 5: By symmetry (Hb (p) = Hb (1 − p)) we can assume                                                   x
Assume p(ˆ|x) satisﬁes the distortion constraint
p ≤ 1/2 (so that min{p, 1 − p} = p).
ˆ           ˆ
Ed(X, X) = P (X = X) ≤ D                                    (∗)
x
Let D ≥ p. Deﬁne p(ˆ|x) by
Then
p(0|0) = p(0|1) = 1
ˆ
I(X, X)             = H(X) − H(X|X) ˆ
ˆ                 ˆ
The resulting X is such that P (X = 0) = 1, and so
ˆ ˆ
= Hb (p) − H(X ⊕ X|X)
ˆ           ˆ
Ed(X, X) = P (X = X) = P (X = 0) = p ≤ D                                                                                   ˆ
≥ Hb (p) − H(X ⊕ X)                            (since conditioning reduces entropy)

so p(ˆ|x) satisﬁes the distortion constraint.
x                                                                                                                           ≥ Hb (p) − Hb (D)

ˆ
Since X is constant,                                                                                                                               ˆ
The second inequality holds since X ⊕ X is a binary random variable
ˆ
I(X; X) = 0                                            such that X ⊕ X                ˆ
ˆ = 1 iﬀ X = X, so (∗) implies

proving that R(D) = 0 if D ≥ p.                                                                                                                  ˆ               ˆ
H(X ⊕ X) = Hb (P (X = X)) ≤ Hb (D)

since D < 1/2 and Hb (D) is increasing in D ∈ [0, 1/2].
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                          19 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                          20 / 40
Proof cont’d: Thus we proved that                                                                                                                                      1−D
1−p−D
1−2D         0                                     -0
*       1−p
ˆ
P (X = X) ≤ D                                  ˆ
implies I(X, X) ≥ Hb (p) − Hb (D)                                                                                                 D
ˆ
X                                              X
D
This gives
p−D
1                                     -
j1      p
R(D) ≥ Hb (p) − Hb (D)                                                                     1−2D
1−D
for 0 ≤ D < p ≤ 1/2.
ˆ
The proof is ﬁnished by exhibiting a joint distribution for X and X such                                                             ˆ
(Check that the input X indeed gives P (X = 1) = p.)
that
P (X = 1) = p                                                                     Clearly,
ˆ           ˆ
Ed(X, X) = P (X = X) = D

ˆ
P (X = X) = D                                               and
ˆ               ˆ
I(X; X) = H(X) − H(X|X) = Hb (p) − Hb (D)

ˆ
I(X; X) = Hb (p) − Hb (D)                                             This proves that
R(D) = Hb (p) − Hb (D)
The joint distribution is obtained via a binary symmetric channel (BSC)                                        if 0 ≤ D < p ≤ 1/2.
ˆ
whose input is X and output is X.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            21 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                  22 / 40

Special case: asymptotically vanishing distortion (D = 0)                                                      Continuous source alphabet: Gaussian source

Let D = 0. Then R(0) = Hb (p) − Hb (0) = Hb (p).                                                               The rate distortion theorem can be generalized to continuous alphabets
ˆ
X = X = R. Consider the squared error distortion d(x, x) = (x − x)2 .
ˆ         ˆ
In this case, the direct part of the rate distortion theorem states that
Assume X has probability density function (pdf) f (x). Then
there exist codes (fn , gn ) with rate Rn ≥ Hb (p) such that
x
f (ˆ|x)
lim Rn = Hb (p)                                                             R(D) = inf                      x
f (ˆ|x)f (x) log                                 x
dxdˆ
n→∞                                                                                                                                      x
f (ˆ|x )f (x ) dx
and                                                                                                                                                                                     ˆ
I(X;X)
n
ˆ           1                                 ˆ
lim Ed(X n , X n ) = lim                           P (Xi = Xi ) = 0                                                                                  x
where the inﬁmum is taken over all conditional densities f (ˆ|x) such that
n→∞                  n→∞ n
i=1

Remarks:                                                                                                                                                 (x − x)2 f (ˆ|x)f (x) dxdˆ ≤ D
ˆ      x            x

ˆ
The code (fn , gn ) have ﬁxed length. If variable-length codes are                                                                                          Ed(X,X)

ˆ
allowed, then X n = X n can be achieved.                                                              More compactly,
The existence of the codes (fn , gn ) also follows from the “almost
R(D) =                        inf                 ˆ
I(X; X)
lossless” ﬁxed-rate source coding theorem since                                                                                                                     ˆ
f (ˆ|x):E[(X−X)2 ]≤D
x
ˆ                ˆ
Ed(X n , X n ) ≤ P (X n = X n )
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            23 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                  24 / 40
Theorem 6                                                                                               Proof of Theorem 6:
2
If X ∼ N (0, σ ) and the distortion measure is the squared error                                                                              ˆ
First let D ≥ σ 2 . Then we can deﬁne X ≡ 0. We have
distortion, then
                                                                                            ˆ
Ed(X, X) = E[(X − 0)2 ] = Var(X) = σ 2 ≤ D
2
 1 log σ ,                     0 < D ≤ σ2
R(D) = 2      D                                                       and
D > σ2

0,                                                                                                     ˆ      ˆ      ˆ
I(X; X) = H(X) − H(X|X) = 0

Thus R(D) = 0 if D ≥ σ 2 .

Next we recall properties of the diﬀerential entropy
Remark: The inverse of R(D), denoted by D(R), is called the distortion
rate function. It represents the lowest distortion that can be achieved
h(X) = −                f (x) log f (x) dx
with codes of rate ≤ R.

For the Gaussian case D(R) is given by                                                                  and conditional diﬀerential entropy

D(R) = σ 2 2−2R                                                                       ˆ
h(X|X) = −                            ˆ           x     x
f (x, x) log f (x|ˆ) dxdˆ

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                     25 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         26 / 40

ˆ
(1) If X and X are jointly continuous, then
Proof cont’d:                                                   ˆ
Assume f (ˆ|x) is such that E[(X − X)2 ] ≤ D.
x
ˆ               ˆ
I(X; X) = h(X) − h(X|X)
Then
(2) Conditioning reduces diﬀerential entropy:
ˆ
I(X; X)            = h(X) − h(X|X)  ˆ
ˆ
h(X|X) ≤ h(X)                                                              1                      ˆ ˆ
=   log(2πeσ 2 ) − h(X − X|X) (since X ∼ N (0, σ 2 ))
2
ˆ
where equality holds iﬀ X and X are independent.                                                                         1                      ˆ
≥   log(2πeσ 2 ) − h(X − X)   (from (2))
2
(3) For X ∼ N (0, σ 2 ),                                                                                                         1                1
≥   log(2πeσ 2 ) − log(2πeD) (from (4))
1                                                              2                2
h(X) =             log(2πeσ 2 )                                                 1     σ2
2                                                            =   log
2     D
(4) Gaussian random variables maximize diﬀerential entropy for a given
second moment: If E(Z 2 ) ≤ D, then                                                                We conclude that
1     σ2
1                                                                                                R(D) ≥             log
h(Z) ≤ log(2πeD)                                                                                                        2     D
2
where equality holds iﬀ Z ∼ N (0, D).
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                     27 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         28 / 40
Proof cont’d: If 0 ≤ D < σ 2 , the lower bound can be achieved using
ˆ
the following joint distribution for X and X:

ˆ
X = X + Z,                  ˆ
X ∼ N (0, σ 2 − D),       Z ∼ N (0, D)
Proof cont’d: The lower bound and the test channel together imply
ˆ
where Z is independent of X.                                                                                          that for 0 < D < σ 2 ,
1     σ2
R(D) = log
Then X ∼ N (0, σ 2 ) and we obtain the following “test channel” with                                                                                   2     D
independent Gaussian noise:
Since R(D) = 0 if D ≥ σ 2 , conclude that
        2
ˆ
X ∼ N (0, σ 2 − D)                               -+          - X ∼ N (0, σ 2 )                                                                          1 log σ ,                  0 < D ≤ σ2
6                                                                                         R(D) = 2      D
D ≥ σ2

0,
Z ∼ N (0, D)

It is easy to check that

ˆ                                                     ˆ                     1     σ2
E[(X − X)2 ] = E[Z 2 ] = D,                             I(X; X) = h(X) − h(Z) =      log
2     D

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                   29 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                    30 / 40

Proof of Theorem 7: We can use the same steps as in the calculation of
For non-Gaussian sources R(D) is not known in closed form.
R(D) for Gaussian sources.
If X ∼ f (x) has ﬁnite diﬀerential entropy h(X), a lower bound can be                                                                                    ˆ
Assume f (ˆ|x) is such that E[(X − X)2 ] ≤ D.
x
obtained.
Then
Theorem 7 (Shannon lower bound)
Let X have pdf f (x) and ﬁnite diﬀerential entropy h(X). Then its                                                                                           ˆ
I(X; X)                         ˆ
= h(X) − h(X|X)
rate distortion function for the squared error distortion is lower                                                                                                                       ˆ ˆ
= h(X) − h(X − X|X)
bounded as                                                                                                                                                                               ˆ
1                                                                                                                                   ≥ h(X) − h(X − X)
R(D) ≥ h(X) − log(2πeD)                                                                                                                                      1
2                                                                                                                                   ≥ h(X) − log(2πeD)
2
From the deﬁnition
Remark: The Shannon lower bound can easily be expressed in terms of                                                                                                                                  ˆ
R(D) =                        inf        I(X; X)
the distortion rate function:                                                                                                                                                       ˆ
f (ˆ|x):E[(X−X)2 ]≤D
x

1 −2(R−h(X))                                              we conclude that
D(R) ≥              2
2πe                                                                                                                    1
R(D) ≥ h(X) −                    log(2πeD)
2
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                   31 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                    32 / 40
Remarks:

The Shannon lower bound
Connections with vector quantization
1
R(D) ≥ h(X) − log(2πeD)
2                                                              Rate distortion theory characterizes the ultimate performance limit
is easy to calculate since it only depends on D and h(X). It will be                                          for lossy compression with block codes (vector quantizers) as
very useful in comparing the performance of practical codes to the                                            n → ∞.
theoretical limit.                                                                                            We have investigated the performance of optimal vector quantizers
The Shannon lower bound can be derived for more general distortion                                            of a ﬁxed dimension n.
ˆ
measures. For example, if d(x, x) is a diﬀerence distortion measure
in the form                                                                                          Assume {Xi } = X1 , X2 , . . . is a sequence of i.i.d. random variables with
2
d(x, x) = ρ(x − x)
ˆ            ˆ                                                          a pdf such that E(Xi ) < ∞. Let X be a generic r.v. having the same
pdf as the Xi ’s.
then (if ρ is suﬃciently well behaved)

R(D) ≥ h(X) −                    max       h(Y )
Y :Eρ(Y )≤D

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           33 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                34 / 40

Introduce the optimal ﬁxed and variable-rate n-dimensional VQ                                                 Source coding theorem
performance in coding the n-block X n = (X1 , . . . , Xn ).
The converse rate distortion theorem can be restated as:
Optimal ﬁxed-rate VQ performance:
Theorem 8 (Converse to the rate distortion theorem)
1
Dn,F (R) =      inf       D(Q)                                          For all n ≥ 1,
Q : rF (Q)≤R n
Dn,V (R) ≥ D(R)
Optimal variable-rate VQ performance:
1                                   Remarks:
Dn,V (R) =                   inf      D(Q)
Q : rV (Q)≤R   n
The theorem implies that Dn,F (R) ≥ D(R).
Note that                                                                                                              The theorem says that D(R) is an ultimate lower bound on the
Dn,V (R) ≤ Dn,F (R)                                                     distortion of any block code operating at (ﬁxed or variable) rate R.

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           35 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                36 / 40
Theorem 9 (Direct part of the source coding theorem)
Unfortunately, D(R) is known explicitly for some special
For any R ≥ 0,                                                                                              distributions only.
lim Dn,F (R) = D(R)
n→∞
2
We proved that if X is Gaussian with variance σX , then
Remarks:                                                                                                                                                         2
D(R) = σX 2−2R

The theorem implies that lim Dn,V (R) = D(R).
n→∞
In the general case, only bounds are known. We proved the Shannon
The theorem says that for any R ≥ 0 there exist ﬁxed-rate (or
lower bound: If the diﬀerential entropy h(X) is ﬁnite, then for all
variable-rate) vector quantizers which operate at rate R and have
R ≥ 0,
distortion arbitrarily close to D(R) if the quantizer dimension n is                                                                     1 −2(R−h(X))
large enough.                                                                                                      D(R) ≥ DSLB (R) =        2
2πe

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                37 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         38 / 40

Compare with the ultimate limit D(R):
Compare D(R) with variable-rate lattice VQ performance at high rates.
D(QΛ )    D(QΛ )    Gn 2−2(R−h(X))
Let QΛ be the lattice VQ with optimal cell:                                                                         1≤             ≤          ≈ 1 −2(R−h(X)) = Gn 2πe
D(R)     DSLB (R)   2πe 2
Gn = min G(R0 ).
n  Λ⊂R

Then                                                                                               For n = 1: (QΛ is a uniform quantizer with entropy coding) the loss is
1                2            n
D(QΛ ) ≈ Gn 2− n (H(QΛ )−h(X ))
n                                                                                                                                      2πe
10 log10 (Gn 2πe) = 10 log10                 = 1.53 dB
n
12
We have h(X ) = h(X1 ) + · · · + h(Xn ) = nh(X) since the Xi are i.i.d.
In terms of rate, this corresponds to a 0.255 bit rate loss.
1
Thus, with R =                n H(QΛ ),                                                            For n → ∞:
D(QΛ )
1                                                                                                        lim             = lim Gn 2πe = 1
D(QΛ ) ≈ Gn 2−2(R−h(X))                                                                               n→∞       D(R)    n→∞
n
For large n, variable-rate lattice quantizers can perform
arbitrarily close to the rate-distortion limit (at high rates).

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                39 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         40 / 40

```
To top