Docstoc

Source Coding and Quantization VI Fundamentals of Rate Distortion

Document Sample
Source Coding and Quantization VI Fundamentals of Rate Distortion Powered By Docstoc
					                                                                                                                        The rate distortion problem

                                                                                                                                 Source: X1 , X2 , . . . , Xn , . . . are independent and identically
                          Source Coding and Quantization                                                                         distributed (i.i.d.) random variables from the source alphabet X .
                                                                                                                                                                       ˆ ˆ         ˆ
                                                                                                                                 Reproduction: the sequence X1 , X2 , . . . , Xn , . . . from the
                    VI: Fundamentals of Rate Distortion Theory                                                                                              ˆ
                                                                                                                                 reproduction alphabet X .
                                                                                                                                 Distortion: the per symbol distortion between
                                                                  T. Linder                                                                                                                  ˆ
                                                                                                                                 xn = (x1 , . . . , xn ) ∈ X n and xn = (ˆ1 , . . . , xn ) ∈ X n is
                                                                                                                                                                   ˆ     x            ˆ
                                                                                                                                                                                           n
                                                                                                                                                                            n    1n
                                                          Queen’s University                                                                                               ˆ
                                                                                                                                                                     d(x , x ) =                        ˆ
                                                                                                                                                                                                 d(xi , xi )
                                                                                                                                                                                 n         i=1


                                                                  Fall 2009                                                                    ˆ
                                                                                                                                 where d : X × X → [0, ∞) is called the distortion measure.
                                                                                                                                 Question: What is the minimum number of bits that is needed to
                                                                                                                                                            ˆ
                                                                                                                                 represent the reproduction X n to guarantee that the average
                                                                                                                                                      n      ˆ
                                                                                                                                 distortion between X and X n does not exceed a given level D?

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                             1 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                       2 / 40




                                                    fn (X n ) ∈ {1, 2, . . . , 2nR }                                    Remarks:
                  Xn -                                                              - decoder - X n
                       encoder                                                                  ˆ
                         fn                                                             gn
                                                                                                                                 For simplicity, we only consider finite source and reproduction
                                                                                                                                                    ˆ                                           ˆ
                                                                                                                                 alphabets X and X in stating the main results. Thus Xi and Xi are
                                                                                                                                 discrete random variables.
         Definition A (lossy) source code of rate R and blocklength n consists of
                                                                                                                                 Just as in channel coding, we use the simplification that 2nR means
         an encoder
                                                                                                                                  2nR .
                              fn : X n → {1, 2, . . . , 2nR }
                                                                                                                                 Since fn (X n ) can take 2nR different values, we need binary words
         and a decoder                                                                                                           of length
                                                                               ˆ
                                                   gn : {1, 2, . . . , 2nR } → X n                                                                          log 2nR ≈ nR

                                                                                                                                 to represent it exactly (either for transmission or storage).
         The (expected) distortion of the code (fn , gn ) is
                                                                                                                                 (Fixed-rate binary lossless coding.)
                                                 ˆ
                                    D = Ed(X n , X n ) = Ed X n , gn (fn (X n ))                                                 Thus

                                                                                                                                       R = # of bits per source symbol needed to represent fn (X n )
         The rate of (fn , gn ) is R bits per source symbol.


Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                             3 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                       4 / 40
                                                                                                                                         Assumption on the distortion measure: For any x ∈ X there exists
         Remarks cont’d:                                                                                                                 ˆ   ˆ
                                                                                                                                         x ∈ X such that
                                                                                                                                                                         ˆ
                                                                                                                                                                    d(x, x) = 0
                  The distortion D = Ed X n , gn (fn (X n )) measures the fidelity
                                                     ˆ
                  between X n and its reproduction X n . Note: the larger D, the less                                                    Examples:
                  the fidelity.                                                                                                                                                    ˆ
                                                                                                                                                  Hamming distortion: Assume X = X and define
                  The distortion can be explicitly expressed as                                                                                                                 
                                                                                                                                                                                          ˆ
                                                                                                                                                                                0 if x = x
                        D = Ed X n , gn (fn (X n )) =                                     p(xn )d xn , gn (fn (xn ))                                                       ˆ
                                                                                                                                                                      d(x, x) =
                                                                                                                                                                                          ˆ
                                                                                                                                                                                1 if x = x.
                                                                                xn ∈X n

                  where                                                                                                                                          ˆ              ˆ
                                                                                                                                                  Note: Ed(X, X) = P (X = X), the probability of error.
                                                                                             n
                                                       n                    n      n
                                                 p(x ) = P (X = x ) =                            p(xi )                                                                             ˆ
                                                                                                                                                  Squared error distortion: Let X = X = R and define
                                                                                           i=1

                  and p(x), x ∈ X is the pmf of the memoryless source (Xi ∼ p(x)).                                                                                                            d(x, x) = (x − x)2
                                                                                                                                                                                                   ˆ         ˆ
                  The ultimate goal of lossy source coding is to minimize R for a
                                                                                                                                                  Note: The source and reconstruction alphabets are not finite in this
                  given D, or to minimize D for a given R.
                                                                                                                                                  case. All the results we cover can be generalized from finite-alphabet
                                                                                                                                                  sources to more general source alphabets.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                              5 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                6 / 40




         Rate distortion function                                                                                                        Remark: R(D) is the solution of the following constrained optimization
                                                                                                                                         problem:
         Let X the a generic X -valued random variable with the common
         distribution of the Xi . Let p(x) = pX (x) be the pmf of X. We identify                                                                                                                                           x
                                                                                                                                                                                                                         p(ˆ|x)
                                                                                                                                                         minimize                                  x
                                                                                                                                                                                             p(x)p(ˆ|x) log
         the source with X.                                                                                                                                                                                          x ∈X        x
                                                                                                                                                                                                                          p(x )p(ˆ|x )
                                                                                                                                                                                  ˆ ˆ
                                                                                                                                                                              x∈X x∈X

         Definition The rate distortion function of the source X with respect to                                                                                                                                 ˆ
                                                                                                                                                                                                            I(X;X)
         d is defined for any D ≥ 0 by
                                                                                                                                                                             x
                                                                                                                                         over all conditional distribution p(ˆ|x) such that
                                            R(D) =                     min                     ˆ
                                                                                          I(X; X)
                                                                        ˆ
                                                            p(ˆ|x):Ed(X,X)≤D
                                                              x                                                                                                                               p(x)p(ˆ|x)d(x, x) ≤ D
                                                                                                                                                                                                    x        ˆ
                                                                                                                                                                                  ˆ ˆ
                                                                                                                                                                              x∈X x∈X

                                                                ˆ
         Note: In the definition the mutual information I(X; X) is minimized                                                                                                                         ˆ
                                                                                                                                                                                               Ed(X,X)

                                              x            x
         over all conditional distributions p(ˆ|x) = pX|X (ˆ|x) such that the
                                                       ˆ
                                                                   ˆ                                                                     This problem can be solved using numerical methods, and in some
                                           ˆ          x
         resulting joint distribution p(x, x) = p(x)p(ˆ|x) for (X, X) satisfies
                  ˆ ≤ D.                                                                                                                 special cases, analytically.
         Ed(X, X)


Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                              7 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                8 / 40
                                                                                                                                  To prove the converse theorem, we need two lemmas:
         Rate distortion theorem(s)
                                                                                                                                  Lemma 2
         Theorem 1 (Converse to the rate distortion theorem)                                                                      The rate distortion function R(D) is a nonincreasing convex function
                                                                                                                                  of D.
         For any n ≥ 1 and code (fn , gn ), if

                                                  Ed X n , gn (fn (X n )) ≤ D                                                     Proof: Recall that

                                                                                                                                                                      R(D) =                    min            ˆ
                                                                                                                                                                                                          I(X; X)
         then the rate R of (fn , gn ) satisfies                                                                                                                                                  ˆ
                                                                                                                                                                                     p(ˆ|x):Ed(X,X)≤D
                                                                                                                                                                                       x


                                                                 R ≥ R(D)
                                                                                                                                  If D1 < D2 , then the set of conditional distributions over which the
                                                                                                                                                     ˆ
                                                                                                                                  minimum of I(X; X) is taken is larger in the definition of R(D2 ) than in
                                                                                                                                  the definition of R(D1 ). Thus
         Note: The theorem states that R(D) is an ultimate lower bound on the
                                                                                                                                                                                     R(D1 ) ≥ R(D2 )
         rate of any system that compresses the source with distortion ≤ D.
                                                                                                                                  so R(D) is nonincreasing.


Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                       9 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           10 / 40




                                                x           x
         Proof cont’d: Let D1 , D2 ≥ 0 and p1 (ˆ|x) and p2 (ˆ|x) be the
         conditional distribution achieving R(D1 ) and R(D2 ), respectively:

                                    ˆ
                            Ipi (X, X) = R(Di ),                              ˆ
                                                                     Epi d(X, X) ≤ Di ,   i = 1, 2       (∗)                      Proof cont’d: Let Dλ = λD1 + (1 − λ)D2 . Thus we showed that

         For 0 ≤ λ ≤ 1 define the conditional pmf                                                                                                                                            ˆ
                                                                                                                                                                                   Epλ d(X, X) ≤ Dλ                   (∗∗)

                                             x           x                 x
                                         pλ (ˆ|x) = λp1 (ˆ|x) + (1 − λ)p2 (ˆ|x)                                                                                           ˆ
                                                                                                                                  Recall that the mutual information I(X, X) is a convex function of the
         Then                                                                                                                                                x
                                                                                                                                  conditional distribution p(ˆ|x).

                  ˆ                                                                                                                                 ˆ
                                                                                                                                   R(Dλ ) ≤ Ipλ (X; X)    (from (∗∗) and the definition of R(D))
         Epλ d(X, X)
                                                                                                                                                      ˆ                 ˆ                       ˆ
                                                                                                                                          ≤ λIp (X; X) + (1 − λ)Ip (X; X) (convexity of Ip (X; X))
                  =                              x        ˆ
                                         p(x)pλ (ˆ|x)d(x, x)                                                                                                      1                                  2

                             ˆ ˆ
                         x∈X x∈X                                                                                                                    = λR(D1 ) + (1 − λ)R(D2 )                            (from (∗))
                  = λ                               x        ˆ
                                            p(x)p1 (ˆ|x)d(x, x) + (1 − λ)                            x        ˆ
                                                                                             p(x)p2 (ˆ|x)d(x, x)
                                 ˆ ˆ
                             x∈X x∈X                                                   ˆ ˆ
                                                                                   x∈X x∈X                                        Hence R(D) is convex.
                              ˆ                    ˆ
                  = λEp1 d(X, X) + (1 − λ)Ep2 d(X, X)
                  ≤ λD1 + (1 − λ)D2                          (from (∗))


Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                      11 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           12 / 40
         Lemma 3
         Let X1 , X2 , . . . , Xn be discrete independent random variables. Then                                                         Proof of converse to the rate distortion theorem: Assume (fn , gn ) is a
                                                 ˆ ˆ         ˆ
         for any discrete random variables X1 , X2 , . . . , Xn ,                                                                        code with fn : X n → {1, . . . , 2nR } and
                                                                             n                                                                                                                              n
                                                          ˆ                              ˆ                                                                                                           1
                                                  I(X n ; X n ) ≥                 I(Xi ; Xi )                                                                                    ˆ
                                                                                                                                                                        Ed(X n , X n ) =                          Ed(Xi , Xi ) ≤ D
                                                                            i=1                                                                                                                      n      i=1

                                                                                                                                               ˆ
                                                                                                                                         where X n = gn (fn (X n )). Then
         Proof:
                 ˆ
         I(X n ; X n )                              ˆ
                                 = H(X n ) − H(X n |X n )                                                                                                                      ˆ
                                                                                                                                                                       I(X n , X n )               ˆ         ˆ
                                                                                                                                                                                               = H(X n ) − H(X n |X n )
                                           n                                                                                                                                                       ˆ
                                                                                                                                                                                               ≤ H(X n )
                                 =                               ˆ
                                                 H(Xi ) − H(X n |X n )                 (by independence)
                                           i=1                                                                                                                                                 ≤ H(fn (X n ))
                                            n                      n
                                 =               H(Xi ) −                     ˆ
                                                                        H(Xi |X n , X i−1 )     (by the chain rule)                      Since
                                           i=1                    i=1                                                                                                            H(fn (X n )) ≤ log 2nR = nR
                                            n                      n
                                 ≥               H(Xi ) −                     ˆ
                                                                        H(Xi |Xi ) (conditioning reduces entropy)                        we obtain
                                           i=1                    i=1                                                                                                                                   ˆ
                                                                                                                                                                                           nR ≥ I(X n , X n )
                                            n
                                 =                      ˆ
                                                 I(Xi ; Xi )
                                           i=1
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                             13 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            14 / 40




                                                                                                                                         Theorem 4 (Achievability of the rate distortion function)
         Thus                                                                                                                            For any D ≥ 0 and δ > 0, if n is large enough, then there exists a
                                                                                                                                         code (fn , gn ) with distortion
         nR                   ˆ
                    ≥ I(X n , X n )
                             n
                                         ˆ                                                                                                                                     Ed X n , gn (fn (X n )) < D + δ
                    ≥             I(Xi ; Xi ) (from Lemma 3)
                           i=1
                            n                                                                                                            and rate R such that
                    ≥                       ˆ
                                  R Ed(Xi , Xi )                  (from the definition of R(D) )                                                                                              R < R(D) + δ
                           i=1
                                       n
                                 1                    ˆ
                    = n                     R Ed(Xi , Xi )
                                 n   i=1
                                        n                                                                                                Proof: Based on random code selection. It is rather long and we omit it.
                         1                               ˆ
                    ≥ nR                         Ed(Xi , Xi )               (from Jensen’s inequality and Lemma 2)
                         n               i=1                                                                                             Note: The converse and direct theorems together imply that
                                                              n
                                                   1
                    ≥ nR(D)                 (since                 Ed(Xi , Xi ) ≤ D by assumption )                                               R(D) is the ultimate lower bound on the rate of any code
                                                   n        i=1                                                                                   compressing the source with distortion ≤ D;
         We conclude that R ≥ R(D).                                                                                                               this lower bound can be approached arbitrarily closely by coding
                                                                                                                                                  blocks of asymptotically large length.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                             15 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            16 / 40
                                                                                                                      Calculation of rate distortion functions
         Interpretation of converse and direct rate distortion theorems

         If fn : X n → {1, . . . 2nR }, define                                                                         Closed-form solutions only exist in special cases.

                                                                                                                      Binary sources
                                                              r(fn , gn ) = R
                                                                                                                                                            ˆ
                                                                                                                      First we consider the binary case X = X = {0, 1} and the Hamming
         and let d(fn , gn ) denote the distortion of (fn , gn ):                                                     distortion. Recall that
                                              d(fn , gn ) = Ed X n , gn (fn (X n ))                                                                   Hb (q) = −q log q − (1 − q) log(1 − q)
         The minimum rate of any code (fn , gn ) operating with distortion ≤ D is                                     denotes the binary entropy of q ∈ [0, 1].

                                          Rn (D) =                      min          r(fn , dn )                      Theorem 5
                                                            (fn ,gn ):d(fn ,gn )≤D
                                                                                                                      For a binary source with P (X = 1) = p and the Hamming distortion,
         Then Theorems 1 and 4 together imply that Rn (D) ≥ R(D) and                                                                          
                                                                                                                                              H (p) − H (D),                              0 ≤ D ≤ min{p, 1 − p}
                                                                                                                                                  b     b
                                                         lim Rn (D) = R(D)                                                             R(D) =
                                                        n→∞
                                                                                                                                              0,                                          D ≥ min{p, 1 − p}



Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                          17 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                          18 / 40




                                                                                                                      Proof cont’d: Let 0 ≤ D < p and let ⊕ denote mod 2 addition.
         Proof of Theorem 5: By symmetry (Hb (p) = Hb (1 − p)) we can assume                                                   x
                                                                                                                      Assume p(ˆ|x) satisfies the distortion constraint
         p ≤ 1/2 (so that min{p, 1 − p} = p).
                                                                                                                                                                   ˆ           ˆ
                                                                                                                                                             Ed(X, X) = P (X = X) ≤ D                                    (∗)
                            x
         Let D ≥ p. Define p(ˆ|x) by
                                                                                                                      Then
                                                         p(0|0) = p(0|1) = 1
                                                                                                                           ˆ
                                                                                                                      I(X, X)             = H(X) − H(X|X) ˆ
                       ˆ                 ˆ
         The resulting X is such that P (X = 0) = 1, and so
                                                                                                                                                            ˆ ˆ
                                                                                                                                          = Hb (p) − H(X ⊕ X|X)
                                      ˆ           ˆ
                                Ed(X, X) = P (X = X) = P (X = 0) = p ≤ D                                                                                   ˆ
                                                                                                                                          ≥ Hb (p) − H(X ⊕ X)                            (since conditioning reduces entropy)

         so p(ˆ|x) satisfies the distortion constraint.
              x                                                                                                                           ≥ Hb (p) − Hb (D)

               ˆ
         Since X is constant,                                                                                                                               ˆ
                                                                                                                      The second inequality holds since X ⊕ X is a binary random variable
                                                                    ˆ
                                                               I(X; X) = 0                                            such that X ⊕ X                ˆ
                                                                                                                                     ˆ = 1 iff X = X, so (∗) implies

         proving that R(D) = 0 if D ≥ p.                                                                                                                  ˆ               ˆ
                                                                                                                                                    H(X ⊕ X) = Hb (P (X = X)) ≤ Hb (D)

                                                                                                                      since D < 1/2 and Hb (D) is increasing in D ∈ [0, 1/2].
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                          19 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                          20 / 40
         Proof cont’d: Thus we proved that                                                                                                                                      1−D
                                                                                                                                               1−p−D
                                                                                                                                                1−2D         0                                     -0
                                                                                                                                                                                                   *       1−p
                                ˆ
                         P (X = X) ≤ D                                  ˆ
                                                           implies I(X, X) ≥ Hb (p) − Hb (D)                                                                                                 D
                                                                                                                                                        ˆ
                                                                                                                                                        X                                              X
                                                                                                                                                                                             D
         This gives
                                                                                                                                                p−D
                                                                                                                                                             1                                     -
                                                                                                                                                                                                   j1      p
                                                     R(D) ≥ Hb (p) − Hb (D)                                                                     1−2D
                                                                                                                                                                                1−D
         for 0 ≤ D < p ≤ 1/2.
                                                                           ˆ
         The proof is finished by exhibiting a joint distribution for X and X such                                                             ˆ
                                                                                                                        (Check that the input X indeed gives P (X = 1) = p.)
         that
                                      P (X = 1) = p                                                                     Clearly,
                                                                                                                                                                     ˆ           ˆ
                                                                                                                                                               Ed(X, X) = P (X = X) = D

                                                                   ˆ
                                                            P (X = X) = D                                               and
                                                                                                                                                   ˆ               ˆ
                                                                                                                                              I(X; X) = H(X) − H(X|X) = Hb (p) − Hb (D)

                                                       ˆ
                                                  I(X; X) = Hb (p) − Hb (D)                                             This proves that
                                                                                                                                                                    R(D) = Hb (p) − Hb (D)
         The joint distribution is obtained via a binary symmetric channel (BSC)                                        if 0 ≤ D < p ≤ 1/2.
                          ˆ
         whose input is X and output is X.
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            21 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                  22 / 40




         Special case: asymptotically vanishing distortion (D = 0)                                                      Continuous source alphabet: Gaussian source

         Let D = 0. Then R(0) = Hb (p) − Hb (0) = Hb (p).                                                               The rate distortion theorem can be generalized to continuous alphabets
                                                                                                                             ˆ
                                                                                                                        X = X = R. Consider the squared error distortion d(x, x) = (x − x)2 .
                                                                                                                                                                               ˆ         ˆ
         In this case, the direct part of the rate distortion theorem states that
                                                                                                                        Assume X has probability density function (pdf) f (x). Then
         there exist codes (fn , gn ) with rate Rn ≥ Hb (p) such that
                                                                                                                                                                                                         x
                                                                                                                                                                                                      f (ˆ|x)
                                                            lim Rn = Hb (p)                                                             R(D) = inf                      x
                                                                                                                                                                     f (ˆ|x)f (x) log                                 x
                                                                                                                                                                                                                   dxdˆ
                                                           n→∞                                                                                                                                      x
                                                                                                                                                                                                 f (ˆ|x )f (x ) dx
         and                                                                                                                                                                                     ˆ
                                                                                                                                                                                             I(X;X)
                                                                            n
                                            ˆ           1                                 ˆ
                               lim Ed(X n , X n ) = lim                           P (Xi = Xi ) = 0                                                                                  x
                                                                                                                        where the infimum is taken over all conditional densities f (ˆ|x) such that
                              n→∞                  n→∞ n
                                                                            i=1

         Remarks:                                                                                                                                                 (x − x)2 f (ˆ|x)f (x) dxdˆ ≤ D
                                                                                                                                                                       ˆ      x            x

                                                                                                                                                                                   ˆ
                  The code (fn , gn ) have fixed length. If variable-length codes are                                                                                          Ed(X,X)

                                 ˆ
                  allowed, then X n = X n can be achieved.                                                              More compactly,
                  The existence of the codes (fn , gn ) also follows from the “almost
                                                                                                                                                         R(D) =                        inf                 ˆ
                                                                                                                                                                                                      I(X; X)
                  lossless” fixed-rate source coding theorem since                                                                                                                     ˆ
                                                                                                                                                                         f (ˆ|x):E[(X−X)2 ]≤D
                                                                                                                                                                            x
                            ˆ                ˆ
                  Ed(X n , X n ) ≤ P (X n = X n )
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                            23 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                  24 / 40
         Theorem 6                                                                                               Proof of Theorem 6:
                                   2
         If X ∼ N (0, σ ) and the distortion measure is the squared error                                                                              ˆ
                                                                                                                 First let D ≥ σ 2 . Then we can define X ≡ 0. We have
         distortion, then
                                                                                                                                             ˆ
                                                                                                                                        Ed(X, X) = E[(X − 0)2 ] = Var(X) = σ 2 ≤ D
                                                          2
                                                  1 log σ ,                     0 < D ≤ σ2
                                           R(D) = 2      D                                                       and
                                                                                 D > σ2
                                                 
                                                  0,                                                                                                     ˆ      ˆ      ˆ
                                                                                                                                                    I(X; X) = H(X) − H(X|X) = 0

                                                                                                                 Thus R(D) = 0 if D ≥ σ 2 .

                                                                                                                 Next we recall properties of the differential entropy
         Remark: The inverse of R(D), denoted by D(R), is called the distortion
         rate function. It represents the lowest distortion that can be achieved
                                                                                                                                                        h(X) = −                f (x) log f (x) dx
         with codes of rate ≤ R.

         For the Gaussian case D(R) is given by                                                                  and conditional differential entropy

                                                            D(R) = σ 2 2−2R                                                                       ˆ
                                                                                                                                              h(X|X) = −                            ˆ           x     x
                                                                                                                                                                              f (x, x) log f (x|ˆ) dxdˆ



Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                     25 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         26 / 40




                       ˆ
          (1) If X and X are jointly continuous, then
                                                                                                                 Proof cont’d:                                                   ˆ
                                                                                                                                              Assume f (ˆ|x) is such that E[(X − X)2 ] ≤ D.
                                                                                                                                                        x
                                                           ˆ               ˆ
                                                      I(X; X) = h(X) − h(X|X)
                                                                                                                 Then
          (2) Conditioning reduces differential entropy:
                                                                                                                           ˆ
                                                                                                                      I(X; X)            = h(X) − h(X|X)  ˆ
                                                                    ˆ
                                                                h(X|X) ≤ h(X)                                                              1                      ˆ ˆ
                                                                                                                                         =   log(2πeσ 2 ) − h(X − X|X) (since X ∼ N (0, σ 2 ))
                                                                                                                                           2
                                                ˆ
                  where equality holds iff X and X are independent.                                                                         1                      ˆ
                                                                                                                                         ≥   log(2πeσ 2 ) − h(X − X)   (from (2))
                                                                                                                                           2
          (3) For X ∼ N (0, σ 2 ),                                                                                                         1                1
                                                                                                                                         ≥   log(2πeσ 2 ) − log(2πeD) (from (4))
                                                                            1                                                              2                2
                                                           h(X) =             log(2πeσ 2 )                                                 1     σ2
                                                                            2                                                            =   log
                                                                                                                                           2     D
          (4) Gaussian random variables maximize differential entropy for a given
              second moment: If E(Z 2 ) ≤ D, then                                                                We conclude that
                                                                                                                                                                                    1     σ2
                                                                  1                                                                                                R(D) ≥             log
                                                            h(Z) ≤ log(2πeD)                                                                                                        2     D
                                                                  2
                  where equality holds iff Z ∼ N (0, D).
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                     27 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         28 / 40
         Proof cont’d: If 0 ≤ D < σ 2 , the lower bound can be achieved using
                                                    ˆ
         the following joint distribution for X and X:

                                 ˆ
                             X = X + Z,                  ˆ
                                                         X ∼ N (0, σ 2 − D),       Z ∼ N (0, D)
                                                                                                                               Proof cont’d: The lower bound and the test channel together imply
                                   ˆ
         where Z is independent of X.                                                                                          that for 0 < D < σ 2 ,
                                                                                                                                                                1     σ2
                                                                                                                                                        R(D) = log
         Then X ∼ N (0, σ 2 ) and we obtain the following “test channel” with                                                                                   2     D
         independent Gaussian noise:
                                                                                                                               Since R(D) = 0 if D ≥ σ 2 , conclude that
                                                                                                                                                                               2
                ˆ
                X ∼ N (0, σ 2 − D)                               -+          - X ∼ N (0, σ 2 )                                                                          1 log σ ,                  0 < D ≤ σ2
                                                                       6                                                                                         R(D) = 2      D
                                                                                                                                                                                                    D ≥ σ2
                                                                                                                                                                       
                                                                                                                                                                        0,
                                                              Z ∼ N (0, D)

         It is easy to check that

                     ˆ                                                     ˆ                     1     σ2
              E[(X − X)2 ] = E[Z 2 ] = D,                             I(X; X) = h(X) − h(Z) =      log
                                                                                                 2     D

Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                   29 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                    30 / 40




                                                                                                                               Proof of Theorem 7: We can use the same steps as in the calculation of
         For non-Gaussian sources R(D) is not known in closed form.
                                                                                                                               R(D) for Gaussian sources.
         If X ∼ f (x) has finite differential entropy h(X), a lower bound can be                                                                                    ˆ
                                                                                                                               Assume f (ˆ|x) is such that E[(X − X)2 ] ≤ D.
                                                                                                                                         x
         obtained.
                                                                                                                               Then
         Theorem 7 (Shannon lower bound)
         Let X have pdf f (x) and finite differential entropy h(X). Then its                                                                                           ˆ
                                                                                                                                                                I(X; X)                         ˆ
                                                                                                                                                                                   = h(X) − h(X|X)
         rate distortion function for the squared error distortion is lower                                                                                                                       ˆ ˆ
                                                                                                                                                                                   = h(X) − h(X − X|X)
         bounded as                                                                                                                                                                               ˆ
                                               1                                                                                                                                   ≥ h(X) − h(X − X)
                               R(D) ≥ h(X) − log(2πeD)                                                                                                                                      1
                                               2                                                                                                                                   ≥ h(X) − log(2πeD)
                                                                                                                                                                                            2
                                                                                                                               From the definition
         Remark: The Shannon lower bound can easily be expressed in terms of                                                                                                                                  ˆ
                                                                                                                                                                R(D) =                        inf        I(X; X)
         the distortion rate function:                                                                                                                                                       ˆ
                                                                                                                                                                                f (ˆ|x):E[(X−X)2 ]≤D
                                                                                                                                                                                   x

                                                                     1 −2(R−h(X))                                              we conclude that
                                                    D(R) ≥              2
                                                                    2πe                                                                                                                    1
                                                                                                                                                            R(D) ≥ h(X) −                    log(2πeD)
                                                                                                                                                                                           2
Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                                   31 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                    32 / 40
         Remarks:

                  The Shannon lower bound
                                                                                                                       Connections with vector quantization
                                                                 1
                                                    R(D) ≥ h(X) − log(2πeD)
                                                                 2                                                              Rate distortion theory characterizes the ultimate performance limit
                  is easy to calculate since it only depends on D and h(X). It will be                                          for lossy compression with block codes (vector quantizers) as
                  very useful in comparing the performance of practical codes to the                                            n → ∞.
                  theoretical limit.                                                                                            We have investigated the performance of optimal vector quantizers
                  The Shannon lower bound can be derived for more general distortion                                            of a fixed dimension n.
                                                 ˆ
                  measures. For example, if d(x, x) is a difference distortion measure
                  in the form                                                                                          Assume {Xi } = X1 , X2 , . . . is a sequence of i.i.d. random variables with
                                                                                                                                           2
                                          d(x, x) = ρ(x − x)
                                               ˆ            ˆ                                                          a pdf such that E(Xi ) < ∞. Let X be a generic r.v. having the same
                                                                                                                       pdf as the Xi ’s.
                  then (if ρ is sufficiently well behaved)

                                                 R(D) ≥ h(X) −                    max       h(Y )
                                                                              Y :Eρ(Y )≤D



Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           33 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                34 / 40




         Introduce the optimal fixed and variable-rate n-dimensional VQ                                                 Source coding theorem
         performance in coding the n-block X n = (X1 , . . . , Xn ).
                                                                                                                       The converse rate distortion theorem can be restated as:
         Optimal fixed-rate VQ performance:
                                                                                                                       Theorem 8 (Converse to the rate distortion theorem)
                                                                       1
                                               Dn,F (R) =      inf       D(Q)                                          For all n ≥ 1,
                                                          Q : rF (Q)≤R n
                                                                                                                                                                         Dn,V (R) ≥ D(R)
         Optimal variable-rate VQ performance:
                                                                                   1                                   Remarks:
                                               Dn,V (R) =                   inf      D(Q)
                                                                    Q : rV (Q)≤R   n
                                                                                                                                The theorem implies that Dn,F (R) ≥ D(R).
         Note that                                                                                                              The theorem says that D(R) is an ultimate lower bound on the
                                                        Dn,V (R) ≤ Dn,F (R)                                                     distortion of any block code operating at (fixed or variable) rate R.




Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                           35 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                36 / 40
         Theorem 9 (Direct part of the source coding theorem)
                                                                                                                     Unfortunately, D(R) is known explicitly for some special
         For any R ≥ 0,                                                                                              distributions only.
                                                       lim Dn,F (R) = D(R)
                                                      n→∞
                                                                                                                                                                    2
                                                                                                                     We proved that if X is Gaussian with variance σX , then
         Remarks:                                                                                                                                                         2
                                                                                                                                                                  D(R) = σX 2−2R

                  The theorem implies that lim Dn,V (R) = D(R).
                                                                n→∞
                                                                                                                     In the general case, only bounds are known. We proved the Shannon
                  The theorem says that for any R ≥ 0 there exist fixed-rate (or
                                                                                                                     lower bound: If the differential entropy h(X) is finite, then for all
                  variable-rate) vector quantizers which operate at rate R and have
                                                                                                                     R ≥ 0,
                  distortion arbitrarily close to D(R) if the quantizer dimension n is                                                                     1 −2(R−h(X))
                  large enough.                                                                                                      D(R) ≥ DSLB (R) =        2
                                                                                                                                                          2πe




Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                37 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         38 / 40




                                                                                                            Compare with the ultimate limit D(R):
         Compare D(R) with variable-rate lattice VQ performance at high rates.
                                                                                                                                     D(QΛ )    D(QΛ )    Gn 2−2(R−h(X))
         Let QΛ be the lattice VQ with optimal cell:                                                                         1≤             ≤          ≈ 1 −2(R−h(X)) = Gn 2πe
                                                                                                                                     D(R)     DSLB (R)   2πe 2
                                                          Gn = min G(R0 ).
                                                                  n  Λ⊂R

         Then                                                                                               For n = 1: (QΛ is a uniform quantizer with entropy coding) the loss is
                                            1                2            n
                                              D(QΛ ) ≈ Gn 2− n (H(QΛ )−h(X ))
                                            n                                                                                                                                      2πe
                                                                                                                                            10 log10 (Gn 2πe) = 10 log10                 = 1.53 dB
                                   n
                                                                                                                                                                                    12
         We have h(X ) = h(X1 ) + · · · + h(Xn ) = nh(X) since the Xi are i.i.d.
                                                                                                                                In terms of rate, this corresponds to a 0.255 bit rate loss.
                                       1
         Thus, with R =                n H(QΛ ),                                                            For n → ∞:
                                                                                                                                                                    D(QΛ )
                                                  1                                                                                                        lim             = lim Gn 2πe = 1
                                                    D(QΛ ) ≈ Gn 2−2(R−h(X))                                                                               n→∞       D(R)    n→∞
                                                  n
                                                                                                                                For large n, variable-rate lattice quantizers can perform
                                                                                                                                arbitrarily close to the rate-distortion limit (at high rates).


Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                39 / 40   Source Coding and Quantization VI: Fundamentals of Rate Distortion Theory                         40 / 40

				
DOCUMENT INFO