EE665000 視訊處理
Chapter 8
Still Image Compression
1
8.1 Basics of Image
Compression
Purposes:
1. To remove image redundancy
2. To increase storage and/or transmission efficiency
Why?
l Still Pictures
for ISO JPEG Standard Test Pictures:
720(pels) × 576(lines) × 1.5bytes
=4.977 Mbits
=77.8(secs) @ 64k(bits/sec)
2
8.1 Basics of Image
Compression (cont.)
How?
1. Use characteristics of images(statistical):
(a) General ---using statistical model from Shannon’s
rate-distortion theory
(b) Particular ---using nonstationary properties of images
(wavelets, fractals, …)
2. Use characteristics of human perception (psychological):
(a) Color representation
(b) Weber-Fetcher law
(c) Spatial/temporal masking
3
8.1.1 Elements of an Image
Compression System
A source encoder consists of the following blocks:
Input Transformers Quantizer Entropy coder Binary
image T Q C bit stream
Fig. 8.1 :Block diagram of an image compression system
4
8.1.2 Information Theory
l Entropy: Average Uncertainty (Randomness)of a
stationary,
ergodic signal source X, i.e., Bits needed to resolve its
uncertainty.
Discrete- Amplitude Memoryless Sources (DMS) X:
,which is a finite-alphabet
M set containing k symbols.
x n Ax a1, a2 ,..., ak
5
8.1.2 Information Theory (cont.)
Assume that
is an independent identically distributed
process;
x n that is i, i, d
M
1,2,..., M x
Thepentropy n1 , x n2 ,...x nM p x ni
i 1
= bits/ letter
where
H X E log P X
k
pi log 2 pi
i 1
pi P X ai
6
8.1.2 Information Theory (cont.)
l Sources with Memory
Discrete-Amplitude Sources with Memory X:
x n Ax a1 , a2 ,..., ak
Assume that
x n is stationary in strict sense, i.e.,
Pn 1,n 2,...,n M x n 1 , x n 2 ,..., x n M
Pn 1T ,n 2T ,...,n M T x n 1 T ,..., x n M T
7
8.1.2 Information Theory (cont.)
The Nth order Entropy (N-tuple, N-block) is given by
1
HN X E log P X
N
1
N
P x log P x
allX
2 bits/ letter
Where x x n , x n 1 ,..., x n N 1
X X n , X n 1 ,..., X n N 1
Lower bound of the source coding is
H X lim H N X
N
8
8.1.2 Information Theory (cont.)
l Variable- Length Source Coding Theorem
Lets X be discrete –amplitude stationary & ergodic
source and HN X be its Nth order entropy, then there
exits a binary prefix code with an average bit rate b
satisfying 1
H N X b H N X
N
Where the prefix condition is observed by that no
codeword is prefix (initial part ) of another codeword.
9
8.1.2 Information Theory (cont.)
Example : source X
X: DMS i, i, d
x n Ax a sunny , b rainy
P a 1
Entropyp: 0.8, P b p2 0.2
bits/letter
H X p1 log2 p1 p2 log2 p2
0.72
10
8.1.2 Information Theory (cont.)
l How many bits needed to remove
uncertainty?
s Single-Letter Block:N=1 1 bits/letter
Source Word Codeword (Bits) Probability
a 1 0 0.8
b 0 0.2
1×0.8+1×0.2=1
b
11
8.1.2 Information Theory (cont.)
l How many bits needed to remove
uncertainty? s Two-Letter Block: N=2
0.78 bits/letter
Source Word Codeword Probability
(Bits)
1
aa 0.64
ab 01 1 0.16
ba 001 0.16
0.04
bb 000
1×0.64+2×0.16+3×0.16+3×0.04=0.78
b
12
8.2 Entropy Coding
How to construct a real code that achieves the theoretical limits?
(a) Huffman coding (1952)
(b) Arithmetic coding (1976)
(c) Ziv-Lempel coding (1977)
13
8.2.1 Huffman coding
l Variable-Length-Coding (VLC) with the following
characteristics:
t Lossless entropy coding for digital signals
t Fewer bits for highly probable events
t Prefix codes
14
8.2.1 Huffman coding (cont.)
l Procedure
Two-stages: (Given probability distribution of X)
Stage 1: Construct a binary tree from events
Select two least probable events a & b, and
replace them by a single node, where
probability is the sum of the probability for
a & b.
Stage 2: Assign codes sequentially from the root.
15
8.2.1 Huffman coding (cont.)
Example: Let the alphabet Ax consist of four symbols as shown in
the following table.
Symbol Probability
a1 0.50
a2 0.25
a3 0.125
a4 0.125
The entropy of the source is
H=-0.5 In0.5 - 0.25 In0.25 - 0.125 In0.125 - 0.25In0.25
=1.75
16
8.2.1 Huffman coding (cont.)
The tree-diagram for Huffman coding is:
0
a1
p 0.5
a2 0
p 0.25
1
0
a3 p 0.5
p 0.125 1
1 p 0.25
a4
p 0.125
17
8.2.1 Huffman coding (cont.)
Which yields the Huffman code
Symbol Huffman code Probability
a1 0 0.50
a2 10 0.25
a3 110 0.125
a4 111 0.125
The average bit-rate
b = 1×0.5+2×0.25+3×0.125+3×0.125=1.75=H
18
8.2.1 Huffman coding (cont.)
l Performance
Implementation:
Step 1: Estimate probability distribution from
samples.
Step 2: Design Huffman codes using the probability
obtained at Step 1.
19
8.2.1 Huffman coding (cont.)
Advantage:
t Approach H(X) (with or without memory), when
block size N
t Relative simple procedure, easy to follow.
20
8.2.1 Huffman coding (cont.)
Disadvantage:
t Large N or preprocessing is needed for source with
memory.
t Hard to adjust codes in real time.
21
8.2.1 Huffman coding (cont.)
Variations:
Modified Huffman code:
Codewords longer than L become fixed-length
Adaptive Huffman codes.
22
8.2.3 Arithmetic Coding
Variable-length to Variable-length
Lossless entropy coding for digital signals
One source symbol may produce several bits; several source
symbols (letters) may produce a single bit.
Source model (Probability distribution) can be derived in real
time.
Similar to Huffman prefix codes in special cases.
23
8.2.3 Arithmetic Coding (cont.)
Principle:
A message (source string) is represented by an interval of
real numbers between 0 and 1.More frequent messages have
larger intervals allowing fewer bits to specify those intervals.
24
8.2.3 Arithmetic Coding (cont.)
Example:
Ax:
Source Probability Cumulative Probability
symbol (binary) (binary)
a 0.500 0.100 0.000 0.000
b 0.250 0.010 0.500 0.100
c 0.125 0.001 0.750 0.110
d 0.125 0.001 0.875 0.111
25
8.2.3 Arithmetic Coding (cont.)
The length of an interval is proportional to its probability
0.000 0.500 0.250
0.750 0.850 1.0
0.000 0.100 0.110 0.111 1.0
[ [ [ [ )
a b c d
Any point in the interval [0.0,0.5) represents “a”; say,
0.25(binary: 0.01), or 0.0(binary: 0.00)
Any point in the interval [0.75,0.875) represents “c”; say,
0.8125(binary: 0.1101), or 0.75(binary: 0.110)
26
8.2.3 Arithmetic Coding (cont.)
Transmitting 3 letters:
0.0 0.1 0.11 1.0
1st letter "a" [ a
[ b
[ c
[d)
0.0 0.01 0.1
2nd letter "a" [ a
[ [ [)
0.0 0.001 0.011
0.0011
3rd letter "b" [ [ [ [)
t Any point in the interval [0.001,0.0011) identifies “aab”;
say, 0.00101, or 0.0010.
t Need a model (probability distribution)
27
8.2.3 Arithmetic Coding (cont.)
Procedure: Recursive computation of key values of an
interval:
C (Code Point)-leftmost point
A (Interval Width)
Receiving a symbol ai
New C = Current C + (current A × Pi )
New A = Current A × pi
Where Pi = Cumulative probability of a i
P = Probability of a
i
i
28
8.2.3 Arithmetic Coding (cont.)
【Encoder】
Step 0: Initial C = 0; Initial A = 1
Step 1: Receive a source symbol (If no more symbols,
it’s EOF) Compute New C and New A
Step 2: If EOF, {send the code string that identifies this
current interval; stop}
Else {Send the code string that has been
uniquely determined so far. Goto step 1}
29
8.2.3 Arithmetic Coding (cont.)
【Decoder】
Step 0: Initial C = 0; Initial A = 1
Step 1: Examine the code string received so far, and search
for the interval in which it lies.
Step 2: If a symbol can be decided, decode it.
Else goto step 1
Step 3: If {this symbol is EOF, STOP}
Else {Adjust C and A; goto step 2}
30
8.2.3 Arithmetic Coding (cont.)
More details
(I.H. Witten et al, “Arithmetic Coding for Data
Compression”, COMM ACM, pp.520-540, June 1987)
Integer arithmetic scale intervals up
Bits to follow (undecided symbol…)
Updating model
31
8.2.3 Arithmetic Coding (cont.)
Performance
Advantages
(1) Approach H1 X when possible delay and data
precision
(2) Adapted to the local statistics
(3) Inter-letter correlation can be reduced by using
conditional probability(model with context)
(4) Simple procedures without multiplication and division
have been developed(IBM Q-coder, AT&T
Minimax-coder)
Disadvantages:
Sensitive to channel errors.
32
8.3 Lossless Compression
Methods
(1) Lossless prediction coding
(2) Run-length coding of bit planes.
33
8.3.1 Lossless Predictive
Coding
Sample Residual
+
_
Previous
Sample
(a)
Residual Reconstructed
+ Sample
Previous
Sample
(b)
Figure: Block diagram of a)an encoder, and b) a decoder using a
simple predictor
34
8.3.1 Lossless Predictive
Coding (cont.)
Example: integer prediction
b c d
a x
x int a b 2
^
or
x int a b c d 4
^
35
8.3.1 Lossless Predictive
Coding (cont.)
Relative frequency Relative frequency
255
0
Original image intensity
255
Integer prediction error255
(a) (b)
Histogram of (a)the original image intensity and (b) integer prediction error
36
8.3.2 Run-Length Coding
Source model: First-order Markov sequence, probability
distribution of the current state depends only on the previous
state,
P X n X n 1 , X n 2 ,... P X ' n X n 1
Procedure:
Run=k: (k-1) non-transitions followed by a transition.
BBB W BB WWW ...
3 1 2 ...
tB
t
1- w t
1- B
W B
37 tw
8.3.2 Run-Length Coding (cont.)
lRemarks:
♦ All runs are independent (If runs are allowed to be )
♦ Entropy of runs Entropy of the original source
♦ Modified run-length codes:
RUNS a limit L
38
8.3.2.1 Run-Length Coding of
Bit-Plane
0
1
0
0
0
130
0
0
1
8-bit gray level
Mosst significant
Figure: Bit-plane decomposition of an 8-bit image
Gray code
1_D RLC
2_D RLC
39
8.4 Rate-Distortion Theory
x n Loss Coding y n
y n x n
Distortion Measure: d x n , y n 0
40
8.4 Rate-Distortion Theory (cont.)
Random variables X and Y are related by mutual information I
with P x, y
I X ; Y P x, y log 2
P x P y
H X H X Y
Where H X represents the uncertainty about X before knowing Y,
H X Y represents the uncertainty about X after knowing Y, and
I X ;Y represents the average mutual information, i.e., the
information provided about X by Y.
41
8.4.1 Rate-Distortion Function
For Discrete-Amplitude Memoryless Source (DMS) X:
x Ax ai , which forms source alphabet
y Ay bi , which forms destination alphabet
with P X given, and
Single-letter distortion measure: d ai , bi
Hence, average distortion E d x, y P x, y d x, y
42
8.4.1 Rate-Distortion Function
(cont.)
l Rate-Distortion Function R D or (Distortion-Rate Function D R )
min
R D I X ; Y E d x, y D
P X , Y
-The average number of bits needed to represent a source
symbol, if an average distortion D is allowed.
43
8.4.2 Source Coding Theorem
l A code or codebook B of size M, block length N is a set of
reproducing vectors (code words)
B = y1 , y2 ,..., yM , , where each code word has N components,
yi yi 1 , yi 2 ,..., yi N ,
l Coding Rule: A mapping between all the N-tuple source
words x and B. Each source word x is mapped to
the codeword y B that minimizes d x, y ; that is,
min
d x B
yB
d x, y
44
8.4.2 Source Coding Theorem
(cont.)
Average distortion of code B:
E d x B P x d x B
1
N
all x
l Source Coding Theorem:
For a DMS X with alphabet Ax ,probability p(x) , and a
single-letter distortion measure d( , ) , then, for a given
average distortion D, there exists a sufficient large block
length N, and a code B of size M and block length N such
that log M
Rate R D
N
45
8.4.2 Source Coding Theorem
(cont.)
In other words, there exits a mapping from the source symbols to
codewords such that for a given distortion D, R D bits/symbol are
sufficient to enable source reconstruction with an average
distortion that is arbitrarily close to D. The function R D is called
the rate-distortion function. Note that R 0 H X The actual rate R
should obey
R R D for the fidelity level D.
46
8.4.2 Source Coding Theorem
(cont.)
Rate,R
H(X)
R(D)
D max
0 Distribution, D
Distortion, D
47
8.5 Scalar Quantization
Block size = 1
--Approximate a continuous-amplitude source with finite levels
, given by
Q s ri , if s di 1, di , i 1,..., L
A scalar quantizer Q is a function that is defined in terms of a
ri
finite set of decision levels d i and reconstruction levels where L is
the number of output states.
48
8.5.1 Lloyd-Max Quantizer
(Nonuniform Quantizer)
To minimize
L di
E s ri s ri p s ds
2 2
i 1 di1
w.r.t ri and di , it can be shown that the necessary conditions are
given by d i
sp s ds
ri , 1 i L
di 1
di
p s ds
di 1
ri ri 1
di , 1 i L 1
2
49 with d 0 , d L
8.5.1 Lloyd-Max Quantizer
(Nonuniform Quantizer) (cont.)
Example 1: Gaussian DMS X 0, X with squared-error
2
distortion 1 X2
log 2 , 0 D X
2
R D 2 D
0 , D> X
2
D R 22 R X
2
Uniform scalar quantizer at high rates:
1 2 X 2
R D log 2
*
R ( D ) 0.255
2 D
D* R 2 22 R X
2
e
where 2 2.71
50 6
8.5.1 Lloyd-Max Quantizer
(Nonuniform Quantizer) (cont.)
Example 2: Lloyd-Max quantization of a Laplacian distribution
signal with unity variance.
Table: The decision and reconstruction levels for Lloyd-Max
quantizatizers.
levels 2 4 8
di ri di ri di ri
1 0.707 1.102 0.395 0.504 0.222
2 1.810 1.181 0.785
3 2.285 1.576
4 2.994
1.141 1.087 0.731
51
8.5.1 Lloyd-Max Quantizer
(Nonuniform Quantizer) (cont.)
Where defines uniform quantizers. In case of nonuniform
quantizer with L=4,
d 2 d0 d 1 d1 1.102 d 0 d2 0 d 1 d3 1.102 d 2 d4
X X X X
r 2 r1 1.810 r 1 r2 0.395 r1 r3 0.395 r 2 r4 1.810
Example 3: Quantizer noise
For a memoryless Gaussian signal s with zero mean and
variance 2, we express the mean square quantization noise as
D E ss ,
2
52
8.5.1 Lloyd-Max Quantizer
(Nonuniform Quantizer) (cont.)
then the signal noise ratio in is given by
2
SNR 10 log10
D
It can be seen that SNR 40dB implies D 10, 000 . Substituting this
2
result into the rate-distortion function for a memoryless Gaussian
source, given by
1 2
R D log 2
2 D
we have R D 7 bits/sample. Likewise, we can show that
quantization with 8 bits/sample yields approximately 48dB SNR .
53
8.6 Differential Pulse Code
Modulation (DPCM)
e
Input e Entropy Binary strean
- Q coder
Quantizer
s
s
P +
Predictor
(a)
Binary Entropy e s Decoded
strean decoder + image
s P
Predictor
(b)
with s s e for reconstruction
Block diagram of a DPCM a)encoder b)decoder
54
Prediction: s Pr ed s
8.6.1 Optimal Prediction
For 1-D case
s n h k s n k
k
a k s n k in the source model
k
55
8.6.1.1 Image Modeling
Modeling the source image by a stationary random field, a linear
minimum mean square error (LMMSE) predictor, in the form
s n1 , n2 c1s n1 1, n2 1 c2 s n1 1, n2 c3s n1 1, n2 1 c4 s n1 , n2 1
can be designed to minimize the mean square prediction error
T
E s s s s
56
8.6.1.1 Image Modeling (cont.)
The optimal coefficient vector c is given by
c 1
where Rs 0, 0 Rs 0,1 Rs 0, 2 Rs 1, 0
Rs 0, 1 Rs 0, 0 Rs 0,1 Rs 1, 1
Rs 0, 2 Rs 0, 1 Rs 0, 0 Rs 1, 2
Rs 1, 0 Rs 1,1 Rs 1, 2 Rs 0, 0
and Rs 1,1
Rs 1, 0 Rs 1, 1 Rs 0,1
T
57
8.6.1.1 Image Modeling (cont.)
Although the optimal coefficient vector is optimal in the
LMMSE sense, they are not necessarily optimal in the sense of
minimizing the entropy of the prediction error. Furthermore,
images rarely obey the stationary assumption. As a result, most
DPCM schemes employ a fixed predictor.
58
8.6.1.1 Image Modeling (cont.)
The analysis and design of the optimum predictor is difficult,
because the quantizer is inside the feedback loop.
A heuristic idea is adding a quantization noise rejection filter
before the predictor.
To avoid channel error propagation, leaky predictor may be
useful.
Variation of DPCM: Adaptive Prediction and Quantization.
59
8.6.2 Adaptive Quantization
Adjusting the decision and reconstruction levels according to
the local statistics of the prediction error.
Q1
0.5
Q2
1.0 Coded
Block block
Q3
1.75
Q4
2.5
60
8.7 Delta Modulation
Sn S n 1
e
e
Fig. The quantizer for delta modulation
61
8.7 Delta Modulation (cont.)
Granularity Slope overload
Fig. Illustration of granular noise and slope overload
62
8.8 Transform Coding
Motivation
Rate-Distortion Theory –Insert distortion in frequency domain
following the rate-distortion theory formula.
Decorrelation-Transform coefficients are (almost) independent.
Energy Concentration-Transform coefficients are ordered
according to the importance of their information contents.
63
8.8.1 Linear Transforms
Discrete-Space Linear Orthogonal Transforms
Separable Transform:
y k , l t1 m, k x m, n t2 n, l
m n
choose t1 , t2 ,
DFT(Discrete Fourier Transform)
DCT(Discrete Cosine Transform)
DST(Discrete Sine Transform)
WHT(Walsh-Hadamard Transform)
…
<= Fixed Basis Functions
64
8.8.1 Linear Transforms (cont.)
Non-separable Transform:
KLT(Karhunen-Loeve)
-Basis functions bk are derived from the auto-correlation
matrix R of the source signals by R bk k bk
65
8.8.1 Linear Transforms (cont.)
KL Transform
The KLT coefficients are uncorrelated. (If x is gaussian,
KLT coefficients are independent.)
KLT offers the best energy compaction.
If x is 1st-order Markov with correlation coefficients :
DCT is the KLT of such a source with 1
DST is the KLT of such a source with 0
Performance of typical still images:
DCT KLT
66
8.8.2 Optimum Transform Coder
For stationary random vector source x witch covariance matrix R
x n Linear y k Scalar y k Linear x n
Transform Quant Transform
A Q B
Goal: Minimize mean square coding errors
E xx x x
T
67
8.8.2 Optimum Transform Coder
(cont.)
Question:
What are the optimum A, B & Q ?
Answer:
A is the KLT of x
Q is the optimum (entropy-constrained) quantizer for
each y k
B A1
68
8.8.3 Bit Allocation
How many bits should be assigned to each transform coefficient?
Total Rate
N
1
R
N
n
k 1
k
where N is the block size, and nk is the bits assigned to the k th
coefficient, y k
69
8.8.3 Bit Allocation (cont.)
Distortion: MSE in transform domain
N
E y k y k
1
2
D
N k 1
Recall the rate-distortion function of the
optimal scalar quantizer is
Dk k2 k2 22 nk
70
8.8.3 Bit Allocation (cont.)
is the variance of the coefficient y k , and k is a
2
where k2
constant depending on the source probability distribution (=2.71
for Gaussian distribution).
N
1
Hence, D
N
k 1
2
k
2
k 22 nk
71
8.8.3 Bit Allocation (cont.)
For a given total rate R, assuming all k ’s have the same
value (= ), the results are:
1 k2
log 2 ; k2
nk 2
0 ; k2
D min , k2
1 2 N
N k 1
1 N
R log 2 2
2N k ; k
2
k
If R is given, is obtained by solving the last equation.
72
8.8.3 Bit Allocation (cont.)
Except for a constant (due to scalar quantizer), the above
2
results are identical to the rate-distortion function of the
stationary Gaussian source.
That is, transform coefficients less than are not transmitted.
k
2
A B A B
k
73 0 N
8.8.4 Practical Transform Coding
[Encoder]
Image
Coeff Codes
Block Entropy
T Selection &
Encoder
Quantizer
[Decoder]
Peproduced
Codes Entropy Image Block
Decoder T -1
74
8.8.4 Practical Transform Coding
(cont.)
Block Size: 8×8
Transform: DCT (type-2 2D)
4 N 1 N 1
2n1 1 k1 2n2 1 k2
S k1 , k2 C k1 C k2 s n1 , n2 cos cos
N2 n1 0 n2 0 2N 2N
where k1 , k2 , n1 , n2 , 0,1,..., N 1, and
1
for k 0
C k 2
1
otherwise
75
8.8.4 Practical Transform Coding
(cont.)
Threshold Coding:
S k1 , k2
S k1 , k2 NINT
T k1 , k2
16 11 10 16 24 40 51 61
12 12 14 19 26 58 60 55
14 13 16 24 40 57 69 56
14 17 22 29 51 87 80 62
T k1.k2
18 22 37 56 68 109 103 77
24 35 55 64 81 104 113 92
49 64 78 87 103 121 120 101
72
92 95 98 112 100 103 99
76
8.8.4 Practical Transform Coding
(cont.)
Zig-zag Scanning:
77
8.8.4 Practical Transform Coding
(cont.)
Entropy Coding: Huffman, or Arithmetic Coding
DC, AC
Source DC DPCM
block encoder Entropy
Divide by Bits
DCT coder
QM
AC coefficients
DPCM DC Reconstructed
Bits Entropy decoder Multiply block
IDCT
decoder by QM
AC coefficients
78
8.8.5 Performance
For typical CCIR 601 pictures:
Excellent 2 bits/pel
Good 0.8 bits/pel
Blocking artifacts on reconstructed pictures at very low bit
rates (< 0.5 bits/pel)
Close to the best known algorithm around 0.75 to 2.0 bits/pel
Complexity is acceptable.
79