IMAGE COMPRESSION
A picture is equal to a thousand
words.
Images occupy a large amount of space. A k-
bit image which has M x N pixels requires M
x N x k bits of storage space.
How to reduce this?
Any picture contains information. This can be
compressed by removing the redundancy.
Compression in daily life : Abbreviation
BITS, APOGEE, CGPA, asap, BTW,sms
Sky, audi, GUSS
So in general, compression is achieved by encoding the
information contained in the source symbols into code
words.
For storage purposes, we compress it and at a later stage,
it can be uncompressed.
Sometimes a given codeword may not be uniquely
decodable!
Eg: WWF : World Wildlife Fund
World Wrestling Federation
The aim of coding is to reduce the amount of
redundancy in the message (image).
Many of the compression techniques can be applied
to all kinds of information (files which are not
images).
Data is the means by which we convey information.
Various amounts of data maybe used to convey the
same information. Hence compression is possible.
Compression ratio = n1 /n2
Where n1 and n2 are the number of information-carrying
units in two data sets
Redundancy can be classified into:
•Coding redundancy
•Interpixel redundancy
•Psycho-visual redundancy
An example of redundancy in the English language:
Queen, quick, quite etc. Q is always followed by u , hence
we can compress the message by not transmitting the u.
Then why do we have redundancy in the first
place?
Error correction, detection, insurance against
failures
Redundancies in daily life:
•Redundancy in Computer Networks: NEURON
•Repetition in lecture: Students who doze off for a
minute can “get back” into the lecture
•Redundancy in movies: You can miss half the
movie and still understand the story!
Coding :
Aim: To convert a message i.e, a sequence of
symbols (source alphabets) into a binary (or in
general q-ary message (consisting of code
words). Here the code alphabets are 0 and 1 and
a string made up of 0s and 1s is called a code
word.
A code is a rule which assigns to each
source symbol a code word.
A 00001
B 00010
C 00011
D 00100
E 00101
Uniquely decodable code (UDC):
Example of a code which is not a UDC
Source symbol Code word
A 00
B 10
C 101
D 110
E 1001
Try decoding 10110
Block codes and variable length codes:
In variable length coding, we represent frequently
occurring source symbols by words of shorter length and
source symbols which occur rarely by words of greater
length.
Compression is achieved if the average word length is
reduced.
L 1
lavg l (rk ) p (rk )
k 0
Here l(rk) is the length of the code word which represents
the gray level
rk and p(rk) is the probability of occurrence of this gray
level.
Now, wait a minute!
Decoding a block code is easy, because I know beforehand
that a certain number of symbols (0s and 1s) correspond to
one source symbol (i.e., gray level).
But for a variable length code how do we do the decoding?
Ans: We can either use a end-of-word character, but this
would spoil the compression ratio so we try to decode each
code word as soon we get it (instantaneously).
If instantaneous decoding is possible, then such a code is
called an IDC (instantaneously decodable code).
Eg:
Suppose that we are to find a binary coding for the
alphabet {0,1,2,3}, and we observe that 0 appears
more often than any other symbol, then we assign
the shortest code word to 0 and progressively longer
code words to other alphabets which occur less
frequently.
Source alphabet Code word
0 0
1 01
2 011
3 111
But, this code is not instantaneously decodable.
If we receive the message
011111111……1
We will not know whether the first symbol is 0,1 or
2 until the message stops!
So for a code to be instantaneous, it should satisfy
the No-prefix condition.
No code word is a prefix of another code word.
An instantaneous code for the same source alphabet is
given below:
Source alphabet Code word
0 0
1 10
2 110
3 111
In order that this code can be decoded, we send this
table to the recipient, along with the actual message
(image)
For a given code to be instantaneous, it should satisfy
the no-prefix condition. Let the length of the code
words (using binary) be d1,d2,d3,……. dn. Let the
source alphabets be {a1,a2,a3,….an}
We assume that
d1 d 2 d 3 ....... d n
Let us say, we have chosen a binary word K(a1) of
length d1. We want to choose a binary word of length
d2, but we want to avoid those words which have
K(a1) as a prefix. The number of such words is:
d 2 d1
2
Explanation:
1 0 1 0 0 0 0 1 1 0
d1
d2
For the code to be constructed:
d 2 d1
2 2
d2
1
Number of words of length d2 > Number of words
ruled out + 1
Similarly, we have to choose a word of length d3 after
ruling out the words of length d1 and d2 .
Therefore:
d3 d1 d3 d 2
2 2
d3
2 1
Or:
d1 d2 d3
1 2 2 2
d1 d2 d3 dn
1 2 2 2 ..........
2
Theorem:
Given a source alphabet of n symbols and a code
alphabet of k symbols, then an instantaneous code
with given lengths d1,d2,d3….dn of code words
exists, whenever the following Kraft’s inequality
d1 d2 d3 dn
1 k k k ..........
k
Note:
If a given code satisfies Kraft‟s inequality
then it does not mean that the given code is
instantaneous. The conclusion is that there exists
at least one instantaneous code with the same
length of code words. The ultimate test for an
instantaneous code is the No-Prefix condition
Decoding of instantaneous codes:
•Compare the first bit received with the code words in the
look-up table. If there is a match, then the corresponding
source symbol can be obtained from the look-up table.
•If there is no match, then concatenate this bit with the
next bit received and then again search for a match with
the code words in the look-up table.
•This process of concatenating the last received bit with all
the preceding bits is continued till a match is found.
•Once a match is found, the temp variable where the
concatenated bits were stored is cleared and the next bit
which is received is stored there and the entire process is
repeated.
Huffman code:
We will now describe a systematic procedure for
arriving at an efficient instantaneous coding scheme called
the Huffman code.
Information Source: An information source is a source
alphabet together with a probability distribution; i.e., a set
{a1,…….an} together with numbers P(a1),……P(an)
satisfying
n
P(a ) 1 and
i 1
i
0 P(a2)> …………> P(an)
The reduced source S* has symbols a1,a2,a3…..an-2 and a
new symbol an-1,n and its probabilities are P(a1),……
..P(an-2) and P(an-1,n) = P(an-1) + P(an).
If we cannot find a Huffman code for S*, we continue in
the same way, finding a reduction of S* till we end up with
two source symbols.
Note: Before reducing S*, we must move the new symbol
an-1,n in order to maintain the ordering by probability
1st 2nd 3rd 4th
a P(a) reduction reduction reduction reduction
A 0.4 0.4 0.4 0.4 0 0.6
E 0.2 0.2 0.2 0.4 1 0.4
B 0.1 0.2 0.2 0.2
C 0.1 0.1 0.2
D 0.1 0.1
F 0.1
We now proceed backwards by “splitting” each code word
w assigned to a sum of two probabilities to two words w0
and w1.
splitti
a* P(a*) 1st
ng 2nd spt 3rd 4th
0.6 0 0.4 1 0.4 00 0.4 00 0.4 00
0.4 1 0.4 00 0.2 01 0.2 10 0.2 11
0.2 01 0.2 10 0.2 11 0.1 010
0.2 11 0.1 010 0.1 011
0.1 011 0.1 100
0.1 101
lavg = 2 x0.4+2x0.2+3x0.4=0.8+0.4+1.2=2.4
If we represent in usual binary fixed length code lavg = 3
Implementation of Huffman Encoding:
•We have to keep track of the probabilities which
have been combined. This will be used when we
“split” the code-words when proceeding in the
reverse direction
•Instead of storing the codewords in binary format
or as strings and then concatenating, store them in
decimal format till the end. So concatenating a
binary number with a „0‟ is equivalent to adding a
„2‟ to the corresponding decimal number and
concatenating a „1‟ is like adding „1‟.
function [m,k]= huff(m,k)
global palph1 fl code;
symp=palph1(m,k)+palph1(m-1,k);
tempk=1;
for i=1:m-1,
if symp 1
Since H(S) is the product of these two quantities it
is always equal to or greater than 0.
The maximum value of H(S) occurs for the case
when all symbols are equally probable (uniform
Histogram)
n
H ( S ) pi log 2 pi
i 1
n
1 1
H ( S ) log 2 log 2 n
i 1 n n
In order to prove that this is the maximum value, we use
the following fact: the curve for ln x lies entirely below the
line y = x –1
Thus ln x < x –1 with equality only for x =1
4
3
y=x-1
2
1 y=ln x
y
0
0 0.5 1 1.5 2 2.5 3 3.5 4
-1
-2
-3
x
We compute the difference between the entropy of any
source S and the value log2n
n
1 n
So, H(S) – log2 n = pi log 2 pi log 2 n
i 1 pi i1
1 n 1
pi ln p ln n
ln 2 i 1
i
1 n 1
pi ln p n
ln 2 i 1
i
1 n 1 1 n 1 n
pi p n 1 ln 2 n =0i
ln 2 i 1 i p
i 1 i 1
Therefore H(S) < log2 n
Entropy and Average Length:
Theorem: Every binary instantaneous code of a
source S has the average length larger or equal to
the entropy S (in bits), i.e.,
L H(S)
Proof:
We denote the length of the ith code word as di
n n
L pi d i pi log 2 2 di
i 1 i 1
In order to prove the theorem,
We compute the difference H(S) – L and show that it is
less than 0.
n n
1
H ( S ) L pi log 2 pi log 2 2 di
i 1 pi i 1
We now use the inequality proved earlier,
1 n 1 1 n 1 n
pi p 2di 1 ln 2 2di pi
ln 2 i 1 i
i 1
i 1
1 n 1
di 1
ln 2 i 1 2
Using Kraft‟s inequality, we conclude that
H(S) – L <0
The definition of entropy (H(S)) which have
used till now is not the best way to estimate
randomness in an image. Hence it is called
first-order estimate of the entropy of the
source.
Why?
Look at the following two images.
Both the images have the same first-order entropy,
but they do not contain the same amount of
randomness (or information). The second image
has a lot of interpixel redundancy.
Hence we define the second and higher order
estimates of the entropy. These are obtaining by
looking at extensions of the source.
eg: Suppose we want to send a binary message in
which 0s occur 90 % of the time. We can
compress the message by dividing the message
into bytes (of two symbols each). Thus the possible
source symbols are 00, 01, 10 and 11.
We assume the source to have zero memory.
Then
Symbol 00 01 10 11
Probability 0.81 0.09 0.09 0.01
We call this as an extension of the source denoted by Sk.
Theorem: For each information source S (which has zero
memory),
H(Sk) = k H(S).
If the source has memory, then
H(Sk) < kH(S)
n n
H ( S ) pi p j log 2 ( pi p j )
Proof: 2
i 1 j 1
n n
pi p j (log 2 pi log 2 p j )
i 1 j 1
n n n n
pi log 2 pi p j pi p j log 2 p j
i 1 j 1 i 1 j 1
n
H ( S ) pi H ( S ) 2 H ( S )
i 1
Let us now see whether we can compress more
by coding the extensions of a source rather than
the original source.
The Huffman code for this new information source is:
Symbol 00 01 10 11
Code 0 10 110 111
Lavg =0.81 + 2 x 0.09 + 3 x 0.09 + 3 x 0.01 = 1.29
Therefore 0.645 bits/symbol are needed against 1
bit/symbol.
H(S2) = 0.9380 or
the minimum possible average code word length if we
code 2 symbols at a time is 0.9380 / 2= 0.469 bits/symbol.
We can compress more if we code 3 symbols at a time and
so on.
Example where coding of extensions of a source is useful:
In the English language, all the alphabets do not have the
same probability.
Which one has the most probability?
Ans: e
We can arrange the characters in decreasing order of
probability and achieve compression by an instantaneous
code.
However, if we try to code combinations of characters, the
entropy per symbol can lessened even further.
Eg: We know that some combination of letters occur more
frequently than others: it, in, ing as opposed to sz,pq etc