# Slide 1 - VTU eLearning Center -

Document Sample

```					REVISION OF INFORMATION THEORY
The word „Information‟ in Information Theory
means that commodity produced by the Source
for transfer to the user.
The nature of this commodity or „intelligence‟ can be as
varied as
•   Electricity (Voltage, Current or Power)
•   Coded Words (Telegraphy)
•   Spoken Words (Telephony )
•   Pictures           (Facsimile Telegraphy , Television )
•   Music / Speech (Wireless or Radio )
•   Art
A Communication System must, in general ,
consist of three essential components as
shown in the figure below .

When the communiqué is tangible or readily measurable, the
problems encountered are not difficult to solve.
Whereas when the communiqué is “intelligence” or “information”,
the general familiarity about the problem can not be assumed.
Q 1. Discuss the reasons for using Entropy
type formula for measuring Information
OR
Discuss the reasons for using logarithmic
formula for measuring Information

Entropy type formula       p log (1/p)

Explain first why log(1/p) formula is used?

First step – Establish dependence of Information on probabilities

Second step – Derive the formula for average Information or
Entropy
Since the information encountered in a
communication system are statistically
defined, the most significant feature of
which is “uncertainty” or “unpredictability”, our first task
would be to find a measure for the information.
The source, for example, may transmit at random any one of
the pre-specified messages, no specific knowledge about
which message is transmitted being known but the
probabilities of transmitting each message or some thing to
that effect is available to us.
Evidently, if the behavior of the model were predictable, that is,
there were no uncertainty or ambiguity about the message
transmitted over a communication system, there would be no real
reason for the system (why, at all, should we scratch our brains?).
In order to understand the dependence of
information on probabilities and there by arrive at
a suitable measure, we shall consider the
following situation:
Suppose you are planning to make a trip to New Delhi in a
winter season.
Although you may be watching the daily weather report on the
TV, you may wish to know, in advance, the weather conditions
that may prevail during the course of your trip and you will ring
to the weather bureau. Suppose you receive the following
forecast:
The Sun will rise………….. (p=1)
It will Rain………………… (p=1/3)
Here (i) contains no information as we are
reasonably sure in advance that (whether it is
sunny, cloudy, rainy or foggy) the sun will rise
in the east and will set in the west.
waste.
But, the forecast of rain in (ii) does provide you information
because rain is not an everyday encounter (and it is indeed
useful in arranging additional amenities like sweaters, jerkins,
canvas shoes etc.).
are relatively rare in India- that too in New Delhi.
Tornadoes are simply vacuum vessels with
diameters greater than 50mts and height
greater than 50mts, which would be flying over
100mts from the earth and moving with a
velocity of over 100Km/sec (some vague idea
only to make the student understand).
So, anything that comes underneath will be sucked and thrown
out from a height of over 150mts.
If you happen to be under the Tornado, what would be your fate!
Tornadoes are very common in places where volcanoes erupt
(Japan, Philippines, and Hawaii).
Since there are no volcanoes in India we have said it is a very
Delhi).
Thus the statement in (iii) will give you more
information and may lead to the cancellation of

From the above discussions, it must be clear now that the
“amount of information” would become more as the probability of
occurrence of the event becomes smaller.

Measure of information is an indication of the “freedom of
choice” exercised by the source in selecting a message.

If the source can freely choose from many different messages, the
user is highly uncertain as to which message will be selected for
transmission.
From the previous discussions we have for
the „Amount of information‟, I (.) associated
with an event:
(1) I (A) = Φ {p (A)} )
Where Φ {.} is to be determined.
(2) I (A) = Φ {p (A)} ≥0………0≤ p (A) ≤1
When we are sure about an event, no information is conveyed. i.e.
(3) I (A) = Φ {p (A)} =1
(4) If p (A) < p (B) then and I (A) > I (B)
i.e.          Φ{ p (A) }> Φ{ p (B) }
Many functions satisfying the above relationships
exist (in fact there are infinite solutions).
The final and deciding factor comes from the
consideration of transmitting ‘independent
‘messages.
Thus when message A is delivered the use receives I (A) units of
information.
If a second message B which is independent of A is also delivered
simultaneously the total information the user would receive is
[I (A) +I (B)] units.
Thus it follows that as   P(C) =P (A.B) =P (A). (B),
(condition for independence of two events):
(5). Φ{ P(A.B)} = Φ{ P(A).P(B) }
= Φ{ P(A) }+ Φ{ P(B) }

There is one and only one function Φ { . } that satisfies all the
five conditions listed above viz. the logarithmic function.
Thus we arrive at the function: Φ {.} =log b {.}, where b is the
logarithmic base.
Accordingly we arrive at the following definition:
Definition: Suppose A is an event with a probability of occurrence
p (A). If we are told that the event has occurred, then we say we
I (A) =log b  1 P ( A ) 
           
        
UNITS OF INFORMATION:

The flexibility available for selection of the unit
of information lies only in the logarithmic base „b‟.
In other words selection of „b‟ amounts to the
Choice of “Unit of Information”.

Thus the amount of information conveyed by one of the two
equally probable messages is:
I = logb (1/p) = logb (1/ (1/2)) = logb2 = 1 unit or b = 2
as logarithm of any number to its own base is always unity.
Hence unit of information associated with the choice, b= 2 (Binary
choice) is called 1 Binary unit or 1 Bit.
To distinguish between the binary unit and
the binary digits (0, 1), we use the notation
“binits” to represent binary digits and
“bits” to represent binary units.

This notation will be followed only in Information Theory.

Similarly if you are making a Trinary choice
(selection of 1 out of 3 equally probable messages) the logarithmic
base will be “3” and the unit of information is called “Triples”.

For a quaternary choice b = 4 and unit is “Quadruples”.
For a Decimal choice (selection of 1 out of 10
equally probable messages) b = 10 and unit is
“Decits” or also known as “Hartley‟s” after the
Stalwart who used this definition in his research
works (R.L.V.Hartely, 1928).
Although we cannot give any such explanation, you are not barred
or prevented from using the natural logarithms.
Thus I = ln (1/p) Neipers or Nats is used.
Observe that the unit of information, unless
otherwise specified, will be always “bits”; and
logarithmic base is „2‟.This shall be carefully
followed throughout.
Let S = {s1 , s2 , s3 , … ,sq } represent a discrete
memoryless source with probabilities of
occurrence P = { p1 , p2 , p3 , …, pq } and that
 pk =1, k = 1, 2, …, q
Suppose we consider a long sequence of „n‟ symbols produced by
such a zero memory source.
For example a sequence may be {10110101010010111101010010}
produced by a binary source S = {0, 1}.
This sequence has 12 zeros and 14 ones.
In this way the sequence of „n‟ symbols produced by our source
may be thought of as containing n1 symbols of type s1 , n2 symbols
of type s2 …………and nq symbols of type sq.
The amount of information associated with each
symbol of the source being I (sk) = log 1/ pk ,
k = 1, 2, 3, ……q .
We have for the total amount of information conveyed by the
sequence = IT =  I ( sk ).nk bits.
Following high school arithmetic the average amount of
information conveyed by each symbol will be
q
H   1
n    I ( sk ).nk   bits / symbol
k 1
Identifying pk = (nk /n) as the relative frequency
of the symbol sk in the sequence as the
probability of selection (assuming n to be very
large), and substituting for I (sk), we have
q
 1 
H ( S )   pk log
 p  bits / symbol

k 1     k

This then gives the „Average Information‟ conveyed by the source.
The relationship expressed in this equation is exactly similar in
form to the entropy type formula used in thermodynamics or
statistical mechanics. Hence, H(S) is usually referred to as
„Entropy of the source‟.
Basically average amount of information and
entropy are not different from each other but we
are using the symbol H because of the similarity
of the formula to that of thermodynamic
entropy.
RELATIONSHIP BETWEEN VARIOUS UNITS OF INFORMATION:

Suppose         logb x = y           then x = b y
Similarly if logc x = z           then x = c z
Hence it follows that         by=cz
Taking logarithm on both sides to the base „a„,
we get     log a b y = log a c z Or    y log a b = z log a c
    logb x  log a b = log c x  log a c
Let c = a;
then logb x  log a b = log a x (because log a a = 1)

The above equation can be rearranged, after replacing x
log a (1/p) = (log a b). logb (1/p)
This equation can now be interpreted as below.
Information in a-ary units = (log a b) × information in b-ary units.
This relation is similar to:
length in meters = (1/100)  Length in centimeters.        or
1 meter = 100 cm
Comparing we have:
Information in a-ary units =
(loga b) × information in b-ary units.
OR
1a-ary unit of information =1/(logab) b-ary units of information
1
a =10, b = 2 →1 Hartley =                         = 3.3219 bits or 3.322 bits.
( log       10   2)
1
a =3, b = 2 →1 Triple =              = 1.58496 bits or 1.585 bits.
( log 3 2 )
1
a =4, b = 2 →1 Quadruple =                   = 2 bits
( log 4 2 )
1
a =e, b = 2 →1 Nat =              = 1.442695041 bits = 1.443 bits.
( log e 2 )
To calculate logarithm to the base „2‟ we
make use of the equation
logb x  log a b = log a x
With x = a, we find logb a  log a b = 1.
This means that „logb a‟ and „loga b‟ are reciprocals of each
other. In other words we have:
logb a = 1/ (loga b) and loga b = 1/ logb a
Again using the above reciprocal property we write
log a x = (logb x) / (logb a)
Using this you can use your calculators as indicated below:
log 10 x ln x
loga x          
log 10 a ln a
1.A card is selected at random from a deck. You are told that is it
from the red suit. How much information have you received?
card?
Solution: There are 26 red cards out of 52 and accordingly p=1/2.
1
I = 1 / 2 = log2 = 1 bit.
log
If the card is to be completely specified (say Ace of Hearts) then
p=1/52 and
I   log 1 ( 1 / 52 )  log 52  log 13  log 4  ( log 13  2 )bits
Additional information needed to completely specified the card is
therefore,           I       I   I  ( log 13  1 )  4.7 bits
2. Suppose a radio announcer has a vocabulary
of 10,000 words and that she makes an
announcement of 1,000 words, selecting these
words from his vocabulary in a completely
random fashion. What is the amount of
information conveyed by him to a listener?
Solution: The announcement A is made up of 1,000 words i.e.
A= {w1, w2, w3, w4 …w1, 000}.
Therefore P (A) =P {w1, w2, w3, w4 …w1,000}.
Further as each word is selected at random, all the words are
equally probable and
p (wi ) =1/10000=10-4;. i=1, 2, 3…1,000.
Therefore
P (A) = 10-4.10-4 .10-4………10-4 =10-4000
Therefore amount of information conveyed by

I1  log 104000  4000 log 10  13287.71238bits
3. A single television picture may be thought of as an array
black, white and grey dot with roughly 500 rows and 600
columns. Suppose that each of these dots may take on any 1
of 10 distinguishable levels. What is the amount of
information provided by 1 picture? Is the old age saying” 1
picture is worth 1000 words” is an exaggeration of
undermining of the fact?
Solution: There are a total of 500600=300,000 dots or
picture elements (pixels).
Since each pixel can take on 10 distinguishable levels of
darkness, it follows:
Total number of pictures possible=
101010……………10=10 300, 000
If each of these pictures is equally likely (i.e. random
selection), the amount of information provided by one such
picture would be
I 2  log10300 ,000  300 ,000 log10  996,578.4285bits

Taking the ratio of the information‟s so found i.e. I2:I1 we find that
I 2 300 ,000log 10        Or I2 = 75 I1
                75
I1   4000log 10

A real black and white TV picture will, in general have at least 256
distinguishable darkness levels where as in the problem you are
given only 10 levels.
Even in this worst case observe that the information conveyed by
one picture is 75 times that conveyed by an announcement of 1000
words from an ideal person having 10,000 words vocabulary.
This implies that the old age saying “1 picture is worth 1000
words” is not an exaggeration but it just undermines or understates
the real fact.
4. In a certain community 25% girls are
blondes and 75% of all blondes have blue
eyes. Also, 50% of all girls have blue eyes.
If you are told that a girl has blue eyes how much information
you have received? If you are further told that she is a blonde
Solution: Let us identify „blondes‟ by „A‟ and „blue eye‟s‟ by
„B‟ ,then the data given can be translated as below.
P (A) = 0.25 = ¼; P (B) = 0.5 = ½ and P (B|A) = 0.75 = ¾
From this we have P (A .B) = P (A).P (B|A) = ¼. ¾ = 3/16
P(AB) 3 / 16 3
P ( A | B)               
P(B)   ½      8
I (B) = log 2 = 1 bit; I (A|B) = log (8/3) = 3 – 1.585= 1 .415 bits
Additional information received = I (A|B) – I (B) = 0.415 bits.
5. A table has 3 identical drawers. In one drawer
there are two gold coins. In the second there are
two silver coins. The third drawer has one gold
coin and one silver coin. A drawer is pulled at random and one coin
is taken out of the drawer at random. If you are told this is a gold
coin how much information you have received? Further, if you are
told that the other coin in the drawer is also gold, how much
I ( D1 )  log 2  1 bit

3
I ( D1 | G )  log
2
 0.585 bits
6. The probability that a student passes a certain
examination is 0.8 given that he has studied. The
probability that he passes the examination without
studying is 0.3.Assume that the probability of the student studying
for the exam is 0.6 (a lazy student indeed). If you are told that
that the student has passed the examination what is the amount
of information you have received? What is the amount of
A= Student Studies for exam   P(A) = 0.6
A = He does not Study        P (A) = 0.4
B= Passes the exam P(B|A) = 0.8 P(B|A)=0.3
P(B)=0.48+0.12=0.6 I(A) = log (1/0.6)=0.737 bits
P(A|B) = (0.8×0.6)/0.6=0.8 I(A|B)=log(1/0.8)=0.322 bits
7. Show that the entropy of the following
probability distribution is { 2- ( 1 / 2 ) n – 2}.

x       x1            x2         x3   … …                   xi           …              …            xn-1          xn

P( X= x ) 1/2 1/4 1/8                                          1/2 i                                  1 /2 n-1         1/2 n-1
You are given that               pi  1           , i  1 ,2 ,..... n  1 and pn  1
2   i
2 n 1
n
.You can veriy  pi  1
i 1
n 1                                                   n 1
n
1                    1                                 i       n1
H   pi .log 1            2 i .log 2    i
       n 1
.log 2   n 1
    2 i  2 n 1            ..... log2 2  1
as
pi                               2
i 1                   i 1                                                   i 1
1 2    3   4             n 1 n 1                                                     n1          2( n  1 )

   2  3  4  ......... n1   n1                                     and                     
2 2   2   2              2  2                                                          2 n 1          2n
1     1    2    3    4              n 2 n1 n1
 H   2  3  4  5  .....  n 1  n   n
2    2    2    2    2               2       2  2
1    1 1 1 1 1                       1 n1 (n1)
 H  H    2  3  4  5  .........  n1  n   n
2    2 2 2 2 2                     2      2    2
1    1 1      1    1    1          1 

 H    2  3  4  5  ........ n 1 
2    2 2     2    2    2         2     
    1 1     1    1    1           1 

  1   2  3  4  5  ........ n 1   1
    2 2    2    2    2          2    
1  ( 1 / 2 )n                                           n 1    1  an
                                                      
 1 U sin g ( 1  a  a  a  ......... a
2   3
)
1(1 / 2 )                                                       1a
 1  ( 1 / 2 )n 1
1
Thus H  2 
2 n 2
8.Calculate the entropy rate of a conventional
telegraph source with dash twice as long as a
dot and half as probable. The dot last for
0.2 m-secs and the same interval exists for the pause between the
symbols.
From the given data we have pdot + pdash=1 and pdash= (1/2) pdot
→ pdot=2/3 and pdash=1/3

τ dash=2 τ dot, τ dot= τ pause= 0.2 m-secs τ dash =0.4 m-secs
→ τ = (⅔) 0.2+ (⅓) 0.4 + 0.2 = (1.4/3) m-secs
→ rs= (1/ τ) = (3000/1.4) symbols/sec
H= (2 /3) log (3/2) + (1/3) log3 =0.92 bits/symbol
R=rs .H= (3000/1.4)  0.92 =1971.43 bits/sec
9. A certain data source has eight symbols that
are produced in blocks of three at a rate of
1000 blocks/sec. The first symbol in each block
is always the same (presumably for synchronization). The remaining
two are filled by any of the eight symbols with equal probability.
What is the entropy rate?
Since all symbols are equally-probable; H= H max =log 8 =3bits/sym.
Since symbols are produced in blocks of three; net entropy is
HT = H 1 + H 2 + H 3
H1 = 0. {Because, the first symbol is always the same (a sure event),
used for synchronization}
H 2 = H 3 = 3 bits/sym. as the second and third positions may be filled
with any of the eight symbols with equal probability.
Hence, HT = 0+ 3 + 3 = 6 bits/block.
Accordingly, R = r s .H T = 6000 bits/sec.
10.In facsimile transmission of picture , there are
about 2.25 million picture elements per frame .
For good reproduction , 12 brightness levels
are necessary. Assume that all these levels are equally probable to
occur. Find the rate of information transmission if one picture is
to be transmitted every three minutes.
2.2510 6
Number of pictures possible = 12
2.25 10 6
H  log 12                 2.25  10 6 log 12  8.06625  10 6 bits / pictur e

rs =1/(3×60)=1/180 pictures per sec.

R = rs.H = 44812.5 bits/sec
11. In a certain 625 line television system the picture
is scanned 25 times per second and the aspect ratio is
4/3.If at any point in the picture, the eyes can perceive
eight graduations of light intensity, determine the rate of
transmission of information of the system, assuming that the
horizontal and vertical resolutions are equal and that the whole of
the line scan is used for picture wave form.
Layman‟s Knowledge- 625 horizontal lines.
Aspect ratio=No. of vertical lines / No. of horizontal lines.
No. of pixels=625625 (4/3) = (625)2 (4/3).
( 625 ) 2 ( 4 / 3 )
 H  log 8                           ( 4 / 3 )  ( 625 )2 log 8  4  ( 625 )2 bits / picture
 R  25 H  39.0625 Mbps
12. Consider a second order Markov source. With
Binary alphabet, S= {0, 1}. These states are
identified as: A= {00}, B= {01}, C= {10} and D= {11}
The transitional probabilities are:
P (0|00) = 0.6, P (1|00) =0.4 , P (1|11) =0.4,
P(0|11)=0.6, P(0|01)=0.2, P(1|01)= 0.8,P(0|10)=0.2
and P(1|10)=0.8
Compute i) stationary distribution
ii) entropy of the source
P(A)=0.6P(A)+0.2P(C);
P(B)=0.4P(A)+0.8P(C);
P(C)=0.2P(B)+0.6P(D)
P(D)=0.8P(B)+0.4P(D)
with P(A)+P(B)+P(C)+P(D)=1 we get
P(A)=0.5P(C); P(B)=P(C); P(D)=(4/3)P(C)
(0.5+1+1+4/3)P(C)=1 or P(C)=6/23
P(A)=3/23; P(B)=6/23; P(C)=6/23 and P(D)=8/23
these are the required stationary distribution.

H(A)= 0.6log(1/0.6)+0.4log(1/0.4)=0.971 bits/sym
H(B)=0.722 bits/sym
H(C)=H(B) and H(D) = H(A)

H(S)= P(A)H(A)+P(B)H(B)+P(C)H(C)+P(D)H(D)
=0.8411bits/sym
13. Choose the generator matrix for a (5,2) linear
block code with the objective of maximizing dmin. .
Construct the standard array for the same
First chose the H matrix as below

1   1    1             1   0   1   1   1
1   1    0      G=
1   0    0             0   1   1   1   0
HT=
0   1    0         1       1   1   0   0
0   0    1      H= 1       1   0   1   0
1       0   0   0   1
1   0   1    1   1      1   1   1    0   0
G=                        H= 1   1   0    1   0
0   1   1    1   0      1   0   0    0   1
Syndrome        Co-set
0 0 0           00 000    01 110        10 111   11 001
0 0 1           00 001    01 111        10 110   11 000
0 1 0           00 010    01 100        10 101   11 011
1 0 0           00 100    01 010        10 011   11 101
1 1 0           01 000    00 110        11 111   10 001
1 1 1           10 000    11 110        00 111   01 001
0 1 1           10 100    11 010        00 011   01 101
1 0 1           10 010    11 100        00 101   01 001
14. Determine which, if any, of the following
polynomials can generate a cyclic code with
code word length n  7. Find the (n, k) values
of any such codes that can be generated.
(a) 1 + X3 + X4            (d) 1 + X + X2 + X4
(b) 1 + X2 + X4             (e) 1 + X3 + X5
(c) 1 + X + X3 +X4
Since nmax is 7 and (n-k) for all the 5 polynomials is 4 ,kmax =3
Least value of k =1. Therefore possible values of n are 7,6 and 5

(1+X7) = (1+X+X2+X4) (1+X+X3)= (1+X2+X3+X4) (1+X2+X3)
(1+X6) = (1 + X)(1+X) (1+X2+X4) Comparing factors, only (b) and
(d) are GP‟s. (n,k) values are (6,2)
(1+X5) = (1 + X) (1+X+X2+X3+X4)
and (7,3) respectively

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 49 posted: 3/30/2010 language: English pages: 41