# Coding

Document Sample

```					           Source Coding
• Change of alphabet
– multi-level to binary
• Data compression
• Security
– encryption
• Error Detection & Correction

41
HongKong Garden Chinese Restaurant

13    Peking Duck with pancakes
14    Sweet and Sour Pork with Fried Rice
15    Chicken Chow Mien with noodles

• Sweet and Sour Pork with Fried Rice:
– 33 alphabetic symbols and 5 bit code = 165 binary
symbols
– 8 bit binary code would allow 256 choices.

42
An alphabet of 4 symbols
AAAABBCD
BAACDAAB                 Symbol Pi        I=log2(Pi)-bits
A      1/2       1
• Note that ƩPi = 1      B      1/4       2
C      1/8       3
• Entropy:               D      1/8       3
n
H   Pi log 2 1
i 1
Pi
1
1         1        1          1              R  H bits/sec
H  log 2 2  log 2 4  log 2 8  log 2 8           
2         4        8          8
1 1       1    1              • If source transmits symbols
 1 2  3  3
2 4       8    8                 at 100 baud, what is the
14
     1.75bits/symbol             data rate?
8                                – 175 bits/sec
HMAX = 2 bits/symbol                                    43
Straight Binary Coding:

Symbol         Pi        Code 1
B           A           C       D   A              1/2       00
B          1.751/4%      01
       87.5
2
C              1/8       10
0       1   0       0   1   0   1   1   D              1/8       11

• If source transmits symbols at 100 baud, what is the
transmission rate?
– 200bits/sec
– Binary digits, not information bits

44
Straight Binary Coding:
• Transmitted data:                  Symbol         Pi        Code 1
– 2 bits/symbol                  A              1/2       00
• Information content:               B          1.751/4%      01
       87.5
– 1.75 bits/symbol               C
2
1/8       10
• Coding Efficiency:                 D              1/8       11
Entropy of Source
Efficiency 
Average Codeword Length
1.75
         87.5%
2
• If source transmits symbols at 100 baud, what is the
transmission rate?
– 200bits/sec                             Informatio n rate 175
     87 .5%
– Binary digits, not information bits    transmission rate 200        45
How can the code efficiency be increased?

• Use shorter code words for more common
symbols
• Efficient codes are
– compact
– Use unequal length code words
• Eg Morse Code

46
Letter   Probability   Morse Code

E        0.131       .            M   0.025   --
T        0.105       -            U   0.025   ..-
A        0.082       .-           G   0.020   --.
O        0.080       ---          P   0.020   .--.
N        0.071       -.           Y   0.020   -.--
R        0.068       .-.          W   0.015   .--
I        0.063       ..           B   0.014   -...
S        0.061       ...          V   0.009   ...-
H        0.053       ....         K   0.004   -.-
D        0.038       -..          X   0.002   -..-
L        0.034       .-..         J   0.001   .---
F        0.029       ..-.         Q   0.001   --.-
C        0.028       -.-.         Z   0.001   --..
47
Coding example
Symbol i      Pi     Code 1   Code 2 Code 3 Code 4

A          ½        00       0      0       0
B          ¼        01       1      10     10
C          1/8      10      10     110     110
D          1/8      11      11     1110    111
Av Length              2       1.25   1.875   1.75

Efficiency           87.5     140     93     100

• For example                     • Code 2 appears to be
– Code ‘BACD’                     more than100% efficient
•   Code 1:   01001011             – What’s the catch?
•   Code 2:   101011
– Exchange 8 arbitrary
•   Code 3:   1001101110
symbol codes with a
•   Code 4:   100110111
colleague using A, B, C, D48
Symbol i     Pi    Code   Code   Code    Code
1      2      3       4
Calculation of the                   A         ½      00     0      0       0
B         ¼      01     1      10      10
average length                      C         1/8    10     10    110     110
D         1/8    11     11    1110    111
• Sequence with the stated          Av Length           2     1.25   1.875   1.75

probabilities                     Efficiency         87.5   140     93     100

– AAAABBCD
•   Code 1: 16 bits
•   Code 2: 10 bits
•   Code 3: 15 bits
•   Code 4: 14 bits
– That is
Average Length=  Pli
i
i
1 1     1   1
Average Length  code 4  = 1  2  3  3
2 4     8   8                                   49
Symbol i         Pi     Code 1    Code 2    Code 3   Code 4
A            ½           00     0         0        0
B            ¼           01     1         10      10
C           1/8          10    10        110      110
D           1/8          11     11       1110     111
Av Length                    2     1.25      1.875    1.75

Efficiency               87.5      140        93      100

• Good codes should be
– Compact
• Average length less than for other instantaneous codes
– for same source and alphabet
– comma-free
– Uniquely decipherable                                 • To be
• free from prefixes                                  instantaneous no
• Which code is compact?                                     codeword can be
– Code 4                                                  a prefix of any
• Uniquely decipherable?                                     other
– All except 2
• Code 4:
– Compact, comma free, instantaneous, 100% efficient
– This is a ‘tree code’
50
Code Tree
Highest
probability

Lowest
probability

51
Straight Binary Coding

• H = 1.75 bits/symbol                                       Symbol         Pi            Code 1
• Frequency of occurrence:                                   A              1/2           00
A A A A B B C D
00 00 00 00 01 01 10 11
B              1/4           01
• P0 = 11/16; P1 = 5/16                                      C              1/8           10
D              1/8           11
• Entropy per bit and hence per symbol
n
 11   16 5    16 
H  2  log 2  log 2   2  0.371  0.524   1.79bits/symbol                H   Pi log 2 1
Pi
 16   11 16    5                                                              i 1

• Coding has increased the information in the signal!!! ?
• Not possible, so what is happening?
– Inspection of the frequency of occurrence shows
• P0=11/16, but P(0|0)=4/6
• P1=5/16, but P(1|1)=1/2
H c   Pi P j | i  log 2
1
– That is we must use conditional probability              i   j                   P j | i 
52
Huffman codes
•        Example data
–      ABHGCCGGBGFGDEFGGEGFGGHFGFFDDFFGGHHHEEFH

1.       Arrange the symbols in descending order of probabilities
2.       Allocate '0' and '1' to the least and second least probable symbols
3.       Combine the last two symbols into a group and add their probabilities
4.       If necessary re-order the table and repeat steps (1) to (3) until all the symbols
are included
5.       Read-off the code (Right to left) as the string of allocated symbols

A 1/40                                    G-13 G-13   G-13   G-13     G-13   D&E&F-   A&B&C&H&
0 1       0       0 1 10010                                       16       G-24
B 2/40                                    F9    F-9   F-9    F-9      A&B&C& G-13     D&E&F-16
1 1       0       0 1 10011                                H-11
C 2/40                                    H-6   H-6   H-6    D&E-7    F-9    A&B&C&H-
0       0       0 1   1000                                      11
D 3/40                                    E-4   E-4   A&B&C- H-6      D&E-7
0       0     0   000
5
E 4/40                                    D-3   D-3   E-4    A&B&C-
1       0     0   001
5
F 9/40                                    C-2   A&B-3 D-3
1     0    01
G 13/40                                   B-2   C-2
1 1     11
H 6/40                                    A-1
1       0 1   101

53
Fano codes
•    Example data
–   ABHGCCGGBGFGDEFGGEGFGGHFGFFDDFFGGHHHEEFH
1.   Divide the symbol table into two groups having, as nearly as possible,
equal probability
2.   Allocate '0' and '1' to each group
3.   Subdivide each group into two equal probability halves
4.   Repeat (2) and (3) until no more division is possible
5.   Read-off the code as the string of allocated symbols

A   1/40    0   1   1      B-2      B-2      B-2       B-2
B   2/40    1   1   1   1 C-2       C-2      D-3       D-3
C   2/40    1   1   0   1 D-3       D-3      C-2       C-2
D   3/40    1   1   1   0 E-4       E-4      E-4       E-4
E   4/40    1   1   0   0 F-9       F-9
F   9/40    1   0          A-1      A-1      A-1
G   13/40   0   0          G-13     H-6      H-6
H   6/40    0   1   0      H-6      G-13
54
Huffman and Fano coding
p       Huffman   Fano     ASCII      •         Codes obtained
A    1/40    10010     011      000               –          Compact?
B    2/40    10011     1111     001               –          Instantaneous?
C    2/40    1000      1101     010        •         Average information per
D    3/40    000       1110     011                  symbol:
N
1
E    4/40    001       1100     100            H   pi log 2
i 1       pi
F    9/40    01        10       101
1        40 2      40 2       40
G    13/40   11        00       110                   log 2    log 2     log 2
40        1 40      2 40       2
H    6/40    101       010      111                 3        40 4      40 9       40
 log 2        log 2     log 2
Av. Length   106/40=   109/40   3                   40        3 40      4 40       9
13        40 6      40
2.65      =2.725                     log 2        log 2
40       13 40      6
 2.599 bits/symbol

• Efficiencies                              • Redundancies:
– Information rate/transmission rate         – 1-efficiency
• ASCII: 2.599/3=86.7%                              • ASCII: 100-86.7=13.3%
• Huffman: 2.599/2.65=98.1%                         • Huffman: 100-98.1=1.9%
• Fano: 2.599/2.725=95.4%                           • Fano:100-95.4%=4.6%
55
Letter   Probability          Morse Code    ASCII           Huffman
E        0.131       .                   100 0101   101
T        0.105       -                   101 0100   0010
A        0.082       .-                  100 0001   01100
O        0.080       ---                 100 1111   0110
N        0.071       -.                  100 1110   1001
R        0.068       .-.                 101 0010   1101
I        0.063       ..                  100 1001   1000
S        0.061       ...                 101 0011   1100
H        0.053       ....                100 1000   1110
D        0.038       -..                 100 0100   01011
L        0.034       .-..                100 1100   01010
F        0.029       ..-.                100 0110   001100
C        0.028       -.-.                100 0011   11111
M         0.025       --                  100 1010   001101
U        0.025       ..-                 101 0101   11110
G        0.020       --.                 100 0111   011101
P        0.020       .--.                101 0000   011110
Y        0.020       -.--                101 1001   001111
W         0.015       .--                 101 0111   001110
B        0.014       -...                100 0010   0111111
V        0.009       ..-                 101 0110   0111000
K        0.004       -.-                 100 1011   01110010
X        0.002       -..-                101 1000   0111001100
J        0.001       .---                100 1010   0111001110
Q        0.001       --.-                101 0001   0111001101
Z        0.001       --..                101 1010   0111001111
56
n’th extension coding:
• If symbol probabilities are a power of 2
– ½, ¼ …
– Fano and Huffman codes yield 100% efficient code
• Code groups of n succesive symbols.
– As n  
– efficiency  100%
• Shannon’s Source Coding Theorem:
– If the source symbols are coded in groups of n, then the average length
per symbol tends to the source entropy as n tends to infinity.
– That is
lim Ln       H
n      n
– where Ln is the average length of the codewords for the n-symbol
groups.

57
• Source coding can give Data Compression
• Data Compression codes are
– either
• Lossless/Reversible/Noiseless
– Huffman and Fano
• Winzip
– or
• Lossy/Irreversible/Noisy
– JPEG
– MPEG

58

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 2 posted: 2/9/2012 language: pages: 18
How are you planning on using Docstoc?