Document Sample

CODING THEORY A PRESENTATION ON CODING THEORY From the department of Computer Science College of Natural Sciences (COLNAS) University of Agriculture, Abeokuta (UNAAB) INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 1 CODING THEORY TABLE OF CONTENTS 1.0 Coding Theory 1.1 Data compression (or source coding) 1.1.2 Principle of source coding 1.2 Error correction (or channel coding') 2.0 Arithmetic coding 2.1 Arithmetic coding and elementary number theory 2.1.1 Example 2.2 Theoretical limit of compressed message 2.3 Using probabilities instead of frequencies 2.4 Implementation details for the probability concept 2.4.1 Defining a model 2.4.2 A simplified example 2.4.3 Encoding and decoding 2.5 Precision and renormalization 2.6 Connections between arithmetic coding and Huffman coding 2.7 US patents on arithmetic coding 2.8 Benchmarks and other technical characteristics INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 2 CODING THEORY 3.0 Huffman coding 3.1 History 3.2 Problem definition 3.2.1 Basic technique 3.2.2 Example 3.3 Main properties 3.4 Variations 3.4.1 n-ary huffman coding 3.4.2 Adaptive huffman coding 3.4.3 Huffman template algorithm 3.4.4 Length-limited huffman coding 3.4.5 Huffman coding with unequal letter costs 3.4.6 The canonical Huffman code 3.4.7 Model reconstruction 3.5 Applications 4.0 Implementation 5.0 References INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 3 CODING THEORY 1.0 Coding Theory Coding theory is studied by various scientific disciplines — such as information theory, electrical engineering, mathematics, and computer science — for the purpose of designing efficient and reliable data transmission methods. This typically involves the removal of redundancy and the correction (or detection) of errors in the transmitted data. It also includes the study of the properties of codes and their fitness for a specific application. Thus, there are essentially two aspects to Coding theory: 1. Data compression (or, source coding) 2. Error correction (or, channel coding') 1.1 DATA COMPRESSION (OR SOURCE CODING) It deals with the properties of codes and with their fitness for a specific application. Source encoding attempts to compress the data from a source in order to transmit it more efficiently. This practice is found every day on the Internet where the common "Zip" data compression is used to reduce the network load and make files smaller. The second, channel encoding, adds extra data bits to make the transmission of data more robust to disturbances present on the transmission channel. The ordinary user may not be aware of many applications using channel coding. A typical music CD uses the Reed-Solomon code to correct for scratches and dust. In this application the transmission channel is the CD itself. Cell phones also use coding techniques to correct for the fading and noise of high frequency radio transmission. Data modems, telephone transmissions, and NASA all employ channel coding techniques to get the bits through, for example the turbo code and LDPC codes. There are different coding methods used for data compression which include 1.0 Arithmetic coding 2.0 Huffman coding 3.0 Range coding 4.0 Cyclic codes 5.0 Hamming coding INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 4 CODING THEORY 1.1.2 PRINCIPLE OF SOURCE CODING Entropy of a source is the measure of information. Basically source codes try to reduce the redundancy present in the source, and represent the source with fewer bits that carry more information. Data compression which explicitly tries to minimize the average length of messages according to a particular assumed probability model is called entropy encoding. Various techniques used by source coding schemes try to achieve the limit of Entropy of the source. C(x) ≥ H(x), where H(x) is entropy of source (bitrate), and C(x) is the bitrate after compression. In particular, no source coding scheme can be better than the entropy of the source. 1.2 ERROR CORRECTION (OR CHANNEL CODING') The aim of channel coding theory is to find codes which transmit quickly, contain many valid code words and can correct or at least detect many errors. While not mutually exclusive, performance in these areas is a trade off. So, different codes are optimal for different applications. The needed properties of this code mainly depend on the probability of errors happening during transmission. In a typical CD, the impairment is mainly dust or scratches. Thus codes are used in an interleaved manner. The data is spread out over the disk. Although not a very good code, a simple repeat code can serve as an understandable example. Suppose we take a block of data bits (representing sound) and send it three times. At the receiver we will examine the three repetitions bit by bit and take a majority vote. The twist on this is that we don't merely send the bits in order. We interleave them. The block of data bits is first divided into 4 smaller blocks. Then we cycle through the block and send one bit from the first, then the second, etc. This is done three times to spread the data out over the surface of the disk. In the context of the simple repeat code, this may not appear effective. However, there are more powerful codes known which are very effective at correcting the "burst" error of a scratch or a dust spot when this interleaving technique is used. Other codes are more appropriate for different applications. Deep space communications are limited by the thermal noise of the receiver which is more of a continuous nature than a bursty INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 5 CODING THEORY nature. Likewise, narrowband modems are limited by the noise, present in the telephone network and also modeled better as a continuous disturbance. Cell phones are subject to rapid fading. The high frequencies used can cause rapid fading of the signal even if the receiver is moved a few inches. Again there are a class of channel codes that are designed to combat fading. The other cases of code theory are evident in other coding techniques such as . 2.0 Arithmetic coding Arithmetic coding is a form of variable-length entropy (which is the process of representing information in the most compact form) encoding used in lossless data compression. When we consider all the different entropy-coding methods, and their possible applications in compression applications, arithmetic coding stands out in terms of elegance, effectiveness and versatility, since it is able to work most efficiently in the largest number of circumstances and purposes. Features Among its most desirable features we have the following. 1. When applied to independent and identically distributed (i.i.d.) sources, the compression of each symbol is provably optimal (Section 1.5). 2. It is effective in a wide range of situations and compression ratios. The same arithmetic coding implementation can effective code all the diverse data created by the different processes such as modeling parameters, transform coefficients, signaling etc. 3. It simplifies automatic modeling of complex sources, yielding near-optimal or significantly improved compression for sources that are not i.i.d. Advantages 1. Its main process is arithmetic, which is supported with ever-increasing efficiency by all general purpose or digital signal processors (CPUs, DSPs). 2. It is suited for use as a \compression black-box" by those that are not coding experts or do not want to implement the coding algorithm themselves. Disadvantages Even with all these advantages, arithmetic coding is not as popular and well understood as other methods. Certain practical problems held back its adoption. 1. The complexity of arithmetic operations was excessive for coding applications. 2. Patents covered the most efficient implementations. Royalties and the fear of patent infringement discouraged arithmetic coding in commercial products. 3. Efficient implementations were difficult to understand. Solutions to the disadvantages 1. First, the relative efficiency of computer arithmetic improved dramatically, and new techniques avoid the most expensive operations. 2. Second, some of the patents have expired (e.g., [11, 16]), or became obsolete. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 6 CODING THEORY 3. Finally, we do not need to worry so much about complexity-reduction details that obscure the inherent simplicity of the method. Current computational resources allow us to implement simple, efficient, and royalty-free arithmetic coding. When a string is converted to arithmetic encoding, frequently-used characters will be stored with fewer bits and not-so-frequently occurring characters will be stored with more bits, resulting in fewer bits used in total. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding, in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0). 2.1 Arithmetic coding and elementary number theory Arithmetic coding may be examined from the perspective of number theory. It can be interpreted as a generalized change of radix. The best way to introduce the concept is to consider an elementary example. We may look at any sequence of symbols DABDDB as a number in a certain base presuming that the involved symbols form an ordered set and each symbol in the ordered set denotes a sequential integer A=0, B=1, C=2, D=3 and so on. If we make a table of frequencies and cumulative frequencies for this message it looks like the following Symbol Frequency of occurrence Cumulative frequency A 1 1 B 2 3 D 3 6 Cumulative Frequency: The total of a frequency and all frequencies below it in a frequency distribution. It is the 'running total' of frequencies. So it is 1 for A, 3 for B and 6 for D. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 7 CODING THEORY In positional numeral system radix or base numerically equal to number of different symbols used to express the number, for example, in decimal system the number of symbols is 10, they are 0,1,2,3,4,5,6,7,8,9. The base or radix is used to express any finite integer in presumed multiplier in polynomial form. For example, the number 457 is actually 4*102 + 5*101 + 7*100, where base 10 is presumed but not shown explicitly. When speaking about different radix we have to introduce different set of symbols, but, for convenience, we can simply use subset of familiar decimal digits, such as 0, 1, 2, 3, 4, and 5 for radix 6. If we choose radix=6 equal to the size of the message and convert the expression DABDDB into a decimal number we first map the letters into digits 301331 and then we shall have 65 * 3 + 64 * 0 + 63 * 1 + 62 * 3 + 61 * 3 + 60 * 1 = 23671. The result 23671 has a length of 15 bits and does not make a message close to the theoretical limit computed via entropy, which must be near 9 bits. [− 3 / 6 * log2 (3 / 6) − 2 / 6 * log2 (2 / 6) − 1 / 6 * log2 (1 / 6)] * 6 = 8.75 bits In order to make it in accordance with information theory we need to slightly generalize the classic formula for changing the radix. We have to compute LOW and HIGH limits and choose a convenient number between them. For the computation of the LOW limit we multiply each next term in the above expression by the product of the frequencies of all previously occurred symbols, so it is turned into the following expression LOW = 65 * 3 + 3 * [64 * 0 + 1 * [63 * 1 + 2 * [62 * 3 + 3 * [61 * 3 + 3 * [60 * 1]]]]] = 25002. The HIGH limit must be the LOW limit plus the product of all frequencies. HIGH = LOW + 3 * 1 * 2 * 3 * 3 * 2 = 25002 + 108 = 25110. Now we can choose any number to represent the message from the semi-closed interval [LOW, HIGH), which we can take as a number with the longest possible trail of zeros, for example 25100. In case of a long message this trail of zeros will be much longer and can be either dropped or presented as an exponent. The number 251, received after truncation of zeros, has a length of 8 bits, which is even less than the theoretical limit. In order to represent the computation of the LOW limit in a simple, easy to remember format, we can offer a table. Each row contains the factors for every term in the above formula. We can clearly see the part that distinguishes this computation of the LOW limit from the classical change of the base. It is column 'Part 2', containing the products of frequencies for all previously occurred symbols. Symbol Part 1 Part 2 Total D 65 * 3 23328 A 64 * 0 3 0 INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 8 CODING THEORY B 63 * 1 3*1 648 D 62 * 3 3*1*2 648 D 61 * 3 3*1*2*3 324 B 60 * 1 3 * 1 * 2 * 3 * 3 54 25002 In order to complete the topic we have to show how to convert the number 25100 back to the original message. The reverse process can be also shown in a table. It has two logical steps: identification of the symbol, and subtraction of the corresponding term from the result. Remainder Identification Identified symbol Corrected remainder 25100 25100 / 65 = 3 D (25100 - 65 * 3) / 3 = 590 590 590 / 64 = 0 A (590 - 64 * 0) / 1 = 590 590 590 / 63 = 2 B (590 - 63 * 1) / 2 = 187 187 187 / 62 = 5 D (187 - 62 * 3) / 3 = 26 26 26 / 61 = 4 D (26 - 61 * 3) / 3 = 2 2 2 / 60 = 2 B In identification we divide the result by the correspondent power of 6. The fractional part of the division is discarded. The result is then matched against the cumulative intervals and the appropriate symbol is selected from look up table. When the symbol is identified the result is corrected. The process is continued for the known length of the message or until the remaining result is positive. As we can see the only difference compared to the classical formula is the identification of the symbol that is not a sequential integer but the integer associated with the interval. A is always 0 and B is either 1 or 2. D is any of 3, 4, and 5. This is in exact accordance with our intervals that are determined by the frequencies. When all intervals equal to 1 we have a special case of classic base change, but the computational part is the same. The generic formula for computation of the LOW limit for the message of n symbols may be expressed as follows And the HIGH limit is computed as INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 9 CODING THEORY , Where Ci is the cumulative frequencies and fk are the frequencies of occurrences. Indexes denote the position of the symbol in a message. In case all frequencies fk are equal to 1 the formulas turn into the special case used for expressing the number in a different base. 2.2 Example Suppose we have a message that only contains the characters A, B, and C, with the following frequencies, expressed as fractions: A: 0.5 B: 0.2 C: 0.3 To show how arithmetic compression works, we first set up a table, listing characters with their probabilities along with the cumulative sum of those probabilities. The cumulative sum defines "intervals", ranging from the bottom value to less than, but not equal to, the top value. The order in which characters are listed in the table does not seem to be important, except to the extent that both the coder and decoder have to know what the order is. letter probability interval ________________________________ C: 0.3 0.0 : 0.3 B: 0.2 0.3 : 0.5 A: 0.5 0.5 : 1.0 ________________________________ Now each character can be coded by the shortest binary fraction whose value falls in the character's probability interval: letter probability interval binary fraction _______________________________________________________ C: 0.3 0.0 : 0.3 0 B: 0.2 0.3 : 0.5 0.011 = 3/8 = 0.375 A: 0.5 0.5 : 1.0 0.1 = 1/2 = 0.5 _______________________________________________________ This shows how single characters can be assigned minimum-length binary codes. However, arithmetic coding doesn't stop there and simply translate the individual characters in a message as these binary codes. It takes a subtler approach, assigning binary fractions to complete messages. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 10 CODING THEORY To start, let's consider sending messages consisting of all possible permutations of two of these three characters. We determine the probability of the two-character strings by multiplying the probabilities of the two characters, and then set up a series of intervals using those probabilities. string probability interval binary fraction _____________________________________________________________ CC: 0.09 0.00 : 0.09 0.0001 = 1/16 = 0.0625 CB: 0.06 0.09 : 0.15 0.001 = 1/8 = 0.125 CA: 0.15 0.15 : 0.30 0.01 = 1/4 = 0.25 BC: 0.06 0.30 : 0.36 0.0101 = 5/16 = 0.3125 BB: 0.04 0.36 : 0.40 0.011 = 3/8 = 0.375 BA: 0.10 0.40 : 0.50 0.0111 = 7/16 = 0.4375 AC: 0.15 0.50 : 0.65 0.1 = 1/2 = 0.5 AB: 0.10 0.65 : 0.75 0.1011 = 11/16 = 0.6875 AA: 0.25 0.75 : 1.00 0.11 = 3/4 = 0.75 _____________________________________________________________ The higher the probability of the string, in general the shorter the binary fraction needed to represent it. Let's build a similar table for three characters now: string probability interval binary fraction ______________________________________________________________________ CCC 0.027 0.000 : 0.027 0.000001 = 1/64 = 0.015625 CCB 0.018 0.027 : 0.045 0.00001 = 1/32 = 0.03125 CCA 0.045 0.045 : 0.090 0.0001 = 1/16 = 0.0625 CBC 0.018 0.090 : 0.108 0.00011 = 3/32 = 0.09375 CBB 0.012 0.108 : 0.120 0.000111 = 7/64 = 0.109375 CBA 0.03 0.120 : 0.150 0.001 = 1/8 = 0.125 CAC 0.045 0.150 : 0.195 0.0011 = 3/16 = 0.1875 CAB 0.03 0.195 : 0.225 0.00111 = 7/32 = 0.21875 CAA 0.075 0.225 : 0.300 0.01 = 1/4 = 0.25 BCC 0.018 0.300 : 0.318 0.0101 = 5/16 = 0.3125 BCB 0.012 0.318 : 0.330 0.010101 = 21/64 = 0.328125 BCA 0.03 0.330 : 0.360 0.01011 = 11/32 = 0.34375 BBC 0.012 0.360 : 0.372 0.0101111 = 47/128 = 0.3671875 BBB 0.008 0.372 : 0.380 0.011 = 3/8 = 0.375 BBA 0.02 0.380 : 0.400 0.011001 = 25/64 = 0.390625 BAC 0.03 0.400 : 0.430 0.01101 = 13/32 = 0.40625 BAB 0.02 0.430 : 0.450 0.0111 = 7/16 = 0.4375 BAA 0.05 0.450 : 0.500 0.01111 = 15/32 = 0.46875 ACC 0.045 0.500 : 0.545 0.1 = 1/2 = 0.5 ACB 0.03 0.545 : 0.575 0.1001 = 9/16 = 0.5625 INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 11 CODING THEORY ACA 0.075 0.575 : 0.650 0.101 = 5/8 = 0.625 ABC 0.03 0.650 : 0.680 0.10101 = 21/32 = 0.65625 ABB 0.02 0.680 : 0.700 0.1011 = 11/16 = 0.6875 ABA 0.05 0.700 : 0.750 0.10111 = 23/32 = 0.71875 AAC 0.075 0.750 : 0.825 0.11 = 3/4 = 0.75 AAB 0.05 0.825 : 0.875 0.11011 = 27/32 = 0.84375 AAA 0.125 0.875 : 1.000 0.111 = 7/8 = 0.875 ______________________________________________________________________ Obviously this same procedure can be followed for more characters, resulting in a longer binary fractional value. What arithmetic coding does is find the probability value of an entire message, and arrange it as part of a numerical order that allows its unique identification. * Let's stop here and send one of the binary strings defined in the table above to a decoder. We'll arbitrarily select the binary string with the decimal value of 0.21875 from the table above. This value was obtained using the probability values and intervals defined earlier: string probability interval ________________________________ C: 0.3 0.0 : 0.3 B: 0.2 0.3 : 0.5 A: 0.5 0.5 : 1.0 ________________________________ The value 0.21875 clearly falls into the interval for "C", so "C" must be the first character. We can then "zoom in" on the characters that follow the "C" by subtracting the bottom value of the interval for "C", which happens to be 0, and dividing the result by the width of the probability interval for "C", which is 0.3: (0.21875 - 0) / 0.3 = 0.72917 This is a simple shift and scaling operation. The result falls into the probability interval for "A", and so the second character must be "A". We can then zoom in on the next character by the same approach as before, subtracting the bottom value of the interval for "A", which is 0.5, and dividing the result by the width of the probability interval for "A", which is also 0.5: (0.72917 - 0.5) / 0.5 = 0.4583 This clearly falls into the probability interval for "B", and so the string has been correctly uncompressed to "CAB", which is the correct answer. Unfortunately, this leaves behind a remainder that can be decoded into an indefinitely long string of bogus characters. This is an artifact of using decimal floating-point math to perform the calculations in this example. In practice, arithmetic coding is based on binary fixed-point math, which avoids this problem. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 12 CODING THEORY One other problem is the fact that the binary fraction that is output by the arithmetic coder is of indefinite length, and the decoder has no idea of where the string ends if it isn't told. In practice, a length header can be sent to indicate how long the fraction is, or an end-of-transmission symbol of some sort can be used to tell the decoder where the end of the fraction is. 2.2 Theoretical limit of compressed message It is easy to show that the computed LOW limit never exceeds nn independently of the order of symbols, where n is the size of the message. That means that the binary length of the LOW limit can be estimated as log2 (nn) = nlog2 (n). After the computation of the HIGH limit and the reduction of the message by selecting a number from the interval [LOW, HIGH) with the longest trail of zeros we can presume that this length can be reduced on number of bits. Since each frequency in a product occurs exactly same number of times as the value of this frequency, we can use the size of the alphabet A for the computation of the product We can see it clearly in the above example. The product of all frequencies in the message is 3 * 1 * 2 * 3 * 3 * 2, where 3 occur exactly 3 times, 2 occur exactly 2 times and so on. Applying log2 for the estimated number of bits in the message we have the result . Using the numbers from above example we can see the exact match with the Shannon entropy limit calculated before 6 * log2 (6) − 3 * log2 (3) − 2 * log2 (2) − 1 * log2 (1) = 8.75. It is not a coincidence. The formulas INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 13 CODING THEORY And can be algebraically converted into each other. The latter represents the entropy E multiplied by the length of the message. It uses pi probabilities of the occurrences of the involved symbols. The entropy E is introduced by Claude Shannon in his fundamental work Mathematical Theory of Communications as a statistical characteristic used in estimation of the quantity of information. Another fundamental property that should be mentioned is the relationship between entropy and the number of possible permutations. It is already shown that compression is achieved as a result of uneven distribution of symbols. It depends on the product of all frequencies in a message and not on the order of symbols. If we consider the distribution as a fixed parameter we can simply enumerate all possible messages and pass the index of the message instead of the message itself. The maximum possible index will be equal to the number of permutations in a message. The bit length of this number is estimated by the formula . The two previous expressions estimate the number of bits in a compressed message as . These two estimates may have a noticeable difference for short messages but starting from 1000 symbols and longer there is only a fraction of a percent difference. That means that entropy is close to the bit length of the number of possible permutations divided by the size of the message. The explanation of why these formulas provide a close result is known as the Sterling approximation for the logarithm of factorial n * log2(n) − n ≈ log2(n!). Both sides of the above expression tend to each other with the increasing of n. Taking all these relationships into consideration we may define arithmetic coding as converting a message to a whole number, whose length is close to the length of the number of possible permutations for the provided statistical distribution of symbols and does not depend on the particular order of symbols. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 14 CODING THEORY 2.3 Using probabilities instead of frequencies Traditionally arithmetic coding was explained by using probabilities and not frequencies. The very detailed explanation can be found in Introduction to Arithmetic Coding, where many other sources are listed in bibliography. From a theoretical point of view usage of probabilities or frequencies does not make any difference because they can be converted into each other by multiplying both parts of equations by a constant. However, some interesting properties were overlooked by researchers because of their exclusive use of probabilities: such as the idea that arithmetic coding is only a generalized form of changing the base; and a relationship between entropy and the number of permutations. Here we can show the traditional approach to the explanation by simply dividing the expression for the LOW limit by the constant n n, or, for our particular case, by the constant 66. In this case the left-hand side will contain the cumulative probabilities and probabilities and the right-hand side will be a fraction. The table illustrating the computation of the LOW limit will also be changed accordingly. Symbol Part 1 Part 2 Total D 3/6 3/6 A 0/6 3/6 0/62 B 1/6 3/6 * 1/6 3/63 D 3/6 3/6 * 1/6 * 2/6 18/64 D 3/6 3/6 * 1/6 * 2/6 * 3/6 54/65 B 1/6 3/6 * 1/6 * 2/6 * 3/6 * 3/6 54/66 0.53587962962962962962962962962963 In the same way the HIGH limit should be computed as LOW plus the product of all probabilities and the reduction of the message can be achieved by the selection of the shortest fraction between two intervals [LOW, HIGH). It is proven that the result always belongs to the interval [0, 1). Each step in the encoding adds a smaller and smaller number, which contributes to the growth of the fraction and needs the special treatment of the numerical processing that is known as renormalization. When the concept was derived and explained by mathematicians and passed to programmers for implementation they appeared to be bounded by the concept and started implementation close to the probability concept, which resulted in tremendous inconveniences in programming and caused a delay in delivering a reliable implementation of years or, possibly, of decades. The domination of the probability concept can be seen in every INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 15 CODING THEORY patent on arithmetic coding filed before the year 2000. The explanation of arithmetic coding as mapping a message onto the [0, 1) interval was included in claims risking the patents to be easily circumvented by programs that do not compute probabilities and not dealing with renormalization. In the frequency concept many computational challenges are avoided. In the same way as probabilities, frequencies are slightly adjusted to convenient numbers. They are scaled to make the base presented as a power of 2 and one multiplication turns into a binary shift. The long products of frequencies are computed as the mantissa and exponent, where the mantissa is maintained over the compression and the exponent resulted also as an additional binary shift. The final computational part is adding numbers that overlap and manage the propagation of carry 47568690 34598908 996245 And so on. The frequency concept is reduced to the computation of integers shifted relatively to each other and adding them in a stair looking structure. This also perfectly explains the fractional bit lengths in an optimal encoding mentioned by Shannon, which states the possibility of encoding a particular symbol in − log2 (p) bits although the number is fractional. This fractional bit length is achieved by a variable shift computed on every step. When the implementation detail for every arithmetic coder varies they all have one common thing: the limits LOW and HIGH are computed on every step. The frequency type approach does not need the computation of the HIGH limit at all, it is not part of the numerical implementation and the computational burden is twice lower. 2.4 Implementation details for the probability concept 2.4.1 Defining a model Arithmetic coders produce near-optimal output for a given set of symbols and probabilities (the optimal value is −log2P bits for each symbol of probability P, see source coding theorem). Compression algorithms that use arithmetic coding start by determining a model of the data – basically a prediction of what patterns will be found in the symbols of the message. The more accurate this prediction is the closer to optimality the output will be. Example: a simple, static model for describing the output of a particular monitoring instrument over time might be: 60% chance of symbol NEUTRAL 20% chance of symbol POSITIVE INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 16 CODING THEORY 10% chance of symbol NEGATIVE 10% chance of symbol END-OF-DATA. (The presence of this symbol means that the stream will be 'internally terminated', as is fairly common in data compression; when this symbol appears in the data stream, the decoder will know that the entire stream has been decoded.) Models can also handle alphabets other than the simple four-symbol set chosen for this example. More sophisticated models are also possible: higher-order modeling changes its estimation of the current probability of a symbol based on the symbols that precede it (the context), so that in a model for English text, for example, the percentage chance of "u" would be much higher when it follows a "Q" or a "q". Models can even be adaptive, so that they continuously change their prediction of the data based on what the stream actually contains. The decoder must have the same model as the encoder. 2.4.2 A simplified example As an example of how a sequence of symbols is encoded, consider a sequence taken from a set of three symbols, A, B, and C, each equally likely to occur. Simple block encoding would use 2 bits per symbol, which is wasteful: one of the bit variations is never used. A more efficient solution is to represent the sequence as a rational number between 0 and 1 in base 3, where each digit represents a symbol. For example, the sequence "ABBCAB" could become 0.0112013. The next step is to encode this ternary number using a fixed-point binary number of sufficient precision to recover it, such as 0.0010110012 — this is only 9 bits, 25% smaller than the naive block encoding. This is feasible for long sequences because there are efficient, in-place algorithms for converting the base of arbitrarily precise numbers. Finally, knowing the original string had length 6; one can simply convert back to base 3, round to 6 digits, and recover the string. 2.4.3 Encoding and decoding In general, each step of the encoding process, except for the very last, is the same; the encoder has basically just three pieces of data to consider: The next symbol that needs to be encoded The current interval (at the very start of the encoding process, the interval is set to [0,1), but that will change) INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 17 CODING THEORY The probabilities the model assigns to each of the various symbols that are possible at this stage (as mentioned earlier, higher-order or adaptive models mean that these probabilities are not necessarily the same in each step.) The encoder divides the current interval into sub-intervals, each representing a fraction of the current interval proportional to the probability of that symbol in the current context. Whichever interval corresponds to the actual symbol that is next to be encoded becomes the interval used in the next step. Example: for the four-symbol model above: the interval for NEUTRAL would be [0, 0.6) the interval for POSITIVE would be [0.6, 0.8) the interval for NEGATIVE would be [0.8, 0.9) The interval for END-OF-DATA would be [0.9, 1). When all symbols have been encoded, the resulting interval unambiguously identifies the sequence of symbols that produced it. Anyone who has the same final interval and model that is being used can reconstruct the symbol sequence that must have entered the encoder to result in that final interval. It is not necessary to transmit the final interval, however; it is only necessary to transmit one fraction that lies within that interval. In particular, it is only necessary to transmit enough digits (in whatever base) of the fraction so that all fractions that begin with those digits fall into the final interval. 2.5 Precision and renormalization The above explanations of arithmetic coding contain some simplification. In particular, they are written as if the encoder first calculated the fractions representing the endpoints of the interval in full, using infinite precision, and only converted the fraction to its final form at the end of encoding. Rather than try to simulate infinite precision, most arithmetic coders instead operate at a fixed limit of precision which they know the decoder will be able to match, and round the calculated fractions to their nearest equivalents at that precision. An example shows how this would work if the model called for the interval [0,1) to be divided into thirds, and this was approximated with 8 bit precision. Note that as now the precision is known, so are the binary ranges we'll be able to use. Probability Interval reduced to Interval reduced to Range in Symbol (expressed as eight-bit precision (as eight-bit precision (in binary INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 18 CODING THEORY fraction) fractions) binary) 00000000 - A 1/3 [0, 85/256) [0.00000000, 0.01010101) 01010100 01010101 - B 1/3 [85/256, 171/256) [0.01010101, 0.10101011) 10101010 10101011 - C 1/3 [171/256, 1) [0.10101011, 1.00000000) 11111111 A process called renormalization keeps the finite precision from becoming a limit on the total number of symbols that can be encoded. Whenever the range is reduced to the point where all values in the range share certain beginning digits, those digits are sent to the output. For however many digits of precision the computer can handle, it is now handling fewer than that, so the existing digits are shifted left, and at the right, new digits are added to expand the range as widely as possible. Note that this result occurs in two of the three cases from our previous example. Digits that can be sent to Range after Symbol Probability Range output renormalization 00000000 - A 1/3 0 00000000 - 10101001 01010100 01010101 - B 1/3 None 01010101 - 10101010 10101010 10101011 - C 1/3 1 01010110 - 11111111 11111111 2.6 Connections between arithmetic coding and Huffman coding 2.6.1 Huffman coding There is great similarity between arithmetic coding and Huffman coding – in fact, it has been shown that Huffman is just a specialized case of arithmetic coding – but because arithmetic coding translates the entire message into one number represented in base b, rather than translating each symbol of the message into a series of digits in base b, it will sometimes approach optimal entropy encoding much more closely than Huffman can. In fact, a Huffman code corresponds closely to an arithmetic code where each of the frequencies is rounded to a nearby power of ½ — for this reason Huffman deals relatively poorly with INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 19 CODING THEORY distributions where symbols have frequencies far from a power of ½, such as 0.75 or 0.375. This includes most distributions where there are either a small numbers of symbols (such as just the bits 0 and 1) or where one or two symbols dominate the rest. For an alphabet {a, b, c} with equal probabilities of 1/3, Huffman coding may produce the following code: a → 0: 50% b → 10: 25% c → 11: 25% This code has an expected (2 + 2 + 1)/3 ≈ 1.667 bits per symbol for Huffman coding, an inefficiency of 5 percent compared to log23 ≈ 1.585 bits per symbol for arithmetic coding. For an alphabet {0, 1} with probabilities 0.625 and 0.375, Huffman encoding treats them as though they had 0.5 probability each, assigning 1 bit to each value, which does not achieve any compression over naive block encoding. Arithmetic coding approaches the optimal compression ratio of: . When the symbol 0 has a high probability of 0.95, the difference is much greater: . One simple way to address this weakness is to concatenate symbols to form a new alphabet in which each symbol represents a sequence of symbols in the original alphabet. In the above example, grouping sequences of three symbols before encoding would produce new "super- symbols" with the following frequencies: 000: 85.7% 001, 010, 100: 4.5% each 011, 101, 110: .24% each 111: 0.0125% With this grouping, Huffman coding averages 1.3 bits for every three symbols, or 0.433 bits per symbol, compared with one bit per symbol in the original encoding. 2.7 US patents on arithmetic coding INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 20 CODING THEORY A variety of specific techniques for arithmetic coding are covered by US patents. Some of these patents may be essential for implementing the algorithms for arithmetic coding that are specified in some formal international standards. When this is the case, such patents are generally available for licensing under what is called "reasonable and non-discriminatory" (RAND) licensing terms (at least as a matter of standards-committee policy). In some well-known instances (including some involving IBM patents) such licenses are available free, and in other instances, licensing fees are required. The availability of licenses under RAND terms does not necessarily satisfy everyone who might want to use the technology, as what may be "reasonable" fees for a company preparing a proprietary software product may seem much less reasonable for a free software or open source project. At least one significant compression software program, bzip2, deliberately discontinued the use of arithmetic coding in favor of Huffman coding due to the patent situation. Also, encoders and decoders of the JPEG file format, which has options for both Huffman encoding and arithmetic coding, typically only support the Huffman encoding option, due to patent concerns; the result is that nearly all JPEGs in use today use Huffman encoding.[2] Some US patents relating to arithmetic coding are listed below. U.S. Patent 4,122,440 — (IBM) Filed March 4, 1977, Granted 24 October 1978 (Now expired) U.S. Patent 4,286,256 — (IBM) Granted 25 August 1981 (Now expired) U.S. Patent 4,467,317 — (IBM) Granted 21 August 1984 (Now expired) U.S. Patent 4,652,856 — (IBM) Granted 4 February 1986 (Now expired) U.S. Patent 4,891,643 — (IBM) Filed 15 September 1986, granted 2 January 1990 (Now expired) U.S. Patent 4,905,297 — (IBM) Filed 18 November 1988, granted 27 February 1990 (Now expired) U.S. Patent 4,933,883 — (IBM) Filed 3 May 1988, granted 12 June 1990 U.S. Patent 4,935,882 — (IBM) Filed 20 July 1988, granted 19 June 1990 U.S. Patent 4,989,000 — Filed 19 June 1989, granted 29 January 1991 U.S. Patent 5,099,440 — (IBM) Filed 5 January 1990, granted 24 March 1992 U.S. Patent 5,272,478 — (Ricoh) Filed 17 August 1992, granted 21 December 1993 Note: This list is not exhaustive. See the following link for a list of more patents. [3] The Dirac codec uses arithmetic coding and is not patent pending.[4] Patents on arithmetic coding may exist in other jurisdictions; see software patents for a discussion of the patentability of software around the world. 2.8 Benchmarks and other technical characteristics INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 21 CODING THEORY Every programmatic implementation of arithmetic encoding has a different compression ratio and performance. While compression ratios vary only a little (usually under 1%) the code execution time can vary by a factor of 10. Choosing the right encoder from a list of publicly available encoders is not a simple task because performance and compression ratio depend also on the type of data, particularly on the size of the alphabet (number of different symbols). One of two particular encoders may have better performance for small alphabets while the other may show better performance for large alphabets. Most encoders have limitations on the size of the alphabet and many of them are designed for a dual alphabet only (zero and one). INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 22 CODING THEORY CHAPTER 3 3.0 Huffman Coding In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. The term refers to the use of a variable-length code table for encoding a source symbol (such as a character in a file) where the variable-length code table has been derived in a particular way based on the estimated probability of occurrence for each possible value of the source symbol. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes". Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code (sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol) that expresses the most common characters using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type: no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to do this in linear time if input probabilities (also known as weights) are sorted. For a set of symbols with a uniform probability distribution and a number of members which is a power of two, Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm. Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e. a stream of unrelated symbols) with a known input probability distribution, it is not optimal when the symbol-by-symbol restriction is dropped, or when the probability mass functions are unknown, not identically distributed, or not independent (e.g., "cat" is more common than "cta"). Other methods such as arithmetic coding and LZW coding often have better compression capability: both of these methods can combine an arbitrary number of symbols for more efficient coding, and generally adapt to the actual input statistics, the latter of which is useful when input probabilities are not precisely known or vary significantly within the stream. However, the limitations of Huffman coding should not be overstated; it can be used adaptively, accommodating unknown, changing, or context-dependent probabilities. In the case of known independent and identically-distributed random variables, combining symbols together reduces inefficiency in a way that approaches optimality as the number of symbols combined increases. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 23 CODING THEORY 3.1 HISTORY In 1951, David A. Huffman and his MIT information theory classmates were given the choice of a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the problem of finding the most efficient binary code. Huffman, unable to prove any codes were the most efficient, was about to give up and start studying for the final when he hit upon the idea of using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing so, the student outdid his professor, who had worked with information theory inventor Claude Shannon to develop a similar code. Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. 3.2 Problem Definition Informal description Given A set of symbols and their weights (usually proportional to probabilities). Find A prefix-free binary code (a set of codewords) with minimum expected codeword length (equivalently, a tree with minimum weighted path length from the root). Formalized description Input. Alphabet , which is the symbol alphabet of size n. Set , which is the set of the (positive) symbol weights (usually proportional to probabilities), i.e. . Output. Code , which is the set of (binary) codewords, where ci is the codeword for . INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 24 CODING THEORY Goal. Let be the weighted path length of code C. Condition: for any code . Samples Symbols ) a b c d e sum Input (A,W) Weights ) 0.10 0.15 0.30 0.16 0.29 =1 Codewords 000 001 10 01 11 Codeword length (in bits) 3 3 2 2 2 Output C Weighted path length 0.30 0.45 0.60 0.32 0.58 L(C)=2.25 Probability budget = 1.00 Information content (in bits) 3.32 2.74 1.74 2.64 1.79 Optimality Entropy 0.332 0.411 0.521 0.423 0.518 H(A)=2.205 INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 25 CODING THEORY For any code that is biunique, meaning that the code is uniquely decodeable, the sum of the probability budgets across all symbols is always less than or equal to one. In this example, the sum is strictly equal to one; as a result, the code is termed a complete code. If this is not the case, you can always derive an equivalent code by adding extra symbols (with associated null probabilities), to make the code complete while keeping it biunique. As defined by Shannon (1948), the information content h (in bits) of each symbol ai with non- null probability is The entropy H (in bits) is the weighted sum, across all symbols ai with non-zero probability wi, of the information content of each symbol: (Note: A symbol with zero probability has zero contribution to the entropy. When w = 0, is an indefinite form; so by L'Hôpital's rule: . For simplicity, symbols with zero probability are left out of the formula above.) As a consequence of Shannon's source coding theorem, the entropy is a measure of the smallest codeword length that is theoretically possible for the given alphabet with associated weights. In this example, the weighted average codeword length is 2.25 bits per symbol, only slightly larger than the calculated entropy of 2.205 bits per symbol. So not only is this code optimal in the sense that no other feasible code performs better, but it is very close to the theoretical limit established by Shannon. Note that, in general, a Huffman code need not be unique, but it is always one of the codes minimizing L(C). 3.2.1 Basic technique INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 26 CODING THEORY A source generates 4 different symbols {a1,a2,a3,a4} with probability {0.4;0.35;0.2;0.05}. A binary tree is generated from left to right taking the two less probable symbols, putting them together to form another equivalent symbol having a probability that equals the sum of the two symbols. The process is repeated until there is just one symbol. The tree can then be read backwards, from right to left, assigning different bits to different branches. The final Huffman code is: Symbol Code a1 0 a2 10 a3 110 a4 111 The standard way to represent a signal made of 4 symbols is by using 2 bits/symbol, but the entropy of the source is 1.73 bits/symbol. If this Huffman code is used to represent the signal, then the average length is lowered to 1.85 bits/symbol; it is still far from the theoretical limit because the probabilities of the symbols are different from negative powers of two. 3.2.2 Example To see how Huffman coding works, assume that a text file is to be compressed, and that the characters in the file have the following frequencies: A: 29 B: 64 C: 32 D: 12 E: 9 F: 66 G: 23 In practice, we need the frequencies for all the characters used in the text, including all letters, digits, and punctuation, but to keep the example simple we'll just stick to the characters from A to G. The first step in building a Huffman code is to order the characters from highest to lowest frequency of occurrence as follows: 66 64 32 29 23 12 9 INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 27 CODING THEORY F B C A G D E First, the two least-frequent characters are selected, logically grouped together, and their frequencies added. In this example, the D and E characters have a combined frequency of 21: : ....... : 21 : : : 66 64 32 29 23 12 9 F B C A G D E This begins the construction of a "binary tree" structure. We now again select the two elements the lowest frequencies, regarding the D-E combination as a single element. In this case, the two elements selected are G and the D-E combination. We group them together and add their frequencies. This new combination has a frequency of 44: : .......... : 44 : : : : ....... : : 21 : : : : 66 64 32 29 23 12 9 F B C A G D E We continue in the same way to select the two elements with the lowest frequency, group them together, and add their frequencies, until we run out of elements. In the third iteration, the lowest frequencies are C and A: : .......... : 44 : : : : ....... : ....... : 61 : : : 21 : : : : : : 66 64 32 29 23 12 9 F B C A G D E The next iterations give: : .............. : 105 : : : : .......... : : 44 : : : : ....... : ....... : : : 61 : : : 21 : : : : : : : : 66 64 32 29 23 12 9 F B C A G D E INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 28 CODING THEORY : .............. : 105 : : : : .......... : : 44 : : : : : ....... ....... : ....... : 130 : : 61 : : : 21 : : : : : : : : 66 64 32 29 23 12 9 F B C A G D E : .................... : 235 : : : : .............. : : 105 : : : : : : .......... : : : 44 : : : : : ....... ....... : ....... : 130 : : 61 : : : 21 : : : : : : : : 66 64 32 29 23 12 9 F B C A G D E The result is known as a "Huffman tree". To obtain the Huffman code itself, each branch of the tree is labeled with a 1 or 0. It doesn't matter how the 1s and 0s are assigned, though a consistent scheme obviously is easier to deal with: : .................... :0 :1 : : : ............... : :0 :1 : : : : : ........... : : :0 :1 : : : : ....... ....... : ....... :0 :1 :0 :1 : :0 :1 : : : : : : : F B C A G D E Tracing down the tree gives the "Huffman codes", with the shortest codes assigned to the characters with the greatest frequency: INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 29 CODING THEORY F: 00 B: 01 C: 100 A: 101 G: 110 D: 1110 E: 1111 3.3 Main Properties A Huffman coder will go through the source text file, convert each character into its appropriate binary Huffman code, and dump the resulting bits to the output file. The Huffman codes won't get mixed up in decoding. The best way to see that this is so is to envision the decoder cycling through the tree structure, guided by the encoded bits it reads, moving from top to bottom and then back to the top. As long as bits constitute legitimate Huffman codes and a bit doesn't get scrambled or lost, the decoder will never get lost either. Huffman coding is optimal when the probability of each input symbol is a negative power of two. Prefix codes tend to have slight inefficiency on small alphabets, where probabilities often fall between these optimal points. "Blocking", or expanding the alphabet size by coalescing multiple symbols into "words" of fixed or variable-length before Huffman coding, usually helps, especially when adjacent symbols are correlated (as in the case of natural language text). The worst case for Huffman coding can happen when the probability of a symbol exceeds 2 −1 = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called run-length encoding; for the simple case of Bernoulli processes, Golomb coding is a provably optimal run-length code. Arithmetic coding produces slight gains over Huffman coding, but in practice these gains have seldom been large enough to offset arithmetic coding's higher computational complexity and patent royalties. (As of July 2006, IBM owns patents on many methods of arithmetic coding in the US; see US patents on arithmetic coding.) 3.4 Variations Many variations of Huffman coding exist, some of which use a Huffman-like algorithm, and others of which find optimal prefix codes (while, for example, putting different restrictions on the output). Note that, in the latter case, the method need not be Huffman-like, and, indeed, need not even be polynomial time. An exhaustive list of papers on Huffman coding and its variations is given by "Code and Parse Trees for Lossless Source Encoding"[1]. 3.4.1 n-ary Huffman coding The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to encode message and build an n-ary tree. This approach was considered by Huffman in his original paper. The same INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 30 CODING THEORY algorithm applies as for binary (n equals 2) codes, except that the n least probable symbols are taken together, instead of just the 2 least probable. Note that for n greater than 2, not all sets of source words can properly form an n-ary tree for Huffman coding. In this case, additional 0- probability place holders must be added. This is because the tree must form an n to 1 contractor; for binary coding, this is a 2 to 1 contractor, and any sized set can form such a contractor. If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Huffman tree. 3.4.2 Adaptive Huffman coding A variation called adaptive Huffman coding calculates the probabilities dynamically based on recent actual frequencies in the source string. This is somewhat related to the LZ family of algorithms. 3.4.3 Huffman template algorithm Most often, the weights used in implementations of Huffman coding represent numeric probabilities, but the algorithm given above does not require this; it requires only a way to order weights and to add them. The Huffman template algorithm enables one to use any kind of weights (costs, frequencies, pairs of weights, non-numerical weights) and one of many combining methods (not just addition). Such algorithms can solve other minimization problems, such as minimizing , a problem first applied to circuit design [2]. 3.4.4 Length-limited Huffman coding This is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Huffman's algorithm. Its time complexity is O(nL), where L is the maximum length of a codeword. No algorithm is known to solve this problem in linear or linearithmic time, unlike the presorted and unsorted conventional Huffman problems, respectively. 3.4.5 Huffman coding with unequal letter costs INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 31 CODING THEORY In the standard Huffman coding problem, it is assumed that each symbol in the set that the code words are constructed from has an equal cost to transmit: a code word whose length is N digits will always have a cost of N, no matter how many of those digits are 0s, how many are 1s, etc. When working under this assumption, minimizing the total cost of the message and minimizing the total number of digits are the same thing. Huffman coding with unequal letter costs is the generalization in which this assumption is no longer assumed true: the letters of the encoding alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An example is the encoding alphabet of Morse code, where a 'dash' takes longer to send than a 'dot', and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the weighted average codeword length, but it is no longer sufficient just to minimize the number of symbols used by the message. No algorithm is known to solve this in the same manner or with the same efficiency as conventional Huffman coding.. 3.4.6 The canonical Huffman code If weights corresponding to the alphabetically ordered inputs are in numerical order, the Huffman code has the same lengths as the optimal alphabetic code, which can be found from calculating these lengths, rendering Hu-Tucker coding unnecessary. The code resulting from numerically (re-)ordered input is sometimes called the canonical Huffman code and is often the code used in practice, due to ease of encoding/decoding. The technique for finding this code is sometimes called Huffman-Shannon-Fano coding, since it is optimal like Huffman coding, but alphabetic in weight probability, like Shannon-Fano coding. The Huffman-Shannon-Fano code corresponding to the example is {000,001,01,10,11}, which, having the same codeword lengths as the original solution, is also optimal. 3.4.7 Model reconstruction Decompression generally requires transmission of information in order to reconstruct the compression model (methods such as adaptive Huffman do not, although they typically produce less than optimal code lengths). Originally, symbol frequencies were passed along to the decompressor, but this method is very inefficient, as it can produce an unacceptable level of overhead. The most common technique utilizes canonical Huffman encoding, which only requires Bn bits of information (where B is the number of bits per symbol and n is the size of the symbol's alphabet). Other methods, such as the "direct transmission" of the Huffman tree, produce variable-length encoding of the model which can reduce the overhead to just a few bytes, in some cases. Because the compressed data can include unused "trailing bits", the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 32 CODING THEORY length of the decompressed data along with the compression model or by defining a special code symbol to signify the end of input (the latter method can adversely affect code length optimality, however). 3.5 Applications Arithmetic coding can be viewed as a generalization of Huffman coding; indeed, in practice arithmetic coding is often preceded by Huffman coding, as it is easier to find an arithmetic code for a binary input than for a nonbinary input. Also, although arithmetic coding offers better compression performance than Huffman coding, Huffman coding is still in wide use because of its simplicity, high speed and lack of encumbrance by patents. Huffman coding today is often used as a "back-end" to some other compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding. References Background story: Profile: David A. Huffman, Scientific American, September 1991, pp. 54-58 Huffman's original article: D.A. Huffman, "A Method for the Construction of Minimum- Redundancy Codes", Proceedings of the I.R.E., September 1952, pp 1098–1102 MacKay, David J.C. (September 2003). "Chapter 6: Stream Codes" (PDF/PostScript/DjVu/LaTeX). Information Theory, Inference, and Learning Algorithms. Cambridge University Press. ISBN 0-521-64298-1. http://www.inference.phy.cam.ac.uk/mackay/itila/book.html. Retrieved 2007-12-30. Rissanen, Jorma (May 1976). "Generalized Kraft Inequality and Arithmetic Coding" (PDF). IBM Journal of Research and Development 20 (3): 198–203. http://domino.watson.ibm.com/tchjr/journalindex.nsf/4ac37cf0bdc4dd6a85256547004d4 7e1/53fec2e5af172a3185256bfa0067f7a0?OpenDocument. Retrieved 2007-09-21. Rissanen, J.J.; Langdon, G.G., Jr. (March 1979). "Arithmetic coding" (PDF). IBM Journal of Research and Development 23 (2): 149–162. http://researchweb.watson.ibm.com/journal/rd/232/ibmrd2302G.pdf. Retrieved 2007-09- 22. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0- 262-03293-7. Section 16.3, pp. 385–392. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 33 CODING THEORY Witten, Ian H.; Neal, Radford M.; Cleary, John G. (June 1987). "Arithmetic Coding for Data Compression" (PDF). Communications of the ACM 30 (6): 520–540. doi:10.1145/214762.214771. http://www.stanford.edu/class/ee398a/handouts/papers/WittenACM87ArithmCoding.pdf. Retrieved 2007-09-21. INFORMATION AND COMMUNICATION THEORY BY: OBI KENNETH ABANG, COURTESY OF 2010 SETS Page 34

DOCUMENT INFO

Shared By:

Categories:

Tags:
arithmetic coding, data compression, symbol probabilities, Huffman coding, arithmetic coder, Binary Arithmetic, entropy coding, IEEE Trans, input symbol, Text Compression

Stats:

views: | 414 |

posted: | 7/21/2010 |

language: | English |

pages: | 34 |

OTHER DOCS BY kobiatech

How are you planning on using Docstoc?
BUSINESS
PERSONAL

By registering with docstoc.com you agree to our
privacy policy and
terms of service, and to receive content and offer notifications.

Docstoc is the premier online destination to start and grow small businesses. It hosts the best quality and widest selection of professional documents (over 20 million) and resources including expert videos, articles and productivity tools to make every small business better.

Search or Browse for any specific document or resource you need for your business. Or explore our curated resources for Starting a Business, Growing a Business or for Professional Development.

Feel free to Contact Us with any questions you might have.