Benchmark for Modern Wavelet Based Still Image Compression

Document Sample
Benchmark for Modern Wavelet Based Still Image Compression Powered By Docstoc
					Benchmark for Modern Wavelet Based Still Image Compression
Yarish Brijmohan and S.H. Mneney, Member SAIEE
number of lags in the time domain, whereas anomalies tend to be wideband in the frequency domain and localised in the time domain. The advantage of wavelet theory is that both anomalies and trends can be analysed on an equal statistical footing. The wavelet transform allocates some coefficients to high data lags corresponding to narrowband low frequency range and some coefficients to short data lags corresponding to a wideband high frequency range. Statistical independence – Transform coefficients are shown to be almost statistically independent, therefore samples do not depend on each other. Hence if the inverse transform is calculated, it does not matter if the transform coefficients are exact or not. Insignificant coefficients – The wavelet transform produces a number of coefficients that have magnitudes close to zero. This property is exploited in a number of compression algorithms. If the locations of these coefficients are implicitly known, then these coefficients need not be coded.

Abstract—This paper aims to review the development of modern wavelet based image compression techniques. The Embedded Zerotree Wavelet (EZW) encoder, which set the benchmark for modern wavelet encoders, is discussed. The effectiveness of the zerotree method in identifying insignificant coefficients is due to the properties of the wavelet transform. The Set Partitioning in Hierarchical Trees (SPIHT) algorithm, which is used in the JPEG 2000 compression standard is outlined. The performance of these algorithms is then compared to other modern wavelet based image compression schemes. The results show that the SPIHT encoder is well known for its simplicity and efficiency. I. INTRODUCTION



the last 20 years, wavelets have found numerous applications in signal processing theory and practice. One of the most important applications is in image and multi-resolution analysis. Multi-resolution theory is concerned with the representation and analysis of images at more than one resolution. The ability of wavelet theory to analyse both anomalies and trends of signals on an equal statistical footing make it much suited to image coding and compression. Image compression and coding forms a vital part of multimedia transmission and reception since more information can be transmitted using the same bandwidth. The main aim of image compression is to store an image in a more condensed form, which is essentially to use fewer bits to represent the image. There are two types of image compression: lossless and lossy. In lossless compression the original image is recovered exactly after decompression. However, it is difficult to obtain error-free compression above a 2:1 rate. Hence to obtain much higher compression ratios, the compression must be lossy, in that there is a small error between the decompressed and original image.


II. WAVELET TRANSFORM A. Why use the wavelet transform 1) Trends and Anomalies – Images consist mainly of areas of spatial trends (high statistical spatial correlation) and few anomalies (edges and object boundaries). Therefore in compressing images, the majority of the bits must be allocated to the trends, rather than to the anomalies. Trends are localised in the frequency domain, while it endures a large

B. The Discrete Wavelet Transform Wavelet analysis is similar to subband analysis; therefore subband techniques will be used to compute the wavelet transform. For a full introduction to wavelet analysis, the reader is referred to [1]. The most commonly used technique to compute the onedimensional discrete wavelet transform (DWT) is shown in Fig. 1. This is the Mallat’s herringbone algorithm (filter bank method), where the signal to be transformed is f(n) and J is the highest resolution scale. The signal is passed through the highpass and lowpass filter, hψ(-n) and hϕ(-n), respectively, then down-sampled by two, to yield a one level transformation. The one level transformation gives rise to a subband split. Here, Wψ(J-1,n) corresponds to the detail transform coefficients and Wϕ(J-1,n) the approximation coefficients at the next highest scale. To achieve multilevel decomposition, the approximation coefficients are filtered as above. The two dimensional (2-D) DWT of a function f(x,y) can be computed by applying the one dimensional DWT on the rows of f(x,y), followed by the one dimensional DWT on the resulting columns. On application of the 2-D transform, the image is divided into four subbands as shown in Fig. 2(a). These four subbands arise from the application of the vertical and horizontal filters. The subbands are: • LL1 – Approximation coefficients at level 1 • LH1 – Horizontal detail coefficients at level 1

Fig. 3. (a) A simple transform encoder – compression of an image. (b) A transform decoder – decompression of an image Fig. 1. Two level Discrete Wavelet Transform of a one dimensional function.

Fig. 2. (a) One Level Wavelet Decomposition of an Image. The size of each subband is a quarter of the original image. (b) Two Level Wavelet Decomposition. The size of the new subbands is an eighth of the original image.

• HL1 – Vertical detail coefficients at level 1 • HH1 – Diagonal detail coefficients at level 1. To obtain the next level of the wavelet transform, the 2-D DWT is applied to the approximation coefficients, i.e. coefficients at subband LLx. This process continues until a final scale is reached. Fig. 2(b) shows a 2 level wavelet decomposition of an image.

III. TRANSFORM CODING Transform based compression schemes first involve the transformation of spatial information to another domain. The main aim of this first step is to decorrelate the spatially distributed energy into fewer data samples such that no information is lost and also to remove the redundancy in the transformed image. The transformation must have an inverse transformation to reconstruct the compressed image in the spatial domain, such that the image can be recovered. Normally speaking, the transform and the inverse transform are lossless. The steps of the wavelet based transform encoder and decoder are shown in Fig. 3. All loss of information occurs at the quantization stage. Quantizing refers to a reduction of the precision of the values of the transform. For compression to be achieved, transform values must be expressed with fewer bits for each value. It is at this point where compression algorithms take different approaches. The final stage of the coding process is the entropy coding. Entropy coding minimizes the redundancy in the bit stream and is fully invertible at the decoding stage, so it is lossless. IV. WAVELET BASED COMPRESSION ALGORITHMS A good quantization technique is as important to compression as a good transform. Several lossy encoding algorithms appear in literature, with the first effective one being proposed by Jerry Shapiro [2], namely the EZW algorithm.

A. Embedded Zerotree Wavelet (EZW) Algorithm In 1993, Jerry Shapiro introduced the EZW algorithm. EZW coding exploited the multi-resolution nature of the wavelet decomposition to give a completely new way of doing image coding. This algorithm was the first of the wavelet image encoders to efficiently capture the low frequency information as well as the localised high frequency information by the method of zerotree quantization. The EZW approach then set a new standard in modern wavelet image coding. At that time, this algorithm outperformed other existing coding standards at low bit rates, thereby improving image compression. Embedded coding is a process of encoding the transform magnitudes such that it allows for progressive transmission of the compressed image. Using an embedded code an encoder can terminate the coding at any point thereby allowing a target rate to be met exactly. The longer the process continues, more precision is added. The EZW algorithm has the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code. The EZW coder is based on two observations of the wavelet transform: • In general, natural images have a lowpass spectrum. Hence the maximum and average absolute wavelet coefficients will get smaller as one moves from the lower subbands (highest scale) to the higher subbands. This shows that progressive encoding is a very natural choice. • Large wavelet coefficients are more important than small coefficients, since they contain more information. Hence these coefficients are coded first. Zerotrees allow for concise encoding of the positions of significant values by creating a highly compressed description of the location of insignificant values. A Zerotree is a quad-tree of which all nodes are equal to or smaller than the root, which is smaller than the threshold against which the wavelet coefficients are currently being measured. The tree is coded with a single symbol (‘ZTR’) and reconstructed by the decoder as a quad-tree filled with zeroes. The coefficients in the lower frequency subbands [i,j] can be thought of as having four children (or descendants) in the next higher subband at locations [2i,2j], [2i+1,2j], [2i,2j+1] and [2i+1,2j+1]. Each of these children will in turn have four children in the next higher subband. A quad-tree is then a tree of locations in a wavelet transform, with a root [i,j], its children and each of their children, and so on. These descendents of the root reach all the way back to the first level of the wavelet transform. This is illustrated in Fig. 4. EZW works efficiently because of the zerotree hypothesis which states that if a wavelet coefficient c at a coarse scale is insignificant with respect to a given threshold T, i.e. |c|<T then all wavelet coefficients of the same orientation at finer

Fig. 4. Parent-Child Dependencies of Quadtrees. Two quadtrees are shown where the root of one is at level 3 and the other at level 2.

Fig. 5. Raster Scan Order

scales are also likely to be insignificant with respect to T. Also using the third property of the wavelet transform (Section I (A)), the transformation of natural images produce many zerotrees, especially at higher thresholds. The EZW encoder can be broken up into five steps which are detailed below. 1) Initialisation: Choose the initial threshold T = T0 such that |W(x,y)| < T0 and at least one transform value satisfies |W(x,y)| ≥ T0/2. Normally, n0 = log 2 ( Max( W ( x, y ))  and T0 = 2no where W(x,y) are the transform coefficients and Max() indicates the maximum coefficient value in the transform. A “header” is output, which consists of the transform size as well as a representation of the initial threshold, n0. 2) Dominant Pass: Each coefficient is compared to the threshold T in a specific order, to test its significance. The most commonly used scanning order is the raster scan which is shown in Fig. 5. A coefficient is significant if |W(x,y) ≥ T|. If the coefficient is significant then it is encoded as ‘POS’ (positive) or ‘NEG’ (negative) depending on its sign. This coefficient need not be coded again at lower threshold levels; therefore its value in the transform will be set to zero. If the coefficient itself is insignificant but one of its descendants is significant, it is encoded as ‘IZ’ (isolated zero). If the hypothesis is true for the coefficient, it will be encoded as ‘ZTR’ (zerotree root). All its descendants don’t need to be encoded as they will be reconstructed as zero at this threshold level. At the end of the dominant pass, all the coefficients that are in absolute value larger than the current threshold are extracted and appended without their sign on the subordinate list and their positions in the image are filled with zeroes. 3) Subordinate pass: This is the refinement pass (refines the significant coefficient value) in which the next most significant bit of all the coefficients in the subordinate list is output. Each value in the subordinate list is compared to the current threshold T. If the value is larger than T, a ‘1’ is output and the current threshold is subtracted from the coefficient value in the subordinate list. If the coefficient is smaller than the threshold, a ‘0’ is output. On completion of the subordinate pass, the subordinate list should be sorted in order of highest to lowest values. This is done so that the larger coefficients, which carry the most information, are in the front, and are thus coded first. Also, the entropy encoder, which follows this quantization stage, will become more efficient, since this pass will generate a group of ‘1’ symbols followed by a group of ‘0’ symbols as apposed to randomly generating these symbols.

4) Decrease the Threshold: T = T/2 5) Repeat: If the threshold (T) is greater than the minimum threshold then repeat Steps 2 to 4. The minimum threshold value controls the encoding performance and also the bit rate. If a “0” minimum value is specified, it gives rise to lossless reconstruction of the image. The decoding algorithm is similar to the encoder. The size of the reconstructed transform as well as the initial threshold can be acquired from the “header”. The reconstructed transform will initially contain all zero values. The decoder also consists of dominant and subordinate passes. It is important to use the same scanning order that was used by the encoder. EZW algorithm has very good performance in terms of the Peak Signal to Noise Ratio (PSNR) in low bit rate comparison with existing encoders at that time because it preserves all significant coefficients at each scale. However, its drawback is that it is computationally expensive. Also encoding of sub-images is not possible since the entire image must be transformed before encoding begins. To reduce the computation time, Shapiro introduced a more efficient method of identifying zerotrees during the encoding stage [13]. This new algorithm utilises lookup tables instead of the original recursive search algorithm. B. Set Partitioning in Hierarchical Trees A number of wavelet coding methods, using the fundamental ideas of the EZW coding scheme, have been proposed. One of the most popular algorithms is the SPIHT algorithm introduced by Said and Pearlman [3]. SPIHT was able to achieve higher performance than the EZW algorithm without having to use an entropy encoder. Hence the complexity reduction was significant. This algorithm also formed the basis for future wavelet based coders. Therefore this section is dedicated in detailing the important properties of this algorithm. The term hierarchical trees refer to the quadtrees while set partitioning refers to the way these quadtrees partition the wavelet transform values at a given threshold. The SPIHT algorithm uses a partitioning of the quadtrees in a manner that tends to keep insignificant coefficients together in larger subsets. The partitioning decisions are binary decisions that are transmitted to the decoder. SPIHT is a fully embedded wavelet coding algorithm that progressively refines the most significant coefficients. Hence the ordering data is not explicitly transmitted. The following sets of co-ordinates are used in the SPIHT algorithm: • H: set of all coordinates of all spatial orientation tree roots (nodes in the highest pyramid level, the lowest resolution)

• O(i,j) set of coordinates of all offspring’s of node (i,j) • D(i,j): set of coordinates of all descendants of node (i,j) • L(i,j): set of coordinates of all the descendants of the coefficients at location (i,j) except for the immediate offspring’s of the coefficient at location (i,j). i.e. L(i,j)=D(i,j)-O(i,j) Three lists are then used to keep track of the order in which elements are tested for significance. These lists are: • LSP – The list of significant pixels which contains the coordinates of coefficients that are found to be significant. • LIP – The list of insignificant pixels which contains the coordinates of coefficients that are insignificant. • LIS – The list of insignificant sets, which contain the coordinates of the roots of sets of type D or L. A set or pixel is significant if the absolute of the maximum element value in that set is greater than the current threshold i.e. 1, max{| c |} ≥ 2 n    i, j (i, j)∈τ S n (τ ) =  . 0, otherwise   

V. COMPARISON OF MODERN WAVELET ENCODERS For compression of images to occur, each pixel in the image must be represented with fewer bits than it already has. The bit rate defines how many bits are used to represent each pixel (bits per pixel) i.e. number of bits output bit rate ( R ) = bpp. number of pixels Hence the compression ratio is given by: bit rate of original image compression ratio = . bit rate of compressed image Hence to obtain high compression ratios, the bit rate of the encoded image must be small. Two commonly used measures for quantifying the error between images are the mean square error (MSE) and the peak signal to noise ratio (PSNR). The MSE between two images A and B is defined as: 1 MSE = ∑ [ A(i, j ) − B (i, j )]2 N i, j where N is the total number of pixels in each image, and sum is over all pixels in the image. The PSNR between two images is given as: 2   (bit rate of original image ) −1  2 PSNR = 10 log10   dBs. MSE     Generally, if the PSNR is 40dBs or larger, the two images are virtually indistinguishable by average human observers. The image comparisons were performed using two 512x512 greyscale images namely Lena and Barbara [5]. The comparison charts are shown in Fig. 6 and 7 for different low output bit rates. All wavelet decompositions were executed using a 5 level 10/8 filter or a 5 level 9/7 filter. The different algorithms used are as follows: • EZW – Embedded Zerotree Wavelet [2] • SPIHT – Set Partitioning in Hierarchical Trees [3] • SFQ – Space-Frequency Quantization [8] • SR – Stack Run [9] • PACC – Partition Aggregation, Conditioned Coding [10] • Context Based [11]

where τ indicates a set of co-ordinates and the current threshold = 2n. The following steps implements the SPIHT encoder: 1) Initialisation: Choose the initial threshold T (same as for the EZW). Set LIP = H, set LSP = 0 and LIS = D. Output “header”. 2) Sorting Pass: i) Examination of LIP: • If the coefficient at the coordinate is significant then output a ‘1’, followed by a bit representing the sign of the coefficient (assume ‘1’ for positive, ‘0’ for negative). Move that coefficient to the LSP list. • If the coefficient at that coordinate is not significant, output ‘0’ and keep it in the LIP. ii) Examination of LIS • If the set of co-ordinate (i,j) is not significant, then output a ‘0’ • If the set of co-ordinate (i,j) is significant, then output a ‘1’. What happens next depends on if the set is of type D or L. If it is type D, we check each offspring of the coefficient at that co-ordinate. If the offspring coefficient is significant then output a ‘1’, followed by a bit representing the sign of the coefficient (‘1’ for positive, ‘0’ for negative). Thereafter move that coefficient to the LSP. If the coefficient at that coordinate is not significant, output a ‘0’ and add their coordinate to the LIP. • If the set is of type L, add it to the end of the LIS as the root of a set of type D. Note that these new entries in the LIS will be examined during this pass. Thereafter remove D(i,j) from the LIS. 3) Refinement Pass: Examine the coefficients in LSP and output the nth most significant bit of that coefficient. 4) Decrease the Threshold: T = T/2 or n = n-1 5) Repeat: If the threshold (T) is greater than the minimum threshold then repeat Steps 2 to 4.



All encoders follow the same pattern in which the PSNR reduces as the compression increases. Therefore at some point the clarity of the image will dictate the maximum compression rate. The results show that even with the less extensive processing, the SPIHT method outperforms the EZW algorithm. The other modern encoders have comparable results to the SPIHT algorithm, with a slight increase in performance. However, the computational expense is greater. Thus with its outstanding performance, simplicity and efficiency, the SPIHT algorithm formed the bases for the JPEG 2000 standard. VI. CONCLUSION The SPIHT algorithm now sets the benchmark for all recent wavelet based coding schemes due to its simplicity and good

41 40 39

38 37 36 35 34 33 0.25
Bit rate (bpp)



Fig. 6. Comparison Chart of Different Wavelet Encoders on the 512x512 greyscale Lena Image. The output bit-rate of the encoders were specified to be 0.25, 0.5 and 1 bpp respectively.

34 33 32

31 30 29 28 27 26 0.25
Bit rate (bpp)


Fig. 7. Comparison of Modern Wavelet Encoders on the 512x512 greyscale Barbara Image. The output bit-rate of the encoders was specified to be 0.25 and 0.5 bpp respectively.

performance. It is the simplicity factor which allows the SPIHT algorithm to be implemented in real time on a DSP with a small adjustment to the algorithm. The reader is advised to refer to [12] for this implementation. VII. ACKNOWLEDGMENTS The authors would like to thank Armscor for their continuous support of this project. REFERENCES [1] [2] [3] Gonzalez R.C. and Woods R.E., “Digital Image Processing”, 2nd Edition, Prentice Hall Inc., New Jersey, 2002. J.M. Shapiro, “Embedded image coding using zerotrees of wavelet coefficients,” IEEE Trans Signal Processing, vol. 41, pp. 3445-3462, Dec. 1993. A. Said and W.A. Pearlman, “A new fast and efficient image codec based on Set Partitioning in Hierarchical Trees,” IEEE Trans Signal Processing, Vol.6, June 1996. A. Skodras, C. Christopoulous, and T. Ebrahimi, “The JPEG2000 still image compression standard,” IEEE Signal Processing Mag., vol.18, pp. 36-58, Sept. 2001. Image Comunications Lab, “Wavelet Image Coding: PSNR Results”. UCLA School of Engineering and

Applied Sciences. Available /~ipl/psnr_results.html. [6] B.E.Usevitch, “A Tutorial on modern lossy wavelet image compression,” IEEE Trans Signal Processing, vol. 18, pp. 22-35, Sept. 2001. [7] S.M. LoPresto, K. Ramchandran, and M.T. Orchard, “Image coding based on mixture modeling of wavelet coefficients and a fast estimation-quantization framework,” IEEE Data Compression Conference '97 Proceedings, pp. 221-230, March 1997. [8] Z. Xiong, K. Ramchandran, and M. T. Orchard, “Space-frequency quantization for wavelet image coding,” to appear in IEEE Trans. Image Processing, 1997. [9] M.J. Tsai, J. Villasenor, and F. Chen, “Stack-run image coding,” IEEE Transactions on Circuits and Systems for Video technology, vol. 6, pp. 519-521, October 1996. [10] D. Marpe and H.L. Cycon, “Efficient Pre-Coding Techniques for Wavelet-Based Image Compression,” submitted to PCS, Berlin, 1997. [11] C. Chrysafis and A. Ortega, “Efficient context-based entropy coding for lossy wavelet image compression,” DCC, Data Compression Conference, Snowbird, UT, March 25 - 27, 1997. [12] Y. Sun, H. Zhang and G. Hu, “Real-Time Implementation of a New Low-Memory SPIHT Image Coding Algorithm Using DSP Chip”, IEEE Trans. on Image Processing, Vol. 11, No.9, September 2002. [13] J.M. Shapiro, “A Fast Technique for Identifying Zerotrees in the EZW Algorithm”, Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 1455-1458, May 1996. [14] E.K.R. Rao and P.C. Yip, “The Transform and Data Compression Handbook”, Boca Raton, CRC Press LLC, 2001. [15] L.R. Iyer, “Image Compression Using Balanced Multiwavelets”, Virginia Polytechnic Institute and State University, June 2001.
Yarish Brijmohan received the B.Sc. degree in Electronic Engineering from the University of Natal, Durban, South Africa in 2002. He is currently pursuing a M.Sc. degree in Electronic Engineering at the School of Electrical, Electronic and Computer Engineering at University of Natal. (e-mail: His research interests include image and video coding using wavelet transforms and fractals. Stanley Mneney obtained his B.Sc. (Hons) Eng. degree from the University of Science and Technology, Kumasi, Ghana in 1976. He was awarded the Charles Deakens prize upon graduation. In 1979 he completed his M.ASc. from the University of Toronto in Canada. In a Nuffic funded project by the Netherlands government he embarked on a sandwich Ph.D programme between the Eindhoven University of Technology and the University of Dar es Salaam, the latter awarding the degree in 1988. In 1983 he was awarded the 1983 Pierro Fanti international prize by INTELSAT and TELESPAZIO. He has taught at the Universities of Dar es Salaam, Nairobi, Durban Westville and now University of Natal where he holds the position of Associate Professor. His research interests include theory and performance of telecommunication systems, low cost rural telecommunications services and networks, digital signal processing applications, EMC, and RF design.

[4] [5]

Shared By:
Description: Benchmark for Modern Wavelet Based Still Image Compression