VIEWS: 74 PAGES: 6 CATEGORY: Emerging Technologies POSTED ON: 12/4/2010 Public Domain
(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 Retrieval of Bitmap Compression History Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa Faculty of Computer and Information Sciences Ain Shams University Cairo, Egypt {s.hamdy, hmessiry, mroushdy, esskhalifa}@cis.asu.edu.eg Abstract—The histogram of Discrete Cosine Transform threat in the public domain. Hence, ensuring that media coefficients contains information on the compression parameters content is credible and has not been altered is becoming an for JPEGs and previously JPEG compressed bitmaps. In this important issue governmental security and commercial paper we extend the work in [1] to identify previously applications. As a result, research is being conducted for compressed bitmaps and estimate the quantization table that was developing authentication methods and tamper detection used for compression, from the peaks of the histogram of DCT coefficients. This can help in establishing bitmap compression techniques. Usually JPEG compression introduces blocking history which is particularly useful in applications like image artifacts and hence one of the standard passive approaches is authentication, JPEG artifact removal, and JPEG recompression to use inconsistencies in these blocking fingerprints as a with less distortion. Furthermore, the estimated table calculates reliable indicator of possible tampering [14]. These can also be distortion measures to classify the bitmap as genuine or forged. used to determine what method of forgery was used. The method shows good average estimation accuracy of around In this paper we are interested in the authenticity of the 92.88% against MLE and autocorrelation methods. In addition, image. We extend the work in [1] to bitmaps and use the because bitmaps do not experience data loss, detecting proposed method for identifying previously compressed inconsistencies becomes easier. Detection performance resulted in bitmaps and estimating the quantization table that was used. an average false negative rate of 3.81% and 2.26% for two distortion measures, respectively. The estimated table is then used to determine if the mage was forged or not by calculating distortion measures. In section 2 we study the histogram of DCT AC Keywords: Digital image forensics; forgery detection; compression coefficients of bitmaps and show how it differs for previously history; Quantization tables. JPEG compressed bitmaps. We then validate that without modeling rounding errors or calculating prior probabilities, I. INTRODUCTION quantization steps of previously compressed bitmaps can still Although JPEG images are the most widely used image be determined straightforward from the peaks of the approximated histograms of DCT coefficients. Results are format, sometimes images are saved in an uncompressed raster discussed in section 3. Section 4 is for conclusions. form (bmp, tiff), and in most situations, no knowledge of previous processing is available. Some applications are II. HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS required to receive images as bitmaps with instructions for rendering at a particular size and without further information. We studied in [1] the histogram of quantized DCT The image may have been processed and perhaps compressed coefficients and showed how it can be used to estimate with contain severe compression artifacts. Hence, it is useful quantization steps. Here, we study uncompressed images and to determine the bitmap history; whether the image has ever validate that the approximated histogram of DCT coefficients been compressed using the JPEG standard and to know what can be used to determine compression history. Bitmap image quantization tables were used. Most of the artifact removal means no data loss and hence all what is required to build an algorithms [2-9] require the knowledge of the quantization informative histogram is expected to be present in the table to estimate the amount of distortion caused by coefficients histograms. quantization and avoid over-blurring. In other applications, The first step is to decide if the test image was previously knowing the quantization table can help in avoiding further compressed because if the image was an original distortion when recompressing the image. Some methods try uncompressed there is no compression data to extract. When to identify bitmap compression history using Maximum the image is decided to have a compression history, the next Likelihood Estimation (MLE) [10-11] or by modeling the step is to estimate that history. For grayscale image, distribution of quantized DCT coefficients, like the use of compression history mainly means its quantization table which Benford’s law [12], or modeling acquisition devices [13]. will be the focus of this paper. For color image, this is Furthermore, due to the nature of digital media and the extended to estimating color plane compression parameters advanced digital image processing techniques, digital images that includes subsampling and associated interpolation. may be altered and redistributed very easily forming a rising 141 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 (a) (b) (a) Lena image (b) Uncompressed (c) (d) Fig. 2. (a) |X*(3,3)| where Hmax occurs at Q(3,3)=6. (b) |X*(3,4)| where Hmax (c) JPEG compressed Q(3,3)=6 (d) Previously compressed bmp occurs at Q(3,4) = 10 (c) |X*(5,4)| where Hmax occurs at Q(5,4)=22. (d) |X*(7,5)| where Hmax occurs at Q(7,5) = 41. Fig. 1. Histograms of X*(3,3). Fig. 1(b) shows the approximated histogram H* of DCT estimate as many of the low frequencies and then search coefficient at position (3,3) of the luminance channel of an through lookup tables for a matching standard table. uncompressed Lena image and the histogram of the image Estimating the quantization table of a bitmap can help after being JPEG compressed with quality factor 80. It is clear determine part of its compression history. If all (or most of) of that the latter contains periodic patterns that are not present in the low frequency steps were estimated to be ones, we can the uncompressed version. It was observed that the coefficient conclude that the image did not go through previous is very likely to have been quantized with a step of this compression. High frequencies may bias because they have periodic [15]. Now if that JPEG was stored in a bitmap very low contribution and do not provide a good estimate. uncompressed form, we expect the DCT coefficients to have Moreover, this method works well also for uncompressed or the same behavior because nothing is lost during this format lossless compressed tiff images. Fig. 3(d) shows the 96.7% change. This is evident in Fig. 1(d) which shows an identical correctly estimated Q table using the above method of a tiff histogram to the one in Fig. 1(c). Hence, similar to the image taken from UCID [16]. The X’s mark the argument in [1], if we closely observe the histogram of H*(i,j) “undetermined” coefficients. outside the main lobe, we notice that the maximum peak Now for verifying the authenticity of the image, we use the occurs at a value that is equal to the quantization step used to same distortion measures we used in [1]. The average quantize Xq(i,j). This observation applies to most low distortion measure is calculated as a function of the frequency AC coefficients. Fig. 2(a) and (b) show |H|, the remainders of DCT coefficients with respect to the original Q absolute histograms of DCT coefficients for Lena of Fig. 1(a) matrix: at frequencies (3,3) and (3,4), respectively. As for high 8 8 frequencies, the maximum occurred at a value matching Q(i,j) B1 modD(i, j), Q(i, j) (2) when |X*(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows: i 1 j 1 * Γ X (i,j) X q (i,j) B(i,j) where D(i,j) and Q(i,j) are the DCT coefficient and the corresponding quantization table entry at position (i,j), ( 2u 1 )iπ ( 2v 1 )jπ (1) respectively. An image block having a large average distortion 0.5 c(u) c(v) cos 16 .cos 16 value indicates that it is very different from what it should be u,v and is likely to belong to a forged image. Averaged over the where Xq(i,j) is the quantized coefficient, and X*(i,j) is the entire image, this measure can be used for making a decision approximated quantized coefficient, Γ is the round off error, about authenticity of the image. In addition, the JPEG 8×8 “blocking effect” is somehow and c( ) 1 2 for 0 still present in the uncompressed version and hence blocking 1 otherwise artifact measure, BAM [14], can be used to give an estimate of See [1, 11]. the distortion of the image. It is computed from the Q table as: Sometimes we do not have enough information to 8 8 D(i, j ) determine Q(i,j) for high frequencies (i,j). This happens when B2 (n) D(i, j) Q(i, j) round Q(i, j) (3) the histogram outside the main lobe decays rapidly to zero i 1 j 1 showing no periodic structure. This reflects the small or zero where B(n) is the estimated blocking artifact for the nth block. value of the coefficient. At such cases, it can be useful to 142 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 5 4 3 2 1 1 1 1 4 1 1 1 1 10 10 10 1 1 1 1 1 10 10 10 1 1 1 1 1 10 10 10 1 1 1 1 14 12 12 12 1 1 1 1 12 13 11 11 1 1 1 1 13 11 12 11 1 1 1 1 13 12 12 12 (a) Test image (b) Estimated Q for uncompressed version (most low frequencies are ones). 3 4 4 6 10 16 20 24 3 0 0 0 0 0 0 5 5 6 8 10 23 24 22 0 0 0 0 0 0 0 6 5 6 10 16 23 28 22 0 0 0 0 0 0 0 6 7 9 12 20 35 32 25 0 0 0 0 0 0 0 7 9 15 22 27 44 41 31 0 0 0 0 0 0 0 10 14 22 26 32 42 45 37 0 0 0 0 0 0 0 20 26 31 35 41 48 47 X 0 0 0 0 0 1 X 29 37 38 39 45 40 X X 0 0 0 0 0 X X (c) Estimated Q for previously compressed version with QF = 80. (d) Difference between (c) and original table for QF=80. Fig. 3. Estimating Q table for original and previously compressed tif image. that of JPEG images [1]. We anticipate that because lossy III. EXPERIMENTAL RESUTLS AND DISCUSSION compression tends to lessen available data to make a better estimate. Average estimation time for all 64 entries of images A. Estimation Accuracy of size 640×480 for different QFs was 52.7 seconds. Our testing image set consisted of 550 images collected Estimating Q using MLE methods [10-11] is based on from different sources (more than five camera models), in searching for all possible Q(i,j) for each DCT coefficient over addition to some from the public domain Uncompressed Color the whole image which can be computationally exhaustive for Image Database (UCID), which provides a benchmark for large size files. Another method [12] proposed a logarithmic image processing analysis [16]. Each of these images was law and argued that the distribution of the first digit of DCT compressed with different quality factors, [60, 70, 80, and 90]. coefficients follows that generalized Benford’s law. The Again, each of these was uncompressed and resaved as method is based on re-compressing the test image with several bitmap. This yielded 550×4 = 2,200 untouched images. For quality factors and fitting the distribution of DCT coefficients each quality factor group, an image’s histogram of DCT of each version to the proposed law. The QF of the version coefficients at one certain frequency was generated and used having the least fitting artifact is chosen and its corresponding to determine the corresponding quantization step at that Q table is the desired one. Of course the above methods can frequency according to section 2. This was repeated for all the only estimate standard compression tables. Although it may be 64 histograms of DCT coefficients. The resulting quantization accurate, it is time consuming. Plus it fails when the re- table was compared to the quality factor’s known table and the compression quantization step is an integer multiple of the percentage of correctly estimated coefficients was recorded. original compression step size. Another method [17] tends to Also, the estimated table was used in equations (2) and (3) to calculate the autocorrelation function of the histogram of DCT determine the image’s average distortion and blocking artifact coefficients. The displacement corresponding to the peak measures, respectively. These values were recorded and used closest to the peak at zero is the value of Q(i,j) given that the later to set a threshold value for distinguishing forgeries from peak is higher than the mean value of the autocorrelation untouched images. function. The method eventually uses a hybrid approach; the Table 1 shows the accuracy of estimating all 64 entries low frequency coefficients are determined directly from the using the proposed method for each quality factor averaged autocorrelation function, while the higher-frequency ones are over the whole set. It exhibits a similar behavior to JPEG estimated by matching the estimated part to standard JPEG images; as quality factor increases, estimation accuracy tables scaled by a factor of s, which is determined from the increases steadily with an expected drop for quality factors known coefficients. higher than 90 as the periodic structure becomes less Table 2 shows the estimation accuracy while Table 3 prominent and the bumps are no longer separate enough . shows estimation time, for the different mentioned methods Overall, we can see that the estimation accuracy is higher than against ours. Note that accuracy was calculated for directly estimating only the first nine AC coefficients without TABLE I. PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS matching. This is due to the methods failing to estimate high FOR SEVERLA QFS frequency coefficients as most of them are quantized to zero. QF 60 70 80 90 On the other hand, the listed time is for estimating the nine BMP 82.07% 84.80% 87.44% 89.44% coefficients and then retrieving the whole matching table from JPEG[1] 72.03% 76.99% 82.36% 88.26% JPEG standard lookup tables. Maximum peak is faster than 143 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 TABLE II. ESTIMATION ACCURCAY FOR THE FIRSY 3×3 AC COEFFICIENTS FOR SEVERAL QFS QF 50 60 70 80 90 100 Avg. Method Acc. MLE 75.31 83.10 90.31 96.34 93.83 59.5 83.06 Benford 99.08 87.59 80.82 93.81 59.47 31.53 75.38 Auto. 48.94 50.37 63.71 81.43 65.37 57.50 61.22 Max.Peak 97.93 97.07 99.01 97.67 89.57 76.04 92.88 TABLE III. ESTIMATION TIME IN SECONDS FOR THE FIRSY 3×3 AC COEFFICIENTS FOR SEVERAL QFS QF 50 60 70 80 90 100 (a) Average distortion measure (b) Blocking artifact measure Method Fig. 4. Distortion measures for untouched and tampered images. MLE 38.73 37.33 37.44 37.36 37.32 34.14 Benford 59.95 58.67 58.70 58.72 58.38 80.04 shows, values from forged images tend to cluster higher than Auto 9.23 11.11 11.10 11.12 11.24 8.96 those from untampered images. We tested the distortion Max.Peak 11.27 11.29 11.30 11.30 11.30 11.56 measure for untouched images against several threshold values and calculated the corresponding false positive rate FPR (the statistical modeling and nearly as fast as that autocorrelation number of untouched images declared as tampered), An ideal method. However, average accuracy of our method is far case would be a threshold giving zero false positive. However, higher. MLE is reliable with 83% accuracy but with more than we had to take into account the false negatives (the number of double the time. Benford’s law based method has an accuracy tampered images declared as untampered) that may occur of 75 % but is the worst in time because recompressing the when testing for forgeries. Hence, we require a threshold value image and calculating distributions for each compressed keeping both FPR and the FNR low. For average distortion version may become time consuming for larger images. measure, we selected a value that gave FPR of 10.8% and a Images used in the experiments were of size 640×480. lower FNR as possible for the different types of forgeries for B. Forfery Detection average distortion. The horizontal line marks this threshold τ = From the untouched previously compressed bitmap image 50. Similarly, we selected the BAM’s threshold to be τ = 40, set, we selected 500 images for each quality factor, each of with a corresponding FPR of 5.6%. Table 4 shows the false which was subjected to four common forgeries; cropping, negative rate (FNR) for the different forgeries at different rotation, composition, and brightness changes. Cropping quality factors for bitmaps and JPEGs. As expected, as QF forgeries were done by deleting some columns and rows from increases, a better estimate of the quantization matrix of the the original image. An image was rotated by 270 o for rotation original untampered image is obtained, and as a result the forgeries Copy-paste forgeries were done by randomly error percentage decreases. Notice how the values drop than copying a block of pixels from an arbitrary image and placing those for JPEG file. Notice also that detection of cropping is it in the original image. Random values were added to every possible when the cropping process breaks the natural JPEG pixel of the image to simulate brightness change. The resulting grid, that it, the removed rows or columns do not fall in line fake images were then stored in their uncompressed form for a with the 8×8 blocking. Similarly, when the pasted part fails to total of (500×4) × 4 = 8,000 images. Next, the quantization fit perfectly into the original JPEG compressed image, the table for each of these images was estimated as above and distortion metric exceeds the detection threshold, and a used to calculate the image’s average distortion, (2), and the possible composite is declared. Fig. 5 shows examples of blocking artifact, (3), measures, respectively. composites. The resulting distortion measures for each Fig. 4(a) and (b) show values of the average distortion composite are shown in left panel. The dark parts denote low measure and blocking artifact measure, respectively. The distortion whereas brighter parts indicate high distortion scattered dots represent 500 untouched images (averaged for values. Notice the highest values correspond to the alien part all quality factors for each image) while the cross marks and hence mark the forged area. represent 500 images from the forged dataset. As the figure TABLE IV. FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS Distortion Measure Original Cropping Rotation Compositing Brightness Average JPEG 12.6% 9.2% 7.55% 8.6% 6.45% BMP 10.8% 3.9% 4.45% 2.0% 4.9% JPEG 6.8% 3.3% 5.95% 3.15% 5.0% BAM BMP 5.6% 1.05% 3.05% 1.25% 3.7% 144 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 IV. CONCLUSIONS Technol., vol. 5, pp. 74–82, Apr. 1995. [7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized The method discussed in this paper is based on using the reconstruction to reduce blocking artifacts of block discrete cosine approximated histogram of DCT coefficients of bitmaps for transform compressed images,” IEEE Trans. Circuits Syst. Video extracting the image’s compression history; its quantization Technol., vol. 3, pp. 421–432, Dec. 1993. table. Also the extracted table is used to expose image [8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in low bit rate dct-based image compression,” IEEE Trans. Image Process., vol. forgeries. The method proved to have practically high 5, pp. 1363–1368, 1996. estimation accuracy when tested on a large set of image from [9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing different sources compared to other statistical approaches. blocking artifacts in block-transform coded images,” IEEE Signal Moreover, estimation times proved to be faster than statistical Process. Lett., vol. 5, pp. 33–35, 1998. methods while maintaining very good accuracy for lower [10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg quantization table in the identification of bitmap compression history”, frequencies. Experimental results also showed that in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948–951. performance for bitmaps surpasses that of JPEGs because of [11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history: their lossy nature but on the other hand, it takes more time to jpeg detection and quantizer estimation”, in IEEE Trans. Image process a bitmap. Process., 12(2): 230–235, February 2003. [12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpeg coefficients and its applications in image forensics”, in Proc. SPIE REFERENCES Secur., Steganography, and Watermarking of Multimed. Contents IX, [1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery vol. 6505, pp. 1L1-1L11, 2007. detection in JPEG compressed images”, JAR-Unpublished, 2010. [13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via [2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117, blocking effects in transform image coding,” IEEE Trans. Circuits Syst. March 2008. Video Technol., vol. 2, pp. 91–94, Mar. 1992. [14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by [3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,” measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int. Proc. IS&T/SPIE Symp. Electronic Imaging: Image and Video Conf. Multimed. and Expo., July, 2007, pp. 12-15. Compression, San Jose, CA, Feb. 1994. [15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG [4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by compatibility", SPIE Multimedia Systems and Applications, vol. 4518, segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing, Denver, CO, pp. 275-280, Aug. 2001. vol. II, 1996, pp. 17–20. [16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image [5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG-2-coded Database”, School of Computing and Mathematics, Technical. Report, video,” IEEE Signal Process. Lett., vol. 7, pp. 213–215, Aug. 2000. Nottingham Trent University, U.K., 2003. [6] Minami S., Zakhor A., “An optimization approach for removing [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed blocking effects in transform coding,” IEEE Trans. Circuits Syst. Video images without the original image”, EE398 Projects - Image and Video Compression, Stanford University, March 2008. 145 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 8 No. 8, 2010 (a) Three composite bitmap images. (b) Distortion measure for the three images in (a). Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the blocking artifact measure. 146 http://sites.google.com/site/ijcsis/ ISSN 1947-5500