Retrieval of Bitmap Compression History

Document Sample
Retrieval of Bitmap Compression History Powered By Docstoc
					                                                             (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                       Vol. 8 No. 8, 2010

               Retrieval of Bitmap Compression History

                           Salma Hamdy, Haytham El-Messiry, Mohamed Roushdy, Essam Kahlifa
                                          Faculty of Computer and Information Sciences
                                                      Ain Shams University
                                                          Cairo, Egypt
                                    {s.hamdy, hmessiry, mroushdy, esskhalifa}

Abstract—The histogram of Discrete Cosine Transform                       threat in the public domain. Hence, ensuring that media
coefficients contains information on the compression parameters           content is credible and has not been altered is becoming an
for JPEGs and previously JPEG compressed bitmaps. In this                 important issue governmental security and commercial
paper we extend the work in [1] to identify previously                    applications. As a result, research is being conducted for
compressed bitmaps and estimate the quantization table that was
                                                                          developing authentication methods and tamper detection
used for compression, from the peaks of the histogram of DCT
coefficients. This can help in establishing bitmap compression            techniques. Usually JPEG compression introduces blocking
history which is particularly useful in applications like image           artifacts and hence one of the standard passive approaches is
authentication, JPEG artifact removal, and JPEG recompression             to use inconsistencies in these blocking fingerprints as a
with less distortion. Furthermore, the estimated table calculates         reliable indicator of possible tampering [14]. These can also be
distortion measures to classify the bitmap as genuine or forged.          used to determine what method of forgery was used.
The method shows good average estimation accuracy of around                   In this paper we are interested in the authenticity of the
92.88% against MLE and autocorrelation methods. In addition,              image. We extend the work in [1] to bitmaps and use the
because bitmaps do not experience data loss, detecting                    proposed method for identifying previously compressed
inconsistencies becomes easier. Detection performance resulted in
                                                                          bitmaps and estimating the quantization table that was used.
an average false negative rate of 3.81% and 2.26% for two
distortion measures, respectively.                                        The estimated table is then used to determine if the mage was
                                                                          forged or not by calculating distortion measures.
                                                                              In section 2 we study the histogram of DCT AC
Keywords: Digital image forensics; forgery detection; compression         coefficients of bitmaps and show how it differs for previously
history; Quantization tables.                                             JPEG compressed bitmaps. We then validate that without
                                                                          modeling rounding errors or calculating prior probabilities,
                      I.     INTRODUCTION                                 quantization steps of previously compressed bitmaps can still
    Although JPEG images are the most widely used image                   be determined straightforward from the peaks of the
                                                                          approximated histograms of DCT coefficients. Results are
format, sometimes images are saved in an uncompressed raster
                                                                          discussed in section 3. Section 4 is for conclusions.
form (bmp, tiff), and in most situations, no knowledge of
previous processing is available. Some applications are                        II.   HISTOGRAM OF DCT COEFFICIENTS IN BITMAPS
required to receive images as bitmaps with instructions for
rendering at a particular size and without further information.               We studied in [1] the histogram of quantized DCT
The image may have been processed and perhaps compressed                  coefficients and showed how it can be used to estimate
with contain severe compression artifacts. Hence, it is useful            quantization steps. Here, we study uncompressed images and
to determine the bitmap history; whether the image has ever               validate that the approximated histogram of DCT coefficients
been compressed using the JPEG standard and to know what                  can be used to determine compression history. Bitmap image
quantization tables were used. Most of the artifact removal               means no data loss and hence all what is required to build an
algorithms [2-9] require the knowledge of the quantization                informative histogram is expected to be present in the
table to estimate the amount of distortion caused by                      coefficients histograms.
quantization and avoid over-blurring. In other applications,                  The first step is to decide if the test image was previously
knowing the quantization table can help in avoiding further               compressed because if the image was an original
distortion when recompressing the image. Some methods try                 uncompressed there is no compression data to extract. When
to identify bitmap compression history using Maximum                      the image is decided to have a compression history, the next
Likelihood Estimation (MLE) [10-11] or by modeling the                    step is to estimate that history. For grayscale image,
distribution of quantized DCT coefficients, like the use of               compression history mainly means its quantization table which
Benford’s law [12], or modeling acquisition devices [13].                 will be the focus of this paper. For color image, this is
    Furthermore, due to the nature of digital media and the               extended to estimating color plane compression parameters
advanced digital image processing techniques, digital images              that includes subsampling and associated interpolation.
may be altered and redistributed very easily forming a rising

                                                                                                     ISSN 1947-5500
                                                                         (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                   Vol. 8 No. 8, 2010

                                                                                                       (a)                                       (b)
        (a) Lena image                          (b) Uncompressed

                                                                                                       (c)                                     (d)
                                                                                     Fig. 2. (a) |X*(3,3)| where Hmax occurs at Q(3,3)=6. (b) |X*(3,4)| where Hmax
 (c) JPEG compressed Q(3,3)=6            (d) Previously compressed bmp               occurs at Q(3,4) = 10 (c) |X*(5,4)| where Hmax occurs at Q(5,4)=22. (d)
                                                                                     |X*(7,5)| where Hmax occurs at Q(7,5) = 41.
                       Fig. 1. Histograms of X*(3,3).

    Fig. 1(b) shows the approximated histogram H* of DCT                             estimate as many of the low frequencies and then search
coefficient at position (3,3) of the luminance channel of an                         through lookup tables for a matching standard table.
uncompressed Lena image and the histogram of the image                                   Estimating the quantization table of a bitmap can help
after being JPEG compressed with quality factor 80. It is clear                      determine part of its compression history. If all (or most of) of
that the latter contains periodic patterns that are not present in                   the low frequency steps were estimated to be ones, we can
the uncompressed version. It was observed that the coefficient                       conclude that the image did not go through previous
is very likely to have been quantized with a step of this                            compression. High frequencies may bias because they have
periodic [15]. Now if that JPEG was stored in a bitmap                               very low contribution and do not provide a good estimate.
uncompressed form, we expect the DCT coefficients to have                            Moreover, this method works well also for uncompressed or
the same behavior because nothing is lost during this format                         lossless compressed tiff images. Fig. 3(d) shows the 96.7%
change. This is evident in Fig. 1(d) which shows an identical                        correctly estimated Q table using the above method of a tiff
histogram to the one in Fig. 1(c). Hence, similar to the                             image taken from UCID [16]. The X’s mark the
argument in [1], if we closely observe the histogram of H*(i,j)                      “undetermined” coefficients.
outside the main lobe, we notice that the maximum peak                                   Now for verifying the authenticity of the image, we use the
occurs at a value that is equal to the quantization step used to                     same distortion measures we used in [1]. The average
quantize Xq(i,j). This observation applies to most low                               distortion measure is calculated as a function of the
frequency AC coefficients. Fig. 2(a) and (b) show |H|, the                           remainders of DCT coefficients with respect to the original Q
absolute histograms of DCT coefficients for Lena of Fig. 1(a)                        matrix:
at frequencies (3,3) and (3,4), respectively. As for high                                               8 8
frequencies, the maximum occurred at a value matching Q(i,j)                                    B1      modD(i, j), Q(i, j)                                (2)
when |X*(i,j)|>B, (Fig. 2 (c) and (d)), where B is as follows:                                         i 1 j 1
          Γ  X (i,j) X q (i,j)  B(i,j)                                            where D(i,j) and Q(i,j) are the DCT coefficient and the
                                                                                     corresponding quantization table entry at position (i,j),
                                        ( 2u  1 )iπ      ( 2v  1 )jπ   (1)         respectively. An image block having a large average distortion
                0.5 c(u) c(v) cos
                                                              16                     value indicates that it is very different from what it should be
            u,v                                                                      and is likely to belong to a forged image. Averaged over the
where Xq(i,j) is the quantized coefficient, and X*(i,j) is the                       entire image, this measure can be used for making a decision
approximated quantized coefficient, Γ is the round off error,                        about authenticity of the image.
                                                                                         In addition, the JPEG 8×8 “blocking effect” is somehow
and c( )  1           2 for   0
                                                                                     still present in the uncompressed version and hence blocking
              1          otherwise                                                  artifact measure, BAM [14], can be used to give an estimate of
See [1, 11].                                                                         the distortion of the image. It is computed from the Q table as:
   Sometimes we do not have enough information to                                                             8    8
                                                                                                                                              D(i, j ) 
determine Q(i,j) for high frequencies (i,j). This happens when                                  B2 (n)        D(i, j)  Q(i, j) round  Q(i, j) 
the histogram outside the main lobe decays rapidly to zero                                                   i 1 j 1
showing no periodic structure. This reflects the small or zero                       where B(n) is the estimated blocking artifact for the nth block.
value of the coefficient. At such cases, it can be useful to

                                                                                                                         ISSN 1947-5500
                                                                               (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                         Vol. 8 No. 8, 2010

                                                                                                          5    4   3    2       1    1        1    1
                                                                                                          4    1   1    1       1   10       10   10
                                                                                                          1    1   1    1       1   10       10   10
                                                                                                          1    1   1    1       1   10       10   10
                                                                                                          1    1   1    1      14   12       12   12
                                                                                                          1    1   1    1      12   13       11   11
                                                                                                          1    1   1    1      13   11       12   11
                                                                                                          1    1   1    1      13   12       12   12
                                      (a) Test image                                     (b) Estimated Q for uncompressed version (most low frequencies are ones).
                    3     4       4      6    10       16   20     24                                          3   0    0      0    0    0    0
                    5     5       6      8    10       23   24     22                                          0   0    0      0    0    0    0
                    6     5       6     10    16       23   28     22                                          0   0    0      0    0    0    0
                    6     7       9     12    20       35   32     25                                          0   0    0      0    0    0    0
                    7     9      15     22    27       44   41     31                                          0   0    0      0    0    0    0
                   10    14      22     26    32       42   45     37                                          0   0    0      0    0    0    0
                   20    26      31     35    41       48   47     X                                           0   0    0      0    0    1    X
                   29    37      38     39    45       40   X      X                                           0   0    0      0    0    X    X
       (c) Estimated Q for previously compressed version with QF = 80.                     (d) Difference between (c) and original table for QF=80.
                                         Fig. 3. Estimating Q table for original and previously compressed tif image.

                                                                                         that of JPEG images [1]. We anticipate that because lossy
       III.    EXPERIMENTAL RESUTLS AND DISCUSSION                                       compression tends to lessen available data to make a better
                                                                                         estimate. Average estimation time for all 64 entries of images
A. Estimation Accuracy                                                                   of size 640×480 for different QFs was 52.7 seconds.
    Our testing image set consisted of 550 images collected                                  Estimating Q using MLE methods [10-11] is based on
from different sources (more than five camera models), in                                searching for all possible Q(i,j) for each DCT coefficient over
addition to some from the public domain Uncompressed Color                               the whole image which can be computationally exhaustive for
Image Database (UCID), which provides a benchmark for                                    large size files. Another method [12] proposed a logarithmic
image processing analysis [16]. Each of these images was                                 law and argued that the distribution of the first digit of DCT
compressed with different quality factors, [60, 70, 80, and 90].                         coefficients follows that generalized Benford’s law. The
Again, each of these was uncompressed and resaved as                                     method is based on re-compressing the test image with several
bitmap. This yielded 550×4 = 2,200 untouched images. For                                 quality factors and fitting the distribution of DCT coefficients
each quality factor group, an image’s histogram of DCT                                   of each version to the proposed law. The QF of the version
coefficients at one certain frequency was generated and used                             having the least fitting artifact is chosen and its corresponding
to determine the corresponding quantization step at that                                 Q table is the desired one. Of course the above methods can
frequency according to section 2. This was repeated for all the                          only estimate standard compression tables. Although it may be
64 histograms of DCT coefficients. The resulting quantization                            accurate, it is time consuming. Plus it fails when the re-
table was compared to the quality factor’s known table and the                           compression quantization step is an integer multiple of the
percentage of correctly estimated coefficients was recorded.                             original compression step size. Another method [17] tends to
Also, the estimated table was used in equations (2) and (3) to                           calculate the autocorrelation function of the histogram of DCT
determine the image’s average distortion and blocking artifact                           coefficients. The displacement corresponding to the peak
measures, respectively. These values were recorded and used                              closest to the peak at zero is the value of Q(i,j) given that the
later to set a threshold value for distinguishing forgeries from                         peak is higher than the mean value of the autocorrelation
untouched images.                                                                        function. The method eventually uses a hybrid approach; the
    Table 1 shows the accuracy of estimating all 64 entries                              low frequency coefficients are determined directly from the
using the proposed method for each quality factor averaged                               autocorrelation function, while the higher-frequency ones are
over the whole set. It exhibits a similar behavior to JPEG                               estimated by matching the estimated part to standard JPEG
images; as quality factor increases, estimation accuracy                                 tables scaled by a factor of s, which is determined from the
increases steadily with an expected drop for quality factors                             known coefficients.
higher than 90 as the periodic structure becomes less                                        Table 2 shows the estimation accuracy while Table 3
prominent and the bumps are no longer separate enough .                                  shows estimation time, for the different mentioned methods
Overall, we can see that the estimation accuracy is higher than                          against ours. Note that accuracy was calculated for directly
                                                                                         estimating only the first nine AC coefficients without
   TABLE I.        PERCENTAGE OF CORRECTLY ESTIMATED COEFFICIENTS                        matching. This is due to the methods failing to estimate high
                                 FOR SEVERLA QFS
                                                                                         frequency coefficients as most of them are quantized to zero.
              QF          60                 70               80          90             On the other hand, the listed time is for estimating the nine
  BMP                   82.07%            84.80%            87.44%      89.44%           coefficients and then retrieving the whole matching table from
  JPEG[1]               72.03%            76.99%            82.36%      88.26%           JPEG standard lookup tables. Maximum peak is faster than

                                                                                                                            ISSN 1947-5500
                                                                                     (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                               Vol. 8 No. 8, 2010

                        COEFFICIENTS FOR SEVERAL QFS

     QF       50          60           70           80        90     100     Avg.
Method                                                                       Acc.
 MLE        75.31        83.10        90.31     96.34       93.83   59.5     83.06
Benford     99.08        87.59        80.82     93.81       59.47   31.53    75.38
 Auto.      48.94        50.37        63.71     81.43       65.37   57.50    61.22
Max.Peak    97.93        97.07        99.01     97.67       89.57   76.04    92.88


       QF          50            60            70            80      90        100               (a) Average distortion measure          (b) Blocking artifact measure
                                                                                                    Fig. 4. Distortion measures for untouched and tampered images.
  MLE           38.73          37.33          37.44        37.36    37.32     34.14
 Benford        59.95          58.67          58.70        58.72    58.38     80.04            shows, values from forged images tend to cluster higher than
  Auto          9.23           11.11          11.10        11.12    11.24     8.96             those from untampered images. We tested the distortion
Max.Peak        11.27          11.29          11.30        11.30    11.30     11.56            measure for untouched images against several threshold values
                                                                                               and calculated the corresponding false positive rate FPR (the
statistical modeling and nearly as fast as that autocorrelation
                                                                                               number of untouched images declared as tampered), An ideal
method. However, average accuracy of our method is far
                                                                                               case would be a threshold giving zero false positive. However,
higher. MLE is reliable with 83% accuracy but with more than
                                                                                               we had to take into account the false negatives (the number of
double the time. Benford’s law based method has an accuracy
                                                                                               tampered images declared as untampered) that may occur
of 75 % but is the worst in time because recompressing the
                                                                                               when testing for forgeries. Hence, we require a threshold value
image and calculating distributions for each compressed
                                                                                               keeping both FPR and the FNR low. For average distortion
version may become time consuming for larger images.
                                                                                               measure, we selected a value that gave FPR of 10.8% and a
Images used in the experiments were of size 640×480.
                                                                                               lower FNR as possible for the different types of forgeries for
B. Forfery Detection                                                                           average distortion. The horizontal line marks this threshold τ =
    From the untouched previously compressed bitmap image                                      50. Similarly, we selected the BAM’s threshold to be τ = 40,
set, we selected 500 images for each quality factor, each of                                   with a corresponding FPR of 5.6%. Table 4 shows the false
which was subjected to four common forgeries; cropping,                                        negative rate (FNR) for the different forgeries at different
rotation, composition, and brightness changes. Cropping                                        quality factors for bitmaps and JPEGs. As expected, as QF
forgeries were done by deleting some columns and rows from                                     increases, a better estimate of the quantization matrix of the
the original image. An image was rotated by 270 o for rotation                                 original untampered image is obtained, and as a result the
forgeries Copy-paste forgeries were done by randomly                                           error percentage decreases. Notice how the values drop than
copying a block of pixels from an arbitrary image and placing                                  those for JPEG file. Notice also that detection of cropping is
it in the original image. Random values were added to every                                    possible when the cropping process breaks the natural JPEG
pixel of the image to simulate brightness change. The resulting                                grid, that it, the removed rows or columns do not fall in line
fake images were then stored in their uncompressed form for a                                  with the 8×8 blocking. Similarly, when the pasted part fails to
total of (500×4) × 4 = 8,000 images. Next, the quantization                                    fit perfectly into the original JPEG compressed image, the
table for each of these images was estimated as above and                                      distortion metric exceeds the detection threshold, and a
used to calculate the image’s average distortion, (2), and the                                 possible composite is declared. Fig. 5 shows examples of
blocking artifact, (3), measures, respectively.                                                composites. The resulting distortion measures for each
    Fig. 4(a) and (b) show values of the average distortion                                    composite are shown in left panel. The dark parts denote low
measure and blocking artifact measure, respectively. The                                       distortion whereas brighter parts indicate high distortion
scattered dots represent 500 untouched images (averaged for                                    values. Notice the highest values correspond to the alien part
all quality factors for each image) while the cross marks                                      and hence mark the forged area.
represent 500 images from the forged dataset. As the figure

                                                 TABLE IV.          FORGERY DETECTION ERROR RATES FOR BITMAPS AND JPEGS

         Distortion Measure                              Original           Cropping               Rotation              Compositing              Brightness
      Average            JPEG                             12.6%               9.2%                  7.55%                   8.6%                    6.45%
                          BMP                             10.8%               3.9%                  4.45%                   2.0%                     4.9%
                                  JPEG                    6.8%               3.3%                   5.95%                  3.15%                     5.0%
                                   BMP                    5.6%               1.05%                  3.05%                  1.25%                     3.7%

                                                                                                                              ISSN 1947-5500
                                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                 Vol. 8 No. 8, 2010
                         IV.     CONCLUSIONS                                              Technol., vol. 5, pp. 74–82, Apr. 1995.
                                                                                      [7] Yang Y., N Galatsanos. P., Katsaggelos A. K., “Regularized
    The method discussed in this paper is based on using the                              reconstruction to reduce blocking artifacts of block discrete cosine
approximated histogram of DCT coefficients of bitmaps for                                 transform compressed images,” IEEE Trans. Circuits Syst. Video
extracting the image’s compression history; its quantization                              Technol., vol. 3, pp. 421–432, Dec. 1993.
table. Also the extracted table is used to expose image                               [8] Luo J., Chen C.W., Parker K. J., Huang T. S., “Artifact reduction in low
                                                                                          bit rate dct-based image compression,” IEEE Trans. Image Process., vol.
forgeries. The method proved to have practically high                                     5, pp. 1363–1368, 1996.
estimation accuracy when tested on a large set of image from                          [9] Chou J., Crouse M., Ramchandran K., “A simple algorithm for removing
different sources compared to other statistical approaches.                               blocking artifacts in block-transform coded images,” IEEE Signal
Moreover, estimation times proved to be faster than statistical                           Process. Lett., vol. 5, pp. 33–35, 1998.
methods while maintaining very good accuracy for lower                               [10] Fan Z., de Queiroz R. L., “Maximum likelihood estimation of jpeg
                                                                                          quantization table in the identification of bitmap compression history”,
frequencies.      Experimental results also showed that                                   in Proc. Int. Conf. Image Process. ’00, 10-13 Sept. 2000, 1: 948–951.
performance for bitmaps surpasses that of JPEGs because of                           [11] Fan Z., de Queiroz R. L., “Identification of bitmap compression history:
their lossy nature but on the other hand, it takes more time to                           jpeg detection and quantizer estimation”, in IEEE Trans. Image
process a bitmap.                                                                         Process., 12(2): 230–235, February 2003.
                                                                                     [12] Fu D., Shi Y.Q., Su W., “A generalized benford's law for jpeg
                                                                                          coefficients and its applications in image forensics”, in Proc. SPIE
                             REFERENCES                                                   Secur., Steganography, and Watermarking of Multimed. Contents IX,
 [1] Hamdy S., El-Messiry H., Roushdy M. I., Kahlifa M. E., “Forgery                      vol. 6505, pp. 1L1-1L11, 2007.
     detection in JPEG compressed images”, JAR-Unpublished, 2010.                    [13] Swaminathan A., Wu M., Ray Liu K. J., “Digital image forensics via
[2] Rosenholtz R., Zakhor A., “Iterative procedures for reduction of                      intrinsic fingerprints”, IEEE Trans. Inf. Forensics Secur., 3(1): 101-117,
     blocking effects in transform image coding,” IEEE Trans. Circuits Syst.              March 2008.
     Video Technol., vol. 2, pp. 91–94, Mar. 1992.                                   [14] Ye S., Sun Q., Chang E.-C., “Detection digital image forgeries by
[3] Fan Z., Eschbach R., “JEPG decompression with reduced artifacts,”                     measuring inconsistencies in blocking artifacts”, in Proc. IEEE Int.
     Proc. IS&T/SPIE Symp. Electronic Imaging: Image and Video                            Conf. Multimed. and Expo., July, 2007, pp. 12-15.
     Compression, San Jose, CA, Feb. 1994.                                            [15] J. Fridrich, M. Goljan, and R. Du, "Steganalysis based on JPEG
[4] Fan Z., and F. Li, “Reducing artifacts in JPEG decompression by                       compatibility", SPIE Multimedia Systems and Applications, vol. 4518,
     segmentation and smoothing,” Proc. IEEE Int. Conf. Image Processing,                 Denver, CO, pp. 275-280, Aug. 2001.
     vol. II, 1996, pp. 17–20.                                                       [16] Schaefer G., Stich M., “UCID – An Uncompressed Color Image
[5] Tan K. T., Ghanbari M., “Blockiness detection for MPEG-2-coded                        Database”, School of Computing and Mathematics, Technical. Report,
     video,” IEEE Signal Process. Lett., vol. 7, pp. 213–215, Aug. 2000.                  Nottingham Trent University, U.K., 2003.
[6] Minami S., Zakhor A., “An optimization approach for removing                     [17] Petkov A., Cottier S., “Image quality estimation for jpeg-compressed
     blocking effects in transform coding,” IEEE Trans. Circuits Syst. Video              images without the original image”, EE398 Projects - Image and Video
                                                                                          Compression, Stanford University, March 2008.

                                                                                                                      ISSN 1947-5500
                                                                       (IJCSIS) International Journal of Computer Science and Information Security,
                                                                                                                                 Vol. 8 No. 8, 2010

                                                            (a) Three composite bitmap images.

                                                     (b) Distortion measure for the three images in (a).
Fig. 5. Distortion measures for some composite bitmap images. The left panel represents the average distortion measure while the right panel represents the
blocking artifact measure.

                                                                                                                ISSN 1947-5500