# Default

Document Sample

```					RECOVERY OF DAMAGED
COMPRESSED FILES
Bassam Lababidi
PROBLEM
ZIP FILES

   Each file entry is introduced by a local header
with information about the file: comment, file
size and file name and then the possibly
compressed, possibly encrypted file data.
PROBLEM DOMAIN

 Compressed files that have a damaged
 No tools currently available for such a case

 Current tools can only decompress blocks
with a correct header even if some blocks
are damaged.
OUR PURPOSE

 The purpose of a forensic investigator is to
regain the original data or, at least part of
the data.
 Some part of the file might constitute
“Digital Evidence”
 Better Insight into the joined field of
compression and digital forensics
INTRODUCTION ON COMPRESSION
GLOSSARY

 Raw compressed data
 Compressed data

 Decompressor

 Huffman Code

 Huffman Tree

 LZ77
LZ77

   a.k.a. sliding window
compression
   Based on Lempel
and Ziv's paper in
1977
   replacing portions of
data with references
to matching data
passed
LZ77

 Looks for the
length and
distance of the
longest match
found in the buffer
 length-distance
pair, and the next
literal
HUFFMAN'S CODE

   expresses the most
common characters
using shorter strings of
bits than are used for
less common source
symbols
DEFLATE

 Uses both LZ77 and Huffman's Code
 Compression is achieved through two
steps
 The  matching and replacement of duplicate
strings with pointers. (LZ77)
 Replacing symbols with new, weighted
symbols based on frequency of use.
(Huffman’s Code)
DEFLATE

LZ77   (Distance, length, Next Literal)   Huffman Coding

Deflated
Stream                                                   File
of Data
WHAT CURRENT TOOLS CAN DO?

 Looks for the local
0x504B0304.
 If found, then
jumps to the start
of the raw data.
 In this case, F,G,H
are uncompressed
(if independent)
   00: a stored/raw/literal
section follows, between 0
and 65535 bytes in length
   01: a static Huffman
compressed block, using a
predefined Huffman table
   10: a compressed block
with a dynamic Huffman
encoding scheme

 Mode 2, the Huffman table is located at the
level of the DEFLATE compression algorithm.
 Mode 3 the Huffman table is within the
compressed file.
 Our approach acts at the bit level by considering
mode 2
SOLUTION
SOLUTION

1.   given a damaged input file
2.   Remove one bit
3.   Apply decompression algorithm
4.   If decompression can be done, then save
resulting data block as a file (chunk).
5.   if decompression cannot be done, go to #2
POSSIBLE PROBLEMS
   Blocks can be independent or
dependent among each other.
   Inter-block dependency does
not permit the recovery of a
single raw compressed data
block
   low compression ratiohigher
independency
RESULT
A TOOL

 Decompression Helper
 decompressed[i] is referred to as the
decompressed data block which starts from i-th
bit.
CONCLUSION

 The algorithm works for mode2
 The algorithm works best for independent
blocks
 There could be a possibility to decompress files
in mode3 if some context is known.
REFERENCES

 Bora Park and Antonio, 2008, Savoldi,Recovery
of Damaged Compressed Files for Digital
Forensic Purposes, International Conference on
Multimedia and Ubiquitous Engineering
 Wikipedia

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 1 posted: 2/6/2012 language: pages: 22