Hashing Concepts by cuiliqing


									Hashing Concepts

        CSC 485/585
   Define Hashing and Hash Values.
   Explain the common uses of Hashes within the field of
    Computer Forensics.
       Data Authentication
       Data Reduction
       File Identification
   Explain the limitations of Hashes.
What is a Hash Function?
   A hash function is any well-defined procedure or mathematical
    function which converts a large, possibly variable-sized amount
    of data into a small datum. The values returned by a hash
    function are called hash values, hash codes, hash sums, or simply
Cryptographic Hash Functions
   A cryptographic hash function is a deterministic procedure that takes
    an arbitrary block of data and returns a fixed-size bit string, the
    (cryptographic) hash value, such that an accidental or intentional
    change to the data will change the hash value. The data to be encoded is
    often called the "message", and the hash value is sometimes called the
    message digest or simply digest.

   The ideal cryptographic hash function has the main properties:
     it is infeasible to find a message that has a given hash,
     it is infeasible to modify a message without changing its hash,
     it is infeasible to find two different messages with the same hash.

   MD5 and SHA-1 are the most commonly used cryptographic hash
    functions (a.k.a. algorithms) in the field of Computer Forensics.
   MD5 (Message-Digest algorithm 5) is a widely used cryptographic hash function with a 128-bit hash
   The 128-bit MD5 hashes (also termed message digests) are represented as a sequence of 16 hexadecimal
    bytes. The following demonstrates a 40-byte ASCII input and the corresponding MD5 hash:
   MD5 of “This is an example of an MD5 Hash Value.” = 3413EE4F01F2A0AA17664088E79CF5C2

   Even a small change in the message will result in a completely different hash. For example, changing the
    period at the end of the sentence to an exclamation mark:
   MD5 of "This is an example of an MD5 Hash Value!” = B872D23A7D14B6EE3B390A58C17F21A8
   SHA stands for Secure Hash Algorithm.
   SHA-1 produces a 160-bit digest from a message and is represented as a sequence
    of 20 hexadecimal bytes. The following is an example of SHA-1 digests:
   Just like MD5, even a small change in a message will result in a completely different
    hash. For example:
   SHA1 of "This is a test.” =
   SHA1 of "This is a pest.” =
   The MD5 hash algorithm - the chance of 2 files having the
    same MD5 hash value is 2 to the 128th power =
    3.4028236692093846346337460743177e+38 or
    1 in 340 billion billion billion billion.
   The SHA-1 hash algorithm - the chance of 2 files having the
    same SHA-1 hash value is 2 to the 160th power =
     1.4615016373309029182036848327163e+48 or
    1 in....a REALLY big number!
What do CF Examiners use Hashes for?
   Data Authentication
       To prove two things are the same
   Data Reduction
       To exclude many “known” files from hundreds of thousands of
        file you have to look at.
   File Identification
       To find a needle in a haystack.
Data Authentication
   One of the most important issues a computer forensic
    examiner faces is ensuring the ability to “authenticate” your
    digital evidence.
   This is done via Chain of Custody, Documentation, and Hash
   Using MD5 or SHA-1 hashing tools, an examiner should be
    able to verify that data has not changed. A hash of the
    acquired data must be identical to a hash of the original
Data Authentication
   Calculating a “hash value” for any block of data (i.e. a file, an entire disk, a
    partition, etc.) can be accomplished as a stand-alone task or simultaneous
    with the acquisition process (by most tools).
   Calculating the “hash value” of an entire disk is done by reading all data on
    the disk, running it through the desired algorithm, and generating a hash of
    all data read. The examiner then typically documents the resulting hash
   The resulting “hash value” is a hash of the data READ from the disk, not
    necessarily a hash of the data WRITTEN to your target disk during the
    acquisition process.
   Input/Output errors and bad sector errors encountered during the
    acquisition process will effect the resulting hash value.
   An examiner should run a verification process after acquisition to ensure
    that the original hash value calculated while reading the original data
    matches the hash value of the data written out to your target disk.
FTK Imager
(Hashes calculated without acquiring drive)
WinHex Specialist
(Hash calculated without acquiring drive)
Linux – md5sum & sha1sum
(using Helix3-2009R1)
FTK Imager
(Hashes calculated (and verified) as part of acquisition process)
Data Authentication
   Considerations:
       Drives will start to fail as they get older, resulting in “bad
        sectors”. Bad sectors = inability to obtain matching hash
        values when comparing a hash of the original disk to the hash
        of a forensic image of the data read from the disk.
       The more time a disk spins up, the more chance of disk
        failure(s). To calculate a hash value of a drive, you must read all
        data on the disk. To acquire a forensic image, you must read all
        data on the disk.
       If your imaging tool does not simultaneously capture a hash
        value as part of the data acquisition process, consider whether
        the risk of double the spin-up time to obtain a pre-acquisition
        hash values is appropriate given that your primary objective is
        to obtain the data.
Data Authentication
   In the previous slides, we looked at hashing an entire drive. Using hashes, an
    examiner can also verify that a specific file or any block of data has not changed.
   Hash individual file(s) with FTK Imager, WinHex, md5summer, and many other
    hashing tools.
Data Authentication
   Note that although these graphic files look identical, a single modified byte will result in hash
    values that do not match.
Data Authentication
   When hashing individual files:
       Changing filename or extension does NOT change hash value.
       Changing Modified, Accessed, Created dates does NOT change hash value.
       Changing file system attributes (read-only, hidden, system, etc.) does NOT
        change hash value.
       Changing ANYTHING within the file contents DOES change the hash value of
        the file.
           For files like MS Word documents, that contain “Metadata”, changes within the Metadata DO change
            the contents of the file and therefore change the hash value of the file.
           For example, if you opened a MS Word document, made no changes to the contents of the file and
            just re-saved the file, MS Word would update the dates saved within the Metadata and the actual
            raw content of the overall word document would change and therefore generate a different hash
       Cropping a graphic, changing the resolution, saving as another graphic format
        (BMP to JPEG), or any other change that may not necessarily change the visual
        depiction of the picture, WILL change the raw contents of the file and therefore
        will change the hash value of the file.
Data Authentication
   NOTE:
       Although we just told you that changing a filename or other “non-content” of a
        file does not change the hash value of the file….
       Such a “non-content” change DOES make a change to the FAT directory entry,
        MFT entry, or other file system component that holds the filename, MAC dates,
        attributes, etc. and therefore DOES change the data on the file system that holds
        the file in question.
       Therefore a change of a filename, MAC date, file attribute, etc. DOES NOT
        change the hash value of the file, but it DOES change the hash value of the disk
        on which the file is stored.
Data Reduction
   As the storage capacity of disks grows, so does the number of files a
    computer forensic examiner must examine.
   A typical hard drive containing a Windows installation, software
    applications, user files, temporary Internet files, music downloads,
    etc. will contain well over a hundred thousand files.
   Large databases containing hash values of “known” files can be used
    by a forensic examiner to reduce the number of files he or she must
   Files that are known to be part of the operating system and/or
    installed software applications are likely not going to contain
   By excluding all known operating system files and files from known
    software applications, an examiner is left with only user created files
    to review for potential evidence
Data Reduction
   Using forensic software tools, an examiner calculates the hash value
    of all files on a disk.
   Then the examiner uses the software tool to compare the
    calculated hash values against all of the hash values within a known
    hash database to identify any matching hash values.
   The examiner can then exclude from view, any files with hash values
    matching those in the database.
   The examiner can also exclude from view, any files that are
    duplicates of each other according to their hash values, further
    reducing the number of files in view.
   This process called “Data Reduction” can save the examiner from
    analyzing many thousands of un-necessary files.
Data Reduction
   Hash Databases:
       National Software Reference Library (NSRL) –
        Reference Data Sets (RDS) - NIST
       HashKeeper (LE, Military and Government only) - NDIC
       Known File Filter (KFF) – AccessData, Inc.
       Self-generated or shared databases
Forensic Tool Kit - KFF
File Identification
   Quickly identifying a specific “notable” file or files amongst the
    hundreds of thousands of files on a disk can also be accomplished by
    use of hash databases….finding the needle in the haystack!
   Instead of using a database of known “ignorable” files such as OS
    files, databases containing hash values of known “notable” files can
    be utilized.
   Example of common “Notable” files are:
       Child Pornography and other contraband images
       Hacker Tools
       Viruses, Trojans and other Malware
   The examiner can search by hash value and flag any files with hash
    values matching those in the “notable” database.
   A mismatched hash value only tells you something changed, not
    what changed!
   When using MD5, SHA-1 or other standard cryptographic
    hashes to identify known files, only EXACT matches will result
    in success.
        When files are slightly modified, standard hashing will not identify
         similar files.
        “Fuzzy Hashing” uses a concept called context triggered piecewise hashes
         in the tool ssdeep to identify files that have similar pieces but may not
         be entirely identical.
   Hash “collisions” have been discovered and some argue that
    stronger (more collision proof) hash algorithms should be used
    in computer forensics.
           Questions ???

…as usual, use the discussion board!

To top