Common Methods for Data Integrity and Authentication
Hash Functions (Message Digests)
A hash function accepts a variable size data input and generates a fixed size representation
of the input data. The fixed size representation is referred to as hash/digest of the input data.
Hash functions are sometimes referred to as one-way functions because of their unique
property, which makes the inversion process extremely difficult or impossible to achieve.
Some people refer to a message digest (hash) as a digital fingerprint of the input data. It is
important to note that hash functions are all about predictability and have no place for
randomness. Given the same input data twice, the hash function should be able to generate
the same message digest value both times. A change of 1 bit in the input data would result in
a very different hash value.
Hash functions are designed to result in digests that are small enough to be managed.
At the same time, these digests should be large enough so that they are not susceptible to
attacks. Hashing algorithms, such as MD5 (RFC 1321) and SHA-1 (FIPS PUB 180-1), are the
two most common algorithms in use today. MD5 produces a 128-bit (16-byte) message
digest, and SHA-1 produces a 160-bit (20-byte) digest block.
Hash functions do not provide confidentiality, and they do not involve the use of a
secret key to generate the digest. Hash functions are very well suited for authentication and
ensuring data integrity. UNIX systems have been using hash functions for authentication for
years. Instead of storing the password of the user in the clear, the hash or a hashed
derivative of the password is stored instead. Every time a user logs in, the password is first
hashed and then compared with the stored value. The advantage of this approach is that the
hashed password database is literally of no value to hacker because the hash password are
one-way in nature. Because the type is static authentication can be subject to replay attacks,
a challenge-response method is used in certain network protocols, such as CHAP, for
authentication. In the case, the server would sent out a random chunk of data to the client.
The client would concatenate his password and the random data, hash them together, and
send the result to the server. It should be noted that in this case, the server should have
knowledge of the shared secret (the password is clear) to be able to derive and compare the
client's hash message.
The most important feature of a hash function is collision resistance. Because hashing
algorithms convert a large arbitrary domain to a fixed size and smaller string of bits, there is
always a possibility of a many-to-one relationship during this compression process. This might
result in two or more messages hashing to the same value. The biggest requirement of a
good hashing algorithm is that it should infeasible to find two messages that hash to exactly
the same value. MD4 is a hashing algorithm, which was developed by Ronald Rivest in 1990.
Due to some limitations in the internal design of MD4, certain collision attack methodologies
were developed rapidly. Certain attacks were able to show the existence of collisions with
MD4 under a minute in a generic PC. As a result, MD4 should be considered broken and not
used for hashing purposes. Rivest developed the MD5 hashing algorithm in 1991. MD5 was
based on the MD4 algorithm but took care of the collision resistance problems, which were
present in MD4.
As a rule of thumb, it would also better to consider the SHA-1 algorithm as more
secure than MD5 because SHA-1 generates a larger message digest compared to MD5. With
the recent advances in hardware, certain groups do not consider a 128-bit message digest
secure enough to thwart brute force attacks. A brute force collision search attack on a 128-bit
hash result would require 2^64 operations.
RIPEMED-160 is a hash function based on MD4. This is a 160-bit cryptographic hash
function, designed by Hans Dobbertin, Antoon Bosselears, and Bart Preneel. It takes into
account the knowledge gained from the analysis of MD4 and MD5. RIPEMD-160 was
basically designed for 32-bit processors.
For efficient implementation of hashing algorithms on 64-bit systems, Ross Anderson
and Eli Biham came up with the Tiger hashing algorithm. It should be noted that the MD
family of hashing algorithms uses 32-bit rotations and additions. A 64-bit register would then
only be handling 32-bit values at a time. The SHA-1 algorithm is optimized for a 32-bit system
and would definitely run slowly on a 64-bit processor. The Tiger hashing algorithm would be
particularly efficient and quick on RISC processors because its internal table lookup
operations can be done in parallel. In contrast, the MD family will suffer from pipeline stalls on
RISC processors. The Tiger hashing algorithm is usually used in three different modes:
Tiger/192 (192-bit hash), Tiger/160 (160-bit hash), and Tiger/128 (128-bit hash).
Hash functions can also be viewed as compression functions, which are able to
represent large chunks of input data as manageable chunks of data. This is particularly useful
when a large message has to be digitally signed (encrypted with the private key). In real-life
digital signatures, the private key operation is performed on the hash of the input message
rather than on the message itself because of performance reasons. Because the digital
signature is attached to the hash of the message, there is a very strong requirement for
choosing a collision free message digest algorithm to generate the hash. The creation of a
digital signature based on a private key is a slow and CPU-intensive process. Therefore, it
would make sense to perform this operation on the compressed input data (the hash of the
data) rather than actual input data. The hash of the data is also usually of fixed size. (For
example, the MD5 hash is always 128-bits long, and a SHA1 hash is always 160 bits long). It
is easier to quantify the signature creation (private key) operations on fixed size data inputs
rather than variable data input sizes.