## Computer Security :: Lessons :: Cryptographic Hash Functions

### Hash Functions

A hash function takes a variable-length block of data as input and outputs a fixed-size value called a hash. A change to any bits in the input should result in a change to the hash output. A cryptographic hash function is an algorithm where it is computationally infeasible to find a data object that maps to a pre-specified hash results or two data objects that map to the same has result. The diagram shows that the input is padded to a fixed-length number of bits using the length of the original message in bits as part of the padding value. This is a security measure to increase the difficulty for an attacker to produce an alternative message with the same hash value.

For a hash value h = H(x), x is referred to as the preimage of h. Because H is a many-to-one mapping, for any given hash value h, there will usually be multiple preimages. A collision occurs if we have x ≠ y but H(x) = H(y). Collisions are not desirable for data integrity. The following are the requirement for a secure cryptographic hash function:

- Variable Input Size - H can be applied to a block of data of any size
- Fixed Output Size - H produces a fixed-length hash value.
- Efficiency - H(x) is relatively easy to computer for any given x, especially compared to encryption algorithms.
- Preimage Resistant - For any given hash value h, it is computationally infeasible to find y such that H(y) = h.
- Second Preimage Resistant - For any given block x, it is computationally infeasible to find y ≠ x with H(y) = H(x).
- Collision Resistant - It is computationally infeasible to find any pair (x, y) such that H(x) = H(y).
- Pseudorandomness - Output of H meets standard tests for pseudorandomness.

### Brute Force Attacks

The difficulty of a brute force attack on a cryptographic hash function depends on the bit length of the resulting hash. The longer the bit length, the more secure the hash function. For a preimage or second preimage attack, an attacker must try, on average, 2^{m-1} values of y for a function H(y) with a hash bit-size of m.

An attack on collision resistance is considerably easier than preimage or second preimage attacks. This is because of a statistical principle called the birthday paradox where choosing random variables from a uniform distribution then the probability that a repeated element is encountered exceeds 0.5 after √N choices have been made. Therefore, for an m-bit hash value an attacker could be expected to find a collision after √2^{m} or 2^{m/2} attempts.