## Computer Security :: Lessons :: SHA

### Cryptanalysis

Cryptanalysis is only feasible if it takes less effort than a brute-force attack. An ideal hash algorithm, therefore, would be immune to cryptanalysis if it is harder to cryptanalyze the function than it is to brute-force it. The structure of a typical hash function is shown in the diagram. The hash function takes an input message and partitions it into L fixed-size blocks of b bits each. The final block is padded to b bits, if necessary. The final block also contains the total length of the input to the hash function. This is what makes cryptanalysis difficult. The attacker must find two messages of equal length that hash to the same value, or two messages of differing values that, together with their length values, hash to the same value.

The hash algorithm repeatedly uses a compression function f that takes two inputs and produces an n-bit output. The two inputs to the compression function are the n-bit output from the previous step called the chaining variable and a b-bit block.

### Secure Hash Algorithm

The Secure Hash Algorithm, or SHA, is based on the hash function MD4. The original version of SHA was SHA-0, which was found to be vulnerable so an updated version was released called SHA-1. SHA-1 is on its way to being vulnerable to collision attacks and many web browsers that currently rely on SHA-1 are hurrying their transition to SHA-2, which consists of three versions of SHA with different hash lengths known as SHA-224, SHA-256, SHA-384, and SHA-512. Below is a table comparing different versions of SHA.

SHA-1 | SHA-224 | SHA-256 | SHA-384 | SHA-512 | |
---|---|---|---|---|---|

Hash Size | 160 | 224 | 256 | 384 | 512 |

Message Size | <2^{64} |
<2^{64} |
<2^{64} |
<2^{128} |
<2^{128} |

Block Size | 512 | 512 | 512 | 1024 | 1024 |

Number of Steps | 80 | 64 | 64 | 80 | 80 |

The following logic applies to the SHA-512 algorithm, but can be easily applied to any of the other SHA algorithms. The input message must be less than 2^{128} bits and produces a 512-bit hash as an output. The input is processed in 1024-bit blocks as can be seen in the figure to the right. The algorithm follows these steps:

- Append Padding Bits: The message is always padded, even if the message is already the desired length. The number of padding bits, therefore, could be between 1 and 1024. The padding consists of a single 1 bit followed by the necessary number of 0 bits.
- Append Length: A block of 128 bits is appended to the end of the message. This block is an unsigned 128-bit integer and contains the length of the original message before padding.
- Initialize Hash Buffer: A 512-bit buffer is used to hold the intermediate and final results of the hash function. The buffer can be represented as eight 64-bit registers, which are initialized to the first 64 bits of the square roots of the first eight prime numbers.
- Process Message Blocks: This is the core of the SHA-512 algorithm that consists of 80 rounds. Each round takes the 512-bit buffer value, abcdefgh, and updates the buffer contents. Each round uses a 64-bit value, W
_{t}, derived from the current block being processed, M_{i}. Each round also makes sure of an additive constant, K_{t}. These constants were obtained by taking the first 64 bits of the square roots of the first 80 prime numbers. The output of the final round is added to the input from the first round to produce the output for this block. - Output: After all of the blocks have been processed, the output from the final stage is the 512-bit hash.

For the gritty details on how SHA-2 works, check out this video by Professor Christof Paar of Ruhr University. You can also check out this video that explains the accepted proposal for SHA-3 called the Keccak function. This function works differently than previous iterations of SHA so it may be interesting to investigate.