In cryptography, cryptographic hash functions can be divided into two main categories. In the first category are those functions whose designs are based on mathematical problems, and whose security thus follows from rigorous mathematical proofs, complexity theory and formal reduction. These functions are called Provably Secure Cryptographic Hash Functions. To construct these is very difficult, and few examples have been introduced. Their practical use is limited.

In the second category are functions which are not based on mathematical problems, but on an ad-hoc constructions, in which the bits of the message are mixed to produce the hash. These are then believed to be hard to break, but no formal proof is given. Almost all hash functions in widespread use reside in this category. Some of these functions are already broken, and are no longer in use. See Hash function security summary.

Types of security of hash functions

Generally, the basic security of cryptographic hash functions can be seen from different angles: pre-image resistance, second pre-image resistance, collision resistance, and pseudo-randomness.

The meaning of "hard"

The basic question is the meaning of "hard". There are two approaches to answer this question. First is the intuitive/practical approach: "hard means that it is almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important."

The second approach is theoretical and is based on the computational complexity theory. If problem A is hard, there exists a formal security reduction from a problem which is widely considered unsolvable in polynomial time, such as integer factorization problem or discrete logarithm problem.

However, non-existence of a polynomial time algorithm does not automatically ensure that the system is secure. The difficulty of a problem also depends on its size. For example, RSA public key cryptography relies on the difficulty of integer factorization. However, it is considered secure only with keys that are at least 2048 bits large.

Password case

Also, if the set of inputs to the hash is relatively small or is ordered by likelihood in some way, then a brute force search may be practical, regardless of theoretical security. Likelihood of recovering the preimage depends on the input set size and the speed or cost of computing the hash function. A common example is the use of hashes to store password validation data. Rather than store the plaintext of user passwords, an access control system typically stores a hash of the password. When a person requests access, the password they submit is hashed and compared with the stored value. If the stored validation data is stolen, the thief will only have the hash values, not the passwords. However most users choose passwords in predictable ways and often passwords are short enough so that all possible combinations can be tested if fast hashes are used.[1] Special hashes called key derivation functions have been created to slow searches. See Password cracking.

Cryptographic hash functions

Main article: Cryptographic hash function

Most hash functions are built on an ad hoc basis, where the bits of the message are nicely mixed to produce the hash. Various bitwise operations (e.g. rotations), modular additions and compression functions are used in iterative mode to ensure high complexity and pseudo-randomness of the output. In this way, the security is very hard to prove and the proof is usually not done. Only a few years ago, one of the most popular hash functions, SHA-1, was shown to be less secure than its length suggested: collisions could be found in only 251[2] tests, rather than the brute-force number of 280.

In other words, most of the hash functions in use nowadays are not provably collision-resistant. These hashes are not based on purely mathematical functions. This approach results generally in more effective hashing functions, but with the risk that a weakness of such a function will be eventually used to find collisions. The famous case is MD5.

Meaning of "hard to find collision" in these cases means "almost certainly beyond the reach of any adversary who must be prevented from breaking the system for as long as the security of the system is deemed important." The meaning of the term is therefore somewhat dependent on the application, since the effort that a malicious agent may put into the task is usually proportional to his expected gain.

Provably secure hash functions

In this approach, we base the security of hash function on some hard mathematical problem and we prove that finding collisions of the hash functions is as hard as breaking the underlying problem. This gives a somewhat stronger notion of security than just relying on complex mixing of bits as in the classical approach.

A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.

It means that if finding collisions would be feasible in polynomial time by algorithm A, we could find and use polynomial time algorithm R (reduction algorithm) that would use algorithm A to solve problem P, which is widely supposed to be unsolvable in polynomial time. That is a contradiction. This means that finding collisions cannot be easier than solving P.

However, this only indicates that finding collisions is difficult in 'some' cases, as not all instances of a computationally hard problem are typically hard. Indeed, very large instances of NP-hard problems are routinely solved, while only the hardest are practically impossible to solve.

Hash functions with the proof of their security are based on mathematical functions.

Hard problems

Examples of problems, that are assumed to be not solvable in polynomial time

Downsides of provable approach

SWIFFT is an example of a hash function that circumvents these security problems. It can be shown that for any algorithm that can break SWIFFT with probability P within an estimated time T, we can find an algorithm that solves the worst-case scenario of a certain difficult mathematical problem within time T' depending on T and P.[citation needed]

Example of (impractical) Provably Secure Hash Function

Hash(m) = xm mod n where n is hard to factor composite number, and x is some prespecified base value. A collision xm1 congruent to xm2 reveals a multiple m1 - m2 of the order of x. Such information can be used to factor n in polynomial time assuming certain properties of x.

But the algorithm is quite inefficient because it requires on average 1.5 multiplications modulo n per message-bit.

More practical provably secure hash functions

References

  1. ^ Goodin, Dan (2012-12-10). "25-GPU cluster cracks every standard Windows password in <6 hours". Ars Technica. Retrieved 2020-11-23.
  2. ^ http://eprint.iacr.org/2008/469.pdf [bare URL PDF]
  3. ^ Petit, C.; Quisquater, J.-J.; Tillich, J.-P., "Hard and easy Components of Collision Search in the Zémor-Tillich hash function:new Attacks and Reduced Variants with Equivalent Security" (PDF), ((citation)): Missing or empty |title= (help)