This article provides insufficient context for those unfamiliar with the subject. Please help improve the article by providing more context for the reader. (February 2020) (Learn how and when to remove this message)

In cryptography, a collision attack on a cryptographic hash tries to find two inputs producing the same hash value, i.e. a hash collision. This is in contrast to a preimage attack where a specific target hash value is specified.

There are roughly two types of collision attacks:

Classical collision attack: Find two different messages m₁ and m₂ such that hash(m₁) = hash(m₂).

More generally:

Chosen-prefix collision attack: Given two different prefixes p₁ and p₂, find two suffixes s₁ and s₂ such that hash(p₁ ∥ s₁) = hash(p₂ ∥ s₂), where ∥ denotes the concatenation operation.

Classical collision attack

Much like symmetric-key ciphers are vulnerable to brute force attacks, every cryptographic hash function is inherently vulnerable to collisions using a birthday attack. Due to the birthday problem, these attacks are much faster than a brute force would be. A hash of n bits can be broken in 2^n/2 time steps (evaluations of the hash function).

Mathematically stated, a collision attack finds two different messages m1 and m2, such that hash(m1) = hash(m2). In a classical collision attack, the attacker has no control over the content of either message, but they are arbitrarily chosen by the algorithm.

More efficient attacks are possible by employing cryptanalysis to specific hash functions. When a collision attack is discovered and is found to be faster than a birthday attack, a hash function is often denounced as "broken". The NIST hash function competition was largely induced by published collision attacks against two very commonly used hash functions, MD5^[1] and SHA-1. The collision attacks against MD5 have improved so much that, as of 2007, it takes just a few seconds on a regular computer.^[2] Hash collisions created this way are usually constant length and largely unstructured, so cannot directly be applied to attack widespread document formats or protocols.

However, workarounds are possible by abusing dynamic constructs present in many formats. In this way, two documents would be created which are as similar as possible in order to have the same hash value. One document would be shown to an authority to be signed, and then the signature could be copied to the other file. Such a malicious document would contain two different messages in the same document, but conditionally display one or the other through subtle changes to the file:

Some document formats like PostScript, or macros in Microsoft Word, have conditional constructs.^[3]^[4] (if-then-else) that allow testing whether a location in the file has one value or another in order to control what is displayed.
TIFF files can contain cropped images, with a different part of an image being displayed without affecting the hash value.^[4]
PDF files are vulnerable to collision attacks by using color value (such that text of one message is displayed with a white color that blends into the background, and text of the other message is displayed with a dark color) which can then be altered to change the signed document's content.^[4]

Chosen-prefix collision attack

An extension of the collision attack is the chosen-prefix collision attack, which is specific to Merkle–Damgård hash functions. In this case, the attacker can choose two arbitrarily different documents, and then append different calculated values that result in the whole documents having an equal hash value. This attack is normally harder, a hash of n bits can be broken in 2^(n/2)+1 time steps, but is much more powerful than a classical collision attack.

Mathematically stated, given two different prefixes p₁, p₂, the attack finds two suffixes s₁ and s₂ such that hash(p₁ ∥ s₁) = hash(p₂ ∥ s₂) (where ∥ is the concatenation operation).

More efficient attacks are also possible by employing cryptanalysis to specific hash functions. In 2007, a chosen-prefix collision attack was found against MD5, requiring roughly 2⁵⁰ evaluations of the MD5 function. The paper also demonstrates two X.509 certificates for different domain names, with colliding hash values. This means that a certificate authority could be asked to sign a certificate for one domain, and then that certificate (specially its signature) could be used to create a new rogue certificate to impersonate another domain.^[5]

A real-world collision attack was published in December 2008 when a group of security researchers published a forged X.509 signing certificate that could be used to impersonate a certificate authority, taking advantage of a prefix collision attack against the MD5 hash function. This meant that an attacker could impersonate any SSL-secured website as a man-in-the-middle, thereby subverting the certificate validation built in every web browser to protect electronic commerce. The rogue certificate may not be revokable by real authorities, and could also have an arbitrary forged expiry time. Even though MD5 was known to be very weak in 2004,^[1] certificate authorities were still willing to sign MD5-verified certificates in December 2008,^[6] and at least one Microsoft code-signing certificate was still using MD5 in May 2012.

The Flame malware successfully used a new variation of a chosen-prefix collision attack to spoof code signing of its components by a Microsoft root certificate that still used the compromised MD5 algorithm.^[7]^[8]

In 2019, researchers found a chosen-prefix collision attack against SHA-1 with computing complexity between 2^66.9 and 2^69.4 and cost less than 100,000 US dollars. ^[9]^[10] In 2020, researchers reduced the complexity of a chosen-prefix collision attack against SHA-1 to 2^63.4. ^[11]

Attack scenarios

Many applications of cryptographic hash functions do not rely on collision resistance, thus collision attacks do not affect their security. For example, HMACs are not vulnerable.^[12] For the attack to be useful, the attacker must be in control of the input to the hash function.

Digital signatures

Because digital signature algorithms cannot sign a large amount of data efficiently, most implementations use a hash function to reduce ("compress") the amount of data that needs to be signed down to a constant size. Digital signature schemes often become vulnerable to hash collisions as soon as the underlying hash function is practically broken; techniques like randomized (salted) hashing will buy extra time by requiring the harder preimage attack.^[13]

The usual attack scenario goes like this:

Mallory creates two different documents A and B that have an identical hash value, i.e., a collision. Mallory seeks to deceive Bob into accepting document B, ostensibly from Alice.
Mallory sends document A to Alice, who agrees to what the document says, signs its hash, and sends the signature to Mallory.
Mallory attaches the signature from document A to document B.
Mallory then sends the signature and document B to Bob, claiming that Alice signed B. Because the digital signature matches document B's hash, Bob's software is unable to detect the substitution.^{[citation needed]}

In 2008, researchers used a chosen-prefix collision attack against MD5 using this scenario, to produce a rogue certificate authority certificate. They created two versions of a TLS public key certificate, one of which appeared legitimate and was submitted for signing by the RapidSSL certificate authority. The second version, which had the same MD5 hash, contained flags which signal web browsers to accept it as a legitimate authority for issuing arbitrary other certificates.^[14]

Hash flooding

Hash flooding (also known as HashDoS^[15]) is a denial of service attack that uses hash collisions to exploit the worst-case (linear probe) runtime of hash table lookups.^[16] It was originally described in 2003. To execute such an attack, the attacker sends the server multiple pieces of data that hash to the same value and then tries to get the server to perform slow lookups. As the main focus of hash functions used in hash tables was speed instead of security, most major programming languages were affected,^[17] with new vulnerabilities of this class still showing up a decade after the original presentation.^[16]

To prevent hash flooding without making the hash function overly complex, newer keyed hash functions are introduced, with the security objective that collisions are hard to find as long as the key is unknown. They may be slower than previous hashes, but are still much easier to compute than cryptographic hashes. As of 2021, Jean-Philippe Aumasson and Daniel J. Bernstein's SipHash (2012) is the most widely-used hash function in this class.^[18] (Non-keyed "simple" hashes remain safe to use as long as the application's hash table is not controllable from the outside.)

It is possible to perform an analogous attack to fill up Bloom filters using a (partial) preimage attack.^[19]

References

External links

Cryptographic hash functions and message authentication codes

Cryptographic hash functions and message authentication codes
List Comparison Known attacks
Common functions	MD5 (compromised) SHA-1 (compromised) SHA-2 SHA-3 BLAKE2
SHA-3 finalists	BLAKE Grøstl JH Skein Keccak (winner)
Other functions	BLAKE3 CubeHash ECOH FSB Fugue GOST HAS-160 HAVAL Kupyna LSH Lane MASH-1 MASH-2 MD2 MD4 MD6 MDC-2 N-hash RIPEMD RadioGatún SIMD SM3 SWIFFT Shabal Snefru Streebog Tiger VSH Whirlpool
Password hashing/ key stretching functions	Argon2 Balloon bcrypt Catena crypt LM hash Lyra2 Makwa PBKDF2 scrypt yescrypt
General purpose key derivation functions	HKDF KDF1/KDF2
MAC functions	CBC-MAC DAA GMAC HMAC NMAC OMAC/CMAC PMAC Poly1305 SipHash UMAC VMAC
Authenticated encryption modes	CCM ChaCha20-Poly1305 CWC EAX GCM IAPM OCB
Attacks	Collision attack Preimage attack Birthday attack Brute-force attack Rainbow table Side-channel attack Length extension attack
Design	Avalanche effect Hash collision Merkle–Damgård construction Sponge function HAIFA construction
Standardization	CAESAR Competition CRYPTREC NESSIE NIST hash function competition Password Hashing Competition
Utilization	Hash-based cryptography Merkle tree Message authentication Proof of work Salt Pepper

v t e Cryptography
General	History of cryptography Outline of cryptography Cryptographic protocol Authentication protocol Cryptographic primitive Cryptanalysis Cryptocurrency Cryptosystem Cryptographic nonce Cryptovirology Hash function Cryptographic hash function Key derivation function Digital signature Kleptography Key (cryptography) Key exchange Key generator Key schedule Key stretching Keygen Cryptojacking malware Ransomware Random number generation Cryptographically secure pseudorandom number generator (CSPRNG) Pseudorandom noise (PRN) Secure channel Insecure channel Subliminal channel Encryption Decryption End-to-end encryption Harvest now, decrypt later Information-theoretic security Plaintext Codetext Ciphertext Shared secret Trapdoor function Trusted timestamping Key-based routing Onion routing Garlic routing Kademlia Mix network
Mathematics	Cryptographic hash function Block cipher Stream cipher Symmetric-key algorithm Authenticated encryption Public-key cryptography Quantum key distribution Quantum cryptography Post-quantum cryptography Message authentication code Random numbers Steganography
Category