#### Challenge 55:  MD4 Collisions

[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)

In [1]:
from Crypto.Random import random
from Crypto.Cipher import AES
from Crypto.Cipher import Blowfish
import math
import cryptopals as cp
import pdb

<div class="alert alert-block alert-info">   

MD4 is a `128-bit` cryptographic hash function, meaning it should take a work factor of roughly `2^64` to find collisions.

It turns out we can do much better.

The paper "Cryptanalysis of the Hash Functions MD4 and RIPEMD" by Wang et al details a cryptanalytic attack that lets us find collisions in `2^8` or less.

Given a message block `M`, Wang outlines a strategy for finding a sister message block `M'`, differing only in a few bits, that will collide with it. Just so long as a short set of conditions holds true for `M`.

What sort of conditions? Simple bitwise equalities within the intermediate hash function state, e.g. `a[1][6] = b[0][6]`. This should be read as: "the sixth bit (zero-indexed) of `a[1]` (i.e. the first update to `a`) should equal the sixth bit of `b[0]` (i.e. the initial value of `b`)".

It turns out that a lot of these conditions are trivial to enforce. To see why, take a look at the first (of three) rounds in the MD4 compression function. In this round, we iterate over each word in the message block sequentially and mix it into the state. So we can make sure all our first-round conditions hold by doing this:

```
# calculate the new value for a[1] in the normal fashion
a[1] = (a[0] + f(b[0], c[0], d[0]) + m[0]).lrot(3)

# correct the erroneous bit
a[1] ^= ((a[1][6] ^ b[0][6]) << 6)

# use algebra to correct the first message block
m[0] = a[1].rrot(3) - a[0] - f(b[0], c[0], d[0])
```
    
Simply ensuring all the first round conditions puts us well within the range to generate collisions, but we can do better by correcting some additional conditions in the second round. This is a bit trickier, as we need to take care not to stomp on any of the first-round conditions.

Once you've adequately massaged `M`, you can simply generate `M'` by flipping a few bits and test for a collision. A collision is not guaranteed as we didn't ensure every condition. But hopefully we got enough that we can find a suitable `(M, M')` pair without too much effort.

Implement Wang's attack.

</div>

---

I'll start by implementing MD4 in Python to get a better handle on what we're doing here.  Based on [RFC 1320](https://tools.ietf.org/html/rfc1320) by Rivest.

In [2]:
def MD4(data):
    
    if isinstance(data, str):
        data = data.encode()
    
    # Step 1:  Append padding bits.  Single 1-bit + 0-bits so that
    #          length of message is congruent to 448 mod 512.
    #          I'll assume we're always passed a string or bytes.
    
    #pdb.set_trace()
    
    bit_length = len(data)*8
    
    # append 1-bit = 0x80 
    data += b'\x80' # Hex 0x80 = 0b10000000
    
    data_len = len(data) % 64
    padding_len = (56 - data_len) % 64
    
    data += b'\x00'*padding_len
    
    # Step 2:  Append length.  64-bit representation before padding.
    
    data += bit_length.to_bytes(8, 'little', signed=False)
        
    # Step 3:  Initialize MD Buffer.  
    
    A, B, C, D = 0x67452301, 0xefcdab89, 0x98badcfe, 0x10325476
    
    MGK_1 = 0x5a827999
    MGK_2 = 0x6ed9eba1
    
    # Step 4:  Process Message in blocks of 16 32-bit words (512 bits ea)
    
    # Define Auxilliary Functions:
    
    def lrot_32(n, d):
        # Circular rotate.  Python only natively supports non-circular shift.
        return ( (n << d) | (n >> (32 - d)) )
    
    def F(X, Y, Z):         
        return (X & Y) | (~X & Z)
    
    def G(X, Y, Z):
        return (X & Y) | (X & Z) | (Y & Z)
    
    def H(X, Y, Z):
        return (X ^ Y ^ Z)
    
    def phi(j, a, b, c, d, m_k, s):                
        if j == 0:            
            x = lrot_32(((a + F(b, c, d) + m_k) % 2**32), s)        
        elif j ==  1:            
            x = lrot_32(((a + G(b, c, d) + m_k + MGK_1) % 2**32), s)            
        elif j == 2:            
            x = lrot_32(((a + H(b, c, d) + m_k + MGK_2) % 2**32), s)            
        else:            
            raise(ValueError('Invalid j value to phi()'))        
        return(x)

    # Convert the data into 32-bit words.
    
    M = []
    N = (len(data) // 64)
    for ii in range(0, len(data), 4):
        word = int.from_bytes(data[ii:ii+4], byteorder='little', signed=False)
        M.append(word)
    
    # Run the compression algorithm.  Loop for each block of 512 bits until
    # full message is consumed.
    
    for kk in range(N):
        
        # pdb.set_trace()
        
        X = M[16*kk:16*(kk+1)]
        AA, BB, CC, DD = A, B, C, D
        
        # Round 1
        for ii in [0, 4, 8, 12]:
            A = phi(0, A, B, C, D, X[ii], 3) 
            D = phi(0, D, A, B, C, X[ii+1], 7) 
            C = phi(0, C, D, A, B, X[ii+2], 11)  
            B = phi(0, B, C, D, A, X[ii+3], 19)

        # Round 2
        for ii in [0, 1, 2, 3]:
            A = phi(1, A, B, C, D, X[ii], 3) 
            D = phi(1, D, A, B, C, X[ii+4], 5) 
            C = phi(1, C, D, A, B, X[ii+8], 9)  
            B = phi(1, B, C, D, A, X[ii+12], 13)
        
        # Round 3
        for ii in [0, 2, 1, 3]:
            A = phi(2, A, B, C, D, X[ii], 3) 
            D = phi(2, D, A, B, C, X[(ii+8) % 16], 9) 
            C = phi(2, C, D, A, B, X[(ii+4) % 16], 11)  
            B = phi(2, B, C, D, A, X[(ii+12) % 16], 15)
        
        A = (A + AA) 
        B = (B + BB) 
        C = (C + CC) 
        D = (D + DD) 
    
    digest = (A % 2**32).to_bytes(4, 'little') + \
             (B % 2**32).to_bytes(4, 'little') + \
             (C % 2**32).to_bytes(4, 'little') + \
             (D % 2**32).to_bytes(4, 'little')
   
    return(digest)

In [4]:
# Check against the MD4 test suite from RFC 1320:

assert(MD4('').hex() == '31d6cfe0d16ae931b73c59d7e0c089c0')
assert(MD4('a').hex() == 'bde52cb31de33e46245e05fbdbd6fb24')
assert(MD4('abc').hex() == 'a448017aaf21d8525fc10ae87aa6729d')
assert(MD4('message digest').hex() == 'd9130a8164549fe818874806e1c7014b')
assert(MD4('abcdefghijklmnopqrstuvwxyz').hex() == 'd79e1c308aa5bbcdeea8ed63df412da9')
assert(MD4('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789').hex() == '043f8582f241db351ce627e153e7f0e4')
assert(MD4('12345678901234567890123456789012345678901234567890123456789012345678901234567890').hex() == \
      'e33b4ddc9c38f2199c3e7b164fcc0536')

[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)