#### Challenge 54: Kelsey and Kohno's Nostradamus Attack

[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)

In [1]:
from Crypto.Random import random
import cryptopals as cp
import math
import pdb

<div class="alert alert-block alert-info">   

Hash functions are sometimes used as proof of a secret prediction.

For example, suppose you wanted to predict the score of every Major League Baseball game in a season. (`2,430` in all.) You might be concerned that publishing your predictions would affect the outcomes.

So instead you write down all the scores, hash the document, and publish the hash. Once the season is over, you publish the document. Everyone can then hash the document to verify your soothsaying prowess.

But what if you can't accurately predict the scores of `2.4k` baseball games? Have no fear - forging a prediction under this scheme reduces to another second preimage attack.

We could apply the long message attack from the previous problem, but it would look pretty shady. Would you trust someone whose predicted message turned out to be `2^50` bytes long?
    
It turns out we can run a successful attack with a much shorter suffix. Check the method:

</div>    

<div class="alert alert-block alert-info">     
    
1. Generate a large number of initial hash states. Say, `2^k`.

</div>

In [20]:
# Generate 2**k random **unique** initial states.  I'll use 

M_dummy = "This is some arbitrary message that is long enough to hold my full prediction.  It is going to be pretty long.  I could keep typing things.  But we just want to illustrate the process."
M_len = len(M_dummy)

b = 16
block_size = b // 8
k = math.ceil(math.log2(M_len / block_size))
N_initial_states = 2**k

starting_state_range = range(N_initial_states)
starting_state_data = [x.to_bytes(block_size, 'little') for x in starting_state_range]

<div class="alert alert-block alert-info"> 
    
2. Pair them up and generate single-block collisions. Now you have `2^k` hash states that collide into `2^(k-1)` states.
    
</div>

---

What we're to do here is pretty ambiguous -- wasn't clear at first pass what it means to "pair them up and generate single-block collisions" without more information.  Going to the source....

> Original Paper: [Herding Hash Functions and the Nostradamus Attack](https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=150629) by Kelsey and Kohno

Looking at this, it looks like we literally just choose the first two and search for message blocks that generates a collision between the two starting states for the next hash output.  Let's try that.

In [21]:
def find_pairwise_MD_collision(state_a, state_b, block_size):
    
    max_msg = 2**(block_size*8)
    collision_found = False
    
    m_a = 0
    while m_a < max_msg:
        
        m_a_bytes = m_a.to_bytes(block_size, 'little')
        digest_a = cp.MD(m_a_bytes, state_a, block_size)
        
        m_b = 0
        while m_b < max_msg:
                    
            m_b_bytes = m_b.to_bytes(block_size, 'little')
            digest_b = cp.MD(m_b_bytes, state_b, block_size)
        
            if digest_a == digest_b:
                return(m_a_bytes, m_b_bytes, digest_a)
            else:
                m_b += 1
            
        m_a += 1
            
    raise(Exception('No valid collision found'))

In [22]:
# Construct the diamond structure as a list of dictionaries -- where the keys for each state 
# tell us the next message block that's needed to navigate to the root.
#
# Call it a "herd map".
# 
# Structure for each round is {current_digest: [next_block, next_digest]}
# 
# this makes it easy to traverse from a leaf node to the root.

current_state_list = starting_state_data
herd_map = []
 
for k_i in range(k):
    
    print(f"Processing k_i={k_i}")
    round_map = {}    
    next_state_list = []
    
    for ii in range(0, len(current_state_list), 2):
    
        [x, y, h] = find_pairwise_MD_collision(current_state_list[ii], current_state_list[ii+1], block_size)
        
        round_map[current_state_list[ii]] = [x, h]
        round_map[current_state_list[ii+1]] = [y, h]

        next_state_list.append(h)
        
    current_state_list = next_state_list
    herd_map.append(round_map)
    
final_state = h       

Processing k_i=0
Processing k_i=1
Processing k_i=2
Processing k_i=3
Processing k_i=4
Processing k_i=5
Processing k_i=6


<div class="alert alert-block alert-info"> 

3. Repeat the process. Pair up the `2^(k-1)` states and generate collisions. Now you have `2^(k-2)` states.
4. Keep doing this until you have one state. This is your prediction.
5. Well, sort of. You need to commit to some length to encode in the padding. Make sure it's long enough to accommodate your actual message, this suffix, and a little bit of glue to join them up. Hash this padding block using the state from step 4 - THIS is your prediction.

</div>

<div class="alert alert-block alert-info"> 

What did you just build? It's basically a funnel mapping many initial states into a common final state. What's critical is we now have a big field of `2^k` states we can try to collide into, but the actual suffix will only be `k+1` blocks long.

The rest is trivial:

1. Wait for the end of the baseball season. (This may take some time.)
2. Write down the game results. Or, you know, anything else. I'm not too particular.
3. Generate enough glue blocks to get your message length right. The last block should collide into one of the leaves in your tree.
4. Follow the path from the leaf all the way up to the root node and build your suffix using the message blocks along the way.

The difficulty here will be around `2^(b-k)`. By increasing or decreasing `k` in the tree generation phase, you can tune the difficulty of this step. It probably makes sense to do more work up-front, since people will be waiting on you to supply your message once the event passes. Happy prognosticating!

</div>

In [23]:
suffix = M_len.to_bytes(block_size, 'little')
M_dummy_digest = cp.MD(suffix, final_state, block_size)
M_dummy_len = M_len - (k*block_size + len(suffix)) - block_size

# Now let's construct my "prediction" after the event actually occured, and
# long after I delivered the proof that I predicted it in the form of a hash

M_post_event = b"I told you I would be right."*8
M_post_event += b'\x00'*(M_dummy_len - len(M_post_event))

initial_state = b'\x00'*block_size
last_state = cp.MD(M_post_event, initial_state, block_size)

max_msg = 2**(block_size*8)
bridge_val = 0
bridge_found = False

while not(bridge_found):
    
    bridge_bytes = bridge_val.to_bytes(block_size, 'little')
    bridge_hash = cp.MD(bridge_bytes, last_state, block_size)
    
    if bridge_hash in starting_state_data:
        bridge_found = True

    bridge_val += 1
    if bridge_val >= max_msg:
        raise(Exception('Uh oh'))
    
post_event_prediction = M_post_event + bridge_bytes
leaf_idx = starting_state_data.index(bridge_hash) * block_size
current_state = cp.MD(post_event_prediction, initial_state, block_size)

for k_i in range(k):
    
    next_block = herd_map[k_i][current_state][0]
    current_state = herd_map[k_i][current_state][1]
    post_event_prediction += next_block

post_event_prediction += suffix
M_final_digest = cp.MD(post_event_prediction, initial_state, block_size)

assert (M_final_digest == M_dummy_digest)

print('Boom!\n')
print('The hashes of these two messages are equal:\n')
print(f'1: {M_dummy}\n')
print(f'Hash: {M_dummy_digest.hex()}\n\n')
print(f'2: {post_event_prediction}\n')
print(f'Hash: {M_final_digest.hex()}')

Boom!

The hashes of these two messages are equal:

1: This is some arbitrary message that is long enough to hold my full prediction.  It is going to be pretty long.  I could keep typing things.  But we just want to illustrate the process.

Hash: ffc6


2: b'I told you I would be right.I told you I would be right.I told you I would be right.I told you I would be right.I told you I would be right.I told you I would be right.I told you I would be right.I told you I would be right.k\x03!\x1e\x00\x00\x05\x00\x00\x00\x00\x00\x02?\x00\x00\xb8\x00'

Hash: ffc6


[Back to Index](CryptoPalsWalkthroughs_Cobb.ipynb)