Attacking aHash or why it's good enough for a hashmap
Let's say we want to generate a collision
If we have a single item say a u64 or a u32 or a < 16 byte string we're imminently in trouble. aHash works by taking:
aes( aes( aes(block1) ^ block2) ^ block3 ) ^ block4 etc and then finally doing
aes(aes(encodedData)) to get the final hash. So if we're limited to one block, then the process amounts to
aes(aes(someValue ^ data)) which is fully reversible, so there is no opportunity to induce a full collision.
What's more two rounds of AES will fully defuse the input, leaving the output totally unpredictable, ruling out even a partial collision.
Using more than one block
Let's say we have two inputs that are two blocks each. Can we get them to produce the same hash code? Well we can't for the same reason as above if we only edit the final block. So we have to edit both blocks in such a way that the changes in the second happen to cancel out the difference resulting from the first block. It will have to cancel out ever bit of difference, though otherwise the final two rounds will defuse any remaining difference over the entire output.
Is this possible? Sortof. AES after a single round does not fully defuse the output. Instead for any bit in the input a whole 32 bit word of the output will be changed, but not the other 92 bits.
So it is possible to simply brute force it and generate 2^32 inputs for the second block. Then we could send 2^32 + 1 input values that are 32 bytes each or 128GB and guarantee that one of those values would have the same hash code as the first one. Unfortunately, sending 128GB over the network is a lot more expensive than the 1 extra CPU instruction that can be induced.
Can this be improved upon?
Some. It's possible to send all 2^32 inputs that could influence a given word in the first block and all possible 2^32 values for the second block. Meaning that it would be possible to construct a 256GB message that could be sent over the wire that would induce 2^32 pairs of collisions if they all went into the same map.
But pairs of collisions are uninteresting. To produce a DOS attack we need to create hundreds or thousands of values that all will hash to the same key. That way the hashmap lookup becomes O(N) not O(2).
Is generating this kind of attack possible?
Almost certainly not. In order to make a large number of collisions we would need to be able to guess with high probability which of the 2^32 values will cancel the change in the input from the previous block. If that could be done it would be possible to produce 'chains' where the one pair or the other of colliding blocks are used over and over. Then it would be possible to produce 2^(L/2) collisions of length L blocks.
Previous hashes such as MurmurHash2 and FxHash are vulnerable to this sort of scheme, because one bit in the input (the high order bit) only affected itself. So to produce a collision you only had to work with that one bit. With 32 bits its much harder.
Implications if a attack did exist
If it were possible to predict which bits need to be modified in the second block to induce a collision, this would mean that it would be possible for an attacker that knows at least 32 bits of a plain text, to predict the corresponding 32 bits of the cipher text following an AES round without knowing the key. To even do this probabilisticly would be a HUGE problem for AES.
Limitations on severity of an attack.
Supposing someone came up with a better than random guessing version of this attack. How many values could they create that collided with the same key?
Suppose for example that instead of 1/2^32 probability of canceling out the change from the previous block someone found a way to reduce this to 1/4. This would probably be a big deal in terms of attacking AES and it would certainly make the news. But what would it mean for aHash?
This means that they could produce 2 32byte inputs that have a 1/4 change of both colliding, or 4 64byte inputs that have a 1/16 chance of all colliding, etc. (Assuming the 1/4th probability is correlated, which is generous)
Even with such an attack to produce 1024 collisions (the point where it just barely would start to impact latency) the odds of the attack working are less than 1 in a million, and it would require transmitting 16MB of data. (The network bandwidth of which is a much bigger attack than the resulting minor CPU usage that would occur assuming it worked)
So even with a very strong attack on AES, it would lead to no practical DOS attacks on aHash based HashMaps.