Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miserable instances for integral types #71

Closed
hrdinka opened this issue Jul 13, 2013 · 2 comments
Closed

Miserable instances for integral types #71

hrdinka opened this issue Jul 13, 2013 · 2 comments

Comments

@hrdinka
Copy link

hrdinka commented Jul 13, 2013

As mentioned in Issue #30 Integral types do now hash to themselves. This leads to several problems.

Generally this behaviour would be okay if the combining function would take care of this. In the current implementation it is using a simple xor which has the following property:

xor 0 a = a
xor a 0 = a

Which makes the following hashes to be the same:
hash [] == hash [0,0,0,0] == hash [1,0,0] == hash [0,1,0] == hash [1]

As mentioned in the old issue, languages like Python also hash Ints to themselves. But they got the chaining right. In Python:
hash ((0)) = 0
hash ((0,0)) = 3713080549408328131

Further the usage of xor loads to miserable avalanche behavior for chains of Ints, For example:

hash [1,2] == hash 3
hash [1,2,3] == hash 0

This leads to a ton of hash collisions and all kind of funky errors.

This property naturally expands to generic instances which will affect a wide range of applications. For Example
data Salary = Salary { month :: Int, salary :: Int }
hash (Salary 1 2) == hash (Salary 0 3)
hash (Salary 0 1) == hash (Salard 1 0)

@bos
Copy link
Collaborator

bos commented Jul 24, 2013

The combining function was good when we had a good hash function for integers. Unfortunately, the hash function got reverted back to the identity function, but the combining function remained the same as it had been :-(

@tibbe
Copy link
Collaborator

tibbe commented Sep 13, 2013

The list instance is fixed by 4064b9f. The definition folds hashWithSalt with an initial non-zero seed so the numbers of elements in the list matters:

ghci> hash ([] :: [Int])
-2578643520546668380
ghci> hash ([0] :: [Int])
839657738087498284
ghci> hash ([0, 0] :: [Int])
-7234408896634655932

We don't really care that tuples of different size (e.g. (0, 0) and (0, 0, 0)) hash to the same value, as the only way you end up using both instances simultaneously is if you use an existential wrapper type, in which case you're already in deep waters.

Aside: 0 is only a left identity of the current combine implementation (which uses multiplication and xor), not a right identity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants