Hashing for lists-of-lists and other things is semi-broken. #74

neongreen opened this Issue Nov 10, 2013 · 9 comments


None yet

5 participants


Currently structures containing lists are treated as if they were all concatenated: hash ["abc"] == hash ["a","bc"] == hash ("ab", "c") == hash "abc". That is, hashable completely ignores the structure. Is it deliberate?

I think that in some cituations it might be a serious flaw — for instance, storing all partitions of a list (they would all have one hash).

tibbe commented Nov 11, 2013

Good question. I would have to ponder this for a while. Do you know what other platform do e.g. Java or STL?

gseitz commented Nov 11, 2013

Java basically does this in java.util.AbstractList (with an additional null check per element of course):

foldl' (\acc x -> 31*acc + hash x)  1 xs

I admit that it’s the first time ever I code in Java, but for what it’s worth this piece of code

List a = Arrays.asList(1,2,3);
List b = Arrays.asList(Arrays.asList(1,2),Arrays.asList(3));
List c = Arrays.asList(Arrays.asList(1),Arrays.asList(2,3));
List d = Arrays.asList(Arrays.asList(),Arrays.asList(1,2,3));
List e = Arrays.asList(Arrays.asList(1),Arrays.asList(),Arrays.asList(2,3));

produces [30817, 31809, 2979, 31809, 61600]. That is, hash [[1,2],[3]] == hash [[],[1,2,3]] /= hash [[1],[2,3]].

tibbe commented Nov 11, 2013

Here's the guiding principled I've been using when thinking about collisions: we try to avoid collisions between two values of the same types (e.g. ["abc"] and ["a","bc"]). We don't not try to avoid collisions between two values of different types (e.g. ("ab", "c") and "abc"). The latter is not very useful for a library designed to support data types which store a value of a single type (e.g. HashMap Text (String, String)). It's also terribly difficult to avoid in practice.

We should see if we can do something better for lists and tuples though, perhaps based on what Java is doing.


I only included the tuple to further prove my point that structure of whatever kind is ignored; I’m not proposing here to address this problem too.

Now that I’ve said that... Avoiding collisions between two values of different types might be as easy as using hash of TypeRep as salt (if it introduces overhead, you could always make it a separate function).

tibbe commented Nov 11, 2013

@ArtyomKazak as you said, it slows things down. hashable is not really meant for general purpose hashing, just to support hashing based data types.


If anyone's interested I've released a rewrite of hashable called hashabler which concerns itself with this issue.

@phadej phadej referenced this issue Sep 17, 2015

List with length #103

phadej commented Jan 16, 2016

With on my machine:

 Data.Hashable > (hash ["abc"], hash ["a","bc"], hash ("ab", "c"), hash "abc")

This can be closed.

tibbe commented Jan 16, 2016

Thanks for verifying!

@tibbe tibbe closed this Jan 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment