Currently structures containing lists are treated as if they were all concatenated: hash ["abc"] == hash ["a","bc"] == hash ("ab", "c") == hash "abc". That is, hashable completely ignores the structure. Is it deliberate?
hash ("ab", "c")
I think that in some cituations it might be a serious flaw — for instance, storing all partitions of a list (they would all have one hash).
Good question. I would have to ponder this for a while. Do you know what other platform do e.g. Java or STL?
Java basically does this in java.util.AbstractList (with an additional null check per element of course):
foldl' (\acc x -> 31*acc + hash x) 1 xs
I admit that it’s the first time ever I code in Java, but for what it’s worth this piece of code
List a = Arrays.asList(1,2,3);
List b = Arrays.asList(Arrays.asList(1,2),Arrays.asList(3));
List c = Arrays.asList(Arrays.asList(1),Arrays.asList(2,3));
List d = Arrays.asList(Arrays.asList(),Arrays.asList(1,2,3));
List e = Arrays.asList(Arrays.asList(1),Arrays.asList(),Arrays.asList(2,3));
produces [30817, 31809, 2979, 31809, 61600]. That is, hash [[1,2],] == hash [,[1,2,3]] /= hash [,[2,3]].
[30817, 31809, 2979, 31809, 61600]
hash [[1,2],] == hash [,[1,2,3]] /= hash [,[2,3]]
Here's the guiding principled I've been using when thinking about collisions: we try to avoid collisions between two values of the same types (e.g. ["abc"] and ["a","bc"]). We don't not try to avoid collisions between two values of different types (e.g. ("ab", "c") and "abc"). The latter is not very useful for a library designed to support data types which store a value of a single type (e.g. HashMap Text (String, String)). It's also terribly difficult to avoid in practice.
HashMap Text (String, String)
We should see if we can do something better for lists and tuples though, perhaps based on what Java is doing.
I only included the tuple to further prove my point that structure of whatever kind is ignored; I’m not proposing here to address this problem too.
Now that I’ve said that... Avoiding collisions between two values of different types might be as easy as using hash of TypeRep as salt (if it introduces overhead, you could always make it a separate function).
@ArtyomKazak as you said, it slows things down. hashable is not really meant for general purpose hashing, just to support hashing based data types.
If anyone's interested I've released a rewrite of hashable called hashabler which concerns itself with this issue.