Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Hashing for lists-of-lists and other things is semi-broken. #74

Open
neongreen opened this Issue · 6 comments

3 participants

@neongreen

Currently structures containing lists are treated as if they were all concatenated: hash ["abc"] == hash ["a","bc"] == hash ("ab", "c") == hash "abc". That is, hashable completely ignores the structure. Is it deliberate?

I think that in some cituations it might be a serious flaw — for instance, storing all partitions of a list (they would all have one hash).

@tibbe
Owner

Good question. I would have to ponder this for a while. Do you know what other platform do e.g. Java or STL?

@gseitz

Java basically does this in java.util.AbstractList (with an additional null check per element of course):

foldl' (\acc x -> 31*acc + hash x)  1 xs
@neongreen

I admit that it’s the first time ever I code in Java, but for what it’s worth this piece of code

List a = Arrays.asList(1,2,3);
List b = Arrays.asList(Arrays.asList(1,2),Arrays.asList(3));
List c = Arrays.asList(Arrays.asList(1),Arrays.asList(2,3));
List d = Arrays.asList(Arrays.asList(),Arrays.asList(1,2,3));
List e = Arrays.asList(Arrays.asList(1),Arrays.asList(),Arrays.asList(2,3));
System.out.print(Arrays.asList(a.hashCode(),b.hashCode(),c.hashCode(),d.hashCode(),e.hashCode()));

produces [30817, 31809, 2979, 31809, 61600]. That is, hash [[1,2],[3]] == hash [[],[1,2,3]] /= hash [[1],[2,3]].

@tibbe
Owner

Here's the guiding principled I've been using when thinking about collisions: we try to avoid collisions between two values of the same types (e.g. ["abc"] and ["a","bc"]). We don't not try to avoid collisions between two values of different types (e.g. ("ab", "c") and "abc"). The latter is not very useful for a library designed to support data types which store a value of a single type (e.g. HashMap Text (String, String)). It's also terribly difficult to avoid in practice.

We should see if we can do something better for lists and tuples though, perhaps based on what Java is doing.

@neongreen

I only included the tuple to further prove my point that structure of whatever kind is ignored; I’m not proposing here to address this problem too.

Now that I’ve said that... Avoiding collisions between two values of different types might be as easy as using hash of TypeRep as salt (if it introduces overhead, you could always make it a separate function).

@tibbe
Owner

@ArtyomKazak as you said, it slows things down. hashable is not really meant for general purpose hashing, just to support hashing based data types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.