New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing for lists-of-lists and other things is semi-broken. #74

Closed
neongreen opened this Issue Nov 10, 2013 · 9 comments

Comments

Projects
None yet
5 participants
@neongreen

neongreen commented Nov 10, 2013

Currently structures containing lists are treated as if they were all concatenated: hash ["abc"] == hash ["a","bc"] == hash ("ab", "c") == hash "abc". That is, hashable completely ignores the structure. Is it deliberate?

I think that in some cituations it might be a serious flaw — for instance, storing all partitions of a list (they would all have one hash).

@tibbe

This comment has been minimized.

Show comment
Hide comment
@tibbe

tibbe Nov 11, 2013

Owner

Good question. I would have to ponder this for a while. Do you know what other platform do e.g. Java or STL?

Owner

tibbe commented Nov 11, 2013

Good question. I would have to ponder this for a while. Do you know what other platform do e.g. Java or STL?

@gseitz

This comment has been minimized.

Show comment
Hide comment
@gseitz

gseitz Nov 11, 2013

Java basically does this in java.util.AbstractList (with an additional null check per element of course):

foldl' (\acc x -> 31*acc + hash x)  1 xs

gseitz commented Nov 11, 2013

Java basically does this in java.util.AbstractList (with an additional null check per element of course):

foldl' (\acc x -> 31*acc + hash x)  1 xs
@neongreen

This comment has been minimized.

Show comment
Hide comment
@neongreen

neongreen Nov 11, 2013

I admit that it’s the first time ever I code in Java, but for what it’s worth this piece of code

List a = Arrays.asList(1,2,3);
List b = Arrays.asList(Arrays.asList(1,2),Arrays.asList(3));
List c = Arrays.asList(Arrays.asList(1),Arrays.asList(2,3));
List d = Arrays.asList(Arrays.asList(),Arrays.asList(1,2,3));
List e = Arrays.asList(Arrays.asList(1),Arrays.asList(),Arrays.asList(2,3));
System.out.print(Arrays.asList(a.hashCode(),b.hashCode(),c.hashCode(),d.hashCode(),e.hashCode()));

produces [30817, 31809, 2979, 31809, 61600]. That is, hash [[1,2],[3]] == hash [[],[1,2,3]] /= hash [[1],[2,3]].

neongreen commented Nov 11, 2013

I admit that it’s the first time ever I code in Java, but for what it’s worth this piece of code

List a = Arrays.asList(1,2,3);
List b = Arrays.asList(Arrays.asList(1,2),Arrays.asList(3));
List c = Arrays.asList(Arrays.asList(1),Arrays.asList(2,3));
List d = Arrays.asList(Arrays.asList(),Arrays.asList(1,2,3));
List e = Arrays.asList(Arrays.asList(1),Arrays.asList(),Arrays.asList(2,3));
System.out.print(Arrays.asList(a.hashCode(),b.hashCode(),c.hashCode(),d.hashCode(),e.hashCode()));

produces [30817, 31809, 2979, 31809, 61600]. That is, hash [[1,2],[3]] == hash [[],[1,2,3]] /= hash [[1],[2,3]].

@tibbe

This comment has been minimized.

Show comment
Hide comment
@tibbe

tibbe Nov 11, 2013

Owner

Here's the guiding principled I've been using when thinking about collisions: we try to avoid collisions between two values of the same types (e.g. ["abc"] and ["a","bc"]). We don't not try to avoid collisions between two values of different types (e.g. ("ab", "c") and "abc"). The latter is not very useful for a library designed to support data types which store a value of a single type (e.g. HashMap Text (String, String)). It's also terribly difficult to avoid in practice.

We should see if we can do something better for lists and tuples though, perhaps based on what Java is doing.

Owner

tibbe commented Nov 11, 2013

Here's the guiding principled I've been using when thinking about collisions: we try to avoid collisions between two values of the same types (e.g. ["abc"] and ["a","bc"]). We don't not try to avoid collisions between two values of different types (e.g. ("ab", "c") and "abc"). The latter is not very useful for a library designed to support data types which store a value of a single type (e.g. HashMap Text (String, String)). It's also terribly difficult to avoid in practice.

We should see if we can do something better for lists and tuples though, perhaps based on what Java is doing.

@neongreen

This comment has been minimized.

Show comment
Hide comment
@neongreen

neongreen Nov 11, 2013

I only included the tuple to further prove my point that structure of whatever kind is ignored; I’m not proposing here to address this problem too.

Now that I’ve said that... Avoiding collisions between two values of different types might be as easy as using hash of TypeRep as salt (if it introduces overhead, you could always make it a separate function).

neongreen commented Nov 11, 2013

I only included the tuple to further prove my point that structure of whatever kind is ignored; I’m not proposing here to address this problem too.

Now that I’ve said that... Avoiding collisions between two values of different types might be as easy as using hash of TypeRep as salt (if it introduces overhead, you could always make it a separate function).

@tibbe

This comment has been minimized.

Show comment
Hide comment
@tibbe

tibbe Nov 11, 2013

Owner

@ArtyomKazak as you said, it slows things down. hashable is not really meant for general purpose hashing, just to support hashing based data types.

Owner

tibbe commented Nov 11, 2013

@ArtyomKazak as you said, it slows things down. hashable is not really meant for general purpose hashing, just to support hashing based data types.

@jberryman

This comment has been minimized.

Show comment
Hide comment
@jberryman

jberryman Aug 17, 2015

If anyone's interested I've released a rewrite of hashable called hashabler which concerns itself with this issue.

jberryman commented Aug 17, 2015

If anyone's interested I've released a rewrite of hashable called hashabler which concerns itself with this issue.

@phadej phadej referenced this issue Sep 17, 2015

Merged

List with length #103

@phadej

This comment has been minimized.

Show comment
Hide comment
@phadej

phadej Jan 16, 2016

Contributor

With 1.2.4.0 on my machine:

 Data.Hashable > (hash ["abc"], hash ["a","bc"], hash ("ab", "c"), hash "abc")
(770288776435754308,-3254706873997758045,779577897349174717,-2835585523591699257)

This can be closed.

Contributor

phadej commented Jan 16, 2016

With 1.2.4.0 on my machine:

 Data.Hashable > (hash ["abc"], hash ["a","bc"], hash ("ab", "c"), hash "abc")
(770288776435754308,-3254706873997758045,779577897349174717,-2835585523591699257)

This can be closed.

@tibbe

This comment has been minimized.

Show comment
Hide comment
@tibbe

tibbe Jan 16, 2016

Owner

Thanks for verifying!

Owner

tibbe commented Jan 16, 2016

Thanks for verifying!

@tibbe tibbe closed this Jan 16, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment