New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Odds of Hash Collision for Custom Type Keyword ID's are quite High #126
Comments
Hi Alexander, thanks for pinging! This is an intentional tradeoff since:
Is your concern theoretical, or have you actually run into a problem on your end?
Would be happy to look at a PR if you have something in mind 👍
I'm not sure I follow? Collisions will already print a warning message, including the specific id. Thanks! |
You are right, as I already said, for most use cases it is ok.
Yes, I explore to use something like 50 to 100 records instead of normal maps in my application and plan to serialize them with nippy.
Something like this:
I would include both, the old and the new id. |
Docstring will be updated in next release, thanks Alexander! |
You use 16-bit Hashes for custom type keyword ID's. The equation to calculate hash collisions is:
or simpler, but only valid for small probabilities:
So assume one would like to use at least 128 custom types, as it would be possible with numeric ID's, the odds of a hash collision would be 1 out of 10. So in 1/10 of cases were I use roughly 128 random keywords to identify my custom types, I will have a hash collision.
For a small number of custom types like 12 the odds are 1 out of 1000. So I assume for most use cases the choice of a 16-bit hash is ok. But if one plans to use a large number of custom types, collisions will be a problem.
I would suggest to add some hint to the documentation that the hashing system works only for small amounts of custom types. Furthermore I would include collided keywords explicitly in the warning inside
extend-thaw
.The text was updated successfully, but these errors were encountered: