|
| 1 | +# Hashing |
| 2 | + |
| 3 | +## Hash Method Generation |
| 4 | + |
| 5 | +:::{warning} |
| 6 | +The overarching theme is to never set the `@attrs.define(unsafe_hash=X)` parameter yourself. |
| 7 | +Leave it at `None` which means that *attrs* will do the right thing for you, depending on the other parameters: |
| 8 | + |
| 9 | +- If you want to make objects hashable by value: use `@define(frozen=True)`. |
| 10 | +- If you want hashing and equality by object identity: use `@define(eq=False)` |
| 11 | + |
| 12 | +Setting `unsafe_hash` yourself can have unexpected consequences so we recommend to tinker with it only if you know exactly what you're doing. |
| 13 | +::: |
| 14 | + |
| 15 | +Under certain circumstances, it's necessary for objects to be *hashable*. |
| 16 | +For example if you want to put them into a {class}`set` or if you want to use them as keys in a {class}`dict`. |
| 17 | + |
| 18 | +The *hash* of an object is an integer that represents the contents of an object. |
| 19 | +It can be obtained by calling `hash` on an object and is implemented by writing a `__hash__` method for your class. |
| 20 | + |
| 21 | +*attrs* will happily write a `__hash__` method for you [^fn1], however it will *not* do so by default. |
| 22 | +Because according to the [definition](https://docs.python.org/3/glossary.html#term-hashable) from the official Python docs, the returned hash has to fulfill certain constraints: |
| 23 | + |
| 24 | +[^fn1]: The hash is computed by hashing a tuple that consists of a unique id for the class plus all attribute values. |
| 25 | + |
| 26 | +1. Two objects that are equal, **must** have the same hash. |
| 27 | + This means that if `x == y`, it *must* follow that `hash(x) == hash(y)`. |
| 28 | + |
| 29 | + By default, Python classes are compared *and* hashed by their `id`. |
| 30 | + That means that every instance of a class has a different hash, no matter what attributes it carries. |
| 31 | + |
| 32 | + It follows that the moment you (or *attrs*) change the way equality is handled by implementing `__eq__` which is based on attribute values, this constraint is broken. |
| 33 | + For that reason Python 3 will make a class that has customized equality unhashable. |
| 34 | + Python 2 on the other hand will happily let you shoot your foot off. |
| 35 | + Unfortunately, *attrs* still mimics (otherwise unsupported) Python 2's behavior for backward compatibility reasons if you set `hash=False`. |
| 36 | + |
| 37 | + The *correct way* to achieve hashing by id is to set `@define(eq=False)`. |
| 38 | + Setting `@define(unsafe_hash=False)` (which implies `eq=True`) is almost certainly a *bug*. |
| 39 | + |
| 40 | + :::{warning} |
| 41 | + Be careful when subclassing! |
| 42 | + Setting `eq=False` on a class whose base class has a non-default `__hash__` method will *not* make *attrs* remove that `__hash__` for you. |
| 43 | + |
| 44 | + It is part of *attrs*'s philosophy to only *add* to classes so you have the freedom to customize your classes as you wish. |
| 45 | + So if you want to *get rid* of methods, you'll have to do it by hand. |
| 46 | + |
| 47 | + The easiest way to reset `__hash__` on a class is adding `__hash__ = object.__hash__` in the class body. |
| 48 | + ::: |
| 49 | + |
| 50 | +2. If two objects are not equal, their hash **should** be different. |
| 51 | + |
| 52 | + While this isn't a requirement from a standpoint of correctness, sets and dicts become less effective if there are a lot of identical hashes. |
| 53 | + The worst case is when all objects have the same hash which turns a set into a list. |
| 54 | + |
| 55 | +3. The hash of an object **must not** change. |
| 56 | + |
| 57 | + If you create a class with `@define(frozen=True)` this is fulfilled by definition, therefore *attrs* will write a `__hash__` function for you automatically. |
| 58 | + You can also force it to write one with `hash=True` but then it's *your* responsibility to make sure that the object is not mutated. |
| 59 | + |
| 60 | + This point is the reason why mutable structures like lists, dictionaries, or sets aren't hashable while immutable ones like tuples or `frozenset`s are: |
| 61 | + point 1 and 2 require that the hash changes with the contents but point 3 forbids it. |
| 62 | + |
| 63 | +For a more thorough explanation of this topic, please refer to this blog post: [*Python Hashes and Equality*](https://hynek.me/articles/hashes-and-equality/). |
| 64 | + |
| 65 | + |
| 66 | +## Hashing and Mutability |
| 67 | + |
| 68 | +Changing any field involved in hash code computation after the first call to `__hash__` (typically this would be after its insertion into a hash-based collection) can result in silent bugs. |
| 69 | +Therefore, it is strongly recommended that hashable classes be `frozen`. |
| 70 | +Beware, however, that this is not a complete guarantee of safety: |
| 71 | +if a field points to an object and that object is mutated, the hash code may change, but `frozen` will not protect you. |
| 72 | + |
| 73 | + |
| 74 | +## Hash Code Caching |
| 75 | + |
| 76 | +Some objects have hash codes which are expensive to compute. |
| 77 | +If such objects are to be stored in hash-based collections, it can be useful to compute the hash codes only once and then store the result on the object to make future hash code requests fast. |
| 78 | +To enable caching of hash codes, pass `@define(cache_hash=True)`. |
| 79 | +This may only be done if *attrs* is already generating a hash function for the object. |
0 commit comments