Skip to content

hash(::Int, ::UInt) and hash_integer(::Int, ::UInt) no longer coincide #58386

@lgoettgens

Description

@lgoettgens
Contributor

In Nemo.jl, we introduce a type ZZRingElem to represent a mathematical exact integer of arbitrary size, which is backed by the C-library FLINT. Since we sometimes use these as indices of dictionaries etc., we want everything that fits into an Int64 to hash like that Int64.
This is achieved by querying FLINT if the value consists of just a single limb, and if yes, passing that single limb (as a Int64) together with the h::UInt to Base.hash_integer. This worked fine until #57509 got merged as since then, our nightly CI is splattered with KeyErrors (cf. oscar-system/Oscar.jl#4882).

While digging into this, I noticed that the 2-arg hash and Base.hash_integer used to coincide when the first argument is a Int64, but since #57509 they no longer do.
This can be tested by e.g. hash(17, UInt(0)) == Base.hash_integer(17, UInt(0)).

From the code and previous discussions on github, I wasn't able to conclude if they should coincide (aka #57509 broke something) or if our use of Base.hash_integer assumes too much that just happened to hold by accident.

cc @benlorenz @fingolfin @thofma
ping @adienes @oscardssmith @JeffBezanson

Activity

adienes

adienes commented on May 12, 2025

@adienes
Member

I wasn't able to conclude if [Andy broke something] or if our use of Base.hash_integer assumes too much

perhaps a little of both. I am not seeing that it is, pedantically speaking, a documented requirement that hash_integer coincides with hash. but this is probably just due to the lack of documentation rather than an intentional freedom for them to diverge.

in this case, I think changing

function hash_integer(n::Integer, h::UInt)
    h ⊻= hash_uint((n % UInt) ⊻ h)

that first line to h = hash((n % UInt), h) will recover the coincidence. I can submit a pr shortly

FYI (maybe better as a separate issue) if you're interested, there may be potential to use the new hash_bytes to hash really huge numbers for a big performance win, if that's ever a bottleneck for you. I once worked on a project where each number was Gb large where such an improvement might have been relevant.

lgoettgens

lgoettgens commented on May 12, 2025

@lgoettgens
ContributorAuthor

in this case, I think changing

function hash_integer(n::Integer, h::UInt)
    h ⊻= hash_uint((n % UInt) ⊻ h)

that first line to h = hash((n % UInt), h) will recover the coincidence.

seems to work locally. I also have a local fix that adapts the hash_integer special case for BigInt accordingly. I'll open a PR with both changes in a few

lgoettgens

lgoettgens commented on May 22, 2025

@lgoettgens
ContributorAuthor

Triage decided that this is not an issue as hash_integer is internal. See #58387 (comment) for the triage comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    hashingregressionRegression in behavior compared to a previous version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @lgoettgens@adienes

      Issue actions

        `hash(::Int, ::UInt)` and `hash_integer(::Int, ::UInt)` no longer coincide · Issue #58386 · JuliaLang/julia