Skip to content

Model File Lookup by SHA256 Hash #3069

Open
@cgf120

Description

@cgf120

Problem Statement
Currently, there's no way to directly query and locate a model file on Hugging Face using only its SHA256 hash value. This creates inefficiency when managing multiple models locally and trying to determine if a specific file is already downloaded under a different filename.
Use Case
As a user who frequently works with multiple AI models:

I often have models stored locally with varying filenames
When I need to download a model, I want to verify if I already have it locally (even under a different name)
With only the SHA256 hash of a model file, I currently cannot easily find its corresponding Hugging Face URL

Example
If I have a file with SHA256 hash d99e39955c9d3d0350d8fb7c75e40c64a2b2eaeb003883d7c941fd2e8747b28c, I should be able to query the API and discover it corresponds to https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2/blob/main/parakeet-tdt-0.6b-v2.nemo.
Proposed Solution
Add a new method to the HfApi class that allows lookup by SHA256, such as:
pythondef get_file_by_sha256(sha256_hash: str) -> List[FileInfo]:
"""
Returns information about files matching the provided SHA256 hash.

Args:
    sha256_hash: The SHA256 hash to look up
    
Returns:
    List of FileInfo objects containing repo_id, path, and other metadata
"""

Benefits
This feature would:

Save storage space by avoiding duplicate downloads
Improve model management workflows
Enhance model traceability across the platform
Support verification of model file integrity
Streamline sharing by allowing reference via hash rather than full paths

Implementation Notes
This could be implemented by:

Creating a SHA256-to-file mapping in the backend
Enhancing the existing search functionality to handle SHA256 queries
Adding the corresponding API endpoints

I'm happy to provide additional information or discuss this feature request further if needed.
Environment

HF Hub API version: [version you're using]
Python version: [your Python version]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions