Add a non-duplicating way of cloning models/datasets via LFS

With the vanilla instructions on https://huggingface.co/deepseek-ai/DeepSeek-R1?clone=true,

![Image](https://github.com/user-attachments/assets/a7734258-8b39-4847-8b8c-6b88815d5e4a)

The downloaded 600Gb will be extra-duplicated in the `.git/lfs/objects`. Yes, this is required for having a well-formed repo, but we can't contrib to DeepSeek repo anyway, so having a bit broken repo is acceptable, if this leads to avoiding duplicating hundreds of gigabytes on disk.

I discussed it with LFS folks here: https://github.com/git-lfs/git-lfs/discussions/6029

For now, here is my workaround example using hardlinks:

```bash
GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/deepseek-ai/DeepSeek-R1
cd DeepSeek-R1
git lfs fetch
git lfs ls-files -l | while read SHA DASH FILEPATH; do rm "$FILEPATH" && ln ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done

# git lfs ls-files -l | while read SHA DASH FILEPATH; do mv ".git/lfs/objects/${SHA:0:2}/${SHA:2:2}/$SHA" "$FILEPATH"; done
```

Given that HF is driving LFS usage for these giant models and datasets (and as this currently leads to duplication), maybe you could consider advertising this workaround in the `?clone=true` popup?

Thanks!

---

LFS don't like much the idea of opt-in usage of hard-links (currently they only try reflinks which are not supported by majority of Linux file-systems), but maybe if HF asks, they would accept such a PR for `git-lfs pull` or for `git-lfs dedup` adding an opt-in switch for using hardlinks and maybe making these files marked readonly...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a non-duplicating way of cloning models/datasets via LFS #3053

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a non-duplicating way of cloning models/datasets via LFS #3053

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions