-
Notifications
You must be signed in to change notification settings - Fork 751
Xet upload workflow #2887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xet upload workflow #2887
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…hub into xet-upload-workflow
…hub into xet-upload-workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few missing doc strings, but otherwise looks great!
…hub into xet-upload-workflow
…hub into xet-upload-workflow
I think if xet is enabled in the repo and |
Ah yes, I forgot the workaround in this case would be to uninstall hf_xet if really they are not authorized to use it. But shouldn't happen except if doing something unexpected. Then it's all fine to raise the unauthorized error :) |
…ace_hub into xet-upload-workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥
Co-authored-by: Lucain <lucain@huggingface.co>
…ace_hub into xet-upload-workflow
Let's wait for a final review from @bpronan and/or @rajatarya and then I think we should be good to merge! 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is 💯 correct. The user would only see this case (xetEnabled on repo + Uninstalling |
…hub into xet-upload-workflow
I think we can merge this one to the |
* add hf_xet as an optional dependency * update installed packages at runtime * split xet testing in CI * fix workflow * fix windows * Xet download workflow (#2875) * first draft * remove comment * hf_xet instead of xet * update docstring * fix * update docstring * simplify typing * quality * add logging * fix tests * add unit tests for xet utilities * first draft of download testing * more tests * address some comments * fix tests * check if hf_xet is available or not * remove unnecessary dest dir creation * keep comment Co-authored-by: Lucain <lucain@huggingface.co> * post-review improvements * Update tests/test_xet_download.py --------- Co-authored-by: Lucain <lucain@huggingface.co> * Add ability to enable/disable xet storage on a repo (#2893) * add ability to enable/disable xet storage * add test * better way to check if all settings are none * don't strip authorization header with downloading with xet * update comment * Xet upload workflow (#2887) * add upload workflow * fixes and tests * use helper for prgress bar * use tmp repo in tests * some fixes * update tests * mock HF_XET_CACHE * fix tests * fix utils tests * debug CI * fix * check if xet is enabled * debug CI * debug CI again * revert * debugging * don't rerun xet tests * revert * remove pytest timeout * don't run tests in parallel * add comment * revert and rename variable * don't skip tests * remove warning * fix tests * Apply suggestions from code review * fixes * fix syntax error with python 3.8 * catch Invalid credentials * fix * record Space API VCR test * use raise instead of raise e Co-authored-by: Lucain <lucain@huggingface.co> * disable xet storage for the other tests * reverting * isolate xet tests for windows * fix windows * install hf_xet for xet testing --------- Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: Lucain Pouget <lucainp@gmail.com> * Xet Docs for huggingface_hub (#2899) * Xet docs * PR feedback, added waitlist links * Added HF_XET_CACHE env variable docs * PR feedback * Doc feedback * Added two lines about flow of upload/download * Updating links to Hub doc location * Reformat headings, less levels in TOC --------- Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> Co-authored-by: Célina <hanouticelina@gmail.com> Co-authored-by: Lucain <lucainp@gmail.com> * Adding Token Refresh Xet Test (#2932) Directly calling hfxet.download_files() with token_refresher callback to ensure that hfxet calls the token refresher as expected. --------- Co-authored-by: Celina Hanouti <hanouticelina@gmail.com> * Using a two stage download path for xet files. (#2920) * Adding request header on resolve endpoint indicating that we can receive xet info. * Adding test to ensure that the header is always sent on metdata request * Using a two stage download path for xet files. * Using the GET call's JSON * Using xet_backed for the whether the file is a xet file or not to disambiguate from whether xet is enabled * Adding and fixing tests * Testing fix WIP * Rewriting xet download to use the refresh route to resolve the xetmetadata * Parameter type check * Docs * Removing extraneous constant * Fixing file_download tests * Readding the refresh route into the file metadata * Refactoring the XetMetadata object into two objects to reflect the Hub changes. * Fixing broken tests * Code cleanup from self review * Fixing types * Quality & Lint * Handling when hub returns the entire refresh route in its headers. * Update tests/test_xet_utils.py * Fixing merge conflicts in the new tests * Extracting the refresh route from the link header (#2953) * Getting the refresh route from the links header * refactor xet_file_data func signature & tests Co-authored-by: Lucain <lucain@huggingface.co> Co-authored-by: Rajat Arya <rajat@huggingface.co> * Update src/huggingface_hub/constants.py Co-authored-by: Célina <hanouticelina@gmail.com> --------- Co-authored-by: Celina Hanouti <hanouticelina@gmail.com> Co-authored-by: Rajat Arya <rajatarya@users.noreply.github.com> Co-authored-by: Julien Chaumond <julien@huggingface.co> Co-authored-by: Pierric Cistac <Pierrci@users.noreply.github.com> Co-authored-by: Brian Ronan <brian.ronan@huggingface.co> Co-authored-by: Rajat Arya <rajat@huggingface.co>
Partially resolves #2713
This PR contains only the Xet upload workflow implemented in xetpoc_huggingface_hub (internal).
The main branch for xet storage integration is xet-integration.
Main changes:
Note: since it's a common part for download and upload, this has been pushed directly into the xet-integration branch.
to try it in from this branch: