Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fsdp sharded state dict wont work for save_only_model knob #36627

Merged
merged 4 commits into from
Mar 13, 2025

Conversation

kmehant
Copy link
Contributor

@kmehant kmehant commented Mar 10, 2025

What does this PR do?

Fixes #36626

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Sorry, something went wrong.

Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@github-actions github-actions bot marked this pull request as draft March 10, 2025 06:57
@kmehant kmehant marked this pull request as ready for review March 10, 2025 07:01
@kmehant kmehant force-pushed the bug-fsdp-save branch 3 times, most recently from 3d7e6e4 to 3a16920 Compare March 10, 2025 10:12
@Rocketknight1 Rocketknight1 requested review from muellerzr and SunMarc and removed request for Rocketknight1 and ArthurZucker March 10, 2025 17:52
@kmehant
Copy link
Contributor Author

kmehant commented Mar 11, 2025

@SunMarc @muellerzr requesting review on this nit fix. Thanks

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, left a comment

Comment on lines 813 to 818
if (
hasattr(self.accelerator.state, "fsdp_plugin")
and "SHARDED_STATE_DICT" in str(self.accelerator.state.fsdp_plugin.state_dict_type)
and self.args.save_only_model is True
):
raise ValueError("save_only_model option is not compatible with FSDP state dict type 'SHARDED_STATE_DICT'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put that in create_accelerator_and_postprocess function ! There are already some checks that exist there for save_only_model arg

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated thank you.

@kmehant kmehant force-pushed the bug-fsdp-save branch 5 times, most recently from fe111ad to 4a2e33b Compare March 12, 2025 13:46
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SG !

SunMarc added 2 commits March 13, 2025 11:47

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@SunMarc SunMarc merged commit 09a309d into huggingface:main Mar 13, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

save_only_model with FSDP throws FileNotFoundError error
3 participants