-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable/disable compile for quants methods #36519
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks @SunMarc! This was on my mind as I ran into this last week :) |
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for fixing it 💛
Added two nits that I believe will help with long-term maintenance 🤗
src/transformers/generation/utils.py
Outdated
generation_config.disable_compile = not self.hf_quantizer.is_compileable | ||
else: | ||
generation_config.disable_compile = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
generation_config
here doesn't persist across calls, and, in general, I'm moving away from mutating generation_config
so we can better distinguish user flags from our internal handling (especially in the decoding methods, like _sample
).
Can we instead set is_compileable
accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done !
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Let me know if this is better @gante ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thank you for iterating 💪
What does this PR do ?
This PR disables compile if the quantization method do not support it. List to be found here. Need to add follow up PRs to allow compile or not for each method + add tests. Only torchao and bnb have the tests for now.
Fixes #36485 (comment)
cc @gante let me know what you think about that. I wanted to still have an easy way to not
disable_compile
if we wanted.