-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix regression compressed-tensors #36921
Conversation
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cc @rahul-tuli |
@@ -154,7 +152,8 @@ def is_quantization_compressed(self): | |||
|
|||
return ( | |||
self.quantization_config.quantization_config is not None | |||
and self.quantization_config.quantization_config.quantization_status == QuantizationStatus.COMPRESSED | |||
and self.quantization_config.quantization_config.quantization_status | |||
in [QuantizationStatus.COMPRESSED, QuantizationStatus.FROZEN] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not always True.
The model listed is an old model which is failing as the config itself is incorrect and should be updated or run_compressed
should be passed in as False.
The FROZEN state is reserved for models that have already been decompressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good then, this is what I thought . Can you update the config for those models (the most used ones should be enough !) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LMK when this is done and I will close this PR !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SunMarc I have updated our most downloaded models, you should be good to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks !
What does this PR do?
This PR fixes the compressed-tensors model loading as it was broken for some models by this PR. Impacted models are models with config attribute "quantization_status": "frozen" (e.g. this is the case for the FP8 model) because we defaulted
run_compressed
to beTrue
in the config. Thus, it triggers an error immediately.Fixes #36915
To reproduce:
Also maybe we shouldn't run the model with
run_compressed
in the case the model is frozen, but for the fp8 model, the weights are actually in fp8 format.