Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression compressed-tensors #36921

Closed
wants to merge 7 commits into from
Closed

Conversation

SunMarc
Copy link
Member

@SunMarc SunMarc commented Mar 24, 2025

What does this PR do?

This PR fixes the compressed-tensors model loading as it was broken for some models by this PR. Impacted models are models with config attribute "quantization_status": "frozen" (e.g. this is the case for the FP8 model) because we defaulted run_compressed to be True in the config. Thus, it triggers an error immediately.

Fixes #36915

To reproduce:

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("neuralmagic/Meta-Llama-3.1-8B-FP8")
model = AutoModelForCausalLM.from_pretrained("neuralmagic/Meta-Llama-3.1-8B-FP8")

Also maybe we shouldn't run the model with run_compressed in the case the model is frozen, but for the fp8 model, the weights are actually in fp8 format.

@github-actions github-actions bot marked this pull request as draft March 24, 2025 10:47
Copy link

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. When it is ready for review, please click the Ready for review button (at the bottom of the PR page).

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc
Copy link
Member Author

SunMarc commented Mar 24, 2025

cc @rahul-tuli

@@ -154,7 +152,8 @@ def is_quantization_compressed(self):

return (
self.quantization_config.quantization_config is not None
and self.quantization_config.quantization_config.quantization_status == QuantizationStatus.COMPRESSED
and self.quantization_config.quantization_config.quantization_status
in [QuantizationStatus.COMPRESSED, QuantizationStatus.FROZEN]
Copy link
Contributor

@dsikka dsikka Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always True.

The model listed is an old model which is failing as the config itself is incorrect and should be updated or run_compressed should be passed in as False.

The FROZEN state is reserved for models that have already been decompressed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good then, this is what I thought . Can you update the config for those models (the most used ones should be enough !) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LMK when this is done and I will close this PR !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SunMarc I have updated our most downloaded models, you should be good to go!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@SunMarc SunMarc closed this Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: run_compressed is only supported for quantized_compressed models
5 participants