-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Cast inputs_embeds to model dtype if necessary #30635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request addresses a RuntimeError caused by a dtype mismatch for inputs_embeds during profile_run. The fix involves casting inputs_embeds to the correct model dtype within the _dummy_run method. The change is correct and effectively resolves the reported issue. However, I've noted that a similar dtype mismatch could potentially occur during regular inference, as the execute_model path lacks a similar safeguard. I've left a comment suggesting a more comprehensive fix in a follow-up to ensure robustness across all execution paths. Overall, this is a good fix for the immediate problem.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, how can this happen? The buffer should be in the correct dtype according to this code:
self.inputs_embeds = self._make_buffer(
self.max_num_tokens, self.inputs_embeds_size, dtype=self.dtype, numpy=False
)
|
https://gist.github.com/wasertech/fd579f5b09b2e9e0206dc5dac092791b#file-vllm-err-dtype-txt This is the full logs enforcing eager mode on Turing. No matter which dtype I feed using parameter flags I still get this error since v0.10.x (first release compatible with this particular model - Apertus-). |
|
Hi @wasertech, the pre-commit checks have failed. Please run: uv pip install pre-commit
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
This change consolidates the fix for handling runtime dtype mismatches for inputs_embeds, specifically observed in Docker environments with Apertus models. It includes a guard clause to cast inputs_embeds to self.dtype if they diverge, ensuring robustness against such edge cases. Signed-off-by: wasertech <danny@waser.tech>
After deep investigation, here is the most technically plausible explanation for the dtype mismatch: The Mechanism of Divergence: Initialization: The GPUModelRunner is initialized. Configuration Update / Divergence: Sometime later (likely during The Mismatch: The Crash: The The Fix: The check catches this exact divergence ( Crucially, Thanks Gemini for sniffing around and trying to explain this madness. |
|
Also, which model triggered this problem? Perhaps the problem is in how the dummy batch is created as well. |
|
This one: |
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if this problem happens to any other model?
|
Also it would be nice to try to print out |
I can serve my own mistral fine-tuned export in AWQ fine (
I agree let me add those and produce a nice log. |
|
Just tested my changes locally. Not only my fix doesn't solve the issue but the dtype is already aligned when entering the I understand your reaction now ahhah. I'll close this PR since it doesn't address the root of the issue (or even solves it hihi). It's likely in transformers? Will try to see where exactly. |
Purpose
Ensure inputs_embeds are cast to the correct dtype. This should help with #29349 and
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::HalfTest Plan
WIP
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.