-
-
Notifications
You must be signed in to change notification settings - Fork 11.6k
[Model][Qwen3VL] Cache positional embedding indices #28475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model][Qwen3VL] Cache positional embedding indices #28475
Conversation
| def device(self) -> torch.device: | ||
| return self.patch_embed.proj.weight.device | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a static methods to make this work nicely with lru_cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a performance optimization for the Qwen3VL model by caching the computation of positional embedding indices. The rot_pos_emb method, which previously ran on the CPU for every forward pass, is refactored to use a new cached static method rot_pos_ids. This new method leverages functools.lru_cache to store results and avoid redundant computations for the same image dimensions. Additionally, the computation logic has been migrated from PyTorch to NumPy, which, as the author notes, provides a slight performance improvement even for uncached calls. The changes are well-implemented, correct, and effectively address the performance bottleneck. The code is clearer and more efficient. I have no further comments.
ywang96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Left a very small nit
|
Please fix the merge conflicts |
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
9431e54 to
1b083ab
Compare
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: George D. Torres <gdavtor@gmail.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>
Purpose
rot_pos_embcurrently runs on the CPU at the beginning of the model forward pass which means that no GPU computation happens in parallel. This PR caches the index computation to help with this. I also moved the index computation to numpy which gave a small ~25-30% improvement to uncached computations as well.Test Plan
Test Result
Before:
After: