Skip to content

Conversation

zhuohan123
Copy link
Member

Fix an error in #1731.

Copy link
Collaborator

@WoosukKwon WoosukKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my bad. Thanks for the fix!

@zhuohan123 zhuohan123 merged commit 7d761fe into main Nov 21, 2023
@zhuohan123 zhuohan123 deleted the fix_parallel_scaled_activation_weight_loading branch November 28, 2023 00:05
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
jinyouzhi pushed a commit to jinyouzhi/vllm that referenced this pull request Sep 12, 2025
…project#1737)

## Essential Elements of an Effective PR Description Checklist
- [x] The purpose of the PR, such as "Fix some issue (link existing
issues this PR will resolve)".
- [ ] The test plan, such as providing test command.
- [ ] The test results, such as pasting the results comparison before
and after, or e2e results


## Purpose

Found current m_rope check is always call transformer API, which leads
to a deeper python stack and longer CPU time

Before:
thinker_uses_mrope is called 33 times => leads to model_is_mrope spent
0.097ms.
<img width="1682" height="573" alt="image"
src="https://github.com/user-attachments/assets/f5de5586-8aa9-4028-b1ba-05b85dc6eaa1"
/>


With this PR:
we removed thinker_uses_mrope call (only call once to set local
property) => leads to model_is_mrope only spends 0.006ms
<img width="1685" height="548" alt="image"
src="https://github.com/user-attachments/assets/1b311199-e1a6-4dc5-b663-e2592fe18a57"
/>



## Test Plan

## Test Result

<!--- pyml disable-next-line no-emphasis-as-heading -->

Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants