Skip to content

Conversation

wbn03
Copy link
Contributor

@wbn03 wbn03 commented Dec 9, 2023

Replace head_mapping params with num_kv_heads to attention kernel.
Base on this issue:
https://github.com/vllm-project/vllm/issues/1928
To avoid the head_mapping load from global memory.

Copy link
Collaborator

@Yard1 Yard1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. cc @WoosukKwon

@WoosukKwon
Copy link
Collaborator

WoosukKwon commented Dec 10, 2023

@Yard1 It seems this exactly overlaps with #1994 I think we should add @zhaoyang-star as the co-author of this PR. Let me do it after running this PR.

@WoosukKwon WoosukKwon linked an issue Dec 10, 2023 that may be closed by this pull request
@WoosukKwon WoosukKwon self-requested a review December 10, 2023 03:25
@Yard1
Copy link
Collaborator

Yard1 commented Dec 10, 2023

@WoosukKwon Agreed, thanks!

@WoosukKwon WoosukKwon merged commit dacaf5a into vllm-project:main Dec 10, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
…llm-project#1997)

Co-authored-by: wangguoya <wangguoya@baidu.com>
Co-authored-by: Yang Zhao <zhaoyangstar@foxmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why we need head_mapping as param pass to paged_attention kernel?
4 participants