Skip to content

Conversation

@MatthewBonanni
Copy link

Add FP8 support via patching in this unmerged PR from upstream: #54

@MatthewBonanni MatthewBonanni changed the title Feature/fp8 from pr Add FP8 support Aug 6, 2025
@MatthewBonanni MatthewBonanni force-pushed the feature/fp8_from_pr branch 2 times, most recently from 015fdaf to 193d01f Compare August 8, 2025 14:34
@MatthewBonanni MatthewBonanni marked this pull request as ready for review August 8, 2025 15:51
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
@MicroZHY
Copy link

Hi @MatthewBonanni,
Thank you for porting the FP8 feature over!
Could you please summarize the main incremental changes between your branch #4 and the original upstream PR deepseek-ai#54 ?
In other words, what did you add, drop, or modify on top of the upstream patch so that it fits the vLLM fork?
Really appreciate your time!
Best regards

@MatthewBonanni
Copy link
Author

MatthewBonanni commented Aug 14, 2025

@MicroZHY Glad to contribute! The code from the upstream PR is largely unchanged; I simply copied the relevant kernel into a new kernels_fp8 folder and updated mha_fwd_kvcache_mla to route to the appropriate kernel based on the dtype. The FP8 kernel also takes an Flash_fwd_mla_params_fp8 object, which is a child of Flash_fwd_mla_params, simply adding the descales and h_h_k_ratio, a parameter no longer used by the main kernel.

Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks

@LucasWilkinson LucasWilkinson merged commit a757314 into vllm-project:main Aug 18, 2025
1 check passed
@MatthewBonanni MatthewBonanni mentioned this pull request Aug 18, 2025
@MatthewBonanni MatthewBonanni deleted the feature/fp8_from_pr branch September 24, 2025 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants