-
-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[Performance] Move apply_w8a8_block_fp8_linear to an op class #24666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
vllm-bot
merged 50 commits into
vllm-project:main
from
neuralmagic:move-apply_w8a8_block_fp8_linear-to-class
Sep 23, 2025
Merged
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
c9ca102
Move apply_w8a8_block_fp8_linear to an op class
ElizaWszola eef4349
Remove TODO, bring back old one
ElizaWszola dd53183
CUDA graphs fix
ElizaWszola bb24881
Clean up
ElizaWszola 1ba47cd
Create linear op objects conditionally, move some arch checks to bloc…
ElizaWszola 02793b9
format
ElizaWszola b72c9f2
clean up repetitive code
ElizaWszola d51f35c
More aggressive dispatch of blockscale ops
ElizaWszola a6ae689
fix
ElizaWszola 3238ff6
Deep_gemm fix
ElizaWszola f9c79aa
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola 23341c2
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola 9b09b60
Post-merge fixes, better dispatch
ElizaWszola e6b0028
small fixes
ElizaWszola 9b5c552
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola ef6f1e2
Fix cutlass compilation issue on Hopper
ElizaWszola 77335de
Cleanup bad transpose
ElizaWszola 5eaf155
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola e036dac
Wrap w8a8_block_fp8_matmul
ElizaWszola 233e874
Rename padded_cutlass to padded_cutlass_scaled_mm, add todo
ElizaWszola 1edfedc
Cleanup dispatch_w8a8_blockscale_func
ElizaWszola 35a0236
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola 0ac3a1e
Deep gemm warmup fix
ElizaWszola 9a48100
Fix deep gemm support function
ElizaWszola b6a8fb8
Feedback
ElizaWszola e89ecd8
Pre-commit fixes
ElizaWszola 00cb05c
Pre-commit fixes 2
ElizaWszola 66c89e6
Feedback
ElizaWszola d9b4121
fix type issue
ElizaWszola 1bc81a1
Add use_ue8m0 support to _quantize_group_native
ElizaWszola ec73268
Fix padding compilation issue
ElizaWszola d19bf4b
Feedback
ElizaWszola 1f895e9
Update vllm/model_executor/layers/quantization/utils/fp8_utils.py
ElizaWszola be3ac58
Link bad group shape issue
ElizaWszola 3772f2f
format
ElizaWszola 8b6cbe4
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola 2a87a3b
fix quant config condition
ElizaWszola 012eaff
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
mgoin e7f6ec9
fix quant issue (TODO test)
ProExpertProg 10829d3
fix custom op test
ProExpertProg 15cf30e
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola ebdcb10
CUDA condition for compressed tensors and H100
ElizaWszola 2e3d206
Fix quantfp8 test
ElizaWszola bd32cb9
Test scales_col vs. scales_native
ElizaWszola efa4446
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ElizaWszola 1f00804
Add compressed tensors model test
ElizaWszola e895df6
Extra asserts, don't use enabled()
ElizaWszola 9806cf8
CUDA path for quant
ProExpertProg 2ae1ef9
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ProExpertProg 00bd638
Merge branch 'main' into move-apply_w8a8_block_fp8_linear-to-class
ProExpertProg File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.