-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
[Kernels][Bugfix] Use torch op for all kernels in FusedMoE forward. Add additional testing for cudagraphs. #19717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
After updating the unit tests with mark_dynamic, I'm running into |
Are you referring to #19631 or something else? Yes, the way to avoid specialization is to hide some things in a custom op |
Yeah, #19631. |
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To document somewhere in the PR description why we change from .shape
to .size()
, pasting from a DM between me and @bnellnm:
i was messing with torch compile and wanted to make sure any dynamic dims got preserved properly. .size() preserves SymInts where .shape doesn't. i figured it was safer to just switch all of them
safer+more consistent
Signed-off-by: Bill Nell <bnell@redhat.com>
Head branch was pushed to by a user without write access
I don't think there should be a difference between these two. If there is, then it's a bug in PyTorch -- both of them should preserve SymInts. cc @bobrenjc93 |
Maybe this has changed in newer versions of pytorch but iirc at one point using |
@@ -29,7 +29,10 @@ | |||
(224, 1024, 1536), | |||
(224, 3072, 1024), | |||
(224, 3072, 1536), | |||
(1024 * 128, 1024, 1024), | |||
(32768, 1024, 1024), | |||
# These sizes trigger wrong answers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bnellnm could you share some e2e numerical testing results here? this means the current approach still has correctness problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an issue with the cutlass kernels themselves (separate from chunking). I'm not sure if the tolerances need to be changed or if there's a real problem with cutlass. @ElizaWszola can probably provide some more insight here.
This PR broke the following two tests (they are not run by default during CI):
I have confirmed that both tests pass using the commit before this PR was merged ( The failure seems to come specifically from always using the torch op. If I take |
The solution for failures like this is to make sure a I'll either update the test or make the MoE layer tolerate a missing context. |
In order to avoid any issues with activation chunking and torch.compile, I've moved the remainder of the kernel calls in the FusedMoE layer into the
moe_forward
custom op. I've also added some additional testing for cudagraphs at the pytest level for triton, deep gemm and cutlass.Some sizes are disabled for cutlass since those result in wrong answer errors.
This should help address #19631
I'm going to include these changes in #19636.
cc @tlrmchlsmth , @varun-sundar-rabindranath , @ElizaWszola