-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 #28938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix build issues on DGX Spark by ensuring that SM100-specific CUTLASS MoE kernels are only built for SM100 architectures. The changes correctly remove SM120 architectures from some of the build configurations in CMakeLists.txt. While the changes are correct, the fix appears to be incomplete. I've identified other sections in CMakeLists.txt for SM100 kernels that still incorrectly include SM120 architectures. I've left a specific comment pointing to these locations. Applying the fix consistently across the file will prevent future build problems. Overall, this is a good step towards improving build correctness.
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com>
wrmedford
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, and preserves functionality on sm110a across its rename.
mgoin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com> (cherry picked from commit 49ef847)
…ct#28938) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
I see we released a new wheel with this fix for DGX Spark, should we expect the aarch64 wheel to be compatible with DGX Spark and run accelerated workloads soon? |
|
Coffee time: |
…ct#28938) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>
yes it is compatible |
|
I happen to have a DGX spark I use daily for vLLM dev work anyway, so trying the v0.11.2 release on there: That all works fine, and pulls in CUDA 13 libs as expected: But, when trying a simple test to serve openai/gpt-oss-20b (something I regularly do on vLLM builds from source), I get: Do I need to do something differently with my install command to get the released wheel working? |
|
@bbrowning I think this gets you past that: uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130 Then you end up with: libtorch_cuda.so missing... |
|
@ericcurtin That's one of the steps I take when building from source, along with several others including That second command to install vLLM overwrites my torch, torchvision, and torchaudio I just installed above it. I'm sure I can get the release installing from source with these kind of steps. But, since there was some indication the released wheel may just work on DGX Spark, I was trying to do that. |
|
I was able to install release v0.11.2 via these commands on a DGX spark: That compiled from source without issue. So, while we don't have any wheel releases that work yet for DGX Spark, release v0.11.2 does install on the system without extra hacks. |
|
Does |
just try, now the best backend for spark is flash infer |
Yes, it starts up without issue and I was able to send a simple chat completion request to it just to see some kind of generation working. |
Most of the times I have been installing without these flags: --no-binary --torch-backend=auto I wonder is that the difference... The iterations are slow in my house, I mean an iteration is an hour (bad bandwidth, thanks for answering) |
|
I'd appreciate it if someone put together a simple: with commands like: that works, I feel like I've tried it 100 times with new errors each time and failed :'( |
|
@ericcurtin Oh, I'm doing this directly on my spark and not inside a container. A container will need additional steps, but let me see what I can figure out. |
|
@ericcurtin I was able to build and run a functioning v0.11.2 container directly on my DGX Spark with the Dockerfile at https://gist.github.com/bbrowning/e2efe77b617b741a23ed31333a7ecba9 - it just takes the first bits of the official vLLM container and installs from releases instead of source, along with removing as much of the unnecessary bits I could find to simplify things for just this one use case. Make sure to pass |
fix DGX Spark vllm issue
Purpose
Continue with this PR. No answer from owner. #26844
cc @mgoin