[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 #28938

johnnynunez · 2025-11-18T11:50:55Z

fix DGX Spark vllm issue

Purpose

Continue with this PR. No answer from owner. #26844

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

gemini-code-assist

Code Review

This pull request aims to fix build issues on DGX Spark by ensuring that SM100-specific CUTLASS MoE kernels are only built for SM100 architectures. The changes correctly remove SM120 architectures from some of the build configurations in CMakeLists.txt. While the changes are correct, the fix appears to be incomplete. I've identified other sections in CMakeLists.txt for SM100 kernels that still incorrectly include SM120 architectures. I've left a specific comment pointing to these locations. Applying the fix consistently across the file will prevent future build problems. Overall, this is a good step towards improving build correctness.

CMakeLists.txt

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Signed-off-by: Johnny <johnnynuca14@gmail.com>

wrmedford

Looks good to me, and preserves functionality on sm110a across its rename.

mgoin

Thank you!

Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com> (cherry picked from commit 49ef847)

…ct#28938) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>

ericcurtin · 2025-11-20T09:57:06Z

I see we released a new wheel with this fix for DGX Spark, should we expect the aarch64 wheel to be compatible with DGX Spark and run accelerated workloads soon?

https://github.com/vllm-project/vllm/releases/tag/v0.11.2

ericcurtin · 2025-11-20T12:52:22Z

Coffee time:

https://buymeacoffee.com/johnnycano

…ct#28938) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: Bhagyashri <Bhagyashri.Gaikwad2@ibm.com>

johnnynunez · 2025-11-20T16:07:57Z

I see we released a new wheel with this fix for DGX Spark, should we expect the aarch64 wheel to be compatible with DGX Spark and run accelerated workloads soon?

https://github.com/vllm-project/vllm/releases/tag/v0.11.2

yes it is compatible

bbrowning · 2025-11-20T16:40:36Z

I happen to have a DGX spark I use daily for vLLM dev work anyway, so trying the v0.11.2 release on there:

mkdir -p ~/tmp/vllm-v0.11.2
cd ~/tmp/vllm-v0.11.2
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install "vllm==v0.11.2" --torch-backend=auto

That all works fine, and pulls in CUDA 13 libs as expected:

...
 + nvidia-cublas==13.0.0.19                                                                                                                                                                                                                                                                     
 + nvidia-cuda-cupti==13.0.48
 + nvidia-cuda-nvrtc==13.0.48
 + nvidia-cuda-runtime==13.0.48
 + nvidia-cudnn-cu13==9.13.0.50
 + nvidia-cudnn-frontend==1.16.0
 + nvidia-cufft==12.0.0.15
 + nvidia-cufile==1.15.0.42
 + nvidia-curand==10.4.0.35
 + nvidia-cusolver==12.0.3.29
 + nvidia-cusparse==12.6.2.49
 + nvidia-cusparselt-cu13==0.8.0
 + nvidia-cutlass-dsl==4.2.1
 + nvidia-ml-py==13.580.82
 + nvidia-nccl-cu13==2.27.7
 + nvidia-nvjitlink==13.0.39
 + nvidia-nvshmem-cu13==3.3.24
 + nvidia-nvtx==13.0.39
 ...

But, when trying a simple test to serve openai/gpt-oss-20b (something I regularly do on vLLM builds from source), I get:

Traceback (most recent call last):
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/bin/vllm", line 4, in <module>
    from vllm.entrypoints.cli.main import main
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/__init__.py", line 3, in <module>
    from vllm.entrypoints.cli.benchmark.latency import BenchmarkLatencySubcommand
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/entrypoints/cli/benchmark/latency.py", line 5, in <module>
    from vllm.benchmarks.latency import add_cli_args, main                                                                                      
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/benchmarks/latency.py", line 17, in <module>              
    from vllm.engine.arg_utils import EngineArgs          
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 35, in <module>  
    from vllm.attention.backends.registry import AttentionBackendEnum
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/attention/__init__.py", line 4, in <module>
    from vllm.attention.backends.abstract import (                   
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/attention/backends/abstract.py", line 9, in <module>
    from vllm.model_executor.layers.linear import ColumnParallelLinear
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/model_executor/__init__.py", line 4, in <module>    
    from vllm.model_executor.parameter import BasevLLMParameter, PackedvLLMParameter
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/model_executor/parameter.py", line 11, in <module>
    from vllm.distributed import (                                                                                                              
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/distributed/__init__.py", line 4, in <module>     
    from .communication_op import *
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/distributed/communication_op.py", line 9, in <module>
    from .parallel_state import get_tp_group
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 250, in <module>
    direct_register_custom_op(              
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/utils/torch_utils.py", line 640, in direct_register_custom_op
    from vllm.platforms import current_platform
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/platforms/__init__.py", line 257, in __getattr__             
    _current_platform = resolve_obj_by_qualname(platform_cls_qualname)() 
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                          
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/utils/import_utils.py", line 89, in resolve_obj_by_qualname
    module = importlib.import_module(module_name)                     
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                               
  File "/usr/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                 
  File "/home/bbrowning/tmp/vllm-v0.11.2/.venv/lib/python3.12/site-packages/vllm/platforms/cuda.py", line 16, in <module>
    import vllm._C  # noqa                                     
    ^^^^^^^^^^^^^^                                                                                                                              
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

Do I need to do something differently with my install command to get the released wheel working?

ericcurtin · 2025-11-20T18:47:12Z

@bbrowning I think this gets you past that:

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130

Then you end up with:

libtorch_cuda.so

missing...

bbrowning · 2025-11-20T19:06:28Z

@ericcurtin That's one of the steps I take when building from source, along with several others including python use_existing_torch.py so that installing vLLM from source does not overwrite my torch. But, trying to consume the release wheels, the following still results in the same error:

uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install "vllm==v0.11.2" --torch-backend=auto

That second command to install vLLM overwrites my torch, torchvision, and torchaudio I just installed above it.

I'm sure I can get the release installing from source with these kind of steps. But, since there was some indication the released wheel may just work on DGX Spark, I was trying to do that.

bbrowning · 2025-11-20T20:21:11Z

I was able to install release v0.11.2 via these commands on a DGX spark:

mkdir -p ~/tmp/vllm-v0.11.2
cd ~/tmp/vllm-v0.11.2
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install "vllm==v0.11.2" --no-binary vllm --torch-backend=auto

That compiled from source without issue. So, while we don't have any wheel releases that work yet for DGX Spark, release v0.11.2 does install on the system without extra hacks.

ericcurtin · 2025-11-20T20:48:38Z

Does vllm serve "HuggingFaceTB/SmolLM-135M-Instruct" work with this installation technique?

johnnynunez · 2025-11-20T20:56:23Z

Does vllm serve "HuggingFaceTB/SmolLM-135M-Instruct" work with this installation technique?

just try, now the best backend for spark is flash infer

bbrowning · 2025-11-20T21:11:31Z

Does vllm serve "HuggingFaceTB/SmolLM-135M-Instruct" work with this installation technique?

Yes, it starts up without issue and I was able to send a simple chat completion request to it just to see some kind of generation working.

ericcurtin · 2025-11-20T21:16:48Z

Does vllm serve "HuggingFaceTB/SmolLM-135M-Instruct" work with this installation technique?

Yes, it starts up without issue and I was able to send a simple chat completion request to it just to see some kind of generation working.

Most of the times I have been installing without these flags:

--no-binary --torch-backend=auto

I wonder is that the difference...

The iterations are slow in my house, I mean an iteration is an hour (bad bandwidth, thanks for answering)

ericcurtin · 2025-11-20T21:42:43Z

I'd appreciate it if someone put together a simple:

FROM ubuntu:24.04

RUN
RUN

with commands like:

mkdir -p ~/tmp/vllm-v0.11.2
cd ~/tmp/vllm-v0.11.2
uv venv --python 3.12 --seed
source .venv/bin/activate
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu130
uv pip install "vllm==v0.11.2" --no-binary vllm --torch-backend=auto

that works, I feel like I've tried it 100 times with new errors each time and failed :'(

bbrowning · 2025-11-20T22:28:32Z

@ericcurtin Oh, I'm doing this directly on my spark and not inside a container. A container will need additional steps, but let me see what I can figure out.

bbrowning · 2025-11-21T00:16:46Z

@ericcurtin I was able to build and run a functioning v0.11.2 container directly on my DGX Spark with the Dockerfile at https://gist.github.com/bbrowning/e2efe77b617b741a23ed31333a7ecba9 - it just takes the first bits of the official vLLM container and installs from releases instead of source, along with removing as much of the unnecessary bits I could find to simplify things for just this one use case.

Make sure to pass --gpus=all when running the built container. The entrypoint is set to vllm serve just like the official container, so pass whatever args you need after that.

johnnynunez requested review from LucasWilkinson and tlrmchlsmth as code owners November 18, 2025 11:50

mergify bot added ci/build nvidia labels Nov 18, 2025

github-project-automation bot added this to NVIDIA Nov 18, 2025

johnnynunez changed the title ~~Guard SM100 CUTLASS MoE macro to SM100 builds v2~~ [NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 Nov 18, 2025

continue to sm100 guard pr

7c5a8f0

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

johnnynunez force-pushed the spark branch from a3be336 to 7c5a8f0 Compare November 18, 2025 11:52

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

johnnynunez and others added 4 commits November 18, 2025 12:54

continue to sm100 guard pr

5be2b0a

Signed-off-by: johnnynunez <johnnynuca14@gmail.com>

Merge branch 'main' into spark

dde9752

Update CMakeLists.txt

e421661

Signed-off-by: Johnny <johnnynuca14@gmail.com>

Merge branch 'vllm-project:main' into spark

2e20598

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 18, 2025

mgoin mentioned this pull request Nov 18, 2025

Guard SM100 CUTLASS MoE macro to SM100 builds #26844

Open

5 tasks

wrmedford approved these changes Nov 18, 2025

View reviewed changes

mgoin approved these changes Nov 19, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 19, 2025

vllm-bot merged commit 49ef847 into vllm-project:main Nov 19, 2025
87 of 89 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 19, 2025

khluu pushed a commit that referenced this pull request Nov 19, 2025

[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (#28938)

0ce9990

Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com> (cherry picked from commit 49ef847)

Victor49152 pushed a commit to Victor49152/vllm that referenced this pull request Nov 20, 2025

[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 (vllm-proje…

cf3479a

…ct#28938) Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnynuca14@gmail.com>

ericcurtin mentioned this pull request Nov 20, 2025

vLLM 0.11.2 for DGX Spark support docker/model-runner#428

Open

Uh oh!

[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 #28938

[NVIDIA] Guard SM100 CUTLASS MoE macro to SM100 builds v2 #28938

Conversation

johnnynunez commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

wrmedford left a comment

Choose a reason for hiding this comment

Uh oh!

mgoin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericcurtin commented Nov 20, 2025

Uh oh!

ericcurtin commented Nov 20, 2025

Uh oh!

johnnynunez commented Nov 20, 2025

Uh oh!

bbrowning commented Nov 20, 2025

Uh oh!

ericcurtin commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bbrowning commented Nov 20, 2025

Uh oh!

bbrowning commented Nov 20, 2025

Uh oh!

ericcurtin commented Nov 20, 2025

Uh oh!

johnnynunez commented Nov 20, 2025

Uh oh!

bbrowning commented Nov 20, 2025

Uh oh!

ericcurtin commented Nov 20, 2025

Uh oh!

ericcurtin commented Nov 20, 2025

Uh oh!

bbrowning commented Nov 20, 2025

Uh oh!

bbrowning commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

johnnynunez commented Nov 18, 2025 •

edited by github-actions bot

Loading

ericcurtin commented Nov 20, 2025 •

edited

Loading