Skip to content

Conversation

SageMoore
Copy link
Contributor

@SageMoore SageMoore commented Sep 24, 2025

Purpose

Currently on vllm will output the following warning when capturing DBO cudagraphs.

/home/sage/git/nm-vllm/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:1166: UserWarning: Attempting to run cuBLAS, but there was no current CUDA context! Attempting to set the primary context... (Triggered internally at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:179.)
(EngineCore_DP0 pid=3571065)   return torch._C._cuda_getCurrentBlasHandle()
(EngineCore_DP1 pid=3571066) /home/sage/git/nm-vllm/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:1166: UserWarning: Attempting to run cuBLAS, but there was no current CUDA context! Attempting to set the primary context... (Triggered internally at /pytorch/aten/src/ATen/cuda/CublasHandlePool.cpp:179.)
(EngineCore_DP1 pid=3571066)   return torch._C._cuda_getCurrentBlasHandle()

This warning is completely benign so we should suppress it.

Test Plan

Spun up a vllm server with DBO enabled and confirmed that the message no longer appears

Test Result

VLLM_ALL2ALL_BACKEND=deepep_low_latency vllm serve --model="deepseek-ai/DeepSeek-V2-Lite" --data-parallel-size 2 --enable-expert-parallel --gpu-memory-utilization 0.75 --enable-dbo

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.3567|±  |0.0277|
|     |       |strict-match    |     5|exact_match|↑  |0.3533|±  |0.0276|

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a benign cuBLAS warning that occurs during CUDA graph capture with DBO. The root cause is correctly identified as a missing CUDA context in the worker threads. The fix involves storing the device in the UBatchWrapper and explicitly setting it at the beginning of the _capture_ubatch_thread using torch.cuda.set_device(). This ensures a CUDA context is established before any cuBLAS operations are attempted, effectively resolving the warning. The change is clean, well-targeted, and correctly implemented. I have no further suggestions for improvement.

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 24, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) September 24, 2025 16:53
@tlrmchlsmth tlrmchlsmth merged commit f84a472 into vllm-project:main Sep 24, 2025
55 checks passed
yewentao256 pushed a commit that referenced this pull request Oct 3, 2025
…5596)

Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants