[Bug]: Get NCCL_ERROR_SYSTEM_ERROR with latest Docker vLLM image (v0.9.1)

### Your current environment

<details><summary>I get the NCCL_ERROR_SYSTEM_ERROR error since I moved to the latest vLLM docker image</summary></details>

I am running distributed vLLM on a 4-node Ray cluster where each node has 8 GPUs (NVIDIA RTX 2000) and 8 NIC ports (CX-5). vLLM does start correctly at launch time. 

The crash happens when I start the HF Inference Benchmark. See details in the Describe the bug section below.

Note that my test is working fine with vLLM v0.8.5.post1 (upgraded to Ray 2.4.6)

Using the DeepSeek-R1-Distill-Llama-8B model but the issue also happens with other models, such as Llama-4-Maverick-17B-128E-Instruct-FP8

Current component version
```
root@echo:/vllm-workspace# pip show vllm 
Name: vllm
Version: 0.9.1

root@echo:/vllm-workspace# pip show ray
Name: ray
Version: 2.46.0

NV_LIBNCCL_PACKAGE=libnccl2=2.25.1-1+cuda12.8
```


### 🐛 Describe the bug

<details>
I am running distributed vLLM on a 4-node Ray cluster where each node has 8 GPUs (NVIDIA RTX 2000) and 8 NIC ports (CX-5). vLLM does start correctly at launch time. The crash happens when I start the HF Inference Benchmark.


Here is my vLLM launcher command:
```
VLLM_HOST_IP=${HEAD_NODE_IP} python3 -m vllm.entrypoints.openai.api_server \
    --model ${MODEL_NAME} \
    --host 0.0.0.0 \
    --port ${VLLM_PORT} \
    --tensor-parallel-size 8 \
    --pipeline-parallel-size 4 \
    --max-model-len 8192 \
    --max-num-batched-tokens 8192 \
    --max-num-seqs 256 \
    --gpu-memory-utilization 0.95 \
    --disable-custom-all-reduce \
    --distributed-executor-backend ray
```

Here is the log output:
```text
INFO:     Application startup complete.
INFO 06-13 07:35:57 [chat_utils.py:420] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
INFO 06-13 07:35:57 [logger.py:43] Received request chatcmpl-1d0fbaa8547d44d9b43101e0ffb998a0: prompt: '<｜begin▁of▁sentence｜><｜User｜>Java add to the arraylist of a class type<｜Assistant｜><think>\n', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.0, top_p=1.0, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=706, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: None, prompt_embeds shape: None, lora_request: None, prompt_adapter_request: None.
INFO:     127.0.0.1:41822 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 06-13 07:35:57 [async_llm.py:271] Added request chatcmpl-1d0fbaa8547d44d9b43101e0ffb998a0.
INFO 06-13 07:35:57 [ray_distributed_executor.py:562] VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE = auto
INFO 06-13 07:35:57 [ray_distributed_executor.py:564] VLLM_USE_RAY_COMPILED_DAG_OVERLAP_COMM = False
INFO 06-13 07:35:57 [ray_distributed_executor.py:579] RAY_CGRAPH_get_timeout is set to 100
2025-06-13 07:35:58,526	INFO torch_tensor_nccl_channel.py:772 -- Creating NCCL group 3200c040-d2be-4c99-842b-730529383025 on actors: [Actor(RayWorkerWrapper, bebeb5f99868994a4998a8fd01000000), Actor(RayWorkerWrapper, 93e17680bba685c71a729e0001000000), Actor(RayWorkerWrapper, e547e19cf6c6f549f6ea466401000000), Actor(RayWorkerWrapper, 204ff50d2953c8c03cc7425401000000), Actor(RayWorkerWrapper, e8b9319c7f06769ee3cdc13001000000), Actor(RayWorkerWrapper, b918f94811a9ec8999019d5101000000), Actor(RayWorkerWrapper, 30f3b37c2a57b59fabc7370e01000000), Actor(RayWorkerWrapper, 745668516386b99e44c927af01000000), Actor(RayWorkerWrapper, 9b859d42ef61170ebf74641401000000), Actor(RayWorkerWrapper, e2de0ed35e014120c90b19c201000000), Actor(RayWorkerWrapper, 248270ebe94b42510319457001000000), Actor(RayWorkerWrapper, b2896eab02ace9e27fcdaf7d01000000), Actor(RayWorkerWrapper, f383873f4a00245053e6156301000000), Actor(RayWorkerWrapper, fb380e494adb6bfb9cdc889e01000000), Actor(RayWorkerWrapper, 675bd5f78f61325782c9aaab01000000), Actor(RayWorkerWrapper, 509a8b78e385c7847cc7598701000000), Actor(RayWorkerWrapper, fe35d99a04ae968d109b957a01000000), Actor(RayWorkerWrapper, 8c477867837cbe6993f38d4101000000), Actor(RayWorkerWrapper, ed936fd7dd6af7d365da038901000000), Actor(RayWorkerWrapper, c350dabff57c320662c2672f01000000), Actor(RayWorkerWrapper, 925b397e1d19eef5b1ee573101000000), Actor(RayWorkerWrapper, e677615e7158b3c1cc40e09a01000000), Actor(RayWorkerWrapper, 0c2c5710f757a68c93b3fc1701000000), Actor(RayWorkerWrapper, 63fa9453dd73a3ad81b1e72601000000), Actor(RayWorkerWrapper, 0f4ec1abad6a7ffd98c45c9701000000), Actor(RayWorkerWrapper, 012b1a3eff432fea33a1876901000000), Actor(RayWorkerWrapper, 1a03881567a0986c6e09ff7d01000000), Actor(RayWorkerWrapper, 1c5cb5e3433fbb6aeda5d96601000000), Actor(RayWorkerWrapper, 3dfea73f20cc2d995555c63801000000), Actor(RayWorkerWrapper, f2c1744dcc05e9d924e7611601000000), Actor(RayWorkerWrapper, 85bc08f778b9e6cf3537e26101000000), Actor(RayWorkerWrapper, ccdc287311ec09ae70c5c7a701000000)]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Init timings - ncclCommInitRankConfig: rank 1 nranks 8 total 0.09 (kernels 0.00, alloc 0.00, bootstrap 0.00, allgathers 0.00, topo 0.03, graphs 0.00, connections 0.05, rest 0.00)
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 
(RayWorkerWrapper pid=933) INFO 06-13 07:35:31 [gpu_model_runner.py:2048] Graph capturing finished in 28 secs, took 0.25 GiB [repeated 31x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO P2P Chunksize set to 131072 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3522 [1] NCCL INFO [Proxy Service] Device 1 CPU core 40 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3531 [1] NCCL INFO [Proxy Service UDS] Device 1 CPU core 34 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3538 [1] NCCL INFO [Proxy Progress] Device 1 CPU core 11 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO Channel 00/0 : 1[1] -> 3[1] [receive] via NET/IB/9/GDRDMA [repeated 6x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO NCCL_NET_GDR_READ set by environment to 1.
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO Channel 00/0 : 3[1] -> 1[1] [send] via NET/IB/9/GDRDMA [repeated 6x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:1664 [1] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Connected all rings, use ring PXN 0 GDR 1 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Connected all trees [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO Connected binomial trees
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512 [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO 2 coll channels, 2 collnet channels, 0 nvls channels, 2 p2p channels, 2 p2p channels per peer [repeated 2x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO ncclCommInitRank comm 0x26bafab0 rank 3 nranks 4 cudaDev 1 nvmlDev 1 busId 6000 commId 0x27da8d0f99f5ca84 - Init COMPLETE
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO Init timings - ncclCommInitRank: rank 3 nranks 4 total 0.15 (kernels 0.00, alloc 0.00, bootstrap 0.00, allgathers 0.00, topo 0.01, graphs 0.00, connections 0.13, rest 0.00)
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:291 [1] NCCL INFO Comm config Blocking set to 1
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Using network IB
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO DMA-BUF is available on GPU device 1
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO ncclCommInitRankConfig comm 0x450b0f20 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 6000 commId 0x637996529902fd29 - Init START
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Bootstrap timings total 0.000820 (create 0.000045, send 0.000188, recv 0.000145, ring 0.000244, delay 0.000000)
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Setting affinity for GPU 1 to ffff,0000ffff
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO NVLS multicast support is not available on dev 1
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO comm 0x450b0f20 rank 1 nRanks 8 nNodes 1 localRanks 8 localRank 1 MNNVL 0
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO Channel 01/0 : 1[1] -> 0[0] via P2P/IPC [repeated 4x across cluster]
(RayWorkerWrapper pid=291, ip=10.0.0.168) kilo:291:3517 [1] NCCL INFO ncclCommInitRankConfig comm 0x450b0f20 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 6000 commId 0x637996529902fd29 - Init COMPLETE
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2] NCCL INFO NCCL_NET_GDR_READ set by environment to 1.
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:1674 [2] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2] NCCL INFO Connected binomial trees
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2] NCCL INFO ncclCommInitRank comm 0x16266fa0 rank 3 nranks 4 cudaDev 2 nvmlDev 2 busId 9000 commId 0xd68ee9af498f33ac - Init COMPLETE
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2] NCCL INFO Init timings - ncclCommInitRank: rank 3 nranks 4 total 0.16 (kernels 0.00, alloc 0.00, bootstrap 0.00, allgathers 0.00, topo 0.01, graphs 0.00, connections 0.14, rest 0.00)
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2] NCCL INFO Comm config Blocking set to 1
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO Using network IB
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO DMA-BUF is available on GPU device 2
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO ncclCommInitRankConfig comm 0x34766920 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 9000 commId 0x637996529902fd29 - Init START
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO Bootstrap timings total 0.003778 (create 0.000045, send 0.000155, recv 0.000721, ring 0.000297, delay 0.000000)
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO Setting affinity for GPU 2 to ffff,0000ffff
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO NVLS multicast support is not available on dev 2
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO comm 0x34766920 rank 2 nRanks 8 nNodes 1 localRanks 8 localRank 2 MNNVL 0
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:3516 [2] NCCL INFO ncclCommInitRankConfig comm 0x34766920 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 9000 commId 0x637996529902fd29 - Init COMPLETE
(RayWorkerWrapper pid=292, ip=10.0.0.168) kilo:292:292 [2
(RayWorkerWrapper pid=292, ip=10.0.0.145) november:292:292 [2] NCCL INFO ncclCommInitRank comm 0x72a0e9d0 rank 0 nranks 32 cudaDev 2 nvmlDev 2 busId 9000 commId 0x49a0ca1d8831e923 - Init START
(RayWorkerWrapper pid=292, ip=10.0.0.145) november:292:292 [2] NCCL INFO Channel 00/02 :  0 23 11 25  2  3 16 18  1 30  5 17  7 20 12 31  4 15 26 29 14 22 21 28  6 27  8 13  9 24 10 19
(RayWorkerWrapper pid=292, ip=10.0.0.145) november:292:292 [2] NCCL INFO Channel 01/02 :  0 23 11 25  2  3 16 18  1 30  5 17  7 20 12 31  4 15 26 29 14 22 21 28  6 27  8 13  9 24 10 19
(RayWorkerWrapper pid=292, ip=10.0.0.145) november:292:292 [2] NCCL INFO Check P2P Type intraNodeP2pSupport 0 directMode 0
(RayWorkerWrapper pid=292, ip=10.0.0.145) november:292
ERROR 06-13 07:35:59 [core.py:517] EngineCore encountered a fatal error.
ERROR 06-13 07:35:59 [core.py:517] Traceback (most recent call last):
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 508, in run_engine_core
ERROR 06-13 07:35:59 [core.py:517]     engine_core.run_busy_loop()
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 535, in run_busy_loop
ERROR 06-13 07:35:59 [core.py:517]     self._process_engine_step()
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 560, in _process_engine_step
ERROR 06-13 07:35:59 [core.py:517]     outputs, model_executed = self.step_fn()
ERROR 06-13 07:35:59 [core.py:517]                               ^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 263, in step_with_batch_queue
ERROR 06-13 07:35:59 [core.py:517]     future = self.model_executor.execute_model(scheduler_output)
ERROR 06-13 07:35:59 [core.py:517]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/ray_distributed_executor.py", line 52, in execute_model
ERROR 06-13 07:35:59 [core.py:517]     self.forward_dag = self._compiled_ray_dag(enable_asyncio=False)
ERROR 06-13 07:35:59 [core.py:517]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 630, in _compiled_ray_dag
ERROR 06-13 07:35:59 [core.py:517]     return forward_dag.experimental_compile(
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/dag/dag_node.py", line 340, in experimental_compile
ERROR 06-13 07:35:59 [core.py:517]     return build_compiled_dag_from_ray_dag(
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 3315, in build_compiled_dag_from_ray_dag
ERROR 06-13 07:35:59 [core.py:517]     compiled_dag._get_or_compile()
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 1564, in _get_or_compile
ERROR 06-13 07:35:59 [core.py:517]     self._preprocess()
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 1299, in _preprocess
ERROR 06-13 07:35:59 [core.py:517]     self._init_communicators()
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/dag/compiled_dag_node.py", line 1357, in _init_communicators
ERROR 06-13 07:35:59 [core.py:517]     p2p_communicator_id = _init_communicator(
ERROR 06-13 07:35:59 [core.py:517]                           ^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/torch_tensor_nccl_channel.py", line 790, in _init_communicator
ERROR 06-13 07:35:59 [core.py:517]     ray.get(init_tasks, timeout=30)
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
ERROR 06-13 07:35:59 [core.py:517]     return fn(*args, **kwargs)
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
ERROR 06-13 07:35:59 [core.py:517]     return func(*args, **kwargs)
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2822, in get
ERROR 06-13 07:35:59 [core.py:517]     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
ERROR 06-13 07:35:59 [core.py:517]                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 930, in get_objects
ERROR 06-13 07:35:59 [core.py:517]     raise value.as_instanceof_cause()
ERROR 06-13 07:35:59 [core.py:517] ray.exceptions.RayTaskError(NcclError): ray::RayWorkerWrapper.__ray_call__() (pid=296, ip=10.0.0.168, actor_id=e8b9319c7f06769ee3cdc13001000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7a977d3b0bf0>)
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/actor.py", line 1739, in __ray_call__
ERROR 06-13 07:35:59 [core.py:517]     return fn(self, *args, **kwargs)
ERROR 06-13 07:35:59 [core.py:517]            ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/torch_tensor_nccl_channel.py", line 658, in _do_init_communicator
ERROR 06-13 07:35:59 [core.py:517]     ctx.communicators[group_id] = _NcclGroup(
ERROR 06-13 07:35:59 [core.py:517]                                   ^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "/usr/local/lib/python3.12/dist-packages/ray/experimental/channel/nccl_group.py", line 91, in __init__
ERROR 06-13 07:35:59 [core.py:517]     self._comm = self.nccl_util.NcclCommunicator(world_size, comm_id, rank)
ERROR 06-13 07:35:59 [core.py:517]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-13 07:35:59 [core.py:517]   File "cupy_backends/cuda/libs/nccl.pyx", line 283, in cupy_backends.cuda.libs.nccl.NcclCommunicator.__init__
ERROR 06-13 07:35:59 [core.py:517]   File "cupy_backends/cuda/libs/nccl.pyx", line 129, in cupy_backends.cuda.libs.nccl.check_status
ERROR 06-13 07:35:59 [core.py:517] cupy_backends.cuda.libs.nccl.NcclError: NCCL_ERROR_SYSTEM_ERROR: unhandled system error (run with NCCL_DEBUG=INFO for details)
```
</details>


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: Get NCCL_ERROR_SYSTEM_ERROR with latest Docker vLLM image (v0.9.1) #19613

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: Get NCCL_ERROR_SYSTEM_ERROR with latest Docker vLLM image (v0.9.1) #19613

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions