Skip to content

[Bug]: AttributeError: 'MultiprocExecutor' object has no attribute 'workers' when VLLM_USE_V1=1 on rocm platform serve deepseek-r1 671B #17533

Closed
@GuoxiangZu

Description

@GuoxiangZu

Your current environment

The output of python collect_env.py
INFO 05-01 12:11:03 [__init__.py:239] Automatically detected platform rocm.
Collecting environment information...
PyTorch version: 2.7.0a0+git295f2ed
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.4.43482-0f2d60242

OS: Ubuntu 22.04.5 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: 19.0.0git (https://github.com/RadeonOpenCompute/llvm-project roc-6.4.0 25133 c7fe45cf4b819c5991fe208aaa96edf142730f1d)
CMake version: version 3.31.6
Libc version: glibc-2.35

Python version: 3.12.10 (main, Apr  9 2025, 08:55:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.15.0-25-generic-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Instinct MI3**X (gfx942:sramecc+:xnack-)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.4.43482
MIOpen runtime version: 3.4.0
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Address sizes:                   52 bits physical, 57 bits virtual
Byte Order:                      Little Endian
CPU(s):                          192
On-line CPU(s) list:             0-191
Vendor ID:                       GenuineIntel
Model name:                      Intel(R) Xeon(R) Platinum 8468V
CPU family:                      6
Model:                           143
Thread(s) per core:              2
Core(s) per socket:              48
Socket(s):                       2
Stepping:                        8
Frequency boost:                 enabled
CPU max MHz:                     2401.0000
CPU min MHz:                     800.0000
BogoMIPS:                        4800.00
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk pconfig arch_lbr avx512_fp16 flush_l1d arch_capabilities
Virtualization:                  VT-x
L1d cache:                       4.5 MiB (96 instances)
L1i cache:                       3 MiB (96 instances)
L2 cache:                        192 MiB (96 instances)
L3 cache:                        195 MiB (2 instances)
NUMA node(s):                    2
NUMA node0 CPU(s):               0-47,96-143
NUMA node1 CPU(s):               48-95,144-191
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.7.0a0+git295f2ed
[pip3] torchvision==0.21.0+7af6987
[pip3] transformers==4.51.3
[pip3] triton==3.3.0+git981e987e
[conda] Could not collect
ROCM Version: 6.4.43482-0f2d60242
Neuron SDK Version: N/A
vLLM Version: 0.8.5.dev5+g41b85b6ed (git sha: 41b85b6ed)
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            15           15           15           15           15           15           15
GPU1   15           0            15           15           15           15           15           15
GPU2   15           15           0            15           15           15           15           15
GPU3   15           15           15           0            15           15           15           15
GPU4   15           15           15           15           0            15           15           15
GPU5   15           15           15           15           15           0            15           15
GPU6   15           15           15           15           15           15           0            15
GPU7   15           15           15           15           15           15           15           0

================================= Hops between two GPUs ==================================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            1            1            1            1            1            1            1
GPU1   1            0            1            1            1            1            1            1
GPU2   1            1            0            1            1            1            1            1
GPU3   1            1            1            0            1            1            1            1
GPU4   1            1            1            1            0            1            1            1
GPU5   1            1            1            1            1            0            1            1
GPU6   1            1            1            1            1            1            0            1
GPU7   1            1            1            1            1            1            1            0

=============================== Link Type between two GPUs ===============================
       GPU0         GPU1         GPU2         GPU3         GPU4         GPU5         GPU6         GPU7
GPU0   0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU1   XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI         XGMI
GPU2   XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI         XGMI
GPU3   XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI         XGMI
GPU4   XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI         XGMI
GPU5   XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI         XGMI
GPU6   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0            XGMI
GPU7   XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         XGMI         0

======================================= Numa Nodes =======================================
GPU[0]          : (Topology) Numa Node: 0
GPU[0]          : (Topology) Numa Affinity: 0
GPU[1]          : (Topology) Numa Node: 0
GPU[1]          : (Topology) Numa Affinity: 0
GPU[2]          : (Topology) Numa Node: 0
GPU[2]          : (Topology) Numa Affinity: 0
GPU[3]          : (Topology) Numa Node: 0
GPU[3]          : (Topology) Numa Affinity: 0
GPU[4]          : (Topology) Numa Node: 1
GPU[4]          : (Topology) Numa Affinity: 1
GPU[5]          : (Topology) Numa Node: 1
GPU[5]          : (Topology) Numa Affinity: 1
GPU[6]          : (Topology) Numa Node: 1
GPU[6]          : (Topology) Numa Affinity: 1
GPU[7]          : (Topology) Numa Node: 1
GPU[7]          : (Topology) Numa Affinity: 1
================================== End of ROCm SMI Log ===================================

PYTORCH_ROCM_ARCH=gfx90a;gfx942;gfx1100;gfx1101;gfx1200;gfx1201
LD_LIBRARY_PATH=/opt/rocm/lib:/usr/local/lib:
NCCL_CUMEM_ENABLE=0
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
CUDA_MODULE_LOADING=LAZY

🐛 Describe the bug

With vllm 0.8.5(also the same issue with vllm 0.8.3, has not try 0.8.4, but most likely the same issue) , on a rocm platform(AMD Mi3** GPU), when v1 engine is enabled, when vllm serve deepseek-r1 671B,the vllm server could not start successfully due to ERROR 05-01 12:18:50 [multiproc_executor.py:435] AttributeError: 'GPUModelRunner' object has no attribute 'runner'.

'''Python
export VLLM_CONFIGURE_LOGGING=1
export VLLM_USE_V1=1

vllm serve /model/deepseek-r1
--enable-reasoning
--reasoning-parser deepseek_r1
--tensor-parallel-size 8
--trust-remote-code
'''

Below is the full error log:

INFO 05-01 12:17:52 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:03 [api_server.py:1043] vLLM API server version 0.8.5.dev5+g41b85b6ed
INFO 05-01 12:18:03 [api_server.py:1044] args: Namespace(subparser='serve', model_tag='/model/deepseek-r1', config='', host=None, port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/model/deepseek-r1', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config={}, use_tqdm_on_load=True, config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', max_model_len=None, guided_decoding_backend='auto', reasoning_parser='deepseek_r1', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=8, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, gpu_memory_utilization=0.9, swap_space=4, kv_cache_dtype='auto', num_gpu_blocks_override=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', cpu_offload_gb=0, calculate_kv_scales=False, disable_sliding_window=False, use_v2_block_manager=True, seed=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config={}, limit_mm_per_prompt={}, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=None, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=None, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', speculative_config=None, ignore_patterns=[], served_model_name=None, qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, max_num_batched_tokens=None, max_num_seqs=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, num_lookahead_slots=0, scheduler_delay_factor=0.0, preemption_mode=None, num_scheduler_steps=1, multi_step_stream_outputs=True, scheduling_policy='fcfs', enable_chunked_prefill=None, disable_chunked_mm_input=False, scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, additional_config=None, enable_reasoning=True, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7f1d1112d940>)
INFO 05-01 12:18:03 [config.py:209] Replacing legacy 'type' key with 'rope_type'
INFO 05-01 12:18:17 [config.py:717] This model supports multiple tasks: {'reward', 'score', 'embed', 'classify', 'generate'}. Defaulting to 'generate'.
WARNING 05-01 12:18:18 [arg_utils.py:1676] Detected VLLM_USE_V1=1 with rocm. Usage should be considered experimental. Please report any issues on Github.
INFO 05-01 12:18:18 [config.py:1739] Defaulting to use mp for distributed inference
INFO 05-01 12:18:18 [config.py:1983] Chunked prefill is enabled with max_num_batched_tokens=8192.
WARNING 05-01 12:18:18 [fp8.py:63] Detected fp8 checkpoint. Please note that the format is experimental and subject to change.
INFO 05-01 12:18:21 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:30 [core.py:57] Initializing a V1 LLM engine (v0.8.5.dev5+g41b85b6ed) with config: model='/model/deepseek-r1', speculative_config=None, tokenizer='/model/deepseek-r1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=163840, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=fp8, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend='deepseek_r1'), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/model/deepseek-r1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["+rms_norm","+silu_and_mul"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512}
WARNING 05-01 12:18:30 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed.
INFO 05-01 12:18:30 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7], buffer_handle=(8, 10485760, 10, 'psm_980beeb4'), local_subscribe_addr='ipc:///tmp/0fe9dd22-b3ce-41d7-8f90-f691bc92576d', remote_subscribe_addr=None, remote_addr_ipv6=False)
INFO 05-01 12:18:33 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
INFO 05-01 12:18:34 [__init__.py:239] Automatically detected platform rocm.
WARNING 05-01 12:18:43 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f8f4830dd90>
(VllmWorker rank=1 pid=2813) INFO 05-01 12:18:43 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_b28735dc'), local_subscribe_addr='ipc:///tmp/89290b02-accb-4cc2-847f-d10e02300cd5', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:43 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7ef88c0b5d90>
(VllmWorker rank=4 pid=2816) INFO 05-01 12:18:43 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_f75a830b'), local_subscribe_addr='ipc:///tmp/66c23883-fa78-4d9a-b2cf-d4b5218315c6', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f2c9b66ef00>
(VllmWorker rank=6 pid=2818) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_c7ac618b'), local_subscribe_addr='ipc:///tmp/7b4a7298-bc8f-4e56-8aa9-680722991a71', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7f4c9c901eb0>
(VllmWorker rank=2 pid=2814) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_f7eeeb6f'), local_subscribe_addr='ipc:///tmp/393ff790-a6fc-43b9-988b-cf01e69b8194', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fbc87ed0350>
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_bd5b14d8'), local_subscribe_addr='ipc:///tmp/4247edba-4983-42df-ae5b-06727334066d', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fc5796d0a70>
(VllmWorker rank=5 pid=2817) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_3064b7e9'), local_subscribe_addr='ipc:///tmp/189de3f1-36d2-4fbd-84fe-3859e1bf7ab0', remote_subscribe_addr=None, remote_addr_ipv6=False)
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7ff7526acb00>
WARNING 05-01 12:18:44 [utils.py:2686] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0x7fb3b0dfb590>
(VllmWorker rank=7 pid=2819) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ff30da6f'), local_subscribe_addr='ipc:///tmp/6bfaa5f3-f373-4ca5-a770-91c83d691147', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=3 pid=2815) INFO 05-01 12:18:44 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_6e130156'), local_subscribe_addr='ipc:///tmp/9747cb0c-b19e-443a-b765-10b22aba7ea9', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=5 pid=2817) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=3 pid=2815) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=5 pid=2817) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=3 pid=2815) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=6 pid=2818) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=6 pid=2818) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=2 pid=2814) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=2 pid=2814) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=7 pid=2819) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=7 pid=2819) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=1 pid=2813) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=4 pid=2816) INFO 05-01 12:18:45 [utils.py:1197] Found nccl from library librccl.so.1
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=1 pid=2813) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=4 pid=2816) INFO 05-01 12:18:45 [pynccl.py:69] vLLM is using nccl==2.22.3
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:49 [shm_broadcast.py:266] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_6b8de5a9'), local_subscribe_addr='ipc:///tmp/1542f54a-b90f-433a-bfc8-a140ecc410e2', remote_subscribe_addr=None, remote_addr_ipv6=False)
(VllmWorker rank=6 pid=2818) INFO 05-01 12:18:49 [parallel_state.py:946] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6
(VllmWorker rank=4 pid=2816) INFO 05-01 12:18:49 [parallel_state.py:946] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4
(VllmWorker rank=3 pid=2815) INFO 05-01 12:18:49 [parallel_state.py:946] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3
(VllmWorker rank=5 pid=2817) INFO 05-01 12:18:49 [parallel_state.py:946] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5
(VllmWorker rank=1 pid=2813) INFO 05-01 12:18:49 [parallel_state.py:946] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:49 [parallel_state.py:946] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0
(VllmWorker rank=2 pid=2814) INFO 05-01 12:18:49 [parallel_state.py:946] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2
(VllmWorker rank=7 pid=2819) INFO 05-01 12:18:49 [parallel_state.py:946] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7
(VllmWorker rank=2 pid=2814) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
(VllmWorker rank=3 pid=2815) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
(VllmWorker rank=6 pid=2818) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
(VllmWorker rank=4 pid=2816) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
(VllmWorker rank=1 pid=2813) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435] WorkerProc failed to start.
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435] Traceback (most recent call last):
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 409, in worker_main
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 305, in __init__
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     self.worker.init_device()
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 604, in init_device
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     self.worker.init_device()  # type: ignore
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     ^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 141, in init_device
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     self.model_runner: GPUModelRunner = GPUModelRunner(
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]                                         ^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 141, in __init__
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     self.attn_metadata_builder = self.attn_backend.get_builder_cls()(
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]   File "/usr/local/lib/python3.12/dist-packages/vllm/attention/backends/mla/common.py", line 746, in __init__
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]     self.runner = input_builder.runner
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435]                   ^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=2 pid=2814) ERROR 05-01 12:18:50 [multiproc_executor.py:435] AttributeError: 'GPUModelRunner' object has no attribute 'runner'
(VllmWorker rank=0 pid=2812) INFO 05-01 12:18:50 [rocm.py:155] Using Triton MLA backend.
[rank0]:[W501 12:18:50.060757128 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
ERROR 05-01 12:18:54 [core.py:395] EngineCore failed to start.
ERROR 05-01 12:18:54 [core.py:395] Traceback (most recent call last):
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 386, in run_engine_core
ERROR 05-01 12:18:54 [core.py:395]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-01 12:18:54 [core.py:395]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 328, in __init__
ERROR 05-01 12:18:54 [core.py:395]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 63, in __init__
ERROR 05-01 12:18:54 [core.py:395]     self.model_executor = executor_class(vllm_config)
ERROR 05-01 12:18:54 [core.py:395]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 05-01 12:18:54 [core.py:395]     self._init_executor()
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 91, in _init_executor
ERROR 05-01 12:18:54 [core.py:395]     self.workers = WorkerProc.wait_for_ready(unready_workers)
ERROR 05-01 12:18:54 [core.py:395]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-01 12:18:54 [core.py:395]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 370, in wait_for_ready
ERROR 05-01 12:18:54 [core.py:395]     raise e from None
ERROR 05-01 12:18:54 [core.py:395] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Process EngineCore_0:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 399, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 386, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 328, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 63, in __init__
    self.model_executor = executor_class(vllm_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 91, in _init_executor
    self.workers = WorkerProc.wait_for_ready(unready_workers)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 370, in wait_for_ready
    raise e from None
Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Traceback (most recent call last):
  File "/usr/lib/python3.12/weakref.py", line 666, in _exitfunc
    f()
  File "/usr/lib/python3.12/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 228, in shutdown
    for w in self.workers:
             ^^^^^^^^^^^^
AttributeError: 'MultiprocExecutor' object has no attribute 'workers'
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 53, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server
    async with build_async_engine_client(args) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 143, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 102, in __init__
    self.engine_core = core_client_class(
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 621, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 396, in __init__
    self._wait_for_engine_startup()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 422, in _wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingrocmRelated to AMD ROCm

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions