Skip to content

[Bug]: Docker 多机多卡部署--模型启动报错 #13145

Closed as not planned
Closed as not planned
@Tian14267

Description

@Tian14267

Your current environment

The output of `python collect_env.py`
Node-1:  4 * 4090
Node-2:  4 * 4090

🐛 Describe the bug

我使用docker部署vllm并进行多机多卡的推理,结果在启动DeepSeek-R1-Distill-Llama-70B模型时报错。
docker: vllm/vllm-openai:v0.7.2
报错信息:

(RayWorkerWrapper pid=319, ip=10.68.27.14) WARNING 02-12 01:21:27 custom_all_reduce.py:136] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. [repeated 6x across cluster]
Loading safetensors checkpoint shards:   0% Completed | 0/16 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:   6% Completed | 1/16 [00:00<00:07,  1.88it/s]
Loading safetensors checkpoint shards:  19% Completed | 3/16 [00:04<00:20,  1.61s/it]
Loading safetensors checkpoint shards:  25% Completed | 4/16 [00:10<00:36,  3.00s/it]
Loading safetensors checkpoint shards:  31% Completed | 5/16 [00:15<00:41,  3.76s/it]
Loading safetensors checkpoint shards:  38% Completed | 6/16 [00:15<00:26,  2.68s/it]
Loading safetensors checkpoint shards:  44% Completed | 7/16 [00:20<00:29,  3.27s/it]
Loading safetensors checkpoint shards:  50% Completed | 8/16 [00:20<00:18,  2.31s/it]
Loading safetensors checkpoint shards:  56% Completed | 9/16 [00:20<00:11,  1.64s/it]
Loading safetensors checkpoint shards:  62% Completed | 10/16 [00:25<00:14,  2.47s/it]
Loading safetensors checkpoint shards:  69% Completed | 11/16 [00:25<00:08,  1.76s/it]
Loading safetensors checkpoint shards:  75% Completed | 12/16 [00:27<00:07,  1.91s/it]
Loading safetensors checkpoint shards:  81% Completed | 13/16 [00:27<00:04,  1.38s/it]
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Error executing method 'load_model'. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Traceback (most recent call last):
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     return run_method(target, method, args, kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     return func(*args, **kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]            ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     self.model_runner.load_model()
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     self.model = get_model(vllm_config=self.vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     return loader.load_model(vllm_config=vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574]     raise ValueError(
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ValueError: Following weights were not initialized from checkpoint: {'model.layers.46.input_layernorm.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight'}
(RayWorkerWrapper pid=828) INFO 02-12 01:21:08 model_runner.py:1110] Starting to load model /root/deepseek-ai/DeepSeek-R1-Distill-Llama-70B... [repeated 6x across cluster]
Loading safetensors checkpoint shards:  88% Completed | 14/16 [00:29<00:03,  1.60s/it]
Loading safetensors checkpoint shards:  94% Completed | 15/16 [00:30<00:01,  1.40s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00,  1.04s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00,  1.93s/it]

INFO 02-12 01:21:39 model_runner.py:1115] Loading model weights took 16.4603 GB
[rank0]: Traceback (most recent call last):
[rank0]:   File "/usr/local/bin/vllm", line 8, in <module>
[rank0]:     sys.exit(main())
[rank0]:              ^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 204, in main
[rank0]:     args.dispatch_function(args)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 44, in serve
[rank0]:     uvloop.run(run_server(args))
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[rank0]:     return __asyncio.run(
[rank0]:            ^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[rank0]:     return runner.run(main)
[rank0]:            ^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[rank0]:     return self._loop.run_until_complete(task)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[rank0]:     return await main
[rank0]:            ^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
[rank0]:     async with build_async_engine_client(args) as engine_client:
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]:     return await anext(self.gen)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
[rank0]:     async with build_async_engine_client_from_engine_args(
[rank0]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]:     return await anext(self.gen)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
[rank0]:     engine_client = AsyncLLMEngine.from_engine_args(
[rank0]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 644, in from_engine_args
[rank0]:     engine = cls(
[rank0]:              ^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 594, in __init__
[rank0]:     self.engine = self._engine_class(*args, **kwargs)
[rank0]:                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in __init__
[rank0]:     self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]:                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
[rank0]:     self._init_workers_ray(placement_group)
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 356, in _init_workers_ray
[rank0]:     self._run_workers("load_model",
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 481, in _run_workers
[rank0]:     ray_worker_outputs = ray.get(ray_worker_outputs)
[rank0]:                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2772, in get
[rank0]:     values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
[rank0]:                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 919, in get_objects
[rank0]:     raise value.as_instanceof_cause()
[rank0]: ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.execute_method() (pid=317, ip=10.68.27.14, actor_id=b5ada3c37ac4d423338ba5bd01000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f5aa3145bb0>)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 575, in execute_method
[rank0]:     raise e
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
[rank0]:     return run_method(target, method, args, kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
[rank0]:     self.model = get_model(vllm_config=self.vllm_config)
[rank0]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
[rank0]:     return loader.load_model(vllm_config=vllm_config)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
[rank0]:     raise ValueError(
[rank0]: ValueError: Following weights were not initialized from checkpoint: {'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.46.input_layernorm.weight'}

同样的docker,部署 DeepSeek-R1-Distill-Qwen-32B 就能正常启动和使用。

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleOver 90 days of inactivity

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions