Closed as not planned
Description
Your current environment
The output of `python collect_env.py`
Node-1: 4 * 4090
Node-2: 4 * 4090
🐛 Describe the bug
我使用docker部署vllm并进行多机多卡的推理,结果在启动DeepSeek-R1-Distill-Llama-70B
模型时报错。
docker: vllm/vllm-openai:v0.7.2
报错信息:
(RayWorkerWrapper pid=319, ip=10.68.27.14) WARNING 02-12 01:21:27 custom_all_reduce.py:136] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. [repeated 6x across cluster]
Loading safetensors checkpoint shards: 0% Completed | 0/16 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 6% Completed | 1/16 [00:00<00:07, 1.88it/s]
Loading safetensors checkpoint shards: 19% Completed | 3/16 [00:04<00:20, 1.61s/it]
Loading safetensors checkpoint shards: 25% Completed | 4/16 [00:10<00:36, 3.00s/it]
Loading safetensors checkpoint shards: 31% Completed | 5/16 [00:15<00:41, 3.76s/it]
Loading safetensors checkpoint shards: 38% Completed | 6/16 [00:15<00:26, 2.68s/it]
Loading safetensors checkpoint shards: 44% Completed | 7/16 [00:20<00:29, 3.27s/it]
Loading safetensors checkpoint shards: 50% Completed | 8/16 [00:20<00:18, 2.31s/it]
Loading safetensors checkpoint shards: 56% Completed | 9/16 [00:20<00:11, 1.64s/it]
Loading safetensors checkpoint shards: 62% Completed | 10/16 [00:25<00:14, 2.47s/it]
Loading safetensors checkpoint shards: 69% Completed | 11/16 [00:25<00:08, 1.76s/it]
Loading safetensors checkpoint shards: 75% Completed | 12/16 [00:27<00:07, 1.91s/it]
Loading safetensors checkpoint shards: 81% Completed | 13/16 [00:27<00:04, 1.38s/it]
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Error executing method 'load_model'. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Traceback (most recent call last):
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return run_method(target, method, args, kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return func(*args, **kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] self.model_runner.load_model()
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] self.model = get_model(vllm_config=self.vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return loader.load_model(vllm_config=vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] raise ValueError(
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ValueError: Following weights were not initialized from checkpoint: {'model.layers.46.input_layernorm.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight'}
(RayWorkerWrapper pid=828) INFO 02-12 01:21:08 model_runner.py:1110] Starting to load model /root/deepseek-ai/DeepSeek-R1-Distill-Llama-70B... [repeated 6x across cluster]
Loading safetensors checkpoint shards: 88% Completed | 14/16 [00:29<00:03, 1.60s/it]
Loading safetensors checkpoint shards: 94% Completed | 15/16 [00:30<00:01, 1.40s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00, 1.04s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00, 1.93s/it]
INFO 02-12 01:21:39 model_runner.py:1115] Loading model weights took 16.4603 GB
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/bin/vllm", line 8, in <module>
[rank0]: sys.exit(main())
[rank0]: ^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 204, in main
[rank0]: args.dispatch_function(args)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 44, in serve
[rank0]: uvloop.run(run_server(args))
[rank0]: File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[rank0]: return __asyncio.run(
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[rank0]: return runner.run(main)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[rank0]: return self._loop.run_until_complete(task)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[rank0]: File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[rank0]: return await main
[rank0]: ^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
[rank0]: async with build_async_engine_client(args) as engine_client:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
[rank0]: async with build_async_engine_client_from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
[rank0]: engine_client = AsyncLLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 644, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 594, in __init__
[rank0]: self.engine = self._engine_class(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in __init__
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in __init__
[rank0]: self._init_executor()
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 356, in _init_workers_ray
[rank0]: self._run_workers("load_model",
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 481, in _run_workers
[rank0]: ray_worker_outputs = ray.get(ray_worker_outputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2772, in get
[rank0]: values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 919, in get_objects
[rank0]: raise value.as_instanceof_cause()
[rank0]: ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.execute_method() (pid=317, ip=10.68.27.14, actor_id=b5ada3c37ac4d423338ba5bd01000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f5aa3145bb0>)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 575, in execute_method
[rank0]: raise e
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
[rank0]: return run_method(target, method, args, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
[rank0]: self.model = get_model(vllm_config=self.vllm_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
[rank0]: return loader.load_model(vllm_config=vllm_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
[rank0]: raise ValueError(
[rank0]: ValueError: Following weights were not initialized from checkpoint: {'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.46.input_layernorm.weight'}
同样的docker,部署 DeepSeek-R1-Distill-Qwen-32B
就能正常启动和使用。
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.