-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Closed as not planned
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity
Description
Your current environment
The output of `python collect_env.py`
Node-1: 4 * 4090
Node-2: 4 * 4090
🐛 Describe the bug
我使用docker部署vllm并进行多机多卡的推理,结果在启动DeepSeek-R1-Distill-Llama-70B
模型时报错。
docker: vllm/vllm-openai:v0.7.2
报错信息:
(RayWorkerWrapper pid=319, ip=10.68.27.14) WARNING 02-12 01:21:27 custom_all_reduce.py:136] Custom allreduce is disabled because it's not supported on more than two PCIe-only GPUs. To silence this warning, specify disable_custom_all_reduce=True explicitly. [repeated 6x across cluster]
Loading safetensors checkpoint shards: 0% Completed | 0/16 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 6% Completed | 1/16 [00:00<00:07, 1.88it/s]
Loading safetensors checkpoint shards: 19% Completed | 3/16 [00:04<00:20, 1.61s/it]
Loading safetensors checkpoint shards: 25% Completed | 4/16 [00:10<00:36, 3.00s/it]
Loading safetensors checkpoint shards: 31% Completed | 5/16 [00:15<00:41, 3.76s/it]
Loading safetensors checkpoint shards: 38% Completed | 6/16 [00:15<00:26, 2.68s/it]
Loading safetensors checkpoint shards: 44% Completed | 7/16 [00:20<00:29, 3.27s/it]
Loading safetensors checkpoint shards: 50% Completed | 8/16 [00:20<00:18, 2.31s/it]
Loading safetensors checkpoint shards: 56% Completed | 9/16 [00:20<00:11, 1.64s/it]
Loading safetensors checkpoint shards: 62% Completed | 10/16 [00:25<00:14, 2.47s/it]
Loading safetensors checkpoint shards: 69% Completed | 11/16 [00:25<00:08, 1.76s/it]
Loading safetensors checkpoint shards: 75% Completed | 12/16 [00:27<00:07, 1.91s/it]
Loading safetensors checkpoint shards: 81% Completed | 13/16 [00:27<00:04, 1.38s/it]
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Error executing method 'load_model'. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] Traceback (most recent call last):
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return run_method(target, method, args, kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return func(*args, **kwargs)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] self.model_runner.load_model()
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] self.model = get_model(vllm_config=self.vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] return loader.load_model(vllm_config=vllm_config)
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] raise ValueError(
(RayWorkerWrapper pid=318, ip=10.68.27.14) ERROR 02-12 01:21:59 worker_base.py:574] ValueError: Following weights were not initialized from checkpoint: {'model.layers.46.input_layernorm.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight'}
(RayWorkerWrapper pid=828) INFO 02-12 01:21:08 model_runner.py:1110] Starting to load model /root/deepseek-ai/DeepSeek-R1-Distill-Llama-70B... [repeated 6x across cluster]
Loading safetensors checkpoint shards: 88% Completed | 14/16 [00:29<00:03, 1.60s/it]
Loading safetensors checkpoint shards: 94% Completed | 15/16 [00:30<00:01, 1.40s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00, 1.04s/it]
Loading safetensors checkpoint shards: 100% Completed | 16/16 [00:30<00:00, 1.93s/it]
INFO 02-12 01:21:39 model_runner.py:1115] Loading model weights took 16.4603 GB
[rank0]: Traceback (most recent call last):
[rank0]: File "/usr/local/bin/vllm", line 8, in <module>
[rank0]: sys.exit(main())
[rank0]: ^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 204, in main
[rank0]: args.dispatch_function(args)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/scripts.py", line 44, in serve
[rank0]: uvloop.run(run_server(args))
[rank0]: File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
[rank0]: return __asyncio.run(
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
[rank0]: return runner.run(main)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
[rank0]: return self._loop.run_until_complete(task)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[rank0]: File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
[rank0]: return await main
[rank0]: ^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
[rank0]: async with build_async_engine_client(args) as engine_client:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
[rank0]: async with build_async_engine_client_from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
[rank0]: engine_client = AsyncLLMEngine.from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 644, in from_engine_args
[rank0]: engine = cls(
[rank0]: ^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 594, in __init__
[rank0]: self.engine = self._engine_class(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 273, in __init__
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 262, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 51, in __init__
[rank0]: self._init_executor()
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 90, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 356, in _init_workers_ray
[rank0]: self._run_workers("load_model",
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 481, in _run_workers
[rank0]: ray_worker_outputs = ray.get(ray_worker_outputs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
[rank0]: return fn(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 2772, in get
[rank0]: values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 919, in get_objects
[rank0]: raise value.as_instanceof_cause()
[rank0]: ray.exceptions.RayTaskError(ValueError): ray::RayWorkerWrapper.execute_method() (pid=317, ip=10.68.27.14, actor_id=b5ada3c37ac4d423338ba5bd01000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f5aa3145bb0>)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 575, in execute_method
[rank0]: raise e
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 566, in execute_method
[rank0]: return run_method(target, method, args, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2220, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]: self.model_runner.load_model()
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1112, in load_model
[rank0]: self.model = get_model(vllm_config=self.vllm_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
[rank0]: return loader.load_model(vllm_config=vllm_config)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/loader.py", line 393, in load_model
[rank0]: raise ValueError(
[rank0]: ValueError: Following weights were not initialized from checkpoint: {'model.layers.48.post_attention_layernorm.weight', 'model.layers.49.self_attn.o_proj.weight', 'model.layers.47.mlp.gate_up_proj.weight', 'model.layers.45.self_attn.o_proj.weight', 'model.layers.46.post_attention_layernorm.weight', 'model.layers.46.self_attn.qkv_proj.weight', 'model.layers.47.self_attn.o_proj.weight', 'model.layers.47.self_attn.qkv_proj.weight', 'model.layers.48.input_layernorm.weight', 'model.layers.48.self_attn.qkv_proj.weight', 'model.layers.46.self_attn.o_proj.weight', 'model.layers.49.input_layernorm.weight', 'model.layers.47.post_attention_layernorm.weight', 'model.layers.49.post_attention_layernorm.weight', 'model.layers.46.mlp.gate_up_proj.weight', 'model.layers.49.mlp.down_proj.weight', 'model.layers.45.post_attention_layernorm.weight', 'model.layers.48.self_attn.o_proj.weight', 'model.layers.47.mlp.down_proj.weight', 'model.layers.49.self_attn.qkv_proj.weight', 'model.layers.45.mlp.gate_up_proj.weight', 'model.layers.48.mlp.down_proj.weight', 'model.layers.47.input_layernorm.weight', 'model.layers.48.mlp.gate_up_proj.weight', 'model.layers.45.input_layernorm.weight', 'model.layers.49.mlp.gate_up_proj.weight', 'model.layers.45.mlp.down_proj.weight', 'model.layers.46.mlp.down_proj.weight', 'model.layers.46.input_layernorm.weight'}
同样的docker,部署 DeepSeek-R1-Distill-Qwen-32B
就能正常启动和使用。
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingstaleOver 90 days of inactivityOver 90 days of inactivity