-
Notifications
You must be signed in to change notification settings - Fork 272
Closed
Labels
bugSomething isn't workingSomething isn't workingfp8For any issue / PR related to FP8 supportFor any issue / PR related to FP8 supportvllmUsing vLLMUsing vLLM
Description
⚙️ Your current environment
Full error code below, but FP8_Block fails to load
You can see below, there is a mismatch from vlm 0.11.1 and W8A16-FP8_BLOCK quants. I was under the impression that the PR for FP8_BLOCK and SM12.0 was already incorporated in the mainline?
### Environment Information ###
Operating System: `Linux-6.8.0-85-generic-x86_64-with-glibc2.39`
Python Version: `3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]`
llm-compressor Version: `None`
compressed-tensors Version: `0.11.0`
transformers Version: `4.57.0`
torch Version: `2.8.0+cu129`
CUDA Devices: `['NVIDIA RTX PRO 6000 Blackwell Workstation Edition', 'NVIDIA RTX PRO 6000 Blackwell Workstation Edition']`
AMD Devices: `None`
🐛 Describe the bug
Here's the relevant section:
AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype'
... in compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py
... -> maybe_post_process_fp8_weight_block(...) -> fp8_utils.py
FULL Error Code here:
Loading safetensors checkpoint shards: 0% Completed | 0/26 [00:00<?, ?it/s]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 4% Completed | 1/26 [00:01<00:30, 1.22s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 8% Completed | 2/26 [00:02<00:29, 1.24s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 12% Completed | 3/26 [00:03<00:28, 1.22s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 15% Completed | 4/26 [00:04<00:26, 1.20s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 19% Completed | 5/26 [00:06<00:25, 1.23s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 23% Completed | 6/26 [00:07<00:25, 1.29s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 27% Completed | 7/26 [00:08<00:24, 1.31s/it]
�[1m�[36m(APIServer pid=71476)�[0m DEBUG 10-13 14:12:27 [v1/engine/utils.py:776] Waiting for 1 local, 0 remote core engine proc(s) to start.
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 31% Completed | 8/26 [00:10<00:23, 1.28s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 35% Completed | 9/26 [00:11<00:21, 1.27s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 38% Completed | 10/26 [00:12<00:20, 1.29s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 42% Completed | 11/26 [00:13<00:19, 1.29s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 46% Completed | 12/26 [00:15<00:17, 1.26s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 50% Completed | 13/26 [00:16<00:16, 1.25s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 54% Completed | 14/26 [00:17<00:15, 1.30s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 58% Completed | 15/26 [00:19<00:14, 1.32s/it]
�[1m�[36m(APIServer pid=71476)�[0m DEBUG 10-13 14:12:37 [v1/engine/utils.py:776] Waiting for 1 local, 0 remote core engine proc(s) to start.
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 62% Completed | 16/26 [00:20<00:12, 1.27s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 65% Completed | 17/26 [00:21<00:11, 1.24s/it]
�[1m�[36m(Worker_TP1 pid=71620)�[0m DEBUG 10-13 14:12:39 [model_executor/models/utils.py:186] Loaded weight lm_head.weight with shape torch.Size([16384, 12288])
�[1m�[36m(Worker_TP0 pid=71619)�[0m DEBUG 10-13 14:12:39 [model_executor/models/utils.py:186] Loaded weight lm_head.weight with shape torch.Size([16384, 12288])
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 69% Completed | 18/26 [00:22<00:08, 1.04s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 73% Completed | 19/26 [00:23<00:07, 1.09s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 77% Completed | 20/26 [00:24<00:06, 1.15s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 81% Completed | 21/26 [00:25<00:05, 1.17s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 85% Completed | 22/26 [00:27<00:04, 1.18s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 88% Completed | 23/26 [00:28<00:03, 1.24s/it]
�[1m�[36m(APIServer pid=71476)�[0m DEBUG 10-13 14:12:47 [v1/engine/utils.py:776] Waiting for 1 local, 0 remote core engine proc(s) to start.
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 92% Completed | 24/26 [00:29<00:02, 1.30s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 96% Completed | 25/26 [00:31<00:01, 1.26s/it]
�[1m�[36m(Worker_TP1 pid=71620)�[0m INFO 10-13 14:12:49 [model_executor/model_loader/default_loader.py:267] Loading weights took 32.12 seconds
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 100% Completed | 26/26 [00:32<00:00, 1.24s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
Loading safetensors checkpoint shards: 100% Completed | 26/26 [00:32<00:00, 1.24s/it]
�[1m�[36m(Worker_TP0 pid=71619)�[0m
�[1m�[36m(Worker_TP0 pid=71619)�[0m INFO 10-13 14:12:50 [model_executor/model_loader/default_loader.py:267] Loading weights took 32.33 seconds
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] WorkerProc failed to start.
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] Traceback (most recent call last):
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 571, in worker_main
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] worker = WorkerProc(*args, **kwargs)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 437, in __init__
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] self.worker.load_model()
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] self.model_runner.load_model(eep_scale_up=eep_scale_up)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] self.model = model_loader.load_model(
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] ^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/base_loader.py", line 51, in load_model
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] process_weights_after_loading(model, model_config, target_device)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/model_executor/model_loader/utils.py", line 112, in process_weights_after_loading
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] quant_method.process_weights_after_loading(module)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 718, in process_weights_after_loading
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] layer.scheme.process_weights_after_loading(layer)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py", line 136, in process_weights_after_loading
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] maybe_post_process_fp8_weight_block(
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/model_executor/layers/quantization/utils/fp8_utils.py", line 915, in maybe_post_process_fp8_weight_block
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] layer.orig_dtype, layer.weight)
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] ^^^^^^^^^^^^^^^^
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1962, in __getattr__
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] raise AttributeError(
�[1m�[36m(Worker_TP1 pid=71620)�[0m ERROR 10-13 14:12:50 [v1/executor/multiproc_executor.py:597] AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype'
�[1m�[36m(Worker_TP1 pid=71620)�[0m INFO 10-13 14:12:50 [v1/executor/multiproc_executor.py:558] Parent process exited, terminating worker
�[1m�[36m(Worker_TP0 pid=71619)�[0m INFO 10-13 14:12:50 [v1/executor/multiproc_executor.py:558] Parent process exited, terminating worker
[rank0]:[W1013 14:12:51.310831696 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] EngineCore failed to start.
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] Traceback (most recent call last):
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] engine_core = EngineCoreProc(*args, **kwargs)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] super().__init__(vllm_config, executor_class, log_stats,
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] self.model_executor = executor_class(vllm_config)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] self._init_executor()
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] raise e from None
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ERROR 10-13 14:12:52 [v1/engine/core.py:708] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m Process EngineCore_DP0:
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m Traceback (most recent call last):
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m self.run()
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m self._target(*self._args, **self._kwargs)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m raise e
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m engine_core = EngineCoreProc(*args, **kwargs)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 498, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m super().__init__(vllm_config, executor_class, log_stats,
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 83, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m self.model_executor = executor_class(vllm_config)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 54, in __init__
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m self._init_executor()
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 106, in _init_executor
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m self.workers = WorkerProc.wait_for_ready(unready_workers)
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 509, in wait_for_ready
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m raise e from None
�[1m�[36m(EngineCore_DP0 pid=71549)�[0m Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
�[1m�[36m(APIServer pid=71476)�[0m Traceback (most recent call last):
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/bin/vllm", line 8, in <module>
�[1m�[36m(APIServer pid=71476)�[0m sys.exit(main())
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
�[1m�[36m(APIServer pid=71476)�[0m args.dispatch_function(args)
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 57, in cmd
�[1m�[36m(APIServer pid=71476)�[0m uvloop.run(run_server(args))
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
�[1m�[36m(APIServer pid=71476)�[0m return __asyncio.run(
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
�[1m�[36m(APIServer pid=71476)�[0m return runner.run(main)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
�[1m�[36m(APIServer pid=71476)�[0m return self._loop.run_until_complete(task)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
�[1m�[36m(APIServer pid=71476)�[0m return await main
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
�[1m�[36m(APIServer pid=71476)�[0m await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
�[1m�[36m(APIServer pid=71476)�[0m async with build_async_engine_client(
�[1m�[36m(APIServer pid=71476)�[0m File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
�[1m�[36m(APIServer pid=71476)�[0m return await anext(self.gen)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
�[1m�[36m(APIServer pid=71476)�[0m async with build_async_engine_client_from_engine_args(
�[1m�[36m(APIServer pid=71476)�[0m File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
�[1m�[36m(APIServer pid=71476)�[0m return await anext(self.gen)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
�[1m�[36m(APIServer pid=71476)�[0m async_llm = AsyncLLM.from_vllm_config(
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/utils/__init__.py", line 1572, in inner
�[1m�[36m(APIServer pid=71476)�[0m return fn(*args, **kwargs)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
�[1m�[36m(APIServer pid=71476)�[0m return cls(
�[1m�[36m(APIServer pid=71476)�[0m ^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
�[1m�[36m(APIServer pid=71476)�[0m self.engine_core = EngineCoreClient.make_async_mp_client(
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
�[1m�[36m(APIServer pid=71476)�[0m return AsyncMPClient(*client_args)
�[1m�[36m(APIServer pid=71476)�[0m ^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 769, in __init__
�[1m�[36m(APIServer pid=71476)�[0m super().__init__(
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 448, in __init__
�[1m�[36m(APIServer pid=71476)�[0m with launch_core_engines(vllm_config, executor_class,
�[1m�[36m(APIServer pid=71476)�[0m File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
�[1m�[36m(APIServer pid=71476)�[0m next(self.gen)
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
�[1m�[36m(APIServer pid=71476)�[0m wait_for_engine_startup(
�[1m�[36m(APIServer pid=71476)�[0m File "/home/phaedawg/vllm/venv/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
�[1m�[36m(APIServer pid=71476)�[0m raise RuntimeError("Engine core initialization failed. "
�[1m�[36m(APIServer pid=71476)�[0m RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d
🛠️ Steps to reproduce
Install VLLM mainline
load a W8A16-FP8_BLOCK quant of a Dense model
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingfp8For any issue / PR related to FP8 supportFor any issue / PR related to FP8 supportvllmUsing vLLMUsing vLLM