Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: size must contain 'shortest_edge' and 'longest_edge' keys. #427

Open
sandatiger opened this issue Apr 1, 2025 · 5 comments
Open
Labels
Bug Something isn't working UI TARS

Comments

@sandatiger
Copy link

Version

v0.6.6

Model

UI-TARS-2B-SFT

Deployment Method

Local

Issue Description

启动命令: sudo python3 -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model /usr/local/UI-TARS-2B-DPO/ --trust-remote-code

启动后报错。

Error Logs

ERROR 04-01 20:31:07 engine.py:366] size must contain 'shortest_edge' and 'longest_edge' keys.
ERROR 04-01 20:31:07 engine.py:366] Traceback (most recent call last):
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 04-01 20:31:07 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 04-01 20:31:07 engine.py:366] return cls(ipc_path=ipc_path,
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in init
ERROR 04-01 20:31:07 engine.py:366] self.engine = LLMEngine(*args, **kwargs)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 276, in init
ERROR 04-01 20:31:07 engine.py:366] self._initialize_kv_caches()
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
ERROR 04-01 20:31:07 engine.py:366] self.model_executor.determine_num_available_blocks())
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 68, in determine_num_available_blocks
ERROR 04-01 20:31:07 engine.py:366] return self.driver_worker.determine_num_available_blocks()
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-01 20:31:07 engine.py:366] return func(*args, **kwargs)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 202, in determine_num_available_blocks
ERROR 04-01 20:31:07 engine.py:366] self.model_runner.profile_run()
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-01 20:31:07 engine.py:366] return func(*args, **kwargs)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1270, in profile_run
ERROR 04-01 20:31:07 engine.py:366] max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 251, in get_max_multimodal_tokens
ERROR 04-01 20:31:07 engine.py:366] return sum(self.get_max_tokens_by_modality(model_config).values())
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 238, in get_max_tokens_by_modality
ERROR 04-01 20:31:07 engine.py:366] self.get_max_tokens_per_item_by_modality(model_config).items()
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 215, in get_max_tokens_per_item_by_modality
ERROR 04-01 20:31:07 engine.py:366] return {
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 216, in
ERROR 04-01 20:31:07 engine.py:366] key: plugin.get_max_multimodal_tokens(model_config)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/base.py", line 243, in get_max_multimodal_tokens
ERROR 04-01 20:31:07 engine.py:366] max_mm_tokens = max_mm_tokens(InputContext(model_config),
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 700, in get_max_qwen2_vl_mm_tokens
ERROR 04-01 20:31:07 engine.py:366] hf_processor = ctx.get_hf_processor(Qwen2VLProcessor)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/inputs/registry.py", line 102, in get_hf_processor
ERROR 04-01 20:31:07 engine.py:366] hf_processor = cached_get_processor(
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/processor.py", line 36, in get_processor
ERROR 04-01 20:31:07 engine.py:366] raise e
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/processor.py", line 18, in get_processor
ERROR 04-01 20:31:07 engine.py:366] processor = AutoProcessor.from_pretrained(
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 345, in from_pretrained
ERROR 04-01 20:31:07 engine.py:366] return processor_class.from_pretrained(
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 1070, in from_pretrained
ERROR 04-01 20:31:07 engine.py:366] args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 1134, in _get_arguments_from_pretrained
ERROR 04-01 20:31:07 engine.py:366] args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/image_processing_auto.py", line 557, in from_pretrained
ERROR 04-01 20:31:07 engine.py:366] return image_processor_class.from_dict(config_dict, **kwargs)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_base.py", line 423, in from_dict
ERROR 04-01 20:31:07 engine.py:366] image_processor = cls(**image_processor_dict)
ERROR 04-01 20:31:07 engine.py:366] File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 144, in init
ERROR 04-01 20:31:07 engine.py:366] raise ValueError("size must contain 'shortest_edge' and 'longest_edge' keys.")
ERROR 04-01 20:31:07 engine.py:366] ValueError: size must contain 'shortest_edge' and 'longest_edge' keys.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
return cls(ipc_path=ipc_path,
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in init
self.engine = LLMEngine(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 276, in init
self._initialize_kv_caches()
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/llm_engine.py", line 416, in _initialize_kv_caches
self.model_executor.determine_num_available_blocks())
File "/usr/local/lib/python3.10/dist-packages/vllm/executor/gpu_executor.py", line 68, in determine_num_available_blocks
return self.driver_worker.determine_num_available_blocks()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 202, in determine_num_available_blocks
self.model_runner.profile_run()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 1270, in profile_run
max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 251, in get_max_multimodal_tokens
return sum(self.get_max_tokens_by_modality(model_config).values())
File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 238, in get_max_tokens_by_modality
self.get_max_tokens_per_item_by_modality(model_config).items()
File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 215, in get_max_tokens_per_item_by_modality
return {
File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/registry.py", line 216, in
key: plugin.get_max_multimodal_tokens(model_config)
File "/usr/local/lib/python3.10/dist-packages/vllm/multimodal/base.py", line 243, in get_max_multimodal_tokens
max_mm_tokens = max_mm_tokens(InputContext(model_config),
File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 700, in get_max_qwen2_vl_mm_tokens
hf_processor = ctx.get_hf_processor(Qwen2VLProcessor)
File "/usr/local/lib/python3.10/dist-packages/vllm/inputs/registry.py", line 102, in get_hf_processor
hf_processor = cached_get_processor(
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/processor.py", line 36, in get_processor
raise e
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/processor.py", line 18, in get_processor
processor = AutoProcessor.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/processing_auto.py", line 345, in from_pretrained
return processor_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 1070, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/processing_utils.py", line 1134, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/image_processing_auto.py", line 557, in from_pretrained
return image_processor_class.from_dict(config_dict, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/image_processing_base.py", line 423, in from_dict
image_processor = cls(**image_processor_dict)
File "/usr/local/lib/python3.10/dist-packages/transformers/models/qwen2_vl/image_processing_qwen2_vl.py", line 144, in init
raise ValueError("size must contain 'shortest_edge' and 'longest_edge' keys.")
ValueError: size must contain 'shortest_edge' and 'longest_edge' keys.
[rank0]:[W401 20:31:08.978610575 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-3' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-4' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-5' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-6' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-7' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-8' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-9' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-10' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Task exception was never retrieved
future: <Task finished name='Task-11' coro=<MQLLMEngineClient.run_output_handler_loop() done, defined at /usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py:178> exception=ZMQError('Operation not supported')>
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop
while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT
File "/usr/local/lib/python3.10/dist-packages/zmq/_future.py", line 372, in poll
raise _zmq.ZMQError(_zmq.ENOTSUP)
zmq.error.ZMQError: Operation not supported
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 774, in
uvloop.run(run_server(args))
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.10/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 740, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 118, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/usr/local/lib/python3.10/dist-packages/vllm/entrypoints/openai/api_server.py", line 223, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

@sandatiger sandatiger added Bug Something isn't working UI TARS labels Apr 1, 2025
@sandatiger sandatiger changed the title [Bug]: runtime_error [Bug]: size must contain 'shortest_edge' and 'longest_edge' keys. Apr 1, 2025
@ysjyx7
Copy link

ysjyx7 commented Apr 3, 2025

You can try to add the shortest_edge and longest_edge items under the size node in the model file "preprocessor_config.json" based on vllm-project/vllm#15614
But I don' t know which values should be used. I use 100 and 500, it works.

@lishufood
Copy link

Transformers v4.50.3 was released on March 29, 2025, and I've encountered this issue too.
I suspect it's likely a version compatibility problem, as we've seen several similar cases already.

Given that I've experienced other version-related issues with Transformers before, I've reverted to this specific version:
pip install transformers==4.45.2

However, I can't say for certain this will resolve your case, since my device is currently throwing CUDA memory errors. You might want to try rolling back to previous versions to see if that helps.

@sandatiger
Copy link
Author

sandatiger commented Apr 5, 2025 via email

@Zerohertz
Copy link

@lishufood

I followed your suggestion, but I'm encountering the following error:

      containers:
        - args:
            - -c
            - |
              pip install transformers==4.45.2 && \
              vllm serve bytedance-research/UI-TARS-72B-DPO --trust-remote-code --host=0.0.0.0 --gpu-memory-utilization 0.95 --tensor-parallel-size=2
vLLM logs
Loading safetensors checkpoint shards:  81% Completed | 25/31 [00:20<00:04,  1.24it/s]
Loading safetensors checkpoint shards:  84% Completed | 26/31 [00:20<00:03,  1.25it/s]
Loading safetensors checkpoint shards:  87% Completed | 27/31 [00:21<00:03,  1.24it/s]
Loading safetensors checkpoint shards:  90% Completed | 28/31 [00:22<00:02,  1.29it/s]
Loading safetensors checkpoint shards:  94% Completed | 29/31 [00:23<00:01,  1.29it/s]
Loading safetensors checkpoint shards:  97% Completed | 30/31 [00:23<00:00,  1.29it/s]
Loading safetensors checkpoint shards: 100% Completed | 31/31 [00:24<00:00,  1.58it/s]
Loading safetensors checkpoint shards: 100% Completed | 31/31 [00:24<00:00,  1.28it/s]
(VllmWorker rank=0 pid=228) 
(VllmWorker rank=0 pid=228) INFO 04-08 20:11:47 [loader.py:447] Loading weights took 24.30 seconds
(VllmWorker rank=0 pid=228) INFO 04-08 20:11:47 [gpu_model_runner.py:1273] Model loading took 68.4777 GiB and 25.840630 seconds
(VllmWorker rank=1 pid=245) INFO 04-08 20:11:49 [loader.py:447] Loading weights took 26.04 seconds
(VllmWorker rank=1 pid=245) INFO 04-08 20:11:50 [gpu_model_runner.py:1273] Model loading took 68.4777 GiB and 28.213698 seconds
(VllmWorker rank=1 pid=245) INFO 04-08 20:11:50 [gpu_model_runner.py:1542] Encoder cache will be initialized with a budget of 20808 tokens, and profiled with 1 video items of the maximum feature size.
(VllmWorker rank=0 pid=228) INFO 04-08 20:11:50 [gpu_model_runner.py:1542] Encoder cache will be initialized with a budget of 20808 tokens, and profiled with 1 video items of the maximum feature size.
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] WorkerProc hit an exception: %s
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] Traceback (most recent call last):
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     output = func(*args, **kwargs)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     return func(*args, **kwargs)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     self.model_runner.profile_run()
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1548, in profile_run
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 151, in _get_dummy_mm_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     processor_inputs = factory.get_dummy_processor_inputs(
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 987, in get_dummy_processor_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     image_token: str = hf_processor.image_token
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] Traceback (most recent call last):
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     output = func(*args, **kwargs)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     return func(*args, **kwargs)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     self.model_runner.profile_run()
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1548, in profile_run
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 151, in _get_dummy_mm_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     processor_inputs = factory.get_dummy_processor_inputs(
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 987, in get_dummy_processor_inputs
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     image_token: str = hf_processor.image_token
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
(VllmWorker rank=1 pid=245) ERROR 04-08 20:11:52 [multiproc_executor.py:383] 
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] WorkerProc hit an exception: %s
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] Traceback (most recent call last):
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     self.model_runner.profile_run()
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1548, in profile_run
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 151, in _get_dummy_mm_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     processor_inputs = factory.get_dummy_processor_inputs(
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 987, in get_dummy_processor_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     image_token: str = hf_processor.image_token
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] Traceback (most recent call last):
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     output = func(*args, **kwargs)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]              ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     return func(*args, **kwargs)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     self.model_runner.profile_run()
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1548, in profile_run
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 151, in _get_dummy_mm_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     processor_inputs = factory.get_dummy_processor_inputs(
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 987, in get_dummy_processor_inputs
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]     image_token: str = hf_processor.image_token
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383]                        ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
(VllmWorker rank=0 pid=228) ERROR 04-08 20:11:52 [multiproc_executor.py:383] 
ERROR 04-08 20:11:52 [core.py:390] EngineCore hit an exception: Traceback (most recent call last):
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 378, in run_engine_core
ERROR 04-08 20:11:52 [core.py:390]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 04-08 20:11:52 [core.py:390]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 319, in __init__
ERROR 04-08 20:11:52 [core.py:390]     super().__init__(vllm_config, executor_class, log_stats)
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 71, in __init__
ERROR 04-08 20:11:52 [core.py:390]     self._initialize_kv_caches(vllm_config)
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 132, in _initialize_kv_caches
ERROR 04-08 20:11:52 [core.py:390]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 04-08 20:11:52 [core.py:390]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 66, in determine_available_memory
ERROR 04-08 20:11:52 [core.py:390]     output = self.collective_rpc("determine_available_memory")
ERROR 04-08 20:11:52 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 134, in collective_rpc
ERROR 04-08 20:11:52 [core.py:390]     raise e
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 123, in collective_rpc
ERROR 04-08 20:11:52 [core.py:390]     raise result
ERROR 04-08 20:11:52 [core.py:390] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
ERROR 04-08 20:11:52 [core.py:390] Traceback (most recent call last):
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 376, in worker_busy_loop
ERROR 04-08 20:11:52 [core.py:390]     output = func(*args, **kwargs)
ERROR 04-08 20:11:52 [core.py:390]              ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 04-08 20:11:52 [core.py:390]     return func(*args, **kwargs)
ERROR 04-08 20:11:52 [core.py:390]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory
ERROR 04-08 20:11:52 [core.py:390]     self.model_runner.profile_run()
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1548, in profile_run
ERROR 04-08 20:11:52 [core.py:390]     dummy_mm_kwargs = self.mm_registry.get_decoder_dummy_data(
ERROR 04-08 20:11:52 [core.py:390]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 470, in get_decoder_dummy_data
ERROR 04-08 20:11:52 [core.py:390]     dummy_data = profiler.get_decoder_dummy_data(seq_len, mm_counts)
ERROR 04-08 20:11:52 [core.py:390]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 224, in get_decoder_dummy_data
ERROR 04-08 20:11:52 [core.py:390]     ) = self.get_and_validate_mm_inputs(seq_len, mm_counts)
ERROR 04-08 20:11:52 [core.py:390]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 179, in get_and_validate_mm_inputs
ERROR 04-08 20:11:52 [core.py:390]     mm_inputs = self._get_dummy_mm_inputs(seq_len, mm_counts)
ERROR 04-08 20:11:52 [core.py:390]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/profiling.py", line 151, in _get_dummy_mm_inputs
ERROR 04-08 20:11:52 [core.py:390]     processor_inputs = factory.get_dummy_processor_inputs(
ERROR 04-08 20:11:52 [core.py:390]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/qwen2_vl.py", line 987, in get_dummy_processor_inputs
ERROR 04-08 20:11:52 [core.py:390]     image_token: str = hf_processor.image_token
ERROR 04-08 20:11:52 [core.py:390]                        ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-08 20:11:52 [core.py:390] AttributeError: 'Qwen2VLProcessor' object has no attribute 'image_token'
ERROR 04-08 20:11:52 [core.py:390] 
ERROR 04-08 20:11:52 [core.py:390] 
CRITICAL 04-08 20:11:52 [core_client.py:361] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
Traceback (most recent call last):
  File "/usr/local/bin/vllm", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 51, in main
    args.dispatch_function(args)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 27, in cmd
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
    async with build_async_engine_client(args) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 136, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 102, in __init__
    self.engine_core = EngineCoreClient.make_client(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 69, in make_client
    return AsyncMPClient(vllm_config, executor_class, log_stats)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 570, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 401, in __init__
    engine.proc_handle.wait_for_startup()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/utils.py", line 127, in wait_for_startup
    if self.reader.recv()["status"] != "READY":
       ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/connection.py", line 430, in _recv_bytes
    buf = self._recv(4)
          ^^^^^^^^^^^^^
  File "/usr/lib/python3.12/multiprocessing/connection.py", line 399, in _recv
    raise EOFError
EOFError

@Zerohertz
Copy link

Zerohertz commented Apr 8, 2025

I found that using the following setup resolved the issue:

          image: vllm/vllm-openai:v0.7.3

or

        - args:
            - -c
            - |
              pip install transformers==4.49.0 && \
              vllm serve bytedance-research/UI-TARS-72B-DPO --trust-remote-code --host=0.0.0.0 --gpu-memory-utilization 0.95 --tensor-parallel-size=4 --limit-mm-per-prompt image=5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working UI TARS
Projects
None yet
Development

No branches or pull requests

4 participants