Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. #3295

Closed
trajkovn opened this issue Mar 9, 2024 · 0 comments
Labels
duplicate This issue or pull request already exists

Comments

@trajkovn
Copy link

trajkovn commented Mar 9, 2024

I got the following error when running a long prompt/output on a fine tuned mistral, that otherwise works great.

params
{
"max_tokens": 9000,
"temperature": 0.0,
"n": 1,
"best_of": 5,
"use_beam_search": true
}

INFO 03-09 07:34:14 metrics.py:213] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 37.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 15.9%, CPU KV cache usage: 0.0%

INFO 03-09 07:34:17 async_llm_engine.py:133] Aborted request cmpl-00d201404782417f91da55952303060e-0.
Exception in callback functools.partial(<function _raise_exception_on_finish at 0x7f9bdd35c160>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f9bd324d660>)
handle: <Handle functools.partial(<function _raise_exception_on_finish at 0x7f9bdd35c160>, request_tracker=<vllm.engine.async_llm_engine.RequestTracker object at 0x7f9bd324d660>)>
Traceback (most recent call last):
File "/workspace/vllm/engine/async_llm_engine.py", line 29, in _raise_exception_on_finish
task.result()
File "/workspace/vllm/engine/async_llm_engine.py", line 414, in run_engine_loop
has_requests_in_progress = await self.engine_step()
File "/workspace/vllm/engine/async_llm_engine.py", line 393, in engine_step
request_outputs = await self.engine.step_async()
File "/workspace/vllm/engine/async_llm_engine.py", line 203, in step_async
return self._process_model_outputs(output, scheduler_outputs)
File "/workspace/vllm/engine/llm_engine.py", line 756, in _process_model_outputs
self._process_sequence_group_outputs(seq_group, outputs)
File "/workspace/vllm/engine/llm_engine.py", line 608, in _process_sequence_group_outputs
self.scheduler.free_seq(parent)
File "/workspace/vllm/core/scheduler.py", line 399, in free_seq
self.block_manager.free(seq)
File "/workspace/vllm/core/block_manager.py", line 314, in free
self._free_block_table(block_table)
File "/workspace/vllm/core/block_manager.py", line 305, in _free_block_table
self.gpu_allocator.free(block)
File "/workspace/vllm/core/block_manager.py", line 45, in free
raise ValueError(f"Double free! {block} is already freed.")
ValueError: Double free! PhysicalTokenBlock(device=Device.GPU, block_number=1875, ref_count=0) is already freed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/workspace/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
raise exc
File "/workspace/vllm/engine/async_llm_engine.py", line 33, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.

@hmellor hmellor added the duplicate This issue or pull request already exists label Mar 9, 2024
@hmellor hmellor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

2 participants