-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #1556 double free #3347
Fixes #1556 double free #3347
Conversation
If you run cc @WoosukKwon as you responded to the original issue. |
@hmellor, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
blocks_to_free = (block_table[-self.block_sliding_window:] | ||
if self.block_sliding_window is not None else | ||
block_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment for this line?
@@ -312,7 +312,7 @@ def fork(self, parent_seq: Sequence, child_seq: Sequence) -> None: | |||
# Thus, it is always safe from OOM. | |||
src_block_table = self.block_tables[parent_seq.seq_id] | |||
self.block_tables[child_seq.seq_id] = src_block_table.copy() | |||
for block in src_block_table: | |||
for block in set(src_block_table): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add comment why we need to deduplicate?
@@ -274,3 +274,48 @@ def test_reset(): | |||
# Resetting block manager frees all allocated blocks. | |||
block_manager.reset() | |||
assert block_manager.get_num_free_gpu_blocks() == original_blocks | |||
|
|||
|
|||
def test_sliding_window_multi_seq(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a small docstring explaining what runtime property this test verifies and the basic methodology?
@cadedaniel here you go! Let me know if there's anything else missing. |
Just waiting for CI to finish 👍 |
This PR fixes issue #1556.
Conditions for the bug to appear:
The double free error occurs when there are multiple sequences sharing blocks and the context window exceeds the model's sliding-window.
The test function in this PR will fail on the
block_manager.py
in main with the "double free" error mentioned in #1556.What is the underlying cause of the bug?
The reason this happens is that in
append_slot
, when a new block needs to be allocated because an existing block cannot be reused (because it is referenced by other sequences), its ref count is decremented for the current sequence: https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager.py#L307This block is thus considered not to be referenced anymore by this sequence.
Later, when
_free_block_table
is called for the block table of this sequence, vllm tries to free this block once again: https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager.py#L396, leading to the error.What is the fix?
There are two small parts:
ref_count
of shared blocks was being wrongly incremented onfork
because all blocks in the block table were being incremented. This is wrong because if the context exceeds the sliding window, blocks in the block tables will be reused. The fix makes sure to only increment theref_count
by 1, by wrapping the table inset()
._free_block_table
should not free already freed blocks. To this end, the method was changed to just free the lastsliding_window
blocks (if defined).