Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #1556 double free #3347

Merged
merged 3 commits into from
Mar 13, 2024
Merged

Fixes #1556 double free #3347

merged 3 commits into from
Mar 13, 2024

Conversation

br3no
Copy link
Contributor

@br3no br3no commented Mar 12, 2024

This PR fixes issue #1556.

Conditions for the bug to appear:

The double free error occurs when there are multiple sequences sharing blocks and the context window exceeds the model's sliding-window.

The test function in this PR will fail on the block_manager.py in main with the "double free" error mentioned in #1556.

What is the underlying cause of the bug?

The reason this happens is that in append_slot, when a new block needs to be allocated because an existing block cannot be reused (because it is referenced by other sequences), its ref count is decremented for the current sequence: https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager.py#L307
This block is thus considered not to be referenced anymore by this sequence.

Later, when _free_block_table is called for the block table of this sequence, vllm tries to free this block once again: https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager.py#L396, leading to the error.

What is the fix?

There are two small parts:

  1. the ref_count of shared blocks was being wrongly incremented on fork because all blocks in the block table were being incremented. This is wrong because if the context exceeds the sliding window, blocks in the block tables will be reused. The fix makes sure to only increment the ref_count by 1, by wrapping the table in set().
  2. the _free_block_table should not free already freed blocks. To this end, the method was changed to just free the last sliding_window blocks (if defined).

@hmellor
Copy link
Collaborator

hmellor commented Mar 12, 2024

If you run ./format.sh then this should be ready for review.

cc @WoosukKwon as you responded to the original issue.

@br3no
Copy link
Contributor Author

br3no commented Mar 12, 2024

@hmellor, done.

Copy link
Collaborator

@cadedaniel cadedaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Comment on lines +396 to +398
blocks_to_free = (block_table[-self.block_sliding_window:]
if self.block_sliding_window is not None else
block_table)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment for this line?

@@ -312,7 +312,7 @@ def fork(self, parent_seq: Sequence, child_seq: Sequence) -> None:
# Thus, it is always safe from OOM.
src_block_table = self.block_tables[parent_seq.seq_id]
self.block_tables[child_seq.seq_id] = src_block_table.copy()
for block in src_block_table:
for block in set(src_block_table):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add comment why we need to deduplicate?

@@ -274,3 +274,48 @@ def test_reset():
# Resetting block manager frees all allocated blocks.
block_manager.reset()
assert block_manager.get_num_free_gpu_blocks() == original_blocks


def test_sliding_window_multi_seq():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a small docstring explaining what runtime property this test verifies and the basic methodology?

@br3no
Copy link
Contributor Author

br3no commented Mar 12, 2024

@cadedaniel here you go! Let me know if there's anything else missing.

@cadedaniel
Copy link
Collaborator

Just waiting for CI to finish 👍

@Yard1 Yard1 enabled auto-merge (squash) March 12, 2024 22:09
@Yard1 Yard1 merged commit 49a3c86 into vllm-project:main Mar 13, 2024
24 checks passed
@br3no br3no deleted the double-free-bug branch March 13, 2024 09:45
starmpcc pushed a commit to starmpcc/vllm that referenced this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants