Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

jacobthebanana · 2024-03-07T21:54:47Z

Ensures the LoRA ID is a part of the hash used for prefix blocks.

jacobthebanana · 2024-03-07T22:02:20Z

Example unit test output with the revised test case and without the fix (see commit 3441735).

test_auto_prefix_caching passes when either the request specifies one lora adapter, or when no adapters was requested.
test_auto_prefix_caching does not pass when subsequent requests specify different adapters (or one request without adapter and one request with lora adapter enabled.)

$ git reset --hard 3441735
> HEAD is now at 3441735 Added test case of lora block_hash conflict.
$ pytest tests/test_cache_block_hashing.py
============================================================= test session starts ==============================================================
platform linux -- Python 3.10.12, pytest-8.0.2, pluggy-1.4.0
plugins: forked-1.6.0, anyio-4.3.0, rerunfailures-13.0, asyncio-0.23.5
asyncio: mode=strict
collected 5 items                                                                                                                              

tests/test_cache_block_hashing.py ..FFF                                                                                                  [100%]

=================================================================== FAILURES ===================================================================
_________________________________ test_auto_prefix_caching[concurrent_lora_int_ids2-256-16-facebook/opt-125m] __________________________________

model = 'facebook/opt-125m', block_size = 16, max_num_seqs = 256, concurrent_lora_int_ids = [None, 1]

...

        for hash0, hash1 in zip(flatten_2d(hashes[0]), flatten_2d(hashes[1])):
>           assert (hash0 != hash1)
E           assert 6230683134333785342 != 6230683134333785342

tests/test_cache_block_hashing.py:84: AssertionError
_________________________________ test_auto_prefix_caching[concurrent_lora_int_ids3-256-16-facebook/opt-125m] __________________________________

model = 'facebook/opt-125m', block_size = 16, max_num_seqs = 256, concurrent_lora_int_ids = [None, 1, 2]
...

tests/test_cache_block_hashing.py:84: AssertionError
_________________________________ test_auto_prefix_caching[concurrent_lora_int_ids4-256-16-facebook/opt-125m] __________________________________

model = 'facebook/opt-125m', block_size = 16, max_num_seqs = 256, concurrent_lora_int_ids = [1, 2]
...

tests/test_cache_block_hashing.py:84: AssertionError
=========================================================== short test summary info ============================================================
FAILED tests/test_cache_block_hashing.py::test_auto_prefix_caching[concurrent_lora_int_ids2-256-16-facebook/opt-125m] - assert 6230683134333785342 != 6230683134333785342
FAILED tests/test_cache_block_hashing.py::test_auto_prefix_caching[concurrent_lora_int_ids3-256-16-facebook/opt-125m] - assert 6230683134333785342 != 6230683134333785342
FAILED tests/test_cache_block_hashing.py::test_auto_prefix_caching[concurrent_lora_int_ids4-256-16-facebook/opt-125m] - assert 6230683134333785342 != 6230683134333785342
==================================================== 3 failed, 2 passed, 1 warning in 1.47s ====================================================

jacobthebanana · 2024-03-07T22:04:16Z

This PR closes #3264

Yard1

Thanks, that's exactly how it should be implemented!

…ect#2762) and multi-LoRA support (vllm-project#1804) (vllm-project#3263)

jacobthebanana added 2 commits March 7, 2024 15:49

Added test case of lora block_hash conflict.

3441735

LoRA block_hash conflict: added test case and suggested fix

7d1b048

jacobthebanana mentioned this pull request Mar 7, 2024

Automatic Prefix Caching (#2792) might conflict with multi-LoRA (#1804) #3264

Closed

jacobthebanana marked this pull request as ready for review March 7, 2024 22:02

Yard1 approved these changes Mar 7, 2024

View reviewed changes

Yard1 enabled auto-merge (squash) March 7, 2024 22:06

Yard1 merged commit 8cbba46 into vllm-project:main Mar 7, 2024
23 checks passed

AdrianAbeyta pushed a commit to AdrianAbeyta/vllm that referenced this pull request Mar 8, 2024

Possible fix for conflict between Automated Prefix Caching (vllm-proj…

fd6e57e

…ect#2762) and multi-LoRA support (vllm-project#1804) (vllm-project#3263)

dtransposed pushed a commit to afeldman-nm/vllm that referenced this pull request Mar 26, 2024

Possible fix for conflict between Automated Prefix Caching (vllm-proj…

12634be

…ect#2762) and multi-LoRA support (vllm-project#1804) (vllm-project#3263)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

jacobthebanana commented Mar 7, 2024 •

edited by Yard1

Loading

jacobthebanana commented Mar 7, 2024

jacobthebanana commented Mar 7, 2024

Yard1 left a comment

Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

Possible fix for conflict between Automated Prefix Caching (#2762) and multi-LoRA support (#1804) #3263

Conversation

jacobthebanana commented Mar 7, 2024 • edited by Yard1 Loading

jacobthebanana commented Mar 7, 2024

jacobthebanana commented Mar 7, 2024

Yard1 left a comment

Choose a reason for hiding this comment

jacobthebanana commented Mar 7, 2024 •

edited by Yard1

Loading