[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" #3837

cadedaniel · 2024-04-03T23:46:26Z

This PR implements the BaseWorker interface described in #3809. This enables speculative decoding to treat all workers in the same way, allowing future work to enable speculative decoding for CPU/Neuron/other vLLM backends.

I will write up docs on how to do this after spec decode is merged; the TL;DR is need to implement the rejection sampler (currently implemented only in pytorch) and add plumbing between the proposal method and verification model that works for the hardware backend (currently only top1 fixed speculation is implemented for torch).

Notes

This PR moves some logic in each of the executors back to their respective workers, for example logic to check if the cache size is valid. This allows speculative decoding to benefit from these checks, since the speculative worker will compose within it workers (not executors). Notably, the checks are different for cpu/gpu workers.

Closes #3809

LiuXiaoxuanPKU

LGTM! Just some minor questions.

LiuXiaoxuanPKU · 2024-04-09T03:57:24Z

vllm/worker/cpu_worker.py

+        # Note: To reuse the cache management procedure,
+        # use cpu cache as 'gpu cache'.
+        num_cpu_blocks = num_gpu_blocks
+        del num_gpu_blocks


Confused about line 209-212, why del here? self.cache_config.num_gpu_blocks = num_gpu_blocks?

I'm trying to make the code readable given the awkwardness of num_gpu_blocks actually being num_cpu_blocks and num_cpu_blocks being ignored/always zero. But yeah the del is a bit extreme for this..

I refactored out the checks; code is more readable now

LiuXiaoxuanPKU · 2024-04-09T04:01:14Z

vllm/worker/cpu_worker.py

+        self.cache_config.num_gpu_blocks = num_cpu_blocks
+        self.cache_config.num_cpu_blocks = 0
+
+        if num_cpu_blocks <= 0:


Nit: move the check before we read num_cpu_blocks

…e hardware-agnostic speculative decoding" (vllm-project#3837)

cadedaniel added 30 commits April 3, 2024 14:17

wip

252a0c7

Merge remote-tracking branch 'upstream/main' into executor_base

dd629d4

wip

a34800f

wip

09f30bd

clean

8b5bb8b

wip

6fd424f

wip

2a347bb

wip

658ff9b

wip

acee7be

wip

85760d6

wip

408b29d

Merge remote-tracking branch 'upstream/main' into executor_base

9d8fd69

wip

3149a03

wip

0c32e0a

wip

f64d5b1

wip

7207f0c

wip

0c4df0b

wip

2e355e7

wip

edb7f62

wip

48bb3e9

fix test

7b39044

fix test

9e5f2fb

fix test

1a3e26e

fix test

cd2015c

fix

d926034

fix

607f7e2

fix

e127bb7

fix

deaa8b0

clean

7817d61

clean

99823a3

cadedaniel added 10 commits April 4, 2024 19:39

rename

3bb9e6f

wip

edad09c

wip

f93c845

wip

d2d2218

lint

2f960e7

wip

68552e1

import order

42983ba

fix

2d5dbb8

docstrings

ae2f7e6

Merge branch 'main' into executor_base

c89bb75

cadedaniel changed the title ~~[Draft] [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding"~~ [Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" Apr 5, 2024

cadedaniel marked this pull request as ready for review April 5, 2024 05:59

cadedaniel mentioned this pull request Apr 5, 2024

[RFC] Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding #3809

Closed

LiuXiaoxuanPKU self-assigned this Apr 5, 2024

Merge remote-tracking branch 'upstream/main' into executor_base

2b0d787

cadedaniel mentioned this pull request Apr 8, 2024

[Speculative decoding 6/9] Integrate speculative decoding with LLMEngine #3894

Merged

LiuXiaoxuanPKU reviewed Apr 9, 2024

View reviewed changes

pr feedback

ca516aa

cadedaniel enabled auto-merge (squash) April 9, 2024 04:49

LiuXiaoxuanPKU approved these changes Apr 9, 2024

View reviewed changes

cadedaniel merged commit e7c7067 into vllm-project:main Apr 9, 2024
35 checks passed

cadedaniel mentioned this pull request Apr 9, 2024

[Misc] [CI]: Flaky test failure in test_chatglm3_lora #3947

Closed

SageMoore pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 11, 2024

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enabl…

698bbe7

…e hardware-agnostic speculative decoding" (vllm-project#3837)

andy-neuma pushed a commit to neuralmagic/nm-vllm that referenced this pull request Apr 12, 2024

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enabl…

7e06ab2

…e hardware-agnostic speculative decoding" (vllm-project#3837)

This was referenced Apr 12, 2024

[Bug]: Incorrect typing for Python 3.8 #4035

Closed

[Bugfix] More type hint fixes for py 3.8 #4039

Merged

bigPYJ1151 mentioned this pull request Apr 15, 2024

[Misc][Minor] Fix CPU block num log in CPUExecutor. #4088

Merged

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enabl…

d351238

…e hardware-agnostic speculative decoding" (vllm-project#3837)

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" #3837

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" #3837

cadedaniel commented Apr 3, 2024 •

edited

LiuXiaoxuanPKU left a comment

LiuXiaoxuanPKU Apr 9, 2024

cadedaniel Apr 9, 2024

LiuXiaoxuanPKU Apr 9, 2024

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" #3837

[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" #3837

Conversation

cadedaniel commented Apr 3, 2024 • edited

Notes

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

LiuXiaoxuanPKU Apr 9, 2024

Choose a reason for hiding this comment

cadedaniel Apr 9, 2024

Choose a reason for hiding this comment

LiuXiaoxuanPKU Apr 9, 2024

Choose a reason for hiding this comment

cadedaniel commented Apr 3, 2024 •

edited