[Speculative decoding] Adding configuration object for speculative decoding #3706

cadedaniel · 2024-03-29T00:23:44Z

This PR is a subset of PR 6/9: Integrate speculative decoding with LLMEngine. in the speculative decoding open sourcing plan. It introduces a SpeculativeConfig and plumbs it through to the executors. The new flags are as follows:

parser.add_argument(
    '--speculative-model',
    type=str,
    default=None,
    help=
    'The name of the draft model to be used in speculative decoding.')

parser.add_argument(
    '--num-speculative-tokens',
    type=int,
    default=None,
    help='The number of speculative tokens to sample from '
    'the draft model in speculative decoding')

In the future we can extend these flags to support non-draft-model speculative decoding.

Testing

We assert that the GPUExecutor raises AssertionError when spec decode is enabled. This verifies the config works.

Misc.

This wraps the various engine configs class in an EngineConfig. This removes the need to do parallel_config = engine_configs[2] and device_config = engine_configs[4].

cadedaniel · 2024-04-01T23:48:53Z

ready for review @LiuXiaoxuanPKU

LiuXiaoxuanPKU

LGTM!

vllm/config.py

Co-authored-by: Lily Liu <lilyliupku@gmail.com>

…coding (vllm-project#3706) Co-authored-by: Lily Liu <lilyliupku@gmail.com>

cadedaniel changed the title ~~[WIP] [Speculative decoding] Adding configuration object for speculative decoding~~ [Speculative decoding] Adding configuration object for speculative decoding Mar 29, 2024

LiuXiaoxuanPKU self-assigned this Mar 29, 2024

cadedaniel marked this pull request as ready for review April 1, 2024 23:48

spec decode config

7bc7532

cadedaniel force-pushed the spec-decode-llm-engine branch from efc0278 to 7bc7532 Compare April 1, 2024 23:55

empty

1aefa81

cadedaniel force-pushed the spec-decode-llm-engine branch from 8fad4f5 to 1aefa81 Compare April 2, 2024 03:51

cadedaniel added 4 commits April 2, 2024 13:47

Merge remote-tracking branch 'upstream/main' into spec-decode-llm-engine

bb31b89

fix

6471c76

lint

f8d452e

fix2

f93d8b1

LiuXiaoxuanPKU approved these changes Apr 2, 2024

View reviewed changes

vllm/config.py Outdated Show resolved Hide resolved

Update vllm/config.py

fb38197

Co-authored-by: Lily Liu <lilyliupku@gmail.com>

cadedaniel enabled auto-merge (squash) April 2, 2024 23:07

empty

5831cc9

cadedaniel merged commit 5757d90 into vllm-project:main Apr 3, 2024
35 checks passed

cadedaniel deleted the spec-decode-llm-engine branch April 3, 2024 17:37

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024

[Speculative decoding] Adding configuration object for speculative de…

84cdd7a

…coding (vllm-project#3706) Co-authored-by: Lily Liu <lilyliupku@gmail.com>

dtrifiro mentioned this pull request May 15, 2024

bump ubi base image tag opendatahub-io/vllm#24

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative decoding] Adding configuration object for speculative decoding #3706

[Speculative decoding] Adding configuration object for speculative decoding #3706

cadedaniel commented Mar 29, 2024 •

edited

cadedaniel commented Apr 1, 2024

LiuXiaoxuanPKU left a comment

[Speculative decoding] Adding configuration object for speculative decoding #3706

[Speculative decoding] Adding configuration object for speculative decoding #3706

Conversation

cadedaniel commented Mar 29, 2024 • edited

Testing

Misc.

cadedaniel commented Apr 1, 2024

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

cadedaniel commented Mar 29, 2024 •

edited