Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speculative decoding] Adding configuration object for speculative decoding #3706

Merged
merged 8 commits into from
Apr 3, 2024

Conversation

cadedaniel
Copy link
Collaborator

@cadedaniel cadedaniel commented Mar 29, 2024

This PR is a subset of PR 6/9: Integrate speculative decoding with LLMEngine. in the speculative decoding open sourcing plan. It introduces a SpeculativeConfig and plumbs it through to the executors. The new flags are as follows:

parser.add_argument(
    '--speculative-model',
    type=str,
    default=None,
    help=
    'The name of the draft model to be used in speculative decoding.')

parser.add_argument(
    '--num-speculative-tokens',
    type=int,
    default=None,
    help='The number of speculative tokens to sample from '
    'the draft model in speculative decoding')

In the future we can extend these flags to support non-draft-model speculative decoding.

Testing

  • We assert that the GPUExecutor raises AssertionError when spec decode is enabled. This verifies the config works.

Misc.

  • This wraps the various engine configs class in an EngineConfig. This removes the need to do parallel_config = engine_configs[2] and device_config = engine_configs[4].

@cadedaniel cadedaniel changed the title [WIP] [Speculative decoding] Adding configuration object for speculative decoding [Speculative decoding] Adding configuration object for speculative decoding Mar 29, 2024
@LiuXiaoxuanPKU LiuXiaoxuanPKU self-assigned this Mar 29, 2024
@cadedaniel cadedaniel marked this pull request as ready for review April 1, 2024 23:48
@cadedaniel
Copy link
Collaborator Author

ready for review @LiuXiaoxuanPKU

Copy link
Collaborator

@LiuXiaoxuanPKU LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

vllm/config.py Outdated Show resolved Hide resolved
Co-authored-by: Lily Liu <lilyliupku@gmail.com>
@cadedaniel cadedaniel enabled auto-merge (squash) April 2, 2024 23:07
@cadedaniel cadedaniel merged commit 5757d90 into vllm-project:main Apr 3, 2024
35 checks passed
@cadedaniel cadedaniel deleted the spec-decode-llm-engine branch April 3, 2024 17:37
z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants