Skip to content

[Tracking Issue][Performance]: Speculative decoding performance/QoL improvements #28947

@xinli-sw

Description

@xinli-sw

Below are some of the items that help improve the usability and performance of speculative decoding in vLLM. Please feel free to review, suggest and collaborate if you are interested!

Point of Contact: @benchislett , drafted by @benchislett

Asynchronous Scheduling Support

See main Async Scheduling tracking issue for background: #27679

New Drafting Styles

Performance Improvements

Sampling Improvements

CC List

@pavanimajety @vadiklyutiy @hjjq @nvpohanh

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions