[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs #4672

cadedaniel · 2024-05-08T07:01:33Z

There's a bug introduced in my spec logprobs PR that only shows up for ngram. We missed it because the ngram tests weren't actually running until recently (#4551).

The bug is that the output logprob tensor of shape (batch_size, num_speculation, vocab_size, sizeof(float32)) is returned once per step, instead of slicing it into (batch_size, 1, vocab_size, sizeof(float32)) views. The spec decode framework does a torch.stack to make contiguous the logprobs of all steps; this caused quadratic memory allocation in num_steps. for a high enough batch size it can cause ooms (e.g. 64).

This is fixed by appropriately slicing the output logprobs.

comaniac

Good catch!

LiuXiaoxuanPKU

Thanks for the fix!

…gprobs (vllm-project#4672)

fix overallocation

fc8e385

comaniac approved these changes May 8, 2024

View reviewed changes

comaniac mentioned this pull request May 8, 2024

[Bug fix][Core] fixup ngram not setup correctly #4551

Merged

cadedaniel changed the title ~~[WIP] Fix overallocation in ngram + spec logprobs~~ [Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs May 8, 2024

cadedaniel marked this pull request as ready for review May 8, 2024 21:00

simon-mo approved these changes May 8, 2024

View reviewed changes

simon-mo enabled auto-merge (squash) May 8, 2024 21:01

LiuXiaoxuanPKU approved these changes May 8, 2024

View reviewed changes

simon-mo merged commit 8b9241b into vllm-project:main May 8, 2024
54 of 55 checks passed

z103cb pushed a commit to z103cb/opendatahub_vllm that referenced this pull request May 9, 2024

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec lo…

3f6229f

…gprobs (vllm-project#4672)

robertgshaw2-neuralmagic pushed a commit to neuralmagic/nm-vllm that referenced this pull request May 19, 2024

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec lo…

edd9e90

…gprobs (vllm-project#4672)

dtrifiro pushed a commit to dtrifiro/vllm that referenced this pull request May 21, 2024

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec lo…

683a105

…gprobs (vllm-project#4672)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs #4672

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs #4672

cadedaniel commented May 8, 2024 •

edited

Loading

comaniac left a comment

LiuXiaoxuanPKU left a comment

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs #4672

[Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs #4672

Conversation

cadedaniel commented May 8, 2024 • edited Loading

comaniac left a comment

Choose a reason for hiding this comment

LiuXiaoxuanPKU left a comment

Choose a reason for hiding this comment

cadedaniel commented May 8, 2024 •

edited

Loading