Enable Eagle3 speculative decoding for GPT-OSS model #25246

eldarkurtic · 2025-09-19T10:21:20Z

This PR adds support for EAGLE3 speculative decoding for GPT-OSS model. Changes tested with a locally trained speculator model, and observed reasonable acceptance rates.

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request enables Eagle3 speculative decoding for the GPT-OSS model. The changes are generally well-implemented, including adding the model to the supported list, implementing the SupportsEagle3 interface, and generalizing the embedding layer sharing logic. However, I've identified a critical issue in GptOssModel.forward where a TypeError can occur due to improper handling of a None residual on the first layer of a pipeline stage. I have provided a code suggestion to fix this bug.

vllm/model_executor/models/gpt_oss.py

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

rahul-tuli

Looks good to me!

vllm/model_executor/models/gpt_oss.py

NickLucche

LGTM but let's wait for someone more familiar with eagle perhaps.

Would be great to have smoke tests for model init at least with layer-capped versions..

vllm/model_executor/models/gpt_oss.py

NickLucche · 2025-09-19T13:21:00Z

vllm/config/speculative.py

                             f"{self.disable_by_batch_size=}")

-        eagle3_target_supported = ["llama", "qwen"]
+        eagle3_target_supported = ["llama", "qwen", "gpt_oss"]


unrelated to this PR, but I wonder why do we have to list models here instead of relying on SupportsEagle3 dispatching

To be honest not sure, I just followed llama.py / qwen.py to see how eagle is enabled there

NickLucche · 2025-09-19T13:29:13Z

vllm/v1/spec_decode/eagle.py

+                logger.info(
+                    "Assuming the EAGLE head shares the same vocab embedding"
+                    " with the target model.")
+                del self.model.model.embed_tokens


should be picked up by gc

I took this from llama.py and qwen.py. Should I remove it here or leave it for consistency?

benchislett · 2025-09-19T14:30:02Z

Noting #23596, which implements similarly but includes blackwell support via FlashInfer. Should be fine to merge this one first, but we should make sure they're consistent as we will need blackwell support as well

jiahanc · 2025-09-19T15:51:28Z

I think for models with alternative attentions like gpt-oss, you need to find correct attention builders for draft model (draft model is full attention) like https://github.com/vllm-project/vllm/pull/23596/files#diff-a4809a837fbf535a8f0999b11087a53ec1c53948b50c0a1fe64396bc86de9461R883-R906 . Without this, it will use the sliding window attention, which will have accuracy issues (target model KV will be overwritten by draft model)
Also this part is required https://github.com/vllm-project/vllm/pull/23596/files#diff-80ee7e2a62f9dcfbb8a312dc4e3948557e97ef187290daebbcae1e28596bda29R1107-R1111.

eldarkurtic · 2025-09-20T11:02:15Z

The goal of this one was to first add support for the simplest Llama-like-speculator from Eagle3. And then we can build on top of it for more complex archs. This is unblock some preliminary experimentation with speculative decoding.

) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

enable eagle3 spec for gpt-oss

f157960

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

eldarkurtic requested review from benchislett, luccafong, simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 19, 2025 10:21

mergify bot added gpt-oss Related to GPT-OSS models speculative-decoding v1 labels Sep 19, 2025

github-project-automation bot moved this to To Triage in gpt-oss Issues & Enhancements Sep 19, 2025

github-project-automation bot added this to gpt-oss Issues & Enhancements Sep 19, 2025

gemini-code-assist bot reviewed Sep 19, 2025

View reviewed changes

vllm/model_executor/models/gpt_oss.py Outdated Show resolved Hide resolved

accept gemini suggestion

6917d40

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

rahul-tuli approved these changes Sep 19, 2025

View reviewed changes

vllm/model_executor/models/gpt_oss.py Show resolved Hide resolved

NickLucche approved these changes Sep 19, 2025

View reviewed changes

github-project-automation bot moved this from To Triage to Ready in gpt-oss Issues & Enhancements Sep 19, 2025

Merge branch 'main' into spec-eagle3-gptoss

11f6d6c

DarkLight1337 enabled auto-merge (squash) September 22, 2025 07:04

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 22, 2025

DarkLight1337 merged commit 21467f9 into vllm-project:main Sep 22, 2025
58 checks passed

github-project-automation bot moved this from Ready to Done in gpt-oss Issues & Enhancements Sep 22, 2025

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

Enable Eagle3 speculative decoding for GPT-OSS model (vllm-project#25246

4859378

) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

Enable Eagle3 speculative decoding for GPT-OSS model (vllm-project#25246

d5f9b30

) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: charlifu <charlifu@amd.com>

yewentao256 pushed a commit that referenced this pull request Oct 3, 2025

Enable Eagle3 speculative decoding for GPT-OSS model (#25246)

ef85a43

Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enable Eagle3 speculative decoding for GPT-OSS model #25246

Enable Eagle3 speculative decoding for GPT-OSS model #25246

Uh oh!

eldarkurtic commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

rahul-tuli left a comment

Uh oh!

Uh oh!

NickLucche left a comment

Uh oh!

Uh oh!

NickLucche Sep 19, 2025

Uh oh!

eldarkurtic Sep 19, 2025

Uh oh!

NickLucche Sep 19, 2025

Uh oh!

eldarkurtic Sep 20, 2025

Uh oh!

benchislett commented Sep 19, 2025

Uh oh!

jiahanc commented Sep 19, 2025 •

edited

Loading

Uh oh!

eldarkurtic commented Sep 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Enable Eagle3 speculative decoding for GPT-OSS model #25246

Enable Eagle3 speculative decoding for GPT-OSS model #25246

Uh oh!

Conversation

eldarkurtic commented Sep 19, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

eldarkurtic Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Sep 19, 2025

Choose a reason for hiding this comment

Uh oh!

eldarkurtic Sep 20, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett commented Sep 19, 2025

Uh oh!

jiahanc commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eldarkurtic commented Sep 20, 2025

Uh oh!

Uh oh!

Uh oh!

jiahanc commented Sep 19, 2025 •

edited

Loading