feat: add train_with_decode support for speculative decoding during training#28
Merged
torchspec-bot merged 1 commit intomainfrom Mar 4, 2026
Merged
feat: add train_with_decode support for speculative decoding during training#28torchspec-bot merged 1 commit intomainfrom
torchspec-bot merged 1 commit intomainfrom
Conversation
b9f8d3d to
60fa773
Compare
46b9971 to
b3f976c
Compare
b3f976c to
52cd929
Compare
…raining Co-authored-by: Bobbie Bie <96061080+BobbyIsHandsome@users.noreply.github.com> Co-authored-by: Junxiong Wang <chuangzhetianxia@gmail.com> Co-authored-by: Shirley Wu <shirley@research-dev-b200-04.cloud.together.ai> Co-authored-by: Yubo Wang <yubowang2019@gmail.com>
yubofredwang
approved these changes
Mar 4, 2026
cicirori
added a commit
that referenced
this pull request
Mar 4, 2026
…raining (#28) Co-authored-by: BobbyIsHandsome <96061080+BobbyIsHandsome@users.noreply.github.com> Co-authored-by: Junxiong Wang <16102460+jxiw@users.noreply.github.com> Co-authored-by: xwuShirley <37637998+xwuShirley@users.noreply.github.com> Co-authored-by: Yubo Wang <10526540+yubofredwang@users.noreply.github.com>
zhubohao911
pushed a commit
to zhubohao911/TorchSpec
that referenced
this pull request
Mar 22, 2026
- Results: Compute sub-breakdown, 200-step stability, optimization tests - Issues: torchspec-project#27 torch.compile recompilation, torchspec-project#28 GPU Direct RDMA, torchspec-project#29 Mooncake bypass - Pending work: Updated completed items, active training tasks - Best config: no_sync + bf16 reduce → 2.7 step/s (+8%), ~3.9hr training Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
zhubohao911
pushed a commit
to zhubohao911/TorchSpec
that referenced
this pull request
Mar 23, 2026
…raining (torchspec-project#28) Co-authored-by: BobbyIsHandsome <96061080+BobbyIsHandsome@users.noreply.github.com> Co-authored-by: Junxiong Wang <16102460+jxiw@users.noreply.github.com> Co-authored-by: xwuShirley <37637998+xwuShirley@users.noreply.github.com> Co-authored-by: Yubo Wang <10526540+yubofredwang@users.noreply.github.com>
zhubohao911
pushed a commit
to zhubohao911/TorchSpec
that referenced
this pull request
Mar 23, 2026
- Results: Compute sub-breakdown, 200-step stability, optimization tests - Issues: torchspec-project#27 torch.compile recompilation, torchspec-project#28 GPU Direct RDMA, torchspec-project#29 Mooncake bypass - Pending work: Updated completed items, active training tasks - Best config: no_sync + bf16 reduce → 2.7 step/s (+8%), ~3.9hr training
zhubohao911
pushed a commit
to zhubohao911/TorchSpec
that referenced
this pull request
Mar 23, 2026
- Results: Compute sub-breakdown, 200-step stability, optimization tests - Issues: torchspec-project#27 torch.compile recompilation, torchspec-project#28 GPU Direct RDMA, torchspec-project#29 Mooncake bypass - Pending work: Updated completed items, active training tasks - Best config: no_sync + bf16 reduce → 2.7 step/s (+8%), ~3.9hr training
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DecodeConfigdataclass,SglDecodeEngineMixinfor decode-mode generation and draft weight sync, sglang decode patch, configs, and example scriptsChanges
torchspec/config/train_config.py: Addtrain_with_decodeflag,DecodeConfigdataclass, updatedynamic_loss_maskconditionaltorchspec/inference/engine/sgl_engine_decode.py: New mixin withgenerate_with_decode()andupdate_weights_from_disk()torchspec/inference/engine/sgl_engine.py: InheritSglDecodeEngineMixin, add decode engine kwargstorchspec/controller/inference_manager.py: Conditional dispatch (generate vs generate_with_decode)torchspec/controller/loop.py:_maybe_sync_draft_weights()for periodic draft model updatestorchspec/train_entry.py:_maybe_create_scratch_draft(), startup validation for decode modetorchspec/data/dataset.py: Passtrain_with_decodeasadd_generation_promptthrough data pipelinepatches/sglang/v0.5.8.post1/sglang_decode.patch: Full sglang patch for decode-mode supporttools/apply_sglang_patch.sh: Add--decodeflag to apply decode patchtools/convert_to_hf.py: Addtrain_with_decode=Falseparameterconfigs/train_with_decode/: Qwen3-8B and Kimi-K2.5-NVFP4 decode-mode configsexamples/train-with-decode/: Launch scripts for decode-mode trainingTest plan
Acknowledgements
Co-authored-by: BobbyIsHandsome 96061080+BobbyIsHandsome@users.noreply.github.com
Co-authored-by: Junxiong Wang 16102460+jxiw@users.noreply.github.com
Co-authored-by: xwuShirley 37637998+xwuShirley@users.noreply.github.com
Co-authored-by: Yubo Wang 10526540+yubofredwang@users.noreply.github.com
Generated with Claude Code