Skip to content

[Feature] add Minimax support and Kimi parser updates#45

Merged
yubofredwang merged 5 commits intomainfrom
ywang/support-minimax-training
Mar 19, 2026
Merged

[Feature] add Minimax support and Kimi parser updates#45
yubofredwang merged 5 commits intomainfrom
ywang/support-minimax-training

Conversation

@yubofredwang
Copy link
Collaborator

Enable Minimax chat formatting, parsing, draft-model config, and parser-focused tests so M2.5 data and training flows work end to end. Include the related Kimi parser/template coverage updates and align the Kimi Eagle3 draft config with the intended KV-head setting, while keeping checkpoint export dtype control plus runtime env passthrough for FP8-compatible serving.

Enable Minimax chat formatting, parsing, draft-model config, and parser-focused tests so M2.5 data and training flows work end to end. Include the related Kimi parser/template coverage updates and align the Kimi Eagle3 draft config with the intended KV-head setting, while keeping checkpoint export dtype control plus runtime env passthrough for FP8-compatible serving.
…ning

Add dataset.shuffle_dataset (default True) so users can disable automatic
dataset shuffling when the training data is intentionally ordered (e.g.
curriculum learning, staged difficulty). Threads through both the offline
preprocessing path and the online training controller epoch reload.
@yubofredwang yubofredwang marked this pull request as ready for review March 19, 2026 17:48
Copilot AI review requested due to automatic review settings March 19, 2026 17:48
Test-only concern: the HF model paths were only used by
test_loss_mask_cross_validation to load tokenizers for validation.
Keep ChatTemplate focused on chat format metadata.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end support for MiniMax-M2.5 chat formatting/parsing alongside Kimi-K2.5 parser/template updates, enabling Minimax/Kimi data flows to work through preprocessing, training, and conversion/serving utilities.

Changes:

  • Introduce minimax-m2 chat template + MiniMaxParser, with comprehensive unit tests and multimodal/tool-call handling.
  • Extend Kimi-K2.5 formatting to support expand_media_tokens=False passthrough behavior and add focused tests.
  • Add dataset shuffle control (shuffle_dataset) and add --dtype output casting support to the HF conversion tool (plus env passthrough for SGLang VLM cache sizing).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
torchspec/utils/env.py Forward SGLANG_VLM_CACHE_SIZE_MB to Ray actors for serving/runtime configuration.
torchspec/data/template.py Add reference_model metadata to templates and register the new minimax-m2 template.
torchspec/data/preprocessing.py Make dataset shuffling conditional on shuffle_seed being set.
torchspec/data/parse.py Add MiniMaxParser; update thinking detection; add media-token passthrough option to Kimi parser.
torchspec/controller/training_controller.py Add optional deterministic shuffling toggle via shuffle_dataset and rename dataset prep helper.
torchspec/config/train_config.py Add shuffle_dataset to DatasetConfig so it can be configured via YAML.
tools/convert_to_hf.py Add --dtype to control output weight dtype during HF conversion.
tests/test_minimax_parser.py New unit tests covering MiniMax formatting/parsing, tools, thinking, multimodal, truncation, passthrough.
tests/test_kimi_k25_parser.py Add tests for expand_media_tokens=False; remove real-tokenizer integration tests.
configs/draft_models/minimax_m25_eagle3.json Add draft-model config for MiniMax M2.5 Eagle3.
configs/draft_models/kimi_k25_eagle3.json Update Kimi K2.5 Eagle3 KV-head setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1c63185234

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

json.loads on string arguments crashed on malformed or plain-text
payloads, aborting formatting for the whole job. Catch JSONDecodeError
and preserve the raw string as a fallback.
…-prune-vocab

The prune-vocab path wrote raw_config from disk without updating torch_dtype,
causing exported weights and config metadata to diverge when --dtype was specified.
Copilot AI review requested due to automatic review settings March 19, 2026 17:57
@yubofredwang yubofredwang review requested due to automatic review settings March 19, 2026 17:58
@yubofredwang yubofredwang merged commit 2c25d5d into main Mar 19, 2026
3 checks passed
@yubofredwang yubofredwang deleted the ywang/support-minimax-training branch March 19, 2026 17:59
zhubohao911 pushed a commit to zhubohao911/TorchSpec that referenced this pull request Mar 23, 2026
zhubohao911 pushed a commit to zhubohao911/TorchSpec that referenced this pull request Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants