Releases: modelscope/ms-swift
Releases · modelscope/ms-swift
Patch release v3.10.3
Full Changelog: v3.10.2...v3.10.3
Patch release v3.10.2
Full Changelog: v3.10.1...v3.10.2
Patch release v3.10.1
Full Changelog: v3.10.0...v3.10.1
v3.10.0
中文版
新特性
- Megatron-SWIFT
a. Mcore-Bridge发布。支持直接加载和存储 safetensors 格式的模型权重;支持LoRA增量权重双向转换;支持多机转换。文档参考:https://swift.readthedocs.io/zh-cn/latest/Megatron-SWIFT/Mcore-Bridge.html 。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. megatron-core 版本升级至0.14.0。
c. 多模态模型训练新增vit_lr和aligner_lr参数支持。
d. 新增存储优化参数:async_save, save_retain_interval等。
e. 支持batched mrope,加速Qwen3-VL、Qwen2.5-VL等模型的训练速度。 - RL
a. GRPO LoRA 训练权重同步速度优化,具体参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/GetStarted/GRPO.html#id3
b. GRPO 训练显存优化以降低峰值显存占用。
c. RLVR 新算法支持:RLOO,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/RLOO.html 。REINFORCE++ Baseline,文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD 支持使用 vLLM 加速策略模型rollout,并新增参数teacher_deepspeed额外控制教师模型分片策略。文档参考:https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html
e. GSPO 支持使用liger_kernel减少显存使用。 - 训练
a. PT/SFT/采样/数据蒸馏中支持了RAY,具体参考文档:https://swift.readthedocs.io/zh-cn/latest/Instruction/Ray.html
b. Qwen3-VL、Qwen3-Omni支持混合模态数据训练;Qwen3-VL支持ulysses序列并行。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. 支持 yaml 方式配置训练参数,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. 新增 FSDP2 训练启动案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. 新增自定义多模态模型注册最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/MLLM-Registration.html
f. embedding 训练中的 InfoNCE 损失与 Qwen3-Embedding 论文描述对齐。具体参考文档:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Embedding.html
g. 新增多标签分类训练案例,脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template 支持 seed-oss。感谢@hpsun1109的贡献。 - 全链路
a.swift export支持 GPTQ-v2 量化,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh 。感谢@zzc0430的贡献。
b.swift deployvllm推理后端支持 DP 部署,使用--vllm_data_parallel_size参数。感谢@YushunXiang 的贡献。
c.swift deploy新增 health/ping endpoints。
d. vLLM 部署新增参数vllm_mm_processor_cache_gb/vllm_engine_kwargs。
新模型
- 纯文本模型:
a. Qwen/Qwen3Guard-Gen-0.6B系列
b. MiniMax/MiniMax-M2 - 多模态模型:
a. Qwen/Qwen3-VL-2B-Instruct系列
b. deepseek-ai/DeepSeek-OCR,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking系列
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct系列
English Version
New Features
- Megatron-SWIFT
a. Mcore-Bridge Release. Supports direct loading and saving of model weights in safetensors format; supports bidirectional conversion of LoRA incremental weights; supports multi-node conversion. Documentation: https://swift.readthedocs.io/en/latest/Megatron-SWIFT/Mcore-Bridge.html. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/mcore_bridge
b. Upgraded megatron-core version to 0.14.0.
c. Addedvit_lrandaligner_lrparameter support for multimodal model training.
d. Added storage optimization parameters: async_save, save_retain_interval, etc.
e. Support for batched mrope to accelerate training speed of Qwen3-VL, Qwen2.5-VL, and other models. - RL
a. GRPO LoRA training weight synchronization speed optimization. Details: https://swift.readthedocs.io/en/latest/Instruction/GRPO/GetStarted/GRPO.html#memory-optimization-solutions-in-colocate-mode
b. GRPO training memory optimization to reduce peak memory consumption.
c. New RLVR algorithm support: RLOO, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/RLOO.html. REINFORCE++ Baseline, documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/REINFORCEPP.html
d. GKD supports using vLLM to accelerate policy model rollout, with new parameter teacher_deepspeed for additional control of teacher model sharding strategy. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GKD.html
e. GSPO supports using liger_kernel to reduce memory usage. - Training
a. RAY support added for PT/SFT/Sampling/Data Distillation, documentation: https://swift.readthedocs.io/en/latest/Instruction/Ray.html
b. Qwen3-VL and Qwen3-Omni support mixed modality data training; Qwen3-VL supports Ulysses sequence parallelism. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
c. Support for YAML-based training parameter configuration, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
d. Added FSDP2 training launch example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/multi-gpu/fsdp2_lora
e. Added best practice for custom multimodal model registration: https://swift.readthedocs.io/en/latest/BestPractices/MLLM-Registration.html
f. InfoNCE loss in embedding training aligned with Qwen3-Embedding paper description. Documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html
g. Added multi-label classification training example, scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/seq_cls/multi_label
h. agent_template supports seed-oss. Thanks to @hpsun1109 for the contribution. - Full Pipeline
a.swift exportsupports GPTQ-v2 quantization, scripts: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize/gptq_v2.sh. Thanks to @zzc0430 for the contribution.
b. swift deploy vLLM inference backend supports DP deployment, using--vllm_data_parallel_sizeparameter. Thanks to @YushunXiang for the contribution.
c. swift deploy added health/ping endpoints.
d. vLLM deployment added parametersvllm_mm_processor_cache_gb/vllm_engine_kwargs.
New Models
- Text-only models:
a. Qwen/Qwen3Guard-Gen-0.6B series
b. MiniMax/MiniMax-M2 - Multimodal models:
a. Qwen/Qwen3-VL-2B-Instruct series
b. deepseek-ai/DeepSeek-OCR, training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/deepseek_ocr
c. PaddlePaddle/PaddleOCR-VL
d. ZhipuAI/Glyph
e. PaddlePaddle/ERNIE-4.5-VL-28B-A3B-Thinking series
f. lmms-lab/LLaVA-OneVision-1.5-4B-Instruct series
What's Changed
- [bugfix] fix image_list qwen2.5/3-omni by @Jintao-Huang in #6122
- [model] Support Qwen3-VL dense by @Jintao-Huang in #6120
- feat: support gptq_v2 quantization method by @zzc0430 in #6102
- [bugfix] fix gptq_v2 by @Jintao-Huang in #6126
- [bugfix] patch timeout & fix print_rich_table by @Jintao-Huang in #6137
- Add the support for vLLM data parallel configuration in SwiftDeploy by @YushunXiang in #6114
- [docs] update vllm deploy DP docs by @Jintao-Huang in #6139
- [model] Support Qwen/Qwen3-VL-4B-Instruct series by @Jintao-Huang in #6143
- Update loss_scale method call to pass through inputs.extra_kwargs by @CJack812 in #6160
- [bugfix] fix qwen3_vl videos by @Jintao-Huang in #6162
- Fix bug of sp/cp by @tastelikefeet in #6163
- [deploy] update vllm_enable_prefix_caching by @Jintao-Huang in #6165
- [bugfix] qwen3-vl support mixed data by @Jintao-Huang in #6161
- [template] add_retry by @Jintao-Huang in #6138
- [bugfix] Fix multimodal lazy_tokenize false by @Jintao-Huang in #6172
- [template] update qwen3_vl grounding dataset format by @Jintao-Huang in #6178
- [docs] update docs by @Jintao-Huang in #6180
- [bugfix] add tools fileds in inputs2reqeusts by @hjh0119 in #6054
- [grpo] Optimize vLLM weight synchronization & update buitin accuracy reward by @hjh0119 in #5773
- [model] support Qwen/Qwen3Guard-Gen-0.6B series by @Jintao-Huang in #6189
- [template] Support qwen3 omni mixed data by @Jintao-Huang in #6196
- [docs] update qwen3_vl best practice by @Jintao-Huang in #6206
- [vllm] support vllm_mm_processor_cache_gb by @hjh0119 in #6210
- [megatron] fix qwen3_vl new_special_tokens by @Jintao-Huang in #6213
- [megatron] add mcore save_args by @Jintao-Huang in #6216
- [bugfix] fix dtype warning by @Jintao-Huang in #6219
- [bugfix] fix infer pt dp by @Jintao-Huang in #6222
- support training for multimodal reranker by @0russwest0 in #6192
- [bugfix] fix reward_trainer logger by @Jintao-Huang in #6240
- [model] Support deepseek-ocr by @Jintao-Huang in #6238
- [docs] update deepseek_ocr docs by @Jintao-Huang in #6242
- [bugfix] fi...
Patch release v3.9.3
Full Changelog: v3.9.2...v3.9.3
Patch release v3.9.2
Full Changelog: v3.9.1...v3.9.2
Patch release v3.9.1
Full Changelog: v3.9.0...v3.9.1
v3.9.0
中文版
新特性
- Megatron-SWIFT
a. 支持更多模型架构:Qwen3-VL, Qwen3-Omni, Qwen3-Next, Kimi-VL, InternVL3.5-HF等。完整的模型支持情况,参考支持的模型文档:https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html
b. 支持KTO训练,包括全参数/LoRA/MoE/多模态/Packing等训练技术等支持。感谢招商银行技术团队@kevssim 的贡献。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/kto
c. 支持RM训练,包括全参数/LoRA/MoE/多模态/Packing等训练技术等支持。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/rm
d. 支持序列分类模型架构,包括三种任务:regression、single_label_classification、multi_label_classification。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/seq_cls
e. 支持VPP并行技术,减少PP并行的计算空泡,提高GPU利用率,但会略微提高通信量。支持异构PP并行pipeline_model_parallel_layout,自定义流水线并行(PP/VPP)布局。
f. DPO等RLHF技术中的ref_model不初始化 main_grad 降低显存占用。 - 训练
a. 序列并行优化,ulysses 和 ring-attention 支持混合使用,实现更长的序列处理能力。支持纯文本和多模态模型的SFT/DPO/GRPO训练。训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/train/sequence_parallel/sequence_parallel.sh
b. 纯文本及多模态模型Embedding/Reranker/序列分类任务训练支持使用 padding_free 以节约显存资源并加速训练。
c. Embedding和Reranker训练数据集格式重构,具体参考文档:https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html, https://swift.readthedocs.io/en/latest/BestPractices/Reranker.html
d. Agent template支持更多模型:deepseek_v3_1, qwen3_coder。(感谢@gakkiri ,@ray075hl 的贡献)
e.load_from_cache_file默认值从True改成False,避免因缓存原因导致的未知问题。 - RLHF
a. GRPO支持CHORD算法,在GRPO训练中混合SFT训练,参考文档:https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CHORD.html
b. KTO支持padding free和packing以节约显存资源并加速训练。
c. GRPO训练 padding_free重构,更好支持多模态模型。
d. GRPO vLLM 支持PYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"环境变量,减小显存碎片。 - 推理
a. 支持Reranker任务的推理/部署 (pt/vllm),以及序列分类任务的推理部署(pt/vllm)。脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/deploy/reranker, https://github.com/modelscope/ms-swift/tree/main/examples/deploy/seq_cls
新模型
- 纯文本模型
a. Qwen/Qwen3-Next-80B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_next
b. ZhipuAI/GLM-4.6
c. inclusionAI/Ling-mini-2.0; inclusionAI/Ring-mini-2.0系列
d. iic/Tongyi-DeepResearch-30B-A3B
e. ByteDance-Seed/Seed-OSS-36B-Instruct系列(感谢@hpsun1109 的贡献)
f. deepseek-ai/DeepSeek-V3.1-Terminus
g. PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking
h. google/embeddinggemma-300m(embedding模型) - 多模态模型
a. Qwen/Qwen3-VL-30B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
b. Qwen/Qwen3-Omni-30B-A3B-Instruct系列,训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_omni
c. Kwai-Keye/Keye-VL-1_5-8B(感谢@hellopahe 的贡献)
d. OpenGVLab/InternVL3_5-1B-HF系列
e. BytedanceDouyinContent/SAIL-VL2-2B系列
f. stepfun-ai/Step-Audio-2-mini(感谢@CJack812 的贡献)
English Version
New Features
- Megatron-SWIFT
a. More model architecture support: Qwen3-VL, Qwen3-Omni, Qwen3-Next, Kimi-VL, InternVL3.5-HF, etc. For a complete list of supported models, please refer to the Supported Models documentation: https://swift.readthedocs.io/en/latest/Instruction/Supported-models-and-datasets.html
b. KTO training support, including full-parameter, LoRA, MoE, multimodal, and Packing training techniques. Special thanks to @kevssim from China Merchants Bank’s technical team for their contribution. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/kto
c. Reward Model training support, including full-parameter, LoRA, MoE, multimodal, and Packing training techniques. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/rlhf/rm
d. Sequence classification model architecture support, covering three task types: regression, single_label_classification, and multi_label_classification. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/seq_cls
e. Support for VPP (Virtual Pipeline Parallelism): reduces pipeline bubbles in PP (Pipeline Parallelism), improving GPU utilization at the cost of slightly increased communication overhead. Supports heterogeneous PP viapipeline_model_parallel_layoutfor custom PP/VPP pipeline layouts.
f. In RLHF techniques such as DPO, the ref_model no longer initializes main_grad, reducing GPU memory consumption. - Training
a. Sequence parallelism optimization: Ulysses and Ring Attention can now be used together, enabling processing of even longer sequences. Supports SFT/DPO/GRPO training for both text-only and multimodal models. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/sequence_parallel/sequence_parallel.sh
b. Padding-free training is now supported for embedding, reranker, and sequence classification tasks on both text-only and multimodal models, saving GPU memory and accelerating training.
c. Restructured dataset formats for embedding and reranker training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/BestPractices/Embedding.html, https://swift.readthedocs.io/en/latest/BestPractices/Reranker.html
d. Agent templates support more models: deepseek_v3_1, qwen3_coder. (Thanks to contributions from @gakkiri and @ray075hl)
e. Default value ofload_from_cache_filechanged from True to False to avoid unexpected issues caused by caching. - RLHF
a. GRPO now supports the CHORD algorithm, enabling mixed SFT training during GRPO. Documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO/AdvancedResearch/CHORD.html
b. KTO supports padding-free and packing, reducing memory usage and accelerating training.
c. Padding-free implementation in GRPO has been refactored for better multimodal model support.
d. GRPO with vLLM now supports the environment variablePYTORCH_CUDA_ALLOC_CONF="expandable_segments:True"to reduce GPU memory fragmentation. - Inference
a. Inference and deployment support for Reranker tasks (PyTorch/vLLM) and sequence classification tasks (PyTorch/vLLM). Example scripts: https://github.com/modelscope/ms-swift/tree/main/examples/deploy/reranker, https://github.com/modelscope/ms-swift/tree/main/examples/deploy/seq_cls
New Models
New Models
- Text-only Models
a. Qwen/Qwen3-Next-80B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_next
b. ZhipuAI/GLM-4.6
c. inclusionAI/Ling-mini-2.0; inclusionAI/Ring-mini-2.0 series
d. iic/Tongyi-DeepResearch-30B-A3B
e. ByteDance-Seed/Seed-OSS-36B-Instruct series (Thanks to @hpsun1109 for the contribution)
f. deepseek-ai/DeepSeek-V3.1-Terminus
g. PaddlePaddle/ERNIE-4.5-21B-A3B-Thinking
h. google/embeddinggemma-300m (embedding model) - Multimodal Models
a. Qwen/Qwen3-VL-30B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_vl
b. Qwen/Qwen3-Omni-30B-A3B-Instruct series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_omni
c. Kwai-Keye/Keye-VL-1_5-8B (Thanks to @hellopahe for the contribution)
d. OpenGVLab/InternVL3_5-1B-HF series
e. BytedanceDouyinContent/SAIL-VL2-2B series
f. stepfun-ai/Step-Audio-2-mini (Thanks to @CJack812 for the contribution)
What's Changed
- Merge ulysses and ring-attention by @tastelikefeet in #5522
- [bugfix] fix text_position_ids by @Jintao-Huang in #5692
- [grpo] support CHORD algorithm by @hjh0119 in #5680
- [doc] update chord doc by @hjh0119 in #5701
- [bugfix]: use GCD to robustly configure sp and rp dimensions for any world_size by @0russwest0 in #5698
- [megatron] Fix SP & LoRA by @Jintao-Huang in #5704
- [megatron] Support ovis2.5 by @Jintao-Huang in #5719
- [template] update get_env_args & load_from_cache_file by @Jintao-Huang in #5730
- [bugfix] fix qwen3
swift ptby @Jintao-Huang in #5741 - fix sp grpo by @tastelikefeet in #5744
- Fix multiple input issue and more_params for web-ui by @slin000111 in #5739
- [bugfix] set default padding side to left for generative reranker by @0russwest0 in #5751
- [bugfix] correct multi-GPU reranker evaluation metric calculation by @0russwest0 in #5755
- wrap base_model into get_llm_model by @tastelikefeet in #5749
- [bugfix] fix forward_context by @Jintao-Huang in #5757
- [bugfix] update use_barrier -> True by @Jintao-Huang in #5763
- support Seed-OSS-36B-Instruct by @hpsun1109 in #5761
- [bugfix] fix megatron model_type by @Jintao-Huang in #5767
- Refactor grpo padding free by @tastelikefeet in #5769
- Update seed.py by @hpsun1109 in #5725
- [model] Support qwen3_next (transformers) by @Jintao-Huang in #5782
- [megatron] fix text_position_ids by @Jintao-Huang in #5783
- [model] su...
Patch release v3.8.3
Full Changelog: v3.8.2...v3.8.3
Patch release v3.8.2
Full Changelog: v3.8.1...v3.8.2