Releases · modelscope/ms-swift

13 Jun 14:24

Jintao-Huang

v3.5.1

f38305b

Patch release v3.5.1 Latest

Latest

Full Changelog: v3.5.0...v3.5.1

Assets 2

08 Jun 16:51

Jintao-Huang

v3.5.0

cb64cb7

v3.5.0

中文版

新特性

GRPO：
a. 代码重构，使用参数vllm_mode指定。参数说明详见参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id1:~:text=vllm_mode%20server%20%E5%8F%82%E6%95%B0,colocate%20mode%20%E7%94%9F%E6%95%88%E3%80%82
b. GRPO长文本优化，支持ulysses序列并行，显著降低长文本训练显存占用，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. 新增sync_ref_model参数，支持训练中同步参考模型权重。
d. 支持 liger kernel loss，使用参数 use_liger_kernel，降低显存占用。
e. External mode 支持 move_model_batches，降低zero3同步权重时的显存峰值。
f. 集成 INTELLECT-2 的 Two-Sided Clipping 算法，使用参数 delta。
g. 支持奖励函数返回 None，适用于多任务训练，参考文档：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#id7
h. Internal mode 支持 vllm_server_base_url，传入外部 vLLM 服务器url。
i. 插件拓展：支持 QwenLong-L1 奖励模型插件。
j. 新增 steps_per_generation/generation_batch_size 参数，支持自定义采样批量大小。
k. Web-UI支持GRPO训练。
l. 以下参数将在 v3.6 移除：tensor_parallel_size / vllm_device / vllm_max_num_seqs / num_infer_workers。
训练：
a. CPT/SFT/DPO/GRPO 支持 padding free。通过将批次数据展平避免数据填充（padding），显著降低显存并加速训练。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. 多模态训练增强。支持使用 vit_lr 和 aligner_lr 参数独立控制 ViT 和 Aligner 模块的学习率。支持通过 vit_gradient_checkpointing 参数单独控制 vit 模块的 gradient checkpointing，性能基准测试参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT支持使用 channel loss 对不同 channel 数据集分别统计损失值。感谢招商银行技术团队的贡献。
d. CPT/SFT/DPO支持 use_logits_to_keep参数，降低显存占用，提升训练速度。
e. Qwen2.5-VL/Omni 支持传入图像目录进行视频训练。
推理部署：
a. swift infer批处理优化，新增 write_batch_size 参数，用于控制批处理推理结果写入result_path的间隔。
b. vllm 推理引擎默认使用 V1 engine，并支持TP和DP结合的推理模式，脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT：
a. 非流式数据集支持通过 max_epochs 自动计算 train_iters。
b. 提供 extra_megatron_kwargs 参数，支持未写入ms-swift的megatron参数传入。

新模型

Qwen/Qwen3-Embedding-0.6B系列，训练脚本参考：https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B系列，最佳实践参考https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530、XiaomiMiMo/MiMo-VL-7B-SFT系列
OpenBMB/MiniCPM4-0.5B系列

English Version

New Features

GRPO:
a. Code refactored, specified via the vllm_mode parameter. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#arguments-and-execution-script:~:text=vllm_mode%20server%20parameter,in%20colocate%20mode.
b. GRPO long-text optimization with Ulysses sequence parallelism, significantly reducing GPU memory usage during long-text training. Training script: https://github.com/modelscope/ms-swift/blob/main/examples/train/long_text/sequence_parallel_grpo.sh
c. Added sync_ref_model parameter to synchronize reference model weights during training.
d. Supports Liger Kernel Loss via use_liger_kernel parameter, reducing GPU memory consumption.
e. External mode supports move_model_batches to lower peak GPU memory during ZeRO-3 weight synchronization.
f. Integrated INTELLECT-2’s Two-Sided Clipping algorithm using the delta parameter.
g. Supports reward functions returning None, applicable for multi-task training. For details, refer to the documentation: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#multi-task-training
h. Internal mode supports vllm_server_base_url for passing external vLLM server URLs.
i. Plugin extension: Added QwenLong-L1 reward model plugin.
j. Added steps_per_generation and generation_batch_size parameters for customizing sampling batch size.
k. Web-UI supports GRPO training.
l. The following parameters will be deprecated in v3.6: tensor_parallel_size, vllm_device, vllm_max_num_seqs, num_infer_workers.
Training:
a. CPT/SFT/DPO/GRPO support padding-free training. By flattening batch data to avoid padding, GPU memory usage is reduced and training speed is improved. Script: https://github.com/modelscope/ms-swift/tree/main/examples/train/padding_free
b. Multimodal training enhancements: Supports separate learning rates for ViT and Aligner modules via vit_lr and aligner_lr parameters. Added vit_gradient_checkpointing to independently control gradient checkpointing for ViT modules. Benchmark: https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/vit_gradient_checkpointing.sh
c. CPT/SFT support channel_loss to separately calculate loss for different channel datasets. Thanks to the contributions from the technical team at China Merchants Bank.
d. CPT/SFT/DPO support use_logits_to_keep to reduce GPU memory usage and accelerate training.
e. Qwen2.5-VL/Omni support video training by passing image directories.
Inference & Deployment:
a. Optimized swift infer batching with new write_batch_size parameter to control inference result write intervals to result_path.
b. vLLM inference engine now defaults to V1 engine and supports hybrid Tensor Parallelism (TP) and Data Parallelism (DP). Script: https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/dp_tp.sh
Megatron-SWIFT:
a. Non-streaming datasets automatically calculate train_iters via max_epochs.
b. Added extra_megatron_kwargs to pass unlisted Megatron parameters into ms-swift.

New Models

Qwen/Qwen3-Embedding-0.6B series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_emb.sh
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B series. Best practices: https://mp.weixin.qq.com/s/-hhfGiiGTqXUybwPH525gw
iic/QwenLong-L1-32B
XiaomiMiMo/MiMo-7B-RL-0530 & XiaomiMiMo/MiMo-VL-7B-SFT series
OpenBMB/MiniCPM4-0.5B series

What's Changed

[grpo] code refactor by @hjh0119 in #4097
support yarn by @tastelikefeet in #4197
fix ppo init model by @hjh0119 in #4199
fix ppo reward model by @hjh0119 in #4200
[doc] remove vllm version warning in grpo by @hjh0119 in #4204
[grpo] fix colocate + tp by @hjh0119 in #4209
Refactor packing by @Jintao-Huang in #4207
[grpo] set system in inputs by @hjh0119 in #4214
fix mm packing by @Jintao-Huang in #4217
fix packing multi_node by @Jintao-Huang in #4222
fix get reward model by @hjh0119 in #4225
fix val_dataset_shuffle by @Jintao-Huang in #4226
fix task type judgement in rlhf by @hjh0119 in #4228
fix eval extral args by @Yunnglin in #4227
fix loss_scale by @Jintao-Huang in #4229
update docs by @Jintao-Huang in #4235
[rlhf] prepare_model for ref_model & reduce peak memory in dpo by @hjh0119 in #4232
fix qwen2_5_vl VIDEO_TOTAL_PIXELS by @Jintao-Huang in #4236
Support super long length sft by @tastelikefeet in #4237
compat transformers 4.52 by @Jintao-Huang in #4238
update liger_kernel docs by @Jintao-Huang in #4241
[grpo] support synchronizing ref model by @hjh0119 in #4242
optimize packing io by @Jintao-Huang in #4244
fix register_post_encode_hook by @Jintao-Huang in #4247
compat megatron-core 0.11 by @Jintao-Huang in #4250
fix qwen2_5_omni by @Jintao-Huang in #4253
fix readme by @Jintao-Huang in #4256
[grpo] set v1 engine as default in external rollout by @hjh0119 in #4258
fix ddp_timeout by @Jintao-Huang in #4259
Add tqdm by @Jintao-Huang in #4260
Fix is_master by @Jintao-Huang in #4262
fix ppo zero3 by @Jintao-Huang in #4263
test link valid by @Jintao-Huang in #4265
update docs & fix quant by @Jintao-Huang in #4268
[grpo] fix external mode&multi turn by @hjh0119 in #4255
fix ulysses eval by @tastelikefeet in #4271
support IndexedDataset shard by @Jintao-Huang in #4269
Support vit_lr aligner_lr by @Jintao-Huang in #4273
support padding_free CPT/SFT by @Jintao-Huang in #4274
[grpo] fix num of reward_model > 1 by @hjh0119 in #4287
fix n > 1 with vLLM V1 Engine by @hjh0119 in #4295
update load_args by @Jintao-Huang in #4296
update swift image by @Jintao-Huang in #4309
Fix ulysses pending by @tastelikefeet in https://github...

Contributors

liuyanyi, wizyoung, and 7 other contributors

Assets 2

18 May 14:47

Jintao-Huang

v3.4.1.post1

e20a65a

v3.4.1.post1

Full Changelog: v3.4.1...v3.4.1.post1

Assets 2

13 May 06:33

Jintao-Huang

v3.4.1

41adb93

v3.4.1

中文版

新特性

序列并行: 支持在PT/SFT/DPO阶段使用ulysses序列并行。兼容deepspeed、packing、flash_attn、streaming等训练技术。训练脚本参考这里。
GRPO: 支持自定义奖励模型逻辑，内置了一个生成式奖励模型的例子，训练脚本参考这里。
Megatron-SWIFT: 更新megatron-core到0.12.0；新增max_epochs参数，在epoch到达max_epochs时停止训练并保存权重；新增wandb参数记录训练日志。
最佳实践：新增从零开始快速训练视觉语言模型的最佳实践，参考这里。
外部贡献：支持GRPO使用judge0执行生成的代码；支持指定freeze/activate parameters使用正则表达式；支持对初始化模型中未初始化参数指定初始化策略。感谢招商银行技术团队的贡献。

新模型

XiaomiMiMo/MiMo-7B-RL系列
deepseek-ai/DeepSeek-Prover-V2-7B系列
OpenGVLab/InternVL3-1B-Pretrained系列

English Version

New Features

Sequence Parallelism: Supports the use of Ulysses sequence parallelism during PT/SFT/DPO stages. Compatible with training techniques such as DeepSpeed, packing, flash_attn, and streaming. Refer to the training script here.
GRPO: Supports custom reward model logic. Includes a built-in example of a generative reward model. Refer to the training script here.
Megatron-SWIFT: Updated megatron-core to version 0.12.0. Added the max_epochs parameter to stop training and save weights when the epoch reaches max_epochs. Added the wandb parameter to log training metrics.
Best Practices: Added best practices for quickly training vision-language models from scratch. Refer to the guide here.
External Contributions: Supports GRPO using judge0 for executing generated code. Allows specifying freeze/activate parameters using regular expressions. Supports defining initialization strategies for uninitialized parameters in the initial model. Thanks to the contributions from the technical team at China Merchants Bank.

New Models

XiaomiMiMo/MiMo-7B-RL Series
deepseek-ai/DeepSeek-Prover-V2-7B Series
OpenGVLab/InternVL3-1B-Pretrained Series

What's Changed

Fix grpo eval when gas > 1 by @hjh0119 in #4057
support qwen3-moe awq by @Jintao-Huang in #4059
Support empty think loss scale by @Jintao-Huang in #4065
fix packing eval streaming by @Jintao-Huang in #4066
support MiMo-7B by @Jintao-Huang in #4067
fix padding_side left by @Jintao-Huang in #4069
feat: add run name support by @firefighter-eric in #4072
feat: support megatron wandb by @firefighter-eric in #4074
update docs by @Jintao-Huang in #4078
Support ulysses for llm/mllm,dpo/sft by @tastelikefeet in #4085
fix enable_cache by @Jintao-Huang in #4091
Update liger code by @tastelikefeet in #4095
support max_epochs by @Jintao-Huang in #4102
[megatron] Update long text shell by @Jintao-Huang in #4106
fix requirements by @Jintao-Huang in #4108
fix enable_cache by @Jintao-Huang in #4109
fix packing by @Jintao-Huang in #4113
Fix ulysses eval by @tastelikefeet in #4114
fix omni aligner by @Jintao-Huang in #4117
fix sequence_parallel by @Jintao-Huang in #4122
update qwen3 more models by @Jintao-Huang in #4123
[grpo] fix labels pop and peftmodel parameter check by @hjh0119 in #4136
[megatron] support max_epochs by @Jintao-Huang in #4125
grpo code reward by judge0 by @kevssim in #4140
Feature freezing/activating parameters via regex by @lincq2000 in #4143
Support init parameters by @lincq2000 in #4141
fix ulysses dpo by @tastelikefeet in #4149
Fix bugs by @Jintao-Huang in #4150
fix init parameters by @lincq2000 in #4148
Add sp script by @tastelikefeet in #4154
Add more evaluation args by @Yunnglin in #4155
update readme by @Jintao-Huang in #4157
Support ulysses streaming by @tastelikefeet in #4160
[megatron]Support packing & CP by @Jintao-Huang in #4163
support internvl3 pretrain instruct by @Jintao-Huang in #4164
[grpo] support gen rm by @hjh0119 in #4151
[grpo] fix multi modal doc by @hjh0119 in #4124
fix _tp_plan by @Jintao-Huang in #4167
[doc] VL model training best practice by @hjh0119 in #4168
fix val_dataset streaming packing by @Jintao-Huang in #4172
fix kto by @tastelikefeet in #4180
fix max_length by @Jintao-Huang in #4178
fix loss_scale by @Jintao-Huang in #4183
support deepseek_prover_v2 by @Jintao-Huang in #4184
update docs by @Jintao-Huang in #4189

New Contributors

@firefighter-eric made their first contribution in #4072
@kevssim made their first contribution in #4140
@lincq2000 made their first contribution in #4143

Full Changelog: v3.4.0...v3.4.1

Contributors

firefighter-eric, kevssim, and 5 other contributors

Assets 2

30 Apr 15:45

Jintao-Huang

v3.4.0

15b973a

v3.4.0

中文版

新特性

支持Qwen3/Qwen2-MoE/Qwen3-MoE的Megatron训练(CPT/SFT)，在MoE模型上相比transformers实现训练速度快近10倍。Qwen3-MoE训练最佳实践参考: #4030

新模型

Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B系列
Qwen/Qwen2.5-Omni-3B

English Version

New Features

Support for Megatron training (CPT/SFT) of Qwen3/Qwen2-MoE/Qwen3-MoE, with training speeds nearly 10 times faster on MoE models compared to the Transformers implementation. For best practices on Qwen3-MoE training, refer to: #4030

New Models

Qwen/Qwen3-32B, Qwen/Qwen3-30B-A3B series
Qwen/Qwen2.5-Omni-3B

What's Changed

🐛 fix: fix reward model train seq_cls by @gaohongkui in #3921
Support vllm quantization by @tastelikefeet in #4003
[megatron] Support Qwen3 by @Jintao-Huang in #3995
Fix merge sentence transformers by @tastelikefeet in #4011
Fix gte training and compatible with ds3 by @tastelikefeet in #4022
fix truncation_strategy by @Jintao-Huang in #4025
[Megatron] support MoE (Qwen2-Moe & Qwen3-MoE) by @Jintao-Huang in #4012
Support Qwen3 series by @Jintao-Huang in #4029
fix bugs by @Jintao-Huang in #4031
fix grpo resume_from_checkpoint by @Jintao-Huang in #4035
support qwen3_self_cognition by @Jintao-Huang in #4039
Update readme & fix generate by @Jintao-Huang in #4041
update wechat by @tastelikefeet in #4047
support Qwen2.5-Omni-3B by @Jintao-Huang in #4052
updates GRPOTrainer compatible with trl 0.17 by @hjh0119 in #3969
fix rollout by @hjh0119 in #4055

New Contributors

@gaohongkui made their first contribution in #3921

Full Changelog: v3.3.1...v3.4.0

Contributors

gaohongkui, Jintao-Huang, and 2 other contributors

Assets 2

26 Apr 08:57

Jintao-Huang

v3.3.1

1e91895

v3.3.1

中文版

新特性

Agent训练部署模块引入agent template，包括hermes, glm4_0414, llama4等10余种agent template，支持agent数据集兼容不同模型的训练切换，文档参考这里。
GRPO训练支持调用外部vLLM server，训练与部署显存分配更灵活，训练脚本参考这里。

新模型

OpenGVLab/InternVL3-1B系列
moonshotai/Kimi-VL-A3B-Instruct系列
ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414系列

English Version

New Features

The Agent training and deployment module introduces agent templates, including more than 10 types such as hermes, glm4_0414, and llama4. These templates support switching between different models for agent dataset compatibility during training. For documentation, refer to here.
GRPO training now supports calling an external vLLM server, allowing for more flexible allocation of GPU memory during training and deployment. For the training script, refer to here.

New Models

OpenGVLab/InternVL3-1B series
moonshotai/Kimi-VL-A3B-Instruct series
ZhipuAI/GLM-4-9B-0414, ZhipuAI/GLM-Z1-9B-0414 series

What's Changed

Fix sampling and rft by @tastelikefeet in #3847
Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
support internvl3 by @hjh0119 in #3842
fix grpo filter overlong by @hjh0119 in #3844
dapo-bug by @Evilxya in #3846
support agent packing by @Jintao-Huang in #3853
Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
fix multimodal target_modules by @Jintao-Huang in #3856
Fix multimodal target modules by @Jintao-Huang in #3858
Update FAQ by @slin000111 in #3841
fix grpo completion length equal zero by @hjh0119 in #3857
support val_dataset_shuffle by @Jintao-Huang in #3860
Update swift docker by @Jintao-Huang in #3866
fix citest & minimax link by @Jintao-Huang in #3868
fix grpo save checkpoint by @hjh0119 in #3865
support glm4-z1 by @hjh0119 in #3862
add paper link by @tastelikefeet in #3886
refactor mm target_regex (compat peft/vllm) by @Jintao-Huang in #3879
Support kimi-vl by @Jintao-Huang in #3884
Fix glm4 z1 by @Jintao-Huang in #3889
fix bugs by @Jintao-Huang in #3893
fix typealias to be compatible with Python 3.9 by @hjh0119 in #3895
Fix ui by @tastelikefeet in #3903
Fix fp16 bf16 by @Jintao-Huang in #3909
add rm center_rewards_coefficient argument by @hjh0119 in #3917
revert swift_from_pretrained by @Jintao-Huang in #3914
fix grpo doc by @hjh0119 in #3920
update qwen2_5_omni by @Jintao-Huang in #3908
Support qwen3 by @Jintao-Huang in #3945
Decouple vLLM engine and GRPOTrainer. by @hjh0119 in #3911
Refactor Agent Template by @Jintao-Huang in #3918
update docs by @Jintao-Huang in #3961
fix bugs by @Jintao-Huang in #3962
Support hermes loss_scale by @Jintao-Huang in #3963
fix parse tools by @Jintao-Huang in #3975
Update unsloth compatibility by @tastelikefeet in #3970
Fix qwen2.5-omni use_audio_in_video by @Jintao-Huang in #3987
Fix web-ui by @tastelikefeet in #3997
fix get_toolcall & fix ci by @Jintao-Huang in #3999
fix bugs by @Jintao-Huang in #4001
fix seq_cls by @Jintao-Huang in #4002

New Contributors

@IamLihua made their first contribution in #3845
@Evilxya made their first contribution in #3846

Full Changelog: v3.3.0...v3.3.1

Contributors

Jintao-Huang, tastelikefeet, and 4 other contributors

Assets 2

15 Apr 09:43

Jintao-Huang

v3.3.0.post1

8afa208

v3.3.0.post1

What's Changed

Fix sampling and rft by @tastelikefeet in #3847
Fix incorrect retry count check in LazyLLMDataset.getitem by @IamLihua in #3845
support internvl3 by @hjh0119 in #3842
fix grpo filter overlong by @hjh0119 in #3844
dapo-bug by @Evilxya in #3846
support agent packing by @Jintao-Huang in #3853
Fix internvl2.5/3 deepspeed packing by @Jintao-Huang in #3855
fix multimodal target_modules by @Jintao-Huang in #3856
Fix multimodal target modules by @Jintao-Huang in #3858
Update FAQ by @slin000111 in #3841
fix grpo completion length equal zero by @hjh0119 in #3857

New Contributors

@IamLihua made their first contribution in #3845
@Evilxya made their first contribution in #3846

Full Changelog: v3.3.0...v3.3.0.post1

Contributors

Jintao-Huang, tastelikefeet, and 4 other contributors

Assets 2

11 Apr 06:36

Jintao-Huang

v3.3.0

1e48900

v3.3.0

中文版

新特性

支持DAPO算法，训练文档参考：https://swift.readthedocs.io/zh-cn/latest/Instruction/GRPO.html#dapo
支持多模态模型的序列packing，包括qwen2-vl、qwen2.5-vl、qwen2.5-omni和internvl2.5系列，训练速度提升100%。训练脚本参考：https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
新增SWIFT和Megatron-SWIFT镜像，参考这里：https://swift.readthedocs.io/zh-cn/latest/GetStarted/SWIFT%E5%AE%89%E8%A3%85.html#id3
多模态/Omni/Moe量化能力增强，量化脚本参考这里：https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize

新模型

Qwen/Qwen2.5-Omni-7B
LLM-Research/Llama-4-Scout-17B-16E-Instruct系列
cognitivecomputations/DeepSeek-V3-0324-AWQ

English Version

New Features

Supports the DAPO algorithm; training documentation can be found here: https://swift.readthedocs.io/en/latest/Instruction/GRPO.html#dapo
Supports sequence packing for multimodal models, including qwen2-vl, qwen2.5-vl, qwen2.5-omni, and the internvl2.5 series, with a 100% increase in training speed. Training scripts can be found here: https://github.com/modelscope/ms-swift/tree/main/examples/train/packing
Added SWIFT and Megatron-SWIFT mirrors, see details here: https://swift.readthedocs.io/en/latest/GetStarted/SWIFT-installation.html#mirror
Enhanced quantization capabilities for Multimodal/Omni/Moe models, shell scripts can be found here: https://github.com/modelscope/ms-swift/blob/main/examples/export/quantize

New Models

Qwen/Qwen2.5-Omni-7B
LLM-Research/Llama-4-Scout-17B-16E-Instruct series
cognitivecomputations/DeepSeek-V3-0324-AWQ

What's Changed

fix shell by @Jintao-Huang in #3675
support Qwen/Qwen2.5-Omni-7B (sft/dpo/grpo) by @Jintao-Huang in #3613
fix grpo rank by @hjh0119 in #3687
Grpo vl72b script by @hjh0119 in #3692
fix import error by @Jintao-Huang in #3700
[megatron] fix val_dataset streaming by @Jintao-Huang in #3699
fix grpo qwen2_5_omni by @Jintao-Huang in #3701
fix grpo vl by @Jintao-Huang in #3704
update warning_once by @Jintao-Huang in #3706
fix grpo template copy by @Jintao-Huang in #3708
fix adalora by @tastelikefeet in #3714
fix qwen2_5-omni by @Jintao-Huang in #3716
Fix grpo dora by @hjh0119 in #3709
support qwen2_5_vl packing by @Jintao-Huang in #3694
fix qwen2_5 omni by @Jintao-Huang in #3734
fix grpo train dataloader by @Jintao-Huang in #3736
support internvl2.5 packing by @Jintao-Huang in #3735
Support qwen2 5-vl awq quant & update shell by @Jintao-Huang in #3743
support moe quant by @Jintao-Huang in #3772
update liger kernel by @Jintao-Huang in #3775
support llama4 by @Jintao-Huang in #3777
support DAPO by @hjh0119 in #3725
[Gemma] Fixing the ndarray cast warning by @Reichenbachian in #3791
add swift docker by @Jintao-Huang in #3796
support streaming shuffle by @Jintao-Huang in #3782
grpo lmdeploy warn by @hjh0119 in #3800
fix math accuracy by @hjh0119 in #3795
fix grounding dataset concat by @Jintao-Huang in #3802
fix omni max_model_len by @Jintao-Huang in #3803
fix get_config_attrs by @Jintao-Huang in #3807
Fix grpo ovis2 by @Jintao-Huang in #3808
more grpo log by @hjh0119 in #3801
fix reward_template by @Jintao-Huang in #3813
[GRPO] fix template copy (async generate) by @Jintao-Huang in #3814
update docs by @Jintao-Huang in #3815
optimize zero3 rlhf code by @Jintao-Huang in #3816
fix grpo zero3 inflight params by @hjh0119 in #3818
fix grpo log_completions by @Jintao-Huang in #3819
vLLM 0.8.3 support for GRPO colocate mode by @hjh0119 in #3820
fix web-ui by @Jintao-Huang in #3822
fix telechat by @hjh0119 in #3825
fix omni zero3 by @Jintao-Huang in #3826
feat: grpo async generate thread-safe queue production by @hjh0119 in #3821
fix grpo async generate by @hjh0119 in #3829
update docs grpo vllm by @Jintao-Huang in #3831
support omni vllm by @Jintao-Huang in #3832
remove sequence_parallel_size by @Jintao-Huang in #3835
update grpo type annotations by @hjh0119 in #3834
fix grpo multi turn tp by @hjh0119 in #3837
[docs] fix seq_parallel by @Jintao-Huang in #3838

New Contributors

@Reichenbachian made their first contribution in #3791

Full Changelog: v3.2.2...v3.3.0

Contributors

Reichenbachian, Jintao-Huang, and 2 other contributors

Assets 2

26 Mar 02:59

Jintao-Huang

v3.2.2

47a9b76

v3.2.2

中文版

新特性

Megatron-SWIFT发布。支持TP、PP、SP、CP等并行技术对Qwen系、Llama系、Deepseek-R1蒸馏系等100+模型进行预训练和微调。支持streaming数据集和序列packing功能支持超大数据集并提升训练效率。更多内容参考Megatron-SWIFT训练文档。
支持多轮GRPO训练以适配例如Deep Search等多轮agent工具调用场景，示例代码参考这里。
- 支持mini-batch，降低训练时的显存消耗。参考GRPO训练文档。
支持iic/gme-Qwen2-VL-2B-Instruct等多模态模型的Embedding训练。具体参考embedding模型训练文档。
支持大模型和多模态大模型的多标签分类和回归任务的训练到部署。示例脚本参考这里。
支持在训练过程中使用EvalScope对模型进行评测，及时了解模型的训练效果。示例脚本参考评测文档。
书写外置plugin，以支持多模态模型LoRA训练LLM的同时，全参数训练ViT，并采用不同的学习率。避免ViT部分merge-lora造成的精度误差。示例脚本参考这里。

新模型

iic/gme-Qwen2-VL-2B-Instruct系列
Qwen/Qwen2.5-VL-32B-Instruct
LLM-Research/gemma-3-4b-it系列
deepseek-ai/DeepSeek-V3-0324
mistralai/Mistral-Small-3.1-24B-Instruct-2503系列

English Version

New Features

Release of Megatron-SWIFT: Megatron-SWIFT has been released, supporting various parallel technologies such as TP (Tensor Parallelism), PP (Pipeline Parallelism), SP (Sequence Parallelism), and CP (Context Parallelism) for pre-training and fine-tuning over 100 models, including the Qwen series, Llama series, and Deepseek-R1 distillation series. It also supports streaming datasets and sequence packing, enabling the handling of ultra-large datasets while improving training efficiency. For more details, refer to the Megatron-SWIFT Training Documentation.
Support for Multi-turn GRPO Training: Supports multi-turn GRPO training to adapt to scenarios such as multi-turn agent tool calls in Deep Search. Example code can be found here.
- Supports mini-batch training to reduce GPU memory consumption during training. Refer to the GRPO Training Documentation.
Embedding Training for Multimodal Models: Supports embedding training for multimodal models such as iic/gme-Qwen2-VL-2B-Instruct. For more information, refer to the Embedding Model Training Documentation.
Multi-label Classification and Regression Tasks for Large Models and Multimodal Large Models: Supports end-to-end training and deployment for multi-label classification and regression tasks for large models and multimodal large models. Example scripts can be found here.
Model Evaluation with EvalScope During Training: Supports model evaluation using EvalScope during training to monitor training performance in real time. Example scripts can be found in the Evaluation Documentation.
Custom External Plugin for LoRA + ViT Training: Provides an external plugin to support LoRA training for LLMs (Large Language Models) while performing full-parameter training for ViTs (Vision Transformers) with different learning rates. This avoids precision errors caused by merging LoRA into the ViT portion. Example code can be found here.

New Models

iic/gme-Qwen2-VL-2B-Instruct series
Qwen/Qwen2.5-VL-32B-Instruct
LLM-Research/gemma-3-4b-it series
deepseek-ai/DeepSeek-V3-0324
mistralai/Mistral-Small-3.1-24B-Instruct-2503 series

What's Changed

update code doc by @hjh0119 in #3498
fix readme by @Jintao-Huang in #3499
feat: swanlab config add ms-swift by @Zeyi-Lin in #3500
Support GME models by @tastelikefeet in #3513
fix docs by @tastelikefeet in #3514
Fix docs links by @tastelikefeet in #3516
fix vllm memory leak by @hjh0119 in #3515
[Docs] Easy .[all] install from git by @xihuai18 in #3518
Fix bugs by @tastelikefeet in #3520
support megatron by @Jintao-Huang in #2885
fix megatron by @Jintao-Huang in #3527
support gemma3 by @hjh0119 in #3492
fix megatron pipeline parallel by @Jintao-Huang in #3529
fix megatron tie_weight by @Jintao-Huang in #3530
support megatron llama by @Jintao-Huang in #3532
Support megatron llama3.1 3.2 by @Jintao-Huang in #3537
更新LlavaHfTemplate以适配transformers版本大于4.47时对LLaVA和LLaVA-Next模型处理图像token逻辑的修改 by @zsxm1998 in #3521
refactor llava-hf by @Jintao-Huang in #3538
fix docs by @Jintao-Huang in #3539
refactor get_megatron_model_meta by @Jintao-Huang in #3542
Gather infonce loss and support hard negative samples by @tastelikefeet in #3548
fix docs by @tastelikefeet in #3553
fix unsloth by @tastelikefeet in #3554
fix grpo mllm split modules by @hjh0119 in #3552
grpo embedding layer lora by @hjh0119 in #3531
update arguments by @Jintao-Huang in #3556
update doc by @hjh0119 in #3557
Support all models' embedding and mask fake negative by @tastelikefeet in #3563
skip grpo first wake up by @hjh0119 in #3562
move grpovllmengine import by @hjh0119 in #3568
fix bugs & support dataset_name by @Jintao-Huang in #3565
fix wrap by @tastelikefeet in #3572
Feature: add train-eval loop by @Yunnglin in #3569
compat vllm>=0.8 by @Jintao-Huang in #3574
[grpo] Fix Incorrect Placement of Data in eval_queue During async_generate by @hjh0119 in #3573
Fix lmdeploy 0.7.3 by @tastelikefeet in #3584
support vit full llm lora by @Jintao-Huang in #3575
support Mistral3.1-2503 by @hjh0119 in #3588
Support megatron packing by @Jintao-Huang in #3595
[megatron] support streaming by @Jintao-Huang in #3609
fix rft by @lxline in #3602
[template] refactor replace media tokens by @Jintao-Huang in #3614
fix top_logprobs by @Jintao-Huang in #3616
Fix bugs by @Jintao-Huang in #3619
Support multi turn grpo by @tastelikefeet in #3615
fix grpo npu context by @hjh0119 in #3597
support regression multi-label by @Jintao-Huang in #3621
Support peft 0.15 by @tastelikefeet in #3623
update grpo warning by @hjh0119 in #3598
fix grpo rm zero3 by @hjh0119 in #3626
GRPO mini batch by @hjh0119 in #3205
fix grpo warning with pt backend by @hjh0119 in #3629
compat transformers 4.50 by @Jintao-Huang in #3625
support train_sampler_random by @Jintao-Huang in #3631
fix grpo multi turn by @tastelikefeet in #3632
update docs by @Jintao-Huang in #3633
Support deepseek v3 0324 by @Jintao-Huang in #3637
fix grpo cosine reward by @hjh0119 in #3638
fix grpo lora split module by @hjh0119 in #3635
fix reward model by @Jintao-Huang in #3641
support qwen2_5_vl_32b by @Jintao-Huang in #3642
fix grpo warning by @hjh0119 in #3630
grpo reset prefix cache by @...

Contributors

xihuai18, zsxm1998, and 6 other contributors

Assets 2

14 Mar 07:07

Jintao-Huang

v3.2.1

660e35a

v3.2.1

中文版

新特性

GRPO支持vLLM的tensor parallel模式。例子参考这里。
GRPO支持co-locate和optimizer和model的offload，支持分批次导入权重和合并LoRA，节约显存资源，使72B模型的训练可以在四张A100上运行。例子参考这里。
GRPO支持code ORM。最佳实践参考这里。

新模型

Qwen/QwQ-32B系列
inclusionAI/Ling-lite系列

New Features

GRPO supports the tensor parallel mode of vLLM. Examples can be found here.
GRPO supports co-locating offloading for both the optimizer and the model, allows for batch weight loading and LoRA merging, saving GPU memory resources, which enables training of a 72B model on four A100 GPUs. Examples can be found here.
GRPO supports code ORM. Best practices can be found here.

New Models

Qwen/QwQ-32B series
inclusionAI/Ling-lite series

What's Changed

Support vllm LLMEngine by @Jintao-Huang in #3370
update publish workflows by @Jintao-Huang in #3374
support ling by @Jintao-Huang in #3379
Support mp mode and hybrid mode of GRPO by @tastelikefeet in #3381
fix name by @tastelikefeet in #3382
fix web-ui infer by @Jintao-Huang in #3384
fix bugs by @tastelikefeet in #3385
fix bugs by @Jintao-Huang in #3386
support Qwen/QwQ-32B by @Jintao-Huang in #3388
support qwq-awq by @Jintao-Huang in #3391
support lmdeploy qwen2_5_vl by @Jintao-Huang in #3394
update infer_save by @Jintao-Huang in #3400
update requirements by @Jintao-Huang in #3403
fix ollama export by @Jintao-Huang in #3406
Fix grpo engine by @tastelikefeet in #3412
fix infer_stream by @Jintao-Huang in #3413
FIx some comments, add dlc script by @tastelikefeet in #3419
add comments and docs by @tastelikefeet in #3424
fix issue 1663 by @Jintao-Huang in #3417
Support GRPO model and optimizer offload, and split loading model by @tastelikefeet in #3427
update wechat by @tastelikefeet in #3430
Fix vllm random by @tastelikefeet in #3437
fix seed by @Jintao-Huang in #3438
fix_base_deploy by @Jintao-Huang in #3442
fix GRPO device mismatch by @hjh0119 in #3440
compat vllm==0.5.1 by @Jintao-Huang in #3444
fix grpo multimodal doc by @mi804 in #3449
support grpo code orm by @hjh0119 in #3431
fix GRPO seed by @Jintao-Huang in #3458
fix grpo multi nodes by @hjh0119 in #3462
Fix tensor parallel hang by @tastelikefeet in #3464
fix grpo trainer zero3 always gather parameters by @tcye in #3467
fix grpo temperature inconsistency by @hjh0119 in #3468
fix grad_norm nan by @Jintao-Huang in #3465
fix grad_norm by @Jintao-Huang in #3469
update minimax by @Jintao-Huang in #3471
Support 72b script with 4 gpus by @tastelikefeet in #3472
refactor packing by @Jintao-Huang in #3457
Fix some docs by @tastelikefeet in #3475
fix grpo ddp hang by @hjh0119 in #3476
fix moe quant by @Jintao-Huang in #3478
Delete duplicate parameters in train_72b_4gpu.sh by @Marquis03 in #3479
fix image by @tastelikefeet in #3480
fix infer gptq internvl2 by @Jintao-Huang in #3481
Resume sample by @BC-A in #3460
fix qwen2_vl flash_attn deepspeed by @Jintao-Huang in #3484
Fix seed of tp=1 by @tastelikefeet in #3486
fix use_cache by @Jintao-Huang in #3487
Fix qwen2 5 vl grounding by @Jintao-Huang in #3491
fix ovis2 device_map by @Jintao-Huang in #3496
fix template.decode by @Jintao-Huang in #3497

New Contributors

@tcye made their first contribution in #3467
@Marquis03 made their first contribution in #3479
@BC-A made their first contribution in #3460

Full Changelog: v3.2.0...v3.2.1

Contributors

tcye, mi804, and 5 other contributors

Assets 2

Releases: modelscope/ms-swift

Patch release v3.5.1

Uh oh!

v3.5.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

v3.4.1.post1

Uh oh!

v3.4.1

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v3.4.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v3.3.1

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v3.3.0.post1

What's Changed

New Contributors

Contributors

Uh oh!

v3.3.0

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

New Contributors

Contributors

Uh oh!

v3.2.2

中文版

新特性

新模型

English Version

New Features

New Models

What's Changed

Contributors

Uh oh!

v3.2.1

中文版

新特性

新模型

New Features

New Models