Skip to content

v0.1.0

Latest

Choose a tag to compare

@wtomin wtomin released this 03 Jul 07:08
e1ae8f8

Highlights

This release includes 150 merged PRs from 14 contributors.

VeRL-Omni v0.1.0 is the first release of the dedicated multimodal generative RL training repository, after the pre-release version v0.1.0rc1. It establishes a runnable stack for diffusion, unified multimodal, and omni-modality recipes, while strengthening the rollout engine, trainer backends, reward workflows, hardware support, and validation docs.

This is the first tagged final release since multimodal generative RL training moved into a dedicated VeRL-Omni repository.

Architecture

Rollout Engine

  • Upgraded the rollout stack to vLLM-Omni v0.22.0 and aligned the release around the companion vLLM, vLLM-Omni, and verl dependency versions. (#166, #167)
  • Improved rollout execution with step-wise batching co-execution, experimental rollout correction, faster LoRA weight updates, and configurable diffusion rollout attention backend selection. These changes reduce friction from trajectory generation through actor-to-rollout synchronization and help catch attention-backend mismatches before large runs. (#81, #93, #156, #200)
  • Hardened autoregressive rollout paths used by Qwen3-Omni by guarding rollout attention backend handling and refreshing processor exports / worker processor loading after model patches. (#211, #224)

Training

  • Moved the GPU actor path toward FA3 attention while preserving native and SDPA fallbacks when FA3 dependencies are unavailable. (#141, #165)
  • Expanded trainer backend options with FSDP sequence parallelism, optional VeOmni actor/reference engines, and corrected FSDP2 launch scripts so recipe scripts actually select true FSDP2 training. (#59, #104, #216)
  • Added trainer-side observability and reproducibility tools, including diffusion MFU metrics and deterministic per-step / per-rollout seeding. (#60, #128, #136)

Reward

  • Added multi-reward weighted aggregation so a run can combine multiple scoring signals through configurable reward functions and managers. Follow-up fixes support dynamic loading and relative imports in the multi-reward manager, which makes custom scorer packaging easier for recipe authors. (#109, #228)
  • Added external HTTP scorer support and async reward documentation, making it easier to serve expensive reward models separately and overlap reward scoring with rollout. (#116, #155)

Model & Algorithm Supports

  • Added the first Qwen-Image RL recipe set, covering FlowGRPO, Flow-DPPO, MixGRPO, GRPO-Guard, DiffusionNFT, and online DPO. These recipes make Qwen-Image the main text-to-image coverage anchor for the first final release. (#48, #58, #106, #126, #139, #164, #168, #202)
  • Expanded diffusion generator coverage with SD3.5 offline DPO / FlowGRPO and Wan2.2 DanceGRPO text-to-video training. The quickstart and example docs were refreshed around these diffusion recipes. (#95, #98, #127, #142, #178, #204)
  • Added BAGEL FlowGRPO support for unified understanding-and-generation training, including vLLM-Omni v0.22 alignment and OCR / PickScore reward variants. (#132, #137, #180, #209, #212)
  • Added Qwen3-Omni Thinker GSPO + LoRA with vLLM-Omni async autoregressive rollout, then extended it with Transformers 5.x LoRA FSDP validation, smoke-test wiring, and processor-loading fixes. (#113, #208, #224)

Hardware

  • Added Ascend NPU support for Qwen-Image FlowGRPO, including NPU-oriented launch scripts and quickstart guidance for Atlas 800T A2-style setups. (#68, #85, #181, #202)
  • Extended NPU recipe coverage to Qwen-Image DiffusionNFT, Qwen-Image online DPO, SD3.5 DPO, BAGEL FlowGRPO with OCR reward, and Qwen3-Omni Thinker GSPO. (#127, #164, #174, #180, #189)
  • Documented NPU-specific runtime requirements and attention backend choices in the relevant installation and recipe guides. (#68, #85, #181, #189)

Documentation / Tooling

  • Simplified environment setup with project extras for GPU, vLLM-Omni rollout, training, OCR reward, and development workflows. When upgrading, refresh dependencies using the current install guide so the vLLM-Omni v0.22 stack and optional extras are installed together. (#167)
  • Added CUDA Docker setup for users who prefer containerized environments, plus multi-node and larger-card-count examples for scaling Qwen-Image FlowGRPO beyond a single node. (#177, #194, #195)
  • Refreshed user-facing docs for installation, quickstart, supported models, model examples, algorithm-specific recipes, HTTP scorer services, async reward, and diffusion MFU metrics. (#128, #155, #167, #204, #213, #229)
  • Expanded validation coverage with GPU smoke tests and e2e scripts for core diffusion and reward paths, including FlowGRPO, DPO, DiffusionNFT, Qwen3-Omni GSPO, vLLM reward coverage, patched processor config, and multi-reward manager imports. (#45, #80, #127, #150, #208, #228, #230)
  • Some examples include compatibility guidance around the vLLM-Omni v0.22 / Transformers 5.x stack, especially for Qwen3-Omni GSPO. Check the relevant example README before launching large runs. (#166, #208)

Breaking Changes

  • Adapted rollout integrations to the upstream verl LLMServerClient refactor. Custom rollout server/client code should migrate to the current rollout client and configuration paths before upgrading. (#52)

What's Changed

  • [doc] chore: Change quick start docs to SD3.5 by @knlnguyen1802 in #204
  • [recipe, diffusion] chore: update Qwen-Image NPU example scripts by @Sky-Trigger in #202
  • [trainer] feat: bagel flow-grpo training vllm-omni 0.22 by @zhtmike in #209
  • [doc] Update WeChat Group QR Code by @wtomin in #210
  • [rollout, model] fix: guard rollout_attn_backend for AR rollouts and refresh hf_processor re-export by @qinganrice in #211
  • [trainer] feat: bagel flow-grpo training with pickscore reward by @zhtmike in #212
  • [recipe] fix: fix all fsdp2 scripts to enable true FSDP2 training by @zhtmike in #216
  • [model] feat: support bagel npu training with ocr reward by @ZihaoW123 in #180
  • [doc] chore: Add models docs and refactor models example by @knlnguyen1802 in #213
  • [omni, rollout] feat: add Ascend NPU support for Qwen3-Omni Thinker GSPO training by @panshaowu in #189
  • [cfg, tests] chore: upgrade verl pin by @SamitHuang in #215
  • [ci] fix: Fix ci errors by @SamitHuang in #220
  • [model, rollout, cfg, tests, doc] feat: Qwen3-Omni Thinker LoRA FSDP on transformers 5.x by @qinganrice in #208
  • [omni,model] fix: load Qwen3OmniMoeProcessor in AgentLoopWorker via stale hf_processor refresh by @wtomin in #224
  • [reward, tests] fix: support dynamic loading and relative imports in multi-reward manager by @SamitHuang in #228
  • [doc] chore: Update readme by @SamitHuang in #229
  • [ci] fix: update config vision_start_token_id for patched hf_processor by @wtomin in #230

Full Changelog: v0.1.0rc1...v0.1.0