Skip to content

v0.2.0

Choose a tag to compare

@YujiaBao YujiaBao released this 26 Mar 21:48
· 134 commits to main since this release
776e4b1

v0.2.0

229 commits since v0.1.0 — a major release spanning new model support, a weights lifecycle, RL robustness, and infrastructure maturation.

Highlights

Renderer overhaul

  • Tool calling support across Qwen3, DeepSeek V3, Kimi K2 with new ToolSpec API
  • Structured content model — ThinkingPart in content list replaces Message.thinking (breaking)
  • Field renames: prefix/content/suffixheader/output/stop_overlap (breaking)
  • Per-model module architecture; Renderer changed from Protocol to ABC
  • New models: Nemotron-3, Qwen3.5, Kimi K2.5 (text + vision)
  • Custom renderer/tokenizer registration

Weights lifecycle (new)

  • New tinker_cookbook/weights/ subpackage — download, merge, publish
  • Shard-by-shard merging for memory-efficient LoRA→base-model merge
  • FP8 quantized export for MoE models
  • PEFT-format adapter building for vLLM/HF serving

RL improvements

  • Rollout error resilience — failures no longer crash the run
  • Context limit handling in multi-turn environments
  • Pluggable rollout executor for distributed rollouts
  • ActionExtra for Env.step extensibility; EnvGroupBuilder.cleanup()
  • Async training hang fix on data exhaustion

Supervised learning

  • SFT hyperparameter sweep with published results for 3 models
  • max_steps parameter, streaming dataset batch skip fix

Infrastructure & packaging

  • hatch-vcs versioning + nightly builds
  • Slimmed core dependencies (recipe extras separated)
  • Centralized exception hierarchy with picklability
  • Deprecation framework for API evolution
  • PEP 561 py.typed marker; public API surface cleanup

New recipes

  • Harbor RL (sandboxed terminal-bench), ifBench RLVR, tool-use agents library
  • Multi-turn on-policy distillation, vision input, rubric-based eval

Environments & sandboxes

  • Modal sandbox backend (warm pool, rate limiting, async)
  • Configurable KL penalty reference model
  • Pickle support for Renderer/Env (distributed execution)

Eval & logging

  • Inspect AI improvements, renderer metadata persistence
  • Logtree JSON + rollout summary JSONL exports
  • Unified training telemetry with Wandb Gantt charts
  • Per-iteration output subdirectories

Testing & CI

  • Downstream API compatibility tests, weights e2e suite
  • pytest markers, pyright CI, daily recipe smoke tests

See the full CHANGELOG.md for details.