v0.2.0

YujiaBao released this 26 Mar 21:48

· 134 commits to main since this release

776e4b1

v0.2.0

229 commits since v0.1.0 — a major release spanning new model support, a weights lifecycle, RL robustness, and infrastructure maturation.

Highlights

Renderer overhaul

Tool calling support across Qwen3, DeepSeek V3, Kimi K2 with new ToolSpec API
Structured content model — ThinkingPart in content list replaces Message.thinking (breaking)
Field renames: prefix/content/suffix → header/output/stop_overlap (breaking)
Per-model module architecture; Renderer changed from Protocol to ABC
New models: Nemotron-3, Qwen3.5, Kimi K2.5 (text + vision)
Custom renderer/tokenizer registration

Weights lifecycle (new)

New tinker_cookbook/weights/ subpackage — download, merge, publish
Shard-by-shard merging for memory-efficient LoRA→base-model merge
FP8 quantized export for MoE models
PEFT-format adapter building for vLLM/HF serving

RL improvements

Rollout error resilience — failures no longer crash the run
Context limit handling in multi-turn environments
Pluggable rollout executor for distributed rollouts
ActionExtra for Env.step extensibility; EnvGroupBuilder.cleanup()
Async training hang fix on data exhaustion

Supervised learning

SFT hyperparameter sweep with published results for 3 models
max_steps parameter, streaming dataset batch skip fix

Infrastructure & packaging

hatch-vcs versioning + nightly builds
Slimmed core dependencies (recipe extras separated)
Centralized exception hierarchy with picklability
Deprecation framework for API evolution
PEP 561 py.typed marker; public API surface cleanup

New recipes

Harbor RL (sandboxed terminal-bench), ifBench RLVR, tool-use agents library
Multi-turn on-policy distillation, vision input, rubric-based eval

Environments & sandboxes

Modal sandbox backend (warm pool, rate limiting, async)
Configurable KL penalty reference model
Pickle support for Renderer/Env (distributed execution)

Eval & logging

Inspect AI improvements, renderer metadata persistence
Logtree JSON + rollout summary JSONL exports
Unified training telemetry with Wandb Gantt charts
Per-iteration output subdirectories

Testing & CI

Downstream API compatibility tests, weights e2e suite
pytest markers, pyright CI, daily recipe smoke tests

See the full CHANGELOG.md for details.

Assets 2