Skip to content

v0.7 - context caching + tool-choice

Choose a tag to compare

@vivianjeet vivianjeet released this 07 Jun 18:36
· 24 commits to main since this release
b47880d

v0.7 - context caching + tool-choice

  • Model-tier router (UnifiedLLMClient): every node selects by tier on the
    shared retry/rotation spine; fail-closed QuotaExhaustedError preserved.
  • Context caching on two axes: per-PR diff cache reused across the Flash
    nodes (~74% input-cost cut, verified live), cross-PR prefix cache on the
    security node; both respect the 2048-token floor.
  • Tool-choice benchmark across the four Gemini function-calling modes;
    Instructor retained for forced structured extraction.
  • Central src/config.py for all tunables (one-way config -> state).

Tests: 210 passed, 2 deselected.