v0.7 - context caching + tool-choice
v0.7 - context caching + tool-choice
- Model-tier router (UnifiedLLMClient): every node selects by tier on the
shared retry/rotation spine; fail-closed QuotaExhaustedError preserved. - Context caching on two axes: per-PR diff cache reused across the Flash
nodes (~74% input-cost cut, verified live), cross-PR prefix cache on the
security node; both respect the 2048-token floor. - Tool-choice benchmark across the four Gemini function-calling modes;
Instructor retained for forced structured extraction. - Central src/config.py for all tunables (one-way config -> state).
Tests: 210 passed, 2 deselected.