Releases: vivianjeet/langgraph-pr-audit-agent
v1.0 - multi-agent PR audit on a Gemini model-tier router
A LangGraph PR-audit agent: plan-execute with reflexion, four-type memory
with a rule lifecycle, an MCP compliance server, and a single model-tier
router over Gemini with context caching, extended-thinking routing and
fail-closed retries. Every audit reports its cost and serving tier; the
CI gate blocks a merge on an unaddressed critical finding until a human
approves.
v0.9 - extended-thinking router and full router consolidation
Adds a Gemini extended-thinking path for the hardest regulated reviews (gated by a
deterministic complexity heuristic) and routes every node's model call through one
router, so each gets consistent tier selection, fail-closed handling and a complete
cost trace. Includes a fix so per-call output ceilings apply.
v0.8 - Langfuse cost tracking, tool-choice benchmark, docs alignment
Rollup over v0.7: Langfuse cost tracking at the router callback (per-tier spend,
fallback events, scores on trace, report CLIs), a fail-closed tier fix (security
stays on Pro on cache miss), a tool-choice benchmark that captures thinking tokens
with the docs corrected to match, a consolidated command reference, four newly
documented scripts and removal of batch-mode framing from the prefix-cache docs.
v0.7 - context caching + tool-choice
v0.7 - context caching + tool-choice
- Model-tier router (UnifiedLLMClient): every node selects by tier on the
shared retry/rotation spine; fail-closed QuotaExhaustedError preserved. - Context caching on two axes: per-PR diff cache reused across the Flash
nodes (~74% input-cost cut, verified live), cross-PR prefix cache on the
security node; both respect the 2048-token floor. - Tool-choice benchmark across the four Gemini function-calling modes;
Instructor retained for forced structured extraction. - Central src/config.py for all tunables (one-way config -> state).
Tests: 210 passed, 2 deselected.
v0.6 - model-tier router, central config, verdict-driven human review
Adds a model-tier router (UnifiedLLMClient) on top of the Gemini retry/
rotation spine, with tiered fallback and per-call cost accounting, plus a
single src/config.py for every tunable value. Human-review verdicts now drive
the report status, rule learning and the CI exit code through one shared path.
Scoring is multiplicative and citation verification is whitespace-safe.
Built on the v0.2-v0.5 line.
v0.5 - grounded compliance citations
Verified verbatim regulatory spans per compliance claim (quote -> substring-
verify -> drop hallucinated), on the Gemini spine. Plus the MCP week:
agent-as-client/server over stdio, multi-framework rule packs, a raw-stdio
test client, and the compliance audit trail.
v0.4 - Compliance grounding over MCP
This release makes the agent ground its findings in real regulations, and
ships the memory, context, and reliability work that backs a production audit.
Highlights
Compliance grounding over MCP
The agent now speaks the Model Context Protocol on both ends. A new compliance
stage triages whether a diff is regulated and pulls the matching regulatory
passages, so a security finding cites the exact clause it breaks - for example,
a SQL-injection diff is grounded in the RBI Cyber Security Framework and
OWASP A03.
- MCP client - the
compliancenode callssearch_compliance_docsover
stdio (retrieve -> compliance -> plan). - MCP server -
compliance-rag(FastMCP) exposes the same retrieval as a
reusable tool any MCP client can call, from Claude Desktop to a raw-SDK
client, with no glue. - Pluggable framework packs - RBI, HIPAA, PCI-DSS, OWASP, and GDPR ship by
default. Adding a framework is dropping apacks/*.yamlfile and re-running
the seeder; no code change. - Fails soft - a missing server or an unregulated diff yields empty context
and a visible trace line, never a crash and never a silent "clean".
Memory, context, and orchestration
- Four-type agent memory (semantic, episodic, procedural, in-context) with a
typed procedural-rule lifecycle and a governance CLI. - Priority-ordered context budgeting and an in-graph history-compression node
for long sessions. - The three audits run concurrently, with thread-safe API-key rotation under
the fan-out. - Pluggable checkpointer: in-memory by default, opt-in durable SQLite.
Reliability
- A depleted-credits 429 is classified as terminal billing rather than a
transient rate limit, so it rotates keys instead of burning retries. - Corpus seeding is batched to stay under the embedding per-minute quota.
- MCP tool output is normalized back to structured records on the client.
Verification
- 163 tests passing (
pytest -m "not integration"). - Live end-to-end: a regulated diff produces cross-framework citations; an
unregulated diff short-circuits with no lookup.
Full changelog: v0.3...v0.4
v0.3 - concurrent audits, rule governance, and design-rationale docs
- Concurrent audit fan-out (async nodes, overlapping Gemini calls) with thread-safe key rotation in the retry layer.
- Procedural-rule governance: full lifecycle (seeded / learned_pending / learned_approved / rejected / retired) plus an offline review
CLI with a near-duplicate hint and per-rule PR-verdict provenance. - Latency benchmark harness and a design-rationale README section.
- CI hardening: hermetic test job, async-correct gate, dependency and PR-approval-lookup fixes.