Skip to content

Releases: vivianjeet/langgraph-pr-audit-agent

v1.0 - multi-agent PR audit on a Gemini model-tier router

08 Jun 11:49
a5e3bb6

Choose a tag to compare

A LangGraph PR-audit agent: plan-execute with reflexion, four-type memory
with a rule lifecycle, an MCP compliance server, and a single model-tier
router over Gemini with context caching, extended-thinking routing and
fail-closed retries. Every audit reports its cost and serving tier; the
CI gate blocks a merge on an unaddressed critical finding until a human
approves.

v0.9 - extended-thinking router and full router consolidation

08 Jun 09:29
9126387

Choose a tag to compare

Adds a Gemini extended-thinking path for the hardest regulated reviews (gated by a
deterministic complexity heuristic) and routes every node's model call through one
router, so each gets consistent tier selection, fail-closed handling and a complete
cost trace. Includes a fix so per-call output ceilings apply.

v0.8 - Langfuse cost tracking, tool-choice benchmark, docs alignment

08 Jun 09:27
798cdcc

Choose a tag to compare

Rollup over v0.7: Langfuse cost tracking at the router callback (per-tier spend,
fallback events, scores on trace, report CLIs), a fail-closed tier fix (security
stays on Pro on cache miss), a tool-choice benchmark that captures thinking tokens
with the docs corrected to match, a consolidated command reference, four newly
documented scripts and removal of batch-mode framing from the prefix-cache docs.

v0.7 - context caching + tool-choice

07 Jun 18:36
b47880d

Choose a tag to compare

v0.7 - context caching + tool-choice

  • Model-tier router (UnifiedLLMClient): every node selects by tier on the
    shared retry/rotation spine; fail-closed QuotaExhaustedError preserved.
  • Context caching on two axes: per-PR diff cache reused across the Flash
    nodes (~74% input-cost cut, verified live), cross-PR prefix cache on the
    security node; both respect the 2048-token floor.
  • Tool-choice benchmark across the four Gemini function-calling modes;
    Instructor retained for forced structured extraction.
  • Central src/config.py for all tunables (one-way config -> state).

Tests: 210 passed, 2 deselected.

v0.6 - model-tier router, central config, verdict-driven human review

07 Jun 11:55
fbc55d7

Choose a tag to compare

Adds a model-tier router (UnifiedLLMClient) on top of the Gemini retry/
rotation spine, with tiered fallback and per-call cost accounting, plus a
single src/config.py for every tunable value. Human-review verdicts now drive
the report status, rule learning and the CI exit code through one shared path.
Scoring is multiplicative and citation verification is whitespace-safe.

Built on the v0.2-v0.5 line.

v0.5 - grounded compliance citations

06 Jun 17:05
7bab53f

Choose a tag to compare

Verified verbatim regulatory spans per compliance claim (quote -> substring-
verify -> drop hallucinated), on the Gemini spine. Plus the MCP week:
agent-as-client/server over stdio, multi-framework rule packs, a raw-stdio
test client, and the compliance audit trail.

v0.4 - Compliance grounding over MCP

06 Jun 12:47
f776c69

Choose a tag to compare

This release makes the agent ground its findings in real regulations, and
ships the memory, context, and reliability work that backs a production audit.

Highlights

Compliance grounding over MCP

The agent now speaks the Model Context Protocol on both ends. A new compliance
stage triages whether a diff is regulated and pulls the matching regulatory
passages, so a security finding cites the exact clause it breaks - for example,
a SQL-injection diff is grounded in the RBI Cyber Security Framework and
OWASP A03.

  • MCP client - the compliance node calls search_compliance_docs over
    stdio (retrieve -> compliance -> plan).
  • MCP server - compliance-rag (FastMCP) exposes the same retrieval as a
    reusable tool any MCP client can call, from Claude Desktop to a raw-SDK
    client, with no glue.
  • Pluggable framework packs - RBI, HIPAA, PCI-DSS, OWASP, and GDPR ship by
    default. Adding a framework is dropping a packs/*.yaml file and re-running
    the seeder; no code change.
  • Fails soft - a missing server or an unregulated diff yields empty context
    and a visible trace line, never a crash and never a silent "clean".

Memory, context, and orchestration

  • Four-type agent memory (semantic, episodic, procedural, in-context) with a
    typed procedural-rule lifecycle and a governance CLI.
  • Priority-ordered context budgeting and an in-graph history-compression node
    for long sessions.
  • The three audits run concurrently, with thread-safe API-key rotation under
    the fan-out.
  • Pluggable checkpointer: in-memory by default, opt-in durable SQLite.

Reliability

  • A depleted-credits 429 is classified as terminal billing rather than a
    transient rate limit, so it rotates keys instead of burning retries.
  • Corpus seeding is batched to stay under the embedding per-minute quota.
  • MCP tool output is normalized back to structured records on the client.

Verification

  • 163 tests passing (pytest -m "not integration").
  • Live end-to-end: a regulated diff produces cross-framework citations; an
    unregulated diff short-circuits with no lookup.

Full changelog: v0.3...v0.4

v0.3 - concurrent audits, rule governance, and design-rationale docs

03 Jun 20:54
6f1ebe0

Choose a tag to compare

  • Concurrent audit fan-out (async nodes, overlapping Gemini calls) with thread-safe key rotation in the retry layer.
  • Procedural-rule governance: full lifecycle (seeded / learned_pending / learned_approved / rejected / retired) plus an offline review
    CLI with a near-duplicate hint and per-rule PR-verdict provenance.
  • Latency benchmark harness and a design-rationale README section.
  • CI hardening: hermetic test job, async-correct gate, dependency and PR-approval-lookup fixes.