llmgate

Pre-deploy LLM regression testing for CI pipelines. Trace your LLM calls, then llmgate diff baseline current fails your PR if output quality dropped. No server, no account, SQLite only.

import llmgate

@llmgate.trace
def answer(question: str) -> str:
    return my_llm_call(question)

That's it. Every call is logged locally. When you change your prompt or swap models, run:

llmgate diff main feature-branch

If outputs degraded, the command exits 1 and your PR fails. No server, no account, no config.

Install

pip install llmgate

Usage

1. Trace your LLM calls

import llmgate
import os

os.environ["LLMGATE_RUN_ID"] = "v1.0"  # set per run, or use git SHA in CI

@llmgate.trace
def my_pipeline(query: str) -> str:
    context = retrieve(query)
    return llm.complete(f"{context}\n\n{query}")

2. Assert output quality

output = my_pipeline("What is the capital of France?")

llmgate.assert_contains(output, "Paris")
llmgate.assert_output(output, lambda s: len(s) < 500, "response too long")
llmgate.assert_similarity(output, baseline, threshold=0.85)

3. Diff runs in CI

# See all recorded runs
llmgate runs

# Compare two runs — exits 1 if regressions found
llmgate diff v1.0 v1.1

# Inspect a specific run
llmgate show abc123

4. GitHub Actions

- name: Run LLM eval suite
  env:
    LLMGATE_RUN_ID: ${{ github.sha }}
  run: python examples/eval_suite.py

- name: Check for regressions
  run: llmgate diff ${{ github.base_ref }} ${{ github.sha }}

How it works

All traces are stored in .llmgate.db (SQLite, commit it or cache it as a CI artifact)
@llmgate.trace works with any function that returns a string, or OpenAI/Anthropic response objects
llmgate diff computes token-level similarity between baseline and current outputs
Nothing leaves your machine unless you choose to push the .db file

CLI reference

llmgate runs                          # list all runs with stats
llmgate show <run-id>                 # inspect calls in a run
llmgate diff <baseline> <current>     # compare runs, exit 1 on regression
  --threshold FLOAT                   # similarity threshold (default: 0.8)
  --no-fail                           # report only, don't exit 1

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
examples		examples
src/llmgate		src/llmgate
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llmgate

Install

Usage

1. Trace your LLM calls

2. Assert output quality

3. Diff runs in CI

4. GitHub Actions

How it works

CLI reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llmgate

Install

Usage

1. Trace your LLM calls

2. Assert output quality

3. Diff runs in CI

4. GitHub Actions

How it works

CLI reference

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages