Skip to content

zer0contextlost/llmgate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmgate

Pre-deploy LLM regression testing for CI pipelines. Trace your LLM calls, then llmgate diff baseline current fails your PR if output quality dropped. No server, no account, SQLite only.

import llmgate

@llmgate.trace
def answer(question: str) -> str:
    return my_llm_call(question)

That's it. Every call is logged locally. When you change your prompt or swap models, run:

llmgate diff main feature-branch

If outputs degraded, the command exits 1 and your PR fails. No server, no account, no config.


Install

pip install llmgate

Usage

1. Trace your LLM calls

import llmgate
import os

os.environ["LLMGATE_RUN_ID"] = "v1.0"  # set per run, or use git SHA in CI

@llmgate.trace
def my_pipeline(query: str) -> str:
    context = retrieve(query)
    return llm.complete(f"{context}\n\n{query}")

2. Assert output quality

output = my_pipeline("What is the capital of France?")

llmgate.assert_contains(output, "Paris")
llmgate.assert_output(output, lambda s: len(s) < 500, "response too long")
llmgate.assert_similarity(output, baseline, threshold=0.85)

3. Diff runs in CI

# See all recorded runs
llmgate runs

# Compare two runs — exits 1 if regressions found
llmgate diff v1.0 v1.1

# Inspect a specific run
llmgate show abc123

4. GitHub Actions

- name: Run LLM eval suite
  env:
    LLMGATE_RUN_ID: ${{ github.sha }}
  run: python examples/eval_suite.py

- name: Check for regressions
  run: llmgate diff ${{ github.base_ref }} ${{ github.sha }}

How it works

  • All traces are stored in .llmgate.db (SQLite, commit it or cache it as a CI artifact)
  • @llmgate.trace works with any function that returns a string, or OpenAI/Anthropic response objects
  • llmgate diff computes token-level similarity between baseline and current outputs
  • Nothing leaves your machine unless you choose to push the .db file

CLI reference

llmgate runs                          # list all runs with stats
llmgate show <run-id>                 # inspect calls in a run
llmgate diff <baseline> <current>     # compare runs, exit 1 on regression
  --threshold FLOAT                   # similarity threshold (default: 0.8)
  --no-fail                           # report only, don't exit 1

About

Pre-deploy LLM regression testing for CI pipelines

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages