# TrustyAI Garak Provider - Unified Guide

This notebook is the end-to-end walkthrough for this repository.

Use this one notebook for all deployment modes:

- Total inline (local server + local scans)
- Partial remote (local server + KFP scans)
- Total remote (cluster server + KFP scans)

The workflow is mostly identical across modes. In practice, the main change is `BASE_URL` and deployment setup.


## What This Notebook Covers


- Listing all predefined Garak benchmarks

- Running a predefined benchmark
- Reading results (`scores`, probe-level metrics, `_overall`)
- TBSA interpretation (`tbsa`, `version_probe_hash`, `probe_detector_pairs_contributing`)
- Accessing scan artifacts like HTML report from job metadata
- Registering custom benchmarks
- Updating existing/predefined benchmarks (deep-merge behavior)
- Shield testing with `shield_ids` and `shield_config`

In [46]:
import logging
import time
import random
from rich.pretty import pprint
from pathlib import Path
import webbrowser

from llama_stack_client import LlamaStackClient
from llama_stack_provider_trustyai_garak import GarakCommandConfig, GarakSystemConfig, GarakRunConfig, GarakPluginsConfig, GarakReportingConfig

# Suppress noisy HTTPX logs
logging.getLogger("httpx").setLevel(logging.WARNING)

## 1) Configure `BASE_URL`


Set this depending on deployment mode.

In [2]:
# Partial remote or total inline (local llama-stack)
BASE_URL = "http://localhost:8321"

# Total remote examples:
# BASE_URL = "http://<lsd-service>.<namespace>.svc.cluster.local:8321"
# BASE_URL = "https://<route-hostname>"

client = LlamaStackClient(base_url=BASE_URL)

print(f"Connected to: {BASE_URL}")


Connected to: http://localhost:8321


## 2) Discover Garak Provider



In [3]:
garak_provider = next(
    p for p in client.providers.list()
    if p.provider_type.endswith("trustyai_garak")
)
garak_provider_id = garak_provider.provider_id

print("Garak provider:")
pprint(garak_provider)
print(f"Using provider_id: {garak_provider_id}")


Garak provider:


Using provider_id: trustyai_garak


## 3) List Predefined Benchmarks


In [4]:
all_benchmarks = client.alpha.benchmarks.list()
predefined = [b for b in all_benchmarks if b.identifier.startswith("trustyai_garak::")]

print(f"Found {len(predefined)} predefined benchmarks:")
identifier_col = "identifier"
name_col = "name"
desc_col = "description"
print("-" * 150)
print(f"{identifier_col:<{33}} | {name_col:<{27}} | {desc_col}")
print("-" * 150)
for b in sorted(predefined, key=lambda x: x.identifier):
    name = (b.metadata or {}).get(name_col, "")
    desc = (b.metadata or {}).get(desc_col, "")
    print(f"{b.identifier:<{33}} | {name:<{27}} | {desc}")


Found 8 predefined benchmarks:
------------------------------------------------------------------------------------------------------------------------------------------------------
identifier                        | name                        | description
------------------------------------------------------------------------------------------------------------------------------------------------------
trustyai_garak::avid              | AVID Taxonomy               | AI Vulnerability and Incident Database - All vulnerabilities
trustyai_garak::avid_ethics       | AVID Ethics Taxonomy        | AI Vulnerability and Incident Database - Ethical concerns
trustyai_garak::avid_performance  | AVID Performance Taxonomy   | AI Vulnerability and Incident Database - Performance issues
trustyai_garak::avid_security     | AVID Security Taxonomy      | AI Vulnerability and Incident Database - Security vulnerabilities
trustyai_garak::cwe               | Common Weakness Enumeration | Common Weaknes

### Performance Tip for Predefined Benchmarks

Predefined benchmarks are comprehensive by design and can take longer.

For faster exploratory runs, create a tuned variant and lower `garak_config.run.soft_probe_prompt_cap` (fewer prompts per probe). For full assessment/comparability, keep defaults or use the same cap across compared runs.

In [57]:
# Example: create a faster (less comprehensive) variant of a predefined benchmark
fast_benchmark_id = "owasp_fast_demo"

client.alpha.benchmarks.register(
    benchmark_id=fast_benchmark_id,
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id="trustyai_garak::owasp_llm_top10",
    metadata={
        "garak_config": {
            "run": {
                "soft_probe_prompt_cap": 100,
            }
        }
    },
)

pprint(client.alpha.benchmarks.retrieve(benchmark_id=fast_benchmark_id))

  client.alpha.benchmarks.register(


## 4) Pick a Model and Run a Predefined Benchmark


In [5]:
models = client.models.list()
llm_models = [m for m in models if (m.custom_metadata or {}).get("model_type") == "llm"]

if not llm_models:
    raise RuntimeError("No LLM models found. Register a model first.")

pprint(llm_models)

In [6]:
## pick a model you want to test
model_id = 'ollama/qwen2.5:1.5b'

## pick a benchmark you want to run
benchmark_id = "trustyai_garak::avid_performance"

In [7]:
job = client.alpha.eval.run_eval(
    benchmark_id=benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": model_id,
            "sampling_params": {
                "max_tokens": 100,
            },
        }
    },
)

print("Started job:")
pprint(job)

Started job:


In [8]:
def wait_for_job(client:LlamaStackClient, benchmark_id: str, job_id: str, poll_interval: int = 20):
    while True:
        status = client.alpha.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)
        print(status)
        if status.status in {"completed", "failed", "cancelled"}:
            return status
        time.sleep(poll_interval)

status = wait_for_job(client, benchmark_id=benchmark_id, job_id=job.job_id, poll_interval=90)
print(status)


Job(job_id='garak-job-823012c0-3727-42c3-aa6a-b81c12adb20c', status='in_progress', metadata={'created_at': '2026-02-17T23:02:26+00:00', 'kfp_run_id': '1ead3b7e-ad61-44fe-b7c1-acd2381840e6'})
Job(job_id='garak-job-823012c0-3727-42c3-aa6a-b81c12adb20c', status='in_progress', metadata={'created_at': '2026-02-17T23:02:26+00:00', 'kfp_run_id': '1ead3b7e-ad61-44fe-b7c1-acd2381840e6'})
Job(job_id='garak-job-823012c0-3727-42c3-aa6a-b81c12adb20c', status='in_progress', metadata={'created_at': '2026-02-17T23:02:26+00:00', 'kfp_run_id': '1ead3b7e-ad61-44fe-b7c1-acd2381840e6'})
Job(job_id='garak-job-823012c0-3727-42c3-aa6a-b81c12adb20c', status='in_progress', metadata={'created_at': '2026-02-17T23:02:26+00:00', 'kfp_run_id': '1ead3b7e-ad61-44fe-b7c1-acd2381840e6'})
Job(job_id='garak-job-823012c0-3727-42c3-aa6a-b81c12adb20c', status='in_progress', metadata={'created_at': '2026-02-17T23:02:26+00:00', 'kfp_run_id': '1ead3b7e-ad61-44fe-b7c1-acd2381840e6'})
Job(job_id='garak-job-823012c0-3727-42c3-aa6a

In [9]:
job_result = client.alpha.eval.jobs.retrieve(
    job_id=job.job_id,
    benchmark_id=benchmark_id,
)

print(f"generations: {len(job_result.generations)}")
print(f"score entries: {len(job_result.scores)}")


generations: 1088
score entries: 13


## 5) Understand Results: Probe Scores + `_overall`


`job_result.scores` contains:

- one entry per probe
- a synthetic `_overall` aggregate entry

`_overall.aggregated_results` is the first place to look for high-level posture.

In [None]:
aggregated = {k: v.aggregated_results for k, v in job_result.scores.items()}
overall = aggregated.get("_overall", {})

print("_overall aggregated_results:")
pprint(overall)

example_probe = random.choice(list(aggregated.keys()))
while example_probe == "_overall":
    example_probe = random.choice(list(aggregated.keys()))
if example_probe:
    print(f"\nExample probe ({example_probe}) aggregated_results:")
    pprint(aggregated[example_probe])


_overall aggregated_results:



Example probe (misleading.FalseAssertion) aggregated_results:


Each key representing a Garak probe name contains metrics such as `attack_success_rate`, along with `avid_taxonomy` information to help you understand which specific behavior of the model was assessed.

In [None]:
pprint(aggregated)

### TBSA Explained

TBSA (Tier-Based Security Aggregate) is a 1.0 to 5.0 score where higher is better.

How it works (high-level):

- Garak internally grades each probe:detector outcome on DEFCON-like 1-5 levels using pass-rate and z-score
- pass-rate DEFCON and z-score DEFCON are combined conservatively using `min(...)`
- probe:detector aggregates are grouped by tier
- tier-level aggregates are combined with weighted averaging (typically Tier1:Tier2 = 2:1)
- final score is rounded to 1 decimal place
- `version_probe_hash` will be same across multiple runs that use same probes. Use this to have apples to apples comparision for different runs

This gives more signal than plain pass/fail while still preserving severe failures.


## 6) Artifacts and Job Metadata


When a job completes, metadata includes artifact file IDs such as:

- `scan.log`: Detailed log of this scan.
- `scan.report.jsonl`: Report containing information about each attempt (prompt) of each garak probe.
- `scan.hitlog.jsonl`: Report containing only the information about attempts that the model was found vulnerable to.
- `scan.avid.jsonl`: AVID (AI Vulnerability Database) format of `scan.report.jsonl`. You can find info about AVID [here](https://avidml.org/).
- `scan.report.html`: Visual representation of the scan. This is logged as a html artifact of the pipeline.


In remote mode, keys are prefixed with `"{job_id}_..."`.

In [22]:
def find_scan_artifacts(metadata: dict):
    keys = [k for k in metadata.keys() if "scan" in k]
    return sorted(keys)

print("Artifact keys in last status metadata:")
pprint(find_scan_artifacts(status.metadata or {}))


Artifact keys in last status metadata:


You can retrieve the details and actual content of any of these files as below

In [25]:
scan_log = client.files.retrieve(status.metadata[f'{job.job_id}_scan.log'])
pprint(scan_log)

In [26]:
scan_log_content = client.files.content(status.metadata[f'{job.job_id}_scan.log'])
# printing last 10 lines
scan_log_content.split('\n')[-10:]

['2026-02-17 23:41:43,081  DEBUG  probe return: <garak.probes.tap.TAPCached object at 0x7fc979a398b0> with 9 attempts',
 '2026-02-17 23:41:43,081  DEBUG  harness: run detector garak.detectors.mitigation.MitigationBypass',
 '2026-02-17 23:41:43,085  DEBUG  harness: probe list iteration completed',
 '2026-02-17 23:41:43,085  INFO  run complete, ending',
 '2026-02-17 23:41:43,261  INFO  garak run complete in 2281.40s',
 '2026-02-17 23:41:43,265  DEBUG  close.started',
 '2026-02-17 23:41:43,265  DEBUG  close.complete',
 '2026-02-17 23:41:43,564  DEBUG  close.started',
 '2026-02-17 23:41:43,564  DEBUG  close.complete',
 '']

Let's get the HTML report of this scan to get the detailed visual results

In [None]:
scan_report_content = client.files.content(status.metadata[f'{job.job_id}_scan.report.html'])

## save the content to a file
with open(f'{Path.cwd()}/reports/{job.job_id}_scan.report.html', 'w') as f:
    f.write(scan_report_content)

## open the file in browser
webbrowser.open(f'file://{Path.cwd()}/reports/{job.job_id}_scan.report.html')


True

## 7) Register a Custom Benchmark



Use `metadata.garak_config` for Garak command config.
Provider runtime knobs like `timeout`, `shield_ids`, `shield_config` remain top-level metadata keys.

Please refer [BENCHMARK_METADATA_REFERENCE.md](../BENCHMARK_METADATA_REFERENCE.md) for detailed config options.

In [None]:
custom_benchmark_id = "custom_promptinject"

client.alpha.benchmarks.register(
    benchmark_id=custom_benchmark_id,
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id=custom_benchmark_id,
    metadata={
        "garak_config": GarakCommandConfig(
            system=GarakSystemConfig(
                parallel_attempts=8,
            ),
            run=GarakRunConfig(
                soft_probe_prompt_cap=100,
                generations=1,
                eval_threshold=0.5,
                seed=333
            ),
            plugins=GarakPluginsConfig(
                probe_spec=["promptinject"]
            ),
            reporting=GarakReportingConfig(
                taxonomy="owasp",
            )
        ),
        "timeout": 900,
    },
)

print("Registered custom benchmark")
pprint(client.alpha.benchmarks.retrieve(benchmark_id=custom_benchmark_id))


Registered custom benchmark


## 8) Update Existing Benchmarks (Deep-Merge)



You can build a tuned benchmark by setting `provider_benchmark_id` to an existing benchmark (including predefined ones).

The provider deep-merges your metadata onto the base metadata.

In [None]:
updated_benchmark_id = "promptinject_hijackhatehumans"
client.alpha.benchmarks.unregister(benchmark_id=updated_benchmark_id)
client.alpha.benchmarks.register(
    benchmark_id=updated_benchmark_id,
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id=custom_benchmark_id,
    metadata={
        "garak_config": GarakCommandConfig(
            plugins=GarakPluginsConfig(
                probe_spec=["promptinject.HijackHateHumans"],
            )
        ),
        "timeout": 600,
    },
)

print("Registered deep-merged benchmark")
pprint(client.alpha.benchmarks.retrieve(benchmark_id=updated_benchmark_id))


Registered deep-merged benchmark


Let's run this benchmark

In [34]:
pi_job = client.alpha.eval.run_eval(
    benchmark_id=updated_benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": model_id,
            "sampling_params": {
                "max_tokens": 100,
            },
        }
    },
)

print("Started job:")
pprint(pi_job)

Started job:


In [35]:
pi_status = wait_for_job(client, benchmark_id=updated_benchmark_id, job_id=pi_job.job_id, poll_interval=30)
print(pi_status)


Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8819ad4e3076', status='scheduled', metadata={'created_at': '2026-02-18T00:16:25+00:00', 'kfp_run_id': '3a992f0e-d62b-4551-a685-9a34b576ad76'})
Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8819ad4e3076', status='in_progress', metadata={'created_at': '2026-02-18T00:16:25+00:00', 'kfp_run_id': '3a992f0e-d62b-4551-a685-9a34b576ad76'})
Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8819ad4e3076', status='in_progress', metadata={'created_at': '2026-02-18T00:16:25+00:00', 'kfp_run_id': '3a992f0e-d62b-4551-a685-9a34b576ad76'})
Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8819ad4e3076', status='in_progress', metadata={'created_at': '2026-02-18T00:16:25+00:00', 'kfp_run_id': '3a992f0e-d62b-4551-a685-9a34b576ad76'})
Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8819ad4e3076', status='in_progress', metadata={'created_at': '2026-02-18T00:16:25+00:00', 'kfp_run_id': '3a992f0e-d62b-4551-a685-9a34b576ad76'})
Job(job_id='garak-job-7f69fb7d-5328-4eae-886b-8

In [36]:
pi_job_result = client.alpha.eval.jobs.retrieve(
    job_id=pi_job.job_id,
    benchmark_id=updated_benchmark_id,
)

In [40]:
example_probe = "promptinject.HijackHateHumans" # let's use this probe for the example

In [41]:
aggregated = {k: v.aggregated_results for k, v in pi_job_result.scores.items()}

pprint(aggregated[example_probe])

Our model seems to be vulnerable to prompt injections 😥. Let's apply an input shield to limit these type of attacks and let's check if our input shield works -

## 9) Shield Testing


Two patterns are supported:

- `shield_ids`: all treated as input shields
- `shield_config`: explicit `input`/`output` mapping

In [20]:
shield_ids_benchmark_id = "promptinject_inp_shield"

# let's use previous benchmark as a base
client.alpha.benchmarks.register(
    benchmark_id=shield_ids_benchmark_id,
    dataset_id="garak",
    scoring_functions=["garak_scoring"],
    provider_id=garak_provider_id,
    provider_benchmark_id=updated_benchmark_id,
    metadata={
        "shield_ids": ["llama-guard"],
        "timeout": 600,
    },
)

print("Registered shield_ids benchmark")
pprint(client.alpha.benchmarks.retrieve(benchmark_id=shield_ids_benchmark_id))


Registered shield_ids benchmark


  client.alpha.benchmarks.register(


In [42]:
in_shield_job = client.alpha.eval.run_eval(
    benchmark_id=shield_ids_benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": model_id,
            "sampling_params": {
                "max_tokens": 100,
            },
        }
    },
)

print("Started job:")
pprint(job)

Started job:


In [43]:
in_shield_status = wait_for_job(client, benchmark_id=shield_ids_benchmark_id, job_id=in_shield_job.job_id, poll_interval=30)
print(in_shield_status)


Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a2ec2e5f8bd3', status='scheduled', metadata={'created_at': '2026-02-18T00:44:07+00:00', 'kfp_run_id': '67ad024f-eefe-4475-9bdc-37cb5cd3c0b9'})
Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a2ec2e5f8bd3', status='in_progress', metadata={'created_at': '2026-02-18T00:44:07+00:00', 'kfp_run_id': '67ad024f-eefe-4475-9bdc-37cb5cd3c0b9'})
Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a2ec2e5f8bd3', status='in_progress', metadata={'created_at': '2026-02-18T00:44:07+00:00', 'kfp_run_id': '67ad024f-eefe-4475-9bdc-37cb5cd3c0b9'})
Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a2ec2e5f8bd3', status='in_progress', metadata={'created_at': '2026-02-18T00:44:07+00:00', 'kfp_run_id': '67ad024f-eefe-4475-9bdc-37cb5cd3c0b9'})
Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a2ec2e5f8bd3', status='in_progress', metadata={'created_at': '2026-02-18T00:44:07+00:00', 'kfp_run_id': '67ad024f-eefe-4475-9bdc-37cb5cd3c0b9'})
Job(job_id='garak-job-bfae4d2a-3291-403f-b6b3-a

In [44]:
in_shield_job_result = client.alpha.eval.jobs.retrieve(
    job_id=in_shield_job.job_id,
    benchmark_id=shield_ids_benchmark_id,
)

In [45]:
in_shield_aggregated = {k: v.aggregated_results for k, v in in_shield_job_result.scores.items()}

pprint(in_shield_aggregated[example_probe])

That shield worked great to catch all the propmt injections \o/.