## Setup

Import this notebook into your OpenShift AI Workbench. If you want to run this notebook locally, create a `Route` to your Llama Stack distribution, or use port-forwarding as an alternative.

In [1]:
#### Install below if not available in your environment
# !pip install llama-stack-client rich

In [2]:
import time
from rich.pretty import pprint
import logging

logging.getLogger("httpx").setLevel(logging.WARNING)

In [None]:
BASE_URL = "http://lsd-garak-example-service.model-namespace.svc.cluster.local:8321" # update this with your service name, namespace and cluster local address
def create_http_client():
    from llama_stack_client import LlamaStackClient
    return LlamaStackClient(base_url=BASE_URL)

client = create_http_client()

In [None]:
# This will list all the providers registered with the TrustyAI Llama Stack Distribution but we will be using `trustyai_garak` provider for this demo
client.providers.list()

In [4]:
## let's find our remote trustyai_garak provider
remote_garak_provider = None
for provider in client.providers.list():
    if provider.provider_type == "remote::trustyai_garak":
        remote_garak_provider = provider
        break

assert remote_garak_provider is not None, "Could not find remote::trustyai_garak provider"

remote_garak_provider_id = remote_garak_provider.provider_id



In [5]:
models = client.models.list()

## let's filter out the models of type `llm`
llm_models = [model for model in models if model.api_model_type == "llm"]
pprint(llm_models)


In [6]:
## select the model to use for this demo. 
model_id = models[0].identifier # 'vllm-inference/Granite-3.3-8B-Instruct'

In [7]:
### utility to get summary metrics of a scan
def get_summary_metrics(aggregated_scores: dict):
    summary_metrics = {"total_attempts": 0, "vulnerable_responses": 0, "attack_success_rate": 0}
    for aggregated_results in aggregated_scores.values():
        summary_metrics["total_attempts"] += aggregated_results["total_attempts"]
        summary_metrics["vulnerable_responses"] += aggregated_results["vulnerable_responses"]
    summary_metrics["attack_success_rate"] = round((summary_metrics["vulnerable_responses"] / summary_metrics["total_attempts"] * 100), 2) if summary_metrics["total_attempts"] > 0 else 0

    return summary_metrics

## Pre-defined Benchmarks

The Garak remote provider comes with some pre-defined benchmarks to readily use to assess your LLM for vulnerabilities.

In [8]:
pprint(client.benchmarks.list())

Let's run one of the predefined benchmarks

In [9]:
pre_defined_benchmark_id = "trustyai_garak::avid_performance"

In [10]:
job = client.alpha.eval.run_eval(
    benchmark_id=pre_defined_benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": model_id,
            "sampling_params": {
                "max_tokens": 100
            },
        }
     },
)

pprint(f"Starting job '{job}'")

Head to your KFP/DSP UI to see the created pipeline. It'll be some thing like below - 

<p align="center">
<img src="pics/kfp-start.png" alt="kfp-start"/>
</p>

Here's the actual pipeline DAG - 

<p align="center">
<img src="pics/kfp-dag.png" alt="kfp-end" width="700"/>
</p>

Once pipeline completes, you can click the output artifacts of `parse-results` task to see summary_metrics and html report.


This pipeline took about 8 minutes for me, but your timing may vary depending on model size, latency, `max_tokens`, and other factors. Let’s poll the status of the pipeline.


In [11]:
poll_interval = 20

def get_job_status(job_id, benchmark_id):
    return client.alpha.eval.jobs.status(job_id=job_id, benchmark_id=benchmark_id)

while True:
    job = get_job_status(job_id=job.job_id, benchmark_id=pre_defined_benchmark_id)
    print(job)

    if job.status in ['failed', 'completed', 'cancelled']:
        print("="*100)
        print(f"Job ended with status: {job.status}")
        break

    time.sleep(poll_interval)

Job(job_id='garak-job-45ee7c44-0870-4a34-920a-0bc4395e91fe', status='in_progress', metadata={'created_at': '2025-12-10T22:22:49+00:00', 'kfp_run_id': '8ddcc049-cda6-4de2-aa0c-bb23666ba311'})
Job(job_id='garak-job-45ee7c44-0870-4a34-920a-0bc4395e91fe', status='in_progress', metadata={'created_at': '2025-12-10T22:22:49+00:00', 'kfp_run_id': '8ddcc049-cda6-4de2-aa0c-bb23666ba311'})
Job(job_id='garak-job-45ee7c44-0870-4a34-920a-0bc4395e91fe', status='in_progress', metadata={'created_at': '2025-12-10T22:22:49+00:00', 'kfp_run_id': '8ddcc049-cda6-4de2-aa0c-bb23666ba311'})
Job(job_id='garak-job-45ee7c44-0870-4a34-920a-0bc4395e91fe', status='in_progress', metadata={'created_at': '2025-12-10T22:22:49+00:00', 'kfp_run_id': '8ddcc049-cda6-4de2-aa0c-bb23666ba311'})
Job(job_id='garak-job-45ee7c44-0870-4a34-920a-0bc4395e91fe', status='in_progress', metadata={'created_at': '2025-12-10T22:22:49+00:00', 'kfp_run_id': '8ddcc049-cda6-4de2-aa0c-bb23666ba311'})
Job(job_id='garak-job-45ee7c44-0870-4a34-920a

The completed pipeline logs its metrics, which can be viewed from the run list screen as shown below.

<p align="center">
<img src="pics/kfp-metrics.png" alt="metrics"/>
</p>

Let's get detailed results of this scan

In [14]:
job_result = client.alpha.eval.jobs.retrieve(job_id=job.job_id, benchmark_id=pre_defined_benchmark_id)
scores = job_result.scores

Each key represents a Garak probe name, and the corresponding value contains metrics such as `attack_success_rate`, along with `avid_taxonomy` information to help you understand which specific model behavior was assessed.

In [15]:
aggregated_scores = {k: v.aggregated_results for k, v in scores.items()}
pprint(aggregated_scores)

In [16]:
## let's calculate the summary metrics of this scan
pprint(get_summary_metrics(aggregated_scores))

In [17]:
pprint(job.metadata)

All the scan related files below are saved - 

- `scan.log`: Detailed log of this scan.
- `scan.report.jsonl`: Report containing information about each attempt (prompt) of each garak probe.
- `scan.hitlog.jsonl`: Report containing only the information about attempts that the model was found vulnerable to.
- `scan.avid.jsonl`: AVID (AI Vulnerability Database) format of `scan.report.jsonl`. You can find info about AVID [here](https://avidml.org/).
- `scan.report.html`: Visual representation of the scan. This is logged as a html artifact of the pipeline.

You can retrieve the details and actual content of any of these files as below

In [19]:
scan_log = client.files.retrieve(job.metadata[f'{job.job_id}_scan.log'])
pprint(scan_log)

In [22]:
scan_log_content = client.files.content(job.metadata[f'{job.job_id}_scan.log'])
# printing last 10 lines
scan_log_content.split('\n')[-10:]

['2025-12-10 22:29:33,305  DEBUG  HTTP Response: POST http://lsd-garak-example-service.model-namespace.svc.cluster.local:8321/v1/chat/completions "200 OK" Headers({\'date\': \'Wed, 10 Dec 2025 22:29:28 GMT\', \'server\': \'uvicorn\', \'content-length\': \'1068\', \'content-type\': \'application/json\', \'x-trace-id\': \'7d24c75a01c0c28e5ef737e2d72447d2\'})',
 '2025-12-10 22:29:33,305  DEBUG  request_id: None',
 '2025-12-10 22:29:33,332  DEBUG  probe return: <garak.probes.tap.TAPCached object at 0x7f612e2d70e0> with 9 attempts',
 '2025-12-10 22:29:33,332  DEBUG  harness: run detector garak.detectors.mitigation.MitigationBypass',
 '2025-12-10 22:29:33,335  DEBUG  harness: probe list iteration completed',
 '2025-12-10 22:29:33,335  INFO  run complete, ending',
 '2025-12-10 22:29:33,378  INFO  garak run complete in 329.55s',
 '2025-12-10 22:29:33,525  DEBUG  close.started',
 '2025-12-10 22:29:33,526  DEBUG  close.complete',
 '']

You can try running `trustyai_garak::owasp_llm_top10` benchmark for more comprehensive coverage but it'll take quite some time to finish.

## Run other garak probes

You are not limited to only pre-defined benchmarks. You can create a benchmark with garak probes of interest (llm related) you can find [here](https://reference.garak.ai/en/stable/probes.html)

In [38]:
user_defined_probe_benchmark_id = "dan"

client.benchmarks.register(
    benchmark_id=user_defined_probe_benchmark_id,
    dataset_id="garak", # placeholder
    scoring_functions=["garak_scoring"], # placeholder
    provider_benchmark_id=user_defined_probe_benchmark_id,
    provider_id="trustyai_garak",
    metadata={
        "probes": ["dan.DanInTheWild"],
        "timeout": 60*60, # in seconds; optional (default is 3hrs, set to higher number if you think it'll take longer)
    }
)

In [39]:
job = client.alpha.eval.run_eval(
    benchmark_id=user_defined_probe_benchmark_id,
    benchmark_config={
        "eval_candidate": {
            "type": "model",
            "model": model_id,
            "sampling_params": {
                "max_tokens": 100
            },
        }
     },
)

print(f"Starting job '{job}'")

Starting job 'Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='scheduled', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})'


In [40]:
while True:
    job = get_job_status(job_id=job.job_id, benchmark_id=user_defined_probe_benchmark_id)
    print(job)

    if job.status in ['failed', 'completed', 'cancelled']:
        print("="*100)
        print(f"Job ended with status: {job.status}")
        break

    time.sleep(45)

Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='scheduled', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})
Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='in_progress', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})
Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='in_progress', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})
Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='in_progress', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})
Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6f76a2c252f3', status='in_progress', metadata={'created_at': '2025-12-11T01:09:53+00:00', 'kfp_run_id': 'fb9728b5-33af-4db1-99ca-5055a02d11d1'})
Job(job_id='garak-job-c48c23e2-b637-4c7f-8a95-6

In [41]:
job_result = client.alpha.eval.jobs.retrieve(job_id=job.job_id, benchmark_id=user_defined_probe_benchmark_id)
scores = job_result.scores

In [42]:
aggregated_scores = {k: v.aggregated_results for k, v in scores.items()}
pprint(aggregated_scores)

You can also test your guardrail (shield) effectiveness. You just have to pass the shield_id you want to use in the benchmark during registration as below - 

```python
benchmark_id = "pi_shield"
client.benchmarks.register(
    benchmark_id=benchmark_id,
    dataset_id=benchmark_id,
    scoring_functions=["string"],
    provider_benchmark_id="pi_shield",
    provider_id="trustyai_garak",
    metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "timeout": 60*10,
        "shield_ids": ["Prompt-Guard-86M"]
    }
)
```

If you want to use multiple shields as input and output guardrails, you can pass `shield_config` instead of `shield_ids` in the metadata - 

```json
metadata={
        "probes": ["promptinject.HijackHateHumans"],
        "timeout": 600,
        "shield_config": {
            "input": ["Prompt-Guard-86M"], // input shield_ids
            "output": ["Llama-Guard-3-8B"] // output shield_ids
    }

}
```

If you pass both `shield_config` and `shield_ids`, only `shield_ids` is considered. Once the shields are specified in benchmark, you can use the same `client.alpha.eval.run_eval` we did above to run the scan. Based on how much a shield is reducing the `attack_success_rate`, you'll get an early estimate of which shields work.
