# Running Evaluations

This notebook illustrates the use of the evaluation framework.

## Setup

Start the evaluation target, which is the agent API server:
```
./opschat.sh run
```
Find the FastAPI host info in the output:
```
FastAPI: http://10.128.135.97:<PORT-NUMBER> or http://localhost:<PORT-NUMBER>
```

Copy the API host information to your `.env` file:
```
API_BASE=http://localhost:<PORT-NUMBER>
```

## Test Configuration

Settings for the tests are saved in YAML files in the following directory: `/eval-config`

## Executing Evaluations
### Manual script execution
The script requires the name of the configuration file to load, and will look in the `/eval-config` directory.


In [None]:
!python3 ../evals/evaluate_agent.py golden-questions.yaml

### Python code execution in notebook

This is the recommended method for developers to run the process.

#### Setup - run once

In [None]:
import json
import nest_asyncio
nest_asyncio.apply()

import os, sys
lib_path = os.path.abspath(os.path.join(os.path.curdir, ".."))
if lib_path not in sys.path:
    sys.path.insert(0, lib_path)

from evals.evaluate_agent import run_llm_test

#### Running an eval configuration

In [None]:
result = run_llm_test("golden-questions.yaml")
print(">>> DONE:", json.dumps(result, indent=2))

## Evaluation Output

The process outputs the LLM responses and the corresponding metric scores to files on disk in json format. They can be found here:
- `/eval-data/questions`
- `/eval-data/responses`
- `/eval-data/scores`
- `/eval-data/summary`