# Adaptive Evaluations with Scorebook - Evaluating an OpenAI GPT Model

This quick-start guide showcases an adaptive evaluation of OpenAI's GPT-4o Mini model.

We recommend that you first see our [getting started quick-start guide]() if you have not done so already, for more of a detailed overview on adaptive testing and setting up Trismik credentials.

## Prerequisites

- **Trismik API key**: Generate a Trismik API key from the [Trismik dashboard's settings page](https://app.trismik.com/settings).
- **OpenAI API key**: Generate an OpenAI API key from [OpenAI's API Platform](https://openai.com/api/).

## Setup Trismik Credentials

In [None]:
# Set your credentials here
TRISMIK_API_KEY = "your-trismik-api-key-here"
OPENAI_API_KEY = "your-openai-api-key-here"

## Login with Trismik API Key and Create a Project

In [None]:
from scorebook import create_project, login

# Login to Trismik
login(TRISMIK_API_KEY)
print("✓ Logged in to Trismik")

# Create a project
project = create_project(
    name = "Adaptive Evaluation Demo - GPT-4o Mini",
    description= "A project created as part of Trismik's quick-start guides."
)

print("✓ Project created")
print(f"Project ID: {project.id}")

## Define an Inference Function

In [None]:
from openai import OpenAI
from typing import Any, List
import string

client = OpenAI(api_key=OPENAI_API_KEY)

def openai_inference(inputs: List[Any], **hyperparameters: Any) -> List[Any]:
    """Process inputs through OpenAI's API"""

    outputs = []
    for idx, input_item in enumerate(inputs):

        # Format prompt
        choices = input_item.get("options", [])
        prompt = (
            str(input_item.get("question", ""))
            + "\nOptions:\n"
            + "\n".join(
                f"{letter}: {choice['text'] if isinstance(choice, dict) else choice}"
                for letter, choice in zip(string.ascii_uppercase, choices)
            )
        )

        # Build messages for OpenAI API
        messages = [
            {
                "role": "system",
                "content": hyperparameters["system_message"]
            },
            {"role": "user", "content": prompt},
        ]

        # Call OpenAI API and extract output from the response
        try:
            response = client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.7,
            )
            output = response.choices[0].message.content.strip()

        except Exception as e:
            output = f"Error: {str(e)}"

        outputs.append(output)

    return outputs

## Run an Adaptive Evaluation

For asynchronous inference functions, use Scorebook's `evaluate_async` function. Both `evaluate` and `evaluate_async` are identical except are synchronous/awaitable respectively.

In [None]:
from scorebook import evaluate

# Run adaptive evaluation
results = evaluate(
    inference = openai_inference,
    datasets = "trismik/figQA:adaptive",
    hyperparameters = {"system_message": "Answer the question with only the letter of the correct option. No additonal text or context"},
    split = "validation",
    experiment_id = "Adaptive-Common-Sense-QA-Notebook",
    project_id = project.id,
)

# Print the adaptive evaluation results
print("✓ Adaptive evaluation complete!")
print("Results: ", results[0]["score"])

## Next Steps