# OpenAI OSS Models as Judge with TruLens

Evaluation is a key component for using TruLens, useful for assessing the quality of AI apps and increasingly AI agents.

This presents a competing set of requirements for many developers looking to build and assess the quality of agents.

1. We need powerful, reliable LLMs to assess the performance of increasingly complex tasks.
2. The models used for evaluation should not be cost prohibitive (as the token requirements are large).
3. In many situations, the models themselves need to be runnable on local hardware (rather than via an API)

To meet these requirements, we often have to choose between large, proprietary models and smaller, open ones.

OpenAI's release of GPT-OSS models (20B and 120B) are an important advancement to adress these competing requirements offering highly performant reasoning at a competitive price, runnable on local hardware.

TruLens offers day-0 support of these models, allowing you to evaluate your AI agents with powerful OSS models like the GPT-OSS series.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truera/trulens/blob/main/examples/expositional/models/local_and_OSS_models/openai-gpt-oss.ipynb)

## Consider a challenging groundedness evaluation

In [None]:
source_text = """
Clinical decision support (CDS) software that provides recommendations based on AI algorithms may be 
considered a medical device if it is intended to inform clinical management. 
However, for such software to be exempt from regulation, it must allow healthcare professionals to 
independently review the basis of its recommendations. The FDA does not endorse any software that acts 
as a substitute for clinical judgment or is used as the sole basis for treatment decisions.
"""

In [None]:
claim_hallucination = "The FDA’s 2023 guidance explicitly states that AI-generated diagnoses may be used as a sole basis for treatment decisions in clinical settings."

claim_grounded = "According to the FDA, clinical decision support software must enable healthcare professionals to independently review how recommendations are made, in order to be exempt from regulation."

## Evaluate using the TruLens _LiteLLM_ provider & _Ollama_

To use, first you need to [download ollama](https://ollama.com/download).

Then, run `ollama run gpt-oss`.

Once the model is pulled, you can use it in TruLens!

In [None]:
from trulens.providers.litellm import LiteLLM

ollama_provider = LiteLLM(
    model_engine="ollama/gpt-oss", api_base="http://localhost:11434"
)

In [None]:
ollama_provider.groundedness_measure_with_cot_reasons(source_text, claim_hallucination)

In [None]:
ollama_provider.groundedness_measure_with_cot_reasons(source_text, claim_grounded)