<a href="https://colab.research.google.com/github/uptrain-ai/uptrain/blob/main/examples/checks/custom/custom_evals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align="center">
  <a href="https://uptrain.ai">
    <img width="300" src="https://user-images.githubusercontent.com/108270398/214240695-4f958b76-c993-4ddd-8de6-8668f4d0da84.png" alt="uptrain">
  </a>
</h1>

<h1 style="text-align: center;">Building Custom Evaluations using UpTrain</h1>

UpTrain offers a multitude of [pre-built evaluations](https://docs.uptrain.ai/predefined-evaluations/overview) that use custom prompt templates to evaluate your model's performance. If you want to create your own custom prompt templates for evaluations, you can check out the [Custom Prompt Evals Tutorial](https://github.com/uptrain-ai/uptrain/blob/main/examples/checks/custom/custom_prompt_evals.ipynb). All of these evaluations involve making LLM calls. This is not always necessary. Some evaluations can be done with simple Python code. In this tutorial, we will show you how to create custom evaluations using Python code.

 
If you face any difficulties, need some help with using UpTrain or want to brainstorm on custom evaluations for your use-case, [speak to the maintainers of UpTrain here](https://calendly.com/uptrain-sourabh/30min).
 

## Step 1: Install UpTrain by running 'pip install uptrain'

In [1]:
#!pip install uptrain

## Step 2: Let's define the custom evaluation

Let's say you want to check how diverse the vocabulary of your model is. You can do this by checking the number of unique words in the output of your model. We will define a function that takes in the output of the model and returns the number of unique words in the output. 

In [15]:
from uptrain.framework.base import Settings
from uptrain.operators.base import TransformOp, register_custom_op, TYPE_TABLE_OUTPUT
import polars as pl


# Note: Make sure that the score column names start with "score_" in order for them to show up in the dashboard
@register_custom_op
class DiverseVocabularyEval(TransformOp):
    col_in_text: str = "response"
    col_out_score: str = "score_diverse_vocabulary"

    def setup(self, settings: Settings):
        return self

    def run(self, data: pl.DataFrame) -> TYPE_TABLE_OUTPUT:
        scores = data.get_column(self.col_in_text).map_elements(lambda s : round(len(set(s.split())) / len(s.split()), 2))
        return {"output": data.with_columns([scores.alias(self.col_out_score)])}

In [16]:
from uptrain import EvalLLM
from uptrain.operators.language.text import WordCount
import os

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")

data = [
    {
        "question": "What are the primary components of a cell?",
        "response": "A cell comprises a cell membrane, cytoplasm, and nucleus. The cell membrane regulates substance passage, the cytoplasm contains organelles, and the nucleus houses genetic material."
    },
    {
        "question": "How does photosynthesis work?",
        "response": "Photosynthesis converts light energy into chemical energy in plants, algae, and some bacteria. Chlorophyll absorbs sunlight, synthesizing glucose from carbon dioxide and water, with oxygen released as a byproduct."
    },
    {
        "question": "What are the key features of the Python programming language?",
        "response": "Python is a high-level, interpreted language known for readability. It supports object-oriented, imperative, and functional programming with a large standard library, dynamic typing, and automatic memory management."
    }
]

eval_llm = EvalLLM(openai_api_key=OPENAI_API_KEY)

results = eval_llm.evaluate(
    data=data,
    checks=[
        DiverseVocabularyEval(col_in_text="response"),
        WordCount(col_in_text="response"), # You can also use the built-in operators offered by Uptrain
    ],
)

[32m2024-03-20 16:52:40.204[0m | [34m[1mDEBUG   [0m | [36muptrain.framework.base[0m:[36mrun[0m:[36m217[0m - [34m[1mExecuting node: operator_0 for operator DAG: dummy[0m


In [17]:
results

[{'question': 'What are the primary components of a cell?',
  'response': 'A cell comprises a cell membrane, cytoplasm, and nucleus. The cell membrane regulates substance passage, the cytoplasm contains organelles, and the nucleus houses genetic material.',
  'score_diverse_vocabulary': 0.84,
  'word_count': 25},
 {'question': 'How does photosynthesis work?',
  'response': 'Photosynthesis converts light energy into chemical energy in plants, algae, and some bacteria. Chlorophyll absorbs sunlight, synthesizing glucose from carbon dioxide and water, with oxygen released as a byproduct.',
  'score_diverse_vocabulary': 0.93,
  'word_count': 29},
 {'question': 'What are the key features of the Python programming language?',
  'response': 'Python is a high-level, interpreted language known for readability. It supports object-oriented, imperative, and functional programming with a large standard library, dynamic typing, and automatic memory management.',
  'score_diverse_vocabulary': 0.93,
  