# 📊 Client-Side Custom Metrics with TruLens

This notebook demonstrates how to create and use client-side custom metrics with TruLens. Client-side custom metrics allow you to define your own evaluation functions that run locally on the client instead of on the server (Snowflake).

## Key Features

- **Custom Metric Decorator**: Use `@custom_metric` to convert any function into a metric
- **EvaluationConfig**: Explicit configuration for mapping metric parameters to span attributes
- **Flexible Selectors**: Map metric parameters to span attributes using selectors
- **Client-Side Computation**: Metrics are computed locally and results uploaded as OTEL spans
- **Seamless Integration**: Works with existing TruLens apps and runs

## Prerequisites

- OTEL tracing enabled
- TruLens feedback package installed
- Access to a TruLens app with instrumented methods


In [None]:
# Setup and imports
import os

from dotenv import load_dotenv
import pandas as pd
from trulens.apps.app import TruApp
from trulens.core.feedback.custom_metric import EvaluationConfig
from trulens.core.feedback.custom_metric import custom_metric
from trulens.core.feedback.selector import Selector
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes

# Load environment variables
load_dotenv()

# Enable OTEL tracing - MUST be set before importing TruLens
os.environ["TRULENS_OTEL_TRACING"] = "1"


# Define a mock RAG app.


class TestApp:
    @instrument(
        span_type=SpanAttributes.SpanType.RECORD_ROOT,
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        },
    )
    def query(self, query: str) -> str:
        retrieved_contexts = self.get_contexts(query)
        return self.generation(query, retrieved_contexts)

    @instrument(
        span_type=SpanAttributes.SpanType.RETRIEVAL,
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        },
    )
    def get_contexts(self, query: str) -> list[str]:
        return ["context 1", "context 2", "context 3", "context 4"]

    @instrument(
        span_type=SpanAttributes.SpanType.GENERATION,
    )
    def generation(self, query: str, contexts: list[str]) -> str:
        if len(contexts) == 0:
            return "Sorry, I couldn't find an answer to your question."
        return "Answer to your question."

## Step 1: Define Custom Metrics

Let's create some custom metrics using the `@custom_metric` decorator. These metrics will evaluate the quality of text-to-SQL generation.


In [None]:
# Define custom metrics using the decorator
from typing import Tuple


@custom_metric(metric_type="custom_accuracy", higher_is_better=True)
def custom_accuracy(query: str) -> float:
    """
    A custom implementation of some arbitrary accuracy metric. Here we just check if the length of the query is greater than 50.
    This is a simplified example - in practice you'd have more sophisticated logic.
    """
    if len(query) > 50:
        return 1.0

    return 0.0


@custom_metric(metric_type="random_metric", higher_is_better=True)
def custom_random_metric(some_str: str) -> Tuple[float, str]:
    """
    A custom implementation of another arbitrary accuracy metric.
    """
    import random

    score = random.random()
    return score, some_str

## Step 2: Create EvaluationConfig Objects

Now let's create explicit evaluation configurations that define how to map OTEL span attributes to metric function parameters. This is where the **span-to-argument mapping** happens!


In [None]:
# Method 1: Create EvaluationConfig using fluent interface
eval_config_1 = EvaluationConfig(
    name="evaluation_config_1",
    metric_type="custom_accuracy",
    computation_type="client",
    description="Evaluates some custom accuracy",
).add_selector(
    "query",  # Parameter name in the metric function
    Selector(
        function_attribute="query",  # Extract from 'query' parameter
        function_name="TestApp.query",  # From this specific function
    ),
)

# Method 2: Create EvaluationConfig from dictionary (matches original specification)
eval_config_dict = {
    "name": "evaluation_config_2",
    "metric_type": "custom_random_metric",
    "computation_type": "client",
    "description": "Evaluate with a random number generating metric",
    "selectors": {
        "some_str": Selector(
            function_attribute="return", function_name="TestApp.generation"
        )
    },
}

eval_config_2 = EvaluationConfig.from_dict(eval_config_dict)

print("EvaluationConfig objects created:")
print(f"1. {eval_config_1}")
print(f"2. {eval_config_2}")

# Show the span-to-argument mapping details
print("\\n=== Span-to-Argument Mapping ===")
print("eval_config_1 mapping:")
for param_name, selector in eval_config_1.selectors.items():
    print(
        f"  Parameter '{param_name}' ← {selector.function_name}.{selector.function_attribute}"
    )

print("\\eval_config_2 mapping:")
for param_name, selector in eval_config_2.selectors.items():
    print(
        f"  Parameter '{param_name}' ← {selector.function_name}.{selector.function_attribute}"
    )

In [None]:
# Create snowpark session.
import os

from snowflake.snowpark import Session
from trulens.connectors.snowflake import SnowflakeConnector

snowflake_connection_parameters = {
    "account": os.environ["SNOWFLAKE_ACCOUNT"],
    "user": os.environ["SNOWFLAKE_USER"],
    "password": os.environ["SNOWFLAKE_USER_PASSWORD"],
    "database": os.environ["SNOWFLAKE_DATABASE"],
    "schema": os.environ["SNOWFLAKE_SCHEMA"],
    "role": os.environ["SNOWFLAKE_ROLE"],
    "warehouse": os.environ["SNOWFLAKE_WAREHOUSE"],
}
snowpark_session = Session.builder.configs(
    snowflake_connection_parameters
).create()

# TruSession is no longer required as long as snowflake connector exists
sf_connector = SnowflakeConnector(snowpark_session=snowpark_session)

In [None]:
# Create TruLens instrumented app from custom app.


# APP_NAME = f"{os.getlogin()} custom metrics client-side flow {uuid.uuid4()}"
APP_NAME = "dhuang custom metrics client-side flow b055dcbe-d0ae-491d-9694-200c7622f1ae"
APP_VERSION = "V1"

test_app = TestApp()
tru_app = TruApp(
    test_app, app_name=APP_NAME, app_version=APP_VERSION, connector=sf_connector
)

In [None]:
test_data_entries = [
    {
        "query": "What wave of coffee culture is Starbucks seen to represent in the United States?"
    },
    {"query": "What is the largest city in New Zealand?"},
    {
        "query": "What is the main campus of the University of Washington located?"
    },
    {"query": "What is the capital city of New Zealand?"},
    {
        "query": "What is the largest institution of higher education in Washington state?"
    },
    {
        "query": "What wave of coffee culture is Starbucks seen to represent in the New Zealand?"
    },
    {"query": "What year was Washington State University founded?"},
    {
        "query": "Which university has a strong focus on veterinary medicine and agriculture?"
    },
    {"query": "Which landmark in Seattle was built for the 1962 World’s Fair?"},
    {"query": "How many campuses does the University of Washington have?"},
    {"query": "Where is Starbucks headquartered?"},
]


user_input_data_df = pd.DataFrame(test_data_entries)

In [None]:
# Method 1: Add metric using EvaluationConfig (recommended approach)
print("\\n=== Registering Metrics with EvaluationConfig ===")

tru_app.add_metric_with_evaluation_config(
    metric=custom_accuracy, evaluation_config=eval_config_1
)
print(f"✅ Registered: {eval_config_1.name}")

tru_app.add_metric_with_evaluation_config(
    metric=custom_random_metric, evaluation_config=eval_config_2
)
print(f"✅ Registered: {eval_config_2.name}")

print("\\nMetrics registered using EvaluationConfig approach!")

## Step 3: Understanding the Evaluation Config Mapping

Let's examine how the evaluation configs map OTEL span attributes to metric function parameters. This shows exactly how the **span-to-argument mapping** works!


In [None]:
# from trulens.core.run import Run
# from trulens.core.run import RunConfig

# run_name = f"test_run_0623_{uuid.uuid4()}"

# run_config = RunConfig(
#     run_name=run_name,
#     dataset_name="dummy_test_rag_set",
#     source_type="DATAFRAME",
#     dataset_spec={"RECORD_ROOT.INPUT": "query"},
# )  # type: ignore

# run: Run = tru_app.add_run(run_config=run_config)

In [None]:
run_name = "test_run_0623_dee81f62-1b55-4c9e-9eb9-f3d1fe984b14"
run = tru_app.get_run(run_name)

In [None]:
# run.start(input_df=user_input_data_df)

In [None]:
run.get_status()

In [None]:
run.compute_metrics([
    # "answer_relevance",
    "custom_accuracy",
    "custom_random_metric",
])