# Evaluate gen AI apps with Snowflake Cortex AI and TruLens
This notebook demonstrates how AI Observability in Snowflake Cortex AI helps quantitatively measure the performance of a RAG applications using  different LLMs, providing insights into application behavior and helping the user select the best model for their use case.

### Required Packages
* trulens-core (1.4.5 or above)
* trulens-connectors-snowflake (1.4.5 or above)
* trulens-providers-cortex (1.4.5 or above)
* snowflake.core (1.0.5 or above)


## Session Information
Fetches the current session information and the connection details for the Snowflake account. This connection details will be used to ingest application traces and trigger metric computation jobs.

In [None]:
import os

os.environ["SNOWFLAKE_ACCOUNT"] = "..."
os.environ["SNOWFLAKE_USER"] = "..."
os.environ["SNOWFLAKE_USER_PASSWORD"] = "..."
os.environ["SNOWFLAKE_DATABASE"] = "..."
os.environ["SNOWFLAKE_SCHEMA"] = "..."
os.environ["SNOWFLAKE_WAREHOUSE"] = "..."
os.environ["SNOWFLAKE_ROLE"] = "..."

In [None]:
from snowflake.snowpark import Session
from trulens.connectors.snowflake import SnowflakeConnector

snowflake_connection_parameters = {
    "account": os.environ["SNOWFLAKE_ACCOUNT"],
    "user": os.environ["SNOWFLAKE_USER"],
    "password": os.environ["SNOWFLAKE_USER_PASSWORD"],
    "database": os.environ["SNOWFLAKE_DATABASE"],
    "schema": os.environ["SNOWFLAKE_SCHEMA"],
    "role": os.environ["SNOWFLAKE_ROLE"],
    "warehouse": os.environ["SNOWFLAKE_WAREHOUSE"],
}
snowpark_session = Session.builder.configs(
    snowflake_connection_parameters
).create()

# TruSession is no longer required as long as snowflake connector exists
sf_connector = SnowflakeConnector(snowpark_session=snowpark_session)

## Virtual Run - New Feature!
With the new virtual run feature, you can now ingest existing data directly into the Event Table without creating a dummy app. This approach is much cleaner and avoids the awkward pattern of creating fake app methods.

The example schema used is shown below:
```sql
create table YOUR_TABLE_NAME (
    query_string VARCHAR,
    output_string VARCHAR, 
    contexts VARCHAR
);
```

### Two approaches available:

**1. New Virtual Run Approach (Recommended):**
- No need to create a dummy app
- Directly ingest existing data using `run.start(virtual=True)`
- Much cleaner and more intuitive

**2. Legacy Approach (shown below for comparison):**
- Requires creating a TestApp with dummy methods
- More verbose and awkward

## Option 1: Virtual Run (Recommended)

This approach uses a minimal placeholder app class instead of the complex TestApp with actual data fetching logic. The key benefits:

- **Minimal boilerplate**: Just a simple placeholder class with one method
- **No data fetching logic**: The method is never actually called
- **Same TruApp flow**: Uses the familiar `tru_app.add_run()` pattern
- **Virtual execution**: `run.start(virtual=True)` creates spans from existing data


In [None]:
# Virtual Run approach - much cleaner!
import uuid

from trulens.apps.app import TruApp
from trulens.core.run import RunConfig

APP_NAME = "RAG evaluation run on existing data"
APP_VERSION = "V1"


# Create a minimal placeholder app for virtual runs
class VirtualApp:
    """Minimal placeholder app for virtual runs - no real methods needed"""

    def virtual_query(self, query: str) -> str:
        """Placeholder method - not actually called in virtual runs"""
        return "virtual_result"


# Create TruApp with placeholder - preserves existing add_run() flow
virtual_app = VirtualApp()
tru_app = TruApp(
    virtual_app,
    app_name=APP_NAME,
    app_version=APP_VERSION,
    connector=sf_connector,
    # main_method=virtual_app.virtual_query,  # Specify the main method explicitly
)

# Create run config with dataset specification
run_name = f"virtual_run_{uuid.uuid4()}"

run_config = RunConfig(
    run_name=run_name,
    dataset_name="YOUR_TABLE_NAME",  # Your Snowflake table name
    source_type="TABLE",
    dataset_spec={
        "record_root.input": "QUERY_STRING",  # Maps to input field
        "record_root.output": "OUTPUT_STRING",  # Maps to output field
        "retrieved_contexts": "CONTEXTS",  # Maps to contexts field (optional)
        # Add other fields as needed
    },
)

# Use the existing add_run() flow
virtual_run = tru_app.add_run(run_config=run_config)

print(f"Created virtual run: {run_name}")

In [None]:
# Start the virtual run - this will create OTEL spans from existing data
# The virtual=True flag tells the run to create spans from existing data
# instead of actually invoking the VirtualApp.virtual_query method
virtual_run.start(virtual=True)

print("Virtual run completed! Data has been ingested into Event Table.")

In [None]:
# Check virtual run status
import time

while virtual_run.get_status() == "INVOCATION_IN_PROGRESS":
    print("Waiting for ingestion to complete...")
    time.sleep(2)

print(f"Virtual run status: {virtual_run.get_status()}")

In [None]:
# Compute metrics for the virtual run
virtual_run.compute_metrics([
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

print("Metrics computation started for virtual run!")

## Option 2: Legacy Approach (for comparison)

This shows the old way of doing virtual runs, which required creating a complex TestApp with actual data fetching methods. Compare the complexity below with the simple VirtualApp above:

**Key differences:**
- **More complex**: TestApp has actual data fetching logic in each method
- **Awkward pattern**: Methods fetch data that already exists in the table  
- **More code**: Requires implementing retrieval, generation, and query methods
- **Still works**: This approach is still supported for backward compatibility


In [None]:
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes


class TestApp:
    def __init__(self, snowflake_table_name: str):
        self.snowflake_table_name = snowflake_table_name

    @instrument(
        span_type=SpanAttributes.SpanType.RECORD_ROOT,
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        },
    )
    def query(self, query: str) -> str:
        retrieved_contexts = self.get_contexts(query)
        return self.generate_answer(query, retrieved_contexts)

    @instrument(
        span_type=SpanAttributes.SpanType.RETRIEVAL,
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        },
    )
    def get_contexts(self, query: str) -> list[str]:
        # query the snowflake table with query and find the relevant retrieved contexts. the contexts column is a string in comma separated format. parse them into a list of strings.
        query_result = snowpark_session.sql(
            f"SELECT CONTEXTS FROM {self.snowflake_table_name} WHERE QUERY_STRING = '{query}'"
        ).collect()

        if not query_result:
            return []

        # Get contexts string from first row
        contexts_str = query_result[0]["CONTEXTS"]

        # Parse comma-separated string into list
        if contexts_str:
            contexts = [context.strip() for context in contexts_str.split(",")]
            return contexts

        return []

    @instrument(
        span_type=SpanAttributes.SpanType.GENERATION,
    )
    def generate_answer(self, query: str, contexts: list[str]) -> str:
        # Query snowflake table to get output string for the given query

        if len(contexts) == 0:
            return "Sorry, I couldn't find an answer to your question."
        query_result = snowpark_session.sql(
            f"SELECT OUTPUT_STRING FROM {self.snowflake_table_name} WHERE QUERY_STRING = '{query}'"
        ).collect()
        answer = query_result[0]["OUTPUT_STRING"] if query_result else None
        if answer:
            return answer
        else:
            return "Did not find an answer."

## App Registration
Registers the two app instances in Snowflake, creating EXTERNAL AGENT objects to represent the app instances in the Snowflake account and registers both the app instances as different versions of the application.

In [None]:
# Create TruLens instrumented app from custom app.

import uuid

from trulens.apps.app import TruApp

APP_NAME = "RAG evaluation run on existing data"
APP_VERSION = "V1"

test_app = TestApp(snowflake_table_name="YOUR_TABLE_NAME")

tru_app = TruApp(
    test_app,
    app_name=APP_NAME,
    app_version=APP_VERSION,
    connector=sf_connector,
    main_method=test_app.query,
)

## Add runs to agent

In [None]:
from trulens.core.run import Run
from trulens.core.run import RunConfig

run_name = f"test_virtual_run_{uuid.uuid4()}"

run_config = RunConfig(
    run_name=run_name,
    dataset_name="VIRTUAL_RUN_TEST",
    source_type="TABLE",
    dataset_spec={
        "RECORD_ROOT.INPUT": "QUERY_STRING",  # column name "QUERY_STRING" is case sensitive
    },
)

run: Run = tru_app.add_run(run_config=run_config)

In [None]:
run.start()

## Run Status Check
Checks the status of the runs for "INVOCATION_IN_PROGRESS". 

Note: Metric computation cannot be started until the invocation is in progress. Once the runs' status is changed to "INVOCATION_COMPLETED", metric computation can be triggered.

In [None]:
import time

while run.get_status() == "INVOCATION_IN_PROGRESS":
    time.sleep(1)

In [None]:
run.compute_metrics([
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

## Compute Metrics

Computes the RAG triad metrics for both runs to measure the quality of response in the RAG application.

In [None]:
run.get_status()

## Evaluation Results

To view evaluation results:
* Login to [Snowsight](https://app.snowflake.com/).
* Navigate to **AI & ML** -> **Evaluations** from the left navigation menu.
* Select “RAG evaluation run on existing data” to view the runs, see detailed traces and compare runs.