# Evaluate gen AI apps with Snowflake Cortex AI and TruLens
This notebook demonstrates how AI Observability in Snowflake Cortex AI helps quantitatively measure the performance of a RAG applications using  different LLMs, providing insights into application behavior and helping the user select the best model for their use case.

### Required Packages
* trulens-core (1.4.5 or above)
* trulens-connectors-snowflake (1.4.5 or above)
* trulens-providers-cortex (1.4.5 or above)
* snowflake.core (1.0.5 or above)


## Session Information
Fetches the current session information and the connection details for the Snowflake account. This connection details will be used to ingest application traces and trigger metric computation jobs.

In [None]:
from snowflake.snowpark.context import get_active_session

snowpark_session = get_active_session()

## Environment Variables

Sets the environment variables to use OpenTelemetry for generated traces. This step is mandatory to trace and evaluate the application.

In [None]:
import os

os.environ["TRULENS_OTEL_TRACING"] = "1"

In [None]:
from trulens.connectors.snowflake import SnowflakeConnector

sf_connector = SnowflakeConnector(snowpark_session=snowpark_session)

## RAG Application
Defines the RAG application with retrieval and generation steps. Here, instead of invoking application with LLM generation, we directly query and fetch existing data in Snowflake table, ingest them into event table and run evaluation metrics on the existing data. 

The example schema used is shown below
```
create table YOUR_TABLE_NAME (
    query_string VARCHAR,
    output_string VARCHAR, 
    contexts VARCHAR
    );
```

In [None]:
from trulens.core.otel.instrument import instrument
from trulens.otel.semconv.trace import SpanAttributes


class TestApp:
    def __init__(self, snowflake_table_name: str):
        self.snowflake_table_name = snowflake_table_name

    @instrument(
        span_type=SpanAttributes.SpanType.RECORD_ROOT,
        attributes={
            SpanAttributes.RECORD_ROOT.INPUT: "query",
            SpanAttributes.RECORD_ROOT.OUTPUT: "return",
        },
    )
    def query(self, query: str) -> str:
        retrieved_contexts = self.get_contexts(query)
        return self.generate_answer(query, retrieved_contexts)

    @instrument(
        span_type=SpanAttributes.SpanType.RETRIEVAL,
        attributes={
            SpanAttributes.RETRIEVAL.QUERY_TEXT: "query",
            SpanAttributes.RETRIEVAL.RETRIEVED_CONTEXTS: "return",
        },
    )
    def get_contexts(self, query: str) -> list[str]:
        # query the snowflake table with query and find the relevant retrieved contexts. the contexts column is a string in comma separated format. parse them into a list of strings.
        query_result = snowpark_session.sql(
            f"SELECT CONTEXTS FROM {self.snowflake_table_name} WHERE QUERY_STRING = '{query}'"
        ).collect()

        if not query_result:
            return []

        # Get contexts string from first row
        contexts_str = query_result[0]["CONTEXTS"]

        # Parse comma-separated string into list
        if contexts_str:
            contexts = [context.strip() for context in contexts_str.split(",")]
            return contexts

        return []

    @instrument(
        span_type=SpanAttributes.SpanType.GENERATION,
    )
    def generate_answer(self, query: str, contexts: list[str]) -> str:
        # Query snowflake table to get output string for the given query

        if len(contexts) == 0:
            return "Sorry, I couldn't find an answer to your question."
        query_result = snowpark_session.sql(
            f"SELECT OUTPUT_STRING FROM {self.snowflake_table_name} WHERE QUERY_STRING = '{query}'"
        ).collect()
        answer = query_result[0]["OUTPUT_STRING"] if query_result else None
        if answer:
            return answer
        else:
            return "Did not find an answer."

## App Registration
Registers the two app instances in Snowflake, creating EXTERNAL AGENT objects to represent the app instances in the Snowflake account and registers both the app instances as different versions of the application.

In [None]:
# Create TruLens instrumented app from custom app.

import uuid

from trulens.apps.app import TruApp

APP_NAME = "RAG evaluation run on existing data"
APP_VERSION = "V1"

test_app = TestApp(snowflake_table_name="YOUR_TABLE_NAME")

tru_app = TruApp(
    test_app,
    app_name=APP_NAME,
    app_version=APP_VERSION,
    connector=sf_connector,
    main_method=test_app.query,
)

## Add runs to agent

In [None]:
from trulens.core.run import Run
from trulens.core.run import RunConfig

run_name = f"test_virtual_run_{uuid.uuid4()}"

run_config = RunConfig(
    run_name=run_name,
    dataset_name="VIRTUAL_RUN_TEST",
    source_type="TABLE",
    dataset_spec={
        "RECORD_ROOT.INPUT": "QUERY_STRING",  # column name "QUERY_STRING" is case sensitive
    },
)

run: Run = tru_app.add_run(run_config=run_config)

In [None]:
run.start()

## Run Status Check
Checks the status of the runs for "INVOCATION_IN_PROGRESS". 

Note: Metric computation cannot be started until the invocation is in progress. Once the runs' status is changed to "INVOCATION_COMPLETED", metric computation can be triggered.

In [None]:
import time

while run.get_status() == "INVOCATION_IN_PROGRESS":
    time.sleep(1)

In [None]:
run.compute_metrics([
    "answer_relevance",
    "context_relevance",
    "groundedness",
])

## Compute Metrics

Computes the RAG triad metrics for both runs to measure the quality of response in the RAG application.

In [None]:
run.get_status()

## Evaluation Results

To view evaluation results:
* Login to [Snowsight](https://app.snowflake.com/).
* Navigate to **AI & ML** -> **Evaluations** from the left navigation menu.
* Select “RAG evaluation run on existing data” to view the runs, see detailed traces and compare runs.