# Snowflake Observability Quickstart

In this quickstart, we'll show how to build a RAG with the full snowflake stack including Cortex LLM Functions, Cortex Search, and TruLens observability.

In addition, we'll show how to run TruLens feedback functions with Cortex as the backend, and how to log TruLens traces and evaluation metrics to a Snowflake table.

## Setup

First, we'll install the packages needed

In [None]:
# pip install snowflake-snowpark-python
# pip install notebook
# pip install snowflake-ml-python
# pip install trulens-eval
# pip install snowflake-sqlalchemy

Then we can load our credentials and set our Snowflake connection

In [None]:
from dotenv import load_dotenv

load_dotenv()

# import necessary packages
from snowflake.snowpark.session import Session


import os
from dotenv import load_dotenv

load_dotenv()

connection_details = {
    'account':  os.environ["SNOWFLAKE_ACCOUNT"],
    'user': os.environ["SNOWFLAKE_USERNAME"],
    'password': os.environ["SNOWFLAKE_PASSWORD"],
    'role': os.environ["SNOWFLAKE_ROLE"],
    'database': os.environ["SNOWFLAKE_DATABASE"],
    'schema': os.environ["SNOWFLAKE_SCHEMA"],
    'warehouse': os.environ["SNOWFLAKE_WAREHOUSE"]
}

session = Session.builder.configs(connection_details).create()

## Using Cortex Complete

With the session set, we have what need to call a Snowflake Cortex LLM:

In [None]:
from snowflake.cortex import Complete

text = """
    The Snowflake company was co-founded by Thierry Cruanes, Marcin Zukowski,
    and Benoit Dageville in 2012 and is headquartered in Bozeman, Montana.
"""

print(Complete("llama2-70b-chat", "how do snowflakes get their unique patterns?"))

## Cortex Search

Next, we'll turn to the retrieval component of our RAG and set up Cortex Search.

This requires three steps:

1. Read and preprocess unstructured documents.
2. Embed the cleaned documents with Arctic Embed.
3. Call the Cortex search service.

### Read and preprocess unstructured documents

For this example, we want to load Cortex Search with documentation from Github about a popular open-source library, Streamlit. To do so, we'll use a GitHub data loader available from LlamaHub.

Here we'll also expend some effort to clean up the text so we can get better search results.

In [None]:
import nest_asyncio
nest_asyncio.apply()
from llama_index.readers.github import GithubRepositoryReader, GithubClient

github_token = os.environ["GITHUB_TOKEN"]
client = github_client = GithubClient(github_token=github_token, verbose=False)

reader = GithubRepositoryReader(
    github_client=github_client,
    owner="streamlit",
    repo="docs",
    use_parser=False,
    verbose=True,
    filter_directories=(
        ["content"],
        GithubRepositoryReader.FilterType.INCLUDE,
    ),
    filter_file_extensions=(
        [".md"],
        GithubRepositoryReader.FilterType.INCLUDE,
    )
)

documents = reader.load_data(branch="main")

import re

def clean_up_text(content: str) -> str:
    """
    Remove unwanted characters and patterns in text input.

    :param content: Text input.
    
    :return: Cleaned version of original text input.
    """

    # Fix hyphenated words broken by newline
    content = re.sub(r'(\w+)-\n(\w+)', r'\1\2', content)

    unwanted_patterns = ['---\nvisible: false','---', '#','slug:']
    for pattern in unwanted_patterns:
        content = re.sub(pattern, "", content)

    # Remove all slugs starting with a \ and stopping at the first space
    content = re.sub(r'\\slug: [^\s]*', '', content)

    # normalize whitespace
    content = re.sub(r'\s+', ' ', content)
    return content

cleaned_documents = []

for d in documents:
    cleaned_text = clean_up_text(d.text)
    d.text = cleaned_text
    cleaned_documents.append(d)

### Embed the preprocessed documents

We'll use Snowflake's Arctic Embed model available from HuggingFace to embed the documents. We'll also use Llama-Index's `SemanticSplitterNodeParser` for processing.

In [None]:

from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core.node_parser import SemanticSplitterNodeParser

embed_model = HuggingFaceEmbedding("Snowflake/snowflake-arctic-embed-m")

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=85, embed_model=embed_model
)    

With the embed model and splitter, we can execute them in an ingestion pipeline

In [None]:
from llama_index.core.ingestion import IngestionPipeline

cortex_search_pipeline = IngestionPipeline(
    transformations=[
        splitter,
    ],
)

results = cortex_search_pipeline.run(show_progress=True, documents=cleaned_documents)

import numpy as np

print(f"Roughly the proportion of chunks that are bigger than 512 tokens (approx 385 English words): {np.mean([len(curr.text.split()) > 385 for curr in res])}")

### Load data to Cortex Search

Now that we've embedded our documents, we're ready to load them to Cortex Search.

Here we can use the same connection details as we set up for Cortex Complete.

In [None]:
import os
import snowflake.connector
from tqdm.auto import tqdm

conn = snowflake.connector.connect(
    user=connection_details["user"],
    password=connection_details["password"],
    account=connection_details["account"],
    warehouse=connection_details["warehouse"],
    database=connection_details["database"],
    schema=connection_details["schema"]
)

conn.cursor().execute("CREATE OR REPLACE TABLE streamlit_docs(doc_text VARCHAR)")
for curr in tqdm(result):
    conn.cursor().execute("INSERT INTO streamlit_docs VALUES (%s)", curr.text)

### Call the Cortex Search Service

Here we'll create a CortexSearchRetreiver class to connect to our cortex search service and add the `retrieve` method that we can leverage for calling it.

In [None]:
import os
from snowflake.core import Root

class CortexSearchRetriever:

    def __init__(self, session = session, limit_to_retrieve: int = 4):
        self.session = session
        self._limit_to_retrieve = limit_to_retrieve
    
    def retrieve(self, query: str):
        root = Root(self.session)
        cortex_search_service = root.databases[
                os.environ["SNOWFLAKE_DATABASE"]].schemas[
                    os.environ["SNOWFLAKE_SCHEMA"]].cortex_search_services[
                        os.environ["SNOWFLAKE_CORTEX_SEARCH_SERVICE"]]
        resp = cortex_search_service.search(
                query=query,
                columns=["doc_text"],
                limit=self._limit_to_retrieve,
            )
        if resp.results:
            return [curr["doc_text"] for curr in resp.results]

In [None]:
retriever = CortexSearchRetriever(session=session, limit_to_retrieve=4)

retrieved_context = retriever.retrieve(query="How do I launch a streamlit app?")

len(retrieved_context)

## Create a RAG with built-in observability

Now that we've set up the components we need from Snowflake Cortex, we can build our RAG.

We'll do this by creating a custom python class with each the methods we need. We'll also add TruLens instrumentation with the `@instrument` decorator to our app.

The first thing we need to do however, is to set the database connection where we'll log the traces and evaluation results from our application. This way we have a stored record that we can use to understand the app's performance. This is done when initializing `Tru`.

In [None]:
from trulens_eval import Tru


db_url = "snowflake://{user}:{password}@{account}/{dbname}/{schema}?warehouse={warehouse}&role={role}".format(
    user=os.environ["SNOWFLAKE_USERNAME"],
    account=os.environ["SNOWFLAKE_ACCOUNT"],
    password=os.environ["SNOWFLAKE_PASSWORD"],
    dbname=os.environ["SNOWFLAKE_DATABASE"],
    schema=os.environ["SNOWFLAKE_SCHEMA"],
    warehouse=os.environ["SNOWFLAKE_WAREHOUSE"],
    role=os.environ["SNOWFLAKE_ROLE"],
)

tru = Tru(database_url=db_url)

In [None]:
session.close()

In [None]:
new_session = Session.builder.configs(connection_details).create()  

Now we can construct the RAG.

In [None]:
from trulens_eval.tru_custom_app import instrument

class RAG_from_scratch:

    def __init__(self):
        self.retriever = CortexSearchRetriever(session=new_session, limit_to_retrieve=4)
    @instrument
    def retrieve(self, query: str) -> list:
        """
        Retrieve relevant text from vector store.
        """
        results = self.retriever.retrieve(query)

    @instrument
    def generate_completion(self, query: str, context_str: list) -> str:
        """
        Generate answer from context.
        """
        completion = Complete("mistral-large",query)
        return completion

    @instrument
    def query(self, query: str) -> str:
        context_str = self.retrieve(query)
        completion = self.generate_completion(query, context_str)
        return completion

rag = RAG_from_scratch()

In [None]:
rag.query("how do I launch a streamlit app?")

In [None]:
from trulens_eval.feedback.provider.cortex import Cortex
from trulens_eval.feedback import Feedback
from trulens_eval import Select
import numpy as np

provider = Cortex("mistral-large")

f_groundedness = Feedback(
    provider.groundedness_measure_with_cot_reasons, name="Groundedness"
).on(Select.RecordCalls.retrieve_context.rets[:].collect()).on_output()

f_context_relevance = Feedback(
    provider.context_relevance,
    name="Context Relevance"
).on_input().on(
    Select.RecordCalls.retrieve_context.rets[:]
).aggregate(np.mean)

f_answer_relevance = Feedback(
    provider.relevance,
    name="Answer Relevance"
).on_input().on_output().aggregate(np.mean)

feedbacks = [f_context_relevance,
            f_answer_relevance,
            f_groundedness,
        ]