# Observability

In [None]:
import importlib

if not importlib.util.find_spec("class_utils"):
    !pip install -qqq git+https://github.com/xtreamsrl/genai-for-engineers-class

## What is Observability?
Application observability is the capability to understand the internal state of a system based on the external outputs it produces, like logs, metrics, and traces. It enables deep insights into application behavior, performance, and health, helping to answer not just *what* is happening but *why* it is happening. 

Application observability has been a hot topic for years now and a mature ecosystem of tools exists to support "normal" applications. For instance, the `opentelemetry` standard and tooling is supported by all the major cloud vendors and by all the most modern web frameworks.

#### The Three Pillars of Observability

1. **Logs**: Detailed, timestamped records of discrete events. Used for tracking the flow of execution and debugging specific issues.
2. **Metrics**: Quantitative data points, such as CPU usage, memory consumption, request rates, and error rates. Used for monitoring performance and system health.
3. **Traces**: Records of the path a request takes through a distributed system. Used for identifying performance bottlenecks and understanding system dependencies.

#### Benefits of Observability

- **Rapid Troubleshooting**: Quickly identify and resolve issues by understanding their root cause.
- **Performance Optimization**: Detect and address performance bottlenecks.
- **Proactive Monitoring**: Anticipate and mitigate issues before they impact users.
- **Improved Development Practices**: Gain immediate feedback on code changes.
- **Enhanced User Experience**: Ensure high availability and reliability.

#### Implementing Observability

1. **Instrumentation**: Use libraries and tools to collect logs, metrics, and traces (e.g., OpenTelemetry).
2. **Centralized Logging and Monitoring**: Aggregate data using platforms like ELK Stack, Prometheus, Grafana, and Jaeger.
3. **Contextual Data**: Include metadata (e.g., request IDs, user IDs) for better correlation and analysis.
4. **Alerting and Visualization**: Set up alerts and dashboards for real-time monitoring and visualization.
5. **Continuous Improvement**: Regularly review observability practices and refine instrumentation and monitoring strategies.

# Observability for the GenAI Era
Observability is tricky for GenAI applications due to their inherent complexity and dynamic nature. These applications rely on intricate machine learning models that are often opaque ("black boxes"), making it challenging to interpret their internal states and decision-making processes. The high dimensionality of input data and the stochastic nature of model training and inference further complicate the task of monitoring and debugging. Additionally, GenAI systems typically involve a multitude of interdependent components, such as data pipelines, model serving infrastructure, and distributed computing resources, which generate vast amounts of heterogeneous data. This makes it difficult to correlate logs, metrics, and traces across different layers of the stack, hindering effective root cause analysis and performance optimization.

Fortunately, in the last 2 years, some new tools have emerged to ease the task of observing GenAI applications.

# Setup: packages and environment variables

In [None]:
import os
from pprint import pprint

from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever

from class_utils.data import get_movie_dataset_as_documents
from class_utils.haystack_pipelines import (
    build_indexing_pipline,
    build_prompt_building_pipeline,
    build_openai_rag_pipeline,
)
from langfuse import Langfuse
from langfuse.openai import OpenAI

os.environ["OPENAI_API_KEY"] = ...
os.environ["TOKENIZERS_PARALLELISM"] = "true"
os.environ["LANGFUSE_SECRET_KEY"] = ...
os.environ["LANGFUSE_PUBLIC_KEY"] = ...
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"

Let's run our usual indexing pipeline without any modification. We're not interested in observing it.

In [None]:
documents = get_movie_dataset_as_documents(100)
document_store = QdrantDocumentStore(":memory:", embedding_dim=384)
indexing_pipeline = build_indexing_pipline(document_store)
indexing_pipeline.run({"doc_embedder": {"documents": documents}})

# Prompt Registry

Prompts are a fundamental component of code, and as such, they should be versioned accordingly.

However, there are scenarios where you may want to perform A/B testing on prompts over the air or update the prompt without deploying the entire application. An example of this approach can be seen in the Touring app: [Touring](https://touringapp.eu/).

A remote prompt registry offers a great solution for these needs.

A prompt registry functions similarly to a model registry, like the [MLFlow Registry](https://mlflow.org/docs/latest/model-registry.html), but it is specifically designed for prompts. In a prompt registry, you can store and compare different versions of the same prompt, revert to previous versions, and update the deployed prompt without altering the application’s code.

For both prompt registry and LLM observability, we will use [Langfuse](https://langfuse.com/). However, numerous similar tools are emerging. Notably, the [openllmetry](https://github.com/traceloop/openllmetry) standard is gaining popularity in this space.

By using a prompt registry, you can efficiently manage, compare, and deploy prompts, ensuring a streamlined and flexible approach to prompt versioning and observability.

In [None]:
langfuse = Langfuse()
template = langfuse.get_prompt("movie-buddy-rag", label="production").prompt
template

Great, we've sourced our prompt from the prompt registry. Now let's run the pipeline until the full prompt is created.

In [None]:
prompt_building_pipe = build_prompt_building_pipeline(
    QdrantEmbeddingRetriever(document_store), template
)

query = "What film talks about the atomic bomb?"
prompt_builder_output = prompt_building_pipe.run(
    {"embedder": {"text": query}, "prompt_builder": {"question": query}}
)
prompt = prompt_builder_output.get("prompt_builder").get("prompt")
prompt

Then we'll use the `openai` client to run the query to the LLM.
Haystack has a native integration with Langfuse, but it is a bit buggy at the moment.

Note how we import the `openai` client ...

In [None]:
openai_client = OpenAI()

chat_completion = openai_client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": prompt,
        }
    ],
    model="gpt-3.5-turbo",
)

chat_completion.choices[0].message.content

After runnig this notebook, check out your account on Langfuse and analyze the traces.

# Observability with Haystack
Altough some data are not logged correctly, let's try and instrument the full pipeline with Langfuse.
Pay attention to enable tracing with the proper environment variable. Check out the documentation: https://haystack.deepset.ai/integrations/langfuse

In [None]:
from haystack_integrations.components.connectors.langfuse import LangfuseConnector

os.environ["HAYSTACK_CONTENT_TRACING_ENABLED"] = "true"

rag_pipe = build_openai_rag_pipeline(QdrantEmbeddingRetriever(document_store), template)
rag_pipe.add_component("tracer", LangfuseConnector("GenAI Class Tracing Test"))
response = rag_pipe.run(
    {"embedder": {"text": query}, "prompt_builder": {"question": query}}
)
pprint(response)