# Traceability for Agentic AI 
With Weights & Biases (W&B) Weave

## Content

* [Overview](#Overview)
* [Software Components](#Software-components)
* [Key Functionality](#Key-functionality)
* [How it Works](#How-it-works)
* [Key Components](#Key-Components)
* [Prerequisites](#Prerequisites)
* [Deployment - hands on starts here](#Deployment)
* [Getting API Keys](#Getting-API-keys)
* [Docker Compose Check](#Docker-compose-check)
* [Clone the Repository & Set Up Environment](#Clone-the-Repository-&-Set-Up-Environment)
* [Build the Docker Containers](#Build-the-Docker-containers)
* [Ingest Data](#Ingest-Data)
* [Exposing the Interface for Testing](#Exposing-the-Interface-for-Testing)
* [Navigate to W&B to see your traces](#Navigate-to-W&B-to-see-your-traces)
* [Next Steps](#Next-Steps)


## Overview

This blueprint is showcasing an AI virtual assistant with NVIDIA NIM microservices (https://build.nvidia.com/nim) and Development with W&B Weave (https://wandb.ai/site/weave)

This blueprint is a reference solution for a text based virtual assistant. Companies are eager to enhance their customer service operations by integrating knowledge bases into AI assistants. Traditional approaches often fall short in delivering a combination of context-aware, secure, and real-time responses to complex customer queries. This results in longer resolution times, limited customer satisfaction, and potential data exposure risks. A centralized knowledge base that integrates seamlessly with internal applications and call center tools is vital to improving customer experience while ensuring data governance. The AI virtual assistant for customer service NVIDIA AI Blueprint, powered by NVIDIA NeMo Retriever™ and NVIDIA NIM™ microservices, along with retrieval-augmented generation (RAG), offers a streamlined solution for enhancing customer support. It implements context-aware, multi-turn conversations that feature general and personalized Q&A responses based on structured and unstructured data, such as order history and product details.

This notebook will provide you with insights to the key components and walk you through its deployment and architecture in a step-by-step fashion. Note that this walk through is specific for the Docker Compose deployment. If you visit the [code repository](https://github.com/wandb/ai-virtual-assistant), you will find additional information and other forms of deployment instructions (e.g. Helm chart deployment).

## Software Components

- NVIDIA NIM microservices
    - Response Generation (Inference)
        - NIM of meta/llama-3.1-70b-instruct
        - NIM of nvidia/nv-embedqa-e5-v5
        - NIM of nvidia/rerank-qa-mistral-4b
    - Synthetic Data Generation for reference
        - NIM of Nemotron4-340B
- Orchestrator Agent - LangGraph based
- Text Retrievers - LangChain
- Structured Data (CSV) Ingestion - Postgres Database
- Unstructured Data (PDF) Ingestion - Milvus Database (Vector GPU-optimized)
- W&B Weave - Observability & Iteration Platform

Docker Compose scripts are provided which spin up the microservices on a single node. When ready for a larger-scale deployment, you can use the included Helm charts to spin up the necessary microservices. You will use sample Jupyter notebooks with the JupyterLab service to interact with the code directly.

The Blueprint contains sample use-case data pertaining to retail product catalog and customer data with purchase history but Developers can build upon this blueprint, by customizing the RAG application to their specific use case. A sample customer service agent user interface and API-based analytic server for conversation summary and sentiment are also included.

## Key Functionality

- Personalized Responses: Handles structured and unstructured customer queries (e.g., order details, spending history).
- Multi-Turn Dialogue: Offers context-aware, seamless interactions across multiple questions.
- Custom Conversation Style: Adapts text responses to reflect corporate branding and tone.
- Sentiment Analysis: Analyzes real-time customer interactions to gauge sentiment and adjust responses.
- Multi-Session Support: Allows for multiple user sessions with conversation history and summaries.
- Data Privacy: Integrates with on-premises or cloud-hosted knowledge bases to protect sensitive data.

By integrating NVIDIA NIM and RAG, the system empowers developers to build customer support solutions that can provide faster and more accurate support while maintaining data privacy.

## How it works

This blueprint uses a combination of retrieval-augmented generation and large language models to deliver an intelligent, context-aware virtual assistant for customer service. It connects to both structured data (like customer profiles and order histories) and unstructured data (like product manuals, FAQs) so that it can find and present relevant information in real time.

The process works as follows:

- User Query: The customer asks a question in natural language.
- Data Retrieval: The system retrieves relevant data—such as support documents or order details—by embedding and searching through internal databases, product manuals, and FAQs.
- Contextual Reasoning: A large language model uses these retrieved details to generate a helpful, coherent, and contextually appropriate response.
- Additional Capabilities: Tools like sentiment analysis gauge the user’s satisfaction and conversation summaries help supervisors quickly review interactions.
- Continuous Improvement: Feedback from interactions is fed back into the system, refining the model’s accuracy and efficiency over time. The end result is a virtual assistant that can understand complex questions, find the right information, and provide personalized, human-like responses.

### Key Components

The detailed architecture consists of the following components:

**Sample Data** The blueprint comes with synthetic sample data representing a typical customer service function, including customer profiles, order histories (structured data), and technical product manuals (unstructured data). A notebook is provided to guide users on how to ingest both structured and unstructured data efficiently.

Structured Data: Includes customer profiles and order history Unstructured Data: Ingests product manuals, product catalogs, and FAQs

**AI Agent** This reference solution implements three sub-agents using the open-source LangGraph framework. These sub-agents address common customer service tasks for the included sample dataset. They rely on the Llama 3.1 model 70B and NVIDIA NIM microservices for generating responses, converting natural language into SQL queries, and assessing the sentiment of the conversation.

**Structured Data Retriever** Works in tandem with a Postgres database and Vanna.AI to fetch relevant data based on user queries.

**Unstructured Data Retriever** Processes unstructured data (e.g., PDFs, FAQs) by chunking it, creating embeddings using the NeMo Retriever embedding NIM, and storing it in Milvus for fast retrieval.

**Analytics and Admin Operations** To support operational requirements, the blueprint includes reference code for managing key administrative tasks:

- Storing conversation histories
- Generating conversation summaries
- Conducting sentiment analysis on customer interactions These features ensure that customer service teams can efficiently monitor and evaluate interactions for quality and performance.

**Data Flywheel** The blueprint includes a robust set of APIs, some of which are explicitly designed for feedback collection (identified by 'feedback' in their URLs). These APIs support the process of gathering data for continuous model improvement, forming a feedback loop or 'data flywheel.' While this process enables refinement of the model's performance over time to improve accuracy and cost-effectiveness, it is important to note that they do not directly perform the model fine-tuning itself.

**Summary** In summary, this NVIDIA AI Blueprint offers a comprehensive solution for building intelligent, generative AI-powered virtual assistants for customer service, leveraging structured and unstructured data to deliver personalized and efficient support. It includes all necessary tools and guidance to deploy, monitor, and continually improve the solution in real-world environments.

![Blueprint Diagram](https://github.com/wandb/ai-virtual-assistant/raw/main/docs/imgs/weights-biases-architecture-diagram.png)

## Prerequisites

### Docker compose

#### System requirements

Ubuntu 20.04 or 22.04 based machine, with sudo privileges

Install software requirements
- Install Docker Engine and Docker Compose. Refer to the instructions for Ubuntu.
- Ensure the Docker Compose plugin version is 2.29.1 or higher.
- Run docker compose version to confirm.
- Refer to Install the Compose plugin in the Docker documentation for more information.
- To configure Docker for GPU-accelerated containers, install the NVIDIA Container Toolkit.
- Install git.

By default the provided configurations use GPU optimized databases such as Milvus.


# Deployment

## Getting API Keys - Very Important

To run the pipeline you need to obtain an API key from NVIDIA and W&B. These will be needed in a later step to Set up the environment file.

- Required API Keys: These APIs are required by the pipeline to execute LLM queries.

- NVIDIA API Catalog
  1. Navigate to **[NVIDIA API Catalog](https://build.nvidia.com/explore/discover)**.
  2. Select any model, such as llama-3.3-70b-instruct.
  3. On the right panel above the sample code snippet, click on "Get API Key". This will prompt you to log in if you have not already.
     
- W&B API Key
  1. Navigate to **[W&B](https://app.wandb.ai/login?signup=true)** and create an account / log in.
  2. Once you have created an account, navigate to **[the API Key page](https://wandb.ai/authorize)**.
  3. Use this key to set `WANDB_API_KEY`.

NOTE: The API key starts with nvapi- and ends with a 32-character string. You can also generate an API key from the user settings page in NGC (https://ngc.nvidia.com/).


Export API Keys

In [None]:
import os

NVIDIA_API_KEY = input("Please enter your NVIDIA API key (nvapi-): ")
WANDB_API_KEY = input("Please enter your W&B API key: ")
NGC_API_KEY=NVIDIA_API_KEY
os.environ["NVIDIA_API_KEY"] = NVIDIA_API_KEY
os.environ["NGC_CLI_API_KEY"] = NGC_API_KEY
os.environ["NGC_API_KEY"] = NGC_API_KEY
os.environ["WANDB_API_KEY"] = WANDB_API_KEY
os.environ['WANDB_PROJECT'] = 'nv-ai-virtual-assistant' #Default, change if needed to store traces in another project

## Docker Compose check
Ensure the Docker Compose plugin version is 2.29.1 or higher.

In [None]:
# Check certain versions and packages installed
!docker compose version

## Clone the Repository & Set Up Environment

In [None]:
#  Clone the Repository
!git clone https://github.com/wandb/ai-virtual-assistant

The purpose of this code snippet below is to ensure that the notebook is operating within a directory named "ai-virtual-assistant". If it's not, it changes to that directory.

In [None]:
import os

current_path = os.getcwd()
last_part = os.path.basename(current_path)

if os.path.basename(os.getcwd()) != "ai-virtual-assistant":
    os.chdir("ai-virtual-assistant")

os.getcwd()

We login into the NGC catalogue.

In [None]:
!docker login nvcr.io -u '$oauthtoken' -p $NGC_API_KEY

## Build the Docker containers

We are launching the containers by using the following command:

In [None]:
%%bash
docker compose -f deploy/compose/docker-compose.yaml up -d

In [None]:
%%bash
## Ensure the containers are spun up and look healthy
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"

## Download data

Download the manuals into data/manuals_pdf folder Run this script to download the manuals listed in the specified txt file

In [None]:
%%bash
# Ingest data - download data
./data/download.sh ./data/list_manuals.txt

## Ingest data

Open the jupyter notebook  "./ai-virtual-assistant/notebooks/ingest_data.ipynb" and run through the cells (Shift + Enter) to ingest the structured and unstructured data types.
**NOTE:** The first cell in the ingest_data.ipynb requires you to input the proper IP address of the localhost. If the machine is spun up with a default container on Brev, this ought to be the default Docker IP: 172.17.0.1

## Exposing the Interface for Testing

The blueprint comes equiped with a basic UI for testing the deployment. This interface is served at port 3001. In order to expose the port and try out the interaction, you need to follow the steps below.

First, navigate back to the created Launchable instance page and click on the Access menu.


![Access Menu](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-cli-install.png)


Scroll down until you find "Using Tunnels" section and click on Share a Service button.


![Using Tunnels](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-tunnels.png)


Enter the port 3001, as that is where the UI service endpoint is. Confirm with Done. Then click on Edit Access and make the port public:


![Share Access](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/brev-share-access.png)


Past this point, by clicking on the link, the UI should appear in your browser and you are free to interact with the assistant and to ask him about the data that was ingested.


![AI Virtual Assistant Interface](https://github.com/NVIDIA-AI-Blueprints/ai-virtual-assistant/raw/main/docs/imgs/ai-virtual-assistant-interface.png)

## Navigate to W&B to see your traces

![Traces](https://github.com/wandb/ai-virtual-assistant/raw/main/docs/imgs/wandb-traces.png)

While interacting with the blueprint, you will be able to see the traces associated with the responses generated by the assistant in W&B [Weave](https://wandb.ai/). By default, these get stored under your default team and project `nv-ai-virtual-assistant`. If you'd like to modify this, you can change the environment variable `WANDB_PROJECT` with the syntax `{team}/{project}` such as `my-team/nv-ai-assistant`. 

You can further start evaluating your application to start iterating on it by following W&B's [quickstart](https://weave-docs.wandb.ai/tutorial-eval) on evaluations and using the virtual assistant's traces.

![Weave Evals Diagram](https://github.com/wandb/ai-virtual-assistant/raw/main/docs/imgs/weave-evals.png)

You can further iterate on your application via W&B Weave Playgrounds, allowing you to replay and tune your backing LLM to answer the way your users would prefer.

![Weave Playground](https://github.com/wandb/ai-virtual-assistant/raw/main/docs/imgs/weave-playground.png)


## Next Steps

To run this project in your own environment. Visit our [GitHub repository README](https://github.com/wandb/ai-virtual-assistant) for:
   - Detailed setup instructions
   - Configuration options
   - Advanced customization
   - Troubleshooting guides

The README contains all the necessary information to get the Traceability for Agentic AI with Weights & Biases (W&B) Weave running in your environment, including:
- System requirements
- Environment setup
- API key configuration
- Deployment options
- Troubleshooting 