# HPE AI Essentials Knowledge Base Lab

## 1. Introduction to Knowledge Base

**Knowledge Base** describes a system for building, deploying, and managing secure Retrieval-Augmented Generation (RAG) solutions with GPU-Optimized LLMs in HPE AI Essentials Software.

Knowledge Base is an integrated framework that enables you to quickly build and deploy Retrieval-Augmented Generation (RAG) solutions. A RAG solution retrieves relevant information from data sources and uses a generative model to produce accurate, textual responses to user queries.

It integrates **retrieval**, **embeddings**, and **generation** processes to reduce development time for Generative AI applications:

- **Retrieval**: Finding and accessing relevant information from the documents stored in the vector database.
- **Embeddings**: Numerically representing text data, capturing the meaning and context of words and sentences in a machine-understandable format.
- **Generation**: Creating new text based on input data and learned patterns.

By combining these three processes, Knowledge Base enables Large Language Models (LLMs) to generate more relevant responses referencing your own data from internal sources across S3, HPE Ezmeral Data Fabric, and HPE GreenLake for File Storage.
The result is tailored, context-aware responses for your specific use cases.

### Features and Functionality
Knowledge Base simplifies the creation and management of RAG solutions through a simple interface where you can:
- **Securely connect enterprise data sources** to RAG solutions to generate accurate, contextually aware responses for reduced risk of inaccuracies or hallucinations.
- **Manage the RAG workflow**, including data ingestion, retrieval, and automation.
- **Utilize advanced options** such as custom prompts and endpoint APIs for enhanced control and flexibility.
- **Generate access tokens** for secure user and client application access to RAG coordinator endpoints.
- **Experiment** with the built-in playground and session context management features.

## Knowledge Base Architecture

The following diagram shows the key components and workflow in HPE AI Essentials Software Knowledge Base:

![Knowledge Base Architecture](docs/Knowledge-base-architecture.png)

The following list describes the Knowledge Base components included in HPE AI Essentials Software:

1. **Private Data**: Users can connect private data sources or upload local files. The system processes these files into structured chunks for efficient indexing and retrieval.

2. **Embedding**: HPE AI Essentials Software integrates the GPU-Optimized model (e.g., NVIDIA Retrieval QA E5 Embedding v5) to perform semantic search. It converts text into high-dimensional numerical embeddings to capture meaning.

3. **Vector DB**: The data source is stored as vectors in a vector database (Weaviate). Each Knowledge Base corresponds to one Collection in the database.

4. **Input**: A user’s query (question, statement, or prompt).

5. **Search and Retrieval**: When a user submits a query, the system scans the Weaviate vector database to fetch the data that answers the user’s query.

6. **Contextual Data Prompt**: The system fetches the most relevant context to the user’s query which is then used to create a detailed, context-aware prompt for the LLM.

7. **LLM**: NVIDIA AI Enterprise provides prebuilt containers for large language models (LLMs) used to generate responses. These are GPU-optimized and scalable.

8. **Output**: The final response generated by the LLM after processing the contextual data prompt.

### Accessing and Managing
You can manage Knowledge Base configurations, statuses, and API tokens through the HPE AI Essentials Software UI. This lab focuses on the API interaction.

In [None]:
import requests
import json
import warnings
from requests.packages.urllib3.exceptions import InsecureRequestWarning

# Suppress insecure request warnings for lab environment
warnings.simplefilter('ignore', InsecureRequestWarning)

## 2. Configuration

In this section, we configure the connection to the RAG endpoint. 

**NOTE**: Please update the `AUTH_TOKEN` and `APP_NAME` with the values provided in your specific lab environment or Knowledge Base settings.

- **RAG_ENDPOINT**: The URL of the RAG Coordinator.
- **AUTH_TOKEN**: Your security token to authenticate with the service.
- **APP_NAME**: The specific Collection ID or Application Name for your Knowledge Base.
- **ENABLE_CITATIONS**: Toggle this to "true" if you want the response to include source citations.
- **SYSTEM_CONTEXT**: The prompt template used to guide the LLM's behavior and inject context.

### **How to Generate API Tokens**

To run this lab, you need APP name and tokens for the Knowledge Base. Follow these steps:

**1. Access Gen AI Studio**
Open the Generative AI Studio from your dashboard.
![Access Gen AI](docs/accessing_Gen%20AI.png)

**2. Navigate to Knowledge Base**
Click on the "Knowledge Base" option in the side menu.
![Model Endpoints](docs/kb-option.png)

**3. Select a knowledge base**
Locate and select the knowledge base you need (e.g., `llama3-8b 1`).
![Select Model](docs/select-kb.png)

**4. Copy the Metadata name and RAG endpoint**
On this screen, copy the value shown under Metadata name and paste it into the `APP_NAME` variable in your code.
Next, copy the RAG Coordinator Endpoint URL and paste it into the `RAG_ENDPOINT` variable, replacing the placeholder value.

![Copy Endpoint](docs/metadata.png)

**5. Generate Token**
Click the **"Generate Token"** button.
![Generate Button](docs/Generate_token.png)

**6. Copy the API Key**
A token will be generated. Copy this key immediately and paste it into the `AUTH_TOKEN` variable below.
![Copy Key](docs/copy_key.png)


In [None]:
# Endpoint Configuration
RAG_ENDPOINT = "YOUR_RAG_ENDPOINT_HERE"
RAG_API_PATH = "/v1/chat/completions"

# User Credentials
AUTH_TOKEN = "YOUR_AUTH_TOKEN_HERE"
APP_NAME = "YOUR_APP_NAME_HERE"
RAG_MODEL = "meta/llama3-8b-instruct"

# Toggle Citations
ENABLE_CITATIONS = "false" # Set to "true" to enable citations

# Output Language
output_language = "English"

# System Context for RAG
# This template defines how the LLM should behave and where context/history is injected.
SYSTEM_CONTEXT = """- You are an AI assistant who analyzes the input query and provides responses strictly based on the retrieved context and previous chat conversations.
- If no relevant context or prior chat conversations is available, do not respond and instead state: "I don't have enough information to answer the question."
- You must not rely on general knowledge or any data from your training. Use only the specific details from the given context and prior chat conversations.
- If no relevant context is retrieved, do not attempt to generate a response.
- Your answers must remain strictly within the boundaries of the available context and previous chat conversations.
- If the user sends a greeting or a polite remark, reply briefly in a friendly manner
- Do not include greetings in responses to non-greeting queries.
- Don't assume or hallucinate when responding
- Do not include the phrase 'Based on the provided context and previous chat conversation' in the responses.

Context:
{context} 

Chat_Conversations:
{chat_history}

Query:
{query}"""

## 3. Query Function

We will define a helper function `query_rag` that:
1. Constructs the HTTP headers with Authentication and Application Name.
2. Builds the JSON payload with the `system` message and `user` query.
3. Sends the POST request to the RAG Coordinator.
4. Parses and returns the response, including extracting citations if enabled.

In [None]:
def query_rag(text):
    """Query RAG endpoint with language-specific instruction."""
    url = f"{RAG_ENDPOINT}{RAG_API_PATH}"
    
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {AUTH_TOKEN}",
        "X-SA-NAME": APP_NAME,
        "X-ENABLE-CITATIONS": ENABLE_CITATIONS
    }
    
    # Payload for Chat Completion
    payload = {
        "model": RAG_MODEL,
        "messages": [
            {"role": "system", "content": SYSTEM_CONTEXT},
            {"role": "user", "content": text},
        ],
        "max_tokens": 2048,
        "temperature": 0,
        "frequency_penalty": 0,
        "presence_penalty": 1,
        "top_p": 0.5
    }
    
    print(f"Querying Knowledge Base... (Citations: {ENABLE_CITATIONS})")
    
    try:
        response = requests.post(url, headers=headers, json=payload, verify=False) 
        response.raise_for_status()
        result = response.json()
        
        # Handle Citations if enabled
        if "metadata" in result and "citations" in result["metadata"] and ENABLE_CITATIONS.lower() == "true":
            print("\n--- Citations ---")
            for citation in result["metadata"]["citations"]:
                doc_name = citation.get("doc_id", "Unknown Document")
                doc_url = citation.get("doc_url", "No URL provided")
                print(f"Document: {doc_name} ({doc_url})")
                
                if "chunks" in citation:
                    for idx, chunk in enumerate(citation["chunks"]):
                        snippet = chunk.get("snippet", "").strip()
                        score = chunk.get("score", "N/A")
                        print(f"  - Chunk {idx+1} (Score: {score}): {snippet[:150]}...")
            print("-----------------\n")
        
        # Handle Chat Completion Response
        if "choices" in result and len(result["choices"]) > 0:
            answer = result["choices"][0]["message"]["content"]
            return answer
        else:
            return f"Unexpected response format: {str(result)}"
            
    except Exception as e:
        print(f"RAG Error: {e}")
        return "I'm sorry, I couldn't connect to the knowledge base."

## 4. Interactive Lab

Now, test the Knowledge Base by entering queries below. 
Ensure your `AUTH_TOKEN` and `APP_NAME` are valid before running this.

In [None]:
print("--- HPE AI Essentials Knowledge Base Lab ---")
print("Type 'exit', 'quit', or 'q' to stop the session.\n")

while True:
    user_input = input("Enter your query: ")
    
    if user_input.lower() in ['exit', 'quit', 'q']:
        print("Exiting session.")
        break
    
    if not user_input.strip():
        continue
        
    # Get Response
    answer = query_rag(user_input)
    
    print(f"\nAI Response:\n{answer}\n")
    print("-" * 50 + "\n")