# **6.0** ‎ Querying the LLM with RAG

### Purpose of This Notebook

This notebook brings together all the previous components to form a complete **Retrieval-Augmented Generation (RAG) chatbot pipeline**. \
It demonstrates how to take a user query, retrieve relevant information from a vector database, optionally rerank, and generate a final grounded response using a LLM.

This is the final stage of the RAG pipeline where we would want to see our municipal chatbot becomes fully functional for deployment use — \
capable of generating **context-aware**, **domain-specific** responses by combining LLM generation with trusted municipal documents.

---

### What This Notebook Covers

- Accepting a user query as input
- Retrieving relevant document chunks from the indexed vector database (Chroma)
- Applying **Flashrank reranking** to improve relevance of results
- Injecting the top-ranked context into a prompt template
- Querying a **local or API-based LLM** (e.g., via Ollama or OpenAI)
- Generating a final, human-readable response grounded in retrieved context

---

### Why This Matters

Traditional LLMs often hallucinate or make up facts — especially in niche domains like municipal services. \
By grounding the LLM's response in documents retrieved from a trusted knowledge base, RAG improves both **accuracy** and **trustworthiness**. \
It ensures the chatbot answers queries with **real information**, not just guesses.

---

### Assumptions & Dependencies

This notebook assumes you have already:
- Indexed your documents in a vector store (Chroma)
- Set up an embedding model and reranker (e.g., SentenceTransformer + Flashrank)
- Integrated and tested your LLM (via LangChain, Ollama, or OpenAI)

The output of this notebook is a working pipeline that can be wrapped into an API, chatbot interface, or web app.


### **6.0.1** ‎ ‎ Load ChromaDB Collection and Embedding Model

In [2]:
!pip install --upgrade --quiet chromadb sentence_transformers

  You can safely remove it manually.


# **6.1** ‎ Setting up the Pipeline

### **6.1.1** ‎ ‎ Preparing the Voice Module

In [None]:
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from config import MODEL_PATHS
import librosa
import torch

def transcribe(audio_path):
    processor = WhisperProcessor.from_pretrained("openai/whisper-tiny", task="transcribe")
    model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")

    model.generation_config.forced_decoder_ids = None

    # Use GPU if available
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model.to(device)

    # Load and preprocess audio
    audio_array, sampling_rate = librosa.load(audio_path, sr=16000)  # Whisper expects 16kHz

    # Prepare input features
    inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
    input_features = inputs.input_features.to(device)

    # Generate prediction
    predicted_ids = model.generate(inputs["input_features"])

    # Decode transcription
    transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
    
    return transcription


In [1]:
import chromadb
from sentence_transformers import SentenceTransformer

# Load from persisted ChromaDB directory
chroma_client = chromadb.PersistentClient(path="../vector_stores/chroma_store_textonly")
collection = chroma_client.get_collection(name="municipal_issues")

# Load the embedding model (same one used during ingestion)
embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

In [10]:
from langchain.prompts import PromptTemplate
from langchain_ollama.llms import OllamaLLM
from sentence_transformers import SentenceTransformer
    
import chromadb
import textwrap
import numpy as np
import re

**Load ChromaDB Collection and Embedding Model**

**Define Search Function**

In [None]:
def search_similar_issues(query, k=5, metadata_filter=None):
    embedding = embedding_model.encode(query).tolist()

    # Add filter condition if needed
    query_kwargs = {
        "query_embeddings": [embedding],
        "n_results": k,
    }
    if metadata_filter:
        query_kwargs["where"] = metadata_filter

    results = collection.query(**query_kwargs)

    documents = results["documents"][0]
    metadatas = results["metadatas"][0]
    return documents, metadatas                     


**Define a Context-Aware Prompt Template**

An example prompt optimised for municipal service context, using in-context instructions:

In [6]:
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template=textwrap.dedent("""\
        You are a helpful municipal assistant in Singapore. Use the data below to answer the user's question accurately and concisely.

        Context:
        {context}

        User question: {question}
        Answer:""")
)

**Query the System**

In [12]:
def split_thought_and_answer(llm_output: str):
    match = re.search(r"<think>(.*?)</think>(.*)", llm_output, re.DOTALL)
    if match:
        thought = match.group(1).strip()
        answer = match.group(2).strip()
        return thought, answer
    else:
        return None, llm_output.strip()

In [13]:
llm = OllamaLLM(model="deepseek-r1:7b")
chain = prompt_template | llm

# Define user query
user_query = "Are there any recent high severity rodent issues in Toa Payoh?"

# Set filter (if needed)
metadata_filter = {
    "$and": [
        {"agency": {"$eq": "National Environment Agency (NEA)"}},
        {"severity": {"$eq": "High"}}
    ]
}

# Get top matches from Chroma
docs, metas = search_similar_issues(user_query, k=5, metadata_filter=metadata_filter)

# Format retrieved context
rag_context = "\n\n---\n\n".join(docs)

# Generate final response
response = chain.invoke({"context": rag_context, "question": user_query})
thought, answer = split_thought_and_answer(response)
print("LLM + RAG Response:")
print("Final Answer:", answer)
print("Reasoning (optional):", thought)

LLM + RAG Response:
Final Answer: Yes, there was a high-severity rodent issue reported at an establishment in Toa Payoh Central on April 13, 2025. The incident falls under the category of Pests > Rodents in Food Establishment and was resolved by the National Environment Agency (NEA).
Reasoning (optional): Okay, so I need to figure out if there have been any recent high-severity rodent issues in Toa Payoh based on the data provided. Let me go through each entry step by step.

First, I see that the user has shared several incidents reported between April 12 and April 15, all under different categories but all related to Toa Payoh Central area with coordinates (1.3321, 103.8478). 

Looking at each incident:

1. The first entry from April 15 mentions rodent issues in the context of "Foreign gun every field treatment." I'm not entirely sure what that means, but it might refer to something like rat traps or efforts to control rats.

2. There's another incident on April 13 related to high sev