# Lesson 3: Retrieving and Prompt Building in RAG Systems

Welcome to the third lesson in our journey through **Scaling Up RAG with Vector Databases**! Well done—you’re halfway through this course. In the previous lesson, you learned how to split (or chunk) your text data and store those chunks in a vector database collection. Now, we’ll delve into retrieving the most relevant chunks for any given query and building an LLM prompt to produce more accurate, context-driven answers.

---

## Retrieving the Most Relevant Chunks

Before your LLM can generate a coherent, context-rich answer, you need to fetch the right information. Your vector database (for instance, using Chroma) will rank which document chunks are most relevant for a given query.

```python
def retrieve_top_chunks(query, collection, top_k=2):
    """
    Retrieves the top_k chunks relevant to the given query from 'collection'.
    Returns a list of retrieved chunks, each containing 'chunk' text,
    'doc_id', and 'distance'.
    """
    # Search for top_k results matching the user's query
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )
    
    retrieved_chunks = []
    
    # Safeguard in case no results are found
    if not results['documents'] or not results['documents'][0]:
        return retrieved_chunks

    # Gather each retrieved chunk, along with its distance score
    for i in range(len(results['documents'][0])):
        retrieved_chunks.append({
            "chunk": results['documents'][0][i],
            "doc_id": results['ids'][0][i],
            "distance": results['distances'][0][i]
        })
    return retrieved_chunks
````

### How It Works

1. **Function Definition**

   * `query`: the user’s question or search term
   * `collection`: the Chroma collection object containing embedded documents
   * `top_k`: the number of most relevant chunks to retrieve (default = 2)

2. **Vector Search**

   * `collection.query()` performs a vector-based similarity search.
   * `query_texts=[query]` passes the user’s query as a list (Chroma API expects a list).
   * `n_results=top_k` specifies how many matching chunks to return.

3. **Results Structure**
   The query returns a dictionary with keys:

   * **`documents`**: the actual text chunks
   * **`ids`**: document identifiers
   * **`distances`**: similarity scores (lower = better match)
     Each key maps to a nested list: `[[item1, item2, ...]]`.

4. **Processing Results**
   For each hit, we build a dict with:

   * `"chunk"`: text content
   * `"doc_id"`: its identifier
   * `"distance"`: similarity score
     These are appended to `retrieved_chunks`, which is then returned.

---

## Building a Prompt for the LLM

Once you have your relevant chunks, construct a prompt so the LLM focuses only on those contexts. This reduces hallucinations and enforces factual accuracy.

```python
def build_prompt(query, retrieved_chunks):
    """
    Constructs a prompt by restating the 'query' and adding the retrieved chunks
    as inline context for the LLM.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt
```

**Why this matters:**

* **Controlled Context**: Instructs the LLM to use only the provided text.
* **Flexibility**: You can tweak the format (bullet points, instructions, etc.) to guide tone and depth.
* **Clarity**: Restating the question up front reminds the model of the exact task.

---

## Integrating the Corpus and Creating the Collection

To see this in action, first load your corpus data and create a collection in your vector database.

```python
# Load corpus data from JSON file
with open('data/corpus.json', 'r') as f:
    corpus_data = json.load(f)

model_name = 'sentence-transformers/all-MiniLM-L6-v2'
embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
client = Client(Settings())
collection = client.get_or_create_collection(
    "rag_collection",
    embedding_function=embed_func
)

# Batch add documents from the corpus data
documents = [doc['content'] for doc in corpus_data]
ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
collection.add(documents=documents, ids=ids)
```

**Key Details:**

* **Embedding Function**: Uses `SentenceTransformerEmbeddingFunction`. Replaceable with any model you prefer.
* **Collection**: `get_or_create_collection` retrieves an existing one or creates a new collection automatically.
* **Bulk Ingestion**: Adding documents in batches maximizes throughput.

---

## Querying the Database and Generating Answers

With your collection ready, you can retrieve chunks and feed them into your LLM:

```python
query = "What are some recent technological breakthroughs?"
retrieved_docs = retrieve_top_chunks(query, collection, top_k=5)
final_prompt = build_prompt(query, retrieved_docs)
answer = get_llm_response(final_prompt)

print("Prompt:\n")
print(final_prompt)
print("\nLLM Answer:", answer)
```

**Step by Step:**

1. **Formulate the Query**: Define the user’s question string.
2. **Retrieve Chunks**: Call `retrieve_top_chunks()` to get the top 5 semantically similar chunks.
3. **Construct the Prompt**: Use `build_prompt()` to assemble the question + contexts.
4. **Get LLM Response**: Pass the prompt into your LLM via `get_llm_response()`.

Printing both the prompt and the answer helps you debug and refine retrieval or prompt formatting.

---

## Examining the Output

Here’s an example of the system’s final output:

```
Prompt:

Question: What are some recent technological breakthroughs?
Answer using only the following context:
- The Industrial Revolution brought significant technological and social changes. It reshaped economies and altered the fabric of society. Scholars examine its impact on labor, innovation, and modern industrial practices.
- Breakthroughs in renewable energy technologies are reducing global dependence on fossil fuels. Solar and wind systems are becoming more efficient and affordable. These innovations are crucial to combating climate change and ensuring a sustainable future.
- The digital revolution is transforming how we approach health and wellness. Technological innovations, from fitness trackers to health apps, are empowering individuals to manage their well-being. This integration of technology and lifestyle is reshaping daily habits for a healthier future.
- Advances in medical technology are revolutionizing patient care through new diagnostic and treatment methods. Breakthroughs in imaging and robotics are enhancing the precision of medical procedures. Healthcare professionals are optimistic about the potential for improved outcomes.
- Scientists are developing renewable materials that could replace traditional plastics. Innovations in biopolymers are leading to sustainable manufacturing practices. These breakthroughs promise to reduce environmental waste and support a circular economy.
Answer:

LLM Answer: Recent technological breakthroughs include advancements in renewable energy technologies, which are making solar and wind systems more efficient and affordable, thereby reducing global dependence on fossil fuels and aiding in the fight against climate change. Additionally, the digital revolution is enhancing health and wellness through innovations like fitness trackers and health apps, empowering individuals to better manage their well-being. In the medical field, new diagnostic and treatment methods, along with improvements in imaging and robotics, are revolutionizing patient care and enhancing the precision of medical procedures. Furthermore, scientists are developing renewable materials, such as biopolymers, to replace traditional plastics, promoting sustainable manufacturing practices and supporting a circular economy.
```

In this snippet:

* The prompt instructs the LLM to use only the listed contexts.
* The final answer focuses on the most relevant chunks—renewable energy, healthcare tech, and sustainable materials—demonstrating how RAG systems reduce hallucinations and maintain factual alignment.

---

## Conclusion and Next Steps

In this lesson, you learned how to:

1. **Retrieve** the most relevant text chunks via semantic similarity.
2. **Construct** a focused, context-driven prompt to keep the LLM on track.

These steps are central to building a robust Retrieval-Augmented Generation pipeline. By feeding your LLM with precise context, you improve accuracy and trustworthiness.

**Next**, you’ll get hands-on with exercises to test different queries, tweak prompt formats, and observe how the LLM’s responses change. Keep experimenting—your mastery of RAG systems is well underway!



## Enhance Your Retrieval Function

In the previous lesson, you explored the fundamentals of retrieving relevant chunks from a vector database. Now, it's time to apply that knowledge! Your goal is to enhance the retrieve_top_chunks function to accept a user query string and test it against a ChromaDB collection. This exercise will help you verify that the function accurately returns the top matching chunks based on semantic similarity.

To get started, follow these steps:

Implement the retrieve_top_chunks function by using collection.query() to search for documents matching the query. Ensure the query returns the top k results.
Process the results and append each chunk's information to retrieved_chunks, including chunk text, doc_id, and distance.
Define a query string to test the retrieval function.
Print the retrieved results, including chunk text, doc_id, and distance, to ensure the function's precision.
For instance, if you query "technological breakthroughs", the function should return the most relevant chunks from your collection, each accompanied by its doc_id and distance score. Let's get to work!

```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions


def retrieve_top_chunks(query, collection, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection'.
    Returns a list of retrieved chunks, each containing 'chunk' text, 'doc_id', and 'distance'.
    """
    # TODO: Use collection.query() to search for documents matching the query
    # The query should return top_k results

    # TODO: Add safeguard for empty results

    # TODO: Process the results and append each chunk's information to retrieved_chunks
    # Each chunk should have: chunk text, doc_id, and distance score
    pass


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining multiple retrieved chunks into a
    single context block, ensuring the model can handle longer or more detailed answers.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load a small set of documents from corpus.json
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Set up the embedding function and create/get a ChromaDB collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection("rag_collection", embedding_function=embed_func)

    # Add documents from corpus_data to the collection
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    collection.add(documents=documents, ids=ids)

    # TODO: Define a query string to test the retrieval function
    user_query = ""
    
    # Retrieve top matches
    retrieved_chunks = retrieve_top_chunks(user_query, collection, top_k=3)

    # Print the retrieved chunks to verify the function's accuracy
    for rc in retrieved_chunks:
        print("Chunk:", rc["chunk"])
        print("Doc ID:", rc["doc_id"])
        print("Distance:", rc["distance"])
        print("-" * 40)


```

Thought for a couple of seconds

```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions


def retrieve_top_chunks(query, collection, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection'.
    Returns a list of retrieved chunks, each containing 'chunk' text, 'doc_id', and 'distance'.
    """
    # Perform vector similarity search
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )

    retrieved_chunks = []

    # Check if any documents were returned
    docs = results.get('documents', [])
    if not docs or not docs[0]:
        return retrieved_chunks

    # Iterate through top results and collect metadata
    for i, chunk_text in enumerate(docs[0]):
        retrieved_chunks.append({
            "chunk": chunk_text,
            "doc_id": results['ids'][0][i],
            "distance": results['distances'][0][i]
        })
    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining multiple retrieved chunks into a
    single context block, ensuring the model can handle longer or more detailed answers.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load a small set of documents from corpus.json
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Setup embedding function and ChromaDB collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection(
        "rag_collection",
        embedding_function=embed_func
    )

    # Add documents from corpus_data to the collection
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    collection.add(documents=documents, ids=ids)

    # Define a query string to test the retrieval function
    user_query = "technological breakthroughs"

    # Retrieve top matches
    retrieved_chunks = retrieve_top_chunks(user_query, collection, top_k=3)

    # Print the retrieved chunks to verify the function's accuracy
    for rc in retrieved_chunks:
        print("Chunk:", rc["chunk"])
        print("Doc ID:", rc["doc_id"])
        print("Distance:", rc["distance"])
        print("-" * 40)

    # Optionally, build and display the LLM prompt
    prompt = build_prompt(user_query, retrieved_chunks)
    print("\nConstructed Prompt:\n")
    print(prompt)


```


Saya telah melengkapi fungsi `retrieve_top_chunks` sesuai instruksi:

* Menggunakan `collection.query()` untuk mencari dan mengembalikan top k hasil.
* Safeguard saat tidak ada hasil.
* Menyusun daftar `retrieved_chunks` dengan `chunk`, `doc_id`, dan `distance`.
* Menetapkan `user_query` sebagai `"technological breakthroughs"` untuk pengujian.
* Menampilkan hasil dan membangun prompt LLM sebagai validasi.

Coba jalankan skrip `enhance_retrieval.py` untuk memverifikasi keluaran retrieval Anda!


## Crafting Context-Rich LLM Prompts

Well done on mastering the art of retrieving relevant chunks from a vector database in the previous exercise! Now, let's elevate your skills by integrating the build_prompt function into your existing code. This will enable you to craft a single, context-rich prompt that guides the LLM to generate precise answers.

Here's your mission:

Retrieve the top chunks for a given query and pass these chunks into the build_prompt function.
Use the generated prompt to obtain an LLM response, employing a placeholder function or script for this purpose.
Print both the final prompt and the LLM's response to verify that the LLM is effectively using the retrieved chunks to formulate its answer.
For instance, if your query revolves around "technological breakthroughs," the LLM should provide an answer that reflects the context of the retrieved chunks. This exercise will ensure that your LLM is not just generating answers but doing so with context and accuracy. Embrace the challenge and discover how well your LLM can perform with the right guidance!


```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response


def retrieve_top_chunks(query, collection, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection'.
    Returns a list of retrieved chunks, each containing 'chunk' text, 'doc_id', and 'distance'.
    """
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )
    retrieved_chunks = []
    
    # Safeguard if no documents are returned
    if not results['documents'][0]:
        return retrieved_chunks

    for i in range(len(results['documents'][0])):
        retrieved_chunks.append({
            "chunk": results['documents'][0][i],
            "doc_id": results['ids'][0][i],
            "distance": results['distances'][0][i]
        })
    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining multiple retrieved chunks into a
    single context block, ensuring the model can handle longer or more detailed answers.
    """
    # TODO: Implement the build_prompt function that creates a prompt string
    # combining the query and retrieved chunks
    

if __name__ == "__main__":
    # Load corpus data from JSON file
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Set up the embedding model and initialize a ChromaDB collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection("rag_collection", embedding_function=embed_func)

    # Add documents from corpus_data into the vector database
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    collection.add(documents=documents, ids=ids)

    # Define a sample query
    query = "What are some recent technological breakthroughs?"

    # TODO: Retrieve chunks, build the prompt, and get the LLM response
    
    # TODO: Print the final prompt and LLM answer


```


```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response

def retrieve_top_chunks(query, collection, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection'.
    Returns a list of retrieved chunks, each containing 'chunk' text, 'doc_id', and 'distance'.
    """
    results = collection.query(
        query_texts=[query],
        n_results=top_k
    )
    retrieved_chunks = []
    # Safeguard if no documents are returned
    if not results.get('documents') or not results['documents'][0]:
        return retrieved_chunks
    for i, chunk_text in enumerate(results['documents'][0]):
        retrieved_chunks.append({
            "chunk": chunk_text,
            "doc_id": results['ids'][0][i],
            "distance": results['distances'][0][i]
        })
    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining the query and retrieved chunks
    into a single context block for the LLM.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load corpus data from JSON file
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Set up the embedding model and initialize a ChromaDB collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection(
        "rag_collection",
        embedding_function=embed_func
    )

    # Batch add documents from corpus_data into the vector database
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    collection.add(documents=documents, ids=ids)

    # Define a sample query
    query = "What are some recent technological breakthroughs?"

    # Retrieve top chunks
    retrieved_chunks = retrieve_top_chunks(query, collection, top_k=3)

    # Build the context-rich prompt
    prompt = build_prompt(query, retrieved_chunks)

    # Get an answer from the LLM
    answer = get_llm_response(prompt)

    # Print the final prompt and LLM answer
    print("--- Constructed Prompt ---")
    print(prompt)
    print("--- LLM Response ---")
    print(answer)


```

Thought for 4 seconds


Saya sudah menambahkan fungsi `build_prompt` serta mengintegrasikannya dengan `retrieve_top_chunks` dan `get_llm_response`. Skrip `craft_context_prompt.py` kini:

1. Memuat data korpus dan menyiapkan koleksi ChromaDB.
2. Mengambil top-3 chunk relevan untuk query.
3. Membentuk prompt kontekstual lengkap.
4. Memanggil LLM lewat `get_llm_response` dan mencetak prompt serta jawaban.

Coba jalankan file ini untuk melihat seberapa akurat LLM memanfaatkan konteks yang diambil!


## Metadata-Based Retrieval Enhancement

In the previous exercise, you mastered retrieving relevant chunks from a vector database and using them to build context-rich prompts. Now, let's enhance your skills by incorporating metadata-based filtering into your retrieval process.

Your task is to enhance the retrieve_top_chunks function to include this filtering capability. This involves allowing users to specify a category, such as "finance" or "AI," and ensuring that only chunks matching this category are returned. Here's your roadmap:

Modify the retrieve_top_chunks function to create a where dictionary that filters by category if one is provided.
Use the collection.query() method with the where parameter to perform the query with metadata filtering.
Process the query results and append each chunk's information to the retrieved_chunks list.
Define a query and category to test the retrieval function.
Implement logic to handle cases where no chunks match the specified category by printing a user-friendly message.
Use the build_prompt function with these filtered chunks to create a single, context-rich prompt.
For example, if you query for "technological breakthroughs" in the "AI" category, the function should return the most relevant AI-related chunks. If no chunks match, inform the user with a clear message. This exercise will refine your retrieval skills and ensure your LLM responses are not only accurate but also contextually relevant. Embrace the challenge and see how effectively you can guide the LLM with precise context!


```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response


def retrieve_top_chunks(query, collection, category=None, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection',
    optionally filtered by category. Returns a list of retrieved chunks, each
    containing 'chunk' text, 'doc_id', and 'distance'.
    """
    # TODO: Create a where dictionary to filter by category if one is provided

    # TODO: Perform the query with metadata filtering using collection.query()
    # Include the where parameter in the query

    retrieved_chunks = []

    # Safeguard against empty results
    if not results['documents'] or not results['documents'][0]:
        return retrieved_chunks

    # TODO: Process query results and append each chunk's information to retrieved_chunks

    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining multiple retrieved chunks into a
    single context block, ensuring the model can provide context-based answers.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load corpus data from JSON file
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Prepare documents, ids, and metadatas
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    metadatas = [{"category": doc["category"]} for doc in corpus_data]

    # Create or retrieve the vector database collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection(
        "rag_collection",
        embedding_function=embed_func
    )

    # Add documents with metadata to the collection
    collection.add(documents=documents, ids=ids, metadatas=metadatas)

    # TODO: Define a query and category to test the retrieval function
    user_query = ""
    user_category = ""

    # TODO: Retrieve chunks matching the query and category
    retrieved = []

    # TODO: Implement logic to handle empty results, build prompt, and get LLM response
    # Print appropriate messages or the final prompt and answer


```
Thought for a couple of seconds

```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response


def retrieve_top_chunks(query, collection, category=None, top_k=3):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection',
    optionally filtered by category. Returns a list of retrieved chunks, each
    containing 'chunk' text, 'doc_id', and 'distance'.
    """
    # Build metadata filter
    where = {"category": category} if category else {}

    # Perform query with optional metadata filtering
    results = collection.query(
        query_texts=[query],
        n_results=top_k,
        where=where
    )

    retrieved_chunks = []

    # Safeguard against empty results
    docs = results.get("documents", [])
    if not docs or not docs[0]:
        return retrieved_chunks

    # Process and collect results
    for i, chunk_text in enumerate(docs[0]):
        retrieved_chunks.append({
            "chunk": chunk_text,
            "doc_id": results["ids"][0][i],
            "distance": results["distances"][0][i]
        })
    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs an LLM prompt by combining multiple retrieved chunks into a
    single context block, ensuring the model can provide context-based answers.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load corpus data from JSON file
    with open('data/corpus.json', 'r') as f:
        corpus_data = json.load(f)

    # Prepare docs, ids, and metadatas
    documents = [doc['content'] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    metadatas = [{"category": doc.get("category", "general")} for doc in corpus_data]

    # Initialize ChromaDB collection
    model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection(
        "rag_collection",
        embedding_function=embed_func
    )

    # Add documents with metadata
    collection.add(
        documents=documents,
        ids=ids,
        metadatas=metadatas
    )

    # Define query and category to test
    user_query = "What are some recent technological breakthroughs?"
    user_category = "AI"

    # Retrieve using metadata filter
    retrieved = retrieve_top_chunks(user_query, collection, category=user_category, top_k=3)

    if not retrieved:
        print(f"No chunks found matching category '{user_category}' for query '{user_query}'.")
    else:
        # Build context-rich prompt and get LLM response
        prompt = build_prompt(user_query, retrieved)
        answer = get_llm_response(prompt)

        # Display results
        print("--- Retrieved Chunks ---")
        for rc in retrieved:
            print(f"Doc ID: {rc['doc_id']}, Distance: {rc['distance']}")
            print(f"Chunk: {rc['chunk']}\n")
        print("--- Constructed Prompt ---")
        print(prompt)
        print("--- LLM Response ---")
        print(answer)


```


Saya telah menambahkan fungsi `retrieve_top_chunks` dengan kemampuan filtering berdasarkan `category` menggunakan parameter `where`. Skrip `metadata_retrieval.py` kini:

1. Membentuk filter metadata (`where`) apabila `category` disediakan.
2. Memanggil `collection.query()` dengan `where`, `query_texts`, dan `n_results`.
3. Menangani kasus tidak ada hasil dengan mengembalikan list kosong.
4. Mengumpulkan hasil relevan beserta `chunk`, `doc_id`, dan `distance`.
5. Di bagian main, menguji fungsi dengan query dan kategori (“AI”), mencetak pesan jika kosong, atau menampilkan chunks, prompt, dan jawaban dari LLM jika berhasil.

Silakan jalankan untuk memverifikasi filter kategori dan keakuratan konteks!


## Refine Retrieval with Distance Threshold

Well done on mastering metadata-based filtering in your previous exercise! Now, let's elevate your retrieval process by incorporating a distance threshold. This enhancement will ensure that only the most semantically relevant chunks are considered, refining your LLM responses for greater precision and contextual accuracy.

To achieve this, follow these steps:

Modify the retrieve_top_chunks function to include a distance threshold. Ensure that only chunks with a similarity distance below a specified cutoff are included. You should choose an appropriate distance threshold yourself!
Seamlessly integrate this distance-based filtering with the existing metadata filtering. Both category constraints and distance constraints should be applied simultaneously.
Handle scenarios where no chunks meet the criteria gracefully. Instead of generating a prompt, provide a friendly message indicating that no relevant chunks were found.
This exercise will deepen your understanding of retrieval logic and reinforce your skills in context construction. Dive in and see how effectively you can guide the LLM with precise and relevant context!


```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response


def retrieve_top_chunks(query, collection, category=None, top_k=3, distance_threshold=1.0):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection',
    optionally filtered by category, and only includes those whose distance is
    below the specified distance_threshold. Returns a list of retrieved chunks,
    each containing 'chunk', 'doc_id', and 'distance'.
    """
    where = {"category": category} if category is not None else None

    results = collection.query(
        query_texts=[query],
        where=where,
        n_results=top_k
    )


    retrieved_chunks = []
    if not results["documents"] or not results["documents"][0]:
        return retrieved_chunks

    # TODO: Process the results and append chunks that meet the distance threshold
    # For each chunk in results, check if its distance is below distance_threshold
    # If it qualifies, add it to retrieved_chunks with chunk text, doc_id, and distance

    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs a prompt by combining the query and retrieved chunks into a
    context block, guiding the LLM to provide a context-based answer.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load corpus data from JSON file
    with open("data/corpus.json", "r") as f:
        corpus_data = json.load(f)

    # Prepare documents, ids, and metadatas
    documents = [doc["content"] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    metadatas = [{"category": doc.get("category", "")} for doc in corpus_data]

    # Create or retrieve the vector database collection
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection("rag_collection", embedding_function=embed_func)

    # Add documents with metadata to the collection
    collection.add(documents=documents, ids=ids, metadatas=metadatas)

    # Define query parameters (query string, category, and distance threshold)
    user_query = "What are the latest AI breakthroughs?"  # Example query
    user_category = "Technology"
    threshold = 1.0

    # Retrieve and filter chunks
    filtered_chunks = retrieve_top_chunks(
        query=user_query,
        collection=collection,
        category=user_category,
        top_k=5,
        distance_threshold=threshold
    )

    # TODO: Handle the filtered chunks:
    # - If no chunks found, print a user-friendly message
    # - Otherwise, build the prompt and get LLM response


```

Thought for a couple of seconds

```python
import json
from chromadb import Client
from chromadb.config import Settings
from chromadb.utils import embedding_functions
from scripts.llm import get_llm_response


def retrieve_top_chunks(query, collection, category=None, top_k=3, distance_threshold=0.8):
    """
    Retrieves the top_k chunks most relevant to the given query from 'collection',
    optionally filtered by category, and only includes those whose distance is
    below the specified distance_threshold. Returns a list of retrieved chunks,
    each containing 'chunk', 'doc_id', and 'distance'.
    """
    # Build metadata filter
    where = {"category": category} if category else None

    # Perform query with optional metadata filtering
    results = collection.query(
        query_texts=[query],
        where=where,
        n_results=top_k
    )

    retrieved_chunks = []
    docs = results.get("documents", [])
    ids = results.get("ids", [])
    distances = results.get("distances", [])

    # Safeguard against empty results
    if not docs or not docs[0]:
        return retrieved_chunks

    # Filter by distance threshold
    for idx, chunk_text in enumerate(docs[0]):
        dist = distances[0][idx]
        if dist <= distance_threshold:
            retrieved_chunks.append({
                "chunk": chunk_text,
                "doc_id": ids[0][idx],
                "distance": dist
            })
    return retrieved_chunks


def build_prompt(query, retrieved_chunks):
    """
    Constructs a prompt by combining the query and retrieved chunks into a
    context block, guiding the LLM to provide a context-based answer.
    """
    prompt = f"Question: {query}\nAnswer using only the following context:\n"
    for rc in retrieved_chunks:
        prompt += f"- {rc['chunk']}\n"
    prompt += "Answer:"
    return prompt


if __name__ == "__main__":
    # Load corpus data from JSON file
    with open("data/corpus.json", "r") as f:
        corpus_data = json.load(f)

    # Prepare documents, ids, and metadatas
    documents = [doc["content"] for doc in corpus_data]
    ids = [f"chunk_{doc['id']}_0" for doc in corpus_data]
    metadatas = [{"category": doc.get("category", "general")} for doc in corpus_data]

    # Initialize ChromaDB collection
    model_name = "sentence-transformers/all-MiniLM-L6-v2"
    embed_func = embedding_functions.SentenceTransformerEmbeddingFunction(model_name=model_name)
    client = Client(Settings())
    collection = client.get_or_create_collection(
        "rag_collection",
        embedding_function=embed_func
    )

    # Add documents with metadata
    collection.add(
        documents=documents,
        ids=ids,
        metadatas=metadatas
    )

    # Define query parameters
    user_query = "What are the latest AI breakthroughs?"
    user_category = "Technology"
    threshold = 0.75

    # Retrieve and filter chunks
    filtered_chunks = retrieve_top_chunks(
        query=user_query,
        collection=collection,
        category=user_category,
        top_k=5,
        distance_threshold=threshold
    )

    # Handle no results case
    if not filtered_chunks:
        print(f"No relevant chunks found for query '{user_query}' "
              f"in category '{user_category}' with threshold <= {threshold}.")
    else:
        # Build the context-rich prompt
        prompt = build_prompt(user_query, filtered_chunks)
        # Get answer from the LLM
        answer = get_llm_response(prompt)

        # Display filtered chunks
        print("--- Filtered Chunks ---")
        for rc in filtered_chunks:
            print(f"Doc ID: {rc['doc_id']} | Distance: {rc['distance']}")
            print(f"Chunk: {rc['chunk']}\n")

        # Display prompt and answer
        print("--- Constructed Prompt ---")
        print(prompt)
        print("--- LLM Response ---")
        print(answer)

```

Saya telah menambahkan filter jarak (`distance_threshold`) ke dalam `retrieve_top_chunks`, memastikan hanya potongan dengan nilai jarak di bawah ambang yang dipilih yang dikembalikan. Kode ini juga menggabungkan filter kategori sekaligus menangani kasus tanpa hasil, menampilkan pesan ramah pengguna. Coba jalankan file `refine_retrieval_threshold.py` untuk melihat perbaikan retrieval Anda!
