# Lesson 3: Handling Overlaps and Summarization in RAG Pipelines


Welcome to our third lesson of this course about improving Retrieval-Augmented Generation (RAG) pipelines! In our previous sessions, we explored constrained generation to reduce hallucinations and iterative retrieval to refine how we search for relevant context. Now, we will focus on managing multiple, potentially repetitive chunks of text by detecting overlaps and summarizing them. This ensures that your final answer is both concise and comprehensive. Let's jump in!

## Why Summarize And Check Overlaps

Sometimes your system will retrieve numerous chunks that carry the same core insight, especially when your corpus has repeated sections. Directly showing all of that content might confuse the end user and clutter the final answer.

By integrating overlap detection and summarization, you can:

- **Reduce Redundancy**: Merge repetitive chunks so readers don’t have to sift through duplicated text.  
- **Enhance Readability**: Provide a cleaner, streamlined overview rather than repeating the same facts.  
- **Improve LLM Performance**: Concentrate the LLM’s attention on crucial details, helping it generate more accurate output.  

This strategy elevates your RAG pipeline: first, detect if multiple chunks are too similar; then decide whether to compile them into a single summary or simply present them as-is.

## Overlap Detection In Action

To illustrate how you might detect repeated content, here’s a simple function that checks lexical (word-level) overlap among chunks. In a more robust system, you would rely on embeddings-based similarity, but this example captures the core concept:

```python
def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    text_sets = [set(c["text"].split()) for c in chunks]
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            overlap = len(text_sets[i].intersection(text_sets[j])) / max(len(text_sets[i]), 1)
            if overlap > similarity_threshold:
                return True
    return False
````

**What’s happening here?**

1. We set a `similarity_threshold` to decide when two chunks have an especially large overlap in vocabulary.
2. If that threshold is exceeded, the function returns `True`, signaling significant redundancy.

While this placeholder approach is simplistic, it’s enough for demonstration. Embeddings-based techniques are more advanced, capturing semantic overlap rather than just word overlap.

## Summarizing Long Text Chunks

When you detect overlapping chunks—or simply have many chunks—it often makes sense to condense them into a single summary. Doing so keeps the final context more focused:

```python
def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks or user decides not to summarize, we skip.
    - If the summary is too short or drops essential info, we can fallback.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )
    summary = get_llm_response(prompt).strip()

    if len(summary) < 20 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary
```

**How it works:**

1. Combine chunks into a single string.
2. Form a prompt explicitly asking the LLM for a brief but thorough summary.
3. If the LLM produces something unusually short or “not possible,” return the original text, ensuring nothing is lost.

## Generating The Final Answer

After deciding whether to use a direct set of chunks or a merged summary, you need to craft the actual response for the user’s query:

```python
def final_generation(query, context):
    """
    Provide the final answer using either the summarized or plain context.
    If no context is available, fallback is triggered.
    """
    if not context.strip():
        return "I'm sorry, but I couldn't find any relevant information."

    prompt = (
        f"Question: {query}\n"
        f"Context:\n{context}\n"
        "Answer:"
    )
    return get_llm_response(prompt)
```

**Key points:**

* If no context is available, we immediately let the user know.
* When context is present, we embed both the user query and the retrieved text into a prompt, so the LLM can produce a final, context-aware answer.

## Putting It All Together

Below is an example flow that ties these functions together— from retrieving chunks to deciding if a summary is needed, and then generating the final answer. Each line includes minimal but essential commentary:

```python
# Load or generate chunks from the corpus
chunked_docs = load_and_chunk_corpus("my_corpus_file.json", chunk_size=40)
collection   = build_chroma_collection(chunked_docs, "my_collection_id")

user_query       = "Provide an overview of our internal policies."
retrieval_results = collection.query(query_texts=[user_query], n_results=5)

# If no chunks are retrieved, provide a fallback answer
if not retrieval_results['documents'][0]:
    print("No chunks were retrieved for the query.")
    final_answer = "No relevant information found."
else:
    # Collect the retrieved text chunks
    retrieved_chunks = []
    for doc_text in retrieval_results['documents'][0]:
        retrieved_chunks.append({"text": doc_text})

    # Decide whether to summarize based on chunk count or overlap
    if len(retrieved_chunks) > 3 or are_chunks_overlapping(retrieved_chunks):
        context = summarize_chunks(retrieved_chunks)
    else:
        # If no major overlap, just list chunks plainly
        context = "\n".join(f"- {c['text']}" for c in retrieved_chunks)

    # Generate the final answer using either the combined summary or raw chunks
    final_answer = final_generation(user_query, context)

print(f"Final answer:\n{final_answer}")
```

**Step-by-step overview:**

1. **Load & Build:** Load the corpus into `chunked_docs` and build a vector-based collection.
2. **Query the Collection:** Fetch the top five relevant documents for a given user query.
3. **Overlap Logic:** If these chunks are numerous (more than three) or appear heavily duplicated, consolidate them into a summary. Otherwise, present them as a list.
4. **Final Generation:** Create a user-facing answer by combining the query with our selected context (summarized or raw).

## Conclusion And Next Steps

You’ve now learned how to detect overlapping chunks in retrieved text and generate a summarized version where it makes sense. This intermediate step can significantly improve readability and relevance for your end users, especially when working with large and repetitive corpora.

Keep experimenting, and have fun optimizing your RAG system!



## Detecting Overlapping Text Chunks

Congratulations on reaching this point in the lesson! You've already made great strides in understanding the fundamentals of RAG pipelines. Now, it's time to apply what you've learned by tackling the are_chunks_overlapping function. This exercise will help you enhance your skills in detecting when chunks of text are too similar.

In this exercise, your objective is to complete the missing parts of the are_chunks_overlapping function to ensure it accurately identifies overlapping content. Here's how you can achieve this:

Split the text: For each chunk, break the text into individual words.
Create sets: Convert these words into sets to facilitate overlap comparison.
Calculate overlap: Determine the overlap ratio between pairs of chunks by finding the intersection of their word sets and dividing by the size of the first set.
Compare with threshold: Check if this overlap ratio exceeds the similarity_threshold. If it does, the function should return True, indicating significant redundancy.
Imagine you have two chunks: "The quick brown fox" and "The quick fox jumps." Your goal is to compute the overlap ratio, which in this case would be 0.75 (3 out of 4 words overlap). If this ratio is greater than the threshold, the function should flag these chunks as overlapping.

By completing this exercise, you'll deepen your understanding of overlap detection and make your RAG pipeline smarter and more efficient. Dive in and see how you can improve the function!


```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    # TODO: Create a list of sets where each set contains the words from a chunk
    # Hint: Use list comprehension with split() to get words and convert to set
    text_sets = ___

    # Compare each pair of chunks
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            # TODO: Calculate the overlap ratio between chunks i and j
            # Hint: Use set intersection and divide by the larger set's size
            overlap = ___

            if overlap > similarity_threshold:
                return True
    return False

def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks or user decides not to summarize, we skip.
    - If the summary is too short or drops essential info, we can fallback or retry.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )

    summary = get_llm_response(prompt).strip()

    if len(summary) < 20 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary

def final_generation(query, context):
    """
    Provide the final answer using either the summarized or plain context.
    If no context is available, fallback is triggered.
    """
    if not context.strip():
        return "I'm sorry, but I couldn't find any relevant information."

    prompt = (
        f"Question: {query}\n"
        f"Context:\n{context}\n"
        "Answer:"
    )
    return get_llm_response(prompt)

if __name__ == "__main__":
    chunked_docs = load_and_chunk_corpus("data/corpus.json", chunk_size=40)
    collection = build_chroma_collection(chunked_docs, "summary_demo_collection")

    user_query = "Provide an overview of our internal policies."
    retrieval_results = collection.query(query_texts=[user_query], n_results=5)

    if not retrieval_results['documents'][0]:
        print("No chunks were retrieved for the query.")
        final_answer = "No relevant information found."
    else:
        retrieved_chunks = []
        for doc_text in retrieval_results['documents'][0]:
            retrieved_chunks.append({"text": doc_text})

        # Decide whether to summarize based on the number or overlap of chunks
        if len(retrieved_chunks) > 3 or are_chunks_overlapping(retrieved_chunks):
            context = summarize_chunks(retrieved_chunks)
        else:
            context = "\n".join(f"- {c['text']}" for c in retrieved_chunks)

        final_answer = final_generation(user_query, context)

    print(f"Final answer:\n{final_answer}")


```

**Filling in function code**

The user wants to complete the `are_chunks_overlapping` function. To fill in the blanks, here's how I'd proceed:

* For `text_sets`, I’d use a list comprehension: `text_sets = [set(chunk["text"].split()) for chunk in chunks]`. It splits each chunk's "text" into words and turns it into a set of unique words.
* For `overlap`, I’d calculate it like this: `overlap = len(text_sets[i].intersection(text_sets[j])) / max(len(text_sets[i]), len(text_sets[j]))`, making sure to divide by the larger set size to get the overlap ratio.

**Clarifying overlap calculation**

The user is focused on the overlap calculation between two chunks — the ratio of overlapping words to the size of the first set versus the larger set. They note different approaches in the instructions and code demonstrations. In the code, using `max(len(set_i), 1)` ensures no division by zero. However, the original hint suggests dividing by the larger set's size for a more conservative overlap. This aligns with ensuring that the overlap only qualifies if it covers at least 80% of the larger chunk size. To accommodate both instructions, I’d divide by the larger set’s size.


Thought for 9 seconds


Here’s the completed `are_chunks_overlapping` with the two TODOs filled in:

```python
def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    # Create a list of sets where each set contains the words from a chunk
    text_sets = [set(c["text"].split()) for c in chunks]

    # Compare each pair of chunks
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            # Calculate the overlap ratio between chunks i and j
            overlap = len(text_sets[i].intersection(text_sets[j])) \
                      / max(len(text_sets[i]), len(text_sets[j]), 1)

            if overlap > similarity_threshold:
                return True

    return False
```

**What we did:**

1. **Built `text_sets`** via list-comprehension: one `set` of words per chunk.
2. **Computed `overlap`** as

   ```python
   len(text_sets[i] & text_sets[j]) 
   / max(len(text_sets[i]), len(text_sets[j]), 1)
   ```

   so we’re dividing by the larger word‐set size (with a `1` guard to avoid zero‐division).
3. **Return `True`** as soon as any pair exceeds the `similarity_threshold`.


## Fix the Summarization Logic

Congratulations on mastering overlap detection in the previous exercise! Now, let's shift our focus to enhancing the summarization process within your RAG pipeline. Your goal is to fix a logical bug in the summarize_chunks function, which currently defaults to returning the original text, even when a valid summary is generated.

To achieve this, follow these steps:

Inspect the logic: Examine the conditions that trigger the fallback to the original text.
Correct the checks: Ensure the function accurately evaluates whether the summary is too short or contains "Summary not possible."
Test the function: Verify that it returns a concise summary when appropriate and only falls back when necessary.
For instance, if the LLM generates a summary like "This document outlines the key policies," the function should return this summary instead of the original text. However, if the summary is "Summary not possible" or is unusually brief, the function should revert to the original content.

Dive in and make the summarize_chunks function smarter and more reliable!

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    text_sets = [set(c["text"].split()) for c in chunks]
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            overlap = len(text_sets[i].intersection(text_sets[j])) / max(len(text_sets[i]), 1)
            if overlap > similarity_threshold:
                return True
    return False

def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks or user decides not to summarize, we skip.
    - If the summary is too short or drops essential info, we can fallback or retry.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )

    summary = get_llm_response(prompt).strip()

    if len(summary) > 0 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary

def final_generation(query, context):
    """
    Provide the final answer using either the summarized or plain context.
    If no context is available, fallback is triggered.
    """
    if not context.strip():
        return "I'm sorry, but I couldn't find any relevant information."

    prompt = (
        f"Question: {query}\n"
        f"Context:\n{context}\n"
        "Answer:"
    )
    return get_llm_response(prompt)

if __name__ == "__main__":
    chunked_docs = load_and_chunk_corpus("data/corpus.json", chunk_size=40)
    collection = build_chroma_collection(chunked_docs, "summary_demo_collection")

    user_query = "Provide an overview of our internal policies."
    retrieval_results = collection.query(query_texts=[user_query], n_results=5)

    if not retrieval_results['documents'][0]:
        print("No chunks were retrieved for the query.")
        final_answer = "No relevant information found."
    else:
        retrieved_chunks = []
        for doc_text in retrieval_results['documents'][0]:
            retrieved_chunks.append({"text": doc_text})

        # Decide whether to summarize based on the number or overlap of chunks
        if len(retrieved_chunks) > 3 or are_chunks_overlapping(retrieved_chunks):
            context = summarize_chunks(retrieved_chunks)
        else:
            context = "\n".join(f"- {c['text']}" for c in retrieved_chunks)

        final_answer = final_generation(user_query, context)

    print(f"Final answer:\n{final_answer}")


```

Thought for 5 seconds


Here’s the corrected version of **only** the `summarize_chunks` function—fixing the bug so that it falls back **only** when the summary is too short or explicitly “not possible,” and otherwise returns the LLM’s summary:

```python
def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks, return a notice.
    - If the summary is too short or signals "Summary not possible", fall back.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )

    summary = get_llm_response(prompt).strip()

    # **FIXED**: only fallback when the summary is too short OR explicitly not possible
    if len(summary) < 20 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary
```

**What changed?**

* The `if` now checks for `len(summary) < 20` (too short) **or** the phrase `"Summary not possible"`.
* We removed the erroneous `len(summary) > 0` which was always true for any non-empty summary.

With this in place, any valid summary of at least 20 characters will be returned, and only genuinely unusable outputs will trigger the full-text fallback.


## Crafting the Final Answer

Congratulations on reaching the final exercise of this unit! You've done an excellent job so far, especially with the previous exercise where you integrated a summarize-or-not flag into the pipeline. Now, it's time to bring everything together and implement the final_generation function from scratch.

This function is essential for crafting the final answer in your RAG pipeline by combining the user's query with either a summarized or raw chunk context. Here's what you need to focus on:

Create the function: Develop a function named final_generation that accepts two parameters: query and context.

Handle empty context: If the context is empty or contains only whitespace, ensure the function returns a graceful fallback message like "I'm sorry, but I couldn't find any relevant information."

Formulate the prompt: When context is available, construct a prompt that includes both the user's query and the context. Format the prompt as follows:

```sh
Question: {query}
Context:
{context}
Answer:
```

Generate the response: Utilize the get_llm_response function to derive the final answer from the LLM based on the constructed prompt.

For instance, if the user's query is "What are our internal policies?" and the context is a detailed summary of those policies, your function should generate a coherent and context-aware answer. This exercise will test your ability to integrate different components of the RAG pipeline and ensure they work seamlessly together. Dive in and showcase your skills!

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    text_sets = [set(c["text"].split()) for c in chunks]
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            overlap = len(text_sets[i].intersection(text_sets[j])) / max(len(text_sets[i]), 1)
            if overlap > similarity_threshold:
                return True
    return False

def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks or user decides not to summarize, we skip.
    - If the summary is too short or drops essential info, we can fallback or retry.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )

    summary = get_llm_response(prompt).strip()

    if len(summary) < 20 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary

# TODO: Implement the final_generation function that generates answers based on context
# Hint: Check if context is empty, then create a prompt with query and context

if __name__ == "__main__":
    chunked_docs = load_and_chunk_corpus("data/corpus.json", chunk_size=40)
    collection = build_chroma_collection(chunked_docs, "summary_demo_collection")

    user_query = "Provide an overview of our internal policies."
    retrieval_results = collection.query(query_texts=[user_query], n_results=5)

    if not retrieval_results['documents'][0]:
        print("No chunks were retrieved for the query.")
        final_answer = "No relevant information found."
    else:
        retrieved_chunks = []
        for doc_text in retrieval_results['documents'][0]:
            retrieved_chunks.append({"text": doc_text})

        # Decide whether to summarize based on the number or overlap of chunks
        if len(retrieved_chunks) > 3 or are_chunks_overlapping(retrieved_chunks):
            context = summarize_chunks(retrieved_chunks)
        else:
            context = "\n".join(f"- {c['text']}" for c in retrieved_chunks)

        # TODO: Replace None with a call to your final_generation function
        final_answer = None

    print(f"Final answer:\n{final_answer}")


```

Thought for 5 seconds


```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def are_chunks_overlapping(chunks, similarity_threshold=0.8):
    """
    Basic check for overlapping or highly similar chunk texts.
    In a production system, you'd compute embeddings for each chunk
    and measure pairwise similarity. Here, we simply check if chunks
    have large lexical overlap (placeholder approach).
    """
    if len(chunks) < 2:
        return False

    text_sets = [set(c["text"].split()) for c in chunks]
    for i in range(len(text_sets) - 1):
        for j in range(i + 1, len(text_sets)):
            overlap = len(text_sets[i].intersection(text_sets[j])) / max(len(text_sets[i]), 1)
            if overlap > similarity_threshold:
                return True
    return False

def summarize_chunks(chunks):
    """
    Combine multiple chunks into a single summary with an LLM.
    - If no chunks, return a notice.
    - If the summary is too short or signals "Summary not possible", fall back.
    """
    if not chunks:
        return "No relevant chunks were retrieved."

    combined_text = "\n".join(c["text"] for c in chunks)
    prompt = (
        "You are an expert summarizer. Please generate a concise summary of the following text.\n"
        "Do not omit critical details that might answer the user's query.\n"
        "If you cannot produce a meaningful summary, just say 'Summary not possible'.\n\n"
        f"Text:\n{combined_text}\n\nSummary:"
    )

    summary = get_llm_response(prompt).strip()

    if len(summary) < 20 or "Summary not possible" in summary:
        print("Summary was too short or not possible. Providing full chunks instead.")
        return combined_text

    return summary

def final_generation(query, context):
    """
    Provide the final answer using either the summarized or raw context.
    If context is empty or only whitespace, return a fallback message.
    """
    if not context or not context.strip():
        return "I'm sorry, but I couldn't find any relevant information."

    prompt = (
        f"Question: {query}\n"
        f"Context:\n{context}\n"
        "Answer:"
    )
    return get_llm_response(prompt)

if __name__ == "__main__":
    chunked_docs = load_and_chunk_corpus("data/corpus.json", chunk_size=40)
    collection = build_chroma_collection(chunked_docs, "summary_demo_collection")

    user_query = "Provide an overview of our internal policies."
    retrieval_results = collection.query(query_texts=[user_query], n_results=5)

    if not retrieval_results['documents'][0]:
        print("No chunks were retrieved for the query.")
        final_answer = "No relevant information found."
    else:
        retrieved_chunks = []
        for doc_text in retrieval_results['documents'][0]:
            retrieved_chunks.append({"text": doc_text})

        # Decide whether to summarize based on the number or overlap of chunks
        if len(retrieved_chunks) > 3 or are_chunks_overlapping(retrieved_chunks):
            context = summarize_chunks(retrieved_chunks)
        else:
            context = "\n".join(f"- {c['text']}" for c in retrieved_chunks)

        # Generate the final, context-aware answer
        final_answer = final_generation(user_query, context)

    print(f"Final answer:\n{final_answer}")
```
