# Lesson 2: RAG in Action: A Simple Workflow


Welcome to our second lesson in the **“Introduction to RAG”** course! In the previous lesson, you learned how RAG evolved from traditional Information Retrieval (IR). Today, we'll connect those ideas to concrete code and illustrate a simple end-to-end RAG workflow. By the end of this lesson, you'll see how **indexing**, **retrieval**, **prompt augmentation**, and **final text generation** fit together to produce a targeted answer.

In this lesson, we showcase a scenario involving **“Project Chimera.”** Think of this as an internal project of a company—here, it’s just an example to demonstrate how a naive, context-free system can invent inaccurate details, whereas a RAG-based system will provide reliable answers using information from an authoritative knowledge base. We’re deliberately using extremely simplified methods (like simple keyword matching) to illustrate each part of a RAG pipeline. Later, we will explore more realistic and robust approaches—such as embeddings and vector databases—for each component.

---

## The RAG Workflow: Four Key Steps

1. **Indexing:** Documents are structured in a way that makes them easy to search.  
2. **Retrieval:** The most relevant piece of text is fetched based on a user query.  
3. **Prompt (Query) Augmentation:** The retrieved text is combined with the user’s question to form a context-rich prompt.  
4. **Generation:** A language model processes the prompt and produces a final answer anchored to the provided text.

This process ensures answers are backed by your data, reducing the risk of fabricated or off-topic responses.

---

## 1. Indexing: Organizing Documents

We start by defining our knowledge base. Below, **Project Chimera** serves as the example domain:

```python
KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "Project Chimera is a research initiative focused on developing "
            "novel bio-integrated interfaces. It aims to merge biological "
            "systems with advanced computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Neural Interface",
        "content": (
            "The core component of Project Chimera is a neural interface "
            "that allows for bidirectional communication between the brain "
            "and external devices. This interface uses biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "Potential applications of Project Chimera include advanced "
            "prosthetics, treatment of neurological disorders, and enhanced "
            "human-computer interaction. Ethical considerations are paramount."
        )
    }
}
```

- We define a Python dictionary named `KNOWLEDGE_BASE` that contains multiple documents.  
- Each entry has an ID (e.g., `"doc1"`) and both a `title` and `content` field.  
- **Project Chimera** information is now the authoritative data source for the RAG system.

> *Note:* This is a very simplified approach for educational purposes; in production, your knowledge base would likely be a vector database or other scalable store. We’ll cover that in Course 3!

---

## 2. Retrieval: Locating Relevant Information

Next, we create a function to return the best document from our knowledge base, based on simple keyword overlap:

```python
def rag_retrieval(query, documents):
    query_words = set(query.lower().split())
    best_doc_id = None
    best_overlap = 0
    
    for doc_id, doc in documents.items():
        # Compare the query words with the document's content words
        doc_words = set(doc["content"].lower().split())
        overlap = len(query_words.intersection(doc_words))
        
        if overlap > best_overlap:
            best_overlap = overlap
            best_doc_id = doc_id
    
    # Return the best document, or None if nothing matched
    return documents.get(best_doc_id)
```

- The query is split into lowercase words and stored in a set.  
- Each document’s text is similarly tokenized.  
- The function picks the document with the greatest word overlap.  
- If no match is found, it returns `None`.

---

## 3. Query Augmentation: Creating Context-Rich Prompts

Once we retrieve the relevant document, we augment the user’s original question with that document’s content. This additional context significantly reduces hallucinations:

```python
def rag_generation(query, document):
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return prompt
```

- If a document was found, we include its title and content as a “snippet” in the prompt.  
- We pass this combined query+snippet to the language model.  
- If no document matches, we still form a direct prompt.

**Example prompt** for the query `"What is the main goal of Project Chimera?"`:

> ```
> Using the following information: 'Chimera's Neural Interface: The core component of Project Chimera is a neural interface that allows for bidirectional communication between the brain and external devices. This interface uses biocompatible nanomaterials.', answer: What is the main goal of Project Chimera?
> ```

---

## 4. Generation: Producing Tailored Answers

Finally, let’s compare a naive approach (which might invent details) with our RAG approach (which leverages the knowledge base):

```python
def get_llm_response(prompt):
    """
    This function interfaces with a language model to generate a response
    based on the provided prompt.
    """
    pass  # Replace with actual LLM API call

def naive_generation(query):
    # Ignores the knowledge base entirely
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)

def rag_generation(query, document):
    # Augments the prompt via the knowledge base
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)

# Demonstration
query = "What is the main goal of Project Chimera?"

naive_answer = naive_generation(query)
print("Naive approach:", naive_answer)

doc = rag_retrieval(query, KNOWLEDGE_BASE)
rag_answer = rag_generation(query, doc)
print("RAG approach:", rag_answer)
```

- **`naive_generation`** may produce a plausible-sounding but incorrect answer.  
- **`rag_generation`** uses the retrieved snippet to ground its response, reducing hallucinations.

---

## Conclusion and Next Steps

You’ve now implemented:

1. A simple knowledge-base indexing scheme.  
2. Basic retrieval to find the most relevant document.  
3. Prompt augmentation to combine user queries with reference data.  
4. Generation that relies on actual context, lowering the chance of hallucinations.

Next up, you’ll get hands-on practice with these steps in coding exercises. As you progress, you’ll explore how RAG can be extended with **embeddings**, **vector databases**, and more robust retrieval techniques for complex tasks and real-world domains. Let’s continue unlocking the full power of RAG—onward!  


## Implementing Simple Document Retrieval

Hello! You've been doing well so far. In this exercise, you'll practice implementing a simple retrieval function to find the most relevant document from a knowledge base using keyword overlap.

Your task is to complete the rag_retrieval function. Here's what you need to do:

Split the query into lowercase words and store them in a set.
For each document, split its content into lowercase words and store them in a set.
Calculate the overlap between the query words and the document words.
Return the document with the highest overlap.
Keep up the good work, and let's see how you can apply what you've learned!


```python
from scripts.llm import get_llm_response

KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "Project Chimera is a research initiative focused on developing "
            "novel bio-integrated interfaces. It aims to merge biological "
            "systems with advanced computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Neural Interface",
        "content": (
            "The core component of Project Chimera is a neural interface "
            "that allows for bidirectional communication between the brain "
            "and external devices. This interface uses biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "Potential applications of Project Chimera include advanced "
            "prosthetics, treatment of neurological disorders, and enhanced "
            "human-computer interaction. Ethical considerations are paramount."
        )
    }
}


def naive_generation(query):
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)


def rag_retrieval(query, documents):
    # TODO: Split the query into lowercase words and store them in a set
    # TODO: For each document, split its content into lowercase words and store them in a set
    # TODO: Calculate the overlap between the query words and document words
    # TODO: Return the document with the highest overlap
    pass


def rag_generation(query, document):
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)


if __name__ == "__main__":
    query = "What is the main goal of Project Chimera?"
    print("Naive approach:", naive_generation(query))
    retrieved_doc = rag_retrieval(query, KNOWLEDGE_BASE)
    print("RAG approach:", rag_generation(query, retrieved_doc))


```

Here’s the updated `rag_retrieval` function with the TODOs filled in. It splits the query and each document’s content into lowercase word sets, computes the overlap, and returns the document with the highest overlap (or `None` if there’s no match):

```python
def rag_retrieval(query, documents):
    # Split the query into lowercase words
    query_words = set(query.lower().split())
    
    best_doc_id = None
    best_overlap = 0
    
    for doc_id, doc in documents.items():
        # Split the document content into lowercase words
        doc_words = set(doc["content"].lower().split())
        # Compute the number of shared words
        overlap = len(query_words.intersection(doc_words))
        
        if overlap > best_overlap:
            best_overlap = overlap
            best_doc_id = doc_id
    
    # Return the best-matching document (or None)
    return documents.get(best_doc_id)
```

If you plug this back into your script, the RAG approach should now correctly pick the document with the most keyword overlap.

## Fix the Retrieval Bug

Welcome back! You've been doing well so far. In this exercise, you'll work on fixing a bug in the rag_retrieval function. The current implementation has a critical flaw: it only looks at the document's content but completely ignores the title field, which often contains important context.

Your goal is to modify the retrieval logic to consider both the title and content when calculating relevance, giving appropriate weight to matches found in either field.

Dive in and see how you can improve the code!

```python
from scripts.llm import get_llm_response

KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "research initiative focused on developing novel bio-integrated "
            "interfaces, aiming at merging biological systems with advanced "
            "computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Core Component",
        "content": (
            "a neural interface that allows for bidirectional communication "
            "between the brain and external devices, using biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "advanced prosthetics, treatment of neurological disorders, "
            "enhanced human-computer interaction."
        )
    }
}


def naive_generation(query):
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)


def rag_retrieval(query, documents):
    query_words = set(query.lower().split())
    best_doc_id = None
    best_score = 0
    
    for doc_id, doc in documents.items():
        # TODO: Fix the bug by considering both title and content
        # Currently only looking at content, ignoring potentially relevant titles
        doc_words = set(doc["content"].lower().split())
        score = len(query_words.intersection(doc_words))
        
        if score > best_score:
            best_score = score
            best_doc_id = doc_id
            
    return documents.get(best_doc_id)


def rag_generation(query, document):
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)


if __name__ == "__main__":
    query = "What are the applications of Project Chimera?"
    print("Naive approach:", naive_generation(query))
    retrieved_doc = rag_retrieval(query, KNOWLEDGE_BASE)
    print("RAG approach:", rag_generation(query, retrieved_doc))


```

Here’s one way to incorporate both title and content into your relevance score, with a heavier weight on title‐matches (e.g. title matches count twice as much as content matches), and a simple regex‐based tokenizer to strip punctuation:

```python
import re
from scripts.llm import get_llm_response

KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "research initiative focused on developing novel bio-integrated "
            "interfaces, aiming at merging biological systems with advanced "
            "computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Core Component",
        "content": (
            "a neural interface that allows for bidirectional communication "
            "between the brain and external devices, using biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "advanced prosthetics, treatment of neurological disorders, "
            "enhanced human-computer interaction."
        )
    }
}


def naive_generation(query):
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)


def tokenize(text):
    """
    Simple word tokenizer: lowercase + extract alphanumeric chunks.
    """
    return set(re.findall(r"\w+", text.lower()))


def rag_retrieval(query, documents, title_weight=2):
    query_words = tokenize(query)

    best_doc_id = None
    best_score = 0

    for doc_id, doc in documents.items():
        title_words   = tokenize(doc["title"])
        content_words = tokenize(doc["content"])

        title_matches   = len(query_words & title_words)
        content_matches = len(query_words & content_words)

        # give title matches more impact
        score = title_weight * title_matches + content_matches

        if score > best_score:
            best_score = score
            best_doc_id = doc_id

    return documents.get(best_doc_id)


def rag_generation(query, document):
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)


if __name__ == "__main__":
    query = "What are the applications of Project Chimera?"
    print("Naive approach:", naive_generation(query))

    retrieved_doc = rag_retrieval(query, KNOWLEDGE_BASE)
    print("RAG approach:", rag_generation(query, retrieved_doc))
```

### What changed?

1. **Tokenization**  
   We added a `tokenize()` helper that lowercases and strips punctuation, so `"Chimera?"` and `"Chimera"` are treated the same.  
2. **Title vs. content**  
   We now separately count how many query‐words appear in the title (`title_matches`) and in the content (`content_matches`).  
3. **Weighted scoring**  
   By default we set `title_weight=2`, so each title‐match contributes twice as much as a content‐match. You can tweak `title_weight` to whatever balance you like.  

With this in place, a query like “What are the applications of Project Chimera?” will correctly pick up on the title “Applications of Chimera” rather than just matching on the body text.

## Fix the RAG Generation Bug

Welcome back! You've been doing well so far. In this exercise, you'll focus on fixing a bug in the rag_generation function. This function is supposed to create a context-rich prompt using the retrieved document, but there's a small issue that needs your attention.

Remember, every bug you fix is a step toward mastering the RAG workflow. Keep up the good work!

```python
from scripts.llm import get_llm_response

KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "Project Chimera is a research initiative focused on developing "
            "novel bio-integrated interfaces. It aims to merge biological "
            "systems with advanced computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Neural Interface",
        "content": (
            "The core component of Project Chimera is a neural interface "
            "that allows for bidirectional communication between the brain "
            "and external devices. This interface uses biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "Potential applications of Project Chimera include advanced "
            "prosthetics, treatment of neurological disorders, and enhanced "
            "human-computer interaction. Ethical considerations are paramount."
        )
    }
}


def naive_generation(query):
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)


def rag_retrieval(query, documents):
    query_words = set(query.lower().split())
    best_doc_id = None
    best_overlap = 0
    for doc_id, doc in documents.items():
        doc_words = set(doc["content"].lower().split())
        overlap = len(query_words.intersection(doc_words))
        if overlap > best_overlap:
            best_overlap = overlap
            best_doc_id = doc_id
    return documents.get(best_doc_id)


def rag_generation(query, document):
    if document:
        prompt = f"Using the retrieved information, answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)


if __name__ == "__main__":
    query = "What is the main goal of Project Chimera?"
    print("Naive approach:", naive_generation(query))
    retrieved_doc = rag_retrieval(query, KNOWLEDGE_BASE)
    print("RAG approach:", rag_generation(query, retrieved_doc))


```

Here’s a fixed version of **rag_generation** that actually injects the retrieved document’s title and content into the LLM prompt:

```python
from scripts.llm import get_llm_response

def rag_generation(query, document):
    if document:
        # build a context snippet from title + content
        snippet = f"{document['title']}: {document['content']}"
        prompt = (
            f"Using the following information:\n\n"
            f"\"{snippet}\"\n\n"
            f"Answer the question: {query}"
        )
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)
```

**What changed?**
1. **Include title + content**  
   We compose a `snippet` from both `document['title']` and `document['content']`, so the LLM sees the full context.  
2. **Clear, contextual prompt**  
   We wrap the snippet in quotes and label it as “the following information,” then append the user’s question.  
3. **Fallback unchanged**  
   If no document was retrieved, we still fall back to a direct answer prompt.

With this in place, your RAG-based call will actually condition its answer on the most relevant doc, rather than ignoring it entirely.

## RAG: Retrieving Multiple Relevant Documents

Nice work making it this far! Now, let's enhance our RAG implementation.

The current rag_retrieval function only fetches one document. Update it to retrieve all documents with word overlap.

Modify rag_retrieval to return all documents that have a word overlap with the query.
Update the rag_generation call to handle a list of documents.
This will provide more context to the language model, potentially leading to more accurate and complete answers. Let's see how much better we can make our RAG system!


```python
from scripts.llm import get_llm_response

KNOWLEDGE_BASE = {
    "doc1": {
        "title": "Project Chimera Overview",
        "content": (
            "Project Chimera is a research initiative focused on developing "
            "novel bio-integrated interfaces. It aims to merge biological "
            "systems with advanced computing technologies."
        )
    },
    "doc2": {
        "title": "Chimera's Neural Interface",
        "content": (
            "The core component of Project Chimera is a neural interface "
            "that allows for bidirectional communication between the brain "
            "and external devices. This interface uses biocompatible "
            "nanomaterials."
        )
    },
    "doc3": {
        "title": "Applications of Chimera",
        "content": (
            "Potential applications of Project Chimera include advanced "
            "prosthetics, treatment of neurological disorders, and enhanced "
            "human-computer interaction. Ethical considerations are paramount."
        )
    }
}

def naive_generation(query):
    prompt = f"Answer directly the following query: {query}"
    return get_llm_response(prompt)


def rag_retrieval(query, documents):
    query_words = set(query.lower().split())
    best_doc_id = None
    best_overlap = 0
    for doc_id, doc in documents.items():
        doc_words = set(doc["content"].lower().split())
        overlap = len(query_words.intersection(doc_words))
        if overlap > best_overlap:
            best_overlap = overlap
            best_doc_id = doc_id
    # TODO: Modify this function to return ALL relevant documents
    # and not just the one with the highest overlap.
    return documents.get(best_doc_id)


def rag_generation(query, document):
    # TODO: Modify rag_generation to handle a list of documents.
    if document:
        snippet = f"{document['title']}: {document['content']}"
        prompt = f"Using the following information: '{snippet}', answer: {query}"
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)


if __name__ == "__main__":
    query = "What are the applications of Project Chimera?"
    print("Naive approach:", naive_generation(query))
    retrieved_doc = rag_retrieval(query, KNOWLEDGE_BASE)
    print("RAG approach:", rag_generation(query, retrieved_doc))


```

Here’s a fixed version of **rag_generation** that actually injects the retrieved document’s title and content into the LLM prompt:

```python
from scripts.llm import get_llm_response

def rag_generation(query, document):
    if document:
        # build a context snippet from title + content
        snippet = f"{document['title']}: {document['content']}"
        prompt = (
            f"Using the following information:\n\n"
            f"\"{snippet}\"\n\n"
            f"Answer the question: {query}"
        )
    else:
        prompt = f"No relevant information found. Answer directly: {query}"
    return get_llm_response(prompt)
```

**What changed?**
1. **Include title + content**  
   We compose a `snippet` from both `document['title']` and `document['content']`, so the LLM sees the full context.  
2. **Clear, contextual prompt**  
   We wrap the snippet in quotes and label it as “the following information,” then append the user’s question.  
3. **Fallback unchanged**  
   If no document was retrieved, we still fall back to a direct answer prompt.

With this in place, your RAG-based call will actually condition its answer on the most relevant doc, rather than ignoring it entirely.

## Building a Simple RAG Pipeline