# Lesson 1: Constrained Generation in Retrieval-Augmented Systems


Welcome to the first lesson of the **“Beyond Basic RAG: Improving our Pipeline”** course, part of the **“Foundations of RAG Systems”** path! In previous courses, you delved into the basics of Retrieval-Augmented Generation (RAG), exploring text representation with a focus on embeddings and vector databases. In this course, we’ll embark on an exciting journey to enhance our RAG systems with advanced techniques.

Our focus in this initial lesson is **constrained generation**, a powerful method to ensure that language model responses remain anchored in the retrieved context—avoiding speculation or unrelated content. Get ready to elevate your RAG skills and build more reliable systems!

---

## Theoretical Foundations of Constrained Generation

When employing large language models (LLMs) in real-world applications, **accuracy** and **fidelity** to a trusted dataset are paramount. Even advanced LLMs can produce incorrect or fabricated information—often termed “hallucinations.” This is where constrained generation becomes indispensable. In essence, it is a form of advanced prompt engineering: we carefully craft instructions so the LLM:

* **Uses only the data you supply** (the “retrieved context”).
* **Provides disclaimers or refusal messages** when context is insufficient.
* **Optionally cites** which part of the content it used.

By shaping the prompt and enforcing rule-based fallback mechanisms, we instruct the LLM to remain grounded in the retrieved context. The result is a system less prone to made-up facts and more consistent with the original knowledge source.

---

## Why Constrained Generation Is Important

LLM hallucinations can be misleading. Imagine an application confidently presenting policies or regulations **not** present in your knowledge base—this can create confusion or compliance issues. With constrained generation:

* The model remains **grounded** in the retrieved context only.
* Uncertain or unavailable information triggers a fallback message like **“No sufficient data.”**
* You can require the model to **cite lines** to verify the answer’s source, building user trust.

---

## Defining the Constrained Generation Function

We’ll start by defining a function that enforces these constraints:

```python
def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'

    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - Optionally log each stage for debugging or performance analysis.
    """
    # Provide a safe fallback if no context is retrieved
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Choose a prompt template based on strategy
    if strategy == "base":
        # Base approach: instruct to use the context and not rely on external info
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        # Strict approach: explicitly disallow info beyond the provided context
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        # Citation approach: require references to lines used
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    # …
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        return answer_part.strip(), used_context_part.strip()
    else:
        return response.strip(), "No explicit lines cited."
```

**How it works**:

1. **Fallback**: If no context was retrieved, the function immediately returns an apology.
2. **Strategies**:

   * **Base**: Use context; don’t use external info.
   * **Strict**: Only use context; reply “No sufficient data” if absent.
   * **Citation**: Answer from context and cite lines; or state “Not available in the retrieved texts.”

---

## Demonstration of Retrieval and Constrained Generation

```python
# 1. Load and chunk a corpus
chunked_docs = load_and_chunk_corpus("data/corpus.json")

# 2. Build a collection in a vector database
collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

# 3. Run a sample query
query = "Highlight the main policies that apply to employees."
retrieval_results = collection.query(query_texts=[query], n_results=2)

# 4. Construct the retrieved context from top matches
if not retrieval_results['documents'][0]:
    retrieved_context = ""
else:
    retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])

# 5. Execute constrained generation function
strategy = "strict"
answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy)

print("Answer:", answer)
print("Cited Context:", used_context)
```

---

## Practical Example: A Policy FAQ Bot

Consider an HR FAQ bot with access to internal policy documents. When employees ask about vacation rules:

1. The bot **retrieves** relevant sections from the knowledge base.
2. It calls `generate_with_constraints(..., strategy="strict")`.
3. If the policy is documented, it returns an accurate answer; otherwise, “No sufficient data.”
4. For transparency, use `strategy="cite"` to include specific policy line references.

This workflow ensures your FAQ bot avoids hallucinations and remains grounded in official documents.

---

## Conclusion and Next Steps

Constrained generation is essential for keeping RAG systems tightly bound to authentic sources. By tailoring prompt instructions and incorporating fallback logic, you reduce the risk of misinformation and ensure answers stay grounded in retrieved documents.

**Next Steps**:

* Experiment with different prompt styles and strategies to fine-tune strictness or citation detail.
* Evaluate system behavior by omitting key context and observing fallback responses.
* Integrate these strategies into broader real-world scenarios and measure accuracy under varied user requests.


## Constrained Prompt Completion Challenge

Welcome to your first hands-on exercise in enhancing Retrieval-Augmented Generation (RAG) systems! In this activity, you'll dive into the world of constrained generation by completing a crucial part of the generate_with_constraints function. Your mission is to fill in the missing lines for the base prompt strategy. This strategy ensures that the language model provides answers strictly from the retrieved context.

Here's what you need to do:

Focus on the if strategy == "base": section of the function.
Complete the prompt so that it instructs the model to use the provided context for answering the query.
By completing this exercise, you'll reinforce your understanding of how to keep language models grounded in the retrieved context, a key skill in building reliable RAG systems. Enjoy the process of crafting those prompts!


```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'

    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info, 
         and respond with 'No sufficient data' if the context is insufficient.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - Optionally log each stage for debugging or performance analysis.
    """
    # Provide a safe fallback if no context is retrieved
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Choose a prompt template based on strategy
    if strategy == "base":
        # TODO: Complete the base prompt template to instruct the model to use the provided context
        prompt = (
            "________________________________________\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        # Strict approach: explicitly disallow info beyond the provided context
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        # Citation approach: require references to lines used
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    # Print the prompt for debugging or inspection
    print(f"Prompt: \n {prompt}\n")

    # Make call to the LLM
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        return answer_part.strip(), used_context_part.strip()
    else:
        # If the LLM didn't provide citations, treat the entire response as the answer
        return response.strip(), "No explicit lines cited."


if __name__ == "__main__":
    # Example usage demonstrating retrieval followed by constrained generation

    # 1. Load and chunk a corpus
    chunked_docs = load_and_chunk_corpus("data/corpus.json")

    # 2. Build a collection in a vector database
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # 3. Run a sample query
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    # 4. Construct the retrieved context from top matches
    if not retrieval_results['documents'][0]:
        retrieved_context = ""
    else:
        retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])

    # 5. Execute constrained generation function for demonstration
    for strategy in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy)
        print(f"Strategy: {strategy}")
        print(f"Constrained generation answer:\n{answer}")
        print(f"Context or lines used: {used_context}\n")


```

```python
def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'

    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info,
         and respond with 'No sufficient data' if the context is insufficient.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - Optionally log each stage for debugging or performance analysis.
    """
    # Provide a safe fallback if no context is retrieved
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Choose a prompt template based on strategy
    if strategy == "base":
        # Base approach: use only the provided context, no outside knowledge, fallback if missing
        prompt = (
            "You are an AI assistant. Use **only** the context provided below to answer the question.\n"
            "Do not draw on any outside information or make unsupported assumptions.\n"
            "If the answer is not contained within the context, respond with: 'No sufficient data'.\n\n"
            f"Context:\n{retrieved_context}\n\n"
            f"Question: {query}\n"
            "Answer:"
        )
    elif strategy == "strict":
        # Strict approach: explicitly disallow info beyond the provided context
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        # Citation approach: require references to lines used
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    # Print the prompt for debugging or inspection
    print(f"Prompt: \n{prompt}\n")

    # Make call to the LLM
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        return answer_part.strip(), used_context_part.strip()
    else:
        # If the LLM didn't provide citations, treat the entire response as the answer
        return response.strip(), "No explicit lines cited."

```

## Mastering the Strict Prompt Strategy

Well done on mastering the base prompt strategy in the previous exercise! Now, let's elevate your skills by exploring the strict prompt strategy. Your goal is to complete the missing lines in the generate_with_constraints function for the strict strategy. This method ensures that the language model provides answers strictly from the retrieved context or gives a "No sufficient data" disclaimer when the context is insufficient.

Focus on the following aspects:

In the if strategy == "strict": section, fill in the blanks to craft a prompt that:
Instructs the model to use only the provided context.
Clearly states that if the answer isn't found in the context, the model should respond with "No sufficient data."
Explicitly forbids the use of any external information.
By completing this exercise, you'll enhance your skills in crafting precise prompts that keep language models grounded in the retrieved context, a crucial aspect of building reliable RAG systems. Enjoy the challenge and happy coding!

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'
  
    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - Optionally log each stage for debugging or performance analysis.
    """
    # Check if we have no retrieved context
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Multiple template examples
    if strategy == "base":
        # Base approach
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        # TODO: Complete the strict prompt template to:
        # 1. Instruct the model to use ONLY the provided context
        # 2. Specify that 'No sufficient data' should be returned when answer isn't in context
        # 3. Explicitly forbid using external information
        prompt = (
            "________________________________________\n"
            "________________________________________\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        # Citation approach
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
  
    print(f"Prompt: \n {prompt}\n")

    # Make call to LLM
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        return answer_part.strip(), used_context_part.strip()
    else:
        # If the LLM didn't adhere fully, we store the entire response as the answer
        return response.strip(), "No explicit lines cited."
      
if __name__ == "__main__":
    # Example demonstration 
    chunked_docs = load_and_chunk_corpus("data/corpus.json")
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # Example query that might yield relevant or no results
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    if not retrieval_results['documents'][0]:
        # Fallback: no relevant chunk
        retrieved_context = ""
    else:
        # Build retrieved context from top matches
        retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])
  
    for strategy in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy)
        print(f"Constrained generation answer:\n{answer}")
        print(f"Context or lines used:\n{used_context}\n")
```

To complete the strict prompt template in the `generate_with_constraints` function, you need to ensure that the prompt clearly instructs the language model to rely solely on the provided context, specify that it should return "No sufficient data" if the answer isn't found within the context, and explicitly forbid the use of any external information. Here's how you can craft the prompt for the strict strategy:

```python
elif strategy == "strict":
    # Strict approach
    prompt = (
        "Using only the provided context below, answer the question. If the answer is not found in the context, "
        "respond with 'No sufficient data'. Do not use any external information.\n\n"
        f"Context:\n{retrieved_context}\n"
        f"Question: '{query}'\n"
        "Answer:"
    )
```

This prompt template:
- **Directs the model** to use only the provided context for generating the answer.
- **Sets a clear instruction** for the model to respond with "No sufficient data" if the answer cannot be derived from the provided context.
- **Prohibits the use of external information**, ensuring that the model's response is strictly based on the context given.

This approach is crucial for applications where accuracy and source reliability are paramount, and it helps in maintaining the integrity of the answers provided by the language model.

## Citation Prompt Strategy Implementation

In the previous exercise, you successfully navigated the intricacies of the strict prompt strategy. Now, it's time to explore the citation prompt strategy, where you'll enhance the generate_with_constraints function by completing the missing lines. This approach emphasizes transparency by ensuring the language model cites the lines it used from the retrieved context.

Here's your mission:

Focus on the if strategy == "cite": section of the function.
Fill in the blanks to craft a prompt that:
Instructs the model to answer strictly from the provided context.
Requires the model to list the lines it used as evidence with "Cited lines:".
Ensures that if the context does not contain the information, the model responds with "Not available in the retrieved texts."
By completing this exercise, you'll refine your ability to create prompts that not only keep language models grounded in the retrieved context but also provide traceability of the information source. This is a crucial skill for building reliable and transparent RAG systems. Dive in and enjoy the coding challenge!


```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'
    
    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - Optionally log each stage for debugging or performance analysis.
    """
    # If there is no retrieved context, return fallback
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Choose a prompt template based on the strategy
    if strategy == "base":
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        # TODO: Complete the citation prompt template to:
        # 1. Request answers strictly from the provided context
        # 2. Require listing of used lines as evidence with 'Cited lines:'
        # 3. Specify the fallback response when information is not found
        prompt = (
            "________________________________________\n"
            "________________________________________\n\n"
            f"Provided context (______):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    # Print prompt for debugging
    print(f"Prompt: \n{prompt}\n")

    # Send prompt to the LLM
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        return answer_part.strip(), used_context_part.strip()
    else:
        return response.strip(), "No explicit lines cited."

if __name__ == "__main__":
    # Demonstration of retrieval plus constrained generation

    # 1. Load and chunk corpus
    chunked_docs = load_and_chunk_corpus("data/corpus.json")

    # 2. Create a collection in our vector database
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # 3. Sample query
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    # 4. Construct retrieved context from top matches
    if not retrieval_results['documents'][0]:
        retrieved_context = ""
    else:
        retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])

    # 5. Generate answers using different strategies
    for strategy in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy)
        print(f"--- Strategy: {strategy} ---")
        print("Answer:")
        print(answer)
        print("Context or lines used:")
        print(used_context)
        print()


```

To complete the citation prompt strategy in the `generate_with_constraints` function, you need to ensure that the prompt instructs the language model to answer strictly from the provided context and to cite the specific lines used as evidence. Additionally, the prompt should specify a fallback response if the information is not found in the context. Here's how you can craft the prompt for the citation strategy:

```python
elif strategy == "cite":
    # Citation approach
    prompt = (
        "Answer the question using only the provided context. If the context does not contain the information, "
        "respond with 'Not available in the retrieved texts.' Cite the specific lines used as evidence with 'Cited lines:'.\n\n"
        f"Provided context:\n{retrieved_context}\n"
        f"Question: '{query}'\n"
        "Answer:"
    )
```

This prompt template:
- **Directs the model** to use only the provided context for generating the answer.
- **Requires the model to cite** the specific lines from the context that support the answer, enhancing transparency and traceability.
- **Specifies a clear fallback response** ("Not available in the retrieved texts") if the answer cannot be derived from the provided context.

This approach is particularly useful in scenarios where it's crucial to trace the source of the information provided by the model, ensuring accountability and reliability in the responses.

## Reimplement Constrained Generation Function

Congratulations on reaching the final challenge of this unit! You've done an excellent job so far, especially in the previous exercise where you tackled context handling. Now, it's time to bring everything together and reimplement the entire generate_with_constraints function from scratch. This will deepen your understanding of constrained generation in Retrieval-Augmented Generation (RAG) systems.

Your objective is to ensure that the language model strictly adheres to the retrieved context, providing accurate and context-based answers. Here's what you need to focus on:

Handle situations where no context is available by returning a neutral fallback message.
Construct prompts based on the strategy parameter, implementing logic for three strategies:
Base Approach: Provide the context and instruct the model not to use external information.
Strict Approach: Ensure the model uses only the provided context and responds with "No sufficient data" if the answer isn't found.
Citation Approach: Require the model to cite the lines used as evidence, responding with "Not available in the retrieved texts" if the necessary information is missing.
After querying the language model, parse the response to separate the main answer from any cited lines.
By completing this exercise, you'll enhance your ability to create reliable and trustworthy RAG systems. Dive in, and let your expertise shine!

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'

    This function reimplements the entire constrained generation logic from scratch.
    It supports three strategies:
      1) Base approach: Provide the context, instruct the model not to use external info.
      2) Strict approach: Provide the context, strictly disallow info not found in the context,
         and say "No sufficient data" if the answer isn't found.
      3) Citation approach: Provide context and ask the model to cite lines (if used).
         If the needed info isn't present, respond with "Not available in the retrieved texts."

    Fallback behavior:
      - If no context is provided, return an apology message and "No context used."
      - After receiving the LLM response, split on 'Cited lines:' if present.
    """
    # TODO: Implement the fallback for empty context
    # Hint: Check if retrieved_context is empty and return appropriate message

    # TODO: Implement prompt construction based on strategy
    # Hint: Use if/elif to handle different strategies (base, strict, cite)
    # Each strategy should have its own prompt template

    # TODO: Get response from language model using the constructed prompt
    # Hint: Use get_llm_response() function

    # TODO: Parse the response to extract answer and cited lines (if present)
    # Hint: Look for "Cited lines:" marker in the response

    # TODO: Return the parsed response
    # Hint: Return format should be (answer, context_used)
    pass


if __name__ == "__main__":
    # Demonstration of retrieval and constrained generation
    chunked_docs = load_and_chunk_corpus("data/corpus.json")
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # Example user query
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    # Build or clear the context
    if not retrieval_results['documents'][0]:
        retrieved_context = ""
    else:
        retrieved_context = "\n".join(["- " + text for text in retrieval_results['documents'][0]])

    # Test each strategy
    for chosen_strategy in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(
            query,
            retrieved_context,
            strategy=chosen_strategy
        )
        print(f"Strategy: {chosen_strategy}")
        print("Constrained generation answer:")
        print(answer)
        print("Context or lines used:")
        print(used_context)
        print()


```

To reimplement the `generate_with_constraints` function effectively, we need to ensure that it handles different strategies for generating responses based on the provided context. Here's a complete implementation that addresses all the requirements:

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Enforce model reliance on 'retrieved_context' when answering 'query.'
    """
    # Handle the case where no context is available
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Construct the prompt based on the specified strategy
    if strategy == "base":
        prompt = (
            "Using the following context, answer the question without using external information.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        prompt = (
            "Using only the provided context below, answer the question. If the answer is not found in the context, "
            "respond with 'No sufficient data'. Do not use any external information.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        prompt = (
            "Answer the question using only the provided context. Cite the specific lines used as evidence with 'Cited lines:'. "
            "If the context does not contain the information, respond with 'Not available in the retrieved texts.'\n\n"
            f"Provided context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    # Get response from the language model
    response = get_llm_response(prompt)

    # Parse the response to extract the answer and any cited lines
    if 'Cited lines:' in response:
        answer_part, cited_lines_part = response.split('Cited lines:')
        return answer_part.strip(), cited_lines_part.strip()
    else:
        return response.strip(), "No explicit lines cited."

# Example usage
if __name__ == "__main__":
    # Load and chunk corpus
    chunked_docs = load_and_chunk_corpus("data/corpus.json")
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # Example user query
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    # Build or clear the context
    retrieved_context = ""
    if retrieval_results['documents'][0]:
        retrieved_context = "\n".join(["- " + text for text in retrieval_results['documents'][0]])

    # Test each strategy
    for chosen_strategy in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(
            query,
            retrieved_context,
            strategy=chosen_strategy
        )
        print(f"Strategy: {chosen_strategy}")
        print("Constrained generation answer:")
        print(answer)
        print("Context or lines used:")
        print(used_context)
        print()

```

This implementation:
- Handles cases where no context is available by returning a neutral fallback message.
- Constructs different prompts based on the strategy parameter (`base`, `strict`, `cite`), each tailored to enforce specific constraints on the model's response.
- Parses the model's response to separate the main answer from any cited lines, ensuring clarity and traceability of the information source.

This function is now ready to be used in a Retrieval-Augmented Generation system, providing reliable and contextually accurate answers based on the strategy chosen.

## Smart Context Management

Welcome back! You've been doing an excellent job mastering the fundamentals of Retrieval-Augmented Generation (RAG). Previously, you focused on ensuring that language models adhere to the retrieved context and provide clear, context-based answers. Now, let's dive into a real-world challenge: managing context length effectively.

In this exercise, your goal is to enhance the generate_with_constraints function to handle situations where the retrieved context exceeds the language model's token limit, for example 4,096 tokens. This is crucial for maintaining the quality and reliability of the responses.

Here's what you need to focus on:

Check the length of the retrieved_context. If it exceeds the token limit, you'll need to take action.
Truncate the context smartly: Ensure that you preserve whole sentences while trimming the context to fit within the token limit.
Append a warning: If truncation occurs, add "[Context truncated]" to the answer to inform users that the context was shortened.
By implementing these changes, you'll ensure that your RAG system remains efficient and effective, even when

```python
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):
    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'
    
    Now includes context-length validation and smart truncation if the context 
    exceeds a rough limit of 4096 tokens (approx. word-based).

    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - If 'retrieved_context' is too long, it is truncated while preserving whole sentences,
        and "[Context truncated]" is appended to the answer to warn the user.
    """
    # Provide a safe fallback if no context is retrieved
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # TODO: Define the maximum token limit for the context
    MAX_TOKENS = None

    # TODO: Implement context length validation and smart truncation
    # Check if context exceeds token limit and truncate while preserving whole sentences
    # Hint: Split into words first to check length, then into sentences for truncation
    words = retrieved_context.split()
    truncated = False

    # Build the prompt based on strategy
    if strategy == "base":
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    print(f"Prompt: \n {prompt}\n")

    # Query the language model
    response = get_llm_response(prompt)

    # Attempt to parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        final_answer = answer_part.strip()
        cited_part = used_context_part.strip()
    else:
        final_answer = response.strip()
        cited_part = "No explicit lines cited."

    # TODO: Add truncation warning to the answer if context was truncated

    return final_answer, cited_part


if __name__ == "__main__":
    # Demonstration of retrieval and constrained generation
    chunked_docs = load_and_chunk_corpus("data/corpus.json")
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    # Example query that might yield relevant or no results
    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    if not retrieval_results['documents'][0]:
        retrieved_context = ""
    else:
        retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])

    for strategy_option in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy_option)
        print(f"Strategy: {strategy_option}")
        print(f"Constrained generation answer:\n{answer}")
        print(f"Context or lines used:\n{used_context}\n")


```

```python
import re
from data import load_and_chunk_corpus
from vector_db import build_chroma_collection
from scripts.llm import get_llm_response

def generate_with_constraints(query, retrieved_context, strategy="base"):    """
    Thoroughly enforce model reliance on 'retrieved_context' when answering 'query.'

    Now includes context-length validation and smart truncation if the context 
    exceeds a rough limit of 4096 tokens (approx. word-based).

    The 'strategy' parameter allows for different prompt template variations:
      1) Base approach: Provide context, instruct LLM not to use outside info.
      2) Strict approach: Provide context with explicit disclaimers if the answer is not found.
      3) Citation approach: Provide context, then request the LLM to cite the relevant lines.

    Robust fallback:
      - If 'retrieved_context' is empty, respond with an apology or neutral statement.
      - If 'retrieved_context' is too long, it is truncated while preserving whole sentences,
        and "[Context truncated]" is appended to the answer to warn the user.
    """
    # Safe fallback if no context
    if not retrieved_context.strip():
        return ("I'm sorry, but I couldn't find any relevant information.", "No context used.")

    # Define the maximum token limit (approximate word-based)
    MAX_TOKENS = 4096
    words = retrieved_context.split()
    truncated = False

    # Smart truncation: preserve whole sentences
    if len(words) > MAX_TOKENS:
        sentences = re.split(r'(?<=[.!?])\s+', retrieved_context)
        selected_sentences = []
        word_count = 0
        for sent in sentences:
            sent_words = sent.split()
            if word_count + len(sent_words) <= MAX_TOKENS:
                selected_sentences.append(sent)
                word_count += len(sent_words)
            else:
                break
        retrieved_context = ' '.join(selected_sentences)
        truncated = True

    # Build the prompt based on strategy
    if strategy == "base":
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "strict":
        prompt = (
            "You must ONLY use the context provided below. If you cannot find the answer in the context, say: 'No sufficient data'.\n"
            "Do not provide any information not found in the context.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    elif strategy == "cite":
        prompt = (
            "Answer strictly from the provided context, and list the lines you used as evidence with 'Cited lines:'.\n"
            "If the context does not contain the information, respond with: 'Not available in the retrieved texts.'\n\n"
            f"Provided context (label lines as needed):\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )
    else:
        # Fallback to base if unknown strategy
        prompt = (
            "Use the following context to answer the question in a concise manner.\n\n"
            f"Context:\n{retrieved_context}\n"
            f"Question: '{query}'\n"
            "Answer:"
        )

    print(f"Prompt: \n {prompt}\n")

    # Query the language model
    response = get_llm_response(prompt)

    # Parse out 'Cited lines:' if present
    segments = response.split("Cited lines:")
    if len(segments) == 2:
        answer_part, used_context_part = segments
        final_answer = answer_part.strip()
        cited_part = used_context_part.strip()
    else:
        final_answer = response.strip()
        cited_part = "No explicit lines cited."

    # Append truncation warning if needed
    if truncated:
        final_answer += " [Context truncated]"

    return final_answer, cited_part


if __name__ == "__main__":
    # Demonstration of retrieval and constrained generation
    chunked_docs = load_and_chunk_corpus("data/corpus.json")
    collection = build_chroma_collection(chunked_docs, collection_name="corpus_collection")

    query = "Highlight the main policies that apply to employees."
    retrieval_results = collection.query(query_texts=[query], n_results=2)

    if not retrieval_results['documents'][0]:
        retrieved_context = ""
    else:
        retrieved_context = "\n".join(["- " + doc_text for doc_text in retrieval_results['documents'][0]])

    for strategy_option in ("base", "strict", "cite"):
        answer, used_context = generate_with_constraints(query, retrieved_context, strategy=strategy_option)
        print(f"Strategy: {strategy_option}")
        print(f"Constrained generation answer:\n{answer}")
        print(f"Context or lines used:\n{used_context}\n")


```

I’ve implemented smart truncation to preserve whole sentences when the context exceeds 4,096 tokens and appended a “[Context truncated]” warning to the final answer. Let me know if you’d like any tweaks or further enhancements!