<a href="https://colab.research.google.com/github/Nebius-Academy/LLM-Engineering-Essentials/blob/main/topic3/3.1_the_concept_of_rag_solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# LLMOps Essentials 3.1. The concept of RAG

# Practice solutions

## Task 1: Retrieval as a Tool

In the example above, we built a bot that uses retrieval at every step — but that approach isn't always appropriate. In this task, you'll turn the bot into an agent that calls retrieval only when the LLM deems it's necessary.

Compared with the previous RAG chatbot, this version will be more flexible, avoiding awkward responses to messages like “Hi there!” that don't benefit from retrieval at all.

You're free to design your own architecture, of course, but we suggest combining ideas from both `ChatBotWithRAG` and `NPCTraderAgent` in the agent notebook. If you like, you can set up the decision to call retrieval via a classifier LLM call — similar to how intent classification was used for trade in `NPCTraderAgent`. But for now, we recommend simply using native LLM tool calling.

**Solution**

In [None]:
!pip install -q openai tavily-python

In [None]:
import os

with open("tavily_api_key", "r") as file:
    tavily_api_key = file.read().strip()

os.environ["TAVILY_API_KEY"] = tavily_api_key

with open("nebius_api_key", "r") as file:
    nebius_api_key = file.read().strip()

os.environ["NEBIUS_API_KEY"] = nebius_api_key

Basically we reused the design pattern we used for the NPC Trader Agent from Topic 2. But this time we only have one tool, which is web search.

In [None]:
from collections import defaultdict, deque
from openai import OpenAI
from typing import Dict, Any, List, Optional, Callable
import json
import traceback

class RAGAgent:
    def __init__(self, client: OpenAI, model: str, search_client,
                 history_size: int = 10,
                 get_system_message: Callable[[], Optional[Dict[str, str]]] = None,
                 search_depth: str = "advanced",
                 max_search_results: int = 5
                 ):
        """Initialize the chat agent with RAG tool.

        Args:
            client: OpenAI client instance
            model: The model to use (e.g., "gpt-4o-mini")
            search_client: Search client instance (for example, Tavily)
            history_size: Number of messages to keep in history per user
            get_system_message: Function to retrieve the system message
            search_depth: Depth of web search ('basic' or 'advanced')
            max_search_results: Maximum number of search results to retrieve
        """
        self.client = client
        self.model = model
        self.search_client = search_client
        self.history_size = history_size

        # If no system message function is provided, use the default one
        self.get_system_message = get_system_message if get_system_message else self._default_system_message

        self.search_depth = search_depth
        self.max_search_results = max_search_results

        # Initialize chat history storage
        self.chat_histories = defaultdict(lambda: deque(maxlen=history_size))

        # Define the tools available to the model
        self.tools = [
            {
                "type": "function",
                "function": {
                    "name": "retrieve_information",
                    "description": "Search the web for information relevant to the user's query when you need additional context to provide a complete and accurate answer.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "The search query to use for retrieving information. This should be a refined version of the user's question optimized for web search."
                            }
                        },
                        "required": ["query"]
                    }
                }
            }
        ]

        # Map available tool functions
        self.available_tools = {
            "retrieve_information": self.retrieve_information
        }

    def _default_system_message(self) -> Dict[str, str]:
        """Default system message if none is provided."""
        return {
            "role": "system",
            "content": """You are a helpful assistant."""
        }

    def retrieve_information(self, query: str) -> Dict[str, Any]:
        """
        Perform a web search using the search client and format the results.

        Args:
            query: The search query

        Returns:
            Dictionary containing formatted search results and metadata
        """
        try:
            search_results = self.search_client.search(
                query=query,
                search_depth=self.search_depth,
                max_results=self.max_search_results
            )

            formatted_results = []
            for result in search_results.get('results', []):
                content = result.get('content', '').strip()
                url = result.get('url', '')
                if content:
                    formatted_results.append(f"Content: {content}\nSource: {url}\n")

            # Join all results with proper formatting
            context = "\n".join(formatted_results)

            return {
                "context": context,
                "query": query,
                "num_results": len(formatted_results),
                "success": True,
                "message": f"Retrieved {len(formatted_results)} results for query: '{query}'"
            }

        except Exception as e:
            print(f"Error in retrieve_information: {str(e)}")
            return {
                "context": "",
                "query": query,
                "num_results": 0,
                "success": False,
                "message": f"Failed to retrieve information: {str(e)}"
            }

    def process_tool_calls(self, tool_calls, user_id: str, debug: bool = False) -> List[Dict[str, Any]]:
        """Process tool calls from the LLM response."""
        tool_responses = []

        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_id = tool_call.id

            try:
                function_args = json.loads(tool_call.function.arguments)
            except Exception as e:
                print(f"Error parsing arguments: {e}")
                function_args = {}

            if debug:
                print(f"#Processing tool call:\n {function_name}, args: {function_args}\n")

            if function_name not in self.available_tools:
                print(f"Unknown function: {function_name}")
                continue

            # Get the function to call
            tool_function = self.available_tools[function_name]

            try:
                # Execute the function
                result = tool_function(**function_args)

                # Format the result specifically for retrieval
                if function_name == "retrieve_information":
                    # Format with <context> tags as requested
                    content = json.dumps({
                        "result": {
                            "message": result["message"],
                            "formatted_context": f"<context>\n{result['context']}\n</context>"
                        }
                    })
                else:
                    # Generic handling for any future tools
                    content = json.dumps(result)

                if debug:
                    print(f"#Tool result:\n{content[:200]}...\n")

            except Exception as e:
                print(f"Error executing {function_name}: {e}")
                print(traceback.format_exc())
                content = json.dumps({"error": str(e)})

            # Create the tool response
            tool_responses.append({
                "tool_call_id": function_id,
                "role": "tool",
                "name": function_name,
                "content": content
            })

        return tool_responses

    def chat(self, user_message: str, user_id: str, debug: bool = False) -> str:
        """Process a user message and return the agent's response.

        Args:
            user_message: The message from the user
            user_id: Unique identifier for the user
            debug: Whether to print debug information

        Returns:
            str: The agent's response
        """
        # Add new user message to history
        user_message_dict = {
            "role": "user",
            "content": user_message
        }
        self.chat_histories[user_id].append(user_message_dict)

        # Construct messages for the LLM
        messages = []
        system_message = self.get_system_message()
        if system_message:
            messages.append(system_message)

        # Add conversation history
        history = list(self.chat_histories[user_id])
        if history:
            messages.extend(history)

        try:
            # First API call that might use tools
            completion = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                temperature=0.7,
                tools=self.tools,
                tool_choice="auto"
            )

            if debug:
                print(f"#First completion response:\n{completion}\n")

            # Get the assistant's response
            assistant_message = completion.choices[0].message
            response_content = assistant_message.content or ""

            # Check for tool calls
            tool_calls = getattr(assistant_message, 'tool_calls', None)

            # If there are tool calls, process them
            if tool_calls:
                if debug:
                    print(f"#Tool calls detected: {len(tool_calls)}\n")

                # Add the assistant's message to history for proper conversation tracking
                messages.append({
                    "role": "assistant",
                    "content": response_content,
                    "tool_calls": [
                        {
                            "id": tc.id,
                            "type": "function",
                            "function": {
                                "name": tc.function.name,
                                "arguments": tc.function.arguments
                            }
                        } for tc in tool_calls
                    ]
                })

                # Process tool calls and get responses
                tool_responses = self.process_tool_calls(tool_calls, user_id, debug=debug)

                # Add tool responses to messages
                for tool_response in tool_responses:
                    messages.append(tool_response)

                # Make a second call to get the final response with retrieved information
                second_completion = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    temperature=0.7
                )

                # Use the final response that includes tool results
                response_content = second_completion.choices[0].message.content or ""

                if debug:
                    print(f"#Final response after tool calls:\n{response_content[:100]}...\n")

            # Store the final response in history
            self.chat_histories[user_id].append({
                "role": "assistant",
                "content": response_content
            })

            return response_content

        except Exception as e:
            error_msg = f"Error in chat: {str(e)}"
            print(error_msg)
            print(traceback.format_exc())
            return error_msg

    def get_chat_history(self, user_id: str) -> list:
        """Retrieve the chat history for a specific user.

        Args:
            user_id: Unique identifier for the user

        Returns:
            list: List of message dictionaries
        """
        return list(self.chat_histories[user_id])


In [None]:
from openai import OpenAI
from tavily import TavilyClient
import os

nebius_client = OpenAI(
    base_url="https://api.studio.nebius.ai/v1/",
    api_key=os.environ.get("NEBIUS_API_KEY"),
)

tavily_client = TavilyClient(api_key=os.environ.get("TAVILY_API_KEY"))

model = "meta-llama/Meta-Llama-3.1-8B-Instruct"

rag_agent = RAGAgent(client=nebius_client, model=model, search_client=tavily_client)

Let's try asking the agent different questions to check how much it relies on retrieval and in which situations it will answer on its own.

In [None]:
# Generate a user ID
import uuid
user_id = str(uuid.uuid4())
print(f"Demo User ID: {user_id}")

Demo User ID: 6a05b5e5-b884-4f7e-aeee-3bf1e84ca9d4


In [None]:
result = rag_agent.chat(user_message="Hi there!", user_id=user_id, debug=True)
print(f"=====\n{result}")

#First completion response:
ChatCompletion(id='chatcmpl-2f16f46afad347a3aad864f27703ff91', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="It's nice to meet you. Is there something I can help you with or would you like to chat?", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1745174899, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=23, prompt_tokens=227, total_tokens=250, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

=====
It's nice to meet you. Is there something I can help you with or would you like to chat?


In [None]:
result = rag_agent.chat(user_message="Tell me about the goddess Casandalee from the Pathfinder universe",
                        user_id=user_id, debug=True)
print(f"=====\n{result}")

#First completion response:
ChatCompletion(id='chatcmpl-70834d41cfeb4dc4843cf98e77b45090', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-089a6b0dd07044aca9628948d4d3b9c2', function=Function(arguments='{"query": "Casandalee Pathfinder goddess"}', name='retrieve_information'), type='function')], reasoning_content=None), stop_reason=128008)], created=1745174900, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=28, prompt_tokens=272, total_tokens=300, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

#Tool calls detected: 1

#Processing tool call:
 retrieve_information, args: {'query': 'Casandalee Pathfinder goddess'}

#Tool result:
{"result": {"

In [None]:
result = rag_agent.chat(user_message="Who is Yann LeCun?",
                        user_id=user_id, debug=True)
print(f"=====\n{result}")

#First completion response:
ChatCompletion(id='chatcmpl-6b19a5068b724b4f9e8a9ace0e5f2478', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-350041af29424ec193973a709ec262d3', function=Function(arguments='{"query": "Yann LeCun biography"}', name='retrieve_information'), type='function')], reasoning_content=None), stop_reason=128008)], created=1745174907, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=22, prompt_tokens=584, total_tokens=606, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

#Tool calls detected: 1

#Processing tool call:
 retrieve_information, args: {'query': 'Yann LeCun biography'}

#Tool result:
{"result": {"message": "Retriev

In [None]:
result = rag_agent.chat(user_message="Nice weather, isn't it?",
                        user_id=user_id, debug=True)
print(f"=====\n{result}")

#First completion response:
ChatCompletion(id='chatcmpl-a1ef5413a8d740438209292e937eba67', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="I'm just a language model, I don't have the ability to experience the weather or have personal opinions. However, I can provide you with information about the weather if you're interested! What would you like to know?", refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1745174942, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=46, prompt_tokens=915, total_tokens=961, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

=====
I'm just a language model, I don't have the ability to experience the weather or have personal opinions. However, I can provide you with i

In [None]:
result = rag_agent.chat(user_message="How much is 12 + 13?",
                        user_id=user_id, debug=True)
print(f"=====\n{result}")

#First completion response:
ChatCompletion(id='chatcmpl-cdbe529c597d4c5cab13e23339cee410', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content=None, refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='chatcmpl-tool-3cae93d224a04ccca1e3d17542e10198', function=Function(arguments='{"query": "12 + 13"}', name='retrieve_information'), type='function')], reasoning_content=None), stop_reason=128008)], created=1745175001, model='meta-llama/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=20, prompt_tokens=971, total_tokens=991, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

#Tool calls detected: 1

#Processing tool call:
 retrieve_information, args: {'query': '12 + 13'}

#Tool result:
{"result": {"message": "Retrieved 5 results for query: '1

Probably, our agent over-relies on retrieval. I'm pretty sure that Llama 3.1 knows well who Yann LeCun is. But this could be tuned with more accurate prompting. However, general conversation topics such as "what's the weather today" are answered without web search.