# General

In the previous notebook, we created a document base and tested it with a set of questions. In this notebook, we will explore two strategies to deploy a RAG system. Specifically, we will evaluate a workflow-based approach and an agentic setup.

These correspond to steps 4 and 5 described in the previous notebook.

For this tutorial, **we add two additional collections to our vectorized data. One collection contains Linux information, and the other contains information about Slurm. This slightly complicates the retrieval step**: what happens when we receive a query? We could launch a semantic search over all our data and keep the top three relevant documents from all the collections, or we could:

1. Create a workflow that routes a question through a specific retriever;
2. Create an agent that does this automatically.


# Imports

In [1]:
import base64
from IPython.display import Image, display

import os
import re
import ast
from datetime import datetime
from dotenv import load_dotenv
from typing import List, Optional
from enum import Enum
from pydantic import BaseModel
import random

from langchain_openai import ChatOpenAI
from langchain_chroma.vectorstores import Chroma 
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers import ContextualCompressionRetriever
from langchain_core.documents import Document
from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent

# Env config

In [2]:
t0 = datetime.now()
date = re.sub(r"[ :-]", "_", str(t0)[:19])
print(f"Last execution {t0}")

load_dotenv()

# Models
EMBEDDER = "BAAI/bge-m3"
RERANKER = "BAAI/bge-reranker-v2-m3"
LLM = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"

# Endpoints
VLLM_OPENAI_ENDPOINT = os.environ["VLLM_OPENAI_ENDPOINT"]
VLLM_KEY = os.environ["VLLM_KEY"]

# Paths
PROMPT_PATH = "../data/prompts"
CHROMA_PATH = "../data/output/chunking/chroma"

llm = ChatOpenAI(base_url=VLLM_OPENAI_ENDPOINT, api_key=VLLM_KEY, model=LLM, temperature=0, max_completion_tokens=3000)

# Use GPU 3 for this notebook. GPUs 0 and 1 are used to load the llm
os.environ["CUDA_VISIBLE_DEVICES"] = "2"

Last execution 2025-07-28 16:51:48.413683


In [3]:
# Util function to plot mermaid graphs
def mm(graph):
    graphbytes = graph.encode("utf8")
    base64_bytes = base64.urlsafe_b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))

# The collections

In [4]:
# This is basically the class we implemented in the previous notebook, except for the fact that we removed the llm generation call
class SemanticRetriever():
    def __init__(self, top_k:int, collection_name:str, chroma_path:str, embedder_name:str, reranker_name:str):
        self.reranker_name = reranker_name
        lc_embedder = HuggingFaceEmbeddings(model_name = embedder_name)
        vector_store = Chroma(collection_name = collection_name, embedding_function = lc_embedder, persist_directory = chroma_path)
        self.retriever = vector_store.as_retriever(search_kwargs = {"k": 20 if self.reranker_name else 3})
        if self.reranker_name:
            reranker = HuggingFaceCrossEncoder(model_name=reranker_name)
            compressor = CrossEncoderReranker(model=reranker, top_n=top_k)
            
            self.compression_retriever = ContextualCompressionRetriever(base_compressor=compressor, base_retriever=self.retriever)

    def retrieve(self, query:str):
        if self.reranker_name:
            retrieved_chunks:List[Document] = self.compression_retriever.invoke(query)
        else:
            retrieved_chunks:List[Document] = self.retriever.invoke(query)

        resources = "[RETRIEVED_RESOURCES]:\n\n" 
        for chunk in retrieved_chunks:
            page_content = chunk.page_content
            doc_name = chunk.metadata["doc_name"]
            resources += "[DOCUMENT_TITLE]: " + doc_name + "\n[DOCUMENT_CONTENT]: " + page_content + "\n\n"
        return resources

In [5]:
class SlurmDocumentationRetriever():
    def __init__(self):
        self.slurm_docs = [Document(page_content="SLURM Overview: SLURM (Simple Linux Utility for Resource Management) is an open-source workload manager used for scheduling and managing jobs on HPC clusters, handling job queues, and allocating resources on compute nodes."),
                           Document(page_content="SLURM Architecture: Understand how SLURM components interact, including the controller daemon (slurmctld), compute node daemons (slurmd), and optional database daemon (slurmdbd) for accounting."),
                           Document(page_content="Comparison with Other Workload Managers: Compare SLURM with PBS, LSF, and other schedulers to understand differences in syntax, features, and usage within HPC environments."),
                           Document(page_content="SLURM Key Components: Learn about slurmctld for managing jobs, slurmd for executing jobs on compute nodes, slurmdbd for accounting, and configuration files like slurm.conf, cgroup.conf, and gres.conf for resource management."),
                           Document(page_content="Basic SLURM User Commands: Commands like sbatch (submit jobs), squeue (check job queue), scontrol show job (inspect job details), scancel (cancel jobs), and srun or salloc (interactive job execution)."),
                           Document(page_content="SLURM Job Scripts: Structure and syntax for writing SLURM batch scripts, including SBATCH directives for setting job names, outputs, errors, wall time, partitions, node and CPU requests, memory, and GPU allocations."),
                           Document(page_content="Resource Management: Learn about partitions and priorities in SLURM, scheduling policies like backfill and fair-share, node features, GPU scheduling, and requesting memory and CPU resources accurately."),
                           Document(page_content="Monitoring and Debugging in SLURM: Tools like sinfo (cluster state), squeue and scontrol (job monitoring), sacct and sreport (accounting), seff (job efficiency check), and methods for debugging stuck or failed jobs."),
                           Document(page_content="Advanced SLURM Usage: Using job arrays for batch submission, managing job dependencies, checkpointing, requeueing failed jobs, heterogeneous job scheduling, and requesting advanced resources like licenses or specialized hardware."),
                           Document(page_content="SLURM Accounting and Reporting: Setting up slurmdbd, configuring accounting_storage, managing users and accounts with sacctmgr, using QoS for prioritization, and generating usage and efficiency reports using sreport."),
                           Document(page_content="SLURM Configuration and Administration: Writing and maintaining slurm.conf, configuring partitions and node states (drain, down, idle), managing fair-share policies, and using cgroup integration for resource control."),
                           Document(page_content="Integrations and Extensions: Integration of SLURM with MPI for parallel workloads, using Singularity or Apptainer containers with SLURM jobs, profiling jobs with sstat and sacct, and integrating with monitoring tools like Prometheus and Grafana."),
                           Document(page_content="SLURM REST API: Learn about slurmrestd for exposing SLURM capabilities over HTTP/REST for programmatic management and monitoring of SLURM clusters."),
                           Document(page_content="Security and Policies in SLURM: Managing user authentication, permission controls, node isolation, job sandboxing, enforcing resource limits, and setting job timeout policies within a SLURM-managed HPC cluster."),
                           Document(page_content="Best Practices for SLURM: Writing efficient and clear job scripts, effectively using job arrays for scalable workloads, managing GPU and node usage for efficiency, profiling and optimizing SLURM jobs, and systematic debugging of issues.")
                          ]
        
    def retrieve(self, query:str) -> str:
        retrieved_chunks:List[Document] = [random.choice(self.slurm_docs)]
        
        resources = "[RETRIEVED_RESOURCES]:\n\n" 
        for chunk in retrieved_chunks:
            page_content = chunk.page_content
            resources += "[DOCUMENT_CONTENT]: " + page_content + "\n\n"
            
        return resources

In [6]:
class LinuxDocumentationRetriever():
    def __init__(self):
        self.linux_doc = [Document(page_content="Linux Overview: Linux is an open-source, Unix-like operating system kernel used for managing hardware resources and providing essential services to run applications on servers, desktops, and embedded systems."),
                          Document(page_content="Linux Distributions: Learn about different Linux distributions (distros) such as Ubuntu, CentOS, Debian, Fedora, and Arch, which package the Linux kernel with user-space utilities and package managers for different use cases."),
                          Document(page_content="Linux Filesystem Hierarchy: Understand the Linux filesystem structure, including root (/), /home, /etc, /var, /usr, /tmp, and the purpose of each directory in system organization."),
                          Document(page_content="File and Directory Management: Basic file operations using commands like ls, cp, mv, rm, mkdir, rmdir, and advanced file management using find, locate, and file permissions (chmod, chown, chgrp)."),
                          Document(page_content="User and Group Management: Managing users and groups with commands like useradd, userdel, passwd, groupadd, and understanding /etc/passwd, /etc/shadow, and /etc/group files."),
                          Document(page_content="Process Management: Understanding Linux processes, using ps, top, htop, kill, pkill, nice, renice, and background/foreground job control with &, fg, bg, and jobs."),
                          Document(page_content="Package Management: Installing, updating, and removing packages using apt (Debian/Ubuntu), yum/dnf (RHEL/CentOS/Fedora), and pacman (Arch), and building packages from source."),
                          Document(page_content="System Services and Daemons: Managing system services using systemd (systemctl), service, and chkconfig, and understanding how daemons run in the background to provide essential system services."),
                          Document(page_content="Networking Basics: Configuring network interfaces using ip, ifconfig, and nmcli, checking connectivity with ping, traceroute, and managing network services like SSH and firewalls."),
                          Document(page_content="Disk and Filesystem Management: Managing disks and partitions using fdisk, lsblk, blkid, formatting with mkfs, mounting filesystems with mount/umount, and checking disk usage with df and du."),
                          Document(page_content="System Monitoring and Logging: Using monitoring tools like top, htop, iostat, vmstat, free, and inspecting logs in /var/log using tail, less, journalctl, and log rotation."),
                          Document(page_content="Shell Scripting: Writing basic and advanced shell scripts using bash, using variables, loops, conditionals, and creating reusable scripts for automation of system tasks."),
                          Document(page_content="Security and Permissions: Managing file permissions, using sudo for privilege escalation, configuring SSH security, using fail2ban, and keeping the system updated for security patches."),
                          Document(page_content="System Backup and Recovery: Using tar, rsync, and cron for backups, creating disk images, and recovery strategies to restore system states in case of failures."),
                          Document(page_content="Advanced Linux Topics: Kernel modules management, performance tuning, using system calls, cgroups for resource management, and using containers (Docker, Podman) on Linux systems."),
                          Document(page_content="Linux Best Practices: Keeping systems updated, securing SSH access, using least privilege, monitoring system resources regularly, automating tasks with cron, and maintaining clean logs and disk usage.")
                         ]
        
    def retrieve(self, query:str) -> str:
        retrieved_chunks:List[Document] = [random.choice(self.linux_doc)]

        resources = "[RETRIEVED_RESOURCES]:\n\n" 
        for chunk in retrieved_chunks:
            page_content = chunk.page_content
            resources += "[DOCUMENT_CONTENT]: " + page_content + "\n\n"
            
        return resources

In [7]:
hpc_retriever = SemanticRetriever(top_k = 3, collection_name = "hpc_contextualized_wiki", chroma_path = CHROMA_PATH, embedder_name = EMBEDDER, reranker_name = None)
slurm_retriever = SlurmDocumentationRetriever()
linux_retriever = LinuxDocumentationRetriever()

  from .autonotebook import tqdm as notebook_tqdm


# Workflow

We can ask the llm what is the most relevant collection given a question. Then, we trigger search into that collection.

![workflow](imgs/router_rag.drawio.png)  

We need to **constrain the model’s answers to a specified set of categories** (e.g., HPC, Slurm, etc.). We can reduce unwanted outputs from the model (e.g., prepending "Absolutely, here is your category") when returning answers by using guided decoding.

Unlike standard decoding methods such as greedy search, sampling, or beam search, guided decoding incorporates additional rules during generation to influence the model’s output. During token generation, the model’s output logits (the scores before applying softmax) are influenced by **masking invalid options. Valid options are specified using Pydantic models**, JSON schemas, or similar tools.

![guided](imgs/method1.png)  
Image courtesy of: https://lmsys.org/blog/2024-02-05-compressed-fsm/.

In [8]:
class Collection(str, Enum):
    LINUX = "Linux"
    SLURM = "Slurm"
    HPC = "HPC"
    NA = "None"

class QueryCategory(BaseModel):
    category:Collection

def documentation_router(query:str, llm):
    few_shot_prompt = f"""Given the following query, answer with a category chosen from the following:
- Linux: Use this category if the question is about Linux.
- Slurm: Use this category if the question is a general question about Slurm.
- HPC: Use this category if the question is about Cineca, Leonardo, or is a specific Slurm question about how Slurm is configured on Leonardo.
- None: Use this category if the question is generic and does not fall into any of the previous categories.

Here are some examples:
[QUERY]: How many GPUs does Leonardo have?
[ANSWER]: HPC
[QUERY]: Have you ever read Lord of the Rings?
[ANSWER]: None
[QUERY]: What is Linus Torvalds' favourite distro?
[ANSWER]: Linux
[QUERY]: When was Slurm invented?
[ANSWER]: Slurm

[QUERY]: {query}
[ANSWER]:"
    """
    answer = llm.with_structured_output(QueryCategory).invoke([("system", "Answer accordingly to the user's instructions."),
                           ("user", few_shot_prompt)])

    if answer.category is Collection.LINUX:
        return linux_retriever.retrieve(question)
    elif answer.category is Collection.HPC:
        return hpc_retriever.retrieve(question)
    elif answer.category is Collection.SLURM:
        return slurm_retriever.retrieve(question)
    else:
        return "[RETRIEVED_RESOURCES]: No resources where retrieved for this question."

In [9]:
query_set = ["Tell me something about Linux",
             "Tell me something about Slurm",
             "What's the weather like in Italy?",
             "What are the names of the QOS queues available on the Leonardo supercomputer BOOSTER partition?",
             "What is the scheduler used on leonardo? slurm or pbs?"]

for question in query_set:
    documents = documentation_router(question, llm)
    question = f"[QUESTION]:\n\n{question}\n\n{documents}"
    print(question)
    answer = llm.stream(question)
    print(f"[ANSWER]: ")
    for token in answer:
        print(token.content, end = "")
    print("\n=======================================================")
    

[QUESTION]:

Tell me something about Linux

[RETRIEVED_RESOURCES]:

[DOCUMENT_CONTENT]: Disk and Filesystem Management: Managing disks and partitions using fdisk, lsblk, blkid, formatting with mkfs, mounting filesystems with mount/umount, and checking disk usage with df and du.


[ANSWER]: 
Linux is a widely-used open-source operating system known for its stability, security, and flexibility. It is based on the Unix operating system and is often used in servers, desktops, and embedded systems. Here are some key points about Linux, particularly focusing on disk and filesystem management:

### Disk and Filesystem Management

1. **Managing Disks and Partitions:**
   - **fdisk:** A command-line utility used to create, delete, and manipulate disk partitions. It provides a text-based interface for managing disk partitions.
   - **lsblk:** Lists information about all available or the specified block devices. It provides a tree-like structure of the block devices.
   - **blkid:** Displays or m

Pros:
- Except for the llm categorization part, you are in control of the workflow;

Cons:
- If you add more datasources, you will need to update your code and the few shot prompt;
- If you code a bunch of if-then elses clauses (like in this case) you can miss cases where it is useful to search for a question in multiple document bases. However, can solve this by asking the model to return a list of categories and then launching the semantic search step on each category...


# Agents and agentic RAG

In an agentic rag setup, we leave to the model to burden of deciding where has to be searched. Basically, the model "has agency".
Let's build an agent from scratch and then let's use a framework to do the same thing. 

Agents use the **ReAct Pattern**. ReAct is based on **Thought-Action-Observation** cycle.  

![react_flow](imgs/react.png)
Image courtesy of: https://www.ibm.com/think/topics/react-agent.

- Thought: the LLM checks the conversation and decides what the next step should be;
- Action: A tool is called or a snippet of code is written;
  - An external parser executes the snippet of code or the tool the LLM has called;
- Observation: The result of the function call is added to the history of messages and the model is given this result.
These steps are specified in the system prompt and are executed in loop until the model decides that the task was accomplished.  

In [10]:
def apply_chat_template(system_prompt:str, user_messages:list[str], system_messages:list[str]) -> list[dict]:
    """Util function to create a chat history with the correct format."""
    messages = [{"role" : "system", "content" : system_prompt}]
    for i in range(max(len(user_messages), len(system_messages))):
        try:
            messages.append({"role" : "user", "content" : user_messages[i]})
        except IndexError:
            pass
        try:
            messages.append({"role" : "assistant", "content" : system_messages[i]})
        except IndexError:
            pass
    return messages

In [11]:
def search_hpc_wiki(query:str) -> str:
    """Retrieves relevant information from Cineca's HPC internal wiki in response to a natural language query, returning the results as a plain text string.
    
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return hpc_retriever.retrieve(query)

def search_linux_wiki(query:str) -> str:
    """Returns general trivia from the internal Linux documentation wiki in response to a natural language query, returning the results as a plain text string.
 
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return linux_retriever.retrieve(query)
    
def search_slurm_wiki(query:str) -> str:
    """Retrieves general trivia from the internal Slurm documentation wiki in response to a natural language query, returning the results as a plain text string.
 
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return slurm_retriever.retrieve(query)

In [12]:
print("[NAME]: " + search_hpc_wiki.__name__)
print("[DESCRIPTION]: " + search_hpc_wiki.__doc__)

[NAME]: search_hpc_wiki
[DESCRIPTION]: Retrieves relevant information from Cineca's HPC internal wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    


In [13]:
functions = [search_hpc_wiki, search_linux_wiki, search_slurm_wiki]

In [14]:
functions_to_str = ""
functions_to_str = "\n".join([f"- {item.__name__}: {item.__doc__}" for item in functions])
print(functions_to_str)

- search_hpc_wiki: Retrieves relevant information from Cineca's HPC internal wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    
- search_linux_wiki: Returns general trivia from the internal Linux documentation wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    
- search_slurm_wiki: Retrieves general trivia from the internal Slurm documentation wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    


In [15]:
FUNCTION_CALLING_PROMPT = (
    "You are a helpful assistant trained to answer user questions. "
    "You will receive a question from the user. You must answer using your knowledge or "
    "tools. Here is a list of the available tools.\n\n[TOOLS]:\n"
    f"{functions_to_str}\n"
    "When you call a tool you need to use the following format: "
    "TOOL_CALL: {'tool_name': <name_of_the_tool>, 'parameters':['<parameter_value>']}\n"
    "Here is an example: TOOL_CALL: {'tool_name': 'count_occurrences', 'parameters': ['s', 'strawberry']}\n"
    "Use only the tools specified in the TOOLS section. Do not invent tools.\n"
    "The tool you choose will be executed by an external Python interpreter and "
    "the result will be given to you in the following form: TOOL_RESULT: <value>.\n"
    "You can use the tool result to give a final answer. "
    "When you know the final answer, always start your answer with FINAL_ANSWER:\n"
    "Do not invent answers. "
)

print(FUNCTION_CALLING_PROMPT)

You are a helpful assistant trained to answer user questions. You will receive a question from the user. You must answer using your knowledge or tools. Here is a list of the available tools.

[TOOLS]:
- search_hpc_wiki: Retrieves relevant information from Cineca's HPC internal wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    
- search_linux_wiki: Returns general trivia from the internal Linux documentation wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    
- search_slurm_wiki: Retrieves general trivia from the internal Slurm documentation wiki in response to a natural language query, returning the results as a plain text string.

    Arguments:


In [16]:
query = apply_chat_template(FUNCTION_CALLING_PROMPT, ["What GPUs are used on Leonardo?"], []) 
completion = llm.invoke(query)
print(completion.content)

TOOL_CALL: {'tool_name': 'search_hpc_wiki', 'parameters': ['What GPUs are used on Leonardo?']}


In [17]:
def model_ans_parser(model_answ:list[str], user_queries:list[str]):
    """
    Calls a tool if the model answer was a tool call.
    """
    TOOL_CALL_PLACEHOLDER = "TOOL_CALL: "
    # This is a function call, we need to parse model output
    if TOOL_CALL_PLACEHOLDER in model_answ[-1]:
        answer = model_answ[-1]
        answer = answer.replace(TOOL_CALL_PLACEHOLDER, "")
        try:
            # Cast the string to a real dict
            answer = ast.literal_eval(answer)
            # Cast string to function
            function_name = eval(answer["tool_name"])
            params = answer["parameters"]
            # Eval the expression
            user_queries.append(f"TOOL_RESULT: {function_name(*params)}")
            return model_answ, user_queries
        except KeyError:
            print("Key error while parsing model answer")
    else:
        # This is not a function call, we simply return the model answ list
        return model_answ, user_queries

In [18]:
print(model_ans_parser([completion.content], ["Tell me something about Linux"]))



In [19]:
def check_final_answer(model_answ:list[str]):
    FINAL_ANSW_PLACEHOLDER = "FINAL_ANSWER:"
    if len(model_answ):
        # Check the last answer
        if FINAL_ANSW_PLACEHOLDER in model_answ[-1]:
            # Last answer had the final answ placeholder
            return True
        else:
            # Last answer was not the final answer
            return False
    else:
        # First interaction, model answ is empty
        return False

Now we just need a while loop so that model will continue to call function until he has all the resources needed to produce his final answer. Remember that the final answer is different from all other messges because will contain the "FINAL_ANSWER" tag.

In [20]:
user_queries = ["What GPUs are used on Leonardo?"]
model_answers = []

while not check_final_answer(model_answers):
    # Build the chat history
    query = apply_chat_template(FUNCTION_CALLING_PROMPT, user_queries, model_answers) 
    # Send messages to the model
    completion = llm.invoke(query)
    # Extract model answer
    answer = completion.content
    # Append the answer to the list of hanswers
    model_answers.append(answer)
    # Parse the answer
    model_answers, user_queries = model_ans_parser(model_answers, user_queries)

print(model_answers[-1])

FINAL_ANSWER: The GPUs used on Leonardo are NVIDIA Ampere A100-64 accelerators.


Let's inspect what happened.

In [21]:
print(apply_chat_template(FUNCTION_CALLING_PROMPT, user_queries, model_answers) )



# Considerations

Obviously the loop we scripted is very educative, but not production ready (e.g. for function call we should have used guided decoding, error handling is missing, etc). In real industrial scenarios you probably want to use something more "battle tested".

# Langchain

Here we use the lang-graph framework. The document base becomes a tool of our agent. The method `create_react_agent` creates a Langchain graph which encodes what we scripted before.

In [22]:
@tool
def search_hpc_wiki(query:str) -> str:
    """Retrieves relevant information from Cineca's HPC internal wiki in response to a natural language query, returning the results as a plain text string.
    
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return hpc_retriever.retrieve(query)

@tool
def search_linux_wiki(query:str) -> str:
    """Returns general trivia from the internal Linux documentation wiki in response to a natural language query, returning the results as a plain text string.
 
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return linux_retriever.retrieve(query)
    
@tool
def search_slurm_wiki(query:str) -> str:
    """Retrieves general trivia from the internal Slurm documentation wiki in response to a natural language query, returning the results as a plain text string.
 
    Arguments:
        - query: a string with the user query.

    Returns:
        str: a string with the retrieved context.
    """
    return slurm_retriever.retrieve(query)

langchain_tools = [search_hpc_wiki, search_linux_wiki, search_slurm_wiki]
agent = create_react_agent(llm, langchain_tools)

In [23]:
mm(agent.get_graph().draw_mermaid())

In [24]:
for query in query_set:

    response = agent.invoke({"messages": [query]})

    for message in response["messages"]:
        message.pretty_print()
    print("================================================================================")


Tell me something about Linux
Tool Calls:
  search_linux_wiki (QaPWBAFPA)
 Call ID: QaPWBAFPA
  Args:
    query: Linux
Name: search_linux_wiki

[RETRIEVED_RESOURCES]:

[DOCUMENT_CONTENT]: Linux Distributions: Learn about different Linux distributions (distros) such as Ubuntu, CentOS, Debian, Fedora, and Arch, which package the Linux kernel with user-space utilities and package managers for different use cases.



Linux is an open-source operating system kernel that was first released on September 17, 1991, by Linus Torvalds. It has since grown to become one of the most widely used and influential operating systems in the world. Linux is known for its stability, security, and flexibility, making it a popular choice for servers, desktops, and embedded systems.

There are many different Linux distributions, often referred to as "distros," each tailored to specific use cases and user preferences. Some of the most popular Linux distributions include:

1. **Ubuntu**: Known for its user-frie

# Next steps

In this tutorial, we have seen how a RAG system can be implemented using semantic search and how multiple document bases can be integrated into a workflow and agentic setups.

Although this tutorial was very introductory, I hope it helps you on your GenAI journey. If you want to take your knowledge to the next level, here are some additional topics you might want to explore:

- Hypothetical document embeddings and hypothetical questions (a powerful technique to improve retrieval results);
- Memory management in agents;