# ReAct Agents from Scratch

In this chapter, we will be build a ReAct agent *from scratch*. That is, no frameworks or unecessary abstractions so that we can truly understand how ReAct agents work.

We will be modifying ReAct a little to take advantage of the progress that has been made in LLMs since the paper was originally released. We make two modifications:

1. We use chat models, when ReAct was released LLMs were not fine-tuned specifically for chat and instead were prompted to generate chat-like dialogues. Most **S**tate **o**f **t**he **A**rt (SotA) LLMs nowadays are built specifically for chat and so the input into them must be modified to be chat-model friendly.

2. We will use JSON-mode to force structured output from our LLMs.  The original ReAct method simply instructed the LLM to output everything in a particular format. That works but is prone to occasionally breaking. By forcing JSON-like output we reduce the likelihood of poorly structured output *and* make it easier for our downstream code to parse and use the output from our LLM. To accomodate this we modify the instructions to ask for `thought` and `action` steps in a JSON format.

In [1]:
system_prompt = """
You are a helpful assistant. Given a user query you must provide a `thought` and
`action` step that take one step towards solving the user's query. Both the
`thought` and `action` steps will be contained in JSON output.

The `thought` is the first key mapping to your reasoning on how to solve the
user's query.

The `action` step is the second key mapping that describes how you wish to use
the chosen tool to solve the user's query. It contains a `tool` key mapping to
the name of the tool to use and a `args` key containing a JSON object of
arguments to pass to the tool.

Here is an example:

user: What is the weather in Tokyo?
assistant: {
  "thought": "I need to find out the current temperature in Tokyo",
  "action": {"tool": "search", "args": {"query": "current temperature in Tokyo"}}
}

If you have performed any previous thought and action steps, you will find them
below under the "Previous Steps" section. Alongside these you will find an
`observation` key containing the output of those previous actions.
"""

We haven't defined any tools or agent logic yet, but let's see what type of output
our LLM produces if we prompt it with this system prompt.

Make sure you have Ollama running and Llama 3.2 downloaded by executing this in your terminal:

```
ollama pull llama3.2:3b-instruct-fp16
```

If you need guidance on setting up Ollama, please refer to our [guidelines](https://github.com/aurelio-labs/agents-course?tab=readme-ov-file#ollama).

In [2]:
import ollama

res = ollama.chat(
    model="llama3.2:3b-instruct-fp16",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is the date today?"},
    ],
    format="json",
    options={"temperature": 0.0}
)

print(res["message"]["content"])

{
  "thought": "I need to determine the current date",
  "action": {
    "tool": "date",
    "args": {}
  }
}


Great, we're outputting the correct format *however* we don't have a `date` tool, in fact, we don't have *any* tools! Let's define some.

First, we'll define a search tool. This tool will allow our agent to search the web for information. To implement it we will use the Tavily API, it comes with a number of requests for free but we do need to [sign up for the API](https://app.tavily.com/home) and get an API key to use it.

In [3]:
import requests

TAVILY_API_KEY = "tvly-..."  # put your API key here!

tavily_url = "https://api.tavily.com"

res = requests.post(
    f"{tavily_url}/search",
    json={
        "api_key": TAVILY_API_KEY,
        "query": "What is the weather in Tokyo?"
    },
)

res.json()

{'query': 'What is the weather in Tokyo?',
 'follow_up_questions': None,
 'answer': None,
 'images': [],
 'results': [{'title': 'Weather in Tokyo',
   'url': 'https://www.weatherapi.com/',
   'content': "{'location': {'name': 'Tokyo', 'region': 'Tokyo', 'country': 'Japan', 'lat': 35.6895, 'lon': 139.6917, 'tz_id': 'Asia/Tokyo', 'localtime_epoch': 1730676560, 'localtime': '2024-11-04 08:29'}, 'current': {'last_updated_epoch': 1730675700, 'last_updated': '2024-11-04 08:15', 'temp_c': 17.9, 'temp_f': 64.3, 'is_day': 1, 'condition': {'text': 'Partly Cloudy', 'icon': '//cdn.weatherapi.com/weather/64x64/day/116.png', 'code': 1003}, 'wind_mph': 5.6, 'wind_kph': 9.0, 'wind_degree': 181, 'wind_dir': 'S', 'pressure_mb': 1023.0, 'pressure_in': 30.22, 'precip_mm': 0.0, 'precip_in': 0.0, 'humidity': 61, 'cloud': 60, 'feelslike_c': 17.9, 'feelslike_f': 64.3, 'windchill_c': 17.9, 'windchill_f': 64.3, 'heatindex_c': 17.9, 'heatindex_f': 64.3, 'dewpoint_c': 10.5, 'dewpoint_f': 50.8, 'vis_km': 10.0, 'vi

As you can see, we don't return much useful info here beyond that we have some URLs that *do contain* the information we need. Fortunately, we can extract this information via Tavily's `/extract` endpoint.

In [4]:
urls = [x["url"] for x in res.json()["results"]]

res = requests.post(
    f"{tavily_url}/extract",
    json={
        "api_key": TAVILY_API_KEY,
        "urls": urls
    },
)

In [5]:
res.json()

{'results': [{'url': 'https://www.qweather.com/en/weather/tokyo-65E77.html',
   'raw_content': 'Japan\xa0\xa0\xa02024-11-04\xa0\xa0\xa0Monday\xa0\xa0\xa035.68N, 139.81E\nTokyo\nTokyo\n2024-11-04 07:52\n12°\nSunny\n\n                    3KM/H\n                \nNNE\n82%\nHumidity\xa0\nModerate\nUV\xa0\n12°\nFeels Like\n18km\nVisibility\n0.0mm\nPrecipitation\n1023hPa\nPressure\n24H Forecast\nTemperature\nForecast\nToday\n4 Nov\nTue\n5 Nov\nWed\n6 Nov\nThu\n7 Nov\nFri\n8 Nov\nSat\n9 Nov\nSun\n10 Nov\nSun\nNow\n08:29\nNow\n08:29\nAltitude\n24°\nHeading\n133°SE\nMoon\nNow\n08:29\nNow\n08:29\nAltitude\n-1°\nHeading\n122°ESE\nTemperature\nSatellite imagery\nWeather Rank\nAza Chatan\nGinowan\nGushikawa\nKatsuren Haebaru\nItoman\nOkinawa-shi\nTomigusuku-shi\nIoto\nIkenosawa\nNago-shi\nTeine-ku\nOtaru-shi\nYoichi-cho\nAogashima\nMombetsu-shi\nKita-ku\nHigashi-ku\nNishi-ku\nMinami-ku\nMinami5-Jonishi\nNearby\nUrayasu-shi\nShinagawa-ku\nIchikawa-shi\nMatsudo\nYashio\nNerima\nSoka-shi\nFunabashi\nT

We can see that most requests didn't work, but that's okay we made multiple requests and fortunately received one result that looks perfect, we can extract that information like so:

In [6]:
print(res.json()["results"][0]["raw_content"])

Japan   2024-11-04   Monday   35.68N, 139.81E
Tokyo
Tokyo
2024-11-04 07:52
12°
Sunny

                    3KM/H
                
NNE
82%
Humidity 
Moderate
UV 
12°
Feels Like
18km
Visibility
0.0mm
Precipitation
1023hPa
Pressure
24H Forecast
Temperature
Forecast
Today
4 Nov
Tue
5 Nov
Wed
6 Nov
Thu
7 Nov
Fri
8 Nov
Sat
9 Nov
Sun
10 Nov
Sun
Now
08:29
Now
08:29
Altitude
24°
Heading
133°SE
Moon
Now
08:29
Now
08:29
Altitude
-1°
Heading
122°ESE
Temperature
Satellite imagery
Weather Rank
Aza Chatan
Ginowan
Gushikawa
Katsuren Haebaru
Itoman
Okinawa-shi
Tomigusuku-shi
Ioto
Ikenosawa
Nago-shi
Teine-ku
Otaru-shi
Yoichi-cho
Aogashima
Mombetsu-shi
Kita-ku
Higashi-ku
Nishi-ku
Minami-ku
Minami5-Jonishi
Nearby
Urayasu-shi
Shinagawa-ku
Ichikawa-shi
Matsudo
Yashio
Nerima
Soka-shi
Funabashi
Toda-shi
Sato
QWeather APP
Visualize Your Weather
Weather API/SDK
Need weather data service?
NEED WEATHER DATA ?
Get APP
Forecast
Air Quality
Severe Weather
Satellite+Radar
Traffic Weather
Visualization
Weather Data
Wea

Okay so this is how we use the Tavily API to search the web, now let's implement this logic within a function which we can then use as a tool (ie action) for our ReAct agent.

In [7]:
def search(query: str):
    """Use this tool to search the web for information."""
    # first we need to search the web for the query
    res = requests.post(
        f"{tavily_url}/search",
        json={
            "api_key": TAVILY_API_KEY,
            "query": query
        },
    )
    # now get all the URLs from the search results
    urls = [x["url"] for x in res.json()["results"]]
    # now extract the information from the URLs
    res = requests.post(
        f"{tavily_url}/extract",
        json={
            "api_key": TAVILY_API_KEY,
            "urls": urls
        },
    )
    # we return just the top result as otherwise we overload our LLM
    return res.json()["results"][0]["raw_content"]

Let's test our function:

In [8]:
print(search(query="What is the weather in Tokyo?"))

Yahoo Weather
My Locations
Around the World
Tokyo
Japan
Mostly Sunny
Forecast
5 PM
6 PM
7 PM
8 PM
9 PM
10 PM
11 PM
12 AM
1 AM
2 AM
3 AM
4 AM
5 AM
6 AM
7 AM
8 AM
9 AM
10 AM
11 AM
12 PM
1 PM
2 PM
3 PM
4 PM
Clear with a high of 42 °F (5.6 °C) and a 49% chance of precipitation. Winds NW at 24 mph (38.6 kph).
Night - Clear with a 28% chance of precipitation. Winds variable at 7 to 25 mph (11.3 to 40.2 kph). The overnight low will be 34 °F (1.1 °C).
Sunny today with a high of 54 °F (12.2 °C) and a low of 32 °F (0 °C).
Mostly cloudy today with a high of 56 °F (13.3 °C) and a low of 40 °F (4.4 °C).
Rain today with a high of 52 °F (11.1 °C) and a low of 40 °F (4.4 °C). There is a 66% chance of precipitation.
Rain today with a high of 47 °F (8.3 °C) and a low of 41 °F (5 °C). There is a 77% chance of precipitation.
Showers today with a high of 48 °F (8.9 °C) and a low of 42 °F (5.6 °C). There is a 75% chance of precipitation.
Mostly cloudy today with a high of 54 °F (12.2 °C) and a low of 39 °F 

Okay that is our `search` tool. We will also define a tool that will be triggered when our LLM would like to provide it's final `answer` to the user.

In [9]:
def answer(answer: str):
    """Use this tool to provide your final answer to the user."""
    return answer

Now we generate an additional part to our `system_prompt` to explain which tools are available to the LLM.

In [10]:
import inspect

# we get the various parameters/description from each tool function
tools = [search, answer]
tool_descriptions = [
    {
        "name": tool.__name__,
        "description": str(inspect.getdoc(tool)),
        "args": {
            k: str(v).split(": ")[1] for k, v in inspect.signature(tool).parameters.items()
        }
    }
    for tool in tools
]
tool_descriptions

[{'name': 'search',
  'description': 'Use this tool to search the web for information.',
  'args': {'query': 'str'}},
 {'name': 'answer',
  'description': 'Use this tool to provide your final answer to the user.',
  'args': {'answer': 'str'}}]

Now let's parse these into text instructions that can be added to our `system_prompt`.

In [11]:
tool_instructions = (
    "You have access to the following tools ONLY, no other tools exist:\n\n"
    + "\n".join([str(x) for x in tool_descriptions])
)
print(tool_instructions)

You have access to the following tools ONLY, no other tools exist:

{'name': 'search', 'description': 'Use this tool to search the web for information.', 'args': {'query': 'str'}}
{'name': 'answer', 'description': 'Use this tool to provide your final answer to the user.', 'args': {'answer': 'str'}}


Now let's try calling our LLM again with these additional instructions.

In [12]:
res = ollama.chat(
    model="llama3.2:3b-instruct-fp16",
    messages=[
        {"role": "system", "content": f"{system_prompt}\n\n{tool_instructions}"},
        {"role": "user", "content": "What is Ollama in the context of AI?"},
    ],
    format="json",
    options={"temperature": 0.0}
)

step = res["message"]["content"]
print(step)

{
  "thought": "I need to find out what Ollama refers to in the context of AI",
  "action": {
    "tool": "search",
    "args": {"query": "Ollama AI"}
  }
}


Perfect! Our LLM has correctly generated the query we need. We can now parse this and pass it into the `search` tool as specified by our LLM.

In [13]:
import json

tool_choice = json.loads(res["message"]["content"])["action"]["tool"]
args = json.loads(res["message"]["content"])["action"]["args"]

# we use a dictionary to map the tool name to the tool function
tool_selector = {x.__name__: x for x in tools}

# now we select the tool and call it with the arguments
observation = tool_selector[tool_choice](**args)
print(observation)

Run Language Models Locally with Ollama: A Comprehensive Guide


Follow

Follow


Run Language Models Locally with Ollama: A Comprehensive Guide

Spheron Network
·Nov 3, 2024·7 min read
Table of contents

Integration with LangChain
Building a Simple Chatbot
Using AnythingLLM with Ollama
Best tools available, but unheard
1. Haystack by Deepset
2. LlamaIndex (formerly GPT Index)
3. Chroma
4. Hugging Face Transformers
5. Pinecone
6. OpenAI API
7. Rasa
8. Cohere
9. Vercel AI SDK


Conclusion

Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine. With Ollama, you can easily download, install, and interact with LLMs without the usual complexities.
To get started, you can download Ollama from here. Once installed, open a terminal and type:
ollama run phi3
OR
ollama pull phi3
ollama run phi3
This will download the required layers of the model "phi3". After the model is loaded, Ollama enters a REPL (Read-Eval-

Nice! Within the ReAct framework we would then pass this information back to our LLM via a new `observation` variable. Let's try.

In [18]:
iteration = json.loads(step)
# we limit the observation to 3000 characters to avoid overwhelming the LLM
iteration["observation"] = observation[:3000] + "..."
iteration_str = json.dumps(iteration, indent=2)
print(iteration_str)

{
  "thought": "I need to find out what Ollama refers to in the context of AI",
  "action": {
    "tool": "search",
    "args": {
      "query": "Ollama AI"
    }
  },
  "observation": "Run Language Models Locally with Ollama: A Comprehensive Guide\n\n\nFollow\n\nFollow\n\n\nRun Language Models Locally with Ollama: A Comprehensive Guide\n\nSpheron Network\n\u00b7Nov 3, 2024\u00b77 min read\nTable of contents\n\nIntegration with LangChain\nBuilding a Simple Chatbot\nUsing AnythingLLM with Ollama\nBest tools available, but unheard\n1. Haystack by Deepset\n2. LlamaIndex (formerly GPT Index)\n3. Chroma\n4. Hugging Face Transformers\n5. Pinecone\n6. OpenAI API\n7. Rasa\n8. Cohere\n9. Vercel AI SDK\n\n\nConclusion\n\nOllama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine. With Ollama, you can easily download, install, and interact with LLMs without the usual complexities.\nTo get started, you can download Oll

Now we feed this back into our chat.

In [19]:
res = ollama.chat(
    model="llama3.2:3b-instruct-fp16",
    messages=[
        {
            "role": "system",
            "content": f"{system_prompt}\n\n{tool_instructions}"
        },
        {"role": "user", "content": "What is Ollama in the context of AI?"},
        {"role": "assistant", "content": f"Step 1:\n{iteration_str}\n\nWhat do I do next to answer the user's question..."},
    ],
    format="json",
    options={"temperature": 0.0}
)

step2 = res["message"]["content"]
print(step2)

{ "thought": "I need to provide an answer based on what I found", "action": {"tool": "answer", "args": {"answer": "Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine."}} }


There we go! Our final answer from the LLM is:

In [20]:
step2_json = json.loads(step2)
print(step2_json["action"]["args"]["answer"])

Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine.


Perfect, now let's take everything we've done so far and use it to construct our ReAct agent.

In [22]:
from typing import Callable


class ReActAgent:
    def __init__(self, tools: list[Callable]):
        self.messages = []
        self.tools = {x.__name__: x for x in tools}
        tool_instructions = self._format_tool_instructions(tools=tools)
        self.system_prompt = system_prompt
        # add system prompt and tool instructions to our messages
        self.messages.append({
            "role": "system",
            "content": f"{self.system_prompt}\n\n{tool_instructions}"
        })

    def _format_tool_instructions(self, tools: list[Callable]):
        # get the various parameters/description from each tool function
        tool_descriptions = [
            {
                "name": tool.__name__,
                "description": str(inspect.getdoc(tool)),
                "args": {
                    k: str(v).split(": ")[1] for k, v in inspect.signature(tool).parameters.items()
                }
            } for tool in tools
        ]
        # parse these into text instructions that can be added to our system prompt
        tool_instructions = (
            "You have access to the following tools ONLY, no other tools exist:\n\n"
            + "\n".join([str(x) for x in tool_descriptions])
        )
        return tool_instructions

    def __call__(self, prompt: str, max_steps: int = 3):
        self.messages.append({"role": "user", "content": prompt})
        step_count = 1
        steps = []
        while step_count < max_steps:
            # get the next step
            step_dict = self._call_llm(
                messages=self.messages+self._format_scratchpad(steps)
            )
            # get the tool choice and arguments
            tool_choice = step_dict["action"]["tool"]
            args = step_dict["action"]["args"]
            self._print_react(step_count=step_count, step_dict=step_dict)
            if tool_choice == "answer":
                # we've reached the final step
                self.messages.append({"role": "assistant", "content": json.dumps(step_dict)})
                return step_dict["action"]["args"]["answer"]
            else:
                # otherwise we call the chosen tool
                observation = self.tools[tool_choice](**args).strip()
                print(f"Observation {step_count}: {observation[:200]}... ({len(observation)} chars)")
                # if the observation is very long we truncate it
                if len(observation) > 3000:
                    observation = observation[:3000] + "..."
            # add the step to our scratchpad
            steps.append({
                "thought": step_dict["thought"],
                "action": step_dict["action"],
                "observation": observation
            })
            step_count += 1
        # if we get here we've hit the max steps so we force the answer tool
        # to do this we modify the system prompt to only show the answer tool
        print(f"Exceeded max_steps={max_steps}, forcing early answer.")
        messages = self.messages.copy()
        tool_instructions = self._format_tool_instructions(tools=[self.tools["answer"]])
        messages[0]["content"] = f"{self.system_prompt}\n\n{tool_instructions}"
        # now we call the LLM with the modified system prompt
        step_dict = self._call_llm(messages=messages)
        return step_dict["action"]["args"]["answer"]

    def _call_llm(self, messages: list[dict]) -> dict:
        res = ollama.chat(
            model="llama3.2:3b-instruct-fp16",
            messages=messages,
            format="json",
            options={"temperature": 0.0},
        )
        step_dict = json.loads(res["message"]["content"])
        return step_dict
        
    def _format_scratchpad(self, steps: list[dict]) -> list[dict]:
        if not steps:
            # no steps so we just return an empty list
            return []
        steps_str = ""
        for i, step in enumerate(steps):
            steps_str += f"Step {i+1}:\n{json.dumps(step, indent=2)}\n\n"
        steps_str += "What do I do next to answer the user's question..."
        return [{"role": "assistant", "content": steps_str}]

    def _print_react(self, step_count: int, step_dict: dict) -> None:
        """Prints the Reasoning (thought) and Action step"""
        react = "\n".join([
            f"Thought {step_count}: {step_dict['thought']}",
            f"Action {step_count}: {step_dict['action']}",
        ])
        print(react)

# initialize our agent with the search and answer tools
agent = ReActAgent(tools=[search, answer])

In [23]:
agent("What is Ollama in the context of AI?")

Thought 1: I need to find out what Ollama refers to in the context of AI
Action 1: {'tool': 'search', 'args': {'query': 'Ollama AI'}}
Observation 1: Run Language Models Locally with Ollama: A Comprehensive Guide


Follow

Follow


Run Language Models Locally with Ollama: A Comprehensive Guide

Spheron Network
·Nov 3, 2024·7 min read
Table of conte... (10390 chars)
Thought 2: I need to provide an answer based on what I found
Action 2: {'tool': 'answer', 'args': {'answer': 'Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine.'}}


'Ollama is an open-source platform that simplifies the process of setting up and running large language models (LLMs) on your local machine.'

To begin a new conversation, we must reinitialize our agent:

In [25]:
agent = ReActAgent(tools=[search, answer])

agent("What is Chain-of-Thought prompting in AI?")

Thought 1: Chain-of-thought prompting refers to a technique used in AI where the model generates a series of intermediate steps or thoughts that lead to the final answer. This approach allows the model to provide more transparency and explainability into its decision-making process.
Action 1: {'tool': 'search', 'args': {'query': 'Chain-of-thought prompting in AI'}}
Observation 1: What Is Chain-of-Thought Prompting and How Can You Use It?
Maxwell Timothy
10 min read
Chain-of-Thought Prompting.
What is it? How does it matter in AI, especially within the expanding field of prompt... (12562 chars)
Thought 2: I need to provide a clear explanation of Chain-of-Thought prompting and its benefits
Action 2: {'tool': 'answer', 'args': {'answer': 'Chain-of-thought prompting is a technique used in AI where the model generates a series of intermediate steps or thoughts that lead to the final answer, allowing for more transparency and explainability into its decision-making process.'}}


'Chain-of-thought prompting is a technique used in AI where the model generates a series of intermediate steps or thoughts that lead to the final answer, allowing for more transparency and explainability into its decision-making process.'

We can try another query, this time `"What are AI agents?"`:

In [27]:
agent = ReActAgent(tools=[search, answer])

agent("What are AI agents?")

Thought 1: I need to define what an AI agent is
Action 1: {'tool': 'search', 'args': {'query': 'definition of AI agent'}}
Observation 1: What Are AI Agents?
AI agents are poised to revolutionize the way we live and work, automating tasks normally completed by humans.
AI agents are artificial intelligence systems that can perform a wide... (12564 chars)
Thought 2: I need to summarize what an AI agent is
Action 2: {'tool': 'answer', 'args': {'answer': 'An AI agent is a type of artificial intelligence system that can perform complex tasks independently, without the need for fixed rules or constant human intervention.'}}


'An AI agent is a type of artificial intelligence system that can perform complex tasks independently, without the need for fixed rules or constant human intervention.'

We can also ask follow up questions:

In [28]:
agent("and what is a popular type?")

Thought 1: I need to identify a well-known type of AI agent
Action 1: {'tool': 'search', 'args': {'query': 'popular types of AI agents'}}
Observation 1: Sign up
Sign in
Sign up
Sign in
Guide of AI Agent Types with examples
Thomas Latterner
Follow
--
Listen
Share
From a home alarm, to a fleet of robots in a warehouse, to your smartphone’s assistant, AI... (6343 chars)
Thought 2: I need to provide more information about a popular type of AI agent
Action 2: {'tool': 'answer', 'args': {'answer': 'A popular type of AI agent is a Learning Agent, which enhances its performance over time through experience and learning from data.'}}


'A popular type of AI agent is a Learning Agent, which enhances its performance over time through experience and learning from data.'

Despite us not specifying that we're asking about AI agent types in our query (we just ask for `"popular type?"`) the LLM understands that this is a follow-up question from our previous message and so it reformulates our query to include that important context. In this case, rewriting our vague question to a more explicit `"popular types of AI agents"`.

_It's worth noting that the agent types mentioned in the retrieved article originate from agents in the context of **R**einforcement **L**earning (RL). So terminology such as **Learning Agents** are not typical when discussing LLM-based agents \[[source](https://www.javatpoint.com/types-of-ai-agents)\]._

Let's try one more:

In [34]:
agent = ReActAgent(tools=[search, answer])

agent("Give me a deep dive on RAG")

Thought 1: RAG stands for Relevance, Accuracy, and Generalizability. It is a framework used in natural language processing (NLP) to evaluate the quality of text-based models.
Action 1: {'tool': 'answer', 'args': {'answer': 'RAG is a framework used in NLP to evaluate the quality of text-based models.'}}


'RAG is a framework used in NLP to evaluate the quality of text-based models.'

Here we can see that the agent hallucinated. It begun by *incorrectly* stating that RAG stands for **R**elevance, **A**ccuracy, and **G**eneralizability. Because of the LLM's overconfidence in this initial answer it does not defer to the `search` tool, where it would likely find that RAG more commonly means **R**etrieval **A**ugmented **G**eneration.

There are many ways we can make our agents more resilient to producing bad outputs. First and foremost is to choose a more capable LLM. In our example we are using a 3B parameter LLM which is (in the world of LLMs) a *tiny* model. Tiny models can run on smaller hardware and are faster, but are less capable and prone to hallucination, not following instructions, or losing track of the original objective over multiple reasoning steps.

There are always other options too, we could try to improve our prompting, add more determinstic checks for erronous outputs and run retries, or we can try to improve the quality of data being fed into our LLM via our tools. All of these are options open to us while developing with LLMs and we'll discussed these and other options during future chapters.

---