
  # Agents the Hard Way - Complete Solution

In this exercise, we will build a simple AI agent from scratch. 

## Setup the environment

Like most Python code, we start by importing the necessary modules.
- The `openai` module is maintained by OpenAI and is used for talking to models running remotely.
- The `caching` module helps us cache LLM responses and conserve tokens.

The caching module is custom code, but it makes use of the `cachier` Python library.

In [34]:
import json
import os
from openai import OpenAI

def call_llm_cached(model_client, model_name, message_history, tool_list):
    """Simple wrapper for OpenAI API calls without caching."""
    kwargs = {
        "model": model_name,
        "messages": message_history,
    }
    
    if tool_list:
        kwargs["tools"] = tool_list
    
    response = model_client.chat.completions.create(**kwargs)
    message = response.choices[0].message
    
    result = {"role": "assistant", "content": message.content}
    
    if hasattr(message, 'tool_calls') and message.tool_calls:
        result["tool_calls"] = []
        for tool_call in message.tool_calls:
            result["tool_calls"].append({
                "id": tool_call.id,
                "function": {
                    "name": tool_call.function.name,
                    "arguments": tool_call.function.arguments
                },
                "type": "function"
            })
        result["content"] = None
    
    return result

## Load the configuration

Now I'll load our configuration as constants:
- `API_KEY` loads our credentials from an environment variable
- `MODEL_URL` points to the server hosting your model
- `MODEL_NAME` is the model we are going to use

In [35]:
API_KEY = " "  # Replace with your real key within ""
MODEL_URL = "https://integrate.api.nvidia.com/v1"
MODEL_NAME = "meta/llama-3.3-70b-instruct"

## Part 1 - The Model
The first of four critical parts for an agent is the AI model.

We will talk to the AI models on build.nvidia.com using the OpenAI API.
This API is the *language* that most model providers use.
This means we can use the `OpenAI` class to connect to most model providers. Neat!

Using the `MODEL_URL` and `API_KEY` defined above, create a new model client named `client`.

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
client = OpenAI(base_url=MODEL_URL, api_key=API_KEY)
```
</details>

In [36]:
# TODO

## Why these values?

This `MODEL_URL` points to NVIDIA's hosted Model API Catalog.

Because we are starting with NVIDIA's hosted service, we have a lot of models to choose from. Picking where to start can be difficult.

Start with a newer open source model from a team you recognize.
Start with a moderate sized model (~70b parameters). You'll work on optimizing to a smaller model later.
If you need features like function calling, make sure the model supports it! (More on that later).

## Part 2 - Tools

Every agent has access to some tools.
Tools are how the model is able to interact with the world.

Tools are simply code that is executed at the LLM'srequest.
So let's write our first tool!

Create a function, called `add`, that adds two integers, called `a` and `b`.

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
def add(a, b):
    return a + b
```
</details>

In [37]:
# TODO: Write your add function here!

## Describe the tools

Before moving on, we need to create a description of the tools that the model can understand.
Think of this as the LLM's menu of possible helpers.

In [38]:
tools = [
    {
        "type": "function",
        "function": {
            "name": "add",
            "description": "Add two integers.",
            "parameters": {
                "type": "object",
                "properties": {
                    "a": {"type": "integer", "description": "First integer"},
                    "b": {"type": "integer", "description": "Second integer"},
                },
                "required": ["a", "b"],
            },
        },
    }
]

## Part 3 - Memory

The topic of memory is complex, and we will only scratch the surface.
There are two types of memory, short term and long term.
For now, let's focus on short term memory.

Short term memory starts at the beginning of the conversation and ends at the end of the conversation.
Put simply, short term memory is a log of the conversation.

For this, we will use a humble list.
Every line in our list will be a message in the conversation, stored in a dictionary.
The messages can come from the user, the assistant, or from tools.

Create an initial list called `memory`.
Initialize it with this message from the user: "What is 3 plus 12?"

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
memory = [
  {"role": "user", "content": "What is 3 plus 12?"}
]
```
</details>

In [39]:
# TODO

## Run the agent

We now have three of the four pieces required of an agent. The last missing piece is the routing. For the time being, we will forgo this piece and manually route this message.

The agent starts by giving the memory and tool list to the model. The model will reply by either requesting a tool or answering the question.

Based on the model's response, we will decide how to proceed.

Call the model using the `call_llm_cached` function. The function takes four arguments:

- `model_client`: the OpenAI client
- `model_name`: the name of the model to use (check the constants from before)
- `message_history`: your short term memory
- `tool_list`: the menu of tools the model can access

Use this function to call the LLM. Save the result by appending it to the end of `messages`.

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
llm_response = call_llm_cached(client, MODEL_NAME, memory, tools)
memory.append(llm_response)

print(llm_response) 
```
</details>

In [40]:
#TODO

{'role': 'assistant', 'content': None, 'tool_calls': [{'id': 'chatcmpl-tool-f485374cbf09466a9ddef47ede3cabb0', 'function': {'name': 'add', 'arguments': '{"a": 3, "b": 12}'}, 'type': 'function'}]}


## Part 4 - Routing

Looks like the model chose to request a tool instead of answering. We can tell because `content` is `null` and a function is defined under `tool_calls`.

> **Tools vs Function Calling?**  
> These terms are often used interchangeably. Technically, Function Calling is a feature of a model. This feature allows the model to request that the developer run the Tools.  
> But most of the time, these terms are simply referring to an agent's ability to run functions.

We can see that the model has requested that we run the `add` function with the arguments 3 and 12.

Write some code to extract the requested function's name, arguments, and id from `messages[-1]`.  
Store those values in variables called `tool_name`, `tool_args`, and `tool_id` respectively.

💡 **TIP:** The value of `arguments` is a string. Use the `json` library to read it.

In [43]:
#TODO

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
tool_call = memory[-1]["tool_calls"][0]
tool_name = tool_call["function"]["name"]
tool_args = json.loads(tool_call["function"]["arguments"])
tool_id = tool_call["id"]
```
</details>

If you haven't noticed by now, even though the feature is called tool calling...  
The model doesn't actually call the tool!

So let's write the code to run the tools as requested.

Check if the tool name is equal to `add`. If it is, then run the add function with the requested arguments.

Save the output from the tool call to a variable called `tool_out`.

In [44]:
#TODO

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
if tool_name == "add":
    tool_out = add(**tool_args)
```
</details>

We just got `tool_out` back from the tool. Now we can update the memory with the tool output.

Next time we call the model, it will see the prompt from the user, its own request for a tool call, and the result of that tool call.

This is another one where the syntax is tricky and picky. This is the standard message format for a tool call result (as defined by OpenAI):

```json
[
  {"role": "user", "content": "What is 3 plus 12?"},
  {"role": "assistant", "tool_calls": ... },
  {"role": "tool", "tool_call_id": "...", "name": "add", "content": "15"}
]
```



In [45]:
#TODO

<details>
<summary>💡 NEED SOME HELP?</summary>

```python
tool_result = {
    "role": "tool",
    "tool_call_id": tool_id,
    "name": tool_name,
    "content": str(tool_out)
}
memory.append(tool_result)
```
</details>

Now, we call the model again and save the response in memory.

💡 **HINT:** It is the exact same code as last time.

In [48]:
llm_response = call_llm_cached(client, MODEL_NAME, memory, tools)
memory.append(llm_response)
print("Here is what the model had to say:\n")
print(llm_response)

Here is what the model had to say:

{'role': 'assistant', 'content': 'The answer is 15.'}


## Complete Conversation History

Let's see the full memory to understand how the agent processed the conversation.

In [47]:
import pprint
pprint.pprint(memory)

[{'content': 'What is 3 plus 12?', 'role': 'user'},
 {'content': None,
  'role': 'assistant',
  'tool_calls': [{'function': {'arguments': '{"a": 3, "b": 12}', 'name': 'add'},
                  'id': 'chatcmpl-tool-f485374cbf09466a9ddef47ede3cabb0',
                  'type': 'function'}]},
 {'content': '15',
  'name': 'add',
  'role': 'tool',
  'tool_call_id': 'chatcmpl-tool-f485374cbf09466a9ddef47ede3cabb0'},
 {'content': 'The answer is 15.', 'role': 'assistant'}]


 # You've Reached The End! 

 Congrats on making your very first agent. But you might be wondering, now what? Keep going through this workshop to find out. 
