In [1]:
import dotenv
dotenv.load_dotenv()

True

# Agents: key concepts


### Agent vs Chain

The main difference between Agents and Chains is the ability to decide what to do next. Recall what we did in RAG notebook, per each user input we went through the same predefined list of steps.

The Agents receives a task as input and makes a plan on how to solve it.
Depending on implementation, it can either come up with whole sequence of steps at the beginning or decide on next steps on the fly.

### Tools

To enable the Agent to solve the given task, we can provide it with a list of available tools and allow it to decide which one to use.

The example tools are: 
- web search <- maybe we need to fetch some data to answer the question
- custom Python function <- run some predefined code
- calculator <- LLMs are not good at more complex calculations


### Agent taking next steps

Remember that we are working with Language Models which can only predict the probability of next token. 

In order to automatically detecy whether the Agent wants to use a tool or take some next step we need to somehow structure its output to be able to parse it.

To start with, we will use a mechanism implemented into `OpenAI` models called function calling. It allows us to define a set of functions and enables the model come up with sentence completion or function call.

Since `OpenAI` is closed API we do not know how exactly it works, but good assuption would be that the base model is furher finetuned on a dataset with function calls 
where the output is probably in JSON to easily parse it.

In [2]:
import openai
import json

def add_two_number(x, y):
    """Function that adds two numbers"""
    return x + y

# available types are the same as for normal Json Schema
# https://json-schema.org/understanding-json-schema/reference/type
functions = [
        {
            "name": "add_two_number",
            "description": "Add two numbers",
            "parameters": {
                "type": "object",
                "properties": {
                    "x": {
                        "type": "number",
                        "description": "first number",
                    },
                    "y": {
                        "type": "number",
                        "description": "second number",
                    },
                },
                "required": ["x", "y"],
            },
        }
    ]


completion = openai.ChatCompletion.create(
    model='gpt-3.5-turbo-0613',
    messages=[{'role': 'user', 'content': 'What is 1 + 1?'}],
    functions=functions,
    function_call="auto",  # auto is default
    # function_call={'name': 'add_two_number'}, # this will force the model to use the function
)

When the model wants to call a function, the `finish_reason` is set to `function_call` instead of `stop`

In [3]:
resp = completion['choices'][0] # type: ignore
print(resp) 

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "function_call": {
      "name": "add_two_number",
      "arguments": "{\n  \"x\": 1,\n  \"y\": 1\n}"
    }
  },
  "finish_reason": "function_call"
}


In [4]:
# type: ignore
finish_reason = resp['finish_reason']
f_name = completion['choices'][0]['message']['function_call']['name']
f_args = json.loads(completion['choices'][0]['message']['function_call']['arguments'])

print(f"finish_reason: {finish_reason}, f_name: {f_name}, f_args: {f_args}")

f = globals()[f_name]
f(**f_args)

finish_reason: function_call, f_name: add_two_number, f_args: {'x': 1, 'y': 1}


2

In order not to write function schema in JSON on our own, we can use utility function from LangChain to do it for us. The schema differs slightly from the one we wrote.

In [7]:
from langchain.tools import format_tool_to_openai_function
from langchain.agents import tool

@tool
def add_two_number(x: float, y: float):
    """Function that adds two numbers"""
    return x + y

tools = format_tool_to_openai_function(add_two_number) # type: ignore
tools

{'name': 'add_two_number',
 'description': 'add_two_number(x: float, y: float) - Function that adds two numbers',
 'parameters': {'title': 'add_two_numberSchemaSchema',
  'type': 'object',
  'properties': {'x': {'title': 'X', 'type': 'number'},
   'y': {'title': 'Y', 'type': 'number'}},
  'required': ['x', 'y']}}

In [12]:
completion = openai.ChatCompletion.create(
    model='gpt-3.5-turbo-0613',
    messages=[{'role': 'user', 'content': 'What is 1 + 1?'}],
    functions=[tools],
    function_call="auto",
)

resp = completion['choices'][0] # type: ignore
print(resp)

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "function_call": {
      "name": "add_two_number",
      "arguments": "{\n  \"x\": 1,\n  \"y\": 1\n}"
    }
  },
  "finish_reason": "function_call"
}


### How does it work?

according to the [OpenAI docs](https://platform.openai.com/docs/guides/gpt/function-calling):

---

The latest models (`gpt-3.5-turbo-0613` and `gpt-4-0613`) have been fine-tuned to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature.

note: the model may hallucinate parameters

---

The most important takeaway is that the model is not actually executing the function, it is just generating the arguments in JSON format. It is up to us to parse the output and execute the function.

Also, by providint the model with a list of function, we do not switch it to some magical mode, it is still a language model and it will generate text token by token.
So it can still hallucinate and generate nonsense.

Let's see what happens when we force it to produce a function call but provide some nonsense as input:

In [13]:
completion = openai.ChatCompletion.create(
    model='gpt-3.5-turbo-0613',
    messages=[{'role': 'user', 'content': 'How to make one cake and then another?'}],
    functions=functions,
    function_call={'name': 'add_two_number'},
)

In [14]:
resp = completion['choices'][0] # type: ignore
print(resp) 

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": null,
    "function_call": {
      "name": "add_two_number",
      "arguments": "{\n  \"x\": 1,\n  \"y\": 1\n}"
    }
  },
  "finish_reason": "stop"
}


In [11]:
# but at least the Json is valid :)
json.loads(resp['message']['function_call']['arguments'])

{'x': 1, 'y': 1}

### Next steps

Now we know how to get structure action output from the model. What's missing is planing and exeuction.
More complex question may require multiple actions to be executed and it's output may be used by the model to decide on next steps.

Even for simple questions, we need to parse the output and execute the function call.
Also it would be nice to get a nice final result sentence, not just a return value of the function.
We will do it in next notebook.

`OpenAI` function calling makes it easier to build agents because the API takes care of formatting the model prompt with available functions and gives us a nice JSON output.
If we were to use some Open Source model we would need to implement this functionality on our own.

Also note that Open Source models are usually trained on some plain text from the internet and only some of them are fine-tuned on e.g. code or other structured data. If you plan to use Open Source model probably the one fine-tuned on code would be the best choice to get structured output.

In next notebook you will see how to automate the process of parsing the output and executing the function calls.

For now, let just think how the Agents plan could look like.

```
user: "What is 1 + 1?"

agent: (thinks) "I will use calculator to solve this task"

agent: (uses tool) call add_two_number with arguments 1 and 1

agent: (acts) provide the call result as an answer
```