# Demystifying AI Agents with Python Code

Agent = LLM + memory + planning + tools + while loop

## 1. LLM

![ChatGPT.png](./ChatGPT.png)

In [None]:
from openai import OpenAI

client = OpenAI()

MODEL = "gpt-4o"

**Note**

If you want to run this notebook with Google Gemini or Anthropic, the most straightforward approach would be to use the OpenAI-compatible endpoints that they provide. I have tested it with both, just uncomment the relevant code below.

Documentation: 
- https://ai.google.dev/gemini-api/docs/openai
- https://docs.anthropic.com/en/api/openai-sdk

**Google**

In [None]:
# import os

# client = OpenAI(
#     api_key=os.environ["GEMINI_API_KEY"],
#     base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
# )

# MODEL = "gemini-2.0-flash"

**Anthropic**

In [None]:
# import os

# client = OpenAI(
#     api_key=os.environ["ANTHROPIC_API_KEY"],
#     base_url="https://api.anthropic.com/v1/"
# )

# MODEL = "claude-3-7-sonnet-20250219"

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Tell me about the Python programming language in a few sentences."}
    ]
)
print(response.model_dump_json(indent=4))

In [None]:
llm_text = response.choices[0].message.content
print(llm_text)

### System or "developer" prompt

We can use the system message to give the model instructions

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "Talk like a pirate"},
        {"role": "user", "content": "Tell me about the Python programming language in a few sentences."}
    ]
)
print(response.choices[0].message.content)

System instructions take precedence over user instructions

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": "Talk like a pirate"},
        {"role": "user", "content": "Don't talk like a pirate. Tell me about the Python programming language in a few sentences."}
    ]
)
print(response.choices[0].message.content)

## 2. Memory

The main LLM API (Chat Completions) is stateless.

In [None]:
def call_llm(message):
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "user", "content": message}
        ]
    )
    return response.choices[0].message.content

In [None]:
call_llm("My name is William")

In [None]:
call_llm("What is my name?")

The way to track state is by appending to the `messages` list

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Tell me about the Python programming language in a few sentences."},
        {"role": "assistant", "content": llm_text},
        {"role": "user", "content": "Can you elaborate on its use cases?"}
    ]
)
print(response.choices[0].message.content)

You could represent this with a stateful class

In [None]:
class ChatBot:
    def __init__(self):
        self.messages = []

    def chat(self, message):
        self.messages.append({"role": "user", "content": message})
        response = client.chat.completions.create(
            model=MODEL,
            messages=self.messages,
        )
        self.messages.append({"role": "assistant", "content": response.choices[0].message.content})
        return response.choices[0].message.content

In [None]:
bot = ChatBot()
bot.chat("My name is William")

In [None]:
bot.chat("What is my name?")

## 3. Planning + Tools

Lets say you wanted your LLM to call a Python function that you've written.

Why?

**Note**: Tool vs Function

Earlier terminology was "function", but now people are shifting to "tool". Both are still used somewhat interchangeably.
Strictly speaking, at least in OpenAI's terms, a "function" is one specific kind of "tool".

In [None]:
question = "Who is currently leading the Masters?"

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": question},
    ]
)
print(response.choices[0].message.content)

In [None]:
import json
import os
import requests

def search_web(query: str) -> str:
    """
    Searches the web
    """
    
    headers = {
      'X-API-KEY': os.environ["SERPER_API_KEY"],
      'Content-Type': 'application/json'
    }
    payload = {
        "q": query
    }
    response = requests.post("https://google.serper.dev/search", json=payload, headers=headers)    
    return json.dumps(response.json()["organic"], indent=4)

In [None]:
print(search_web("masters 2025"))

How do I get the LLM to use this function?

### Manual Prompting

In [None]:
SYSTEM_PROMPT = """You have access to a search_web function that takes a query parameter. 
If you want to use it return just 'search_web(<query>)'"""

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Who is currently leading the Masters?"}
    ],
)
print(response.choices[0].message.content)

In [None]:
from datetime import datetime

iso_string = datetime.now().isoformat()

In [None]:
SYSTEM_PROMPT = f"""You have access to a search_web function that takes a query parameter. 
If you want to use it return just 'search_web("<query>")'

The current datetime is {datetime.now().isoformat()}"""

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Who is currently leading the Masters?"}
    ]
)
print(response.choices[0].message.content)

In [None]:
import re

def extract_param(s: str) -> str | None:    
    match = re.search(r'search_web\("(.+?)"\)', s)
    if match:
        parameter = match.group(1)
        return parameter

In [None]:
extract_param(response.choices[0].message.content)

In [None]:
print(search_web('Masters Tournament 2025 current leader'))

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Who is currently leading the Masters?"},
        {"role": "assistant", "content": """search_web("current leader of the Masters Tournament 2025")"""},
        {"role": "user", "content": f"Here are the results of search_web: {search_web('current leader of the Masters Tournament 2025')}"},
    ]
)
print(response.choices[0].message.content)

#### Overview: How do we get the LLM to interact with functions?

1. Tell it the functions it can use and their parameters.
2. Extract the function calls and parameters from response.
3. Actually run the function with the parameters.
4. Pass the function result to the LLM.
5. Get the LLM's final response

### Provider APIs

Describing tools in this way is cumbersome and error-prone. So the providers have trained their models to recognize a specific syntax, and they provide parameters in their SDKs for specifying tools in structured format.

In [None]:
WEB_SEARCH_TOOL = {
    "type": "function",
    "function": {
        "name": "search_web",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {
                    "type": "string",
                }
            },
            "required": [
                "query"
            ],
            "additionalProperties": False
        },
        "strict": True
    }
}

In [None]:
SYSTEM_PROMPT = f"""The current datetime is {datetime.now().isoformat()}"""

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Who is currently leading the Masters?"}
    ],
    tools=[WEB_SEARCH_TOOL]
)
print(response.choices[0].message.model_dump_json(indent=4))

In [None]:
from typing import Any

def extract_params(response) -> dict[str, Any]:    
    return json.loads(response.choices[0].message.tool_calls[0].function.arguments)

In [None]:
extract_params(response)

In [None]:
search_results = search_web(**extract_params(response))
print(search_results)

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "Who is currently leading the Masters?"},
        # The LLM's previous response
        response.choices[0].message,
        {"role": "tool", "tool_call_id": response.choices[0].message.tool_calls[0].id, "content": search_results},
    ],
    tools=[WEB_SEARCH_TOOL]
)
print(response.choices[0].message.content)

*Important note*: the LLM doesn't have to use the tools if they are not necessary. This is part of planning. 

In [None]:
response = client.chat.completions.create(
    model=MODEL,
    messages=[
        {"role": "user", "content": "Write a haiku about Python programming"},
    ],
    tools=[WEB_SEARCH_TOOL]
)
print(response.choices[0].message.content)

For convenience we can also automatically turn our functions into tool specs.

In [None]:
import inspect

print(inspect.signature(search_web))
print(inspect.signature(search_web).parameters)

In [None]:
import inspect

def python_type_to_json_type(py_type):
    mapping = {
        str: "string",
        int: "integer",
        float: "number",
        bool: "boolean"
    }
    # Default to "string" if the type is not in the mapping.
    return mapping.get(py_type, "string")

def function_to_tool_spec(fn):
    sig = inspect.signature(fn)
    properties = {}
    required = []
    
    # Iterate over parameters of the function.
    for param in sig.parameters.values():
        
        # Map the Python annotation to a JSON schema type.
        param_schema = {"type": python_type_to_json_type(param.annotation)}
        
        # If the parameter has a default value, add it to the schema.
        # Otherwise, mark it as required.
        if param.default != inspect.Parameter.empty:
            param_schema["default"] = param.default
        else:
            required.append(param.name)
        
        properties[param.name] = param_schema

    spec = {
        "type": "function",
        "function": {
            "name": fn.__name__,
            "parameters": {
                "type": "object",
                "properties": properties,
                "required": required,
                "additionalProperties": False
            },
            "strict": True
        }
    }
    return spec

In [None]:
function_to_tool_spec(search_web)

In [None]:
function_to_tool_spec(search_web) == WEB_SEARCH_TOOL

In [None]:
def call_llm_with_tool_calling(message, functions):
    # Map the function name to the actual function
    fn_map = {fn.__name__: fn for fn in functions}

    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": message},
    ]
    
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=[function_to_tool_spec(fn) for fn in functions]
    )

    tool_calls = response.choices[0].message.tool_calls

    # while loop!
    while tool_calls:
        messages.append(response.choices[0].message)
        for tool_call in tool_calls:
            # Get the function from our mapping
            fn = fn_map[tool_call.function.name]
    
            # Call the function with the specified args
            args = json.loads(tool_call.function.arguments)
            print(f"Calling {tool_call.function.name} with {args}")
            result = fn(**args)
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=[function_to_tool_spec(fn) for fn in functions]
        )

        tool_calls = response.choices[0].message.tool_calls

    return response.choices[0].message.content

In [None]:
output = call_llm_with_tool_calling("who is currently leading the masters?", [search_web])
print(output)

## Our First Agent

Combining LLMs, memory, and planning + tools

We have to slightly modify the previous function to work well with our message history

In [None]:
def call_llm_with_tool_calling(message, functions, message_history=None):
    fn_map = {fn.__name__: fn for fn in functions}

    # We can pass in the previous messages. If there are none, we start from scratch.
    if message_history is None:
        message_history = []

    messages = [{"role": "user", "content": message}]
    
    response = client.chat.completions.create(
        model=MODEL,
        messages=message_history + messages,
        tools=[function_to_tool_spec(fn) for fn in functions]
    )

    tool_calls = response.choices[0].message.tool_calls

    while tool_calls:
        messages.append(response.choices[0].message)
        for tool_call in tool_calls:
            # Get the function from our mapping
            fn = fn_map[tool_call.function.name]
    
            # Call the function with the specified args
            args = json.loads(tool_call.function.arguments)
            result = fn(**args)
            messages.append({"role": "tool", "tool_call_id": tool_call.id, "content": result})

        response = client.chat.completions.create(
            model=MODEL,
            messages=message_history + messages,
            tools=[function_to_tool_spec(fn) for fn in functions]
        )

        tool_calls = response.choices[0].message.tool_calls

    return response.choices[0].message.content, messages

In [None]:
class Agent:
    def __init__(self, functions, system_prompt=SYSTEM_PROMPT):
        self.messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        self.functions = functions

    def chat(self, message):
        output, messages = call_llm_with_tool_calling(message, self.functions, message_history=self.messages)
        self.messages += messages
        return output

In [None]:
agent = Agent(functions=[search_web])

In [None]:
agent.chat("Hi! I am interested in AI")

In [None]:
agent.chat("Can you find events in Austin that I might like?")

## Agent Frameworks

That can be a lot of code to write on top of the OpenAI SDK. Many frameworks have popped up that abstract that away from you.

In [None]:
from crewai import Agent, Task, Crew
from crewai.tools import tool

In [None]:
agent = Agent(
    role="searcher",
    goal="Get information for the user",
    backstory="",
    tools=[tool(search_web)]    
)

In [None]:
task = Task(
    description="{query}",
    expected_output="Answers for {query}",
    agent=agent,
)

In [None]:
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
)

In [None]:
result = crew.kickoff(inputs={"query": "Who is currently leading the Masters in 2025?"})

In [None]:
result

In [None]:
print(result.raw)

#### Multiple tools

In [None]:
import html2text
import requests

def fetch_url(url: str) -> str:
    "Get the content at a specific URL"
    response = requests.get(url)
    h = html2text.HTML2Text()
    h.ignore_links = True
    h.ignore_images = True
    return h.handle(response.text)

In [None]:
print(fetch_url("https://sports.yahoo.com/golf/live/2025-masters-second-round-live-leaderboard-and-updates-justin-rose-leads-scottie-scheffler-rory-mcilroy-at-augusta-national-110049747.html"))

In [None]:
agent = Agent(
    role="searcher",
    goal="Get information for the user",
    backstory="",
    tools=[tool(search_web), tool(fetch_url)]    
)

In [None]:
task = Task(
    description="{query}",
    expected_output="Answers for {query}",
    agent=agent,
)

In [None]:
crew = Crew(
    agents=[agent],
    tasks=[task],
    verbose=True,
)

In [None]:
result = crew.kickoff(inputs={"query": "Who is speaking at PyTexas 2025?"})

In [None]:
print(result.raw)

### Multi-Agent

In [None]:
promoter_agent = Agent(
    role="python_promoter",
    goal="Promote the use of Python",
    backstory="You love to use Python",
    tools=[tool(search_web), tool(fetch_url)]    
)

critic_agent = Agent(
    role="python_critic",
    goal="Critique the use of Python",
    backstory="You hate Python and don't think it should ever be used",
    tools=[tool(search_web), tool(fetch_url)]    
)

summarizer_agent = Agent(
    role="summarizer",
    goal="Summarize both sides of an argument",
    backstory="",
)

In [None]:
promote_task = Task(
    description="Promote Python",
    expected_output="Answers for why Python is a good choice for the question {query}",
    agent=promoter_agent,
)

critique_task = Task(
    description="Critique Python",
    expected_output="Answers for why Python is a bad choice for the question {query}",
    agent=critic_agent,
)

summarize_task = Task(
    description="Summarize pros and cons",
    expected_output="A balanced answer to the question {query}",
    context=[promote_task, critique_task],
    agent=summarizer_agent,
)

In [None]:
crew = Crew(
    agents=[promoter_agent, critic_agent],
    tasks=[promote_task, critique_task, summarize_task],
    verbose=True,
)

In [None]:
result = crew.kickoff(inputs={"query": "Should I use Python to build my web app"})

In [None]:
print(result)

## Evaluation (A Brief Intro!)

Since tool use is the model picking options from a finite list (the tools you give it), one way to evaluate tool use is to treat it as a multi-class classification problem.

In [None]:
def search_healthcare_benefits_documents(query: str) -> str:
    pass

def escalate_to_human() -> str:
    pass

def reschedule_appointment(new_time: str) -> str:
    pass

healthcare_functions = [search_healthcare_benefits_documents, escalate_to_human, reschedule_appointment]

You are in control of when to stop the execution, so you don't have to run the functions.

In [None]:
def call_llm_with_tool_calling(message, functions, message_history=None):
    if message_history is None:
        message_history = []

    messages = [{"role": "user", "content": message}]
    
    response = client.chat.completions.create(
        model=MODEL,
        messages=message_history + messages,
        tools=[function_to_tool_spec(fn) for fn in functions]
    )

    tool_calls = response.choices[0].message.tool_calls

    # We stop without running any tool calls
    if not tool_calls:
        return None

    return tool_calls[0].function.name

In [None]:
call_llm_with_tool_calling("What is my copay for primary care?", functions=healthcare_functions)

In [None]:
evaluation_data = [
    ("What is my copay for primary care?", 'search_healthcare_benefits_documents'),
    ("Talk to a human", 'escalate_to_human'),
    ("Hi!", None),
    ("Change my next appointment to Thursday", "reschedule_appointment"),
    ("Are x rays covered?", 'search_healthcare_benefits_documents'),
]

In [None]:
results = []

for message, expected_fn_name in evaluation_data:
    actual_fn_name = call_llm_with_tool_calling(message, functions=healthcare_functions)
    results.append(expected_fn_name == actual_fn_name)

In [None]:
accuracy = sum(results) / len(results)
print(accuracy)

More complicated evaluation topics:
- Nondeterminism
- How to handle the parameters?
- How to handle multi-turn conversations?