# Unit 1 Efficient Multi-Query Agent Flow With Tool Caching

# Advanced MCP Server and Agent Integration in Python: Lesson 1 - Tool Caching

Welcome to the first lesson of this course on **advanced MCP server and agent integration in Python**. In previous courses, you learned how to build an MCP server and connect it to an agent, giving your agent the ability to use external tools. Now, we will take your skills further by focusing on how to make your agent more efficient and responsive, especially when handling a sequence of user queries. In this lesson, you will learn how to use tool caching to reduce latency and improve performance when running an agent across multiple queries. These techniques are essential for building agents that feel fast and natural in real-world applications.

-----

## Recap Of MCP Tools And Agent Integration

Before we dive into advanced topics, let's quickly review how **MCP servers and agents** work together. An MCP server provides a set of tools — these are actions or functions the agent can use to help answer user queries. In earlier lessons, you learned how to launch an MCP server and connect it to an agent using the OpenAI Agents SDK. The agent gathers tools from the MCP server (or servers) and can use them alongside any built-in tools you define. This integration allows your agent to perform a wide range of tasks, from fetching data to managing files, depending on the tools available.

-----

## The Importance And Benefits Of Tool Caching

Every time an agent starts a new session or receives a query, it needs to know what tools are available. By default, the agent asks the MCP server for the list of tools each time. If the MCP server is running locally, this might be fast, but if it's remote or the tool list is large, this can slow things down. **Caching the tool list** means the agent remembers the tools after the first request, so it doesn't have to ask the server again for every query. This reduces latency, speeds up responses, and saves resources.

To see the impact of not using tool caching, here's an example of what you might see in the logs of an MCP server running with Server-Sent Events (SSE) when an agent processes multiple queries without caching enabled. Notice how the server receives a `ListToolsRequest` for each new query, even though the list of tools hasn't changed:

```
INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:03] INFO     Processing request of type ListToolsRequest

INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:07] INFO     Processing request of type CallToolRequest

INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:19] INFO     Processing request of type ListToolsRequest

INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:22] INFO     Processing request of type CallToolRequest

INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:28] INFO     Processing request of type ListToolsRequest

INFO:     127.0.0.1:37682 - "POST /messages/?session_id=... HTTP/1.1" 202 Accepted
[05/05/25 15:30:32] INFO     Processing request of type CallToolRequest
```

Each time the agent receives a new query, it sends a `ListToolsRequest` to the MCP server to fetch the available tools, even if the tool list hasn't changed. This repeated fetching adds unnecessary latency and load to the server, especially in multi-query conversations. By enabling tool caching, you can avoid these redundant requests and make your agent much more efficient.

-----

## When Not To Cache

Tool caching is most effective when your tool list is stable and does not change often. However, if your tool list is dynamic and changes frequently—such as when tools are added, removed, or updated at runtime—caching can cause the agent to miss new or updated tools. In these cases, the agent may continue to use an outdated tool list, leading to incorrect or incomplete responses.

If you expect frequent changes to your tool list, consider disabling caching or make sure to manually invalidate the cache whenever the tool set changes. This ensures your agent always has access to the latest tools and can respond accurately to user queries.

-----

## Implementing Tool Caching With The OpenAI Agents SDK

Let's look at how to enable tool caching in your agent setup. In the **OpenAI Agents SDK**, you can set the `cache_tools_list` parameter to `True` when you create your MCP server connection. This tells the agent to fetch the tool list once and reuse it for future queries. Here's how it looks in practice:

```python
# Connect to the MCP server via stdio
async with MCPServerStdio(
    params=server_params,
    cache_tools_list=True  # Cache the tools list to avoid re-fetching it
) as mcp_server:
```

When you use this `mcp_server` in your agent, the agent will only fetch the tool list from the server the first time.

### Clearing the Tools List Cache

Tool caching works best when your tool list is stable, but there are situations where you need to update the cache to reflect changes. Only the metadata about the tools (such as names, input/output schema, and descriptions) is cached—not the actual tool logic or state. The logic itself always runs live on the server. If you update a tool's implementation on the server but keep its metadata the same, you do not need to invalidate the cache. However, if you add, remove, or modify tools (for example, by changing their schema or description), you should clear the cache so the agent can fetch the latest tool list.

You can manually clear the cached tool list by calling:

```python
# Manually clear the cached tool list if tools have changed
mcp_server.invalidate_tools_cache()
```

A common use case for this is when tools are dynamically registered based on user roles or runtime data. For example, if administrative tools are only available when a user logs in as an admin, you should call `invalidate_tools_cache()` immediately after a login event or role change. This ensures the agent fetches the correct tool list for the new user context and always has access to the appropriate tools.

If your tool list changes frequently, consider when and how to invalidate the cache so your agent always operates with the most up-to-date information.

-----

## Running An Agent Across Multiple Queries

A key part of building a helpful agent is maintaining the conversation context across multiple user queries. This means the agent remembers what was said before and can respond in a way that makes sense for the ongoing conversation. In the example below, you will see how to run an agent through a series of queries, updating the conversation history each time.

Here is a complete example that brings together everything we've discussed:

```python
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerSse

async def main():
    server_params = {"url": "http://localhost:3000/sse"}
    
    queries = [
        "Show me the whole list",
        "Add 2 bananas",
        "Mark milk as purchased"
    ]

    async with MCPServerSse(
        params=server_params,
        cache_tools_list=True  # Enable tool list caching
    ) as mcp_server:
        
        agent = Agent(
            name="Shopping Assistant",
            instructions="You are an assistant that uses shopping list tools to help manage a shopping list.",
            mcp_servers=[mcp_server],
            model="gpt-4.1"
        )
        
        conversation_history = []
        for query in queries:
            print(f"Query: {query}")
            conversation_history.append({"role": "user", "content": query})
            
            result = await Runner.run(
                starting_agent=agent,
                input=conversation_history
            )
            
            print(f"Assistant: {result.final_output}\n")
            conversation_history = result.to_input_list()

if __name__ == "__main__":
    asyncio.run(main())
```

In this code, the agent is connected to an MCP server with tool caching enabled. Because the tool list is cached, the agent only fetches the list of available tools from the server once, instead of making a separate request for each query. This significantly reduces latency and speeds up the agent's responses across the entire sequence of user queries. Tool caching is especially beneficial in multi-query scenarios like this, where the agent can reuse the same tool list for each turn in the conversation, resulting in a smoother and more responsive user experience.

-----

## Summary, Best Practices, And Next Steps

In this lesson, you learned how to make your agent more efficient by **caching the list of tools** from the MCP server. Caching reduces latency and improves performance, especially when the tool list is stable. As a best practice, enable tool caching when your tool list does not change often, and remember to invalidate the cache if you update the tools.

You are now ready to practice these concepts in hands-on exercises. Keep up the great work — these skills will help you build faster and smarter agents\!

## Watching Tool Requests in Action

In this first task, you’ll see for yourself how an agent interacts with an MCP server when tool caching is not enabled. This will help you understand the performance impact of repeatedly fetching the tool list for every query.

The terminal tab is already open and running an MCP server. You do not need to change any code. Simply click the Run button to execute the agent code, and watch the terminal tab to observe how the agent interacts with the MCP server as it processes each query.

Pay close attention to how often the server receives a ListToolsRequest. This hands-on step will help you see the impact of not using tool caching before you move on to making your agent more efficient.

```python
import asyncio
from agents import Agent, Runner
from agents.mcp import MCPServerSse


async def main():
    # Configure the MCP server parameters
    server_params = {"url": "http://localhost:3000/sse"}
    
    # List of queries to run
    queries = [
        "Show me the whole list",
        "Add 2 bananas",
        "Mark milk as purchased",
        "Show me only unpurchased items",
        "Remove bread from the list",
        "Show me the whole list"
    ]
    
    # Connect to the MCP server
    async with MCPServerSse(params=server_params) as mcp_server:
        
        # Create an agent with conversation-aware instructions.
        agent = Agent(
            name="Shopping Assistant",
            instructions=(
                "You are an assistant that uses shopping list tools to help manage a shopping list."
            ),
            mcp_servers=[mcp_server],
            model="gpt-4.1"
        )
        
        # Initialize conversation history as a list of message dictionaries.
        conversation_history = []

        # Iterate over the queries
        for query in queries:
            # Print the query
            print(f"Query: {query}")

            # Append the query to the conversation history
            conversation_history.append({"role": "user", "content": query})
            
            # Run the agent with the conversation history
            result = await Runner.run(
                starting_agent=agent,
                input=conversation_history
            )
            
            # Print the agent's response
            print(f"Assistant: {result.final_output}\n")
            
            # Update the conversation history with the agent's output
            conversation_history = result.to_input_list()

if __name__ == "__main__":
    asyncio.run(main())

```

## Speeding Up Agent Conversations

## Refreshing the Agent Tool Cache