This repository demonstrates LLM Agents using tools from Model Context Protocol (MCP) servers with several frameworks:
- Google Agent Development Kit (ADK)
- LangGraph Agents
- OpenAI Agents
- Pydantic-AI Agents
- Agent with a single MCP server - Learning examples and basic patterns
- Agent with multiple MCP servers - Advanced usage with comprehensive evaluation suite
- Evaluation Dashboard: Interactive Streamlit UI for model comparison
- Multi-Model Benchmarking: Parallel/sequential evaluation across multiple LLMs
- Rich Metrics: Usage analysis, cost comparison, and performance leaderboards
The repo also includes Python MCP Servers:
example_server.py
based on MCP Python SDK Quickstart - Modified to include a datetime tool and run as a server invoked by Agentsmermaid_validator.py
- Mermaid diagram validation server using mermaid-cli
Tracing is done through Pydantic Logfire.
cp .env.example .env
- Add
GEMINI_API_KEY
and/orOPENAI_API_KEY
- Individual scripts can be adjusted to use models from any provider supported by the specific framework
- By default only basic_mcp_use/oai-agent_mcp.py requires
OPENAI_API_KEY
- All other scripts require
GEMINI_API_KEY
(Free tier key can be created at https://aistudio.google.com/apikey)
- By default only basic_mcp_use/oai-agent_mcp.py requires
- Individual scripts can be adjusted to use models from any provider supported by the specific framework
- [Optional] Add
LOGFIRE_TOKEN
to visualise evaluations in Logfire web ui
Run an Agent framework script e.g.:
-
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/pydantic_mcp.py
- Requires
GEMINI_API_KEY
by default
- Requires
-
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/oai-agent_mcp.py
- Requires
OPENAI_API_KEY
by default
- Requires
-
Launch the ADK web UI for visual interaction with the agents:
make adk_basic_ui
Check console, Logfire, or the ADK web UI for output
This project aims to teach:
- How to use MCP with multiple LLM Agent frameworks
- How to see traces LLM Agents with Logfire
- How to evaluate LLMs with PydanticAI evals
-
agents_mcp_usage/basic_mcp/ - Single MCP server integration examples
- basic_mcp_use/ - Contains basic examples of single MCP usage:
adk_mcp.py
- Example of using MCP with Google's Agent Development Kit (ADK 1.3.0)langgraph_mcp.py
- Example of using MCP with LangGraphoai-agent_mcp.py
- Example of using MCP with OpenAI Agentspydantic_mcp.py
- Example of using MCP with Pydantic-AI
- basic_mcp_use/ - Contains basic examples of single MCP usage:
-
agents_mcp_usage/multi_mcp/ - Advanced multi-MCP server integration examples
- multi_mcp_use/ - Contains examples of using multiple MCP servers simultaneously:
pydantic_mcp.py
- Example of using multiple MCP servers with Pydantic-AI Agent
- eval_multi_mcp/ - Contains evaluation examples for multi-MCP usage:
evals_pydantic_mcp.py
- Example of evaluating the use of multiple MCP servers with Pydantic-AI
- multi_mcp_use/ - Contains examples of using multiple MCP servers simultaneously:
-
Demo Python MCP Servers
mcp_servers/example_server.py
- Simple MCP server that runs locally, implemented in Pythonmcp_servers/mermaid_validator.py
- Mermaid diagram validation MCP server, implemented in Python
The basic_mcp
directory demonstrates how to integrate a single MCP server with different agent frameworks. Each example follows a similar pattern:
- Environment Setup: Loading environment variables and configuring logging
- Server Connection: Establishing a connection to the local MCP server
- Agent Configuration: Setting up an agent with the appropriate model
- Execution: Running the agent with a query and handling the response
The MCP server in these examples provides:
- An addition tool (
add(a, b)
) - A time tool (
get_current_time()
) - A dynamic greeting resource (
greeting://{name}
)
graph LR
User((User)) --> |"Run script<br>(e.g., pydantic_mcp.py)"| Agent
subgraph "Agent Frameworks"
Agent[Agent]
ADK["Google ADK<br>(adk_mcp.py)"]
LG["LangGraph<br>(langgraph_mcp.py)"]
OAI["OpenAI Agents<br>(oai-agent_mcp.py)"]
PYD["Pydantic-AI<br>(pydantic_mcp.py)"]
Agent --> ADK
Agent --> LG
Agent --> OAI
Agent --> PYD
end
subgraph "Python MCP Server"
MCP["Model Context Protocol Server<br>(mcp_servers/example_server.py)"]
Tools["Tools<br>- add(a, b)<br>- get_current_time()"]
Resources["Resources<br>- greeting://{name}"]
MCP --- Tools
MCP --- Resources
end
subgraph "LLM Providers"
OAI_LLM["OpenAI Models"]
GEM["Google Gemini Models"]
OTHER["Other LLM Providers..."]
end
Logfire[("Logfire<br>Tracing")]
ADK --> MCP
LG --> MCP
OAI --> MCP
PYD --> MCP
MCP --> OAI_LLM
MCP --> GEM
MCP --> OTHER
ADK --> Logfire
LG --> Logfire
OAI --> Logfire
PYD --> Logfire
LLM_Response[("Response")] --> User
OAI_LLM --> LLM_Response
GEM --> LLM_Response
OTHER --> LLM_Response
# Google ADK example
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/adk_mcp.py
# LangGraph example
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/langgraph_mcp.py
# OpenAI Agents example
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/oai-agent_mcp.py
# Pydantic-AI example
uv run agents_mcp_usage/basic_mcp/basic_mcp_use/pydantic_mcp.py
# Launch ADK web UI for visual interaction
make adk_basic_ui
More details on basic MCP implementation can be found in the basic_mcp README.
The multi_mcp
directory demonstrates advanced techniques for connecting to and coordinating between multiple specialised MCP servers simultaneously. This approach offers several advantages:
- Domain Separation: Each MCP server can focus on a specific domain or set of capabilities
- Modularity: Add, remove, or update capabilities without disrupting the entire system
- Scalability: Distribute load across multiple servers for better performance
- Specialisation: Optimise each MCP server for its specific use case
graph LR
User((User)) --> |"Run script<br>(e.g., pydantic_mcp.py)"| Agent
subgraph "Agent Framework"
Agent["Pydantic-AI Agent<br>(pydantic_mcp.py)"]
end
subgraph "MCP Servers"
PythonMCP["Python MCP Server<br>(mcp_servers/example_server.py)"]
MermaidMCP["Python Mermaid MCP Server<br>(mcp_servers/mermaid_validator.py)"]
Tools["Tools<br>- add(a, b)<br>- get_current_time()"]
Resources["Resources<br>- greeting://{name}"]
MermaidValidator["Mermaid Diagram<br>Validation Tools"]
PythonMCP --- Tools
PythonMCP --- Resources
MermaidMCP --- MermaidValidator
end
subgraph "LLM Providers"
LLMs["PydanticAI LLM call"]
end
Logfire[("Logfire<br>Tracing")]
Agent --> PythonMCP
Agent --> MermaidMCP
PythonMCP --> LLMs
MermaidMCP --> LLMs
Agent --> Logfire
LLM_Response[("Response")] --> User
LLMs --> LLM_Response
# Run the Pydantic-AI multi-MCP example
uv run agents_mcp_usage/multi_mcp/multi_mcp_use/pydantic_mcp.py
# Run the multi-MCP evaluation
uv run agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py
# Run multi-model benchmarking
uv run agents_mcp_usage/multi_mcp/eval_multi_mcp/run_multi_evals.py --models "gemini-2.5-pro-preview-06-05,gemini-2.0-flash" --runs 5 --parallel
# Launch the evaluation dashboard
uv run streamlit run agents_mcp_usage/multi_mcp/eval_multi_mcp/merbench_ui.py
More details on multi-MCP implementation can be found in the multi_mcp README.
This repository includes a comprehensive evaluation system for benchmarking LLM agent performance across multiple frameworks and models. The evaluation suite tests agents on mermaid diagram correction tasks using multiple MCP servers, providing rich metrics and analysis capabilities.
- Multi-Level Difficulty: Easy, medium, and hard test cases for comprehensive assessment
- Multi-Model Benchmarking: Parallel or sequential evaluation across multiple LLM models
- Interactive Dashboard: Streamlit-based UI for visualising results, cost analysis, and model comparison
- Rich Metrics Collection: Token usage, cost analysis, success rates, and failure categorisation
- Robust Error Handling: Comprehensive retry logic and detailed failure analysis
- Export Capabilities: CSV results for downstream analysis and reporting
The included Streamlit dashboard (merbench_ui.py
) provides:
- Model Leaderboards: Performance rankings by accuracy, cost efficiency, and speed
- Cost Analysis: Detailed cost breakdowns and cost-per-success metrics
- Failure Analysis: Categorised failure reasons with debugging insights
- Performance Trends: Visualisation of model behaviour across difficulty levels
- Resource Usage: Token consumption and API call patterns
- Comparative Analysis: Side-by-side model performance comparison
# Single model evaluation
uv run agents_mcp_usage/multi_mcp/eval_multi_mcp/evals_pydantic_mcp.py
# Multi-model parallel benchmarking
uv run agents_mcp_usage/multi_mcp/eval_multi_mcp/run_multi_evals.py \
--models "gemini-2.5-pro-preview-06-05,gemini-2.0-flash,gemini-2.5-flash" \
--runs 5 \
--parallel \
--output-dir ./results
# Launch interactive dashboard
uv run streamlit run agents_mcp_usage/multi_mcp/eval_multi_mcp/merbench_ui.py
The evaluation system enables robust, repeatable benchmarking across LLM models and agent frameworks, supporting both research and production model selection decisions.
The Model Context Protocol allows applications to provide context for LLMs in a standardised way, separating the concerns of providing context from the actual LLM interaction.
Learn more: https://modelcontextprotocol.io/introduction
By defining clear specifications for components like resources (data exposure), prompts (reusable templates), tools (actions), and sampling (completions), MCP simplifies the development process and fosters consistency.
A key advantage highlighted is flexibility; MCP allows developers to more easily switch between different LLM providers without needing to completely overhaul their tool and data integrations. It provides a structured approach, potentially reducing the complexity often associated with custom tool implementations for different models. While frameworks like Google Agent Development Kit, LangGraph, OpenAI Agents, or libraries like PydanticAI facilitate agent building, MCP focuses specifically on standardising the interface between the agent's reasoning (the LLM) and its capabilities (tools and data), aiming to create a more interoperable ecosystem.
-
Clone this repository
-
Install required packages:
make install
To use the ADK web UI, run:
make adk_basic_ui
-
Set up your environment variables in a
.env
file:LOGFIRE_TOKEN=your_logfire_token GEMINI_API_KEY=your_gemini_api_key OPENAI_API_KEY=your_openai_api_key
-
Run any of the sample scripts as shown in the examples above
Logfire is an observability platform from the team behind Pydantic that makes monitoring AI applications straightforward. Features include:
- Simple yet powerful dashboard
- Python-centric insights, including rich display of Python objects
- SQL-based querying of your application data
- OpenTelemetry support for leveraging existing tooling
- Pydantic integration for analytics on validations
Logfire gives you visibility into how your code is running, which is especially valuable for LLM applications where understanding model behaviour is critical.