Replies: 5 comments 6 replies
-
I think that's not specific to MCP tools. Any tool failure should not crash the agent. @Jacksunwei could you please take a look at the robustness of agent workflows ? |
Beta Was this translation helpful? Give feedback.
-
Right now, this is expected behavior. We were thinking about catching all errors and submit the error to LLM, so that it can retry. However, that approach may leak sensitive data to LLM unexpected. Hence, the ending solution is that we recommend tool author to handle the error and return a sanitized and meaningful error object like below, so that LLM can help retry and doesn't leak sensitive data to LLM.
In the meanwhile, I'm also interest how people think about this topic. I'm converting this to a discussion for more people to comment. |
Beta Was this translation helpful? Give feedback.
-
In general, which options do you think the best and would like to adopt?
Tool author handles all the error within the tool and provide meaningful error dict for LLM to proceed. Pros
Cons
ADK currently handles only one basic error, parameter missing. For all other errors, defer to No.1. Pros
Cons
Pros
Cons
Feel free to cast your opinions and discussion on the 3 options or even more ideas. |
Beta Was this translation helpful? Give feedback.
-
Regarding tools: if a tool determines that it's safe not to crash, it can handle the error gracefully by returning a JSON object or an error message with an appropriate indicator. However, in my case, I'm testing the agent flow using smaller models, and they often attempt to call incorrect tools, which ends up "crashing" the flow. |
Beta Was this translation helpful? Give feedback.
-
I tried @Danau5tin's fix, but unfortunately it didn’t work for me. The |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Maintainer's comment: we'd like to seek options on this topic from community.
check out #795 (comment) for the poll and cast your opinions.
Original content
MCP Tool Failure Crashes Entire ADK Multi-Agent Workflow
When an MCP tool fails during execution (not connection), it propagates as an unhandled exception that crashes the entire ADK agent workflow, stopping all subsequent agents in a SequentialAgent pipeline.
Environment
ADK Version: Latest (using google.adk.agents, google.adk.tools.mcp_tool)
Python Version: 3.12
MCP Library Version: Latest compatible with current ADK implementation
Operating System: Linux 5.15
Problem Description
While ADK provides good error handling for MCP server connection failures, runtime MCP tool failures (like "Resource not found") propagate as unhandled McpError exceptions that crash the entire multi-agent workflow.
Expected Behavior
Individual MCP tool failures should not crash the entire agent workflow
Agents should be able to handle tool failures gracefully and continue execution
Sequential agents should continue to subsequent agents even if one tool fails
The framework should provide built-in resilience mechanisms for MCP tool failures
Actual Behavior
Single MCP tool failure crashes the entire SequentialAgent workflow
No opportunity for graceful degradation or alternative approaches
Complete loss of partial results from successful agents
Workflow stops executing without running subsequent agents
Steps to Reproduce
Create a multi-agent workflow using SequentialAgent
Include an MCP tool that may fail (e.g., GitHub file access with invalid path)
Configure the agent to use the MCP tool
Run the workflow with inputs that will cause the MCP tool to fail
Minimal Reproducible Example
python
import asyncio
from google.adk.agents import SequentialAgent, LlmAgent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset
from google.adk.tools.mcp_tool.mcp_toolset import StdioServerParameters
async def create_failing_workflow():
# Setup GitHub MCP tools
git_tools, git_exit_stack = await MCPToolset.from_server(
connection_params=StdioServerParameters(
command='npx',
args=["-y", "@modelcontextprotocol/server-github"],
env={"GITHUB_PERSONAL_ACCESS_TOKEN": "your_token"}
)
)
Run with input that causes MCP tool to fail
Result: Entire workflow crashes, other_agent1 and other_agent2 never execute
Error Log
mcp.shared.exceptions.McpError: Not Found: Resource not found: Not Found
File "/home/nmr/.venv/lib/python3.12/site-packages/google/adk/tools/mcp_tool/mcp_tool.py", line 126, in run_async
raise e
File "/home/nmr/.venv/lib/python3.12/site-packages/google/adk/tools/mcp_tool/mcp_tool.py", line 122, in run_async
response = await self.mcp_session.call_tool(self.name, arguments=args)
File "/home/nmr/.venv/lib/python3.12/site-packages/mcp/client/session.py", line 265, in call_tool
return await self.send_request(
File "/home/nmr/.venv/lib/python3.12/site-packages/mcp/shared/session.py", line 273, in send_request
raise McpError(response_or_error.error)
Current Workarounds
Agent Instruction Level: Explicitly instruct agents to handle tool failures
Wrapper Functions: Create wrapper tools with try-catch logic
Alternative Agent Patterns: Use custom agents instead of SequentialAgent
Suggested Solutions
Add built-in error handling in MCPTool.run_async():
python
async def run_async(self, args, tool_context):
try:
response = await self.mcp_session.call_tool(self.name, arguments=args)
return response
except McpError as e:
# Convert to tool result with error information
return {
"error": True,
"error_type": "mcp_tool_failure",
"error_message": str(e),
"tool_name": self.name,
"suggestions": ["Try alternative tools", "Check connectivity"]
}
2. SequentialAgent Resilience
Modify SequentialAgent to continue execution despite sub-agent failures:
python
Add option for fault-tolerant execution
workflow = SequentialAgent(
name="FaultTolerantWorkflow",
sub_agents=[agent1, agent2, agent3],
continue_on_failure=True, # New parameter
collect_partial_results=True # New parameter
)
3. Circuit Breaker Pattern
Implement circuit breaker functionality for MCP tools to prevent cascading failures.
Impact
Severity: High - Crashes entire workflows
Frequency: Common when using external MCP servers
Workaround Complexity: Medium - Requires manual error handling
Additional Context
This issue significantly impacts the reliability of production ADK systems using MCP tools. The current behavior makes it difficult to build robust multi-agent systems that can gracefully handle partial failures.
Related Issues
[Link to any related issues if they exist]
Feature Request
Consider adding:
Built-in error handling options for MCP tools
Fault-tolerant execution modes for multi-agent workflows
Circuit breaker patterns for external tool integrations
Better error propagation and handling documentation
Labels: bug, enhancement, mcp-tools, multi-agent, error-handling
Beta Was this translation helpful? Give feedback.
All reactions