Skip to content

mingyin1/Agents_Failure_Attribution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems (🔥🔥🔥 ICML 2025 Spotlight - Top 2.6%)

License Paper Dataset

🚀 This work has been covered by three well-known media outlets:

Important

If you find this project helpful, please consider giving us a ⭐️! It motivates us to keep improving.

🧐 Overview

overview

This repository provides the implementation of ICML 2025 spotlight paper "Which Agent Causes Task Failures and When?", which introduces the task of automated failure attribution in LLM-based multi-agent systems. Given a failed task, the goal failure attribution is to automatically identify the agent and step responsible for the failure.

Automated failure-attributions offers several key advantages:

  • Reduces manual debugging effort: Automates the labor-intensive process of inspecting failure logs and tracing errors.
  • Accelerates system development: Speeds up the iteration cycle by quickly identifying faulty agents and critical mistakes.
  • Enables intermediate feedback for agent self-improvement: Pinpointing decisive errors provides actionable signals for agentic systems' self-correction or can serve as rewards in reinforcement learning.

🔧 Who&When: #1 Benchmark for MAS automated failure attribution.

  • 184 annotated failure tasks collected from
  • Fine-grained annotations for each failure, including:
    • The failure-responsible agent (who failed),
    • The decisive error step (when the critical error occurred),
    • A natural language explanation of the failure.

The dataset covers a wide range of realistic multi-agent scenarios based on queries from GAIA and AssistantBench. It serves as a foundational resource for developing and evaluating methods that aim to automatically pinpoint the causes of failures in complex agentic systems. We follow the following guide to annotate these failure logs. More information could be found in the paper.

Important

Check out the dataset on Hugging Face 🤗.

💡 Evaluations

Requirements

To install requirements:

pip install -r requirements.txt

Inference

Please ensure that you specify the AutoFA method (--method) in the corresponding sections of the code before executing it.

  • Models We support the following models:
Model Name Command-line Argument
GPT-4o --model gpt-4o
GPT-4 --model gpt4
GPT-4o-mini --model gpt4o-mini
Llama-3.1-8B-Instruct --model llama-8b
Llama-3.1-70B-Instruct --model llama-70b
Qwen2.5-7B-Instruct --model qwen-7b
Qwen2.5-72B-Instruct --model qwen-72b

Run

python inference.py --method #METHOD --model #MODEL --is_handcrafted #DATA --directory_path #PATH

where:

  • --method specifies the failure attribution method:

    • all_at_once : All-at-Once judging
    • step_by_step : Step-by-Step judging
    • binary_search : Binary Search judging
  • --is_handcrafted specifies the dataset type:

    • True : Use hand-crafted agentic systems
    • False : Use algorithm-generated agentic systems
  • --directory_path specifies the path to the dataset:

    • ../Who&When/Hand-Crafted : Path to hand-crafted systems
    • ../Who&When/Algorithm-Generated : Path to algorithm-generated systems

Example:

python inference.py --method step_by_step --model gpt-4o --is_handcrafted False --directory_path ../Who&When/Algorithm-Generated

Evaluation

After that, you can evaluate the results. By default, the results are stored in the outputs.

Example:

python evaluate.py --data_path ../Who\&When/Algorithm-Generated --eval_file  outputs/step_by_step_gpt-4o_alg_generated.txt

🧪 Experimental Results

Main Experiment Ablation Model

More results could be found in the paper.

📖 Reference

Important

If you find it useful, please consider citing our work:

@article{zhang2025agent,
  title={Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems},
  author={Zhang, Shaokun and Yin, Ming and Zhang, Jieyu and Liu, Jiale and Han, Zhiguang and Zhang, Jingyang and Li, Beibin and Wang, Chi and Wang, Huazheng and Chen, Yiran and others},
  journal={arXiv preprint arXiv:2505.00212},
  year={2025}
}