New AI Debugging Method Identifies Which Agent Caused Task Failures and When

By ✦ min read

Automated Failure Attribution Tool Promises Faster Debugging of Multi-Agent Systems

Researchers have unveiled a groundbreaking method to automatically pinpoint which agent in a large language model (LLM) multi-agent system caused a task failure—and exactly when the failure occurred. The work, which includes the first benchmark dataset named Who&When, was accepted as a Spotlight presentation at ICML 2025.

New AI Debugging Method Identifies Which Agent Caused Task Failures and When — Source: syncedreview.com

'Developers have been stuck manually sifting through thousands of interaction logs to find the root cause of failures,' said Shaokun Zhang, co-first author from Penn State University. 'Our automated attribution approach turns that needle-in-a-haystack search into a structured, scalable process.'

The research is a collaboration between Penn State, Duke University, Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University. The code and dataset are fully open-source on GitHub and Hugging Face.

The Reliability Crisis in Multi-Agent Systems

LLM-powered multi-agent systems tackle complex tasks by having multiple AI agents collaborate—each handling subtasks and communicating results. But errors are frequent: a single agent's mistake, a misunderstanding, or a broken information chain can derail the entire mission.

Current debugging relies on manual log archaeology, where developers comb through extensive logs to trace the failure. That process is time-consuming and demands deep expertise in the system's design.

Ming Yin, co-first author from Duke University, explained: 'Without automated attribution, teams spend days or weeks trying to fix bugs, slowing down iteration. This problem will only grow as multi-agent systems become more complex.'

Introducing Who&When: The First Benchmark for Failure Attribution

The team constructed the Who&When benchmark dataset to evaluate automated attribution methods. It includes annotated failure cases from multiple multi-agent systems, providing ground-truth labels for which agent failed and at which step.

They then developed several attribution methods that analyze interaction logs, agent outputs, and system states. Early results show that these methods significantly outperform manual debugging in speed and accuracy.

'Our best method can identify the responsible agent and the failure moment with high precision in minutes, not hours,' said Zhang.

Background: Why Debugging Multi-Agent Systems Is So Hard

LLM multi-agent systems have shown immense potential in domains like software development, customer service, and scientific research. However, their autonomous nature and long reasoning chains make failures particularly insidious.

Unlike traditional software where errors are often localized, multi-agent failures can cascade: a hallucination from one agent leads to incorrect decisions downstream, masking the original source. Existing debugging tools lack the ability to automatically attribute such failures.

This research fills that gap by formalizing 'Automated Failure Attribution' as a distinct problem—the first systematic attempt to tackle it in the literature.

What This Means for AI Development

This breakthrough could accelerate the development of reliable multi-agent systems. By automating failure attribution, developers can iterate faster, reduce debugging costs, and build trust in these systems.

The open-source release allows the broader AI community to build on these methods and contribute to a shared benchmark. The researchers hope it will spur further innovation in self-diagnosing AI systems.

'Ultimately, we want multi-agent systems that can self-diagnose and recover from failures,' added Yin. 'This is a foundational step toward that vision.'

The full paper and dataset are available now. For details, visit the paper on arXiv.

Tags: