Pinpointing the Culprit: A Guide to Automated Failure Attribution in LLM Multi-Agent Systems

By ✦ min read

Overview

Multi-agent systems powered by large language models (LLMs) are increasingly used to tackle complex tasks through collaborative workflows. Yet, when a multi-agent system fails—and it often does—developers face the daunting challenge of identifying which agent caused the failure and at what point in the process. Manual inspection of lengthy interaction logs is akin to searching for a needle in a haystack, time-consuming and error-prone. To address this, researchers from Penn State University and Duke University, in collaboration with Google DeepMind, the University of Washington, Meta, Nanyang Technological University, and Oregon State University, have formally introduced the problem of Automated Failure Attribution. They have created the first benchmark dataset, named Who&When, and developed several automated attribution methods. Their work, accepted as a Spotlight presentation at ICML 2025, aims to enhance the reliability and debuggability of LLM multi-agent systems. This tutorial will guide you through the core concepts, prerequisites, and practical steps to understand and apply automated failure attribution, with code examples and best practices.

Pinpointing the Culprit: A Guide to Automated Failure Attribution in LLM Multi-Agent Systems
Source: syncedreview.com

Prerequisites

Before diving into automated failure attribution, ensure you have a solid foundation in the following areas:

Step-by-Step Instructions

This section walks you through the process of performing automated failure attribution using the Who&When dataset and the methods described in the paper. The steps are organized under relevant subsections.

1. Setting Up the Environment

Clone the official repository from GitHub:

git clone https://github.com/mingyin1/Agents_Failure_Attribution.git
cd Agents_Failure_Attribution

Create a Python virtual environment (Python 3.8+ recommended) and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Download the Who&When dataset and place it in the data/ directory. The dataset is organized into folds for cross-validation.

2. Understanding the Dataset Structure

The Who&When dataset contains JSON files representing multi-agent task episodes. Each episode includes:

Example snippet:

{
  "episode_id": "ep_001",
  "agents": ["Agent_1", "Agent_2", "Agent_3"],
  "steps": [
    {"step": 0, "agent": "Agent_1", "action": "propose_plan", "content": "..."},
    ...
  ],
  "failure_label": 1,
  "responsible_agent": "Agent_2",
  "failure_step": 4
}

3. Preprocessing the Data

Before feeding data into attribution models, you need to convert raw logs into a structured format. The repository includes a preprocessing script. Run:

python preprocess.py --data_path data/raw --output_path data/processed

This script extracts relevant features (e.g., agent utterances, step indices) and splits data into training/validation/test sets. It also generates embeddings using a pre-trained LLM if required by the attribution method.

4. Implementing a Baseline Attribution Method

The paper proposes several baseline approaches. Here we demonstrate a simple pattern-based method: Last Agent to Speak. The assumption is that the agent who sent the last message before failure is likely responsible.

def last_agent_to_speak(episode):
    steps = episode['steps']
    # Find the last step before failure (if failure occurred)
    if episode['failure_label'] == 1:
        failure_step = episode['failure_step']
        # Get the agent who spoke at the step just before failure
        agent = steps[failure_step - 1]['agent']
        return agent, failure_step
    else:
        return None, None

Evaluate this method on the test set using accuracy for who and when.

5. Advanced Attribution with LLM-based Classifiers

The paper’s main contributions are LLM-based methods that contextualize the entire episode. One approach uses a fine-tuned LLM (e.g., RoBERTa or T5) to predict both the responsible agent and failure step simultaneously. The training code is provided in train_llm.py.

To fine-tune a model:

python train.py --model_name roberta-base --num_epochs 10 --batch_size 16 --learning_rate 2e-5

This script:

  1. Loads episodes from the processed dataset.
  2. Tokenizes the concatenated step descriptions (with agent markers).
  3. Trains a multi-task classification model outputting two heads: agent ID and step index.

After training, evaluate:

python evaluate.py --checkpoint checkpoints/best_model.ckpt --test_path data/processed/test.json

The output will report metrics like accuracy, precision, recall, and F1 for both attribution tasks.

6. Interpreting Attribution Results

Once you have predictions, you can analyze failure patterns. The paper also introduces a visualization tool in the repository. Run:

python visualize.py --results results.csv

This generates a heatmap showing which agents are most frequently implicated at which steps. Use this to identify systemic issues, such as a particular agent consistently failing during information handoffs.

Common Mistakes

Here are pitfalls to avoid when performing automated failure attribution:

Summary

Automated failure attribution in LLM multi-agent systems is a critical step toward building reliable and debuggable collaborative AI. This tutorial introduced the concept, prerequisites, and a hands-on guide using the Who&When benchmark created by researchers from Penn State and Duke. By preprocessing interaction logs, implementing baseline and LLM-based attribution methods, and avoiding common pitfalls, you can systematically identify which agent caused a failure and at which step. The open-source code and dataset empower you to apply these techniques to your own multi-agent systems, accelerating debugging and optimization. As multi-agent systems become more prevalent, automated attribution will be an essential tool in every developer’s toolkit.

Tags:

Recommended

Discover More

How to Implement Agentic R&D with Microsoft Discovery: A Step-by-Step GuideOpen Document Standards: The Core of Digital Sovereignty in European Office SuitesTurning Your PS5 Into a Linux Gaming Machine: Q&A on the Ubuntu PortMastering Highlighting-Friendly Code: A Q&A GuideGoogle's Bug Bounty Shifts: Chrome Cuts, Android Boosts, and AI's Role