Failure Modes: Why Agents Break

> MASFT (Berkeley, 2025) catalogs 14 multi-agent failure modes in 3 categories. Microsoft's Taxonomy documents how existing AI failures amplify in agentic settings. Industry field data converges on five recurring modes: hallucinated actions, scope creep, cascading errors, context loss, tool misuse.

Type: Learn + Build

Languages: Python (stdlib)

Prerequisites: Phase 14 · 05 (Self-Refine and CRITIC), Phase 14 · 24 (Observability)

Time: ~60 minutes

Learning Objectives

The Problem

Teams ship agents that work on 90% of traces. The 10% failures are not random noise — they fall into a small number of recurring categories. Once you can name them, you can monitor for them and fix them.

The Concept

MASFT (Berkeley, arXiv:2503.13657)

Multi-Agent System Failure Taxonomy. 14 failure modes clustered into 3 categories. Inter-annotator Cohen's Kappa 0.88 — the categories are reliably distinguishable.

Central claim: failures are fundamental design flaws in multi-agent systems, not LLM limitations to be fixed with better base models.

Microsoft Taxonomy of Failure Mode in Agentic AI Systems

Characterizing Faults in Agentic AI (arXiv:2603.06847)

LLM Agent Hallucinations Survey (arXiv:2509.18970)

Two primary manifestations:

  1. Instruction-following Deviation — agent doesn't follow the system prompt.
  2. Long-range Contextual Misuse — agent forgets or misapplies context from earlier turns.

Sub-intention errors: Omission (missed step), Redundancy (repeated step), Disorder (out-of-order steps).

The five industry-recurring modes

Arize, Galileo, NimbleBrain 2024-2026 field analyses converge on:

  1. Hallucinated actions. Agent invokes a tool that doesn't exist or fabricates arguments.
  2. Scope creep. Agent expands task beyond the user's ask (creates extra PRs, sends extra emails).
  3. Cascading errors. One wrong call triggers downstream effects. A phantom SKU hallucination triggers four API calls — a multi-system incident.
  4. Context loss. Long-horizon tasks forget early-turn constraints.
  5. Tool misuse. Calls the right tool with wrong arguments, or the wrong tool entirely.

Cascading is the killer. Agents cannot distinguish "I failed" from "the task is impossible" and often hallucinate a success message on 400 errors to close the loop.

Mitigation: gates at every step

Automated verification gates at every step of a reasoning chain, checking factual grounding against environment state. Concretely:

Where failure monitoring goes wrong

Build It

code/main.py implements a stdlib failure-mode tagger:

Run it:

python3 code/main.py

Output: per-trace labels + aggregate distribution, a cheap reproduction of what Phoenix's trace clustering surfaces.

Use It

Ship It

outputs/skill-failure-detector.md generates failure-mode detectors tailored to your domain, wired to a trace store.

Exercises

  1. Add a detector for "success hallucination": agent returns success but the target state is unchanged.
  2. Tag 100 real traces from a product you've built. Which mode dominates? What's the cost of fixing it?
  3. Implement a "cascade radius" metric: given a failure at step N, how many downstream steps did it affect?
  4. Read MASFT's 14 failure modes. Pick three that apply to your product. Write detectors.
  5. Wire one detector into a CI job: fail the build if >=5% of traces tag a mode.

Key Terms

Term What people say What it actually means
MASFT "Multi-agent failure taxonomy" Berkeley 14-mode categorization
Cascading error "Ripple failure" One early mistake propagates through N steps
Context loss "Forgot the constraint" Long-horizon turn drops early-turn facts
Tool misuse "Wrong tool / wrong args" Valid call, wrong invocation
Success hallucination "Faked completion" Agent claims success on a 400; state unchanged
Scope creep "Overreach" Agent does more than asked
Instruction-following deviation "Disobedience" Ignores system prompt or user constraint
Sub-intention errors "Plan bugs" Omission, redundancy, disorder in plan execution

Further Reading