← Multi-Agent Debate and Collaboration Prompt Injection and the PVE Defense →

Failure Modes: Why Agents Break

> MASFT (Berkeley, 2025) catalogs 14 multi-agent failure modes in 3 categories. Microsoft's Taxonomy documents how existing AI failures amplify in agentic settings. Industry field data converges on five recurring modes: hallucinated actions, scope creep, cascading errors, context loss, tool misuse.

Type: Learn + Build

Languages: Python (stdlib)

Prerequisites: Phase 14 · 05 (Self-Refine and CRITIC), Phase 14 · 24 (Observability)

Time: ~60 minutes

Learning Objectives

Name MASFT's three failure categories and at least four specific modes in each.
Explain why agentic failure amplifies existing AI failure modes (bias, hallucination).
Describe the five industry-recurring modes and their mitigations.
Implement a stdlib detector that tags agent traces with failure-mode labels.

The Problem

Teams ship agents that work on 90% of traces. The 10% failures are not random noise — they fall into a small number of recurring categories. Once you can name them, you can monitor for them and fix them.

The Concept

MASFT (Berkeley, arXiv:2503.13657)

Multi-Agent System Failure Taxonomy. 14 failure modes clustered into 3 categories. Inter-annotator Cohen's Kappa 0.88 — the categories are reliably distinguishable.

Central claim: failures are fundamental design flaws in multi-agent systems, not LLM limitations to be fixed with better base models.

Microsoft Taxonomy of Failure Mode in Agentic AI Systems

Existing AI failures (bias, hallucination, data leakage) amplify in agentic settings.
New failures emerge from autonomy: unintended action at scale, tool misuse, mission drift.
The whitepaper is the risk register for agentic products.

Characterizing Faults in Agentic AI (arXiv:2603.06847)

Failures arise from orchestration, internal state evolution, and environment interaction.
Not just "bad code" or "bad model output."

LLM Agent Hallucinations Survey (arXiv:2509.18970)

Two primary manifestations:

Instruction-following Deviation — agent doesn't follow the system prompt.
Long-range Contextual Misuse — agent forgets or misapplies context from earlier turns.

Sub-intention errors: Omission (missed step), Redundancy (repeated step), Disorder (out-of-order steps).

The five industry-recurring modes

Arize, Galileo, NimbleBrain 2024-2026 field analyses converge on:

Hallucinated actions. Agent invokes a tool that doesn't exist or fabricates arguments.
Scope creep. Agent expands task beyond the user's ask (creates extra PRs, sends extra emails).
Cascading errors. One wrong call triggers downstream effects. A phantom SKU hallucination triggers four API calls — a multi-system incident.
Context loss. Long-horizon tasks forget early-turn constraints.
Tool misuse. Calls the right tool with wrong arguments, or the wrong tool entirely.

Cascading is the killer. Agents cannot distinguish "I failed" from "the task is impossible" and often hallucinate a success message on 400 errors to close the loop.

Mitigation: gates at every step

Automated verification gates at every step of a reasoning chain, checking factual grounding against environment state. Concretely:

Per-step safety classifier (Lesson 21).
Tool-call argument validation (Lesson 06).
Cross-check retrieved content against known facts (Lesson 05, CRITIC).
Detect success hallucination by re-probing state (was the file actually created?).

Where failure monitoring goes wrong

Tagging only crashes. Most agent failures produce valid-looking output. Need content-level checks.
No baseline. Drift detection needs a last-known-good; without it you cannot say "this is getting worse."
Over-alerting. Every failure produces a page. Cluster and rate-limit.

Build It

code/main.py implements a stdlib failure-mode tagger:

A synthetic trace dataset covering the five modes.
Detector functions per mode (signature patterns on tool calls, outputs, repeat actions).
A tagger that labels each trace and reports mode distribution.

Run it:

python3 code/main.py

Output: per-trace labels + aggregate distribution, a cheap reproduction of what Phoenix's trace clustering surfaces.

Use It

Phoenix for production drift clustering (Lesson 24).
Langfuse for session replay + annotation.
Custom for domain-specific signatures your observability platform can't detect.

Ship It

outputs/skill-failure-detector.md generates failure-mode detectors tailored to your domain, wired to a trace store.

Exercises

Add a detector for "success hallucination": agent returns success but the target state is unchanged.
Tag 100 real traces from a product you've built. Which mode dominates? What's the cost of fixing it?
Implement a "cascade radius" metric: given a failure at step N, how many downstream steps did it affect?
Read MASFT's 14 failure modes. Pick three that apply to your product. Write detectors.
Wire one detector into a CI job: fail the build if >=5% of traces tag a mode.

Key Terms

Term	What people say	What it actually means
MASFT	"Multi-agent failure taxonomy"	Berkeley 14-mode categorization
Cascading error	"Ripple failure"	One early mistake propagates through N steps
Context loss	"Forgot the constraint"	Long-horizon turn drops early-turn facts
Tool misuse	"Wrong tool / wrong args"	Valid call, wrong invocation
Success hallucination	"Faked completion"	Agent claims success on a 400; state unchanged
Scope creep	"Overreach"	Agent does more than asked
Instruction-following deviation	"Disobedience"	Ignores system prompt or user constraint
Sub-intention errors	"Plan bugs"	Omission, redundancy, disorder in plan execution