Orchestration Patterns: Supervisor, Swarm, Hierarchical

> Four orchestration patterns recur across 2026 frameworks: supervisor-worker, swarm / peer-to-peer, hierarchical, debate. Anthropic's guidance: "It's about building the right system for your needs." Start simple; add topology only when a single agent plus five workflow patterns is insufficient.

Type: Learn + Build

Languages: Python (stdlib)

Prerequisites: Phase 14 · 12 (Workflow Patterns), Phase 14 · 25 (Multi-Agent Debate)

Time: ~60 minutes

Learning Objectives

The Problem

Teams reach for "multi-agent" before they need it. Four patterns recur across frameworks; once you can name them, you can pick the right one — or skip topology entirely.

The Concept

Supervisor-worker

Frameworks: LangGraph create_supervisor, Anthropic orchestrator-workers, CrewAI Hierarchical Process.

2026 LangChain recommendation: do supervision through direct tool calls rather than create_supervisor. Gives finer context engineering control — you decide exactly what each specialist sees.

Swarm / peer-to-peer

Frameworks: LangGraph swarm topology, OpenAI Agents SDK handoffs (when all agents can hand off to all others).

Hierarchical

When you need it: when a single supervisor's context budget cannot hold descriptions of all specialists.

Debate

CrewAI Crew vs Flow

CrewAI formalizes two deployment modes:

This is orthogonal to the four patterns above but maps to topology: Flow is typically supervisor or hierarchical; Crew is typically supervisor with an LLM router.

Anthropic's guidance

"Success in the LLM space isn't about building the most sophisticated system. It's about building the right system for your needs."

Decision order:

  1. Single agent + workflow patterns (Lesson 12) — start here.
  2. Supervisor-worker — when you have 2-4 specialists.
  3. Swarm — when latency matters more than reasoning clarity.
  4. Hierarchical — only when supervisor context budget fails.
  5. Debate — when accuracy matters more than cost.

Where this pattern goes wrong

Build It

code/main.py implements all four patterns in stdlib against a scripted LLM:

Each pattern handles the same three-intent task (refund / bug / sales). Trace shapes differ.

Run it:

python3 code/main.py

Output: per-pattern trace + op count. Supervisor is cleanest; swarm is shortest; hierarchical is deepest; debate is most expensive.

Use It

Ship It

outputs/skill-orchestration-picker.md picks a topology and implements it.

Exercises

  1. Convert a supervisor-worker to a swarm by removing the router. What breaks? What improves?
  2. Add a hop counter to the swarm: refuse after 3 handoffs. Does it catch A->B->A bouncing?
  3. Build a two-level hierarchical system for a 12-specialist domain. Where does the context budget fail without nesting?
  4. Profile the four patterns on a production-shaped workload. Which wins on which metric (latency, cost, accuracy, debuggability)?
  5. Read Anthropic's "Building Effective Agents" post. Map each of your production flows to one of the four. Any that don't map cleanly?

Key Terms

Term What people say What it actually means
Supervisor-worker "Router + specialists" Central LLM dispatches to specialists; they don't talk to each other
Swarm "Peer-to-peer" Direct handoffs via shared tools; no central router
Hierarchical "Supervisors of supervisors" Nested subgraphs for large populations
Debate "Proposer + critique" Parallel proposers, cross-critique (Lesson 25)
Tool-call-based supervision "Supervisor without a library" Implement supervisor as direct tool calls for context control
Crew "Autonomous team" CrewAI's role-based collaboration mode
Flow "Deterministic workflow" CrewAI's event-driven production mode

Further Reading