LangGraph: Stateful Graphs and Durable Execution

> LangGraph is the 2026 reference for low-level stateful orchestration. Agent is a state machine; nodes are functions; edges are transitions; state is immutable and checkpointed after every step. Resume from any failure exactly where it left off.

Type: Learn + Build

Languages: Python (stdlib)

Prerequisites: Phase 14 · 01 (Agent Loop), Phase 14 · 12 (Workflow Patterns)

Time: ~75 minutes

Learning Objectives

The Problem

Agents and workflows share a problem: when a 40-step run fails at step 38, you want to resume from step 38, not start over. Second-class state models leave operators hacking retries around a library that assumes fresh runs.

LangGraph's design answer: state is a first-class typed object, mutations are explicit, and checkpoints persist after every node. Resume is a load_state(session_id) call.

The Concept

The graph

A graph is defined by:

Example: an agent with classify, refund, bug, sales, done nodes — a routing workflow as a graph.

Durable execution

After each node returns, the runtime serializes the state and writes it to a checkpointer (SQLite, Postgres, Redis, custom). On failure at step N, the runtime can resume(session_id) and pick up from step N+1 with exact state.

The LangGraph docs explicitly highlight production users where this matters: Klarna, Uber, J.P. Morgan. The claim isn't the graph shape; it's that the graph shape plus checkpointing makes recovery cheap.

Streaming

Every node can yield partial output. The graph streams per-node-delta events to the caller so UIs update as the graph runs.

Human-in-the-loop

Inspect and modify state between nodes. Implementations: pause before a critical node, surface state to a human, accept modifications, resume. The checkpointer makes this easy because state is already serialized.

Memory

Short-term (within a run — conversation history in state) and long-term (across runs — persistent via the checkpointer plus a separate long-term store). LangGraph integrates with external memory systems (Mem0, custom) via tools.

Three topologies

  1. Supervisor. Central router LLM dispatches to specialist subagents. create_supervisor() in langgraph-supervisor (though the LangChain team in 2026 recommends doing this through tool calls directly for more context control).
  2. Swarm / peer-to-peer. Agents hand off directly via a shared tool surface. No central router.
  3. Hierarchical. Supervisors managing sub-supervisors, implemented as nested subgraphs.

Where this pattern goes wrong

Build It

code/main.py implements a stdlib stateful graph:

Run it:

python3 code/main.py

The trace shows the first run failing at the human gate, persistence, then resume producing the final output.

Use It

Ship It

outputs/skill-state-graph.md generates a LangGraph-shaped state graph in any target runtime with checkpointing and resume wired in.

Exercises

  1. Add a conditional edge from classify to end when classification confidence is below a threshold. Resume the run after a human sets route manually.
  2. Swap the SQLite-like fake for a real SQLite checkpointer. Measure per-step serialization overhead.
  3. Implement parallel edges: two nodes run concurrently, merge by a custom reducer. What does immutable state buy here?
  4. Read langgraph-supervisor reference. Port the toy to create_supervisor. Compare the trace shapes.
  5. Add streaming: each node yields partial state while it runs. Print the deltas as they arrive.

Key Terms

Term What people say What it actually means
State graph "Agent as state machine" Typed state + nodes + edges + reducers
Checkpointer "Persistence backend" Serializes state after every node; enables resume
Reducer "State merger" Function that combines current state with a node's update
Conditional edge "Branch" Edge chosen by a function of state
Subgraph "Nested graph" A graph used as a node inside another graph
Durable execution "Resume from failure" Restart at the last successful node with exact state
Supervisor "Router LLM" Central dispatcher for specialist subagents
Swarm "P2P agents" Agents hand off via shared tools; no central router

Further Reading