← Capstone — Build a Complete Tool Ecosystem ReWOO and Plan-and-Execute: Decoupled Planning →

The Agent Loop: Observe, Think, Act

> Every agent in 2026 — Claude Code, Cursor, Devin, Operator — is a variant of the ReAct loop from 2022. Reasoning tokens interleave with tool calls and observations until a stop condition fires. Learn this loop cold before touching any framework.

Type: Build

Languages: Python (stdlib)

Prerequisites: Phase 11 (LLM Engineering), Phase 13 (Tools and Protocols)

Time: ~60 minutes

Learning Objectives

Name the three parts of the ReAct loop — Thought, Action, Observation — and explain why each one is load-bearing.
Implement a stdlib agent loop with a toy LLM, tool registry, and stop condition under 200 lines.
Identify the 2026 shift from prompt-based thought tokens to native model reasoning (Responses API, encrypted reasoning passthrough).
Explain why every modern harness (Claude Agent SDK, OpenAI Agents SDK, LangGraph, AutoGen v0.4) still runs this loop under the hood.

The Problem

An LLM on its own is an autocomplete. You ask a question, you get a string back. It cannot read a file, run a query, open a browser, or verify a claim. If the model has outdated or wrong information it will say the wrong thing confidently and stop.

Agents fix this with one pattern: a loop that lets the model decide to pause, call a tool, read the result, and continue thinking. That is the entire idea. Every additional capability in Phase 14 — memory, planning, subagents, debate, evals — is scaffolding around this loop.

The Concept

ReAct: the canonical format

Yao et al. (ICLR 2023, arXiv:2210.03629) introduced Reason + Act. Each turn emits:

Thought: I need to look up the capital of France.
Action: search("capital of France")
Observation: Paris is the capital of France.
Thought: The answer is Paris.
Action: finish("Paris")

Three absolute wins over imitation or RL baselines in the original paper:

ALFWorld: +34 points absolute success rate with only 1–2 in-context examples.
WebShop: +10 points over imitation learning and search baselines.
Hotpot QA: ReAct recovers from hallucinations by grounding each step in retrieval.

Reasoning traces do three things the model cannot do with action-only prompting: induce a plan, track the plan across steps, and handle exceptions when an action returns an unexpected observation.

The 2026 shift: native reasoning

Prompt-based Thought: tokens are a 2022 workaround. The 2025–2026 Responses API lineage replaces them with native reasoning: the model emits reasoning content on a separate channel, and that channel is passed through turns (encrypted across providers in production). Letta V1 (letta_v1_agent) deprecates the old send_message + heartbeat pattern and the explicit thought-token scheme in favor of this.

What does not change: the loop itself. Observe → think → act → observe → think → act → stop. Whether the thought tokens are printed in your transcript or carried in a separate field, the control flow is the same.

The five ingredients

Every agent loop needs exactly five things. Miss any one and you have a chat bot, not an agent.

A message buffer that grows: user turn, assistant turn, tool turn, assistant turn, tool turn, assistant turn, final.
A tool registry the model can invoke by name — schema in, execution, result string out.
A stop condition — model says finish, or the assistant turn contains no tool calls, or max turns, or max tokens, or a guardrail trips.
A turn budget to prevent infinite loops. Anthropic's computer use announcement says dozens-to-hundreds of steps per task is normal; pick a cap that fits the task class, not a one-size-fits-all.
An observation formatter that converts tool outputs into something the model can read. Every 400 error in your stack needs to end up as an observation string, not a crash.

Why this loop is everywhere

Claude Agent SDK, OpenAI Agents SDK, LangGraph, AutoGen v0.4 AgentChat, CrewAI, Agno, Mastra — every one of these runs ReAct under the hood. Framework differences are about what lives around the loop: state checkpointing (LangGraph), actor-model message passing (AutoGen v0.4), role templates (CrewAI), tracing spans (OpenAI Agents SDK). The loop itself is invariant.

2026 pitfalls

Trust boundary collapse. Tool outputs are untrusted input. A PDF retrieved from the web can contain delete the repo. OpenAI's CUA docs are explicit: "only direct instructions from the user count as permission." See Lesson 27.
Cascading failure. One phantom SKU, four downstream API calls, one multi-system outage. Agents cannot tell "I failed" from "the task is impossible" and often hallucinate success on 400 errors. See Lesson 26.
Loop length explosion. Most 2026 agents run 40–400 steps. Debugging step 38's wrong decision requires observability (Lesson 23) and eval trajectories (Lesson 30).

Build It

code/main.py implements the loop end to end with stdlib only. Components:

ToolRegistry — name → callable map with input validation.
ToyLLM — a deterministic script that emits Thought, Action, Observation, Finish lines so the loop is testable offline.
AgentLoop — the while loop with max turns, trace recording, and stop conditions.
Three sample tools — calculator, kv_store.get, kv_store.set — enough surface to show branching.

Run it:

python3 code/main.py

The output is a full ReAct trace: thoughts, tool calls, observations, final answer, and a summary. Swap the ToyLLM for a real provider and you have a production-shaped agent — that is the entire point.

Use It

Every framework in Phase 14 sits on top of this loop. Once you own it, picking a framework is about ergonomics and operational shape (durable state, actor model, role templates, voice transport), not a different control flow.

Reference the framework docs as you learn them:

Claude Agent SDK (Lesson 17) — built-in tools, subagents, lifecycle hooks.
OpenAI Agents SDK (Lesson 16) — Handoffs, Guardrails, Sessions, Tracing.
LangGraph (Lesson 13) — stateful graph of nodes, checkpoints after every step.
AutoGen v0.4 (Lesson 14) — asynchronous message-passing actors.
CrewAI (Lesson 15) — role + goal + backstory templating, Crews vs Flows.

Ship It

outputs/skill-agent-loop.md is a reusable skill that any agent you build can load to explain the ReAct loop and generate a correct reference implementation for any language or runtime.

Exercises

Add a max_tool_calls_per_turn cap. What breaks if the model issues three calls but you only execute the first two?
Implement a no_tool_calls → done stop path. Contrast with finish as an explicit tool. Which is safer against early-termination bugs?
Extend ToyLLM so it sometimes returns an Action with a malformed argument dict. Make the loop recover by feeding back an error observation. This is the shape of 2026 CRITIC-style correction (Lesson 5).
Replace ToyLLM with a real Responses API call. Move the thought trace from inline strings to the reasoning channel. What changes in the transcript?
Add a tool_use_id correlator like the Anthropic schema so parallel tool calls can return out of order. Why do Anthropic, OpenAI, and Bedrock all require it?

Key Terms

Term	What people say	What it actually means
Agent	"Autonomous AI"	A loop: LLM thinks, picks a tool, result feeds back, repeat until stop
ReAct	"Reasoning and Acting"	Yao et al. 2022 — interleave Thought, Action, Observation in one stream
Tool call	"Function calling"	Structured output the runtime dispatches to an executable
Observation	"Tool result"	The string representation of tool output fed back into the next prompt
Reasoning channel	"Thinking tokens"	Native reasoning output on a separate stream, passed through across turns
Stop condition	"Exit clause"	Explicit `finish`, no tool calls emitted, max turns, max tokens, or guardrail trip
Turn budget	"Max steps"	Hard cap on loop iterations — agents run 40–400 steps per task in 2026
Trace	"Transcript"	Full record of thought, action, observation tuples for a run