The Agent Loop: Observe, Think, Act
> Every agent in 2026 — Claude Code, Cursor, Devin, Operator — is a variant of the ReAct loop from 2022. Reasoning tokens interleave with tool calls and observations until a stop condition fires. Learn this loop cold before touching any framework.
Type: Build
Languages: Python (stdlib)
Prerequisites: Phase 11 (LLM Engineering), Phase 13 (Tools and Protocols)
Time: ~60 minutes
Learning Objectives
- Name the three parts of the ReAct loop — Thought, Action, Observation — and explain why each one is load-bearing.
- Implement a stdlib agent loop with a toy LLM, tool registry, and stop condition under 200 lines.
- Identify the 2026 shift from prompt-based thought tokens to native model reasoning (Responses API, encrypted reasoning passthrough).
- Explain why every modern harness (Claude Agent SDK, OpenAI Agents SDK, LangGraph, AutoGen v0.4) still runs this loop under the hood.
The Problem
An LLM on its own is an autocomplete. You ask a question, you get a string back. It cannot read a file, run a query, open a browser, or verify a claim. If the model has outdated or wrong information it will say the wrong thing confidently and stop.
Agents fix this with one pattern: a loop that lets the model decide to pause, call a tool, read the result, and continue thinking. That is the entire idea. Every additional capability in Phase 14 — memory, planning, subagents, debate, evals — is scaffolding around this loop.
The Concept
ReAct: the canonical format
Yao et al. (ICLR 2023, arXiv:2210.03629) introduced Reason + Act. Each turn emits:
Thought: I need to look up the capital of France.
Action: search("capital of France")
Observation: Paris is the capital of France.
Thought: The answer is Paris.
Action: finish("Paris")
Three absolute wins over imitation or RL baselines in the original paper:
- ALFWorld: +34 points absolute success rate with only 1–2 in-context examples.
- WebShop: +10 points over imitation learning and search baselines.
- Hotpot QA: ReAct recovers from hallucinations by grounding each step in retrieval.
Reasoning traces do three things the model cannot do with action-only prompting: induce a plan, track the plan across steps, and handle exceptions when an action returns an unexpected observation.
The 2026 shift: native reasoning
Prompt-based Thought: tokens are a 2022 workaround. The 2025–2026 Responses API lineage replaces them with native reasoning: the model emits reasoning content on a separate channel, and that channel is passed through turns (encrypted across providers in production). Letta V1 (letta_v1_agent) deprecates the old send_message + heartbeat pattern and the explicit thought-token scheme in favor of this.
What does not change: the loop itself. Observe → think → act → observe → think → act → stop. Whether the thought tokens are printed in your transcript or carried in a separate field, the control flow is the same.
The five ingredients
Every agent loop needs exactly five things. Miss any one and you have a chat bot, not an agent.
- A message buffer that grows: user turn, assistant turn, tool turn, assistant turn, tool turn, assistant turn, final.
- A tool registry the model can invoke by name — schema in, execution, result string out.
- A stop condition — model says
finish, or the assistant turn contains no tool calls, or max turns, or max tokens, or a guardrail trips. - A turn budget to prevent infinite loops. Anthropic's computer use announcement says dozens-to-hundreds of steps per task is normal; pick a cap that fits the task class, not a one-size-fits-all.
- An observation formatter that converts tool outputs into something the model can read. Every 400 error in your stack needs to end up as an observation string, not a crash.
Why this loop is everywhere
Claude Agent SDK, OpenAI Agents SDK, LangGraph, AutoGen v0.4 AgentChat, CrewAI, Agno, Mastra — every one of these runs ReAct under the hood. Framework differences are about what lives around the loop: state checkpointing (LangGraph), actor-model message passing (AutoGen v0.4), role templates (CrewAI), tracing spans (OpenAI Agents SDK). The loop itself is invariant.
2026 pitfalls
- Trust boundary collapse. Tool outputs are untrusted input. A PDF retrieved from the web can contain
. OpenAI's CUA docs are explicit: "only direct instructions from the user count as permission." See Lesson 27.delete the repo - Cascading failure. One phantom SKU, four downstream API calls, one multi-system outage. Agents cannot tell "I failed" from "the task is impossible" and often hallucinate success on 400 errors. See Lesson 26.
- Loop length explosion. Most 2026 agents run 40–400 steps. Debugging step 38's wrong decision requires observability (Lesson 23) and eval trajectories (Lesson 30).
Build It
code/main.py implements the loop end to end with stdlib only. Components:
ToolRegistry— name → callable map with input validation.ToyLLM— a deterministic script that emitsThought,Action,Observation,Finishlines so the loop is testable offline.AgentLoop— the while loop with max turns, trace recording, and stop conditions.- Three sample tools —
calculator,kv_store.get,kv_store.set— enough surface to show branching.
Run it:
python3 code/main.py
The output is a full ReAct trace: thoughts, tool calls, observations, final answer, and a summary. Swap the ToyLLM for a real provider and you have a production-shaped agent — that is the entire point.
Use It
Every framework in Phase 14 sits on top of this loop. Once you own it, picking a framework is about ergonomics and operational shape (durable state, actor model, role templates, voice transport), not a different control flow.
Reference the framework docs as you learn them:
- Claude Agent SDK (Lesson 17) — built-in tools, subagents, lifecycle hooks.
- OpenAI Agents SDK (Lesson 16) — Handoffs, Guardrails, Sessions, Tracing.
- LangGraph (Lesson 13) — stateful graph of nodes, checkpoints after every step.
- AutoGen v0.4 (Lesson 14) — asynchronous message-passing actors.
- CrewAI (Lesson 15) — role + goal + backstory templating, Crews vs Flows.
Ship It
outputs/skill-agent-loop.md is a reusable skill that any agent you build can load to explain the ReAct loop and generate a correct reference implementation for any language or runtime.
Exercises
- Add a
max_tool_calls_per_turncap. What breaks if the model issues three calls but you only execute the first two? - Implement a
no_tool_calls → donestop path. Contrast withfinishas an explicit tool. Which is safer against early-termination bugs? - Extend
ToyLLMso it sometimes returns anActionwith a malformed argument dict. Make the loop recover by feeding back an error observation. This is the shape of 2026 CRITIC-style correction (Lesson 5). - Replace
ToyLLMwith a real Responses API call. Move the thought trace from inline strings to the reasoning channel. What changes in the transcript? - Add a
tool_use_idcorrelator like the Anthropic schema so parallel tool calls can return out of order. Why do Anthropic, OpenAI, and Bedrock all require it?
Key Terms
| Term | What people say | What it actually means |
|---|---|---|
| Agent | "Autonomous AI" | A loop: LLM thinks, picks a tool, result feeds back, repeat until stop |
| ReAct | "Reasoning and Acting" | Yao et al. 2022 — interleave Thought, Action, Observation in one stream |
| Tool call | "Function calling" | Structured output the runtime dispatches to an executable |
| Observation | "Tool result" | The string representation of tool output fed back into the next prompt |
| Reasoning channel | "Thinking tokens" | Native reasoning output on a separate stream, passed through across turns |
| Stop condition | "Exit clause" | Explicit finish, no tool calls emitted, max turns, max tokens, or guardrail trip |
| Turn budget | "Max steps" | Hard cap on loop iterations — agents run 40–400 steps per task in 2026 |
| Trace | "Transcript" | Full record of thought, action, observation tuples for a run |
Further Reading
- Yao et al., ReAct: Synergizing Reasoning and Acting in Language Models (arXiv:2210.03629) — the canonical paper
- Anthropic, Building Effective Agents (Dec 2024) — when to use an agent loop vs a workflow
- Letta, Rearchitecting the Agent Loop — the native-reasoning rewrite of MemGPT's loop
- Claude Agent SDK overview — the 2026 harness shape
- OpenAI Agents SDK docs — Handoffs, Guardrails, Sessions, Tracing