The Agent Loop: Observe, Think, Act

> Every agent in 2026 — Claude Code, Cursor, Devin, Operator — is a variant of the ReAct loop from 2022. Reasoning tokens interleave with tool calls and observations until a stop condition fires. Learn this loop cold before touching any framework.

Type: Build

Languages: Python (stdlib)

Prerequisites: Phase 11 (LLM Engineering), Phase 13 (Tools and Protocols)

Time: ~60 minutes

Learning Objectives

The Problem

An LLM on its own is an autocomplete. You ask a question, you get a string back. It cannot read a file, run a query, open a browser, or verify a claim. If the model has outdated or wrong information it will say the wrong thing confidently and stop.

Agents fix this with one pattern: a loop that lets the model decide to pause, call a tool, read the result, and continue thinking. That is the entire idea. Every additional capability in Phase 14 — memory, planning, subagents, debate, evals — is scaffolding around this loop.

The Concept

ReAct: the canonical format

Yao et al. (ICLR 2023, arXiv:2210.03629) introduced Reason + Act. Each turn emits:

Thought: I need to look up the capital of France.
Action: search("capital of France")
Observation: Paris is the capital of France.
Thought: The answer is Paris.
Action: finish("Paris")

Three absolute wins over imitation or RL baselines in the original paper:

Reasoning traces do three things the model cannot do with action-only prompting: induce a plan, track the plan across steps, and handle exceptions when an action returns an unexpected observation.

The 2026 shift: native reasoning

Prompt-based Thought: tokens are a 2022 workaround. The 2025–2026 Responses API lineage replaces them with native reasoning: the model emits reasoning content on a separate channel, and that channel is passed through turns (encrypted across providers in production). Letta V1 (letta_v1_agent) deprecates the old send_message + heartbeat pattern and the explicit thought-token scheme in favor of this.

What does not change: the loop itself. Observe → think → act → observe → think → act → stop. Whether the thought tokens are printed in your transcript or carried in a separate field, the control flow is the same.

The five ingredients

Every agent loop needs exactly five things. Miss any one and you have a chat bot, not an agent.

  1. A message buffer that grows: user turn, assistant turn, tool turn, assistant turn, tool turn, assistant turn, final.
  2. A tool registry the model can invoke by name — schema in, execution, result string out.
  3. A stop condition — model says finish, or the assistant turn contains no tool calls, or max turns, or max tokens, or a guardrail trips.
  4. A turn budget to prevent infinite loops. Anthropic's computer use announcement says dozens-to-hundreds of steps per task is normal; pick a cap that fits the task class, not a one-size-fits-all.
  5. An observation formatter that converts tool outputs into something the model can read. Every 400 error in your stack needs to end up as an observation string, not a crash.

Why this loop is everywhere

Claude Agent SDK, OpenAI Agents SDK, LangGraph, AutoGen v0.4 AgentChat, CrewAI, Agno, Mastra — every one of these runs ReAct under the hood. Framework differences are about what lives around the loop: state checkpointing (LangGraph), actor-model message passing (AutoGen v0.4), role templates (CrewAI), tracing spans (OpenAI Agents SDK). The loop itself is invariant.

2026 pitfalls

Build It

code/main.py implements the loop end to end with stdlib only. Components:

Run it:

python3 code/main.py

The output is a full ReAct trace: thoughts, tool calls, observations, final answer, and a summary. Swap the ToyLLM for a real provider and you have a production-shaped agent — that is the entire point.

Use It

Every framework in Phase 14 sits on top of this loop. Once you own it, picking a framework is about ergonomics and operational shape (durable state, actor model, role templates, voice transport), not a different control flow.

Reference the framework docs as you learn them:

Ship It

outputs/skill-agent-loop.md is a reusable skill that any agent you build can load to explain the ReAct loop and generate a correct reference implementation for any language or runtime.

Exercises

  1. Add a max_tool_calls_per_turn cap. What breaks if the model issues three calls but you only execute the first two?
  2. Implement a no_tool_calls → done stop path. Contrast with finish as an explicit tool. Which is safer against early-termination bugs?
  3. Extend ToyLLM so it sometimes returns an Action with a malformed argument dict. Make the loop recover by feeding back an error observation. This is the shape of 2026 CRITIC-style correction (Lesson 5).
  4. Replace ToyLLM with a real Responses API call. Move the thought trace from inline strings to the reasoning channel. What changes in the transcript?
  5. Add a tool_use_id correlator like the Anthropic schema so parallel tool calls can return out of order. Why do Anthropic, OpenAI, and Bedrock all require it?

Key Terms

Term What people say What it actually means
Agent "Autonomous AI" A loop: LLM thinks, picks a tool, result feeds back, repeat until stop
ReAct "Reasoning and Acting" Yao et al. 2022 — interleave Thought, Action, Observation in one stream
Tool call "Function calling" Structured output the runtime dispatches to an executable
Observation "Tool result" The string representation of tool output fed back into the next prompt
Reasoning channel "Thinking tokens" Native reasoning output on a separate stream, passed through across turns
Stop condition "Exit clause" Explicit finish, no tool calls emitted, max turns, max tokens, or guardrail trip
Turn budget "Max steps" Hard cap on loop iterations — agents run 40–400 steps per task in 2026
Trace "Transcript" Full record of thought, action, observation tuples for a run

Further Reading