Agent Framework Tradeoffs — LangGraph vs CrewAI vs AutoGen vs Agno

> Every framework sells the same demo (research agent builds a report) and hides the same bug (state schema fights with the orchestration layer). Pick the framework whose abstractions match the shape of your problem; everything else is glue you write twice.

Type: Learn

Languages: Python

Prerequisites: Phase 11 · 09 (Function Calling), Phase 11 · 16 (LangGraph)

Time: ~45 minutes

The Problem

You have a task that needs more than one LLM call. Maybe it is a research workflow (plan, search, summarize, cite). Maybe it is a code-review pipeline (parse diff, critique, patch, validate). Maybe it is a multi-turn assistant that books flights, writes emails, and files expense reports. You pick a framework.

Three days later, you discover the framework's abstractions leak. CrewAI gives you roles but fights you when the "researcher" needs to hand a structured plan to the "writer." AutoGen gives you chat between agents but has no first-class state so your checkpoint is a pickle of a conversation log. LangGraph gives you a state graph but forces you to name every transition before you know what the agent will do. Agno gives you a single-agent primitive that screams when you try to fan out to three concurrent workers.

The fix is not "pick the best framework." It is to match the framework's core abstraction to the shape of your problem. This lesson draws that map.

The Concept

Agent framework matrix: core abstraction vs problem shape

Four frameworks dominate the 2026 landscape. Their core abstractions are not the same.

Framework Core abstraction Best fit Worst fit
LangGraph StateGraph — typed state, nodes, conditional edges, checkpointer. Workflows with explicit state and human-in-the-loop interrupts; production agents needing time-travel debugging. Loose, role-driven brainstorming where the topology is unknown.
CrewAI Crew — roles (goal, backstory), tasks, process (sequential or hierarchical). Role-playing or persona-driven workflows with a short linear/hierarchical plan. Anything stateful beyond the crew's turn history; complex branching.
AutoGen ConversableAgent pair — two or more agents that speak in turns until an exit condition. Multi-agent *dialogue* (teacher-student, proposer-critic, actor-reviewer) where the thinking emerges from the chat. Deterministic workflows with a known DAG; anything needing durable state across restarts.
Agno Agent — a single LLM + tools + memory, composable into teams. Fast-to-build single agents and lightweight teams; strong multi-modality and built-in storage drivers. Deep, explicitly-branched graphs with custom reducers.

What "abstraction" actually means

A framework's core abstraction is the thing you draw on the whiteboard when you pitch the architecture.

The state question

State is where most framework choices break down in production.

The branching question

Every non-trivial agent branches. Who decides the branch matters.

The observability question

Cost and latency

All four frameworks add per-call overhead (framework logic, validation, serialization). Rough order of increasing overhead: Agno ≈ LangGraph < CrewAI ≈ AutoGen. The difference is dominated by how much extra LLM routing the framework does. CrewAI's hierarchical manager spends tokens deciding who goes next; AutoGen's GroupChatManager likewise. LangGraph only spends tokens where you write llm.invoke. Agno's single-agent path is thin.

When cost per run matters, prefer explicit routing (LangGraph edges, AutoGen speaker_selection_method) over LLM-selected routing.

Interoperability

The Skill

> You can explain, in one sentence, why a given framework is right for a given agent problem.

Pre-build checklist:

  1. Draw the shape. Is this a graph (typed state, named transitions)? A role play (specialists hand off work)? A chat (agents talk until done)? A single agent with tools?
  2. Decide who branches. Developer-decided branching → LangGraph. Manager-agent-decided → CrewAI hierarchical. Chat-emergent → AutoGen. Tool-call-decided → Agno.
  3. Check the state budget. Do you need resume-from-checkpoint? Time-travel? Human interrupts mid-run? If yes, LangGraph is the default; Agno sessions cover conversation-scoped state.
  4. Check the cost budget. LLM-selected routing costs extra tokens per turn. If the agent runs thousands of times a day, prefer explicit routing.
  5. Budget the framework overhead. Every framework is another dependency. If the task is two LLM calls and a tool, write 30 lines of plain Python; no framework is cheaper than no framework.

Refuse to reach for a framework before you can draw the graph, the org chart, the chat, or the agent box. Refuse to pick one that forces you to fight its state model for the thing you actually need.

The Decision Matrix

Problem shape Preferred framework Why
Workflow DAG with typed state, human approvals, long-running LangGraph First-class state, checkpointer, interrupts, time-travel.
Research / writing pipeline with distinct roles CrewAI (sequential) or LangGraph subgraphs Role-per-task is cheap to express in CrewAI; scale up with LangGraph when branching gets complex.
Proposer-critic or teacher-student dialogue AutoGen Two-agent chat is its native shape.
Single agent with tools, sessions, memory Agno Thinnest setup, built-in storage and memory.
Thousands of parallel fanouts with reducers LangGraph + Send The only one with a first-class parallel dispatch primitive.
Quick prototype, no framework commitment Plain Python + provider SDK No framework is the fastest framework.

Exercises

  1. Easy. Take the same task — "research Anthropic's headquarters, write a 200-word brief, cite sources" — and implement it in LangGraph (four nodes: plan, search, write, cite) and in CrewAI (three roles: researcher, writer, editor). Report token cost per run and lines of code.
  2. Medium. Build the same task in AutoGen (researcher ↔ writer chat, editor joins via GroupChat) and Agno (a single agent with search_tools and write_tools, plus a session store). Rank the four implementations on (a) cost per run, (b) ability to resume after a crash, (c) ability to inject a human approval before the write step.
  3. Hard. Build a decision-tree script pick_framework.py that takes a short problem description (JSON: {has_typed_state, has_roles, has_dialogue, has_parallel_fanout, needs_resume}) and returns a recommendation with one-sentence justification. Verify it on six cases you design yourself.

Key Terms

Term What people say What it actually means
Orchestration "How the agents coordinate" The layer that decides which node/role/agent runs next.
Durable state "Resume after a restart" State that survives process death, attached to a checkpoint or session store.
LLM-selected routing "Let the model decide" A planner LLM picks the next step each turn; flexible but pays tokens on every decision.
Explicit routing "Developer decides" A Python function or static edge picks the next step; cheap and auditable.
Crew "A CrewAI team" Roles + tasks + process (sequential or hierarchical) bound into a single runnable.
GroupChat "AutoGen's multi-agent chat" A managed conversation between N agents with a speaker selector.
Team (Agno) "Multi-agent Agno" Route / coordinate / collaborate mode over a set of agents.
StateGraph "LangGraph's graph" Typed-state, node, conditional-edge, checkpointer primitive.

Further Reading