Swarm Optimization for LLMs (PSO, ACO)

> Bio-inspired optimization is making an LLM comeback. LMPSO (arXiv:2504.09247) uses PSO where each particle's velocity is a prompt and the LLM generates the next candidate; works well on structured-sequence outputs (math expressions, programs). Model Swarms (arXiv:2410.11163) treats each LLM expert as a PSO particle on a model-weight manifold and reports 13.3% average gain over 12 baselines on 9 datasets with just 200 instances. SwarmPrompt (ICAART 2025) hybridizes PSO + Grey Wolf for prompt optimization. AMRO-S (arXiv:2603.12933) is ACO-inspired pheromone specialists for multi-agent LLM routing — 4.7x speedup, interpretable routing evidence, quality-gated asynchronous update that decouples inference from learning. This lesson implements PSO on prompt parameter space and ACO on agent routing, measures why these classical algorithms fit the LLM era, and when they do not.

Type: Learn + Build

Languages: Python (stdlib)

Prerequisites: Phase 16 · 09 (Parallel Swarm Networks), Phase 16 · 14 (Consensus and BFT)

Time: ~75 minutes

Problem

You have a prompt that scores 62% on your task eval. You want to improve it. The naive move is gradient-free manual tweaking, which scales badly. Reinforcement learning needs reward signals and enough rollouts to train. Backprop through prompts is not really possible — the prompt is a discrete string, not a differentiable parameter.

Classical bio-inspired optimization — PSO for continuous search spaces, ACO for path selection — was designed exactly for this regime: gradient-free, population-based, cheap per evaluation. Pair them with LLMs for the gradient-free search step, and you get a surprisingly practical optimizer.

The same patterns apply to agent *routing* in multi-agent systems. An ACO-style pheromone trail records which agent worked best on which task-type, lets the router exploit the trail, and decays pheromones so routes can be rediscovered.

Concept

PSO refresher (Kennedy & Eberhart 1995)

Particle Swarm Optimization: population of particles in a continuous search space. Each particle has position x_i and velocity v_i. Each iteration:

v_i <- w * v_i + c1 * r1 * (p_best_i - x_i) + c2 * r2 * (g_best - x_i)
x_i <- x_i + v_i
evaluate fitness(x_i)
update p_best_i if improved
update g_best if global best

Where p_best is particle's own best, g_best is swarm's best, w, c1, c2 are inertia + cognitive + social weights, r1, r2 are random factors.

PSO on LLM outputs — LMPSO

arXiv:2504.09247 adapts PSO for LLM-generated structured outputs (math expressions, programs). Each particle is a candidate output. Velocity is a *prompt* that describes how to modify the current output toward the personal/global best. The LLM generates the new output from the velocity prompt. The "inertia" of the velocity is a prompt like "make small incremental changes."

This works well when:

It does not work well when fitness needs human review — the per-iteration cost becomes prohibitive.

Model Swarms

arXiv:2410.11163 takes PSO off the output layer and into the *model* layer. Each "particle" is an expert LLM (parameters). The swarm moves the parameters toward the collective best via a gradient-free update. Reported: 13.3% average gain over 12 baselines on 9 datasets, with just 200 instances per iteration.

The key insight is that LLM expert models are already nearby in a shared parameter manifold (adapter weights, LoRA deltas). PSO on this low-dimensional subspace is cheap and effective.

ACO refresher (Dorigo 1992)

Ant Colony Optimization: ants traverse a graph; each path has a pheromone trail. Ant move probabilities weight by pheromone strength. Ants that complete the task deposit pheromone proportional to solution quality. Pheromone decays over time.

AMRO-S — ACO for agent routing

arXiv:2603.12933 uses ACO for multi-agent routing. Each task-type is a "destination"; each agent is a possible route. Pheromones strengthen routes that produce good outputs. Key contributions:

The quality gate matters: without it, fast-but-wrong agents accrue pheromone, and the system locks in on bad routes.

When to use PSO / ACO for LLMs

Use PSO when:

Use ACO when:

Do not use either when:

Why bio-inspired still wins

Gradient-based methods need differentiable signals. LLM outputs and routing decisions are not trivially differentiable. Pseudo-gradient methods (reinforcement-learned routers, DPO-style prompt tuners) work but need expensive training.

PSO and ACO need only an *evaluator* function. If you can score a candidate output or a routing decision, you can optimize over the space. That makes the bar for applicability much lower.

Practical limits

Build It

code/main.py implements:

Run:

python3 code/main.py

Expected output:

Use It

outputs/skill-swarm-optimizer.md helps choose between PSO, ACO, genetic algorithms, and gradient-based optimizers for LLM / agent optimization problems.

Ship It

Exercises

  1. Run code/main.py. Observe LMPSO convergence. Vary population size 5, 10, 20, 50. At what size does time-to-converge saturate?
  2. Implement a "catastrophic drift" experiment: after iteration 30, change the fitness function. How fast does PSO adapt? Does resetting p_best help?
  3. Add a quality gate to AMRO-S: pheromone deposit only on runs with eval score > 0.7. How does this change convergence vs the un-gated version?
  4. Read LMPSO (arXiv:2504.09247). Map the paper's "velocity as a prompt" back to your numeric velocity. What is lost in the simulation and what is preserved?
  5. Read AMRO-S (arXiv:2603.12933). Implement the decoupled "inference fast-path" with asynchronous pheromone update. How does this change system latency under sustained load?

Key Terms

Term What people say What it actually means
PSO "Particle Swarm Optimization" Kennedy-Eberhart 1995. Population-based gradient-free optimizer.
ACO "Ant Colony Optimization" Dorigo 1992. Path/route optimization via pheromone trails.
LMPSO "PSO with LLM generation" arXiv:2504.09247. Velocity is a prompt; LLM produces candidates.
Model Swarms "PSO on expert weights" arXiv:2410.11163. Gradient-free update on model parameter subspace.
AMRO-S "ACO for agent routing" arXiv:2603.12933. Pheromone matrix over task-type × agent.
p_best / g_best "Personal / global best" Per-particle and swarm-wide best solutions found so far.
Pheromone "Routing memory" Strength on an edge; decays over time; deposits on quality.
Quality-gated update "Only learn from good runs" Pheromone deposit conditioned on quality check.
Catastrophic drift "Distribution shift" Fitness landscape changes; old p_best and pheromones become stale.

Further Reading