Multi-Session Handoff

> The session is going to end. The work is not. The handoff packet is the artifact that turns "the agent worked for an hour" into "the next session is productive in the first minute." Build it on purpose, not as an afterthought.

Type: Build

Languages: Python (stdlib)

Prerequisites: Phase 14 · 34 (Repo Memory), Phase 14 · 38 (Verification), Phase 14 · 39 (Reviewer)

Time: ~50 minutes

Learning Objectives

The Problem

The session ends. The agent says "great, we made progress." The next session opens. The next agent asks "where did we leave off?" The first agent's answer is gone. The next agent rediscovers, re-runs the same commands, re-asks the human the same questions, and burns thirty minutes recovering the last thirty seconds of the previous session.

The cost of a bad handoff is paid every session for the life of the task. The fix is a packet generated automatically at session end: what changed, why, what was tried, what failed, what is left, what to do first next time.

The Concept

flowchart LR State[agent_state.json] --> Generator[generate_handoff.py] Verdict[verification_report.json] --> Generator Review[review_report.json] --> Generator Feedback[feedback_record.jsonl] --> Generator Generator --> Handoff[handoff.md + handoff.json] Handoff --> Next[Next Session]

Seven fields every handoff carries

Field Question it answers
summary One paragraph of what was done
changed_files The diff at a glance
commands_run What was actually executed
failed_attempts What was tried and why it did not work
open_risks What could bite next session, with severity
next_action The first concrete step next session takes
verdict_pointer Path to the verification + review reports

The next_action field is the load-bearing one. A handoff with everything except next_action is a status report, not a handoff.

Handoffs are generated, not written

A hand-written handoff is a handoff that gets skipped on a hard day. The generator reads the workbench artifacts and emits the packet. The agent's job is to leave the workbench in a state the generator can summarize, not to write the summary.

Two forms: human-readable and machine-readable

handoff.md is what the human reads. handoff.json is what the next agent loads. Both come from the same source artifacts. If they diverge, the JSON wins.

Feedback log trimming

The full feedback_record.jsonl may be hundreds of entries. The handoff carries only the last K plus every entry with a non-zero exit. The next session loads the full log if it needs to, but the packet stays small.

Build It

code/main.py implements:

Run it:

python3 code/main.py

Output: a printed handoff body, plus both files on disk.

Production patterns in the wild

Codex CLI, Claude Code, and OpenCode each ship a different compaction story; the structured handoff packet sits on top of all three.

Compaction strategies vary; the packet schema does not. Codex CLI's POST /v1/responses/compact is a server-side opaque AES blob (fast path for OpenAI models); the fallback is a local "handoff summary" appended as a _summary user-role message. Claude Code runs five-stage progressive compaction at 95% of context. OpenCode does timestamp-based message hiding plus a 5-heading LLM summary. Three different mechanisms, same need: serialize what survives compression into a portable artifact. The packet is that artifact.

Fresh-session handoff is not compaction. Compaction extends a session; handoff closes one cleanly and starts the next. The Hermes Issue #20372 framing (April 2026) is right: when in-place compression starts degrading, the agent should write a compact handoff, end the session, and resume in fresh context. The packet is what makes that transition cheap. The mistake is to keep compressing until quality collapses; the fix is to budget for an early, clean handoff.

One active handoff per branch and topic. Multi-agent coordination breaks down on stale handoffs more than on bad model output. Always include branch, last_known_good_commit, and a status of active | superseded | archived. Stale handoffs are archived; only the active one drives the next session. This is the difference between handoff-as-notes and handoff-as-state.

Wrap up before 50-75% context, not at the wall. The hand-written-pattern playbook (CLAUDE.md + HANDOVER.md) reports best results when the session ends at 50-75% context budget instead of 95%. The packet generator runs cleanly before compression artifacts pollute the source state. Cheap to write while context is intact; expensive when the model is already losing its place.

Use It

Production patterns:

The packet is small, regular, and cheap to produce. The cost saving compounds with every session.

Ship It

outputs/skill-handoff-generator.md produces a generator tuned to a project's artifact paths, an end-of-session hook that runs it, and a handoff.json schema the next agent reads on startup.

Exercises

  1. Add an assumptions_to_validate field that surfaces every assumption the builder logged but the reviewer did not score above 1.
  2. Trim the feedback summary differently for failing runs versus passing ones. Defend the asymmetry.
  3. Include a "questions for the human" list. What is the threshold for a question to make it into the packet versus into a chat message?
  4. Make the generator idempotent: running it twice produces the same packet. What needs to be stable for that to hold?
  5. Add a "next session prereqs" section listing exactly the artifacts the next session must load before acting.

Key Terms

Term What people say What it actually means
Handoff packet "Session summary" Generated artifact carrying the seven fields, both markdown and JSON
Next action "What to do first" The one concrete step that starts the next session
Feedback trim "Log summary" Last K records plus every non-zero exit
Status report "What we did" A document missing next_action; useful, but not a handoff
Verdict pointer "Receipt" Path to the verification + review reports for traceability

Further Reading