Scope Contracts and Task Boundaries
> The model does not know where the work ends. A scope contract is a per-task file that says where the work begins, where it ends, and how to roll back if it spills. The contract turns "stay in scope" from a wish into a check.
Type: Build
Languages: Python (stdlib)
Prerequisites: Phase 14 · 32 (Minimal Workbench), Phase 14 · 33 (Rules as Constraints)
Time: ~50 minutes
Learning Objectives
- Write a scope contract that an agent reads at task start and a verifier reads at task end.
- Specify allowed files, forbidden files, acceptance criteria, rollback plan, and approval boundaries.
- Implement a scope checker that compares a diff against the contract and flags violations.
- Make scope creep visible, automatic, and reviewable.
The Problem
Agents creep. The task is "fix the login bug." The diff touches the login route, the email helper, the database driver, the README, and the release script. Each touch had a plausible reason in the moment. Together they are a different change than the one that was reviewed.
Scope creep is the most under-monitored failure mode in agent work because the agent narrates each step in good faith. The fix is not a stricter prompt. The fix is a contract on disk that says what was promised and a check that compares the result against the promise.
The Concept
What goes in a scope contract
| Field | Purpose |
|---|---|
task_id |
Links to the task on the board |
goal |
One sentence the reviewer can verify |
allowed_files |
Globs the agent may write |
forbidden_files |
Globs the agent must not touch even by accident |
acceptance_criteria |
Test commands or assertion lines that prove done |
rollback_plan |
One paragraph the operator can execute if a halt is required |
approvals_required |
Actions outside scope that need explicit human sign-off |
A contract without forbidden_files is incomplete. The negative space is half the contract.
Globs, not raw paths
Real repos move files. Pin contracts to globs (app/**/*.py, tests/test_signup*.py) so a refactor between sessions does not invalidate the contract.
Rollback is part of scope
Listing how to roll back forces the contract author to think about what could go wrong. A contract you cannot roll back from is a contract that should not be approved.
Scope check is a diff check
The agent writes a diff. The checker reads the diff, the allowed globs, the forbidden globs, and a list of any acceptance commands that ran. Each violation is a tagged finding the verification gate can refuse.
Build It
code/main.py implements:
scope_contract.jsonschema (subset of JSON Schema, glob arrays).- A diff parser that turns a list of touched files plus a list of run commands into a
RunSummary. - A
scope_checkthat returns(violations, in_scope, off_scope)against the contract. - Two demo runs: one that stays in scope, one that creeps. The checker flags the creep with the exact file and reason.
Run it:
python3 code/main.py
Output: the contract, the two runs, the per-run verdicts, and a saved scope_report.json.
Production patterns in the wild
A practitioner running "specsmaxxing" (scope contracts in YAML before invoking the agent) reports rabbit-hole rate dropped from 52% to 21% in three weeks without changing the agent. The contract did the work, not the model. Three patterns make the gain stick.
Violation budgets, not binary failures. agent-guardrails (the OSS merge gate used by Claude Code, Cursor, Windsurf, Codex via MCP) ships a violationBudget per task: minor scope slips within budget are surfaced as warnings; only when the budget is exceeded does the merge gate refuse. Pair with violationSeverity: "error" | "warning". The budget is the difference between a gate that ships and a gate that gets disabled by the team that hated it.
Severity asymmetry by path family. Off-scope writes to docs/ are usually warn; off-scope writes to scripts/, migrations/, config/prod/ are always block. This asymmetry has to live in the contract, not in the runtime, because it is project-specific and changes per task.
Time and network budgets next to file budgets. A time_budget_minutes field bounds the wall clock; the runtime refuses to continue past it without re-approval. A network_egress allowlist on hostnames prevents the agent from quietly hitting an external API that was not part of the task. These are scope dimensions too; the file globs are necessary, not sufficient.
Multi-contract merge semantics (least privilege). When two scope contracts apply (e.g., a project-wide contract plus a task-specific one), the merge is: intersect allowed_files (both contracts must permit the path), union forbidden_files (either can prohibit), time_budget_minutes is the most restrictive (min), approvals_required accumulates. network_egress is None for no enforcement, [] for deny-all, [...] as an allowlist; under merge, None defers to the other side, two lists intersect, and deny-all stays deny-all. State this in the contract schema so the merge is mechanical and reviewable.
Use It
Production patterns:
- Claude Code slash commands. A
/scopecommand writes the contract and pins it as session context. Subagents read the contract before acting. - GitHub PRs. Push the contract as a JSON file in the PR body or as a checked-in artifact. CI runs the scope checker against the merge diff.
- LangGraph interrupts. A scope violation triggers an interrupt; the handler asks the human whether the contract needs to grow or the agent needs to back off.
The contract travels with the task. When the task closes, the contract is archived under outputs/scope/closed/.
Ship It
outputs/skill-scope-contract.md generates a scope contract for a task description and a glob-aware checker that runs in CI on every agent diff.
Exercises
- Add a
network_egressfield listing allowed external hosts. Refuse runs that touch other hosts. - Extend the checker to fail soft on
docs/and hard onscripts/. Justify the asymmetry. - Make the contract derive
allowed_filesfrom agoalfield using a static rule set (no LLM). What goes wrong on the first edge case? - Add a
time_budget_minutesand refuse to continue once the wall clock exceeds it. - Run two contracts against the same diff. What is the right merge semantics when both apply?
Key Terms
| Term | What people say | What it actually means |
|---|---|---|
| Scope contract | "The task brief" | Per-task JSON listing allowed/forbidden files, acceptance, rollback |
| Scope creep | "It also touched..." | Files outside the contract changed in the same task |
| Rollback plan | "We can revert" | The one-paragraph operator runbook for halting |
| Approval boundary | "Needs sign-off" | An action listed in the contract as requiring explicit human approval |
| Diff check | "Path audit" | Comparing touched files against the contract globs |
Further Reading
- LangGraph human-in-the-loop interrupts
- OpenAI Agents SDK tool approval policies
- logi-cmd/agent-guardrails — merge gates and scope validation — violation budgets, severity tiers
- Dev|Journal, Preventing AI Agent Configuration Drift with Agent Contract Testing —
--strictmode without external deps - Agentic Coding Is Not a Trap (production logs) — specsmaxxing receipts: 52% → 21%
- OpenCode permission globs — fine-grained per-permission scope
- Knostic, AI Coding Agent Security: Threat Models and Protection Strategies — scope as part of least privilege
- Augment Code, AI Spec Template — three-tier boundary system (must/ask/never)
- Phase 14 · 27 — prompt injection defenses that pair with scope locks
- Phase 14 · 33 — the rule set this contract specializes per task
- Phase 14 · 38 — the verification gate the checker reports into