Skip to content

Context Compaction

The mechanism claude-code uses to keep the model's context below the "stupid threshold." Internally referred to (per jared-zoneraich's read) as H2A — an async I/O buffer that decouples terminal/reasoning output from what goes back into the model.

Behaviour

  • Triggers around ~92% context fill.
  • Strategy: summarise head and tail, drop the middle (classic "lost-in-the-middle"-aware compression), then continue the loop.
  • Long-term memory is offloaded to the file system via bash — the agent writes markdown notes and re-reads them, rather than keeping everything in-context. Zoneraich predicts every chat UI will ship a sandbox for exactly this reason.

Alternative designs mentioned

  • Amp's handoff — instead of compacting in place, spawn a fresh thread and hand over a hand-written briefing. "Faster than reloading, like switching weapons in Call of Duty." Zoneraich leans toward handoff-style as the winning pattern.
  • Cursor Composer — distilled fast model absorbs some of the compaction burden.

Why it matters

Context is "the boogeyman" of agent design — the claude-code-master-loop only works because something is actively pruning. Compaction is the reason simple loops beat DAGs: DAGs pay their context cost per node; a well-compacted loop amortises it.