Skip to content

Driving into Mud

mitchell-hashimoto's phrase for the failure mode of long, under-supervised agent sessions:

"It kind of feels like I'm driving into mud. At first it's fine, then the more times I do that, the slower each iteration gets, because every time it makes a new change, it breaks previous things, and eventually it's just… I'm stuck in the mud and really unhappy with where the codebase has ended up."

The shape

  1. Agent ships a plausible first change. ✅
  2. Next change silently breaks an earlier one. ⚠️
  3. Agent fixes the regression by bending something else. ⚠️⚠️
  4. Codebase becomes a web of near-right pieces that are jointly wrong. 🫠
  5. Owner either burns time restoring invariants, or discards the session.

Fundamentally: context-window memory, stochastic behavior, and the agent's bias toward appearing to make progress compound. Without the human architect holding the invariants, coherence drifts.

Counter-practices from the talk

  • Tight scope per step. Small, well-specified units, not "fix this whole feature."
  • Snapshot between steps. jujutsu snapshots so it's cheap to rewind when you notice the drift.
  • Diff before run. After each successful-looking iteration, scan the diff for anything weird (e.g. an agent spawning a subprocess to delete files) before running.
  • Parallel competitions for low-confidence tasks. Multiple agents on the same prompt in parallel Ghostty clones — pick the best output.
  • Disable self-verification when it can't actually verify. Ghostty-Mac has no screenshot/build feedback, so "I ran the tests and it works" reports are worse than silence.

Bridge

Inverse of agentic-engineering's "quality bar preserved" promise. Driving-into-mud is what vibe-coding collapses into when applied to serious software without the senior owner staying in the loop — the failure mode the agent-as-junior-engineer frame is designed to prevent.