Driving into Mud¶
mitchell-hashimoto's phrase for the failure mode of long, under-supervised agent sessions:
"It kind of feels like I'm driving into mud. At first it's fine, then the more times I do that, the slower each iteration gets, because every time it makes a new change, it breaks previous things, and eventually it's just… I'm stuck in the mud and really unhappy with where the codebase has ended up."
The shape¶
- Agent ships a plausible first change. ✅
- Next change silently breaks an earlier one. ⚠️
- Agent fixes the regression by bending something else. ⚠️⚠️
- Codebase becomes a web of near-right pieces that are jointly wrong. 🫠
- Owner either burns time restoring invariants, or discards the session.
Fundamentally: context-window memory, stochastic behavior, and the agent's bias toward appearing to make progress compound. Without the human architect holding the invariants, coherence drifts.
Counter-practices from the talk¶
- Tight scope per step. Small, well-specified units, not "fix this whole feature."
- Snapshot between steps. jujutsu snapshots so it's cheap to rewind when you notice the drift.
- Diff before run. After each successful-looking iteration, scan the diff for anything weird (e.g. an agent spawning a subprocess to delete files) before running.
- Parallel competitions for low-confidence tasks. Multiple agents on the same prompt in parallel Ghostty clones — pick the best output.
- Disable self-verification when it can't actually verify. Ghostty-Mac has no screenshot/build feedback, so "I ran the tests and it works" reports are worse than silence.
Bridge¶
Inverse of agentic-engineering's "quality bar preserved" promise. Driving-into-mud is what vibe-coding collapses into when applied to serious software without the senior owner staying in the loop — the failure mode the agent-as-junior-engineer frame is designed to prevent.