Skip to content

Parallel Agent Competitions

Workflow pattern described by mitchell-hashimoto: for low-confidence tasks, don't pick the agent — race them.

The setup

ghostty/       # real repo
ghostty-2/     # copy
ghostty-3/     # copy
ghostty-4/     # copy

Same prompt, different models / agents / providers, run in parallel on the clones. When one finishes, hop over and review. Pick the best output — discard the rest. Most of the work is remote anyway, so wall-clock cost is low.

"It's a competition. If that's something you can't really do with people without getting in a lot of trouble. But with machines, you could be like 'you four, fight to the death.'"

When to use

  • You have low confidence the first attempt will be right (unfamiliar module, tricky invariants, style you can't easily specify).
  • The task is well-scoped enough that reviewing 3–4 candidate diffs is cheaper than re-prompting one agent 3–4 times.
  • Models differ meaningfully in domain strength (e.g. Claude vs Gemini vs local model on the same Zig bug).

When not to use

  • High-confidence tasks — wasteful.
  • Refactors / renames — all agents do these well; no signal from racing.
  • Long open-ended sessions — coordinating 4 ongoing agents becomes its own full-time job (driving-into-mud × 4).

Mitchell's observation

  • Claude has been the most reliable winner in 2025.
  • Gemini was competitive for a while, then "something changed" and dropped in his informal benchmark.
  • Zig is the discriminating dimension — most models fall apart, so racing surfaces the few that don't.