Parallel Agent Competitions¶
Workflow pattern described by mitchell-hashimoto: for low-confidence tasks, don't pick the agent — race them.
The setup¶
Same prompt, different models / agents / providers, run in parallel on the clones. When one finishes, hop over and review. Pick the best output — discard the rest. Most of the work is remote anyway, so wall-clock cost is low.
"It's a competition. If that's something you can't really do with people without getting in a lot of trouble. But with machines, you could be like 'you four, fight to the death.'"
When to use¶
- You have low confidence the first attempt will be right (unfamiliar module, tricky invariants, style you can't easily specify).
- The task is well-scoped enough that reviewing 3–4 candidate diffs is cheaper than re-prompting one agent 3–4 times.
- Models differ meaningfully in domain strength (e.g. Claude vs Gemini vs local model on the same Zig bug).
When not to use¶
- High-confidence tasks — wasteful.
- Refactors / renames — all agents do these well; no signal from racing.
- Long open-ended sessions — coordinating 4 ongoing agents becomes its own full-time job (driving-into-mud × 4).
Mitchell's observation¶
- Claude has been the most reliable winner in 2025.
- Gemini was competitive for a while, then "something changed" and dropped in his informal benchmark.
- Zig is the discriminating dimension — most models fall apart, so racing surfaces the few that don't.