Parallel Agent Competitions¶

Workflow pattern described by mitchell-hashimoto: for low-confidence tasks, don't pick the agent — race them.

The setup¶

ghostty/       # real repo
ghostty-2/     # copy
ghostty-3/     # copy
ghostty-4/     # copy

Same prompt, different models / agents / providers, run in parallel on the clones. When one finishes, hop over and review. Pick the best output — discard the rest. Most of the work is remote anyway, so wall-clock cost is low.

"It's a competition. If that's something you can't really do with people without getting in a lot of trouble. But with machines, you could be like 'you four, fight to the death.'"

When to use¶

You have low confidence the first attempt will be right (unfamiliar module, tricky invariants, style you can't easily specify).
The task is well-scoped enough that reviewing 3–4 candidate diffs is cheaper than re-prompting one agent 3–4 times.
Models differ meaningfully in domain strength (e.g. Claude vs Gemini vs local model on the same Zig bug).

When not to use¶

High-confidence tasks — wasteful.
Refactors / renames — all agents do these well; no signal from racing.
Long open-ended sessions — coordinating 4 ongoing agents becomes its own full-time job (driving-into-mud × 4).

Mitchell's observation¶

Claude has been the most reliable winner in 2025.
Gemini was competitive for a while, then "something changed" and dropped in his informal benchmark.
Zig is the discriminating dimension — most models fall apart, so racing surfaces the few that don't.

agentic-engineering · agent-as-junior-engineer · driving-into-mud · claude-code · mitchell-hashimoto

Parallel Agent Competitions¶

The setup¶

When to use¶

When not to use¶

Mitchell's observation¶

Related¶