Auto-Research Loop¶
A named automation pattern Shopify ships as tangent on top of tangle: given a pipeline with a measurable metric, an LLM-driven loop iterates over code, hyperparameters, prompts, and infrastructure choices — running real experiments until the metric improves. Credit for Shopify's search throughput jump (800 → 4,200 QPS) and for distillations into liquid-ai models for narrow tasks.
What it is / isn't¶
Not just hyperparameter search. The loop can rewrite code (pipeline logic, kernel choices, prompts) and read its own experiment logs. Key enablers: reproducible workflows (Tangle), cheap experiment cloning, and a well-posed reward — usually an existing production metric. Limits (per Parakhin):
- Needs a well-defined reward signal — "taste" problems stay with humans
- Needs a cheap simulator for the inner loop; expensive sims bottleneck progress
- Doesn't replace researcher intuition; democratizes it by letting non-researchers run experiments
Relation to other concepts¶
- tokens-need-critique-loop — auto-research is a critique loop elevated to pipeline scope
- pipeline-as-specification / pipeline-as-verifier — the pipeline and its metric are the spec the loop optimizes
- simgym supplies the cheap simulator that Tangent needs for customer-facing code paths