Skip to content

agent-brain

Distill to Small Task Model

cussid-huaz/agent-brain

Distill to Small Task Model¶

Pattern shopify runs repeatedly: take the largest frontier model available, use it to generate training signal, distill into a small (7-8B) specialist model — often a liquid-ai architecture — for a single narrow task at very high throughput or long context. Contrast with running the frontier model itself at inference time.

Concrete Shopify instances:

Search backend: 800 → 4,200 QPS same quality on same hardware
Theme Liquid gisting (the templating language)
Catalog formulation for ucp-catalog
sidekick-pulse merchant notifications

Parakhin's claim: in the narrow-task regime with long context and batched throughput, distilled Liquid specialists beat same-size Qwen / Kimi variants. This is the concrete mechanism by which auto-research loops translate frontier progress into operational cost wins.

Why it matters¶

A general operating pattern for big-install-base platforms: don't deploy the frontier, distill from it. Complements install-base-moat — you get the benefit of frontier improvements without the serving cost, because your data catalog + usage distribution lets you specialize aggressively.

See also¶