Distill to Small Task Model¶
Pattern shopify runs repeatedly: take the largest frontier model available, use it to generate training signal, distill into a small (7-8B) specialist model — often a liquid-ai architecture — for a single narrow task at very high throughput or long context. Contrast with running the frontier model itself at inference time.
Concrete Shopify instances:
- Search backend: 800 → 4,200 QPS same quality on same hardware
- Theme Liquid gisting (the templating language)
- Catalog formulation for ucp-catalog
- sidekick-pulse merchant notifications
Parakhin's claim: in the narrow-task regime with long context and batched throughput, distilled Liquid specialists beat same-size Qwen / Kimi variants. This is the concrete mechanism by which auto-research loops translate frontier progress into operational cost wins.
Why it matters¶
A general operating pattern for big-install-base platforms: don't deploy the frontier, distill from it. Complements install-base-moat — you get the benefit of frontier improvements without the serving cost, because your data catalog + usage distribution lets you specialize aggressively.