Skip to content

Distill to Small Task Model

Pattern shopify runs repeatedly: take the largest frontier model available, use it to generate training signal, distill into a small (7-8B) specialist model — often a liquid-ai architecture — for a single narrow task at very high throughput or long context. Contrast with running the frontier model itself at inference time.

Concrete Shopify instances:

  • Search backend: 800 → 4,200 QPS same quality on same hardware
  • Theme Liquid gisting (the templating language)
  • Catalog formulation for ucp-catalog
  • sidekick-pulse merchant notifications

Parakhin's claim: in the narrow-task regime with long context and batched throughput, distilled Liquid specialists beat same-size Qwen / Kimi variants. This is the concrete mechanism by which auto-research loops translate frontier progress into operational cost wins.

Why it matters

A general operating pattern for big-install-base platforms: don't deploy the frontier, distill from it. Complements install-base-moat — you get the benefit of frontier improvements without the serving cost, because your data catalog + usage distribution lets you specialize aggressively.

See also