Skip to content

Model Drift

ash-lewis: if you "set and forget" an LLM in production, it is "probably already drifting and getting worse." Drift arises from two compounding sources:

  1. Input drift — user query distribution evolves after launch (same phenomenon Konstanty describes in eval-lifecycle-pre-to-production: "Adidas size 47""shoes like LeBron James").
  2. Provider-side drift — closed frontier models change behaviour between versions with no notice to the operator.

Remediation pattern (Lewis)

  • Partition usage — don't send all traffic to one model.
  • Deploy small/open-source task-specific models — gives you ownership of weights and inference logs.
  • Agents as continuous selectorsagents-as-model-selectors: experiments run over Llama/Glyner/DeepSeek using real inference traffic as eval set.

Why this matters

Konstanty describes the symptom (silent dropoff, eval drift). Lewis proposes a structural remedy — treat model selection as an ongoing agentic workflow, not a procurement decision.