Model Drift¶

ash-lewis: if you "set and forget" an LLM in production, it is "probably already drifting and getting worse." Drift arises from two compounding sources:

Input drift — user query distribution evolves after launch (same phenomenon Konstanty describes in eval-lifecycle-pre-to-production: "Adidas size 47" → "shoes like LeBron James").
Provider-side drift — closed frontier models change behaviour between versions with no notice to the operator.

Remediation pattern (Lewis)¶

Partition usage — don't send all traffic to one model.
Deploy small/open-source task-specific models — gives you ownership of weights and inference logs.
Agents as continuous selectors — agents-as-model-selectors: experiments run over Llama/Glyner/DeepSeek using real inference traffic as eval set.

Why this matters¶

Konstanty describes the symptom (silent dropoff, eval drift). Lewis proposes a structural remedy — treat model selection as an ongoing agentic workflow, not a procurement decision.