Model Drift¶
ash-lewis: if you "set and forget" an LLM in production, it is "probably already drifting and getting worse." Drift arises from two compounding sources:
- Input drift — user query distribution evolves after launch (same phenomenon Konstanty describes in eval-lifecycle-pre-to-production: "Adidas size 47" → "shoes like LeBron James").
- Provider-side drift — closed frontier models change behaviour between versions with no notice to the operator.
Remediation pattern (Lewis)¶
- Partition usage — don't send all traffic to one model.
- Deploy small/open-source task-specific models — gives you ownership of weights and inference logs.
- Agents as continuous selectors — agents-as-model-selectors: experiments run over Llama/Glyner/DeepSeek using real inference traffic as eval set.
Why this matters¶
Konstanty describes the symptom (silent dropoff, eval drift). Lewis proposes a structural remedy — treat model selection as an ongoing agentic workflow, not a procurement decision.