Arize / Phoenix¶

LLM observability + eval tooling vendor. Phoenix is the OSS tier; Arize Enterprise is the paid platform.

Practitioner take (Konstanty, 2026)¶

maggie-konstanty uses Arize Enterprise but explicitly refuses the UI-based evaluator builder, preferring custom-coded evaluators with API export. Her concrete complaints (verbatim in transcript):

Trace export bottleneck — "If I export more than 1,000 traces, they suddenly slow down. I have to do things in batches. And it takes hours."
No sampling — "They also don't enable sampling as far as I'm concerned." Running 6 evaluators × 100k traces × multi-turn at production scale is cost-prohibitive without a sampling strategy the tool enables natively.
Dashboards are not yet standalone-useful — "it's trivial but it's also not really well yet."

Konstanty's repeated refrain: she spent "a whole day fighting the UI" trying to configure evaluators the tool's way, then went back to code.

This is one datapoint of a broader concepts/ai/custom-code-vs-eval-platform-style dilemma — mature teams tend to defect from platform UIs back to custom code for the eval-construction step, while keeping the platform for trace storage.

Sources¶

maggie-konstanty-evals-2026