Feedback Subagent¶

A reviewer agent that knows what a good report looks like, runs alongside or after a main agent's async report generation, and flags errors before they reach the user. listen-labs' solution to the problem of long async runs producing quietly-wrong output.

Dual use — runtime + eval¶

The clever move: the same reviewer plays two roles depending on where it runs.

Async runner (runtime): after the research agent finishes a long report, the feedback subagent reviews it and can force a revision loop before the human sees it. Catches errors during long async runs when no user is watching the stream.
Live runner (evaluation): on user-facing chats the same reviewer runs as an evaluation system — scoring responses post-hoc, contributing to Listen's internal benchmark set + production regression detection.

Florian's framing (source):

"We have this sub-agent which is this reviewer agent that just knows what a good report looks like. So that's what we run it using in the asynchronous runner and then in the live runner we use it as an evaluation system."

Relation to other eval patterns¶

Shares DNA with llm-as-a-judge — a model scoring another model's output. Difference: the feedback subagent is in the loop, not just offline.
Distinct from Konstanty's lifecycle where eval is a separate stage; Listen collapses runtime review and production eval into one artifact.
Matches Mabrouk's GEPA argument that judges themselves must be continuously validated — Listen runs its judge in async mode where failure costs are high, so catching its drift matters.

Pitfalls Florian notes¶

No substitute for the engineer reading traces themselves. The feedback subagent augments human inspection, doesn't replace it.
Eval data tension: Listen doesn't yet run the judge on live streaming chats (too latency-sensitive); they fire it async after each run instead.