Map-Reduce Classification¶
Pattern listen-labs uses to get quantitative structure out of hundreds or thousands of open-ended qualitative responses without paying frontier-model cost per row.
Mechanism¶
- The main research agent identifies a question that needs row-level labeling (e.g. "how many of these 500 interviews mention price sensitivity?").
- It calls a hardcoded classification tool that fans out to a small model — GPT-mini or Claude Haiku — one call per row.
- Results aggregate back into a new column on the virtual table.
- The main agent now has robust quantitative data derived from media-rich, open-ended conversations.
Florian frames the tool explicitly as a map-reduce:
"You can think of it more as like a map-reduce call… you can call it sub-agent or you can call it just LLM."
Why hardcode it?¶
Listen's research agent could in principle do this via free-form tool use, but Florian's take: some fan-outs are worth a specialized tool. Hardcoding guarantees the aggregation shape (one row in → one cell out, then reduce to a count/percentage) and makes the result live — when a 501st interview arrives, the tool re-maps only the new row, not the whole corpus. See live-report-numbers.
Related¶
- distill-to-small-task-model — same instinct: don't use frontier models for bulk classification
- virtual-table-architecture — the substrate this writes into
- subagent-architecture — when the "map" step uses a full sub-agent vs a single LM call