Virtual Table Architecture¶
Architectural pattern listen-labs uses to expose qualitative interview data to its research agent. Instead of a virtual file system — the currently fashionable harness pattern (cf. subagent-architecture, harness-engineering) — Listen exposes the corpus as a virtual table.
Shape¶
- Rows = one response / one interview transcript.
- Columns = a question or an extracted feature (sentiment, topic tag, emotional valence, custom classification).
- The agent's primary affordance is creating new columns by calling a classification tool (e.g.
classify-with-small-model) that spawns a map-reduce-classification job across all rows. - Underneath, the table is Postgres, not a CSV file. When Python execution is needed (roughly 20% of tasks, typically for bespoke analysis the structured tools don't cover), the data is materialized as a pandas DataFrame inside an E2B sandbox.
Why table > files for this domain¶
Florian Juengermann's argument (source): qualitative interview corpora are already tabular — each respondent is a row, each question is a column. Forcing them into a file-system abstraction loses the natural join/aggregate shape and makes the agent invent its own.
"Right now in the main agent, it's not directly file structure. We think of it more as a table… the agent can basically create new columns."
Contrast with Mitchell Hashimoto's and Ryan Lopopolo's setups where the repository/file-tree IS the substrate — those are code domains where files are the native unit. My inference: the right substrate for an agent is whatever unit the domain already uses to think — code = files, research = rows.
Related¶
- map-reduce-classification — the primary "write" operation on the table
- contextual-prompt-engineering — how the agent knows which columns matter
- subagent-architecture — complementary pattern for orchestration