Skip to content

Virtual Table Architecture

Architectural pattern listen-labs uses to expose qualitative interview data to its research agent. Instead of a virtual file system — the currently fashionable harness pattern (cf. subagent-architecture, harness-engineering) — Listen exposes the corpus as a virtual table.

Shape

  • Rows = one response / one interview transcript.
  • Columns = a question or an extracted feature (sentiment, topic tag, emotional valence, custom classification).
  • The agent's primary affordance is creating new columns by calling a classification tool (e.g. classify-with-small-model) that spawns a map-reduce-classification job across all rows.
  • Underneath, the table is Postgres, not a CSV file. When Python execution is needed (roughly 20% of tasks, typically for bespoke analysis the structured tools don't cover), the data is materialized as a pandas DataFrame inside an E2B sandbox.

Why table > files for this domain

Florian Juengermann's argument (source): qualitative interview corpora are already tabular — each respondent is a row, each question is a column. Forcing them into a file-system abstraction loses the natural join/aggregate shape and makes the agent invent its own.

"Right now in the main agent, it's not directly file structure. We think of it more as a table… the agent can basically create new columns."

Contrast with Mitchell Hashimoto's and Ryan Lopopolo's setups where the repository/file-tree IS the substrate — those are code domains where files are the native unit. My inference: the right substrate for an agent is whatever unit the domain already uses to think — code = files, research = rows.