Ryan Lopopolo — Harness Engineering: How to Build Software When Humans Steer, Agents Execute¶
Source index. ~46-min conference talk. Raw at ryan-lopopolo-harness-engineering-2026.
Thesis¶
Nine months of building software exclusively through agents at openai. Code is free. Implementation is no longer scarce. The job of the senior engineer is now harness-engineering — making the codebase, docs, and processes legible to agents so they can do the full job while humans do higher-leverage delegation, system design, and taste.
Structure¶
- "I am a token billionaire" — the AGI-pill framing
- The banned-editor experiment — team works only through the harness
- Three scarce resources: human time, human attention, model context window
- code-is-free axiom — consequences for refactor, migration, P3 work
- Non-functional requirements are the hard part — non-functional-requirements-as-prompts
- "Don't accept slop" — short-term velocity hit to install durable guardrails
- Observability-for-agents — DevTools via skill, local dev stack invocable by Codex
- The lint-bespoke-to-the-codebase pattern (fetch without retry/timeout example)
- Test-the-source-code (file ≤350 lines as a context-window invariant)
- PR as hub-and-spoke broadcast — throughput over blocking review
- Five-to-ten deep skills > wide shallow skill surface
- Q&A: working in the car, CarPlay voice mode not ready
Concepts introduced¶
- harness-engineering — the framing
- code-is-free — the axiom
- non-functional-requirements-as-prompts — the operational move
Entities¶
- ryan-lopopolo — speaker, OpenAI MTS
- openai — employer
Memorable moves¶
- "I am a token billionaire and I believe that in order for us to get into our AGI future, we want everybody to be token billionaires."
- "I've lived that experience by banning my team from even touching their editors."
- "Code is free… it's free to produce, free to refactor, and it is not a thing to get hung up on anymore."
- "Things are either P0s or P2s. Those P3s will never get done. However, in a world where code is free… all those P3s get kicked off immediately, maybe 4x in parallel."
- "The important thing is not the code but the prompt and the guardrails that got you there."
- "Don't produce slop. Don't accept slop. You won't get slop in your codebase."
- "We need to make them legible to those agents that are driving the implementation."
- "We don't go super wide on skills, preferring to mature them deeply."
Cross-ingest links¶
- agentic-engineering — andrej-karpathy's framing ("preserve the quality bar while going faster with agents"); Lopopolo is what it looks like when you operationalize Karpathy's abstraction at team scale.
- software-factory — eric-zakariasson's factory metaphor; harness-engineering is the ops discipline inside the factory walls. Zakariasson paints the factory; Lopopolo runs the shift.
- agent-as-junior-engineer — Lopopolo takes the stronger stance: "they're isomorphic to you and I." Senior model, not junior.
- driving-into-mud — mitchell-hashimoto's failure mode; Lopopolo's whole guardrail regime is designed to prevent it at fleet scale.
- emergent-cursor-rules — same feedback loop (observe failure → encode rule → refactor codebase to match).
- parallel-agent-competitions — enabled by code-is-free ("4x in parallel… pick one that solves the problem").
- llm-judge-calibration — mahmoud-mabrouk's adjacent move (calibrate agent judgment against human taste before trusting).
- agent-taste — peter-steinberger's concept; what gets encoded into persona docs and bespoke lints is the human's taste.
- verifiable-systems-for-agents — tests-on-source-code (file-length cap) is a verifiable-invariant pattern.
- soul-md — persona files; Lopopolo's "persona-oriented documentation around what a good job looks like" lines up.
Open questions¶
- The "ban editors" policy is maximalist. What's the minimum-viable version for a team not at OpenAI scale? Which parts generalize?
- 5–10 skills as the centralization target — what's the selection criterion? How do you retire a skill? Lopopolo doesn't say.
- Tests-about-source-code (file length, naming conventions) vs. standard linters — when does this cross into over-engineering? Where's the break-even on the agent-context savings vs. author cognitive load?
- Throughput-over-blocking PR review: what's the defect-escape rate? Lopopolo claims guardrails + tests catch it, but he doesn't cite numbers.
- "Implementation agent can acknowledge, defer, or reject any feedback." Does this scale past a small, high-trust team, or is it an OpenAI-culture-specific move?
- The agents-building-agents reflexive dimension: he builds internal agents to improve co-workers' productivity. What's the compounding / regression profile?
Synthesis note¶
Five AI Engineer 2026 ingests now form a layered picture of the agentic SDLC:
- Supply side noise — peter-steinberger (agent-security-slop)
- Eval side theater — mahmoud-mabrouk (llm-judge-calibration)
- Execution side untrust — harshil-agrawal (ai-generated-code-is-untrusted, capability-based-security)
- Identity side delegation — Riley & Galan (auth-for-ai-four-pillars)
- Operating discipline — Lopopolo (this page) (harness-engineering, code-is-free, non-functional-requirements-as-prompts)
Lopopolo sits above the other four — the others describe individual trust boundaries (code, eval, exec, identity); Lopopolo describes the day-to-day practice of running a team where the trust boundaries are presumed and the work is moving synchronous human time into higher-leverage guidance. My inference: the four edges without the operating discipline produce a hardened but idle system; the operating discipline without the edges produces throughput with leaks. Neither speaker says this explicitly — it's my synthesis across the five talks.