Ryan Lopopolo — Harness Engineering: How to Build Software When Humans Steer, Agents Execute¶

Source index. ~46-min conference talk. Raw at ryan-lopopolo-harness-engineering-2026.

Thesis¶

Nine months of building software exclusively through agents at openai. Code is free. Implementation is no longer scarce. The job of the senior engineer is now harness-engineering — making the codebase, docs, and processes legible to agents so they can do the full job while humans do higher-leverage delegation, system design, and taste.

Structure¶

"I am a token billionaire" — the AGI-pill framing
The banned-editor experiment — team works only through the harness
Three scarce resources: human time, human attention, model context window
code-is-free axiom — consequences for refactor, migration, P3 work
Non-functional requirements are the hard part — non-functional-requirements-as-prompts
"Don't accept slop" — short-term velocity hit to install durable guardrails
Observability-for-agents — DevTools via skill, local dev stack invocable by Codex
The lint-bespoke-to-the-codebase pattern (fetch without retry/timeout example)
Test-the-source-code (file ≤350 lines as a context-window invariant)
PR as hub-and-spoke broadcast — throughput over blocking review
Five-to-ten deep skills > wide shallow skill surface
Q&A: working in the car, CarPlay voice mode not ready

Concepts introduced¶

harness-engineering — the framing
code-is-free — the axiom
non-functional-requirements-as-prompts — the operational move

Entities¶

ryan-lopopolo — speaker, OpenAI MTS
openai — employer

Memorable moves¶

"I am a token billionaire and I believe that in order for us to get into our AGI future, we want everybody to be token billionaires."
"I've lived that experience by banning my team from even touching their editors."
"Code is free… it's free to produce, free to refactor, and it is not a thing to get hung up on anymore."
"Things are either P0s or P2s. Those P3s will never get done. However, in a world where code is free… all those P3s get kicked off immediately, maybe 4x in parallel."
"The important thing is not the code but the prompt and the guardrails that got you there."
"Don't produce slop. Don't accept slop. You won't get slop in your codebase."
"We need to make them legible to those agents that are driving the implementation."
"We don't go super wide on skills, preferring to mature them deeply."

Cross-ingest links¶

agentic-engineering — andrej-karpathy's framing ("preserve the quality bar while going faster with agents"); Lopopolo is what it looks like when you operationalize Karpathy's abstraction at team scale.
software-factory — eric-zakariasson's factory metaphor; harness-engineering is the ops discipline inside the factory walls. Zakariasson paints the factory; Lopopolo runs the shift.
agent-as-junior-engineer — Lopopolo takes the stronger stance: "they're isomorphic to you and I." Senior model, not junior.
driving-into-mud — mitchell-hashimoto's failure mode; Lopopolo's whole guardrail regime is designed to prevent it at fleet scale.
emergent-cursor-rules — same feedback loop (observe failure → encode rule → refactor codebase to match).
parallel-agent-competitions — enabled by code-is-free ("4x in parallel… pick one that solves the problem").
llm-judge-calibration — mahmoud-mabrouk's adjacent move (calibrate agent judgment against human taste before trusting).
agent-taste — peter-steinberger's concept; what gets encoded into persona docs and bespoke lints is the human's taste.
verifiable-systems-for-agents — tests-on-source-code (file-length cap) is a verifiable-invariant pattern.
soul-md — persona files; Lopopolo's "persona-oriented documentation around what a good job looks like" lines up.

Open questions¶

The "ban editors" policy is maximalist. What's the minimum-viable version for a team not at OpenAI scale? Which parts generalize?
5–10 skills as the centralization target — what's the selection criterion? How do you retire a skill? Lopopolo doesn't say.
Tests-about-source-code (file length, naming conventions) vs. standard linters — when does this cross into over-engineering? Where's the break-even on the agent-context savings vs. author cognitive load?
Throughput-over-blocking PR review: what's the defect-escape rate? Lopopolo claims guardrails + tests catch it, but he doesn't cite numbers.
"Implementation agent can acknowledge, defer, or reject any feedback." Does this scale past a small, high-trust team, or is it an OpenAI-culture-specific move?
The agents-building-agents reflexive dimension: he builds internal agents to improve co-workers' productivity. What's the compounding / regression profile?

Synthesis note¶

Five AI Engineer 2026 ingests now form a layered picture of the agentic SDLC:

Supply side noise — peter-steinberger (agent-security-slop)
Eval side theater — mahmoud-mabrouk (llm-judge-calibration)
Execution side untrust — harshil-agrawal (ai-generated-code-is-untrusted, capability-based-security)
Identity side delegation — Riley & Galan (auth-for-ai-four-pillars)
Operating discipline — Lopopolo (this page) (harness-engineering, code-is-free, non-functional-requirements-as-prompts)

Lopopolo sits above the other four — the others describe individual trust boundaries (code, eval, exec, identity); Lopopolo describes the day-to-day practice of running a team where the trust boundaries are presumed and the work is moving synchronous human time into higher-leverage guidance. My inference: the four edges without the operating discipline produce a hardened but idle system; the operating discipline without the edges produces throughput with leaks. Neither speaker says this explicitly — it's my synthesis across the five talks.