Skip to content

Harshil Agrawal — Why & How to Sandbox AI-Generated Code

Source index. ~38-min talk. Raw at harshil-agrawal-sandbox-ai-code-2026.

Thesis

AI-generated code is untrusted code from the internet. Apply capability-based-security via sandboxing. Pick isolates-vs-containers based on one question (does the code need FS/processes/packages?). Follow the 8-item universal checklist.

Structure

  1. The trajectory — auto-complete → full code gen → autonomous agents, in 2 years
  2. Reframe — we're running untrusted code with production privileges
  3. Threat model — hallucination, "helpful" LLM, prompt injection (direct + indirect)
  4. Why sandboxes work — browsers, OS, mobile already solved this
  5. Capability-based security — default deny, explicit allow
  6. Spectrum: eval → isolates → containers → VMs
  7. Demo 1: OpenClaw-alternative using V8 isolates on Cloudflare
  8. Demo 2: PromptMotion (video generator) using containers
  9. Decision tree + trade-offs
  10. 8-item universal checklist
  11. Secrets anti-pattern + proxy-through-worker fix
  12. Cleanup discipline (try/finally, max lifetimes)

Concepts introduced

Entities

Memorable moves

  • "If you told someone, 'I found this code on a random website, let's run it in production,' you absolutely would not. That's security 101. But that's essentially what we're doing with LLM-generated code."
  • "The over-helpful LLM is dangerous precisely because its behavior looks reasonable."
  • "Don't enumerate what to block. Enumerate what to allow."
  • "Would you rather give someone a master key and a list of 10,000 rooms they can't enter — or keys to the 3 rooms they actually need?"
  • Secrets anti-pattern: env.API_KEY → sandbox. Never. Proxy through your own worker.

Open questions

  • What's the container-per-user cost profile at 10k, 100k, 1M concurrent users? At what scale does the "one user one sandbox" rule start to buckle financially?
  • How do you audit the 8-item checklist across a codebase? Is there a linter / policy-as-code for "this API endpoint exposes an unsandboxed LLM call"?
  • Where do indirect prompt injections get caught — pre-sandbox (input validation) or post-sandbox (capability enforcement)? Probably both but the split matters.
  • The "code mode" pattern Agrawal mentions (for agent integration) — what's its status and how does it compare to OpenAI's code-interpreter?

Synthesis note

Three ingests from AI Engineer 2026 now form a tight triangle on the trust-boundary axis: - peter-steinberger — maintainer-side: AI-generated CVE/PR slop - mahmoud-mabrouk — judge-side: AI eval is theater unless calibrated - harshil-agrawal — execution-side: AI code is untrusted; sandbox it

Together: don't trust what AI produces (at any layer — code, review, judgment) without verified bounds.