Skip to content

Isolates vs Containers

Agrawal's decision framework for picking a sandbox primitive when running LLM-generated code.

The spectrum

Level Isolation Startup Capabilities
eval None — same process instant everything (never do this)
V8 Isolates Memory + execution ctx isolated ~0.25 ms JS/TS/Python/WASM, no FS, no processes
Containers Full Linux env ~seconds FS, processes, networking, package managers
Full VMs Hardware ~seconds-minutes everything an OS can do

The one question decision tree

Does the code need a file system, processes, or package installs? - Yes → container. Full stop. - No → isolates. Faster, cheaper, tighter isolation model.

When isolates

  • AI-agent tool-calling loops (function gen → run → return to model → iterate)
  • Code interpreters (user-typed snippet → output)
  • Data transformation pipelines
  • Plugins / skills that just need a restricted DB binding + logger
  • Anything requiring sub-millisecond response time

When containers

  • Building + deploying an application (git clone, npm install, dev server)
  • Running test suites
  • Anything that needs real FS, real processes, real networking

In practice, both

Agent uses isolates as the fast brain (rapid tool-call iteration) and switches to a container as the workbench when it needs to actually build/run an application. The decision isn't which one forever — it's which one for this step.

Isolate pattern (Cloudflare-specific code Agrawal showed)

const isolate = loader.load({
  code: userCode,
  globalOutbound: null,       // ← blocks ALL outbound network
  env: { db: restrictedDB, logger },  // ← explicit capabilities only
});

Few lines of config; strong isolation; no firewall rules, no AST detection of "dangerous" code — just deny-by-default + hand in what's needed.

Container pattern

  • User ID = isolation boundary. One user, one sandbox. Always. Sharing = data-leak vector baked into architecture (hard to unwind).
  • Clean up with try/finally, not try/catch. Even on build failure / exception / fire, destroy the sandbox.
  • Set max lifetimes (Cloudflare default: 10min). Idle containers cost money and add attack surface.

Trade-offs

Isolates: - ✗ Only JS/TS/Python/WASM; no Go, Rust, compiled binaries - ✗ No FS; state lives only in memory unless externalized - ✗ Stateless; each invocation is fresh - ✗ Resource limits (CPU time, memory) - ✓ For short-lived + constrained + side-effect-free, constraints are features

Containers: - ✗ Seconds not ms startup - ✗ More expensive per sandbox - ✗ More moving parts (SDK, durable object, orchestration, networking) - ✓ Real FS, real processes, real servers

Universal 8-item checklist (applies to both)

  1. Default-deny network access
  2. Grant explicit capabilities, not broad access
  3. Isolate per user — never share sandboxes between tenants
  4. Set resource limits (timeouts, memory caps, CPU limits)
  5. Keep secrets outside the sandbox — proxy sensitive operations through your own code
  6. Clean up — try/finally, max lifetimes
  7. Log everything (code, who ran it, when, what it did) — you need the audit trail
  8. Validate input before it hits the sandbox (length, syntax, known-dangerous-pattern detection) — defense in depth

The secret-handling anti-pattern

Don't pass API keys as env vars into the sandbox. The moment the key enters, any code inside (AI-generated, prompt-injected, buggy-log-everything) can read it.

Do proxy-through-worker. The sandbox hits your-worker/proxy-endpoint, your worker adds the auth header with the real key, forwards the request, returns the response. Secret never enters the sandbox.

Cross-references