Skip to content

Jared Zoneraich — How Claude Code Works

Source index. ~1h05m closing workshop at AI Engineer 2025 (uploaded Dec 2025). Raw at jared-zoneraich-claude-code-works-2026. Unofficial reverse engineering of claude-code — not endorsed by Anthropic.

Structure

  • Why coding agents suddenly work (simple architecture + better models)
  • Internals tour: master loop, tools, to-dos, sub-agents, sandboxing, skills
  • Comparative: Codex, Amp, Cursor Composer, Factory Droid, Devin
  • Evaluation: agent smell, back-testing, rigorous tools
  • Future: headless SDKs, agentic endpoints, fewer tool calls

Concepts introduced / minted

Reinforces existing: subagent-architecture, context-engineering, agentic-engineering, harness-engineering, contextual-prompt-engineering, skill-distillation, eval-lifecycle-pre-to-production.

Entities

Notable claims

  • "Less scaffolding, more model." Scaffolding to paper over current-model flaws is obsolete in 3–6 months; invest in the outer loop + rigorous tools instead.
  • Master loop (nO) is ~4 lines. The architectural simplification is the story; model improvements then compound 1:1.
  • Grep > RAG for general-purpose agents. Vector DBs were a workaround for weak long-context / weak tool use.
  • To-dos are not deterministically enforced — purely system-prompt-level structure, only possible because 2025 models follow instructions.
  • Handoff > compact (Amp) is probably the winning context-continuation pattern.
  • "AI therapist problem" — no global maximum in agent design; Claude Code, Codex, Composer, Amp, Droid each own different use cases. Taste and domain experts (promptlayer's thesis) are the moat.
  • Future: most LLM calls may be replaced by claude-code SDK invocations — the agent loop as the new completions API.

Open questions

  • Is skill auto-selection a post-training problem or an inherent limit of prompt-level dispatch?
  • Does the "one mega tool call" vs "hundreds of tool calls" debate resolve via model capability, or via tool-ecosystem standards?
  • How do you eval a maximally flexible master loop? agent-smell is a start — what's the right aggregation?