Skip to content

LLM Knowledge Bases

andrej-karpathy's running project and favorite example of software-3-0-native software — a category that couldn't exist before because "there was no code you could write" to do it.

What it is

Given a pile of source documents (articles, papers, transcripts), use an LLM to recompile them into a structured, interlinked wiki. Not a database query, not a RAG response — a reframing of the same data into new linked pages, organized by concept/entity, that the owner can then query.

Why Karpathy likes it

  • Synthetic data generation over a fixed corpus. Every question against the wiki produces a new projection — a new chance to gain insight.
  • Anti-bottleneck for the director. He describes himself as the understanding-bottleneck of his own workflow (see agentic-engineering). Agents can ship fast, but he still has to know what is worth shipping. A wiki built from what he reads is a tool to enhance understanding — the part LLMs don't excel at.
  • Net-new software. Can't be written as 1.0 code; can only exist once you have an LLM that can ingest + cross-link prose.

Connection to this wiki

This very wiki is a concrete instance of the pattern. Sources (YouTube transcripts, articles) get ingested into raw/, then recompiled into entity and concept pages. The andrej-karpathy-sequoia-2026 query page is itself an example of "projection onto information."

Tension Karpathy flags

"The LLM certainly don't excel at understanding — you still are uniquely in charge of that."

Tools enhance understanding; they don't substitute for it. A knowledge base is only as good as the director querying it.