Skip to content

Scaling Laws (Plural)

jensen-huang's framing: there is no single "scaling law" — there are at least four, and they compound.

The four

  1. Pre-training scaling — more data, more compute, more params. The original Kaplan/Chinchilla regime. People thought high-quality data would be the wall; it mostly hasn't been.
  2. Post-training scaling — RLHF, DPO, distillation, synthetic data. Extracts more capability from a fixed pre-trained base.
  3. Test-time scaling — inference-time reasoning (chain-of-thought, tool use, search). More compute per token spent.
  4. Agentic scaling — multi-step workflows, long-horizon planning, tool-using agents. Compute budget becomes unbounded in principle.

Implication

Each law multiplies demand for compute. The constraint on AI progress is not ideas running out; it's the ai-supply-chain — power, fabs, packaging, HBM — keeping up with the compounding demand curve.