Scaling Laws (Plural)¶
jensen-huang's framing: there is no single "scaling law" — there are at least four, and they compound.
The four¶
- Pre-training scaling — more data, more compute, more params. The original Kaplan/Chinchilla regime. People thought high-quality data would be the wall; it mostly hasn't been.
- Post-training scaling — RLHF, DPO, distillation, synthetic data. Extracts more capability from a fixed pre-trained base.
- Test-time scaling — inference-time reasoning (chain-of-thought, tool use, search). More compute per token spent.
- Agentic scaling — multi-step workflows, long-horizon planning, tool-using agents. Compute budget becomes unbounded in principle.
Implication¶
Each law multiplies demand for compute. The constraint on AI progress is not ideas running out; it's the ai-supply-chain — power, fabs, packaging, HBM — keeping up with the compounding demand curve.
Cross-links¶
- extreme-co-design — why rack architecture must evolve faster than Moore's Law
- physical-ai — the next scaling frontier (embodied)
- jensen-huang-lex-fridman-2026