Extreme Co-Design¶
jensen-huang's term for designing GPU + CPU + memory + networking + storage + power + cooling + software + rack + pod + data center as one computer, because no single-component optimization can keep up with AI workload growth.
Why it's needed¶
Moore's Law at the transistor level has slowed, but AI compute demand has accelerated. The only way to keep delivering order-of-magnitude gains is to optimize the entire stack simultaneously, with each layer's constraints informing every other layer.
Workload-driven rack generations¶
- Grace Blackwell (NVL72): designed specifically for MoE large language model inference.
- Vera Rubin: one year later, a fundamentally different rack — new CPU (Vera), storage accelerators, and a companion rack called Rock — because agentic workloads "bang on tools" and shift the workload shape away from pure LLM serving.
The implication: rack architecture now has to anticipate the dominant workload two years out. Huang says NVIDIA's current mental unit of compute is no longer the chip or even the rack but the pod, and soon the building.
Cross-links¶
- nvidia — company executing the strategy
- physical-ai — the workload that will drive the next generation
- jensen-huang-lex-fridman-2026 — source