The “Vera Rubin” Leap: NVIDIA Unveils a Unified AI Supercomputer for 2026

SAN FRANCISCO — March 7, 2026 — As volume shipments of the Blackwell Ultra (B300) begin to saturate the market, NVIDIA CEO Jensen Huang today provided the most detailed look yet at the company’s next frontier: the “Vera Rubin” architecture.

Named after the trailblazing astronomer who discovered dark matter, the Rubin platform is not just a faster GPU; it is a six-chip “unified supercomputer” designed to collapse the distance between memory, compute, and the networking fabric. Scheduled for full production launch in Q3 2026, Rubin aims to deliver a 10x reduction in inference costs, specifically targeting the needs of massive-context “Agentic AI” systems.

Six Chips, One Unified Nervous System

For the first time, NVIDIA is moving away from the “accelerator” model to a “system-scale” architecture. The Vera Rubin platform comprises six co-designed chips that operate as a single logical unit:

Rubin GPU: A 336-billion transistor monster featuring 3nm process technology and the first implementation of HBM4 memory, delivering an unprecedented 22 TB/s of bandwidth.
Vera CPU: An 88-core, custom Arm-based processor (codenamed Olympus) designed to replace the Grace CPU. It features 1.5 TB of LPDDR5X system memory to keep the GPU pipelines constantly saturated.
NVLink 6 Switch: The “spine” of the rack, doubling bandwidth to 3.6 TB/s per GPU, allowing 72 Rubin units to act as a single, giant GPU.
ConnectX-9 SuperNIC: A 1.6 Tbps networking interface designed for ultra-low latency “scale-out” across massive data center clusters.
BlueField-4 DPU: Offloads security and storage tasks, including a new Inference Context Memory system to store millions of tokens for active AI agents.
Spectrum-6 Ethernet: A photonics-based switch system that delivers 5x improved power efficiency for the “Gigawatt Era” data centers.

Breaking the Memory Wall with HBM4

The definitive breakthrough in Rubin is its integration of HBM4 (High Bandwidth Memory 4). As AI models shift from simple chat to “Agentic Reasoning,” the bottleneck is no longer raw compute, but the speed at which data moves from memory to the chip.

22 TB/s Bandwidth: This represents a nearly 3x jump over the Blackwell Ultra. It allows the system to process “Mixture-of-Experts” (MoE) models with 4x fewer GPUs, significantly reducing training time and energy use.
Vertical Integration: By using the Hybrid Bonding technology showcased by SK hynix at MWC, NVIDIA has essentially merged the memory and the processor into a 3D stack, drastically lowering the power required to move a single bit of data.

“Rubin arrives at the exact moment AI is transitioning from training to agency,” Jensen Huang stated. “It is the first platform where the network is the computer. We have integrated the speed of light into the silicon.”

Infrastructure Requirements: 800V DC and Liquid Cooling

The Vera Rubin platform is built for the Gigawatt AI Factory. To handle the massive power density, NVIDIA is mandating a shift in data center standards:

800V DC Distribution: Moving from traditional AC to 800V DC reduces power losses by up to 20%, a necessity for racks that can now draw over 120 kW each.
Direct-to-Chip Liquid Cooling: The NVL72 VR200 rack design features 45°C warm-water cooling, eliminating the need for traditional, energy-hungry chillers.

Availability and Ecosystem

Major cloud providers including AWS, Microsoft Azure, Google Cloud, and Oracle have already signed multi-year capacity agreements for Rubin. Early customer samples have already begun shipping as of late February 2026, with the first production racks expected to go online in H2 2026.

Generation	Architecture	Memory	Bandwidth	Transistors
2024	Blackwell (B200)	HBM3e	8 TB/s	208B
2025	Blackwell Ultra (B300)	HBM3e	8 TB/s	208B
2026	Vera Rubin (VR200)	HBM4	22 TB/s