Rubin has two dies with 25 petaFLOPs per die, NVLink interconnect and 288GB of HBM4 high-speed reminiscence. The Rubin CPX has one die with 30 petaFLOPS of efficiency, no NVLink and 128GB of GDDR7 reminiscence. So Rubin CPX is perfect for particular excessive context wants that don’t want lots of reminiscence. CPX will probably be cheaper than the usual Rubin however Nvidia wouldn’t say how a lot.
To course of video, AI fashions can take as much as a million tokens for an hour of content material, which may take many hours if not days to generate. The extra tokens the system can generate, the bigger scale processing it could possibly do.
Rubin CPX delivers as much as 30 petaflops of compute with NVFP4 precision. It options 128GB of GDDR7 reminiscence somewhat than the same old HBM reminiscence, which is costlier than GDDR7. Nvidia says that the GDDR7 has sufficient efficiency, and that Rubin CPX delivers thrice quicker consideration capabilities in contrast with GB300 NVL72 techniques.
Rubin CPX is obtainable in a number of configurations, together with the Vera Rubin NVL144 CPX, that may be mixed with the Quantum‐X800 InfiniBand scale-out compute cloth or the Spectrum-XTM Ethernet networking platform with Nvidia Spectrum-XGS Ethernet expertise and Nvidia ConnectX-9 SuperNICs.
Nvidia can also be saying a brand new Vera Rubin NVL 144 CPX rack. Narasimhan mentioned the NVL 144 CPX permits AI service suppliers to dramatically enhance their profitability by delivering $5 billion of income for each $100 million invested in infrastructure.
It is available in two configurations: single rack, with 144 Rubin CPX GPUs, 144 Rubin GPUs, and 36 Vera CPUs for 8 exaFLOPs of NVFP4 compute and 100TB of quick reminiscence and 1.7 PB/s of reminiscence bandwidth. Nvidia mentioned it’s 7.5 occasions quicker than the present top-of-the-line GB300 NVL72.
