Nvidia kicked off its GTC 2024 convention with the formal launch of Blackwell, its next-generation GPU structure due on the finish of the yr.
Blackwell makes use of a chiplet design, to a degree. Whereas AMD’s designs have a number of chiplets, Blackwell has two very giant dies which are tied collectively as one GPU with a high-speed interlink that operates at 10 terabytes per second, in line with Ian Buck, vp of HPC at Nvidia.
Nvidia will ship three new Blackwell information middle and AI GPUs: the B100, B200, and GB200. The B100 has a single processor, the B200 has two GPUs interconnected, and the GB200 options two GPUs and a Grace CPU.
Buck says the GB200 will ship inference efficiency that’s seven instances better than the Hopper GH200 can ship. It delivers 4 instances the AI coaching efficiency of Hopper, 30 instances higher inference efficiency general, and 25 instances higher power effectivity, Buck claimed. “It will increase AI information middle scale to past 100,000 GPUs,” he mentioned on a press name forward of the announcement.
Blackwell has 192GB of HBM 3E reminiscence with greater than 8TB/sec of bandwidth and 1.8 TB of secondary hyperlink. Blackwell additionally helps the corporate’s second-generation transformer engine, which tracks the accuracy and dynamic vary of each layer of each tensor and all the neural community because it proceeds in computing.
Blackwell has 20 petaflops of FP4 AI efficiency on a single GPU. FP4, with 4 bits of floating level precision per operation, is new to the Blackwell processor. Hopper had FP8. The shorter the floating-point string, the sooner it may be executed. That’s why as floating-point strings go up – FP8, FP16, FP32, and FP64 – efficiency is reduce in half with every step. Hopper has 4 Pflops of FP8 AI efficiency, which is lower than half the efficiency of Blackwell.
