To show their level, the authors imagined a 400 MW AI datacenter with 1024 GPU racks of 128 GPUs every for a complete of 128,000 GPUs. “Assume 12.8T scale-up and 1.6T scale-out bandwidth per GPU. With OSFP change racks which have a density of 1.6 Pbps per rack, this might require greater than 1,400 change racks for scale-up and scale-out materials. With XPO, this might require 75% fewer racks, saving over 1,050 racks or 44 % of the ground house,” Bechtolsheim and Vusirikala said within the weblog.
“Eliminating 75% of change racks interprets to large reductions in building and infrastructure prices, together with energy distribution, plumbing and set up prices, whereas accelerating deployment timelines,” Bechtolsheim and Vusirikala said.
Arista stated the water-cooling functionality of XPO can be an vital characteristic.
“All massive AI information facilities might be liquid cooled and the switches that go into these information facilities additionally have to be liquid cooled,” Bechtolsheim and Vusirikala said. “Whereas one can add liquid cooled chilly plates on flat-top OSFP modules, this doesn’t considerably enhance thermal efficiency.”
XPO solves this downside by integrating a liquid chilly plate contained in the module, with two 32-channel paddle playing cards sharing the widespread chilly plate which may cool each low energy in addition to high-power optics similar to 8x1600G-ZR/ZR+ with as much as 400W of energy, Bechtolsheim and Vusirikala said.
XPO modules are a lot easier than OSPF modules which improves reliability as effectively. “Every 32-channel paddle card has just one microcontroller and one set of voltage converters, a 75% discount in widespread elements versus 4 OSFPs,” Bechtolsheim and Vusirikala wrote.
