“The fast growth of compute infrastructure to assist coaching for genAI has created a serious electrical energy availability problem,” mentioned a Gartner analysis observe on rising applied sciences for energy-efficient generative AI compute programs by researchers Gaurav Gupta, Menglin Cao, Alan Priestley, Akhil Singh, and Joseph Unsworth.
This implies these operating AI knowledge facilities should discover options to the issue now to mitigate the challenges for his or her operations, which embrace elevated prices, inadequate energy availability, and poorer sustainability efficiency. “All of those can be finally handed on to knowledge heart operators’ prospects and finish customers,” the researchers famous.
On the similar time, knowledge facilities should steadiness the bottlenecks in efficiency that the drive to GPU-assisted AI is inflicting, famous Eckhardt Fischer, senior analysis analyst for IDC. “Any enchancment within the pc system to scale back this bottleneck will usually present a corresponding enchancment in output,” he noticed.
These bottlenecks for AI/genAI compute necessities embrace reminiscence and networking, as a result of “even the present Moore’s Regulation can’t sustain with explosive compute wants,” famous Gartner’s Gupta.
Optimizing useful resource allocation
Fujitsu’s AI computing dealer middleware goals to unravel this partially utilizing a mixture of adaptive GPU allocator expertise developed by the corporate in November 2023, and AI-processing optimization applied sciences, the corporate mentioned. This enables the middleware to routinely determine and optimize CPU and GPU useful resource allocation for AI processing in a number of packages, giving precedence to processes with excessive execution effectivity.
Nonetheless, quite than typical useful resource allocation, which does the duty on a per-job foundation, Fujitsu’s AI computing dealer dynamically allocates assets on a per-GPU foundation, the corporate mentioned. That is aimed toward bettering availability charges and permitting for the concurrent operating of quite a few AI processes with out caring with GPU reminiscence utilization or bodily capability.