Barclays forecasts that chip-related capital expenditure for consumer AI inference alone is anticipated to strategy $120 billion in 2026 and exceed $1.1 trillion by 2028.
Barclays additionally famous that LLM suppliers, reminiscent of OpenAI, are being compelled to take a look at customized chips, primarily ASICS, as an alternative of GPUs, to cut back the price of inference to maneuver towards profitability.
The case for Google TPUs
Inference consumes over 50% of OpenAI’s compute funds, and TPUs, particularly older ones, supply considerably decrease cost-per-inference in comparison with Nvidia GPUs, Dai mentioned, explaining the importance of TPUs for OpenAI.
“Whereas older TPUs lack the height efficiency of newer Nvidia chips, their devoted structure minimizes vitality waste and idle assets, making them cheaper at scale,” Dai added.
Omdia principal analyst Alexander Harrowell additionally agreed with Dai.
“…a variety of AI practitioners will inform you they get (from TPUs) a greater ratio of floating-point operations per second (FLOPS) — a unit of measuring computational efficiency — utilized to theoretical most efficiency than they do with anything,” Harrowell mentioned.
