Phil Kaye, Co-Founder and Director of Vespertec, argues that various accelerators will develop inside hyperscalers, however it’s Nvidia’s ecosystem, alongside tightening reminiscence and cooling constraints, that can form most deployments within the yr forward.
In 2026, I count on hyperscalers will proceed investing in their very own accelerators – Google will increase its TPU deployments, and Meta can also be evaluating the usage of Google’s TPU platform. These chips will achieve their very own environments, however they received’t meaningfully problem Nvidia’s place exterior hyperscaler partitions.
The reason being easy: what retains Nvidia forward is the depth and completeness of its ecosystem. Past the {hardware} itself, the CUDA software program stack is deeply embedded throughout instruments and workflows that groups already perceive and depend on.
This maturity means Nvidia platforms are properly supported throughout completely different system architectures and slot cleanly into blended vendor environments. For organisations constructing AI infrastructure at scale, interoperability and software program consistency make adoption far simpler with out forcing a wholesale redesign of current estates or locking them right into a single-vendor strategy.
Alongside this prediction that Nvidia continues to dominate the market, Phil supplied another insights:
Non-public cloud makes a comeback, re-establishing itself as a central a part of AI structure
Earlier this month, AWS and Nvidia launched the Non-public AI Manufacturing unit, providing prospects entry to high-performance compute in an atmosphere they will management extra instantly. When an operator at AWS’s scale invests in a mannequin that brings superior compute nearer to the place the info resides, the remainder of the business pays consideration. In 2026, we are going to count on to see a noticeable shift again towards personal cloud infrastructure as others observe their lead.
This isn’t a rejection of public cloud or nostalgia for older deployment patterns, however a mirrored image of a rising recognition amongst corporations to take management over how their AI workload runs and the way efficiency is ruled. For a lot of, regaining management over efficiency variables is turning into important to attaining predictable scaling behaviour. AWS and Nvidia’s endorsement give the mannequin legitimacy and units a path the remainder of the market is more likely to observe.
Trade-wide AI reminiscence stays in super-shortage cycle, redefining structure planning
DRAM, and particularly DDR5, is heading right into a extreme scarcity cycle. Fabrication vegetation are driving manufacturing capability towards HBM3 for GPU manufacturing, which reduces the output of conventional RDIMMS simply as AI servers are driving RDIMM demand to report ranges. Distributors already know what their output capability appears to be like like for 2026, and the demand will soar so sharply {that a} single cloud supplier may realistically snap up all the pieces the market may produce.
With that in thoughts, we will count on aggressive pricing and allocation-based provide. However opposite to fashionable perception, each lead time for procurement and worth pose the most important threat for consumers. The times of last-minute expansions for efficiency upgrades are behind us. Organisations might want to plan far forward and interact with companions earlier for a bonus.
NAND faces the identical underlying strain.
As demand for giant, GPU-dense servers will increase, so does the requirement for high-capacity, high-performance SSDs to help data-heavy AI workloads. That is driving sustained demand for enterprise NAND and contributing to ongoing worth firming, with costs anticipated to proceed rising regularly by way of 2026.
Organisations ought to plan for greater baseline prices and longer procurement timelines, notably for high-capacity enterprise SSDs utilized in large-scale server deployments.
Cooling innovation strikes past proof of idea, selectively
Cooling is coming into a brand new part, with immersion persevering with to mature past proof of idea. Nevertheless, direct liquid cooling (DLC) stays the first alternative for probably the most highly effective AI methods. Applied sciences reminiscent of NVLink aren’t at the moment suitable with immersion environments, which means the highest-performance, most thermally demanding platforms nonetheless rely upon DLC.
As rack energy strikes past 70kW and airflow turns into more durable to handle, these methods are those pushing cooling limits, but they’re additionally the least suited to immersion. This helps clarify why immersion has not seen broader uptake thus far.
That mentioned, the ecosystem round immersion is bettering. Interoperability testing is growing, tooling and repair fashions are maturing, and extra distributors are ready to help immersion the place it is sensible. Immersion won’t be mainstream in 2026, however it is going to characteristic extra usually in design discussions as organisations acknowledge that conventional cooling approaches won’t at all times match future compute ambitions.
This text is a part of our DCR Predicts 2026 collection. Come again each week in January for extra.
