Jürgen Hatheier, Worldwide CTO at Ciena, explains why the surge in AI coaching will eclipse cloud site visitors and pressure operators to rethink capability, automation and fibre technique at 800 Gb/s and past.
Over the previous 20 years, world broadband community site visitors has elevated at a gentle, predictable price. As cloud companies, high-resolution video and different high-bandwidth functions have developed over that point, annual bandwidth will increase of 20 to 30% have been ample to take care of the extra calls for positioned on the community.
However that’s quickly about to vary because of the expansion of AI workloads together with bold plans set forth by governments and companies to compete within the AI race.
Within the UK, for instance, the AI Alternatives Motion Plan goals to extend sovereign compute capability by no less than twenty occasions by 2030, stating “such enlargement is important if the UK is to maintain tempo”. Throughout the Atlantic, the US Division of Power lately recognized 16 federal websites the place tech corporations can quickly construct information centres to speed up business growth of AI.
In the meantime, the most important hyperscalers are each optimising their information centres for AI workloads along with constructing devoted AI information centres. Meta, for instance, is in discussions to construct a brand new AI information centre campus with prices exceeding $200 billion.
The expansion in compute capability pushed by AI information centres will after all have implications on information centre interconnect (DCI), the networks used to hyperlink a number of information centres. Ciena lately commissioned a world survey of over 1,300 information centre consultants to grasp their expectations about AI’s affect on DCI within the coming years. The survey findings validated the well-known proven fact that AI is sparking a major transformation in information centre community infrastructure.
AI’s affect on DCI
Based on Ciena’s survey, greater than half (53%) of information centre consultants consider that, over the subsequent two to a few years, calls for from AI workloads on DCI infrastructure will surpass these of cloud computing and massive information analytics.
To fulfill these calls for, important investments in (and enlargement of) information centre estates and infrastructure are underway. In reality, in line with the survey, 43% of recent information centre amenities are anticipated to be devoted to AI workloads.
As the necessities for AI compute improve, Massive Language Mannequin (LLM) coaching will doubtless happen over geographically distributed amenities. Splitting the coaching in several places will enable the required tens of hundreds of power-hungry GPUs to faucet into completely different elements of the ability grid. This method requires information centres to synchronise outcomes at every step of the coaching course of, by exchanging large quantities of information. These transmissions should be as quick as doable to benefit from the pricey compute infrastructure, creating the demand for unprecedentedly excessive DCI bandwidths.
The worldwide information centre consultants interviewed in our survey count on AI to drive no less than a six-fold improve in DCI community bandwidth over the subsequent 5 years. This interprets to between 40 to 60% compound annual progress, greater than double the expansion sample we’ve identified for the reason that early 2000s.
CSPs and hyperscalers alike are taking steps so as to add extra capability and put together their networks to deal with the calls for pushed by AI. Based on the survey findings, there are a mixture of options, each {hardware} and software program, information centre consultants consider can be wanted to enhance DCI efficiency, effectivity and scalability.
Larger, smarter networks for AI workloads
As information site visitors, pushed by AI workloads alongside pre-existing cloud, video and analytics companies continues to surge, information centre operators are more and more investing in scalable, high-capacity infrastructure to keep up tempo with calls for. For example, practically 9 in ten (87%) information centre consultants consider they are going to want a minimal of 800 Gb/s per wavelengths throughout each new and current community routes.
Capability, nevertheless, isn’t the one infrastructure problem that information centre operators might want to tackle. To handle the varied site visitors sorts and dynamic site visitors patterns of AI workloads, networks are reworking to be extra clever and adaptive.
Good networks and clever automation platforms can be key to making sure AI site visitors is prioritised and routed effectively. Actual-time software program automation capabilities can allow networks to dynamically modify bandwidth, optimise energy consumption, and forestall congestion.
Managed Optical Fibre Networks: A brand new method for collaboration
Whereas cloud suppliers and hyperscalers are increasing very quickly to help AI initiatives, our survey confirms that they are going to be leveraging Managed Optical Fibre Networks (MOFN) as an method to assist scale out and help the elevated calls for for information centres.
In reality, two thirds (67%) of the worldwide survey respondents plan to make use of MOFN whereas a 3rd would purchase darkish fibre themselves.
There is no such thing as a one-size-fits-all method
Whereas the method laid out above might assist cloud and communication service suppliers optimise their networks for AI workloads, the truth is that there’s presently no one-size-fits-all community structure to fulfill all use circumstances. We’ll see over the approaching months and years completely different community architectures and enlargement methods that match particular enterprise fashions, and which finest cater to their prospects. Whatever the method taken, the important thing can be excessive efficiency DCI connectivity.
