Narek Tatevosyan, Product Director at Nebius AI, explores how AI startups can increase effectivity by optimising their tech stack and utilizing full-stack platforms.
The present generative AI increase is pushing the boundaries of know-how, remodeling industries — and driving up demand for compute. Many AI start-ups are falling into the ‘compute lure’ and focussing on getting access to the newest, strongest {hardware} no matter the fee, fairly than optimising their current infrastructure or discovering more practical and environment friendly options to constructing GenAI purposes.
Whereas GPU energy will all the time be important to coaching massive AI fashions and different machine studying purposes, it isn’t the one factor that issues. With out state-of-the artwork CPUs, excessive velocity community interface playing cards just like the InfiniBand 400 ND, DDR5 reminiscence, and a motherboard and server rack that may tie all of it collectively, it’s not possible to get most efficiency from an NVIDIA H100 or different top-spec GPUs. In addition to taking a broader view of compute, specializing in a extra holistic strategy to growing AI purposes that features environment friendly knowledge preparation, optimised coaching runs, and utilizing scalable inference infrastructure can mean you can scale and evolve your AI purposes in a sustainable method.
The issue with compute
All else being equal, the extra compute you’ve out there and the bigger your dataset, the extra highly effective the AI fashions you may construct. For instance, Meta’s Llama 3.1 8B and 405B LLMs had been skilled on the identical 15 trillion token dataset utilizing NVIDIA H100s – however the 8B model took 1.46 million GPU hours whereas the considerably extra highly effective 405B model took 30.84 million GPU hours.
In the actual world, in fact, all else is seldom equal, and only a few AI firms have the assets to compete head on with a Meta. As an alternative of falling into the compute lure and making an attempt to match the compute spend of a number of the largest firms on the earth, it may be a more practical aggressive technique to focus holistically on the entire tech stack driving your ML improvement.
It’s additionally value noting that whereas Llama 8B isn’t as highly effective as Llama 405B, it’s nonetheless an efficient and aggressive LLM that outperforms many older, bigger fashions. Whereas Meta clearly used an enormous quantity of compute in growing Llama, the researchers had been innovating aggressively in different areas too.
The total stack benefit
Utilizing a single platform to handle all the pieces – from knowledge preparation and labelling to mannequin coaching, fine-tuning and even inference – comes with a number of benefits.
Growing and deploying an AI utility on a single full-stack supplier means your staff has to be taught to make use of a single set of instruments, fairly than a number of totally different platforms. Equally, your knowledge stays on a single platform so that you don’t must take care of the complexities and inefficiencies of multi-cloud operations. Maybe most usefully, if you happen to run into any points you might be coping with a single assist staff who understands the opposite layers in your stack.
And, in fact, there can be monetary advantages: Through the use of the identical infrastructure for knowledge dealing with, coaching, and inference, you usually tend to get higher pricing out of your infrastructure supplier.
Past the large three
Whereas hyperscalers like AWS, Microsoft Azure, and Google Cloud may appear the apparent alternative in case you are investing in a single platform, they’ll have downsides for a lot of if not most AI firms.
Most notably, the Massive Three cloud computing platforms are costly. If you happen to function an extremely properly funded start-up or an enormous tech firm, this won’t be a problem – however for almost all of AI firms greater cloud suppliers don’t supply the most effective ROI. Furthermore, they aren’t optimised for AI particular operations, and in consequence you pay vital premiums for options you don’t want.
Devoted full-stack AI platforms like Nebius supply a way more environment friendly resolution. In addition to offering extra reasonably priced compute and {hardware} setups optimised for each coaching and inference, they solely embody the instruments and options wanted for growing AI purposes. You may give attention to growing, coaching and optimising your AI fashions being assured that they’re on the precise {hardware} for the job, not navigating a sprawling server backend or questioning why your costly GPUs aren’t getting the info throughput they need to.
Whereas leveraging a full-stack strategy to ML improvement requires funding, it minimises your ongoing infrastructure prices. Constructing a greater optimised utility from the beginning not solely saves on coaching runs, however also needs to scale back the price of inference. These sorts of saving can compound over a number of generations of AI fashions. A barely extra environment friendly prototype can result in a way more environment friendly manufacturing mannequin, and so forth into the longer term. The selections you make now might be what offers your organization the runway to make it to an IPO.