Rafay launched a Serverless Inference providing to assist NVIDIA Cloud Companions (NCPs) and GPU Cloud Suppliers ship high-margin AI providers shortly and cost-effectively.
The providing supplies a token-metered API for operating open-source and privately skilled/tuned giant language fashions (LLMs). Key options embody seamless developer integration, clever infrastructure administration, built-in metering and billing, enterprise-grade safety, and observability instruments.
It permits NCPs and GPU Clouds to transition from GPU-as-a-Service to AI-as-a-Service, addressing the rising demand within the AI inference market. The answer eliminates infrastructure complexity, permitting builders and enterprises to combine generative AI workflows into functions quickly.
“Having spent the final 12 months experimenting with GenAI, many enterprises at the moment are centered on constructing agentic AI functions that increase and improve their enterprise choices,” says Haseeb Budhani, CEO and co-founder of Rafay Programs. “The flexibility to quickly eat GenAI fashions by way of inference endpoints is vital to sooner growth of GenAI capabilities. That is the place Rafay’s NCP and GPU Cloud companions have a fabric benefit.”
This answer represents a shift in the direction of extra dynamic, scalable AI workloads that may function nearer to information sources, lowering latency and enhancing real-time processing. Moreover, it may speed up the adoption of edge-based machine studying functions throughout industries, driving development in edge AI inference markets.
The global AI inference market is projected to develop considerably, reaching $106 billion by 2025 and $254 billion by 2030.
Rafay’s platform helps multi-tenant GPU/CPU infrastructure and can quickly embody fine-tuning capabilities for AI fashions. Rafay goals to simplify cloud-native and AI infrastructure administration, with prospects akin to MoneyGram and Guardant Well being leveraging its options.
Associated
AI/ML | cloud infrastructure | generative AI | GPU cloud | serverless inference
