Keysight Applied sciences, introduces Keysight AI (KAI) Knowledge Centre Builder, a complicated software program suite that emulates real-world workloads to judge how new algorithms, elements, and protocols influence the efficiency of AI coaching. KAI Knowledge Centre Builder’s workload emulation functionality integrates massive language mannequin (LLM) and different synthetic intelligence (AI) mannequin coaching workloads into the design and validation of AI infrastructure elements – networks, hosts, and accelerators. This resolution allows tighter synergy between {hardware} design, protocols, architectures, and AI coaching algorithms, boosting system efficiency.
AI operators use numerous parallel processing methods, often known as mannequin partitioning, to speed up AI mannequin coaching. Aligning mannequin partitioning with AI cluster topology and configuration enhances coaching efficiency. Throughout the AI cluster design part, crucial questions are finest answered by experimentation. Lots of the questions deal with knowledge motion effectivity between the graphics processing models (GPUs). Key issues embrace:
Scale-up design of GPU interconnects inside an AI host or rack
Scale-out community design, together with bandwidth per GPU and topology
Configuration of community load balancing and congestion management
Tuning of the coaching framework parameters
The KAI Knowledge Centre Builder workload emulation resolution reproduces community communication patterns of real-world AI coaching jobs to speed up experimentation, cut back the educational curve obligatory for proficiency, and supply deeper insights into the reason for efficiency degradation, which is difficult to attain by actual AI coaching jobs alone. Keysight clients can entry a library of LLM workloads like GPT and Llama, with a number of fashionable mannequin partitioning schemas like Knowledge Parallel (DP), Absolutely Sharded Knowledge Parallel (FSDP), and three-dimensional (3D) parallelism.
Utilizing the workload emulation utility within the KAI Knowledge Centre Builder allows AI operators to:
Experiment with parallelism parameters, together with partition sizes and their distribution over the out there AI infrastructure (scheduling)
Perceive the influence of communications inside and amongst partitions on total job completion time (JCT)
Establish low-performing collective operations and drill right down to establish bottlenecks
Analyse community utilisation, tail latency, and congestion to grasp the influence they’ve on JCT
The KAI Knowledge Centre Builder’s new workload emulation capabilities allow AI operators, GPU cloud suppliers, and infrastructure distributors to carry life like AI workloads into their lab setups to validate the evolving designs of AI clusters and new elements. They’ll additionally experiment to fine-tune mannequin partitioning schemas, parameters, and algorithms to optimise the infrastructure and enhance AI workload efficiency.
Ram Periakaruppan, Vice President and Normal Supervisor, Community Check & Safety Options, Keysight, mentioned: “As AI infrastructure grows in scale and complexity, the necessity for full-stack validation and optimisation turns into essential. To keep away from expensive delays and rework, it is important to shift validation to earlier phases of the design and manufacturing cycle. KAI Knowledge Centre Builder’s workload emulation brings a brand new degree of realism to AI element and system design, optimising workloads for peak efficiency.”
KAI Knowledge Centre Builder is the inspiration of the Keysight Synthetic Intelligence (KAI) structure, a portfolio of end-to-end options designed to assist clients scale synthetic intelligence processing capability in knowledge centres by validating AI cluster elements utilizing real-world AI workload emulation.
