Tencent has expanded its household of open-source Hunyuan AI fashions which are versatile sufficient for broad use. This new household of fashions is engineered to ship highly effective efficiency throughout computational environments, from small edge units to demanding, high-concurrency manufacturing methods.
The discharge features a complete set of pre-trained and instruction-tuned fashions out there on the developer platform Hugging Face. The fashions are available in a number of sizes, particularly with parameter scales of 0.5B, 1.8B, 4B, and 7B, offering substantial flexibility for builders and companies.
Tencent has indicated that these fashions have been developed utilizing coaching methods much like its extra highly effective Hunyuan-A13B mannequin, permitting them to inherit its efficiency traits. This method allows customers to pick the optimum mannequin for his or her wants, whether or not it’s a smaller variant for resource-constrained edge computing or a bigger mannequin for high-throughput manufacturing workloads, all whereas making certain robust capabilities.
Probably the most notable options of the Hunyuan sequence is its native assist for an ultra-long 256K context window. This enables the fashions to deal with and keep steady efficiency on long-text duties, a significant functionality for complicated doc evaluation, prolonged conversations, and in-depth content material technology. The fashions assist what Tencent calls “hybrid reasoning,” which permits for each quick and sluggish pondering modes that customers can select between relying on their particular necessities.
The corporate has additionally positioned a robust emphasis on agentic capabilities. The fashions have been optimised for agent-based duties and have demonstrated main outcomes on established benchmarks corresponding to BFCL-v3, τ-Bench, and C3-Bench, suggesting a excessive diploma of proficiency in complicated, multi-step problem-solving. As an illustration, on the C3-Bench, the Hunyuan-7B-Instruct mannequin achieves a rating of 68.5, whereas the Hunyuan-4B-Instruct mannequin scores 64.3.
The sequence’ efficiency is a deal with environment friendly inference. Tencent’s Hunyuan fashions utilise Grouped Question Consideration (GQA), a method identified for enhancing processing velocity and decreasing computational overhead. This effectivity is additional enhanced by superior quantisation assist, a key component of the Hunyuan structure designed to decrease deployment limitations.
Tencent has developed its personal compression toolset, AngleSlim, to create a extra user-friendly and efficient mannequin compression resolution. Utilizing this instrument, the corporate affords two most important varieties of quantisation for the Hunyuan sequence.
The primary is FP8 static quantisation, which employs an 8-bit floating-point format. This methodology makes use of a small quantity of calibration information to pre-determine the quantisation scale with out requiring full retraining, changing mannequin weights and activation values into the FP8 format to spice up inference effectivity.
The second methodology is INT4 quantisation, which achieves W4A16 quantisation by the GPTQ and AWQ algorithms:
- The GPTQ method processes mannequin weights layer by layer, utilizing calibration information to minimise errors within the quantised weights. This course of avoids requiring mannequin retraining and improves inference velocity.
- The AWQ algorithm works by statistically analysing the amplitude of activation values from a small set of calibration information. It then calculates a scaling coefficient for every weight channel, which expands the numerical vary of vital weights to retain extra data through the compression course of.
Builders can both use the AngleSlim instrument themselves or obtain the pre-quantised fashions instantly.
Efficiency benchmarks verify the robust capabilities of the Tencent Hunyuan fashions throughout a variety of duties. The pre-trained Hunyuan-7B mannequin, for instance, achieves a rating of 79.82 on the MMLU benchmark, 88.25 on GSM8K, and 74.85 on the MATH benchmark, demonstrating stable reasoning and mathematical expertise.
The instruction-tuned variants present spectacular leads to specialised areas. In arithmetic, the Hunyuan-7B-Instruct mannequin scores 81.1 on the AIME 2024 benchmark, whereas the 4B model scores 78.3. In science, the 7B mannequin reaches 76.5 on OlympiadBench, and in coding, it scores 42 on Livecodebench.
The quantisation benchmarks present minimal efficiency degradation. On the DROP benchmark, the Hunyuan-7B-Instruct mannequin scores 85.9 in its base B16 format, 86.0 with FP8, and 85.7 with Int4 GPTQ, indicating that effectivity beneficial properties don’t come at a price to accuracy.
For deployment, Tencent recommends utilizing established frameworks like TensorRT-LLM, vLLM, or SGLang to serve the Hunyuan fashions and create OpenAI-compatible API endpoints, making certain they are often built-in easily into current improvement workflows. This mix of efficiency, effectivity, and deployment flexibility positions the Hunyuan sequence as a seamless highly effective contender in open-source AI.
See additionally: Deep Cogito v2: Open-source AI that hones its reasoning expertise

Need to be taught extra about AI and large information from trade leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The excellent occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover different upcoming enterprise know-how occasions and webinars powered by TechForge here.
