Friday, 1 May 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
AI & Compute

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Last updated: March 6, 2025 9:48 pm
Published March 6, 2025
Share
Person holding popcorn as Alibaba unveils Qwen QwQ-32B — a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.
SHARE

The Qwen crew at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI mannequin that demonstrates efficiency rivalling the a lot bigger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Studying (RL) on strong basis fashions.

The Qwen crew have efficiently built-in agent capabilities into the reasoning mannequin, enabling it to assume critically, utilise instruments, and adapt its reasoning primarily based on environmental suggestions.

“Scaling RL has the potential to boost mannequin efficiency past standard pretraining and post-training strategies,” the crew acknowledged. “Current research have demonstrated that RL can considerably enhance the reasoning capabilities of fashions.”

QwQ-32B achieves efficiency corresponding to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testomony to the effectiveness of RL when utilized to strong basis fashions pretrained on in depth world information. This outstanding end result underscores the potential of RL to bridge the hole between mannequin measurement and efficiency.

The mannequin has been evaluated throughout a variety of benchmarks, together with AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to evaluate its mathematical reasoning, coding proficiency, and normal problem-solving capabilities.

The outcomes spotlight QwQ-32B’s efficiency compared to different main fashions, together with DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the unique DeepSeek-R1.

Benchmark outcomes:

  • AIME24: QwQ-32B achieved 79.5, barely behind DeepSeek-R1-6718’s 79.8, however considerably forward of OpenAl-o1-mini’s 63.6 and the distilled fashions.
  • LiveCodeBench: QwQ-32B scored 63.4, once more intently matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled fashions and OpenAl-o1-mini’s 53.8.
  • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled fashions and OpenAl-o1-mini’s 57.5.
  • IFEval: QwQ-32B scored 83.9, very near DeepSeek-R1-6718’s 83.3, and main the distilled fashions and OpenAl-o1-mini’s 59.1.
  • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled fashions and OpenAl-o1-mini’s 49.3.
See also  Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

The Qwen crew’s method concerned a cold-start checkpoint and a multi-stage RL course of pushed by outcome-based rewards. The preliminary stage centered on scaling RL for math and coding duties, utilising accuracy verifiers and code execution servers. The second stage expanded to normal capabilities, incorporating rewards from normal reward fashions and rule-based verifiers.

“We discover that this stage of RL coaching with a small quantity of steps can improve the efficiency of different normal capabilities, reminiscent of instruction following, alignment with human desire, and agent efficiency, with out important efficiency drop in math and coding,” the crew defined.

QwQ-32B is open-weight and accessible on Hugging Face and ModelScope beneath the Apache 2.0 license, and can be accessible by way of Qwen Chat. The Qwen crew views this as an preliminary step in scaling RL to boost reasoning capabilities and goals to additional discover the mixing of brokers with RL for long-horizon reasoning.

“As we work in direction of creating the subsequent technology of Qwen, we’re assured that combining stronger basis fashions with RL powered by scaled computational assets will propel us nearer to reaching Synthetic Normal Intelligence (AGI),” the crew acknowledged.

See additionally: Deepgram Nova-3 Medical: AI speech mannequin cuts healthcare transcription errors

Need to study extra about AI and large knowledge from business leaders? Try AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.

See also  Quantifying AI ROI in strategy

Source link

TAGGED: Alibaba, Learning, Qwen, QwQ32B, reinforcement, Scaled, showcase
Share This Article
Twitter Email Copy Link Print
Previous Article How Zain Sudan restored mobile connectivity at a time of national crisis How Zain Sudan restored mobile connectivity at a time of national crisis
Next Article Vertiv to showcase latest high-capacity liquid cooling innovations Vertiv to showcase latest high-capacity liquid cooling innovations
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

EU to Raise Biden’s AI Chip Curbs with Trump Administration

(Bloomberg) -- The European Union will increase issues with the US over a call to…

January 21, 2025

Best 3 multi-CDN providers in 2025

In an period of instantaneous entry and international digital experiences, delivering your content material by…

July 12, 2025

AIRSYS breaks ground on $40m headquarters

AIRSYS Cooling Applied sciences has damaged floor on its new AIRSYS International HQ constructing in…

May 12, 2025

Adobe previews AI generated PowerPoints from raw customer data with ‘Project Slide Wow’

Be a part of our every day and weekly newsletters for the most recent updates…

March 20, 2025

Visa just launched a protocol to secure the AI shopping boom — here’s what it means for merchants

Visa is introducing a brand new safety framework designed to resolve one of many thorniest…

October 14, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.