Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
AI

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Last updated: March 6, 2025 9:48 pm
Published March 6, 2025
Share
Person holding popcorn as Alibaba unveils Qwen QwQ-32B — a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.
SHARE

The Qwen crew at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI mannequin that demonstrates efficiency rivalling the a lot bigger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Studying (RL) on strong basis fashions.

The Qwen crew have efficiently built-in agent capabilities into the reasoning mannequin, enabling it to assume critically, utilise instruments, and adapt its reasoning primarily based on environmental suggestions.

“Scaling RL has the potential to boost mannequin efficiency past standard pretraining and post-training strategies,” the crew acknowledged. “Current research have demonstrated that RL can considerably enhance the reasoning capabilities of fashions.”

QwQ-32B achieves efficiency corresponding to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testomony to the effectiveness of RL when utilized to strong basis fashions pretrained on in depth world information. This outstanding end result underscores the potential of RL to bridge the hole between mannequin measurement and efficiency.

The mannequin has been evaluated throughout a variety of benchmarks, together with AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to evaluate its mathematical reasoning, coding proficiency, and normal problem-solving capabilities.

The outcomes spotlight QwQ-32B’s efficiency compared to different main fashions, together with DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the unique DeepSeek-R1.

Benchmark outcomes:

  • AIME24: QwQ-32B achieved 79.5, barely behind DeepSeek-R1-6718’s 79.8, however considerably forward of OpenAl-o1-mini’s 63.6 and the distilled fashions.
  • LiveCodeBench: QwQ-32B scored 63.4, once more intently matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled fashions and OpenAl-o1-mini’s 53.8.
  • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled fashions and OpenAl-o1-mini’s 57.5.
  • IFEval: QwQ-32B scored 83.9, very near DeepSeek-R1-6718’s 83.3, and main the distilled fashions and OpenAl-o1-mini’s 59.1.
  • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled fashions and OpenAl-o1-mini’s 49.3.
See also  How few-shot learning with Google’s Prompt Poet can supercharge your LLMs

The Qwen crew’s method concerned a cold-start checkpoint and a multi-stage RL course of pushed by outcome-based rewards. The preliminary stage centered on scaling RL for math and coding duties, utilising accuracy verifiers and code execution servers. The second stage expanded to normal capabilities, incorporating rewards from normal reward fashions and rule-based verifiers.

“We discover that this stage of RL coaching with a small quantity of steps can improve the efficiency of different normal capabilities, reminiscent of instruction following, alignment with human desire, and agent efficiency, with out important efficiency drop in math and coding,” the crew defined.

QwQ-32B is open-weight and accessible on Hugging Face and ModelScope beneath the Apache 2.0 license, and can be accessible by way of Qwen Chat. The Qwen crew views this as an preliminary step in scaling RL to boost reasoning capabilities and goals to additional discover the mixing of brokers with RL for long-horizon reasoning.

“As we work in direction of creating the subsequent technology of Qwen, we’re assured that combining stronger basis fashions with RL powered by scaled computational assets will propel us nearer to reaching Synthetic Normal Intelligence (AGI),” the crew acknowledged.

See additionally: Deepgram Nova-3 Medical: AI speech mannequin cuts healthcare transcription errors

Need to study extra about AI and large knowledge from business leaders? Try AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.

See also  Perplexity launches Sonar API, taking aim at Google and OpenAI with real-time AI search

Source link

TAGGED: Alibaba, Learning, Qwen, QwQ32B, reinforcement, Scaled, showcase
Share This Article
Twitter Email Copy Link Print
Previous Article TickPick TickPick Acquires Fanimal
Next Article Alessandro Bruno, CTO, and Matthijs Rijlarsdam, CEO, QuantWare Quantware Raises €20M in Series A Funding
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

JobGet Acquires Snagajob

JobGet, a Boston, MA-based supplier app-first job platform, acquired Snagajob, a supplier of a platform…

November 18, 2024

Elliptic Labs and Ceva join forces to push AI sensing to the ultra-low power edge

Elliptic Labs and Ceva introduced a collaboration to combine Elliptic Labs’ AI Digital Sensible Sensor…

June 13, 2025

Horizon3.ai Unveils Pentesting Services for Compliance Ahead of PCI DSS v4.0 Rollout

Revolutionizing Supply of Handbook Pentesting for Compliance, World-Class Pentesting Specialists Geared up with NodeZero’s Velocity…

March 11, 2024

Qualcomm’s $2.4B Alphawave deal signals bold data center ambitions

Qualcomm says its Oryon CPU and Hexagon NPU processors are “effectively positioned” to fulfill rising…

June 10, 2025

RoxFit Raises £800K in Funding

RoxFit, a York, UK-based supplier of a hybrid health platform, raised £800K in funding. The…

May 31, 2025

You Might Also Like

Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam
AI

Nous Research just released Nomos 1, an open-source AI that ranks second on the notoriously brutal Putnam math exam

By saad
Enterprise users swap AI pilots for deep integrations
AI

Enterprise users swap AI pilots for deep integrations

By saad
Why most enterprise AI coding pilots underperform (Hint: It's not the model)
AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

By saad
Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.