Sunday, 1 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
AI

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Last updated: March 6, 2025 9:48 pm
Published March 6, 2025
Share
Person holding popcorn as Alibaba unveils Qwen QwQ-32B — a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.
SHARE

The Qwen crew at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI mannequin that demonstrates efficiency rivalling the a lot bigger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Studying (RL) on strong basis fashions.

The Qwen crew have efficiently built-in agent capabilities into the reasoning mannequin, enabling it to assume critically, utilise instruments, and adapt its reasoning primarily based on environmental suggestions.

“Scaling RL has the potential to boost mannequin efficiency past standard pretraining and post-training strategies,” the crew acknowledged. “Current research have demonstrated that RL can considerably enhance the reasoning capabilities of fashions.”

QwQ-32B achieves efficiency corresponding to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testomony to the effectiveness of RL when utilized to strong basis fashions pretrained on in depth world information. This outstanding end result underscores the potential of RL to bridge the hole between mannequin measurement and efficiency.

The mannequin has been evaluated throughout a variety of benchmarks, together with AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to evaluate its mathematical reasoning, coding proficiency, and normal problem-solving capabilities.

The outcomes spotlight QwQ-32B’s efficiency compared to different main fashions, together with DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the unique DeepSeek-R1.

Benchmark outcomes:

  • AIME24: QwQ-32B achieved 79.5, barely behind DeepSeek-R1-6718’s 79.8, however considerably forward of OpenAl-o1-mini’s 63.6 and the distilled fashions.
  • LiveCodeBench: QwQ-32B scored 63.4, once more intently matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled fashions and OpenAl-o1-mini’s 53.8.
  • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled fashions and OpenAl-o1-mini’s 57.5.
  • IFEval: QwQ-32B scored 83.9, very near DeepSeek-R1-6718’s 83.3, and main the distilled fashions and OpenAl-o1-mini’s 59.1.
  • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled fashions and OpenAl-o1-mini’s 49.3.
See also  Anthropic just launched a new platform that lets everyone in your company collaborate on AI — not just the tech team

The Qwen crew’s method concerned a cold-start checkpoint and a multi-stage RL course of pushed by outcome-based rewards. The preliminary stage centered on scaling RL for math and coding duties, utilising accuracy verifiers and code execution servers. The second stage expanded to normal capabilities, incorporating rewards from normal reward fashions and rule-based verifiers.

“We discover that this stage of RL coaching with a small quantity of steps can improve the efficiency of different normal capabilities, reminiscent of instruction following, alignment with human desire, and agent efficiency, with out important efficiency drop in math and coding,” the crew defined.

QwQ-32B is open-weight and accessible on Hugging Face and ModelScope beneath the Apache 2.0 license, and can be accessible by way of Qwen Chat. The Qwen crew views this as an preliminary step in scaling RL to boost reasoning capabilities and goals to additional discover the mixing of brokers with RL for long-horizon reasoning.

“As we work in direction of creating the subsequent technology of Qwen, we’re assured that combining stronger basis fashions with RL powered by scaled computational assets will propel us nearer to reaching Synthetic Normal Intelligence (AGI),” the crew acknowledged.

See additionally: Deepgram Nova-3 Medical: AI speech mannequin cuts healthcare transcription errors

Need to study extra about AI and large knowledge from business leaders? Try AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.

See also  The role of machine learning in enhancing cloud-native container security

Source link

TAGGED: Alibaba, Learning, Qwen, QwQ32B, reinforcement, Scaled, showcase
Share This Article
Twitter Email Copy Link Print
Previous Article TickPick TickPick Acquires Fanimal
Next Article Alessandro Bruno, CTO, and Matthijs Rijlarsdam, CEO, QuantWare Quantware Raises €20M in Series A Funding
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Virtue Drinks Raises £2M in Funding

Virtue Drinks, a London, UK-based clear vitality drink supplier, raised £2M in funding. Backers included…

November 21, 2024

Upskilling is the key to creating AI talent fit for the future

Lauren Birch, Expertise and Expertise Lead, Turing Innovation Catalyst Manchester, discusses the significance of upskilling…

August 13, 2024

Aibidia Raises $28M in Series B Funding

Aibidia, a Helsinki, Finland-based AI-powered fintech firm, raised $28M in Collection B funding. The spherical…

June 4, 2025

Adoption comes at high security cost

The retail trade is among the many leaders in generative AI adoption, however a brand…

September 24, 2025

Iceotope announces the retirement of CEO David Craig

Iceotope has introduced the retirement of David Craig from the place of CEO efficient from…

July 29, 2024

You Might Also Like

ASML's high-NA EUV tools clear the runway for next-gen AI chips
AI

ASML’s high-NA EUV tools clear the runway for next-gen AI chips

By saad
Poor implementation of AI may be behind workforce reduction
AI

Poor implementation of AI may be behind workforce reduction

By saad
Upgrading agentic AI for finance workflows
AI

Upgrading agentic AI for finance workflows

By saad
Goldman Sachs and Deutsche Bank test agentic AI for trade surveillance
AI

Goldman Sachs and Deutsche Bank test agentic AI in trading

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.