Saturday, 13 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase
AI

Alibaba Qwen QwQ-32B: Scaled reinforcement learning showcase

Last updated: March 6, 2025 9:48 pm
Published March 6, 2025
Share
Person holding popcorn as Alibaba unveils Qwen QwQ-32B — a 32 billion parameter AI model that demonstrates performance rivalling the much larger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Learning (RL) on robust foundation models.
SHARE

The Qwen crew at Alibaba has unveiled QwQ-32B, a 32 billion parameter AI mannequin that demonstrates efficiency rivalling the a lot bigger DeepSeek-R1. This breakthrough highlights the potential of scaling Reinforcement Studying (RL) on strong basis fashions.

The Qwen crew have efficiently built-in agent capabilities into the reasoning mannequin, enabling it to assume critically, utilise instruments, and adapt its reasoning primarily based on environmental suggestions.

“Scaling RL has the potential to boost mannequin efficiency past standard pretraining and post-training strategies,” the crew acknowledged. “Current research have demonstrated that RL can considerably enhance the reasoning capabilities of fashions.”

QwQ-32B achieves efficiency corresponding to DeepSeek-R1, which boasts 671 billion parameters (with 37 billion activated), a testomony to the effectiveness of RL when utilized to strong basis fashions pretrained on in depth world information. This outstanding end result underscores the potential of RL to bridge the hole between mannequin measurement and efficiency.

The mannequin has been evaluated throughout a variety of benchmarks, together with AIME24, LiveCodeBench, LiveBench, IFEval, and BFCL, designed to evaluate its mathematical reasoning, coding proficiency, and normal problem-solving capabilities.

The outcomes spotlight QwQ-32B’s efficiency compared to different main fashions, together with DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the unique DeepSeek-R1.

Benchmark outcomes:

  • AIME24: QwQ-32B achieved 79.5, barely behind DeepSeek-R1-6718’s 79.8, however considerably forward of OpenAl-o1-mini’s 63.6 and the distilled fashions.
  • LiveCodeBench: QwQ-32B scored 63.4, once more intently matched by DeepSeek-R1-6718’s 65.9, and surpassing the distilled fashions and OpenAl-o1-mini’s 53.8.
  • LiveBench: QwQ-32B achieved 73.1, with DeepSeek-R1-6718 scoring 71.6, and outperforming the distilled fashions and OpenAl-o1-mini’s 57.5.
  • IFEval: QwQ-32B scored 83.9, very near DeepSeek-R1-6718’s 83.3, and main the distilled fashions and OpenAl-o1-mini’s 59.1.
  • BFCL: QwQ-32B achieved 66.4, with DeepSeek-R1-6718 scoring 62.8, demonstrating a lead over the distilled fashions and OpenAl-o1-mini’s 49.3.
See also  Cohere launches new AI models to bridge global language divide

The Qwen crew’s method concerned a cold-start checkpoint and a multi-stage RL course of pushed by outcome-based rewards. The preliminary stage centered on scaling RL for math and coding duties, utilising accuracy verifiers and code execution servers. The second stage expanded to normal capabilities, incorporating rewards from normal reward fashions and rule-based verifiers.

“We discover that this stage of RL coaching with a small quantity of steps can improve the efficiency of different normal capabilities, reminiscent of instruction following, alignment with human desire, and agent efficiency, with out important efficiency drop in math and coding,” the crew defined.

QwQ-32B is open-weight and accessible on Hugging Face and ModelScope beneath the Apache 2.0 license, and can be accessible by way of Qwen Chat. The Qwen crew views this as an preliminary step in scaling RL to boost reasoning capabilities and goals to additional discover the mixing of brokers with RL for long-horizon reasoning.

“As we work in direction of creating the subsequent technology of Qwen, we’re assured that combining stronger basis fashions with RL powered by scaled computational assets will propel us nearer to reaching Synthetic Normal Intelligence (AGI),” the crew acknowledged.

See additionally: Deepgram Nova-3 Medical: AI speech mannequin cuts healthcare transcription errors

Need to study extra about AI and large knowledge from business leaders? Try AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.

See also  Wall Street Ponke Launches with AI Tools, Learning Hub, and Over $300K Raised in Hours

Source link

TAGGED: Alibaba, Learning, Qwen, QwQ32B, reinforcement, Scaled, showcase
Share This Article
Twitter Email Copy Link Print
Previous Article TickPick TickPick Acquires Fanimal
Next Article Alessandro Bruno, CTO, and Matthijs Rijlarsdam, CEO, QuantWare Quantware Raises €20M in Series A Funding
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Digital Realty unveils first data centre in Crete

Digital Realty’s new knowledge heart, HER1, the primary service impartial facility on Crete, uniquely positions…

April 9, 2025

TMRW Raises $1.3M in Pre-Seed Funding

TMRW, a Miami-based Bitcoin startup, raised $1.3m in pre-seed funding spherical. The spherical was led by Maple VC, with participation…

August 25, 2024

EdgeCortix forecasts AI will redefine business in 2024

EdgeCortix, a Japan-based fabless semiconductor company, foresees 2024 as a pivotal year for edge AI.…

January 25, 2024

Celona redefines private 5G with cloud-only AP architecture for industrial AI

Non-public 5G options supplier Celona has launched AerFlex, the primary cloud-controlled, entry level (AP)-only non-public…

August 11, 2025

Counterintuitive’s new chip aims escape the AI ‘twin trap’

AI startup firm, Counterintuitive, has got down to construct “reasoning-native computing,” enabling machines to grasp…

November 2, 2025

You Might Also Like

Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
BBVA embeds AI into banking workflows using ChatGPT Enterprise
AI

BBVA embeds AI into banking workflows using ChatGPT Enterprise

By saad
Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks
AI

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.