Thursday, 30 Apr 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost
AI & Compute

Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost

Last updated: January 21, 2025 9:35 am
Published January 21, 2025
Share
Open-source DeepSeek-R1 uses pure reinforcement learning to match OpenAI o1 — at 95% less cost
SHARE

Be part of our every day and weekly newsletters for the newest updates and unique content material on industry-leading AI protection. Study Extra


Chinese language AI startup DeepSeek, recognized for difficult main AI distributors with open-source applied sciences, simply dropped one other bombshell: a brand new open reasoning LLM referred to as DeepSeek-R1.

Primarily based on the lately launched DeepSeek V3 mixture-of-experts mannequin, DeepSeek-R1 matches the efficiency of o1, OpenAI’s frontier reasoning LLM, throughout math, coding and reasoning duties. One of the best half? It does this at a way more tempting price, proving to be 90-95% extra inexpensive than the latter.

The discharge marks a serious leap ahead within the open-source area. It showcases that open fashions are additional closing the hole with closed business fashions within the race to synthetic normal intelligence (AGI). To indicate the prowess of its work, DeepSeek additionally used R1 to distill six Llama and Qwen fashions, taking their efficiency to new ranges. In a single case, the distilled model of Qwen-1.5B outperformed a lot greater fashions, GPT-4o and Claude 3.5 Sonnet, in choose math benchmarks.

These distilled fashions, together with the main R1, have been open-sourced and can be found on Hugging Face under an MIT license.

What does DeepSeek-R1 convey to the desk?

The main target is sharpening on synthetic normal intelligence (AGI), a degree of AI that may carry out mental duties like people. Numerous groups are doubling down on enhancing fashions’ reasoning capabilities. OpenAI made the primary notable transfer within the area with its o1 mannequin, which makes use of a chain-of-thought reasoning course of to sort out an issue. By RL (reinforcement studying, or reward-driven optimization), o1 learns to hone its chain of thought and refine the methods it makes use of — finally studying to acknowledge and proper its errors, or strive new approaches when the present ones aren’t working. 

See also  Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks

Now, persevering with the work on this path, DeepSeek has launched DeepSeek-R1, which makes use of a mixture of RL and supervised fine-tuning to deal with complicated reasoning duties and match the efficiency of o1. 

When examined, DeepSeek-R1 scored 79.8% on AIME 2024 arithmetic checks and 97.3% on MATH-500. It additionally achieved a 2,029 ranking on Codeforces — higher than 96.3% of human programmers. In distinction, o1-1217 scored 79.2%, 96.4% and 96.6% respectively on these benchmarks. 

It additionally demonstrated robust normal data, with 90.8% accuracy on MMLU, simply behind o1’s 91.8%. 

Efficiency of DeepSeek-R1 vs OpenAI o1 and o1-mini

The coaching pipeline

DeepSeek-R1’s reasoning efficiency marks an enormous win for the Chinese language startup within the US-dominated AI house, particularly as the whole work is open-source, together with how the corporate skilled the entire thing. 

Nevertheless, the work isn’t as easy because it sounds.

Based on the paper describing the analysis, DeepSeek-R1 was developed as an enhanced model of DeepSeek-R1-Zero — a breakthrough mannequin skilled solely from reinforcement studying. 

https://twitter.com/DrJimFan/standing/1881353126210687089

The corporate first used DeepSeek-V3-base as the bottom mannequin, growing its reasoning capabilities with out using supervised information, basically focusing solely on its self-evolution via a pure RL-based trial-and-error course of. Developed intrinsically from the work, this means ensures the mannequin can resolve more and more complicated reasoning duties by leveraging prolonged test-time computation to discover and refine its thought processes in better depth.

“Throughout coaching, DeepSeek-R1-Zero naturally emerged with quite a few highly effective and fascinating reasoning behaviors,” the researchers observe within the paper. “After 1000’s of RL steps, DeepSeek-R1-Zero displays tremendous efficiency on reasoning benchmarks. For example, the move@1 rating on AIME 2024 will increase from 15.6% to 71.0%, and with majority voting, the rating additional improves to 86.7%, matching the efficiency of OpenAI-o1-0912.”

See also  Gartner: GPT-5 is here, but the infrastructure to support true agentic AI isn’t (yet)

Nevertheless, regardless of exhibiting improved efficiency, together with behaviors like reflection and exploration of alternate options, the preliminary mannequin did present some issues, together with poor readability and language mixing. To repair this, the corporate constructed on the work executed for R1-Zero, utilizing a multi-stage method combining each supervised studying and reinforcement studying, and thus got here up with the improved R1 mannequin.

“Particularly, we start by gathering 1000’s of cold-start information to fine-tune the DeepSeek-V3-Base mannequin,” the researchers defined. “Following this, we carry out reasoning-oriented RL like DeepSeek-R1- Zero. Upon nearing convergence within the RL course of, we create new SFT information via rejection sampling on the RL checkpoint, mixed with supervised information from DeepSeek-V3 in domains similar to writing, factual QA, and self-cognition, after which retrain the DeepSeek-V3-Base mannequin. After fine-tuning with the brand new information, the checkpoint undergoes a further RL course of, taking into consideration prompts from all situations. After these steps, we obtained a checkpoint known as DeepSeek-R1, which achieves efficiency on par with OpenAI-o1-1217.”

Way more inexpensive than o1

Along with enhanced efficiency that just about matches OpenAI’s o1 throughout benchmarks, the brand new DeepSeek-R1 can also be very inexpensive. Particularly, the place OpenAI o1 prices $15 per million enter tokens and $60 per million output tokens, DeepSeek Reasoner, which is predicated on the R1 mannequin, costs $0.55 per million enter and $2.19 per million output tokens. 

https://twitter.com/EMostaque/standing/1881310721746804810

The mannequin could be examined as “DeepThink” on the DeepSeek chat platform, which is analogous to ChatGPT. customers can entry the mannequin weights and code repository by way of Hugging Face, below an MIT license, or can go along with the API for direct integration.

See also  How Moonshot AI beat GPT-5 & Claude at a fraction of the cost

Source link
TAGGED: Cost, DeepSeekR1, Learning, match, OpenAI, opensource, Pure, reinforcement
Share This Article
Twitter Email Copy Link Print
Previous Article Linklaters advises Brookfield-owned data centre operator Data4 on its €3.3bn debt raise Linklaters advises Brookfield-owned data centre operator Data4 on its €3.3bn debt raise
Next Article Europe unites to launch its first sovereign edge cloud Europe unites to launch its first sovereign edge cloud
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

This new AI technique creates ‘digital twin’ consumers, and it could kill the traditional survey industry

A brand new research paper quietly revealed final week outlines a breakthrough methodology that enables…

October 13, 2025

It’s Qwen’s summer: Qwen3-235B-A22B-Thinking-2507 tops charts

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues…

July 27, 2025

US Plans AI Chip Curbs on Malaysia, Thailand Over China Concerns

(Bloomberg) -- President Donald Trump’s administration plans to limit shipments of AI chips from the…

July 12, 2025

The role of hyperparameters in fine-tuning AI models

You’ve bought an awesome concept for an AI-based utility. Consider fine-tuning like instructing a pre-trained…

January 11, 2025

JLL bolsters Data Centres team with EMEA appointments

Evaluation by JLL estimates that 742MW of hyperscale self-build is at present beneath development in…

January 17, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.