Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > That ‘cheap’ open-source AI model is actually burning through your compute budget
AI

That ‘cheap’ open-source AI model is actually burning through your compute budget

Last updated: August 15, 2025 3:11 am
Published August 15, 2025
Share
That 'cheap' open-source AI model is actually burning through your compute budget
SHARE

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now


A complete new study has revealed that open-source synthetic intelligence fashions eat considerably extra computing assets than their closed-source rivals when performing similar duties, probably undermining their price benefits and reshaping how enterprises consider AI deployment methods.

The analysis, performed by AI agency Nous Research, discovered that open-weight fashions use between 1.5 to 4 instances extra tokens — the essential items of AI computation — than closed fashions like these from OpenAI and Anthropic. For easy data questions, the hole widened dramatically, with some open fashions utilizing as much as 10 instances extra tokens.

Measuring Pondering Effectivity in Reasoning Fashions: The Lacking Benchmarkhttps://t.co/b1e1rJx6vZ

We measured token utilization throughout reasoning fashions: open fashions output 1.5-4x extra tokens than closed fashions on similar duties, however with large variance relying on activity kind (as much as… pic.twitter.com/LY1083won8

— Nous Analysis (@NousResearch) August 14, 2025

“Open weight fashions use 1.5–4× extra tokens than closed ones (as much as 10× for easy data questions), making them typically costlier per question regardless of decrease per‑token prices,” the researchers wrote of their report revealed Wednesday.

The findings problem a prevailing assumption within the AI business that open-source fashions supply clear financial benefits over proprietary alternate options. Whereas open-source fashions usually price much less per token to run, the research suggests this benefit will be “simply offset in the event that they require extra tokens to cause a couple of given downside.”


AI Scaling Hits Its Limits

Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be part of our unique salon to find how prime groups are:

  • Turning vitality right into a strategic benefit
  • Architecting environment friendly inference for actual throughput good points
  • Unlocking aggressive ROI with sustainable AI methods
See also  Chinese firms use cloud loophole to access US AI tech

Safe your spot to remain forward: https://bit.ly/4mwGngO


The actual price of AI: Why ‘cheaper’ fashions could break your price range

The analysis examined 19 different AI models throughout three classes of duties: fundamental data questions, mathematical issues, and logic puzzles. The crew measured “token effectivity” — what number of computational items fashions use relative to the complexity of their options—a metric that has acquired little systematic research regardless of its important price implications.

“Token effectivity is a essential metric for a number of sensible causes,” the researchers famous. “Whereas internet hosting open weight fashions could also be cheaper, this price benefit may very well be simply offset in the event that they require extra tokens to cause a couple of given downside.”

Open-source AI fashions use as much as 12 instances extra computational assets than essentially the most environment friendly closed fashions for fundamental data questions. (Credit score: Nous Analysis)

The inefficiency is especially pronounced for Giant Reasoning Fashions (LRMs), which use prolonged “chains of thought” to resolve advanced issues. These fashions, designed to assume via issues step-by-step, can eat 1000’s of tokens pondering easy questions that ought to require minimal computation.

For fundamental data questions like “What’s the capital of Australia?” the research discovered that reasoning fashions spend “a whole bunch of tokens pondering easy data questions” that may very well be answered in a single phrase.

Which AI fashions really ship bang to your buck

The analysis revealed stark variations between mannequin suppliers. OpenAI’s fashions, significantly its o4-mini and newly launched open-source gpt-oss variants, demonstrated distinctive token effectivity, particularly for mathematical issues. The research discovered OpenAI fashions “stand out for excessive token effectivity in math issues,” utilizing as much as 3 times fewer tokens than different business fashions.

See also  OpenAI signs 6GW compute agreement with AMD

Amongst open-source choices, Nvidia’s llama-3.3-nemotron-super-49b-v1 emerged as “essentially the most token environment friendly open weight mannequin throughout all domains,” whereas newer fashions from corporations like Magistral confirmed “exceptionally excessive token utilization” as outliers.

The effectivity hole various considerably by activity kind. Whereas open fashions used roughly twice as many tokens for mathematical and logic issues, the distinction ballooned for easy data questions the place environment friendly reasoning must be pointless.

OpenAI’s newest fashions obtain the bottom prices for easy questions, whereas some open-source alternate options can price considerably extra regardless of decrease per-token pricing. (Credit score: Nous Analysis)

What enterprise leaders have to learn about AI computing prices

The findings have fast implications for enterprise AI adoption, the place computing prices can scale quickly with utilization. Firms evaluating AI fashions usually give attention to accuracy benchmarks and per-token pricing, however could overlook the entire computational necessities for real-world duties.

“The higher token effectivity of closed weight fashions usually compensates for the upper API pricing of these fashions,” the researchers discovered when analyzing whole inference prices.

The research additionally revealed that closed-source mannequin suppliers seem like actively optimizing for effectivity. “Closed weight fashions have been iteratively optimized to make use of fewer tokens to cut back inference price,” whereas open-source fashions have “elevated their token utilization for newer variations, presumably reflecting a precedence towards higher reasoning efficiency.”

The computational overhead varies dramatically between AI suppliers, with some fashions utilizing over 1,000 tokens for inside reasoning on easy duties. (Credit score: Nous Analysis)

How researchers cracked the code on AI effectivity measurement

The analysis crew confronted distinctive challenges in measuring effectivity throughout totally different mannequin architectures. Many closed-source fashions don’t reveal their uncooked reasoning processes, as a substitute offering compressed summaries of their inside computations to stop rivals from copying their strategies.

To deal with this, researchers used completion tokens — the entire computational items billed for every question — as a proxy for reasoning effort. They found that “most up-to-date closed supply fashions is not going to share their uncooked reasoning traces” and as a substitute “use smaller language fashions to transcribe the chain of thought into summaries or compressed representations.”

See also  Inflection AI launches new model for Pi chatbot, nearly matches GPT-4 

The research’s methodology included testing with modified variations of well-known issues to attenuate the affect of memorized options, akin to altering variables in mathematical competitors issues from the American Invitational Mathematics Examination (AIME).

Completely different AI fashions present various relationships between computation and output, with some suppliers compressing reasoning traces whereas others present full particulars. (Credit score: Nous Analysis)

The way forward for AI effectivity: What’s coming subsequent

The researchers counsel that token effectivity ought to change into a main optimization goal alongside accuracy for future mannequin improvement. “A extra densified CoT can even permit for extra environment friendly context utilization and will counter context degradation throughout difficult reasoning duties,” they wrote.

The discharge of OpenAI’s open-source gpt-oss models, which display state-of-the-art effectivity with “freely accessible CoT,” may function a reference level for optimizing different open-source fashions.

The entire analysis dataset and analysis code are available on GitHub, permitting different researchers to validate and prolong the findings. Because the AI business races towards extra highly effective reasoning capabilities, this research means that the true competitors might not be about who can construct the neatest AI — however who can construct essentially the most environment friendly one.

In any case, in a world the place each token counts, essentially the most wasteful fashions could discover themselves priced out of the market, no matter how nicely they will assume.


Source link
TAGGED: Budget, Burning, cheap, compute, Model, opensource
Share This Article
Twitter Email Copy Link Print
Previous Article DE-CIX expands global reach with São Paulo internet exchange DE-CIX expands global reach with São Paulo internet exchange
Next Article OneCrew Raises $7.5M in Series A Funding OneCrew Raises $7.5M in Series A Funding
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Fispoke Receives Investment First Rate Ventures

Fispoke, an Arlington, TX-based WealthTech firm, obtained an funding from First Charge Ventures. The quantity…

June 5, 2024

Tishman Speyer To Develop Its First Data Center

A deliberate knowledge heart from Tishman Speyer will mark two firsts for the New York-based…

May 14, 2024

Vantiq, Huron partner to bring agentic AI to life in hospital command centers

Actual-time clever edge orchestration supplier Vantiq and Huron have partnered to ship superior agentic AI…

February 5, 2025

Liquid Cooling is Powering Smarter Data Center Repatriation

The cloud was as soon as hailed as the final word answer, providing infinite scalability,…

May 29, 2025

SoftBank acquires AI chip designer Graphcore

Graphcore’s product line known as Bow, a household of Intelligence Processing Items (IPUs) particularly engineered…

July 17, 2024

You Might Also Like

Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
BBVA embeds AI into banking workflows using ChatGPT Enterprise
AI

BBVA embeds AI into banking workflows using ChatGPT Enterprise

By saad
Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks
AI

Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.