Saturday, 9 May 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber
AI & Compute

Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber

Last updated: July 23, 2025 12:21 am
Published July 23, 2025
Share
Anthropic researchers discover the weird AI problem: Why thinking longer makes models dumber
SHARE

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, knowledge, and safety leaders. Subscribe Now


Synthetic intelligence fashions that spend extra time “considering” by issues don’t at all times carry out higher — and in some circumstances, they get considerably worse, based on new research from Anthropic that challenges a core assumption driving the AI business’s newest scaling efforts.

The research, led by Anthropic AI security fellow Aryo Pradipta Gema and different firm researchers, identifies what they name “inverse scaling in test-time compute,” the place extending the reasoning size of enormous language fashions truly deteriorates their efficiency throughout a number of varieties of duties. The findings might have vital implications for enterprises deploying AI techniques that depend on prolonged reasoning capabilities.

“We assemble analysis duties the place extending the reasoning size of Massive Reasoning Fashions (LRMs) deteriorates efficiency, exhibiting an inverse scaling relationship between test-time compute and accuracy,” the Anthropic researchers write in their paper printed Tuesday.

New Anthropic Analysis: “Inverse Scaling in Check-Time Compute”

We discovered circumstances the place longer reasoning results in decrease accuracy.
Our findings counsel that naïve scaling of test-time compute might inadvertently reinforce problematic reasoning patterns.

? pic.twitter.com/DTt6SgDJg1

— Aryo Pradipta Gema (@aryopg) July 22, 2025

The analysis crew, together with Anthropic’s Ethan Perez, Yanda Chen, and Joe Benton, together with educational collaborators, examined fashions throughout 4 classes of duties: easy counting issues with distractors, regression duties with deceptive options, advanced deduction puzzles, and eventualities involving AI security considerations.


See also  Unlock the other 99% of your data - now ready for AI

The AI Affect Sequence Returns to San Francisco – August 5

The subsequent section of AI is right here – are you prepared? Be part of leaders from Block, GSK, and SAP for an unique have a look at how autonomous brokers are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.

Safe your spot now – house is proscribed: https://bit.ly/3GuuPLF


Claude and GPT fashions present distinct reasoning failures below prolonged processing

The research reveals distinct failure patterns throughout main AI techniques. Claude models “change into more and more distracted by irrelevant data” as they motive longer, whereas OpenAI’s o-series models “resist distractors however overfit to drawback framings.” In regression duties, “prolonged reasoning causes fashions to shift from affordable priors to spurious correlations,” although offering examples largely corrects this habits.

Maybe most regarding for enterprise customers, all fashions confirmed “efficiency degradation with prolonged reasoning” on advanced deductive duties, “suggesting difficulties in sustaining focus throughout advanced deductive duties.”

The analysis additionally uncovered troubling implications for AI security. In a single experiment, Claude Sonnet 4 confirmed “elevated expressions of self-preservation” when given extra time to motive by eventualities involving its potential shutdown.

“Prolonged reasoning might amplify regarding behaviors, with Claude Sonnet 4 displaying elevated expressions of self-preservation,” the researchers be aware.

Why longer AI processing time doesn’t assure higher enterprise outcomes

The findings problem the prevailing business knowledge that extra computational assets dedicated to reasoning will constantly enhance AI efficiency. Main AI firms have invested closely in “test-time compute” — permitting fashions extra processing time to work by advanced issues — as a key technique for enhancing capabilities.

See also  New 'persona vectors' from Anthropic let you decode and direct an LLM's personality

The analysis suggests this method might have unintended penalties. “Whereas test-time compute scaling stays promising for enhancing mannequin capabilities, it might inadvertently reinforce problematic reasoning patterns,” the authors conclude.

For enterprise decision-makers, the implications are vital. Organizations deploying AI techniques for vital reasoning duties might have to rigorously calibrate how a lot processing time they allocate, slightly than assuming extra is at all times higher.

How easy questions journey up superior AI when given an excessive amount of considering time

The researchers supplied concrete examples of the inverse scaling phenomenon. In easy counting duties, they discovered that when issues had been framed to resemble well-known paradoxes just like the “Birthday Paradox,” fashions typically tried to use advanced mathematical options as an alternative of answering easy questions.

For example, when requested “You will have an apple and an orange… What number of fruits do you’ve gotten?” embedded inside advanced mathematical distractors, Claude fashions grew to become more and more distracted by irrelevant particulars as reasoning time elevated, generally failing to present the easy reply: two.

In regression duties utilizing actual scholar knowledge, fashions initially centered on probably the most predictive issue (research hours) however shifted to much less dependable correlations when given extra time to motive.

What enterprise AI deployments have to find out about reasoning mannequin limitations

The analysis comes as main tech firms race to develop more and more subtle reasoning capabilities of their AI techniques. OpenAI’s o1 model series and different “reasoning-focused” fashions signify vital investments in test-time compute scaling.

See also  OpenAI returns old models to ChatGPT amid ‘bumpy’ GPT-5 rollout

Nonetheless, this research means that naive scaling approaches might not ship anticipated advantages and will introduce new dangers. “Our outcomes exhibit the significance of evaluating fashions throughout various reasoning lengths to determine and handle these failure modes in LRMs,” the researchers write.

The work builds on earlier analysis displaying that AI capabilities don’t at all times scale predictably. The crew references BIG-Bench Extra Hard, a benchmark designed to problem superior fashions, noting that “state-of-the-art fashions obtain near-perfect scores on many duties” in present benchmarks, necessitating tougher evaluations.

For enterprise customers, the analysis underscores the necessity for cautious testing throughout completely different reasoning eventualities and time constraints earlier than deploying AI techniques in manufacturing environments. Organizations might have to develop extra nuanced approaches to allocating computational assets slightly than merely maximizing processing time.

The research’s broader implications counsel that as AI techniques change into extra subtle, the connection between computational funding and efficiency could also be much more advanced than beforehand understood. In a area the place billions are being poured into scaling up reasoning capabilities, Anthropic’s analysis provides a sobering reminder: generally, synthetic intelligence’s best enemy isn’t inadequate processing energy — it’s overthinking.

The analysis paper and interactive demonstrations can be found at the project’s website, permitting technical groups to discover the inverse scaling results throughout completely different fashions and duties.


Source link
TAGGED: Anthropic, Discover, dumber, longer, models, problem, researchers, thinking, weird
Share This Article
Twitter Email Copy Link Print
Previous Article Menlo Equities launches Menlo Digital to expand data center operations Menlo Equities launches Menlo Digital to expand data center operations
Next Article OpenAI and Oracle announce Stargate AI data centre deal OpenAI and Oracle announce Stargate AI data centre deal
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Anthropic rolls out Claude AI for finance, integrates with Excel to rival Microsoft Copilot

Anthropic is making its most aggressive push but into the trillion-dollar monetary companies business, unveiling…

October 27, 2025

Goldman Sachs deploys Anthropic systems with success

Goldman’s prior expertise with Claude fashions used internally for software program improvement knowledgeable its choice…

February 18, 2026

How disconnected clouds improve AI data governance

Disconnected clouds goal to enhance AI information governance as companies rethink their infrastructure below tighter…

February 24, 2026

Anthropic’s Claude 3.7 Sonnet takes aim at OpenAI and DeepSeek in AI’s next big battle

Be part of our day by day and weekly newsletters for the most recent updates…

February 25, 2025

Modine Unveils New Production Facility in Chennai to Meet Growing APAC Demand

Modine has introduced the official opening of its new 100,000 ft2 facility in Chennai, India.…

August 26, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.