Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Google DeepMind unveils ‘superhuman’ AI system that excels in fact-checking, saving costs and improving accuracy
AI

Google DeepMind unveils ‘superhuman’ AI system that excels in fact-checking, saving costs and improving accuracy

Last updated: March 30, 2024 1:08 am
Published March 30, 2024
Share
Google DeepMind unveils 'superhuman' AI system that excels in fact-checking, saving costs and improving accuracy
SHARE

Be a part of us in Atlanta on April tenth and discover the panorama of safety workforce. We’ll discover the imaginative and prescient, advantages, and use instances of AI for safety groups. Request an invitation right here.


A brand new research from Google’s DeepMind analysis unit has discovered that a man-made intelligence system can outperform human fact-checkers when evaluating the accuracy of data generated by massive language fashions.

The paper, titled “Long-form factuality in large language models” and revealed on the pre-print server arXiv, introduces a way known as Search-Augmented Factuality Evaluator (SAFE). SAFE makes use of a big language mannequin to interrupt down generated textual content into particular person information, after which makes use of Google Search outcomes to find out the accuracy of every declare.

“SAFE makes use of an LLM to interrupt down a long-form response right into a set of particular person information and to judge the accuracy of every truth utilizing a multi-step reasoning course of comprising sending search queries to Google Search and figuring out whether or not a truth is supported by the search outcomes,” the authors defined.

‘Superhuman’ efficiency sparks debate

The researchers pitted SAFE towards human annotators on a dataset of roughly 16,000 information, discovering that SAFE’s assessments matched the human rankings 72% of the time. Much more notably, in a pattern of 100 disagreements between SAFE and the human raters, SAFE’s judgment was discovered to be right in 76% of instances.

VB Occasion

The AI Influence Tour – Atlanta

Persevering with our tour, we’re headed to Atlanta for the AI Influence Tour cease on April tenth. This unique, invite-only occasion, in partnership with Microsoft, will function discussions on how generative AI is reworking the safety workforce. House is restricted, so request an invitation immediately.

See also  Google Invests in Venture to Build Energy Parks for Data Centers

Request an invitation

Whereas the paper asserts that “LLM brokers can obtain superhuman ranking efficiency,” some specialists are questioning what “superhuman” actually means right here.

On a fast learn I can’t determine a lot in regards to the human topics, nevertheless it seems to be like superhuman means higher than an underpaid crowd employee, reasonably a real human truth checker? That makes the characterization deceptive. (Like saying that 1985 chess software program was superhuman).…

— Gary Marcus (@GaryMarcus) March 28, 2024

Gary Marcus, a widely known AI researcher and frequent critic of overhyped claims, urged on Twitter that on this case, “superhuman” might merely imply “higher than an underpaid crowd employee, reasonably a real human truth checker.”

“That makes the characterization deceptive,” he mentioned. “Like saying that 1985 chess software program was superhuman.”

Marcus raises a sound level. To really show superhuman efficiency, SAFE would should be benchmarked towards skilled human fact-checkers, not simply crowdsourced staff. The particular particulars of the human raters, reminiscent of their {qualifications}, compensation, and fact-checking course of, are essential for correctly contextualizing the outcomes.

Value financial savings and benchmarking prime fashions

One clear benefit of SAFE is value — the researchers discovered that utilizing the AI system was about 20 instances cheaper than human fact-checkers. As the quantity of data generated by language fashions continues to blow up, having a cost-effective and scalable option to confirm claims can be more and more very important.

The DeepMind crew used SAFE to judge the factual accuracy of 13 prime language fashions throughout 4 households (Gemini, GPT, Claude, and PaLM-2) on a brand new benchmark known as LongFact. Their outcomes point out that bigger fashions typically produced fewer factual errors. 

See also  FRVR AI makes game creation available to anyone

Nonetheless, even the best-performing fashions generated a big variety of false claims. This underscores the dangers of over-relying on language fashions that may fluently specific inaccurate data. Automated fact-checking instruments like SAFE might play a key position in mitigating these dangers.

Transparency and human baselines are essential

Whereas the SAFE code and LongFact dataset have been open-sourced on GitHub, permitting different researchers to scrutinize and construct upon the work, extra transparency remains to be wanted across the human baselines used within the research. Understanding the specifics of the crowdworkers’ background and course of is crucial for assessing SAFE’s capabilities in correct context.

Because the tech giants race to develop ever extra highly effective language fashions for purposes starting from search to digital assistants, the flexibility to mechanically fact-check the outputs of those programs might show pivotal. Instruments like SAFE signify an vital step in direction of constructing a brand new layer of belief and accountability.

Nonetheless, it’s essential that the event of such consequential applied sciences occurs within the open, with enter from a broad vary of stakeholders past the partitions of anyone firm. Rigorous, clear benchmarking towards human specialists — not simply crowdworkers — can be important to measure true progress. Solely then can we gauge the real-world affect of automated fact-checking on the combat towards misinformation.



Source link

Contents
‘Superhuman’ efficiency sparks debateValue financial savings and benchmarking prime fashionsTransparency and human baselines are essential
TAGGED: accuracy, Costs, DeepMind, excels, factchecking, Google, Improving, saving, superhuman, System, unveils
Share This Article
Twitter Email Copy Link Print
Previous Article Sense4Med Raises €510K in Funding Sense4Med Raises €510K in Funding
Next Article JLL booth Significant Data Center Capacity Growth in EU Secondary Market
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Robots are coming to the kitchen—what that could mean for society and culture

Credit score: Unsplash/CC0 Public Area Automating meals is in contrast to automating the rest. Meals…

September 1, 2024

Why the Middle East is a hot place for global tech investments

The Center East is pulling in additional consideration from world tech traders than ever. Saudi…

May 22, 2025

Foxconn and Nvidia to build high-tech computing facility in Taiwan

Taiwanese electronics manufacturing big Foxconn has unveiled plans to construct a high-performance computing centre in…

June 10, 2024

Shell Acquires Nigerian Solar Firm in First Africa Power Buy | DCN

(Bloomberg) -- Shell Plc’s renewable energy division acquired Daystar Power, an off-grid services provider in…

February 5, 2024

Seco launches hub to unify edge AI deployment

Seco, an IoT and AI resolution supplier launched the Seco Software Hub, an app market…

July 18, 2025

You Might Also Like

Why most enterprise AI coding pilots underperform (Hint: It's not the model)
AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

By saad
Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
BBVA embeds AI into banking workflows using ChatGPT Enterprise
AI

BBVA embeds AI into banking workflows using ChatGPT Enterprise

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.