Thursday, 7 May 2026
Subscribe
logo
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Font ResizerAa
Data Center NewsData Center News
Search
  • AI Compute
  • Infrastructure
  • Power & Cooling
  • Security
  • Colocation
  • Cloud Computing
  • More
    • Sustainability
    • Industry News
    • About Data Center News
    • Terms & Conditions
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI & Compute > Samsung benchmarks real productivity of enterprise AI models
AI & Compute

Samsung benchmarks real productivity of enterprise AI models

Last updated: September 25, 2025 3:14 pm
Published September 25, 2025
Share
Samsung benchmarks real productivity of enterprise AI models
SHARE

Samsung is overcoming limitations of current benchmarks to raised assess the real-world productiveness of AI fashions in enterprise settings. The brand new system, developed by Samsung Research and named TRUEBench, goals to handle the rising disparity between theoretical AI efficiency and its precise utility within the office.

As companies worldwide speed up their adoption of enormous language fashions (LLMs) to enhance their operations, a problem has emerged: methods to precisely gauge their effectiveness. Many current benchmarks concentrate on educational or common data checks, usually restricted to English and easy query and reply codecs. This has created a niche that leaves enterprises with no dependable technique for evaluating how an AI mannequin will carry out on complicated, multilingual, and context-rich enterprise duties.

Samsung’s TRUEBench, quick for Reliable Actual-world Utilization Analysis Benchmark, has been developed to fill this void. It supplies a complete suite of metrics that assesses LLMs primarily based on eventualities and duties instantly related to real-world company environments. The benchmark attracts upon Samsung’s personal in depth inner enterprise use of AI fashions, guaranteeing the analysis standards are grounded in real office calls for.

The framework evaluates frequent enterprise features reminiscent of creating content material, analysing information, summarising prolonged paperwork, and translating supplies. These are damaged down into 10 distinct classes and 46 sub-categories, offering a granular view of an AI’s productiveness capabilities.

“Samsung Analysis brings deep experience and a aggressive edge via its real-world AI expertise,” stated Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Analysis. “We anticipate TRUEBench to ascertain analysis requirements for productiveness.”

See also  Accenture and Anthropic partner to boost enterprise AI integration

To sort out the constraints of older benchmarks, TRUEBench is constructed upon a basis of two,485 various check units spanning 12 totally different languages and supporting cross-linguistic eventualities. This multilingual method is important for international firms the place data flows throughout totally different areas. The check supplies themselves mirror the number of office requests, starting from transient directions of simply eight characters to the complicated evaluation of paperwork exceeding 20,000 characters.

Samsung recognised that in an actual enterprise context, a consumer’s full intent isn’t at all times explicitly said of their preliminary immediate. The benchmark is due to this fact designed to evaluate an AI mannequin’s capacity to know and fulfil these implicit enterprise wants, transferring past easy accuracy to a extra nuanced measure of helpfulness and relevance.

To attain this, Samsung Analysis developed a singular collaborative course of between human specialists and AI to create the productiveness scoring standards. Initially, human annotators set up the analysis requirements for a given job. An AI then critiques these requirements, checking for potential errors, inner contradictions, or pointless constraints that may not mirror a practical consumer expectation. Following the AI’s suggestions, the human annotators refine the factors. This iterative loop ensures the ultimate analysis requirements are exact and reflective of a high-quality end result.

This cross-verified course of delivers an automatic analysis system that scores the efficiency of LLMs. By utilizing AI to use these refined standards, the system minimises the subjective bias that may happen with human-only scoring, guaranteeing consistency and reliability throughout all checks. TRUEBench additionally employs a strict scoring mannequin the place an AI mannequin should fulfill each situation related to a check to obtain a passing mark. This all or nothing method for particular person circumstances permits a extra detailed and exacting evaluation of the efficiency of AI fashions throughout totally different enterprise duties.

See also  Global VC investments rose 5.4% to $368.5B in 2024, but deals fell 17% | NVCA/Pitchbook

To spice up transparency and encourage wider adoption, Samsung has made TRUEBench’s information samples and leaderboards publicly accessible on the worldwide open-source platform Hugging Face. This enables builders, researchers, and enterprises to instantly evaluate the productiveness efficiency of as much as 5 totally different AI fashions concurrently. The platform supplies a transparent, at a look overview of how numerous AIs stack up in opposition to one another on sensible duties.

As of writing, listed here are the highest 20 fashions by general rating primarily based on Samsung’s AI benchmark:

Current top 20 models by overall ranking based on Samsung’s AI benchmark that assesses the real-world productivity of AI models in enterprise settings.

The complete revealed information additionally consists of the common size of the AI-generated responses. This enables for a simultaneous comparability of not solely efficiency but additionally effectivity, a key consideration for companies weighing operational prices and pace.

With the launch of TRUEBench, Samsung isn’t merely releasing one other software however is aiming to vary how the business thinks about AI efficiency. By transferring the goalposts from summary data to tangible productiveness, Samsung’s benchmark may play a task in serving to organisations make higher choices about which enterprise AI fashions to combine into their workflows and bridge the hole between an AI’s potential and its confirmed worth.

See additionally: Inside Huawei’s plan to make hundreds of AI chips suppose like one pc

Banner for the AI & Big Data Expo event series.

Wish to study extra about AI and large information from business leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is a part of TechEx and is co-located with different main know-how occasions, click on here for extra data.

See also  Ontology is the real guardrail: How to stop AI agents from misunderstanding your business

AI Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars here.

Source link

TAGGED: benchmarks, enterprise, models, Productivity, Real, Samsung
Share This Article
Twitter Email Copy Link Print
Previous Article Alibaba brings Nvidia’s AI robotics tools to its cloud Alibaba brings Nvidia’s AI robotics tools to its cloud
Next Article FuriosaAI Challenges GPU Market with New Server Line FuriosaAI Challenges GPU Market with New Server Line
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Chinese AI Models Power 175,000 Unprotected Systems as Western Labs Pull Back

As a result of Western AI labs gained’t—or can’t—anymore. As OpenAI, Anthropic, and Google face mounting stress…

February 9, 2026

Former Google, Meta leaders launch Palona AI, bringing personalized, emotive customer agents to non-techie enterprises

Be a part of our every day and weekly newsletters for the newest updates and…

January 31, 2025

Zelim saves lives at sea with Pulsant

Pulsant has been chosen by Edinburgh-based, maritime search and rescue innovator Zelim, as its digital…

May 2, 2025

How big U.S. bank BNY manages armies of AI agents

Be part of our every day and weekly newsletters for the newest updates and unique…

February 26, 2025

Thinking Machines challenges OpenAI's AI scaling strategy: 'First superintelligence will be a superhuman learner'

Whereas the world's main synthetic intelligence corporations race to construct ever-larger fashions, betting billions that…

October 25, 2025

You Might Also Like

STL launches Neuralis data centre connectivity suite in the U.S.
AI & Compute

STL launches Neuralis data centre connectivity suite in the U.S.

By saad
What is optical interconnect and why Lightelligence's $10B debut says it matters for AI
AI & Compute

What is optical interconnect and why Lightelligence’s $10B debut says it matters for AI

By saad
IBM launches AI platform Bob to regulate SDLC costs
AI & Compute

IBM launches AI platform Bob to regulate SDLC costs

By saad
The evolution of encoders: From simple models to multimodal AI
AI & Compute

The evolution of encoders: From simple models to multimodal AI

By saad

About Us

Data Center News is your dedicated source for data center infrastructure, AI compute, cloud, and industry news.

Top Categories

  • AI & Compute
  • Cloud Computing
  • Power & Cooling
  • Colocation
  • Security
  • Infrastructure
  • Sustainability
  • Industry News

Useful Links

  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

Find Us on Socials

© 2026 Data Center News. All Rights Reserved.

© 2026 Data Center News. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.