Friday, 6 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Samsung benchmarks real productivity of enterprise AI models
AI

Samsung benchmarks real productivity of enterprise AI models

Last updated: September 25, 2025 3:14 pm
Published September 25, 2025
Share
Samsung benchmarks real productivity of enterprise AI models
SHARE

Samsung is overcoming limitations of current benchmarks to raised assess the real-world productiveness of AI fashions in enterprise settings. The brand new system, developed by Samsung Research and named TRUEBench, goals to handle the rising disparity between theoretical AI efficiency and its precise utility within the office.

As companies worldwide speed up their adoption of enormous language fashions (LLMs) to enhance their operations, a problem has emerged: methods to precisely gauge their effectiveness. Many current benchmarks concentrate on educational or common data checks, usually restricted to English and easy query and reply codecs. This has created a niche that leaves enterprises with no dependable technique for evaluating how an AI mannequin will carry out on complicated, multilingual, and context-rich enterprise duties.

Samsung’s TRUEBench, quick for Reliable Actual-world Utilization Analysis Benchmark, has been developed to fill this void. It supplies a complete suite of metrics that assesses LLMs primarily based on eventualities and duties instantly related to real-world company environments. The benchmark attracts upon Samsung’s personal in depth inner enterprise use of AI fashions, guaranteeing the analysis standards are grounded in real office calls for.

The framework evaluates frequent enterprise features reminiscent of creating content material, analysing information, summarising prolonged paperwork, and translating supplies. These are damaged down into 10 distinct classes and 46 sub-categories, offering a granular view of an AI’s productiveness capabilities.

“Samsung Analysis brings deep experience and a aggressive edge via its real-world AI expertise,” stated Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Analysis. “We anticipate TRUEBench to ascertain analysis requirements for productiveness.”

See also  Noma offers security from enterprise AI data to deployment

To sort out the constraints of older benchmarks, TRUEBench is constructed upon a basis of two,485 various check units spanning 12 totally different languages and supporting cross-linguistic eventualities. This multilingual method is important for international firms the place data flows throughout totally different areas. The check supplies themselves mirror the number of office requests, starting from transient directions of simply eight characters to the complicated evaluation of paperwork exceeding 20,000 characters.

Samsung recognised that in an actual enterprise context, a consumer’s full intent isn’t at all times explicitly said of their preliminary immediate. The benchmark is due to this fact designed to evaluate an AI mannequin’s capacity to know and fulfil these implicit enterprise wants, transferring past easy accuracy to a extra nuanced measure of helpfulness and relevance.

To attain this, Samsung Analysis developed a singular collaborative course of between human specialists and AI to create the productiveness scoring standards. Initially, human annotators set up the analysis requirements for a given job. An AI then critiques these requirements, checking for potential errors, inner contradictions, or pointless constraints that may not mirror a practical consumer expectation. Following the AI’s suggestions, the human annotators refine the factors. This iterative loop ensures the ultimate analysis requirements are exact and reflective of a high-quality end result.

This cross-verified course of delivers an automatic analysis system that scores the efficiency of LLMs. By utilizing AI to use these refined standards, the system minimises the subjective bias that may happen with human-only scoring, guaranteeing consistency and reliability throughout all checks. TRUEBench additionally employs a strict scoring mannequin the place an AI mannequin should fulfill each situation related to a check to obtain a passing mark. This all or nothing method for particular person circumstances permits a extra detailed and exacting evaluation of the efficiency of AI fashions throughout totally different enterprise duties.

See also  AI dominated the conversation in 2025, CIOs shift gears in 2026

To spice up transparency and encourage wider adoption, Samsung has made TRUEBench’s information samples and leaderboards publicly accessible on the worldwide open-source platform Hugging Face. This enables builders, researchers, and enterprises to instantly evaluate the productiveness efficiency of as much as 5 totally different AI fashions concurrently. The platform supplies a transparent, at a look overview of how numerous AIs stack up in opposition to one another on sensible duties.

As of writing, listed here are the highest 20 fashions by general rating primarily based on Samsung’s AI benchmark:

Current top 20 models by overall ranking based on Samsung’s AI benchmark that assesses the real-world productivity of AI models in enterprise settings.

The complete revealed information additionally consists of the common size of the AI-generated responses. This enables for a simultaneous comparability of not solely efficiency but additionally effectivity, a key consideration for companies weighing operational prices and pace.

With the launch of TRUEBench, Samsung isn’t merely releasing one other software however is aiming to vary how the business thinks about AI efficiency. By transferring the goalposts from summary data to tangible productiveness, Samsung’s benchmark may play a task in serving to organisations make higher choices about which enterprise AI fashions to combine into their workflows and bridge the hole between an AI’s potential and its confirmed worth.

See additionally: Inside Huawei’s plan to make hundreds of AI chips suppose like one pc

Banner for the AI & Big Data Expo event series.

Wish to study extra about AI and large information from business leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is a part of TechEx and is co-located with different main know-how occasions, click on here for extra data.

See also  Is AI the future of sales? Salesforce's new models could change the game

AI Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars here.

Source link

TAGGED: benchmarks, enterprise, models, Productivity, Real, Samsung
Share This Article
Twitter Email Copy Link Print
Previous Article Qualcomm sees 6G as the link for cloud-to-edge AI Qualcomm sees 6G as the link for cloud-to-edge AI
Next Article ManTech Leverages Zero Trust for Innovation, Compliance Gains ManTech Leverages Zero Trust for Innovation, Compliance Gains
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Record US$13.3B AWS Australia data centre investment

Amazon Net Companies is committing AU$20 billion (US$13.3 billion) over 5 years to broaden its…

June 18, 2025

India Leads In Data Center Capacity Expansion

The Indian knowledge middle sector is rising, pushed by sturdy authorities assist and the calls…

May 20, 2024

Surry approves nuclear-powered data center campus that promises more than 1,300 jobs – Smithfield Times

Surry approves nuclear-powered data center campus that promises more than 1,300 jobs Published 6:48 pm…

February 13, 2024

Plugging the data centre skills gap

Chris Robust, Director at Soben, warns that with no speedy inflow of specialist expertise, the…

May 24, 2025

New electrolyte helps all-solid-state batteries overcome long-standing 5 V stability barrier

Mechanism behind the improved Li+ conductivity in LiCl–Li2TiF6. a–d, Crystal buildings (a), topological evaluation primarily…

November 7, 2025

You Might Also Like

Rowspace Raises $50M to Bring AI for Private Equity Out of the Back Office
AI

Rowspace Raises $50M to Bring AI for Private Equity Out of the Back Office

By saad
High-performance large language models for Europe
Innovations

High-performance large language models for Europe

By saad
Dyna.Ai Just Raised Eight Figures to Fix Finance's Biggest AI Problem
AI

Dyna.Ai Just Raised Eight Figures to Fix Finance’s Biggest AI Problem

By saad
JPMorgan expands AI investment as tech spending nears $20B
AI

JPMorgan expands AI investment as tech spending nears $20B

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.