Monday, 20 Apr 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Samsung benchmarks real productivity of enterprise AI models
AI

Samsung benchmarks real productivity of enterprise AI models

Last updated: September 25, 2025 3:14 pm
Published September 25, 2025
Share
Samsung benchmarks real productivity of enterprise AI models
SHARE

Samsung is overcoming limitations of current benchmarks to raised assess the real-world productiveness of AI fashions in enterprise settings. The brand new system, developed by Samsung Research and named TRUEBench, goals to handle the rising disparity between theoretical AI efficiency and its precise utility within the office.

As companies worldwide speed up their adoption of enormous language fashions (LLMs) to enhance their operations, a problem has emerged: methods to precisely gauge their effectiveness. Many current benchmarks concentrate on educational or common data checks, usually restricted to English and easy query and reply codecs. This has created a niche that leaves enterprises with no dependable technique for evaluating how an AI mannequin will carry out on complicated, multilingual, and context-rich enterprise duties.

Samsung’s TRUEBench, quick for Reliable Actual-world Utilization Analysis Benchmark, has been developed to fill this void. It supplies a complete suite of metrics that assesses LLMs primarily based on eventualities and duties instantly related to real-world company environments. The benchmark attracts upon Samsung’s personal in depth inner enterprise use of AI fashions, guaranteeing the analysis standards are grounded in real office calls for.

The framework evaluates frequent enterprise features reminiscent of creating content material, analysing information, summarising prolonged paperwork, and translating supplies. These are damaged down into 10 distinct classes and 46 sub-categories, offering a granular view of an AI’s productiveness capabilities.

“Samsung Analysis brings deep experience and a aggressive edge via its real-world AI expertise,” stated Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Analysis. “We anticipate TRUEBench to ascertain analysis requirements for productiveness.”

See also  Companies expand AI adoption while keeping control

To sort out the constraints of older benchmarks, TRUEBench is constructed upon a basis of two,485 various check units spanning 12 totally different languages and supporting cross-linguistic eventualities. This multilingual method is important for international firms the place data flows throughout totally different areas. The check supplies themselves mirror the number of office requests, starting from transient directions of simply eight characters to the complicated evaluation of paperwork exceeding 20,000 characters.

Samsung recognised that in an actual enterprise context, a consumer’s full intent isn’t at all times explicitly said of their preliminary immediate. The benchmark is due to this fact designed to evaluate an AI mannequin’s capacity to know and fulfil these implicit enterprise wants, transferring past easy accuracy to a extra nuanced measure of helpfulness and relevance.

To attain this, Samsung Analysis developed a singular collaborative course of between human specialists and AI to create the productiveness scoring standards. Initially, human annotators set up the analysis requirements for a given job. An AI then critiques these requirements, checking for potential errors, inner contradictions, or pointless constraints that may not mirror a practical consumer expectation. Following the AI’s suggestions, the human annotators refine the factors. This iterative loop ensures the ultimate analysis requirements are exact and reflective of a high-quality end result.

This cross-verified course of delivers an automatic analysis system that scores the efficiency of LLMs. By utilizing AI to use these refined standards, the system minimises the subjective bias that may happen with human-only scoring, guaranteeing consistency and reliability throughout all checks. TRUEBench additionally employs a strict scoring mannequin the place an AI mannequin should fulfill each situation related to a check to obtain a passing mark. This all or nothing method for particular person circumstances permits a extra detailed and exacting evaluation of the efficiency of AI fashions throughout totally different enterprise duties.

See also  Rebuilding Alexa: How Amazon is mixing models, agents and browser-use for smarter AI

To spice up transparency and encourage wider adoption, Samsung has made TRUEBench’s information samples and leaderboards publicly accessible on the worldwide open-source platform Hugging Face. This enables builders, researchers, and enterprises to instantly evaluate the productiveness efficiency of as much as 5 totally different AI fashions concurrently. The platform supplies a transparent, at a look overview of how numerous AIs stack up in opposition to one another on sensible duties.

As of writing, listed here are the highest 20 fashions by general rating primarily based on Samsung’s AI benchmark:

Current top 20 models by overall ranking based on Samsung’s AI benchmark that assesses the real-world productivity of AI models in enterprise settings.

The complete revealed information additionally consists of the common size of the AI-generated responses. This enables for a simultaneous comparability of not solely efficiency but additionally effectivity, a key consideration for companies weighing operational prices and pace.

With the launch of TRUEBench, Samsung isn’t merely releasing one other software however is aiming to vary how the business thinks about AI efficiency. By transferring the goalposts from summary data to tangible productiveness, Samsung’s benchmark may play a task in serving to organisations make higher choices about which enterprise AI fashions to combine into their workflows and bridge the hole between an AI’s potential and its confirmed worth.

See additionally: Inside Huawei’s plan to make hundreds of AI chips suppose like one pc

Banner for the AI & Big Data Expo event series.

Wish to study extra about AI and large information from business leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is a part of TechEx and is co-located with different main know-how occasions, click on here for extra data.

See also  Alcatel-Lucent Enterprise targets IoT, industrial networks with private 5G package

AI Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars here.

Source link

TAGGED: benchmarks, enterprise, models, Productivity, Real, Samsung
Share This Article
Twitter Email Copy Link Print
Previous Article Qualcomm sees 6G as the link for cloud-to-edge AI Qualcomm sees 6G as the link for cloud-to-edge AI
Next Article ManTech Leverages Zero Trust for Innovation, Compliance Gains ManTech Leverages Zero Trust for Innovation, Compliance Gains
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Gauging the real impact of AI agents

That creates the first community concern for AI brokers, which is coping with implicit and…

January 27, 2026

The Complexities of Cloud Security and Data Sovereignty – Intel

On this episode of Intel’s InTechnology podcast, host Camille Morhardt engages with Jonas De Troy,…

December 18, 2024

Short memory supply forces Micron to abandon consumer market, prioritize enterprise

Together with that, they take an excellent laborious take a look at how they're utilizing…

December 10, 2025

Intel decides to keep networking business after all

That doesn’t clarify why Intel made the choice to pursue spin-off within the first place.…

December 8, 2025

90% of Firms Struggle to Integrate AI into Tech Stacks, New Study Shows

A brand new research by Tray.io, a pioneer in AI-powered, multi-experience integration Platform as a…

February 28, 2024

You Might Also Like

Kay Firth-Butterfield, formerly WEF: The future of AI, the metaverse and digital transformation
AI

Anthropic Mythos AI Cybersecurity Threat Brings Amodei Back to the White House

By saad
Nasuni expands unstructured data platform for AI-driven enterprise workflows
Design

Nasuni expands unstructured data platform for AI-driven enterprise workflows

By saad
Cadence expands AI and robotics partnerships with Nvidia, Google Cloud
AI

Cadence expands AI and robotic partnerships with Nvidia, Google Cloud

By saad
OpenAI Agents SDK improves governance with sandbox execution
AI

OpenAI Agents SDK improves governance with sandbox execution

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.