Friday, 27 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Samsung benchmarks real productivity of enterprise AI models
AI

Samsung benchmarks real productivity of enterprise AI models

Last updated: September 25, 2025 3:14 pm
Published September 25, 2025
Share
Samsung benchmarks real productivity of enterprise AI models
SHARE

Samsung is overcoming limitations of current benchmarks to raised assess the real-world productiveness of AI fashions in enterprise settings. The brand new system, developed by Samsung Research and named TRUEBench, goals to handle the rising disparity between theoretical AI efficiency and its precise utility within the office.

As companies worldwide speed up their adoption of enormous language fashions (LLMs) to enhance their operations, a problem has emerged: methods to precisely gauge their effectiveness. Many current benchmarks concentrate on educational or common data checks, usually restricted to English and easy query and reply codecs. This has created a niche that leaves enterprises with no dependable technique for evaluating how an AI mannequin will carry out on complicated, multilingual, and context-rich enterprise duties.

Samsung’s TRUEBench, quick for Reliable Actual-world Utilization Analysis Benchmark, has been developed to fill this void. It supplies a complete suite of metrics that assesses LLMs primarily based on eventualities and duties instantly related to real-world company environments. The benchmark attracts upon Samsung’s personal in depth inner enterprise use of AI fashions, guaranteeing the analysis standards are grounded in real office calls for.

The framework evaluates frequent enterprise features reminiscent of creating content material, analysing information, summarising prolonged paperwork, and translating supplies. These are damaged down into 10 distinct classes and 46 sub-categories, offering a granular view of an AI’s productiveness capabilities.

“Samsung Analysis brings deep experience and a aggressive edge via its real-world AI expertise,” stated Paul (Kyungwhoon) Cheun, CTO of the DX Division at Samsung Electronics and Head of Samsung Analysis. “We anticipate TRUEBench to ascertain analysis requirements for productiveness.”

See also  Your AI models are failing in production—Here's how to fix model selection

To sort out the constraints of older benchmarks, TRUEBench is constructed upon a basis of two,485 various check units spanning 12 totally different languages and supporting cross-linguistic eventualities. This multilingual method is important for international firms the place data flows throughout totally different areas. The check supplies themselves mirror the number of office requests, starting from transient directions of simply eight characters to the complicated evaluation of paperwork exceeding 20,000 characters.

Samsung recognised that in an actual enterprise context, a consumer’s full intent isn’t at all times explicitly said of their preliminary immediate. The benchmark is due to this fact designed to evaluate an AI mannequin’s capacity to know and fulfil these implicit enterprise wants, transferring past easy accuracy to a extra nuanced measure of helpfulness and relevance.

To attain this, Samsung Analysis developed a singular collaborative course of between human specialists and AI to create the productiveness scoring standards. Initially, human annotators set up the analysis requirements for a given job. An AI then critiques these requirements, checking for potential errors, inner contradictions, or pointless constraints that may not mirror a practical consumer expectation. Following the AI’s suggestions, the human annotators refine the factors. This iterative loop ensures the ultimate analysis requirements are exact and reflective of a high-quality end result.

This cross-verified course of delivers an automatic analysis system that scores the efficiency of LLMs. By utilizing AI to use these refined standards, the system minimises the subjective bias that may happen with human-only scoring, guaranteeing consistency and reliability throughout all checks. TRUEBench additionally employs a strict scoring mannequin the place an AI mannequin should fulfill each situation related to a check to obtain a passing mark. This all or nothing method for particular person circumstances permits a extra detailed and exacting evaluation of the efficiency of AI fashions throughout totally different enterprise duties.

See also  Red Hat Enterprise Linux 9.5 gains security, networking upgrades

To spice up transparency and encourage wider adoption, Samsung has made TRUEBench’s information samples and leaderboards publicly accessible on the worldwide open-source platform Hugging Face. This enables builders, researchers, and enterprises to instantly evaluate the productiveness efficiency of as much as 5 totally different AI fashions concurrently. The platform supplies a transparent, at a look overview of how numerous AIs stack up in opposition to one another on sensible duties.

As of writing, listed here are the highest 20 fashions by general rating primarily based on Samsung’s AI benchmark:

Current top 20 models by overall ranking based on Samsung’s AI benchmark that assesses the real-world productivity of AI models in enterprise settings.

The complete revealed information additionally consists of the common size of the AI-generated responses. This enables for a simultaneous comparability of not solely efficiency but additionally effectivity, a key consideration for companies weighing operational prices and pace.

With the launch of TRUEBench, Samsung isn’t merely releasing one other software however is aiming to vary how the business thinks about AI efficiency. By transferring the goalposts from summary data to tangible productiveness, Samsung’s benchmark may play a task in serving to organisations make higher choices about which enterprise AI fashions to combine into their workflows and bridge the hole between an AI’s potential and its confirmed worth.

See additionally: Inside Huawei’s plan to make hundreds of AI chips suppose like one pc

Banner for the AI & Big Data Expo event series.

Wish to study extra about AI and large information from business leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is a part of TechEx and is co-located with different main know-how occasions, click on here for extra data.

See also  Top 5 enterprise tech priorities for 2026

AI Information is powered by TechForge Media. Discover different upcoming enterprise know-how occasions and webinars here.

Source link

TAGGED: benchmarks, enterprise, models, Productivity, Real, Samsung
Share This Article
Twitter Email Copy Link Print
Previous Article Qualcomm sees 6G as the link for cloud-to-edge AI Qualcomm sees 6G as the link for cloud-to-edge AI
Next Article ManTech Leverages Zero Trust for Innovation, Compliance Gains ManTech Leverages Zero Trust for Innovation, Compliance Gains
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Nvidia launches research center to accelerate quantum computing breakthrough

The brand new analysis middle goals to deal with quantum computing’s most vital challenges, together…

March 23, 2025

Centerfield Acquires Brainjolt

Centerfield, a Los Angeles, CA-based expertise service supplier for digital buyer acquisition, acquired Brainjolt, a social…

November 22, 2024

Weaving reality or warping it? The personalization trap in AI systems

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues…

July 21, 2025

The high- and low-level context behind Nvidia CEO Jensen Huang’s GTC 2025 keynote | Dion Harris interview

Jensen Huang, CEO of Nvidia, hit loads of excessive ideas and low-level tech communicate at…

March 24, 2025

Enfabrica looks to accelerate GPU communication

“The design of in the present day’s supercomputers is just not very fault tolerant, and…

September 23, 2024

You Might Also Like

RPA still matters, but AI is changing how automation works
AI

RPA matters, but AI changes how automation works

By saad
Oracle introduces “agentic cloud apps” into enterprise workflows
Cloud Computing

Oracle introduces “agentic cloud apps” into enterprise workflows

By saad
Family offices turn to AI for financial data insights
AI

Family offices turn to AI for financial data insights

By saad
AI agents enter banking roles at Bank of America
AI

AI agents enter banking roles at Bank of America

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.