Monday, 9 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic
AI

OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic

Last updated: August 23, 2025 4:31 am
Published August 23, 2025
Share
OpenCUA’s open source computer-use agents rival proprietary models from OpenAI and Anthropic
SHARE

Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now


A brand new framework from researchers at The University of Hong Kong (HKU) and collaborating establishments offers an open supply basis for creating sturdy AI brokers that may function computer systems. The framework, referred to as OpenCUA, contains the instruments, information, and recipes for scaling the event of computer-use brokers (CUAs).

Fashions educated utilizing this framework carry out strongly on CUA benchmarks, outperforming current open supply fashions and competing carefully with closed brokers from main AI labs like OpenAI and Anthropic.

The problem of constructing computer-use brokers

Pc-use brokers are designed to autonomously full duties on a pc, from navigating web sites to working advanced software program. They will additionally assist automate workflows within the enterprise. Nevertheless, probably the most succesful CUA methods are proprietary, with essential particulars about their coaching information, architectures, and improvement processes saved personal.

“As the shortage of transparency limits technical developments and raises security considerations, the analysis group wants actually open CUA frameworks to check their capabilities, limitations, and dangers,” the researchers state in their paper.


AI Scaling Hits Its Limits

Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how prime groups are:

  • Turning power right into a strategic benefit
  • Architecting environment friendly inference for actual throughput positive aspects
  • Unlocking aggressive ROI with sustainable AI methods

Safe your spot to remain forward: https://bit.ly/4mwGngO


On the identical time, open supply efforts face their very own set of hurdles. There was no scalable infrastructure for amassing the varied, large-scale information wanted to coach these brokers. Current open supply datasets for graphical person interfaces (GUIs) have restricted information, and plenty of analysis initiatives present inadequate element about their strategies, making it tough for others to copy their work.

See also  Anthropic unveils new Claude AI models and ‘computer control’

Based on the paper, “These limitations collectively hinder advances in general-purpose CUAs and limit a significant exploration of their scalability, generalizability, and potential studying approaches.”

Introducing OpenCUA

OpenCUA framework Supply: XLANG Lab at HKU

OpenCUA is an open supply framework designed to deal with these challenges by scaling each the information assortment and the fashions themselves. At its core is the AgentNet Device for recording human demonstrations of laptop duties on totally different working methods.

The device streamlines information assortment by operating within the background on an annotator’s private laptop, capturing display movies, mouse and keyboard inputs, and the underlying accessibility tree, which offers structured details about on-screen parts. This uncooked information is then processed into “state-action trajectories,” pairing a screenshot of the pc (the state) with the person’s corresponding motion (a click on, key press, and so forth.). Annotators can then assessment, edit, and submit these demonstrations.

AgentNet device Supply: XLang Lab at HKU

Utilizing this device, the researchers collected the AgentNet dataset, which incorporates over 22,600 process demonstrations throughout Home windows, macOS, and Ubuntu, spanning greater than 200 purposes and web sites. “This dataset authentically captures the complexity of human behaviors and environmental dynamics from customers’ private computing environments,” the paper notes.

Recognizing that screen-recording instruments increase important information privateness considerations for enterprises, the researchers designed the AgentNet Device with safety in thoughts. Xinyuan Wang, co-author of the paper and PhD pupil at HKU, defined that they carried out a multi-layer privateness safety framework. “First, annotators themselves can absolutely observe the information they generate… earlier than deciding whether or not to submit it,” he informed VentureBeat. The information then undergoes handbook verification for privateness points and automatic scanning by a big mannequin to detect any remaining delicate content material earlier than launch. “This layered course of ensures enterprise-grade robustness for environments dealing with delicate buyer or monetary information,” Wang added.

To speed up analysis, the crew additionally curated AgentNetBench, an offline benchmark that gives a number of right actions for every step, providing a extra environment friendly technique to measure an agent’s efficiency.

See also  Gauging the real impact of AI agents

A brand new recipe for coaching brokers

The OpenCUA framework introduces a novel pipeline for processing information and coaching computer-use brokers. Step one converts the uncooked human demonstrations into clear state-action pairs appropriate for coaching vision-language fashions (VLMs). Nevertheless, the researchers discovered that merely coaching fashions on these pairs yields restricted efficiency positive aspects, even with massive quantities of information.

OpenCUA chain-of-thought pipeline Supply: XLang Lab at HKU

The important thing perception was to enhance these trajectories with chain-of-thought (CoT) reasoning. This course of generates an in depth “inside monologue” for every motion, which incorporates planning, reminiscence, and reflection. This structured reasoning is organized into three ranges: a high-level remark of the display, reflective ideas that analyze the scenario and plan the subsequent steps, and at last, the concise, executable motion. This method helps the agent develop a deeper understanding of the duties.

“We discover pure language reasoning essential for generalizable computer-use basis fashions, serving to CUAs internalize cognitive capabilities,” the researchers write.

This information synthesis pipeline is a basic framework that may be tailored by corporations to coach brokers on their very own distinctive inside instruments. Based on Wang, an enterprise can document demonstrations of its proprietary workflows and use the identical “reflector” and “generator” pipeline to create the required coaching information. “This enables them to bootstrap a high-performing agent tailor-made to their inside instruments without having to handcraft reasoning traces manually,” he defined.

Placing OpenCUA to the take a look at

The researchers utilized the OpenCUA framework to coach a spread of open supply VLMs, together with variants of Qwen and Kimi-VL, with parameter sizes from 3 billion to 32 billion. The fashions had been evaluated on a collection of on-line and offline benchmarks that take a look at their capacity to carry out duties and perceive GUIs.

See also  Apple secures 'observer' seat on OpenAI board

The 32-billion-parameter mannequin, OpenCUA-32B, established a brand new state-of-the-art success charge amongst open supply fashions on the OSWorld-Verified benchmark. It additionally surpassed OpenAI’s GPT-4o-based CUA and considerably closed the efficiency hole with Anthropic’s main proprietary fashions.

OpenCUA reveals large enchancment over base fashions (left) whereas competing with main CUA fashions (proper) Supply: XLANG Lab at HKU

For enterprise builders and product leaders, the analysis gives a number of key findings. The OpenCUA methodology is broadly relevant, enhancing efficiency on fashions with totally different architectures (each dense and mixture-of-experts) and sizes. The educated brokers additionally present sturdy generalization, performing properly throughout a various vary of duties and working methods.

Based on Wang, the framework is especially fitted to automating repetitive, labor-intensive enterprise workflows. “For instance, within the AgentNet dataset, we already seize just a few demonstrations of launching EC2 cases on Amazon AWS and configuring annotation parameters on MTurk,” he informed VentureBeat. “These duties contain many sequential steps however observe repeatable patterns.”

Nevertheless, Wang famous that bridging the hole to dwell deployment requires addressing key challenges round security and reliability. “The largest problem in actual deployment is security and reliability: the agent should keep away from errors that might inadvertently alter system settings or set off dangerous uncomfortable side effects past the supposed process,” he stated.

The researchers have launched the code, dataset, and weights for his or her fashions.

As open supply brokers constructed on frameworks like OpenCUA turn out to be extra succesful, they may basically evolve the connection between data employees and their computer systems. Wang envisions a future the place proficiency in advanced software program turns into much less necessary than the flexibility to obviously articulate objectives to an AI agent.

He described two major modes of labor: “offline automation, the place the agent leverages its broader software program data to pursue a process end-to-end,” and “on-line collaboration, the place the agent responds in real-time and works facet by facet with the human, very like a colleague.” Principally, the people will present the strategic “what,” whereas more and more subtle AI brokers deal with the operational “how.”


Source link
TAGGED: agents, Anthropic, computeruse, models, Open, OpenAI, OpenCUAs, Proprietary, rival, source
Share This Article
Twitter Email Copy Link Print
Previous Article Nvidia Spectrum-X Nvidia turns to software to speed up its data center networking hardware for AI
Next Article John Shingler<br /> - Yondr Group - John Shingler – Yondr Group –
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Microsoft needs to win back trust

The world’s largest tech firm has a safety drawback. A sequence of high-profile safety incidents…

April 25, 2024

Cognizant adds multi-agent functionality to AI application platform

Be part of our each day and weekly newsletters for the most recent updates and…

October 16, 2024

Tencent to Boost Mideast Cloud Investments Amid Regional AI Push | Data Center Knowledge

(Bloomberg) -- Tencent Holdings is seeking to develop its cloud enterprise in Saudi Arabia and the…

March 22, 2024

Global Data Center M&A Deals Surge, Approaching Record Levels

Following somewhat slowdown in 2023, knowledge heart-focused M&A transactions globally are anticipated to succeed in…

August 26, 2024

BSI launches ‘Mark of Trust’ scheme for data centres

BSI has launched a brand new ‘Mark of Belief’ scheme designed to assist knowledge centre…

January 20, 2026

You Might Also Like

SuperCool review: Evaluating the reality of autonomous creation
AI

SuperCool review: Evaluating the reality of autonomous creation

By saad
Top 7 best AI penetration testing companies in 2026
AI

Top 7 best AI penetration testing companies in 2026

By saad
Intuit, Uber, and State Farm trial AI agents inside enterprise workflows
AI

Intuit, Uber, and State Farm trial enterprise AI agents

By saad
How separating logic and search boosts AI agent scalability
AI

How separating logic and search boosts AI agent scalability

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.