Sunday, 8 Feb 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
AI

Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments

Last updated: January 30, 2024 4:14 am
Published January 30, 2024
Share
Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
SHARE

There have been many advances in vision-language models (VLM) that can match natural language queries to objects in a visual scene. And researchers are experimenting with how these models can be applied to robotics systems, which are still lagging in generalizing their abilities.

A new paper by researchers at Meta AI and New York University introduces an open-knowledge-based framework that brings pre-trained machine learning (ML) models together to create a robotics system that can perform tasks in unseen environments. Called OK-Robot, the framework combines VLMs with movement-planning and object-manipulation models to perform pick-and-drop operations without training.

Robotic systems are usually designed to be deployed in previously seen environments and are poor at generalizing their capabilities beyond locations where they have been trained. This limitation is especially problematic in settings where data is scarce, such as unstructured homes.

There have been impressive advances in individual components needed for robotics systems. VLMs are good at matching language prompts to visual objects. At the same time, robotic skills for navigation and grasping have progressed considerably. However, robotic systems that combine modern vision models with robot-specific primitives still perform poorly. 

“Making progress on this problem requires a careful and nuanced framework that both integrates VLMs and robotics primitives, while being flexible enough to incorporate newer models as they are developed by the VLM and robotics community,” the researchers write in their paper.

OK-Robot modules (source: arxiv)

OK-Robot combines state-of-the-art VLMs with powerful robotics primitives to perform pick-and-drop tasks in unseen environments. The models used in the system are trained on large, publicly available datasets. 

See also  Databricks Data and AI Summit 2024: The biggest innovations

OK-Robot combines three primary subsystems: an open-vocabulary object navigation module, an RGB-D grasping module and a dropping heuristic system. When placed in a new home, OK-Robot requires a manual scan of the interior, which can be captured with an iPhone app that takes a sequence of RGB-D images as the user moves around the building. The system uses the images and the camera pose and positions to create a 3D environment map.

The system processes each image with a vision transformer (ViT) model to extract information about objects. The object and environment information are brought together to create a semantic object memory module.

Given a natural language query for picking an object, the memory module computes the embedding of the prompt and matches it with the object with the closest semantic representation. OK-Robot then uses navigation algorithms to find the best path to the location of the object in a way that provides the robot with room to manipulate the object without causing collisions.

Finally, the robot uses an RGB-D camera, an object segmentation model and a pre-trained grasp model to pick the object. The system uses a similar process to reach the destination and drop the object. This enables the robot to find the most suitable grasp for each object and also be able to handle destination spots that might not be flat.

“From arriving into a completely novel environment to start operating autonomously in it, our system takes under 10 minutes on average to complete the first pick-and-drop task,” the researchers write.

The researchers tested OK-Robot in 10 homes and ran 171 pick-and-drop experiments to evaluate how it performs in novel environments. OK-Robot succeeded in completing full pick-and-drops in 58% of cases. Notably, this is a zero-shot algorithm, which means the models used in the system were not specifically trained for such environments. The researchers also found that by improving the queries, decluttering the space, and excluding adversarial objects, the success rate increases to above 82%.

See also  Oracle plans $40B Nvidia chip deal for AI facility in Texas

OK-Robot is not perfect. It sometimes fails to match the natural language prompt with the right object. Its grasping model fails on some objects, and the robot hardware has limitations. More importantly, its object memory module is frozen after the environment is scanned. Therefore, the robot cannot dynamically adapt to changes in the objects and arrangements.

Nonetheless, the OK-Robot project has some very important findings. First, it shows that current open-vocabulary vision-language models are very good at identifying arbitrary objects in the real world and navigating to them in a zero-shot manner. Also, the findings show that special-purpose robot models pre-trained on large amounts of data can be applied out-of-the-box to approach open-vocabulary grasping in unseen environments. Finally, it shows that with the right tooling and configuration, pre-trained models can be combined to perform zero-shot tasks with no training. OK-Robot can be the beginning of a field of research with plenty of room for improvement.

Source link

TAGGED: Environments, Metas, OKRobot, performs, pickanddrop, unseen, zeroshot
Share This Article
Twitter Email Copy Link Print
Previous Article How to enable Stolen Device Protection on your iOS device How to enable Stolen Device Protection on your iOS device
Next Article Generative AI image What is Generative AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Will AI reshape the data centre industry

The information centre trade is going through seismic change, as elevated concentrate on AI places…

April 8, 2025

Master the grep Command in Linux

For builders and directors pursuing a wide range of use circumstances, the grep command in…

November 1, 2024

Vantage Invests €350M in New Sustainable Milan Data Center

Vantage Information Facilities is deepening its dedication to Italy’s digital infrastructure with the announcement of…

October 17, 2025

Digital divide – Virginia Business

Knowledge middle increase sparks opposition, regulation bids Printed March 28, 2024 by Elizabeth Cooper State…

March 29, 2024

Google bets on non-water-cooled nuclear reactors for data centers

The Nuclear Regulatory Fee authorised building of the Oak Ridge facility in November 2024, marking…

August 24, 2025

You Might Also Like

SuperCool review: Evaluating the reality of autonomous creation
AI

SuperCool review: Evaluating the reality of autonomous creation

By saad
Top 7 best AI penetration testing companies in 2026
AI

Top 7 best AI penetration testing companies in 2026

By saad
Intuit, Uber, and State Farm trial AI agents inside enterprise workflows
AI

Intuit, Uber, and State Farm trial enterprise AI agents

By saad
How separating logic and search boosts AI agent scalability
AI

How separating logic and search boosts AI agent scalability

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.