Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
AI

Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments

Last updated: January 30, 2024 4:14 am
Published January 30, 2024
Share
Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
SHARE

There have been many advances in vision-language models (VLM) that can match natural language queries to objects in a visual scene. And researchers are experimenting with how these models can be applied to robotics systems, which are still lagging in generalizing their abilities.

A new paper by researchers at Meta AI and New York University introduces an open-knowledge-based framework that brings pre-trained machine learning (ML) models together to create a robotics system that can perform tasks in unseen environments. Called OK-Robot, the framework combines VLMs with movement-planning and object-manipulation models to perform pick-and-drop operations without training.

Robotic systems are usually designed to be deployed in previously seen environments and are poor at generalizing their capabilities beyond locations where they have been trained. This limitation is especially problematic in settings where data is scarce, such as unstructured homes.

There have been impressive advances in individual components needed for robotics systems. VLMs are good at matching language prompts to visual objects. At the same time, robotic skills for navigation and grasping have progressed considerably. However, robotic systems that combine modern vision models with robot-specific primitives still perform poorly. 

“Making progress on this problem requires a careful and nuanced framework that both integrates VLMs and robotics primitives, while being flexible enough to incorporate newer models as they are developed by the VLM and robotics community,” the researchers write in their paper.

OK-Robot modules (source: arxiv)

OK-Robot combines state-of-the-art VLMs with powerful robotics primitives to perform pick-and-drop tasks in unseen environments. The models used in the system are trained on large, publicly available datasets. 

See also  CISO dodges bullet protecting $8.8 trillion from shadow AI

OK-Robot combines three primary subsystems: an open-vocabulary object navigation module, an RGB-D grasping module and a dropping heuristic system. When placed in a new home, OK-Robot requires a manual scan of the interior, which can be captured with an iPhone app that takes a sequence of RGB-D images as the user moves around the building. The system uses the images and the camera pose and positions to create a 3D environment map.

The system processes each image with a vision transformer (ViT) model to extract information about objects. The object and environment information are brought together to create a semantic object memory module.

Given a natural language query for picking an object, the memory module computes the embedding of the prompt and matches it with the object with the closest semantic representation. OK-Robot then uses navigation algorithms to find the best path to the location of the object in a way that provides the robot with room to manipulate the object without causing collisions.

Finally, the robot uses an RGB-D camera, an object segmentation model and a pre-trained grasp model to pick the object. The system uses a similar process to reach the destination and drop the object. This enables the robot to find the most suitable grasp for each object and also be able to handle destination spots that might not be flat.

“From arriving into a completely novel environment to start operating autonomously in it, our system takes under 10 minutes on average to complete the first pick-and-drop task,” the researchers write.

The researchers tested OK-Robot in 10 homes and ran 171 pick-and-drop experiments to evaluate how it performs in novel environments. OK-Robot succeeded in completing full pick-and-drops in 58% of cases. Notably, this is a zero-shot algorithm, which means the models used in the system were not specifically trained for such environments. The researchers also found that by improving the queries, decluttering the space, and excluding adversarial objects, the success rate increases to above 82%.

See also  Enhancing Performance and Storage Flexibility in Hybrid Cloud Environments

OK-Robot is not perfect. It sometimes fails to match the natural language prompt with the right object. Its grasping model fails on some objects, and the robot hardware has limitations. More importantly, its object memory module is frozen after the environment is scanned. Therefore, the robot cannot dynamically adapt to changes in the objects and arrangements.

Nonetheless, the OK-Robot project has some very important findings. First, it shows that current open-vocabulary vision-language models are very good at identifying arbitrary objects in the real world and navigating to them in a zero-shot manner. Also, the findings show that special-purpose robot models pre-trained on large amounts of data can be applied out-of-the-box to approach open-vocabulary grasping in unseen environments. Finally, it shows that with the right tooling and configuration, pre-trained models can be combined to perform zero-shot tasks with no training. OK-Robot can be the beginning of a field of research with plenty of room for improvement.

Source link

TAGGED: Environments, Metas, OKRobot, performs, pickanddrop, unseen, zeroshot
Share This Article
Twitter Email Copy Link Print
Previous Article How to enable Stolen Device Protection on your iOS device How to enable Stolen Device Protection on your iOS device
Next Article Generative AI image What is Generative AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Best Cryptocurrency Converters in 2024

In recent times, cryptocurrency has grown right into a useful object that can be utilized…

April 5, 2024

Meta unveils AI tools to give robots a human touch in physical world

Be part of our every day and weekly newsletters for the newest updates and unique…

November 2, 2024

Light and AI drive precise motion in soft robotic arm

Elizabeth Blackert is a Rice College doctoral alumna who's the primary writer on a examine…

June 10, 2025

Khazna Unveils 100MW AI-Ready Data Center with Liquid Cooling Innovation

In an interview with Financial system Center East at GITEX GLOBAL 2024, Hassan Alnaqbi, CEO…

October 24, 2024

Ship fast, optimize later: top AI engineers don't care about cost — they're prioritizing deployment

Throughout industries, rising compute bills are sometimes cited as a barrier to AI adoption —…

November 8, 2025

You Might Also Like

Why most enterprise AI coding pilots underperform (Hint: It's not the model)
AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

By saad
Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
BBVA embeds AI into banking workflows using ChatGPT Enterprise
AI

BBVA embeds AI into banking workflows using ChatGPT Enterprise

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.