Thursday, 26 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
AI

Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments

Last updated: January 30, 2024 4:14 am
Published January 30, 2024
Share
Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
SHARE

There have been many advances in vision-language models (VLM) that can match natural language queries to objects in a visual scene. And researchers are experimenting with how these models can be applied to robotics systems, which are still lagging in generalizing their abilities.

A new paper by researchers at Meta AI and New York University introduces an open-knowledge-based framework that brings pre-trained machine learning (ML) models together to create a robotics system that can perform tasks in unseen environments. Called OK-Robot, the framework combines VLMs with movement-planning and object-manipulation models to perform pick-and-drop operations without training.

Robotic systems are usually designed to be deployed in previously seen environments and are poor at generalizing their capabilities beyond locations where they have been trained. This limitation is especially problematic in settings where data is scarce, such as unstructured homes.

There have been impressive advances in individual components needed for robotics systems. VLMs are good at matching language prompts to visual objects. At the same time, robotic skills for navigation and grasping have progressed considerably. However, robotic systems that combine modern vision models with robot-specific primitives still perform poorly. 

“Making progress on this problem requires a careful and nuanced framework that both integrates VLMs and robotics primitives, while being flexible enough to incorporate newer models as they are developed by the VLM and robotics community,” the researchers write in their paper.

OK-Robot modules (source: arxiv)

OK-Robot combines state-of-the-art VLMs with powerful robotics primitives to perform pick-and-drop tasks in unseen environments. The models used in the system are trained on large, publicly available datasets. 

See also  AI21 CEO says transformers not right for AI agents due to error perpetuation

OK-Robot combines three primary subsystems: an open-vocabulary object navigation module, an RGB-D grasping module and a dropping heuristic system. When placed in a new home, OK-Robot requires a manual scan of the interior, which can be captured with an iPhone app that takes a sequence of RGB-D images as the user moves around the building. The system uses the images and the camera pose and positions to create a 3D environment map.

The system processes each image with a vision transformer (ViT) model to extract information about objects. The object and environment information are brought together to create a semantic object memory module.

Given a natural language query for picking an object, the memory module computes the embedding of the prompt and matches it with the object with the closest semantic representation. OK-Robot then uses navigation algorithms to find the best path to the location of the object in a way that provides the robot with room to manipulate the object without causing collisions.

Finally, the robot uses an RGB-D camera, an object segmentation model and a pre-trained grasp model to pick the object. The system uses a similar process to reach the destination and drop the object. This enables the robot to find the most suitable grasp for each object and also be able to handle destination spots that might not be flat.

“From arriving into a completely novel environment to start operating autonomously in it, our system takes under 10 minutes on average to complete the first pick-and-drop task,” the researchers write.

The researchers tested OK-Robot in 10 homes and ran 171 pick-and-drop experiments to evaluate how it performs in novel environments. OK-Robot succeeded in completing full pick-and-drops in 58% of cases. Notably, this is a zero-shot algorithm, which means the models used in the system were not specifically trained for such environments. The researchers also found that by improving the queries, decluttering the space, and excluding adversarial objects, the success rate increases to above 82%.

See also  Huawei agentic AI drives industrial automation

OK-Robot is not perfect. It sometimes fails to match the natural language prompt with the right object. Its grasping model fails on some objects, and the robot hardware has limitations. More importantly, its object memory module is frozen after the environment is scanned. Therefore, the robot cannot dynamically adapt to changes in the objects and arrangements.

Nonetheless, the OK-Robot project has some very important findings. First, it shows that current open-vocabulary vision-language models are very good at identifying arbitrary objects in the real world and navigating to them in a zero-shot manner. Also, the findings show that special-purpose robot models pre-trained on large amounts of data can be applied out-of-the-box to approach open-vocabulary grasping in unseen environments. Finally, it shows that with the right tooling and configuration, pre-trained models can be combined to perform zero-shot tasks with no training. OK-Robot can be the beginning of a field of research with plenty of room for improvement.

Source link

TAGGED: Environments, Metas, OKRobot, performs, pickanddrop, unseen, zeroshot
Share This Article
Twitter Email Copy Link Print
Previous Article How to enable Stolen Device Protection on your iOS device How to enable Stolen Device Protection on your iOS device
Next Article Generative AI image What is Generative AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Topanga Raises $8M in Series A Funding

Topanga, a Los Angeles, CA-based supplier of an enterprise expertise platform that helps kitchens scale…

March 26, 2025

Arista continues to defy expectations, build enterprise momentum

Throughout her keynote, Ullal famous Arista isn't solely promoting high-speed switches for AI knowledge facilities…

September 22, 2025

Augur Raises $7M in Seed Funding

Augur (fka SecLytics), a San Diego, CA primarily based AI-powered risk prevention firm, raised $7m…

April 24, 2025

Vertiv and Oklo Explore Nuclear Energy for Data Center Cooling

Oklo, the nuclear know-how firm backed by OpenAI CEO Sam Altman, has introduced a partnership…

July 22, 2025

Advancing fibre network testing in high-density data centres

Fluke Networks has revealed CertiFiber Max, a third-generation optical loss check set (OLTS). This software is…

January 26, 2026

You Might Also Like

RPA still matters, but AI is changing how automation works
AI

RPA matters, but AI changes how automation works

By saad
Family offices turn to AI for financial data insights
AI

Family offices turn to AI for financial data insights

By saad
AI agents enter banking roles at Bank of America
AI

AI agents enter banking roles at Bank of America

By saad
Securing AI systems under today's and tomorrow's conditions
AI

Securing AI systems under today’s and tomorrow’s conditions

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.