Friday, 6 Mar 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
AI

Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments

Last updated: January 30, 2024 4:14 am
Published January 30, 2024
Share
Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
SHARE

There have been many advances in vision-language models (VLM) that can match natural language queries to objects in a visual scene. And researchers are experimenting with how these models can be applied to robotics systems, which are still lagging in generalizing their abilities.

A new paper by researchers at Meta AI and New York University introduces an open-knowledge-based framework that brings pre-trained machine learning (ML) models together to create a robotics system that can perform tasks in unseen environments. Called OK-Robot, the framework combines VLMs with movement-planning and object-manipulation models to perform pick-and-drop operations without training.

Robotic systems are usually designed to be deployed in previously seen environments and are poor at generalizing their capabilities beyond locations where they have been trained. This limitation is especially problematic in settings where data is scarce, such as unstructured homes.

There have been impressive advances in individual components needed for robotics systems. VLMs are good at matching language prompts to visual objects. At the same time, robotic skills for navigation and grasping have progressed considerably. However, robotic systems that combine modern vision models with robot-specific primitives still perform poorly. 

“Making progress on this problem requires a careful and nuanced framework that both integrates VLMs and robotics primitives, while being flexible enough to incorporate newer models as they are developed by the VLM and robotics community,” the researchers write in their paper.

OK-Robot modules (source: arxiv)

OK-Robot combines state-of-the-art VLMs with powerful robotics primitives to perform pick-and-drop tasks in unseen environments. The models used in the system are trained on large, publicly available datasets. 

See also  Y Combinator’s hottest startup, Origami Agents, secures $2M seed round to supercharge sales teams with AI

OK-Robot combines three primary subsystems: an open-vocabulary object navigation module, an RGB-D grasping module and a dropping heuristic system. When placed in a new home, OK-Robot requires a manual scan of the interior, which can be captured with an iPhone app that takes a sequence of RGB-D images as the user moves around the building. The system uses the images and the camera pose and positions to create a 3D environment map.

The system processes each image with a vision transformer (ViT) model to extract information about objects. The object and environment information are brought together to create a semantic object memory module.

Given a natural language query for picking an object, the memory module computes the embedding of the prompt and matches it with the object with the closest semantic representation. OK-Robot then uses navigation algorithms to find the best path to the location of the object in a way that provides the robot with room to manipulate the object without causing collisions.

Finally, the robot uses an RGB-D camera, an object segmentation model and a pre-trained grasp model to pick the object. The system uses a similar process to reach the destination and drop the object. This enables the robot to find the most suitable grasp for each object and also be able to handle destination spots that might not be flat.

“From arriving into a completely novel environment to start operating autonomously in it, our system takes under 10 minutes on average to complete the first pick-and-drop task,” the researchers write.

The researchers tested OK-Robot in 10 homes and ran 171 pick-and-drop experiments to evaluate how it performs in novel environments. OK-Robot succeeded in completing full pick-and-drops in 58% of cases. Notably, this is a zero-shot algorithm, which means the models used in the system were not specifically trained for such environments. The researchers also found that by improving the queries, decluttering the space, and excluding adversarial objects, the success rate increases to above 82%.

See also  OpenAI announces invitation-only community forum

OK-Robot is not perfect. It sometimes fails to match the natural language prompt with the right object. Its grasping model fails on some objects, and the robot hardware has limitations. More importantly, its object memory module is frozen after the environment is scanned. Therefore, the robot cannot dynamically adapt to changes in the objects and arrangements.

Nonetheless, the OK-Robot project has some very important findings. First, it shows that current open-vocabulary vision-language models are very good at identifying arbitrary objects in the real world and navigating to them in a zero-shot manner. Also, the findings show that special-purpose robot models pre-trained on large amounts of data can be applied out-of-the-box to approach open-vocabulary grasping in unseen environments. Finally, it shows that with the right tooling and configuration, pre-trained models can be combined to perform zero-shot tasks with no training. OK-Robot can be the beginning of a field of research with plenty of room for improvement.

Source link

TAGGED: Environments, Metas, OKRobot, performs, pickanddrop, unseen, zeroshot
Share This Article
Twitter Email Copy Link Print
Previous Article How to enable Stolen Device Protection on your iOS device How to enable Stolen Device Protection on your iOS device
Next Article Generative AI image What is Generative AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

EU’s AI adoption lags China amid regulatory hurdles

Google’s President of World Affairs, Kent Walker, has urged the EU to extend AI adoption…

October 2, 2025

Data centre outsourcing market size to cross $243.3 billion by 2034

The worldwide knowledge centre outsourcing market was valued at USD 132.3 billion in 2024 and…

December 14, 2025

The internet’s next upgrade should be cleaner

Jennifer Holmes, CEO of the London Web Alternate, argues that smarter routing, broader emissions measurement,…

February 14, 2026

Cisco Live: Security focus yields new firewalls, Hypershield integrations, and agentic AI defenses

As well as, the Cisco Safety Cloud App for Splunk now helps Cisco Safe Firewall…

June 11, 2025

Portal Biotechnologies Raises $7M in Seed Funding

Portal Biotechnologies, a Cambridge, MA-based cell engineering platform firm, raised $7M in Seed funding. The…

December 22, 2024

You Might Also Like

Rowspace Raises $50M to Bring AI for Private Equity Out of the Back Office
AI

Rowspace Raises $50M to Bring AI for Private Equity Out of the Back Office

By saad
Dyna.Ai Just Raised Eight Figures to Fix Finance's Biggest AI Problem
AI

Dyna.Ai Just Raised Eight Figures to Fix Finance’s Biggest AI Problem

By saad
JPMorgan expands AI investment as tech spending nears $20B
AI

JPMorgan expands AI investment as tech spending nears $20B

By saad
Photo from Nvidia's blogpost
AI

What MWC 2026 Actually Proved

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.