Thursday, 16 Apr 2026
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
AI

Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments

Last updated: January 30, 2024 4:14 am
Published January 30, 2024
Share
Meta’s OK-Robot performs zero-shot pick-and-drop in unseen environments
SHARE

There have been many advances in vision-language models (VLM) that can match natural language queries to objects in a visual scene. And researchers are experimenting with how these models can be applied to robotics systems, which are still lagging in generalizing their abilities.

A new paper by researchers at Meta AI and New York University introduces an open-knowledge-based framework that brings pre-trained machine learning (ML) models together to create a robotics system that can perform tasks in unseen environments. Called OK-Robot, the framework combines VLMs with movement-planning and object-manipulation models to perform pick-and-drop operations without training.

Robotic systems are usually designed to be deployed in previously seen environments and are poor at generalizing their capabilities beyond locations where they have been trained. This limitation is especially problematic in settings where data is scarce, such as unstructured homes.

There have been impressive advances in individual components needed for robotics systems. VLMs are good at matching language prompts to visual objects. At the same time, robotic skills for navigation and grasping have progressed considerably. However, robotic systems that combine modern vision models with robot-specific primitives still perform poorly. 

“Making progress on this problem requires a careful and nuanced framework that both integrates VLMs and robotics primitives, while being flexible enough to incorporate newer models as they are developed by the VLM and robotics community,” the researchers write in their paper.

OK-Robot modules (source: arxiv)

OK-Robot combines state-of-the-art VLMs with powerful robotics primitives to perform pick-and-drop tasks in unseen environments. The models used in the system are trained on large, publicly available datasets. 

See also  Here's how to try Meta's new Llama 3.2 with vision for free

OK-Robot combines three primary subsystems: an open-vocabulary object navigation module, an RGB-D grasping module and a dropping heuristic system. When placed in a new home, OK-Robot requires a manual scan of the interior, which can be captured with an iPhone app that takes a sequence of RGB-D images as the user moves around the building. The system uses the images and the camera pose and positions to create a 3D environment map.

The system processes each image with a vision transformer (ViT) model to extract information about objects. The object and environment information are brought together to create a semantic object memory module.

Given a natural language query for picking an object, the memory module computes the embedding of the prompt and matches it with the object with the closest semantic representation. OK-Robot then uses navigation algorithms to find the best path to the location of the object in a way that provides the robot with room to manipulate the object without causing collisions.

Finally, the robot uses an RGB-D camera, an object segmentation model and a pre-trained grasp model to pick the object. The system uses a similar process to reach the destination and drop the object. This enables the robot to find the most suitable grasp for each object and also be able to handle destination spots that might not be flat.

“From arriving into a completely novel environment to start operating autonomously in it, our system takes under 10 minutes on average to complete the first pick-and-drop task,” the researchers write.

The researchers tested OK-Robot in 10 homes and ran 171 pick-and-drop experiments to evaluate how it performs in novel environments. OK-Robot succeeded in completing full pick-and-drops in 58% of cases. Notably, this is a zero-shot algorithm, which means the models used in the system were not specifically trained for such environments. The researchers also found that by improving the queries, decluttering the space, and excluding adversarial objects, the success rate increases to above 82%.

See also  Secure I.T. Environments boosts energy efficiency at Isle of Wight NHS Trust’s Data Centre

OK-Robot is not perfect. It sometimes fails to match the natural language prompt with the right object. Its grasping model fails on some objects, and the robot hardware has limitations. More importantly, its object memory module is frozen after the environment is scanned. Therefore, the robot cannot dynamically adapt to changes in the objects and arrangements.

Nonetheless, the OK-Robot project has some very important findings. First, it shows that current open-vocabulary vision-language models are very good at identifying arbitrary objects in the real world and navigating to them in a zero-shot manner. Also, the findings show that special-purpose robot models pre-trained on large amounts of data can be applied out-of-the-box to approach open-vocabulary grasping in unseen environments. Finally, it shows that with the right tooling and configuration, pre-trained models can be combined to perform zero-shot tasks with no training. OK-Robot can be the beginning of a field of research with plenty of room for improvement.

Source link

TAGGED: Environments, Metas, OKRobot, performs, pickanddrop, unseen, zeroshot
Share This Article
Twitter Email Copy Link Print
Previous Article How to enable Stolen Device Protection on your iOS device How to enable Stolen Device Protection on your iOS device
Next Article Generative AI image What is Generative AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Alibaba is developing an AI inference chip amid US export curbs

Specialists say Alibaba’s AI Chip may additionally deal with the shortcomings of Huawei’s 910B and…

September 2, 2025

Harnessing AI for corporate cybersecurity

Cybersecurity is within the midst of a contemporary arms race, and the highly effective weapon…

August 22, 2025

Nokia boosts industrial edge capabilities with AI, automation, and safety innovations

Nokia has expanded its industrial utility portfolio with six new Business 4.0 purposes on the…

April 3, 2025

Data Center Logical Security Market Forecast Report 2024-2029, with Focus on Bosch, Check Point Software Technologies, Trend Micro, Italtel, HP, VMWare, Fortinet, Intel, IBM and Cisco Systems

Firm EmblemInternational Knowledge Middle Logical Safety MarketInternational Knowledge Middle Logical Safety MarketDublin, Could 02, 2024…

May 2, 2024

Samsung AI-RAN demo signals telecom cloud shift at MWC 2026

Samsung moved synthetic intelligence nearer to dwell telecom infrastructure at MWC 2026, the place it…

March 2, 2026

You Might Also Like

Citizen developers now have their own Wingman
AI

Citizen developers now have their own Wingman

By saad
Commvault launches a ‘Ctrl-Z’ for cloud AI workloads
AI

Commvault launches a ‘Ctrl-Z’ for cloud AI workloads

By saad
Agricultural drones get smarter for large farm holdings
AI

Agricultural drones get smarter for large farm holdings

By saad
Hyundai expands into robotics and physical AI systems
AI

Hyundai expands into robotics and physical AI systems

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.