Google DeepMind Introduces Gemini Robotics for Real-World Tasks
Google DeepMind has unveiled Gemini Robotics, a new vision-language-action (VLA) model designed to bring AI into the physical world. Built upon Gemini 2.0, this model allows robots to perform a wider range of real-world tasks by adding physical actions as a new output modality.
Key Features of Gemini Robotics
- Generality: Adapts to diverse situations and handles new objects and instructions effectively, even in unfamiliar environments. This adaptability stems from Gemini's inherent understanding of the world.
- Interactivity: Responds quickly to instructions and environmental changes, understanding commands in everyday conversational language and multiple languages.
- Dexterity: Performs complex, multi-step tasks requiring precise manipulation, such as origami folding or packing items.
Gemini Robotics-ER for Enhanced Spatial Reasoning
Alongside Gemini Robotics, Google introduced Gemini Robotics-ER, a vision-language model with enhanced spatial reasoning capabilities. This model allows robots to better understand the world around them, enabling more intuitive interactions with objects. For example, it can determine the best way to grasp a coffee mug by its handle.
Real-World Applications and Partnerships
Gemini Robotics is compatible with various robot form factors, including bi-arm and humanoid robots. Google is working with trusted testers like Agile Robots, Agility Robots, Boston Dynamics, and Enchanted Tools to explore real-world applications. This development aligns with Google's vision of robotics as a testing ground for AI advancements in the physical world. For more on AI advancements, check out Gemini Powers Up Google Chat and Meet. You can also learn about related advancements in Samsung's XR Headset and explore the latest tech deals.