Google is training its robots using the Gemini artificial intelligence model to help them better navigate in space and perform tasks. This was reported by The Verge.
The DeepMind robotics team has explained in a new paper how Gemini 1.5 Pro's long context window allows users to interact with RT-2 robots more easily with natural language instructions.
This is done by filming a video tour of a certain area, such as a house or office. In this case, the researchers use Gemini 1.5 Pro to make the robot "watch" the video and learn about the environment.
The robot can then execute commands based on what it sees. For example, if you show it a phone and ask "Where can I charge it?", the robot will direct the user to an outlet.
According to DeepMind, the Gemini-powered robot had a 90% success rate in following more than 50 user instructions given in a workspace of about 840 square meters.
The researchers also have "preliminary evidence" that Gemini 1.5 Pro allows robots to plan instructions that go beyond simple navigation. Currently, the robot takes 10 to 30 seconds to process instructions.