“The real magic begins once we start to chat with the machines” – Futurist Jim Carroll

In the TV series The Jetsons, the humans regularly talked to the robots.

That future isn’t that far away. Watch the video in which the Google DeepMind research group is using ChatGPT-like commands to instruct a robotic arm to use its machine vision to identify and work with a particular object. In this case, it’s been asked to identify and lift the extinct animal. It’s figured out which animal figure is the extinct one, utilizing its AI-based machine-vision analysis, and proceeds accordingly. Imagine this – the next command could be something as simple as this – “Find the king of the jungle and place it next to the sports item used by LeBron James.” Magical!

The full details of this not-too-small achievement can be found on an extensive page that details all the work behind the scenes: “RT-2: Vision-Language-Action Models:  Transfer Web Knowledge to Robotic Control.” States the abstract:

We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.

In other words, we’re learning how to use large language models – the tech behind Bard, ChatGPT, and Bing — to figure out what to do and translate these results into actions that are given to the robots. Now consider this – we already know how to use our voice to talk to computers – think Siri, Alexa, and more – so simply talking directly with the robots via something like ChatGPT is the obvious next step.

Similarly, Microsoft is exploring the integration between robotics and AI.

They state similar goals about the project, ChatGPT for Robotics: Design Principles and Model Abilities

Have you ever wanted to tell a robot what to do using your own words, like you would to a human? Wouldn’t it be amazing to just tell your home assistant robot: “Please warm up my lunch“, and have it find the microwave by itself? Even though language is the most intuitive way for us to express our intentions, we still rely heavily on hand-written code to control robots. Our team has been exploring how we can change this reality and make natural human-robot interactions possible using OpenAI (opens in new tab)‘s new AI language model, ChatGPT (opens in new tab).

Add into this evolution the fact that we know that sophisticated robotic technology is well on the way -all you need to browse the page for Boston Dynamics with insight on the evolution and roles of their Spot robotic technology., and their robots Spot and Stretch. Spend some time reading about the real-world use of this technology in warehouses, construction sites, and manufacturing facilities. And then, watch the video of how this particualr robot is actually being used at a power plant:

Then, for the fun of it, watch these robots in their latest dance video:

Now watch 20 years of the evolution of this robotic technology:

Now imagine instructing the robot what to do simply by talking to it, or by asking a question of ChatGPT and having it figure out the next steps.

The future is magic, and it’s not that far away.




