The Google I/O 2024 keynote session allowed the company to show off its impressive array of artificial intelligence (AI) models and tools it has been working on for some time. Most of the introduced features will make their way to public previews in the coming months. However, the most interesting technology presented at the event won’t be here for a while. Developed by Google DeepMind, this new AI assistant is called Project Astra and demonstrates real-time interaction based on computer vision.

Project Astra is an AI model that can perform tasks that are extremely advanced for existing chatbots. Google follows a system where it uses its biggest and most powerful AI models to train its production-ready models. Highlighting one such example of an AI model currently being trained, Google DeepMind co-founder and CEO Demis Hassabis demonstrated Project Astra. Introducing it, he said: “Today we have an exciting new advance to share about the future of AI assistants, which we call Project Astra. For a long time, we wanted to create a universal AI agent that could be really useful in everyday life.”

Hassabis also listed a set of requirements the company has set for such AI agents. They must understand and respond to complex and dynamic real-world environments, and must remember what they see in order to develop context and take action. In addition, it also needs to be teachable and personable so that it can learn new skills and hold conversations without delay.

With this description, DeepMind’s CEO demonstrated a demo video in which a user can be seen holding a smartphone with a camera app open. The user talks to the AI ​​and the AI ​​instantly responds by answering various vision-based queries. The AI ​​was also able to use the visual information for context and answer related questions requiring generative abilities. For example, the user showed the AI ​​some crayons and asked the AI ​​to describe them with alliteration. Without any delay, the chatbot says, “Creative Crayons are coloring merrily. They certainly create colorful creations.”

But that wasn’t all. Further in the video, the user points to a window that shows some buildings and roads. When asked about the neighborhood, the AI ​​immediately gives the correct answer. This demonstrates the computer vision processing capability of the AI ​​model and the massive visual data set that would be required to train it. But perhaps the most interesting demonstration was when the AI ​​was asked about the user’s glasses. They appeared on the screen briefly for a few seconds and he was already off the screen. Yet the AI ​​could remember its position and guide the user to it.

The Astra project is not available in either public or private preview. Google is still working on the model and needs to understand the use cases of the AI ​​feature and decide how to make it available to users. This demonstration would be AI’s most ridiculous feat yet, but the OpenAI Spring Update event a day ago took some of the thunder away. During its event, OpenAI unveiled GPT-4o, which demonstrated similar capabilities and emotional voices that made AI sound more human.