At I/O 2024, Google’s teaser for gave us an idea of ​​where AI assistants are going in the future. It’s a multimodal feature that combines the intelligence of Gemini with the kind of image recognition capabilities you get in Google Lens, as well as powerful natural language responses. However, while the promo video was slick, once you were able to try it out in person, it’s clear that you’ve got a long way to go before something like the Astra lands on your phone. So here are three takeaways from our first experience with Google’s next-generation AI.

Sam takes:

Currently, most people interact with digital assistants using their voice, so Astra’s immediate multimodality (ie using image and sound in addition to text/speech) to communicate with AI is relatively new. In theory, it allows computer-based entities to operate and behave more like a true assistant or agent—which was one of Google’s big buzzwords for the show—rather than something more robotic that simply responds to spoken commands.

The first Astra demo project we tried used a large touchscreen connected to a downward-facing camera.

Photo by Sam Rutherford/Engadget

In our demo, we had the option to ask Astra to tell a story based on some objects we put in front of the camera, after which it told us a wonderful story about a dinosaur and his trusty baguette trying to escape an ominous red light. It was fun and the story was cute, and the AI ​​worked as well as you’d expect. But at the same time, it was a far cry from the seemingly omniscient assistant we saw in Google’s teaser. And aside from perhaps entertaining a child with an original bedtime story, it didn’t feel like the Astra was doing as much with the information as you’d like.

My colleague Carissa then drew a bucolic scene on a touchscreen, at which point Astra correctly identified the flower and sun she had drawn. But the most engaging demo was when we returned a second time with Astra running on the Pixel 8 Pro. This allowed us to point his cameras at a collection of objects while he tracked and memorized the location of each one. It was even smart enough to recognize my outfit and where I had hidden my sunglasses, even though those items weren’t originally part of the demo.

In some ways, our experience highlighted the potential highs and lows of AI. Just having a digital assistant tell you where you might have left your keys or how many apples were in your fruit bowl before heading to the grocery store can help you save real time. But after talking to some of the researchers behind Astra, there are still many hurdles to overcome.

An AI-generated story about a dinosaur and a bagel created by Google's Astra projectAn AI-generated story about a dinosaur and a bagel created by Google's Astra project

Photo by Sam Rutherford/Engadget

Unlike many of Google’s recent AI features, Astra (which Google describes as a “research preview”) still needs help from the cloud, rather than being able to run on the device. And while it maintains some level of object persistence, these “memories” only last for a single session, which currently only spans a few minutes. And even if Astra could remember things for longer, there are things like storage and latency to consider, because for every object Astra calls, you risk slowing down the AI, resulting in a more awkward experience. So while it’s clear that the Astra has a lot of potential, my excitement was tempered by the knowledge that it will be a while before we can get fuller functionality.

Carissa’s takeaway:

Of all the generative AI advances, multimodal AI is the one I’m most intrigued by. As powerful as the latest models are, I find it hard to get excited about iterative updates to text-based chatbots. But the idea of ​​an AI that can recognize and respond to queries about your surroundings in real time feels like something out of a sci-fi movie. It also gives a much clearer picture of how the latest wave of AI improvements will find their way into new devices like smart glasses.

Google hinted at this with Project Astra, which may one day have a glasses component, but is mostly experimental for now (the video during the I/O keynote is apparently a “research prototype.”) Personally, though, I didn’t feel like Project Astra just like something out of a science fiction movie.

During a demonstration at Google I/O, Project Astra was able to remember the position of objects seen by the phone's camera. During a demonstration at Google I/O, Project Astra was able to remember the position of objects seen by the phone's camera.

Photo by Sam Rutherford/Engadget

He was able to accurately recognize objects that were placed in the room and answer nuanced questions about them, such as “which of these toys should a 2-year-old play with.” He could recognize what was in my doodle and make up stories about different toys we showed him.

But most of the Astra’s capabilities seemed on par with what Meta offers with its smart glasses. Meta’s multimodal AI can also recognize your surroundings and do some creative writing on your behalf. And while Meta also bills the features as experimental, at least they’re widely available.

The Astra’s feature that may set Google’s approach apart is the fact that it has built-in “memory.” After scanning a bunch of objects, it can still “remember” where specific items were placed. So far, Astra’s memory appears to be limited to a relatively short period of time, but members of the research team told us that it could theoretically be expanded. This would obviously open up even more possibilities for the tech, making the Astra look more like a true sidekick. I don’t need to know where I left my glasses 30 seconds ago, but if you can remember where I left them last night, it will really feel like science fiction come to life.

But, like much of generative AI, the most exciting possibilities are those that haven’t quite happened yet. Astra may get there eventually, but right now it looks like Google still has a long way to go to get there.

Catch up on all the news from Google I/O 2024 right now here!