OpenAI on Monday announced GPT-4o, a brand new AI model that the company says is one step closer to “much more natural human-computer interaction.” The new model accepts any combination of text, audio and images as input and can generate output in all three formats. It’s also capable of recognizing emotion, lets you interrupt it mid-speech, and reacts almost as fast as a human during conversations.
“What’s special about GPT-4o is that it’s a GPT-4 level of intelligence for everyone, including our free users,” said OpenAI CTO Mira Murati during a live presentation. “This is the first time we’ve taken a huge step forward when it comes to ease of use.”
During the presentation, OpenAI showed GPT-4o translating live between English and Italian, helping a researcher solve a linear equation in real time on paper, and providing deep breathing guidance to another OpenAI executive simply by listening to his breathing.
The “O” in GPT-4o stands for “omni,” a reference to the model’s multimodal capabilities. OpenAI said GPT-4o is trained on text, vision and audio, meaning all inputs and outputs are processed by the same neural network. This is different from the company’s previous models, the GPT-3.5 and GPT-4, which allowed users to ask questions simply by speaking, but then transcribed the speech into text. It took away the tone and emotion and made the interaction slower.
OpenAI is making the new model available to everyone, including free ChatGPT users, over the next few weeks, and is also releasing a desktop version of ChatGPT, initially for Mac, that paid users will have access to starting today.
OpenAI’s announcement comes a day before Google I/O, the company’s annual developer conference. Shortly after OpenAI revealed GPT-4o, Google released a version of Gemini, its own AI chatbot, with similar capabilities.
https://www.engadget.com/openai-claims-that-its-free-gpt-4o-model-can-talk-laugh-sing-and-see-like-a-human-184249780.html?src=rss