OpenAI showcases the real time 'magic' of GPT-4o

“Omnimodel” makes ChatGPT more accessible

AI Chat

Image:
AI Chat

OpenAI debuted GPT-4o yesterday which the company hopes will offer users a more human like interaction in real time via live voice conversation, video streams and text.

The model will roll out over the next few weeks and will be free for all users through both the GPT app and the web interface. Users who subscribe to OpenAI's paid tiers, which start at $20 per month, will be able to make more requests.

OpenAI CTO Mira Murati, leading the live demonstration said:

"GPT-4o reasons across voice, text and vision, and this is incredibly important, because we're looking at the future of interaction between ourselves and machines."

Whilst GPT-4 provided multiple ways to interact with OpenAIs products, these were only accessible via separate models. This delayed response times and increased computing costs.

GPT-4o now offers all of these capabilities in a single "onmimodel." This should lead to faster responses and smoother transitions between tasks.

The new release makes ChatGPT more accesible because it allows users to interact with the chatbot as they would a digital assistant like Siri or Alexa.

Users can even ask GPT-4o powered ChatGPT a question and interrupt it while it's answering which will then prompt the model to reassess its answer. In addition to the real-time responsiveness, OpenAI claims that the model can pick up on nuances in a user's voice and fine tune its response in "a range of different emotive styles."

OpenAI CEO Sam Altman said the new technology "feels like magic", writing in a blog post that it was "the best computer interface" he had ever used.

"It feels like AI from the movies; and it's still a bit surprising to me that it's real," he wrote.

"The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different. It is fast, smart, fun, natural, and helpful."

GPT-4o also boasts enhanced translation capabilities, now being conversant with around 50 languages.

Nonetheless, OpenAI is clearly wary about the potential of such user-friendly voice interaction because the company has said it plans to first launch support for GPT4o's voice capabilities to "a small group of trusted partners" over the next few weeks.

OpenAI is trying to address some of the shortcomings of chatbots such as their propensity to hallucinate or simply make things up. In a blog post accompanying the launch, OpenAI explained that GPT-4o has been extensively tested to identify risks arising from bias and misinformation.

"GPT-4o has also undergone extensive external red teaming with 70+ external experts in domains such as social psychology, bias and fairness, and misinformation to identify risks that are introduced or amplified by the newly added modalities. We used these learnings to build out our safety interventions in order to improve the safety of interacting with GPT-4o. We will continue to mitigate new risks as they're discovered."

Murati also said during the demonstration:

"GPT-4o presents new challenges for us when it comes to safety, because we're dealing with real-time audio, real-time vision and our team has been hard at work, figuring out how to build in mitigations against misuse. We continue to work with different stakeholders out there from government, media, entertainment, all industries, red-teamers, and civil society about how to best bring these technologies into the world."

Whilst the demonstration was clearly tightly managed, it didn't go perfectly. The chatbot started to solve an equation it wasn't shown and mistook an image of a smiling man for a wooden surface.

The timing of the event has also been noted. Today, Google is due to announce its latest GenAI developments at its annual conference, Google I/O.