Unpacking the Latest AI Innovations from OpenAI and Google

Published On Tue Jun 04 2024
Unpacking the Latest AI Innovations from OpenAI and Google

Takeaways from OpenAI and Google's May announcements

A few weeks ago both OpenAI and Google announced a number of new AI features and products. Although it feels like forever given how quickly AI has been progressing, there were a few themes from the announcements that I felt was worth unpacking, which is what I’ll do in this piece.

OpenAI's GPT 4-o (omni)

OpenAI, through their new flagship model, GPT 4-o (omni), showcased a model that can reason across audio, vision, and text in real time. What this essentially means is that it understands all those inputs directly, and can also output these elements directly.

OpenAI introduces GPT-4o, its faster and free for all new AI model

Google's Project Astra

Google also showcased Project Astra, which is a universal AI agent that can reason across these modalities in realtime and respond directly across the modalities.

Google I/O 2024: Google's Project Astra is the “universal AI agent

As an example of how this changes things, voice agents today are typically are built using three steps:

These new models can do this natively, skipping the intermediate steps and directly outputting the right speech in response to what it hears. This is a game changer for speed/latency but also in its ability to understand tone, emotion etc and output text that sounds more natural.

Key Takeaways

The biggest unlock in the short to medium from these will be in two areas:

  • Developers using OpenAI’s and Google’s models are seeing improvements on the cost and latency side for a given level of performance.
  • AI integrated everywhere, deeply at both the operating system level and within every application, is becoming a prominent theme.
HoloAssist: A multimodal dataset for next-gen AI copilots for the ...

These launches continue to bring up the key question around startup opportunities in the AI space.

OpenAI's Consumer Ambitions

It’s rare to see a “startup” simultaneously execute on as many different initiatives across developers/B2B and consumer as OpenAI is doing. On one hand, critics may argue there’s some cookie licking across too many different areas ongoing. But the other side is that OpenAI is continuing to execute on an ambitious and wide-ranging roadmap across B2B and consumer quite successfully.

Between the launch of a desktop app on mac, and a “Her” like all-encompassing assistant enabled by GPT 4-o, and the reports that OpenAI may soon partner with Apple on their next generation assistant, it’s likely that OpenAI’s models may be used by a significant number of people very soon, directly from an OpenAI product.

In addition, OpenAI is making their flagship GPT-4o model available on the ChatGPT free plan, which partially is to collect more multimodal data (and highlights the progress in the cost to serve these models), but also signals they continue to want to lead the way in the race for the de-facto AI assistant for consumers, amidst the recent competition from Google and Meta with the launch of their Meta AI.