Unprecedented AI Week: SORA, Gemini 2.0, and More

ThursdAI - Dec 12 - unprecedented AI week - SORA, Gemini 2.0 and More

Hey folks, writing this from the beautiful Vancouver BC, Canada. I'm here for NeurIPS 2024, the biggest ML conferences of the year, and let me tell you, this was one hell of a week to not be glued to the screen. After last week banger week, with OpenAI kicking off their 12 days of releases, with releasing o1 full and pro mode during ThursdAI, things went parabolic. It seems that all the AI labs decided to just dump EVERYTHING they have before the holidays? 🎅

Unprecedented AI Releases

A day after our show, on Friday, Google announced a new Gemini 1206 that became the #1 leading model on LMarena and Meta released LLama 3.3, then on Saturday Xai releases their new image model code named Aurora.

Google Gemini 2.0 announced with multimodal image and audio output

On a regular week, the above Fri-Sun news would be enough for a full 2 hour ThursdAI show on its own, but not this week, this week this was barely a 15 minute segment 😅 because so MUCH happened starting Monday, we were barely able to catch our breath, so let's dive into it!

Gemini 2.0 by Google

Google has absolutely taken the crown away from OpenAI with Gemini 2.0 believe it or not this week with this incredible release. All of us on the show were in agreement that this is a phenomenal release from Google for the 1 year anniversary of Gemini.

xAI's Aurora image model becomes official, built from scratch

Gemini 2.0 Flash is beating Pro 002 and Flash 002 on all benchmarks, while being 2x faster than Pro, having 1M context window, and being fully multimodal! This model was announced to be fully multimodal on inputs AND outputs, which means it can natively understand text, images, audio, video, documents and output text, text + images and audio (so it can speak!). Some of these capabilities are restricted for beta users for now, but we know they exist.

If you remember project Astra, this is what powers that project. In fact, we had Matt Wolfe join the show, and he demoed had early access to Project Astra and demoed it live on the show (see above) which is powered by Gemini 2.0 Flash.

The most amazing thing is, this functionality, that was just 8 months ago, presented to us in Google IO, in a premium Booth experience, is now available to all, in Google AI studio, for free! Really, you can try out right now, yourself at https://aistudio.google.com/live but here's a demo of it, helping me proofread this exact paragraph by watching the screen and talking me through it.

SORA by OpenAI

Open AI has FINALLY released SORA, their long-promised text to video and image to video (and video to video) model (nee, world simulator) to general availability, including a new website - sora.com and a completely amazing UI to come with it.

SORA can generate images of various quality from 480p up to 1080p and up to 20 seconds long, and they promised that those will be generating fast, as what they released is actually SORA turbo! (apparently SORA 2 is already in the works and will be even more amazing, more on this later)

Unlocking Financial Insights with Databricks and Tredence's ATOM.AI

OpenAI seemed to have severely underestimated how many people would like to generate the 50 images per month allowed on the plus account (pro account gets you 10x more for $200 + longer durations whatever that means), and since the time of writing these words on ThursdAI afternoon, I still am not able to create a sora.com account and try out SORA myself (as I was boarding a plane when they launched it)

I've invited one of my favorite video creators, Blaine Brown to the show, who does incredible video experiments, that always go viral, and had time to play with SORA to tell us what he thinks both from a video perspective and from an interface perspective.

Blaine had a great take that we all collectively got so much HYPE over the past 8 months of getting teased, that many folks expected SORA to just be an incredible text to video 1 prompt to video generator and it's not that really, in fact, if you just send prompts, it's more like a slot machine (which is also confirmed by another friend of the pod Bilawal)

But the magic starts to come when the additional tools like blend are taken into play. One example that Blaine talked about is the Remix feature, where you can Remix videos and adjust the remix strength (Strong, Mild) Another amazing insight Blaine shared is that SORA can be used by fusing two videos that were not even generated with SORA, but SORA is being used as a creative tool to combine them into one.

And lastly, just like Midjourney (and StableDiffusion before that), SORA has a featured and a recent wall of video generations, that show you videos and prompts that others used to create those videos with, for inspiration and learning, so you can remix those videos and learn to prompt better + there are prompting extension tools that OpenAI has built in. I love this discovery and wanted to share this with you, the prompt is "A man smiles to the camera, then holds up a sign. On the sign, there is only a single-digit number (the number of 'r's in 'strawberry')"

Exciting AI Innovations

OpenAI has 12 days of releases, and the other amazing things we got obviously got overshadowed but they are still cool, Canvas can now run code and have custom GPTs, GPT in Apple Intelligence is now widely supported with the public release of iOS 18.2 and they have announced fine-tuning with reinforcement learning, allowing to fine-tune o1-mini to outperform o1 on specific tasks with a few examples. There's 6 more work days to go, and they promised to "end with a bang" so... we'll keep you updated!

This Week's Buzz

Alright, it's time for "This Week's Buzz," our weekly segment. This week I hosted Soumik Rakshit from the Weights and Biases AI Team. Soumik gave us a deep dive into Guardrails, our new set of features in Weave for ensuring reliability in GenAI production! Guardrails serve as a "safety net" for your LLM powered applications, filtering out inputs or llm responses that trigger certain criteria or boundaries.

Types of guardrails include prompt injection attacks, PII leakage, jailbreaking attempts, toxic language, but can also include a competitor mention, or selling a product at $0 or a policy your company doesn't have.

As part of developing the guardrails Soumik also developed and open-sourced an app to test prompts against those guardrails "Guardrails Genie" and we're going to host it to allow folks to test their prompts against our guardrails, and also are developing it and the guardrails in the open so please check out our Github.