Exploring the Advancements in AI Assistants and Technology

AI Week in Review 25.06.07 - by Patrick McGuinness

Google has released an upgraded Gemini 2.5 Pro with improvements across the board. This latest boost makes already-leading Gemini 2.5 Pro the best publicly available AI model (at least for now): 21.6% on Humanity’s Last Exam, 86.4% on GPQA Diamond, 88.0% on AIME 25, and 82.2% on Aider polyglot. This update supports thinking budgets and fixes areas where their May code-focused update regressed from the March update: We also addressed feedback from our previous 2.5 Pro release, improving its style and structure — it can be more creative with better-formatted responses. This is likely the final tweak to Gemini 2.5 Pro, which has incrementally evolved since December: this model will be the generally available, stable version starting in a couple of weeks, ready for enterprise-scale applications.

Mistral AI Enters the Coding Assistant Arena

Mistral AI has announced Mistral Code, joining the AI coding assistant competition with an enterprise coding assistant that offers unprecedented customization, allowing local deployment, fine-tuning on private codebases, and specialized AI models. Mistral is built on open-source coding assistant Continue and works in JetBrains and VSCode IDEs. Their emphasis on enterprise concerns of security, compliance, and data privacy positions Mistral Code as an interesting alternative, challenging rivals like GitHub Copilot, Cursor, and OpenAI Codex.

Latest Updates in AI Assistants

Cursor has released Cursor 1.0, adding several features to their leading AI coding assistant: Bug Bot auto-reviews PRs, Background Agent, Cursor support for Jupyter (iPython) notebooks, extracting memories from conversations, and one click MCP install for easier MCP setup.

Anthropic’s Claude Code is now included in the Claude Pro plan. Pro plan subscribers can use their rate limits for Claude apps and Claude Code. On a related note, OpenAI’s Codex coding assistant is now available to plus members and has internet access.

EleutherAI released Common Pile v0.1, an 8 TB collection of openly licensed, public domain text for AI model training. This was developed with a broad group of partner organizations over two years. This dataset was used to train new Comma models, showing training on their open dataset performs competitively with those trained on copyrighted material. EleutherAI aims to demonstrate the viability of licensed data and promote transparency amidst ongoing AI copyright lawsuits.

ElevenLabs revealed their V3 alpha model, ‘the most expressive TTS model’ yet. It can speak in 70+ languages, supports multi-speaker dialogue and emotional, expressive audio tags such as [excited], [sighs], [laughing], and [whispers]. The ElevenLabs V3 demo is very impressive; these voices pass the audio Turing test.

Advancements in AI Technology

Captions introduced Mirage Studio which helps users “craft talking videos with AI,” providing talking-head avatar videos similar to HeyGen: Generate expressive videos at scale, with actors that actually look and feel alive. Our actors laugh, flinch, sing, rap — all of course, per your direction. Just upload an audio, describe the scene, or drop in a reference image, and create energetic content in minutes. You can use it for explainer videos or marketing, or even make your own music video from a Suno AI music track.

OpenAI made several incremental feature updates to ChatGPT. OpenAI released connectors allowing Deep Research to connect to local data on various platforms. MCP support will be available to Pro users. ChatGPT now has a Record Mode to capture, transcribe, and summarize meetings straight into ChatGPT. It produces structured output and a full transcript with timestamps.

OpenAI is upgrading ChatGPT Advanced Voice, making interactions feel more fluid and human-like with enhancements in AI voice output intonation and naturalness. ChatGPT now has expanded memory capabilities for free users. The model can reference recent conversation history to produce more contextually aware and personalized responses.

OpenAI's business user base surged 50% since February, reaching 3 million paying enterprise customers.

Latest Features by Tech Giants

Google is rolling out “scheduled actions” in the Gemini app for paid subscribers, enabling timed recurring or one-off task execution of tasks. This feature offers capabilities akin to ChatGPT's recurring actions.

Google is rolling out interactive chart visualizations in AI Mode in Labs to help bring financial data to life for questions on stocks and mutual funds. Powered by Gemini, this feature allows users to compare and analyze real-time and historical financial information via AI-generated interactive graphs and explanations.

Google began testing “Search Live” in AI Mode on select Android and iOS devices, introducing real-time voice and video conversational search capabilities that allow the AI to ask clarifying questions and process camera inputs for contextual responses.

Windows Insiders on Copilot+ PCs gained access to “Relight”, a new AI-powered feature in Microsoft Photos that offers dynamic lighting controls to adjust the illumination of pictures post-capture.

Anthropic open-sourced its circuit tracing tools to demystify LLMs' “black box” nature for developers and enterprises. Anthropic’s circuit tracing tool uses mechanistic interpretability to understand internal model workings, investigate errors, and enable granular fine-tuning.

Current Research and Studies

A new study published in How much do language models memorize? reveals that GPT-style LLMs have a fixed memorization capacity of approximately 3.6 bits per parameter. The research indicates that models don't memorize more with increased training data, but instead distribute this fixed capacity over more data, forcing less memorization per sample. Training on more data leads to better generalization.

AI Research Highlights

Our AI Research Review for this week covered AI research on self-improving AI and entropy management in RL training for reasoning:

Darwin Gödel Machine: Open-Ended Evolution of Self-Improving Agents
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
The Entropy Mechanism of RL for Reasoning Language Models
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective RL for LLM Reasoning
Skywork Open Reasoner 1 Technical Report