Discover the Ghibli Effect: GPT-4o's Image Generation System

GPT 4o generates Ghibli Art, Gemini 2.5 Pro is out, Tencent's GPT ...

Hello AI Enthusiasts! Welcome to the twelfth edition of "This Week in AI Engineering"! ChatGPT's 4o introduces powerful native image generation capabilities, leading to the emergence of the viral "Ghibli effect." Additionally, Tencent unveils the world's first ultra-large Hybrid-Transformer-Mamba MoE model, Google releases Gemini 2.5 Pro with state-of-the-art reasoning capabilities, and Microsoft introduces KBLaM, integrating knowledge bases efficiently.

GPT-4o's New Image Generation System

OpenAI has launched a new image generation system integrated directly into GPT-4o, surpassing DALL-E by enabling image creation within the language model itself. This multimodal approach enhances accuracy and context awareness in image generation. The release of GPT-4o has triggered the "Ghibli effect," with users transforming images into art resembling Studio Ghibli's animation style.

System prompt for GPT-4o when generating MathPrompt attacks

Despite its advancements, OpenAI acknowledges limitations in areas like cropping, hallucinations, and multilingual text rendering, which are expected to be addressed in future updates. To read more about GPT-4o's image generation system, visit here.

Google's Gemini 2.5 Pro

Google has introduced Gemini 2.5 Pro, showcasing enhanced reasoning capabilities and benchmark performance. This model emphasizes integrating reasoning techniques directly into the model for improved problem analysis. Gemini 2.5 Pro excels in visual reasoning and image understanding, positioning it as a versatile tool for developing advanced AI agents.

Microsoft's KBLaM

Microsoft Research introduces Knowledge Base-Augmented Language Model (KBLaM), an innovative approach to integrating external knowledge into pre-trained language models efficiently. The model's architecture eliminates the need for separate retrieval systems or costly retraining processes, offering a seamless integration of structured knowledge.

Tencent's Hunyuan-T1

Tencent has released Hunyuan-T1, an upgraded version of their Hybrid-Transformer-Mamba MoE model. Built on the TurboS fast-thinking base architecture, Hunyuan-T1 excels in tasks such as reading comprehension, Chinese language understanding, and mathematical reasoning. The model establishes itself as a leading competitor in the realm of reasoning models.

Tencent has launched 'Hunyuan-T1', a super-large Mamba-based model

Anthropic's "Think" Tool for Claude 3.7

Anthropic introduces a new "think" tool for Claude 3.7, enhancing the model's performance in complex tasks involving sequential tool calls and multi-step decision-making. This tool offers a low-risk, high-reward solution for improving performance on challenging tasks with minimal implementation complexity.

To explore the technical implementation and performance metrics of Anthropic's "think" tool, along with the key differences from extended thinking approaches, visit this page.

That concludes this edition of "This Week in AI Engineering." Stay updated on the latest AI advancements and share this newsletter with fellow enthusiasts. Subscribe to receive future updates directly in your inbox. Happy building!