Revolutionizing Autoregressive Models with LLaDA

Published On Tue Feb 18 2025
Revolutionizing Autoregressive Models with LLaDA

[AINews] LLaDA: Large Language Diffusion Models • Buttondown

This is AI News! an MVP of a service that goes through all AI Discords, Twitters, and Reddits and summarizes what people are talking about, allowing you to stay updated without the fatigue. Signing up here opts you in for the real thing when it launches soon.

Chinese AI is all you need? AI News for 2/14/2025-2/17/2025. We scanned 7 subreddits, 433 Twitters, and 29 Discords (211 channels, and 11039 messages) for you. The estimated reading time saved (at 200wpm) is 1163 minutes. You are now able to tag @smol_ai for AINews discussions!

LLaDA: Large Language Diffusion Models

Ahead of the expected Grok 3 release late tonight on a US holiday, today was a notable day with small things but we choose to award today's title story to LLaDA: Large Language Diffusion Models, the first text diffusion model scaled up to be competitive with autoregressive models like Llama 3 8B.

This is a "white whale" alternative LLM architecture that has only been speculated about but never successfully scaled up until now. The main trick is adapting diffusion to predict uniformly masked tokens, producing text in a diffusion process:

Table of Contents

AI Model & Research Releases

Benchmarks & Performance

Tools & Libraries

China & DeepSeek Focus

Perplexity Deep Research & Usage

AI & Society, Ethics, and Future

Humor/Memes

Themes:

Theme 1. Zonos: Open Weight Voice Cloning Model

Theme 2. OpenArc Python API Enhances Intel Inference

Theme 3. DeepSeek-R1: MoE Model CPU Performances

Subreddits: /r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding

Themes:

Theme 1. Nvidia GPUs: Compute Doubling in 10 Months Raises Questions

Theme 2. Advances in Video-to-Video AI: Hunyuan's Harry Potter Anime

Theme 3. Open-Source Video Model Step-Video-T2V: High Demand, High Innovation

Theme 4. AI Agent Apply Hero: Mass Job Applications and its Impact

Theme 5. AI Image Restoration: Upscaling the Windows XP Bliss Wallpaper

A summary of Summaries of Summaries by Gemini 2.0 Flash Thinking:

Theme 1. Grok 3 Ignites AI Debate: Musk's Claims Meet Community Skepticism

Theme 2. DeepSeek Models Drive Performance and Ethical Discussions

Theme 3. Llama 3.2 Challenges VRAM Limits and Sparks Quantization Solutions

Theme 4. RAG vs Fine-tuning Debate Intensifies: Efficiency and Application Focus Emerge

Theme 5. Community-Driven Tooling and Optimization Efforts Advance AI Development

The Gorilla LLM (Berkeley Function Calling) Discord and the AI21 Labs (Jamba) Discord have no new messages. Unsloth Hiring Challenges, Learning Resources for AI, Fine-tuning Models, Model Uploads on Hugging Face, Personal Experiences in Tech, and more topics were discussed.

Various other topics, discussions, and links were mentioned covering a wide range of areas in the AI field, showcasing the diversity and depth of ongoing conversations and developments in the industry.