Meta releases first Llama 4 models: What You Need to Know

Published On Sun Apr 06 2025

Meta releases first Llama 4 models

The above are distilled from an unreleased, still in training, larger model:

The free lunch that was smashing GitHub Copilot’s premium models with unlimited requests is over and rate limits are coming in May. Any requests not hitting the base model (currently: OpenAI GPT-4o) are considered premium, so Sonnet, Gemini, o1 etc. The rate limits are as follows:

Copilot Free: 50/month
Copilot Pro: 300/month
Copilot Pro+ 1500/month
Copilot Business: 300/month
Copilot Enterprise: 1000/month

ChatGPT Image Generator

The premium models also have a rate multiplier where for example Claude Sonnet 3.5/3.7 have a rate mulitplier of 1, and o1 and GPT-4.5 have a rate multiplier of 10 and 50 respectively.

The news comes with an announcement of the GitHub Copilot Pro+ plan, a new individual tier priced at 39USD/month with 1500 requests/month to premium models.

Gemini 2.5 Pro pricing is out

Gemini 2.5 Pro Benchmark

For prompts less than 200k tokens:

Input: $1.25/mil tokens
Output: $10/mil tokens

For prompts over 200k tokens:

Input: $2.50/mil tokens
Output: $15/mil tokens

Importantly this bumps the rate limit to at least 150RPM & 1,000RPD on Tier 1.

The success of the new ChatGPT 4o image generation caused the rollout to be delayed but it’s now available to free users, rate limited to 3 generations per day.

Not sure what tangible changes this might produce considering Elon already owned both but, xAI acquired XxAI and X’s futures are intertwined. Today, we officially take the step to combine the data, models, compute, distribution and talent. This combination will unlock immense potential by blending xAI’s advanced AI capability and expertise with X’s massive reach.

The next version of Docker (v4.40) will add native llm capability to the Docker CLI

Docker Model Runner is not yet publicly released, but adds commands like docker model run that will run LLM models outside of containers. Initial reports look promising and may be nice replacement for running llama.cpp, koboldcpp or ollama locally.

OpenAI are adopting MCP. They’ve already integrated it with their Agents SDK and note they’re also working on MCP support for the OpenAI API and ChatGPT desktop app.

It’s been some time since I wrote a browser extension and it couldn’t be easier to do so with wxt, the Next-gen Web Extension Framework. Based on vite, it can export to both Chrome & Firefox and has an HMR dev mode that’s very familiar.

MCP C# SDK allows C# developers to build MCP clients and servers in the .NET ecosystem.