Unveiling the Secrets of AI Language Models: A Deep Dive

Published On Mon Jan 13 2025

Almost Timely News: 🗞️ A Semi-Technical Deep Dive into AI Language Models (2025-01-12)

🚍 Download my new 2025 Marketing and AI Trends Report, free!

Understanding Tokens in AI

This week, let’s do a very deep dive into the technical guts of generative AI, specifically large language models. To make AI work optimally, it helps to understand at least a few of the concepts, so we understand why they do what they do.

Generative AI begins with tokens. What’s a token? It’s a unit of information that the AI uses to learn and generate text. Think of it like ingredients in a recipe. To understand language, AI needs to break it down into its basic components: tokens.

For large language models, using character-level tokenization is too granular, and it makes it hard for the AI to see the bigger picture. Word-level tokenization, on the other hand, would result in a gigantic recipe book. Hence, subword tokenization is used, which breaks words down into familiar parts like “straw,” “berry,” “chocolate,” and “cake.”

5 Attention Mechanism Insights Every AI Developer Should Know

Vectorization and Embeddings

Once a text is tokenized, the next step is to convert those numbers into vectors and embeddings. These embeddings help the model understand where words are located on a map and how they relate to each other. It’s like the coordinates and distances that Google Maps uses to calculate the best “route” between words.

Embeddings allow large language models to understand and generate human language based on the relationships between tokens.

A Classification of Large Language Models Capabilities

The Attention Mechanism

Introduced in 2017, the attention mechanism is a novel way of predicting tokens for generative AI. It allows the model to consider a large amount of text when making predictions, not just the few words immediately preceding. This mechanism helps the model decide which words are most relevant to the prediction it’s making at the moment.

Every word that appears on screen can be taken into account when the AI is predicting the next word, with the attention mechanism determining the most relevant words for the prediction.