Decoding the ChatGPT Algorithm: A Comprehensive Guide

Published On Wed Jan 22 2025
Decoding the ChatGPT Algorithm: A Comprehensive Guide

From Concept to Code: Unveiling the ChatGPT Algorithm | Towards AI

For the past two years, ChatGPT and Large Language Models (LLMs) in general have been the big thing in artificial intelligence. Many articles about how-to-use, prompt engineering, and the logic behind have been published. Nevertheless, when I started familiarizing myself with the algorithm of LLMs — the so-called transformer — I had to go through many different sources to feel like I really understood the topic.

In this article, I want to summarize my understanding of Large Language Models. I will explain conceptually how LLMs calculate their responses step-by-step, go deep into the attention mechanism, and demonstrate the inner workings in a code example.

1.1 Introduction to Transformers

Tokens are the basic building blocks for text processing in Large Language Models. The process of splitting text into tokens is called tokenization. Depending on the tokenization model, the received tokens can look quite different. Some models split text into words, others into subwords or characters. Tokenization models also include punctuation marks and special tokens like <start> and <stop> for controlling the LLM's output to a user interaction.

The basic idea of tokenization is to split the processed text into a potentially large but limited number of tokens the LLM knows.

1.2 Tokenization

Fig. 1.2.1 shows a simple example. The context "Let’s go in the garden" is split into the seven tokens "let", "‘", "s", "go", "in", "the", "garden". These tokens are known to the LLM and will be represented by an internal number for further processing.

A Comprehensive Analysis of Various Tokenizers for Arabic Large

1.3 Word Embedding

Word embedding translates tokens into large vectors with usually several hundred or several thousand dimensions. The higher the embedding depth, the more information the embedding can capture.

Understanding Positional Encoding in Transformers - Blog by Kemal ...

In Fig. 1.3.1, the tokenizer has split the sentence "Let’s go in the garden" into tokens, and the word embedding translates these tokens into vectors.

1.4 Positional Encoding

Positional encoding is used to specify the position of the token in the context. It adds a vector of the same size as the word embedding to each embedding vector.

Understanding Positional Encoding in Transformers - Blog by Kemal ...

Fig. 1.4.1 continues the example of "Let’s go in the garden" by showing positional vectors added to each token.

1.5 Attention Mechanism

The attention mechanism is the heart of the transformer and is the main reason why ChatGPT is so good at language processing. It revolves around the concept of "context".

We will dive deeper into the attention mechanism and other key components in the upcoming sections.

...

My personal impression is that the way positional vectors are calculated is less important for the performance of a transformer. However, it is crucial to use positional encoding, no matter how the vectors have been calculated.

1.6 Layer Norm

The layer normalization process in transformers.

1.7 Feed Forward

Explaining the feed-forward mechanism in transformers.

1.8 Softmax

Understanding the softmax function in the context of transformers.

1.9 Multinomial

Discussing the multinomial distribution in transformer models.

Stay tuned for the upcoming sections where we will delve into the data preparation, attention heads, model training, and more.