Enhanced Efficiency: Meta's Multi Token Prediction for LLMs

Published On Fri May 10 2024

Meta's Multi Token Prediction Can Speed Up LLMs by 3 Times

Researchers have suggested that training AI large language models (LLMs) to predict multiple tokens simultaneously can enhance their speed and accuracy.

Snowflake Arctic - LLM for Enterprise AI

Meta has made significant advancements in the field of Generative AI recently. They introduced their AI Assistant called Meta AI on WhatsApp, Instagram, and Messenger. Additionally, they unveiled Llama 3, which stands as one of the most powerful LLMs available.

Enhancing Sample Efficiency with Multi Token Prediction

In a recent study, researchers showcased that enhancing the sample efficiency of LLMs is achievable by training them to predict multiple tokens simultaneously. This approach involves having the LLM predict several future tokens from each position in the training data concurrently.

The study proposed a simple multi token prediction architecture that does not incur additional training time or memory overhead.

Paper page - Better & Faster Large Language Models via Multi-token ...

According to the researchers, this method results in higher sample efficiency as compared to traditional next-token prediction loss used in training language models such as GPT and Llama.

Challenges with Next-Token Prediction

Although next-token prediction has been a common training strategy for LLMs, it still falls short in developing language, general knowledge, and reasoning skills effectively. Researchers have highlighted limitations in using next-token prediction, where the model tends to focus on local trends over making challenging decisions.

By contrast, multi-token prediction offers a more efficient and powerful way to train transformer models. The model architecture includes multiple output heads, each dedicated to predicting a specific token, which accelerates the decoding process.

In a recent podcast episode titled AI Breakdown, experts discuss the impact of multi-token prediction on AI development.

Benefits of Multi-Token Prediction

Through evaluating various tasks using models of different sizes, researchers found that multi-token prediction yields better results as the model size increases. Notably, models with billions of parameters outperformed baseline single-token prediction on coding benchmarks.

The study demonstrated that pretraining with multi-token prediction enhances the model's ability for self-speculative decoding, leading to improved accuracy during inference.

The Future of Multi-Token Prediction

Researchers are exploring potential research directions to optimize multi-token prediction further. They are considering methods to determine the optimal number of tokens to predict and investigating the impact of vocabulary sizes on multi-token predictions.

LLM Training: Strategies for Efficient Language Model Development

Overall, multi-token prediction shows promise in accelerating AI models and enhancing their accuracy, particularly in generative tasks like code completion. The research presents a new perspective on training language models for such tasks, offering insights into speeding up AI models.