Revolutionizing Open-Source Language Models: Meet MPT-7B

Published On Mon May 08 2023
Revolutionizing Open-Source Language Models: Meet MPT-7B

Introducing MPT-7B: A New Standard for Open-Source Language Models

Large language models have been changing the world, but for those outside of well-resourced industry labs, it can be challenging to train and deploy these models. To address this, MosaicML has released a new model series, the MosaicML Pretrained Transformer (MPT), which is an open-source model that matches and surpasses current models.

The MPT model series includes the MPT-7B Base, MPT-7B-StoryWriter-65k+, MPT-7B-Instruct, and MPT-7B-Chat, all of which were rigorously evaluated and have demonstrated improved performance.

MPT-7B Base

MPT-7B Base is a decoder-style transformer with 6.7 billion parameters, which was trained on 1 trillion tokens of text and code that was curated by MosaicML’s data team. The base model includes FlashAttention for fast training and inference and ALiBi for finetuning and extrapolation to long context lengths.

MPT-7B-StoryWriter-65k+

MPT-7B-StoryWriter-65k+ is designed to read and write stories with ultra-long context lengths. It was built by finetuning MPT-7B with a context length of 65k tokens on a filtered fiction subset of the books3 dataset. At inference time, thanks to ALiBi, MPT-7B-StoryWriter-65k+ can extrapolate even beyond 65k tokens, as demonstrated by generations as long as 84k tokens on a single node of A100-80GB GPUs.

MPT-7B-Instruct

MPT-7B-Instruct is a model for short-form instruction following. It was built by finetuning MPT-7B on a dataset derived from Databricks Dolly-15k and Anthropic’s Helpful and Harmless datasets.

MPT-7B-Chat

MPT-7B-Chat is a chatbot-like model for dialogue generation. It was built by finetuning MPT-7B on the ShareGPT-Vicuna, HC3, Alpaca, Helpful and Harmless, and Evol-Instruct datasets.

MosaicML has open-sourced the entire codebase for pretraining, finetuning, and evaluating MPT through its new MosaicML LLM Foundry, an entire framework for building great LLMs with MosaicML’s usual emphasis on efficiency, ease-of-use, and rigorous attention to detail. Customers can train MPT models with efficiency without diverging from loss spikes and can serve MPT models with both standard HuggingFace pipelines and FasterTransformer.

MPT-7B matches the quality of LLaMA-7B and outperforms other open-source 7B - 20B models on standard academic tasks. To evaluate model quality, 11 open-source benchmarks commonly used for in-context learning (ICL) were compiled, and MPT was evaluated in an industry-standard manner. An added Jeopardy benchmark was also used to evaluate the model’s ability to produce factually correct answers to challenging questions.

The MosaicML platform and a single node of 8xA100-40GB can easily finetune MPT-7B to handle context lengths up to 65k. The ability to handle such extreme context length adaptation comes from ALiBi, one of the key architectural choices in MPT-7B.

In conclusion, MPT-7B provides a commercially-usable, open-source model that matches and surpasses current models, and MosaicML has open-sourced the entire codebase for building great LLMs with ease-of-use and rigorous attention to detail. Customers can train and serve MPT models with efficiency and handle extreme context length adaptation up to 65k.