Unlocking the Potential of MPT-7b Models with a 50/50 Merge

Published On Mon May 08 2023

Merging MPT-7b Storywriter and MPT-7b Instruct: Creating the 50/50 Merge

The MPT-7b model has gained popularity for its ability to generate text with remarkable fluency. This ability to generate extended prose, however, has always been limited by the context of the training data. This has led to a high level of interest in merging MPT-7b models to expand their capacity to generate long-form content.

The MPT-7b-InstructAndStorywriting-50_50-Merge is a model that explores this possibility. It is a merge between the MPT-7b Storywriter and MPT-7b Instruct models. The merge was done using a weighted average merge strategy that resulted in a model composed of 50% of each parent model.

The idea behind this merge is to test the theory of how long context tunes affect attention when merged with a model that has been trained for a different purpose on a shorter context span. The aim is to create a new model that is capable of generating long prose while inheriting some of the Assistant / Instruct / Helpful properties of the Instruct base.

The MPT-7b Storywriter model, being one of the parent models, brings with it a wide array of books sampled for it, which can generate content that is considered NSFW.

Prompting the MPT-7b-InstructAndStorywriting-50_50-Merge Model

The specific prompting for the model is unknown, but it is suggested to approach it first as a story or text completion prompt style. Following that, use a mixed approach of Alpaca's instruct format to see what results in the most interesting output.

If you want to run inference on this model, read through the original model card for instructions on how to proceed.