Revolutionize Your Language Model with Dolly 2.0

Published On Mon May 08 2023

Dolly 2.0: ChatGPT Open Source Alternative for Commercial Use

Dolly 2.0 is an open-source, large language model (LLM) that has been fine-tuned on a human-generated dataset, making it suitable for both research and commercial purposes. The Databricks team released Dolly 1.0, which was similar to ChatGPT in terms of instruction following ability and training cost. However, Dolly 1.0 suffered from the issue of being trained on a restricted Stanford Alpaca dataset, making it unsuitable for commercial use.

Dolly 2.0 has resolved this issue by fine-tuning a 12B parameter language model (Pythia) on a high-quality human-generated instruction in the following dataset, which was labeled by a Datbricks employee. Both model and dataset are now available for commercial use. In addition, the new dataset contains 15,000 high-quality human-labeled prompt/response pairs, which can be used to design instruction tuning large language models. The databricks-dolly-15k dataset comes with Creative Commons Attribution-ShareAlike 3.0 Unported License, which allows anyone to use it, modify it, and create a commercial application on it.

In terms of performance, the dolly-v2-12b model has outperformed EleutherAI/gpt-neox-20b and EleutherAI/pythia-6.9b. While it has underperformed Dolly 1.0 in some evaluation benchmarks, it is important to note that the Dolly model family is under active development, and an updated version with better performance is likely in the future.

Dolly 2.0 is 100% open-source, with all components suitable for commercial use. Additionally, the Databricks team has set up a contest to generate high-quality human-generated data, which can be used to improve the performance of large language models. In conclusion, Dolly 2.0 offers an open-source alternative to ChatGPT that is suitable for commercial use and has the potential for further development and improvement.