Fine-Tuning Dolly 2.0: The Ideal Chat Bot for Sensitive Data

Published On Mon May 08 2023
Fine-Tuning Dolly 2.0: The Ideal Chat Bot for Sensitive Data

Building a Chat Bot that Follows Instructions Using Company-Specific Data

Building a chatbot that can follow instructions is a highly desirable feature for many companies, especially those dealing with sensitive and proprietary data. While off-the-shelf chatbots may seem like the easiest solution, many companies prefer to build their own models using their own data to maintain accountability and privacy.

A community of AI experts in San Francisco Bay areas has studied LLM-based instruction-following chatbots, including GPT-4, Alpaca, Koala, and Vicuna. After multiple discussions and customer feedback, the community believes that companies and enterprises would be best served by owning their models and creating higher quality models for their specific applications without handing over sensitive data to third parties.

One open-source chatbot that has been released for this purpose is Dolly 2.0 by Data Bricks, a 12B parameter language model based on the EleutherAI pythia model family. Dolly 2.0 has been fine-tuned exclusively on a new, high-quality human-generated instruction-following dataset that has been crowdsourced among Data Bricks employees, making it an ideal choice for companies that want to maintain control over their sensitive data.

It is worth noting that instruction-following chatbots do not require the latest or largest language models. Dolly 2.0, for example, is only 6 billion parameters compared to 175 billion for GPT-3. This means that companies of all sizes can afford to develop their own models to improve their products.

Building a company-specific chatbot has many benefits, including maintaining privacy and accountability while creating higher quality models. With the help of third-party consultants and off-the-shelf chatbot models, companies can train their own chatbots and transform LLMs from a commodity that only a few companies can afford to something that every company can own and customize to improve their products.