FastChat is an open platform for training, serving, and evaluating large language model based chatbots. It provides a core set of features, including:
- Training, serving, and evaluating large language models for chatbots
- Compatibility with the OpenAI API
- AI-enhanced evaluation pipeline based on GPT-4
- Multiple models available for testing and evaluation
FastChat is available on GitHub and can be accessed using Git or SVN. It also has an official CLI that allows users to work faster. To use Codespaces, users must sign in. If there are issues with the Codespaces, GitHub Desktop or Xcode can be downloaded and used instead.
Using FastChat
FastChat provides detailed instructions on their GitHub page on how to use their platform. For Mac users, Vicuna weights are released as delta weights to comply with the LLaMA model license. Adding deltas to the original LLaMA weights will yield the Vicuna weights. Additionally, there are instructions on how to reduce CPU RAM requirements for weight conversion.
One can start chatting by running a single line of code. The model parallelism of FastChat also allows for the aggregation of GPU memory from multiple GPUs. There is also an option to enable 8-bit compression to reduce memory usage by half at the cost of slightly degraded model quality. FastChat also allows users to serve their models using a web UI. The architecture of this service is described on their GitHub page along with detailed instructions.
AI-Enhanced Evaluation Pipeline
FastChat provides an AI-enhanced evaluation pipeline based on GPT-4. The pipeline consists of generating answers from different models, generating reviews with GPT-4, generating visualization data, and visualizing the data. A static website provides an interface for visualizing the data. Evaluation is performed with a data format encoded with JSON Lines. The data format includes information on models, prompts, reviewers, questions, answers, and reviews. The evaluation process can be customized or contributed to by accessing the relevant data.
Vicuna
Vicuna is a chatbot created by fine-tuning a LLaMA base model using approximately 70K user-shared conversations gathered from ShareGPT.com with public APIs. The ShareGPT data is cleaned by converting HTML back to markdown and filtering out inappropriate or low-quality samples. Lengthy conversations are divided into smaller segments that fit the model's maximum context length. Vicuna is trained using similar hyperparameters as the Stanford Alpaca. Fine-tuning code can be run with dummy questions in dummy.json. Vicuna can be trained on 8 A100 GPUs with 80GB memory using SkyPilot, a framework built by UC Berkeley for running ML workloads on any cloud.
Overall, FastChat provides a powerful platform for training, serving, and evaluating large language model based chatbots. Its user-friendly interface makes it accessible to both beginner and advanced users.