OpenAssistant Launches Its Open-Source ChatGPT Competitor
The OpenAssistant project has released its open-source AI assistant to compete with OpenAI's ChatGPT. The project started in December of the previous year, shortly after OpenAI launched ChatGPT. OpenAssistant's goal is to create an open-source AI assistant possessing similar capabilities to ChatGPT. The team collected a massive amount of "human-generated, human-annotated assistant-style conversation corpus" with the help of more than 13,500 volunteers. The corpus consists of 161,443 messages distributed across 66,497 conversation trees in 35 different languages, annotated with 461,292 quality ratings.
The OpenAssistant team has utilized the collected instructional data to refine several language models, including variants of Meta's LLaMA model and EleutherAI's Pyhtia model. The largest variant is based on the LLaMA model with 30 billion parameters. Like Alpaca or Vicuna, the models are "instruction-tuned" and not improved further by human feedback. The results generated by the chatbots should approach those from ChatGPT's gpt-3.5-turbo model, according to a comparative study with volunteers.
The Pythia models are already available, and the LLaMA models will be released shortly. Although the LLaMA models cannot be used commercially due to Meta's licensing, the Pythia models are available for commercial use. Besides, the team released the code and collected data with OpenAssistant Conversations. In addition, all models can be tried out through a web interface, where conversations can be evaluated and used to further improve the models.
OpenAssistant's models exhibit the well-known problems of large language models, such as hallucinations, according to the accompanying paper. Moreover, the training data collected was mostly contributed by male annotators, with a median age of 26, which may introduce biases into the dataset. Nevertheless, the team took steps to detect and remove harmful messages from the dataset, but state that the system is not infallible.
The team said that AI research, mainly in the area of large-scale language models and their adaptation to human values, have been limited to a handful of research labs with the resources to train and collect data. OpenAssistant, with its published models and freely available dataset, is an attempt to democratize this research. This is a clear attempt to counter OpenAI's approach of making the development of its own language models and data sourcing increasingly opaque, and of conducting alignment research with a small group of selected specialists.
OpenAssistant was founded by Andreas Köpf, Yannic Kilcher, Huu Nguyen, and Christoph Schumann and includes a team of over 20 developers, data and security experts, and a moderation and documentation team. The project is supported with computational resources, tools, and other assistance by Redmond AI, Hugging Face, Weights & Biases, as well as Stability AI and LAION.