Teuken-7B: Revolutionizing Open Source Language Models

Published On Wed Nov 27 2024

Multilingual and open source: OpenGPT-X research project releases ...

The large language model of the OpenGPT-X research project is now available for download on Hugging Face. "Teuken-7B" has been trained from scratch in all 24 official languages of the European Union and contains 7 billion parameters. Researchers and companies can leverage this commercially usable open source model for their own artificial intelligence applications.

Multilingual and open source: OpenGPT-X research project releases ...

The OpenGPT-X Consortium

The OpenGPT-X consortium, led by the Fraunhofer Institutes for Intelligent Analysis and Information Systems IAIS and for Integrated Circuits IIS, have developed an AI language model that is open source and has a distinctly European perspective.

Model Features and Benefits

In the OpenGPT-X project, two years were spent researching the underlying technologies for large AI foundation models and training models with leading industry and research partners. The "Teuken-7B" model is freely available, providing a public, research-based alternative for use in academia and industry.

Tokenizer Development

In addition to model training, the OpenGPT-X team addressed research questions related to training and operating multilingual AI language models more efficiently. The project developed a multilingual "tokenizer" to optimize model performance across multiple languages.

Towards a seamless data cycle for space components: considerations ...

Industry Applications and Innovations

The technology developed in OpenGPT-X will provide a basis for training own models in the future, enabling companies to create customized AI solutions for various applications without relying on third-party components.

Collaborative Efforts and Future Outlook

The collaborative efforts of the consortium partners have led to valuable foundational technology in the OpenGPT-X project. The research project, which began in 2022, is set to conclude in March 2025, allowing for further optimizations and evaluations of the models.

Interested developers can download Teuken-7B free of charge from Hugging Face for research or commercial purposes. The model has been optimized for chat applications through "instruction tuning," enhancing its usability in practical scenarios.