Apple shows off open AI prowess: new models outperform Mistral ...
Introduction
Just like DCLM-7B, the smaller 1.4B version of the model, trained jointly with Toyota Research Insitute on 2.6 trillion tokens, also delivers impressive performance across MMLU, Core and Extended tests.
Open Source Initiative
Something worth noting is the project was made truly open source with the release of the model weights, the training code, and the pretraining dataset.
Data Curation Techniques
To demonstrate the effectiveness of the curation technique, the resulting dataset, DCLM-Baseline, was used to train the new DCLM decoder-only transformer English language models with 7 billion and 1.4 billion parameters from scratch.