Open AI Introduces A New Series of Reasoning Models for Solving Complex Problems
Open AI has recently unveiled a cutting-edge series of AI models that have been specifically designed to engage in more thoughtful reasoning before providing responses. These advanced models are capable of tackling intricate tasks and solving challenging problems across various domains such as science, coding, and mathematics.
Introduction of the New Series
The initial release within this new series is now available in ChatGPT and the Open AI API. This launch serves as a preview, with ongoing updates and enhancements to be anticipated in the future. Additionally, evaluations for the forthcoming update, currently under development, have been included alongside this release.
Functionality and Performance
These models have been trained to dedicate more time to pondering problems before formulating responses, emulating the cognitive process of a human. Through the training process, they refine their thinking mechanisms, experiment with various strategies, and learn from their mistakes.
In performance evaluations, the upcoming model update has exhibited results comparable to those of PhD students when undertaking challenging benchmark tasks in physics, chemistry, and biology. Moreover, it has demonstrated exceptional proficiency in mathematics and coding. For example, during a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o accurately solved only 13% of problems, whereas the reasoning model achieved an impressive 83% success rate. The model's coding prowess was validated in competitions, reaching the 89th percentile in Codeforces contests. Further technical insights are available in our research post.
Safety Measures
With the introduction of these new models, Open AI has implemented a novel safety training approach that leverages their reasoning abilities to ensure adherence to safety and alignment protocols. By enabling the models to reason within the context of safety guidelines, they can effectively apply these principles.
One aspect of safety verification involves assessing how well the model adheres to safety rules when faced with attempts to circumvent them (referred to as "jailbreaking"). In a challenging jailbreaking test scenario, GPT-4o scored 22 out of 100, while the o1-preview model achieved a significantly higher score of 84. Further details are provided in the system card and associated research materials.
Target Audience
The enhanced reasoning capabilities of these models are particularly beneficial for individuals dealing with complex challenges in science, coding, mathematics, and related fields. Healthcare researchers may utilize o1 for annotating cell sequencing data, physicists to generate intricate mathematical formulas essential for quantum optics, and developers across various sectors for constructing and executing multi-step workflows.
OpenAI o1-mini
In addition to the primary o1 series, Open AI is introducing the o1-mini version, which excels in generating and debugging complex code efficiently. This streamlined model offers a cost-effective solution for developers, being 80% cheaper than the o1-preview model while retaining high effectiveness in coding tasks.
Utilizing OpenAI o1
Users of ChatGPT Plus and Team can now access the o1 models within ChatGPT. The model selection process allows for manual choosing of both o1-preview and o1-mini, with initial weekly rate limits set at 30 messages for o1-preview and 50 for o1-mini. Efforts are underway to enhance these limits and enable ChatGPT to automatically determine the most suitable model based on the input prompt.