OpenAI's New Models: o1-preview, o1-mini - A Paradigm Shift in AI Technology

Published On Fri Sep 13 2024
OpenAI's New Models: o1-preview, o1-mini - A Paradigm Shift in AI Technology

New OpenAI Models: o1-preview, o1-mini. New Means Better? | by Blazej Kunke

Today, two new models from OpenAI, the creators of the well-known ChatGPT, were released. They allow for slower but deeper consideration of difficult problems that previous editions struggled with. What kind of problems? On their website, OpenAI boasts that the new model handles mathematics and programming remarkably well. In the qualifying exam for the International Mathematical Olympiad, GPT-4o (the previous best model) solved only 13% of the problems correctly, while the o1-preview model scored 83%. source.

Testing OpenAI’s 3 Latest Models: o1-preview, o1-mini, 4o

This is a completely new approach, moving away from instant answers to more thoughtful, calm analysis. To me, it’s like talking to an expert — an older person who needs a moment to think but then provides a more nuanced answer. The previous models were more like a student at an oral exam, blurting out something quickly, confidently, but was it accurate? Who knows.

However, we should exercise caution here. It’s too early to draw conclusions from a model that was just released a few hours ago — I’m not claiming that the answers that take longer to generate will necessarily be less prone to errors than those from previous models.

A Deep Dive into GPT Models: Evolution & Performance Comparison

Nomen non est omen

So, I decided to run a test on the two new models compared to the previous best one, and I asked them the same question:

How will raising the minimum wage in Poland by 500 PLN affect the labor market, inflation, interest rates, and Poland’s public debt?

Model Size Comparison Of GPT Models Introduction To GPT 4 ChatGPT

The responses I received were different, though not as dramatically as I had expected. This might be a good time to ask for expert opinions from labor market and public finance specialists — but I’ll refrain from a very detailed assessment, as the responses seem relatively similar in quality. I will, however, point out their length.

It’s important to keep the cons in sight as well.

Nomen non est omen — the model with “mini” in its name gave me the longest answer, over 30% longer than model 4o. Meanwhile, o1-preview’s answer was 10% longer.

You should be aware of certain limitations and issues that both new models have compared to the previous version. This brings me back to the question I posed in the title: does new necessarily mean better? Maybe. The quality and speed (I didn’t measure this precisely, though it’s a good idea for future articles) of the responses in this single example didn’t differ significantly, but unfortunately, the new models are weaker than the previous one in two key aspects: they can neither browse the web for answers, nor can they accept files like PDFs or images as part of a prompt.

What’s Next?

The newest model from OpenAI hasn’t even been live for a full day at the time of writing this — further testing and use will determine whether it proves successful and if its limitations will discourage most users. It’s possible that this will be a good solution for programmers or geneticists, but most other users might return to the tried-and-true 4o version.

Is the new gpt-3.5-turbo model worse?

OpenAI promises to continue developing and adding new features, and to make this model more widely available — even to free users.

Do you have a question or comment? Feel free to join the discussion on my website and social media: Threads, X, LinkedIn. For those interested in a deeper analysis, I’m happy to share the models’ responses.