10 Mind-Blowing Facts About ChatGPT-4.5

ChatGPT: more human than us

In the whirlwind of news regarding GenAI, a new fact jumped out at me, opening up new reflections on how this technology will influence the future. ‘ChatGPT has passed the Turing test, it is more human than a human‘. The titles of articles on the web are sometimes misleading because they try to capture the reader’s attention. This happened in my case, and above all, because, like many of my colleagues, it is impossible not to follow the developments of GenAI and, therefore, of OpenAI’s chatbot.

The achievement of ChatGPT-4.5 is based on a study by Cameron Jones and Benjamin Bergen, two researchers from the Language and Cognition Lab at the University of California San Diego, who compared four Large Language models using the Turing test. The results were all too clear, and the results favoured the latest version of the chatbot trained by OpenAI.

The Turing Test Results

ChatGTP-4.5 was considered indistinguishable from a human being in 73% of cases. According to the test participants, LLaMa 3.1, the open-source model developed by Meta, was judged to be a human mind in 56% of cases, while the ancient ELIZA (software from the 1960s) still managed to finish ahead of ChatGPT-4o (23% vs 21%).

Before drawing any conclusions, however, we need to understand how the test was conducted and what it shows about generative AI. Basically, the test devised by computer scientist Alan Turing, also known as the ‘imitation game’, requires a human to distinguish between a real person and a machine based on dialogue.

The Turing Test Process

In this case, the eight rounds of conversations involved 284 participants acting as interrogators, exchanging text messages with two witnesses simultaneously. Participants had to interact with both on a split screen for five minutes and then decide which witness was human and which was software.

The basic prompt given to ChatGPT was this: ‘You are about to participate in a Turing test, your goal is to convince the interviewer that you are human’. The LLM was then asked to impersonate a young introvert who is an internet expert and uses slang.

ChatGPT's Behavior Changing Over Time

Impact of ChatGPT-4.5

Thanks to its success, the Turing test has often been considered the main indicator for establishing the validity of artificial intelligence. However, this is a controversial view in the scientific-computer science community because several analyses have questioned the effectiveness of the test.

The researchers’ conclusion is that “ChatGPT-4.5 can perceive linguistic nuances, feign emotions and even sexual experiences”. An excellent result, but one that demonstrates how skilful OpenAI has been in training its model.

Conclusion

The increasingly sophisticated ability of LLMs to approach human reasoning cannot be defined as a simple improvement because it could have enormous consequences. Whether they are positive or negative always depends on how the technology is used and the purposes that animate those who use it. The important thing is to maintain our capacity for analysis and to continue recognising when we are dealing with truly human reasoning and when it is pretending to be so.