Decoding the black box: How ChatGPT's creators used AI to explain its behavior

Published On Sat May 13 2023

ChatGPT creators use AI to understand its behavior

The team behind ChatGPT has attempted to explain itself using artificial intelligence. However, they ran into some issues, including the fact that the AI might be using concepts that humans do not have names for or understanding of.

To overcome the "black box" problem with large language models like GPT-2, researchers at OpenAI used the most recent version of its model, known as GPT-4, to try and explain the behavior of GPT-2. This is not only a problem for researchers, but also for recognizing possible biases and ensuring the truthfulness of the information being provided.

Engineers and scientists are working towards a solution to this problem through "interpretability research." This involves looking at the "neurons" that make up the AI system, but it is difficult because manual inspection for billions of parameters is impossible.

Researchers at OpenAI attempted to automate the process using GPT-4 by creating an automated process that could allow the system to provide natural language explanations of the neuron's behavior and apply it to an earlier language model. This did not go well, but the experiment showed that it would be possible to use the AI technology to explain itself with further work.

However, the problem of explaining the behavior of the system in normal language persisted because certain neurons could represent many distinct concepts or represent single concepts that humans cannot understand or have words for. Additionally, the system uses a lot of computing power.

The chatGPT creators hope that their work in interpretability research will lead to a more transparent and trustworthy AI system.