Unveiling ChatGPT's Struggle with Reproducible Science

ChatGPT struggles to recognize reproducible science | Knowledge ...

The quality of answers provided by ChatGPT matters with over 100 million users and approximately 1 billion monthly website visits. Large language models have the potential to drive scientific breakthroughs by processing vast amounts of information in seconds and learning from data at a scale and speed unattainable by humans, but recognizing reproducibility, a core aspect of high-quality science, remains a challenge.

Study on ChatGPT and Scientific Reproducibility

Our study investigates the effectiveness of ChatGPT (GPT\(-\)3.5) in evaluating scientific reproducibility, a critical and underexplored topic, by analyzing the methods sections of 158 research articles. In our methodology, we asked ChatGPT, through a structured prompt, to predict the reproducibility of a scientific article based on the extracted text from its methods section.

Research's latest challenge: Reproducibility of Results

The findings of our study reveal significant limitations: Out of the assessed articles, only 18 (11.4%) were accurately classified, while 29 (18.4%) were misclassified, and 111 (70.3%) faced challenges in interpreting key methodological details that influence reproducibility. Future advancements should ensure consistent answers for similar or same prompts, improve reasoning for analyzing technical, jargon-heavy text, and enhance transparency in decision-making.

Additionally, we suggest the development of a dedicated benchmark to systematically evaluate how well AI models can assess the reproducibility of scientific articles. This study highlights the continued need for human expertise and the risks of uncritical reliance on AI.

Access the Full Article and Dataset

This is a preview of subscription content, log in via an institution to check access.

Price excludes VAT (USA). Tax calculation will be finalized during checkout. Instant access to the full article PDF. Institutional subscriptions

1,500 scientists lift the lid on reproducibility | Nature

Our full dataset, including the extracted methods sections and responses from ChatGPT to our prompt, is shown here. It is also accessible through our repository (https://bitbucket.org/nordlinglab/nordlinglab-reprod-chatgpt/).