Challenging Google's Narrative: Examining Gemini's Data-Analyzing Skills

Gemini's data-analyzing abilities aren't as good as Google claims

One of the selling points of Google's flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the amount of data they can supposedly process and analyze. In press briefings and demos, Google has repeatedly claimed that the models can accomplish previously impossible tasks thanks to their "long context," like summarizing multiple hundred-page documents or searching across scenes in film footage.

But new research suggests that the models aren't, in fact, very good at those things. Two separate studies investigated how well Google's Gemini models and others make sense out of an enormous amount of data — think "War and Peace"-length works.

Google Gemini: Everything you need to Know

Data Analysis Challenges

Both find that Gemini 1.5 Pro and 1.5 Flash struggle to answer questions about large datasets correctly; in one series of document-based tests, the models gave the right answer only 40%-50% of the time.

"While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don't actually 'understand' the content," Marzena Karpinska, a postdoc at UMass Amherst and a co-author on one of the studies, told TechCrunch.

Challenges with Context

A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). The newest versions of Gemini can take in upward of 2 million tokens as context, which is the largest context of any commercially available model.

33 AI Prompts for Data Analysis - PromptDrive.ai

In a briefing earlier this year, Google showed several pre-recorded demos meant to illustrate the potential of Gemini's long-context capabilities. However, recent research and studies have highlighted significant challenges and limitations in Gemini's data-analyzing abilities.

Limitations in Understanding Complex Information

Test results showed that Gemini models struggled with verifying claims that require considering larger portions of the text or even the entire document. The studies also revealed difficulties in verifying claims about implicit information that is clear to a human reader but not explicitly stated in the text.

Challenges with Video Analysis

The studies also tested the ability of Gemini 1.5 Flash to "reason over" videos. Results indicated that Flash had difficulties in transcribing and recognizing objects in images, showcasing limitations in its data analysis capabilities.

Both studies raise questions about the efficacy of Google's Gemini models and shed light on the discrepancies between Google's claims and the actual performance of these AI systems.

Generative AI technology, in general, is facing increased scrutiny as businesses and investors become more aware of the limitations and challenges associated with these advanced AI models.