How DataGemma Fixes AI Illusion With Google's Data Commons
The large language models (LLMs) that drive the advances in AI today are becoming increasingly complex. These models have the ability to analyze vast amounts of text to generate summaries, new creative ideas, and even code drafts. However, one of the challenges with generative AI is the occurrence of "hallucination," where false information is confidently produced.
Google has introduced new research findings aimed at addressing this issue by grounding LLMs in real-world statistical data from Google's Data Commons. This approach helps reduce hallucinations and improve the reliability of these models. In line with these developments, Google is proud to present DataGemma, the first publicly available models that connect LLMs with extensive real-world data sourced from Data Commons.
The Role of Data Commons
Data Commons is a publicly accessible knowledge network with over 240 billion data points across hundreds of thousands of statistical variables. It aggregates data from reputable sources like the World Health Organization (WHO), the United Nations (UN), the Centers for Disease Control and Prevention (CDC), and Census Bureaus. This consolidated repository of data enables researchers, policymakers, and organizations to gain valuable insights.
Imagine Data Commons as a vast library of trustworthy data on various topics, from economics and health to demographics and the environment. This data can be accessed and analyzed using Google's artificial intelligence (AI)-powered natural language interface.
Enhancing AI Models with DataGemma
Google aims to enhance the use of generative AI by integrating Data Commons into Gemma, a series of advanced, lightweight open models. DataGemma leverages the same technology and research as the Gemini models, empowering researchers and developers to utilize these models effectively.
Future Prospects
Google is committed to further refining these approaches to enhance the capabilities of Gemma and Gemini models. The ongoing research aims to make LLMs more reliable and trustworthy, ultimately promoting informed decision-making and a deeper understanding of the world around us.
If you are a researcher or developer interested in exploring DataGemma, you can utilize the provided quickstart notebooks for the RIG and RAG techniques.




















