Zuckerberg Appeared to Know Meta Trained AI on Pirated Library
The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram.
Questionable Data Sources
As first revealed in a motion filed by attorneys for novelists Christopher Golden and Richard Kadrey and comedian Sarah Silverman, Meta's use of Library Genesis, a known "shadow library" of pirated ebooks and PDFs, for training its AI model raised copyright concerns. Despite acknowledging that LibGen was pirated, CEO Mark Zuckerberg reportedly approved its use for training the next iteration of its AI language model, Llama.
Legal and Ethical Concerns
Under scrutiny from a class-action lawsuit, internal dialogues at Meta revealed discussions about leveraging pirated data from LibGen, with employees strategizing on ways to process and filter the data to avoid copyright indicators. The unsealed documents also highlighted the potential legal repercussions and negative media attention Meta could face if its use of pirated data was discovered by external parties.
Meta's Handling of the Situation
Meta's team discussed methods for obtaining the LibGen data set, including concerns about torrenting from corporate laptops. Despite these ethical dilemmas, Meta continued to push for the use of pirated data to improve its AI models, risking backlash and legal action.
Implications for the Tech Industry
The revelations from Meta's internal communications shed light on the challenges tech companies face in navigating copyright laws while advancing AI technology. The class-action lawsuit against Meta could set a precedent for similar suits against AI companies regarding copyright infringement and data usage without permission.
As the tech industry continues to grapple with these ethical and legal challenges, the responsibility to respect copyright and intellectual property rights remains crucial in the development of AI technologies.




















