Meta's Controversial AI Training on Pirated Library Exposed

Zuckerberg Appeared to Know Meta Trained AI on Pirated Library

The AI rush has brought with it thorny questions of copyright and ownership of data as tech companies train bots like ChatGPT on existing texts, but it seems Meta largely brushed these aside as they worked to integrate such tools into Facebook and Instagram.

Zuckerberg Appears to Know Meta Trained AI on Pirated Library

Questionable Data Sources

As first revealed in a motion filed by attorneys for novelists Christopher Golden and Richard Kadrey and comedian Sarah Silverman, Meta's use of Library Genesis, a known "shadow library" of pirated ebooks and PDFs, for training its AI model raised copyright concerns. Despite acknowledging that LibGen was pirated, CEO Mark Zuckerberg reportedly approved its use for training the next iteration of its AI language model, Llama.

Legal and Ethical Concerns

Under scrutiny from a class-action lawsuit, internal dialogues at Meta revealed discussions about leveraging pirated data from LibGen, with employees strategizing on ways to process and filter the data to avoid copyright indicators. The unsealed documents also highlighted the potential legal repercussions and negative media attention Meta could face if its use of pirated data was discovered by external parties.

Meta Secretly Trained Its AI on a Notorious Piracy Database

Meta's Handling of the Situation

Meta's team discussed methods for obtaining the LibGen data set, including concerns about torrenting from corporate laptops. Despite these ethical dilemmas, Meta continued to push for the use of pirated data to improve its AI models, risking backlash and legal action.

Implications for the Tech Industry

The revelations from Meta's internal communications shed light on the challenges tech companies face in navigating copyright laws while advancing AI technology. The class-action lawsuit against Meta could set a precedent for similar suits against AI companies regarding copyright infringement and data usage without permission.

Meta's LibGen controversy reveals how desperate AI companies are ...

As the tech industry continues to grapple with these ethical and legal challenges, the responsibility to respect copyright and intellectual property rights remains crucial in the development of AI technologies.