Meta's AI Scandal: Pirated Books Used with CEO's Approval

Published On Fri Jan 10 2025

Meta Knew It Used Pirated Books to Train AI, Authors Say

Meta Platforms used pirated versions of copyrighted books to train its artificial intelligence systems with approval from its CEO Mark Zuckerberg, a group of authors alleged in newly disclosed court papers. The authors sued Meta in 2023, arguing that the tech giant misused their books to train its large language model Llama.

The accusations were made by authors such as Ta-Nehisi Coates, comedian Sarah Silverman, and others who are suing Meta for copyright infringement. They said internal documents produced by Meta during the discovery process showed that the company was aware that the works they were using were pirated.

LibGen: publishers sue infamous 'shadow library' over pirated books

The case is one of several alleging that copyrighted works were used to develop AI products without permission. Defendants have argued that they made fair use of the copyrighted material.

New Allegations

The authors asked the court for permission to file an updated complaint based on new evidence. They claimed that Meta used the AI training dataset LibGen, which allegedly includes millions of pirated works, and distributed it through peer-to-peer torrents.

Internal Meta communications revealed that Zuckerberg had approved the use of the LibGen dataset despite concerns within Meta's AI executive team that it was pirated.

The authors claimed that there was evidence of Meta's awareness of the piracy involved in their AI training process, which could potentially strengthen their case.

Legal Proceedings

Last year, U.S. District Judge Vince Chhabria dismissed claims that text generated by Meta's chatbots infringed the authors' copyrights and that Meta unlawfully stripped their books' copyright management information. However, the authors argued that the new evidence justified reviving their CMI claim and adding a new computer fraud claim.