Authors Escape OpenAI Bid for Entirety of ChatGPT Testing Data
Authors accusing OpenAI Inc. of copyright infringement persuaded a federal judge to partly overturn an order requiring them to share all of the methodology and data they used to test the flagship ChatGPT chatbot in preparation for their lawsuit.
The author-plaintiffs still must disclose the prompts, outputs, and account settings that resulted in the results they said in their complaint demonstrate infringement, but they won’t have to turn over testing data for queries that didn’t improperly reproduce their works, Judge Araceli Martinez-Olguin wrote in an order issued Thursday in the US District Court for the Northern District of California.
Authors accusing OpenAI Inc. of copyright infringement persuaded a federal judge to partly overturn an order requiring them to share all of the methodology and data they used to test the flagship ChatGPT chatbot in preparation for their lawsuit.
Lawsuit Details
Comedian Sarah Silverman and a dozen other authors sued OpenAI in June 2023, claiming OpenAI trained ChatGPT by copying hundreds of thousands of books without the authors’ permission. Their March 2024 amended complaint included examples of queries to ChatGPT to summarize in detail various parts of their writing, along with their responses.
As part of the discovery process, OpenAI requested documents about the OpenAI accounts, prompts, and outputs the authors used, and documentation of the authors’ methodology. The authors offered to produce full threads of the prompts and outputs that elicited the complaint’s examples but balked at providing other materials.
Discovery Process
OpenAI has made similar demands in other copyright suits against it, including one brought by New York Times Co. The AI company accused the Times of “prompt hacking” to obtain the results cited in its complaint, which the newspaper refuted in a filing.
The authors “offered up only their preferred, cherry-picked results,” OpenAI argued, and asked the court to require the authors to produce of the entirety of their test results. In June, Magistrate Judge Robert M. Illman sided with OpenAI and said the authors “cannot avoid the notion that by placing a large subset of these facts” in the complaint that they “have waived the ability to assert work product protection,” Illman said.
OpenAI has made similar demands in other copyright suits against it, including one brought by New York Times Co. The AI company accused the Times of “prompt hacking” to obtain the results cited in its complaint, which the newspaper refuted in a filing.
Court Rulings
Martinez-Olguin disagreed, saying the authors’ testing qualified as “virtually undiscoverable” opinion work product because “the ChatGPT prompts were queries crafted by counsel and contain counsel’s mental impressions and opinions about how to interrogate ChatGPT, in an effort to vindicate Plaintiffs’ copyrights against the alleged infringements.”
Latham & Watkins LLP; Morrison & Foerster LLP; and Keker, Van Nest & Peters LLP represent OpenAI. Joseph Saveri Law Firm LLP and Matthew Butterick represent the authors. Cafferty Clobes Meriwether & Sprengel LLP also represents author Michael Chabon.
The case is Tremblay v. OpenAI Inc., N.D. Cal., No. 23-cv-03223, order 8/8/24.










