YouTube vs OpenAI: The Ethics of AI Training

Published On Fri Jun 07 2024
YouTube vs OpenAI: The Ethics of AI Training

YouTube says OpenAI training Sora with its videos would break ...

The use of YouTube videos to train OpenAI's text-to-video generator would be an infraction of the platform's terms of service, YouTube Chief Executive Officer Neal Mohan said. In his first public remarks on the topic, Mohan said he had no firsthand knowledge of whether OpenAI had, in fact, used YouTube videos to refine its artificial intelligence-powered video creation tool, called Sora. But if that were the case, it would be a "clear violation" of YouTube's terms of use, he said.

"From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations," Mohan said Thursday in an interview with Emily Chang, host of Bloomberg Originals. "One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform."

Is OpenAI's Sora Trained on YouTube Videos? A Question of Ethics

Debate Over AI Training

There has been much public debate over what material OpenAI uses to train the AI models underlying popular content creation products such as ChatGPT and DALL-E. Sora and other generative AI tools work by sucking up all sorts of content from around the web and using that data as the foundation from which the tools can generate new content, including videos, photos, narrative text and more. As companies like OpenAI, Google and others race to develop more powerful artificial intelligence, they are looking to source as much content as possible to train their AI models to get better quality results. Google and YouTube are units of Alphabet Inc.

Content Licensing and AI Training

Mohan said Google adheres to YouTube's individual contracts with creators before deciding whether to use videos from the platform in training the company's own powerful AI model, Gemini. "Lots of creators have different sorts of licensing contracts in terms of their content on our platform," Mohan said. Though "some portion of that YouTube corpus maybe being used" to train models like Gemini, Google and YouTube ensure that using the videos as training data for Google's AI is "in concert with whatever the terms of service or the contract that that creator has signed" beforehand, he said.