Facebook posts from as far back as 2007 being used to train Meta AI
Text and photos posted publicly to Facebook and Instagram from as far back as 2007 are being used to train Meta's artificial intelligence models, a Parliamentary committee has heard.
Representatives from Meta, the owner of Facebook and Instagram, were pressed on their AI models during a Senate select committee hearing on Wednesday. Melinda Claybaugh, global privacy policy director at Meta, confirmed that unless a person had set their profile or posts to private, their social media posts were being used to train Meta's AI models.
Greens Senator David Shoebridge pressed Ms. Claybaugh on the issue. "The truth of the matter is unless you had consciously set those posts to private since 2007, Meta has just decided it will scrape all of the photos and all of the text from every public post on Instagram or Facebook that Australians have shared since 2007 unless there was a conscious decision to set them in private. That is actually the reality, isn't it?" he said.
"Correct," Ms. Claybaugh said.
Privacy Concerns
She attempted to continue her answer before committee chair and Labor Senator Tony Sheldon interjected, suggesting Meta's 25,000-word privacy statement did not give enough detail about how people can protect their information.
Ms. Claybaugh said she wanted to make clear Meta did not use data from accounts of people under 18 years of age to train its models.
Ethical Concerns
Mr. Shoebridge later pressed the issue again asking the representatives about whether there was an ethical issue to be addressed.
"When a young mother celebrated the birth of their child in 2010 and put a post up on Facebook and then celebrated their daughter's fifth-year birthday in 2015 ... she would never have contemplated that Meta was going to scrape those photos, scrape the text, take the name of her daughter, the photograph of her daughter and feed it into an AI model," he said.
Ms. Claybaugh said Meta had a layered approach to privacy to ensure someone's personal data is not memorised and "spit out" by the generative AI product.
Opt-In Process
Labor Senator Varun Ghosh later asked Simon Milner, vice president of public policy APAC at Meta, whether Meta should instead provide an opt-in process to allow Meta to use people's data to train AI models.
"A compulsory opt-in at all times would be extremely annoying for most people across the internet. We know that for a fact," he said.
Copyrighted Works Issue
The committee also raised the issue of copyrighted works being used to train Meta's AI model.
Hundreds of thousands of pirated books, including from Australian authors, have allegedly been used to train Meta's AI models, scraped from a dataset known as Books3.
Senator Sheldon later issued a statement saying the issue made "a mockery of the entire existence of copyright protections".
He labelled Meta's use of people's information as "dishonest" and "predatory."
"Meta must think we're mugs if they expect us to believe someone uploading a family photo to Facebook in 2007 consented to it being used 17 years later to train AI technology that didn't even exist at the time," Senator Sheldon said.