The Rise of AI Training from Social Platforms

AI is learning from what you said on Reddit, Stack Overflow or ...

CAMBRIDGE, Mass. (AP) — Post a comment on Reddit, answer coding questions on Stack Overflow, edit a Wikipedia entry, or share a baby photo on your public Facebook or Instagram feed and you are also helping to train the next generation of artificial intelligence.

AI is learning from what you said on Reddit, Stack Overflow or ...

Not everyone is OK with that — especially as the same online forums where they've spent years contributing are increasingly flooded with AI-generated commentary mimicking what real humans might say. Some longtime users have tried to delete their past contributions or rewrite them into gibberish, but the protests haven't had much effect. A handful of governments — including Brazil's privacy regulator on Tuesday — have also tried to step in.

Platform Responses

Platforms are responding — with mixed results. Take Stack Overflow, the popular hub for computer programming tips. First, it banned ChatGPT-written responses due to frequent errors, but now it's partnering with AI chatbot developers and has punished some of its own users who tried to erase their past contributions in protest.

Community Concerns

Software developer Andy Rotering of Bloomington, Minnesota, has used Stack Overflow daily for 15 years and said he worries the company “could be inadvertently hurting its greatest resource” — the community of contributors who’ve donated time to help other programmers.

Stack Overflow CEO Prashanth Chandrasekar said the company is trying to balance rising demand for instant chatbot-generated coding assistance with the desire for a community “knowledge base” where people still want to post and “get recognized” for what they've contributed.

Challenges Ahead

Chandrasekar readily describes Stack Overflow's challenges as like one of the “case studies” he learned about at Harvard Business School, of how a business survives — or doesn't — after a disruptive technological change.

Marketing in the Age of Generative AI | IE Insights

For more than a decade, users typically landed on Stack Overflow after typing a coding question in Google, and then found the answer, copied and pasted it. The answers they were most likely to see came from volunteers who'd built up points measuring their credibility — which in some cases could help land them a job.

Now programmers can simply ask an AI chatbot — some of which are already trained on everything ever posted to Stack Overflow — and it can instantly spit out an answer.

Adapting to AI

ChatGPT's debut in late 2022 threatened to put Stack Overflow out of business. So Chandrasekar carved out a special 40-person team at the company to race out the launch of its own specialized AI chatbot, called Overflow AI.

That kind of strategy makes sense but may have come too late, said Maria Roche, an assistant professor at Harvard Business School. “I’m surprised that Stack Overflow wasn’t working on this earlier," she said.

Global Response

Brazil’s national data protection authority on Tuesday took action to ban social media giant Meta Platforms from training its AI models on the Facebook and Instagram posts of Brazilians. It established a daily fine of 50,000 reais ($8,820) for non-compliance.

Meta in a statement called it a “step backwards for innovation” and said it has been more transparent than many industry counterparts doing similar AI training on public content, and that its practices comply with Brazilian laws.

Privacy Concerns

Meta has also encountered resistance in Europe, where it recently put on hold its plans to start feeding people’s public posts into training AI systems — which was supposed to start last week. In the U.S., where there's no national law protecting online privacy, such training is already likely happening.

Reddit's Approach

Reddit has taken a different approach — partnering with AI developers like OpenAI and Google while also making clear that content can't be taken in bulk without the platform’s approval by commercial entities “with no regard for user rights or privacy.” The deals helped bring Reddit the money it needed to debut on Wall Street in March, with investors pushing the value of the company close to $9 billion seconds after it began trading on the New York Stock Exchange.

Reddit hasn't tried to punish users who protested — nor could it easily do so given how much say voluntary moderators have on what happens in their specialty forums known as subreddits.

But what worries Gilbert, who helps moderate the “AskHistorians” subreddit, is the increasing flow of AI-generated commentary that moderators must decide whether to allow or ban.

Meta ordered to stop training its AI on Brazilian personal data ...

“People come to Reddit because they want to talk to people, they don’t want to talk to bots,” Gilbert said. “There’s apps where they can talk to bots if they want to. But historically Reddit has been for connecting with humans.”