Is Your Community Selling You Out to AI?

Stack Overflow Logo

On May 6, Stack Overflow, a community-powered assistance site for programmers, announced that it had reached a deal with OpenAI to give the company broad access to its fifteen-year catalog of questions and answers.

To say that the announcement did not go over well would be an understatement.

Users began to rebel, with some attempting to deface or edit their previous answers. One user, identified as Ben, said he was suspended for seven days after editing some of his most popular answers to a protest message. His original answers were restored without his permission.

As of this writing, a post announcing the partnership on the sister site Stack Exchange has nearly 800 negative reputation.

However, those upset about the new policy have precious little they can do. According to a report by Benj Edwards at Ars Technica, Stack Overflow owns user posts, and users can not revoke permissions for any content they contribute. They can stop participating moving forward, but there’s little they can do about what they have already posted.

For many, what makes this new partnership sting is that it feels like a betrayal by Stack Overflow. The site had previously banned generative AI use in answers but partially reversed that decision in June 2023.

Moderators were also worried that Generative AI systems were diluting their audience as programmers turned to AI, rather than Stack Overflow, to answer their questions. This helped codify AI as an enemy or a threat to the site.

But, while OpenAI may be the most recent example of a community selling out their users to AI companies, they aren’t the first and won’t be the last. In fact, there’s a surprisingly rich history of it already.

A Brief History of Communities and AI

To be clear, Stack Overflow isn’t the first or the biggest community to encounter these issues.

On February 21 this year, the social media site Reddit announced it had reached a deal with Google to make its content available to train its AI systems. This led to calls for users to delete or edit their posts to prevent AI systems from accessing them.

Before that, in September 2023, Meta drew criticism for using public Facebook and Instagram posts to train its AI systems. Though they did say the “vast majority” of the content used was publicly available, they hinted that wasn’t true for everything.

A few months earlier, in June 2023, Adobe announced Firefly, an “ethically trained” generative AI system for images. Since the system was trained solely on licensed photos, Adobe even offered indemnification for corporate users.

However, members of the Adobe Stock community, whose images were used for the training, accused Adobe of using their images without explicit permission. They also raised concerns it would “cannibalize” the platform and harm their business overall.

Finally, the community art website DeviantArt became the center of controversy in November 2022 after it launched a new AI art tool, DreamUp. Between the lack of clear consent and a deficient opt-out tool, the site drew significant community backlash over its initiatives.

These are just a few examples of large communities using member content to train AI systems. While these are some of the bigger stories, they aren’t the only ones and won’t be the last ones.

Simply put, AI is big business and too big for some communities to ignore.

Following the Money

The issue is simple: Training an AI system requires a large amount of content.

Historically, AI companies have simply scraped internet content for that purpose. However, that practice has led to dozens of lawsuits against AI companies, with creators arguing that such unauthorized use is a copyright infringement.

But even without the lawsuits, large sites and communities have worked to stymie mass scraping. In June 2023, Reddit altered its API policy, drawing significant protests due to the impact on third-party apps. Just this week, X (formerly Twitter) lost a lawsuit against a data-scraping company in a bid to prevent such content extraction.

While crediting these sites for protecting user content might be tempting, their actions and words show this is not true. Instead, these are attempts to claw back control of user content in a bid to get paid for it.

AI companies and their customers are wary of the legal issues AI scraping/training has generated. One way to defend against that is to seek large licensed content libraries. Sites like Reddit and Stack Overflow have the largest such libraries in the world.

When you consider the amount of money being invested in AI, the problem becomes clear. Sites with large libraries of user-generated content have suddenly become very valuable. Given the past struggles many have had monetizing such content, AI becomes a tempting solution.

However, for users and community contributors, this is cold comfort. Members of communities place trust that those communities will not abuse their contributions. Though the terms of service may sign away a swath of rights, they expect the community will respect their wishes.

But, thanks to the allure of AI money or opportunities, it has become less and less true.

This has led to a growing divide between communities eager to take advantage of AI opportunities and AI-wary users.

Bottom Line

In the end, communities like Stack Overflow, Reddit and DeviantArt face a simple decision: Either strike deals with AI companies or miss out on a lucrative opportunity. The decision is complicated by the ongoing legal battle over AI and the fact AI companies likely have already scraped that data.

However, communities must remember that they are in a position of trust. Many of their members see such a deal as a breach of that trust.

Communities live, thrive and die on the goodwill of their members. Trading that goodwill for profits is a risky move.

If a community is going to enter into such a deal, proper outreach to address the community’s concerns is important. Communities should openly answer any questions, offer effective opt-out tools, and honestly communicate why it is happening.

Most importantly, they must ensure the partnership makes sense for the community. Many groups will never embrace AI, no matter how transparent the community is.

In the end, communities need to consider these partnerships carefully. Sadly, many seem to be thinking only about maximizing the value they receive, not how to serve the community that generated that value.

That mistake could be costly, trading long-term stability for a short-term boon.

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free