Why I’m Giving Up on Blocking AI Bots (For Now)

ChatGPT Logo

Back in August 2023, I wrote an article entitled “How to Block ChatGPT (And Why to Do It).” 

At the time, we were less than a year into the launch of ChatGPT, and we were still very much in the pre-AI mindset. As such, the recommendation was to simply update your robots.txt file to block OpenAI’s bots and be done with it.

However, the next three years have been incredibly rough for the robots.txt standard. In July 2025, I noticed that, despite my blocking of ChatGPT, the bot was still indexing this site and regurgitating my content. How was unclear, but the best guess was that it was getting the content from other sources, likely including the Internet Archive.

But even if OpenAI fully ignored my robots.txt file, it wouldn’t likely be considered a copyright infringement. The reason is that, in December 2025, a judge ruled that robots.txt did not qualify as an effective control mechanism for protecting copyright, meaning circumventing it is not an infringement under the Digital Millennium Copyright Act (DMCA).

As such, when I recently checked every single one of the major AI companies’ tools, I found that Plagiarism Today content was easily available, despite being blocked by using robots.txt on both the server and content delivery network (CDN) level.

Simply put, robots.txt doesn’t matter anymore. It hasn’t mattered for some time. It’s either time to adopt a new approach or time to give up the fight. 

For now, I’m giving up. But that doesn’t mean that I’m done trying for good.

The Death of Robots.txt

Robots.txt, or Robot Exclusion Protocol (REP), was first proposed in 1994 as a way for a site to communicate with crawlers and other bots. 

It’s always been something of a “gentleman’s agreement” between the site owner and the search engine. Bad bots have always ignored it, and those that did ignore it were often targeted with blocks. 

Despite being a de facto standard for over 25 years, the standard wasn’t codified until September 2022, just in time to start becoming irrelevant. That’s because, in November 2022, OpenAI launched ChatGPT to the public, kicking off the current explosion of consumer AI. 

This marked a major turning point in the relationship between websites and crawlers. Historically, websites were happy to allow crawlers to index their content in hopes of search engines directing traffic toward them. However, AI has turned this symbiotic relationship into a parasitic one

Instead of indexing content to present links to users, crawlers now scoop up content to train AI systems that often don’t cite sources at all and do so poorly when they do

The move becomes obvious: block those bots. But blocking those bots the traditional way is pointless for two reasons. First, as mentioned above, there’s no practical legal enforcement for the standard. Second, even if they do honor it, there are a myriad of other ways they can get the content.

Simply put, AI companies have not been picky about how they obtain training data. One of the few legal victories that human creators have had against AI companies has been the rampant piracy they committed in obtaining data.

As such, even if AI companies were to honor your robots.txt file, they can obtain your content through:

  1. The Internet Archive (and other archiving sites)
  2. Massive Datasets such as Books3 or Common Crawl
  3. Spam/Infringing Sites that Republish Your Content
  4. Search Results
  5. Legitimate Sources Discussing or Distributing Content

In short, even if AI bots completely honored your crawling restrictions, which they have no reason to do, they would still have access to your content through a myriad of other sources.

The only thing a robots.txt exclusion does is discourage AI bots from linking to your content. As small as that benefit is, it is the only benefit on offer right now.

So what is a frustrated human to do?

From Collaborative to Combative

Robots.txt worked as a standard because search engines had a reason to honor it. Their goal to direct users to relevant content gave them an incentive to ensure that the creators wanted to be indexed and consented to being included.

AI broke that relationship down completely. So what now?

One idea was the Really Simple Licensing Collective, an add-on to the robots.txt standard that would instruct AI tools to obtain licenses or follow guidelines to use a site’s content to train AI systems. However, since AI crawlers can largely ignore robots.txt with impunity, there’s little reason for them to follow it. 

Cloudflare, on the other hand, introduced its own licensing tool, one that blocked AI bots from accessing the content. This is likely to work much more effectively, especially for larger rights holders, but it still has the problem of the content being available on other sites and resources.

If you truly want to keep your new content from being used to train AI systems, the only real option is to block nearly all bots and to do so not through robots.txt, but through server-level and/or CDN-level blocking. Simply asking the bots to leave won’t work and, most likely, is not enforceable.

But this creates its own set of problems. An aggressive blocking strategy risks blocking legitimate users. This includes those using virtual private networks (VPNs), are on corporate networks, or are using privacy tools such as Apple’s Private Relay.

There are also bots that most webmasters do want to allow, such as actual search engine crawlers, bots that send email newsletters, and so forth. 

The thing about robots.txt is that it was much easier. You could set up a robots.txt file with either a template or using a simple tool. Actively blocking dubious bots either requires a great deal of work and expertise or the use of a third party like Cloudflare.

However, that is what the internet has come to. If you want a block that means something legally and technically, robots.txt isn’t going to cut it. You need to block at the source.

Bottom Line

The obvious question is “Why not just leave up the block, even if it isn’t doing any good?”

The reason is fairly simple. This site, like countless others, has been hit hard by a traffic drop with the rise of consumer AI. I’ve been fortunate in that my business model doesn’t rely on traffic to keep the lights on. I work as a copyright and plagiarism expert/consultant, and my revenue is divorced from traffic statistics.

Since the AI companies have made it clear that my content is fair game, I want to see what impact, if any, removing the block has on my traffic. Personally, I don’t think it will have any impact, positive or negative. But it would be nice to at least see that firsthand.

What is clear is, when it comes to bots, we’ve entered a new era of the internet. The age of mutual respect and cooperation is over. Asking bots to leave never really worked, but now it has no real benefit at all. 

Since respect is absent and legal enforcement is wanting, the only tool left is technical enforcement.

That’s not the internet I wanted, but it’s the internet we are getting. 

Want to Reuse or Republish this Content?

If you want to feature this article in your site, classroom or elsewhere, just let us know! We usually grant permission within 24 hours.

Click Here to Get Permission for Free

Exit mobile version