AI Crawlers: The Nasty Bugs Causing Trouble on the Internet
AI tools with web search capabilities, such as Anthropic’s Claude, browse the internet to deliver users the needed information. Perplexity, OpenAI, and Google offer similar features through ‘Deep Research’.
AI Crawlers and Their Impact
In a blog post, Cloudflare explained that these web crawlers, often referred to as AI crawlers, deploy the same techniques as search engine crawlers to gather available information. While the aim of AI crawlers is to assist users, they may be causing more damage on the internet than one realises. They are believed to increase server resource usage for website administrators, leading to unwanted bills and causing disruptions.
Gergely Orosz, creator of The Pragmatic Engineer newsletter, shared on LinkedIn, “AI crawlers are wrecking the open internet, and I’m now being hit for the bill for their training.”
Vercel, a cloud platform company, shared some interesting statistics from their network in a blog post that said: “AI crawlers have become a significant presence on the web. OpenAI’s GPTBot generated 569 million requests across Vercel’s network in the past month, while Anthropic’s Claude followed with 370 million.”

Frustrations and Solutions
Xe Iaso, a software developer, expressed frustration upon noticing that AmazonBot was consuming their Git server resources. Attempts to block it resulted in failure. Iaso stated in the blog post, “It’s futile to block AI crawler bots because they lie, change their user agent, use residential IP addresses as proxies, and more. I just want the requests to stop.”
The developer created an open source solution, Anubis, to present a challenge to AI crawlers and block the requests.
Community Efforts and Conclusion
Ars Technica reached a similar conclusion for AI crawlers, focusing on its impact on open source projects. Many other reports indicate that people are attempting to fend off AI crawlers consuming their web resources.

Solutions such as Iaso’s Anubis, though not suitable for everyone, are a good option and are increasingly being embraced by individuals. Cloudflare has joined the fight against AI bots that do not honour the robots.txt rule with AI Labyrinth, which uses AI-generated content to keep the crawler occupied and waste its resources.
In addition to the solutions mentioned above, AI companies can do their bit by improving their crawlers to respect the web resources and be a little less aggressive in their information-hunt process. While the web search functionality in AI tools provides great value, it should not come at the cost of disrupting the web server resources of small or independent web admins.