Perplexity has fired back at Cloudflare’s accusations that the AI company uses stealth crawlers to bypass website restrictions, calling the claims “embarrassing errors” and questioning Cloudflare’s technical competence. The dispute escalates a growing battle over AI companies’ data collection practices, with Perplexity accusing Cloudflare, a content delivery network provider, of fundamental misunderstandings about how modern AI assistants operate.
The original accusations: Cloudflare claimed Perplexity was disguising its web crawlers as regular Chrome browsers to scrape content from sites that had explicitly blocked its official bots through robots.txt files and firewall rules.
• The CDN company said it observed Perplexity rotating through multiple IP addresses and different network providers to evade blocks across “tens of thousands of domains and millions of requests per day.”
• Cloudflare set up test domains with explicit crawling prohibitions and found that Perplexity still accessed the content and could provide detailed answers about it to users.
• This follows similar accusations from WIRED and Forbes last year alleging Perplexity scraped their content without permission.
Perplexity’s aggressive response: The AI company published a blog post dismissing Cloudflare’s technical analysis as fundamentally flawed and disqualifying.
• “This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats,” Perplexity stated.
• The company claimed Cloudflare made “technical errors” including misattributing “millions of requests” and publishing “completely inaccurate technical diagrams.”
• Perplexity argued that Cloudflare has “forfeited any claim to expertise in this space” due to these alleged technical mistakes.
Cloudflare’s defensive measures: The CDN provider has rolled out new tools to help customers block AI crawlers while noting that other companies like OpenAI respect robots.txt restrictions.
• Cloudflare’s bot management system can now detect and block Perplexity’s alleged stealth crawlers, with existing customers already protected.
• The company launched a “Pay Per Crawl” program allowing publishers to set rates for AI companies wanting to scrape their content.
• Cloudflare also offers automatic blocking of all AI crawlers as an option for customers.
Industry context: The dispute highlights broader tensions over AI training data as companies seek content while publishers demand compensation.
• Recent licensing deals include partnerships between The New York Times and Amazon, The Washington Post and OpenAI, and notably, Perplexity with Gannett Publishing.
• Ziff Davis, ZDNET’s parent company, filed a lawsuit against OpenAI in April 2024 alleging copyright infringement in training AI systems.