Cloudflare has accused Perplexity AI of acting like “North Korean hackers” after discovering the AI search company’s bots repeatedly circumventing anti-scraping measures to crawl websites without permission. This escalation in the ongoing battle over AI data collection could significantly undermine Perplexity’s ability to index content, as Cloudflare, an internet infrastructure provider, has now delisted the company as a “verified bot” and implemented hard blocks against its web crawlers.
What happened: Cloudflare CEO Matthew Prince publicly called out Perplexity AI on Monday for invasive web crawling practices that violate website protection measures.
- An investigation revealed Perplexity was “repeatedly modifying” its web-crawling bots to evade data-scraping blocks on third-party websites.
- The company used stealth tactics including impersonating Google Chrome on macOS browsers and rotating through multiple IP addresses outside its official range when blocked.
- Cloudflare verified these claims by creating test domains deliberately hidden from search engines, which Perplexity still managed to crawl.
The technical details: Perplexity’s sophisticated evasion methods operated at massive scale across the internet.
- “We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked,” Cloudflare found.
- The activity spanned “tens of thousands of domains and millions of requests per day,” according to the investigation.
- When stealth crawling was successfully blocked, Perplexity would fall back to other data sources, though these produced “less specific” answers that “lacked details from the original content.”
In plain English: Web crawlers are programs that automatically visit websites to collect information, much like a librarian systematically cataloging books. Websites can block these crawlers using a “robots.txt” file—essentially a “Do Not Enter” sign. Perplexity allegedly ignored these signs and disguised its crawlers to look like regular human users browsing with Chrome, making it nearly impossible for websites to identify and block them.
Why this matters: The crackdown highlights the growing tension between AI companies’ data hunger and website owners’ rights to control their content.
- Over 2.5 million websites have chosen to block AI training through Cloudflare’s managed robots.txt feature or AI crawler blocking rules.
- Media companies have already sued Perplexity and other AI providers like OpenAI for alleged copyright infringement over unauthorized content scraping.
- The incident underscores how some AI companies are willing to use deceptive practices to gather training data, despite growing legal and ethical concerns.
Competitive contrast: Cloudflare noted that other major AI companies are respecting anti-scraping measures.
- The company specifically mentioned that OpenAI has been complying with data scraping protections.
- This compliance difference could give law-abiding AI companies a competitive advantage as website owners increasingly implement blocking measures.
What’s next: Cloudflare expects an ongoing cat-and-mouse game as AI companies develop new evasion tactics.
- The company anticipates Perplexity will update its web crawler to circumvent the new blocking measures.
- Cloudflare’s delisting of Perplexity as a “verified bot” lumps its crawlers with other untrusted activity, potentially making content indexing significantly more difficult.
Cloudflare: Perplexity AI Acts Like North Korean Hackers, Ignores Scraping Blocks