Industry

Cloudflare Flags Perplexity AI for Using Stealth Crawlers to Bypass Website No-Crawl Directives

Cloudflare has accused Perplexity AI, an AI-powered answer engine, of engaging in stealth web crawling to evade robots.txt no-crawl directives and network blocks. According to Cloudflare’s security intelligence, Perplexity initially uses its declared user agent when crawling websites, but when blocked, it allegedly switches to undeclared crawlers with modified user agents and different ASNs to disguise its activity.

The security firm reports that Perplexity’s stealth crawling often bypasses or ignores robots.txt files—sometimes not even fetching them—violating standard crawler ethics. As a result, Cloudflare has removed Perplexity from its Verified Bots list and implemented AI crawler blocking heuristics within its managed rules to protect websites from unauthorized scraping.

Why This Matters for AI Crawlers and Web Security
For over 30 years, the internet has relied on trust between web operators and crawlers. Ethical bot operators are expected to:

  • Clearly identify themselves with a unique user agent, declared IP ranges, and contact details.
  • Avoid excessive traffic, data scraping of sensitive content, or stealth crawling tactics.
  • Serve a defined, public purpose (e.g., powering search, accessibility, or price tracking).
  • Separate bots for different functions, allowing selective crawler permissions.
  • Always follow rules in robots.txt and honor site owner preferences.
  • Cloudflare’s Verified Bots Policy outlines these best practices, which Perplexity’s alleged activity contradicts.

Cloudflare’s bot management system has flagged all Perplexity-related stealth crawling as bot activity, blocking it through managed challenge rules. Customers can either block traffic completely or challenge suspicious requests to verify human access. The company has also added signature detection for these stealth crawlers to its AI crawler blocking rule, available to all customers—including free-tier users.

This crackdown follows Cloudflare’s Content Independence Day initiative, which gives site owners more control over AI training data access. Over 2.5 million websites now block AI crawlers entirely using managed robots.txt rules or targeted blocking.

Cloudflare continues to work with global policy and technical standards bodies like the IETF to develop enforceable guidelines for ethical AI crawling. The goal is to create transparent, standardized rules that legitimate AI bot operators must follow—ensuring the web remains secure, fair, and respectful of digital content ownership.

Related posts

HCLTech and Volvo Cars join forces to Accelerate Automotive Engineering Transformation

NewzOnClick

Cybercriminals Exploit Proofpoint and Intermedia Link Wrapping, Warns Cloudflare

NewzOnClick

Amazon Cuts 100 Jobs in Devices and Services Team to Boost Efficiency

NewzOnClick

Leave a Comment

error: Content is protected !!