conbersa.ai
Technical4 min read

What Is the Anthropic Web Crawler?

Neil Ruaro·Founder, Conbersa
·
anthropic-crawlerclaudebotai-crawlersweb-crawlers

The Anthropic web crawler is a set of automated programs operated by Anthropic - the company behind Claude AI - that browse the internet to collect web content for improving Claude's AI models and powering Claude's search features. Anthropic operates three distinct crawlers: ClaudeBot for training data, Claude-User for fetching pages requested by Claude users, and Claude-SearchBot for indexing content for search results. Each crawler has its own user-agent string and can be independently controlled through robots.txt files.

Understanding Anthropic's crawlers matters for AI search visibility because Claude is one of the major AI platforms where users ask questions and receive cited answers. If your content is blocked from Anthropic's crawlers, it cannot appear in Claude's responses.

How Do Anthropic's Three Crawlers Work?

Anthropic updated its crawler documentation in February 2025 with a formal breakdown of its three-bot framework:

Crawler User-Agent Purpose Impact on Claude
ClaudeBot ClaudeBot Collects data for model training and improvement Influences what Claude "knows" from training data
Claude-User Claude-User Fetches pages when a user shares a URL with Claude Enables Claude to read and summarize specific pages on request
Claude-SearchBot Claude-SearchBot Indexes content for Claude's web search feature Directly provides sources for Claude's search results

This three-bot architecture mirrors OpenAI's approach with GPTBot and OAI-SearchBot - separating training crawlers from search crawlers so website owners can make granular decisions about data usage.

ClaudeBot (Training Crawler)

ClaudeBot is Anthropic's original and primary crawler. It browses the web to collect publicly available content that contributes to training and improving Claude's AI models. When ClaudeBot crawls your content, that content may be used to train future versions of Claude, making it more likely that Claude "knows" about your brand and expertise.

ClaudeBot identifies itself with the user-agent string:

User-agent: ClaudeBot

Before ClaudeBot, Anthropic operated under the deprecated user-agent strings Claude-Web and Anthropic-AI. If your robots.txt references those old strings, update them to ClaudeBot.

Claude-User (User-Requested Fetching)

Claude-User is activated when a Claude user shares a URL or asks Claude to read a specific webpage. Unlike ClaudeBot which crawls proactively, Claude-User only fetches pages when a human user explicitly requests it.

Blocking Claude-User means that when someone pastes your URL into Claude and asks for a summary or analysis, Claude will not be able to access the page. For most websites, there is no reason to block this crawler.

Claude-SearchBot (Search Indexing)

Claude-SearchBot indexes web content specifically for Claude's web search feature. When a Claude user asks a question that triggers a web search, Claude-SearchBot's index determines which sources appear in the results.

Blocking Claude-SearchBot prevents your content from appearing in Claude's real-time search results, even if ClaudeBot has already crawled your content for training. This is analogous to blocking OAI-SearchBot for ChatGPT search.

How Do You Control Anthropic Crawler Access?

Control access through your robots.txt file. To allow all three crawlers:

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

To block training but allow search and user-requested access:

User-agent: ClaudeBot
Disallow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

Anthropic also supports the non-standard Crawl-delay directive to control how frequently ClaudeBot visits your site:

User-agent: ClaudeBot
Allow: /
Crawl-delay: 1

This tells ClaudeBot to wait at least 1 second between requests, reducing server load from crawling.

Should Startups Allow or Block Anthropic's Crawlers?

For most startups focused on AI visibility, the answer is clear: allow all three crawlers.

Allow ClaudeBot to ensure your content contributes to Claude's training data. This increases the likelihood that Claude "knows" about your brand and can reference your expertise in responses.

Allow Claude-User so that when potential customers share your content with Claude for analysis, Claude can access and summarize it. Blocking this creates a frustrating user experience.

Allow Claude-SearchBot to appear in Claude's web search results. As Claude's search capabilities expand, this becomes increasingly important for visibility.

The only reason to block ClaudeBot is if you have specific concerns about your content being used in AI training data. Even then, consider allowing Claude-SearchBot and Claude-User for the search visibility and user experience benefits.

For a complete audit of all AI crawlers accessing your site, see our guide on how to audit AI crawler access. Understanding and managing crawler access is the foundation of any AI search optimization strategy.

Frequently Asked Questions

Related Articles