ServicesAI Audit
← Back to Blog

Robots.txt: The File That Controls Whether AI Can See Your Website

AI VisibilityRobots.txtTechnical GuideAI Crawlers

Every website has a file called robots.txt. It sits at the root of your domain — yourdomain.com/robots.txt — and tells search engine crawlers what they're allowed to access. For decades, this file was mainly about managing Googlebot. In 2025, it's become one of the most important factors in whether AI systems can recommend your business.

Here's the problem: many websites are accidentally blocking AI crawlers without knowing it. Security plugins, hosting providers, and CDN configurations often add blanket blocks that include the bots used by ChatGPT, Perplexity, and other AI search engines. If your robots.txt blocks these crawlers, your business is invisible to AI — no matter how good your content or schema markup is.

What robots.txt Actually Does

The file is a simple text document with rules that tell web crawlers which pages they can and can't access. It doesn't require authentication or special software — any text editor works. Here's what a basic one looks like:

User-agent: *
Allow: /

Sitemap: https://yourdomain.com/sitemap.xml

The User-agent: * line means "this rule applies to all crawlers." The Allow: / line means "you can access everything." This is the simplest, most open configuration — and for most businesses, it's a perfectly good starting point.

Problems start when specific crawlers get blocked:

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

This tells OpenAI's and Anthropic's crawlers they can't access any page on your site. If these lines are in your robots.txt, ChatGPT and Claude literally cannot read your website.

The AI Crawlers You Need to Know

There are five AI crawlers that matter right now. Each belongs to a different AI platform, and each checks your robots.txt before accessing your site:

  • GPTBot — OpenAI's crawler. Powers ChatGPT's knowledge of your website. If you block this, ChatGPT can't learn about your business from your site.
  • ClaudeBot — Anthropic's crawler. Used by Claude to index web content. Blocking it removes your site from Claude's training and retrieval data.
  • PerplexityBot — Perplexity's crawler. Perplexity is one of the fastest-growing AI search engines. Blocking this bot means you won't appear in Perplexity answers.
  • GoogleOther — Google's secondary crawler used for AI training and Gemini. This is separate from Googlebot (which handles regular search). You can rank on Google but still be invisible to Gemini if GoogleOther is blocked.
  • Bytespider — ByteDance's crawler. Less critical for most UK/US businesses but relevant if your audience uses TikTok's AI features or ByteDance products.

How to Check Your robots.txt Right Now

Open a browser and go to yourdomain.com/robots.txt. That's it — the file is always publicly accessible at that URL. Look for any of these patterns:

# Bad — blocks all AI crawlers
User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: GoogleOther
Disallow: /

If you see Disallow: / next to any of these bot names, that AI system is blocked from your entire site.

Also watch for overly broad rules:

# This blocks EVERYTHING — including AI crawlers
User-agent: *
Disallow: /

This blocks every crawler from every page. Some security-focused hosting setups default to this. It's the nuclear option and it kills your visibility everywhere.

What Your robots.txt Should Look Like

For most businesses, you want AI crawlers to have full access to your public pages while keeping private areas (admin panels, user accounts, internal APIs) blocked. Here's a solid template:

# Allow all crawlers by default
User-agent: *
Allow: /

# Block private areas
Disallow: /admin/
Disallow: /api/
Disallow: /dashboard/
Disallow: /account/

# Explicitly allow AI crawlers (belt and suspenders)
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: GoogleOther
Allow: /

# Sitemap location
Sitemap: https://yourdomain.com/sitemap.xml

The explicit Allow rules for each AI crawler aren't strictly necessary if your User-agent: * rule already allows access — but they act as a safety net. If a plugin or security rule adds a block later, the specific allow rules take precedence for those bots.

Common Mistakes We See

1. WordPress Security Plugins Adding Blanket Blocks

Plugins like Wordfence, Sucuri, and iThemes Security sometimes add crawler blocks to robots.txt or via server-level rules. After installing or updating a security plugin, always check your robots.txt. Some plugins have an "AI crawler" toggle buried in settings that defaults to "block."

2. Cloudflare Bot Fight Mode

Cloudflare's Bot Fight Mode and Super Bot Fight Mode can block AI crawlers at the CDN level — before they even reach your robots.txt. If you're using Cloudflare, check your Security settings. You may need to create a WAF rule that explicitly allows GPTBot, ClaudeBot, and PerplexityBot user agents.

3. Blocking GoogleOther but Allowing Googlebot

This is a subtle one. Many sites correctly allow Googlebot (for regular search rankings) but block GoogleOther (the AI training crawler). Your site will still rank on Google, but Gemini won't know about you. These are separate crawlers with separate rules.

4. robots.txt vs. Server-Level Blocks

robots.txt is a polite request — well-behaved crawlers respect it, but it's not enforcement. Conversely, your server might be blocking crawlers at the firewall level even if robots.txt allows them. If your robots.txt looks clean but AI crawlers still can't access your site, check your hosting firewall, CDN settings, and any .htaccess rules.

Does robots.txt Affect Your AI Visibility Score?

Yes. It's one of the 26 factors in our AI visibility audit. We check whether GPTBot, ClaudeBot, PerplexityBot, GoogleOther, and Bytespider are allowed or blocked. Each blocked crawler reduces your score under the "AI Crawler Access" factor, which carries a 1.6x impact multiplier — making it one of the higher-weighted factors.

We consistently see this as one of the fastest fixes for improving AI visibility. Unblocking crawlers is free, takes under 5 minutes, and has an immediate effect. The crawlers will start indexing your site on their next pass, which typically happens within days.

What About Privacy Concerns?

Some businesses block AI crawlers deliberately because they don't want their content used for AI training. That's a legitimate choice — but it comes with a trade-off. If you block GPTBot, ChatGPT won't recommend your business. If you block PerplexityBot, Perplexity won't cite you in answers.

For most businesses, the visibility benefit of allowing AI crawlers far outweighs the concern. Your public web pages are already indexed by Google and archived by the Wayback Machine. Allowing AI crawlers extends that same public indexing to AI search engines — and in return, you get recommended to the growing audience that uses AI instead of Google.

If you have genuinely sensitive content, the better approach is to keep those specific pages behind authentication (where crawlers can't reach them) while allowing AI access to your public marketing, product, and service pages.

Quick Checklist

  1. Check your file — go to yourdomain.com/robots.txt right now
  2. Look for blocks — search for GPTBot, ClaudeBot, PerplexityBot, GoogleOther
  3. Remove any Disallow rules for AI crawlers (or change them to Allow)
  4. Check your CDN/firewall — Cloudflare, Sucuri, and similar services may block at the network level
  5. Check after plugin updates — security plugins can silently re-add blocks
  6. Add your sitemap URL — helps all crawlers find your content efficiently

This is the single easiest and fastest thing you can do to improve your AI visibility. No code changes, no technical expertise needed, no cost. Just a text file edit.

Run a free audit to check your AI crawler access alongside 25 other visibility factors — results in 30 seconds.

Want to check your AI visibility?

Run a free audit on your website and see how visible you are to ChatGPT, Perplexity, and other AI search engines.

Run Free Audit