Two Different Crawlers, Two Different Jobs
OpenAI doesn't send one crawler to your website. It sends two. That surprises a lot of site owners, and it causes real configuration mistakes in robots.txt files.
GPTBot is OpenAI's training crawler. It visits your site to collect content that may be used to train future versions of ChatGPT and other OpenAI models. ChatGPT-User is a separate agent entirely. It's the real-time retrieval crawler, the one that fetches live web content when a user asks ChatGPT a question and the model needs up-to-date information from the web.
These two crawlers have different purposes, different schedules, and different implications for your site. Writing a single blanket rule for both, or assuming one covers the other, is a mistake that can either lock you out of AI visibility or leave your content being scraped without any corresponding benefit in search results.
So yes, you very likely do need separate rules for them. Here's why, and how to write them correctly.
What GPTBot Actually Does
GPTBot crawls your site asynchronously, meaning it isn't triggered by a user doing anything in real time. It runs background indexing passes, not unlike Googlebot, and the content it collects feeds into OpenAI's model training pipelines.
If you block GPTBot, your content won't be used in training future models. For many site owners that feels like a reasonable choice, particularly if they're protective of proprietary content, product data, or long-form editorial. For others, especially those who want their brand voice, product details, and expertise baked into future model behaviour, blocking GPTBot is actively counterproductive.
The user-agent string to use in robots.txt is exactly:
User-agent: GPTBot
Disallow: /
That blocks all GPTBot access. To allow it fully:
User-agent: GPTBot
Allow: /
You can also be selective. If you want GPTBot to crawl your blog and product pages but not your account or checkout areas:
User-agent: GPTBot
Allow: /blog/
Allow: /products/
Disallow: /account/
Disallow: /checkout/
That's a sensible middle ground for most e-commerce brands.
What ChatGPT-User Does Differently
ChatGPT-User is a live retrieval agent. When a ChatGPT user has web browsing enabled and asks a question, ChatGPT may fetch content from URLs in real time to construct its answer. ChatGPT-User is the agent doing that fetching.
This is much closer in nature to a user clicking a link than to a background crawler building a training dataset. The implications are significant. If you block ChatGPT-User, ChatGPT cannot retrieve live content from your site to answer user queries. That means your pages won't appear as cited sources in ChatGPT's browsing-enabled responses, even if your content is exactly what the user needs.
From an AI visibility standpoint, ChatGPT-User access is arguably more commercially valuable than GPTBot access in the short term. A citation from a live ChatGPT answer drives immediate referral intent. Training data influence is longer-term and harder to measure.
The user-agent string is:
User-agent: ChatGPT-User
Disallow: /
Or to allow:
User-agent: ChatGPT-User
Allow: /
Why a Single Rule Won't Cover Both
Some site owners assume that writing a rule for GPTBot will apply to all OpenAI crawlers. It doesn't. The robots.txt protocol is user-agent specific. A rule written for GPTBot has no effect on ChatGPT-User, and vice versa.
If you write:
User-agent: GPTBot
Disallow: /
ChatGPT-User is completely unaffected. It will still crawl and retrieve your content for live user queries. Conversely, if you only write rules for ChatGPT-User, GPTBot crawls on regardless.
This also means accidental gaps are common. A site that added GPTBot rules back in 2023 when OpenAI first published the user-agent, but never revisited the file, may have no ChatGPT-User rules at all. That's worth checking right now.
You can inspect your current robots.txt by going to yourdomain.com/robots.txt in your browser. Look for both user-agent strings explicitly. If one is missing, you have an unconfigured crawler.
If you want to understand more about what's actually hitting your site right now, checking your server logs for AI crawler activity is a practical first step before making any changes.
Recommended Configurations for Different Scenarios
There's no single right answer here. Your configuration should reflect your actual goals around AI visibility and content protection.
You want maximum AI visibility
Allow both crawlers explicitly. Don't rely on the absence of rules, write them out clearly:
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
Explicit allow rules signal intent and reduce the chance of a misconfigured wildcard rule accidentally blocking either crawler.
You want live citations but not training use
This is a legitimate position. You're happy for ChatGPT to cite your content in real-time answers, but you don't want your content used in model training. The configuration looks like this:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Allow: /
Be aware, though, that enforcement here relies on OpenAI respecting your robots.txt directives. OpenAI has stated publicly that it honours robots.txt, but you have no cryptographic guarantee of that. Most serious AI labs do respect the standard, but it's worth understanding the limitation.
You want to block both completely
Perhaps you're running a members-only site, a database of proprietary research, or a competitor intelligence tool. Blocking both makes sense:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
You want selective access
For most e-commerce brands, this is the most sensible approach. Allow access to public-facing product, blog, and category pages. Disallow anything sensitive:
User-agent: GPTBot
Allow: /products/
Allow: /blog/
Allow: /collections/
Disallow: /account/
Disallow: /orders/
Disallow: /cart/
User-agent: ChatGPT-User
Allow: /products/
Allow: /blog/
Allow: /collections/
Disallow: /account/
Disallow: /orders/
Disallow: /cart/
You might configure GPTBot and ChatGPT-User identically, or differently, depending on your content strategy. There's no rule that says they have to match.
Common Mistakes to Avoid
Using a wildcard that blocks everything
A wildcard rule like User-agent: * with Disallow: / will block all crawlers including both OpenAI agents. This is sometimes added by developers during a site build and never removed. If your wildcard rule is too aggressive, specific user-agent rules placed after it won't always override it cleanly across all parsers. It's safer to write explicit rules for each agent you care about.
Misspelling the user-agent strings
The strings are case-sensitive in some implementations. Use exactly GPTBot and ChatGPT-User. Common errors include gptbot, GPT-Bot, and ChatGPTUser (missing the hyphen). None of these will match.
Adding rules and never verifying
Use Google's robots.txt Tester in Search Console, or a standalone robots.txt validator, to confirm your rules are parsed as intended. What looks correct to the human eye isn't always what the parser sees, especially if there are formatting issues, BOM characters, or encoding problems in the file.
It's also worth knowing that robots.txt can block AI crawlers without you realising, particularly on platforms where the file is partially managed by the CMS.
How This Fits Into Broader AI Visibility
Robots.txt is only one layer of AI visibility. Allowing GPTBot and ChatGPT-User to crawl your site is a necessary condition for AI search inclusion, but it isn't sufficient on its own.
For your content to be cited well by AI models, it also needs to be structured clearly. Schema markup, particularly types like Product, FAQPage, Article, and Organization, gives AI crawlers semantic context that plain HTML doesn't. A page that's technically accessible but structurally ambiguous is harder for a model to interpret and cite accurately.
At FlinnSchema, we see this combination regularly: a site has correctly configured crawler access, but the content is marked up poorly or not at all, so it never gets cited in AI-generated answers. Fixing robots.txt is step one. Structured data is what turns access into visibility.
If you're not sure where your site currently stands, a free AI visibility audit will show you exactly which crawlers can access your content and where the structural gaps are.
You can also read more about what GPTBot is and how to configure access to it if you want a deeper look at that specific crawler.
Frequently Asked Questions
Does blocking GPTBot also block ChatGPT-User?
No. These are two separate user-agents and must be configured independently in robots.txt. A rule for GPTBot has no effect on ChatGPT-User, and vice versa. You need an explicit directive for each one if you want to control both.
If I allow ChatGPT-User, will my pages definitely appear in ChatGPT answers?
Allowing ChatGPT-User removes the technical barrier to access, but it doesn't guarantee citation. ChatGPT retrieves content based on query relevance, content quality, and page structure. Well-structured pages with clear headings, schema markup, and concise answers to specific questions are far more likely to be retrieved and cited than unstructured pages.
Can I allow ChatGPT-User but block GPTBot?
Yes, entirely. This is a reasonable configuration if you want your content to be used in live ChatGPT browsing answers but you'd prefer it isn't fed into OpenAI's training pipeline. The two functions are independent, and your robots.txt rules can reflect that distinction clearly.
How do I check whether my current robots.txt already has rules for these crawlers?
Visit yourdomain.com/robots.txt in a browser and search the page for GPTBot and ChatGPT-User. If neither appears, no specific rules exist for either crawler, and whether they can access your site depends entirely on your wildcard rules. If you have User-agent: * with Allow: /, they can crawl. If you have Disallow: /, they can't.

