Two crawlers, two very different jobs
Google runs a fleet of crawlers, and most site owners have only ever had to think about Googlebot. That one crawler handled search indexing, and everything else was a footnote. That changed when Google started building AI products at scale. Suddenly, Google needed to separate the act of indexing for traditional search from the act of collecting data to train and power AI systems. That distinction gave us two new user agents: Google-Extended and GoogleOther.
They sound similar. They're not. Confusing them in your robots.txt can have real consequences, either for your AI visibility, your search rankings, or both. Here's exactly what each one does and how to handle them.
What Google-Extended actually is
Google-Extended is a standalone user agent introduced by Google in September 2023. Its specific purpose is to control whether your site's content can be used to train Google's AI models, including Gemini and the underlying large language models that power Google's AI Overviews (formerly Search Generative Experience).
When Google-Extended visits your site, it is not indexing your pages for traditional search results. It is reading your content as training data. Blocking Google-Extended tells Google: "Do not use my content to improve your AI products." Allowing it means your content can feed into the models that generate AI-powered answers across Google's ecosystem.
This is a meaningful distinction. Some site owners want their content to appear in Google Search but do not want it used as AI training material. Google-Extended gives you that lever. You can block it independently of Googlebot, and traditional search indexing carries on unaffected.
How to block Google-Extended in robots.txt
Blocking Google-Extended is straightforward:
User-agent: Google-Extended
Disallow: /
That tells Google not to use any page on your site for AI training purposes. You can also target specific directories:
User-agent: Google-Extended
Disallow: /blog/
Disallow: /research/
If you want to explicitly allow Google-Extended everywhere (which is the default if you don't mention it at all), you can state it clearly:
User-agent: Google-Extended
Allow: /
There is no ambiguity in how Google-Extended behaves. It respects robots.txt directives, and Google has been clear about what it does.
What GoogleOther actually is
GoogleOther is a more general-purpose crawler. Google introduced it to give their internal teams a declared user agent for crawling publicly accessible web content for a range of non-search, non-training purposes. Think product research, infrastructure testing, one-off data collection projects, and similar internal tasks that do not fit neatly under Googlebot or Google-Extended.
GoogleOther does not crawl for search indexing. It does not crawl to train Gemini. It is essentially a catch-all label for miscellaneous Google crawling activity that would otherwise appear in your logs under a vague or unlabelled string.
In practical terms, GoogleOther visits are low-frequency. You are unlikely to see it hammering your server. Most site owners will notice it only when reviewing access logs, if at all. Its visits tend to be exploratory and sparse rather than systematic.
GoogleOther-Image and GoogleOther-Video
It is worth knowing that Google has also defined more specific variants: GoogleOther-Image and GoogleOther-Video. These are used when Google's teams are specifically collecting image or video content for internal purposes. They follow the same robots.txt rules as the base GoogleOther agent. If you block GoogleOther, you block these variants too.
The key differences side by side
Let's be direct about how these two crawlers differ, because the overlap in naming causes genuine confusion.
Purpose
Google-Extended exists specifically to collect content for AI model training and improvement. It feeds Gemini and Google's AI Overviews. GoogleOther exists for miscellaneous internal Google projects that do not involve search indexing or AI training.
Impact on your AI visibility
Blocking Google-Extended has a direct, tangible effect on whether your content contributes to Google's AI systems. Blocking GoogleOther has essentially no measurable effect on AI visibility or traditional search. It is the less consequential of the two for most site owners.
Impact on search rankings
Neither Google-Extended nor GoogleOther affects traditional Google Search indexing. That is handled by Googlebot. Blocking either of these agents will not cause your pages to drop out of search results. This is important to understand because some site owners panic when they see these agents in their logs and start adding overly broad blocks.
Frequency of visits
Google-Extended can visit quite regularly if your site produces content that Google's systems find valuable for training. GoogleOther tends to be infrequent and irregular.
How they appear in your server logs
In your access logs, you will see them as distinct user agent strings. Google-Extended typically appears as something like:
Mozilla/5.0 (compatible; Google-Extended)
GoogleOther appears as:
Mozilla/5.0 (compatible; GoogleOther)
If you are unsure which crawlers are visiting your site right now, it is worth auditing your logs. We have written a practical guide on how to check which AI crawlers are visiting your site, which covers both Google's agents and third-party crawlers from OpenAI, Anthropic, and others.
Should you block Google-Extended?
This is the question that actually matters. The answer depends on what you want.
If you are an e-commerce brand, a publisher, or a service business trying to grow your visibility in AI-generated answers, blocking Google-Extended is likely counterproductive. Google's AI Overviews are becoming a significant source of traffic and brand exposure. If your content does not feed into Google's models, you lose ground to competitors whose content does.
If you produce original research, proprietary data, or creative work and you have concerns about your content being used without compensation to improve commercial AI systems, blocking Google-Extended is a reasonable stance. Many news publishers and academic institutions have taken this position.
There is no universally correct answer. What matters is that you make the decision intentionally, not by accident. The worst outcome is blocking Google-Extended without realising it, because a wildcard Disallow rule or an overenthusiastic plugin has swept it up.
Speaking of accidental blocks, it is surprisingly common for robots.txt files to block AI crawlers unintentionally. This post on why your robots.txt might be blocking AI crawlers without you realising is worth a read if you have not audited yours recently.
Should you block GoogleOther?
For most site owners, blocking GoogleOther has no meaningful upside. It does not crawl for search or AI training, so blocking it does not protect your content in any significant way. The main reason someone might block it is server resource conservation, though given GoogleOther's low visit frequency, even that benefit is marginal.
If you are running a tight server with bandwidth constraints and you see GoogleOther making repeated requests, adding a block is harmless. Otherwise, leave it alone.
How this fits into a broader AI visibility strategy
Understanding individual crawlers is useful, but it is one piece of a larger picture. Controlling who can crawl your site is a defensive measure. The proactive work involves making your content as readable, structured, and citable as possible for AI systems, regardless of which crawler is visiting.
Structured data plays a major role here. When your pages include well-formed schema markup, AI systems can extract information with confidence. That increases the likelihood that your content gets cited in AI-generated answers, whether from Google, Perplexity, or ChatGPT. At FlinnSchema, this is the core of what we do: making e-commerce and service brands genuinely visible to AI search, not just technically crawlable.
If you have never assessed how your site currently looks to AI systems, a free AI visibility audit is the most efficient place to start. It surfaces the gaps that robots.txt settings alone cannot fix.
A note on other AI crawlers beyond Google
Google-Extended and GoogleOther are Google-specific. But if you are thinking seriously about AI visibility, you also need to account for crawlers from OpenAI (GPTBot), Anthropic (ClaudeBot), Meta, and others. Each has its own user agent string and its own robots.txt rules.
Blocking Google-Extended while leaving GPTBot unrestricted means your content can still influence ChatGPT's outputs but not Google's. Whether that is what you want depends on where your audience spends their time asking questions. For a deeper look at one of those other crawlers, our post on what GPTBot is and how to let it crawl your site covers the OpenAI side of the equation.
The point is that AI crawler management is not a single switch. It is a set of deliberate decisions about which AI systems get access to your content, with each one carrying different implications for where your brand shows up in AI-generated answers.
Frequently Asked Questions
Does blocking Google-Extended affect my Google Search rankings?
No. Google-Extended is entirely separate from Googlebot, which handles traditional search indexing. Blocking Google-Extended only prevents your content from being used in Google's AI training and AI Overviews. Your organic search rankings are unaffected.
Will blocking GoogleOther hurt my site in any way?
Almost certainly not. GoogleOther does not crawl for search indexing or AI training, so blocking it has no measurable impact on your search performance or AI visibility. It is a low-stakes decision either way.
If I do not mention Google-Extended in my robots.txt, is it allowed to crawl?
Yes. The default is allow. If you do not include a specific directive for Google-Extended, Google treats it as permitted and may use your content for AI training purposes. You need to explicitly add a Disallow rule if you want to opt out.
Can I block Google-Extended for some pages but not others?
Yes. You can use path-specific directives in robots.txt to block Google-Extended from particular directories or pages while allowing it elsewhere. For example, you might allow it on your product pages but block it from your proprietary research or editorial content.

