What is AI visibility and why does it matter for e-commerce?

AI visibility refers to how easily AI assistants like ChatGPT, Perplexity, and Gemini can find, understand, and recommend your business. As more consumers use AI instead of traditional search engines to find products and services, businesses without structured data are invisible to this growing channel. Gartner predicts traditional search volume will drop 25% by 2026.

What is JSON-LD schema markup?

JSON-LD (JavaScript Object Notation for Linked Data) is a method of encoding structured data using the Schema.org vocabulary. It tells AI systems and search engines exactly what your business is, what you sell, your reviews, FAQs, and more in a machine-readable format. It is embedded in your website HTML and is invisible to users but critical for AI comprehension.

What is Generative Engine Optimisation (GEO)?

Generative Engine Optimisation (GEO) is the practice of optimising your website content and structured data specifically for AI-powered search engines and assistants. Unlike traditional SEO which focuses on Google rankings, GEO focuses on making your business recommendable by ChatGPT, Perplexity, Gemini, and other AI systems.

How does the FlinnSchema AI audit work?

The FlinnSchema AI audit analyses your website across 26 factors including schema markup quality, E-E-A-T signals, AI crawler access, Reddit community presence, content freshness, conversational content structure, LLM readability, and llms.txt presence. You receive an overall AI visibility score (capped at 90%) with detailed breakdowns and actionable recommendations.

How long does schema markup implementation take?

Implementation typically takes 1-2 weeks depending on the size and complexity of your store. This includes a full audit, hand-coded JSON-LD markup for all key pages, validation against Google Rich Results Test and Schema.org, and comprehensive documentation.

← Back to Blog

Does ChatGPT Use Google's Index or Its Own Crawl?

Flinn Evans9 June 2026

AI searchChatGPTLLM SEOAI visibilityGPTBotweb crawlingAI indexing

It is a question that comes up constantly from business owners and SEOs alike. ChatGPT seems to know things about your website, but you never gave it access. So where is it getting its information? Is it piggybacking on Google's index, or does it have its own way of finding and reading your content?

The short answer is: it depends on which version of ChatGPT you are using, and the answer has changed significantly over the past couple of years. Let's break it down properly.

The Training Data vs. Live Web Distinction

Before we get into crawlers and indexes, it helps to understand the fundamental split in how ChatGPT works. There are two separate mechanisms at play, and they are often confused with each other.

The first is training data. When OpenAI built GPT-4 (and its predecessors), they fed the model an enormous amount of text scraped from the web, books, code repositories, and other sources. That training process has a cutoff date. Anything published after that date simply does not exist in the model's base knowledge. ChatGPT is not constantly re-reading the internet in the background. That knowledge was baked in once, during training.

The second is live web browsing. ChatGPT now has the ability to search the web in real time when you ask it a question that requires current information. This is a separate feature built on top of the base model, and it works very differently.

Google's index is involved in neither of these directly. OpenAI built its own training dataset using its own crawlers, and it uses its own search infrastructure for live browsing. Google does not hand over its index to OpenAI.

How OpenAI Crawled the Web for Training

To build its training dataset, OpenAI used a bot called GPTBot. This crawler visited publicly accessible web pages, read their content, and that content was folded into the training corpus used to teach the model. GPTBot has its own user-agent string, which means you can identify it in your server logs and, if you choose, block it via robots.txt.

This is entirely separate from Googlebot. Google's crawler feeds Google's search index. GPTBot feeds OpenAI's training pipeline. They are different bots, different infrastructure, different purposes.

OpenAI also licensed some third-party data, including a deal with news publishers and datasets like Common Crawl, which is a publicly available archive of web content. But the core mechanism is GPTBot going out and reading pages directly.

One important nuance: being crawled by GPTBot does not mean your content will appear in ChatGPT's answers. Training data goes through filtering, weighting, and fine-tuning processes. High-quality, well-structured content is more likely to be retained and surfaced, but there are no guarantees based on crawl alone.

ChatGPT's Live Browsing: Bing, Not Google

Here is where things get interesting for anyone hoping to influence ChatGPT's current answers. When ChatGPT searches the live web, it uses Bing, not Google.

OpenAI has a partnership with Microsoft, and live web search in ChatGPT is powered by the Bing Search API. So when a user asks ChatGPT something that requires fresh information, such as today's news, a recent product release, or an updated price, ChatGPT sends a query to Bing and uses the results to inform its response.

This is significant. If you want your content to appear in ChatGPT's live, cited answers, being indexed and ranked well in Bing matters more than you might have assumed. Google rankings still matter for traditional SEO and for Gemini (which does use Google's infrastructure), but for ChatGPT's live browsing specifically, Bing is the relevant search engine.

That said, ChatGPT's browsing behaviour is not identical to a normal Bing search. The model decides when to browse, which queries to send, and how to synthesise results. It does not always use browsing even when a question might benefit from it. The model makes judgement calls based on how it was fine-tuned.

What This Means for Your Site's AI Visibility

Understanding this distinction has real practical implications for how you think about getting your business in front of AI-generated answers.

Structured data matters for training comprehension

When GPTBot crawled the web during training, it was reading raw HTML. Pages with clear structure, well-written copy, and schema markup gave the model more to work with. Schema markup like Organization, Product, and FAQPage helps any parser, human or machine, understand what a page is about and who is behind it.

If your site was poorly structured or thin on content at the time of training, the model may have very little reliable information about your brand. That is partly why some businesses find ChatGPT gives vague or inaccurate answers about them.

Bing indexing is not optional

Given that ChatGPT's live browsing runs on Bing, submitting your sitemap to Bing Webmaster Tools is no longer something you can afford to ignore. Many site owners have never touched Bing's tools because Google dominates traditional search. But for AI visibility through ChatGPT, Bing's index is directly relevant.

Check that your key pages are indexed in Bing. Make sure your site loads quickly, has clean internal linking, and presents content in a way that Bing's crawler can easily read. These are basic things, but they are often overlooked.

Fresh content gets picked up through live browsing

One of the arguments for publishing regular, well-structured content is that it can be picked up by ChatGPT's live browsing. A product guide, a detailed FAQ page, or an authoritative article on a topic relevant to your business could be surfaced when a user asks ChatGPT something in that space.

This is where the quality of your content structure really shows. ChatGPT tends to cite pages that are easy to parse, give clear answers, and carry signals of authority. Sparse or poorly formatted pages rarely get cited, even if they technically appear in Bing results.

For more on how to write content that AI engines will actually quote, take a look at this guide on writing content AI search engines will quote.

Perplexity, Gemini, and the Bigger Picture

It is worth noting that ChatGPT is not the only AI search engine with its own approach to sourcing information. Perplexity uses its own crawler (PerplexityBot) alongside real-time search. Gemini sits inside Google's ecosystem and does have access to Google's index and search infrastructure. Each platform has its own behaviour.

This is why treating "AI SEO" as a single unified thing is a mistake. What helps you rank in Gemini (strong Google presence, structured data, E-E-A-T signals) is not identical to what helps you appear in ChatGPT's live browsing (Bing indexing, citable content structure) or Perplexity (its own crawler plus search APIs).

At FlinnSchema, we look at all of these channels together when auditing a site's AI visibility, because optimising for one and ignoring the others leaves real gaps. You can request a free AI visibility audit if you want to see how your site currently performs across these platforms.

Can You Block GPTBot?

Yes. If you do not want OpenAI's crawler to read your content for training purposes, you can add a directive to your robots.txt file:

User-agent: GPTBot
Disallow: /

This will prevent future training crawls from your site. It will not remove content that was already collected before you added the rule, and it will not affect ChatGPT's live browsing (which goes through Bing, not GPTBot). If you want to think through whether blocking AI crawlers is the right call for your business, this post on blocking AI crawlers walks through the trade-offs in detail.

Why Your Schema Markup Still Matters

Whether the information is coming from training data or a live Bing result, structured data makes your content easier for AI systems to interpret correctly. Schema markup is not just a Google thing. It is a machine-readability signal that helps any automated system, crawlers, language models, browsing agents, understand what your page is saying and who is saying it.

If your pages lack structured data, AI systems are left guessing at your brand name, your products, your pricing, your location, and your authority. Schema fills in those gaps with explicit, machine-readable facts.

If you are not sure where your schema stands right now, the free audit is a good starting point. It shows you exactly what is missing and what would make the biggest difference to how AI engines read your site.

For a deeper look at how AI crawlers actually find and read your site in the first place, this article on how AI crawlers like GPTBot and ClaudeBot find your site is worth reading alongside this one.

Frequently Asked Questions

Does ChatGPT use Google's search results?

No. ChatGPT does not use Google's index or search results. For live web browsing, it uses the Bing Search API via OpenAI's partnership with Microsoft. For its base knowledge, it relies on training data that OpenAI collected independently using its own crawler, GPTBot.

If I rank well on Google, will ChatGPT find my content?

Not automatically. Google rankings do not carry over to ChatGPT. For ChatGPT's live browsing, you need to be indexed and visible in Bing. For the model's base knowledge, your content needed to be crawled by GPTBot before the training cutoff and be of sufficient quality to be retained in the training data.

Does blocking GPTBot affect ChatGPT's live search answers?

No. Blocking GPTBot only prevents your content from being included in future OpenAI training runs. ChatGPT's live browsing operates through Bing, not through GPTBot. If you want to prevent your content from appearing in live ChatGPT answers, that would require a different approach entirely, such as ensuring your pages are not indexed in Bing.

How is this different for Gemini and Perplexity?

Gemini is built inside Google's ecosystem and does have access to Google's search infrastructure, so strong Google visibility is directly relevant there. Perplexity uses its own crawler, PerplexityBot, combined with real-time search APIs. Each AI platform has its own sourcing mechanism, which is why a site-level AI visibility strategy needs to account for all of them rather than treating them as identical.

Want to check your AI visibility?

Run a free audit on your website and see how visible you are to ChatGPT, Perplexity, and other AI search engines.

Run Free Audit