G
GEO Toolbox

Reference

GEO Glossary.

Short, plain-English definitions of the terms behind generative engine optimization and AI search. Each one links to the full guide.

AI Crawlers

CCBot

CCBot is the crawler operated by Common Crawl, a nonprofit that publishes a free, open dataset of web pages. Because that dataset is widely used to train large language models, CCBot is one of the most common indirect routes your content takes into AI systems. It respects robots.txt.

ChatGPT-User

ChatGPT-User is the OpenAI agent that fetches a specific web page in real time when a user's ChatGPT prompt requires it, such as following a link or answering a question about a page. It is distinct from GPTBot (training) and OAI-SearchBot (search indexing).

ClaudeBot

ClaudeBot is Anthropic's web crawler, used to gather publicly available content for Claude. It identifies with the user agent ClaudeBot and respects robots.txt, so you allow or block it the same way you would any other crawler.

Google-Extended

Google-Extended is a robots.txt token that controls whether your content can be used to train Google's AI models, including Gemini, and to ground AI answers. Crucially, it does not affect your ranking in Google Search: blocking it removes you from one AI surface, not from search.

GPTBot

GPTBot is OpenAI's web crawler that gathers publicly available content which may be used to train its models. You control it through robots.txt. It is separate from OAI-SearchBot, the crawler that surfaces pages in ChatGPT's search answers, so blocking GPTBot opts you out of training without removing you from ChatGPT search.

llms.txt

llms.txt is a markdown file at a site's root that gives AI systems a curated map of its most important content. As of 2026 it is not a Google Search or AI Overviews ranking signal, but Google's Chrome Lighthouse now audits for it as an agentic-browsing best practice, so it is becoming low-cost infrastructure for helping AI agents navigate your site.

OAI-SearchBot

OAI-SearchBot is OpenAI's crawler that surfaces and links websites in ChatGPT's search answers. It respects robots.txt and is separate from GPTBot (training): if you block OAI-SearchBot you can disappear from ChatGPT search results, even though you stay eligible for training.

Perplexity-User

Perplexity-User is the agent Perplexity uses to fetch a specific page in real time when a user's question requires it. Because the request is user-initiated, it generally ignores robots.txt, so a robots.txt block will not stop a direct user-driven fetch.

PerplexityBot

PerplexityBot is Perplexity's crawler, designed to surface and link websites in Perplexity's search results. Perplexity states it is not used to train foundation models and recommends allowing it in robots.txt. Blocking it removes you from the index Perplexity builds answers from.

robots.txt

robots.txt is a plain-text file at the root of a site that tells crawlers which paths they may or may not fetch, by user agent. For AI search it is the primary control for allowing or blocking crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Most well-behaved AI crawlers respect it.

AI Search Surfaces

GEO Concepts

AI Citation

An AI citation, in the AI-search sense, is when an AI search engine references and links your page as a source in its answer. Citations are the core unit of AI visibility: in a zero-click world, being one of the cited sources, not ranking a blue link, is what gets your brand seen.

AI Search Engine Optimization

AI search engine optimization is the umbrella term for improving how your brand appears in AI-powered search, across ChatGPT, Perplexity, Gemini, and Google AI Overviews. It is the plain-English name for the same discipline also called generative engine optimization (GEO) and answer engine optimization (AEO).

Answer Engine Optimization (AEO)

Answer engine optimization (AEO) is the practice of structuring content to win the direct answer in answer engines, including featured snippets, voice results, and AI answers. It overlaps almost entirely with generative engine optimization (GEO); the two are the same job described from different angles.

Entity SEO

Entity SEO is optimizing around clearly-defined entities (your brand, people, products, and concepts) and the relationships between them, rather than around keyword strings. It helps search and AI engines recognize who you are and connect facts to you, which makes you safer to cite.

LLM SEO (LLMO)

LLM SEO, sometimes called LLMO, is optimizing content to be surfaced and cited by large language model tools like ChatGPT and Claude. It frames the work around the models specifically, but in practice it is the same discipline as generative engine optimization (GEO) and answer engine optimization (AEO).

Schema Markup (Structured Data)

Schema markup is code, usually JSON-LD using the schema.org vocabulary, that labels what the content on a page means so machines can parse it. It does not force AI citations, and Google says no special schema is required for AI features, but it helps engines understand your entities and content.

Share of Voice (AI)

AI share of voice is the percentage of AI answers, across a defined set of prompts, in which your brand is mentioned or cited, compared with competitors. It is the closest thing to a ranking metric in AI search: not a position, but how often you are part of the answer.

How AI Search Works

AI Hallucination

An AI hallucination is when a model generates false or fabricated information and presents it as fact. In AI search it shows up as wrong claims, invented sources, or incorrect brand details. Grounding answers in retrieved, citable sources is the main defense, which is why clear, sourced content matters.

Content Chunking

Content chunking is structuring a page into self-contained units, each making sense on its own, so AI systems can retrieve and cite a single passage cleanly. Engines pull chunks, not whole pages, so a section that stands alone without 'as mentioned above' is far easier to quote.

Grounding

Grounding is the practice of tying an AI model's answer to verifiable external sources retrieved at query time, rather than relying on the model's internal memory alone. Grounded answers cite where each claim came from, which reduces hallucination and makes your content quotable.

Knowledge Graph

A knowledge graph is a structured network of entities (people, places, brands, concepts) and the relationships between them, often modeled as triples (subject, relation, object). Google's Knowledge Graph powers knowledge panels and helps engines understand who you are. A clear, consistent entity presence makes you easier to recognize and cite.

Query Fan-Out

Query fan-out is the technique where an AI search engine expands a single user question into many parallel sub-queries, retrieves results for each, and synthesizes one answer. Ask for a 5-day trip to Japan and it quietly searches hotels, weather, train passes, and more at once.

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is the technique behind most AI search: instead of answering only from memory, the model retrieves relevant documents at query time and grounds its generated answer in them, then cites the sources it used. It is why fresh, reachable, extractable pages can be quoted right away.

Semantic Search

Semantic search matches content to a query by meaning and intent rather than exact keywords, using vector embeddings to compare concepts. It is why AI engines can pull your page for a question that does not contain your exact phrasing, and why writing for intent beats writing for keyword strings.

Vector Embeddings

A vector embedding is a numerical representation of text (or images, audio) that captures its meaning as a point in high-dimensional space. Pieces of content with similar meaning sit close together, which lets AI systems retrieve relevant passages by similarity. Embeddings power semantic search and RAG.