Reference

GEO Glossary.

Short, plain-English definitions of the terms behind generative engine optimization and AI search. Each one links to the full guide.

AI Crawlers

AI Crawler

An AI crawler is an automated bot that fetches web pages for an AI company, either to train models or to retrieve live content for answers. You control most of them through robots.txt, but reaching AI answers depends on the right crawlers being allowed.

Amazonbot

Amazonbot is Amazon's web crawler, used to improve Amazon services such as letting Alexa answer more questions, with crawled data that may also help train Amazon's AI models. It identifies with the user agent Amazonbot and respects robots.txt.

Applebot

Applebot is Apple's web crawler, powering Siri, Spotlight, and Safari search suggestions. A separate robots.txt token, Applebot-Extended, controls whether your content may be used to train Apple's foundation models, without affecting Applebot's search crawling.

Bingbot

Bingbot is Microsoft's web crawler that builds the Bing search index. It matters for AI visibility because Bing's index also backs Microsoft Copilot and ChatGPT search, so a page Bingbot cannot reach can be missing from those AI answers too.

Bytespider

Bytespider is ByteDance's web crawler, used to collect data to train its large language models. It is known for crawling aggressively and reportedly does not always respect robots.txt, so blocking it often takes a server or WAF rule rather than robots.txt alone.

CCBot

CCBot is the crawler operated by Common Crawl, a nonprofit that publishes a free, open dataset of web pages. Because that dataset is widely used to train large language models, CCBot is one of the most common indirect routes your content takes into AI systems. It respects robots.txt.

ChatGPT-User

ChatGPT-User is the OpenAI agent that fetches a specific web page in real time when a user's ChatGPT prompt requires it, such as following a link or answering a question about a page. It is distinct from GPTBot (training) and OAI-SearchBot (search indexing).

ClaudeBot

ClaudeBot is Anthropic's training crawler, which gathers publicly available content that may train future Claude models. It is one of three Anthropic bots: Claude-SearchBot indexes pages for Claude's web search, and Claude-User fetches pages live during a user's session. Each respects robots.txt and is controlled separately.

Google-Extended

Google-Extended is a robots.txt token that controls whether your content can be used to train Google's AI models, including Gemini, and to ground Gemini's answers. It does not affect your ranking in Google Search: blocking it affects Gemini training and grounding, not search.

GPTBot

GPTBot is OpenAI's web crawler that gathers publicly available content which may be used to train its models. You control it through robots.txt. It is separate from OAI-SearchBot, the crawler that surfaces pages in ChatGPT's search answers, so blocking GPTBot opts you out of training without removing you from ChatGPT search.

llms.txt

llms.txt is a markdown file at a site's root that gives AI systems a curated map of its most important content. As of 2026 it is not a Google Search or AI Overviews ranking signal, but Google's Chrome Lighthouse now audits for it as an agentic-browsing best practice, so it is becoming low-cost infrastructure for helping AI agents navigate your site.

meta-externalagent

meta-externalagent is Meta's web crawler, used to gather public content to train its AI models and index the web. It identifies with the user agent meta-externalagent and respects robots.txt. It is distinct from Meta-ExternalFetcher, which fetches links on a user's behalf.

OAI-SearchBot

OAI-SearchBot is OpenAI's crawler that surfaces and links websites in ChatGPT's search answers. It respects robots.txt and is separate from GPTBot (training): if you block OAI-SearchBot you can disappear from ChatGPT search results, even though you stay eligible for training.

Perplexity-User

Perplexity-User is the agent Perplexity uses to fetch a specific page in real time when a user's question requires it. Because the request is user-initiated, it generally ignores robots.txt, so a robots.txt block will not stop a direct user-driven fetch.

PerplexityBot

PerplexityBot is Perplexity's crawler, designed to surface and link websites in Perplexity's search results. Perplexity states it is not used to train foundation models and recommends allowing it in robots.txt. Blocking it removes you from the index Perplexity builds answers from.

robots.txt

robots.txt is a plain-text file at the root of a site that tells crawlers which paths they may or may not fetch, by user agent. For AI search it is the primary control for allowing or blocking crawlers like GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. Most well-behaved AI crawlers respect it.

AI Search Surfaces

AI Overviews

AI Overviews are Google's AI-generated summaries that appear at the top of some search results. Powered by Gemini, they synthesize an answer from multiple web sources and link to a few of them, so users often get the answer without clicking. Being one of the cited sources is the goal of optimizing for AI Overviews.

Answer Engine

An answer engine is a search tool that returns a single synthesized answer to a question rather than a list of links to evaluate. Perplexity, Google's AI Overviews, and ChatGPT search are answer engines. Optimizing to be cited in them is the focus of answer engine optimization (AEO).

ChatGPT Search

ChatGPT Search is ChatGPT's ability to retrieve live web results and answer with citations, instead of relying on training data alone. Pages are gathered by OAI-SearchBot, and appearing in its answers is a separate goal from using ChatGPT to write content.

Featured Snippet

A featured snippet is the boxed answer Google pulls from a ranking page and shows at the very top of results, above the blue links. It predates AI search but rewards the same thing AI engines do: a concise, directly-stated answer that can be lifted out of the page.

Google AI Mode

Google AI Mode is Google's conversational, AI-first search experience, powered by Gemini, that returns a generated answer you can follow up on rather than a traditional list of ten links. It sits alongside AI Overviews as a surface where being a cited source, not a ranked link, is the goal.

Google SGE

Google SGE (Search Generative Experience) was the experimental name for Google's AI-generated answers in Search, launched in Labs in 2023. It graduated and was rebranded as AI Overviews, which rolled out broadly in May 2024, so SGE is now the retired label for the same surface.

Knowledge Panel

A Google Knowledge Panel is the box of facts about an entity (a brand, person, or place) that appears in search results, generated automatically from the Knowledge Graph. It is the clearest sign Google recognizes you as an entity, and it cannot be requested, only earned and then claimed.

Zero-Click Search

A zero-click search is a query that ends without the user clicking through to any website, because the answer is shown directly on the results page, in a featured snippet, knowledge panel, or AI Overview. It is the core reason ranking can hold steady while traffic falls.

GEO Concepts

AI Agent

An AI agent is an AI system that takes actions to reach a goal, not just answers a question. Where a chatbot replies, an agent plans steps, uses tools (search, code, APIs, a browser), and works through a multi-step task on your behalf. Coding assistants, deep-research modes, and computer-use tools are common examples.

AI Citation

An AI citation, in the AI-search sense, is when an AI search engine references and links your page as a source in its answer. Citations are the core unit of AI visibility: in a zero-click world, being one of the cited sources, not ranking a blue link, is what gets your brand seen.

AI Search Engine Optimization

AI search engine optimization is the umbrella term for improving how your brand appears in AI-powered search, across ChatGPT, Perplexity, Gemini, and Google AI Overviews. It is the plain-English name for the same discipline also called generative engine optimization (GEO) and answer engine optimization (AEO).

AI Visibility

AI visibility is how often, and how favorably, AI engines like ChatGPT, Google AI Overviews, Gemini, and Perplexity surface your brand in their answers. It is the AI-era equivalent of search rankings: instead of where you sit in a list of links, it measures whether the AI mentions, cites, or recommends you at all.

Answer Engine Optimization (AEO)

Answer engine optimization (AEO) is the practice of structuring content to win the direct answer in answer engines, including featured snippets, voice results, and AI answers. It overlaps almost entirely with generative engine optimization (GEO); the two are the same job described from different angles.

Brand Mention

A brand mention is any reference to your brand across the web, linked or unlinked. For AI search, mentions act as corroboration: the more consistently your brand and its facts appear on trusted sources, the more confidently engines recognize and cite you.

E-E-A-T

E-E-A-T stands for Experience, Expertise, Authoritativeness, and Trustworthiness, the qualities Google's Search Quality Rater Guidelines use to judge content. It is not a direct ranking factor, but the signals behind it (first-hand experience and corroborated credibility) are what make content safer to cite, including in AI answers.

Entity SEO

Entity SEO is optimizing around clearly-defined entities (your brand, people, products, and concepts) and the relationships between them, rather than around keyword strings. It helps search and AI engines recognize who you are and connect facts to you, which makes you safer to cite.

Generative Engine Optimization (GEO)

Generative engine optimization (GEO) is the practice of structuring your content and brand presence so AI engines like ChatGPT, Perplexity, and Google AI Overviews cite, quote, and recommend you in their answers. Where SEO competes for a ranked link, GEO competes to be part of the synthesized answer itself.

LLM SEO (LLMO)

LLM SEO, sometimes called LLMO, is optimizing content to be surfaced and cited by large language model tools like ChatGPT and Claude. It frames the work around the models specifically, but in practice it is the same discipline as generative engine optimization (GEO) and answer engine optimization (AEO).

Schema Markup (Structured Data)

Schema markup is code, usually JSON-LD using the schema.org vocabulary, that labels what the content on a page means so machines can parse it. It does not force AI citations, and Google says no special schema is required for AI features, but it helps engines understand your entities and content.

Share of Voice (AI)

AI share of voice is the percentage of AI answers, across a defined set of prompts, in which your brand is mentioned or cited, compared with competitors. It is the closest thing to a ranking metric in AI search: not a position, but how often you are part of the answer.

Topical Authority

Topical authority is the depth and breadth of credible content a site has on a subject, signaling to search and AI engines that it is a reliable source on that topic. It is built with connected content clusters and consistent entity signals, not a single page.

How AI Search Works

AI Hallucination

An AI hallucination is when a model generates false or fabricated information and presents it as fact. In AI search it shows up as wrong claims, invented sources, or incorrect brand details. Grounding answers in retrieved, citable sources is the main defense, which is why clear, sourced content matters.

AI Inference

Inference is the act of running a trained AI model to produce an output, as opposed to training, which is building the model in the first place. Every time you send a prompt and get an answer, that is one inference. It is where the ongoing cost, speed, and energy use of AI mostly live.

Content Chunking

Content chunking is structuring a page into self-contained units, each making sense on its own, so AI systems can retrieve and cite a single passage cleanly. Engines pull chunks, not whole pages, so a section that stands alone without 'as mentioned above' is far easier to quote.

Context Window

A context window is how much text, measured in tokens, a language model can consider at once, including the prompt and any retrieved sources. In AI search, retrieved pages are loaded into the context window so the model can ground its answer in them.

Fine-Tuning

Fine-tuning is further training a pretrained model on a smaller, focused dataset so it specializes in a task, tone, or domain. The model keeps its general knowledge but adjusts its weights toward your examples, which is why a fine-tuned model can sound on-brand or follow a niche format that prompting alone struggles to enforce.

Grounding

Grounding is the practice of tying an AI model's answer to verifiable external sources retrieved at query time, rather than relying on the model's internal memory alone. Grounded answers cite where each claim came from, which reduces hallucination and makes your content quotable.

Knowledge Graph

A knowledge graph is a structured network of entities (people, places, brands, concepts) and the relationships between them, often modeled as triples (subject, relation, object). Google's Knowledge Graph powers knowledge panels and helps engines understand who you are. A clear, consistent entity presence makes you easier to recognize and cite.

Large Language Model (LLM)

A large language model (LLM) is an AI model trained on vast amounts of text to understand and generate language. LLMs like GPT, Gemini, and Claude power AI search engines, producing answers from patterns learned in training plus sources retrieved at query time.

Mixture of Experts (MoE)

A mixture of experts is a model architecture split into many smaller 'expert' sub-networks, where a routing layer activates only a few experts for each token instead of running the whole model every time. This lets a model hold a very large number of parameters while keeping the compute (and cost) per answer low.

Multimodal AI

Multimodal AI is a model that can work with more than one type of data: text, images, audio, and video, in a single system rather than handling text alone. A multimodal model can read a screenshot, describe a chart, transcribe speech, and answer questions about a video, often in the same conversation.

Named Entity Recognition (NER)

Named entity recognition (NER) is the natural language processing step that detects and classifies the entities (people, brands, places, products) mentioned in text. It is how search and AI engines turn a string of words into known things they can connect to facts in a knowledge graph.

Open Source AI

Open source AI refers to AI models released under licenses that let anyone use, study, modify, and share them. In the strict Open Source Initiative sense that requires open code, weights, and enough training detail to recreate the model. In practice the label is used loosely: many models called 'open source' are really open weights, with the data and recipe kept private.

Open Weights

An open-weights model is one whose trained parameters (the weights) are published for download, so you can run and usually fine-tune them, subject to the license. It is not the same as open source: the weights are released, but the training code, data, and recipe usually are not, so you cannot fully reproduce or audit how it was built.

Query Fan-Out

Query fan-out is the technique where an AI search engine expands a single user question into many parallel sub-queries, retrieves results for each, and synthesizes one answer. Ask for a 5-day trip to Japan and it quietly searches hotels, weather, train passes, and more at once.

Reasoning Model

A reasoning model is a large language model trained to work through a problem step by step before giving its final answer, rather than responding in one pass. Examples include OpenAI's o-series, DeepSeek-R1, Gemini Deep Think, and Claude's extended thinking. The extra 'thinking' improves accuracy on hard math, coding, and multi-step logic, at the cost of more time and tokens.

Retrieval-Augmented Generation (RAG)

Retrieval-augmented generation (RAG) is the technique behind most AI search: instead of answering only from memory, the model retrieves relevant documents at query time and grounds its generated answer in them, then cites the sources it used. It is why fresh, reachable, extractable pages can be quoted right away.

Semantic Search

Semantic search matches content to a query by meaning and intent rather than exact keywords, using vector embeddings to compare concepts. It is why AI engines can pull your page for a question that does not contain your exact phrasing, and why writing for intent beats writing for keyword strings.

Transformer Model

The transformer is the neural-network architecture behind nearly every modern large language model, introduced by Google researchers in the 2017 paper 'Attention Is All You Need.' Its key idea, self-attention, lets the model weigh how much every word in the input relates to every other word, which is what makes it so good at language.

Vector Embeddings

A vector embedding is a numerical representation of text (or images, audio) that captures its meaning as a point in high-dimensional space. Pieces of content with similar meaning sit close together, which lets AI systems retrieve relevant passages by similarity. Embeddings power semantic search and RAG.

Wikidata

Wikidata is a free, collaborative knowledge base of machine-readable facts about entities, structured as items with unique IDs (QIDs) and referenced statements. It feeds Google's Knowledge Graph and is one of the sources AI engines lean on to recognize and describe a brand.