G
GEO Toolbox
chatgptai-citationgeoai-visibilitychatgpt-search

How ChatGPT Cites Sources (and Why It Gets Them Wrong)

How ChatGPT cites sources: what a citation is, how it decides what to cite, why so many citations are wrong, and how to track which sources it shows you.

Samy Ben SadokSamy Ben Sadok14 min read
In this post12 sections

Those little numbered links in a ChatGPT answer are its citations, the web pages the model pulled to build the response. The machinery behind them is stranger than it looks: ChatGPT reads far more than it credits, and the credits it does show are wrong more often than most people assume.

One quick disambiguation first. If you came here to learn how to cite ChatGPT itself in an APA or MLA paper, that is a different question with a short answer in the FAQ. This article is about how ChatGPT cites its sources, which is what matters if you care whether your content is the thing it points to.

What a ChatGPT Citation Actually Is

A ChatGPT citation is a link to a web page the model retrieved to build its answer. When ChatGPT runs a web search, it surfaces those sources two ways: as inline numbered links in the text, and in a Sources panel you can expand. On desktop, hovering an inline link shows a preview card of the page. On mobile, the sources collapse into a tap-to-open list, so that hover preview is a desktop convenience, not the citation itself.

The most useful thing to know is the tell. Per OpenAI's documentation, a ChatGPT Search answer carries inline citations plus a Sources button, so in practice, no Sources button means ChatGPT did not search the web for that answer. It answered from parametric memory, the patterns the large language model absorbed during training, and cited nothing, even when it sounds just as sure of itself.

That distinction is the foundation for everything below. A citation only exists when ChatGPT actually searched, and what it shows you is a filtered slice of what it read, not the whole list. The glossary entry on an AI citation defines the term; this piece is about the machinery behind it, which starts with how AI search works.

How ChatGPT Decides What to Cite

ChatGPT cites only about half of what it reads. When it searches, it fires several queries, pulls back a candidate pool of pages, and then cites roughly one in two. Ahrefs analyzed 1.4 million ChatGPT prompts and found the model cited 49.98% of the URLs it retrieved, about 23.4 million cited pages out of roughly 47 million pulled, while the other half were read for context and dropped. So the Sources panel is the visible tip of a much larger set of pages the model looked at.

What survives the cut depends heavily on the type of source. The same study scored citation rates by source type, and the spread is enormous.

Source typeCitation rate
Search-index pages88.46%
News12.01%
Reddit1.93%
YouTube0.51%
Academia0.40%

Two more findings cut against common assumptions. Among search-index results, the median cited page was around 500 days old, about 1.3 years, so relevance can outweigh raw freshness for ChatGPT (the opposite of Perplexity's live-first bias, which we cover in ChatGPT vs Perplexity). And in that same slice, pages with natural-language URL slugs were cited at 89.78% versus 81.11% for those without, a small but real edge for clean URLs.

Why Search-Index Pages Dominate

The 88% figure for search-index pages is doing a lot of work. ChatGPT Search runs on a web index, and the pages it already finds through that index are the ones it trusts enough to cite. News, Reddit, and YouTube get read for context and color, but the citation usually lands on a page that was retrievable and rankable in the first place. Classic discoverability still gates everything: if your page is not in the index ChatGPT searches, it cannot be the one-in-two that gets cited.

Fan-Out: One Question, Many Searches

When you ask one question, ChatGPT does not run one search. It expands your prompt into several related sub-queries, a process called fan-out, and pulls a candidate pool from all of them. It then scores those candidates against the answer it is drafting and keeps the closest matches, which is why roughly half the pool is dropped. A page can be solid and still lose its slot because another source matched the specific sentence better. Being retrievable gets you into the pool; being the clearest match for a passage is what earns the citation.

The Reddit Paradox: Read but Not Credited

ChatGPT reads Reddit constantly and almost never credits it. In the same analysis, 67.8% of all non-cited URLs came from Reddit, even though Reddit's own citation rate was just 1.93%. The model mines Reddit threads for context, opinions, and phrasing, then writes the answer and cites a cleaner-looking source instead.

This is the gap most people miss: retrieval is not citation. Your page can be pulled into the candidate pool, shape the wording of the answer, and never appear in the Sources panel. Being read is not the same as being cited, and the panel quietly undercounts what actually influenced the response.

Why does it swap in a different source? The model leans toward a citation that looks authoritative and self-contained (a clean article over a sprawling forum thread), even when the thread is where the substance came from. So a Reddit discussion can shape the answer while a tidy blog post takes the visible credit.

One denominator note keeps these numbers honest. The 1.93% above is the rate at which retrieved Reddit URLs earn a citation, not Reddit's slice of the citations you actually see. By share of all citations, Reddit sits near the top of the table: Profound's February 2026 study of roughly 730,000 US ChatGPT conversations with web citations (October to December 2025 data) measured Reddit at about 3% of all citations, and a May 2026 5W Public Relations audit put it at 11.97%, second only to Wikipedia's 13.15%. The two numbers differ by a factor of four because the studies are built differently: Profound counted one dataset from late 2025, while 5W synthesized nine datasets reaching into spring 2026, and as the sections below show, these shares move fast enough that the window matters. What they agree on is the rank, with Reddit at or near number two. Both pictures are true at once: ChatGPT retrieves so much Reddit that even a tiny per-URL citation rate leaves Reddit among its most-cited domains. Whenever you compare citation stats, check the denominator first.

For a brand, that has a blunt consequence. If your category lives on Reddit, ChatGPT is probably reading those threads, but the visible credit is going somewhere else, and only the visible credit is measurable from the Sources list.

How Citations Differ by Mode

There is no single "ChatGPT citation" behavior, because ChatGPT is several products wearing one name. How (and whether) it cites depends on the mode you are in.

ModeDoes it cite?How sources showWhat to expect
Plain chat (no search)NoNothingAnswer from parametric memory; no Sources button
SearchYesInline links + Sources panelA handful of cited pages behind a one-shot web search
Deep ResearchYes, heavilyMany inline citations + a long source listDozens of sources across many searches; the most source-dense mode
ShoppingYesProduct cards with merchant linksSources are retailers and product pages, not articles
Agentic browser (Atlas)YesLinks to pages it visitedCites what the agent actually opened while completing a task

The practical takeaway is that the same question can produce zero citations in plain chat and twenty in Deep Research. If you are checking whether ChatGPT cites you, the mode you test in changes the answer. And because each mode searches differently, a page cited in Deep Research is not guaranteed to surface in a quick Search answer.

Why ChatGPT Gets Citations Wrong

A ChatGPT citation tells you where the model looked, not whether it looked correctly. When Columbia Journalism Review's Tow Center ran a 1,600-query news-attribution test across eight AI search engines, more than 60% of the answers carried incorrect citations. ChatGPT Search misidentified 134 of 200 source articles, hedged with uncertainty language only 15 times, and never once declined to answer. It was confidently wrong far more often than it was cautiously right, and the study found the premium tiers answered more confidently, not more accurately, than the free ones.

EngineIncorrect answers (news attribution)
Perplexity (best)37%
ChatGPT Search67% (134 of 200)
Grok-3 (worst)94% incorrect; 154 of 200 cited links broken

Part of this is history. The University of Arizona Libraries notes that early ChatGPT routinely fabricated citations because it could not search the web and was built to produce plausible text, not retrieve real references. Web search reduced that, but did not erase it. Headlines still get invented, bylines still land on the wrong outlet, and links still point at scraped copies rather than the original.

Licensing Doesn't Save You

A licensing deal with OpenAI does not guarantee accurate credit. An earlier Tow Center study from November 2024 found ChatGPT citing partner publishers like the New York Post and The Atlantic inaccurately, and crediting a Yahoo-republished copy of a USA Today article even though USA Today had blocked its crawler. For a brand, that means the credit for your own work can land on a scraper, and a claim can carry your name without ever coming from you. It is the same machinery behind why AI gets your brand wrong, and the deeper dive there covers what to do about it.

How to Read and Trust the Sources ChatGPT Shows You

Treat the Sources panel as a starting point, not a verdict. Two facts from above should change how you read it: the panel undercounts what the model actually used, and a meaningful share of cited links are wrong or broken. A source appearing next to a claim is not proof the source supports that claim.

A 30-Second Citation Check

A quick way to check any ChatGPT citation:

  1. Click through. A cited link can 404 or land on an unrelated page. If it does not open, the citation is worthless.
  2. Confirm the page actually says it. Read the cited page for the specific claim. ChatGPT regularly attaches a real source to a sentence that source never made.
  3. Watch for syndication. If the link is a republished or scraped copy, find the original and cite that instead.
  4. Check the date. A confidently current-sounding answer can rest on a year-old page, given the ~500-day median.
  5. Re-ask. Run the same prompt again. If the sources change, you are seeing how unstable a single snapshot is.

The thirty seconds this takes is the difference between quoting ChatGPT and quoting whatever ChatGPT half-read. For anything you publish or send a client, the citation is a lead to verify, not a citation to trust.

Do ChatGPT's Citations Stick Around?

A ChatGPT citation is a snapshot, not a permanent record. The link points to the live web, so if the page changes, moves, or comes down, the citation rots the way any bookmark does. Open an old chat months later and a source that once backed the claim may now 404, redirect, or show different content than the model summarized.

The Source Set Moves

There is a second kind of impermanence that matters more for visibility. Because ChatGPT re-runs the search every time, the source set itself is not fixed. The page it cited for a query last month may not be cited this month, even when nothing about your page changed, because the candidate pool and the model both shifted underneath it. A citation you earned is not a citation you keep, which means a screenshot of the Sources panel is evidence of one moment, not a standing fact.

The swings are not subtle. When Semrush tracked 230,000 prompts weekly from July 14 to October 12, 2025, the share of ChatGPT responses citing Reddit collapsed from close to 60% in early August to around 10% by mid-September 2025, and Wikipedia fell from roughly 55% of responses to under 20% over the same study window (those are shares of responses, a different denominator from the citation shares earlier). Model updates move the set too: a SISTRIX analysis of 3.8 million German-language ChatGPT responses found that when ChatGPT rolled over to GPT-5.5 in late May 2026, citation patterns that normally drift 1 to 2% a day shifted 47% in 48 hours, and the average number of cited sources per response dropped from 30.9 to 28.4. Nothing about those pages changed. The engine did.

How to Measure Which Sources ChatGPT Cites for You

There is no ChatGPT Search Console. Google hands you a dashboard of the queries you rank for; ChatGPT hands you nothing, so the sources it shows for your category are invisible unless you go looking on purpose.

The manual method works for a quick read. Write down the prompts a real customer would ask about your category, such as "best [your product] for [use case]" or "is [your brand] worth it," run each one in the mode your audience uses, and log which sources appear in the panel. Repeat it on a schedule, because the results move. In the scans we run for our own brand prompts, the same prompt can return a different source set day to day, so a single check tells you almost nothing, and the only useful signal is the trend across many runs.

That is also where the manual method runs out. Tracking dozens of prompts across modes, week after week, by hand is not realistic, which is the gap geotoolbox is built to close: its visibility tracker re-runs your prompts on a schedule (the free plan covers ChatGPT; paid plans add more engines), and citation tracking on the paid plans logs which sources surface over time. The broader metric this rolls up into is your AI share of voice, and the tooling view is an AI rank tracker. Both exist to make the source picture visible instead of guessed at.

What This Means If You Want to Be Cited

If the goal is to be the page ChatGPT cites, the mechanics above point to three things, and none of them is a new trick. Be in the index it searches, which means letting the OpenAI crawlers reach you and not blocking them in robots.txt or at the firewall. Be the clearest, most quotable source on the question, because the model favors pages it can lift a clean answer from. And give it the signals it rewards.

The field is also more open than the big-domain headlines suggest. In Profound's February 2026 analysis of roughly 730,000 US ChatGPT conversations, the ten most-cited domains captured only about 12% of all citations, with Wikipedia first at just 5%. Trackers built on other windows and methods put the top domains higher (the 5W audit above has Wikipedia and Reddit alone over 25% combined), but every large study agrees on the shape: most of ChatGPT's citations go to a long tail of ordinary sites that were the cleanest, best-corroborated answer to a specific question.

💡

Check the gate first

Before ChatGPT can cite you, a crawler has to reach you. The free AI Crawler Checker shows which AI crawlers your robots.txt allows or blocks, with the exact line to fix.

That last point has data behind it. The Generative Engine Optimization study (Aggarwal et al., IIT Delhi and Princeton, KDD 2024) found that its top tactics, citing sources, adding quotations, and adding relevant statistics, each lifted a source's visibility in generated answers by roughly 30 to 40%. Concrete, well-sourced, quotable writing is what gets pulled into the citation set.

The full getting-cited playbook, including the OpenAI crawlers and what ChatGPT actually rewards, is in SEO for ChatGPT, and the engine-agnostic workflow is in how to optimize for AI search. Start there once you understand the machinery on this page.

Citations You Can See Are Citations You Can Fix

ChatGPT's citations are a filtered, often-wrong, mode-dependent slice of what it read, not a clean bibliography. Once you know that, the panel stops being a verdict and becomes a working surface: every link in it is something you can verify, correct, or compete for. Understanding that is what separates quoting ChatGPT from being quoted by it.

The one move that pays off no matter what is to stop guessing. If you have never looked at which sources ChatGPT shows for your category, the citation tracking in geotoolbox gives you that view across the engines on your plan. It reads the same Sources panel ChatGPT shows everyone, the cited half rather than the full retrieval pool, but the cited half is the half you can act on. You cannot fix a citation you cannot see.

Frequently Asked Questions

Can I make ChatGPT cite its sources? Yes. Toggle the web-search tool on, or ask it directly to search the web for the answer, and the response comes back with inline citations and a Sources panel. Without a live search there is nothing for it to cite, so forcing the search is the only reliable way to get sources on demand.

Does blocking OpenAI's crawlers remove my site from ChatGPT citations? Blocking OAI-SearchBot removes your pages from ChatGPT Search results, and with them your citations, over time. Blocking GPTBot only opts you out of model training, not search. And blocked content can still surface indirectly: syndicated copies on sites that allow the crawlers can get cited in your place, as USA Today found.

How do I spot a fabricated ChatGPT citation? The tell is plausibility without traceability: a real author paired with a paper that does not exist, or a journal that checks out while the article title does not. Paste the exact title into a search engine; if nothing comes back, the reference is invented, no matter how polished it looks.

Why are the links ChatGPT cites sometimes broken? Three mechanisms: the model can return a URL from retrieval without re-fetching it to confirm it still resolves, it sometimes links syndicated or scraped copies whose hosts later pull them down, and in the worst cases a generated URL was never valid at all. Click-testing the link is always step one.

How do I cite ChatGPT in an APA or MLA paper? That is the other meaning of "ChatGPT citations" and a separate question from this article. Both APA and MLA treat ChatGPT output as a non-recoverable source and publish their own current formats, so follow the official APA Style and MLA Style guidance for the exact template.

Sources

Keep reading