G
GEO Toolbox

AI Crawlers

CCBot

Also: Common Crawl bot

CCBot is the crawler operated by Common Crawl, a nonprofit that publishes a free, open dataset of web pages. Because that dataset is widely used to train large language models, CCBot is one of the most common indirect routes your content takes into AI systems. It respects robots.txt.

Updated

You will not see CCBot named by a specific AI company, but its open dataset feeds many models, so allowing or blocking it has knock-on effects across the AI ecosystem.

If you are auditing which AI crawlers can reach your site, include CCBot alongside the engine-specific bots like GPTBot and ClaudeBot.