What is AI Inference?

AI Inference: Definition

AI Inference

Also: inference, model inference, inference vs training

Inference is the act of running a trained AI model to produce an output, as opposed to training, which is building the model in the first place. Every time you send a prompt and get an answer, that is one inference. It is where the ongoing cost, speed, and energy use of AI mostly live.

Updated June 23, 2026

Pretraining a frontier model is an enormous upfront expense; inference is the recurring one, paid every time anyone asks the model anything. That economics shapes the products you use: free tiers route you to fast, cheap models, daily caps exist because inference costs money, and architectures like mixture of experts exist largely to make inference cheaper.

It also explains why engines do not re-read the whole web for every query. Inference has a budget, so an AI answer pulls a handful of retrieved sources, not hundreds. Being one of the few clear, authoritative pages on your topic is worth more than being one of many vague ones, because only a few make it into the answer.