What is Mixture of Experts (MoE)?

Mixture of Experts (MoE): Definition

Mixture of Experts (MoE)

Also: MoE, mixture-of-experts

A mixture of experts is a model architecture split into many smaller 'expert' sub-networks, where a routing layer activates only a few experts for each token instead of running the whole model every time. This lets a model hold a very large number of parameters while keeping the compute (and cost) per answer low.

Updated June 23, 2026

MoE is the trick behind the headline parameter counts you see on modern models. DeepSeek's V4-Pro, for example, is reported to hold 1.6 trillion parameters but activate only a fraction of them, on the order of tens of billions, for any given token, which is how labs like DeepSeek and Google ship huge, capable models that are still cheap enough to run at scale.

For a publisher, MoE is mostly a "why it is cheap and fast" explanation rather than something you optimize for. It does not change how an engine decides what to cite. It is worth knowing mainly because it explains a paradox you will keep meeting: models with trillion-parameter headlines that still answer in a second and cost cents to run.