Open Weights vs Open Source: The Real Difference (2026)

Open weights and open source get used as if they mean the same thing. They do not. Open weights means you can download the finished model and run it. Open source means you also get the recipe: the training code and enough detail about the data to rebuild and audit the model from scratch. Almost every model marketed as "open source" today - Llama, DeepSeek, Qwen, Gemma - is actually only open-weight. This guide covers what each term really means, which 2026 models fall where, the license catches that bite in production, and why the distinction matters if you care about being cited in AI answers.

Open Weights vs Open Source: The Short Answer

An open-weights model hands you the trained model. An open-source model hands you the model plus everything needed to reproduce it.

Both let you download and run the thing. The difference is what you can do beyond running it: study how it was built, verify what went into it, and rebuild it yourself if you need to. Open weights give you the product. Open source gives you the product and the factory.

What you get	Open weight	Open source (OSAID)
Trained weights to download and run	Yes	Yes
Fine-tune on your own data	Yes	Yes
Training code	No	Yes
Training data (or detailed data information)	No	Yes
Rebuild from scratch and fully audit	No	Yes
Typical license	Permissive or custom (Apache, MIT, or vendor terms)	OSI-approved (e.g. Apache 2.0)

That single distinction drives everything else: what you can legally do with the model, whether you can trust what is inside it, and whether the "open" label means anything at all. And notice what the license column does not settle. The same permissive license, Apache 2.0, turns up on both sides in the model table below. What separates open weight from open source is not the license name, it is whether the training code and data come with it.

What "Open Weights" Actually Means

The weights are the numbers a model learned during training. Billions of parameters that, together, decide how it turns your prompt into a response. When a lab publishes those weights, you can download them, run the model on your own hardware, and fine-tune it on your own data. That is a real and useful kind of openness. You are no longer renting the model through someone else's API on their meter and their terms.

What you do not get is how it was made. The training code, the exact data it was trained on, the filtering and cleaning steps, the specific recipe - all of that usually stays private. You have the finished parameters, not the process that produced them.

A common way to describe this: the weights are the compiled output, not the source. It is like shipping a program as an executable instead of the source code. You can run it and even patch it, but you cannot read the logic that built it or recreate it cleanly. The comparison is not perfect, and some engineers push back on it, since in practice most people work with a model by adjusting weights, not by retraining from raw data. But it captures the core limit. You are trusting the creator's choices about what went in, because you cannot see them.

In practice, open weights show up on hubs like Hugging Face, where you download a model file and a license, run it locally or on rented GPUs, and adapt it. Downloading the weights is easy. Whether you can use them the way you want is a separate question, and it lives entirely in the license.

What "Open Source AI" Actually Means

Open source has a specific definition, and it is stricter than "you can download it." The Open Source Initiative, the same body that has defined open source software since 1998, published its Open Source AI Definition (OSAID 1.0) on October 28, 2024. It sets the bar for what a genuinely open-source model has to provide.

The definition grants four freedoms, inherited from free software: the freedom to use the system for any purpose, to study how it works, to modify it, and to share it. To make those freedoms real, an open-source model has to release the "preferred form for making modifications." For an AI system, that means three things together: the model weights, the full code used to train and run it, and sufficiently detailed information about the training data.

There is one nuance that trips people up. OSAID does not force a lab to publish the raw training dataset, which is often impossible to redistribute for copyright or privacy reasons. It requires detailed data information instead: where the data came from, how it was selected and filtered, and enough documentation that a skilled person could assemble a substantially equivalent dataset and retrain the model. Critics argue this is too soft and that true openness should require the data itself. That debate is still live.

The practical test is reproducibility. If you have the weights, the training code, and the data information, an independent team can rebuild the model, inspect it for bias or backdoors, and verify what it claims to be. Very few models clear that bar. The clearest current example is OLMo from the Allen Institute for AI, which ships its weights and full training code under Apache 2.0, along with its Dolma training dataset. It has company: EleutherAI's Pythia and LLM360's Amber and CrystalCoder are among the models the Open Source Initiative has pointed to as genuinely conforming. That is what open source actually looks like.

The Openness Spectrum: Closed, Open-Weight, Open-Source

Treating this as open versus closed hides the part that matters. Openness is a spectrum, and most of the interesting models sit in the middle.

The AI openness spectrum from closed to open-weight to open-source, showing what you get at each tier: closed gives API access only, open-weight adds downloadable weights, open-source adds training code and data information. — Openness is a spectrum. Most models people call "open" are open-weight, not open-source.

At one end are closed models like GPT-5.x, Claude, and Gemini. You reach them through an API. You never touch the weights, and you cannot run them yourself. In the middle are open-weight models: the weights are downloadable, but the recipe is not. At the far end are open-source models that meet OSAID, where the full stack is public.

Tier	What you get	Example models	Typical license
Closed	API access only; no weights	GPT-5.x, Claude, Gemini	Proprietary terms of service
Open-weight	Downloadable weights; run and fine-tune; no training code or data	Llama, DeepSeek, Qwen, Gemma, gpt-oss, Kimi K2	Custom or permissive, varies
Open-source (OSAID)	Weights, training code, and data information; fully reproducible	OLMo	OSI-approved (Apache 2.0)

There is a fourth label worth knowing: restricted weights. These are open-weight models released under terms that limit who can use them or how, often for regulated industries. The Open Source Initiative keeps a hard line here. In its view, any restriction on use or field of endeavor means a model is source-available or proprietary, not open source. Others prefer the spectrum framing, arguing that "usable but limited" deserves its own category rather than being lumped in with fully closed systems.

Which AI Models Are Open Weight vs Open Source? (2026)

Now the useful part: which real models are which. This is where the labels get sloppiest, because nearly every model below is called "open source" somewhere. Only one of them actually is.

Model	Open-weight or open-source?	License	Commercial-use catch
Llama (Meta)	Open-weight	Llama Community License + Acceptable Use Policy	Free unless you exceed 700M monthly active users; use-case and regional bans apply
OLMo (Allen Institute for AI)	Open-source	Apache 2.0	None; full code and Dolma data released
DeepSeek (R1, V3)	Open-weight	MIT	None; distillation and commercial use allowed
Qwen (Alibaba)	Open-weight	Apache 2.0 (Qwen3 and Qwen3.5)	None on current models; older Qwen used a community license
Kimi K2 (Moonshot)	Open-weight	Modified MIT	Attribution or display required above large usage thresholds
GLM (Zhipu / z.ai)	Open-weight	MIT	None
Mistral	Mixed	Apache 2.0 or research/non-production license, by model	Research-licensed models are not for commercial use
Gemma (Google)	Open-weight	Apache 2.0 (Gemma 4)	None on Gemma 4; earlier Gemma used custom Google terms
gpt-oss (OpenAI)	Open-weight	Apache 2.0	None

Read that top row carefully, because it is the one people get wrong most. Meta calls Llama open source in its own marketing, but Wikipedia's summary reflects the view of the definition-setting bodies: Llama's license discriminates against certain users and uses, which the Open Source Definition forbids, so it is not open source. The Free Software Foundation reached the same conclusion. Meta's Community License requires anyone above 700 million monthly active users to ask for permission, which Meta can refuse.

The permissive open-weight releases are more generous. DeepSeek ships its recent models under the MIT license, Qwen moved its current models to Apache 2.0, and OpenAI's gpt-oss models are Apache 2.0 as well. Kimi K2 uses a Modified MIT license that adds an attribution requirement at scale, and Zhipu's GLM models are MIT too. Google flipped Gemma 4 to Apache 2.0 in April 2026, dropping the custom terms its earlier models carried. In our experience helping brands track how AI systems cite them, the license is the detail teams skip and later regret, because the download page and the terms of use tell two different stories. Notice the pattern across the whole field of Chinese open-weight models and beyond: permissive licenses are common, but open source, in the strict sense, is rare.

Why the Difference Matters

This is not a semantic argument. The gap between open weights and open source has three consequences that land on real projects.

The first is legal. Downloading the weights does not give you the right to do anything you want with them. The license does. A custom open-weight license can cap you at a user threshold, ban entire fields of use, forbid using the model's outputs to train a competitor, or exclude whole regions. Meta's multimodal Llama models, for example, are not licensed to individuals or companies based in the European Union. The permissive open-source licenses, MIT and Apache 2.0, carry almost none of that. So "is it free to use commercially" has no general answer. It depends on which model and which release.

The second is trust. With an open-weight model you inherit the creator's decisions about training data, filtering, and safety tuning, and you cannot fully audit any of them. That is a genuine liability in regulated sectors like finance or healthcare, where you may need to prove what a model was and was not trained on. Downloadable is also not the same as vetted. You cannot read a block of weights for a hidden backdoor the way you can read source code, and poisoned model files do surface on public hubs, so open weights still need scanning rather than blind trust. Only a reproducible, open-source model lets an outside team verify those claims in full.

The third is control. Open weights let you fine-tune, which is cheap and practical. They do not let you retrain from scratch, which for a frontier model is wildly expensive and, without the data and code, impossible anyway. So your ability to truly change the model is bounded. Self-hosting is not the automatic saving people expect, either: renting the GPUs and keeping a large model running usually costs more than a metered API until you are serving very high volume. And an open-weight model still depends on its vendor. Terms can change, and a model can be pulled or restricted, as export controls have already shown, leaving teams scrambling for a replacement.

The capability cost of choosing open is smaller than most people assume. Epoch AI finds that the best open-weight models trail the best closed models by only a few months, about three and a half on average by its tracking. When the performance gap is that narrow, licensing and control become the real decision, not raw capability.

Why Companies Release Weights but Call It "Open Source"

If open weights are not open source, why do so many labs use the term anyway? Part of it is genuine, and part of it is marketing.

The genuine part is that publishing full training data and code is hard and risky. The data often contains copyrighted or private material a company cannot legally redistribute. The training recipe is a trade secret worth hundreds of millions in compute and research. And a fully open pipeline makes it easier for someone to strip out the safety tuning. Releasing weights while holding back the rest is a real compromise between openness and those pressures.

The marketing part is where it gets slippery. "Open source" carries goodwill that "open weights" does not, so labs reach for the stronger term. Critics call this openwashing: dressing up a weights-only release as something more transparent than it is. When Meta published its 2024 essay framing Llama as open source, the Open Source Initiative pushed back publicly, and the dispute became the clearest example of the gap between the label and the definition.

There is even a linguistic argument that the word has drifted. In this view "open source" has fossilized into a loose synonym for "not locked behind an API," with the "source" part no longer meant literally. Whether you find that reasonable or misleading, it is why precise terms matter.

Others are trying to fix the vocabulary rather than fight it. The Open Weight Definition, from the Open Source Alliance, gives weights-only releases their own honest standard instead of forcing them under the open-source banner. The practical takeaway for anyone choosing a model is simple: ignore the label on the announcement and read the license on the model card. That is the only thing that actually governs what you can do.

What Open Weights vs Open Source Means for AI Visibility

For most marketers this reads as an engineering debate, but it shapes where your brand does and does not get mentioned. The biggest answer engines, ChatGPT and Google's AI Overviews and Gemini and Copilot, still run on closed frontier models. Open weights widen the long tail underneath them. Anyone can run DeepSeek, Qwen, Kimi, Llama, or gpt-oss, so a growing set of assistants, search features, and web-browsing agents is built on downloadable models rather than a big lab's API. Perplexity's Sonar, for one, is fine-tuned from Meta's Llama.

The lever that decides whether you get named is the same across all of them, open or closed. None of these systems let you touch the weights, but every one of them pulls in outside sources at answer time and picks which to cite. That evidence is the part you can influence, and open weights simply mean more systems are making the call.

Frequently Asked Questions

Is Llama open source or open weight? Open weight. Meta calls Llama open source, but its Community License restricts certain users and fields of use, which the Open Source Definition forbids, so the Open Source Initiative and the Free Software Foundation both say it is not open source. You can download and fine-tune the weights, but the training data and code are not released, and companies above 700 million monthly active users need a separate license from Meta.

Is DeepSeek open source or open weight? Open weight. DeepSeek's R1 and V3 weights are published under a permissive MIT license, so commercial use and even distillation are allowed. But DeepSeek does not release its training data or full training code, so you cannot reproduce or fully audit the model. Very permissive, still not open source in the strict sense.

Is ChatGPT open source? What about gpt-oss? ChatGPT and the GPT-5.x models behind it are closed. You only reach them through OpenAI's API or app. Separately, OpenAI released gpt-oss-120b and gpt-oss-20b as open-weight models under Apache 2.0, which you can download and run yourself. Those are open weight, not open source, because the training data and code stay private.

Is Qwen open source? Qwen's current open-weight models, Qwen3 and Qwen3.5, are released under Apache 2.0, so they are free for commercial use with no user cap. Older Qwen models used a custom community license with more conditions. Like the others, the weights are open but the training data and recipe are not, so it is open weight rather than open source.

Can I use open-weight models commercially? Usually yes, but the license decides, not the download button. Permissive licenses like MIT and Apache 2.0 allow commercial use freely. Custom licenses may add user thresholds, attribution requirements, field-of-use bans, or regional exclusions. Always read the license on the model card before you build on it.

What is the difference between open weights and open source in one sentence? Open weights give you the finished model to run and fine-tune, while open source gives you the model plus the training code and data information needed to rebuild and fully audit it.

The Label Is Loose, the License Is Not

The word "open" is doing a lot of unearned work in AI right now. What almost always sits behind it is open weights: downloadable and genuinely useful, but not reproducible and not free of strings. That is not a scandal, just a different thing, with real consequences for what you are allowed to do and what you can trust. So read the license on the model card, not the label on the announcement.

If your buyers are getting answers from AI, being named in those answers is its own problem to solve, whichever models the engines happen to run on. geotoolbox's Citation Interceptor shows which offsite sources the major AI engines cite when they answer questions in your market, and where your brand is missing from them, so you can work on the evidence these systems actually retrieve.

Open Weights vs Open Source: The Real Difference (2026)

Open Weights vs Open Source: The Short Answer

What "Open Weights" Actually Means

What "Open Source AI" Actually Means

The Openness Spectrum: Closed, Open-Weight, Open-Source

Which AI Models Are Open Weight vs Open Source? (2026)

Why the Difference Matters

Why Companies Release Weights but Call It "Open Source"

What Open Weights vs Open Source Means for AI Visibility

Frequently Asked Questions

The Label Is Loose, the License Is Not

Sources

Chinese AI Models Compared: DeepSeek, Qwen, GLM, Kimi (2026)

What Is DeepSeek? China's Open-Weight AI, Explained

GEO vs AEO vs SEO: What's the Difference?

Put this into practice.

GEO Scan

Content Analyzer

Domain Overview

Open Weights vs Open Source: The Short Answer

What "Open Weights" Actually Means

What "Open Source AI" Actually Means

The Openness Spectrum: Closed, Open-Weight, Open-Source

Which AI Models Are Open Weight vs Open Source? (2026)

Why the Difference Matters

Why Companies Release Weights but Call It "Open Source"

What Open Weights vs Open Source Means for AI Visibility

Frequently Asked Questions

The Label Is Loose, the License Is Not

Sources

More on this topic

Chinese AI Models Compared: DeepSeek, Qwen, GLM, Kimi (2026)

What Is DeepSeek? China's Open-Weight AI, Explained

GEO vs AEO vs SEO: What's the Difference?

Put this into practice.

GEO Scan

Content Analyzer

Domain Overview