30+ Open Source LLM Statistics & Trends (2026)
Explore 30+ open source LLM statistics for 2026, covering ecosystem growth, model usage, performance, inference costs, and leading developers.
Written by Sherlock Xu
Last updated on Jul. 2, 2026

Open source LLMs went from a research curiosity to the backbone of real production systems in under three years. They now power coding assistants, agents, and enterprise pipelines, and in some fields they’ve overtaken the proprietary models that defined the early days of the boom.
So how big is the open source LLM ecosystem today? And how fast is it actually growing?
I pulled together the most useful open source LLM statistics I could find, all from primary sources like the Stanford AI Index, Hugging Face, Meta, OpenRouter, and the Stack Overflow Developer Survey. Each section links to the original sources so you can cite them directly.
Top open source LLM statistics
This article prioritizes first-party reports, research papers, official documentation, and original datasets.
| Statistic | Value | Scope and date | Source |
|---|---|---|---|
| Public models hosted on Hugging Face | More than 2 million | All public model repositories, not only LLMs; 2025 | Hugging Face |
| Downloads captured by the top 200 models | 49.6% | Hugging Face downloads; 2025 | Hugging Face |
| Direct derivative models in the Qwen family | More than 113,000 | Hugging Face repositories; March 2026 | Hugging Face |
| Llama downloads | More than 1 billion | Cumulative downloads reported in March 2025 | Meta |
| Open source share of token usage | Roughly one-third | OpenRouter, late 2025 | OpenRouter |
| Tokens processed by DeepSeek models | 14.37 trillion | OpenRouter, November 2024-November 2025 | OpenRouter |
| Open-vs.-closed performance gap | 3.3% | Top Arena models, March 2026 | Stanford AI Index 2026 |
| OpenRouter annualized token run rate | About 1.5 quadrillion | All models on OpenRouter, May 2026 | Menlo Ventures |
| Historical inference cost decline | About 10x per year | Equivalent MMLU performance in a 2024 analysis | Andreessen Horowitz |
What is an open source LLM?
An open source LLM generally refers to a language model that people can download, run, and modify. Compared with proprietary models that are only accessible through an API, open source LLMs give developers much greater control over deployment, customization, and infrastructure.
The term open source is often used loosely. Many models described as open source are actually released as open-weight models. Their weights are publicly available but the license may include restrictions that differ from a traditional open source software license. Because the industry commonly uses “open source LLM” to refer to both categories, this article follows that convention for simplicity.
How many open source LLMs are there?
There is no authoritative global count of open source LLMs. Hugging Face hosted more than 2 million public model repositories in 2025, but that total includes models for text, image, audio, robotics, and other tasks, as well as fine-tunes, adapters, quantizations, and derivatives.
The broader Hugging Face ecosystem is still useful for measuring the scale and direction of open model development:
| Hugging Face statistic | Value | What it measures |
|---|---|---|
| Registered users | 13 million | Platform community size |
| Public model repositories | More than 2 million | All model types and derivatives |
| Public datasets | More than 500,000 | All dataset categories |
| Models with fewer than 200 downloads | About 50% | The long tail of model repositories |
| Share of downloads captured by the top 200 models | 49.6% | Concentration among the most-used repositories |
| Fortune 500 companies with verified accounts | More than 30% | Organizational presence, not confirmed production adoption |
| Industry share of model development | 37% | Down from roughly 70% before 2022 |
| Downloads attributed to unaffiliated developers | 39% | Up from 17% before 2022 |
| Mean size of a downloaded model | 20.8B parameters | Up from 827M in 2023 |
| Median size of a downloaded model | 406M parameters | Up from 326M in 2023 |
| Mean engagement period after release | About 6 weeks | How long models typically sustain attention |
Source: Hugging Face
The mean downloaded model size grew by about 25x between 2023 and 2025, while the median grew by only about 25%. That difference suggests a relatively small number of large models are pulling up the average while small models remain common.
Open source LLM adoption and usage
Open models have moved well past experimentation. By late 2025, they reached roughly one-third of all token volume on OpenRouter, a unified API platform that gives developers access to hundreds of AI models. The underlying study analyzed more than 100 trillion tokens over a year, covering November 2024 through November 2025.
| Open model developer | Tokens processed on OpenRouter |
|---|---|
| DeepSeek | 14.37 trillion |
| Qwen | 5.59 trillion |
| Meta Llama | 3.96 trillion |
| Mistral AI | 2.92 trillion |
| OpenAI | 1.65 trillion |
| MiniMax | 1.26 trillion |
| Z.ai | 1.18 trillion |
| TNGTech | 1.13 trillion |
| Moonshot AI | 0.92 trillion |
| 0.82 trillion |
Source: OpenRouter
DeepSeek processed about 2.6 times as many tokens as Qwen, the second-largest open source family in the study. By late 2025, however, no individual model consistently accounted for more than 20%-25% of open source model tokens. Usage had spread across five to seven competitive models.
The same study found that:
- Models developed in China rose from as little as 1.2% of weekly token volume in late 2024 to nearly 30% in some weeks.
- Chinese open source models averaged 13.0% of weekly OpenRouter token volume over the full period.
- Open source models developed outside China averaged 13.7%.
- Proprietary models developed outside China retained an average share of about 70%.
- Roleplay represented about 52% of open source model tokens, while programming was the second-largest category at roughly 15%-20%.
OpenRouter itself continued to grow after the study period. Menlo Ventures reported that the platform increased from 2.5 million to more than 8 million developers in roughly one year and reached an annualized run rate of about 1.5 quadrillion tokens in May 2026.
Developer adoption of AI tooling is now near-universal. 84% of respondents to the 2025 Stack Overflow Developer Survey use or plan to use AI tools, up from 76% the year before, and 51% of professional developers use them daily.
Sources: OpenRouter, Menlo Ventures, Stack Overflow
The most popular open source LLMs
Popularity depends on the metric. Downloads measure distribution, derivative repositories measure how often developers build on a model, and routed tokens measure hosted API usage.
Meta reported that the Llama family passed 1 billion cumulative downloads in March 2025. On Hugging Face, however, Qwen became the largest ecosystem for derivative work: the family had more than 113,000 direct derivative models by March 2026 and more than 200,000 repositories when every Qwen-tagged model was included.
These figures count downloads and repositories, not unique users or production deployments. Automated downloads, mirrors, quantizations, and repeated pulls can all affect the totals.
You can compare current model releases, licenses, context windows, and hardware requirements on the OpenLLMStack models page.
Sources: Meta, Hugging Face
Open vs. closed model performance
The leading open model trailed the leading closed model by 3.3% on the Arena leaderboard in March 2026, according to the Stanford AI Index. The gap had been only 0.5% in August 2024 before reopening during 2025.
| Date | Top closed-vs-open performance gap |
|---|---|
| August 2024 | 0.5% |
| March 2026 | 3.3% |
The correct conclusion is not that open models have permanently reached parity. The gap is small enough to remain competitive, but it changes as new model generations arrive. Six of the top ten Arena models were closed as of March 2026.
Source: Stanford AI Index 2026
Where open source models come from
Models developed in China made up 41% of Hugging Face downloads in 2025, the largest share attributed to a single country. China surpassed the United States in both monthly and overall downloads during the year.
The contributor mix changed at the same time. The share of model development attributed to industry fell from roughly 70% before 2022 to 37% in 2025, while unaffiliated developers grew from 17% to 39% of downloads.
OpenRouter recorded a similar geographic shift, although through a different metric. Chinese open source models rose from 1.2% of weekly token usage in late 2024 to nearly 30% in some weeks of 2025.
Sources: Hugging Face, OpenRouter
The falling cost of LLM inference
An Andreessen Horowitz analysis found that the price of inference at a fixed level of MMLU performance fell by roughly 10x per year. At an MMLU score of 42, the cheapest observed price dropped from $60 per million tokens in 2021 to $0.06 in 2024, a 1,000-fold decline.
That historical estimate has limitations: it relies on MMLU, averages input and output pricing, and covers selected models from OpenAI, Anthropic, and Meta. It is useful as a directional cost trend, not a law that guarantees another 10x decline every year.
Current API pricing shows how large the spread can be. As of July 2, 2026:
| Model | Input, cache miss | Cached input | Output |
|---|---|---|---|
| DeepSeek-V4-Flash | $0.14 | $0.0028 | $0.28 |
| DeepSeek-V4-Pro | $0.435 | $0.003625 | $0.87 |
| GPT-5.5 | $5.00 | $0.50 | $30.00 |
Prices are per 1 million tokens at standard rates. On those published prices, DeepSeek-V4-Flash input is 97.2% cheaper and output is 99.1% cheaper than GPT-5.5.
Serving efficiency comes from more than cheaper hardware. Quantization, batching, caching, sparse architectures, and optimized kernels all contribute. OpenLLMStack tracks major techniques on the inference optimizations page and the engines that implement them in the inference directory.
Sources: Andreessen Horowitz, DeepSeek API pricing, OpenAI API pricing
Top open source LLM API providers
You do not need your own GPUs to run an open model. A growing set of serverless inference providers host the leading open weights behind a single API, billed per token, so you can switch models without managing infrastructure.
| Provider | Known for |
|---|---|
| Together AI | Broad catalog of 200+ open models behind one unified API |
| Fireworks AI | Fast, production-grade serving of popular open models |
| Groq | Custom LPU silicon for very low latency and high throughput |
| DeepInfra | Low-cost, pay-per-token hosting of open models |
| Baseten | Custom deployment and autoscaling for open weights |
| Modular | Shared endpoints and reserved dedicated GPU capacity, optimized by MAX across GPU vendors |
| Hugging Face | Hub-native inference endpoints next to the models |
For the cheapest access to a single model, the first-party API from the model maker is often the lowest-priced option, such as the APIs from DeepSeek and Alibaba for their own models. Aggregators like OpenRouter route one request across many of these providers so you can compare price and speed.
DeepSeek R1: the breakout moment for open source LLMs
No single release did more for the profile of open source LLMs than DeepSeek R1. The model launched on January 20, 2025, and within days the DeepSeek app climbed to No. 1 on the U.S. Apple App Store, displacing ChatGPT and topping the charts in more than 50 countries.
The download surge was almost vertical. The app reached 2.6 million downloads across the App Store and Google Play by the Monday after launch, with more than 80% of all downloads coming in the previous seven days, and Appfigures data ranked the app No. 1 worldwide.
The market reaction was just as dramatic. On January 27, 2025, Nvidia lost about $589 billion in market value, the largest single-day loss for any company in history, after DeepSeek showed that a frontier-grade open model could reportedly be trained for around $5.6 million.
You can trace these milestones on the OpenLLMStack timeline.
Sources: TechCrunch, Bloomberg, Forbes
Frequently asked questions
How many open source LLMs are there?
No authoritative organization maintains a global count of open source LLMs. Hugging Face hosted more than 2 million public model repositories in 2025, but that figure covers all model types and includes derivatives, adapters, and quantizations. It is not a count of unique LLMs.
Which open source LLM is the most popular?
It depends on the metric.
- Meta reported more than 1 billion cumulative Llama downloads in March 2025.
- Qwen had the largest derivative ecosystem reported by Hugging Face in March 2026, with more than 113,000 direct derivative models.
- DeepSeek led open source usage on OpenRouter with 14.37 trillion tokens processed from November 2024 through November 2025.
What is the best open source LLM in 2026?
There is no single best open source LLM in 2026. Some models are good at coding, others at reasoning, long-context processing, or multilingual tasks. The right model depends on your workload, hardware, latency requirements, and budget.
If you self-host an open source LLM, you can also adapt it to your domain by fine-tuning the model on proprietary data. This can significantly improve performance for specialized tasks, such as legal analysis, healthcare, finance, or customer support, helping the model outperform a general-purpose foundation model in your specific domain.
By mid-2026, several open families compete at or near the proprietary frontier, each with a different strength.
| Model | Maker | Strongest at |
|---|---|---|
| GLM-5.2 | Z.ai | State-of-the-art coding and agentic engineering, 1M token context |
| MiniMax-M3 | MiniMax | Frontier coding and agentic work, native multimodal and computer use |
| DeepSeek-V4-Pro | DeepSeek | Reasoning and coding with adaptive effort modes and strong world knowledge |
| Kimi-K2.6 | Moonshot AI | Long-horizon coding and agent swarm orchestration |
| Qwen3.5-397B-A17B | Alibaba | Multimodal reasoning across 200+ languages, very long context |
| MiMo-V2.5-Pro | Xiaomi | Token-efficient coding agents with long-context reasoning |
| Gemma 4 | Top-tier reasoning and coding |
A common thread runs through the 2026 leaders: state-of-the-art coding, agentic tool use, and context windows that now stretch to 1 million tokens or more.
For the full list with parameters, licenses, context windows, and recommended GPUs, see the OpenLLMStack models page.
How much open source LLM usage is there?
Open source models reached roughly one-third of token volume on OpenRouter by late 2025. This is strong evidence of adoption on that platform, but self-hosted usage and traffic on other providers are not included.
Conclusion
Open source AI in 2025 and 2026 is defined by three trends: models that now rival closed systems on quality, inference costs that have collapsed, and a center of gravity shifting toward Chinese and independent developers. Open source models are no longer the budget option. For a growing share of teams, they’re the default.
If you’re building with open models, OpenLLMStack tracks current releases, inference engines, optimization techniques, and agent frameworks in one place.