Timeline

Major Milestones

Key moments that shaped the open-source LLM ecosystem, from the Transformer paper to frontier open models.

papers

Transformer Architecture Published

Google Brain publishes 'Attention Is All You Need', introducing the Transformer — the architecture that would power all modern LLMs.

Learn more
models

LLaMA Released by Meta

Meta releases LLaMA (7B-65B), proving smaller models trained on more data can compete with much larger ones. Catalyzes the open-source LLM movement.

Learn more
inference

llama.cpp Launches

Georgi Gerganov releases llama.cpp, enabling LLaMA inference in C/C++ on consumer hardware including Apple Silicon. Democratizes local LLM access.

Learn more
inference

vLLM Introduces PagedAttention

UC Berkeley releases vLLM with PagedAttention, applying virtual memory concepts to KV cache. Achieves 2-4x throughput improvement and becomes the standard for LLM serving.

Learn more
models

Llama 2 Released

Meta releases Llama 2 with a permissive commercial license, including chat-tuned variants. First truly open model family that could be freely deployed in commercial products.

Learn more
models

Mistral 7B Released

Mistral AI drops Mistral 7B via a torrent link with no fanfare. Outperforms Llama 2 13B on all benchmarks, introducing sliding window attention and grouped-query attention.

Learn more
models

Mixtral 8x7B Paper

Mistral AI publishes the Mixtral paper, showing a sparse MoE model with 46B total but only 13B active params can match Llama 2 70B. Proves MoE is viable for open models.

Learn more
models

Llama 3.1 405B Released

Meta releases Llama 3.1 including a 405B parameter model — the largest open-weight model at the time, competitive with GPT-4 class models on many benchmarks.

Learn more
agents

Agentic Frameworks Surge

2024 sees an explosion of agentic frameworks — LangGraph, CrewAI, AutoGen, and others mature rapidly. The focus shifts from simple chatbots to autonomous multi-step agents.

models

DeepSeek-V3 Released

DeepSeek releases V3, a 671B MoE model trained for only $5.5M — shattering cost assumptions. Introduces Multi-head Latent Attention (MLA) and matches top proprietary models.

Learn more
models

DeepSeek-R1 — Open Reasoning

DeepSeek releases R1, the first open-weight reasoning model that rivals OpenAI's o1. Achieves chain-of-thought reasoning through pure RL without supervised reasoning traces.

Learn more