ULMFit, not GPT-1, was the first true LLM according to new analysis

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.

The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.

GPT-1, created by Alec Radford, was published on June 11, 2018, several months after ULMFit’s introduction.
Both models demonstrated the core capabilities that define modern LLMs, though at a far smaller scale than today’s systems.

What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.

The definition emphasizes self-supervised training on unlabeled text data, focusing on next-word prediction capabilities.
True LLMs must achieve state-of-the-art performance across multiple text challenges with minimal adaptation.
This definition helps distinguish early language models from true LLMs based on their capabilities and adaptability.

Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.

After analysis, Belotti concludes that while arguable, ULMFit most convincingly fits the definition of the first genuine LLM.
Earlier models lacked either the adaptability or performance characteristics that define modern LLMs.

Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.

Like “GPU” (which originally stood for Graphics Processing Unit but now handles many non-graphics tasks), “LLM” may become a standard term even as models evolve beyond pure language processing.

The First LLM

Jonathon Belotti [thundergolfer]