×
ULMFit, not GPT-1, was the first true LLM according to new analysis
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The development of Large Language Models (LLMs) has fundamentally transformed AI capabilities, but understanding their origins helps contextualize today’s rapid advancements. While GPT-4 and Claude dominate current discussions, identifying the first true LLM clarifies the evolutionary path of these increasingly sophisticated systems and provides valuable perspective on how quickly this technology has developed in just a few years.

The big picture: According to Australian tech blogger Jonathon Belotti, ULMFit, published by Jeremy Howard in January 2018, represents the first true LLM, predating OpenAI’s GPT-1 by several months.

  • GPT-1, created by Alec Radford, was published on June 11, 2018, several months after ULMFit’s introduction.
  • Both models demonstrated the core capabilities that define modern LLMs, though at a far smaller scale than today’s systems.

What makes an LLM: Belotti defines an LLM as a language model effectively trained as a “next word predictor” that can be easily adapted to multiple text-based tasks without architectural changes.

  • The definition emphasizes self-supervised training on unlabeled text data, focusing on next-word prediction capabilities.
  • True LLMs must achieve state-of-the-art performance across multiple text challenges with minimal adaptation.
  • This definition helps distinguish early language models from true LLMs based on their capabilities and adaptability.

Historical context: The article examines several pre-2018 models like CoVE and ELMo to determine if they meet the criteria for being considered LLMs.

  • After analysis, Belotti concludes that while arguable, ULMFit most convincingly fits the definition of the first genuine LLM.
  • Earlier models lacked either the adaptability or performance characteristics that define modern LLMs.

Where we go from here: Despite the increasing multimodality of AI models, Belotti suggests the term “LLM” will likely persist in technical vernacular.

  • Like “GPU” (which originally stood for Graphics Processing Unit but now handles many non-graphics tasks), “LLM” may become a standard term even as models evolve beyond pure language processing.

Recent News

AI on the sly? UK government stays silent on implementation

UK officials use AI assistant Redbox for drafting documents while withholding details about its implementation and influence on policy decisions.

AI-driven leadership demands empathy over control, says author

Tomorrow's successful executives will favor orchestration over command, leveraging human empathy and diverse perspectives to guide increasingly autonomous AI systems.

AI empowers rural communities in agriculture and more, closing digital gaps

AI tools create economic opportunity and improve healthcare and education access in areas where nearly 3 billion people remain offline.