×
Google’s new LLM architecture cuts costs with memory separation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

  • The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
  • By segregating memory functions, Titans can process sequences up to 2 million tokens in length
  • Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

  • A “core” module manages short-term memory using traditional attention mechanisms
  • A “long-term memory” component employs neural memory for storing important information
  • A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

  • This selective approach helps optimize memory usage and computational efficiency
  • The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
  • The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

  • Plans are in place to release both PyTorch and JAX implementations for training and evaluation
  • The architecture could be integrated into Google’s existing models like Gemini and Gemma
  • The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

Recent News

Meta’s Q1 revenue surpasses expectations

Strong advertising revenue drives Meta's first-quarter earnings beyond analyst forecasts, prompting a 6% stock jump in after-hours trading.

Former athletic director jailed for racist AI-generated recording

Maryland school athletics official serves jail time after using AI to create racist audio impersonation of principal following job performance disputes.

Microsoft reports 18% profit surge amid tech sector challenges

Cloud computing and AI segments drive Microsoft's quarterly sales to $70.1 billion with profits up 18% to $25.8 billion, exceeding Wall Street expectations amid post-election market volatility.