×
Google’s new LLM architecture cuts costs with memory separation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

  • The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
  • By segregating memory functions, Titans can process sequences up to 2 million tokens in length
  • Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

  • A “core” module manages short-term memory using traditional attention mechanisms
  • A “long-term memory” component employs neural memory for storing important information
  • A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

  • This selective approach helps optimize memory usage and computational efficiency
  • The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
  • The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

  • Plans are in place to release both PyTorch and JAX implementations for training and evaluation
  • The architecture could be integrated into Google’s existing models like Gemini and Gemma
  • The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

Recent News

Nvidia launches AI tool to generate images from 3D scenes

Nvidia's new tool enables precise control over AI-generated images through 3D scene layouts, addressing the spatial limitations of traditional text-prompt systems.

SaaStr 2025 unites top cloud, B2B and AI leaders in SF Bay

Featuring over 15,000 attendees and 500 speakers, the three-day event will highlight proven strategies from executives who have built successful cloud businesses rather than theoretical AI discussions.

Visa develops AI-powered cards for seamless automated purchases

Visa's platform allows AI assistants to execute transactions using tokenized credentials within user-defined parameters, eliminating payment friction in automated shopping.