Google's new LLM architecture cuts costs with memory separation

Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage

Join Now

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
By segregating memory functions, Titans can process sequences up to 2 million tokens in length
Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

A “core” module manages short-term memory using traditional attention mechanisms
A “long-term memory” component employs neural memory for storing important information
A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

This selective approach helps optimize memory usage and computational efficiency
The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

Plans are in place to release both PyTorch and JAX implementations for training and evaluation
The architecture could be integrated into Google’s existing models like Gemini and Gemma
The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new neural-net LLM architecture separates memory components to control exploding costs of capacity and compute

VentureBeat

Menu

Google’s new LLM architecture cuts costs with memory separation

Recent News

Nvidia launches AI tool to generate images from 3D scenes

SaaStr 2025 unites top cloud, B2B and AI leaders in SF Bay

Visa develops AI-powered cards for seamless automated purchases

Join the revolution

CO/AI

Resources

Join the revolution

Menu

Welcome

Google’s new LLM architecture cuts costs with memory separation

Recent News

Nvidia launches AI tool to generate images from 3D scenes

SaaStr 2025 unites top cloud, B2B and AI leaders in SF Bay

Visa develops AI-powered cards for seamless automated purchases

Join the revolution

CO/AI

Resources

Join the revolution