×
AI headphones clone multiple voices for real-time translation
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

Yes, please do listen to the voices in your head.

Researchers have developed a groundbreaking AI headphone system that can simultaneously translate multiple speakers in real-time, potentially eliminating language barriers in multilingual group conversations. The Spatial Speech Translation system not only converts foreign languages into English text but also preserves each speaker’s unique vocal characteristics and emotional tone, creating a more natural translation experience than existing technologies. This innovation could transform international communication by enabling people to confidently express themselves across language divides.

How it works: The University of Washington’s Spatial Speech Translation system uses AI to track and translate multiple speakers simultaneously in group settings.

  • The technology works with standard noise-canceling headphones connected to a laptop running Apple‘s M2 chip, which supports the necessary neural networks.
  • The system employs two AI models: the first identifies speakers and their locations, while the second translates their speech from French, German, or Spanish into English text.

The big picture: Unlike existing translation tools that focus on single speakers, this system addresses the challenge of following conversations where multiple people speak different languages simultaneously.

  • The technology preserves speakers’ unique vocal characteristics, essentially creating a “cloned” voice that maintains the emotional tone of the original speaker.
  • Researchers presented their work at the ACM CHI Conference on Human Factors in Computing Systems in Japan this month.

Why this matters: The technology could break down significant communication barriers for non-native speakers in various professional and social contexts.

  • “There are so many smart people across the world, and the language barrier prevents them from having the confidence to communicate,” explains Shyam Gollakota, a professor who worked on the project.
  • Gollakota shares that his mother has “incredible ideas when she’s speaking in Telugu,” but struggles to communicate with people during visits to the US from India.

What’s next: Researchers are now working to reduce the system’s latency to under one second to enable more natural conversational flow.

  • The team aims to maintain the “conversational vibe” by minimizing delays between when someone speaks and when the translation is delivered to the listener.
  • Current technology requires the headphones to be connected to a laptop, though the same M2 chip that powers the system is also present in Apple’s Vision Pro headset, suggesting potential for more portable implementations.
A new AI translation system for headphones clones multiple voices simultaneously

Recent News

Hugging Face launches AI agent that navigates the web like a human

Computer assistants enable hands-free navigation of websites by controlling browsers to complete tasks like finding directions and booking tickets through natural language commands.

xAI’s ‘Colossus’ supercomputer faces backlash over health and permit violations

Musk's data center is pumping pollutants into a majority-Black Memphis neighborhood, creating environmental justice concerns as residents report health impacts.

Hallucination rates soar in new AI models, undermining real-world use

Advanced reasoning capabilities in newer AI models have paradoxically increased their tendency to generate false information, calling into question whether hallucinations can ever be fully eliminated.