DeepL enables real-time voice-to-voice translation in 40+ languages



The Cologne-based translation company, known for its text tools, has introduced a complete voice product suite that includes meetings, chats, group settings and an API for enterprise integration. A live demo in Seoul showed a couple of sentence delays, and DeepL’s CPO admitted that word order differences between languages ​​remain a major problem.

DeepL, the Cologne-based AI company that built its reputation on high-quality text translation, has launched DeepL Voice-to-Voice: a real-time speech translation package designed for live business communication.

The product covers four different use cases, virtual meetings, mobile and web chats, group settings for frontline workers and enterprise applications via API, and supports more than 40 languages, including all 24 official EU languages ​​and additional ones such as Vietnamese, Thai, Arabic, Norwegian, Hebrew, Bengali and Tagalog.

The four components of the package are in various stages of availability. Voice for Chats is now generally available, enabling real-time translation on mobile and web without requiring app installation.

Voice for Meetings, which integrates with Microsoft Teams and Zoom so participants can speak in their native language while others hear a simultaneous translation, opens early access in June.

The Voice-to-Voice API, which allows businesses to embed DeepL’s translation engine into customer-facing applications such as call centers, is in continuous early access. Colloquial Terms, a personalization feature that allows the system to learn industry-specific vocabulary, company names and personal names, is scheduled for general release on May 7.

Jarek Kutylovski, Founder and CEO of DeepLdescribed the launch as reaching “Another Frontier in Translation.”

“DeepL Voice-to-Voice allows anyone to speak naturally in their own language without the hassle or expense of translators” he said.

DeepL has positioned the product as an enterprise tool, not a consumer one: the company said its voice technology never uses customer data to train its models, nor does it permanently store transcription or translation data after a call ends, a security framework that differentiates it from consumer. AI voice products and focused on regulated industries.

The current system works with a three-step pipeline: speech is converted to text, text is translated using DeepL’s built-in translation engine, and the result is converted back to speech.

DeepL’s competitive argument rests on the quality of the middle tier: the company says its text translation models outperform alternatives, and that advantage extends to voice output.

In blind evaluations commissioned by DeepL and independently conducted by Slator, a language industry research firm, 96% of professional linguists preferred DeepL Voice over native translation solutions in Google Meet, Microsoft Teams, and Zoom, citing superior fluency and context accuracy. DeepL Voice scored 96.4 out of 100 for Zoom and 96.3 for Microsoft Teams.

However, a live demonstration by Chief Product Officer Gonzalo Gayolas at the company’s DeepL Connect Seoul event on April 15 revealed the system’s current limitation: a seeming one-to-two sentence delay between the completion of a speaker and the delivery of a translation.

Gaiolas directly acknowledged the delay. “Different languages ​​have different word orders and sentence structures, which cause delays in real-time translation” he reports with reference to Seoul Economic Daily.

The company plans to reduce the delay through continuous model development. On the audio quality side, the current system translates using a stable synthetic voice; DeepL said it plans to release a sound protection feature that preserves the speaker’s original sound characteristics in translated output by the end of 2026.

DeepL enters a market with many well-funded competitors. Sanas, which uses artificial intelligence to change speaker accents in real-time for call center applications, raised $65 million in a round led by Quadrille Capital.

Based in Dubai, Camb.AI focuses on speech synthesis and translation for media dubbing. Backed by Reddit co-founder Alexis Ohanian’s Seven Seven Six, Palabra develops a real-time speech translation engine focused on preserving dynamic audio features.

Google, Microsoft, and Zoom all offer their own meeting translation features, while DeepL’s platforms are both complex and integrated. DeepL’s strategic bet is that translation quality, its longest-standing differentiator, can outweigh the structural advantages incumbents have in platform distribution.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *