DeepL, known for its text translation, now wants to translate your voice


DeepL, a translation company known for its text-to-text tools, today released a speech-to-speech translation suite that covers use cases such as meetings, mobile and web conversations, and group chats for frontline workers via dedicated apps. The company also releases an API that allows external developers and enterprises to build on top of DeepL’s technology for custom use cases such as call centers.

“After spending many years translating text, voice was a natural step for us,” DeepL CEO Jarek Kutylowski told TechCrunch. “We’ve come a long way when it comes to text translation and document translation. But we didn’t think there was a great product for real-time voice translation.”

Kutylowski said the challenges in building a real-time translation product center around reducing latency — the delay between someone speaking and the translated audio being played — and maintaining accurate results.

DeepL releases add-ons for platforms such as Zoom and Microsoft Teams, where listeners can either hear the translation in real-time while others speak in their native language, or watch the translated text in real-time on the screen. This program is currently in early access and company invites organizations joining the waiting list. The company also has a product for mobile and web-based chats that can be conducted in person or remotely.

DeepL also allows users to participate in a group chat in settings such as training sessions or workshops, allowing participants to join via a QR code.

DeepL said its voice-to-voice technology can also learn and match individual vocabularies such as industry terms, company and personal names.

Artificial intelligence is reimagining what customer service will look like in the coming years, Kutylovsky said. He noted that the translation layer helps companies provide support in languages ​​where skilled personnel are scarce and expensive to hire.

Techcrunch event

San Francisco, CA
|
October 13-15, 2026

The company said it controls the entire noise stack. However, the current system converts speech to text, applies translation, and then converts it back to speech. DeepL believes that it has an advantage in translation quality because it has been working on the translation of texts for years. In the future, the company wants to develop an end-to-end voice translation model that skips the text step altogether.

DeepL faces competition from several well-funded startups operating in adjacent corners of the space. Sanas, bred last year 65 million dollars From Quadrille Capital and Teleperformance, it uses artificial intelligence to change a speaker’s accent in real-time — a tool aimed primarily at call center agents.

Based in Dubai, Camb.AI helps media and entertainment companies by focusing on speech synthesis and translation for Amazon Web Services. dub and localize video content on the scale.

Backed by Reddit co-founder Alexis Ohanian’s firm Seven Seven Six, Palabra builds a real-time speech translation engine to preserve both meaning and speech. the speaker’s original voiceputting it in more direct competition with what DeepL is building now.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *