Facebook has released seamless an AI-powered language translation model. It converts speech in one language into another, and at the same time maintains the tone and emotion of the original speech.
Facebook got close to developing a language translation model in 2023 called universal language translator. It released seamlessM4T, which can translate audio or text in 100 languages into text of any of the languages or speech in 36 languages. Much water has flown through the Ganga since then.
Seamless is the foundational model. In future, it will facilitate a world where everyone will be understood.
The updated version is called SeamlessM4T v2. It recognizes speech automatically. It is good at speech-to-speech, speech-to-text and text-to-speech functions.
The new AI used in the model is SeamlessStreaming and SeamlessExpressive.
A typical translator translates after the speaker finishes a sentence in speech. It is to deal with different language structures. The syntax of subject-verb-object order may differ from language to language. It leads to delays in translation. The conversations feel less natural.
SeamlessStreaming commences translation as soon as the speaker speaks. The listener hears the translation with a delay of just few seconds. It is a latency of just 1-2 seconds.
SeamlessExpressive focuses not on the content but the tenor of it. The translation should maintain the emotion, style and rhythm of the original soeech.
Seamless is multi-task and multi-lingual model.
There is an expressivity of encoder and expressive unit-to-speech generator conditioned on source speech.
It has been made open source and is available on GitHub.