Isometric MT: Neural Machine Translation for Automatic Dubbing
TL;DR — Automatic dubbing needs translations that match the source length so speech stays in sync. Prior methods over-generate then re-rank. Isometric MT teaches a single transformer to produce length-matched output directly — simpler, and better than the more complex alternatives.
The Problem
In automatic dubbing, the translated speech has to fit the time the original speaker was talking. That makes length a first-class constraint: the translation should land within roughly ±10% of the source character count, without sacrificing quality. The two pull against each other — squeezing output to a target length usually degrades translation. The common fix is a two-step pipeline: generate an N-best list of hypotheses, then re-rank them by a length-and-quality function. It works, but it’s heavy: multiple decodes plus an auxiliary ranker.
Approach
We replace that pipeline with a self-learning approach: the transformer learns to generate length-compliant translations directly, in a single pass. No N-best list, no separate ranking function — the length-matching behavior is baked into the model itself.
Key Results
- Evaluated on four language pairs — English → French, Italian, German, Spanish — on a publicly available benchmark.
- Both automatic and manual evaluation show Isometric MT outperforms the more complex N-best + re-ranking approaches from the literature — while being a single model with a single decoding pass.
Why It Matters
Isometric MT makes length control a property of the model rather than a bolt-on pipeline, which is cheaper to run and easier to deploy in a real dubbing system. It’s a cornerstone of the broader automatic-dubbing line of work — from verbosity control and isochrony-aware translation to jointly optimizing translation and speech timing — and it underpinned the Isometric Spoken Language Translation shared task at IWSLT 2022.
Details & Resources
- Paper: ICASSP 2022 — arXiv
- Code: github.com/amazon-science/isometric-slt
- Citation: S. M. Lakew, Y. Virkar, P. Mathur, M. Federico. Isometric MT: Neural Machine Translation for Automatic Dubbing. ICASSP 2022.