TL;DR — Instead of training a new NMT model from scratch for every language pair, we let a trained model grow: expand its vocabulary on the fly and transfer existing parameters. Gains of +3.85 to +13.63 BLEU, reaching higher quality after only ~4% of the training steps.

The Problem

Adding a new language pair to an NMT system usually means training a fresh model — expensive, slow, and especially wasteful in low-resource settings where data is scarce. Worse, a fixed vocabulary baked in at training time can’t accommodate the new language’s tokens. Can we instead reuse what a trained model already knows and extend it incrementally?

Approach

We introduce a shared dynamic vocabulary: the model’s vocabulary can expand as new data arrives, adding new items only when they aren’t already covered, while transferring parameters from the initial model. We evaluate two scenarios:

  1. Adapt — repurpose a trained single-pair model to a new language pair (progAdapt).
  2. Grow — continuously add new pairs to build up a multilingual model over time (progGrow).
Dynamic vocabulary transfer: encoder-decoder parameters carried over while the embedding vocabulary is extended
Parameters transfer from the initial model while the shared vocabulary is dynamically extended to cover the new language.

Key Results

Training steps needed: progAdapt and progGrow require far fewer steps than the from-scratch baseline
Training steps to convergence: progAdapt and progGrow need a fraction of the from-scratch Baseline.

Why It Matters

This reframes multilingual NMT as something you incrementally grow rather than retrain. For low-resource languages — where every training hour and every sentence pair counts — transferring parameters through a dynamic vocabulary makes adding a language cheap and fast. The idea connects directly to later work on adapting multilingual NMT to unseen languages.

Details & Resources