Authors

Guillaume Lample, Myle Ott, Alexis Conneau, Ludovic Denoye, Marc’Aurelio Ranzato

[Paper]

Overview

Data scarcity is among the main challenges for training a usable Neural Machine Translation(NMT) model. Despite the progress made for a high-resource language (such as; English-German) pair, most languages are characterized by the absence of parallel data to train an NMT system. As my first series of paper review and writing a post I will summarize the “Phrase-Based & Neural Unsupervised Machine Translation”, to be presented at EMNLP18. Authors suggest two model variants; i) Phrase-based and ii) Neural.

Illustration

Both the Phrase-based and Neural rely on three core (#P1, #P2, and #P3) principles, consequently outperforming state-of-the-art approaches on unsupervised translation.

image-center

The illustration aims to visualize the idea behind the three principles:

– A) shows two monolingual datasets distribution (see the legend).

– B) Initialization: the two distributions are roughly aligned, with mechanism like word-by-word translation.

– C) Language Modeling (LM): is learned for each domain. The LM’s is then utilized for denoise examples.

– D) Back-translation: a source &rarr target inference stage is followed by a target–>source inference to reconstruct the examples back to the original language. A similar procedure is applied in the reverse/dual translation direction to get the feedback signals for optimizing the target–>source and source–>target models.

P1: Model Initialization

P2: Language Modeling

P3: Iterative Back-Translation

Algorithms

Integrating the above three principles, the Neural and Phrase-Based algorithms are given, where S and T representes source and target examples, and language models trained using source and target monolingual data are represented as Ps-t and Pt->s.

Neural

Phrase-Based

Results

Experimental are done using the well know WMT16 En<>De and WMT14 En<>Fr benchmarks. The combination of the PBSMT and NMT showed to give the best results (see the last row).

My Thoughts

More on Unsupervised MT

If you are interested and want to explore more about unsupervised MT approaches, check out the following works: