Neural Machine Translation of Text from Non-Native Speakers

Antonios Anastasopoulos, Alison Lui, Toan Q. Nguyen, David Chiang

June 2019

PDF Code Dataset DOI

Abstract

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.

Type

Conference paper

Publication

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

MT robustness

Antonios Anastasopoulos

Assistant Professor

I work on multilingual models, machine translation, speech recognition, and NLP for under-served languages.

Neural Machine Translation of Text from Non-Native Speakers

Abstract

Antonios Anastasopoulos

Assistant Professor

Related