Date of Degree

2-2020

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy

Program

Computer Science

Advisor

Jia Xu

Committee Members

Robert Haralick

Lei Xie

Hui Wan

Subject Categories

Artificial Intelligence and Robotics | Computer Sciences

Keywords

Machine Learning, Deep Learning, Natural Language Processing, Neural Machine Translation

Abstract

This thesis aims for general robust Neural Machine Translation (NMT) that is agnostic to the test domain. NMT has achieved high quality on benchmarks with closed datasets such as WMT and NIST but can fail when the translation input contains noise due to, for example, mismatched domains or spelling errors. The standard solution is to apply domain adaptation or data augmentation to build a domain-dependent system. However, in real life, the input noise varies in a wide range of domains and types, which is unknown in the training phase. This thesis introduces five general approaches to improve NMT accuracy and robustness, where three of them are invariant to models, test domains, and noise types. First, we describe a novel unsupervised text normalization framework Lex-Var, to reduce the lexical variations for NMT. Then, we apply the phonetic encoding as auxiliary linguistic information and obtained very significant (5 BLEU point) improvement in translation quality and robustness. Furthermore, we introduce the random clustering encoding method based on our hypothesis of Semantic Diversity by Phonetics and generalizes to all languages. We also discussed two domain adaptation models for the known test domain. Finally, we provide a measurement of translation robustness based on the consistency of translation accuracy among samples and use it to evaluate our other methods. All these approaches are verified with extensive experiments across different languages and achieved significant and consistent improvements in translation quality and robustness over the state-of-the-art NMT.

Recommended Citation

Khan, Abdul Rafae, "Robust Neural Machine Translation" (2020). CUNY Academic Works.
https://academicworks.cuny.edu/gc_etds/3532

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Dissertations, Theses, and Capstone Projects

Robust Neural Machine Translation

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Browse

Author Corner

Search

Links

Dissertations, Theses, and Capstone Projects

Robust Neural Machine Translation

Author

Date of Degree

Document Type

Degree Name

Program

Advisor

Committee Members

Subject Categories

Keywords

Abstract

Recommended Citation

Included in

Share

Browse

Author Corner

Search

Links