Constructing Low Resource Approaches to Improve Speech-to-text Translation from Modern Standard Arabic to English
Author(s)
Manna, Rami
DownloadThesis PDF (473.9Kb)
Advisor
Glass, James R.
Belinkov, Yonatan
Terms of use
Metadata
Show full item recordAbstract
This thesis explores novel approaches to the Arabic-English speech-to-text translation task. First, we construct a novel Modern Standard Arabic speech and English text parallel dataset. Second, we propose a novel framework for leveraging unsupervised machine translation to improve speech-to-text translation, and apply this framework to the task of Arabic-English speech-to-text translation. In particular, we propose a 3-step cascade approach to speech-to-text translation. In step 1, we use a speech recognition model to transcribe the Arabic speech into Arabic text. In step 2, we leverage unsupervised machine translation to learn a mapping between the output of the speech recognition model (transcribed Arabic) and Modern Standard Arabic (formal written Arabic). In step 3, we use an Arabic-English machine translation model to translate the output of the unsupervised model to English. Our third contribution is an exploration of approaches to low-resource end-to-end speech-to-text translation. We present and compare two approaches for synthesizing parallel training data. Finally, we compare the end-to-end approach with the cascaded approach. We found that the 3-step cascaded speech-to-text did not perform as well as the 2-step cascaded speech-to-text baseline. We show that with the end-to-end approach trained with synthetic English text, we are able to achieve similar performance to the 2-step cascaded speech-to-text baseline.
Date issued
2021-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology