Advancements in Word Alignment: Introducing a Novel Count-Based Subword Model Alongside Neural and Ensemble Models
Author(s)
Ghosh, Shinjini
DownloadThesis PDF (1.076Mb)
Advisor
Andreas, Jacob
Terms of use
Metadata
Show full item recordAbstract
The task of aligning words across source and target languages, known as word alignment, plays a crucial role in natural language processing and machine translation. This thesis addresses the word alignment problem by developing and comparing three models: a count-based subword model, a baseline encoder-decoder neural alignment model, and an ensemble model. The count-based subword model utilizes statistical measures and co-occurrence statistics for word alignment estimation. The neural alignment model employs an encoder-decoder architecture with attention mechanisms for end-to-end alignment learning. The ensemble model combines the strengths of both the count-based and neural models to improve alignment accuracy and robustness. Through extensive experimentation, we demonstrate the effectiveness of each model in capturing subword boundaries, identifying relationships, and aligning words across parallel sentences. The results highlight the superior performance of the count-based subword model and the ensemble model, showcasing the potential for more accurate and robust alignment techniques with applications in various natural language processing tasks. This research contributes to the advancement of word alignment techniques, providing valuable insights and methods for enhancing multilingual processing, machine translation, and other language-related applications.
Date issued
2023-06Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology