Advancements in Word Alignment: Introducing a
Novel Count-Based Subword Model Alongside
Neural and Ensemble Models

Ghosh, Shinjini

Author(s)

Ghosh, Shinjini

DownloadThesis PDF (1.076Mb)

Advisor

Andreas, Jacob

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The task of aligning words across source and target languages, known as word alignment, plays a crucial role in natural language processing and machine translation. This thesis addresses the word alignment problem by developing and comparing three models: a count-based subword model, a baseline encoder-decoder neural alignment model, and an ensemble model. The count-based subword model utilizes statistical measures and co-occurrence statistics for word alignment estimation. The neural alignment model employs an encoder-decoder architecture with attention mechanisms for end-to-end alignment learning. The ensemble model combines the strengths of both the count-based and neural models to improve alignment accuracy and robustness. Through extensive experimentation, we demonstrate the effectiveness of each model in capturing subword boundaries, identifying relationships, and aligning words across parallel sentences. The results highlight the superior performance of the count-based subword model and the ensemble model, showcasing the potential for more accurate and robust alignment techniques with applications in various natural language processing tasks. This research contributes to the advancement of word alignment techniques, providing valuable insights and methods for enhancing multilingual processing, machine translation, and other language-related applications.

Date issued

2023-06

URI

https://hdl.handle.net/1721.1/151524

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses