MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Advancements in Word Alignment: Introducing a Novel Count-Based Subword Model Alongside Neural and Ensemble Models

Author(s)
Ghosh, Shinjini
Thumbnail
DownloadThesis PDF (1.076Mb)
Advisor
Andreas, Jacob
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
The task of aligning words across source and target languages, known as word alignment, plays a crucial role in natural language processing and machine translation. This thesis addresses the word alignment problem by developing and comparing three models: a count-based subword model, a baseline encoder-decoder neural alignment model, and an ensemble model. The count-based subword model utilizes statistical measures and co-occurrence statistics for word alignment estimation. The neural alignment model employs an encoder-decoder architecture with attention mechanisms for end-to-end alignment learning. The ensemble model combines the strengths of both the count-based and neural models to improve alignment accuracy and robustness. Through extensive experimentation, we demonstrate the effectiveness of each model in capturing subword boundaries, identifying relationships, and aligning words across parallel sentences. The results highlight the superior performance of the count-based subword model and the ensemble model, showcasing the potential for more accurate and robust alignment techniques with applications in various natural language processing tasks. This research contributes to the advancement of word alignment techniques, providing valuable insights and methods for enhancing multilingual processing, machine translation, and other language-related applications.
Date issued
2023-06
URI
https://hdl.handle.net/1721.1/151524
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.