Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization
Author(s)
Zhang, Mozhi; Xu, Keyulu; Kawarabayashi, Ken-ichi; Jegelka, Stefanie Sabrina; Boyd-Graber, Jordan
DownloadPublished version (1.158Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Cross-lingual word embeddings (CLWE) underlie many multilingual natural language processing systems, often through orthogonal transformations of pre-trained monolingual embeddings. However, orthogonal mapping only works on language pairs whose embeddings are naturally isomorphic. For non-isomorphic pairs, our method (Iterative Normalization) transforms monolingual embeddings to make orthogonal alignment easier by simultaneously enforcing that (1) individual word vectors are unit length, and (2) each language's average vector is zero. Iterative Normalization consistently improves word translation accuracy of three CLWE methods, with the largest improvement observed on English-Japanese (from 2% to 44% test accuracy).
Date issued
2019-07Department
Massachusetts Institute of Technology. Department of Linguistics and PhilosophyJournal
57th Annual Meeting of the Association for Computational Linguistics
Publisher
Association for Computational Linguistics
Citation
Zhang, Mozhi et al. "Are Girls Neko or Shōjo? Cross-Lingual Alignment of Non-Isomorphic Embeddings with Iterative Normalization." 57th Annual Meeting of the Association for Computational Linguistics, July 2019, Florence, Italy, Association for Computational Linguistics, July 2019. © 2019 Association for Computational Linguistics
Version: Final published version