Using linguistic knowledge in statistical machine translation

Zbib, Rabih M. (Rabih Mohamed), 1974-

dc.contributor.advisor	James R. Glass and Steven R. Lerman.	en_US
dc.contributor.author	Zbib, Rabih M. (Rabih Mohamed), 1974-	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.	en_US
dc.date.accessioned	2011-04-25T15:51:36Z
dc.date.available	2011-04-25T15:51:36Z
dc.date.copyright	2010	en_US
dc.date.issued	2010	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/62391
dc.description	Thesis (Ph. D. in Information Technology)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2010.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 153-162).	en_US
dc.description.abstract	In this thesis, we present methods for using linguistically motivated information to enhance the performance of statistical machine translation (SMT). One of the advantages of the statistical approach to machine translation is that it is largely language-agnostic. Machine learning models are used to automatically learn translation patterns from data. SMT can, however, be improved by using linguistic knowledge to address specific areas of the translation process, where translations would be hard to learn fully automatically. We present methods that use linguistic knowledge at various levels to improve statistical machine translation, focusing on Arabic-English translation as a case study. In the first part, morphological information is used to preprocess the Arabic text for Arabic-to-English and English-to-Arabic translation, which reduces the gap in the complexity of the morphology between Arabic and English. The second method addresses the issue of long-distance reordering in translation to account for the difference in the syntax of the two languages. In the third part, we show how additional local context information on the source side is incorporated, which helps reduce lexical ambiguity. Two methods are proposed for using binary decision trees to control the amount of context information introduced. These methods are successfully applied to the use of diacritized Arabic source in Arabic-to-English translation. The final method combines the outputs of an SMT system and a Rule-based MT (RBMT) system, taking advantage of the flexibility of the statistical approach and the rich linguistic knowledge embedded in the rule-based MT system.	en_US
dc.description.statementofresponsibility	by Rabih M. Zbib.	en_US
dc.format.extent	162 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Civil and Environmental Engineering.	en_US
dc.title	Using linguistic knowledge in statistical machine translation	en_US
dc.title.alternative	Using linguistic knowledge in SMT	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.in Information Technology	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Civil and Environmental Engineering
dc.identifier.oclc	710154183	en_US

Files in this item

Name:: 710154183-MIT.pdf
Size:: 10.48Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record