A tree-to-tree model for statistical machine translation

Cowan, Brooke A. (Brooke Alissa), 1972-

dc.contributor.advisor	Michael J. Collins.	en_US
dc.contributor.author	Cowan, Brooke A. (Brooke Alissa), 1972-	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2009-03-16T19:29:41Z
dc.date.available	2009-03-16T19:29:41Z
dc.date.copyright	2008	en_US
dc.date.issued	2008	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/44689
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.	en_US
dc.description	Includes bibliographical references (p. 227-234).	en_US
dc.description.abstract	In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a target-language tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a source-language tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output. A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the source-language input to the target-language output. Instead of learning a mapping from a source-language tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP. The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system.	en_US
dc.description.statementofresponsibility	by Brooke Alissa Cowan.	en_US
dc.format.extent	234 p	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	A tree-to-tree model for statistical machine translation	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	289341212	en_US

Files in this item

Name:: 289341212-MIT.pdf
Size:: 17.47Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record