Information extraction to facilitate translation of natural language legislation

Wang, Samuel (Samuel Siyue)

dc.contributor.advisor	Hal Abelson.	en_US
dc.contributor.author	Wang, Samuel (Samuel Siyue)	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2011-06-20T15:57:58Z
dc.date.available	2011-06-20T15:57:58Z
dc.date.copyright	2011	en_US
dc.date.issued	2011	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/64598
dc.description	Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 65-66).	en_US
dc.description.abstract	There is a large body of existing legislation and policies that govern how government organizations and corporations can share information. Since these rules are generally expressed in natural language, it is difficult and labor intensive to verify whether or not data sharing events are compliant with the relevant policies. This work aims to develop a natural language processing framework that automates significant portions of this translation process, so legal policies are more accessible to existing automated reasoning systems. Even though these laws are expressed in natural language, for this very specific domain, only a handful of sentence structures are actually used to convey logic. This structure can be exploited so that the program can automatically detect who the actor, action, object, and conditions are for each rule. In addition, once the structure of a rule is identified, similar rules can be presented to the user. If integrated into an authoring environment, this will allow the user to reuse previously translated rules as templates to translate novel rules more easily, independent of the target language for translation. A body of 315 real-world rules from 12 legal sources was collected and annotated for this project. Cross-validation experiments were conducted on this annotated data set, and the developed system was successful in identifying the underlying rule structure 43% of the time, and annotating the underlying tokens with recall of .66 and precision of .66. In addition, for 70% of the rules in each test set, the underlying rule structure had been seen in the training set. This suggests that the hypothesis that rules can only be expressed in a limited number of ways is probable.	en_US
dc.description.statementofresponsibility	by Samuel Wang.	en_US
dc.format.extent	66 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Information extraction to facilitate translation of natural language legislation	en_US
dc.type	Thesis	en_US
dc.description.degree	S.M.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	727067697	en_US

Files in this item

Name:: 727067697-MIT.pdf
Size:: 3.208Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record