Show simple item record

dc.contributor.advisorSamuel Madden.en_US
dc.contributor.authorCattori, Pedroen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2016-12-22T16:28:01Z
dc.date.available2016-12-22T16:28:01Z
dc.date.copyright2016en_US
dc.date.issued2016en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/106077
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 86-87).en_US
dc.description.abstractThe Field Extraction Library (FEL) provides functions for named-entity extraction within free text. FEL models the content structure of the specified named-entities rather than relying on brittle, context-specific separator logic. Users specify the names of the fields they wish to extract, which determine the number of states for an underlying Hidden Markov Model. The observable emission set is pre-determined by FEL's tokenizer. Once the model topology is set, users provide training examples of the form: x = raw text, y {fieldl: val1, field2:val2, ... } FEL learns the parameters of the underlying Hidden Markov Model by maximum likelihood model-estimation on the training examples. FEL is designed to operate on small, sparse training data. As a result, users can provide few (less than 10) training examples to bootstrap the model. FEL offers 3 iterative mechanisms for scaling data quality as users provide guidance through additional feedback: (1) accept more training examples, (2) create landmark states, and (3) bridge related states with state bridges. FEL detects ambiguities both in its internal model and in the extraction results to prompt users for more feedback. Once the model yields acceptable result quality, users can extract fields into a table for easy querying and exporting.en_US
dc.description.statementofresponsibilityby Pedro Cattori.en_US
dc.format.extent87 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleExtracting fields from free-texten_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc965198310en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record