Show simple item record

dc.contributor.advisorCatherine Havasi and Kevin C. Gold.en_US
dc.contributor.authorSittig, Karen Anneen_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2014-03-06T15:46:31Z
dc.date.available2014-03-06T15:46:31Z
dc.date.copyright2013en_US
dc.date.issued2013en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/85500
dc.descriptionThesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.en_US
dc.descriptionCataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 59-60).en_US
dc.description.abstractWriting programs that are capable of completing complex tasks on web pages is difficult due to the inconsistent nature of the pages themselves. While there exist best practices for developing naming schemes for page elements, these schemes are not strictly enforced, making it difficult to develop a general-use automated system. Many pages must be hand-labeled if they are to be incorporated into an automated testing framework. In this thesis, I build an application that assists human users in classifying and labeling web pages. This system uses a gradient boosting classifier from the scikit-learn Python package to identify which of four tasks may be performed on a given web page. It also attempts to automatically label the input fields and buttons on the web page using a gradient boosting classifier. It outputs its results in a format that can be easily consumed by the LARIAT system at MIT Lincoln Laboratory, greatly reducing the human labor required to incorporate new web pages into the system.en_US
dc.description.statementofresponsibilityby Karen Anne Sittig.en_US
dc.format.extent60 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleAn affordance-inspired tool for automated web page labeling and classificationen_US
dc.title.alternativeAutomated web navigation via affordance learningen_US
dc.typeThesisen_US
dc.description.degreeM. Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc871002879en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record