Show simple item record

dc.contributor.advisorRandall Davis and Boris Katz.en_US
dc.contributor.authorUzuner, Ozlem, 1975-en_US
dc.contributor.otherMassachusetts Institute of Technology. Technology, Management, and Policy Program.en_US
dc.date.accessioned2006-03-24T18:37:00Z
dc.date.available2006-03-24T18:37:00Z
dc.date.copyright2005en_US
dc.date.issued2005en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/30286
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology, Management, and Policy Program, 2005.en_US
dc.descriptionIncludes bibliographical references (leaves 181-192).en_US
dc.description.abstractThis thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. The approach presented facilitates identification of intellectual property using expression fingerprints. Copyright law protects expression of content. Recognizing literary works for copyright protection requires identification of the expression of their content. The expression fingerprints described in this thesis use a novel set of linguistic features that capture both the content presented in documents and the manner of expression used in conveying this content. These fingerprints consist of both syntactic and semantic elements of language. Examples of the syntactic elements of expression include structures of embedding and embedded verb phrases. The semantic elements of expression consist of high-level, broad semantic categories. Syntactic and semantic elements of expression enable generation of models that correctly identify books and their paraphrases 82% of the time, providing a significant (approximately 18%) improvement over models that use tfidf-weighted keywords. The performance of models built with these features is also better than models created with standard features used in stylometry (e.g., function words), which yield an accuracy of 62%. In the non-digital world, copyright holders collect revenues by controlling distribution of their works. Current approaches to the digital copyright problem attempt to provide copyright holders with the same kind of control over distribution by employing Digital Rights Management (DRM) systems.en_US
dc.description.abstract(cont.) However, DRM systems also enable copyright holders to control and limit fair use, to inhibit others' speech, and to collect private information about individual users of digital works. Digital tracking technologies enable alternate solutions to the digital copyright problem; some of these solutions can protect creative incentives of copyright holders in the absence of control over distribution of works. Expression fingerprints facilitate digital tracking even when literary works are DRM- and watermark-free, and even when they are paraphrased. As such, they enable metering popularity of works and make practicable solutions that encourage large-scale dissemination and unrestricted use of digital works and that protect the revenues of copyright holders, for example through taxation-based revenue collection and distribution systems, without imposing limits on distribution.en_US
dc.description.statementofresponsibilityby Özlem Uzuner.en_US
dc.format.extent200 leavesen_US
dc.format.extent11620170 bytes
dc.format.extent11646101 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectTechnology, Management, and Policy Program.en_US
dc.titleIdentifying expression fingerprints using linguistic informationen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Engineering Systems Division
dc.contributor.departmentTechnology and Policy Program
dc.identifier.oclc60933836en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record