MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Identifying Expression Fingerprints using Linguistic Information

Author(s)
Uzuner, Ozlem
Thumbnail
DownloadMIT-CSAIL-TR-2005-077.ps (170.7Mb)
Additional downloads
Metadata
Show full item record
Abstract
This thesis presents a technology to complement taxation-based policy proposals aimed at addressing the digital copyright problem. Theapproach presented facilitates identification of intellectual propertyusing expression fingerprints. Copyright law protects expression of content. Recognizing literaryworks for copyright protection requires identification of theexpression of their content. The expression fingerprints described inthis thesis use a novel set of linguistic features that capture boththe content presented in documents and the manner of expression usedin conveying this content. These fingerprints consist of bothsyntactic and semantic elements of language. Examples of thesyntactic elements of expression include structures of embedding andembedded verb phrases. The semantic elements of expression consist ofhigh-level, broad semantic categories. Syntactic and semantic elements of expression enable generation ofmodels that correctly identify books and their paraphrases 82% of thetime, providing a significant (approximately 18%) improvement over modelsthat use tfidf-weighted keywords. The performance of models builtwith these features is also better than models created with standardfeatures used in stylometry (e.g., function words), which yield anaccuracy of 62%.In the non-digital world, copyright holders collect revenues bycontrolling distribution of their works. Current approaches to thedigital copyright problem attempt to provide copyright holders withthe same kind of control over distribution by employing Digital RightsManagement (DRM) systems. However, DRM systems also enable copyrightholders to control and limit fair use, to inhibit others' speech, andto collect private information about individual users of digitalworks.Digital tracking technologies enable alternate solutions to thedigital copyright problem; some of these solutions can protectcreative incentives of copyright holders in the absence of controlover distribution of works. Expression fingerprints facilitatedigital tracking even when literary works are DRM- and watermark-free,and even when they are paraphrased. As such, they enable meteringpopularity of works and make practicable solutions that encouragelarge-scale dissemination and unrestricted use of digital works andthat protect the revenues of copyright holders, for example throughtaxation-based revenue collection and distribution systems, withoutimposing limits on distribution.
Date issued
2005-11-18
URI
http://hdl.handle.net/1721.1/30587
Other identifiers
MIT-CSAIL-TR-2005-077
AITR-2005-004
Series/Report no.
Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory
Keywords
AI, natural language processing, syntactic information, content, expression

Collections
  • CSAIL Technical Reports (July 1, 2003 - present)

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.