Attribution principles for data integration : technology and policy perspectives
Author(s)
Lee, Thomas Y. (Thomas Yupoo)
DownloadFull printable version (13.60Mb)
Alternative title
Principles of attribution for data integration : technology and policy perspectives
Other Contributors
Massachusetts Institute of Technology. Technology, Management, and Policy Program.
Advisor
Stuart E. Madnick.
Terms of use
Metadata
Show full item recordAbstract
This thesis addresses problems of attribution that arise from the data integration that is exemplified by data re-use and re-distribution on the Web. We present two different perspectives. We begin with a simple definition of attribution, asking what data are we interested in and where does it come from? A formal model and its properties are defined, implementation in an extended relational algebra is described, and application to semistructured data on the Web is discussed. However, because the problem is more than simply what and where, we then expand the scope of our analysis. From the perspective of intellectual property policies, we adopt a broader view of the attribution problem space. A policy analysis that surveys the status quo policy landscape and stakeholder interests is followed by specific policy recommendations. Informed by our technology perspective, we offer two new arguments to support misappropriation as a policy approach to the attribution problem space. Our formal model of attribution is developed in the established foundation of the Domain Relational Calculus (DRC). Three distinct types of attribution are identified: comprehensive, source, and relevant. For each type, we consider the attribution of equivalent DRC expressions, attribution for composed queries, and granularity. An algebra is presented to implement the model. The extended algebra is closed, reduces to the standard relational algebra, and is a consistent extension of the standard algebra. (cont.) The policy perspective encompasses not only what and where but also integration architectures and the relationships between data providers and users. Information technologies separate the processes and products of data gathering from data selection and presentation. Where the latter is addressed by copyright, the former is not addressed at all. Based upon two traditional, legal-economic frameworks, the asymmetric Prisoner's Dilemma and Entitlement Theory, we argue for a policy of misappropriation to support integration and attribution for data.
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology, Management, and Policy Program, 2002. Includes bibliographical references (p. [229]-250).
Date issued
2002Department
Massachusetts Institute of Technology. Engineering Systems Division; Technology and Policy ProgramPublisher
Massachusetts Institute of Technology
Keywords
Technology, Management, and Policy Program.