MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
  • DSpace@MIT Home
  • Computer Science and Artificial Intelligence Lab (CSAIL)
  • CSAIL Digital Archive
  • CSAIL Technical Reports (July 1, 2003 - present)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Harvesting Application Information for Industry-Scale Relational Schema Matching

Author(s)
Kushman, Nate; Adib, Fadel; Katabi, Dina; Barzilay, Regina
Thumbnail
DownloadMIT-CSAIL-TR-2013-022.pdf (1.276Mb)
Other Contributors
Natural Language Processing
Advisor
Regina Barzilay
Metadata
Show full item record
Abstract
Consider the problem of migrating a company's CRM or ERP database from one application to another, or integrating two such databases as a result of a merger. This problem requires matching two large relational schemas with hundreds and sometimes thousands of fields. Further, the correct match is likely complex: rather than a simple one-to-one alignment, some fields in the source database may map to multiple fields in the target database, and others may have no equivalent fields in the target database. Despite major advances in schema matching, fully automated solutions to large relational schema matching problems are still elusive. This paper focuses on improving the accuracy of automated large relational schema matching. Our key insight is the observation that modern database applications have a rich user interface that typically exhibits more consistency across applications than the underlying schemas. We associate UI widgets in the application with the underlying database fields on which they operate and demonstrate that this association delivers new information useful for matching large and complex relational schemas. Additionally, we show how to formalize the schema matching problem as a quadratic program, and solve it efficiently using standard optimization and machine learning techniques. We evaluate our approach on real-world CRM applications with hundreds of fields and show that it improves the accuracy by a factor of 2-4x.
Date issued
2013-09-10
URI
http://hdl.handle.net/1721.1/80380
Series/Report no.
MIT-CSAIL-TR-2013-022

Collections
  • CSAIL Technical Reports (July 1, 2003 - present)

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.