Information integration using contextual knowledge and ontology merging
Sloan School of Management.
Stuart E. Madnick.
MetadataShow full item record
With the advances in telecommunications, and the introduction of the Internet, information systems achieved physical connectivity, but have yet to establish logical connectivity. Lack of logical connectivity is often inviting disaster as in the case of Mars Orbiter, which was lost because one team used metric units, the other English while exchanging a critical maneuver data. In this Thesis, we focus on the two intertwined sub problems of logical connectivity, namely data extraction and data interpretation in the domain of heterogeneous information systems. The first challenge, data extraction, is about making it possible to easily exchange data among semi-structured and structured information systems. We describe the design and implementation of a general purpose, regular expression based Cameleon wrapper engine with an integrated capabilities-aware planner/optimizer/executioner. The second challenge, data interpretation, deals with the existence of heterogeneous contexts, whereby each source of information and potential receiver of that information may operate with a different context, leading to large-scale semantic heterogeneity. We extend the existing formalization of the COIN framework with new logical formalisms and features to handle larger set of heterogeneities between data sources. This extension, named Extended Context Interchange (ECOIN), is motivated by our analysis of financial information systems that indicates that there are three fundamental types of heterogeneities in data sources: contextual, ontological, and temporal. While COIN framework was able to deal with the contextual heterogeneities, ECOIN framework expands the scope to include ontological heterogeneities as well.(cont.) In particular, we are able to deal with equational ontological conflicts (EOC), which refer to the heterogeneity in the way data items are calculated from other data items in terms of definitional equations. ECOIN provides a context-based solution to the EOC problem based on a novel approach that integrates abductive reasoning and symbolic equation solving techniques in a unified framework. Furthermore, we address the merging of independently built ECOIN applications, which involves merging disparate ontologies and contextual knowledge. The relationship between ECOIN and the Semantic Web is also discussed. Finally, we demonstrate the feasibility and features of our integration approach with a prototype implementation that provides mediated access to heterogeneous information systems.
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, 2003.Includes bibliographical references (p. 145-152).
DepartmentSloan School of Management.
Massachusetts Institute of Technology
Sloan School of Management.