Publication: A comparative study of instance-based schema matching in relational database
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Subject LCSH
Relational databases
Semantic computing
Subject ICSI
Call Number
Abstract
Schema matching is deemed to be indispensable process for database integration in many contemporary database systems. The aim of schema matching is to identify the correlation across a schema which eventually serves the data integration process. The main issue concern for data integration is to support the merging decision advocating correspondence among attributes of heterogeneous data sources. Numerous schema matching techniques have been suggested in literature for utilizing database instances in detecting correspondence between attributes. However, no single technique managed to provide an accurate and comprehensive match for different types of data. In other words, some of the techniques treat numeric values as strings which undoubtedly adversely affected the match and further, the quality result of the matches. Likewise, other techniques tend to treat textual instances as numeric which might negatively influence the accuracy of the match. Thus, this thesis aims at investigating the performance of two different instance-based schema matching techniques. The study emphasizes on exploring the strengths and the weaknesses of each technique over various types of data sets. The study focuses on developing a syntactic instance-based schema matching technique named Regular Expression (RegEx) with WordNet database. While selecting Google similarity as a semantic instance-based schema matching technique. Both methods have been evaluated over three different data types, namely: (i) numeric, (ii) alphabetic, and (iii) mixed data types. Several analyses have been performed on real and synthetic data sets aiming at examining the match accuracy with respect to precision (P), recall (R) and F-measure (F).