Publication:
A comparative study of instance-based schema matching in relational database

Date

2017

Journal Title

Journal ISSN

Volume Title

Publisher

Kuala Lumpur :International Islamic University Malaysia,2017

Subject LCSH

Data integration (Computer science)
Relational databases
Semantic computing

Subject ICSI

Call Number

t QA 76.9 D338 K45C 2017

Research Projects

Organizational Units

Journal Issue

Abstract

Schema matching is deemed to be indispensable process for database integration in many contemporary database systems. The aim of schema matching is to identify the correlation across a schema which eventually serves the data integration process. The main issue concern for data integration is to support the merging decision advocating correspondence among attributes of heterogeneous data sources. Numerous schema matching techniques have been suggested in literature for utilizing database instances in detecting correspondence between attributes. However, no single technique managed to provide an accurate and comprehensive match for different types of data. In other words, some of the techniques treat numeric values as strings which undoubtedly adversely affected the match and further, the quality result of the matches. Likewise, other techniques tend to treat textual instances as numeric which might negatively influence the accuracy of the match. Thus, this thesis aims at investigating the performance of two different instance-based schema matching techniques. The study emphasizes on exploring the strengths and the weaknesses of each technique over various types of data sets. The study focuses on developing a syntactic instance-based schema matching technique named Regular Expression (RegEx) with WordNet database. While selecting Google similarity as a semantic instance-based schema matching technique. Both methods have been evaluated over three different data types, namely: (i) numeric, (ii) alphabetic, and (iii) mixed data types. Several analyses have been performed on real and synthetic data sets aiming at examining the match accuracy with respect to precision (P), recall (R) and F-measure (F).

Description

Keywords

Citation

Collections