Publication:
Comparative study between regular expression and google similarity index for instance based schema matching

Date

2016

Journal Title

Journal ISSN

Volume Title

Publisher

Gombak, Selangor : International Islamic University Malaysia, 2016

Subject LCSH

Ontologies (Information retrieval)
Artificial intelligence--Data processing
Semantic integration (Computer systems)

Subject ICSI

Call Number

t TK 5105.88815 A478C 2016

Research Projects

Organizational Units

Journal Issue

Abstract

Schema matching is considered as one of the essential phases of database integration. The aim of the schema matching process is to identify the correlation between Schemas which help later in the data integration process. The main issue concern during schema matching is how to support the merging decision by providing the correspondence between attributes through syntactic and semantic heterogeneous in data sources. There have been a lot of attempts in the literature toward utilizing database instances to detect the correspondence between attributes during schema matching process. Many schema matching approaches based on instances have been proposed aiming at improving the accuracy of the matching process. We observed that no single technique managed to provide accurate matching for different types of data. In other words, some of the techniques treat numeric values as strings. This will negatively influence the process of discovering the match and further on the quality of match results. Similarly, other techniques treat textual instance, as numeric, and this will also impact the quality of the match result. Thus, a comparative study between syntactic and semantic techniques is needed. The study should emphasize on analyzing these techniques deeply in order to determine the strengths and weaknesses of each technique. This thesis aims at developing two schema matching techniques, namely: (i) regular expression and (ii) Google similarity to identify the match between attributes for numeric, alphabetic and mix instances. Furthermore, comparing these techniques and evaluate their performance empirically. Several analyses have been conducted on real and synthetic datasets to evaluate the performance of the schema matching techniques considered in this thesis with respect to Precision (P), Recall (R) and F-Measure.

Description

Keywords

Citation

Collections