Publication: Skyline queries in large-scale and incomplete graphs using machine learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Subject LCSH
Database management
Subject ICSI
Call Number
Abstract
Skyline queries are widely used in multi-criteria decision-making to identify non-dominated data points which balance conflicting preferences. While skyline computation has been studied extensively in relational and complete databases, limited attention has been given to skyline query processing in large-scale incomplete graph databases. The challenges include the dynamic and evolving nature of graphs with frequent additions and deletions of nodes, the prevalence of missing attribute values that disrupt dominance relationships and reduce query reliability. These challenges become critical in real-world applications such as recommendation systems, urban planning, fraud detection and location-based services, where incomplete or sparse data is common. Traditional approaches relying on relational-to-graph transformation and heavy preprocessing suffer from inefficiency, sparsity, and poor scalability when applied to high-dimensional graph data. The proposed study introduces an optimized framework for skyline query processing in incomplete graph databases by integrating machine learning techniques, including clustering-based optimization with the K-Means algorithm, dynamic data pruning, and adaptive indexing. The framework reduces computational overhead, handles missing values more effectively, and ensures accurate skyline retrieval under incomplete graph database. Experimental evaluation on synthetic graph datasets, designed to real-world incompleteness, demonstrates the framework's effectiveness. The proposed method achieves a reduction in query processing time of 30-50% and a dataset size reduction of up to 44.44% compared to traditional baseline algorithms. Cluster quality was validated using intrinsic metrics such as the Silhouette Score, ensuring the robustness of the groupings. The proposed solution significantly advances skyline query processing for complex, incomplete graph structures, contributing to more efficient and reliable decision-support systems, recommendation engines, and location-based services.
