Publication: Full optimisation of imbalance techniques for Qur'anic data using genetic algorithm
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Subject LCSH
Subject ICSI
Call Number
Abstract
The holy Qur' an is the first fundamental resource of legislation and law in the Muslim community. The Qur'anic text has been covered by Islamic scholars to offer Qur'anic knowledge quickly and systematically, such as digital Qur'an, and Qur'anic computing. This is performed using the techniques of text mining to automate the Qur'anic text. The classification of Qur' anic verses is one of the focal points in many research, which is conducted through automatic Qur' anic classification. The purpose of Qur' anic classification is to assign the most appropriate topics that are predefined to a specified Qur'anic verse according to its content. However, some properties in the Qur'anic topics such as imbalanced classes could weaken the perfonnance of classification when these classes are classified using traditional classification. Imbalanced classes occur when the sample number of classes in the dataset is not equal. As noticed in the dictionary used in this research, many Qur'anic topics are unequal in the number of verses, which means the problem of imbalanced classes will occur when these topics are classified together using traditional classification. The main problem that this study tries to solve is obtaining equal accuracies for all classes of Qur'anic topics during the classification process. Therefore, this study attempts to explore a new approach to categorise the Qur' anic topics based on imbalanced learning and a genetic algorithm that is called optimisation learning. The technique of imbalanced classification was applied to solve the problem of imbalanced classes existing in the Qur' anic topics. The genetic algorithm was used as an optimisation objective before the implementation of classification. This optimisation was performed for the samples of Qur' anic text to adjust the convergence and spacing between the samples, whether in the same class or among the classes. This adjusting leads to improve the performance of Quranic topics classification. Three cases of optimisation were experimented in this study using the proposed techniques: partial optimisation with oversampling, full optimisation without oversampling, and full optimisation with oversampling. These cases were implemented by using three new oversampling methods, Genetic Oversampling (GOS) and Harmonised Oversampling method based on Genetic Algorithm (HOGA-I and HOGA- 2). In conclusion, the third case of optimisation achieved the best results. Meanwhile, all proposed methods outperformed significantly the other famous methods that have been used widely to classify imbalanced datasets, which are Synthetic Minority Oversampling Technique (SMOTE), Random Undersampling (RUS), and Random Oversampling (ROS). According to the experiment results, GOS method outperformed SMOTE and ROS methods, which were the second best methods among the other previous methods in Specificity metric by I 2% using the validation technique of I 0- fold cross-validation. Meanwhile, HOGA-I method outperfonned the closest method in Matthews Correlation Coefficient (MCC) metric by 7% using the validation technique of training-testing. HOGA-2 method, which was the best among all proposed oversampling methods, outperfonned all closest methods in Sensitivity/Recall, Balanced Accuracy, and Geometric Mean (G-Mean) metrics by I 0% using the validation technique of I 0-fold cross-validation.