Publication: Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Subject LCSH
Credit scoring systems -- Pakistan
Machine learning
Subject ICSI
Call Number
Abstract
Loan default poses a significant challenge for the financial well-being of microfinance banks of Pakistan, affecting their profitability and overall financial stability. Traditional credit evaluation techniques such as the 5Cs model (Character, Capital, Capacity, Collateral, and Conditions) are human-centric and inherently inefficient, at risk of inaccuracies. Consequently, numerous borrowers at risk of default continue to obtain loans, thus increasing financial risk exposure. To address these challenges, this study applies machine-learning techniques to develop a data-driven loan default prediction model aimed at enhancing risk assessment in microfinance banks. Methodology: This research is a mixed-mode study based on quantitative and qualitative methods. Quantitatively four machine-learning algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting (GB) are trained and validated on a large real-world dataset gathered from two microfinance banks of Pakistan. The dataset was split into training and testing sets, with 70% of the data used for training and 30% for testing. Feature selection and preprocessing strategies like handling missing values, outlier detection, and normalization of data were employed to improve predictive performance. Whereas the performance metrics accuracy, precision, recall, and F1 score were used to assess the predictive performance of the models. The qualitative aspect of the study involved surveys of loan officers. This was to verify that what was produced by Machine learning was crosschecked against real lending behavior. Findings: The research determined the most influential predictors of loan defaults to be debt-to-income ratio, instalment amount, and annual income, total number of accounts, loan amount, interest rate, open account, public records, loan term and purpose of debt consolidation. The results indicate that Gradient Boosting performed better than the other models with the best accuracy (77.75%) and precision (55.33%), although recall (11.33%) and F1 score of (18.81%) limited it. Comparatively Random Forest showed slightly less accuracy (77.58%), precision (53.85%), recall (10.19%) and F1 score of (17.14%) whereas Logistic Regression shown reasonable accuracy (77.39%), precision (51.74%) but had the lowest recall (8.74%) and F1 score of (14.95%). The Decision Tree model, while interpretable, had the least overall performance, achieving an accuracy of (68.9%) precision (31.15%), recall (30.32%) and an F1 score of (30.73%). In addition, to crosscheck the machine learning results, practitioner perceptions were included, and it was reaffirmed that Debt to income ratio, instalment amount, and income level were some of the most dominant drivers in human credit evaluation. The research also points out differences between machine learning models and practitioner perceptions, including the use of credit history, climate risk, and borrower reputation drivers usually undervalued by automated models but significant in real-world lending situations. Implications: The findings of this study have strong implications for microfinance banks, policymakers, and regulators are advised to adopt machine learning based credit risk models for enhancing loan screening, minimizing Non-Performing Loans, and maintaining financial sustainability. This study also adds to the literature on financial technology (FinTech) by illustrating how machine learning can enhance credit risk analysis in emerging economies. Recommendations: Future research should focus on hybrid models that combine quantitative machine Learning techniques with qualitative expert judgment through structured interviews to give better complete credit risk assessment. Regulatory frameworks need to be developed to ensure fairness, transparency, and moral application of AI in credit risk management. Moreover, the incorporation of behavioral and macroeconomic factors in the dataset could improve loan default forecasting, as well.
