Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan

Soomro, Anam

Publication:
Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan

cris.virtual.department	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtual.orcid	#PLACEHOLDER_PARENT_METADATA_VALUE#
cris.virtualsource.department	44808325-3d8b-43a8-a9a0-004ba410db39
cris.virtualsource.orcid	44808325-3d8b-43a8-a9a0-004ba410db39
dc.contributor.author	Soomro, Anam
dc.contributor.supervisor	Habeebullah Zakariyah
dc.contributor.supervisor	Asadullah Shah
dc.date.accessioned	2025-10-09T03:18:55Z
dc.date.available	2025-10-09T03:18:55Z
dc.date.issued	2025
dc.description.abstract	Loan default poses a significant challenge for the financial well-being of microfinance banks of Pakistan, affecting their profitability and overall financial stability. Traditional credit evaluation techniques such as the 5Cs model (Character, Capital, Capacity, Collateral, and Conditions) are human-centric and inherently inefficient, at risk of inaccuracies. Consequently, numerous borrowers at risk of default continue to obtain loans, thus increasing financial risk exposure. To address these challenges, this study applies machine-learning techniques to develop a data-driven loan default prediction model aimed at enhancing risk assessment in microfinance banks. Methodology: This research is a mixed-mode study based on quantitative and qualitative methods. Quantitatively four machine-learning algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), and Gradient Boosting (GB) are trained and validated on a large real-world dataset gathered from two microfinance banks of Pakistan. The dataset was split into training and testing sets, with 70% of the data used for training and 30% for testing. Feature selection and preprocessing strategies like handling missing values, outlier detection, and normalization of data were employed to improve predictive performance. Whereas the performance metrics accuracy, precision, recall, and F1 score were used to assess the predictive performance of the models. The qualitative aspect of the study involved surveys of loan officers. This was to verify that what was produced by Machine learning was crosschecked against real lending behavior. Findings: The research determined the most influential predictors of loan defaults to be debt-to-income ratio, instalment amount, and annual income, total number of accounts, loan amount, interest rate, open account, public records, loan term and purpose of debt consolidation. The results indicate that Gradient Boosting performed better than the other models with the best accuracy (77.75%) and precision (55.33%), although recall (11.33%) and F1 score of (18.81%) limited it. Comparatively Random Forest showed slightly less accuracy (77.58%), precision (53.85%), recall (10.19%) and F1 score of (17.14%) whereas Logistic Regression shown reasonable accuracy (77.39%), precision (51.74%) but had the lowest recall (8.74%) and F1 score of (14.95%). The Decision Tree model, while interpretable, had the least overall performance, achieving an accuracy of (68.9%) precision (31.15%), recall (30.32%) and an F1 score of (30.73%). In addition, to crosscheck the machine learning results, practitioner perceptions were included, and it was reaffirmed that Debt to income ratio, instalment amount, and income level were some of the most dominant drivers in human credit evaluation. The research also points out differences between machine learning models and practitioner perceptions, including the use of credit history, climate risk, and borrower reputation drivers usually undervalued by automated models but significant in real-world lending situations. Implications: The findings of this study have strong implications for microfinance banks, policymakers, and regulators are advised to adopt machine learning based credit risk models for enhancing loan screening, minimizing Non-Performing Loans, and maintaining financial sustainability. This study also adds to the literature on financial technology (FinTech) by illustrating how machine learning can enhance credit risk analysis in emerging economies. Recommendations: Future research should focus on hybrid models that combine quantitative machine Learning techniques with qualitative expert judgment through structured interviews to give better complete credit risk assessment. Regulatory frameworks need to be developed to ensure fairness, transparency, and moral application of AI in credit risk management. Moreover, the incorporation of behavioral and macroeconomic factors in the dataset could improve loan default forecasting, as well.
dc.description.abstractarabic	يشكل التخلف عن سداد القروض تحديًا كبيرًا أمام الاستدامة المالية لبنوك التمويل الأصغر في باكستان، مما يؤثر على ربحيتها واستقرارها المالي العام. إن تقنيات تقييم الائتمان التقليدية مثل نموذج 5Cs (الشخصية ورأس المال والقدرة والضمانات والشروط) تتمحور حول الإنسان وغير فعالة بطبيعتها، ومعرضة لخطر عدم الدقة. ونتيجة لذلك، يواصل العديد من المقترضين المعرضين لخطر التخلف عن السداد الحصول على القروض، مما يزيد من التعرض للمخاطر المالية. ولمعالجة هذه التحديات، تطبق هذه الدراسة تقنيات التعلم الآلي لتطوير نموذج تنبؤ بالتخلف عن سداد القروض قائم على البيانات يهدف إلى تحسين تقييم المخاطر في بنوك التمويل الأصغر. المنهجية: هذا البحث هو دراسة مختلطة تعتمد على الأساليب الكمية والنوعية. من الناحية الكمية، يتم تدريب أربع خوارزميات للتعلم الآلي، مثل شجرة القرار (DT) والانحدار اللوجستي (LR) والغابة العشوائية (RF) وتعزيز التدرج (GB) والتحقق من صحتها على مجموعة بيانات كبيرة من العالم الحقيقي تم جمعها من بنكين للتمويل الأصغر في باكستان. تم تقسيم مجموعة البيانات إلى مجموعات تدريب واختبار، مع استخدام 70٪ من البيانات للتدريب و 30٪ للاختبار. تم استخدام استراتيجيات اختيار الميزات والمعالجة المسبقة مثل التعامل مع القيم المفقودة واكتشاف القيم المتطرفة وتطبيع البيانات لتحسين الأداء التنبئي. تم تقييم أداء النماذج باستخدام مقاييس، F1لتقييم الأداء التنبئي للنماذج. تضمن الجانب النوعي للدراسة استطلاعات لموظفي القروض. كان هذا للتحقق من أن ما تم إنتاجه بواسطة التعلم الآلي قد تم التحقق منه مقابل سلوك الإقراض الحقيقي. النتائج: حدد البحث أن أكثر العوامل المؤثرة في التخلف عن سداد القروض هي نسبة الدين إلى الدخل ومبلغ القسط والدخل السنوي وإجمالي عدد الحسابات ومبلغ القرض ومعدل الفائدة والحساب المفتوح والسجلات العامة ومدة القرض والغرض من توحيد الديون. تشير النتائج إلى أن تعزيز التدرج كان أفضل من النماذج الآلية الأخرى بأفضل دقة (77.75٪) ودقة (55.33٪)، على الرغم من أن الاسترجاع (11.33٪) ودرجة F1 (18.81٪) حدتا منه. بالمقارنة، أظهرت الغابة العشوائية دقة أقل قليلاً (77.58٪) ودقة (53.85٪) وتذكر (10.19٪) ودرجة F1 (17.14٪) بينما أظهر الانحدار اللوجستي دقة معقولة (77.39٪) ودقة (51.74٪) ولكنه كان لديه أدنى تذكر (8.74٪) ودرجة F1 (14.95٪). كان نموذج شجرة القرار، على الرغم من قابليته للتفسير، أقل أداء إجمالي، محققًا دقة (68.9٪) ودقة (31.15٪) وتذكر (30.32٪) ودرجة F1 (30.73٪). بالإضافة إلى ذلك، للتحقق من نتائج التعلم الآلي، تم تضمين تصورات الممارسين، وتم التأكيد على أن نسبة الدين إلى الدخل، ومبلغ القسط، ومستوى الدخل كانت من أكثر العوامل المهيمنة في تقييم الائتمان البشري. وأبرزت الدراسة كذلك وجود فجوات بين نتائج النماذج الآلية ورؤى الممارسين، بما في ذلك استخدام التاريخ الائتماني، ومخاطر المناخ، ومحركات سمعة المقترض التي عادةً ما يتم التقليل من قيمتها بواسطة النماذج الآلية ولكنها مهمة في حالات الإقراض في العالم الحقيقي. الآثار المترتبة: لنتائج هذه الدراسة آثار قوية على بنوك التمويل الأصغر، وصانعي السياسات، والهيئات التنظيمية، ويُنصح بتبني نماذج مخاطر الائتمان القائمة على التعلم الآلي لتحسين فحص القروض، وتقليل القروض المتعثرة، والحفاظ على الاستدامة المالية. تضيف هذه الدراسة أيضًا إلى الأدبيات المتعلقة بالتكنولوجيا المالية (FinTech) من خلال توضيح كيف يمكن للتعلم الآلي أن يعزز تحليل مخاطر الائتمان في الاقتصادات الناشئة. التوصيات: يجب أن تركز الأبحاث المستقبلية على النماذج الهجينة التي تجمع بين تقنيات التعلم الآلي الكمية وحكم الخبراء النوعي من خلال المقابلات المنظمة لإعطاء تقييم أفضل لمخاطر الائتمان. يجب تطوير أطر تنظيمية لضمان العدالة والشفافية والتطبيق الأخلاقي للذكاء الاصطناعي في إدارة مخاطر الائتمان. علاوة على ذلك، فإن دمج العوامل السلوكية والاقتصادية الكلية في مجموعة البيانات من شأنه أن يُحسّن التنبؤ بالتخلف عن سداد القروض.
dc.description.callnumber	et HG 178.33 P18 S711L 2025
dc.description.cpsemail	cps2u@iium.edu.my
dc.description.degreelevel	Doctoral
dc.description.identifier	Thesis : Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan / by Anam Soomro
dc.description.kulliyah	Institute of Islamic Banking and Finance (IIiBF)
dc.description.notes	Thesis (Ph.D)--International Islamic University Malaysia, 2025.
dc.description.physicaldescription	1 online resource (xvii, 162 leaves) ; color illustrations.
dc.description.programme	Doctor of Philosophy in Islamic Banking and Finance
dc.identifier.uri	https://studentrepo.iium.edu.my/handle/123456789/33297
dc.language.iso	en
dc.publisher	Kuala Lumpur : IIUM Institute of Islamic Banking and Finance, International Islamic University Malaysia, 2025
dc.rights	OWNED BY STUDENT
dc.subject	Loan Default prediction;Pakistan;Microfinance Banks
dc.subject.lcsh	Microfinance -- Pakistan
dc.subject.lcsh	Credit scoring systems -- Pakistan
dc.subject.lcsh	Machine learning
dc.title	Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan
dc.type	Doctoral Theses	en_US
dspace.entity.type	Publication
oairecerif.author.affiliation	#PLACEHOLDER_PARENT_METADATA_VALUE#

Files

Original bundle

Now showing 1 - 1 of 1

Name:: sa_loan_default_phd.pdf
Size:: 21.39 MB
Format:: Adobe Portable Document Format
Description:: Full Text

Download

Collections

IIiBF - Doctoral Theses

Publication: Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan

Files

Original bundle

Collections

Publication:
Loan default prediction using machine learning : an empirical study of microfinance banks in Pakistan