In Ghana, many people and small businesses can't use traditional credit scoring systems because they don't have formal financial histories. This means that a lot of people can't get credit, which makes it harder for people to get credit, especially in rural and underserved areas. This study fills this gap by creating and testing a fair, AI-driven credit scoring model that uses non-traditional financial data, such as electricity utility payment histories, to determine creditworthiness. An experimental quantitative methodology was employed to gather anonymized data from 250 households in five rural communities, which was then augmented with synthetic data produced by Conditional Tabular Generative Adversarial Networks (CTGAN) to improve robustness. We created and tested three machine learning models: Logistic Regression, Random Forest, and XGBoost using a "Train on Synthetic, Test on Real" method to make sure they could be used in the real world. We used metrics like accuracy, precision, recall, and AUC-ROC to measure how well the model worked, and Demographic Parity across different communities to measure how fair it was. Logistic Regression had the best balance between performance and interpretability, with an AUC of 0.852. It was chosen as the best model because it was clear, easy to explain, and met regulatory requirements. To reduce algorithmic bias, a fairness-by-design approach was used, limiting model inputs to behavioral utility features and leaving out demographic or regional variables. Explainability analyses confirmed the logical coherence of predictions, pinpointing recent and significant payment delinquencies as primary indicators of risk. The results show that using alternative data can create fair and accurate credit scoring systems that can help Ghana's unbanked and underbanked people get access to financial services. In addition to its technical contributions, the study highlights the necessity of incorporating fairness and transparency into AI design, while also advocating for supportive regulatory frameworks to guarantee ethical implementation within Ghana's developing financial ecosystem. This research ultimately demonstrates that locally relevant, interpretable AI-driven models can enhance financial inclusion, mitigate systemic biases, and establish a foundation for a more equitable digital finance ecosystem in Ghana.
Research Area
Machine Learning: Machine Learning (ML) research in Computer Science and Information Technology focuses on the development of algorithms and models that enable computers to learn from data and improve their performance over time without being explicitly programmed. It is a subset of Artificial Intelligence that uses statistical techniques to give machines the ability to learn patterns, make decisions, and predict outcomes based on data.
Supervised learning, a key area of ML research, involves training models on labeled data, where the input-output relationships are predefined. This method is widely used for tasks such as classification (e.g., spam detection) and regression (e.g., predicting house prices). Unsupervised learning, on the other hand, involves finding hidden patterns in data without predefined labels, with clustering and association being typical applications in areas such as customer segmentation and anomaly detection.
Reinforcement learning is another area of ML that focuses on teaching agents to make decisions by interacting with their environment and receiving feedback in the form of rewards or penalties. It is often applied in robotics, game playing, and autonomous systems, where continuous learning and adaptation are required.
Project Main Objective
The main objective of this research is to develop and evaluate a fair Machine Learning credit scoring model framework for Ghana: addressing bias and strengthening regulatory oversight in a digital financial ecosystem