MORTGAGE DATA MINING

PROJECT BACKGROUND

Whilst the Bank of England have recently announced ‘scrapped’ affordability tests due to other reliable sources for affordability, (Financial Policy Committee confirms withdrawal of Mortgage Market Affordability Test 2023), the Financial Conduct Authority maintain the compulsory policy that responsible lending should be a major objective for banks; according to the FCA handbook, in section MCOB 11.6.2 R, it has been stated that banks must assess the customer’s ability to repay debt, (MCOB 11.6 1 Responsible Lending and Financing 2014).

Therefore, this became the main objective of the proposed data model by Random Forest and C4.5 – to assess data patterns which can support rigorous assessments of affordability criterions.

PROJECT AIM

The aim of this project was to assess the capabilities of each algorithm in capturing valuable information from the existing dataset provided and identifying which algorithm explores areas of the decision space in mortgage cultures efficiently, such as, affordability criteria – this was examined through the comparison of various data models built by the selected algorithms.

THE DATA

The mortgage dataset provided contained various continuous and discrete data, aside from personal identifiable data such as Customer ID, Names, Ages, Gender, etc.,

The data outlined was discussed, and explored for analysis of its’ significance to the algorithms selected and it’s value towards the assessment of affordability criterions - as these are machine learning algorithms that are expected to learn the behaviours and patterns of a customer who has defaulted or paid existing mortgages or loans, it was crucial that the data provided to the algorithm is pre-processed and simplified prior to the upcoming experiments.

THE 8 EXPERIMENTS

Following this process aimed to develop further understanding of the data to assess any areas of erroneous or unrealistic data that may exist and examine the ethical, legal, and social issues behind the current dataset, in hopes of manipulating the algorithm to adopt mortgage-led thinking as a method of implementation for the data model it develops.

I conducted 8 experiments to establish how hyperparameter tuning supports algorithms in making informed decisions, I also ensured to make use of further classification accuracy methods/scores such as ROC Curves, True-Positive Rates and Error Rates, as it is widely known that classification accuracy scores are not the only measurement of performance.