Loan Eligibility Model

Binary Loan Eligibility Model with Different Machine Learning Algorithms.

Often many people think that Neural Network models are so much better than other algorithms, but in some cases, this is not true, because a Neural Network consumes a lot of memory and the training time is quite big.

So, there are better Machine Learning Algorithms to choose from, that are more light, with a short Training Time, and can achieve the same Accuracy, Recall and Precision as a Neural Network Model.

Modules Needed:

import numpy as np 
import pandas as pd
import sklearn
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
import tensorflow as tf

Loading the Training and Testing Dataset

The following Loan Eligibility Dataset is at the bottom of this Blog.

train_dataset = pd.read_csv("./data/loan-train.csv")
test_dataset = pd.read_csv("./data/loan-test.csv")

Before feeding the training dataset to the different models, some features of the dataset have missing values and need some preprocessing, like One-Hot-Encoding and Standardization.

Filling Missing Values:

train_dataset['Credit_History'].fillna(train_dataset['Credit_History'].mode(), inplace=True) 
test_dataset['Credit_History'].fillna(test_dataset['Credit_History'].mode(), inplace=True) 

train_dataset['LoanAmount'].fillna(train_dataset['LoanAmount'].mean(), inplace=True) 
test_dataset['LoanAmount'].fillna(test_dataset['LoanAmount'].mean(), inplace=True) 

train_dataset['Gender'].fillna(train_dataset['Gender'].mode()[0], inplace=True)
test_dataset['Gender'].fillna(test_dataset['Gender'].mode()[0], inplace=True)

train_dataset['Dependents'].fillna(train_dataset['Dependents'].mode()[0], inplace=True)
test_dataset['Dependents'].fillna(test_dataset['Dependents'].mode()[0], inplace=True)

train_dataset['Married'].fillna(train_dataset['Married'].mode()[0], inplace=True)
test_dataset['Married'].fillna(test_dataset['Married'].mode()[0], inplace=True)

train_dataset['Credit_History'].fillna(train_dataset['Credit_History'].mean(), inplace=True)
test_dataset['Credit_History'].fillna(test_dataset['Credit_History'].mean(), inplace=True)

train_dataset['Loan_Amount_Term'].fillna(train_dataset['Loan_Amount_Term'].mean(), inplace=True)
test_dataset['Loan_Amount_Term'].fillna(test_dataset['Loan_Amount_Term'].mean(), inplace=True)

train_dataset['Self_Employed'].fillna(train_dataset['Self_Employed'].mode()[0], inplace=True)
test_dataset['Self_Employed'].fillna(test_dataset['Self_Employed'].mode()[0], inplace=True)

le = LabelEncoder()

feature_col = ['Property_Area','Education', 'Dependents']

for col in feature_col:
    train_dataset[col] = le.fit_transform(train_dataset[col])
    test_dataset[col] = le.fit_transform(test_dataset[col])

Replacing Features Values with Numeric Values:

train_dataset.Loan_Status = train_dataset.Loan_Status.replace({"Y": 1, "N" : 0})

train_dataset.Gender = train_dataset.Gender.replace({"Male": 1, "Female" : 0})
test_dataset.Gender = test_dataset.Gender.replace({"Male": 1, "Female" : 0})

train_dataset.Married = train_dataset.Married.replace({"Yes": 1, "No" : 0})
test_dataset.Married = test_dataset.Married.replace({"Yes": 1, "No" : 0})

train_dataset.Self_Employed = train_dataset.Self_Employed.replace({"Yes": 1, "No" : 0})
test_dataset.Self_Employed = test_dataset.Self_Employed.replace({"Yes": 1, "No" : 0})

Checking Different Models Accuracies

Logistic Regression Model

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

logistic_model = LogisticRegression()

logistic_model = LogisticRegression()

features = ['Dependents', 'Education',
       'Self_Employed', 'ApplicantIncome', 'CoapplicantIncome', 'LoanAmount',
       'Loan_Amount_Term', 'Credit_History', 'Property_Area']

X_Train = train_dataset[features]
Y_Train = train_dataset[["Loan_Status"]]

X_Test = test_dataset[features]

Applying Standard Scaler:

numerical_features = X_Train.select_dtypes(include=['float64', 'int64'])

numerical_columns = numerical_features.columns

ct = ColumnTransformer([("only numeric", StandardScaler(), numerical_columns)], remainder='passthrough')

X_Train = ct.fit_transform(X_Train)
Y_Train = train_dataset[["Loan_Status"]]

X_Test = ct.transform(X_Test)

logistic_model.fit(X_Train, Y_Train)

Logistic Regression Model Accuracy:

y_predicted = logistic_model.predict(X_Train)

print(sklearn.metrics.classification_report(Y_Train, y_predicted))

Neural Network Model

num_features = len(features)

model = tf.keras.Sequential()

model.add(tf.keras.layers.InputLayer(input_shape=(num_features,)))
model.add(tf.keras.layers.Dense(units=128, kernel_regularizer=tf.keras.regularizers.L2(l2=0.001), activation="relu", kernel_initializer="he_normal"))
model.add(tf.keras.layers.Dropout(rate=0.4))
model.add(tf.keras.layers.Dense(units=64, kernel_regularizer=tf.keras.regularizers.L2(l2=0.001), activation="relu", kernel_initializer="he_normal"))
model.add(tf.keras.layers.Dropout(rate=0.2))
model.add(tf.keras.layers.Dense(units=32, kernel_regularizer=tf.keras.regularizers.L2(l2=0.001), activation="relu", kernel_initializer="he_normal"))
model.add(tf.keras.layers.Dense(units=1, activation="sigmoid"))

model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=["acc"]
)

history = model.fit(
    X_Train, Y_Train, 
    batch_size=4, 
    epochs=40, 
    validation_split=0.2)

Neural Network Model Accuracy:

y_predicted = model.predict(X_Train)

y_predicted = y_predicted.flatten()

y_predicted = np.where(y_predicted > 0.5, 1, 0)

print(sklearn.metrics.classification_report(Y_Train, y_predicted))

Naive Bayes Model

from sklearn.naive_bayes import GaussianNB

nb_model = GaussianNB()
nb_model = nb_model.fit(X_Train, Y_Train)

Naive Bayes Model Accuracy:

y_pred = nb_model.predict(X_Train)
print(sklearn.metrics.classification_report(Y_Train, y_pred))

SVM Model

from sklearn.svm import SVC

svm_model = SVC()
svm_model = svm_model.fit(X_Train, Y_Train)

SVM Model Accuracy:

y_pred = svm_model.predict(X_Train)
print(sklearn.metrics.classification_report(Y_Train, y_pred))

Overall, the Logistic Regression and SVM Model performs almost the same as the Neural Network Model, where the Logistic Regression Model achieve an Accuracy of 81%, the SVM Model 82%, and the Neural Network 82%.

Resources:

Code: https://github.com/mendez-luisjose/Loan-Eligibility