Bank Customer Churn Prediction Model

Predict, Prevent, and Prosper - Predicting Churn with Machine Learning

Link to project repository on GitHub

Goal: build a machine learning model to predict if a bank's customer will churn (leave the bank) or not.
For each instance in the test set, you must predict a 0 or 1 value for the target variable (Classifier).

Steps

  1. EDA
    • data.describe()
    • Pie chart
    • Correlation heatmap
    • Histograms
    • KDE plots
    • Univariate Exploration
  2. Feature Engineering (note I will train models on original features and engineered features separately)
    • Age Group: grouping customers into age categories to capture non-linear relationships and improve the model’s ability to identify patterns.
    • Balance to Estimated Salary Ratio: provides insight into a customer's financial stability. A higher ratio may indicate financial strain, which could increase the likelihood of the customer leaving the bank.
    • Tenure Group: grouping tenure into categories to capture non-linear effects and reduce noise.
    • Loyalty Index: a composite feature reflecting customer loyalty, combining tenure, the number of products held, and whether they are an active member.
    • Balance Above 0: a binary feature indicating whether the customer's account balance is at or above zero.
  3. Preprocessing
    • Split data
    • Create pipeline
      • OneHot encoding
      • StandardScaler
  4. Create Models
    • Model 1: Random Forest Classifier
    • Model 2: Logistic Regression
    • Model 3: Support Vector Machine
    • Model 4: K-Nearest Neighbours
    • Model 5: XGBoost
    • Visualize models:
      • Confusion matrix
      • Display feature importance (when applicable)
  5. Create Models with Engineered Features
    • Repeat above (10 models total)

Result of Model Evaluations

Model Results

Data

The dataset used in this project is available publicly on Kaggle: https://www.kaggle.com/datasets/shrutimechlearn/churn-modelling

Technologies

Python