Titanic Prediction Model

Sailing through Data: Who Survives the Titanic?

Link to project repository on GitHub

Goal: build a machine learning model to predict if a passenger survived the sinking of the Titanic or not.
For each in the test set, you must predict a 0 or 1 value for the variable (Classifier).

Submissions are evaluated on accuracy. The score is the percentage of passengers you correctly predict (this is known as accuracy).

Steps

  1. EDA
  2. Feature Engineering
    • Family Size: Larger families might have different survival rates compared to solo travelers.
    • Person's Title: (e.g., Ms, Mr) Titles can provide insight into age, gender, and social status, which might affect survival chances.
    • Cabin Deck: The deck could correlate with proximity to lifeboats and thus survival rates.
    • Cabin Assigned: Passengers who have not been assigned a cabin might have different survival probabilities compared to those with recorded cabin details.
    • Age Group: Different age groups might have had different survival probabilities.
    • Fare Price Groups: I will create different groups of fare price, which can capture non-linear relationships between fare and survival.
    • Name Length: Especially in the early 1900s, a person with a longer name could indicate importance, which can impact survival rate.
  3. Preprocessing
    • Dealing With Nulls
    • Split the Data
    • Create Pipelines + Transform Columns
  4. Visualize and Understand Data
    • Histogram
    • KDE
    • Pie Chart
    • Heatmap
  5. Define Models
    I created 5 models:
    • Model 1: Random Forest Classifier
    • Model 2: Logistic Regression
    • Model 3: K-Nearest Neighbours
    • Model 4: XGBoost
    • Model 5: Adaptive Boost
  6. Create Competition Submission

Result of Model Evaluations

Model 1: Random Forest Regressor

Model 2: Logistic Regression

Model 3: K-Nearest Neighbours

Model 4: XGBoost

Model 5: Adaptive Boost

Competition Scores (Best to Worst)

Data

The dataset used in this project is available publicly on Kaggle: https://www.kaggle.com/competitions/titanic/data

Technologies

Python