Skip to content

tripathiGithub/Classification_on_unknown_features

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Objective:

We would like to predict that a person X will buy a product Y. There are a bunch of largely demographic related features available about person X. There are also features available around X’s past activities. We also have aggregated data about people who typically buy product Y available as features (their demographics and past activities). We however do not know which features are which with certainty. The variable C tells us if the person X actually bought the product Y

Full Notebook Report : https://nbviewer.jupyter.org/github/tripathiGithub/Classification_on_unknown_features/blob/main/ML_Classification.ipynb (Use this link only to see the code instead of using ipynb from github directly becacuse github does not renders ipynb files properly)

Target Distribution

image

It was an imbalance binary classification problem

  • I used 'class_weight' to deal with the imbalance, it allows models to give more weightage to minority class

Models Experimented:

  • Logistic Regression
  • Random Forest
  • LightGBM
  • XGboost
  • CatBoost

Metrics Used

  • Area Under Precision Recall Curve for choosing the best model and hyper-parameter tuning
  • F1-score for threshold tuning

Model Comparisions:

image

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

image

Test set Results:

image