Skip to content

Latest commit

 

History

History
32 lines (23 loc) · 1.73 KB

File metadata and controls

32 lines (23 loc) · 1.73 KB

Objective:

We would like to predict that a person X will buy a product Y. There are a bunch of largely demographic related features available about person X. There are also features available around X’s past activities. We also have aggregated data about people who typically buy product Y available as features (their demographics and past activities). We however do not know which features are which with certainty. The variable C tells us if the person X actually bought the product Y

Full Notebook Report : https://nbviewer.jupyter.org/github/tripathiGithub/Classification_on_unknown_features/blob/main/ML_Classification.ipynb (Use this link only to see the code instead of using ipynb from github directly becacuse github does not renders ipynb files properly)

Target Distribution

image

It was an imbalance binary classification problem

  • I used 'class_weight' to deal with the imbalance, it allows models to give more weightage to minority class

Models Experimented:

  • Logistic Regression
  • Random Forest
  • LightGBM
  • XGboost
  • CatBoost

Metrics Used

  • Area Under Precision Recall Curve for choosing the best model and hyper-parameter tuning
  • F1-score for threshold tuning

Model Comparisions:

image

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

image

Test set Results:

image