We would like to predict that a person X will buy a product Y. There are a bunch of largely demographic related features available about person X. There are also features available around X’s past activities. We also have aggregated data about people who typically buy product Y available as features (their demographics and past activities). We however do not know which features are which with certainty. The variable C tells us if the person X actually bought the product Y
Full Notebook Report : https://nbviewer.jupyter.org/github/tripathiGithub/Classification_on_unknown_features/blob/main/ML_Classification.ipynb (Use this link only to see the code instead of using ipynb from github directly becacuse github does not renders ipynb files properly)
- I used 'class_weight' to deal with the imbalance, it allows models to give more weightage to minority class
- Logistic Regression
- Random Forest
- LightGBM
- XGboost
- CatBoost
- Area Under Precision Recall Curve for choosing the best model and hyper-parameter tuning
- F1-score for threshold tuning