Objective:

We would like to predict that a person X will buy a product Y. There are a bunch of largely demographic related features available about person X. There are also features available around X’s past activities. We also have aggregated data about people who typically buy product Y available as features (their demographics and past activities). We however do not know which features are which with certainty. The variable C tells us if the person X actually bought the product Y

Full Notebook Report : https://nbviewer.jupyter.org/github/tripathiGithub/Classification_on_unknown_features/blob/main/ML_Classification.ipynb (Use this link only to see the code instead of using ipynb from github directly becacuse github does not renders ipynb files properly)

Target Distribution

It was an imbalance binary classification problem

I used 'class_weight' to deal with the imbalance, it allows models to give more weightage to minority class

Models Experimented:

Logistic Regression
Random Forest
LightGBM
XGboost
CatBoost

Metrics Used

Area Under Precision Recall Curve for choosing the best model and hyper-parameter tuning
F1-score for threshold tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Objective:

Target Distribution

It was an imbalance binary classification problem

Models Experimented:

Metrics Used

Model Comparisions:

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

Test set Results:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Objective:

Target Distribution

It was an imbalance binary classification problem

Models Experimented:

Metrics Used

Model Comparisions:

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

Test set Results: