Objective:

We would like to predict that a person X will buy a product Y. There are a bunch of largely demographic related features available about person X. There are also features available around X’s past activities. We also have aggregated data about people who typically buy product Y available as features (their demographics and past activities). We however do not know which features are which with certainty. The variable C tells us if the person X actually bought the product Y

Full Notebook Report : https://nbviewer.jupyter.org/github/tripathiGithub/Classification_on_unknown_features/blob/main/ML_Classification.ipynb (Use this link only to see the code instead of using ipynb from github directly becacuse github does not renders ipynb files properly)

Target Distribution

It was an imbalance binary classification problem

I used 'class_weight' to deal with the imbalance, it allows models to give more weightage to minority class

Models Experimented:

Logistic Regression
Random Forest
LightGBM
XGboost
CatBoost

Metrics Used

Area Under Precision Recall Curve for choosing the best model and hyper-parameter tuning
F1-score for threshold tuning

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Results		Results
ML_Classification.ipynb		ML_Classification.ipynb
README.md		README.md
ignore		ignore
model.joblib		model.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Objective:

Target Distribution

It was an imbalance binary classification problem

Models Experimented:

Metrics Used

Model Comparisions:

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

Test set Results:

About

Releases

Packages

Languages

tripathiGithub/Classification_on_unknown_features

Folders and files

Latest commit

History

Repository files navigation

Objective:

Target Distribution

It was an imbalance binary classification problem

Models Experimented:

Metrics Used

Model Comparisions:

Best Model - CatBoost with PR-AUC = 0.44 and F1 score = 0.52 (on validation set)

Test set Results:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages