Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Machine learning algorithms and data visualisation for credit cards default prediction #284

Closed
15 tasks
amukh18 opened this issue Oct 12, 2019 · 0 comments
Closed
15 tasks
Labels
Data Analysis Data analysis tasks Hacktoberfest For hacktoberfest Intermediate For intermediate difficulty Machine Learning Python Python

Comments

@amukh18
Copy link
Contributor

amukh18 commented Oct 12, 2019

Description

Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.

The dataset can be found here.

The features convey the following information:
X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment (from April to September, 2005)
Where
X6 = the repayment status in September, 2005;
X7 = the repayment status in August, 2005;
...
X11 = the repayment status in April, 2005.

And the measurement scale for the repayment status is:
-1 = pay duly;
1 = payment delay for one month;
2 = payment delay for two months;
...
8 = payment delay for eight months;
9 = payment delay for nine months and above.

X12-X17: Amount of bill statement (New Taiwan dollar).
Where
X12 = amount of bill statement in September, 2005;
X13 = amount of bill statement in August, 2005;
...
X17 = amount of bill statement in April, 2005.

X18-X23: Amount of previous payment (New Taiwan dollar).
Where
X18 = amount paid in September, 2005;
X19 = amount paid in August, 2005; . . .
. . . X23 = amount paid in April, 2005.


Details

  • Technical Specifications: pandas, python, sci-kit, numpy
  • Type of issue: Multiple
  • Time Limit:
    • Plots must be implemented within 24 hours of being taken up.
    • Algorithms must be implemented within 2 days of being taken up.
    • Cross-validations/ensembles must be implemented within 3 days of being taken up.

Issue requirements / progress

All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.

Plots:

  • Scatterplot (X12-X17 vs X12-X17)
  • Heatmap
  • lmplot

Algorithms:

  • Support vector classifier
  • Logistic regression
  • K-nearest neighbors
  • Gaussian Naive Bayes
  • Random forest classifier
  • XGBoost Classifier
  • LightGBM Classifier
  • Multi-layer Perceptron Classifier
  • Decision Trees Classifier

Cross-validations/Ensembles:

  • 10-fold cross-validation of XGBoost
  • 10-fold cross-validation of LightGBM
  • Average (non-weighted) the predictions of a 10-fold cross-validation of XGBoost and LightGBM (both must be implemented in the fold)

Resources

List of resources that might be required / helpful.
Here are a few resources that may help you:

  1. NumPy documentation: https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html
  2. Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html
  3. Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/
  4. Jupyter Notebook installation and tutorial : https://www.dataquest.io/blog/jupyter-notebook-tutorial/
  5. XGBoost documentation: https://xgboost.readthedocs.io/en/latest/
  6. LightGBM documentation: https://lightgbm.readthedocs.io/en/latest/
  7. Scikit-learn documentation
  8. Seaborn documentation
    a. Scatter-plot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
    b. Heat-map: http://seaborn.pydata.org/generated/seaborn.heatmap.html
    c. lmplot: https://seaborn.pydata.org/generated/seaborn.lmplot.html

Directory Structure

The following convention must be adhered to when placing your solution files.

Plots:

  • For Scatter-plot
    /machineLearning/credit_default/plots/sp/<solution_file>
  • For Heatmap
    /machineLearning/credit_default/plots/hm/<solution_file>
  • For lmplot
    /machineLearning/credit_default/plots/lp/<solution_file>

Algorithms:

  • For Support Vector Classifier:
    /machineLearning/credit_default/algo/svc/<solution_file>
  • For Logistic Regression:
    /machineLearning/credit_default/algo/lr/<solution_file>
  • For K-Nearest Neighbors:
    /machineLearning/credit_default/algo/knn/<solution_file>
  • For Gaussian Naive Bayes:
    /machineLearning/credit_default/algo/gnb/<solution_file>
  • For Decision Trees Classifier:
    /machineLearning/credit_default/algo/dtc/<solution_file>
  • For Random Forest Classifier:
    /machineLearning/credit_default/algo/rfc/<solution_file>
  • For Multi-layer Perceptron Classifier:
    /machineLearning/credit_default/algo/mlp/<solution_file>
  • For XGBoost:
    /machineLearning/credit_default/algo/xgb/<solution_file>
  • For LightGBM:
    /machineLearning/credit_default/algo/lgbm/<solution_file>

Ensembles:

  • For 10-fold XGBoost:
    /machineLearning/credit_default/ens/10-xgb/<solution_file>
  • For 10-fold LightGBM:
    /machineLearning/credit_default/ens/10-lgbm/<solution_file>
  • For average(non-weighted) of predictions of 10-fold XGBoost and 10-fold LightGBM
    /machineLearning/credit_default/ens/avg/<solution_file>

Note

Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.

@amukh18 amukh18 changed the title Issue: Credit card default prediction Issue 284: Credit card default prediction Oct 12, 2019
@amukh18 amukh18 changed the title Issue 284: Credit card default prediction Issue: Credit card default prediction Oct 12, 2019
@amukh18 amukh18 changed the title Issue: Credit card default prediction Machine learning algorithms and data visualisation for credit cards default prediction Oct 12, 2019
@SaurabhAgarwala SaurabhAgarwala added the Hacktoberfest For hacktoberfest label Oct 12, 2019
@hpgupt hpgupt added Data Analysis Data analysis tasks Intermediate For intermediate difficulty Machine Learning Python Python labels Oct 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Analysis Data analysis tasks Hacktoberfest For hacktoberfest Intermediate For intermediate difficulty Machine Learning Python Python
Projects
None yet
Development

No branches or pull requests

4 participants