Machine learning algorithms and data visualisation for credit cards default prediction #284

amukh18 · 2019-10-12T06:16:40Z

Description

Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.

The dataset can be found here.

The features convey the following information:
X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment (from April to September, 2005)
_{Where

X6 = the repayment status in September, 2005;

X7 = the repayment status in August, 2005;

...

X11 = the repayment status in April, 2005.}

_{And the measurement scale for the repayment status is:

-1 = pay duly;

1 = payment delay for one month;

2 = payment delay for two months;

...

8 = payment delay for eight months;

9 = payment delay for nine months and above.}

X12-X17: Amount of bill statement (New Taiwan dollar).
_{Where

X12 = amount of bill statement in September, 2005;

X13 = amount of bill statement in August, 2005;

...

X17 = amount of bill statement in April, 2005.}

X18-X23: Amount of previous payment (New Taiwan dollar).
_{Where

X18 = amount paid in September, 2005;

X19 = amount paid in August, 2005; . . .

. . . X23 = amount paid in April, 2005.}

Details

Technical Specifications: pandas, python, sci-kit, numpy
Type of issue: Multiple
Time Limit:
- Plots must be implemented within 24 hours of being taken up.
- Algorithms must be implemented within 2 days of being taken up.
- Cross-validations/ensembles must be implemented within 3 days of being taken up.

Issue requirements / progress

All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.

Plots:

Scatterplot (X12-X17 vs X12-X17)
Heatmap
lmplot

Algorithms:

Cross-validations/Ensembles:

10-fold cross-validation of XGBoost
10-fold cross-validation of LightGBM
Average (non-weighted) the predictions of a 10-fold cross-validation of XGBoost and LightGBM (both must be implemented in the fold)

Resources

List of resources that might be required / helpful.
Here are a few resources that may help you:

NumPy documentation: https://docs.scipy.org/doc/numpy-1.13.0/reference/index.html
Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html
Pandas documentation: https://pandas.pydata.org/pandas-docs/stable/
Jupyter Notebook installation and tutorial : https://www.dataquest.io/blog/jupyter-notebook-tutorial/
XGBoost documentation: https://xgboost.readthedocs.io/en/latest/
LightGBM documentation: https://lightgbm.readthedocs.io/en/latest/
Scikit-learn documentation
- Logistic regression: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
- Support vector classifier: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
- K-nearest neighbors: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html
- Gaussian Naive Bayes: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html
- Random forest classifier : https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
- Multi-layer perceptron classifier: https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier
- Decision tree classifier: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
Seaborn documentation
a. Scatter-plot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
b. Heat-map: http://seaborn.pydata.org/generated/seaborn.heatmap.html
c. lmplot: https://seaborn.pydata.org/generated/seaborn.lmplot.html

Directory Structure

The following convention must be adhered to when placing your solution files.

Plots:

For Scatter-plot
/machineLearning/credit_default/plots/sp/<solution_file>
For Heatmap
/machineLearning/credit_default/plots/hm/<solution_file>
For lmplot
/machineLearning/credit_default/plots/lp/<solution_file>

Algorithms:

For Support Vector Classifier:
/machineLearning/credit_default/algo/svc/<solution_file>
For Logistic Regression:
/machineLearning/credit_default/algo/lr/<solution_file>
For K-Nearest Neighbors:
/machineLearning/credit_default/algo/knn/<solution_file>
For Gaussian Naive Bayes:
/machineLearning/credit_default/algo/gnb/<solution_file>
For Decision Trees Classifier:
/machineLearning/credit_default/algo/dtc/<solution_file>
For Random Forest Classifier:
/machineLearning/credit_default/algo/rfc/<solution_file>
For Multi-layer Perceptron Classifier:
/machineLearning/credit_default/algo/mlp/<solution_file>
For XGBoost:
/machineLearning/credit_default/algo/xgb/<solution_file>
For LightGBM:
/machineLearning/credit_default/algo/lgbm/<solution_file>

Ensembles:

For 10-fold XGBoost:
/machineLearning/credit_default/ens/10-xgb/<solution_file>
For 10-fold LightGBM:
/machineLearning/credit_default/ens/10-lgbm/<solution_file>
For average(non-weighted) of predictions of 10-fold XGBoost and 10-fold LightGBM
/machineLearning/credit_default/ens/avg/<solution_file>

Note

Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.

The text was updated successfully, but these errors were encountered:

amukh18 changed the title ~~Issue: Credit card default prediction~~ Issue 284: Credit card default prediction Oct 12, 2019

amukh18 changed the title ~~Issue 284: Credit card default prediction~~ Issue: Credit card default prediction Oct 12, 2019

amukh18 changed the title ~~Issue: Credit card default prediction~~ Machine learning algorithms and data visualisation for credit cards default prediction Oct 12, 2019

SaurabhAgarwala added the Hacktoberfest For hacktoberfest label Oct 12, 2019

hpgupt added Data Analysis Data analysis tasks Intermediate For intermediate difficulty Machine Learning Python Python labels Oct 16, 2019

niranjanrao1509 closed this as completed Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Machine learning algorithms and data visualisation for credit cards default prediction #284

Machine learning algorithms and data visualisation for credit cards default prediction #284

amukh18 commented Oct 12, 2019

Machine learning algorithms and data visualisation for credit cards default prediction #284

Machine learning algorithms and data visualisation for credit cards default prediction #284

Comments

amukh18 commented Oct 12, 2019

Description

Details

Issue requirements / progress

Plots:

Algorithms:

Cross-validations/Ensembles:

Resources

Directory Structure

Plots:

Algorithms:

Ensembles:

Note