Machine learning algorithms and data visualisation for credit cards default prediction #284
Labels
Data Analysis
Data analysis tasks
Hacktoberfest
For hacktoberfest
Intermediate
For intermediate difficulty
Machine Learning
Python
Python
Description
Your task is to predict the probability that a credit card owner will default based on his/her characteristics and payment history. This is a classification problem.
The dataset can be found here.
The features convey the following information:
X1: Amount of the given credit (New Taiwan dollar): it includes both the individual consumer credit and his/her family (supplementary) credit.
X2: Gender (1 = male; 2 = female).
X3: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
X4: Marital status (1 = married; 2 = single; 3 = others).
X5: Age (year).
X6 - X11: History of past payment (from April to September, 2005)
Where
X6 = the repayment status in September, 2005;
X7 = the repayment status in August, 2005;
...
X11 = the repayment status in April, 2005.
And the measurement scale for the repayment status is:
-1 = pay duly;
1 = payment delay for one month;
2 = payment delay for two months;
...
8 = payment delay for eight months;
9 = payment delay for nine months and above.
X12-X17: Amount of bill statement (New Taiwan dollar).
Where
X12 = amount of bill statement in September, 2005;
X13 = amount of bill statement in August, 2005;
...
X17 = amount of bill statement in April, 2005.
X18-X23: Amount of previous payment (New Taiwan dollar).
Where
X18 = amount paid in September, 2005;
X19 = amount paid in August, 2005; . . .
. . . X23 = amount paid in April, 2005.
Details
Issue requirements / progress
All algorithms and ensembles must be scores using RMSE, Logloss and Accuracy metrics. Each pull request must only fulfill one of the tasks below.
Plots:
Algorithms:
Cross-validations/Ensembles:
Resources
List of resources that might be required / helpful.
Here are a few resources that may help you:
a. Scatter-plot: https://seaborn.pydata.org/generated/seaborn.scatterplot.html
b. Heat-map: http://seaborn.pydata.org/generated/seaborn.heatmap.html
c. lmplot: https://seaborn.pydata.org/generated/seaborn.lmplot.html
Directory Structure
The following convention must be adhered to when placing your solution files.
Plots:
/machineLearning/credit_default/plots/sp/<solution_file>
/machineLearning/credit_default/plots/hm/<solution_file>
/machineLearning/credit_default/plots/lp/<solution_file>
Algorithms:
/machineLearning/credit_default/algo/svc/<solution_file>
/machineLearning/credit_default/algo/lr/<solution_file>
/machineLearning/credit_default/algo/knn/<solution_file>
/machineLearning/credit_default/algo/gnb/<solution_file>
/machineLearning/credit_default/algo/dtc/<solution_file>
/machineLearning/credit_default/algo/rfc/<solution_file>
/machineLearning/credit_default/algo/mlp/<solution_file>
/machineLearning/credit_default/algo/xgb/<solution_file>
/machineLearning/credit_default/algo/lgbm/<solution_file>
Ensembles:
/machineLearning/credit_default/ens/10-xgb/<solution_file>
/machineLearning/credit_default/ens/10-lgbm/<solution_file>
/machineLearning/credit_default/ens/avg/<solution_file>
Note
Please claim the issue first by commenting here before starting to work on it. Feel free to contact @amukh18 or @CinnamonRolls1 with any issues at any time.
The text was updated successfully, but these errors were encountered: