# Group 13 Project Proposal

In [1]:
pip install -U altair

Note: you may need to restart the kernel to use updated packages.



**Title: Credit Score Classification**

https://www.kaggle.com/datasets/parisrohan/credit-score-classification/data 


**Introduction:**
A credit report is a summary of a person’s credit history and is created when you borrow money or apply for a credit card. A credit score is a 3 digit number calculated based on your credit report that summarizes how well you manage your credit and how risky it would be for someone to lend you money. The higher credit score the better your rating is.


A credit score is calculated based on a few different factors such as:
- a person's anual income
- the number of credit cards they have
- the number of loans they have
- their credit card payment history
- how old their credit is
and more

**The Question we aim to answer:** Can we classify someone’s credit score based on certain banking history and financial traits (shown above)?

**Dataset description:** The dataset contains the bank and credit-related information of many individuals that have been amassed by a global finance company. It contains 27 columns of these information such as bank account history, loans, debt and EMI along with the number of credit cards a person has and their credit card payment history


In [2]:
import random
import altair as alt
import pandas as pd
import numpy as np
from sklearn import set_config
from sklearn.compose import make_column_transformer
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

### Reading the data from a URL

In [3]:
url_train = "https://raw.githubusercontent.com/vedika37/dsci100-proj/main/train.csv"
url_test = "https://raw.githubusercontent.com/vedika37/dsci100-proj/main/test.csv"
train = pd.read_csv(url_train,sep = ",", low_memory=False)
test = pd.read_csv(url_test,sep = ",", low_memory=False)

### Cleaning Training Data 

In [4]:
# dropping null values and columns not used in analysis
train = train.dropna()
train = train.iloc[:, 7:]
train

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,...,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,11.27,...,_,809.98,26.822620,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
6,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,8_,11.27,...,Good,809.98,22.537593,22 Years and 7 Months,No,49.574949,178.3440674122349,Low_spent_Small_value_payments,244.5653167062043,Good
8,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,4,5.42,...,Good,605.03,24.464031,26 Years and 7 Months,No,18.816215,104.291825168246,Low_spent_Small_value_payments,470.69062692529184,Standard
9,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,7,1,7.42,...,Good,605.03,38.550848,26 Years and 8 Months,No,18.816215,40.39123782853101,High_spent_Large_value_payments,484.5912142650067,Good
10,34847.84_,3037.986667,2,1385,6,1,Credit-Builder Loan,3,-1,5.42,...,_,605.03,33.224951,26 Years and 9 Months,No,18.816215,58.51597569589465,High_spent_Large_value_payments,466.46647639764313,Standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99994,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",20,6,9.5,...,_,502.38,39.323569,31 Years and 5 Months,No,35.104023,140.58140274528395,High_spent_Medium_value_payments,410.2561579776419,Poor
99995,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",23,7,11.5,...,_,502.38,34.663572,31 Years and 6 Months,No,35.104023,60.97133255718485,High_spent_Large_value_payments,479.86622816574095,Poor
99996,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",18,7,11.5,...,_,502.38,40.565631,31 Years and 7 Months,No,35.104023,54.18595028760385,High_spent_Medium_value_payments,496.651610435322,Poor
99997,39628.99,3359.415833,4,6,5729,2,"Auto Loan, and Student Loan",27,6,11.5,...,Good,502.38,41.255522,31 Years and 8 Months,No,35.104023,24.02847744864441,High_spent_Large_value_payments,516.8090832742814,Poor


In [5]:
# deleting garbage values
train = train[train["Payment_Behaviour"] != "!@9#%8"]
train

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,...,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,11.27,...,_,809.98,26.822620,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
6,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,8_,11.27,...,Good,809.98,22.537593,22 Years and 7 Months,No,49.574949,178.3440674122349,Low_spent_Small_value_payments,244.5653167062043,Good
8,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,4,5.42,...,Good,605.03,24.464031,26 Years and 7 Months,No,18.816215,104.291825168246,Low_spent_Small_value_payments,470.69062692529184,Standard
9,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,7,1,7.42,...,Good,605.03,38.550848,26 Years and 8 Months,No,18.816215,40.39123782853101,High_spent_Large_value_payments,484.5912142650067,Good
10,34847.84_,3037.986667,2,1385,6,1,Credit-Builder Loan,3,-1,5.42,...,_,605.03,33.224951,26 Years and 9 Months,No,18.816215,58.51597569589465,High_spent_Large_value_payments,466.46647639764313,Standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99991,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,25,18.31,...,Bad,3571.7,37.140784,6 Years and 3 Months,Yes,60.964772,34.66290609052614,High_spent_Large_value_payments,337.3629882027182,Standard
99994,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",20,6,9.5,...,_,502.38,39.323569,31 Years and 5 Months,No,35.104023,140.58140274528395,High_spent_Medium_value_payments,410.2561579776419,Poor
99995,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",23,7,11.5,...,_,502.38,34.663572,31 Years and 6 Months,No,35.104023,60.97133255718485,High_spent_Large_value_payments,479.86622816574095,Poor
99996,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",18,7,11.5,...,_,502.38,40.565631,31 Years and 7 Months,No,35.104023,54.18595028760385,High_spent_Medium_value_payments,496.651610435322,Poor


### Cleaning Testing Data

In [6]:
# dropping null values and columns not used in analysis
test = test.dropna()
test = test.iloc[:, 7:]
test

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance
0,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,11.27,2022.0,Good,809.98,35.030402,22 Years and 9 Months,No,49.574949,236.64268203272135,Low_spent_Small_value_payments,186.26670208571772
1,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,9,13.27,4.0,Good,809.98,33.053114,22 Years and 10 Months,No,49.574949,21.465380264657146,High_spent_Medium_value_payments,361.44400385378196
4,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,1,5.42,5.0,Good,605.03,25.926822,27 Years and 3 Months,No,18.816215,39.684018417945296,High_spent_Large_value_payments,485.2984336755923
5,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,3,5.42,5.0,Good,605.03,30.116600,27 Years and 4 Months,No,18.816215,251.62736875017606,Low_spent_Large_value_payments,303.3550833433617
7,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,2_,7.42,5.0,_,605.03,33.875167,27 Years and 6 Months,No,18.816215,153.53448761392985,!@9#%8,421.44796447960783
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49992,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,25,18.31,9.0,Bad,3571.7,32.391288,6 Years and 4 Months,Yes,60.964772,107.21074164760236,Low_spent_Small_value_payments,314.8151526456419
49993,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,25,18.31,12.0,Bad,3571.7,37.528511,6 Years and 5 Months,Yes,60.964772,71.79442082882734,Low_spent_Small_value_payments,350.23147346441687
49994,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,22,18.31,12.0,Bad,3571.7,27.027812,6 Years and 6 Months,Yes,60.964772,50.84684680498023,High_spent_Small_value_payments,341.179047488264
49997,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",23,5,13.5,7.0,Good,502.38,36.858542,32 Years and 0 Months,No,35.104023,__10000__,Low_spent_Large_value_payments,349.7263321025098


In [7]:
# deleting garbage values
test = test[test["Payment_Behaviour"] != "!@9#%8"]
test

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Type_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance
0,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,7,11.27,2022.0,Good,809.98,35.030402,22 Years and 9 Months,No,49.574949,236.64268203272135,Low_spent_Small_value_payments,186.26670208571772
1,19114.12,1824.843333,3,4,3,4,"Auto Loan, Credit-Builder Loan, Personal Loan,...",3,9,13.27,4.0,Good,809.98,33.053114,22 Years and 10 Months,No,49.574949,21.465380264657146,High_spent_Medium_value_payments,361.44400385378196
4,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,1,5.42,5.0,Good,605.03,25.926822,27 Years and 3 Months,No,18.816215,39.684018417945296,High_spent_Large_value_payments,485.2984336755923
5,34847.84,3037.986667,2,4,6,1,Credit-Builder Loan,3,3,5.42,5.0,Good,605.03,30.116600,27 Years and 4 Months,No,18.816215,251.62736875017606,Low_spent_Large_value_payments,303.3550833433617
9,143162.64,12187.220000,1,5,8,3,"Auto Loan, Auto Loan, and Not Specified",6,3,2.1,3.0,Good,1303.01,35.685836,18 Years and 6 Months,No,246.992319,453.6151305781054,Low_spent_Large_value_payments,788.1145499681528
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
49992,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,25,18.31,9.0,Bad,3571.7,32.391288,6 Years and 4 Months,Yes,60.964772,107.21074164760236,Low_spent_Small_value_payments,314.8151526456419
49993,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,25,18.31,12.0,Bad,3571.7,37.528511,6 Years and 5 Months,Yes,60.964772,71.79442082882734,Low_spent_Small_value_payments,350.23147346441687
49994,20002.88,1929.906667,10,8,29,5,"Personal Loan, Auto Loan, Mortgage Loan, Stude...",33,22,18.31,12.0,Bad,3571.7,27.027812,6 Years and 6 Months,Yes,60.964772,50.84684680498023,High_spent_Small_value_payments,341.179047488264
49997,39628.99,3359.415833,4,6,7,2,"Auto Loan, and Student Loan",23,5,13.5,7.0,Good,502.38,36.858542,32 Years and 0 Months,No,35.104023,__10000__,Low_spent_Large_value_payments,349.7263321025098


In [8]:
# checking the Payment_Behaviour column to see if garbage values still exist
train['Payment_Behaviour'].unique()

array(['High_spent_Small_value_payments',
       'Low_spent_Small_value_payments',
       'High_spent_Large_value_payments',
       'Low_spent_Large_value_payments',
       'High_spent_Medium_value_payments',
       'Low_spent_Medium_value_payments'], dtype=object)

In [9]:
test['Payment_Behaviour'].unique()

array(['Low_spent_Small_value_payments',
       'High_spent_Medium_value_payments',
       'High_spent_Large_value_payments',
       'Low_spent_Large_value_payments',
       'Low_spent_Medium_value_payments',
       'High_spent_Small_value_payments'], dtype=object)

In [10]:
# tidy dataframes
tidy_train = train.drop(columns = 'Type_of_Loan')
tidy_test = test.drop(columns = 'Type_of_Loan')
tidy_train

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,19114.12,1824.843333,3,4,3,4,3,7,11.27,4.0,_,809.98,26.822620,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
6,19114.12,1824.843333,3,4,3,4,3,8_,11.27,4.0,Good,809.98,22.537593,22 Years and 7 Months,No,49.574949,178.3440674122349,Low_spent_Small_value_payments,244.5653167062043,Good
8,34847.84,3037.986667,2,4,6,1,3,4,5.42,2.0,Good,605.03,24.464031,26 Years and 7 Months,No,18.816215,104.291825168246,Low_spent_Small_value_payments,470.69062692529184,Standard
9,34847.84,3037.986667,2,4,6,1,7,1,7.42,2.0,Good,605.03,38.550848,26 Years and 8 Months,No,18.816215,40.39123782853101,High_spent_Large_value_payments,484.5912142650067,Good
10,34847.84_,3037.986667,2,1385,6,1,3,-1,5.42,2.0,_,605.03,33.224951,26 Years and 9 Months,No,18.816215,58.51597569589465,High_spent_Large_value_payments,466.46647639764313,Standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99991,20002.88,1929.906667,10,8,29,5,33,25,18.31,9.0,Bad,3571.7,37.140784,6 Years and 3 Months,Yes,60.964772,34.66290609052614,High_spent_Large_value_payments,337.3629882027182,Standard
99994,39628.99,3359.415833,4,6,7,2,20,6,9.5,3.0,_,502.38,39.323569,31 Years and 5 Months,No,35.104023,140.58140274528395,High_spent_Medium_value_payments,410.2561579776419,Poor
99995,39628.99,3359.415833,4,6,7,2,23,7,11.5,3.0,_,502.38,34.663572,31 Years and 6 Months,No,35.104023,60.97133255718485,High_spent_Large_value_payments,479.86622816574095,Poor
99996,39628.99,3359.415833,4,6,7,2,18,7,11.5,3.0,_,502.38,40.565631,31 Years and 7 Months,No,35.104023,54.18595028760385,High_spent_Medium_value_payments,496.651610435322,Poor


### Summarizing Training Data

In [11]:
# each column along with its datatype
tidy_train.dtypes

Annual_Income                object
Monthly_Inhand_Salary       float64
Num_Bank_Accounts             int64
Num_Credit_Card               int64
Interest_Rate                 int64
Num_of_Loan                  object
Delay_from_due_date           int64
Num_of_Delayed_Payment       object
Changed_Credit_Limit         object
Num_Credit_Inquiries        float64
Credit_Mix                   object
Outstanding_Debt             object
Credit_Utilization_Ratio    float64
Credit_History_Age           object
Payment_of_Min_Amount        object
Total_EMI_per_month         float64
Amount_invested_monthly      object
Payment_Behaviour            object
Monthly_Balance              object
Credit_Score                 object
dtype: object

In [12]:
# generating descriptive statistics for numeric and 'object' type columns
tidy_train.describe()

Unnamed: 0,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Delay_from_due_date,Num_Credit_Inquiries,Credit_Utilization_Ratio,Total_EMI_per_month
count,49120.0,49120.0,49120.0,49120.0,49120.0,49120.0,49120.0,49120.0
mean,4025.099363,16.895582,22.882736,76.739088,21.97616,27.121498,32.223997,1443.061398
std,3096.560883,115.872316,129.613883,482.042963,15.218689,187.787616,5.058518,8385.261482
min,303.645417,-1.0,0.0,1.0,-5.0,0.0,20.88125,4.462837
25%,1574.233333,4.0,4.0,8.0,10.0,3.0,28.051176,41.132527
50%,2987.101667,6.0,6.0,15.0,19.0,6.0,32.256987,78.321042
75%,5715.376667,8.0,7.0,22.0,29.0,9.0,36.412784,168.846508
max,15204.633333,1798.0,1499.0,5797.0,67.0,2594.0,49.564519,82331.0


### Visualizing Data - Distribution of Predictor Variables

In [13]:
# taking a subset of the data since the original is too big for charts
subset_train = tidy_train.iloc[:1000, :]
subset_train

Unnamed: 0,Annual_Income,Monthly_Inhand_Salary,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Delay_from_due_date,Num_of_Delayed_Payment,Changed_Credit_Limit,Num_Credit_Inquiries,Credit_Mix,Outstanding_Debt,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Total_EMI_per_month,Amount_invested_monthly,Payment_Behaviour,Monthly_Balance,Credit_Score
0,19114.12,1824.843333,3,4,3,4,3,7,11.27,4.0,_,809.98,26.822620,22 Years and 1 Months,No,49.574949,80.41529543900253,High_spent_Small_value_payments,312.49408867943663,Good
6,19114.12,1824.843333,3,4,3,4,3,8_,11.27,4.0,Good,809.98,22.537593,22 Years and 7 Months,No,49.574949,178.3440674122349,Low_spent_Small_value_payments,244.5653167062043,Good
8,34847.84,3037.986667,2,4,6,1,3,4,5.42,2.0,Good,605.03,24.464031,26 Years and 7 Months,No,18.816215,104.291825168246,Low_spent_Small_value_payments,470.69062692529184,Standard
9,34847.84,3037.986667,2,4,6,1,7,1,7.42,2.0,Good,605.03,38.550848,26 Years and 8 Months,No,18.816215,40.39123782853101,High_spent_Large_value_payments,484.5912142650067,Good
10,34847.84_,3037.986667,2,1385,6,1,3,-1,5.42,2.0,_,605.03,33.224951,26 Years and 9 Months,No,18.816215,58.51597569589465,High_spent_Large_value_payments,466.46647639764313,Standard
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2115,29233.34,2637.111667,6,4,17,3,9,13,8.9,5.0,_,96.31,29.278739,9 Years and 7 Months,Yes,50.714455,52.61005015810621,High_spent_Small_value_payments,420.3866611928658,Standard
2117,29233.34,2637.111667,6,4,17,3,9,13,8.9,5.0,Standard,96.31,33.036581,9 Years and 9 Months,Yes,50.714455,118.1865112952499,Low_spent_Large_value_payments,364.810200055722,Standard
2118,29233.34,2637.111667,6,4,17,3,9,13,2.9000000000000004,5.0,_,96.31,28.714455,9 Years and 10 Months,Yes,13320.000000,346.7761351132651,Low_spent_Small_value_payments,156.22057623770692,Standard
2119,29233.34,2637.111667,6,4,17,3,9,11,8.9,5.0,_,96.31,28.262163,9 Years and 11 Months,Yes,50.714455,232.83089261028897,Low_spent_Large_value_payments,250.16581874068302,Standard


In [14]:
# distribution of monthly salary
monthly_inhand_salary_distribution = alt.Chart(subset_train).mark_bar().encode(
    x =  alt.X("Monthly_Inhand_Salary", title = "Monthly Inhand Salary").bin(),
   y = alt.Y("count()")
)
monthly_inhand_salary_distribution

In [15]:
# distribution of number of payment behaviour
payment_behaviour_distribution = alt.Chart(subset_train).mark_bar().encode(
   x = alt.X("count()"),
   y =  alt.Y("Payment_Behaviour").title("Payment Behaviour")
)
payment_behaviour_distribution

In [16]:
# distribution of credit utilization ratio
credit_utilization_ratio_distribution = alt.Chart(subset_train).mark_bar().encode(
    x = alt.X("Credit_Utilization_Ratio").title("Credit Utilization Ratio").bin(),
   y = alt.Y("count()")
)
credit_utilization_ratio_distribution

In [None]:
# sample visualization
training_plot = alt.Chart(subset_train).mark_bar().encode(
    x = alt.X("Credit_Score").title("Credit score"),
    y = alt.Y("Monthly_Inhand_Salary").title("Monthly Inhand Salary")
)
training_plot

### Methods

We plan to conduct our data analysis using the K Nearest Neighbors Classification algrorithm. We'll choose the best value of k using cross-validation and then use the following predictors to predict whether someone's Credit Score is Good, Standard or Poor.

Predictors:
Annual_income
- Monthly_Inhand_Salary
- Num_Bank_Accounts
- Num_Credit_Card
- Interest_Rate
- Num_of_Loan
- Delay_from_due_date
- Num_of_Delayed_Payment
- Changed_Credit_Limit
- Num_Credit_Inquiries
- Credit_Mix
- Outstanding_Debt
- Credit_Utilization_Ratio
- Credit_History_Age
- Payment_of_Min_Amount
- Total_EMI_per_month
- Amount_invested_monthly
Payment_Behaviour
Monthly_Balance


**Describing our visualizations**
We will plot histograms to visualize the distributions of our predictors. This will help us explore the relation between predicted credit score and the factors that may impact the credit score more than others like income, missed payments, credit utilization ratios.


### Expected outcomes and significance
- *What do you expect to find?* 
  
  People with more loans to be categorized in a ‘lower’ category. People with a bigger income, older credit account  age (more credit history) and fewer delayed payments to be in  a better category. Having a mix of credit types - loans/credit cards/mortgages also results in a better score.
  
- *What impact could such findings have?*
  - Helping banks predict whether it is a good idea to issue a new credit card to a new user
  - Can influence an individual’s credit limit/interest rate.
  - Studying the relation of individual factors with credit score category classification.
  
- *What future questions could this lead to?* 
  
  How do we evaluate or categorize a new person who has just started working and does not have a long enough credit history.
