# The effect of socioeconomic factors on credit card approval
**Authors:**
* Brian McGiffin / *directory id* / *uid*
* Walter Osborne / *directory id* / *uid*
* Cedric Prentice / cprentic / 117196856

## Introduction
Credit is an increasingly important tool for Americans. The increasing costs of products like [housing](https://www.whitehouse.gov/cea/written-materials/2021/09/09/housing-prices-and-inflation/), [cars](https://fred.stlouisfed.org/series/CUSR0000SETA02), and home appliances mean that it is difficult or impossible for most Americans to buy them outright. Besides allowing people to make larger purchases than they otherwise could have, people with good credit get another big advantage: better terms for almost all credit products. People with good credit can get higher credit limits, larger loan amounts (for things like mortgages), longer loan terms, and lower interest rates.  
  
Unfortunately, not everyone has an equal chance to reap the opportunities credit provides. Historic inequalities mean that African Americans, for example, face significant financial disadvantages compared to white Americans. According to the [Center for American Progress](https://www.americanprogress.org/article/systematic-inequality/), black households have fewer personal savings, and they are more likely to need to use those savings (because of negative income shocks). This lack of available financial resources causes black households to fall into more debt than white households. All that debt makes it harder to get lines of credit.  
  
By looking at existing credit approval data, we can investigate how socioeconomic factors, like ethnicity, citizenship, and occupation, affect credit approval and credit scores. Over this tutorial, we will cover the [data science lifecycle](https://www.datascience-pm.com/data-science-life-cycle/): data collection, data processing, exploratory analysis and data visualization, analysis, and interpretation.  
  
### Table of contents:
1. *TODO: Insert a table of contents here*
  
### Aside: [credit scores](https://www.investopedia.com/terms/c/credit_score.asp)
The most important datapoint of credit is the credit score. A credit score is a number that rates a consumer’s credit worthiness. It ranges from 300 to 850, with a higher score indicating a consumer that is more worthy. Lenders use it to evaluate the probability that a borrower will repay loans in a timely manner. There are five main factors that impact credit score:
1. Payment history (35% of score)
2. Total amount owed (30% of score)
3. Length of credit history (15% of score)
4. Types of credit (10% of score)
5. New credit (10% of score)

## Data collection
### Modules used
TODO: Provide description about the libraries we used and provide links to official documentation. This doesn't have to be long.

In [2]:
# Import the required modules
import pandas as pd
import os

### Importing the data
The first step of the data science lifecycle is importing data. Our data is downloadable from [Kaggle](https://www.kaggle.com/datasets/samuelcortinhas/credit-card-approval-clean-data), but it is originally sourced from The University of California, Irvine. **Note that certain columns have been rescaled to protect the anonymity of the applicants.** Many of the columns are self-explanatory, but a brief description of the confusing columns are below:
* Gender: 0 = female, 1 = male
* Married: 0 = single, divorced, etc.; 1 = married
* Drivers license: 0 = no license, 1 = license
* Approved: 0 = not approved for card, 1 = approved for card
* The zip code column is randomized for the applicants’ privacy
* The outstanding debt and income columns are rescaled for privacy, but the original distribution is preserved  
  
The raw data is in CSV (Comma-Separated Value) format. To upload the data, we used the ```read_csv``` function from the [Pandas](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) library.

In [7]:
# Load the data
cwd = os.getcwd()
df = pd.read_csv(cwd + '/dataset.csv')
df

Unnamed: 0,Gender,Age,Debt,Married,BankCustomer,Industry,Ethnicity,YearsEmployed,PriorDefault,Employed,CreditScore,DriversLicense,Citizen,ZipCode,Income,Approved
0,1,30.83,0.000,1,1,Industrials,White,1.25,1,1,1,0,ByBirth,202,0,1
1,0,58.67,4.460,1,1,Materials,Black,3.04,1,1,6,0,ByBirth,43,560,1
2,0,24.50,0.500,1,1,Materials,Black,1.50,1,0,0,0,ByBirth,280,824,1
3,1,27.83,1.540,1,1,Industrials,White,3.75,1,1,5,1,ByBirth,100,3,1
4,1,20.17,5.625,1,1,Industrials,White,1.71,1,0,0,0,ByOtherMeans,120,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
685,1,21.08,10.085,0,0,Education,Black,1.25,0,0,0,0,ByBirth,260,0,0
686,0,22.67,0.750,1,1,Energy,White,2.00,0,1,2,1,ByBirth,200,394,0
687,0,25.25,13.500,0,0,Healthcare,Latino,2.00,0,1,1,1,ByBirth,200,1,0
688,1,17.92,0.205,1,1,ConsumerStaples,White,0.04,0,0,0,0,ByBirth,280,750,0


## Processing the data
Text

In [None]:
# Code block