# Lending Club Case Study

## **1. Introduction**



### **1.1. Objective**
Analyse the given data set of a leading lending company, identify the risks and issues with the applicant's borrowing pattern. Prepare a detailed case study document highlighting the risks involved and provide my observation and recommendations.

### **1.2. Problem Statement**
- Make a decision on a loan application when it's received based on the risk factors
- Should not reject an application if the applicant has the potential to repay the loan
- Should not approve an application if the applicant could possibly default

## **2. Data Understanding**



### **2.1. Import necessary libraries**


In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Pandas by default doesn't display all the columns in the dataframe
# As we're going to work on a large dataset, the following setting will help read data from all the columns
pd.set_option('display.max_columns', None)

# For some of the columns we may have to see the data from all rows
# Eg: Categorical columns.
pd.set_option('display.max_rows', None)

# Set themes, styles and color palette for seaborn charts
sns.set_theme(style='darkgrid', context='paper')
sns.set_palette(palette='pastel')

plt.figure(figsize=(12, 4))

plt.rcParams["axes.titlesize"] = 14 
plt.rcParams["axes.labelsize"] = 12 
plt.rcParams["xtick.labelsize"] = 10 
plt.rcParams["ytick.labelsize"] = 10 

<Figure size 1200x400 with 0 Axes>

### **2.2. Data Overview**

In [11]:
# Load the data from loan.csv file. 
# Using the read_csv function from the pandas library, we can load the data from the csv to a pandas dataframe.
df = pd.read_csv('Data/loan.csv', low_memory=False)

# Shape property of the pandas dataframe returns the dimention of the dataframe
# The return value will contain the number of rows and columns in the dataframe
df.shape

# Quick observation
#-------------------
# 1. There are 39717 rows and
# 2. There are 111 columns
#-------------------

(39717, 111)

In [12]:
df.dtypes

# Looking into the dtypes property of the dataframe, help us understand the data type of each column
#
# Quick observation
#-------------------
# 1. Most of the columns are of type object.
# 2. `issue_d` looks like a date column
# 3. `grade`, `sub_grade`, `term`, `loan_status`, `verification_status` etc. looks like category columns
# 4. Columns like `id`, `member_id`, `url`, `desc` may not be useful for analysing the risk. Hence can be removed.
#-------------------

id                                  int64
member_id                           int64
loan_amnt                           int64
funded_amnt                         int64
funded_amnt_inv                   float64
term                               object
int_rate                           object
installment                       float64
grade                              object
sub_grade                          object
emp_title                          object
emp_length                         object
home_ownership                     object
annual_inc                        float64
verification_status                object
issue_d                            object
loan_status                        object
pymnt_plan                         object
url                                object
desc                               object
purpose                            object
title                              object
zip_code                           object
addr_state                        

In [15]:
# Pandas dataframe object offers `nunique` function which give the number of unique elements in each column
df.nunique().sort_values(ascending=False)

# Quick observation
#-------------------
# 1. Most of the columns are of type object.
# 2. `issue_d` looks like a date column
# 3. `grade`, `sub_grade`, `term`, `loan_status`, `verification_status` etc. looks like category columns
# 4. Columns like `id`, `member_id`, `url`, `desc` may not be useful for analysing the risk. Hence can be removed.
#-------------------

id                                39717
url                               39717
member_id                         39717
total_pymnt                       37850
total_pymnt_inv                   37518
total_rec_int                     35148
last_pymnt_amnt                   34930
emp_title                         28820
desc                              26526
revol_bal                         21711
title                             19615
installment                       15383
funded_amnt_inv                    8205
total_rec_prncp                    7976
annual_inc                         5318
recoveries                         4040
dti                                2868
collection_recovery_fee            2616
total_rec_late_fee                 1356
out_prncp_inv                      1138
out_prncp                          1137
revol_util                         1089
funded_amnt                        1041
loan_amnt                           885
zip_code                            823


## **3. Data Preparation**



### **3.1. Data Cleaning**
- Remove columns
- Handle missing / null values
- Remove duplicates
- Calculate IQR and remove outliers



### **3.2. Data Engineering**
- Create derived columns
- Map categorical variables
- Convert data types



### **3.2. Data Exploration**
- Summarize initial understanding

## **4. Exploratory Data Analysis (EDA)**



### **4.1. Univariate Analysis**



#### **4.1.1. Numeric Variables**
- Histogram for distribution
- Boxplots to find outliers



#### **4.1.2. Categorical Variables**
- Bar charts for frequency distribution



### **4.2. Segmented Univariate Analysis**
- Boxplots by categorical variables



### **4.3. Bivariate Analysis**



#### **4.3.1. Numeric Variables by Numeric Variables**
- Scatter plots
- Correlation heatmaps



#### **4.3.2. Numeric Variables by Categorical Variables**
- Boxplots for comparitive analysis
- Violin plots to find the density



#### **4.3.3. Categorical Variables by Categorical Variables**
- Boxplots grouped by categories
- Heatmaps to find proportions



## **5. Summary and Results**



### **5.1. Summary**
- List the summary of insights leading to loan default



### **5.2. Recommendations**
