# Loan Risk Assessment using Machine Learning 

 In today's world, each one of us does our best effort to keep our financial records clean. Everytime we plan to buy or rent a property or even buy a credit card, a credit check is run inorder to verify the financial standing of a applicant. Depending on the financial history, a user is offered a good interest rate. 

The Lending loan club is a online platform designed to bring together the investors and borrowers by encouraging a fair monetary transaction of lending and borrowing. This has helped to foster growth of small businesses and let people take control of their debt.

We have built a machine a model which is capable of predicting the eligibility for a loan applicant and interest rate if eligible. Let's begin with understanding how our model is structured.

### Preliminary Step
 An applicant begins with creating an online profile and is required to enter his detail stating his requirements. The system is designed to check the user’s eligibility for the quoted requirement based on his/her financial standing and market situation. The user is intimated about his eligibility and if eligible, the amount for which the loan was sanctioned with the interest rate. This is the most critical aspect and is the business generator for the Lending Club. It has to be ensured that the given interest rate is fair to both the customer and the bank from the transaction point of view.

For our study, we have scraped data from the Lending Club Data website. While the data was readily available on Kaggle, it was not updated(older data). We scraped data for the applicants with their loan request approved and declined for our study. We structured our understanding of the model and drew the below outline.

## Structure of our Model

<img src="document.jpg">

## Exploratory Data Analysis 

We have done a pre-study of the data trend to understand how the data was trending. This can be seen in the link [here](0.Exploratory Data Analysis.html)

## Data Download

We created a user profile from a registered email id and built a [LUIGI](LUIGI_Commands) pipeline and used [Beautiful Soup](http://www.pythonforbeginners.com/beautifulsoup/beautifulsoup-4-python) package in order to scrape into [declined.py](declined.py) and [approved.py](loandata.py). The data for each of the cases were downloaded by two separate .py files. These files are attached along with this package. 

## Pre-processing of Data

The downloaded data needed cleaning like removal of punctuations, junk text, white spaces in between and changing data types for operable columns.

### Handling Missing Values

We decided that to delete the columns of whose values were missing for over 80%. For the other columns with values missing below 80%, we filled them up by using mean, median, max of the columns.

Introducing Risk Score column which is a derived column by averaging the max and min FICO score.

### Feature Engineering :

Owing to the size of the data of the loan approved and declined data, we needed to narrow our scope of our input data(over 100 columns) to fewer columns. We studied the input data asked to a new applicant,and selected only those columns and did feature selection for further processing.

Once this is done, we have our files ready for building our model.

# Phase 1 : Assessment of Loan Eligibility - Classification Model

Building the classification model for deciding whether the user is eligible for loan. Please check [Classification Model](1.Classification.html) to understand how we built our classification model

### As per our structure, if a user is not eligible, a message displaying "You are currently not eligible for the loan" is displayed. If eligible, the model moves to the phase 2, where the profile is evaluated for the loan amount and interest sanctioned to the user.

# Phase 2 : Evaluation of Loan Amount and Interest Rate sanctioned

We have built used three techniques to evaluate an eligible applicant's profile. Click on the links below to see the model built and its performance in each case.

### K-means Clustering :
Clusters are formed as [this](2.1.Clustering Algorithm_final.html)

Applying Logistic Regression, K-NN Regressor, Random Forest Regressor and Multi Layer Perceptron Regressor to each of our clusters. We also calculate the RMSE and MAE value for the train and test data as a performance metric for each of our models to understand the better performing model.
##### Below is the clusters formed
[Cluster 1](3.1.KmeansCluster0Regression.html)
[Cluster 2](3.2.KmeansCluster1Regression.html)
[Cluster 3](3.3.KmeansCluster2Regression.html)
[Cluster 4](3.4.KmeansCluster3Regression.html)

### Manual Clustering (Risk Score)

Clusters are formed as [this](2.1.ManualClustering.html)

Applying Logistic Regression, K-NN Regressor, Random Forest Regressor and Multi Layer Perceptron Regressor to each of our clusters. We also calculate the RMSE and MAE value for the train and test data as a performance metric for each of our models to understand the better performing model.

##### Below is the clusters formed
[Cluster 1](2.2.ManualCluster1Regression.html)
[Cluster 2](2.3.ManualCluster2Regression.html)
[Cluster 3](2.4.ManualCluster3Regression.html)
[Cluster 4](2.5.ManualCluster4Regression.html)

### No Clustering 

Applying Logistic Regression, K-NN Regressor, Random Forest Regressor and Multi Layer Perceptron Regressor to each of our clusters. We also calculate the RMSE and MAE value for the train and test data as a performance metric for each of our models to understand the better performing model.

Follow this [link](4.NoclusteringRegression.html) to see the piece of code

Reading our data downloaded from the [declined.py](declined.py) and [approved.py](loandata.py) files

The text in the document by Tushar Goel & Trupti Gore is licensed under CC BY 3.0 https://creativecommons.org/licenses/by/3.0/us/

### MIT License

The code in the document by Tushar Goel & Trupti Gore is licensed under the MIT License https://opensource.org/licenses/MIT 

Copyright (c) 2018 Trupti Gore

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
