# Data Story

The data (Bondora's loan book) can be download from: https://www.bondora.com/marketing/media/LoanData.zip

In [None]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

import warnings
warnings.filterwarnings("ignore", category=np.VisibleDeprecationWarning)

pd.options.display.max_rows = 125

In [None]:
import seaborn as sns
sns.set(color_codes=True)
sns.set(rc={"figure.figsize": (16, 4)})

In [None]:
loandata = pd.read_csv("data/loandata.csv", low_memory=False)

In [None]:
loandata.shape

### Number of loans per year

In [None]:
loandata['year'] = pd.to_datetime(loandata['ListedOnUTC']).dt.year

In [None]:
countByYear = loandata.groupby('year').size()

In [None]:
sns.barplot(x=countByYear.index,y=countByYear)

The number of loans pear year seems to be increasing since 2012.

### Average salary per year and country

In [None]:
t = loandata[['year', 'IncomeTotal', 'Country']]

In [None]:
t[(t['year'] > 2010) & (t['year'] < 2017)].groupby(['year', 'Country']).mean().unstack(1).plot(kind='bar', figsize=(16, 4))

Salaries seem to increase every year since 2011.

### Distribution of loan amount

In [None]:
sns.distplot(loandata['Amount'].astype(int), bins=50)

### Comparison of loan duration and loan amount

In [None]:
pd.options.mode.chained_assignment = None  # default='warn'
t = loandata[['Amount', 'LoanDuration']]
t['LoanDuration'] = t['LoanDuration'] // 12 * 12

In [None]:
grid = sns.boxplot(x="LoanDuration", y="Amount", data=t)

The longer the loan duration the higher the loan amount.

### Number of dependants vs age

In [None]:
p = loandata[['Age', 'NrOfDependants']]
p['DepNum'] = pd.to_numeric(loandata.NrOfDependants, errors='coerce')
p = p.dropna().astype(int)

In [None]:
grid = sns.lmplot(x="Age", y="NrOfDependants", data=p, fit_reg=False, size=6, aspect=3)

Number of decendants peaks around 40-45 years of age.

### Number of loans listed per year month

In [None]:
loandata['yearmonth'] = pd.to_datetime(loandata['ListedOnUTC']).dt.to_period('M')

In [None]:
loandata.groupby(['yearmonth', 'Country']).size().unstack(1).sort_index(ascending=True).fillna(0).plot(figsize=(16, 5))

## Summary

From the initial analysis we can see that the number of loans is definitely growing over time. This can be caused by a higher demand for loans or rise in popularity of Bondora. We can also see that, generally, the income of the borrowers increases over time. This is an expected behaviour as the countries, where Bondora operates, have seen an increase of average salary over the last years. There is also a visible linear dependency between the amount borrowed and loan duration -- the longer the loan the higher amount borrowed. On the other hand, we can see a non linear dependency between the age of the borrower and number of the dependants, gradually increasing from the age of 18, reaching peak between 40-45, and then gradually decreasing. Lastly, from the analysis of the loans listed per yearmonth, it is clearly visible, that Slovakian loans were listed only for a short period of time (mostly 2014) and since then borrowing from that country has been phased out.