# Understanding Logistic Regression Tables

Using the same code as in the previous exercise, try to interpret the summary table.

### More information about the dataset: 
Note that <i> interest rate</i> indicates the 3-month interest rate between banks and <i> duration </i> indicates the time since the last contact was made with a given consumer. The <i> previous </i> variable shows whether the last marketing campaign was successful with this customer. The <i>March</i> and <i> May </i> are Boolean variables that account for when the call was made to the specific customer and <i> credit </i> shows if the customer has enough credit to avoid defaulting.

<i> Notes: 
    <li> the first column of the dataset is an index one; </li>
    <li> you don't need the graph for this exercise; </li>
    <li> the dataset used is much bigger </li>
</i>

## Import the relevant libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
sns.set()

from scipy import stats
stats.chisqprob = lambda chisqp, df: stats.chi2.sf(chisq, df)

## Load the data

Load the ‘Bank_data.csv’ dataset.

In [2]:
data = pd.read_csv('Bank-data.csv')

In [3]:
data.head()

Unnamed: 0.1,Unnamed: 0,interest_rate,credit,march,may,previous,duration,y
0,0,1.334,0.0,1.0,0.0,0.0,117.0,no
1,1,0.767,0.0,0.0,2.0,1.0,274.0,yes
2,2,4.858,0.0,1.0,0.0,0.0,167.0,no
3,3,4.12,0.0,0.0,0.0,0.0,686.0,yes
4,4,4.856,0.0,1.0,0.0,0.0,157.0,no


In [5]:
new_data = data[['interest_rate','credit','previous','duration','y']]

clean_data = new_data.copy()

clean_data['y'] = clean_data['y'].map({'yes':0,'no':1})

clean_data.head()

Unnamed: 0,interest_rate,credit,previous,duration,y
0,1.334,0.0,0.0,117.0,1
1,0.767,0.0,1.0,274.0,0
2,4.858,0.0,0.0,167.0,1
3,4.12,0.0,0.0,686.0,0
4,4.856,0.0,0.0,157.0,1


### Declare the dependent and independent variables

Use 'duration' as the independent variable.

In [23]:
y = clean_data['y']
x1 = clean_data[['interest_rate','credit','previous','duration']]

### Simple Logistic Regression

Run the regression.

In [24]:
x = sm.add_constant(x1)
reg_log = sm.Logit(y,x)
results_log = reg_log.fit()

Optimization terminated successfully.
         Current function value: 0.370460
         Iterations 7


### Interpretation

In [25]:
results_log.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,518.0
Model:,Logit,Df Residuals:,513.0
Method:,MLE,Df Model:,4.0
Date:,"Tue, 28 May 2019",Pseudo R-squ.:,0.4655
Time:,09:55:31,Log-Likelihood:,-191.9
converged:,True,LL-Null:,-359.05
,,LLR p-value:,4.291e-71

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
const,0.6332,0.274,2.313,0.021,0.097,1.170
interest_rate,0.7028,0.082,8.520,0.000,0.541,0.864
credit,-2.8717,1.080,-2.659,0.008,-4.988,-0.755
previous,-1.5894,0.479,-3.316,0.001,-2.529,-0.650
duration,-0.0066,0.001,-9.389,0.000,-0.008,-0.005


In [26]:
results_log.pred_table()

array([[226.,  33.],
       [ 43., 216.]])

In [27]:
cm_df = pd.DataFrame(results_log.pred_table())
cm_df.columns = ['Predicted 0','Predicted 1']
cm_df = cm_df.rename(index={0:'Actual 0',1:'Actual 1'})

cm_df.head()

Unnamed: 0,Predicted 0,Predicted 1
Actual 0,226.0,33.0
Actual 1,43.0,216.0


In [29]:
pred_good = 226 + 216
pred_bad = 43 + 33
com = pred_good + pred_bad

In [30]:
pred_ = pred_good/com * 100

In [31]:
pred_

85.32818532818533