# Multivariate: https://courses.thinkful.com/data-001v2/project/2.5.2

1. Load the Lending Club Statistics.

2. Use income (annual_inc) to model interest rates (int_rate).

3. Add home ownership (home_ownership) to the model.

4. Does that affect the significance of the coefficients in the original model?

5. Try to add the interaction of home ownership and incomes as a term. How does this impact the new model?


## 1. Load the Lending Club Statistics.

In [2]:
# Data is a local copy of https://resources.lendingclub.com/LoanStats3d.csv.zip

import numpy as np
import pandas as pd
import statsmodels.api as sm

data = pd.read_csv('LoanStats3d.csv', skiprows=[0], low_memory=False)

# Could treat mortgage as ownership
data['home_ownership'] = data['home_ownership'].map(lambda x: 1 if x=='OWN' else 0)

data.dropna(subset=('annual_inc', 'int_rate', 'home_ownership'), inplace=True)

data['annual_inc'] = data['annual_inc'].astype(float)
data['int_rate'] = [float(str(e).strip('%')) for e in data['int_rate']]

## 2. Use income (annual_inc) to model interest rates (int_rate).

In [3]:
X = data[['annual_inc']]
y = data['int_rate']

X = sm.add_constant(X)
est = sm.OLS(y, X).fit()

est.summary()

0,1,2,3
Dep. Variable:,int_rate,R-squared:,0.009
Model:,OLS,Adj. R-squared:,0.009
Method:,Least Squares,F-statistic:,2726.0
Date:,"Sun, 08 May 2016",Prob (F-statistic):,0.0
Time:,14:33:16,Log-Likelihood:,-839640.0
No. Observations:,290591,AIC:,1679000.0
Df Residuals:,290589,BIC:,1679000.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,13.2418,0.012,1078.643,0.000,13.218 13.266
annual_inc,-6.336e-06,1.21e-07,-52.214,0.000,-6.57e-06 -6.1e-06

0,1,2,3
Omnibus:,15431.694,Durbin-Watson:,1.99
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18215.084
Skew:,0.577,Prob(JB):,0.0
Kurtosis:,3.417,Cond. No.,154000.0


## 3. Add home ownership (home_ownership) to the model.

In [4]:
X = data[['annual_inc', 'home_ownership']]
y = data['int_rate']

X = sm.add_constant(X)
est = sm.OLS(y, X).fit()

est.summary()

0,1,2,3
Dep. Variable:,int_rate,R-squared:,0.009
Model:,OLS,Adj. R-squared:,0.009
Method:,Least Squares,F-statistic:,1364.0
Date:,"Sun, 08 May 2016",Prob (F-statistic):,0.0
Time:,14:33:18,Log-Likelihood:,-839640.0
No. Observations:,290591,AIC:,1679000.0
Df Residuals:,290588,BIC:,1679000.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,13.2371,0.013,1046.879,0.000,13.212 13.262
annual_inc,-6.331e-06,1.21e-07,-52.158,0.000,-6.57e-06 -6.09e-06
home_ownership,0.0406,0.026,1.553,0.120,-0.011 0.092

0,1,2,3
Omnibus:,15418.052,Durbin-Watson:,1.99
Prob(Omnibus):,0.0,Jarque-Bera (JB):,18194.333
Skew:,0.577,Prob(JB):,0.0
Kurtosis:,3.416,Cond. No.,330000.0


## 4. Does that affect the significance of the coefficients in the original model?

## 5. Try to add the interaction of home ownership and incomes as a term. How does this impact the new model?

In [6]:
X = data[['annual_inc']]
y = data['home_ownership']

X = sm.add_constant(X)
# est = sm.Logit(y, X).fit() # What I had before. Surely a mistake?
est = sm.OLS(y, X).fit()

est.summary()

0,1,2,3
Dep. Variable:,home_ownership,R-squared:,0.001
Model:,OLS,Adj. R-squared:,0.001
Method:,Least Squares,F-statistic:,185.8
Date:,"Sun, 08 May 2016",Prob (F-statistic):,2.7699999999999997e-42
Time:,14:33:39,Log-Likelihood:,-71028.0
No. Observations:,290591,AIC:,142100.0
Df Residuals:,290589,BIC:,142100.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5
,coef,std err,t,P>|t|,[95.0% Conf. Int.]
const,0.1159,0.001,132.980,0.000,0.114 0.118
annual_inc,-1.174e-07,8.62e-09,-13.629,0.000,-1.34e-07 -1.01e-07

0,1,2,3
Omnibus:,139999.626,Durbin-Watson:,1.995
Prob(Omnibus):,0.0,Jarque-Bera (JB):,554053.277
Skew:,2.542,Prob(JB):,0.0
Kurtosis:,7.463,Cond. No.,154000.0
