<h1 style="color:blue;">Scenario 5 notebook   </h1>     

- C1.S5.Py01	Create a Simple Regression Model Using statsmodels
- C1.S5.Py02	How to create residuals and export to a DataFrame
- C1.S5.Py03	Graphically looking at y and its residuals

In [None]:
#Code Block 1

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



#style options

%matplotlib inline
#if you want graphs to automatically without plt.show

pd.set_option('display.max_columns',500) #allows for up to 500 columns to be displayed when viewing a dataframe

plt.style.use('seaborn') #a style that can be used for plots - see style reference above



In [None]:
#Code Block 2
df = pd.read_csv('data/Scenario5.csv', index_col = 0, header=0)
    #DOES set the first column to the index
    # and the top row as the headers

In [None]:
#Code Block 3
df.head(3)

<h2 style="color:blue;">C1.S5.Py01 - Creating a Simple Regession Model using statsmodels</h2>    


### How to import and use statsmodels
https://anaconda.org/anaconda/statsmodels

In [None]:
#Code Block 4
import statsmodels
import statsmodels.api as sm

### Using a simple dataset (tips)

In [None]:
#Code Block 5
df_tips = sns.load_dataset('tips')
df_tips

In [None]:
#Code Block 6
plt.figure(figsize=(20,12))
sns.regplot(x='total_bill', y='tip', data = df_tips, scatter_kws={"color":"green","alpha":0.5,"s":200, "linewidth":2,"edgecolor":"white"},
           line_kws={'color': 'black'})

In [None]:
#Code Block 7
X = df_tips['total_bill']
y = df_tips['tip']
X = sm.add_constant(X) # adding a constant

reg_tips = sm.OLS(y, X).fit()

predictions = reg_tips.predict(X)
reg_tips.summary()

#### AIC and BIC penalizes you for complexity and rewards for the simplicity of the model.

In [None]:
#Code Block 8
plt.figure(figsize=(20,8))
sns.residplot(x='total_bill', y='tip', data = df_tips, scatter_kws={"color":"blue","alpha":0.5,"s":150, "linewidth":2,"edgecolor":"white"},
           line_kws={'color': 'black'})

### Look at Amount Funded and Interest Rate
- Does amount funded have a relationship with Interest rate?
- The independent variable (feature) = Amount Funded
- The dependent variable (target variable) = Interest Rate

In [None]:
#Code Block 9
X = df['Annual Income']
y = df['Interest Rate']
X = sm.add_constant(X) # adding a constant

reg = sm.OLS(y, X).fit()

predictions = reg.predict(X)
reg.summary()

<h2 style="color:blue;">C1.S5.Py02 - How to create residuals and export to a DataFrame</h2>    

### Manually calculating residuals

In [None]:
#Code Block 10
df_simplereg = df[['Amount Funded', 'Interest Rate']]
df_simplereg.head()

In [None]:
#Code Block 11
predictions = pd.DataFrame(predictions)
predictions=predictions.rename(columns = {0:'Interest_Pred'})
predictions.head()

In [None]:
#Code Block 12
df_simplereg = pd.concat([df_simplereg, predictions], axis=1)
df_simplereg['Calc_Residual'] = df_simplereg['Interest Rate'] - df_simplereg['Interest_Pred']
df_simplereg.head(10)

### Calculate residuals using statsmodels

In [None]:
#Code Block 13
resid = reg.resid
resid

In [None]:
#Code Block 14
df_simplereg = pd.concat([df_simplereg, resid], axis=1)
df_simplereg=df_simplereg.rename(columns = {0:'Residual'})
df_simplereg.head()

<h2 style="color:blue;">C1.S5.Py03 - Graphically looking at y and its residuals</h2>    

In [None]:
#Code Block 15
sns.regplot(x='Amount Funded', y='Interest Rate', data = df_simplereg, scatter_kws={"color":"green","alpha":0.15,"s":20},
           line_kws={'color': 'black'})

https://seaborn.pydata.org/generated/seaborn.residplot.html

In [None]:
#Code Block 16
plt.figure(figsize=(20,10)) #changes area of scatterplot
sns.residplot(x='Amount Funded', y='Interest Rate',
              data = df_simplereg, scatter_kws={"color":"blue","alpha":0.15, "s":100,"linewidth":2,"edgecolor":"white"},
              line_kws={'color': 'black'})

In [None]:
#Code Block 17
plt.figure(figsize=(20,10)) #changes area of scatterplot
sns.regplot(x='Amount Funded', y='Interest Rate', data = df, scatter_kws={"color":"grey","alpha":0.15,"s":150,"linewidth":2,"edgecolor":"white"},
           line_kws={'color': 'red'})
plt.title('Seaborn regplot for Amount Funded and Actual Interest Rate', color = 'green', fontsize='18')
plt.xlabel('Amount Funded', color = 'red', fontsize='14')
plt.ylabel('Actual Interest Rate', color = 'red', fontsize='14')

In [None]:
#Code Block 18
plt.figure(figsize=(20,10)) #changes area of scatterplot
sns.lmplot(x='Amount Funded', y='Interest Rate', hue="Home Ownership", data = df, palette="Set1",
           aspect = 2, scatter_kws={"alpha":0.15,"s":150,"linewidth":2,"edgecolor":"white"})
plt.title('Seaborn lmplot for Amount Funded and Actual Interest Rate', color = 'green', fontsize='18')
plt.xlabel('Amount Funded', color = 'red', fontsize='14')
plt.ylabel('Actual Interest Rate', color = 'red', fontsize='14')

In [None]:
#Code Block 19
plt.figure(figsize=(20,10)) #changes area of scatterplot
sns.lmplot(x='Amount Funded', y='Interest Rate', col="Home Ownership", data = df, palette="Set1",
           aspect = 2, scatter_kws={"alpha":0.15,"s":150,"linewidth":2,"edgecolor":"white"})

In [None]:
#Code Block 20
plt.figure(figsize=(20,10)) #changes area of scatterplot
sns.lmplot(x='Amount Funded', y='Interest Rate', col="Home Ownership", col_wrap=2, data = df, palette="Set1",
           aspect = 2, scatter_kws={"alpha":0.15,"s":150,"linewidth":2,"edgecolor":"white"}, line_kws={'color': 'red'})