## Analysis

In [1]:
import pandas as pd

import matplotlib.pyplot as plt
plt.style.use('seaborn')
plt.style.use('seaborn-talk')

In [11]:
arima_predictions = pd.read_csv('data/arima_predictions', index_col='date')
arima_predictions.index = pd.to_datetime(arima_predictions.index)
arima_predictions.tail()

Unnamed: 0_level_0,65807,65802,65804,65810,65806,65809
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-01,137481.612848,100931.11478,168107.42912,208566.975,64758.46124,314575.448793
2020-01-01,138193.703838,101074.440374,169160.295731,208933.78968,64901.576122,316724.333984
2020-02-01,138907.295866,101217.932291,170215.629355,209301.077634,65046.137845,318882.407084
2020-03-01,139622.377104,101361.578804,171273.430893,209668.783798,65192.146407,321049.672687
2020-04-01,140338.9429,101505.374645,172333.700868,210036.867794,65339.60181,323226.133471


In [3]:
prophet_predictions = pd.read_csv('data/prophet_predictions.csv', index_col='date')
prophet_predictions.index = pd.to_datetime(arima_predictions.index)
prophet_predictions.tail()

Unnamed: 0_level_0,65807,65802,65804,65810,65806,65809
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-12-01,127070.233087,99174.442951,148194.53687,202212.222251,63707.912363,265616.752851
2020-01-01,127517.386168,99590.750794,148669.931055,202847.907779,63653.907614,266730.304083
2020-02-01,127809.488868,99904.390392,149031.206487,203300.8252,63584.750557,267542.519975
2020-03-01,128064.701521,100012.391682,149444.405223,203721.232062,63874.360932,268200.753897
2020-04-01,128651.011816,100255.124506,149841.237435,204064.850838,63873.122266,267778.466056


### test -Deric

In [4]:
arima_rmses = pd.read_csv('data/arima_rmses.csv', index_col=0)
fb_rmses = pd.read_csv('data/fb_rmses.csv', index_col=0)
fb_rmses.drop(columns=['zip_code'], inplace=True)

In [5]:
df_rmse = pd.concat([arima_rmses, fb_rmses], axis=1)

In [6]:
def best_model(row):
    if row['arima_rmse'] < row['fb_rmse']:
        val = 'arima'
    else:
        val = 'fb'
    return val

df_rmse['best_model'] = df_rmse.apply(best_model, axis=1)

In [7]:
df_rmse

Unnamed: 0,zip_code,arima_rmse,fb_rmse,best_model
0,65807,1731,3829,arima
1,65802,3342,3074,fb
2,65804,2229,4339,arima
3,65810,4222,6539,arima
4,65806,3073,3740,arima
5,65809,11553,13054,arima


> Important to note that the RMSE was lower on all ARIMA models except for the zip code 65802.

In [9]:
from scipy import stats
import datetime as dt

arima_predictions['date_ordinal'] = arima_predictions.index.map(dt.datetime.toordinal)
prophet_predictions['date_ordinal'] = prophet_predictions.index.map(dt.datetime.toordinal)

print('Arima slope')
for col in arima_predictions.columns[:-1]:
    slope = stats.linregress(arima_predictions['date_ordinal'], arima_predictions[col])[0]
    print(col, 'slope:', round(slope,2))
    
print('\n\nFb prophet average slope')    
for col in arima_predictions.columns[:-1]:
    slope = stats.linregress(prophet_predictions['date_ordinal'], prophet_predictions[col])[0]
    print(col, 'slope:', round(slope,2))

Arima slope
65807 slope: 22.76
65802 slope: 4.99
65804 slope: 34.19
65810 slope: 11.8
65806 slope: 4.32
65809 slope: 69.41


Fb prophet average slope
65807 slope: 12.9
65802 slope: 8.21
65804 slope: 11.88
65810 slope: 15.84
65806 slope: 4.36
65809 slope: 12.58


>Arima slope algorithm proves that the zip code with the greatest potential as:
><ol>
> <li>65809 slope: 69.41  (High Relative Rmse)</li> 
> <li>65804 slope: 34.19 </li>
> <li>65807 slope: 22.76 </li>
> <li>65810 slope: 11.80 </li>
> <li>65802 slope: 4.99 </li>
> <li>65806 slope: 4.32 </li>
> </ol>

>FB Prophet slope algorithm proves that the zip code with the greatest potential as:
> <ol>
> <li>65810 slope: 15.84 </li>
><li>65807 slope: 12.90 </li>
><li>65809 slope: 12.58 (High Relative Rmse) </li>
><li>65804 slope: 11.88 </li>
><li>65802 slope:  8.21 </li>
><li>65806 slope:  4.36 </li>
></ol>

## Conclusion

Based on the models, we have concluded that zip code 65804 would be the best zip code for the home renovation company to invest their money in. If focusing in the 65804 area, the company should see the average housing prices continuing to increase over the next two years. 

A few reasons we came to this decision:
<ul>
<li>65804 had the second highest slope in our ARIMA model besides 65809 (which was eliminated in the decision-making process due to a high relative RMSE)</ul>
<li>Had a lower RMSE in the ARIMA model compared to other zip codes</ul>
</ul>
