# Modeling

Now that we have explored our data, and cleaned it up into an acceptable format, we are ready to run some modeling techniques on it. We will begin by building multiple generic models on the data set to see how well it performs. After this, we will build a Neural Network to see if we can improve the performance from the baseline models.

In [1]:
#Import Libraries
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
import matplotlib.pyplot as plt

### Loading Dataset

This dataset is already cleaned up and preprocessed. It is expected that this dataset has the following columns:
* SALEPRICE
* PROPERTYZIP

> NOTE/TODO: Currently, we don't do anything with the PROPERTYZIP. We plan to build a model for each zipcode, but that is pending our preliminary results (to see if we need to even attempt such a thing).

In [2]:
# Loading Data
data = pd.read_csv('preprocess.csv')

# Renaming columns to not include brackets, spaces, or commas
column_mapping = {}
for col in list(data.columns):

    new_col = col.replace(']', ')')
    new_col = new_col.replace('[', '(')
    new_col = new_col.replace(', ', '-')
    new_col = new_col.replace(' ', '')
    column_mapping[col] = new_col

data = data.rename(columns=column_mapping)

In [3]:
data.head()

Unnamed: 0,PROPERTYZIP,LOTAREA,SALEPRICE,COUNTYBUILDING,COUNTYLAND,COUNTYTOTAL,LOCALBUILDING,LOCALLAND,LOCALTOTAL,FAIRMARKETBUILDING,FAIRMARKETLAND,FAIRMARKETTOTAL,YEARBLT,TOTALROOMS,BEDROOMS,FULLBATHS,HALFBATHS,FIREPLACES,BSMTGARAGE,FINISHEDLIVINGAREA,TAXDESC_10-Exempt,TAXDESC_12-PublicUtilityRealtyTax-PURTA,TAXDESC_20-Taxable,TAXDESC_nan,HOMESTEADFLAG_HOM,HOMESTEADFLAG_nan,CLEANGREEN_Y,CLEANGREEN_nan,ABATEMENTFLAG_Y,ABATEMENTFLAG_nan,PREVSALEPRICE_(-0.001-1.0),PREVSALEPRICE_(1.0-17000.0),PREVSALEPRICE_(17000.0-36000.0),PREVSALEPRICE_(36000.0-51456.0),PREVSALEPRICE_(51456.0-66500.0),PREVSALEPRICE_(66500.0-83000.0),PREVSALEPRICE_(83000.0-105000.0),PREVSALEPRICE_(105000.0-131700.0),PREVSALEPRICE_(131700.0-170000.0),PREVSALEPRICE_(170000.0-255000.0),PREVSALEPRICE_(255000.0-55000000.0),PREVSALEPRICE_nan,PREVSALEPRICE2_(-0.001-1.0),PREVSALEPRICE2_(1.0-8367.6),PREVSALEPRICE2_(8367.6-30000.0),PREVSALEPRICE2_(30000.0-47000.0),PREVSALEPRICE2_(47000.0-62000.0),PREVSALEPRICE2_(62000.0-78000.0),PREVSALEPRICE2_(78000.0-95000.0),PREVSALEPRICE2_(95000.0-123000.0),PREVSALEPRICE2_(123000.0-159900.0),PREVSALEPRICE2_(159900.0-239900.0),PREVSALEPRICE2_(239900.0-51500000.0),PREVSALEPRICE2_nan,OWNERDESC_CORPORATION,OWNERDESC_CORPORATION-RAILROAD,OWNERDESC_REGULAR,OWNERDESC_REGULAR-ESTATE,OWNERDESC_REGULAR-ETAL,OWNERDESC_REGULAR-ETUXORETVIR,OWNERDESC_nan,CLASSDESC_AGRICULTURAL,CLASSDESC_COMMERCIAL,CLASSDESC_GOVERNMENT,CLASSDESC_INDUSTRIAL,CLASSDESC_OTHER,CLASSDESC_RESIDENTIAL,CLASSDESC_UTILITIES,CLASSDESC_nan,STYLEDESC_BI-LEVEL,STYLEDESC_BUNGALOW,STYLEDESC_CAPECOD,STYLEDESC_COLONIAL,STYLEDESC_CONDO,STYLEDESC_CONDOCONV,STYLEDESC_CONDOEND,STYLEDESC_CONDOGRDN,STYLEDESC_CONDOHR,STYLEDESC_CONDOINT,STYLEDESC_CONDOPAT/CARG,STYLEDESC_CONDOSINGLE,STYLEDESC_CONTEMPORARY,STYLEDESC_CONVENTIONAL,STYLEDESC_LOGCABIN,STYLEDESC_MANUFACTURED,STYLEDESC_MODULARHOME,STYLEDESC_MULTI-FAMILY,STYLEDESC_OLDSTYLE,STYLEDESC_OTHER,STYLEDESC_RANCH,STYLEDESC_ROWEND,STYLEDESC_ROWINTERIOR,STYLEDESC_SEMIDETACHED,STYLEDESC_SPLITLEVEL,STYLEDESC_TOWNHOUSE,STYLEDESC_TUDOR,STYLEDESC_VICTORIAN,STYLEDESC_nan,STORIES_1.0,STORIES_1.5,STORIES_2.0,STORIES_2.5,STORIES_3.0,STORIES_4.0,STORIES_nan,EXTFINISH_DESC_Brick,EXTFINISH_DESC_Concrete,EXTFINISH_DESC_ConcreteBlock,EXTFINISH_DESC_Frame,EXTFINISH_DESC_Log,EXTFINISH_DESC_MasonryFRAME,EXTFINISH_DESC_Stone,EXTFINISH_DESC_Stucco,EXTFINISH_DESC_nan,ROOFDESC_METAL,ROOFDESC_ROLL,ROOFDESC_RUBBER,ROOFDESC_SHINGLE,ROOFDESC_SLATE,ROOFDESC_TILE,ROOFDESC_nan,BASEMENTDESC_Crawl,BASEMENTDESC_Full,BASEMENTDESC_None,BASEMENTDESC_Part,BASEMENTDESC_Slab/Piers,BASEMENTDESC_nan,GRADEDESC_AVERAGE,GRADEDESC_AVERAGE+,GRADEDESC_AVERAGE-,GRADEDESC_BELOWAVERAGE,GRADEDESC_BELOWAVERAGE+,GRADEDESC_BELOWAVERAGE-,GRADEDESC_EXCELLENT,GRADEDESC_EXCELLENT+,GRADEDESC_EXCELLENT-,GRADEDESC_GOOD,GRADEDESC_GOOD+,GRADEDESC_GOOD-,GRADEDESC_HIGHESTCOST-,GRADEDESC_HighestCost,GRADEDESC_HighestCost+,GRADEDESC_POOR,GRADEDESC_POOR+,GRADEDESC_POOR-,GRADEDESC_VERYGOOD,GRADEDESC_VERYGOOD+,GRADEDESC_VERYGOOD-,GRADEDESC_nan,CONDITIONDESC_AVERAGE,CONDITIONDESC_EXCELLENT,CONDITIONDESC_FAIR,CONDITIONDESC_GOOD,CONDITIONDESC_POOR,CONDITIONDESC_UNSOUND,CONDITIONDESC_VERYGOOD,CONDITIONDESC_VERYPOOR,CONDITIONDESC_nan,CDUDESC_AVERAGE,CDUDESC_EXCELLENT,CDUDESC_FAIR,CDUDESC_GOOD,CDUDESC_POOR,CDUDESC_UNSOUND,CDUDESC_VERYGOOD,CDUDESC_VERYPOOR,CDUDESC_nan,HEATINGCOOLINGDESC_CentralHeat,HEATINGCOOLINGDESC_CentralHeatwithAC,HEATINGCOOLINGDESC_Electric,HEATINGCOOLINGDESC_ElectricHeatwithAC,HEATINGCOOLINGDESC_FloorFurnace,HEATINGCOOLINGDESC_FloorFurnacewithAC,HEATINGCOOLINGDESC_HeatPump,HEATINGCOOLINGDESC_HeatPumpwithAC,HEATINGCOOLINGDESC_NoHeatbutwithAC,HEATINGCOOLINGDESC_None,HEATINGCOOLINGDESC_Other,HEATINGCOOLINGDESC_UnitHeat,HEATINGCOOLINGDESC_UnitHeatwithAC,HEATINGCOOLINGDESC_WallFurnace,HEATINGCOOLINGDESC_WallFurnacewithAC,HEATINGCOOLINGDESC_nan
0,15222.0,0,525000.0,325400,0,325400,343400,0,343400,343400,0,343400,13.0,3.0,2.0,2.0,0.0,0.0,1.0,1520.0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,15222.0,0,350000.0,325400,0,325400,343400,0,343400,343400,0,343400,13.0,4.0,1.0,1.0,1.0,0.0,1.0,1578.0,0,0,1,0,1,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,15222.0,0,535000.0,468400,0,468400,468400,0,468400,468400,0,468400,13.0,5.0,2.0,2.0,0.0,0.0,1.0,1763.0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,15222.0,0,394000.0,303200,0,303200,303200,0,303200,303200,0,303200,13.0,3.0,1.0,1.0,1.0,0.0,1.0,1315.0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,15222.0,2024,255000.0,177500,155900,333400,177500,155900,333400,177500,155900,333400,68.0,7.0,3.0,2.0,1.0,0.0,1.0,1722.0,0,0,1,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1


## Train/Validation/Test Split

We will split our dataset into three sets:
* Training - (77%)
* Validation - (16.5%)
* Testing - (16.5%)

In [4]:
from sklearn.model_selection import train_test_split

X = data.drop(['PROPERTYZIP', 'SALEPRICE'], axis=1)
Y = data['SALEPRICE']

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.33, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, test_size=0.5, random_state=42)

print("Training Shape: ", X_train.shape)
print("Validation Shape: ", X_val.shape)
print("Testing Shape: ", X_test.shape)

Training Shape:  (111731, 181)
Validation Shape:  (27516, 181)
Testing Shape:  (27516, 181)


# Standard Modeling
We will build some standard models on our data and see which model performs the best. The following models will be implemented:

* KNearest Neighbors
* Support Vector Machine
* Gradient Boosted Decision Tree

## Testing Function
Here we will define a function that accepts a model, parameters, and data. This model will build the model and test it.

This function will be useful for testing all of our models.

In [5]:
from sklearn.metrics import mean_squared_error
import time

def test_model(model, params, model_name, X_train, y_train, X_test, Y_test):
    """Trains model, and evaluates it.
        PARAMS:
            model_choice - SKLearn Model: Model to be trained
            params - dictionary: Dictionary of parameters to feed the model
            X_train - DataFrame: Training Data, Features
            y_train - DataFrame: Training Data, Targets
            X_test - DataFrame: Testing Data, Features
            y_test - DataFrame: Testing Data, Targets
        
        RETURNS:
            Accuracy - float: Accuracy for specified model and parameters. 
    """
    print("Begin ", model_name)
    start = time.time()

    # Run model and get predictions and accuracy
    clf = model(**params)
    clf.fit(X_train, y_train)
    
    predictions = clf.predict(X_test)
    rmse = mean_squared_error(Y_test, predictions) ** 0.5

    print(f"{model_name} - RMSE: ", rmse)
    
    print("Time Elapsed: ", time.time() - start)
    
    return (rmse, clf, model_name)

## Testing Models
We will run all of the following models through the testing function:
* KNearestNeighbors
* Linear Support Vector Machine
* GradientBoostingRegressor

Each of these models will be trained on the `Training Set` and evaluated on the `Validation Set`. The model that performs the best on the evaluation set (lowest `RMSE`), will be promoted to the next phase of model tuning (Hyper Parameter Tuning).

In [6]:
from sklearn.svm import LinearSVR
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import GradientBoostingRegressor

knc_params = {}
svr_params = {}
gbr_params = {}

models_to_test = [
    (KNeighborsClassifier, knc_params, "KNeighbors"),
    (LinearSVR, svr_params, "Support Vector Machine"),
    (GradientBoostingRegressor, gbr_params, "GradientBoostingRegressor")
]

results = []
for model in models_to_test:
    results += [test_model(model[0], model[1], model[2], X_train, y_train, X_val, y_val)]

Begin  KNeighbors
KNeighbors - RMSE:  396224.9408896203
Time Elapsed:  17.702276468276978
Begin  Support Vector Machine




Support Vector Machine - RMSE:  1218827.0140549217
Time Elapsed:  55.24514102935791
Begin  GradientBoostingRegressor
GradientBoostingRegressor - RMSE:  247195.02580891573
Time Elapsed:  79.86917090415955


In [7]:
# Sort results by the best accuracy.
results.sort(key = lambda x: x[0], reverse=False)

print(results)

[(247195.02580891573, GradientBoostingRegressor(alpha=0.9, ccp_alpha=0.0, criterion='friedman_mse',
                          init=None, learning_rate=0.1, loss='ls', max_depth=3,
                          max_features=None, max_leaf_nodes=None,
                          min_impurity_decrease=0.0, min_impurity_split=None,
                          min_samples_leaf=1, min_samples_split=2,
                          min_weight_fraction_leaf=0.0, n_estimators=100,
                          n_iter_no_change=None, presort='deprecated',
                          random_state=None, subsample=1.0, tol=0.0001,
                          validation_fraction=0.1, verbose=0, warm_start=False), 'GradientBoostingRegressor'), (396224.9408896203, KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,
                     weights='uniform'), 'KNeighbors'), (1218827.0140549217, LinearSVR(C=1.0, dual=True, epsilon=0

# Hyper Parameter Tuning

The `GradientBoostedRegressor` had the best performance.

We will now run a `Randomized Grid Search` on it to see which parameters might be most effective. We choose to run Randomized Grid Search instead of a pure Grid Search because running a pure Grid Search would take too much time. In the future, we may want to consider expanding the grid, and trying all combinations.

> NOTE: Because we are running a Randomized Grid Search, we will likely not get the best results. However, because of it's random nature, we are likely to get a 'good' result.

In [8]:
from sklearn.model_selection import RandomizedSearchCV

parameters = { 
    'learning_rate':[0.01,0.1,0.2],
    'n_estimators':[50,100,150,200,250,300],
    'subsample':[0.75,0.9,1.0],
    'min_samples_split':[2,4,6],
    'min_samples_leaf':[1,3,5,7],
    'max_depth':[3,5,7],
    'max_features':(None,'sqrt','log2'),
    'alpha':[0.85,0.9,0.95]
}

start = time.time()

clf_rand = RandomizedSearchCV(GradientBoostingRegressor(), parameters, n_iter=10, n_jobs=4, verbose=10)

clf_rand.fit(X_train, y_train)

print("Time Elapsed: ", time.time() - start)

print("Best parameters: ", clf_rand.best_params_)
print("Best Score: ", clf_rand.best_score_)

Fitting 5 folds for each of 10 candidates, totalling 50 fits


[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=4)]: Done   5 tasks      | elapsed:  4.0min
[Parallel(n_jobs=4)]: Done  10 tasks      | elapsed:  5.0min
[Parallel(n_jobs=4)]: Done  17 tasks      | elapsed:  5.8min
[Parallel(n_jobs=4)]: Done  24 tasks      | elapsed:  9.1min
[Parallel(n_jobs=4)]: Done  33 tasks      | elapsed: 13.3min
[Parallel(n_jobs=4)]: Done  42 tasks      | elapsed: 14.5min
[Parallel(n_jobs=4)]: Done  50 out of  50 | elapsed: 22.1min finished


Time Elapsed:  1362.1772680282593
Best parameters:  {'subsample': 0.9, 'n_estimators': 200, 'min_samples_split': 4, 'min_samples_leaf': 3, 'max_features': 'sqrt', 'max_depth': 7, 'learning_rate': 0.01, 'alpha': 0.95}
Best Score:  0.43542794772197907


## Testing Model

We will evaluate the performance of the `GradientBoostedRegressor` model on the 
* Training Set
* Validation Set
* Testing Set

Understanding how the RMSE varies within each of the data sets will help us understand how well the model is generalizing to new data.

> NOTE: We will only using the results on the `testing` for determining to accept our decline our model. 

In [10]:
from sklearn.metrics import mean_squared_error

start = time.time()

clf_gbr = GradientBoostingRegressor(**clf_rand.best_params_).fit(X_train, y_train)

print("Time Elapsed: ", time.time() - start)

predictions = clf_gbr.predict(X_train)
rmse = mean_squared_error(y_train, predictions) ** 0.5
print("Training Root Mean Squared Error: ", rmse)

predictions = clf_gbr.predict(X_val)
rmse = mean_squared_error(y_val, predictions) ** 0.5
print("Validation Root Mean Squared Error: ", rmse)

predictions = clf_gbr.predict(X_test)
rmse = mean_squared_error(y_test, predictions) ** 0.5
print("Testing Root Mean Squared Error: ", rmse)

Time Elapsed:  31.564534187316895
Training Root Mean Squared Error:  384124.43514979706
Validation Root Mean Squared Error:  249198.16436027107
Testing Root Mean Squared Error:  313687.4409712059


In [11]:
y_test = y_test.reset_index(drop=True)
pd.concat([pd.Series(predictions), y_test],axis=1).head(5)

Unnamed: 0,0,SALEPRICE
0,67774.353542,26000.0
1,134665.280979,166500.0
2,116918.655634,139900.0
3,91064.530723,75000.0
4,153064.692673,150000.0


## Standard Model Builds Results

We tested the following models:
* KNearestNeighbors
* GradientBoostedRegressor
* Linear Support Vector Machine 

The best performing model was the `GradientBoostedRegressor`. Once we found the best performing model, we implemented a `Randomized Grid Search` which aimed to explore the grid randomly and find the best parameters. Although these results won't be perfect, they will be good. Once we found a set of 'good' parameters, we tested our predictions on our Training, Validation, and Testing set.

Our testing set had a RMSE of `313687.44`. This indicates that we still have a lot of room for improvements, so we might want to attempt a different approach.

# Neural Network using PyTorch

The GradientBoostedRegressor didn't perform very well, so we are opting to try out a `Neural Network` using  `PyTorch`. Neural Networks are good at understanding patterns in raw data, so we might see better a perfomance here.

## Defining the Neural Network

PyTorch allows you to create a python class that extends the torch.nn.Module class. This allows us to define a Neural Network.

This example was inspired from the following source:
https://medium.com/@benjamin.phillips22/simple-regression-with-neural-networks-in-pytorch-313f06910379

In [12]:
import torch
import torch.nn.functional as F

# this is one way to define a network
class Net(torch.nn.Module):
    def __init__(self, n_feature, n_hidden, n_output):
        super(Net, self).__init__()
        self.hidden = torch.nn.Linear(n_feature, n_hidden)   # hidden layer
        self.predict = torch.nn.Linear(n_hidden, n_output)   # output layer

    def forward(self, x):
        x = F.relu(self.hidden(x))      # activation function for hidden layer
        x = self.predict(x)             # linear output
        return x

## Fixing the Data
The data is currently in a Pandas DataFrame. We want to convert this data to a Tensor object.

In [13]:
def df_to_tensor(df):
    """Converts a pandas DataFrame object to a PyTorch tensor object"""
    return torch.tensor(df.values)

# Format data correctly.
X_train_tensor = df_to_tensor(X_train).float()
Y_train_tensor = df_to_tensor(y_train).float()
Y_train_tensor = Y_train_tensor.reshape(Y_train_tensor.shape[0], 1)

# Training the Neural Network

Here, we will train the Neural Network.

We are using `100 Hidden Units`.

We are choosing to use the `Adam` optimizer and choose to minimize the `Mean Squared Error`.

We will train the network using `200` epochs.

In [16]:
# Instantiate PyTorch Neural Network
net = Net(n_feature=X_train_tensor.shape[1], n_hidden=100, n_output=1)     # define the network
optimizer = torch.optim.Adam(net.parameters(), lr=0.001)
loss_func = torch.nn.MSELoss()  # this is for regression mean squared loss

start = time.time()

# train the network
for t in range(200):
  
    prediction = net(X_train_tensor)     # input x and predict based on x
    
    loss = loss_func(prediction, Y_train_tensor)     # must be (1. nn output, 2. target)

    optimizer.zero_grad()   # clear gradients for next train
    loss.backward()         # backpropagation, compute gradients
    optimizer.step()        # apply gradients

print("Time Elapsed: ", time.time() - start)

Time Elapsed:  93.22580480575562


In [18]:
# Training Predictions
X_train_tensor = df_to_tensor(X_train).float()

train_prediction = net(X_train_tensor)
train_prediction = train_prediction.data.numpy().reshape(train_prediction.shape[0])

train_rmse = mean_squared_error(y_train, train_prediction) ** 0.5
print("Training Root Mean Squared Error: ", train_rmse)

# Validation Predictions
X_val_tensor = df_to_tensor(X_val).float()

val_prediction = net(X_val_tensor)
val_prediction = val_prediction.data.numpy().reshape(val_prediction.shape[0])

val_rmse = mean_squared_error(y_val, val_prediction) ** 0.5
print("Validation Root Mean Squared Error: ", val_rmse)

# Testing Predictions
X_test_tensor = df_to_tensor(X_test).float()

test_prediction = net(X_test_tensor)
test_prediction = test_prediction.data.numpy().reshape(test_prediction.shape[0])

test_rmse = mean_squared_error(y_test, test_prediction) ** 0.5
print("Testing Root Mean Squared Error: ", test_rmse)

Training Root Mean Squared Error:  458942.9851471995
Validation Root Mean Squared Error:  435207.9854275888
Testing Root Mean Squared Error:  293786.02337937144


## Neural Network Results

We seem to get lucky, as the `RMSE` is a lot worse on the Training and Validation data. We can see that our Testing RMSE performs **better** than the `GradientBoostedRegressor`, so if we had to accept one of the models, we might choose the Neural Netowrk.

However, these results are still really poor. We should look to improve these results more by:
* Cleaning our data set more
* Trying different Standard Models
* Playing around with the Neural Network settings more