# CS4001/4042 Assignment 1, Part B, Q2
In Question B1, we used the Category Embedding model. This creates a feedforward neural network in which the categorical features get learnable embeddings. In this question, we will make use of a library called Pytorch-WideDeep. This library makes it easy to work with multimodal deep-learning problems combining images, text, and tables. We will just be utilizing the deeptabular component of this library through the TabMlp network:

In [4]:
!pip install pytorch-widedeep


In [5]:
SEED = 42

import os

import random
random.seed(SEED)

import numpy as np
np.random.seed(SEED)

import pandas as pd

from pytorch_widedeep.preprocessing import TabPreprocessor
from pytorch_widedeep.models import TabMlp, WideDeep
from pytorch_widedeep import Trainer
from pytorch_widedeep.metrics import R2Score


>Divide the dataset (‘hdb_price_prediction.csv’) into train and test sets by using entries from the year 2020 and before as training data, and entries from 2021 and after as the test data.

In [6]:
num_features = [
    "dist_to_nearest_stn",
    "dist_to_dhoby",
    "degree_centrality",
    "eigenvector_centrality",
    "remaining_lease_years",
    "floor_area_sqm",
]

cat_features = [
    "month",
    "town",
    "flat_model_type",
    "storey_range",
]

features = num_features + cat_features

targets = ["resale_price"]

df = pd.read_csv("hdb_price_prediction.csv")

df_train = df[df["year"] <= 2020]
df_test = df[df["year"] >= 2021]

train = df_train[features + targets]
test = df_test[features + targets]


>Refer to the documentation of Pytorch-WideDeep and perform the following tasks:
https://pytorch-widedeep.readthedocs.io/en/latest/index.html
* Use [**TabPreprocessor**](https://pytorch-widedeep.readthedocs.io/en/latest/examples/01_preprocessors_and_utils.html#2-tabpreprocessor) to create the deeptabular component using the continuous
features and the categorical features. Use this component to transform the training dataset.
* Create the [**TabMlp**](https://pytorch-widedeep.readthedocs.io/en/latest/pytorch-widedeep/model_components.html#pytorch_widedeep.models.tabular.mlp.tab_mlp.TabMlp) model with 2 linear layers in the MLP, with 200 and 100 neurons respectively.
* Create a [**Trainer**](https://pytorch-widedeep.readthedocs.io/en/latest/pytorch-widedeep/trainer.html#pytorch_widedeep.training.Trainer) for the training of the created TabMlp model with the root mean squared error (RMSE) cost function. Train the model for 100 epochs using this trainer, keeping a batch size of 64. (Note: set the *num_workers* parameter to 0.)

In [12]:
tab_preprocessor = TabPreprocessor(
    embed_cols=cat_features, continuous_cols=num_features
)
X_tab = tab_preprocessor.fit_transform(train)

tab_mlp = TabMlp(
    column_idx=tab_preprocessor.column_idx,
    cat_embed_input=tab_preprocessor.cat_embed_input,
    continuous_cols=num_features,
    mlp_hidden_dims=[200, 100],
)
model = WideDeep(deeptabular=tab_mlp)

trainer = Trainer(
    model,
    objective="root_mean_squared_error",
    num_workers=0,
    seed=SEED,
    device="cuda",
)
trainer.fit(
    X_tab=X_tab, target=train["resale_price"].values, n_epochs=100, batch_size=64
)


epoch 1: 100%|██████████| 1366/1366 [00:10<00:00, 129.80it/s, loss=2.3e+5] 
epoch 2: 100%|██████████| 1366/1366 [00:09<00:00, 136.83it/s, loss=9.89e+4]
epoch 3: 100%|██████████| 1366/1366 [00:09<00:00, 138.64it/s, loss=8.62e+4]
epoch 4: 100%|██████████| 1366/1366 [00:09<00:00, 142.02it/s, loss=7.93e+4]
epoch 5: 100%|██████████| 1366/1366 [00:09<00:00, 142.73it/s, loss=7.55e+4]
epoch 6: 100%|██████████| 1366/1366 [00:09<00:00, 142.57it/s, loss=7.29e+4]
epoch 7: 100%|██████████| 1366/1366 [00:09<00:00, 141.89it/s, loss=7.14e+4]
epoch 8: 100%|██████████| 1366/1366 [00:10<00:00, 129.35it/s, loss=6.99e+4]
epoch 9: 100%|██████████| 1366/1366 [00:10<00:00, 129.79it/s, loss=6.9e+4] 
epoch 10: 100%|██████████| 1366/1366 [00:10<00:00, 136.32it/s, loss=6.82e+4]
epoch 11: 100%|██████████| 1366/1366 [00:09<00:00, 140.90it/s, loss=6.75e+4]
epoch 12: 100%|██████████| 1366/1366 [00:11<00:00, 121.07it/s, loss=6.71e+4]
epoch 13: 100%|██████████| 1366/1366 [00:09<00:00, 142.84it/s, loss=6.64e+4]
epoch 14

>Report the test RMSE and the test R2 value that you obtained.

In [17]:
r2_score = R2Score()

X_tab_te = tab_preprocessor.transform(test)
y_pred = trainer.predict(X_tab=X_tab_te)

y_test = np.array(test["resale_price"])
y_pred = np.array(y_pred)

rmse = np.sqrt(np.sum((y_test - y_pred) ** 2) / len(y_test))
r2 = r2_score(y_pred, y_test)

print(f"Test RMSE: {rmse}")
print(f"Test R2: {r2}")


predict: 100%|██████████| 1128/1128 [00:03<00:00, 359.93it/s]

Test RMSE: 97072.19440824342
Test R2: 0.6707784730545674



