## TPOT pipeline
#### Here we take the engineered data and we use it to run the TPOT pipeline in order to find the best combination between ML model and hyperparameters.
#### We start by creating the 1-D array containing the target values and the 2-D array with all the features. These are the formats required by TPOT for the target value and the input values.

In [1]:
import pandas as pd
import numpy as np

data = pd.read_json("../data/engineered/presences.json")

inputData = np.array(data[["room", "building", "city", "day", "month", "hour"]])

target = np.array(data["target"])

#### Firstly, we split the datasets in training and test sets with percentage 80% and 20% respectively
#### Secondly, we run the TPOT pipeline in order to find the best combination between ML model and hyperparameters.
#### In the end, we can run the exported pipeline generated by TPOT and test it against the test set.

In [None]:
from tpot import TPOTRegressor
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(inputData, target,
                                                    train_size=0.80, test_size=0.20)

tpot = TPOTRegressor(
    verbosity=2,
    warm_start=True
)

tpot.fit(X_train, y_train)
preds = tpot.predict(X_test)

tpot.score(X_test, y_test)

tpot.export('tpot_exported_pipeline.py')

np.savetxt("../predictions/preds.csv", preds, delimiter=",")