# References

- Vectice Documentation: https://docs.vectice.com/
- Vectice API Documentation: https://api-docs.vectice.com/

## We assume in this exercise that we want to retrain the model because the variable 'postal code' was left accidentally inside our initial modeling dataset

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Install the latest Vectice Python client library

In [None]:
%pip install --q vectice -U
%pip install category_encoders

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from category_encoders import OneHotEncoder

## Get started by connecting to Vectice

You can learn more about the `Connection` object in the [documentation](https://api-docs.vectice.com/reference/vectice/connection/)

<div class="alert alert-block alert-warning">
<b>Code Lineage:</b> If there is no repository found, the `.git folder`, then there will be no code lineage tracked. To track code, you can git clone a repository and run the notebook within it.
</div>

**First, we need to install and authenticate ourselves to the Vectice server. Before proceeding further:**

- Visit the Vectice app (https://app.vectice.com/account/api-keys) to create and download an API token, name the file as "My Token"

- Upload the file to Colab by clicking on the "folder" icon on the left-hand taskbar and selecting "Upload to Session Storage"

- Then you execute

In [None]:
import vectice as vect

vec = vect.connect(config="My-token.json") # Put your own token as explained in the Tutorial 

## Specify which project phase you want to document
In Vectice UI, navigate to your personal workspace, inside your default Tutorial project go to the Model Retraining phase you just created and copy paste your Phase Id below.

In [None]:
phase = vec.phase("PHA-xxxx") # Put your own Model Retraining phase ID

## Next we are going to create an iteration
An iteration allows you to organize your work in repeatable sequences of steps. You can have multiple iteration within a phase

In [None]:
model_iteration = phase.create_iteration()

## Retrieve your cleaned dataset version previously created in phase Data Preparation for your lineage
You can retrieve a variety of Vectice Objects with the `browse('VECTICE-ID')` method. Namely, Phases, Iterations, Datasets, Models etc...

Inside your project, go to the Dataset tab and look for "ProductSales Cleaned" to get the Vectice ID. This is the Wrapped Dataset created in the Data Preparation phase and will be useful for the lineage.

In [None]:
cleaned_ds = vec.browse("DTS-xxxx") #Get the ID of your Clean Dataset created in the Data Preparation phase

## Log a dataset version

Use the following code block to create a local datset we assume that the dataset contains data and we already cleaned it in the previous phases

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/vectice/GettingStarted/23.2.4.1-Tutorial_update/23.2/tutorial/ProductSales%20Cleaned.csv")
df.head()

## Remove Postal code

In [None]:
X = df.drop(["Sales", "Postal Code"],axis=1)
y = df["Sales"]
print(X.shape)
print(y.shape)

In [None]:
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Save the modeling train test split datasets as csv files
train_df = X_train.copy()
test_df = X_test.copy()

train_df["Sales"] = y_train
test_df["Sales"] = y_test

train_df.to_csv("train dataset.csv", index=False)
test_df.to_csv("test dataset.csv", index=False)

### Log dataset metadata and statistics
Log dataset metadata and statistics to Vectice by passing the file resource path and a `Pandas.DataFrame`

In [None]:
train_ds = vect.FileResource(paths="train dataset.csv", dataframes=train_df)
test_ds = vect.FileResource(paths="test dataset.csv", dataframes=test_df)

In [None]:
modeling_dataset = vect.Dataset.modeling(
        name="ProductSales Modeling",
        training_resource=train_ds,
        testing_resource=test_ds, 
        derived_from=cleaned_ds,
    )

## Log the dataset version
Since metadata for the modeling Dataset has changed, Vectice will automatically create a new version of the existing ProductSales Modeling Dataset with the updated metadata

In [None]:
model_iteration.step_model_input_data = modeling_dataset

### Retrain a Ridge regressor

In [None]:
model = make_pipeline(OneHotEncoder(use_cat_names=True),
                     Ridge())
model.fit(X_train, y_train)

In [None]:
# Making Prediction with the training data
y_train_pred = model.predict(X_train)
#Evaluating the model 
mae_train=mean_absolute_error(y_train, y_train_pred)
print(round(mae_train,2))

In [None]:
# Making Prediction with the testing data
y_test_pred = model.predict(X_test)
#Evaluating the model 
mae_test = mean_absolute_error(y_test, y_test_pred)
print(round(mae_test,2))

In [None]:
features = model.named_steps["onehotencoder"].get_feature_names()
importance = model.named_steps["ridge"].coef_

feat_imf = pd.Series(importance, index=features).sort_values()

feat_imf.tail(10).plot(kind="barh")
plt.ylabel("Features")
plt.xlabel("Importance")
plt.title("Feature Importance")
plt.tight_layout()
plt.savefig("Feature Importance.png")

## Log a trained model version with an attachment

Then we log a trained model to Vectice using the `Vectice.Model()` object.

In [None]:
vect_model = vect.Model(library="scikit-learn", technique="Ridge Regression", metrics={"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)}, 
                        properties=model.named_steps, predictor=model, derived_from=modeling_dataset, attachments="Feature Importance.png")

## Add the retrain model to the step
You can add multiple models to a single step by using the `+=` operator.

In [None]:
model_iteration.step_build_model += vect_model

## Add a comment 

Passing a `string` to a step will add a comment.

In [None]:
# Select the model to be staged
model_iteration.step_model_validation = f"Model passed acceptance criteria\nMAE Train: {round(mae_train,2)}\nMAE Test: {round(mae_test,2)}"

### Once you are statisfied with your iteration you can complete it so it can't be modified anymore and you can request a review

In [None]:
model_iteration.complete()

✴ You can view your registered assets and comments in the UI by clicking the links in the output messages..