# Before you start with this Model Retraining Notebook

This notebook is the primary notebook used as part of the Vectice Tutorial. It illustrates how to use Vectice with a realistic Business scenario. As part of this Notebook we will retrain a model by droping a column from our dataset. This Notebook maps to the "Model Retraining" phase of the **"Tutorial: Forecast in store-unit sales"** project you can find in your personal Vectice workspace.

### Pre-requisites:
Before using this notebook you will need:
* An account in Vectice
* An API token to connect to Vectice through the APIs
* Copy the your Model Retraining Phase Id

Refer to Vectice Tutorial Guide for more detailed instructions: https://docs.vectice.com/getting-started/tutorial


### Other Resources
*   Vectice Documentation: https://docs.vectice.com/ </br>
*   Vectice API documentation: https://api-docs.vectice.com/

## We assume in this exercise that we want to retrain a Ridge model because the variable 'postal code' was left accidentally inside our initial modeling dataset

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Install the latest Vectice Python client library

In [None]:
%pip install -q vectice -U

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

## Get started by connecting to Vectice


<div class="alert" style="color: #383d41; background-color: #e2e3e5; border-color: #d6d8db" role="alert">
<b>Automated code lineage:</b> The code lineage functionalities are not covered as part of this Tutorial as they require to first setting up a Git repository.
</div>

**First, we need to authenticate to the Vectice server. Before proceeding further:**

- Visit the Vectice app to create and copy an API token (cf. https://docs.vectice.com/getting-started/create-an-api-token)

- Paste the API token in the code below

In [None]:
import vectice

connect = vectice.connect(api_token="my-api-token") #Paste your API token

## Specify which project phase you want to document

- In Vectice UI, navigate to your personal workspace, navigate to your Tutorial: Forecast in store-unit sales project.

- Go to the Model Retraining Phase and copy and paste your Phase Id below. (cf. https://docs.vectice.com/getting-started/tutorial#duplicate-phase)

In [None]:
phase = connect.phase("PHA-xxxx") # Paste your own Model Retraining phase ID

### Next we are going to create an iteration
An iteration allows you to organize your work in repeatable sequences. You can have multiple iteration within a phase. Iteration can be organized into sections.

In [None]:
iteration = phase.create_or_get_current_iteration()

## Re-create the modeling Dataset without the postal code

Load the data from GitHub. This DataFrame has already been cleaned as part of the Data Preparation Phase.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/vectice/GettingStarted/main/23.3/tutorial/ProductSales%20Cleaned.csv", converters = {'Postal Code': str})
df.head()

### Remove Postal code

In [None]:
X = df.drop(["Sales", "Postal Code"],axis=1)
y = df["Sales"]
print(X.shape)
print(y.shape)

In [None]:
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Save the modeling train test split datasets as csv files
train_df = X_train.copy()
test_df = X_test.copy()

train_df["Sales"] = y_train
test_df["Sales"] = y_test

train_df.to_csv("train dataset.csv", index=False)
test_df.to_csv("test dataset.csv", index=False)

### Log the recreated modeling Dataset in Vectice
The Vectice resource will automatically extract pertinent metadata from the local dataset file and collect statistics (optional) from the pandas dataframe. This information will be documented within the iteration as part of a Dataset version.

In [None]:
train_ds = vectice.FileResource(paths="train dataset.csv", dataframes=train_df)
test_ds = vectice.FileResource(paths="test dataset.csv", dataframes=test_df)

In [None]:
#Declare the modeling Dataset
modeling_dataset = vectice.Dataset.modeling(
        name="ProductSales Modeling",
        training_resource=train_ds,
        testing_resource=test_ds, 
    )

In [None]:
# Log the new modeling Dataset to the iteration to document it inside the model input data section 
iteration.log(modeling_dataset, section = "model input data")

## Log a comment to indicate you removed the "postal code" column

Logging a `string` to an iteration will log a comment. We will utilize the optional parameter called `section` to enhance the organization of an iteration. Sections can be dynamically generated either through the API or the App.

In [None]:
iteration.log("The postal code column was removed from the modeling dataset", section = "model input data")

## Retrain a Ridge regressor model

In [None]:
OHE = OneHotEncoder(handle_unknown='infrequent_if_exist')
scaler = StandardScaler()

cat_cols = ['Ship Mode', 'Segment', 'Country', 'City', 'State','Region', 'Category', 'Sub-Category']
num_cols = ['Quantity', 'Discount', 'Profit']

transformer = ColumnTransformer([('cat_cols', OHE, cat_cols),
                                ('num_cols', scaler, num_cols)])

model = make_pipeline(transformer,Ridge())
model.fit(X_train,y_train)

In [None]:
# Making Prediction with the training data
y_train_pred = model.predict(X_train)
#Evaluating the model 
mae_train=mean_absolute_error(y_train, y_train_pred)
print(round(mae_train,2))

In [None]:
# Making Prediction with the testing data
y_test_pred = model.predict(X_test)
#Evaluating the model 
mae_test = mean_absolute_error(y_test, y_test_pred)
print(round(mae_test,2))

In [None]:
#Generate feature importance
feature_names = transformer.get_feature_names_out()
feature_importances = model.named_steps['ridge'].coef_

feat_imf = pd.Series(feature_importances, index=feature_names).sort_values()

feat_imf.tail(10).plot(kind="barh")
plt.ylabel("Features")
plt.xlabel("Importance")
plt.title("Feature Importance")
plt.tight_layout()
plt.savefig("Feature Importance.png")

## Log the retrained Ridge model we created with the feature importance graph as attachment

In [None]:
#Declare the Ridge model with the information you want to document
vect_model = vectice.Model(library="scikit-learn", 
                            technique="Ridge Regression",
                            metrics={"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)}, 
                            properties=model.named_steps, 
                            predictor=model,                        # Pass your model as a predictor to save it as a pickle file
                            derived_from=modeling_dataset,          # Pass your modeling dataset to document the lineage
                            attachments="Feature Importance.png")   # Pass your Feature Important graph as an attachment

In [None]:
# Log the Ridge model to the iteration to document it inside the build model section 
iteration.log(vect_model, section = "build model")

You can add multiple models to a single iteration.

## Log a comment 

Similarly to the **model input data** section, passing a `string` to an iteration will log a comment.

In [None]:
# Specify some validation metrics that will be used for reviewing the model
comment = f"Model passed acceptance criteria\nMAE Train: {round(mae_train,2)}\nMAE Test: {round(mae_test,2)}"
iteration.log(comment, section = "model validation")

### Once you are satisfied with your iteration you can complete it.

In [None]:
iteration.complete()

### Completed iterations can't be modified anymore. This can be useful as part of the review process. <br>
<br>

## 🥇 Congrats! You learn how to succesfully use Vectice to auto-document the Model Retraining phase of the Tutorial Project.<br>
### Next we encourage you to follow [part 2](https://docs.vectice.com/getting-started/tutorial#part-2) of the Tutorial guide to continue learning about Vectice.

✴ You can view your registered assets and comments in the UI by clicking the links in the output messages..