# Before you start with this Modeling Notebook

This notebook is part of the Vectice tutorial project notebook series. It illustrates how the forcast model was trained in the "Modeling" phase of the **"Tutorial: Forecast in store-unit sales"** project you can find in your personal Vectice workspace.

### Pre-requisites:
Before using this notebook you will need:
* An account in Vectice
* An API key to connect to Vectice through the APIs
* The Phase Id of the project where you want to log your work

Refer to Vectice Tutorial Guide for more detailed instructions: https://docs.vectice.com/v/24.2.11/getting-started/tutorial


### Other Resources
*   Vectice Documentation: https://docs.vectice.com/v/24.2.11/ </br>
*   Vectice API documentation: https://api-docs.vectice.com/24.2.11/


In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Install the latest Vectice Python client library

In [None]:
%pip install --q vectice -U

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

## Get started by connecting to Vectice

<div class="alert" style="color: #383d41; background-color: #e2e3e5; border-color: #d6d8db" role="alert">
<b>Automated code lineage:</b> The code lineage functionalities are not covered as part of this Tutorial as they require to first setting up a Git repository.
</div>

**First, we need to authenticate to the Vectice server. Before proceeding further:**

- Visit the Vectice app to create and copy an API key (cf. https://docs.vectice.com/v/24.2.11/getting-started/create-an-api-key)

- Paste the API key in the code below

In [None]:
import vectice

connect = vectice.connect(api_token="your-api-key", host = 'your-host-info') #Paste your API key

## Specify which project phase you want to document
In Vectice app, navigate to your personal workspace inside your default Tutorial project go to the Modeling phase and copy paste your Phase Id below.

In [None]:
phase = connect.phase("PHA-xxxx") #Paste your own Modeling Phase ID

## Next we are going to create an iteration
An iteration allows you to organize your work in repeatable sequences. You can have multiple iteration within a phase. Iteration can be organized into sections.

In [None]:
iteration = phase.create_or_get_current_iteration()

## Retrieve your cleaned Dataset previously created in your Data Preparation phase of the project
You can retrieve a variety of Vectice Objects with the `browse('VECTICE-ID')` method. Namely, Phases, Iterations, Datasets, Models etc

In [None]:
cleaned_ds = connect.browse("DTV-xxxx") #Get the ID of your Clean Dataset created in the Data Preparation phase

### Retrieve your clean dataframe

Load the data from GitHub. This DataFrame has already been cleaned as part of the Data Preparation Phase.

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/vectice/GettingStarted/main/Tutorial%20data/ProductSales%20Cleaned.csv", converters = {'Postal Code': str})
df.head()

In [None]:
target="Sales"
X=df.drop([target],axis=1)
y=df[target]
print(X.shape)
print(y.shape)

In [None]:
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Save the modeling train test split datasets as csv files
train_df = X_train.copy()
test_df = X_test.copy()

train_df["Sales"] = y_train
test_df["Sales"] = y_test

train_df.to_csv("train dataset.csv", index=False)
test_df.to_csv("test dataset.csv", index=False)

### Log a modeling Dataset
The Vectice resource will automatically extract pertinent metadata from the local dataset file and collect statistics from the pandas dataframe. This information will be documented within the iteration as part of a Dataset version.

In [None]:
train_ds = vectice.FileResource(paths="train dataset.csv", dataframes=train_df)
test_ds = vectice.FileResource(paths="test dataset.csv", dataframes=test_df)

In [None]:
modeling_dataset = vectice.Dataset.modeling(
        name="ProductSales Modeling",
        training_resource=train_ds,
        testing_resource=test_ds, 
        derived_from=cleaned_ds,
    )

In [None]:
iteration.log(modeling_dataset, section = "model input data")

In [None]:
##  baseline mean absolute error
y_mean=y_train.mean()
y_mean_pred=[y_mean] * len(y_train)
baseline_mae=mean_absolute_error(y_train,y_mean_pred)
round(baseline_mae,2)

## Log a Baseline model

First we log a Naive model to Vectice that always return the average sales to establish a Baseline.

In [None]:
# Baseline `model` to compare the Ridge Regression against
vect_baseline_model = vectice.Model(name = "Baseline", library="Own", technique="Constant predictor", metrics={"mae_baseline": round(baseline_mae,2)}, derived_from=modeling_dataset)

In [None]:
iteration.log(vect_baseline_model, section = "build model")

### Train a Ridge regressor as a challenger

In [None]:
OHE = OneHotEncoder(handle_unknown='infrequent_if_exist')
scaler = StandardScaler()

cat_cols = ['Ship Mode', 'Segment', 'Country', 'City', 'State', 'Postal Code','Region', 'Category', 'Sub-Category']
num_cols = ['Quantity', 'Discount', 'Profit']

transformer = ColumnTransformer([('cat_cols', OHE, cat_cols),
                                ('num_cols', scaler, num_cols)])

model = make_pipeline(transformer,Ridge())
model.fit(X_train,y_train)

In [None]:
# Making Prediction with the training data
y_train_pred = model.predict(X_train)

In [None]:
#Evaluating the model 
mae_train=mean_absolute_error(y_train, y_train_pred)
print(round(mae_train,2))

In [None]:
#Making prediction on test
y_test_pred = model.predict(X_test)

In [None]:
#Evaluating the model 
mae_test = mean_absolute_error(y_test, y_test_pred)
print(round(mae_test,2))

In [None]:
feature_names = transformer.get_feature_names_out()
feature_importances = model.named_steps['ridge'].coef_

feat_imf = pd.Series(feature_importances, index=feature_names).sort_values()

feat_imf.tail(10).plot(kind="barh")
plt.ylabel("Features")
plt.xlabel("Importance")
plt.title("Feature Importance")
plt.tight_layout()
plt.savefig("Feature Importance.png")

## Log the Ridge model we created with the feature importance graph as attachment



In [None]:
vect_model = vectice.Model(library="scikit-learn", 
                            technique="Ridge Regression",
                            metrics={"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)}, 
                            properties=model.named_steps, 
                            predictor=model,                        # Pass your model as a predictor to save it as a pickle file
                            derived_from=modeling_dataset,          # Pass your modeling dataset to document the lineage
                            attachments="Feature Importance.png")   # Pass your Feature Important graph as an attachment

### Assign the Ridge model to the an iteration

You can log multiple models to a single interation by using the `iteration.log()` method.

In [None]:
iteration.log(vect_model, section = "build model")

## Add a note 

Logging a `string` to an iteration will add a note.

In [None]:
# Select the model to be staged
iteration.log(f"Model passed acceptance criteria\nMAE Train: {round(mae_train,2)}\nMAE Test: {round(mae_test,2)}", section = "model validation")

In [None]:
iteration.complete()

## 🥇 Congrats! You learn how to succesfully use Vectice to auto-document the Modeling phase of the Tutorial Project.<br>
### Next we encourage you to explore other notebooks in the tutorial series. You can find those notebooks in Vectice Tutorial Guide: [Want to learn more about the other phases of the tutorial project?](https://docs.vectice.com/v/24.2.11/getting-started/tutorial#want-to-learn-more-about-the-other-phases-of-the-tutorial-project)

✴ You can view your registered assets and notes in the Vectice app by clicking the links in the output messages..