# Before you start with this Modeling Notebook

This notebook is part of the Vectice tutorial project notebook series. It illustrates how the forcast model was trained in the "Modeling" phase of the **"Tutorial: Forecast in store-unit sales"** project you can find in your personal Vectice workspace.

### Pre-requisites:
Before using this notebook you will need:
* An account in Vectice
* An API token to connect to Vectice through the APIs
* The Phase Id of the project where you want to log your work

Refer to Vectice Tutorial Guide for more detailed instructions: https://docs.vectice.com/getting-started/tutorial


### Other Resources
*   Vectice Documentation: https://docs.vectice.com/ </br>
*   Vectice API documentation: https://api-docs.vectice.com/

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

## Install the latest Vectice Python client library

In [None]:
%pip install --q vectice -U

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
from sklearn.linear_model import Ridge
from sklearn.pipeline import make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

In [None]:
from google.cloud import aiplatform

## Get started by connecting to Vectice

You can learn more about the `Connection` object in the [documentation](https://api-docs.vectice.com/reference/vectice/connection/)

<div class="alert" style="color: #383d41; background-color: #e2e3e5; border-color: #d6d8db" role="alert">
<b>Automated code lineage:</b> The code lineage functionalities are not covered as part of this Tutorial as they require to first setting up a Git repository.
</div>

**First, we need to authenticate to the Vectice server. Before proceeding further:**

- Visit the Vectice app to create and copy an API token (cf. https://docs.vectice.com/getting-started/create-an-api-token)

- Paste the API token in the code below

In [None]:
import vectice

vec = vectice.connect(api_token="my-api-token") #Paste your API token

## Specify which project phase you want to document
In Vectice UI, navigate to your personal workspace inside your default Tutorial project go to the Modeling phase and copy paste your Phase Id below.

In [None]:
phase = vec.phase("PHA-xxxx") #Paste your own Modeling Phase ID

## Next we are going to create an iteration
An iteration allows you to organize your work in repeatable sequences of steps. You can have multiple iteration within a phase.

In [None]:
model_iteration = phase.create_iteration()

## Retrieve your cleaned Dataset previously created in your Data Preparation phase of the project
You can retrieve a variety of Vectice Objects with the `browse('VECTICE-ID')` method. Namely, Phases, Iterations, Datasets, Models etc

In [None]:
cleaned_ds = vec.browse("DTV-xxxx") #Get the ID of your Clean Dataset created in the Data Preparation phase

## Log a Dataset version

Use the following code block to log a local Dataset

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/vectice/GettingStarted/main/23.2/tutorial/ProductSales%20Cleaned.csv", converters = {'Postal Code': str})
df.to_csv("ProductSales Cleaned.csv")
df.head()

In [None]:
target="Sales"
X=df.drop([target],axis=1)
y=df[target]
print(X.shape)
print(y.shape)

In [None]:
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=0.2, random_state=42)

In [None]:
# Save the modeling train test split datasets as csv files
train_df = X_train.copy()
test_df = X_test.copy()

train_df["Sales"] = y_train
test_df["Sales"] = y_test

train_df.to_csv("train dataset.csv", index=False)
test_df.to_csv("test dataset.csv", index=False)

### Log a modeling Dataset
The Vectice resource will automatically extract pertinent metadata from the local dataset file and collect statistics from the pandas dataframe. This information will be documented within the iteration as part of a Dataset version.

In [None]:
train_ds = vectice.FileResource(paths="train dataset.csv", dataframes=train_df)
test_ds = vectice.FileResource(paths="test dataset.csv", dataframes=test_df)

In [None]:
modeling_dataset = vectice.Dataset.modeling(
        name="ProductSales Modeling",
        training_resource=train_ds,
        testing_resource=test_ds, 
        derived_from=cleaned_ds,
    )

In [None]:
model_iteration.step_model_input_data = modeling_dataset

# Vertex AI Log A Dataset 

In [None]:
aiplatform.init(project="tries-and-spikes", location="us-central1")

dataset = aiplatform.TabularDataset.create(
    display_name="ProductSales Modeling",
    gcs_source=["gs://aidan_vertex_tutorial/tutorial/test dataset.csv","gs://aidan_vertex_tutorial/tutorial/train dataset.csv"],
)

dataset.wait()

print(f'\tDataset: "{dataset.display_name}"')
print(f'\tname: "{dataset.resource_name}"')

In [None]:
from google.cloud import storage
# Setup the Google Cloud Storage client, this is used to create the vectice.GCSResource below.
gcs_client = storage.Client() # You might need to pass credentials, depending on the environment you're in.

In [None]:
# Pass a GCS uri and a pandas DataFrame (this capture statistics) and the GCS Client variable.
train_dataset = vectice.GCSResource("gs://aidan_vertex_tutorial/tutorial/train dataset.csv", pd.read_csv("gs://aidan_vertex_tutorial/tutorial/train dataset.csv"), gcs_client=gcs_client)

In [None]:
test_dataset = vectice.GCSResource("gs://aidan_vertex_tutorial/tutorial/test dataset.csv", pd.read_csv("gs://aidan_vertex_tutorial/tutorial/test dataset.csv"), gcs_client=gcs_client)

In [None]:
modeling_dataset = vectice.Dataset.modeling(
        name="ProductSales Modeling",
        training_resource=train_dataset,
        testing_resource=test_dataset, 
        derived_from=cleaned_ds,
    )

In [None]:
model_iteration.step_model_input_data = modeling_dataset

In [None]:
##  baseline mean absolute error
y_mean=y_train.mean()
y_mean_pred=[y_mean] * len(y_train)
baseline_mae=mean_absolute_error(y_train,y_mean_pred)
round(baseline_mae,2)

## Log a Baseline model with Vertex AI & Vectice

First we log a Naive model to Vectice that always return the average sales to establish a Baseline.

In [None]:
aiplatform.init(project="tries-and-spikes", experiment='baseline-01')

In [None]:
experiment_run = aiplatform.start_run("run-003")

In [None]:
aiplatform.log_metrics({"mae_baseline": round(baseline_mae,2)})

In [None]:
aiplatform.log_params({"technique": "Constant predictor"})

In [None]:
aiplatform.end_run()

In [None]:
baseline_metrics = experiment_run.get_metrics()

In [None]:
base_line_params = experiment_run.get_params()

### Vectice log Baseline model 

In [None]:
# Baseline `model` to compare the Ridge Regression against
vect_baseline_model = vectice.Model(name = "Baseline", library="Own", technique="Constant predictor",properties=base_line_params , metrics=baseline_metrics, derived_from=modeling_dataset)

In [None]:
model_iteration.step_build_model = vect_baseline_model

### Train a Ridge regressor as a challenger

In [None]:
OHE = OneHotEncoder(handle_unknown='ignore')
scaler = StandardScaler()

cat_cols = ['Ship Mode', 'Segment', 'Country', 'City', 'State', 'Postal Code','Region', 'Category', 'Sub-Category']
num_cols = ['Quantity', 'Discount', 'Profit']

transformer = ColumnTransformer([('cat_cols', OHE, cat_cols),
                                ('num_cols', scaler, num_cols)])

model = make_pipeline(transformer,Ridge())
model.fit(X_train,y_train)

In [None]:
# Making Prediction with the training data
y_train_pred = model.predict(X_train)

In [None]:
#Evaluating the model 
mae_train=mean_absolute_error(y_train, y_train_pred)
print(round(mae_train,2))

In [None]:
#Making prediction on test
y_test_pred = model.predict(X_test)

In [None]:
#Evaluating the model 
mae_test = mean_absolute_error(y_test, y_test_pred)
print(round(mae_test,2))

In [None]:
feature_names = transformer.get_feature_names_out()
feature_importances = model.named_steps['ridge'].coef_

feat_imf = pd.Series(feature_importances, index=feature_names).sort_values()

feat_imf.tail(10).plot(kind="barh")
plt.ylabel("Features")
plt.xlabel("Importance")
plt.title("Feature Importance")
plt.tight_layout()
plt.savefig("Feature Importance.png")

# Log Model With Vertex AI & Then Vectice
- Log the Ridge model we created with the feature importance graph as attachment


In [None]:
aiplatform.init(experiment="ridge-regression-001")

In [None]:
# assign the experiment to a variable to easily capture metrics and parameters with Vectice
experiment = aiplatform.start_run("run-003")

In [None]:
aiplatform.log_model(model)

In [None]:
aiplatform.log_metrics({"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)})

In [None]:
parameters = {key: str(val) for key, val in model.named_steps.items()}

In [None]:
aiplatform.log_params(parameters)  #  We should prevent or flag object capture

In [None]:
aiplatform.end_run()

## Log the Ridge model with Vectice
- You can log everything as you do in Vertex AI
- However, with Vectice you can pass attachments such as graphs too

In [None]:
vect_model = vectice.Model(library="scikit-learn", 
                            technique="Ridge Regression",
                            metrics={"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)}, 
                            properties=parameters, 
                            predictor=model,                        # Pass your model as a predictor to save it as a pickle file
                            derived_from=modeling_dataset,          # Pass your modeling dataset to document the lineage
                            attachments="Feature Importance.png")   # Pass your Feature Important graph as an attachment

#### Retrieve a past Vertex Experiment
- You can retrieve a past Experiment and capture with Vectice with the following
- Then simply pass these to the `vectice.Model`

In [None]:
# Retrieve past experiments and capture them with Vectice
experiment = aiplatform.ExperimentRun("run-002", "ridge-regression-001")

In [None]:
# Get metrics from the experiment 
ridge_metrics = experiment.get_metrics()

In [None]:
# Get parameters from the experiment
ridge_params = experiment.get_params()

In [None]:
### Get the predictor, you need to navigate the UI for the artifact id
vertex_model = aiplatform.get_experiment_model(artifact_id="tutorial-ridge-sklearn-2023-07-18-07-10-16-0fae2-tb-run")

### Assign the Ridge model to the step

You can add multiple models to a single step by using the `+=` operator.

In [None]:
model_iteration.step_build_model += vect_model

# Log a Model With Vertex AI using autolog & Vectice
This section outlines how to take advantage of Vertex autologging and Vectice, this is only for illustration purposes.

In [None]:
aiplatform.init(
    experiment="tutorial-ridge",
    project="tries-and-spikes",
    location="us-central1",
)

aiplatform.autolog()

# Your model training code goes here
OHE = OneHotEncoder(handle_unknown='ignore')
scaler = StandardScaler()

cat_cols = ['Ship Mode', 'Segment', 'Country', 'City', 'State', 'Postal Code','Region', 'Category', 'Sub-Category']
num_cols = ['Quantity', 'Discount', 'Profit']

transformer = ColumnTransformer([('cat_cols', OHE, cat_cols),
                                ('num_cols', scaler, num_cols)])

model = make_pipeline(transformer,Ridge())
model.fit(X_train,y_train)

y_train_pred = model.predict(X_train)
mae_train=mean_absolute_error(y_train, y_train_pred)
y_test_pred = model.predict(X_test)
mae_test = mean_absolute_error(y_test, y_test_pred)

# Insert Vectice to capture
vect_model = vectice.Model(library="scikit-learn", 
                            technique="Ridge Regression",
                            metrics={"mae_train": round(mae_train,2), "mae_test": round(mae_test,2)}, 
                            properties=model.named_steps, 
                            predictor=model,                        # Pass your model as a predictor to save it as a pickle file
                            derived_from=modeling_dataset,          # Pass your modeling dataset to document the lineage
                            attachments="Feature Importance.png")   # Pass your Feature Important graph as an attachment

aiplatform.autolog(disable=True)

In [None]:
# You can get the run and experiment from the logging output. Experiment name = `tutorial-ridge` & Run name = `sklearn-2023-06-29-09-55-07-3acb6`
# `Associating projects/599225543291/locations/us-central1/metadataStores/default/contexts/tutorial-ridge-sklearn-2023-06-29-09-55-07-3acb6 to Experiment: tutorial-ridge`
autolog_exp = aiplatform.ExperimentRun("sklearn-2023-07-18-07-10-16-0fae2", "tutorial-ridge")

In [None]:
### Retrieve the predictor ###
autolog_model = aiplatform.get_experiment_model(artifact_id="tutorial-ridge-sklearn-2023-07-18-07-10-16-0fae2-tb-run")

In [None]:
autolog_metrics = autolog_exp.get_metrics()
autolog_params = autolog_exp.get_params()

In [None]:
# Pass the metrics, parameters and model to the `vectice.Model`
# You would then assign this model to a step as seen in the above examples
vect_model = vectice.Model(library="scikit-learn", 
                            technique="Ridge Regression",
                            metrics=autolog_metrics, 
                            properties=autolog_params, 
                            predictor=model,                        # Pass your model as a predictor to save it as a pickle file
                            derived_from=modeling_dataset,          # Pass your modeling dataset to document the lineage
                            attachments="Feature Importance.png")   # Pass your Feature Important graph as an attachment

## Add a comment 

Passing a `string` to a step will add a comment.

In [None]:
# Select the model to be staged
model_iteration.step_model_validation = f"Model passed acceptance criteria\nMAE Train: {round(mae_train,2)}\nMAE Test: {round(mae_test,2)}"

In [None]:
model_iteration.complete()

## 🥇 Congrats! You learn how to succesfully use Vectice to auto-document the Modeling phase of the Tutorial Project.<br>
### Next we encourage you to explore other notebooks in the tutorial series. You can find those notebooks in Vectice public GitHub repository : https://github.com/vectice/GettingStarted/

✴ You can view your registered assets and comments in the UI by clicking the links in the output messages..