# Before your start with this Tutorial

**Tutorial Intention:** Providing an example of iteration and related step on a modeling phase for you to:

*   Experience the data science lifecycle using Vectice
*   See how simple it is to connect your notebook to Vectice
*   Learn how to structure and log your work using Vectice

**Resources needed:**
*   <b>Tutorial Project: Forecast in-store unit sales (23.2)</b> - You can find it as part of your personal workspace       

**Other resources:**
*   Vectice Webapp Documentation: https://docs.vectice.com/
*   Vectice API documentation: https://api-docs.vectice.com/

# 1. Getting Started         

**First, we need to install and authenticate ourselves to the Vectice server. Before proceeding further:**
*   Visit the Vectice app (https://app.vectice.com/account/api-keys) to create and download an API token, name the file as "My Token"
*   Upload the file to Colab by clicking on the "folder" icon on the left-hand taskbar and selecting "Upload to Session Storage"

* If you then execute

In [None]:
%pip install --q vectice -U
import vectice as vct

vec = vct.connect(config="My Token.json")

#### You have successfully installed Vectice in your notebook and connected to your instance. 
#### Wasn't that easy?

# 2. Navigating the Vectice API

#### Next, navigate your way to your personal workspace. Go ahead and execute the cell below to navigate to your workspace. 

In [None]:
active_wksp = vec.my_workspace # Your personal workspace can be access with my_workspace property

#### Let's list all the projects it contains, execute:

In [None]:
active_wksp.list_projects()

#### Make a note of the Tutorial project ID and Name
#### Before we continue, let’s take a moment to look at how API documentation works in Vectice. Execute:

In [None]:
help(active_wksp.project) # you can call help on any method in Vectice

#### to call the “project()” method’s documentation. 
#### In the Vectice API, you can call “help” on any method to understand how said method is used.


#### Now that you’ve familiarized yourself with the help method, let’s keep moving. Using the Name or ID of the tutorial project, execute


In [None]:
active_proj = active_wksp.project("Tutorial Project: Forecast in store unit sales (23.2)") # pass the name or ID of your tutorial project

#### to create a pointer to the tutorial project. 
#### For this tutorial scenario, we’ve created a project template with 6 different project phases for you, some with specific steps we need to complete.

#### Go ahead and execute

In [None]:
active_proj.list_phases()

#### To retrieve a list of phases and their statuses.
#### As you can see, the first 3 phases have been completed for you. 
#### In this tutorial, we are going to learn how to document a new model in Vectice.
#### Go ahead and execute:

In [None]:
active_phase = active_proj.phase("Modeling") # pass the name of the phase

#### to get a handle on the modeling phase.

#### Before we continue, let’s pause briefly. 
#### You may be thinking that it feels like a lot of work to get this far - and it has been.
#### But simply note that (knowing what we do now) we can get to the very same place with a single instruction by using the ID of the phase displayed in the UI or the cells above.
#### Go ahead, try it!

In [None]:
active_phase = vec.phase("PHA-5629") # Replace with your own phase-ID for the Modeling phase

#### Much simpler, no?
#### One can skip the navigation part by directly passing the ID of the project, phase... 

#### To get more information as to what this phase requires, execute:

In [None]:
active_phase.list_steps()

#### Great! We now know what needs to be completed within this phase.

#### Let's start a new iteration of the phase
#### Execute:

In [None]:
active_iter = active_phase.create_iteration()

active_iter.list_steps()

#### We now have an iteration containing the steps that were defined for us

#### Okay, it’s modelin’ time.

# 3. A Simple Modeling Exercise

### 3.1 Logging a Simple Text-Only Message
#### The first step, as described above, calls for describing the modeling technique we will use in this iteration of the model. Execute

In [None]:
active_iter.step_select_modeling_techniques = "For this first iteration we are going to use a Linear Regression model to get a base model."

#### to log a short description of our work. This completes the step

### 3.2 Logging a Text Message with Embedded Variables

#### For our next step, it looks like our modeling overlords would like us to split our dataset into training, testing and validation datasets, and log some basic information about the split.

#### Let get the dataset we need to split. Lucky for us, the Vectice elves have left us a clean dataset, created as part of the “Data Preparation” phase.
#### Execute the cell below to download it locally

In [None]:
!wget https://vectice-examples.s3.us-west-1.amazonaws.com/Tutorial/ForecastTutorial/original_clean.csv -q --no-check-certificate

#### Alright - it’s time to split this baby up.

#### Since we’re about to do some modeling work, we need to load a few analytics libraries and packages. 
#### Execute the following boilerplate code, which is completely independent of Vectice.
#### Don’t worry about understanding the following cell in great detail - suffice to say we are simply retrieving the dataset we saved above, splitting it into three (for training, testing and validation) and saving the split datasets locally.


In [None]:
# NOTE: this cell is boilerplate data science code
# there is no Vectice code below

# here, we install essential analytics libraries and download our dataset,
# before splitting it into 3 files (for training, testing and validation)
# that we then save locally

# import some essential math libraries
import pandas as pd; import matplotlib.pyplot as plt; import numpy as np
import plotly.offline as py; from matplotlib import pyplot as plt
import IPython.display
%matplotlib inline
py.init_notebook_mode(connected=True)


# load scikit-learn modeling packages
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error,mean_absolute_error

# read the dataset
df_model = pd.read_csv("original_clean.csv")

# specify how much of the dataset to set aside for testing
test_size = 0.30
# specify a seed value so we can always generate the same split
random_state = 42

# Generate df_train, df_test, which we will need for modeling
df_train, df_test = train_test_split(df_model, test_size = test_size, random_state = random_state)

# save the 3 split datasets locally
df_train.to_csv("traindataset.csv")
df_test.to_csv("testdataset.csv")

#### The next thing we should do is log the datasets we used, so we know where these numbers came from. Go ahead and execute

In [None]:
train_ds = vct.FileResource(paths="traindataset.csv")
test_ds = vct.FileResource(paths="testdataset.csv")


dataset = vct.Dataset.modeling(
    name="my modeling dataset",
    training_resource=train_ds,
    testing_resource=test_ds
)
active_iter.step_generate_test_design += dataset

#### to package our 2 datasets with their essential metadata, and log them in Vectice.

#### As before, let’s log our work as a message in Vectice, so we keep a trace of the work we did.

In [None]:
# First let's build our message
msg = f"We split the dataset in a training, testing and validation datasets. "\
      f"{test_size * 100}% of the data is set aside for testing.\n "\
      f"- Training dataset size: {df_train.shape[0]}\n "\
      f"- Testing dataset size: {df_test.shape[0]}\n "\
      f"Our seed to generate repeatable datasets is {random_state}"
active_iter.step_generate_test_design += msg

### 3.3 Logging a Model and Associated Datasets

#### We’re on a roll!!
#### As before, we’ve provided you with the following boilerplate code, which is completely independent of Vectice.
#### Don’t worry about understanding the following cell in great detail - all we’re doing is running a linear regression, and outputting summary statistics as well as a nice plot.


In [None]:
# NOTE: this cell is boilerplate data science code
# there is no Vectice code below
X_train, y_train = df_train.drop(['unit_sales'], axis=1), df_train["unit_sales"]
X_test, y_test = df_test.drop(['unit_sales'], axis=1), df_test["unit_sales"]

# here, we are running a linear regression, before outputting some summary
# statistics and a nice plot


# create a linear regression model
model_linreg = LinearRegression()
model_linreg.fit(X_train, y_train)

# evaluate, define and save the RMSE and MAE summary statistics
pred = model_linreg.predict(X_test)    
RMSE = np.sqrt(mean_squared_error(y_test, pred))
MAE = mean_absolute_error(y_test, pred)

# the metrics object holds our two key summary statistics
summary_stats = {"RMSE": RMSE, "MAE": MAE}

# finally, generate a save a pretty plot
plt.scatter(X_train.iloc[:,0].values, y_train ,color='g')
plt.plot(X_test, pred,color='k')
plt.savefig("regression_graph.png")

#### As before, let’s log our work in Vectice, so we keep a trace of what we did.
#### Let's document the model we just generated, run the following cell

In [None]:
# Similar to the way we package our datasets previously, 
# let’s use 'Model' object to package our model with some of its essential metadata
model = vct.Model(
                name          = "Unit Sales Predictor",
                library       = "scikit-learn",
                technique     = "linear regression",
                metrics       = summary_stats,
                attachments   = "regression_graph.png",
                predictor     = model_linreg,
                derived_from  = [dataset.latest_version_id])

# Next, let's log the model to the step
active_iter.step_build_model += model 

#### Now let's log our summary statistics, execute the following cell

In [None]:
msg = f"The model generated the following metrics: \n"\
      f"RMSE = {summary_stats['RMSE']} and MAE = {summary_stats['MAE']}"
active_iter.step_build_model += msg

#### to log our summary statistics as a simple message.



### 3.4 The Final Step

#### The very last step of the Modeling phase calls for assessing the performance of our model, and reflecting on next steps. 
#### But it’s been a long journey, so feel free to simply execute the code below (which should be familiar to you by now!) and call it a day.


In [None]:
active_iter.step_assess_model = "As expected the model performs better however this is not good enough \
      and we should try a different method. We recommend doing a Random Forest \
      as a new iteration to get a base model."

# The iteration of the phase completed all the steps needed. Let's mark it as completed
active_iter.complete()

# 4. Reviewing Our Work
#### That’s it for the heavy lifting. It’s now time for us to enjoy the fruits of our labor. Go ahead and open up your Vectice workspaces (https://app.vectice.com/workspaces/) so we can inspect what was happening in the background, while we were busy in the notebook.

#### Navigate to your personal workspace, and consult the list of projects. Here, you should see the Tutorial project we’ve been working on together. If all went well, the project card should show you as having last updated the project only a few minutes ago - via API.

![alt text](https://vectice-examples.s3.us-west-1.amazonaws.com/images/Vectice_-_AI_asset_management_and_collaboration_software4.jpg)

#### Inside the Tutorial project, things should also be looking very familiar. These are the phases we interacted with earlier, via the API. As you may recall, the Vectice elves had already completed the first few sections for us. Note, though, that the “Modeling” phase is now “In Progress.” This change happened automatically when we created our first iteration.

#### Feel free to take a moment to poke around the rest of the project and familiarize yourself with the way in which the UI and API both represent the same project data model.

![alt text](https://vectice-examples.s3.us-west-1.amazonaws.com/images/Vectice_-_AI_asset_management_and_collaboration_software3.jpg)

#### If you now pull up the Tutorial project’s “Modeling” phase, and visit the “Iterations” tab, you should be able to see the model iteration you created earlier.

![alt text](https://vectice-examples.s3.us-west-1.amazonaws.com/images/Vectice_-_AI_asset_management_and_collaboration_software2.jpg)

#### Better yet, all the information we logged into the steps now accessible here. Years from now, you’ll still know exactly what data went into this version of the “Unit Sales Predictor,” who worked on it, what the results were and what you decided to do next. It’s like having a time machine for Data Science.

![alt text](https://vectice-examples.s3.us-west-1.amazonaws.com/images/Vectice_-_AI_asset_management_and_collaboration_software.jpg)

#### That’s all for now! We hope you enjoyed this brief introduction to Vectice. If you have any questions, please get in touch with our support team at support@vectice.com
