# Before you start with this Data Understanding Notebook

This notebook is part of the Vectice tutorial project notebook series. It illustrates how to log the assets documented in the "Data Understanding" phase of the **"Tutorial: Forecast in store-unit sales"** project you can find in your personal Vectice workspace.

### Pre-requisites:
Before using this notebook you will need:
* An account in Vectice
* An API token to connect to Vectice through the APIs
* The Phase Id of the project where you want to log your work

Refer to Vectice Tutorial Guide for more detailed instructions: https://docs.vectice.com/getting-started/tutorial


### Other Resources
*   Vectice Documentation: https://docs.vectice.com/ </br>
*   Vectice API documentation: https://api-docs.vectice.com/

In [1]:
!pip install mlflow==2.5.0

[0m[31mERROR: Ignored the following versions that require a different python version: 2.0.0 Requires-Python >=3.8; 2.0.0rc0 Requires-Python >=3.8; 2.0.1 Requires-Python >=3.8; 2.1.0 Requires-Python >=3.8; 2.1.1 Requires-Python >=3.8; 2.2.0 Requires-Python >=3.8; 2.2.1 Requires-Python >=3.8; 2.2.2 Requires-Python >=3.8; 2.3.0 Requires-Python >=3.8; 2.3.1 Requires-Python >=3.8; 2.3.2 Requires-Python >=3.8; 2.4.0 Requires-Python >=3.8; 2.4.1 Requires-Python >=3.8; 2.4.2 Requires-Python >=3.8; 2.5.0 Requires-Python >=3.8[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement mlflow==2.5.0 (from versions: 0.0.1, 0.1.0, 0.2.0, 0.2.1, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.5.0, 0.5.1, 0.5.2, 0.6.0, 0.7.0, 0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.9.0.1, 0.9.1, 1.0.0, 1.1.0, 1.1.1.dev0, 1.2.0, 1.3.0, 1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.7.1, 1.7.2, 1.8.0, 1.9.0, 1.9.1, 1.10.0, 1.11.0, 1.12.0, 1.12.1, 1.13, 1.13.1, 1.14.0, 1.14.1, 1.15.0, 1.16.0, 1.17.0, 1.18.0, 1.19.0, 1.20.0, 1.20.1, 1.20.2, 

In [None]:
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Install the latest Vectice Python client library

In [None]:
%pip install --q vectice -U

## Get started by connecting to Vectice

In [None]:
import vectice

vec = vectice.connect(api_token="my-api-token") #Paste your API token

## Specify which project phase you want to document
In Vectice UI, navigate to your personal workspace inside your default Tutorial project go to the Data Understanding phase and copy paste your Phase Id below.

In [None]:
phase = vec.phase("PHA-xxxx") #Paste your own Data Understanding Phase ID

## Next we are going to create an iteration
An iteration allows you to organize your work in repeatable sequences of steps. You can have multiple iteration within a phase

In [None]:
iteration = phase.create_iteration()

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/vectice/GettingStarted/main/23.2/tutorial/SampleSuperstore.csv", converters = {'Postal Code': str})
df.to_csv("SampleSuperstore.csv", index=False)

## Log a dataset
Use the following code block to create a local dataset and generate a graph:

In [None]:
origin_ds = vectice.FileResource(paths="SampleSuperstore.csv", dataframes=df)


origin_dataset = vectice.Dataset.origin(
    name="ProductSales Origin",
    resource=origin_ds, 
)

In [None]:
iteration.step_collect_initial_data = origin_dataset

In [None]:
iteration.step_describe_data = str(df.columns.values)

In [None]:
## Checking for Multicollinearity
corr_matrix=df.select_dtypes("number").drop("Sales",axis=1).corr()
sns.heatmap(corr_matrix);
plt.savefig("corr_matrix.png")

## Log graphs

You can add multiple items to a single step by using the `+=` operator.

In [None]:
iteration.step_explore_data += "corr_matrix.png"

In [None]:
#Checking for outliers
sns.distplot(df["Quantity"])
plt.savefig("Quantity.png")

In [None]:
iteration.step_explore_data += "Quantity.png"

In [None]:
iteration.complete()

## 🥇 Congrats! You learn how to succesfully use Vectice to auto-document the Data Understanding phase of the Tutorial Project.<br>
### Next we encourage you to explore other notebooks in the tutorial series. You can find those notebooks in Vectice public GitHub repository : https://github.com/vectice/GettingStarted/