# Getting Started with Whisk
Whisk makes it easy to create reproducible, collaborative machine learning projects. It provides the project guide rails so you can focus on the data science. Here's what you need to know get started.

## Virtual Environment
This project comes pre-loaded with a virtual environment named `venv` and an IPython kernel named `bike_image_classifier_tensorflow`. This notebooks uses the `bike_image_classifier_tensorflow` kernel. In the terminal, run `source venv/bin/activate` to activate the venv.

Dependencies are listed in the `requirements.txt` file. Add your dependencies to this file and run `pip install -r requirements.txt` to update your environment.

In [None]:
%cat ../requirements.txt

## Loading code from Python files
When your notebook goes beyond exploratory work, it's a good practice to move your functions and classes to Python files. These are easier to maintain than notebook cells.

The cell below ensures that your notebook cells always have a fresh copy of the `src` directory.

In [3]:
# Load the "autoreload" extension. Prior to executing code, modules are reloaded. 
# There's no need to restart jupyter notebook if you modify code in the `src` directory.
# https://ipython.org/ipython-doc/3/config/extensions/autoreload.html
%load_ext autoreload

# OPTIONAL: always reload modules so that as you change code in src, it gets loaded
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


For example, `src/bike_image_classifier_tensorflow/data/extract.py` contains a sample function named `extract_example()`. You can call this function from this notebook:

In [1]:
from bike_image_classifier_tensorflow.data.extract import *
extract_example()

Extracting lots of data ... done.


## Accessing the data directory
Training data should be in version control alongside your code to ensure experiments are reproducible. For smaller training tests, it is OK to store in Git. For larger training sets, DVC is pre-installed.

Place your data inside the project's data directory. You can obtain the path to this directory like this:

In [None]:
import whisk
whisk.data_dir

## Saving models to the artifacts directory

After training a model you should save it to disk so you can invoke the model later. The method call for saving a model to disk is dependent on your ML framework (for example, Scikit-learn uses pickle while you just call `save` on a PyTorch model).

Regardless of your ML framework, save your model and required artifacts for pre/post-processing to the artifacts directory. Saving a model looks this:

In [None]:
import whisk
from whisk.model_stub import ModelStub # A fake model
# This example uses pickle to serialize a Python object. 
# Use the preferred serialization approach for your ML framework.
import pickle

model = ModelStub()
file_path = whisk.artifacts_dir / "model.pkl"
pickle.dump(model, open(file_path,"wb"))

## Invoking a saved model

This project includes a sample `data.models.Model` class that loads a model from disk and allows you to generate a prediction. Find this class inside the `src/bike_image_classifier_tensorflow/models/model.py` file. You can invoke the model like this:

In [None]:
from bike_image_classifier_tensorflow.models.model import Model
model = Model()
model.predict([[1]])

Update `src/bike_image_classifier_tensorflow/models/model.py` to handle loading and pre/post-processing for your own model.