## JupyterHUB: Python notebook

In this notebook we will create an example of a workflow in Python. We will use the classic MNIST deep learning example. For this example, we are using a locally saved version of the dataset, to show that we can read and write data from our own folders on VSC_DATA.

### Loading and adding packages

First of all, we will start by loading the necessary packages:

In [None]:
import tensorflow as tf
import datetime, os
import pickle

As we want to use a GPU, we should check whether or not we have the requested GPU available. 

In [None]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

### Getting data from your VSC_DATA folder

If you need to access a dataset on the cluster, you should store it in your VSC_DATA folder, as this is the standard location for the JupyterHUB. The actual process of reading in your data is easy and exactly the same as usual. In this case we are using a pickle dataset:

In [None]:
os.chdir('py_notebook_ex')
with open('mnist.pickle', 'rb') as data:
    mnist_dataset = pickle.load(data)

Data exploration is quite easy here as well. The notebook allows plotting as easy as in other Python IDEs. We could use for example matplotlib to plot the first training example image. 

As we didn't install the matplotlib package yet, we should use conda to install it. Jupyter also allows to do this install from within the workbook. Just provide the standard 'conda install' command:

In [None]:
conda install matplotlib

As the warning above mentions, you should restart your kernel for the package to be available. If we now check the installed packages, we see that matplotlib is installed:

In [None]:
conda list

All conda commands can be run from here. Some things you need to be aware of:
 - You can only run one conda command per cell
 - Don't mix usage of conda in a terminal and here. This can cause strange behaviour and even corrupt your kernel. Or you 
   install the packages in the terminal before launching the kernel or you install them here and restart the kernel afterwards.

Now that it is installed, we can import matplotlib:

In [None]:
import matplotlib.pyplot as plt

### Training and testing the model

Here, we will create the appropriate datasets, plot a first example and then create the model and loss function.

In [None]:
(x_train, y_train), (x_test, y_test) = mnist_dataset
x_train, x_test = x_train / 255.0, x_test / 255.0

In [None]:
plt.imshow(x_train[0], cmap=plt.get_cmap('gray'))

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(28, 28)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(10)
])

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

The model can now be compiled and trained:

In [None]:
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, 
          y_train, 
          epochs=5,
          validation_data=(x_test, y_test))