# Deep Learning on SLURM

## Session 01 - Preparations

- *Course*: Big Data and Language Technologies
- *Date*: 02.05.2022

In this session, we will prepare running a Deep Learning model on our cluster servers running SLURM. First, we will introduce the dataset and model architecture.

In [None]:
#!pip install transformers
#!pip install tensorflow
#!pip install scikit-learn

## Setup

In [None]:
CONFIG = {
    "model": "distilbert-base-uncased",
    "seq_length": 512,
    "num_classes": 20,
    "batch_size": 64,
}

In [None]:
import tensorflow as tf
# Check for TensorFlow GPU access
print(f"TensorFlow has access to the following devices:\n{tf.config.list_physical_devices()}")

# Check TensorFlow version
print(f"TensorFlow version: {tf.__version__}")

## Obtaining Data

**Exercise**: load the 20NG datasets using the `sklearn.datasets` module.

In [None]:
from sklearn.datasets import fetch_20newsgroups


**Exercise**: split the data into training, validation, and test data. (80/10/10)

In [None]:
from sklearn.model_selection import train_test_split


## Preprocessing

**Exercise**: Load the `DistilBertTokenizer` pretrained tokenizer and apply it to tokenize the 20NG data.

In [None]:
from transformers import DistilBertTokenizer


**Exercise**: convert the dataset into a `tf.data.Dataset` 

**Exercise**: shuffle, repeate and batch the training data, batch the validation and test data.

## Importing the Pretrained Model

**Exercise**: import the `TFDistilBertModel` from huggingface, with the model name as specified in the config.

In [None]:
from transformers import TFDistilBertModel


**Exercise**: define two input layers to feed the `input_id` and `attention_mask` sequences into the transformer layer.

**Exercise**: extract the hidden representation of the `[CLS]` special token (always the first of every document).

## Defining a Classification Head

**Exercise**: define a classification head with 5 layers: Dropout, Dense, Dropout, Dense, Output. The output layer needs to have the same number of dimensions as there are classes (20).

**Exercise**: use the previously defined layer stack to define a model. 

**Exercise**: set the pretrained layer (`tf_distilbert_model`) to not trainable (we only want to train the head).

**Exercise**: compile the model with `SparseCategoricalCrossentropy` as loss function and the `Adam` optimizer.

**Exercise**: define three callbacks: `ModelCheckpoint`, `EarlyStopping`, and `TensorBoard`

**Exercise**: fit the model.

**Exercise**: load the best checkpoint from disk.

*Notes*: 
- the model will likely be too big to train on you computer locally. You can get a finetuned version using the curl link below.
- you need to specify the custom `TFDistilBertModel` pretrained class when loading.

In [None]:
#!curl https://files.webis.de/bdlt-ss22/finetuned_bert_20ng.hdf5 --output finetuned_bert_20ng.hdf5