<a href="https://colab.research.google.com/github/talmolab/sleap/blob/main/docs/notebooks/Interactive_and_resumable_training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Interactive and resumable training

Most of the time, you will be training models through the GUI or using the [`sleap-train` CLI](https://sleap.ai/guides/cli.html#sleap-train).

If you'd like to customize the training process, however, you can use SLEAP's low-level training functionality interactively. This allows you to define scripts that train models according to your own workflow, for example, to **resume training** on an already trained model. Another possible application would be to train a model using **transfer learning**, where a pretrained model can be used to initialize the weights of the new model.

In this notebook we will explore how to set up a training job and train a model for multiple rounds without the GUI or CLI.

## 1. Setup SLEAP

Run this cell first to install SLEAP. If you get a dependency error in subsequent cells, just click **Runtime** → **Restart runtime** to reload the packages.

Don't forget to set **Runtime** → **Change runtime type** → **GPU** as the accelerator.

In [2]:
# This should take care of all the dependencies on colab:
!pip uninstall -y opencv-python opencv-contrib-python && pip install sleap[pypi]


# But to do it locally, we'd recommend the conda package (available on Windows + Linux):
# conda create -n sleap -c sleap -c conda-forge -c nvidia sleap

Collecting opencv-python<=4.6.0,>=4.2.0 (from sleap[pypi])
  Using cached opencv_python-4.5.5.64-cp36-abi3-win_amd64.whl (35.4 MB)
Installing collected packages: opencv-python
Successfully installed opencv-python-4.5.5.64




Import SLEAP to make sure it installed correctly and print out some information about the system:

In [3]:
import sleap
sleap.versions()
sleap.system_summary()

SLEAP: 1.3.2
TensorFlow: 2.11.0
Numpy: 1.21.6
Python: 3.7.12
OS: Windows-10-10.0.19041-SP0
GPUs: None detected.


## 2. Setup training data

Here we will download an existing training dataset package. This is an `.slp` file that contains both the labeled poses, as well as the image data for labeled frames.

If running on Google Colab, you'll want to replace this with mounting your Google Drive folder containing your own data, or if running locally, simply change the path to your labels below in `TRAINING_SLP_FILE`.

In [4]:
# !curl -L --output labels.pkg.slp https://www.dropbox.com/s/b990gxjt3d3j3jh/210205.sleap_wt_gold.13pt.pkg.slp?dl=1
!curl -L --output labels.pkg.slp https://storage.googleapis.com/sleap-data/datasets/wt_gold.13pt/tracking_split2/train.pkg.slp
!ls -lah

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  619M    0  320k    0     0   344k      0  0:30:42 --:--:--  0:30:42  344k
  4  619M    4 27.6M    0     0  13.8M      0  0:00:44  0:00:01  0:00:43 13.8M
  8  619M    8 50.4M    0     0  17.2M      0  0:00:35  0:00:02  0:00:33 17.2M
 12  619M   12 77.7M    0     0  19.7M      0  0:00:31  0:00:03  0:00:28 19.7M
 17  619M   17  110M    0     0  22.4M      0  0:00:27  0:00:04  0:00:23 22.4M
 24  619M   24  151M    0     0  25.4M      0  0:00:24  0:00:05  0:00:19 30.1M
 31  619M   31  192M    0     0  27.7M      0  0:00:22  0:00:06  0:00:16 33.3M
 36  619M   36  227M    0     0  28.6M      0  0:00:21  0:00:07  0:00:14 35.4M
 43  619M   43  267M    0     0  29.9M      0  0:00:20  0:00:08  0:00:12 37.9M
 49  619M   49  304M    0     0  30.2M      0  0:00

In [5]:
TRAINING_SLP_FILE = "labels.pkg.slp"

## 3. Setup training job

A SLEAP `TrainingJobConfig` is a structure that contains all of the hyperparameters needed to train a SLEAP model. This is typically saved out to `initial_config.json` and `training_config.json` in the model folder so that training runs can be reproduced if needed, as well as to store metadata necessary for inference.

Normally, these are generated interactively by the GUI, or manually by editing an existing JSON file in a text editor. Here, we will define a configuration interactively entirely in Python.

In [6]:
from sleap.nn.config import *

# Initialize the default training job configuration.
cfg = TrainingJobConfig()

# Update path to training data we just downloaded.
cfg.data.labels.training_labels = TRAINING_SLP_FILE
cfg.data.labels.validation_fraction = 0.1

# Preprocesssing and training parameters.
cfg.data.instance_cropping.center_on_part = "thorax"
cfg.optimization.augmentation_config.rotate = True
cfg.optimization.epochs = 10  # This is the maximum number of training rounds.

# These configures the actual neural network and the model type:
cfg.model.backbone.unet = UNetConfig(
    filters=16,
    output_stride=4
)
cfg.model.heads.centered_instance = CenteredInstanceConfmapsHeadConfig(
    anchor_part="thorax",
    sigma=1.5,
    output_stride=4
)

# Setup how we want to save the trained model.
cfg.outputs.run_name = "baseline_model.topdown"

Existing configs can also be loaded from a `.json` file with:

```python
cfg = sleap.load_config("training_config.json")
```

## 4. Training
Next we will create a SLEAP `Trainer` from the configuration we just specified. This handles all the nitty gritty mechanics necessary to setup training in the backend.

In [7]:
trainer = sleap.nn.training.Trainer.from_config(cfg)

INFO:sleap.nn.training:Loading training labels from: labels.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 1440 / Validation = 160.


Great, now we're ready to do the first round of training. This is when the model will actually start to improve over time:

In [9]:
trainer.train()

INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [15.6s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/10
360/360 - 77s - loss: 0.0024 - head: 8.5501e-04 - thorax: 9.5685e-04 - abdomen: 0.0023 - wingL: 0.0023 - wingR: 0.0023 - forelegL4: 0.0030 - forelegR4: 0.0028 - midlegL4: 0.0035 - midlegR4: 0.0035 - hindlegL4: 0.0036 - hindlegR4: 0.0036 - eyeL: 9.2221e-04 - eyeR: 9.6568e-04 - val_loss: 0.0021 - val_head: 5.9166e-04 - val_thorax: 6.1347e-04 - val_abdomen: 0.0022 - val_wingL: 0.0018 - val_wingR: 0.0020 - val_forelegL4: 0.0027 - val_forelegR4: 0.0027 - val_midlegL4: 0.0032 - val_midlegR4: 0.0034 - val_hindlegL4: 0.0035 - val_hindlegR4: 0.0034 - val_eyeL: 7.9556e-04 - val_eyeR: 8.8806e-04 - lr: 1.0000e-04 - 77s/epoch - 213ms/step
Epoch 2/10
360/360 - 78s - loss: 0.0020 - head: 6.5916e-04 - thorax: 5.5024e-04 - abdomen: 0.0020 - wingL: 0.0019 - wingR: 0.0019 - forelegL4: 0.0025 - forel

INFO:sleap.nn.evals:Saved predictions: models\baseline_model.topdown\labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models\baseline_model.topdown\metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.528719


INFO:sleap.nn.evals:Saved predictions: models\baseline_model.topdown\labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models\baseline_model.topdown\metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.534329


## 5. Continuing training

If we still have the trainer in memory, we can continue training by simply calling `trainer.train()` again with a potentially different number of epochs:

In [8]:
trainer.config.optimization.epochs = 3
trainer.train()

INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [29.4s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/3
360/360 - 57s - loss: 9.1732e-04 - head: 3.5629e-04 - thorax: 1.9609e-04 - abdomen: 0.0010 - wingL: 9.1318e-04 - wingR: 9.1330e-04 - forelegL4: 0.0013 - forelegR4: 0.0013 - midlegL4: 0.0011 - midlegR4: 0.0011 - hindlegL4: 0.0014 - hindlegR4: 0.0015 - eyeL: 4.4475e-04 - eyeR: 4.3944e-04 - val_loss: 9.2727e-04 - val_head: 3.8719e-04 - val_thorax: 1.5200e-04 - val_abdomen: 0.0011 - val_wingL: 9.3115e-04 - val_wingR: 8.9376e-04 - val_forelegL4: 0.0012 - val_forelegR4: 0.0012 - val_midlegL4: 9.9703e-04 - val_midlegR4: 0.0012 - val_hindlegL4: 0.0015 - val_hindlegR4: 0.0016 - val_eyeL: 4.5374e-04 - val_eyeR: 5.1839e-04 - lr: 1.0000e-04 - 57s/epoch - 158ms/step
Epoch 2/3
360/360 - 56s - loss: 8.7900e-04 - head: 3.4532e-04 - thorax: 1.7895e-04 - abdomen: 0.0010 - wingL: 8.7539e-04 - wingR: 

Output()

INFO:sleap.nn.evals:Saved predictions: models/baseline_model.topdown/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/baseline_model.topdown/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.551905


Output()

INFO:sleap.nn.evals:Saved predictions: models/baseline_model.topdown/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/baseline_model.topdown/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.551469


As you can see, the loss and accuracy pick up from where it left off in the previous training.


Usually, however, if you're continuing training it's likely because you're starting off from an already trained model.

In this case, all you need to do to continue training is to create a new `Trainer` from the existing model configuration and load up the weights before continuing training:

In [9]:
# Load config.
cfg = sleap.load_config("models/baseline_model.topdown")
# cfg.outputs.run_name = "new_folder"  # Set the run_name to a new value if you want the model to be saved to a different folder.

# Create and initialize the trainer.
trainer = sleap.nn.training.Trainer.from_config(cfg)
trainer.setup()

# Replace the randomly initialized weights with the saved weights.
trainer.keras_model.load_weights("models/baseline_model.topdown/best_model.h5")

INFO:sleap.nn.training:Loading training labels from: labels.pkg.slp
INFO:sleap.nn.training:Creating training and validation splits from validation fraction: 0.1
INFO:sleap.nn.training:  Splits: Training = 1440 / Validation = 160.
INFO:sleap.nn.training:Setting up for training...
INFO:sleap.nn.training:Setting up pipeline builders...
INFO:sleap.nn.training:Setting up model...
INFO:sleap.nn.training:Building test pipeline...
INFO:sleap.nn.training:Loaded test example. [1.909s]
INFO:sleap.nn.training:  Input shape: (160, 160, 1)
INFO:sleap.nn.training:Created Keras model.
INFO:sleap.nn.training:  Backbone: UNet(stacks=1, filters=16, filters_rate=2.0, kernel_size=3, stem_kernel_size=7, convs_per_block=2, stem_blocks=0, down_blocks=4, middle_block=True, up_blocks=2, up_interpolate=False, block_contraction=False)
INFO:sleap.nn.training:  Max stride: 16
INFO:sleap.nn.training:  Parameters: 2,101,501
INFO:sleap.nn.training:  Heads: 
INFO:sleap.nn.training:    [0] = CenteredInstanceConfmapsHead

In [10]:
trainer.config.optimization.epochs = 3
trainer.train()

INFO:sleap.nn.training:Creating tf.data.Datasets for training data generation...
INFO:sleap.nn.training:Finished creating training datasets. [28.9s]
INFO:sleap.nn.training:Starting training loop...
Epoch 1/3
360/360 - 63s - loss: 8.2769e-04 - head: 3.4427e-04 - thorax: 1.6900e-04 - abdomen: 9.4941e-04 - wingL: 8.1514e-04 - wingR: 8.1826e-04 - forelegL4: 0.0012 - forelegR4: 0.0012 - midlegL4: 9.2980e-04 - midlegR4: 9.6439e-04 - hindlegL4: 0.0013 - hindlegR4: 0.0013 - eyeL: 4.2129e-04 - eyeR: 4.0767e-04 - val_loss: 7.8855e-04 - val_head: 3.2701e-04 - val_thorax: 1.8405e-04 - val_abdomen: 0.0010 - val_wingL: 7.3709e-04 - val_wingR: 7.1027e-04 - val_forelegL4: 0.0010 - val_forelegR4: 0.0011 - val_midlegL4: 9.3918e-04 - val_midlegR4: 9.0288e-04 - val_hindlegL4: 0.0012 - val_hindlegR4: 0.0013 - val_eyeL: 3.8746e-04 - val_eyeR: 3.3939e-04 - lr: 1.0000e-04 - 63s/epoch - 174ms/step
Epoch 2/3
360/360 - 58s - loss: 7.9662e-04 - head: 3.2407e-04 - thorax: 1.5127e-04 - abdomen: 9.1911e-04 - wingL: 

Output()

INFO:sleap.nn.evals:Saved predictions: models/baseline_model.topdown/labels_pr.train.slp
INFO:sleap.nn.evals:Saved metrics: models/baseline_model.topdown/metrics.train.npz
INFO:sleap.nn.evals:OKS mAP: 0.597609


Output()

INFO:sleap.nn.evals:Saved predictions: models/baseline_model.topdown/labels_pr.val.slp
INFO:sleap.nn.evals:Saved metrics: models/baseline_model.topdown/metrics.val.npz
INFO:sleap.nn.evals:OKS mAP: 0.621393


Again, the loss and accuracy pick up from where they left off prior to this round of training.

The resulting model can be used as usual for inference on new data.