<a href="https://colab.research.google.com/github/talmolab/sleap/blob/main/docs/notebooks/Training_and_inference_on_an_example_dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Training and inference on an example dataset

In this notebook we'll install SLEAP, download a sample dataset, run training and inference on that dataset using the SLEAP command-line interface, and then download the predictions.

## Install SLEAP
Note: Before installing SLEAP check [SLEAP releases](https://github.com/talmolab/sleap/releases) page for the latest version.

In [36]:
!pip uninstall -qqq -y opencv-python opencv-contrib-python
!pip install -qqq "sleap[pypi]>=1.3.3"

[31mERROR: Cannot uninstall opencv-python 4.6.0, RECORD file not found. Hint: The package was installed by conda.[0m[31m
[0m[31mERROR: Cannot uninstall shiboken2 5.15.6, RECORD file not found. You might be able to recover from this via: 'pip install --force-reinstall --no-deps shiboken2==5.15.6'.[0m[31m
[0m

## Download sample training data into Colab
Let's download a sample dataset from the SLEAP [sample datasets repository](https://github.com/talmolab/sleap-datasets) into Colab.

In [24]:
!apt-get install tree
!wget -O dataset.zip https://github.com/talmolab/sleap-datasets/releases/download/dm-courtship-v1/drosophila-melanogaster-courtship.zip
!mkdir dataset
!unzip dataset.zip -d dataset
!rm dataset.zip
!tree dataset

E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied)
E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root?
--2023-09-01 13:30:33--  https://github.com/talmolab/sleap-datasets/releases/download/dm-courtship-v1/drosophila-melanogaster-courtship.zip
Resolving github.com (github.com)... 192.30.255.113
Connecting to github.com (github.com)|192.30.255.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/263375180/16df8d00-94f1-11ea-98d1-6c03a2f89e1c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20230901%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230901T203033Z&X-Amz-Expires=300&X-Amz-Signature=b9b0638744af3144affdc46668c749128bd6c4f23ca2a1313821c7bbcd54ccdd&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=263375180&response-content-disposition=attachment%3B%20filename%3Ddrosophila-melan

## Train models
For the top-down pipeline, we'll need train two models: a centroid model and a centered-instance model.

Using the command-line interface, we'll first train a model for centroids using the default **training profile**. The training profile determines the model architecture, the learning rate, and other parameters.

When you start training, you'll first see the training parameters and then the training and validation loss for each training epoch. 

As soon as you're satisfied with the validation loss you see for an epoch during training, you're welcome to stop training by clicking the stop button. The version of the model with the lowest validation loss is saved during training, and that's what will be used for inference.

If you don't stop training, it will run for 200 epochs or until validation loss fails to improve for some number of epochs (controlled by the `early_stopping` fields in the training profile).

In [25]:
!sleap-train baseline.centroid.json "dataset/drosophila-melanogaster-courtship/courtship_labels.slp" --run_name "courtship.centroid" --video-paths "dataset/drosophila-melanogaster-courtship/20190128_113421.mp4"

INFO:sleap.nn.training:Versions:
SLEAP: 1.3.2
TensorFlow: 2.7.0
Numpy: 1.21.5
Python: 3.7.12
OS: Linux-5.15.0-78-generic-x86_64-with-debian-bookworm-sid
INFO:sleap.nn.training:Training labels file: dataset/drosophila-melanogaster-courtship/courtship_labels.slp
INFO:sleap.nn.training:Training profile: /home/talmolab/sleap-estimates-animal-poses/pull-requests/sleap/sleap/training_profiles/baseline.centroid.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "baseline.centroid.json",
    "labels_path": "dataset/drosophila-melanogaster-courtship/courtship_labels.slp",
    "video_paths": [
        "dataset/drosophila-melanogaster-courtship/20190128_113421.mp4"
    ],
    "val_labels": null,
    "test_labels": null,
    "base_checkpoint": null,
    "tensorboard": false,
    "save_viz": false,
    "zmq": false,
    "run_name": "courtship.centroid",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "first_gpu": false,
    "la

Let's now train a centered-instance model.

In [26]:
!sleap-train baseline_medium_rf.topdown.json "dataset/drosophila-melanogaster-courtship/courtship_labels.slp" --run_name "courtship.topdown_confmaps" --video-paths "dataset/drosophila-melanogaster-courtship/20190128_113421.mp4"

INFO:sleap.nn.training:Versions:
SLEAP: 1.3.2
TensorFlow: 2.7.0
Numpy: 1.21.5
Python: 3.7.12
OS: Linux-5.15.0-78-generic-x86_64-with-debian-bookworm-sid
INFO:sleap.nn.training:Training labels file: dataset/drosophila-melanogaster-courtship/courtship_labels.slp
INFO:sleap.nn.training:Training profile: /home/talmolab/sleap-estimates-animal-poses/pull-requests/sleap/sleap/training_profiles/baseline_medium_rf.topdown.json
INFO:sleap.nn.training:
INFO:sleap.nn.training:Arguments:
INFO:sleap.nn.training:{
    "training_job_path": "baseline_medium_rf.topdown.json",
    "labels_path": "dataset/drosophila-melanogaster-courtship/courtship_labels.slp",
    "video_paths": [
        "dataset/drosophila-melanogaster-courtship/20190128_113421.mp4"
    ],
    "val_labels": null,
    "test_labels": null,
    "base_checkpoint": null,
    "tensorboard": false,
    "save_viz": false,
    "zmq": false,
    "run_name": "courtship.topdown_confmaps",
    "prefix": "",
    "suffix": "",
    "cpu": false,
    "

The models (along with the profiles and ground truth data used to train and validate the model) are saved in the `models/` directory:

In [27]:
!tree models/

[01;34mmodels/[00m
├── [01;34mcourtship.centroid[00m
│   ├── best_model.h5
│   ├── initial_config.json
│   ├── labels_gt.train.slp
│   ├── labels_gt.val.slp
│   ├── labels_pr.train.slp
│   ├── labels_pr.val.slp
│   ├── metrics.train.npz
│   ├── metrics.val.npz
│   ├── training_config.json
│   └── training_log.csv
└── [01;34mcourtship.topdown_confmaps[00m
    ├── best_model.h5
    ├── initial_config.json
    ├── labels_gt.train.slp
    ├── labels_gt.val.slp
    ├── labels_pr.train.slp
    ├── labels_pr.val.slp
    ├── metrics.train.npz
    ├── metrics.val.npz
    ├── training_config.json
    └── training_log.csv

2 directories, 20 files


## Inference
Let's run inference with our trained models for centroids and centered instances.

In [28]:
!sleap-track "dataset/drosophila-melanogaster-courtship/20190128_113421.mp4" --frames 0-100 -m "models/courtship.centroid" -m "models/courtship.topdown_confmaps"

Started inference at: 2023-09-01 13:42:03.066840
Args:
[1m{[0m
[2;32m│   [0m[32m'data_path'[0m: [32m'dataset/drosophila-melanogaster-courtship/20190128_113421.mp4'[0m,
[2;32m│   [0m[32m'models'[0m: [1m[[0m
[2;32m│   │   [0m[32m'models/courtship.centroid'[0m,
[2;32m│   │   [0m[32m'models/courtship.topdown_confmaps'[0m
[2;32m│   [0m[1m][0m,
[2;32m│   [0m[32m'frames'[0m: [32m'0-100'[0m,
[2;32m│   [0m[32m'only_labeled_frames'[0m: [3;91mFalse[0m,
[2;32m│   [0m[32m'only_suggested_frames'[0m: [3;91mFalse[0m,
[2;32m│   [0m[32m'output'[0m: [3;35mNone[0m,
[2;32m│   [0m[32m'no_empty_frames'[0m: [3;91mFalse[0m,
[2;32m│   [0m[32m'verbosity'[0m: [32m'rich'[0m,
[2;32m│   [0m[32m'video.dataset'[0m: [3;35mNone[0m,
[2;32m│   [0m[32m'video.input_format'[0m: [32m'channels_last'[0m,
[2;32m│   [0m[32m'video.index'[0m: [32m''[0m,
[2;32m│   [0m[32m'cpu'[0m: [3;91mFalse[0m,
[2;32m│   [0m[32m'first_gpu'[0m: [3;91mFalse[0

When inference is finished, predictions are saved in a file. Since we didn't specify a path, it will be saved as `<video filename>.predictions.slp` in the same directory as the video:

In [29]:
!tree dataset/drosophila-melanogaster-courtship

[01;34mdataset/drosophila-melanogaster-courtship[00m
├── [01;32m20190128_113421.mp4[00m
├── 20190128_113421.mp4.predictions.slp
├── [01;32mcourtship_labels.slp[00m
└── [01;35mexample.jpg[00m

0 directories, 4 files


You can inspect your predictions file using `sleap-inspect`:

In [30]:
!sleap-inspect dataset/drosophila-melanogaster-courtship/20190128_113421.mp4.predictions.slp

Labeled frames: 101
Tracks: 0
Video files:
  dataset/drosophila-melanogaster-courtship/20190128_113421.mp4
    labeled frames: 101
    labeled frames from 0 to 100
    user labeled frames: 0
    tracks: 1
    max instances in frame: 2
Total user labeled frames: 0

Provenance:
  model_paths: ['models/courtship.centroid/training_config.json', 'models/courtship.topdown_confmaps/training_config.json']
  predictor: TopDownPredictor
  sleap_version: 1.3.2
  platform: Linux-5.15.0-78-generic-x86_64-with-debian-bookworm-sid
  command: /home/talmolab/micromamba/envs/s0/bin/sleap-track dataset/drosophila-melanogaster-courtship/20190128_113421.mp4 --frames 0-100 -m models/courtship.centroid -m models/courtship.topdown_confmaps
  data_path: dataset/drosophila-melanogaster-courtship/20190128_113421.mp4
  output_path: dataset/drosophila-melanogaster-courtship/20190128_113421.mp4.predictions.slp
  total_elapsed: 7.775644779205322
  start_timestamp: 2023-09-01 13:42:03.066840
  finish_timestamp: 2023-

If you're using Chrome you can download your trained models like so:

In [31]:
# Zip up the models directory
!zip -r trained_models.zip models/

# Download.
from google.colab import files
files.download("/content/trained_models.zip")

  adding: models/ (stored 0%)
  adding: models/courtship.topdown_confmaps/ (stored 0%)
  adding: models/courtship.topdown_confmaps/labels_pr.val.slp (deflated 74%)
  adding: models/courtship.topdown_confmaps/metrics.val.npz (deflated 0%)
  adding: models/courtship.topdown_confmaps/labels_pr.train.slp (deflated 67%)
  adding: models/courtship.topdown_confmaps/labels_gt.val.slp (deflated 72%)
  adding: models/courtship.topdown_confmaps/initial_config.json (deflated 73%)
  adding: models/courtship.topdown_confmaps/training_log.csv (deflated 55%)
  adding: models/courtship.topdown_confmaps/metrics.train.npz (deflated 0%)
  adding: models/courtship.topdown_confmaps/labels_gt.train.slp (deflated 61%)
  adding: models/courtship.topdown_confmaps/best_model.h5 (deflated 8%)
  adding: models/courtship.topdown_confmaps/training_config.json (deflated 88%)
  adding: models/courtship.centroid/ (stored 0%)
  adding: models/courtship.centroid/labels_pr.val.slp (deflated 82%)
  adding: models/courtship

And you can likewise download your predictions:

In [33]:
from google.colab import files
files.download('dataset/drosophila-melanogaster-courtship/20190128_113421.mp4.predictions.slp')

In some other browsers (Safari) you might get an error and you can instead download using the "Files" tab in the side panel (it has a folder icon). Select "Show table of contents" in the "View" menu if you don't see the side panel.