Skip to content

SLEAP-NN v0.3.0

Latest

Choose a tag to compare

@talmo talmo released this 29 Jun 20:07
f92d812

Summary

SLEAP-NN v0.3.0 is a major release centered on a new unified sleap-nn predict inference command and a clean Predictor Python API, first-class centroid-only models, an experimental instance-segmentation stack (bottom-up, top-down, and SAM-prompted), the sleap-io v0.8.0 annotation architecture, and Kalman tracking. GPU (CUDA 13 / cu130) is now the default backend. 60+ PRs since v0.2.0.

⚠️ Breaking changes: sleap-nn predict now means pose inference (in v0.2.0 it was the exported-model runner); the sleap-io pin moves to >=0.8.0,<0.9.0; seed defaults to 42; and a fresh prediction .slp is no longer embedded by default. Read Breaking Changes and Upgrade Notes before updating automation or the SLEAP GUI.

Highlights:

  • New sleap-nn predict inference command + Predictor API. A single, unified entry point from model dir(s) + data to sio.Labels, with streaming, raw-tensor access, in-memory frames, and matching CLI ↔ Python ergonomics. (In v0.2.0 the inference command was sleap-nn track, which remains available as a legacy command.)
  • Top-level Python API: sleap_nn.predict(...), sleap_nn.Predictor, and sleap_nn.load_models(...) are now importable straight from sleap_nn for quick scripting and discoverability.
  • Centroid-only models are first-class. Train a lone centroid head and predict / evaluate / export it end-to-end; a single centroid directory auto-detects.
  • Instance segmentation (experimental): bottom-up, top-down (centered_instance_segmentation), and SAM-prompted backends, with mask evaluation, mask-IoU tracking, and training augmentation/viz/eval parity.
  • Kalman tracking (--use_kalman) alongside optical-flow shift, plus a per-node "keypoints" tracking mode.
  • ⚠️ sleap-io >=0.8.0,<0.9.0 — the new annotation architecture (Centroid, PredictedSegmentationMask, PredictedROI, rendering helpers) plus a few behavior changes (read-only annotation views, identity-default track matching).
  • Exported ONNX/TensorRT inference is unified into sleap-nn predict --runtime onnx|tensorrt (CLI) and Predictor.from_export_dir(...) (Python).
  • GPU (cu130) is the default backend; remote-URL --data_path; repeatable --output_format; configurable output embedding/source-video controls.

Installation

# Install / upgrade the CLI tool (auto-selects the right torch backend)
uv tool install sleap-nn --torch-backend auto --upgrade

# Verify
sleap-nn --version
# Expected output: sleap-nn 0.3.0

GPU builds now default to the cu130 (CUDA 13) backend. See the installation docs for CPU-only, specific CUDA versions, and project-dependency usage.


Breaking Changes

⚠️ sleap-nn predict now means pose inference

In v0.2.0, sleap-nn predict was the exported-model runner (positional EXPORT_DIR VIDEO --runtime ...), and pose inference was run with sleap-nn track. In v0.3.0, sleap-nn predict is the unified pose-inference command, and running an exported ONNX/TensorRT model is now a flag on it. sleap-nn track still exists as a legacy command, so existing track scripts keep working; new work should use predict.

# Pose inference (v0.2.0)
sleap-nn track   -i video.mp4 -m models/centroid/ -m models/centered_instance/
# Pose inference (v0.3.0)
sleap-nn predict -i video.mp4 -m models/centroid/ -m models/centered_instance/

# Exported ONNX/TRT model (v0.2.0): sleap-nn predict <export_dir> <video> --runtime onnx
# Exported ONNX/TRT model (v0.3.0):
sleap-nn predict -m exported_model/ -i video.mp4 -o predictions.slp --runtime onnx

The standalone exported-inference entry points were removed — including the sleap_nn.export.predictors.ONNXPredictor class. The Python replacement for running exported models is Predictor.from_export_dir, which supports both ONNX and TensorRT:

from sleap_nn.inference import Predictor

predictor = Predictor.from_export_dir("exported_model/", runtime="onnx")       # or runtime="tensorrt"
labels = predictor.predict("video.mp4")

⚠️ sleap-io pinned to >=0.8.0,<0.9.0

v0.3.0 adopts the sleap-io v0.8.0 annotation architecture (v0.2.0 capped sleap-io at <0.8.0). Notable downstream-visible changes: Labels annotation lists (instances, masks, …) are now read-only views (mutate via the documented APIs); and track matching in Labels.merge() / Labels.match() now defaults to identity, not name — pass track="name" to restore same-named-track collapsing. Analysis-HDF5 / save behavior was re-baselined (regression-guarded).

base.merge(other)                 # 0.8.0 default: same-named tracks stay separate
base.merge(other, track="name")   # restore pre-0.8.0 name-collapse behavior

⚠️ seed now defaults to 42

Training previously left seed unset; it now defaults to 42, and runs warn on a seed mismatch when resuming. This changes the train/val split RNG vs v0.2.0, so a config that omits seed will produce a different split than before. All shipped sample configs were set to seed: 42. To restore fully-random behavior, set seed: null explicitly.

⚠️ Default prediction .slp is non-embedded

By default a prediction .slp is now written non-embedded, referencing the original source videos (--embed false, --restore_source_videos true) rather than a self-contained .pkg.slp. Pass --embed true for a self-contained file. The GUI and any downstream .slp loader must expect non-embedded outputs.


New Features

Unified sleap-nn predict + Predictor API

The inference stack gained a unified command and a clean Python surface. The one-call predict() returns sio.Labels; Predictor is reusable; raw model outputs and streaming are first-class.

from sleap_nn.inference import predict, Predictor

# One call -> sio.Labels (single-stage / bottom-up)
labels = predict("video.mp4", model_paths=["models/my_model/"])
# Top-down (centroid + centered-instance)
labels = predict("video.mp4", model_paths=["models/centroid/", "models/ci/"])

# Build once, predict many times
predictor = Predictor.from_model_paths(["models/bottomup/"], device="cuda")
labels = predictor.predict("video.mp4", peak_threshold=0.3)

# Raw outputs (confmaps / PAFs / centroids) without building Labels
for out in predictor.predict("video.mp4", make_labels=False, return_confmaps=True):
    cms = out.pred_confmaps

# Predict directly on in-memory frames: np.ndarray / torch.Tensor (N, H, W, C)
labels = predictor.predict(frames)

Top-level Python API for discoverability

predict, Predictor, and a new load_models(...) convenience are importable directly from sleap_nn:

import sleap_nn

labels = sleap_nn.predict("video.mp4", model_paths=["models/my_model/"])   # one-shot
predictor = sleap_nn.load_models(["models/bottomup/"], device="cuda")      # reusable Predictor
labels = predictor.predict("video.mp4")

Centroid-only models

Train a lone centroid head and use it end-to-end. A single centroid directory auto-detects; the output collapses to a single-node centroid skeleton emitting sio.Centroid; evaluation uses distance matching.

sleap-nn predict -m models/centroid/ -i video.mp4 -o centroids.slp
sleap-nn eval -g gt.slp -p centroids.slp --match_method centroid

Centroid-only models are an inference/predict feature; running one through the legacy track path raises a clear error (use predict). Standalone ONNX/TensorRT export is supported.

Instance segmentation — experimental

  • Bottom-up: training stack, masks carried through to sio.Labels (frame.masks), wired into predict with --min_mask_area/--fg_threshold.
  • Top-down (centered_instance_segmentation): offset-aware mask decode, training, inference, and eval routing/CLI.
  • SAM-prompted (opt-in, lazy heavy deps): SAM1 and SAM3 backends, mask-based reconciliation + re-tracking.
  • Tracking & post-processing: mask-IoU tracker; adaptive distance-gate + greedy RAG fragment-merge; mask-at-output-stride encoding with opt-in morphology and polygon ROI.
  • Eval & training parity: COCO mask AP/AR/boundary-IoU/fragmentation/per-size + PQ; post-training mask-IoU eval; augmentation/viz/eval parity; predicted-mask overlay in training viz.

Segmentation is experimental in 0.3.0 — see Known Limitations.

Tracking

KalmanShiftTracker (--use_kalman, requires a target instance count; uses pykalman) lands alongside optical-flow shift, with a per-node "keypoints" tracking mode. FlowShiftTracker no longer crashes on frames with no detections. Setting a track cap (--max_tracks) now auto-switches the candidate method to local_queues (logged at INFO) so the cap is actually honored — previously it was silently ignored under the default fixed_window (#670).

Inference UX & I/O

  • Repeatable --output_format — write several formats by repeating the flag: --output_format slp --output_format analysis_h5 writes both a .slp and a SLEAP Analysis HDF5 (one .analysis.h5 per video).
  • Remote URLs (http/https/s3/gs/…) accepted as --data_path.
  • Configurable output embedding / source-video controls for the .slp (--embed, --restore_source_videos).
  • Frame-based progress with a windowed FPS column, inference spin-up & run-summary logging, and a tracking progress bar.
  • --workspace-size-gb for TensorRT export.

Training

Symmetry-aware flip augmentation; a hard error on multi-instance frames in single-instance training; and negative-frame split metrics (val/loss_negative, val/n_negative) with a double-count fix. Out-of-bounds keypoints — off-frame annotations, or nodes pushed outside the crop/frame by augmentation — are now masked to empty targets instead of producing phantom Gaussian blobs at the image/crop edge (#666). Training visualizations can now be saved as JPG to shrink the local viz/ folder when training a battery of models (trainer_config.viz_img_format: jpg; #644).


Fixes

  • FlowShiftTracker no longer crashes on detection-less frames (#612).
  • A warning now fires when SizeMatcher silently resizes input frames (#561).
  • WandBRenderer peak-values shape fixed for the centroid case (#557).
  • Each Tracker gets its own _track_objects state (mutable-default bug) (#592).
  • Three bottom-up segmentation inference parity bugs fixed (#614).
  • Single-instance ONNX/TensorRT export now antialiases the input resize to match PyTorch inference, fixing a ~5 px keypoint discrepancy (#672).
  • Mixed-resolution .slp / .pkg.slp inputs no longer crash predict / post-training eval — LabelsProvider now grows each batch to a shared image shape and closes it at a video/resolution boundary (#678).

Dependencies & Build

  • sleap-io >=0.8.0,<0.9.0 from PyPI; 0.8.0 save/analysis behavior is regression-guarded.
  • GPU (cu130) is the default uv backend; gpu/cpu are first-class torch extras so --extra gpu attaches cuDNN.
  • pykalman is a new (lazy-imported) core dependency for Kalman tracking.

Documentation

Inference docs migrated to sleap-nn predict with nav/overview cleanup (plus multi-GPU & Windows uv fixes); a flies13 top-down training + tracking demo notebook; segmentation docs/config polish; and a docs-correctness sweep (fixed broken copy-paste examples, corrected eval defaults, and seed: 42 in all sample configs). The model reference also gained a reworked plain-language "Choosing a Model" picker and a new Supervised ID guide for the multi_class_topdown / multi_class_bottomup model types (#677).


Known Limitations (planned for 0.3.1)

The segmentation/SAM stack is experimental and intentionally limited; it is opt-in and does not affect the core pose train/predict/eval paths.

  • No config-generator/TUI scaffolding for the segmentation model types — hand-author YAML or copy a sample config (config_bottomup_segmentation_unet.yaml, config_topdown_centered_instance_segmentation_unet.yaml).
  • Mask tracking is an MVP; mask_output="polygon" writes only frame.rois (not re-trainable / mask-evaluable); SAM reconciliation/re-tracking has no CLI producer yet.
  • Inference API: no clean accessor for the underlying torch nn.Module / head-swap "model surgery" yet; the realtime layer.warmup() is not auto-invoked, so the first single-frame call pays cold-start.
  • Eval: evaluation.get_instances ignores LabeledFrame.centroids, so only the non-default emit_centroid="centroid" .slp is affected (the default instance emission evaluates fine).

Upgrade Notes

  • Pose inference: prefer sleap-nn predict … (the v0.2.0 sleap-nn track … still works as a legacy command).
  • Exported models: sleap-nn predict -m <export_dir> --runtime onnx|tensorrt, or in Python Predictor.from_export_dir(<export_dir>, runtime="onnx"|"tensorrt") (the ONNXPredictor class is gone).
  • Run centroid-only models with predict, not track.
  • Expect prediction .slp files to be non-embedded and to reference the original source videos by default (--embed true restores embedding).
  • If you pin sleap-io, move to >=0.8.0,<0.9.0; if you call Labels.merge() / .match() directly, pass track="name" to keep the pre-0.8.0 behavior.
  • A config that omits seed now defaults to 42 (different split RNG than v0.2.0).
  • The legacy from sleap_nn.predict import run_inference import moved to from sleap_nn.legacy_predict import run_inference (the sleap_nn.predict name is now the high-level inference function).
  • SLEAP GUI integrators: route centroid-only models to predict, stop using the removed exported-inference entry points, and tolerate the new default non-embedded .slp output.

Changelog

  • #530: New unified inference pipeline / Predictor API (#508) (@gitttt-1234)
  • #557: Fix WandBRenderer peak_values shape for the centroid case (@davorvr)
  • #558: Restrict Codecov upload to talmolab/sleap-nn (skip upload in forks) (@davorvr)
  • #559: Expose --workspace-size-gb to the export CLI (TensorRT) (@davorvr)
  • #560: Fix swint torch.fx.wrap so torch.compile works (#527) (@gitttt-1234)
  • #561: Warn when SizeMatcher silently resizes input frames (@tom21100227)
  • #562: Centroid-only inference + mean-of-visible-nodes anchor fallback (@gitttt-1234)
  • #563: Device-agnostic layer buffers + Linux spawn-context for the PAF pool (@gitttt-1234)
  • #564: Inference preprocessing parity with the legacy pipeline (@gitttt-1234)
  • #580: Inference loader/factory fork (independent of the legacy predictors module) (@gitttt-1234)
  • #585: Inference parity/correctness follow-ups (@talmo)
  • #587: Inference CLI/streaming/feature follow-ups (@talmo)
  • #588: Inference test-coverage + minor-correctness follow-ups (@talmo)
  • #589: Centroid-only models (1/3): core inference — collapse + sio.Centroid emission (@talmo)
  • #590: Centroid-only models (2/3): distance eval + single-point tracking + train post-eval (@talmo)
  • #591: Centroid-only models (3/3): authoring UX + export consistency + docs (@talmo)
  • #592: Give each Tracker its own _track_objects dict (mutable attrs default) (#574) (@talmo)
  • #593: Accept .ckpt / training_config paths in model_paths (#575) (@talmo)
  • #594: Validation-side negative-frame split metrics (val/loss_negative, val/n_negative) (#577) (@talmo)
  • #595: Add KalmanShiftTracker for legacy parity (#572) (@talmo)
  • #596: Fix KalmanShiftTracker algorithmic correctness (#572 follow-up) (@talmo)
  • #597: Fix negative-frame metric double-count; add weighted/unweighted + per-head split metrics (@talmo)
  • #599: Add keypoints (per-node pose) tracking mode for KalmanShiftTracker (#572) (@talmo)
  • #600: Default seed to 42 and warn on seed mismatch during resume (@gitttt-1234)
  • #601: Add a progress bar for tracking in the new inference pipeline (@gitttt-1234)
  • #603: Bottom-up instance segmentation — training stack (refresh of #501) (@talmo)
  • #604: Carry segmentation masks through Outputs → sio.Labels (@talmo)
  • #605: Wire segmentation into the new predict pipeline (train→predict) (@talmo)
  • #607: Rename the inference subcommand to predict (@gitttt-1234)
  • #608: Post-training mask-IoU evaluation for bottom-up segmentation (@talmo)
  • #612: Fix FlowShiftTracker crash on frames with no detections (#611) (@talmo)
  • #613: Add --output_format to save predictions directly as analysis HDF5 (@tom21100227)
  • #614: Fix three bottom-up segmentation inference parity bugs (@talmo)
  • #615: Unify exported-model inference into sleap-nn predict (@gitttt-1234)
  • #624: Segmentation polish — docs/config, PQ eval, postproc knobs, offset viz (@talmo)
  • #626: Frame-based inference progress + windowed FPS column (#610) (@gitttt-1234)
  • #628: Inference spin-up + run-summary logging (#610) (@gitttt-1234)
  • #629: COCO-style mask AP/AR/boundary-IoU/fragmentation/per-size eval (#616) (@talmo)
  • #630: Mask-IoU tracker MVP for bottom-up segmentation (#619) (@talmo)
  • #631: Encode masks at output-stride + opt-in morphology + polygon ROI (#618) (@talmo)
  • #633: Bump sleap-io pin to main (4ee1fb38) (@talmo)
  • #634: Error on multi-instance frames in single-instance training (@gitttt-1234)
  • #635: Make gpu/cpu first-class torch extras so --extra gpu attaches cuDNN (#632) (@talmo)
  • #636: Adaptive distance-gate + greedy RAG fragment-merge postproc (#617) (@talmo)
  • #637: Top-down segmentation — offset-aware mask decode + sleap-io bump (#622) (@talmo)
  • #638: centered_instance_segmentation model type — config, data, training (#622) (@talmo)
  • #639: Top-down (crop-centered) segmentation inference (#622) (@talmo)
  • #640: Top-down segmentation eval routing + CLI + docs (#622) (@talmo)
  • #641: Lower bottomup_segmentation center-head sigma default 10.0 → 4.0 (@talmo)
  • #645: Symmetry-aware flip augmentation (@gitttt-1234)
  • #646: Mask-based reconciliation + re-tracking for SAM inference (@talmo)
  • #647: SAM1 prompted inference segmentation core (@talmo)
  • #648: SAM3 prompted mask backend (opt-in, mask_backend="sam3") (@talmo)
  • #649: Augmentation, viz & eval parity for segmentation training (@talmo)
  • #650: SAM-inference tech-debt cleanup + predict CLI surface (@talmo)
  • #651: Make codecov coverage upload non-blocking (@talmo)
  • #653: Training viz — overlay predicted per-instance masks (#627) (@talmo)
  • #654: Configurable image-embedding & source-video controls for the prediction output .slp (#652) (@talmo)
  • #659: Lock in sleap-io 0.8.0 save/analysis behavior (compat regression guards) (@talmo)
  • #660: Pin sleap-io to >=0.8.0,<0.9.0 from PyPI (drop git override) (@talmo)
  • #661: Accept remote URLs (http/s3/gs/...) as --data_path (@talmo)
  • #662: Make GPU (cu130) the default uv sync/run backend (@talmo)
  • #663: Migrate inference docs to sleap-nn predict + nav/overview cleanup (multi-GPU & Windows uv fixes) (@gitttt-1234)
  • #664: flies13 top-down training + tracking demo notebook (@talmo)
  • #665: 0.3.0 version bump + seed-config fix + docs-correctness sweep (@talmo)
  • #666: Filter out-of-bounds points before training (NaN-mask off-crop/off-frame nodes) (#571) (@gitttt-1234)
  • #667: Route np.ndarray / torch.Tensor predict() source to NumpyProvider (@talmo)
  • #668: Top-level predict/Predictor/load_models + repeatable --output_format (@talmo)
  • #669: Add trainer_config.viz_img_format (png|jpg) for local training-viz images (#644) (@talmo)
  • #670: Auto-switch to local_queues when max_tracks is set so the track cap is honored (sleap#2720) (@gitttt-1234)
  • #672: Antialias the single-instance ONNX resize to match PyTorch inference (@talmo)
  • #677: Clarify model selection + add a Supervised ID guide for multi_class_topdown/bottomup (#570) (@gitttt-1234)
  • #678: Batch LabelsProvider frames by shape so mixed-resolution .slp inputs don't crash predict/eval (@gitttt-1234)

Contributors: @talmo, @gitttt-1234, @tom21100227, @davorvr

Full Changelog: v0.2.0...v0.3.0