Merge branch 'main' into ci

tlpss · Oct 25, 2023 · 9eb26f8 · 9eb26f8
2 parents 16e3959 + 15648f7
commit 9eb26f8
Show file tree

Hide file tree

Showing 54 changed files with 1,398 additions and 1,196 deletions.
diff --git a/.gitignore b/.gitignore
@@ -4,8 +4,10 @@
 .vscode/**
 
 datasets/
+scripts/dummy_dataset/
 **wandb/
 lightning_logs**
 **.ckpt
 
 **/checkpoints/
+build/**
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -18,7 +18,7 @@ repos:
         files: \.py$
 
 -   repo: https://github.com/PyCQA/isort
-    rev: 5.10.1
+    rev: 5.12.0
     hooks:
     -   id: isort
         name: isort - sort imports

diff --git a/README.md b/README.md
@@ -1,6 +1,6 @@
 <h1 align="center">Pytorch Keypoint Detection</h1>
 
-This repo contains a Python package for 2D keypoint detection using [Pytorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) and [wandb](https://docs.wandb.ai/). Keypoints are trained using Gaussian Heatmaps, as in [Jakab et Al.](https://proceedings.neurips.cc/paper/2018/hash/1f36c15d6a3d18d52e8d493bc8187cb9-Abstract.html) or [Centernet](https://github.com/xingyizhou/CenterNet).
+A Framework for keypoint detection using [Pytorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) and [wandb](https://docs.wandb.ai/). Keypoints are trained with Gaussian Heatmaps, as in [Jakab et Al.](https://proceedings.neurips.cc/paper/2018/hash/1f36c15d6a3d18d52e8d493bc8187cb9-Abstract.html) or [Centernet](https://github.com/xingyizhou/CenterNet).
 
 This package is been used for research at the [AI and Robotics](https://airo.ugent.be/projects/computervision/) research group at Ghent University. You can see some applications below: The first image shows how this package is used to detect corners of cardboard boxes, in order to close the box with a robot. The second example shows how it is used to detect a varying number of flowers.
 <div align="center">
@@ -10,15 +10,16 @@ This package is been used for research at the [AI and Robotics](https://airo.uge
 
 
 ## Main Features
+- The detector can deal with an **arbitrary number of keypoint channels**, that can contain **a varying amount of keypoints**. You can easily configure which keypoint types from the COCO dataset should be mapped onto the different channels of the keypoint detector. This flexibility allows to e.g. combine different semantic locations that have symmetries onto the same channel to overcome this ambiguity.
+- We use the standard **COCO dataset format**.
 
-- This package contains **different backbones** (Unet-like, dilated CNN, Unet-like with pretrained ConvNeXt encoder). Furthermore you can  easily add new backbones or loss functions. The head of the keypoint detector is a single CNN layer.
-- The package uses the often-used **COCO dataset format**.
-- The detector can deal with an **arbitrary number of keypoint channels**, that can contain **a varying amount of keypoints**. You can easily configure which keypoint types from the COCO dataset should be mapped onto the different channels of the keypoint detector.
-- The package contains an implementation of the Average Precision metric for keypoint detection.
-- Extensive **logging to wandb is provided**: The loss for each channel is logged, together with the AP metrics for all specified treshold distances. Furthermore, the raw heatmaps, detected keypoints and ground truth heatmaps are logged at every epoch for the first batch to provide insight in the training dynamics and to verify all data processing is as desired.
+-  **different backbones** can be used (Unet-like, dilated CNN, Unet-like with pretrained encoders). Furthermore you can  easily add new backbones or loss functions. The head of the keypoint detector is a single CNN layer.
+
+- The package contains an implementation of the Average Precision metric for keypoint detection. The threshold distance for classification of detections as FP or TP is based on L2 distance between the keypoints and ground truth keypoints.
+- Extensive **logging to wandb is provided**: The train/val loss for each channel is logged, together with the AP metrics for all specified treshold distances and all channels.  Furthermore, the raw heatmaps, detected keypoints and ground truth heatmaps are logged to provide insight in the training dynamics and to verify all data processing is as desired.
 - All **hyperparameters are configurable** using a python argumentparser or wandb sweeps.
 
-note: this is the second version of the package, for the older version that used a custom dataset format, see the github releases.
+note: this package is still under development and we make no commitment on backwards compatibility nor reproducibility on the main branch. If you need this, it is best to pin a single commit.
 
 
 TODO: add integration example.
@@ -30,47 +31,107 @@ TODO: add integration example.
 - run `wandb login` to set up your wandb account.
 - you are now ready to start training.
 
+
+## Training
+
+To train a keypoint detector,  run the `keypoint-detection train` CLI with the appropriate arguments.
+To create your own configuration: run `keypoint-detection train -h` to see all parameter options and their documentation.
+
+A good starting point could be the bash script `bash test/integration_test.sh` to test on the provided test dataset, which contains 4 images. You should see the loss going down consistently until the detector has completely overfit the train set and the loss is around the entropy of the ground truth heatmaps (if you selected the default BCE loss).
+
+### Wandb sweeps
+Alternatively, you can create a sweep on [wandb](https://wandb.ai) and to then start a (number of) wandb agent(s). This is very useful for running multiple configurations (hparam search, testing on multiple datasets,..)
+
+### Loading pretrained weights
+If you want to load pretrained keypoint detector weights, you can specify the wandb artifact of the checkpoint in the training parameters: `keypoint-detection train ..... -wandb_checkpoint_artifact <artifact-path>`. This can be used for example to finetune on real data after pretraining on synthetic data.
+
 ## Dataset
 
 This package used the [COCO format](https://cocodataset.org/#format-data) for keypoint annotation and expects a dataset with the following structure:
 ```
 dataset/
   images/
     ...
-  <name>.json : a COCO-formatted keypoint annotation file.
+  <name>.json : a COCO-formatted keypoint annotation file with filepaths relative to its parent directory.
 ```
 For an example, see the `test_dataset` at `test/test_dataset`.
 
 
 ### Labeling
-If you want to label data, we provide integration with the [CVAT](https://github.com/opencv/cvat) labeling tool: You can annotate your data and export it in their custom format, which can then be converted to COCO format. Take a look [here](labeling/Readme.md) for more information on this workflow and an example. To visualize a given dataset, you can use the  `keypoint_detection/utils/visualization.py` script.
+If you want to label data, we use[CVAT](https://github.com/opencv/cvat) labeling tool. The flow and the code to create COCO keypoints datasets is all available in the [airo-dataset-tools](https://github.com/airo-ugent/airo-mono/tree/main) package.
 
-## Training
+It is best to label your data with floats that represent the subpixel location of the keypoints. This allows for more precise resizing of the images later on. The keypoint detector cast them to ints before training to obtain the pixel they belong to (it does not support sub-pixel detections).
+
+## Evaluation
+TODO
+`keypoint-detection eval --help`
 
-There are 2 ways to train the keypoint detector:
+## Fiftyone viewer
+TODO
+`scripts/fiftyone_viewer`
 
-- The first is to run the `train.py` script with the appropriate arguments. e.g. from the root folder of this repo, you can run the bash script `bash test/integration_test.sh` to test on the provided test dataset, which contains 4 images. You should see the loss going down consistently until the detector has completely overfit the train set and the loss is around the entropy of the ground truth heatmaps (if you selected the default BCE loss).
+## Using a trained model for  Inference
+During training Pytorch Lightning will have saved checkpoints. See `scripts/checkpoint_inference.py` for a simple example to run inference with a checkpoint.
+For benchmarking the inference (or training), see `scripts/benchmark.py`.
 
-- The second method is to create a sweep on [wandb](https://wandb.ai) and to then start a wandb agent from the correct relative location.
-A minimal sweep example  is given in `test/configuration.py`. The same content should be written to a yaml file according to the wandb format. The sweep can be started by running `wandb agent <sweep-id>` from your CLI.
 
+## Metrics
+
+TO calculate AP, precision or recall, the detections need to be classified into False Positives and False negatives as for object detection or instance segmentation.
+
+This package simply uses a number of euclidian pixel distance thresholds. You can set the euclidian distances for which you want to calculate the metrics in the hyperparameters.
+
+Pixel perfect keypoints have a pixel distance of 0, so if you want a metric for pixel-perfect keypoints you should add a threshold distance of 0.
+
+Usually it is best to calculate the real-world deviations (in cm) that are acceptable and then determine the threshold(s) (in pixels) you are interested in.
+
+In general a lower threshold will result in a lower metric. The size of this gap is determined by the 'ambiguity' of your dataset and/or the accuracy of your labels.
+
+#TODO: add a figure to illustrate this.
+
+
+We do not use OKS as in COCO for the following reasons:
+1. it requires bbox annotations, which are not always required for keypoint detection itself and represent additional label effort.
+2. More importantly, in robotics the size of an object does not always correlate with the required precision. If a large and a small mug stand on a table, they require the same precise localisation of keypoints for a robot to grasp them even though their apparent size is different.
+3. (you need to estimate label variance, though you could simply set k=1 and skip this part)
 
-To create your own configuration: run `python train.py -h` to see all parameter options and their documentation.
 
-## Using a trained model (Inference)
-During training Pytorch Lightning will have saved checkpoints. See `scripts/checkpoint_inference.py` for a simple example to run inference with a checkpoint.
-For benchmarking the inference (or training), see `scripts/benchmark.py`.
 
 ## Development  info
 - formatting and linting is done using [pre-commit](https://pre-commit.com/)
 - testing is done using pytest (with github actions for CI)
 
 
 ## Note on performance
-- Keep in mind that the Average Precision is a very expensive operation, it can easily take as long to calculate the AP of a .1 data split as it takes to train on the remaining 90% of the data. Therefore it makes sense to use the metric sparsely. The AP will always be calculated at the final epoch, so for optimal train performance (w/o intermediate feedback), you can e.g. set the `ap_epoch_start` parameter to your max number of epochs + 1.
+- Keep in mind that calculating the Average Precision is expensive operation, it can easily take as long to calculate the AP of a .1 data split as it takes to train on the remaining 90% of the data. Therefore it makes sense to use the metric sparsely, for which hyperparameters are available. The AP will always be calculated at the final epoch.
+
+## Note on top-down vs. bottom-up keypoint detection.
+There are 2 ways to do keypoint detection when multiple instances are present in an image:
+1. first do instance detection and then detect keypoints on a crop of the bbox for each instance
+2. detect keypoints on the full image.
+
+Option 1 suffers from compounding errors (if the instance is not detected, no keypoints will be detected) and/or requires you to train (and hence label) an object detector.
+Option 2 can have lower performance for the keypoints (more 'noise' in the image that can distract the detector) and if you have multiple keypoints / instance as well as multiple instances per image, you need to do keypoint association.
+
+This repo is somewhat agnostic to that choice.
+For 1: crop your dataset upfront and train the detector on those crops, at inference: chain the object detector and the keypoint detector.
+for 2: If you can do the association manually, simply do it after inference. However this repo does not offer learning the associations as in the [Part Affinity Fields]() paper.
+
 
 ## Rationale:
 TODO
 - why this repo?
   - why not label keypoints as bboxes and use YOLO/Detectron2?
   - ..
+
+# Citing this project
+
+You are invited to cite the following publication if you use this keypoint detector in your research:
+```
+@inproceedings{lips2022synthkeypoints,
+  title={Learning Keypoints from Synthetic Data for Robotic Cloth Folding},
+  author={Lips, Thomas and De Gusseme, Victor-Louis and others},
+  journal={2nd workshop on Representing and Manipulating Deformable Objects - ICRA},
+  year={2022}
+}
+```
diff --git a/environment.yaml b/environment.yaml
@@ -1,11 +1,12 @@
 name: keypoint-detection # to update an existing environment: conda env update -n <current_name> --file <path-to-this-file>
 channels:
   - pytorch
+  - nvidia
   - conda-forge
 dependencies:
-  - cudatoolkit=11.3
   - python=3.9
-  - pytorch
+  - pytorch=1.13
+  - pytorch-cuda=11.7
   - torchvision
   - pip
   - pip:

diff --git a/keypoint_detection/data/coco_dataset.py b/keypoint_detection/data/coco_dataset.py
@@ -1,5 +1,6 @@
 import argparse
 import json
+import math
 import typing
 from collections import defaultdict
 from pathlib import Path
@@ -42,21 +43,23 @@ def add_argparse_args(parent_parser: argparse.ArgumentParser) -> argparse.Argume
         """
         parser = parent_parser.add_argument_group("COCOkeypointsDataset")
         parser.add_argument(
-            "--detect_non_visible_keypoints",
-            default=True,
-            type=str,
-            help="detect keypoints with visibility flag = 1? default = True",
+            "--detect_only_visible_keypoints",
+            dest="detect_only_visible_keypoints",
+            default=False,
+            action="store_true",
+            help="If set, only keypoints with flag > 1.0 will be used.",
         )
+
         return parent_parser
 
     def __init__(
         self,
         json_dataset_path: str,
         keypoint_channel_configuration: list[list[str]],
-        detect_non_visible_keypoints: bool = True,
+        detect_only_visible_keypoints: bool = True,
         transform: A.Compose = None,
         imageloader: ImageLoader = None,
-        **kwargs
+        **kwargs,
     ):
         super().__init__(imageloader)
 
@@ -65,7 +68,9 @@ def __init__(
         self.dataset_dir_path = self.dataset_json_path.parent  # assume paths in JSON are relative to this directory!
 
         self.keypoint_channel_configuration = keypoint_channel_configuration
-        self.detect_non_visible_keypoints = detect_non_visible_keypoints
+        self.detect_only_visible_keypoints = detect_only_visible_keypoints
+
+        print(f"{detect_only_visible_keypoints=}")
 
         self.random_crop_transform = None
         self.transform = transform
@@ -88,13 +93,27 @@ def __getitem__(self, index) -> Tuple[torch.Tensor, IMG_KEYPOINTS_TYPE]:
 
         image_path = self.dataset_dir_path / self.dataset[index][0]
         image = self.image_loader.get_image(str(image_path), index)
+        # remove a-channel if needed
+        if image.shape[2] == 4:
+            image = image[..., :3]
 
         keypoints = self.dataset[index][1]
 
         if self.transform:
             transformed = self.transform(image=image, keypoints=keypoints)
             image, keypoints = transformed["image"], transformed["keypoints"]
 
+        # convert all keypoints to integers values.
+        # COCO keypoints can be floats if they specify the exact location of the keypoint (e.g. from CVAT)
+        # even though COCO format specifies zero-indexed integers (i.e. every keypoint in the [0,1]x [0.1] pixel box becomes (0,0)
+        # we convert them to ints here, as the heatmap generation will add a 0.5 offset to the keypoint location to center it in the pixel
+        # the distance metrics also operate on integer values.
+
+        # so basically from here on every keypoint is an int that represents the pixel-box in which the keypoint is located.
+        keypoints = [
+            [[math.floor(keypoint[0]), math.floor(keypoint[1])] for keypoint in channel_keypoints]
+            for channel_keypoints in keypoints
+        ]
         image = self.image_to_tensor_transform(image)
         return image, keypoints
 
@@ -169,10 +188,12 @@ def is_keypoint_visible(self, keypoint: COCO_KEYPOINT_TYPE) -> bool:
         Returns:
             bool: True if current keypoint is considered visible according to the dataset configuration, else False
         """
-        minimal_flag = 0
-        if not self.detect_non_visible_keypoints:
-            minimal_flag = 1
-        return keypoint[2] > minimal_flag
+        if self.detect_only_visible_keypoints:
+            # filter out occluded keypoints with flag 1.0
+            return keypoint[2] > 1.5
+        else:
+            # filter out non-labeled keypoints with flag 0.0
+            return keypoint[2] > 0.5
 
     @staticmethod
     def split_list_in_keypoints(list_to_split: List[COCO_KEYPOINT_TYPE]) -> List[List[COCO_KEYPOINT_TYPE]]:

diff --git a/keypoint_detection/data/coco_parser.py b/keypoint_detection/data/coco_parser.py
@@ -51,6 +51,8 @@ class CocoKeypointAnnotation(BaseModel):
     image_id: ImageID
 
     num_keypoints: Optional[int]
+    # COCO keypoints can be floats if they specify the exact location of the keypoint (e.g. from CVAT)
+    # even though COCO format specifies zero-indexed integers (i.e. every keypoint in the [0,1]x [0.1] pixel box becomes (0,0)
     keypoints: List[float]
 
     # TODO: add checks.